Re: [Qemu-devel] pvpanic plans?
On 08/27/2013 11:06 AM, Richard W.M. Jones wrote: On Thu, Aug 22, 2013 at 03:09:06PM -0500, Anthony Liguori wrote: Paolo Bonzini pbonz...@redhat.com writes: Also, a virtio watchdog device makes little sense, IMHO. PV makes sense if emulation has insufficient performance, excessive CPU usage, or excessive complexity. We already have both an ISA and a PCI watchdog, and they serve their purpose wonderfully. Neither of which actually work with modern versions of Windows FWIW. Correct, although someone could write a driver! Plus emulated watchdogs do not take into account steal time or overcommit in general. I've seen multiple cases where a naive watchdog has a problem in the field when the system is under heavy load. The watchdog devices in qemu run on guest time. However the watchdog *daemon* inside the guest probably does behave badly as you describe. Changing the device model isn't going to help this, but it would definitely make sense to fix the daemon (although I don't know how -- is steal time exposed to guests?) I don't necessarily think a virtio-watchdog is a bad idea. For one thing it'd mean we would have a watchdog device that works on ARM. Rich. I believe that a watchdog is not the way to go. You need host-side decision making. Say that the guest did not receive CPU/Disk/network resources for a lengthy period of time, but the host knows that this is due to host resources availability. In such cases, you certainly do not want to reboot all the guests, especially since rebooting 50 Windows VMs could be a nightmare. BTW, Windows guest disable some of their watchdogs when they detect the presence of Hyper-V, we use it to overcome BSODs! So the right solution is to send a heart-beat to a management application (using qemu-ga or whatever), and let it decide how to handle it. Ronen.
Re: [Qemu-devel] [PATCH for-1.6 V2 0/2] pvpanic: Separate pvpanic from machine type
How about adding a flag that tells QEMU whether to pause or reboot the guest after the panic? We cannot assume that we always have a management layer that takes care of this. One example is Microsoft's WHQL that deliberately generates a BSOD, and then examines the dump files. Ronen. On 08/11/2013 06:10 PM, Marcel Apfelbaum wrote: Creating the pvpanic device as part of the machine type has the potential to trigger guest OS, guest firmware and driver bugs. The potential of such was originally viewed as minimal. However, since releasing 1.5 with pvpanic as part of the builtin machine type, several issues were observed in the field: - Some Windows versions triggered 'New Hardware Wizard' and an unidentified device appeared in Device Manager. - Issue reported off list: on Linux = 3.10 the pvpanic driver breaks the reset on crash option: VM stops instead of being reset. pvpanic device also changes monitor command behaviour in some cases, such silent incompatible changes aren't expected by management tools: - Monitor command requires 'cont' before 'system_reset' in order to restart the VM after kernel panic/BSOD Note that libvirt is the main user and libvirt people indicated their preference to creating device with -device pvpanic rather than a built-in one that can't be removed. These issues were raised at last KVM call. The agreement reached there was that we were a bit too rash to make the device a builtin, and that for 1.6 we should drop the pvpanic device from the default machine type, instead teach management tools to add it by default using -device pvpanic. It's not clear whether changing 1.5 behaviour at this point is a sane thing, so this patchset doesn't touch 1.5 machine type. This patch series reworks the patchset from Hu Tao (don't create pvpanic device by default) addressing comments and modifying behaviour according to what was discussed on the call. Please review and consider for 1.6. A related discussion can be followed at http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00036.html. This is a continuation of patches sent by Hu Tao: http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00124.html http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00125.html Changes from v1 (by Hu Tao): - Keep pvpanic device enabled by default for 1.5 for backport compatibility - Addressed Andreas Färber review (removed bus type) - Small changes to be posible to enable pvpanic both from command line and from machine_init - Added pvpanic to MISC category Marcel Apfelbaum (2): hw/misc: don't create pvpanic device by default hw/misc: make pvpanic known to user hw/i386/pc_piix.c | 9 - hw/i386/pc_q35.c | 7 --- hw/misc/pvpanic.c | 25 ++--- 3 files changed, 18 insertions(+), 23 deletions(-)
Re: [Qemu-devel] [RFC PATCH 1/2] qemu-help: Sort devices by logical functionality
On 07/18/2013 05:28 PM, Anthony Liguori wrote: Marcel Apfelbaum marce...@redhat.com writes: Categorize devices that appear as output to -device ? command by logical functionality. Sort the devices by logical categories before showing them to user. Signed-off-by: Marcel Apfelbaum marce...@redhat.com Reviewed-by: Kevin Wolf kw...@redhat.com --- include/hw/qdev-core.h | 7 +++ qdev-monitor.c | 23 ++- 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 7fbffcb..4f7a9b8 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -17,6 +17,12 @@ enum { #define DEVICE_CLASS(klass) OBJECT_CLASS_CHECK(DeviceClass, (klass), TYPE_DEVICE) #define DEVICE_GET_CLASS(obj) OBJECT_GET_CLASS(DeviceClass, (obj), TYPE_DEVICE) +#define DEVICE_CATEGORY_STORAGE storage +#define DEVICE_CATEGORY_NETWORK network +#define DEVICE_CATEGORY_INPUT input +#define DEVICE_CATEGORY_DISPLAY display +#define DEVICE_CATEGORY_SOUND sound + Looks reasonable, but please make this a bitmap. There are cases, particularly if we start modeling multifunction PCI cards as a single device, where a single device can support multiple types of functionality. I agree. On the one hand, there is no proper sorting by DEVICE_CATEGORY for devices with multiple functions, but OTOH that's the reality. If you add get_all_devices_by_category(dev_category), then it can be useful as is, and also for looping over all the categories for printout. It will output those devices in multiple categories, which is what we want. Thanks, Ronen. Regards, Anthony Liguori typedef int (*qdev_initfn)(DeviceState *dev); typedef int (*qdev_event)(DeviceState *dev); typedef void (*qdev_resetfn)(DeviceState *dev); @@ -81,6 +87,7 @@ typedef struct DeviceClass { /* public */ const char *fw_name; +const char *category; const char *desc; Property *props; int no_user; diff --git a/qdev-monitor.c b/qdev-monitor.c index e54dbc2..1446b6e 100644 --- a/qdev-monitor.c +++ b/qdev-monitor.c @@ -93,6 +93,9 @@ static void qdev_print_devinfo(ObjectClass *klass, void *opaque) if (qdev_class_has_alias(dc)) { error_printf(, alias \%s\, qdev_class_get_alias(dc)); } +if (dc-category) { +error_printf(, category \%s\, dc-category); +} if (dc-desc) { error_printf(, desc \%s\, dc-desc); } @@ -139,16 +142,34 @@ static const char *find_typename_by_alias(const char *alias) return NULL; } +static gint qdev_device_compare(gconstpointer item1, gconstpointer item2) +{ +DeviceClass *dc1, *dc2; + +dc1 = (DeviceClass *)object_class_dynamic_cast((ObjectClass *)item1, + TYPE_DEVICE); +dc2 = (DeviceClass *)object_class_dynamic_cast((ObjectClass *)item2, + TYPE_DEVICE); + +return g_strcmp0(dc1-category, dc2-category); +} + int qdev_device_help(QemuOpts *opts) { const char *driver; Property *prop; ObjectClass *klass; +GSList *list; driver = qemu_opt_get(opts, driver); if (driver is_help_option(driver)) { bool show_no_user = false; -object_class_foreach(qdev_print_devinfo, TYPE_DEVICE, false, show_no_user); + +list = object_class_get_list(TYPE_DEVICE, false); +list = g_slist_sort(list, qdev_device_compare); +g_slist_foreach(list, (GFunc)qdev_print_devinfo, show_no_user); +g_slist_free(list); + return 1; } -- 1.8.3.1
Re: [Qemu-devel] [Resend][Seabios PATCH] don't boot from un-selected devices
On 12/19/2012 11:32 AM, Gleb Natapov wrote: On Wed, Dec 19, 2012 at 03:24:45PM +0800, Amos Kong wrote: Current seabios will try to boot from selected devices first, if they are all failed, seabios will also try to boot from un-selected devices. For example: @ qemu-kvm -boot order=n,menu=on ... Guest will boot from network first, if it's failed, guest will try to boot from other un-selected devices (floppy, cdrom, disk) one by one. Sometimes, user don't want to boot from some devices. This patch changes And sometimes he want. The patch changes behaviour unconditionally. New behaviour should be user selectable. Something line -boot order=strict on qemu command line. Another option would be to add a terminator symbol, say T (I couldn't find a good terminator), so that order=ndT, would mean strict nd. Ronen. seabios to boot only from selected devices. If user choose first boot device from menu, then seabios will try all the devices, even some of them are not selected. Signed-off-by: Amos Kong ak...@redhat.com --- Resend for CCing seabios maillist. --- src/boot.c | 13 - 1 files changed, 8 insertions(+), 5 deletions(-) diff --git a/src/boot.c b/src/boot.c index 3ca7960..ee810ac 100644 --- a/src/boot.c +++ b/src/boot.c @@ -424,6 +424,10 @@ interactive_bootmenu(void) maxmenu++; printf(%d. %s\n, maxmenu , strtcpy(desc, pos-description, ARRAY_SIZE(desc))); +/* If user chooses first boot device from menu, we will treat + all the devices as selected. */ +if (pos-priority == DEFAULT_PRIO) +pos-priority = DEFAULT_PRIO - 1; pos = pos-next; } @@ -490,7 +494,10 @@ boot_prep(void) // Map drives and populate BEV list struct bootentry_s *pos = BootList; -while (pos) { + +/* The priority of un-selected device is not changed, + we only boot from user selected devices. */ +while (pos pos-priority != DEFAULT_PRIO) { switch (pos-type) { case IPL_TYPE_BCV: call_bcv(pos-vector.seg, pos-vector.offset); @@ -513,10 +520,6 @@ boot_prep(void) } pos = pos-next; } - -// If nothing added a floppy/hd boot - add it manually. -add_bev(IPL_TYPE_FLOPPY, 0); -add_bev(IPL_TYPE_HARDDISK, 0); } -- 1.7.1 -- Gleb.
Re: [Qemu-devel] Better qemu/kvm defaults (was Re: [RFC PATCH 0/4] Gang scheduling in CFS)
On 01/01/2012 12:16 PM, Dor Laor wrote: On 12/29/2011 06:16 PM, Anthony Liguori wrote: On 12/29/2011 10:07 AM, Dor Laor wrote: On 12/26/2011 11:05 AM, Avi Kivity wrote: On 12/26/2011 05:14 AM, Nikunj A Dadhania wrote: btw you can get an additional speedup by enabling x2apic, for default_send_IPI_mask_logical(). In the host? In the host, for the guest: qemu -cpu ...,+x2apic It seems to me that we should improve our default flags. So many times users fail to submit the proper huge command-line options that we require. Honestly, we can't blame them, there are so many flags and so many use cases its just too hard to get it right for humans. You might want to take into account migration considerations. I.e., the target host's optimal setup. Also, we need to beware of too much automation, since hardware changes might void Windows license activations. Some of the parameters will depend on dynamic factors such as the total guest's nCPUs, mem, sharing (KSM), or whatever. As a minimum, we can automatically suggest the qemu parameters and the host setup. Ronen. I propose a basic idea and folks are welcome to discuss it: 1. Improve qemu/kvm defaults Break the current backward compatibility (but add a --default- backward-compat-mode) and set better values for: - rtc slew time What do you specifically mean? -rtc localtime,driftfix=slew - cache=none I'm not sure I see this as a better default particularly since O_DIRECT fails on certain file systems. I think we really need to let WCE be toggable from the guest and then have a caching mode independent of WCE. We then need some heuristics to only enable cache=off when we know it's safe. cache=none is still faster then it has the FS support. qemu can test-run O_DIRECT and fall back to cache mode or just test the filesystem capabilities. - x2apic, maybe enhance qemu64 or move to -cpu host? Alex posted a patch for this. I'm planning on merging it although so far no one has chimed up either way. - aio=native|threads (auto-sense?) aio=native is unsafe to default because linux-aio is just fubar. It falls back to synchronous I/O if the underlying filesystem doesn't support aio. There's no way in userspace to problem if it's actually supported or not either... Can we test-run this too? Maybe as a separate qemu mode or even binary that given a qemu cmdline, it will try to suggest better parameters? - use virtio devices by default I don't think this is realistic since appropriately licensed signed virtio drivers do not exist for Windows. (Please note the phrase appropriately licensed signed). What's the percentage of qemu invocation w/ windows guest and a short cmd line? My hunch is that plain short cmdline indicates a developer and probably they'll use linux guest. - more? Different defaults may be picked automatically when TCG|KVM used. 2. External hardening configuration file kept in qemu.git For non qemu/kvm specific definitions like the io scheduler we should maintain a script in our tree that sets/sense the optimal settings of the host kernel (maybe similar one for the guest). What are appropriate host settings and why aren't we suggesting that distros and/or upstream just set them by default? It's hard to set the right default for a distribution since the same distro should optimize for various usages of the same OS. For example, Fedora has tuned-adm w/ available profiles: - desktop-powersave - server-powersave - enterprise-storage - spindown-disk - laptop-battery-powersave - default - throughput-performance - latency-performance - laptop-ac-powersave We need to keep on recommending the best profile for virtualization, for Fedora I think it either enterprise-storage and maybe throughput-performance. If we have a such a script, it can call the matching tuned profile instead of tweaking every /sys option. Regards, Anthony Liguori HTH, Dor
Re: [Qemu-devel] [PATCH 1/5] vfio: Introduce documentation for VFIO driver
On 12/21/2011 11:42 PM, Alex Williamson wrote: Including rationale for design, example usage and API description. Signed-off-by: Alex Williamsonalex.william...@redhat.com --- Documentation/vfio.txt | 352 1 files changed, 352 insertions(+), 0 deletions(-) create mode 100644 Documentation/vfio.txt diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt new file mode 100644 index 000..09a5a5b --- /dev/null +++ b/Documentation/vfio.txt @@ -0,0 +1,352 @@ +VFIO - Virtual Function I/O[1] +--- +Many modern system now provide DMA and interrupt remapping facilities +to help ensure I/O devices behave within the boundaries they've been +allotted. This includes x86 hardware with AMD-Vi and Intel VT-d, +POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC +systems such as Freescale PAMU. The VFIO driver is an IOMMU/device +agnostic framework for exposing direct device access to userspace, in +a secure, IOMMU protected environment. In other words, this allows +safe[2], non-privileged, userspace drivers. + +Why do we want that? Virtual machines often make use of direct device +access (device assignment) when configured for the highest possible +I/O performance. From a device and host perspective, this simply +turns the VM into a userspace driver, with the benefits of +significantly reduced latency, higher bandwidth, and direct use of +bare-metal device drivers[3]. + +Some applications, particularly in the high performance computing +field, also benefit from low-overhead, direct device access from +userspace. Examples include network adapters (often non-TCP/IP based) +and compute accelerators. Prior to VFIO, these drivers had to either +go through the full development cycle to become proper upstream +driver, be maintained out of tree, or make use of the UIO framework, +which has no notion of IOMMU protection, limited interrupt support, +and requires root privileges to access things like PCI configuration +space. + +The VFIO driver framework intends to unify these, replacing both the +KVM PCI specific device assignment code as well as provide a more +secure, more featureful userspace driver environment than UIO. + +Groups, Devices, and IOMMUs +--- + +Userspace drivers are primarily concerned with manipulating individual +devices and setting up mappings in the IOMMU for those devices. +Unfortunately, the IOMMU doesn't always have the granularity to track +mappings for an individual device. Sometimes this is a topology +barrier, such as a PCIe-to-PCI bridge interposing the device and +IOMMU, other times this is an IOMMU limitation. In any case, the +reality is that devices are not always independent with respect to the +IOMMU. Translations setup for one device can be used by another +device in these scenarios. + +The IOMMU API exposes these relationships by identifying an IOMMU +group for these dependent devices. Devices on the same bus with the +same IOMMU group (or just group for this document) are not isolated +from each other with respect to DMA mappings. For userspace usage, +this logically means that instead of being able to grant ownership of +an individual device, we must grant ownership of a group, which may +contain one or more devices. + +These groups therefore become a fundamental component of VFIO and the +working unit we use for exposing devices and granting permissions to +userspace. In addition, VFIO make efforts to ensure the integrity of +the group for user access. This includes ensuring that all devices +within the group are controlled by VFIO (vs native host drivers) +before allowing a user to access any member of the group or the IOMMU +mappings, as well as maintaining the group viability as devices are +dynamically added or removed from the system. + +To access a device through VFIO, a user must open a character device +for the group that the device belongs to and then issue an ioctl to +retrieve a file descriptor for the individual device. This ensures +that the user has permissions to the group (file based access to the +/dev entry) and allows a check point at which VFIO can deny access to +the device if the group is not viable (all devices within the group +controlled by VFIO). A file descriptor for the IOMMU is obtain in the +same fashion. + +VFIO defines a standard set of APIs for access to devices and a +modular interface for adding new, bus-specific VFIO device drivers. +We call these VFIO bus drivers. The vfio-pci module is an example +of a bus driver for exposing PCI devices. When the bus driver module +is loaded it enumerates all of the devices for it's bus, registering +each device with the vfio core along with a set of callbacks. For +buses that support hotplug, the bus driver also adds itself to the +notification chain for such events.
Re: [Qemu-devel] [RFC] Migration convergence - a suggestion
On 12/20/2011 03:39 PM, Anthony Liguori wrote: On 12/20/2011 01:06 AM, Ronen Hod wrote: Well the issue is not new, anyhow, following a conversation with Orit ... Since we want the migration to finish, I believe that the migration speed parameter alone cannot do the job. I suggest using two distinct parameters: 1. Migration speed - will be used to limit the network resources utilization 2. aggressionLevel - A number between 0.0 and 1.0, where low values imply minimal interruption to the guest, and 1.0 mean that the guest will be completely stalled. In any case the migration will have to do its work and finish given any actual migration-speed, so even low aggressionLevel values will sometimes imply that the guest will be throttled substantially. The algorithm: The aggressionLevel should determine the targetGuest%CPU (how much CPU time we want to allocate to the guest) QEMU has no way to limit the guest CPU time. Wouldn't any yield (sleep / whatever) limit the guest's CPU time, be it in qemu or in KVM. My intention is to suggest an algorithm that is based on guest throttling. Looking at the relevant BZs, I do not see how we can avoid it. I certainly have no claims regarding the architecture. Avi and mst, believe that it is better to continuously control the guest's CPU from the outside (libvirt) using cgroups. Although less responsive to changes, it should still work. In the meantime, I also discovered that everybody has a different point of view regarding the requirements. Regardless, I believe that the same basic mechanics (once decided), can do the work Some relevant configuration requirements are: 1. Max bandwidth 2. Min CPU per guest 3. Max guest stall time 4. Max migration time These requirements will often conflict, and may imply changes in behavior over time. I would also suggest that the management GUI will let the user select the aggression-level (or whatever), and display the implication on all the other parameters (total-time, %CPU) based on the current behavior of the guest and network. Regards, Ronen Regards, Anthony Liguori
[Qemu-devel] [RFC] Migration convergence - a suggestion
Well the issue is not new, anyhow, following a conversation with Orit ... Since we want the migration to finish, I believe that the migration speed parameter alone cannot do the job. I suggest using two distinct parameters: 1. Migration speed - will be used to limit the network resources utilization 2. aggressionLevel - A number between 0.0 and 1.0, where low values imply minimal interruption to the guest, and 1.0 mean that the guest will be completely stalled. In any case the migration will have to do its work and finish given any actual migration-speed, so even low aggressionLevel values will sometimes imply that the guest will be throttled substantially. The algorithm: The aggressionLevel should determine the targetGuest%CPU (how much CPU time we want to allocate to the guest) With aggressionLevel = 1.0, the guest gets no CPU-resources (stalled). With aggressionLevel = 0.0, the guest gets minGuest%CPU, such that migrationRate == dirtyPagesRate. This minGuest%CPU is continuously updated based on the running average of the recent samples (more below). Note that the targetGuest%CPU allocation is continuously updated due to changes guest behavior, network congestion, and alike. Some more details - minGuest%CPU (i.e., for dirtyPagesRate == migrationRate) is easy to calculate as a running average of (migrationRate / dirtyPagesRate * guest%CPU) - There are several methods to calculate the running average, my favorite is IIR, where, roughly speaking, newVal = 0.99 * oldVal + 0.01 * newSample - I would use two measures to ensure that there are more migrated pages than dirty pages. 1. The running average (based on recent samples) of the migrated pages is larger than that of the new dirty pages 2. The total number of migrated pages so far is larger than the total number of new dirty pages. And yes, many details are still missing. Ronen.
Re: [Qemu-devel] [PATCH] qemu_timedate_diff() shouldn't modify its argument.
On 11/06/2011 06:00 PM, Gleb Natapov wrote: The caller of qemu_timedate_diff() does not expect that tm it passes to the function will be modified, but mktime() is destructive and modifies its argument. Pass a copy of tm to it and set tm_isdst so that mktime() will not rely on it since its value may be outdated. I believe that the original issue was not related to outdated data at the moment of the daylight saving time transition. using tmp.tm_isdst = -1 sounds good, but why use a copy of tm? The only significant field that will change in the tm is the tm_isdst itself that will be set to 0/1 (correctly). Acked-by: Ronen Hod r...@redhat.com Signed-off-by: Gleb Natapovg...@redhat.com diff --git a/vl.c b/vl.c index 624da0f..641629b 100644 --- a/vl.c +++ b/vl.c @@ -460,8 +460,11 @@ int qemu_timedate_diff(struct tm *tm) if (rtc_date_offset == -1) if (rtc_utc) seconds = mktimegm(tm); -else -seconds = mktime(tm); +else { +struct tm tmp = *tm; +tmp.tm_isdst = -1; /* use timezone to figure it out */ +seconds = mktime(tmp); + } else seconds = mktimegm(tm) + rtc_date_offset; -- Gleb.
Re: [Qemu-devel] Question on kvm_clock working ...
On 09/09/2011 06:28 PM, al pat wrote: We are doing an experiment with kvm-clock to validate its effectiveness, particularly when running NTP on the host to make sure the host’s clock stays properly sync. Our observations leads us to a few unanswered questions, including the possibility of a bug (our our misunderstanding of how kvm_clock should work). Our understanding is that kvm_clock will help sync the clock between the host and the guest. We do not observe this to happen in reality and thus this question. We are using Ubuntu 11.04 on the host and the guest. The command we issue to launch the VM is the following: $ sudo kvm -m 500 -rtc clock=host guestos.img We also arranged for Ubuntu to show the seconds on the clock displayed in the menu. Observation 1: Upon launching the VM, we see a time difference between the 2 clock ranging from 1 to 2 seconds. Observation 2: If we change the date on the host (with a command such as “date --set 10:00:00 AM Sep 9, 2011”), the time on the guest remains the same, unaffected. Observation 3: After running for a while without NTP on the host, we run “ntpdate” to sync up the host, but the guest stick with whatever previous time. You probably meant ntpd -q Another test we will run is to have ntpd on the host and wait for an extended time to see if the guest drifts away from that original 1 or 2 second lag. In the meantime, we are asking you for some input in this regards: Questions -What does the “–rtc clock” option is supposed to mean exactly? According to the man page, the guest should get its time from the host, but neither date nor an “ntpdate” affected the clock on the guest. -What are the other options that we should use? -rtc [base=utc|localtime|date][,clock=host|vm][,driftfix=none|slew] Specify base as utc or localtime to let the RTC start at the current UTC or local time, respectively. localtime is required for correct date in MS-DOS or Windows. To start at a specific point in time, provide date in the format 2006-06-17T16:01:21 or 2006-06-17. The default base is UTC. By default the RTC is driven by the host system time. This allows to use the RTC as accurate reference clock inside the guest, specifically if the host time is smoothly following an accurate external reference clock, e.g. via NTP. If you want to isolate the guest time from the host, even prevent it from progressing during suspension, you can set clock to vm instead. Enable driftfix (i386 targets only) if you experience time drift problems, specifically with Windows' ACPI HAL. This option will try to figure out how many timer interrupts were not processed by the Windows guest and will re-inject them. Can someone shed light on what we are missing? Any pointers will be helpful. Thanks -a
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
Well, we want to support Microsoft's VSS, and that requires a guest agent that communicates with all the writers (applications), waiting for them to flush their app data in order to generate a consistent app-level snapshot. The VSS platform does most of the work. Still, at the bottom line, the agent's role is only to find the right moment in time. This moment can be relayed back to libvirt, and from there do it according to your suggestion, so that the guest agent does not do the freeze, and it is actually not a mandatory component. Ronen.
Re: [Qemu-devel] RFC: Qemu Guest Tools ISO
On 06/22/2011 09:55 PM, Michael Roth wrote: Goal: Provide a mechanism, similar to vmware and virtualbox guest tools ISOs, that allows us to easily distribute guest tools (and potentially drivers) for linux and windows guests. Advantages (rough list to start the discussion, feel free to add/comment): - Simplify deployment of guest additions. ISO-installable additions can be pulled from QEMU/KVM/virtio/etc upstream or external projects as needed rather than worked into distros as independent packages. Users do not need to worry about installing lists of packages for full support. Pre-made ISOs can be pulled into QEMU/KVM in a manner similar to BIOSs/option roms. - Reduce complexity involved with needing to manage guests with outdated/missing tools or drivers. No need to rely on distros to pull drivers/features/bug fixes from upstream before relying on them; we can assume these fixes/features are immediately available from an upstream perspective, and distros can still maintain compatibility within a distro-centric environment by shipping specific versions of the guest tools ISO (hopefully the version committed to qemu.git at time of rebase or newer) - Simplify updates: Hypervisor can push guest tools updates by building QMP/guest agent interfaces around an ISO. - Extend support to older guests (and windows) where new repo packages are not a realistic option. - ? Disadvantages: - Need to test changes to tools against supported distros/platforms rather than punting to / or leveraging distro maintainers. KVM Autotest would likely be a big part of this. - Potentially less integration from a distro-centric perspective. Upstream mandates guest tools, distros need to keep up or rebase to remain in sync. Can still elect to support specific versions of a guest tools ISO, however. - ? Implementation: I hope to follow-up in fairly short order with a basic prototype of the tools/workflow to create/install a guest additions ISO. A rough overview of the approach I'm currently pursuing: - Use PyInstaller (built around pye2exe, linux/windows compatible, with logic to pull in required shared libs and windows/tcl/cmd.exe support as needed) to generate executables from python scripts. - Each project exists as a free-form directory with source code, or 32/64 bit pre-compiled binaries, windows-based installers, etc. To add to an ISO a symlink to this directory would be added along with a python installer script which accepts arch/distro as arguments. install/update/uninstall logic handled completely by this install script. - Top-level installer will iterate through guest additions projects and execute installers in turn. (some basic dependency support or explicit ordered may be needed). - Install scripts (top-level and per-project) will be run through a set of scripts built around PyInstaller to generate a group of executable installers for linux as well as for windows (installers can be do-nothings for unsupported platforms, or simply call out to other binaries if using, say, an MSI windows installer). Both will co-exist on the same ISO, and share the top-level projects directory containing the individual code/binaries for individual projects. Thoughts? The windows drivers are an issue. You do not want to compile them since you need the hard-to-get Microsoft certification. Now that you have to provide them in binary mode, the question is whether it makes sense to treat the Windows agent differently. Other than building the windows drivers, I don't see an issue. Ronen.