Re: [Qemu-devel] pvpanic plans?

2013-08-27 Thread Ronen Hod

On 08/27/2013 11:06 AM, Richard W.M. Jones wrote:

On Thu, Aug 22, 2013 at 03:09:06PM -0500, Anthony Liguori wrote:

Paolo Bonzini pbonz...@redhat.com writes:

Also, a virtio watchdog device makes little sense, IMHO.  PV makes sense
if emulation has insufficient performance, excessive CPU usage, or
excessive complexity.  We already have both an ISA and a PCI watchdog,
and they serve their purpose wonderfully.

Neither of which actually work with modern versions of Windows FWIW.

Correct, although someone could write a driver!


Plus emulated watchdogs do not take into account steal time or
overcommit in general.  I've seen multiple cases where a naive watchdog
has a problem in the field when the system is under heavy load.

The watchdog devices in qemu run on guest time.  However the watchdog
*daemon* inside the guest probably does behave badly as you describe.
Changing the device model isn't going to help this, but it would
definitely make sense to fix the daemon (although I don't know how --
is steal time exposed to guests?)

I don't necessarily think a virtio-watchdog is a bad idea.  For one
thing it'd mean we would have a watchdog device that works on ARM.

Rich.


I believe that a watchdog is not the way to go. You need host-side decision 
making.
Say that the guest did not receive CPU/Disk/network resources for a lengthy 
period
of time, but the host knows that this is due to host resources availability. In 
such cases,
you certainly do not want to reboot all the guests, especially since rebooting 
50
Windows VMs could be a nightmare.
BTW, Windows guest disable some of their watchdogs when they detect the presence
of Hyper-V, we use it to overcome BSODs!
So the right solution is to send a heart-beat to a management application 
(using qemu-ga
or whatever), and let it decide how to handle it.

Ronen.




Re: [Qemu-devel] [PATCH for-1.6 V2 0/2] pvpanic: Separate pvpanic from machine type

2013-08-14 Thread Ronen Hod

How about adding a flag that tells QEMU whether to pause or reboot the guest
after the panic?
We cannot assume that we always have a management layer that takes care
of this.
One example is Microsoft's WHQL that deliberately generates a BSOD, and then
examines the dump files.

Ronen.

On 08/11/2013 06:10 PM, Marcel Apfelbaum wrote:

Creating the pvpanic device as part of the machine type has the
potential to trigger guest OS, guest firmware and driver bugs.
The potential of such was originally viewed as minimal.
However, since releasing 1.5 with pvpanic as part
of the builtin machine type, several issues were observed
in the field:
  - Some Windows versions triggered 'New Hardware Wizard' and
an unidentified device appeared in Device Manager.
  - Issue reported off list: on Linux = 3.10
the pvpanic driver breaks the reset on crash option:
VM stops instead of being reset.

pvpanic device also changes monitor command behaviour in some cases,
such silent incompatible changes aren't expected by management tools:
  - Monitor command requires 'cont' before 'system_reset'
in order to restart the VM after kernel panic/BSOD

Note that libvirt is the main user and libvirt people indicated their
preference to creating device with -device pvpanic rather than a
built-in one that can't be removed.

These issues were raised at last KVM call. The agreement reached
there was that we were a bit too rash to make the device
a builtin, and that for 1.6 we should drop the pvpanic device from the
default machine type, instead teach management tools to add it by
default using -device pvpanic.
It's not clear whether changing 1.5 behaviour at this point
is a sane thing, so this patchset doesn't touch 1.5 machine type.

This patch series reworks the patchset from Hu Tao
(don't create pvpanic device by default)
addressing comments and modifying behaviour according
to what was discussed on the call.
Please review and consider for 1.6.

A related discussion can be followed at
http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00036.html.

This is a continuation of patches sent by Hu Tao:
http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00124.html
http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00125.html

Changes from v1 (by Hu Tao):
  - Keep pvpanic device enabled by default for 1.5
for backport compatibility
  - Addressed Andreas Färber review (removed bus type)
  - Small changes to be posible to enable pvpanic
both from command line and from machine_init
  - Added pvpanic to MISC category

Marcel Apfelbaum (2):
   hw/misc: don't create pvpanic device by default
   hw/misc: make pvpanic known to user

  hw/i386/pc_piix.c |  9 -
  hw/i386/pc_q35.c  |  7 ---
  hw/misc/pvpanic.c | 25 ++---
  3 files changed, 18 insertions(+), 23 deletions(-)






Re: [Qemu-devel] [RFC PATCH 1/2] qemu-help: Sort devices by logical functionality

2013-07-21 Thread Ronen Hod

On 07/18/2013 05:28 PM, Anthony Liguori wrote:

Marcel Apfelbaum marce...@redhat.com writes:


Categorize devices that appear as output to -device ? command
by logical functionality. Sort the devices by logical categories
before showing them to user.

Signed-off-by: Marcel Apfelbaum marce...@redhat.com
Reviewed-by: Kevin Wolf kw...@redhat.com
---
  include/hw/qdev-core.h |  7 +++
  qdev-monitor.c | 23 ++-
  2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 7fbffcb..4f7a9b8 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -17,6 +17,12 @@ enum {
  #define DEVICE_CLASS(klass) OBJECT_CLASS_CHECK(DeviceClass, (klass), 
TYPE_DEVICE)
  #define DEVICE_GET_CLASS(obj) OBJECT_GET_CLASS(DeviceClass, (obj), 
TYPE_DEVICE)
  
+#define DEVICE_CATEGORY_STORAGE storage

+#define DEVICE_CATEGORY_NETWORK network
+#define DEVICE_CATEGORY_INPUT input
+#define DEVICE_CATEGORY_DISPLAY display
+#define DEVICE_CATEGORY_SOUND sound
+

Looks reasonable, but please make this a bitmap.  There are cases,
particularly if we start modeling multifunction PCI cards as a single
device, where a single device can support multiple types of
functionality.


I agree.
On the one hand, there is no proper sorting by DEVICE_CATEGORY for devices
with multiple functions, but OTOH that's the reality.
If you add get_all_devices_by_category(dev_category), then it can be useful as 
is,
and also for looping over all the categories for printout. It will output those 
devices
in multiple categories, which is what we want.

Thanks, Ronen.



Regards,

Anthony Liguori


  typedef int (*qdev_initfn)(DeviceState *dev);
  typedef int (*qdev_event)(DeviceState *dev);
  typedef void (*qdev_resetfn)(DeviceState *dev);
@@ -81,6 +87,7 @@ typedef struct DeviceClass {
  /* public */
  
  const char *fw_name;

+const char *category;
  const char *desc;
  Property *props;
  int no_user;
diff --git a/qdev-monitor.c b/qdev-monitor.c
index e54dbc2..1446b6e 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -93,6 +93,9 @@ static void qdev_print_devinfo(ObjectClass *klass, void 
*opaque)
  if (qdev_class_has_alias(dc)) {
  error_printf(, alias \%s\, qdev_class_get_alias(dc));
  }
+if (dc-category) {
+error_printf(, category \%s\, dc-category);
+}
  if (dc-desc) {
  error_printf(, desc \%s\, dc-desc);
  }
@@ -139,16 +142,34 @@ static const char *find_typename_by_alias(const char 
*alias)
  return NULL;
  }
  
+static gint qdev_device_compare(gconstpointer item1, gconstpointer item2)

+{
+DeviceClass *dc1, *dc2;
+
+dc1 = (DeviceClass *)object_class_dynamic_cast((ObjectClass *)item1,
+   TYPE_DEVICE);
+dc2 = (DeviceClass *)object_class_dynamic_cast((ObjectClass *)item2,
+   TYPE_DEVICE);
+
+return g_strcmp0(dc1-category, dc2-category);
+}
+
  int qdev_device_help(QemuOpts *opts)
  {
  const char *driver;
  Property *prop;
  ObjectClass *klass;
+GSList *list;
  
  driver = qemu_opt_get(opts, driver);

  if (driver  is_help_option(driver)) {
  bool show_no_user = false;
-object_class_foreach(qdev_print_devinfo, TYPE_DEVICE, false, 
show_no_user);
+
+list = object_class_get_list(TYPE_DEVICE, false);
+list = g_slist_sort(list, qdev_device_compare);
+g_slist_foreach(list, (GFunc)qdev_print_devinfo, show_no_user);
+g_slist_free(list);
+
  return 1;
  }
  
--

1.8.3.1







Re: [Qemu-devel] [Resend][Seabios PATCH] don't boot from un-selected devices

2012-12-25 Thread Ronen Hod

On 12/19/2012 11:32 AM, Gleb Natapov wrote:

On Wed, Dec 19, 2012 at 03:24:45PM +0800, Amos Kong wrote:

Current seabios will try to boot from selected devices first,
if they are all failed, seabios will also try to boot from
un-selected devices.

For example:
@ qemu-kvm -boot order=n,menu=on ...

Guest will boot from network first, if it's failed, guest will try to
boot from other un-selected devices (floppy, cdrom, disk) one by one.

Sometimes, user don't want to boot from some devices. This patch changes

And sometimes he want. The patch changes behaviour unconditionally. New
behaviour should be user selectable. Something line -boot order=strict
on qemu command line.


Another option would be to add a terminator symbol, say T (I couldn't find a good 
terminator), so that order=ndT, would mean strict nd.

Ronen.




seabios to boot only from selected devices.

If user choose first boot device from menu, then seabios will try all
the devices, even some of them are not selected.

Signed-off-by: Amos Kong ak...@redhat.com
---
Resend for CCing seabios maillist.
---
  src/boot.c |   13 -
  1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/boot.c b/src/boot.c
index 3ca7960..ee810ac 100644
--- a/src/boot.c
+++ b/src/boot.c
@@ -424,6 +424,10 @@ interactive_bootmenu(void)
  maxmenu++;
  printf(%d. %s\n, maxmenu
 , strtcpy(desc, pos-description, ARRAY_SIZE(desc)));
+/* If user chooses first boot device from menu, we will treat
+   all the devices as selected. */
+if (pos-priority == DEFAULT_PRIO)
+pos-priority = DEFAULT_PRIO - 1;
  pos = pos-next;
  }
  
@@ -490,7 +494,10 @@ boot_prep(void)
  
  // Map drives and populate BEV list

  struct bootentry_s *pos = BootList;
-while (pos) {
+
+/* The priority of un-selected device is not changed,
+   we only boot from user selected devices. */
+while (pos  pos-priority != DEFAULT_PRIO) {
  switch (pos-type) {
  case IPL_TYPE_BCV:
  call_bcv(pos-vector.seg, pos-vector.offset);
@@ -513,10 +520,6 @@ boot_prep(void)
  }
  pos = pos-next;
  }
-
-// If nothing added a floppy/hd boot - add it manually.
-add_bev(IPL_TYPE_FLOPPY, 0);
-add_bev(IPL_TYPE_HARDDISK, 0);
  }
  
  
--

1.7.1

--
Gleb.






Re: [Qemu-devel] Better qemu/kvm defaults (was Re: [RFC PATCH 0/4] Gang scheduling in CFS)

2012-01-01 Thread Ronen Hod

On 01/01/2012 12:16 PM, Dor Laor wrote:

On 12/29/2011 06:16 PM, Anthony Liguori wrote:

On 12/29/2011 10:07 AM, Dor Laor wrote:

On 12/26/2011 11:05 AM, Avi Kivity wrote:

On 12/26/2011 05:14 AM, Nikunj A Dadhania wrote:


btw you can get an additional speedup by enabling x2apic, for
default_send_IPI_mask_logical().


In the host?



In the host, for the guest:

qemu -cpu ...,+x2apic



It seems to me that we should improve our default flags.
So many times users fail to submit the proper huge command-line
options that we
require. Honestly, we can't blame them, there are so many flags and so
many use
cases its just too hard to get it right for humans.


You might want to take into account migration considerations. I.e., the 
target host's optimal setup.
Also, we need to beware of too much automation, since hardware changes 
might void Windows license activations.
Some of the parameters will depend on dynamic factors such as the total 
guest's nCPUs, mem, sharing (KSM), or whatever.
As a minimum, we can automatically suggest the qemu parameters and the 
host setup.


Ronen.



I propose a basic idea and folks are welcome to discuss it:

1. Improve qemu/kvm defaults
Break the current backward compatibility (but add a --default-
backward-compat-mode) and set better values for:
- rtc slew time


What do you specifically mean?


-rtc localtime,driftfix=slew




- cache=none


I'm not sure I see this as a better default particularly since
O_DIRECT fails on certain file systems. I think we really need to let
WCE be toggable from the guest and then have a caching mode independent
of WCE. We then need some heuristics to only enable cache=off when we
know it's safe.


cache=none is still faster then it has the FS support.
qemu can test-run O_DIRECT and fall back to cache mode or just test 
the filesystem capabilities.





- x2apic, maybe enhance qemu64 or move to -cpu host?


Alex posted a patch for this. I'm planning on merging it although so far
no one has chimed up either way.


- aio=native|threads (auto-sense?)


aio=native is unsafe to default because linux-aio is just fubar. It
falls back to synchronous I/O if the underlying filesystem doesn't
support aio. There's no way in userspace to problem if it's actually
supported or not either...


Can we test-run this too? Maybe as a separate qemu mode or even binary 
that given a qemu cmdline, it will try to suggest better parameters?



- use virtio devices by default


I don't think this is realistic since appropriately licensed signed
virtio drivers do not exist for Windows. (Please note the phrase
appropriately licensed signed).


What's the percentage of qemu invocation w/ windows guest and a short 
cmd line? My hunch is that plain short cmdline indicates a developer 
and probably they'll use linux guest.





- more?

Different defaults may be picked automatically when TCG|KVM used.

2. External hardening configuration file kept in qemu.git
For non qemu/kvm specific definitions like the io scheduler we
should maintain a script in our tree that sets/sense the optimal
settings of the host kernel (maybe similar one for the guest).


What are appropriate host settings and why aren't we suggesting that
distros and/or upstream just set them by default?


It's hard to set the right default for a distribution since the same 
distro should optimize for various usages of the same OS. For example, 
Fedora has tuned-adm w/ available profiles:

- desktop-powersave
- server-powersave
- enterprise-storage
- spindown-disk
- laptop-battery-powersave
- default
- throughput-performance
- latency-performance
- laptop-ac-powersave

We need to keep on recommending the best profile for virtualization, 
for Fedora I think it either enterprise-storage and maybe 
throughput-performance.


If we have a such a script, it can call the matching tuned profile 
instead of tweaking every /sys option.




Regards,

Anthony Liguori


HTH,
Dor












Re: [Qemu-devel] [PATCH 1/5] vfio: Introduce documentation for VFIO driver

2011-12-28 Thread Ronen Hod

On 12/21/2011 11:42 PM, Alex Williamson wrote:

Including rationale for design, example usage and API description.

Signed-off-by: Alex Williamsonalex.william...@redhat.com
---

  Documentation/vfio.txt |  352 
  1 files changed, 352 insertions(+), 0 deletions(-)
  create mode 100644 Documentation/vfio.txt

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
new file mode 100644
index 000..09a5a5b
--- /dev/null
+++ b/Documentation/vfio.txt
@@ -0,0 +1,352 @@
+VFIO - Virtual Function I/O[1]
+---
+Many modern system now provide DMA and interrupt remapping facilities
+to help ensure I/O devices behave within the boundaries they've been
+allotted.  This includes x86 hardware with AMD-Vi and Intel VT-d,
+POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC
+systems such as Freescale PAMU.  The VFIO driver is an IOMMU/device
+agnostic framework for exposing direct device access to userspace, in
+a secure, IOMMU protected environment.  In other words, this allows
+safe[2], non-privileged, userspace drivers.
+
+Why do we want that?  Virtual machines often make use of direct device
+access (device assignment) when configured for the highest possible
+I/O performance.  From a device and host perspective, this simply
+turns the VM into a userspace driver, with the benefits of
+significantly reduced latency, higher bandwidth, and direct use of
+bare-metal device drivers[3].
+
+Some applications, particularly in the high performance computing
+field, also benefit from low-overhead, direct device access from
+userspace.  Examples include network adapters (often non-TCP/IP based)
+and compute accelerators.  Prior to VFIO, these drivers had to either
+go through the full development cycle to become proper upstream
+driver, be maintained out of tree, or make use of the UIO framework,
+which has no notion of IOMMU protection, limited interrupt support,
+and requires root privileges to access things like PCI configuration
+space.
+
+The VFIO driver framework intends to unify these, replacing both the
+KVM PCI specific device assignment code as well as provide a more
+secure, more featureful userspace driver environment than UIO.
+
+Groups, Devices, and IOMMUs
+---
+
+Userspace drivers are primarily concerned with manipulating individual
+devices and setting up mappings in the IOMMU for those devices.
+Unfortunately, the IOMMU doesn't always have the granularity to track
+mappings for an individual device.  Sometimes this is a topology
+barrier, such as a PCIe-to-PCI bridge interposing the device and
+IOMMU, other times this is an IOMMU limitation.  In any case, the
+reality is that devices are not always independent with respect to the
+IOMMU.  Translations setup for one device can be used by another
+device in these scenarios.
+
+The IOMMU API exposes these relationships by identifying an IOMMU
+group for these dependent devices.  Devices on the same bus with the
+same IOMMU group (or just group for this document) are not isolated
+from each other with respect to DMA mappings.  For userspace usage,
+this logically means that instead of being able to grant ownership of
+an individual device, we must grant ownership of a group, which may
+contain one or more devices.
+
+These groups therefore become a fundamental component of VFIO and the
+working unit we use for exposing devices and granting permissions to
+userspace.  In addition, VFIO make efforts to ensure the integrity of
+the group for user access.  This includes ensuring that all devices
+within the group are controlled by VFIO (vs native host drivers)
+before allowing a user to access any member of the group or the IOMMU
+mappings, as well as maintaining the group viability as devices are
+dynamically added or removed from the system.
+
+To access a device through VFIO, a user must open a character device
+for the group that the device belongs to and then issue an ioctl to
+retrieve a file descriptor for the individual device.  This ensures
+that the user has permissions to the group (file based access to the
+/dev entry) and allows a check point at which VFIO can deny access to
+the device if the group is not viable (all devices within the group
+controlled by VFIO).  A file descriptor for the IOMMU is obtain in the
+same fashion.
+
+VFIO defines a standard set of APIs for access to devices and a
+modular interface for adding new, bus-specific VFIO device drivers.
+We call these VFIO bus drivers.  The vfio-pci module is an example
+of a bus driver for exposing PCI devices.  When the bus driver module
+is loaded it enumerates all of the devices for it's bus, registering
+each device with the vfio core along with a set of callbacks.  For
+buses that support hotplug, the bus driver also adds itself to the
+notification chain for such events.  

Re: [Qemu-devel] [RFC] Migration convergence - a suggestion

2011-12-21 Thread Ronen Hod

On 12/20/2011 03:39 PM, Anthony Liguori wrote:

On 12/20/2011 01:06 AM, Ronen Hod wrote:
Well the issue is not new, anyhow, following a conversation with Orit 
...


Since we want the migration to finish, I believe that the migration 
speed

parameter alone cannot do the job.
I suggest using two distinct parameters:
1. Migration speed - will be used to limit the network resources 
utilization
2. aggressionLevel - A number between 0.0 and 1.0, where low values 
imply

minimal interruption to the guest, and 1.0 mean that the guest will be
completely stalled.

In any case the migration will have to do its work and finish given 
any actual
migration-speed, so even low aggressionLevel values will sometimes 
imply that

the guest will be throttled substantially.

The algorithm:
The aggressionLevel should determine the targetGuest%CPU (how much 
CPU time we

want to allocate to the guest)


QEMU has no way to limit the guest CPU time.


Wouldn't any yield (sleep / whatever) limit the guest's CPU time, be 
it in qemu or in KVM.
My intention is to suggest an algorithm that is based on guest 
throttling. Looking at the relevant BZs, I do not see how we can avoid 
it. I certainly have no claims regarding the architecture.
Avi and mst, believe that it is better to continuously control the 
guest's CPU from the outside (libvirt) using cgroups. Although less 
responsive to changes, it should still work.
In the meantime, I also discovered that everybody has a different point 
of view regarding the requirements. Regardless, I believe that the same 
basic mechanics (once decided), can do the work

Some relevant configuration requirements are:
1. Max bandwidth
2. Min CPU per guest
3. Max guest stall time
4. Max migration time
These requirements will often conflict, and may imply changes in 
behavior over time.


I would also suggest that the management GUI will let the user select 
the aggression-level (or whatever), and display the implication on all 
the other parameters (total-time, %CPU) based on the current behavior of 
the guest and network.


Regards, Ronen



Regards,

Anthony Liguori





[Qemu-devel] [RFC] Migration convergence - a suggestion

2011-12-19 Thread Ronen Hod

Well the issue is not new, anyhow, following a conversation with Orit ...

Since we want the migration to finish, I believe that the migration 
speed parameter alone cannot do the job.

I suggest using two distinct parameters:
1. Migration speed - will be used to limit the network resources utilization
2. aggressionLevel - A number between 0.0 and 1.0, where low values 
imply minimal interruption to the guest, and 1.0 mean that the guest 
will be completely stalled.


In any case the migration will have to do its work and finish given any 
actual migration-speed, so even low aggressionLevel values will 
sometimes imply that the guest will be throttled substantially.


The algorithm:
The aggressionLevel should determine the targetGuest%CPU (how much CPU 
time we want to allocate to the guest)

With aggressionLevel = 1.0, the guest gets no CPU-resources (stalled).
With aggressionLevel = 0.0, the guest gets minGuest%CPU, such that 
migrationRate == dirtyPagesRate. This minGuest%CPU is continuously 
updated based on the running average of the recent samples (more below).


Note that the targetGuest%CPU allocation is continuously updated due to 
changes guest behavior, network congestion, and alike.


Some more details
- minGuest%CPU (i.e., for dirtyPagesRate == migrationRate) is easy to 
calculate as a running average of

  (migrationRate / dirtyPagesRate * guest%CPU)
- There are several methods to calculate the running average, my 
favorite is IIR, where, roughly speaking,

  newVal = 0.99 * oldVal + 0.01 * newSample
- I would use two measures to ensure that there are more migrated pages 
than dirty pages.
  1. The running average (based on recent samples) of the migrated 
pages is larger than that of the new dirty pages
  2. The total number of migrated pages so far is larger than the total 
number of new dirty pages.


And yes, many details are still missing.

Ronen.



Re: [Qemu-devel] [PATCH] qemu_timedate_diff() shouldn't modify its argument.

2011-11-07 Thread Ronen Hod

On 11/06/2011 06:00 PM, Gleb Natapov wrote:

The caller of qemu_timedate_diff() does not expect that tm it passes to
the function will be modified, but mktime() is destructive and modifies
its argument. Pass a copy of tm to it and set tm_isdst so that mktime()
will not rely on it since its value may be outdated.


I believe that the original issue was not related to outdated data at 
the moment of the daylight saving time transition.
using tmp.tm_isdst = -1 sounds good, but why use a copy of tm? The only 
significant field that will change in the tm is the tm_isdst itself that 
will be set to 0/1 (correctly).


Acked-by: Ronen Hod r...@redhat.com


Signed-off-by: Gleb Natapovg...@redhat.com
diff --git a/vl.c b/vl.c
index 624da0f..641629b 100644
--- a/vl.c
+++ b/vl.c
@@ -460,8 +460,11 @@ int qemu_timedate_diff(struct tm *tm)
  if (rtc_date_offset == -1)
  if (rtc_utc)
  seconds = mktimegm(tm);
-else
-seconds = mktime(tm);
+else {
+struct tm tmp = *tm;
+tmp.tm_isdst = -1; /* use timezone to figure it out */
+seconds = mktime(tmp);
+   }
  else
  seconds = mktimegm(tm) + rtc_date_offset;

--
Gleb.






Re: [Qemu-devel] Question on kvm_clock working ...

2011-09-26 Thread Ronen Hod

On 09/09/2011 06:28 PM, al pat wrote:


We are doing an experiment with kvm-clock to validate its 
effectiveness, particularly when running NTP on the host to make sure 
the host’s clock stays properly sync.
Our observations leads us to a few unanswered questions, including the 
possibility of a bug (our our misunderstanding of how kvm_clock should 
work).


Our understanding is that kvm_clock will help sync the clock between 
the host and the guest. We do not observe this to happen in reality 
and thus this question.


We are using Ubuntu 11.04 on the host and the guest.

The command we issue to launch the VM is the following:

$ sudo kvm -m 500 -rtc clock=host guestos.img

We also arranged for Ubuntu to show the seconds on the clock displayed 
in the menu.


Observation 1:
Upon launching the VM, we see a time difference between the 2 clock 
ranging from 1 to 2 seconds.


Observation 2:
If we change the date on the host (with a command such as “date --set 
10:00:00 AM Sep 9, 2011”), the time on the guest remains the same, 
unaffected.


Observation 3:
After running for a while without NTP on the host, we run “ntpdate” to 
sync up the host, but the guest stick with whatever previous time.


You probably meant ntpd -q




Another test we will run is to have ntpd on the host and wait for an 
extended time to see if the guest drifts away from that original 1 or 
 2 second lag. In the meantime, we are asking you for some input in 
this regards:

Questions
-What does the “–rtc clock” option is supposed to mean exactly? 
 According to the man page, the guest should get its time from the 
host, but neither date nor an “ntpdate” affected the clock on the guest.

-What are the other options that we should use?

   -rtc [base=utc|localtime|date][,clock=host|vm][,driftfix=none|slew]
  Specify base as utc or localtime to let the RTC start at the
  current UTC or local time, respectively. localtime is required
  for correct date in MS-DOS or Windows. To start at a 
specific point

  in time, provide date in the format 2006-06-17T16:01:21 or
   2006-06-17. The default base is UTC.

  By default the RTC is driven by the host system time. This 
allows

  to use the RTC as accurate reference clock inside the guest,
  specifically if the host time is smoothly following an accurate
  external reference clock, e.g. via NTP.  If you want to 
isolate the
  guest time from the host, even prevent it from progressing 
during

  suspension, you can set clock to vm instead.

  Enable driftfix (i386 targets only) if you experience time drift
  problems, specifically with Windows' ACPI HAL. This option 
will try
  to figure out how many timer interrupts were not processed 
by the

  Windows guest and will re-inject them.


Can someone shed light on what we are missing? Any pointers will be 
helpful.


Thanks
-a





Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-08-07 Thread Ronen Hod
Well, we want to support Microsoft's VSS, and that requires a guest 
agent that communicates with all the writers (applications), waiting 
for them to flush their app data in order to generate a consistent 
app-level snapshot. The VSS platform does most of the work.
Still, at the bottom line, the agent's role is only to find the right 
moment in time. This moment can be relayed back to libvirt, and from 
there do it according to your suggestion, so that the guest agent does 
not do the freeze, and it is actually not a mandatory component.


Ronen.



Re: [Qemu-devel] RFC: Qemu Guest Tools ISO

2011-06-23 Thread Ronen Hod

On 06/22/2011 09:55 PM, Michael Roth wrote:

Goal:

Provide a mechanism, similar to vmware and virtualbox guest tools 
ISOs, that allows us to easily distribute guest tools (and potentially 
drivers) for linux and windows guests.


Advantages (rough list to start the discussion, feel free to 
add/comment):


- Simplify deployment of guest additions. ISO-installable additions 
can be pulled from QEMU/KVM/virtio/etc upstream or external projects 
as needed rather than worked into distros as independent packages. 
Users do not need to worry about installing lists of packages for full 
support. Pre-made ISOs can be pulled into QEMU/KVM in a manner similar 
to BIOSs/option roms.


- Reduce complexity involved with needing to manage guests with 
outdated/missing tools or drivers. No need to rely on distros to pull 
drivers/features/bug fixes from upstream before relying on them; we 
can assume these fixes/features are immediately available from an 
upstream perspective, and distros can still maintain compatibility 
within a distro-centric environment by shipping specific versions of 
the guest tools ISO (hopefully the version committed to qemu.git at 
time of rebase or newer)


- Simplify updates: Hypervisor can push guest tools updates by 
building QMP/guest agent interfaces around an ISO.


- Extend support to older guests (and windows) where new repo packages 
are not a realistic option.


- ?

Disadvantages:

- Need to test changes to tools against supported distros/platforms 
rather than punting to / or leveraging distro maintainers. KVM 
Autotest would likely be a big part of this.


- Potentially less integration from a distro-centric perspective. 
Upstream mandates guest tools, distros need to keep up or rebase to 
remain in sync. Can still elect to support specific versions of a 
guest tools ISO, however.


- ?

Implementation:

I hope to follow-up in fairly short order with a basic prototype of 
the tools/workflow to create/install a guest additions ISO. A rough 
overview of the approach I'm currently pursuing:


- Use PyInstaller (built around pye2exe, linux/windows compatible, 
with logic to pull in required shared libs and windows/tcl/cmd.exe 
support as needed) to generate executables from python scripts.


- Each project exists as a free-form directory with source code, or 
32/64 bit pre-compiled binaries, windows-based installers, etc. To add 
to an ISO a symlink to this directory would be added along with a 
python installer script which accepts arch/distro as arguments. 
install/update/uninstall logic handled completely by this install script.


- Top-level installer will iterate through guest additions projects 
and execute installers in turn. (some basic dependency support or 
explicit ordered may be needed).


- Install scripts (top-level and per-project) will be run through a 
set of scripts built around PyInstaller to generate a group of 
executable installers for linux as well as for windows (installers can 
be do-nothings for unsupported platforms, or simply call out to other 
binaries if using, say, an MSI windows installer). Both will co-exist 
on the same ISO, and share the top-level projects directory containing 
the individual code/binaries for individual projects.


Thoughts?

The windows drivers are an issue. You do not want to compile them since 
you need the hard-to-get Microsoft certification. Now that you have to 
provide them in binary mode, the question is whether it makes sense to 
treat the Windows agent differently.

Other than building the windows drivers, I don't see an issue.
Ronen.