date:20140723

[Qemu-devel] [PULL 1/1] usb: mtp: tag root property as experimental

2014-07-23 Thread Gerd Hoffmann

Reason: we don't want commit to that interface yet.  Possibly
the implementation will be switched over to use fsdev.

Suggested-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/usb/dev-mtp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c
index 1b51a90..384d4a5 100644
--- a/hw/usb/dev-mtp.c
+++ b/hw/usb/dev-mtp.c
@@ -1090,7 +1090,7 @@ static const VMStateDescription vmstate_usb_mtp = {
 };
 
 static Property mtp_properties[] = {
-DEFINE_PROP_STRING(root, MTPState, root),
+DEFINE_PROP_STRING(x-root, MTPState, root),
 DEFINE_PROP_STRING(desc, MTPState, desc),
 DEFINE_PROP_END_OF_LIST(),
 };
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH for-2.1] docs: document remaining QMP events

2014-07-23 Thread Markus Armbruster

Eric Blake ebl...@redhat.com writes:

 Commit dfab4892 restored this file, but did not address any of the
 grammar problems that had been fixed in passing when moving events
 out of this file.  There are also a couple events that were
 undocumented since introduction, and one that had been added only
 in the time that this file was temporarily deleted.

 * docs/qmp/qmp-events.txt (POWERDOWN, SPICE_MIGRATE_COMPLETED)
 (VSERPORT_CHANGE): Add.
 (RESET, SPICE_INITIALIZED): Fix grammar.
 (SPICE_CONNECTED, SPICE_DISCONNECTED): Split.

GNU ChangeLog style, unusual in QEMU commit messages.  Not that I mind.

 Signed-off-by: Eric Blake ebl...@redhat.com
 ---
  docs/qmp/qmp-events.txt | 80 
 +
  1 file changed, 74 insertions(+), 6 deletions(-)

 diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
 index 4a6c2a2..78dd76a 100644
 --- a/docs/qmp/qmp-events.txt
 +++ b/docs/qmp/qmp-events.txt
 @@ -243,6 +243,19 @@ Data:
timestamp: { seconds: 1368697518, microseconds: 326866 } }
  }

 +POWERDOWN
 +-
 +
 +Emitted when the Virtual Machine is powered down through the power
 +control system, such as via ACPI.
 +
 +Data: None.
 +
 +Example:
 +
 +{ event: POWERDOWN,
 +timestamp: { seconds: 1267040730, microseconds: 682951 } }
 +
  QUORUM_FAILURE
  --

 @@ -285,7 +298,7 @@ Example:
  RESET
  -

 -Emitted when the Virtual Machine is reseted.
 +Emitted when the Virtual Machine is reset.

  Data: None.

 @@ -325,7 +338,8 @@ Example:
  SHUTDOWN
  

 -Emitted when the Virtual Machine is powered down.
 +Emitted when the Virtual Machine has shut down, indicating that qemu
 +is about to exit.

  Data: None.

 @@ -337,10 +351,10 @@ Example:
  Note: If the command-line option -no-shutdown has been specified, a STOP
  event will eventually follow the SHUTDOWN event.

 -SPICE_CONNECTED, SPICE_DISCONNECTED
 
 +SPICE_CONNECTED
 +---

 -Emitted when a SPICE client connects or disconnects.
 +Emitted when a SPICE client connects.

Wording doesn't match qapi-event.json exactly.  I doubt we care.


  Data:

 @@ -362,11 +376,36 @@ Example:
  client: {port: 52873, family: ipv4, host: 127.0.0.1}
  }}

 +SPICE_DISCONNECTED
 +--
 +
 +Emitted when a SPICE client disconnects.
 +
 +Data:
 +
 +- server: Server information (json-object)
 +  - host: IP address (json-string)
 +  - port: port number (json-string)
 +  - family: address family (json-string, ipv4 or ipv6)
 +- client: Client information (json-object)
 +  - host: IP address (json-string)
 +  - port: port number (json-string)
 +  - family: address family (json-string, ipv4 or ipv6)
 +
 +Example:
 +
 +{ timestamp: {seconds: 1290688046, microseconds: 388707},
 +  event: SPICE_DISCONNECTED,
 +  data: {
 +server: { port: 5920, family: ipv4, host: 127.0.0.1},
 +client: {port: 52873, family: ipv4, host: 127.0.0.1}
 +}}
 +
  SPICE_INITIALIZED
  -

  Emitted after initial handshake and authentication takes place (if any)
 -and the SPICE channel is up'n'running
 +and the SPICE channel is up and running

  Data:

 @@ -399,6 +438,19 @@ Example:
channel-id: 0, tls: true}
  }}

 +SPICE_INITIALIZED

Another SPICE_INITIALIZED?  Do you mean SPICE_MIGRATE_COMPLETED?

 +-
 +
 +Emitted when SPICE migration has completed
 +
 +Data: None.
 +
 +Example:
 +
 +{ timestamp: {seconds: 1290688046, microseconds: 417172},
 +  event: SPICE_MIGRATE_COMPLETED }
 +
 +
  STOP
  

 @@ -527,6 +579,22 @@ Example:
  host: 127.0.0.1, sasl_username: luiz } },
  timestamp: { seconds: 1263475302, microseconds: 150772 } }

 +VSERPORT_CHANGE
 +---
 +
 +Emitted when the guest opens or closes a virtio-serial port.
 +
 +Data:
 +
 +- id: device identifier of the virtio-serial port (json-string)
 +- open: true if the guest has opened the virtio-serial port (json-bool)
 +
 +Example:
 +
 +{ event: VSERPORT_CHANGE,
 +data: { id: channel0, open: true },
 +timestamp: { seconds: 1401385907, microseconds: 422329 } }
 +
  WAKEUP
  --

Assuming you do mean SPICE_MIGRATE_COMPLETED: list is complete now.

Would you mind splitting this patch?

* Either one patch per undocumented event (if you want to be nice to
  downstreams cherry-picking events), or one patch for all of them.

* One patch for the rest.  Or if you feel generous, two: one for the
  grammar fixes, one for the spice split.

Re: [Qemu-devel] [PATCH] pci: Don't deliver MSI/MSI-X messages if bus master support is off

2014-07-23 Thread Jan Kiszka

On 2014-07-22 21:06, Michael S. Tsirkin wrote:
 On Mon, Jul 21, 2014 at 12:04:22AM +0200, Jan Kiszka wrote:
 On 2014-07-20 23:03, Michael S. Tsirkin wrote:
 On Sun, Jul 20, 2014 at 11:45:10PM +0200, Jan Kiszka wrote:
 On 2014-07-20 21:48, Michael S. Tsirkin wrote:
 On Sat, Jul 19, 2014 at 06:55:48PM +0200, Jan Kiszka wrote:
 From: Jan Kiszka jan.kis...@siemens.com

 The spec says (and real HW confirms this) that, if the bus master bit
 is 0, the device will not generate any PCI accesses. MSI and MSI-X
 messages fall among these.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 I guess an alternative is for callers to check before
 invoking msi_notify. Please note is this is only option
 when using e.g. irqfd, so this has some advantages.
 Is there a specific device that is affected by this?
 I would expect drivers to disable msi before clearing
 bus master bit ...

 This is about emulating conforming behaviour without touching each and
 every device. I stumbled over this while playing with emulated vs. real
 Intel HDA.

 Right so that's my question.
 How did you hit it? With a custom driver?

 So to say: with a hand full lines of code to tickle some MSI event out
 of that device for testing purposes.

 Doesn't regulat driver disable MSI?

 Sure. This is not fixing a regular's driver problem. It's a behavioral
 correction for faulty corner cases.

 Jan
 
 OK based on this I think this is not 2.1 material. Agree?

Agree.

I'll look into Paolo's suggestion how to model this asap.

Jan




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation

2014-07-23 Thread Paolo Bonzini

Il 22/07/2014 17:47, Le Tan ha scritto:
 +static inline void define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
 +uint64_t wmask, uint64_t w1cmask)
 +{
 +*((uint64_t *)s-csr[addr]) = val;

All these casts are not endian-safe.  Please use ldl_le_p, ldq_le_p,
stl_le_p, stq_le_p.

 +*((uint64_t *)s-wmask[addr]) = wmask;
 +*((uint64_t *)s-w1cmask[addr]) = w1cmask;
 +}
 +
 +static inline void define_quad_wo(IntelIOMMUState *s, hwaddr addr,
 +  uint64_t mask)
 +{
 +*((uint64_t *)s-womask[addr]) = mask;
 +}
 +
 +static inline void define_long(IntelIOMMUState *s, hwaddr addr, uint32_t val,
 +uint32_t wmask, uint32_t w1cmask)
 +{
 +*((uint32_t *)s-csr[addr]) = val;
 +*((uint32_t *)s-wmask[addr]) = wmask;
 +*((uint32_t *)s-w1cmask[addr]) = w1cmask;
 +}
 +
 +static inline void define_long_wo(IntelIOMMUState *s, hwaddr addr,
 +  uint32_t mask)
 +{
 +*((uint32_t *)s-womask[addr]) = mask;
 +}
 +
 +/* External get/set operations */
 +static inline void set_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val)
 +{
 +uint64_t oldval = *((uint64_t *)s-csr[addr]);
 +uint64_t wmask = *((uint64_t *)s-wmask[addr]);
 +uint64_t w1cmask = *((uint64_t *)s-w1cmask[addr]);
 +*((uint64_t *)s-csr[addr]) =
 +((oldval  ~wmask) | (val  wmask))  ~(w1cmask  val);
 +}
 +
 +static inline void set_long(IntelIOMMUState *s, hwaddr addr, uint32_t val)
 +{
 +uint32_t oldval = *((uint32_t *)s-csr[addr]);
 +uint32_t wmask = *((uint32_t *)s-wmask[addr]);
 +uint32_t w1cmask = *((uint32_t *)s-w1cmask[addr]);
 +*((uint32_t *)s-csr[addr]) =
 +((oldval  ~wmask) | (val  wmask))  ~(w1cmask  val);
 +}
 +
 +static inline uint64_t get_quad(IntelIOMMUState *s, hwaddr addr)
 +{
 +uint64_t val = *((uint64_t *)s-csr[addr]);
 +uint64_t womask = *((uint64_t *)s-womask[addr]);
 +return val  ~womask;
 +}
 +
 +
 +static inline uint32_t get_long(IntelIOMMUState *s, hwaddr addr)
 +{
 +uint32_t val = *((uint32_t *)s-csr[addr]);
 +uint32_t womask = *((uint32_t *)s-womask[addr]);
 +return val  ~womask;
 +}
 +
 +
 +
 +/* Internal get/set operations */
 +static inline uint64_t __get_quad(IntelIOMMUState *s, hwaddr addr)

get_quad_raw?

 +{
 +return *((uint64_t *)s-csr[addr]);
 +}
 +
 +static inline uint32_t __get_long(IntelIOMMUState *s, hwaddr addr)

get_long_raw?

 +{
 +return *((uint32_t *)s-csr[addr]);
 +}
 +
 +
 +/* val = (val  ~clear) | mask */
 +static inline uint32_t set_mask_long(IntelIOMMUState *s, hwaddr addr,

set_clear_long?

 + uint32_t clear, uint32_t mask)
 +{
 +uint32_t *ptr = (uint32_t *)s-csr[addr];
 +uint32_t val = (*ptr  ~clear) | mask;
 +*ptr = val;
 +return val;
 +}
 +
 +/* val = (val  ~clear) | mask */
 +static inline uint64_t set_mask_quad(IntelIOMMUState *s, hwaddr addr,

set_clear_quad?
 + uint64_t clear, uint64_t mask)
 +{
 +uint64_t *ptr = (uint64_t *)s-csr[addr];
 +uint64_t val = (*ptr  ~clear) | mask;
 +*ptr = val;
 +return val;
 +}
 +
 +

Re: [Qemu-devel] [PATCH 1/2] qemu-img: Allow source cache mode specification

2014-07-23 Thread Kevin Wolf

Am 22.07.2014 um 22:06 hat Max Reitz geschrieben:
 On 21.07.2014 17:52, Eric Blake wrote:
 On 07/19/2014 02:35 PM, Max Reitz wrote:
 Many qemu-img subcommands only read the source file(s) once. For these
 use cases, a full write-back cache is unnecessary and mainly clutters
 host cache memory. Though this is generally no concern as cache memory
 is freely available and can be scaled by the host OS, it may become a
 concern with thin provisioning.
 
 For these cases, it makes sense to allow users to freely specify the
 source cache mode (e.g. use no cache at all).
 
 This commit adds a new switch (-T) for the qemu-img subcommands check,
 compare, convert and rebase to specify the cache to be used for source
 images (the backing file in case of rebase).
 What mnemonic did you have in mind when choosing -T? Or was it just a
 universally available letter for the subcommands you were touching?
 
 To be honest, I just didn't know what -t stands for. Therefore I
 just thought it might be remotely logical if the lower-cased letter
 is used for destination and the upper-cased letter for source
 caching.

Things might get a bit confusing there, though, because upper-case
often means the other image, i.e. destination or backing file, in other
commands (create -F, compare -F, convert -O and -B, rebase -F).

Of course, most of them are deprecated, so I wouldn't make that a reason
to block this series, but perhaps we should consider using more long
options instead of randomly assigning the letters that are still free.

Kevin

[Qemu-devel] [PATCH] icount: Fix virtual clock start value on ARM

2014-07-23 Thread Sebastian Tanase

When using the icount option on ARM, the virtual
clock starts counting at realtime clock but it
should start at 0.
This small fix addresses this issue.

Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr
---
 cpus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 5e7f2cf..de18ece 100644
--- a/cpus.c
+++ b/cpus.c
@@ -104,7 +104,7 @@ static bool all_cpu_threads_idle(void)
 
 /* Compensate for varying guest execution speed.  */
 static int64_t qemu_icount_bias;
-static int64_t vm_clock_warp_start;
+static int64_t vm_clock_warp_start = -1;
 /* Conversion factor from emulated instructions to virtual clock ticks.  */
 static int icount_time_shift;
 /* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
-- 
2.0.0.rc2

Re: [Qemu-devel] [PATCH] icount: Fix virtual clock start value on ARM

2014-07-23 Thread Paolo Bonzini

Il 23/07/2014 11:11, Sebastian Tanase ha scritto:
 When using the icount option on ARM, the virtual
 clock starts counting at realtime clock but it
 should start at 0.
 This small fix addresses this issue.
 
 Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr

Thanks, this is ok for 2.2.

Paolo

 ---
  cpus.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/cpus.c b/cpus.c
 index 5e7f2cf..de18ece 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -104,7 +104,7 @@ static bool all_cpu_threads_idle(void)
  
  /* Compensate for varying guest execution speed.  */
  static int64_t qemu_icount_bias;
 -static int64_t vm_clock_warp_start;
 +static int64_t vm_clock_warp_start = -1;
  /* Conversion factor from emulated instructions to virtual clock ticks.  */
  static int icount_time_shift;
  /* Arbitrarily pick 1MIPS as the minimum allowable speed.  */

Re: [Qemu-devel] [PATCH] icount: Fix virtual clock start value on ARM

2014-07-23 Thread Andreas Färber

Am 23.07.2014 11:16, schrieb Paolo Bonzini:
 Il 23/07/2014 11:11, Sebastian Tanase ha scritto:
 When using the icount option on ARM, the virtual
 clock starts counting at realtime clock but it
 should start at 0.
 This small fix addresses this issue.

 Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr
 
 Thanks, this is ok for 2.2.

Could we get an explanation (in the commit message) of why this fixes
that issue? :) By my reading -1 != 0.

Thanks,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH] icount: Fix virtual clock start value on ARM

2014-07-23 Thread Peter Maydell

On 23 July 2014 10:11, Sebastian Tanase sebastian.tan...@openwide.fr wrote:
 When using the icount option on ARM, the virtual
 clock starts counting at realtime clock but it
 should start at 0.
 This small fix addresses this issue.

 Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr
 ---
  cpus.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/cpus.c b/cpus.c
 index 5e7f2cf..de18ece 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -104,7 +104,7 @@ static bool all_cpu_threads_idle(void)

  /* Compensate for varying guest execution speed.  */
  static int64_t qemu_icount_bias;
 -static int64_t vm_clock_warp_start;
 +static int64_t vm_clock_warp_start = -1;
  /* Conversion factor from emulated instructions to virtual clock ticks.  */
  static int icount_time_shift;
  /* Arbitrarily pick 1MIPS as the minimum allowable speed.  */

Commit message says this is fixing an ARM bug but this is
a generic file. Is this actually a bug with wider scope than just
ARM?

thanks
-- PMM

Re: [Qemu-devel] [PATCH] icount: Fix virtual clock start value on ARM

2014-07-23 Thread Paolo Bonzini

Il 23/07/2014 11:41, Peter Maydell ha scritto:
 On 23 July 2014 10:11, Sebastian Tanase sebastian.tan...@openwide.fr wrote:
 When using the icount option on ARM, the virtual
 clock starts counting at realtime clock but it
 should start at 0.
 This small fix addresses this issue.

 Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr
 ---
  cpus.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/cpus.c b/cpus.c
 index 5e7f2cf..de18ece 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -104,7 +104,7 @@ static bool all_cpu_threads_idle(void)

  /* Compensate for varying guest execution speed.  */
  static int64_t qemu_icount_bias;
 -static int64_t vm_clock_warp_start;
 +static int64_t vm_clock_warp_start = -1;
  /* Conversion factor from emulated instructions to virtual clock ticks.  */
  static int icount_time_shift;
  /* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
 
 Commit message says this is fixing an ARM bug but this is
 a generic file. Is this actually a bug with wider scope than just
 ARM?

Yes, see the discussion yesterday under Re: [RFC PATCH V4 6/6] monitor:
Add drift info to 'info jit' and Re: [RFC PATCH V4 0/6] icount:
Implement delay algorithm between guest and host clocks.

Paolo

[Qemu-devel] [PATCH v2] icount: Fix virtual clock start value on ARM

2014-07-23 Thread Sebastian Tanase

When using the icount option on ARM, the virtual
clock starts counting at realtime clock but it
should start at 0.

The reason why the virtual clock starts at realtime clock
is because the first time we call qemu_clock_warp (which
calls icount_warp_rt) in tcg_exec_all, qemu_icount_bias
(which is part of the virtual time computation mechanism)
will increment by realtime - vm_clock_warp_start, with
vm_clock_warp_start being 0 (see icount_warp_rt in cpus.c).

By changing the value of vm_clock_warp_start from 0 to -1,
the first time we call qemu_clock_warp which calls
icount_warp_rt, we will return immediatly because
icount_warp_rt first checks if vm_clock_warp_start is -1
and if it's the case it returns. Therefore, qemu_icount_bias
will first be incremented by the value of a virtual timer
deadline when the virtual cpu goes from active to inactive.

The virtual time will start at 0 and increment based
on the instruction counter when the vcpu is active or
the qemu_icount_bias value when inactive.

Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr
---
 cpus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 5e7f2cf..de18ece 100644
--- a/cpus.c
+++ b/cpus.c
@@ -104,7 +104,7 @@ static bool all_cpu_threads_idle(void)
 
 /* Compensate for varying guest execution speed.  */
 static int64_t qemu_icount_bias;
-static int64_t vm_clock_warp_start;
+static int64_t vm_clock_warp_start = -1;
 /* Conversion factor from emulated instructions to virtual clock ticks.  */
 static int icount_time_shift;
 /* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
-- 
2.0.0.rc2

Re: [Qemu-devel] [RFC v4 03/13] hw/vfio/pci: Remove unneeded include files

2014-07-23 Thread Eric Auger

On 07/08/2014 08:55 PM, Alex Williamson wrote:
 On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
 Signed-off-by: Eric Auger eric.au...@linaro.org
 ---
  hw/vfio/pci.c | 12 
  1 file changed, 12 deletions(-)

 diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
 index 5c7bfd5..a7df3de 100644
 --- a/hw/vfio/pci.c
 +++ b/hw/vfio/pci.c
 @@ -18,26 +18,14 @@
   *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (m...@il.ibm.com)
   */
  
 -#include dirent.h
  #include linux/vfio.h
  #include sys/ioctl.h
  #include sys/mman.h
 -#include sys/stat.h
 -#include sys/types.h
 -#include unistd.h
 -
 -#include config.h
  #include exec/address-spaces.h
 -#include exec/memory.h
  #include hw/pci/msi.h
  #include hw/pci/msix.h
 -#include hw/pci/pci.h
 -#include qemu-common.h
  #include qemu/error-report.h
 -#include qemu/event_notifier.h
 -#include qemu/queue.h
  #include qemu/range.h
 -#include sysemu/kvm.h
  #include sysemu/sysemu.h
  #include hw/vfio/vfio.h
 
 Was this just a remove and see if it still compiles exercise?  I'm not
 sure I'm a fan of removing includes that are arbitrarily included via
 another include chain.  Thanks,

Hi Alex.

Sorry for the delay, coming back from vacation period...

Yes it was a lazy way to sort things out for PCI/platform split.

Then I will drop that patch file.

Besides, some system includes might be removed thanks to the inclusion
of qemu-common.h, which sounds stable/reliable? dirent.h as well? Anyway
it does not help in any way for my matters.

Best Regards

Eric
 
 Alex

Re: [Qemu-devel] [RFC v4 04/13] hw/vfio/pci: introduce VFIODevice

2014-07-23 Thread Peter Maydell

On 23 July 2014 11:02, Eric Auger eric.au...@linaro.org wrote:
 On 07/09/2014 12:41 AM, Alex Williamson wrote:
 On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
 +vdev-vbasedev.ops = vfio_pci_ops;
 +
 +vdev-vbasedev.type = VFIO_DEVICE_TYPE_PCI;
 +vdev-vbasedev.name = g_malloc0(PATH_MAX);
 +snprintf(vdev-vbasedev.name, PATH_MAX, %04x:%02x:%02x.%01x,
 +vdev-host.domain, vdev-host.bus, vdev-host.slot,
 +vdev-host.function);
 +

 asprintf(3)?  This is a deterministic length, so PATH_MAX is especially
 ridiculous.
 agreed, will use asprintf instead.

A minor nit given this is going to be in only on Linux
code, but we generally prefer g_strdup_printf() over
raw asprintf() (they do the same thing, but the glib
function is guaranteed to be present everywhere,
and the returned memory is freeable with g_free()
like most of our strings, rather than needing to remember
that it needs to be freed via free().)

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 2/3] tap-bsd: implement a FreeBSD only version of tap_open

2014-07-23 Thread Stefan Hajnoczi

On Tue, Jul 22, 2014 at 01:26:00PM +0100, Stefano Stabellini wrote:
 On Tue, 22 Jul 2014, Roger Pau Monné wrote:
  On 27/05/14 15:29, Stefan Hajnoczi wrote:
   On Fri, May 23, 2014 at 05:57:48PM +0200, Roger Pau Monne wrote:
   The current behaviour of tap_open for BSD systems differ greatly from
   it's Linux counterpart. Since FreeBSD supports interface renaming and
   tap device cloning by opening /dev/tap, implement a FreeBSD specific
   version of tap_open that behaves like it's Linux counterpart.
  
   This is specially important for toolstacks that use Qemu (like Xen
   libxl), in order to have a unified behaviour across suported
   platforms.
  
   Signed-off-by: Roger Pau Monné roger@citrix.com
   Cc: xen-de...@lists.xenproject.org
   Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
   Cc: Anthony Liguori aligu...@us.ibm.com
   Cc: Stefan Hajnoczi stefa...@redhat.com
   ---
net/tap-bsd.c |   70 
   -
1 files changed, 69 insertions(+), 1 deletions(-)
   
   Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
  
  I still don't see this committed to the repository, should I ping someone?
 
 I was assuming that this patch would go via some other tree.  But if
 Stefan is OK I could pick it up and submit a pull request for both patch
 1 and 2 of this series.

I'm fine with that.  I only reviewed this patch since it affects the net
subsystem and left the series as a whole for someone to merge.

Stefan


pgp6LKGK11rH8.pgp
Description: PGP signature

[Qemu-devel] Fix a bug in debug printing of memory translation tables

2014-07-23 Thread Mikhail Ilin


Hi,

I've enabled DEBUG_MMAP in linux-user/mmap.c and got debug info of memory
layout.

This is the debug output of guest memory layout from qemu (including
the last mmap call marked with *).

mmap: start=0x0804a000 len=0x00021000 prot=rw- flags=MAP_ANON 
MAP_PRIVATE fd=0 offset=

ret=0x0804a000
  startend  size prot
  00048000-00049000 1000 r-x
* 00049000-0006b000 00022000 rw-
  002f6400-002f7400 1000 rw-
  002f7400-003ff400 00108000 r-x
  003ff400-003ff400  r--
  003ff400-003f6400 7000 rw-
  003fe400-003ff400 1000 rw-
  003ff400-003ff400  r-x
  003ff400-003fe400 f000 r--
  003fe400-003ff400 1000 rw-
  003ff400-000f6800 ffcf7400 ---
  000f6800-000f7000 0800 rw-

It looks completely insane with weird records where the start is bigger
than the end, the size is likely negative and in general all addresses are
in wrong boundaries.

Found a bug in the function which textualize memory translation tables. Made
a fix. Now I have the following output:

mmap: start=0x0804a000 len=0x00021000 prot=rw- flags=MAP_ANON 
MAP_PRIVATE fd=0 offset=

ret=0x0804a000
  startend  size prot
  08048000-08049000 1000 r-x
* 08049000-0806b000 00022000 rw-
  f6612000-f6615000 3000 rw-
  f6615000-f67bb000 001a6000 r-x
  f67bb000-f67bd000 2000 r--
  f67bd000-f67c2000 5000 rw-
  f67da000-f67dd000 3000 rw-
  f67dd000-f67fd000 0002 r-x
  f67fd000-f67fe000 1000 r--
  f67fe000-f67ff000 1000 rw-
  f67ff000-f680 1000 ---

This looks much better.

From 297045c6e7da0089c6ea4ee271000c507c5a8bf8 Mon Sep 17 00:00:00 2001
From: Mikhail Ilyin m.i...@samsung.com
Date: Wed, 23 Jul 2014 13:06:15 +0400
Subject: [PATCH] Fix a bug in debug printing of memory translation tables.

Signed-off-by: Mikhail Ilyin m.i...@samsung.com
---
 translate-all.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 8f7e11b..cb7a33d 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1728,9 +1728,8 @@ int walk_memory_regions(void *priv, 
walk_memory_regions_fn fn)

 data.prot = 0;

 for (i = 0; i  V_L1_SIZE; i++) {
-int rc = walk_memory_regions_1(data, (abi_ulong)i  V_L1_SHIFT,
+int rc = walk_memory_regions_1(data, (abi_ulong)i  
(V_L1_SHIFT + TARGET_PAGE_BITS),
V_L1_SHIFT / V_L2_BITS - 1, 
l1_map + i);

-
 if (rc != 0) {
 return rc;
 }
--
1.9.1

[Qemu-devel] QEMU and other libusb application cause segfaults in libusb

2014-07-23 Thread Erik Rull

Hi all,

I post this to both QEMU and libusb because I'm not sure where the error could
be located.

I have an application using libusb which is running for months without any
issues on the host system.

When I start QEMU - which uses libusb, too - the errors begin. I route some USB
ports to my QEMU-KVM guest but NOT the port where my hardware is attached.
From time to time I start receiving things from my own application that are
never sent by my USB hardware, it seems to be more a heap of memory coming from
somewhere else. And sometimes the libusb gets segfaulted in
libusb_handle_events_timeout_completed().

As long as my application is running alone everything is fine. And there is no
dmesg output that processes are fighting for my device.

When I exit QEMU early enough, the application stays alive and remains stable.

Any hints how to prevent that would be appreciated.

My system is an i5 CPU with a vanilla kernel 3.4.67.

Thanks.

Best regards,

Erik

Re: [Qemu-devel] [RFC v4 04/13] hw/vfio/pci: introduce VFIODevice

2014-07-23 Thread Eric Auger

On 07/23/2014 12:24 PM, Peter Maydell wrote:
 On 23 July 2014 11:02, Eric Auger eric.au...@linaro.org wrote:
 On 07/09/2014 12:41 AM, Alex Williamson wrote:
 On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote:
 +vdev-vbasedev.ops = vfio_pci_ops;
 +
 +vdev-vbasedev.type = VFIO_DEVICE_TYPE_PCI;
 +vdev-vbasedev.name = g_malloc0(PATH_MAX);
 +snprintf(vdev-vbasedev.name, PATH_MAX, %04x:%02x:%02x.%01x,
 +vdev-host.domain, vdev-host.bus, vdev-host.slot,
 +vdev-host.function);
 +

 asprintf(3)?  This is a deterministic length, so PATH_MAX is especially
 ridiculous.
 agreed, will use asprintf instead.
 
 A minor nit given this is going to be in only on Linux
 code, but we generally prefer g_strdup_printf() over
 raw asprintf() (they do the same thing, but the glib
 function is guaranteed to be present everywhere,
 and the returned memory is freeable with g_free()
 like most of our strings, rather than needing to remember
 that it needs to be freed via free().)

Hi Peter,

thanks. this is noted.

BR

Eric
 
 thanks
 -- PMM

Re: [Qemu-devel] [PATCH for-2.1] docs: document remaining QMP events

2014-07-23 Thread Eric Blake

On 07/23/2014 01:25 AM, Markus Armbruster wrote:
 Eric Blake ebl...@redhat.com writes:
 
 Commit dfab4892 restored this file, but did not address any of the
 grammar problems that had been fixed in passing when moving events
 out of this file.  There are also a couple events that were
 undocumented since introduction, and one that had been added only
 in the time that this file was temporarily deleted.

 -SPICE_CONNECTED, SPICE_DISCONNECTED
 
 +SPICE_CONNECTED
 +---

 -Emitted when a SPICE client connects or disconnects.
 +Emitted when a SPICE client connects.
 
 Wording doesn't match qapi-event.json exactly.  I doubt we care.

Not the only place where they don't match.  And I personally don't care :)



 +SPICE_INITIALIZED
 
 Another SPICE_INITIALIZED?  Do you mean SPICE_MIGRATE_COMPLETED?
 

Copy-and-paste strikes again. Yes, I'll fix that.


 
 Assuming you do mean SPICE_MIGRATE_COMPLETED: list is complete now.
 
 Would you mind splitting this patch?
 
 * Either one patch per undocumented event (if you want to be nice to
   downstreams cherry-picking events), or one patch for all of them.
 
 * One patch for the rest.  Or if you feel generous, two: one for the
   grammar fixes, one for the spice split.

v2 coming up as a full series.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] target-i386/FPU: wrong conversion infinity from float80 to int32/int64

2014-07-23 Thread Dmitry Poletaev

14.07.2014, 18:59, Peter Maydell peter.mayd...@linaro.org:

  Since softfloat's status flags are sticky ...

What does it mean?

[Qemu-devel] [PATCH v3] docs/multiple-iothreads.txt: add documentation on IOThread programming

2014-07-23 Thread Stefan Hajnoczi

This document explains how IOThreads and the main loop are related,
especially how to write code that can run in an IOThread.  Currently
only virtio-blk-data-plane uses these techniques.  The next obvious
target is virtio-scsi; there has also been work on virtio-net.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 docs/multiple-iothreads.txt | 134 
 1 file changed, 134 insertions(+)
 create mode 100644 docs/multiple-iothreads.txt

diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt
new file mode 100644
index 000..40b8419
--- /dev/null
+++ b/docs/multiple-iothreads.txt
@@ -0,0 +1,134 @@
+Copyright (c) 2014 Red Hat Inc.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.  See
+the COPYING file in the top-level directory.
+
+
+This document explains the IOThread feature and how to write code that runs
+outside the QEMU global mutex.
+
+The main loop and IOThreads
+---
+QEMU is an event-driven program that can do several things at once using an
+event loop.  The VNC server and the QMP monitor are both processed from the
+same event loop, which monitors their file descriptors until they become
+readable and then invokes a callback.
+
+The default event loop is called the main loop (see main-loop.c).  It is
+possible to create additional event loop threads using -object
+iothread,id=my-iothread.
+
+Side note: The main loop and IOThread are both event loops but their code is
+not shared completely.  Sometimes it is useful to remember that although they
+are conceptually similar they are currently not interchangeable.
+
+Why IOThreads are useful
+
+IOThreads allow the user to control the placement of work.  The main loop is a
+scalability bottleneck on hosts with many CPUs.  Work can be spread across
+several IOThreads instead of just one main loop.  When set up correctly this
+can improve I/O latency and reduce jitter seen by the guest.
+
+The main loop is also deeply associated with the QEMU global mutex, which is a
+scalability bottleneck in itself.  vCPU threads and the main loop use the QEMU
+global mutex to serialize execution of QEMU code.  This mutex is necessary
+because a lot of QEMU's code historically was not thread-safe.
+
+The fact that all I/O processing is done in a single main loop and that the
+QEMU global mutex is contended by all vCPU threads and the main loop explain
+why it is desirable to place work into IOThreads.
+
+The experimental virtio-blk data-plane implementation has been benchmarked and
+shows these effects:
+ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
+
+How to program for IOThreads
+
+The main difference between legacy code and new code that can run in an
+IOThread is dealing explicitly with the event loop object, AioContext
+(see include/block/aio.h).  Code that only works in the main loop
+implicitly uses the main loop's AioContext.  Code that supports running
+in IOThreads must be aware of its AioContext.
+
+AioContext supports the following services:
+ * File descriptor monitoring (read/write/error on POSIX hosts)
+ * Event notifiers (inter-thread signalling)
+ * Timers
+ * Bottom Halves (BH) deferred callbacks
+
+There are several old APIs that use the main loop AioContext:
+ * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
+ * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
+ * LEGACY timer_new_ms() - create a timer
+ * LEGACY qemu_bh_new() - create a BH
+ * LEGACY qemu_aio_wait() - run an event loop iteration
+
+Since they implicitly work on the main loop they cannot be used in code that
+runs in an IOThread.  They might cause a crash or deadlock if called from an
+IOThread since the QEMU global mutex is not held.
+
+Instead, use the AioContext functions directly (see include/block/aio.h):
+ * aio_set_fd_handler() - monitor a file descriptor
+ * aio_set_event_notifier() - monitor an event notifier
+ * aio_timer_new() - create a timer
+ * aio_bh_new() - create a BH
+ * aio_poll() - run an event loop iteration
+
+The AioContext can be obtained from the IOThread using
+iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
+Code that takes an AioContext argument works both in IOThreads or the main
+loop, depending on which AioContext instance the caller passes in.
+
+How to synchronize with an IOThread
+---
+AioContext is not thread-safe so some rules must be followed when using file
+descriptors, event notifiers, timers, or BHs across threads:
+
+1. AioContext functions can be called safely from file descriptor, event
+notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
+necessary.
+
+2. Other threads wishing to access the AioContext must use
+aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
+context is acquired no other thread can

[Qemu-devel] [PATCH v2 for-2.1 0/5] docs: document remaining QMP events

2014-07-23 Thread Eric Blake

diff from v1:
 split into series [Markus]
 fix SPICE_MIGRATE_COMPLETE typo [Markus]

Eric Blake (5):
  docs: grammar fixes to qmp-events
  docs: split SPICE_* event docs
  docs: document missing SPICE_MIGRATE_COMPLETED event
  docs: document missing POWERDOWN event
  docs: document missing VSERPORT_CHANGE event

 docs/qmp/qmp-events.txt | 80 +
 1 file changed, 74 insertions(+), 6 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH v2 for-2.1 3/5] docs: document missing SPICE_MIGRATE_COMPLETED event

2014-07-23 Thread Eric Blake

The SPICE_MIGRATE_COMPLETED event was first documented in
7cfadb6b.  But since dfab4892 later restored this flie to the
state prior to qmp events, and we never documented it in the
past, anyone using this file instead of qapi will miss out on
this event.

* docs/qmp/qmp-events.txt (SPICE_MIGRATE_COMPLETED): Add.

Signed-off-by: Eric Blake ebl...@redhat.com
---
 docs/qmp/qmp-events.txt | 13 +
 1 file changed, 13 insertions(+)

diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
index 9b7ee7c..22d552f 100644
--- a/docs/qmp/qmp-events.txt
+++ b/docs/qmp/qmp-events.txt
@@ -424,6 +424,19 @@ Example:
   channel-id: 0, tls: true}
 }}

+SPICE_MIGRATE_COMPLETED
+---
+
+Emitted when SPICE migration has completed
+
+Data: None.
+
+Example:
+
+{ timestamp: {seconds: 1290688046, microseconds: 417172},
+  event: SPICE_MIGRATE_COMPLETED }
+
+
 STOP
 

-- 
1.9.3

[Qemu-devel] [PATCH v2 for-2.1 4/5] docs: document missing POWERDOWN event

2014-07-23 Thread Eric Blake

The POWERDOWN event was first documented in 0aab9ec3.  But since
dfab4892 later restored this file to the state prior to qmp events,
and we never documented it in the past, anyone using this file
instead of qapi will miss out on this event.  Tweak the existing
wording of SHUTDOWN to match 84321831, and make the difference
between the two events apparent.

* docs/qmp/qmp-events.txt (POWERDOWN): Add.
(SHUTDOWN): Tweak.

Signed-off-by: Eric Blake ebl...@redhat.com
---
 docs/qmp/qmp-events.txt | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
index 22d552f..9d7439e 100644
--- a/docs/qmp/qmp-events.txt
+++ b/docs/qmp/qmp-events.txt
@@ -243,6 +243,19 @@ Data:
   timestamp: { seconds: 1368697518, microseconds: 326866 } }
 }

+POWERDOWN
+-
+
+Emitted when the Virtual Machine is powered down through the power
+control system, such as via ACPI.
+
+Data: None.
+
+Example:
+
+{ event: POWERDOWN,
+timestamp: { seconds: 1267040730, microseconds: 682951 } }
+
 QUORUM_FAILURE
 --

@@ -325,7 +338,8 @@ Example:
 SHUTDOWN
 

-Emitted when the Virtual Machine is powered down.
+Emitted when the Virtual Machine has shut down, indicating that qemu
+is about to exit.

 Data: None.

-- 
1.9.3

[Qemu-devel] [PATCH v2 for-2.1 5/5] docs: document missing VSERPORT_CHANGE event

2014-07-23 Thread Eric Blake

The VSERPORT_CHANGE event was added in e2ae6159.  The patch for
this event was prepared at a time when this file was gone, even
though it got applied immediately after dfab4892 restored this
file.  Duplicate the documentation into this file, so that
anyone using this file instead of qapi will not miss out on this
new event.

* docs/qmp/qmp-events.txt (VSERPORT_CHANGE): Add.

Signed-off-by: Eric Blake ebl...@redhat.com
---
 docs/qmp/qmp-events.txt | 16 
 1 file changed, 16 insertions(+)

diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
index 9d7439e..d759d19 100644
--- a/docs/qmp/qmp-events.txt
+++ b/docs/qmp/qmp-events.txt
@@ -579,6 +579,22 @@ Example:
 host: 127.0.0.1, sasl_username: luiz } },
 timestamp: { seconds: 1263475302, microseconds: 150772 } }

+VSERPORT_CHANGE
+---
+
+Emitted when the guest opens or closes a virtio-serial port.
+
+Data:
+
+- id: device identifier of the virtio-serial port (json-string)
+- open: true if the guest has opened the virtio-serial port (json-bool)
+
+Example:
+
+{ event: VSERPORT_CHANGE,
+data: { id: channel0, open: true },
+timestamp: { seconds: 1401385907, microseconds: 422329 } }
+
 WAKEUP
 --

-- 
1.9.3

[Qemu-devel] [PATCH v2 for-2.1 1/5] docs: grammar fixes to qmp-events

2014-07-23 Thread Eric Blake

When converting to qmp events, commits 7cfadb6b and a6330785
fixed some grammar as part of moving text between files.  But
since dfab4892 later restored this file to the state prior to
qmp events, we have to do it again.

* docs/qmp/qmp-events.txt (RESET, SPICE_INITIALIZED): Tweak.

Signed-off-by: Eric Blake ebl...@redhat.com
---
 docs/qmp/qmp-events.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
index 4a6c2a2..524eadf 100644
--- a/docs/qmp/qmp-events.txt
+++ b/docs/qmp/qmp-events.txt
@@ -285,7 +285,7 @@ Example:
 RESET
 -

-Emitted when the Virtual Machine is reseted.
+Emitted when the Virtual Machine is reset.

 Data: None.

@@ -366,7 +366,7 @@ SPICE_INITIALIZED
 -

 Emitted after initial handshake and authentication takes place (if any)
-and the SPICE channel is up'n'running
+and the SPICE channel is up and running

 Data:

-- 
1.9.3

[Qemu-devel] [PATCH v2 for-2.1 2/5] docs: split SPICE_* event docs

2014-07-23 Thread Eric Blake

For consistency with the rest of this file, every event should be
listed in isolation.  Compare how commit 7cfadb6b split
SPICE_CONNECTED and SPICE_DISCONNECTED into separate qmp events.

* docs/qmp/qmp-events.txt (SPICE_CONNECTED, SPICE_DISCONNECTED):
Split.

Signed-off-by: Eric Blake ebl...@redhat.com
---
 docs/qmp/qmp-events.txt | 31 ---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
index 524eadf..9b7ee7c 100644
--- a/docs/qmp/qmp-events.txt
+++ b/docs/qmp/qmp-events.txt
@@ -337,10 +337,10 @@ Example:
 Note: If the command-line option -no-shutdown has been specified, a STOP
 event will eventually follow the SHUTDOWN event.

-SPICE_CONNECTED, SPICE_DISCONNECTED

+SPICE_CONNECTED
+---

-Emitted when a SPICE client connects or disconnects.
+Emitted when a SPICE client connects.

 Data:

@@ -362,6 +362,31 @@ Example:
 client: {port: 52873, family: ipv4, host: 127.0.0.1}
 }}

+SPICE_DISCONNECTED
+--
+
+Emitted when a SPICE client disconnects.
+
+Data:
+
+- server: Server information (json-object)
+  - host: IP address (json-string)
+  - port: port number (json-string)
+  - family: address family (json-string, ipv4 or ipv6)
+- client: Client information (json-object)
+  - host: IP address (json-string)
+  - port: port number (json-string)
+  - family: address family (json-string, ipv4 or ipv6)
+
+Example:
+
+{ timestamp: {seconds: 1290688046, microseconds: 388707},
+  event: SPICE_DISCONNECTED,
+  data: {
+server: { port: 5920, family: ipv4, host: 127.0.0.1},
+client: {port: 52873, family: ipv4, host: 127.0.0.1}
+}}
+
 SPICE_INITIALIZED
 -

-- 
1.9.3

Re: [Qemu-devel] [PATCH v3] docs/multiple-iothreads.txt: add documentation on IOThread programming

2014-07-23 Thread Eric Blake

On 07/23/2014 05:55 AM, Stefan Hajnoczi wrote:
 This document explains how IOThreads and the main loop are related,
 especially how to write code that can run in an IOThread.  Currently
 only virtio-blk-data-plane uses these techniques.  The next obvious
 target is virtio-scsi; there has also been work on virtio-net.
 
 Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
 ---

Would have been nice to explain the diff to v2...

  docs/multiple-iothreads.txt | 134 
 
  1 file changed, 134 insertions(+)
  create mode 100644 docs/multiple-iothreads.txt
 

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] target-i386/FPU: wrong conversion infinity from float80 to int32/int64

2014-07-23 Thread Peter Maydell

On 23 July 2014 12:55, Dmitry Poletaev poletaev-q...@yandex.ru wrote:
 14.07.2014, 18:59, Peter Maydell peter.mayd...@linaro.org:

  Since softfloat's status flags are sticky ...

 What does it mean?

Sticky here means that the status flags accumulate the
status from a sequence of operations: a softfloat function
will set the flag if the relevant exception occurred, but if
the exceptional condition did not happen then the flag will
be left at whatever its preceding value was. So you can't
just say if the flag is set then the last operation I did set
it, because it might have been set by some operation
before that. (That is, once a bit gets set in the flags word
it sticks and doesn't go away.)

This matches the IEEE mandated behaviour for
floating point exception flags, which is why we do it.

thanks
-- PMM

[Qemu-devel] [RFC 3/3] QMP: extend BLOCK_IO_ERROR event with no-space indicator

2014-07-23 Thread Luiz Capitulino

Management software, such as OpenStack and RHEV's vdsm, want to be able
to allocate disk space on demand. The basic use case is to start a VM
with a small disk and then the disk is enlarged when QEMU hits a ENOSPC
condition.

To this end, the management software has to be notified when QEMU
encounters ENOSPC. The solution implemented by this commit is simple:
it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true
when QEMU is stopped due to ENOSPC.

Note that support for quering this event is already present in
query-block by means of the 'io-status' key and that the new 'nospace'
BLOCK_IO_ERROR field shares the same semantics with 'io-status',
which basically means that werror= has to be set to either
'stop' or 'enospc'.

Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
---
 block.c  | 22 ++
 qapi/block-core.json |  7 ++-
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index 8cf519b..566ef56 100644
--- a/block.c
+++ b/block.c
@@ -3596,6 +3596,18 @@ BlockErrorAction bdrv_get_error_action(BlockDriverState 
*bs, bool is_read, int e
 }
 }
 
+static void send_qmp_error_event(BlockDriverState *bs,
+ BlockErrorAction action,
+ bool is_read, int error)
+{
+BlockErrorAction ac;
+
+ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
+qapi_event_send_block_io_error(bdrv_get_device_name(bs), ac, action,
+   bdrv_iostatus_is_enabled(bs),
+   error == ENOSPC, error_abort);
+}
+
 /* This is done by device models because, while the block layer knows
  * about the error, it does not know whether an operation comes from
  * the device or the block layer (from a job, for example).
@@ -3621,16 +3633,10 @@ void bdrv_error_action(BlockDriverState *bs, 
BlockErrorAction action,
  * also ensures that the STOP/RESUME pair of events is emitted.
  */
 qemu_system_vmstop_request_prepare();
-qapi_event_send_block_io_error(bdrv_get_device_name(bs),
-   is_read ? IO_OPERATION_TYPE_READ :
-   IO_OPERATION_TYPE_WRITE,
-   action, error_abort);
+send_qmp_error_event(bs, action, is_read, error);
 qemu_system_vmstop_request(RUN_STATE_IO_ERROR);
 } else {
-qapi_event_send_block_io_error(bdrv_get_device_name(bs),
-   is_read ? IO_OPERATION_TYPE_READ :
-   IO_OPERATION_TYPE_WRITE,
-   action, error_abort);
+send_qmp_error_event(bs, action, is_read, error);
 }
 }
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1069679..d659165 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1534,6 +1534,11 @@
 #
 # @action: action that has been taken
 #
+# @nospace: #optional true if I/O error was caused due to a no-space
+#   condition. This key is only present if query-block's
+#   io-status is present, please see query-block documentation
+#   for more information (since: 2.2)
+#
 # Note: If action is stop, a STOP event will eventually follow the
 # BLOCK_IO_ERROR event
 #
@@ -1541,7 +1546,7 @@
 ##
 { 'event': 'BLOCK_IO_ERROR',
   'data': { 'device': 'str', 'operation': 'IoOperationType',
-'action': 'BlockErrorAction' } }
+'action': 'BlockErrorAction', '*nospace': 'bool' } }
 
 ##
 # @BLOCK_JOB_COMPLETED
-- 
1.9.3

[Qemu-devel] [RFC 0/3] QMP: extend BLOCK_IO_ERROR event

2014-07-23 Thread Luiz Capitulino


Management software, such as OpenStack and RHEV's vdsm, wants to be able
to allocate VM disk space on demand. The basic use case is to start a VM
with a small disk and then the disk is enlarged when QEMU hits a ENOSPC
condition.

To this end, the management software has to be notified when QEMU
encounters ENOSPC. The most straightforward solution is to extend QMP's
BLOCK_IO_ERROR event with that information.

This series does exactly that. The approach taken is the simplest possible:
the BLOCK_IO_ERROR event is extended to contain a nospace key, which
will be true whenever the guest runs out of space *and* werror=stop|enospc.
Here's an example:

{ event: BLOCK_IO_ERROR,
data: { device: ide0-hd1,
  operation: write,
  action: stop,
  nospace: true },
timestamp: { seconds: 1265044230, microseconds: 450486 } }

There are three important things to observe:

 1. query-block already supports querying the event by means of the
io-status key. Actually, nospace and io-status keys share
the same semantics. This is a big advantage of this approach, no
further extension of query-block is needed

 2. The event could also contain an error message key for debugging,
But if we add it to the event, should we add it to query-block too?

 3. I'm not extending BLOCK_JOB_ERROR. The reason is that it seems
that BLOCK_IO_ERROR is also emitted on BLOCK_JOB_ERROR

Now, this series is an RFC because there's an alternative solution for
this problem: instead of extending the BLOCK_IO_ERROR event with no-space
indicator, we could have a stringfied errno. This way management apps
would also be able to distinguish among other errors.

For example, we could have a error-details dict containing a
reason and a message key:

{ event: BLOCK_IO_ERROR,
data: { device: ide0-hd1,
  operation: write,
  action: stop,
  error-details: { reason: eio, message: I/O 
error },
timestamp: { seconds: 1265044230, microseconds: 450486 } }

And then query-block would have to be extended to contain the same
information.

IMO, this series implementation is good enough for the requirement we
currently have but I'm open to go complex if needed.

Luiz Capitulino (3):
  qapi: block-core.json: improve query-block doc
  QMP: rate limit BLOCK_IO_ERROR
  QMP: extend BLOCK_IO_ERROR event with no-space indicator

 block.c  | 22 ++
 monitor.c|  1 +
 qapi/block-core.json |  8 +++-
 3 files changed, 22 insertions(+), 9 deletions(-)

-- 
1.9.3

[Qemu-devel] [RFC 2/3] QMP: rate limit BLOCK_IO_ERROR

2014-07-23 Thread Luiz Capitulino

This event has the same characteristics of the other rate-limited
events, mainly we can emit dozens of it. Rate limit it then.

Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
---
 monitor.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/monitor.c b/monitor.c
index 5bc70a6..33abe6c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -589,6 +589,7 @@ static void monitor_qapi_event_init(void)
 monitor_qapi_event_throttle(QAPI_EVENT_QUORUM_REPORT_BAD, 1000);
 monitor_qapi_event_throttle(QAPI_EVENT_QUORUM_FAILURE, 1000);
 monitor_qapi_event_throttle(QAPI_EVENT_VSERPORT_CHANGE, 1000);
+monitor_qapi_event_throttle(QAPI_EVENT_BLOCK_IO_ERROR, 1000);
 
 qmp_event_set_func_emit(monitor_qapi_event_queue);
 }
-- 
1.9.3

Re: [Qemu-devel] [PATCH v2 for-2.1 3/5] docs: document missing SPICE_MIGRATE_COMPLETED event

2014-07-23 Thread Markus Armbruster

Eric Blake ebl...@redhat.com writes:

 The SPICE_MIGRATE_COMPLETED event was first documented in
 7cfadb6b.  But since dfab4892 later restored this flie to the

this file

 state prior to qmp events, and we never documented it in the
 past, anyone using this file instead of qapi will miss out on
 this event.

 * docs/qmp/qmp-events.txt (SPICE_MIGRATE_COMPLETED): Add.

 Signed-off-by: Eric Blake ebl...@redhat.com

Patch is fine.

Re: [Qemu-devel] [PATCH v2 2/3] tap-bsd: implement a FreeBSD only version of tap_open

2014-07-23 Thread Roger Pau Monné

On 22/07/14 14:26, Stefano Stabellini wrote:
 On Tue, 22 Jul 2014, Roger Pau Monné wrote:
 On 27/05/14 15:29, Stefan Hajnoczi wrote:
 On Fri, May 23, 2014 at 05:57:48PM +0200, Roger Pau Monne wrote:
 The current behaviour of tap_open for BSD systems differ greatly from
 it's Linux counterpart. Since FreeBSD supports interface renaming and
 tap device cloning by opening /dev/tap, implement a FreeBSD specific
 version of tap_open that behaves like it's Linux counterpart.

 This is specially important for toolstacks that use Qemu (like Xen
 libxl), in order to have a unified behaviour across suported
 platforms.

 Signed-off-by: Roger Pau Monné roger@citrix.com
 Cc: xen-de...@lists.xenproject.org
 Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
 Cc: Anthony Liguori aligu...@us.ibm.com
 Cc: Stefan Hajnoczi stefa...@redhat.com
 ---
  net/tap-bsd.c |   70 
 -
  1 files changed, 69 insertions(+), 1 deletions(-)

 Reviewed-by: Stefan Hajnoczi stefa...@redhat.com

 I still don't see this committed to the repository, should I ping someone?
 
 I was assuming that this patch would go via some other tree.  But if
 Stefan is OK I could pick it up and submit a pull request for both patch
 1 and 2 of this series.

Would you do the backport of those three patches (one is already
committed as e02bc6) to the qemu-xen repo at the same time, or would you
like me to remind you about this in a month or so?

I would really like to have all this patches in Xen 4.5 if possible.

Thanks, Roger.

Re: [Qemu-devel] [PATCH v2 2/3] tap-bsd: implement a FreeBSD only version of tap_open

2014-07-23 Thread Stefano Stabellini

On Wed, 23 Jul 2014, Roger Pau Monné wrote:
 On 22/07/14 14:26, Stefano Stabellini wrote:
  On Tue, 22 Jul 2014, Roger Pau Monné wrote:
  On 27/05/14 15:29, Stefan Hajnoczi wrote:
  On Fri, May 23, 2014 at 05:57:48PM +0200, Roger Pau Monne wrote:
  The current behaviour of tap_open for BSD systems differ greatly from
  it's Linux counterpart. Since FreeBSD supports interface renaming and
  tap device cloning by opening /dev/tap, implement a FreeBSD specific
  version of tap_open that behaves like it's Linux counterpart.
 
  This is specially important for toolstacks that use Qemu (like Xen
  libxl), in order to have a unified behaviour across suported
  platforms.
 
  Signed-off-by: Roger Pau Monné roger@citrix.com
  Cc: xen-de...@lists.xenproject.org
  Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
  Cc: Anthony Liguori aligu...@us.ibm.com
  Cc: Stefan Hajnoczi stefa...@redhat.com
  ---
   net/tap-bsd.c |   70 
  -
   1 files changed, 69 insertions(+), 1 deletions(-)
 
  Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
 
  I still don't see this committed to the repository, should I ping someone?
  
  I was assuming that this patch would go via some other tree.  But if
  Stefan is OK I could pick it up and submit a pull request for both patch
  1 and 2 of this series.
 
 Would you do the backport of those three patches (one is already
 committed as e02bc6) to the qemu-xen repo at the same time, or would you
 like me to remind you about this in a month or so?
 
 I would really like to have all this patches in Xen 4.5 if possible.

I should remember when I'll send a pull request (when 2.1 will be
out). But please remind me if I'll forget.

[Qemu-devel] [PATCH v7 3/5] block/archipelago: Add support for creating images

2014-07-23 Thread Chrysostomos Nanakos

qemu-img archipelago:volumename[/mport=mapperd_port[:vport=vlmcd_port]
 [:segment=segment_name]] [size]

Signed-off-by: Chrysostomos Nanakos cnana...@grnet.gr
---
 block/archipelago.c |  146 +++
 1 file changed, 146 insertions(+)

diff --git a/block/archipelago.c b/block/archipelago.c
index 5a9fc68..b5c66fd 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -592,6 +592,137 @@ err_exit:
 xseg_leave(s-xseg);
 }
 
+static int qemu_archipelago_create_volume(Error **errp, const char *volname,
+  char *segment_name,
+  uint64_t size, xport mportno,
+  xport vportno)
+{
+int ret, targetlen;
+struct xseg *xseg = NULL;
+struct xseg_request *req;
+struct xseg_request_clone *xclone;
+struct xseg_port *port;
+xport srcport = NoPort, sport = NoPort;
+char *target;
+
+/* Try default values if none has been set */
+if (mportno == (xport) -1) {
+mportno = ARCHIPELAGO_DFL_MPORT;
+}
+
+if (vportno == (xport) -1) {
+vportno = ARCHIPELAGO_DFL_VPORT;
+}
+
+if (xseg_initialize()) {
+error_setg(errp, Cannot initialize XSEG);
+return -1;
+}
+
+xseg = xseg_join(posix, segment_name,
+ posixfd, NULL);
+
+if (!xseg) {
+error_setg(errp, Cannot join XSEG shared memory segment);
+return -1;
+}
+
+port = xseg_bind_dynport(xseg);
+srcport = port-portno;
+init_local_signal(xseg, sport, srcport);
+
+req = xseg_get_request(xseg, srcport, mportno, X_ALLOC);
+if (!req) {
+error_setg(errp, Cannot get XSEG request);
+return -1;
+}
+
+targetlen = strlen(volname);
+ret = xseg_prep_request(xseg, req, targetlen,
+sizeof(struct xseg_request_clone));
+if (ret  0) {
+error_setg(errp, Cannot prepare XSEG request);
+goto err_exit;
+}
+
+target = xseg_get_target(xseg, req);
+if (!target) {
+error_setg(errp, Cannot get XSEG target.\n);
+goto err_exit;
+}
+memcpy(target, volname, targetlen);
+xclone = (struct xseg_request_clone *) xseg_get_data(xseg, req);
+memset(xclone-target, 0 , XSEG_MAX_TARGETLEN);
+xclone-targetlen = 0;
+xclone-size = size;
+req-offset = 0;
+req-size = req-datalen;
+req-op = X_CLONE;
+
+xport p = xseg_submit(xseg, req, srcport, X_ALLOC);
+if (p == NoPort) {
+error_setg(errp, Could not submit XSEG request);
+goto err_exit;
+}
+xseg_signal(xseg, p);
+
+ret = wait_reply(xseg, srcport, port, req);
+if (ret  0) {
+error_setg(errp, wait_reply() error.);
+}
+
+xseg_put_request(xseg, req, srcport);
+xseg_quit_local_signal(xseg, srcport);
+xseg_leave_dynport(xseg, port);
+xseg_leave(xseg);
+return ret;
+
+err_exit:
+xseg_put_request(xseg, req, srcport);
+xseg_quit_local_signal(xseg, srcport);
+xseg_leave_dynport(xseg, port);
+xseg_leave(xseg);
+return -1;
+}
+
+static int qemu_archipelago_create(const char *filename,
+   QemuOpts *options,
+   Error **errp)
+{
+int ret = 0;
+uint64_t total_size = 0;
+char *volname = NULL, *segment_name = NULL;
+const char *start;
+xport mport = NoPort, vport = NoPort;
+
+if (!strstart(filename, archipelago:, start)) {
+error_setg(errp, File name must start with 'archipelago:');
+return -1;
+}
+
+if (!strlen(start) || strstart(start, /, NULL)) {
+error_setg(errp, volume name must be specified);
+return -1;
+}
+
+parse_filename_opts(filename, errp, volname, segment_name, mport,
+vport);
+total_size = qemu_opt_get_size_del(options, BLOCK_OPT_SIZE, 0);
+
+if (segment_name == NULL) {
+segment_name = g_strdup(archipelago);
+}
+
+/* Create an Archipelago volume */
+ret = qemu_archipelago_create_volume(errp, volname, segment_name,
+ total_size, mport,
+ vport);
+
+g_free(volname);
+g_free(segment_name);
+return ret;
+}
+
 static void qemu_archipelago_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) blockacb;
@@ -892,6 +1023,19 @@ static int64_t 
qemu_archipelago_getlength(BlockDriverState *bs)
 return ret;
 }
 
+static QemuOptsList qemu_archipelago_create_opts = {
+.name = archipelago-create-opts,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_archipelago_create_opts.head),
+.desc = {
+{
+.name = BLOCK_OPT_SIZE,
+.type = QEMU_OPT_SIZE,
+.help = Virtual disk size
+},
+{ /* end of list */ }
+}
+};
+
 static BlockDriverAIOCB *qemu_archipelago_aio_flush(BlockDriverState

[Qemu-devel] [PATCH v7 4/5] QMP: Add support for Archipelago

2014-07-23 Thread Chrysostomos Nanakos

Introduce new enum BlockdevOptionsArchipelago.

@volume:  #Name of the Archipelago volume image

@mport:   #'mport' is the port number on which mapperd is
  listening. This is optional and if not specified,
  QEMU will make Archipelago to use the default port.

@vport:   #'vport' is the port number on which vlmcd is
  listening. This is optional and if not specified,
  QEMU will make Archipelago to use the default port.

@segment: #optional The name of the shared memory segment
  Archipelago stack is using. This is optional
  and if not specified, QEMU will make Archipelago
  use the default value, 'archipelago'.

Signed-off-by: Chrysostomos Nanakos cnana...@grnet.gr
---
 qapi/block-core.json |   38 +++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index e378653..0fa0c12 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -190,8 +190,8 @@
 # @ro: true if the backing device was open read-only
 #
 # @drv: the name of the block format used to open the backing device. As of
-#   0.14.0 this can be: 'blkdebug', 'bochs', 'cloop', 'cow', 'dmg',
-#   'file', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
+#   0.14.0 this can be: 'archipelago', 'blkdebug', 'bochs', 'cloop', 'cow',
+#   'dmg', 'file', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
 #   'host_floppy', 'http', 'https', 'nbd', 'parallels', 'qcow',
 #   'qcow2', 'raw', 'tftp', 'vdi', 'vmdk', 'vpc', 'vvfat'
 #
@@ -1143,7 +1143,7 @@
 # Since: 2.0
 ##
 { 'enum': 'BlockdevDriver',
-  'data': [ 'file', 'host_device', 'host_cdrom', 'host_floppy',
+  'data': [ 'archipelago', 'file', 'host_device', 'host_cdrom', 'host_floppy',
 'http', 'https', 'ftp', 'ftps', 'tftp', 'vvfat', 'blkdebug',
 'blkverify', 'bochs', 'cloop', 'cow', 'dmg', 'parallels', 'qcow',
 'qcow2', 'qed', 'raw', 'vdi', 'vhdx', 'vmdk', 'vpc', 'quorum' ] }
@@ -1273,6 +1273,37 @@
 '*pass-discard-snapshot': 'bool',
 '*pass-discard-other': 'bool' } }
 
+
+##
+# @BlockdevOptionsArchipelago
+#
+# Driver specific block device options for Archipelago.
+#
+# @volume:  Name of the Archipelago volume image
+#
+# @mport:   #optional The port number on which mapperd is
+#   listening. This is optional
+#   and if not specified, QEMU will make Archipelago
+#   use the default port (1001).
+#
+# @vport:   #optional The port number on which vlmcd is
+#   listening. This is optional
+#   and if not specified, QEMU will make Archipelago
+#   use the default port (501).
+#
+# @segment: #optional The name of the shared memory segment
+#   Archipelago stack is using. This is optional
+#   and if not specified, QEMU will make Archipelago
+#   use the default value, 'archipelago'.
+# Since: 2.2
+##
+{ 'type': 'BlockdevOptionsArchipelago',
+  'data': { 'volume': 'str',
+'*mport': 'int',
+'*vport': 'int',
+'*segment': 'str' } }
+
+
 ##
 # @BlkdebugEvent
 #
@@ -1416,6 +1447,7 @@
   'base': 'BlockdevOptionsBase',
   'discriminator': 'driver',
   'data': {
+  'archipelago':'BlockdevOptionsArchipelago',
   'file':   'BlockdevOptionsFile',
   'host_device':'BlockdevOptionsFile',
   'host_cdrom': 'BlockdevOptionsFile',
-- 
1.7.10.4

[Qemu-devel] [PATCH v7 0/5] Support Archipelago as a QEMU block backend

2014-07-23 Thread Chrysostomos Nanakos

v7:
 - Fix coding style issues.
 - Rename __archipelago_submit_request function to archipelago_submit_request.
 - Set X_NONBLOCK flag to xseg_receive().
 - Return -EIO to .bdrv_getlength() if archipelago_volume_info() fails.
 - Fix segment_name mem leak.
 - Bump version number from 2.1 to 2.2 in qapi/block-core.json file concerning
   QEMU Archipelago support.
 - Convert qemu_aio_wait() to aio_poll().
 - Remove qemu_blockalign() and memcpy() call and use qemu_iovec_to_buf()
   directly.

v6:
 - Split v5 1/4 patch into two different patches. First one implements
   QMP structured options and the second one implements bdrv_parse_filename().

v5:
 - Remove useless qemu_aio_count variable from BDRVArchipelagoState struct.
 - Cleanup xseg signal descriptor, call xseg_quit_local_signal() when closing
   block device.
 - Fix ds and volname leaks.
 - Make xseg request handler thread joinable and wait until exits before
   destroying condition variables and mutexes. Thanks to Stefan Hajnoczi for
   pointing this out.
 - Remove error_propagate() useless call.
 - Use memcpy instead of strncpy.
 - Remove check after trying to allocate memory with g_malloc().
 - Remove pipe code and complete AIO by introducing QEMU bottom-half.
 - Add Archipelago shared memory segment name in options list and QMP.
 - Remove functions archipelago_aio_read()/_write() and introduce new
   and simpler function, __archipelago_submit_request().
   Refactor archipelago_aio_segmented_rw() function.
 - Enable Archipelago support in qemu-iotests

v4:
 - Move Archipelago QMP support from qapi-schema.json file to
   qapi/block-core.json. Fixe various typographic errors, thanks to
   Kevin Wolf and Eric Blake.
 - Use new .create_opts format, define new QemuOptsList structure and refactor
   qemu_archipelago_create function.

v3:
 - Break down initial patch from one to three. First patch implements
   Archipelago QEMU block backend with read/write functionality.
   Second patch implements .bdrv_create() and adds support for creating
   Archipelago images. Third patch adds QMP support.
 - Remove global variable g_xseg_init, make xseg_initialize(), xseg_join()
   and xseg_leave() reentrant and thread-safe.
 - Introduce new enum BlockdevOptionsArchipelago for the QMP support.

v2:
 - Implement .bdrv_parse_filename() function to convert the shortuct version
   with a single string to the individual options.
 - Remove global variables and move relevant fields to ArchipelagoAIOCB struct.
 - Remove ArchipelagoConf struct and use the relevant fields as individual
   arguments.
 - Remove ArchipelagoCB struct and use ArchipelagoAIOCB instead.
 - Remove ArchipelagoThread struct and move relevant fields to
   ArchipelagoAIOCB instead. Now an I/O thread is spawned for per-device to
   handle all async I/O requests.
 - Remove double data copy, use qemu_iovec_from_buf() and copy data directly
   to the destination buffer.
 - Remove archipelago_aio_bh_cb() function, a full request is completed in
   qemu_archipelago_complete_aio() instead.
 - Resolve proposed changes from Kevin Wolf and miscellaneous style issues.

Chrysostomos Nanakos (5):
  block: Support Archipelago as a QEMU block backend
  block/archipelago: Implement bdrv_parse_filename()
  block/archipelago: Add support for creating images
  QMP: Add support for Archipelago
  qemu-iotests: add support for Archipelago protocol

 MAINTAINERS  |6 +
 block/Makefile.objs  |2 +
 block/archipelago.c  | 1064 ++
 configure|   40 ++
 qapi/block-core.json |   38 +-
 tests/qemu-iotests/common|6 +
 tests/qemu-iotests/common.rc |9 +-
 7 files changed, 1161 insertions(+), 4 deletions(-)
 create mode 100644 block/archipelago.c

-- 
1.7.10.4

[Qemu-devel] [PATCH v7 1/5] block: Support Archipelago as a QEMU block backend

2014-07-23 Thread Chrysostomos Nanakos

VM Image on Archipelago volume is specified like this:

file.driver=archipelago,file.volume=volumename[,file.mport=mapperd_port[,
file.vport=vlmcd_port][,file.segment=segment_name]]

'archipelago' is the protocol.

'mport' is the port number on which mapperd is listening. This is optional
and if not specified, QEMU will make Archipelago to use the default port.

'vport' is the port number on which vlmcd is listening. This is optional
and if not specified, QEMU will make Archipelago to use the default port.

'segment' is the name of the shared memory segment Archipelago stack is using.
This is optional and if not specified, QEMU will make Archipelago to use the
default value, 'archipelago'.

Examples:

file.driver=archipelago,file.volume=my_vm_volume
file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
file.vport=1234
file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
file.vport=1234,file.segment=my_segment

Signed-off-by: Chrysostomos Nanakos cnana...@grnet.gr
---
 MAINTAINERS |6 +
 block/Makefile.objs |2 +
 block/archipelago.c |  785 +++
 configure   |   40 +++
 4 files changed, 833 insertions(+)
 create mode 100644 block/archipelago.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 906f252..59940f9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1000,3 +1000,9 @@ SSH
 M: Richard W.M. Jones rjo...@redhat.com
 S: Supported
 F: block/ssh.c
+
+ARCHIPELAGO
+M: Chrysostomos Nanakos cnana...@grnet.gr
+M: Chrysostomos Nanakos ch...@include.gr
+S: Maintained
+F: block/archipelago.c
diff --git a/block/Makefile.objs b/block/Makefile.objs
index fd88c03..858d2b3 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -17,6 +17,7 @@ block-obj-$(CONFIG_LIBNFS) += nfs.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
 block-obj-$(CONFIG_GLUSTERFS) += gluster.o
+block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 endif
 
@@ -35,5 +36,6 @@ gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
 gluster.o-libs := $(GLUSTERFS_LIBS)
 ssh.o-cflags   := $(LIBSSH2_CFLAGS)
 ssh.o-libs := $(LIBSSH2_LIBS)
+archipelago.o-libs := $(ARCHIPELAGO_LIBS)
 qcow.o-libs:= -lz
 linux-aio.o-libs   := -laio
diff --git a/block/archipelago.c b/block/archipelago.c
new file mode 100644
index 000..1c21d36
--- /dev/null
+++ b/block/archipelago.c
@@ -0,0 +1,785 @@
+/*
+ * QEMU Block driver for Archipelago
+ *
+ * Copyright (C) 2014 Chrysostomos Nanakos cnana...@grnet.gr
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/*
+ * VM Image on Archipelago volume is specified like this:
+ *
+ * file.driver=archipelago,file.volume=volumename
+ * [,file.mport=mapperd_port[,file.vport=vlmcd_port]
+ * [,file.segment=segment_name]]
+ *
+ * 'archipelago' is the protocol.
+ *
+ * 'mport' is the port number on which mapperd is listening. This is optional
+ * and if not specified, QEMU will make Archipelago to use the default port.
+ *
+ * 'vport' is the port number on which vlmcd is listening. This is optional
+ * and if not specified, QEMU will make Archipelago to use the default port.
+ *
+ * 'segment' is the name of the shared memory segment Archipelago stack
+ * is using. This is optional and if not specified, QEMU will make Archipelago
+ * to use the default value, 'archipelago'.
+ *
+ * Examples:
+ *
+ * file.driver=archipelago,file.volume=my_vm_volume
+ * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
+ * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
+ * file.vport=1234
+ * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
+ * file.vport=1234,file.segment=my_segment
+ */
+
+#include block/block_int.h
+#include qemu/error-report.h
+#include qemu/thread.h
+#include qapi/qmp/qint.h
+#include qapi/qmp/qstring.h
+#include qapi/qmp/qjson.h
+
+#include inttypes.h
+#include xseg/xseg.h
+#include xseg/protocol.h
+
+#define ARCHIP_FD_READ  0
+#define ARCHIP_FD_WRITE 1
+#define MAX_REQUEST_SIZE524288
+
+#define ARCHIPELAGO_OPT_VOLUME  volume
+#define ARCHIPELAGO_OPT_SEGMENT segment
+#define ARCHIPELAGO_OPT_MPORT   mport
+#define ARCHIPELAGO_OPT_VPORT   vport
+#define ARCHIPELAGO_DFL_MPORT   1001
+#define ARCHIPELAGO_DFL_VPORT   501
+
+#define archipelagolog(fmt, ...) \
+do { \
+fprintf(stderr, archipelago\t%-24s:  fmt, __func__, ##__VA_ARGS__); \
+} while (0)
+
+typedef enum {
+ARCHIP_OP_READ,
+ARCHIP_OP_WRITE,
+ARCHIP_OP_FLUSH,
+ARCHIP_OP_VOLINFO,
+} ARCHIPCmd;
+
+typedef struct ArchipelagoAIOCB {
+BlockDriverAIOCB common;
+QEMUBH *bh;
+struct BDRVArchipelagoState *s;
+QEMUIOVector *qiov;
+ARCHIPCmd cmd;
+bool cancelled;
+int status;
+int64_t size;
+int64_t ret;

[Qemu-devel] [PATCH v7 2/5] block/archipelago: Implement bdrv_parse_filename()

2014-07-23 Thread Chrysostomos Nanakos

VM Image on Archipelago volume can also be specified like this:

file=archipelago:volumename[/mport=mapperd_port[:vport=vlmcd_port][:
segment=segment_name]]

Examples:

file=archipelago:my_vm_volume
file=archipelago:my_vm_volume/mport=123
file=archipelago:my_vm_volume/mport=123:vport=1234
file=archipelago:my_vm_volume/mport=123:vport=1234:segment=my_segment

Signed-off-by: Chrysostomos Nanakos cnana...@grnet.gr
---
 block/archipelago.c |  140 ++-
 1 file changed, 138 insertions(+), 2 deletions(-)

diff --git a/block/archipelago.c b/block/archipelago.c
index 1c21d36..5a9fc68 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -15,6 +15,11 @@
  * [,file.mport=mapperd_port[,file.vport=vlmcd_port]
  * [,file.segment=segment_name]]
  *
+ * or
+ *
+ * file=archipelago:volumename[/mport=mapperd_port[:vport=vlmcd_port][:
+ * segment=segment_name]]
+ *
  * 'archipelago' is the protocol.
  *
  * 'mport' is the port number on which mapperd is listening. This is optional
@@ -32,11 +37,20 @@
  * file.driver=archipelago,file.volume=my_vm_volume
  * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
  * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
- * file.vport=1234
+ *  file.vport=1234
  * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
- * file.vport=1234,file.segment=my_segment
+ *  file.vport=1234,file.segment=my_segment
+ *
+ * or
+ *
+ * file=archipelago:my_vm_volume
+ * file=archipelago:my_vm_volume/mport=123
+ * file=archipelago:my_vm_volume/mport=123:vport=1234
+ * file=archipelago:my_vm_volume/mport=123:vport=1234:segment=my_segment
+ *
  */
 
+#include qemu-common.h
 #include block/block_int.h
 #include qemu/error-report.h
 #include qemu/thread.h
@@ -309,6 +323,127 @@ static void qemu_archipelago_complete_aio(void *opaque)
 g_free(reqdata);
 }
 
+static void xseg_find_port(char *pstr, const char *needle, xport *aport)
+{
+const char *a;
+char *endptr = NULL;
+unsigned long port;
+if (strstart(pstr, needle, a)) {
+if (strlen(a)  0) {
+port = strtoul(a, endptr, 10);
+if (strlen(endptr)) {
+*aport = -2;
+return;
+}
+*aport = (xport) port;
+}
+}
+}
+
+static void xseg_find_segment(char *pstr, const char *needle,
+  char **segment_name)
+{
+const char *a;
+if (strstart(pstr, needle, a)) {
+if (strlen(a)  0) {
+*segment_name = g_strdup(a);
+}
+}
+}
+
+static void parse_filename_opts(const char *filename, Error **errp,
+char **volume, char **segment_name,
+xport *mport, xport *vport)
+{
+const char *start;
+char *tokens[4], *ds;
+int idx;
+xport lmport = NoPort, lvport = NoPort;
+
+strstart(filename, archipelago:, start);
+
+ds = g_strdup(start);
+tokens[0] = strtok(ds, /);
+tokens[1] = strtok(NULL, :);
+tokens[2] = strtok(NULL, :);
+tokens[3] = strtok(NULL, \0);
+
+if (!strlen(tokens[0])) {
+error_setg(errp, volume name must be specified first);
+g_free(ds);
+return;
+}
+
+for (idx = 1; idx  4; idx++) {
+if (tokens[idx] != NULL) {
+if (strstart(tokens[idx], mport=, NULL)) {
+xseg_find_port(tokens[idx], mport=, lmport);
+}
+if (strstart(tokens[idx], vport=, NULL)) {
+xseg_find_port(tokens[idx], vport=, lvport);
+}
+if (strstart(tokens[idx], segment=, NULL)) {
+xseg_find_segment(tokens[idx], segment=, segment_name);
+}
+}
+}
+
+if ((lmport == -2) || (lvport == -2)) {
+error_setg(errp, mport and/or vport must be set);
+g_free(ds);
+return;
+}
+*volume = g_strdup(tokens[0]);
+*mport = lmport;
+*vport = lvport;
+g_free(ds);
+}
+
+static void archipelago_parse_filename(const char *filename, QDict *options,
+   Error **errp)
+{
+const char *start;
+char *volume = NULL, *segment_name = NULL;
+xport mport = NoPort, vport = NoPort;
+
+if (qdict_haskey(options, ARCHIPELAGO_OPT_VOLUME)
+|| qdict_haskey(options, ARCHIPELAGO_OPT_SEGMENT)
+|| qdict_haskey(options, ARCHIPELAGO_OPT_MPORT)
+|| qdict_haskey(options, ARCHIPELAGO_OPT_VPORT)) {
+error_setg(errp, volume/mport/vport/segment and a file name may not
+  be specified at the same time);
+return;
+}
+
+if (!strstart(filename, archipelago:, start)) {
+error_setg(errp, File name must start with 'archipelago:');
+return;
+}
+
+if (!strlen(start) || strstart(start, /, NULL)) {
+error_setg(errp, volume name must be specified);
+return;
+}
+
+parse_filename_opts(filename, errp, volume,

[Qemu-devel] [PATCH v7 5/5] qemu-iotests: add support for Archipelago protocol

2014-07-23 Thread Chrysostomos Nanakos

Signed-off-by: Chrysostomos Nanakos cnana...@grnet.gr
---
 tests/qemu-iotests/common|6 ++
 tests/qemu-iotests/common.rc |9 -
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index e4083f4..70df659 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -152,6 +152,7 @@ check options
 -nbdtest nbd
 -sshtest ssh
 -nfstest nfs
+-archipelagotest archipelago
 -xdiff  graphical mode diff
 -nocacheuse O_DIRECT on backing file
 -misalign   misalign memory allocations
@@ -263,6 +264,11 @@ testlist options
 xpand=false
 ;;
 
+-archipelago)
+IMGPROTO=archipelago
+xpand=false
+;;
+
 -nocache)
 CACHEMODE=none
 CACHEMODE_IS_DEFAULT=false
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index e0ea7e3..3fd691e 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -64,6 +64,8 @@ elif [ $IMGPROTO = ssh ]; then
 elif [ $IMGPROTO = nfs ]; then
 TEST_DIR=nfs://127.0.0.1/$TEST_DIR
 TEST_IMG=$TEST_DIR/t.$IMGFMT
+elif [ $IMGPROTO = archipelago ]; then
+TEST_IMG=archipelago:at.$IMGFMT
 else
 TEST_IMG=$IMGPROTO:$TEST_DIR/t.$IMGFMT
 fi
@@ -163,7 +165,8 @@ _make_test_img()
 -e s# lazy_refcounts=\\(on\\|off\\)##g \
 -e s# block_size=[0-9]\\+##g \
 -e s# block_state_zero=\\(on\\|off\\)##g \
--e s# log_size=[0-9]\\+##g
+-e s# log_size=[0-9]\\+##g \
+-e s/archipelago:a/TEST_DIR\//g
 
 # Start an NBD server on the image file, which is what we'll be talking to
 if [ $IMGPROTO = nbd ]; then
@@ -206,6 +209,10 @@ _cleanup_test_img()
 rbd --no-progress rm $TEST_DIR/t.$IMGFMT  /dev/null
 ;;
 
+archipelago)
+vlmc remove at.$IMGFMT  /dev/null
+;;
+
 sheepdog)
 collie vdi delete $TEST_DIR/t.$IMGFMT
 ;;
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH v2 for-2.1 4/5] docs: document missing POWERDOWN event

2014-07-23 Thread Wenchao Xia

于 2014/7/23 20:26, Eric Blake 写道:
 The POWERDOWN event was first documented in 0aab9ec3.  But since
 dfab4892 later restored this file to the state prior to qmp events,
 and we never documented it in the past, anyone using this file
 instead of qapi will miss out on this event.  Tweak the existing
 wording of SHUTDOWN to match 84321831, and make the difference
 between the two events apparent.
 
 * docs/qmp/qmp-events.txt (POWERDOWN): Add.
 (SHUTDOWN): Tweak.
 
 Signed-off-by: Eric Blake ebl...@redhat.com
 ---
   docs/qmp/qmp-events.txt | 16 +++-
   1 file changed, 15 insertions(+), 1 deletion(-)
 
 diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
 index 22d552f..9d7439e 100644
 --- a/docs/qmp/qmp-events.txt
 +++ b/docs/qmp/qmp-events.txt
 @@ -243,6 +243,19 @@ Data:
 timestamp: { seconds: 1368697518, microseconds: 326866 } }
   }
 
 +POWERDOWN
 +-
 +
 +Emitted when the Virtual Machine is powered down through the power
 +control system, such as via ACPI.
 +
 +Data: None.
 +
 +Example:
 +
 +{ event: POWERDOWN,
 +timestamp: { seconds: 1267040730, microseconds: 682951 } }
 +
   QUORUM_FAILURE
   --
 
 @@ -325,7 +338,8 @@ Example:
   SHUTDOWN
   
 
 -Emitted when the Virtual Machine is powered down.
 +Emitted when the Virtual Machine has shut down, indicating that qemu
 +is about to exit.
 
   Data: None.
 
  Nice to have explantion about the difference.

Re: [Qemu-devel] [PATCH v2 for-2.1 0/5] docs: document remaining QMP events

2014-07-23 Thread Wenchao Xia

Reviewed-by: Wenchao Xia wenchaoq...@gmail.com

Re: [Qemu-devel] [PATCH] scripts: qapi-event.py: support vendor extension

2014-07-23 Thread Wenchao Xia


Reviewed-by: Wenchao Xia wenchaoq...@gmail.com

I didn't expect dot in schema before.

[Qemu-devel] [RFC PATCH 04/17] COLO info: use colo info to tell migration target colo is enabled

2014-07-23 Thread Yang Hongyang

migrate colo info to migration target to tell the target colo is
enabled.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 Makefile.objs  |  1 +
 include/migration/migration-colo.h |  3 ++
 migration-colo-comm.c  | 68 ++
 vl.c   |  4 +++
 4 files changed, 76 insertions(+)
 create mode 100644 migration-colo-comm.c

diff --git a/Makefile.objs b/Makefile.objs
index cab5824..1836a68 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -50,6 +50,7 @@ common-obj-$(CONFIG_POSIX) += os-posix.o
 common-obj-$(CONFIG_LINUX) += fsdev/
 
 common-obj-y += migration.o migration-tcp.o
+common-obj-y += migration-colo-comm.o
 common-obj-$(CONFIG_COLO) += migration-colo.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o
diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index 35b384c..e3735d8 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -12,6 +12,9 @@
 #define QEMU_MIGRATION_COLO_H
 
 #include qemu-common.h
+#include migration/migration.h
+
+void colo_info_mig_init(void);
 
 bool colo_supported(void);
 
diff --git a/migration-colo-comm.c b/migration-colo-comm.c
new file mode 100644
index 000..ccbc246
--- /dev/null
+++ b/migration-colo-comm.c
@@ -0,0 +1,68 @@
+/*
+ *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ *  (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ *  Copyright (C) 2014 FUJITSU LIMITED
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#include migration/migration-colo.h
+
+#define DEBUG_COLO
+
+#ifdef DEBUG_COLO
+#define DPRINTF(fmt, ...) \
+do { fprintf(stdout, COLO:  fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
+static bool colo_requested;
+
+/* save */
+
+static bool migrate_use_colo(void)
+{
+MigrationState *s = migrate_get_current();
+return s-enabled_capabilities[MIGRATION_CAPABILITY_COLO];
+}
+
+static void colo_info_save(QEMUFile *f, void *opaque)
+{
+qemu_put_byte(f, migrate_use_colo());
+}
+
+/* restore */
+
+static int colo_info_load(QEMUFile *f, void *opaque, int version_id)
+{
+int value = qemu_get_byte(f);
+
+if (value  !colo_supported()) {
+fprintf(stderr, COLO is not supported\n);
+return -EINVAL;
+}
+
+if (value  !colo_requested) {
+DPRINTF(COLO requested!\n);
+}
+
+colo_requested = value;
+
+return 0;
+}
+
+static SaveVMHandlers savevm_colo_info_handlers = {
+.save_state = colo_info_save,
+.load_state = colo_info_load,
+};
+
+void colo_info_mig_init(void)
+{
+register_savevm_live(NULL, colo info, -1, 1,
+ savevm_colo_info_handlers, NULL);
+}
diff --git a/vl.c b/vl.c
index fe451aa..1a282d8 100644
--- a/vl.c
+++ b/vl.c
@@ -89,6 +89,7 @@ int main(int argc, char **argv)
 #include sysemu/dma.h
 #include audio/audio.h
 #include migration/migration.h
+#include migration/migration-colo.h
 #include sysemu/kvm.h
 #include qapi/qmp/qjson.h
 #include qemu/option.h
@@ -4339,6 +4340,9 @@ int main(int argc, char **argv, char **envp)
 
 blk_mig_init();
 ram_mig_init();
+if (colo_supported()) {
+colo_info_mig_init();
+}
 
 /* open the virtual block devices */
 if (snapshot)
-- 
1.9.1

[Qemu-devel] [RFC PATCH 07/17] COLO buffer: implement colo buffer as well as QEMUFileOps based on it

2014-07-23 Thread Yang Hongyang

We need a buffer to store migration data.

On save side:
  all saved data was write into colo buffer first, so that we can know
the total size of the migration data. this can also separate the data
transmission from colo control data, we use colo control data over
socket fd to synchronous both side's stat.

On restore side:
  all migration data was read into colo buffer first, then load data
from the buffer: If network error happens while data transmission,
the slaver can still functinal because the migration data are not yet
loaded.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 migration-colo.c | 112 +++
 1 file changed, 112 insertions(+)

diff --git a/migration-colo.c b/migration-colo.c
index d566b9d..b90d9b6 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -11,6 +11,7 @@
 #include qemu/main-loop.h
 #include qemu/thread.h
 #include block/coroutine.h
+#include qemu/error-report.h
 #include migration/migration-colo.h
 
 static QEMUBH *colo_bh;
@@ -20,14 +21,122 @@ bool colo_supported(void)
 return true;
 }
 
+/* colo buffer */
+
+#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)
+#define COLO_BUFFER_MAX_SIZE (1000*1000*1000*10ULL)
+
+typedef struct colo_buffer {
+uint8_t *data;
+uint64_t used;
+uint64_t freed;
+uint64_t size;
+} colo_buffer_t;
+
+static colo_buffer_t colo_buffer;
+
+static void colo_buffer_init(void)
+{
+if (colo_buffer.size == 0) {
+colo_buffer.data = g_malloc(COLO_BUFFER_BASE_SIZE);
+colo_buffer.size = COLO_BUFFER_BASE_SIZE;
+}
+colo_buffer.used = 0;
+colo_buffer.freed = 0;
+}
+
+static void colo_buffer_destroy(void)
+{
+if (colo_buffer.data) {
+g_free(colo_buffer.data);
+colo_buffer.data = NULL;
+}
+colo_buffer.used = 0;
+colo_buffer.freed = 0;
+colo_buffer.size = 0;
+}
+
+static void colo_buffer_extend(uint64_t len)
+{
+if (len  colo_buffer.size - colo_buffer.used) {
+len = len + colo_buffer.used - colo_buffer.size;
+len = ROUND_UP(len, COLO_BUFFER_BASE_SIZE) + COLO_BUFFER_BASE_SIZE;
+
+colo_buffer.size += len;
+if (colo_buffer.size  COLO_BUFFER_MAX_SIZE) {
+error_report(colo_buffer overflow!\n);
+exit(EXIT_FAILURE);
+}
+colo_buffer.data = g_realloc(colo_buffer.data, colo_buffer.size);
+}
+}
+
+static int colo_put_buffer(void *opaque, const uint8_t *buf,
+   int64_t pos, int size)
+{
+colo_buffer_extend(size);
+memcpy(colo_buffer.data + colo_buffer.used, buf, size);
+colo_buffer.used += size;
+
+return size;
+}
+
+static int colo_get_buffer_internal(uint8_t *buf, int size)
+{
+if ((size + colo_buffer.freed)  colo_buffer.used) {
+size = colo_buffer.used - colo_buffer.freed;
+}
+memcpy(buf, colo_buffer.data + colo_buffer.freed, size);
+colo_buffer.freed += size;
+
+return size;
+}
+
+static int colo_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+return colo_get_buffer_internal(buf, size);
+}
+
+static int colo_close(void *opaque)
+{
+colo_buffer_t *cb = opaque ;
+
+cb-used = 0;
+cb-freed = 0;
+
+return 0;
+}
+
+static int colo_get_fd(void *opaque)
+{
+/* colo buffer, no fd */
+return -1;
+}
+
+static const QEMUFileOps colo_write_ops = {
+.put_buffer = colo_put_buffer,
+.get_fd = colo_get_fd,
+.close = colo_close,
+};
+
+static const QEMUFileOps colo_read_ops = {
+.get_buffer = colo_get_buffer,
+.get_fd = colo_get_fd,
+.close = colo_close,
+};
+
 /* save */
 
 static void *colo_thread(void *opaque)
 {
 MigrationState *s = opaque;
 
+colo_buffer_init();
+
 /*TODO: COLO checkpointed save loop*/
 
+colo_buffer_destroy();
+
 if (s-state != MIG_STATE_ERROR) {
 migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
 }
@@ -77,8 +186,11 @@ void colo_process_incoming_checkpoints(QEMUFile *f)
 colo = qemu_coroutine_self();
 assert(colo != NULL);
 
+colo_buffer_init();
+
 /* TODO: COLO checkpointed restore loop */
 
+colo_buffer_destroy();
 colo = NULL;
 restore_exit_colo();
 
-- 
1.9.1

[Qemu-devel] [RFC PATCH 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service

2014-07-23 Thread Yang Hongyang

Virtual machine (VM) replication is a well known technique for
providing application-agnostic software-implemented hardware fault
tolerance non-stop service. COLO is a high availability solution.
Both primary VM (PVM) and secondary VM (SVM) run in parallel. They
receive the same request from client, and generate response in parallel
too. If the response packets from PVM and SVM are identical, they are
released immediately. Otherwise, a VM checkpoint (on demand) is
conducted. The idea is presented in Xen summit 2012, and 2013,
and academia paper in SOCC 2013. It's also presented in KVM forum
2013:
http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf
Please refer to above document for detailed information. 
Please also refer to previous posted RFC proposal:
http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html

The patchset is also hosted on github:
https://github.com/macrosheep/qemu/tree/colo_v0.1

This patchset is RFC, implements the frame of colo, without
failover and nic/disk replication. But it is ready for demo
the COLO idea above QEMU-Kvm.
Steps using this patchset to get an overview of COLO:
1. configure the source with --enable-colo option
2. compile
3. just like QEMU's normal migration, run 2 QEMU VM:
   - Primary VM 
   - Secondary VM with -incoming tcp:[IP]:[PORT] option
4. on Primary VM's QEMU monitor, run following command:
   migrate_set_capability colo on
   migrate tcp:[IP]:[PORT]
5. done
you will see two runing VMs, whenever you make changes to PVM, SVM
will be synced to PVM's state.

TODO list:
1. failover
2. nic replication
3. disk replication[COLO Disk manager]

Any comments/feedbacks are warmly welcomed.

Thanks,
Yang

Yang Hongyang (17):
  configure: add CONFIG_COLO to switch COLO support
  COLO: introduce an api colo_supported() to indicate COLO support
  COLO migration: add a migration capability 'colo'
  COLO info: use colo info to tell migration target colo is enabled
  COLO save: integrate COLO checkpointed save into qemu migration
  COLO restore: integrate COLO checkpointed restore into qemu restore
  COLO buffer: implement colo buffer as well as QEMUFileOps based on it
  COLO: disable qdev hotplug
  COLO ctl: implement API's that communicate with colo agent
  COLO ctl: introduce is_slave() and is_master()
  COLO ctl: implement colo checkpoint protocol
  COLO ctl: add a RunState RUN_STATE_COLO
  COLO ctl: implement colo save
  COLO ctl: implement colo restore
  COLO save: reuse migration bitmap under colo checkpoint
  COLO ram cache: implement colo ram cache on slaver
  HACK: trigger checkpoint every 500ms

 Makefile.objs  |   2 +
 arch_init.c| 174 +-
 configure  |  14 +
 include/exec/cpu-all.h |   1 +
 include/migration/migration-colo.h |  36 +++
 include/migration/migration.h  |  13 +
 include/qapi/qmp/qerror.h  |   3 +
 migration-colo-comm.c  |  78 +
 migration-colo.c   | 643 +
 migration.c|  45 ++-
 qapi-schema.json   |   9 +-
 stubs/Makefile.objs|   1 +
 stubs/migration-colo.c |  34 ++
 vl.c   |  12 +
 14 files changed, 1044 insertions(+), 21 deletions(-)
 create mode 100644 include/migration/migration-colo.h
 create mode 100644 migration-colo-comm.c
 create mode 100644 migration-colo.c
 create mode 100644 stubs/migration-colo.c

-- 
1.9.1

[Qemu-devel] [RFC PATCH 03/17] COLO migration: add a migration capability 'colo'

2014-07-23 Thread Yang Hongyang

Add a migration capability 'colo'. If this capability is on,
The migration will never end, and the VM will be continuously
checkpointed.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 include/qapi/qmp/qerror.h | 3 +++
 migration.c   | 6 ++
 qapi-schema.json  | 5 -
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h
index 902d1a7..226b805 100644
--- a/include/qapi/qmp/qerror.h
+++ b/include/qapi/qmp/qerror.h
@@ -166,4 +166,7 @@ void qerror_report_err(Error *err);
 #define QERR_SOCKET_CREATE_FAILED \
 ERROR_CLASS_GENERIC_ERROR, Failed to create socket
 
+#define QERR_COLO_UNSUPPORTED \
+ERROR_CLASS_GENERIC_ERROR, COLO is not currently supported, please rerun 
configure with --enable-colo option in order to support COLO feature
+
 #endif /* QERROR_H */
diff --git a/migration.c b/migration.c
index 8d675b3..ca83310 100644
--- a/migration.c
+++ b/migration.c
@@ -25,6 +25,7 @@
 #include qemu/thread.h
 #include qmp-commands.h
 #include trace.h
+#include migration/migration-colo.h
 
 enum {
 MIG_STATE_ERROR = -1,
@@ -277,6 +278,11 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 }
 
 for (cap = params; cap; cap = cap-next) {
+if (cap-value-capability == MIGRATION_CAPABILITY_COLO 
+cap-value-state  !colo_supported()) {
+error_set(errp, QERR_COLO_UNSUPPORTED);
+continue;
+}
 s-enabled_capabilities[cap-value-capability] = cap-value-state;
 }
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index b11aad2..807f5a2 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -491,10 +491,13 @@
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #  to speed up convergence of RAM migration. (since 1.6)
 #
+# @colo: The migration will never end, and the VM will instead be continuously
+#checkpointed. The feature is disabled by default. (since 2.1)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'colo'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
1.9.1

[Qemu-devel] [RFC PATCH 05/17] COLO save: integrate COLO checkpointed save into qemu migration

2014-07-23 Thread Yang Hongyang

  Integrate COLO checkpointed save flow into qemu migration.
  Add a migrate state: MIG_STATE_COLO, enter this migrate state
after the first live migration successfully finished.
  Create a colo thread to do the checkpointed save.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 include/migration/migration-colo.h |  4 
 include/migration/migration.h  | 13 +++
 migration-colo-comm.c  |  2 +-
 migration-colo.c   | 48 ++
 migration.c| 36 
 stubs/migration-colo.c |  4 
 6 files changed, 91 insertions(+), 16 deletions(-)

diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index e3735d8..24589c0 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -18,4 +18,8 @@ void colo_info_mig_init(void);
 
 bool colo_supported(void);
 
+/* save */
+bool migrate_use_colo(void);
+void colo_init_checkpointer(MigrationState *s);
+
 #endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3cb5ba8..3e81a27 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -64,6 +64,19 @@ struct MigrationState
 int64_t dirty_sync_count;
 };
 
+enum {
+MIG_STATE_ERROR = -1,
+MIG_STATE_NONE,
+MIG_STATE_SETUP,
+MIG_STATE_CANCELLING,
+MIG_STATE_CANCELLED,
+MIG_STATE_ACTIVE,
+MIG_STATE_COLO,
+MIG_STATE_COMPLETED,
+};
+
+void migrate_set_state(MigrationState *s, int old_state, int new_state);
+
 void process_incoming_migration(QEMUFile *f);
 
 void qemu_start_incoming_migration(const char *uri, Error **errp);
diff --git a/migration-colo-comm.c b/migration-colo-comm.c
index ccbc246..4504ceb 100644
--- a/migration-colo-comm.c
+++ b/migration-colo-comm.c
@@ -25,7 +25,7 @@ static bool colo_requested;
 
 /* save */
 
-static bool migrate_use_colo(void)
+bool migrate_use_colo(void)
 {
 MigrationState *s = migrate_get_current();
 return s-enabled_capabilities[MIGRATION_CAPABILITY_COLO];
diff --git a/migration-colo.c b/migration-colo.c
index 1d3bef8..0cef8bd 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -8,9 +8,57 @@
  * the COPYING file in the top-level directory.
  */
 
+#include qemu/main-loop.h
+#include qemu/thread.h
 #include migration/migration-colo.h
 
+static QEMUBH *colo_bh;
+
 bool colo_supported(void)
 {
 return true;
 }
+
+/* save */
+
+static void *colo_thread(void *opaque)
+{
+MigrationState *s = opaque;
+
+/*TODO: COLO checkpointed save loop*/
+
+if (s-state != MIG_STATE_ERROR) {
+migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
+}
+
+qemu_mutex_lock_iothread();
+qemu_bh_schedule(s-cleanup_bh);
+qemu_mutex_unlock_iothread();
+
+return NULL;
+}
+
+static void colo_start_checkpointer(void *opaque)
+{
+MigrationState *s = opaque;
+
+if (colo_bh) {
+qemu_bh_delete(colo_bh);
+colo_bh = NULL;
+}
+
+qemu_mutex_unlock_iothread();
+qemu_thread_join(s-thread);
+qemu_mutex_lock_iothread();
+
+migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COLO);
+
+qemu_thread_create(s-thread, colo, colo_thread, s,
+   QEMU_THREAD_JOINABLE);
+}
+
+void colo_init_checkpointer(MigrationState *s)
+{
+colo_bh = qemu_bh_new(colo_start_checkpointer, s);
+qemu_bh_schedule(colo_bh);
+}
diff --git a/migration.c b/migration.c
index ca83310..b7f8e7e 100644
--- a/migration.c
+++ b/migration.c
@@ -27,16 +27,6 @@
 #include trace.h
 #include migration/migration-colo.h
 
-enum {
-MIG_STATE_ERROR = -1,
-MIG_STATE_NONE,
-MIG_STATE_SETUP,
-MIG_STATE_CANCELLING,
-MIG_STATE_CANCELLED,
-MIG_STATE_ACTIVE,
-MIG_STATE_COMPLETED,
-};
-
 #define MAX_THROTTLE  (32  20)  /* Migration speed throttling */
 
 /* Amount of time to allocate to each chunk of bandwidth-throttled
@@ -229,6 +219,11 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
 get_xbzrle_cache_stats(info);
 break;
+case MIG_STATE_COLO:
+info-has_status = true;
+info-status = g_strdup(colo);
+/* TODO: display COLO specific informations(checkpoint info etc.),*/
+break;
 case MIG_STATE_COMPLETED:
 get_xbzrle_cache_stats(info);
 
@@ -272,7 +267,8 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 MigrationState *s = migrate_get_current();
 MigrationCapabilityStatusList *cap;
 
-if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP) {
+if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP ||
+s-state == MIG_STATE_COLO) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
 return;
 }
@@ -289,7 +285,7 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 
 /* shared migration helpers */
 
-static void migrate_set_state(MigrationState *s, int old_state, int new_state)
+void

[Qemu-devel] [RFC PATCH 09/17] COLO ctl: implement API's that communicate with colo agent

2014-07-23 Thread Yang Hongyang

We use COLO agent to compare the packets returned by
Primary VM and Secondary VM, and decide whether to start a
checkpoint according to some rules. It is a linux kernel
module for host.
COLO controller communicate with the agent through ioctl().

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 migration-colo.c | 115 +--
 1 file changed, 112 insertions(+), 3 deletions(-)

diff --git a/migration-colo.c b/migration-colo.c
index f295e56..802f8b0 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -13,7 +13,16 @@
 #include block/coroutine.h
 #include qemu/error-report.h
 #include hw/qdev-core.h
+#include qemu/timer.h
 #include migration/migration-colo.h
+#include sys/ioctl.h
+
+/*
+ * checkpoint timer: unit ms
+ * this is large because COLO checkpoint will mostly depend on
+ * COLO compare module.
+ */
+#define CHKPOINT_TIMER 1
 
 static QEMUBH *colo_bh;
 
@@ -22,6 +31,56 @@ bool colo_supported(void)
 return true;
 }
 
+/* colo compare */
+#define COMP_IOC_MAGIC 'k'
+#define COMP_IOCTWAIT   _IO(COMP_IOC_MAGIC, 0)
+#define COMP_IOCTFLUSH  _IO(COMP_IOC_MAGIC, 1)
+#define COMP_IOCTRESUME _IO(COMP_IOC_MAGIC, 2)
+
+#define COMPARE_DEV /dev/HA_compare
+/* COLO compare module FD */
+static int comp_fd = -1;
+
+static int colo_compare_init(void)
+{
+comp_fd = open(COMPARE_DEV, O_RDONLY);
+if (comp_fd  0) {
+return -1;
+}
+
+return 0;
+}
+
+static void colo_compare_destroy(void)
+{
+if (comp_fd = 0) {
+close(comp_fd);
+comp_fd = -1;
+}
+}
+
+/*
+ * Communicate with COLO Agent through ioctl.
+ * return:
+ * 0: start a checkpoint
+ * other: errno == ETIME or ERESTART, try again
+ *errno == other, error, quit colo save
+ */
+static int colo_compare(void)
+{
+return ioctl(comp_fd, COMP_IOCTWAIT, 250);
+}
+
+static __attribute__((unused)) int colo_compare_flush(void)
+{
+return ioctl(comp_fd, COMP_IOCTFLUSH, 1);
+}
+
+static __attribute__((unused)) int colo_compare_resume(void)
+{
+return ioctl(comp_fd, COMP_IOCTRESUME, 1);
+}
+
 /* colo buffer */
 
 #define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)
@@ -131,15 +190,48 @@ static const QEMUFileOps colo_read_ops = {
 static void *colo_thread(void *opaque)
 {
 MigrationState *s = opaque;
-int dev_hotplug = qdev_hotplug;
+int dev_hotplug = qdev_hotplug, wait_cp = 0;
+int64_t start_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+int64_t current_time;
+
+if (colo_compare_init()  0) {
+error_report(Init colo compare error\n);
+goto out;
+}
 
 qdev_hotplug = 0;
 
 colo_buffer_init();
 
-/*TODO: COLO checkpointed save loop*/
+while (s-state == MIG_STATE_COLO) {
+/* wait for a colo checkpoint */
+wait_cp = colo_compare();
+if (wait_cp) {
+if (errno != ETIME  errno != ERESTART) {
+error_report(compare module failed(%s), strerror(errno));
+goto out;
+}
+/*
+ * no checkpoint is needed, wait for 1ms and then
+ * check if we need checkpoint
+ */
+current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+if (current_time - start_time  CHKPOINT_TIMER) {
+usleep(1000);
+continue;
+}
+}
+
+/* start a colo checkpoint */
+
+/*TODO: COLO save */
 
+start_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+}
+
+out:
 colo_buffer_destroy();
+colo_compare_destroy();
 
 if (s-state != MIG_STATE_ERROR) {
 migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED);
@@ -183,6 +275,17 @@ void colo_init_checkpointer(MigrationState *s)
 
 static Coroutine *colo;
 
+/*
+ * return:
+ * 0: start a checkpoint
+ * 1: some error happend, exit colo restore
+ */
+static int slave_wait_new_checkpoint(QEMUFile *f)
+{
+/* TODO: wait checkpoint start command from master */
+return 1;
+}
+
 void colo_process_incoming_checkpoints(QEMUFile *f)
 {
 int dev_hotplug = qdev_hotplug;
@@ -198,7 +301,13 @@ void colo_process_incoming_checkpoints(QEMUFile *f)
 
 colo_buffer_init();
 
-/* TODO: COLO checkpointed restore loop */
+while (true) {
+if (slave_wait_new_checkpoint(f)) {
+break;
+}
+
+/* TODO: COLO restore */
+}
 
 colo_buffer_destroy();
 colo = NULL;
-- 
1.9.1

[Qemu-devel] [RFC PATCH 11/17] COLO ctl: implement colo checkpoint protocol

2014-07-23 Thread Yang Hongyang

implement colo checkpoint protocol.

Checkpoint synchronzing points.

  Primary Secondary
  NEW @
  Suspend
  SUSPENDED   @
  SuspendSave state
  SEND@
  Send state  Receive state
  RECEIVED@
  Flush network   Load state
  LOADED  @
  Resume  Resume

  Start Comparing
NOTE:
 1) '@' who sends the message
 2) Every sync-point is synchronized by two sides with only
one handshake(single direction) for low-latency.
If more strict synchronization is required, a opposite direction
sync-point should be added.
 3) Since sync-points are single direction, the remote side may
go forward a lot when this side just receives the sync-point.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 migration-colo.c | 268 +--
 1 file changed, 262 insertions(+), 6 deletions(-)

diff --git a/migration-colo.c b/migration-colo.c
index 2699e77..a708872 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -24,6 +24,41 @@
  */
 #define CHKPOINT_TIMER 1
 
+enum {
+COLO_READY = 0x46,
+
+/*
+ * Checkpoint synchronzing points.
+ *
+ *  Primary Secondary
+ *  NEW @
+ *  Suspend
+ *  SUSPENDED   @
+ *  SuspendSave state
+ *  SEND@
+ *  Send state  Receive state
+ *  RECEIVED@
+ *  Flush network   Load state
+ *  LOADED  @
+ *  Resume  Resume
+ *
+ *  Start Comparing
+ * NOTE:
+ * 1) '@' who sends the message
+ * 2) Every sync-point is synchronized by two sides with only
+ *one handshake(single direction) for low-latency.
+ *If more strict synchronization is required, a opposite direction
+ *sync-point should be added.
+ * 3) Since sync-points are single direction, the remote side may
+ *go forward a lot when this side just receives the sync-point.
+ */
+COLO_CHECKPOINT_NEW,
+COLO_CHECKPOINT_SUSPENDED,
+COLO_CHECKPOINT_SEND,
+COLO_CHECKPOINT_RECEIVED,
+COLO_CHECKPOINT_LOADED,
+};
+
 static QEMUBH *colo_bh;
 
 bool colo_supported(void)
@@ -185,30 +220,161 @@ static const QEMUFileOps colo_read_ops = {
 .close = colo_close,
 };
 
+/* colo checkpoint control helper */
+static bool is_master(void);
+static bool is_slave(void);
+
+static void ctl_error_handler(void *opaque, int err)
+{
+if (is_slave()) {
+/* TODO: determine whether we need to failover */
+/* FIXME: we will not failover currently, just kill slave */
+error_report(error: colo transmission failed!\n);
+exit(1);
+} else if (is_master()) {
+/* Master still alive, do not failover */
+error_report(error: colo transmission failed!\n);
+return;
+} else {
+error_report(COLO: Unexpected error happend!\n);
+exit(EXIT_FAILURE);
+}
+}
+
+static int colo_ctl_put(QEMUFile *f, uint64_t request)
+{
+int ret = 0;
+
+qemu_put_be64(f, request);
+qemu_fflush(f);
+
+ret = qemu_file_get_error(f);
+if (ret  0) {
+ctl_error_handler(f, ret);
+return 1;
+}
+
+return ret;
+}
+
+static int colo_ctl_get_value(QEMUFile *f, uint64_t *value)
+{
+int ret = 0;
+uint64_t temp;
+
+temp = qemu_get_be64(f);
+
+ret = qemu_file_get_error(f);
+if (ret  0) {
+ctl_error_handler(f, ret);
+return 1;
+}
+
+*value = temp;
+return 0;
+}
+
+static int colo_ctl_get(QEMUFile *f, uint64_t require)
+{
+int ret;
+uint64_t value;
+
+ret = colo_ctl_get_value(f, value);
+if (ret) {
+return ret;
+}
+
+if (value != require) {
+error_report(unexpected state received!\n);
+exit(1);
+}
+
+return ret;
+}
+
 /* save */
 
-static __attribute__((unused)) bool is_master(void)
+static bool is_master(void)
 {
 MigrationState *s = migrate_get_current();
 return (s-state == MIG_STATE_COLO);
 }
 
+static int do_colo_transaction(MigrationState *s, QEMUFile *control,
+   QEMUFile *trans)
+{
+int ret;
+
+ret = colo_ctl_put(s-file, COLO_CHECKPOINT_NEW);
+if (ret) {
+goto out;
+}
+
+ret = colo_ctl_get(control, COLO_CHECKPOINT_SUSPENDED);
+if (ret) {
+goto out;
+}
+
+/* TODO: suspend and save vm state to colo buffer */
+
+ret = colo_ctl_put(s-file, COLO_CHECKPOINT_SEND);
+if (ret) {
+goto out;
+}
+
+/*

[Qemu-devel] [RFC PATCH 14/17] COLO ctl: implement colo restore

2014-07-23 Thread Yang Hongyang

implement colo restore

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 migration-colo.c | 43 +++
 1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/migration-colo.c b/migration-colo.c
index 03ac157..8596845 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -535,8 +535,9 @@ void colo_process_incoming_checkpoints(QEMUFile *f)
 {
 int fd = qemu_get_fd(f);
 int dev_hotplug = qdev_hotplug;
-QEMUFile *ctl = NULL;
+QEMUFile *ctl = NULL, *fb = NULL;
 int ret;
+uint64_t total_size;
 
 if (!restore_use_colo()) {
 return;
@@ -560,7 +561,8 @@ void colo_process_incoming_checkpoints(QEMUFile *f)
 goto out;
 }
 
-/* TODO: in COLO mode, slave is runing, so start the vm */
+/* in COLO mode, slave is runing, so start the vm */
+vm_start();
 
 while (true) {
 if (slave_wait_new_checkpoint(f)) {
@@ -569,43 +571,68 @@ void colo_process_incoming_checkpoints(QEMUFile *f)
 
 /* start colo checkpoint */
 
-/* TODO: suspend guest */
+/* suspend guest */
+vm_stop_force_state(RUN_STATE_COLO);
 
 ret = colo_ctl_put(ctl, COLO_CHECKPOINT_SUSPENDED);
 if (ret) {
 goto out;
 }
 
-/* TODO: open colo buffer for read */
+/* open colo buffer for read */
+fb = qemu_fopen_ops(colo_buffer, colo_read_ops);
+if (!fb) {
+error_report(can't open colo buffer\n);
+goto out;
+}
 
 ret = colo_ctl_get(f, COLO_CHECKPOINT_SEND);
 if (ret) {
 goto out;
 }
 
-/* TODO: read migration data into colo buffer */
+/* read migration data into colo buffer */
+
+/* read the vmstate total size first */
+ret = colo_ctl_get_value(f, total_size);
+if (ret) {
+goto out;
+}
+colo_buffer_extend(total_size);
+qemu_get_buffer(f, colo_buffer.data, total_size);
+colo_buffer.used = total_size;
 
 ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED);
 if (ret) {
 goto out;
 }
 
-/* TODO: load vm state */
+/* load vm state */
+if (qemu_loadvm_state(fb)  0) {
+error_report(COLO: loadvm failed\n);
+goto out;
+}
 
 ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
 if (ret) {
 goto out;
 }
 
-/* TODO: resume guest */
+/* resume guest */
+vm_start();
 
-/* TODO: close colo buffer */
+qemu_fclose(fb);
+fb = NULL;
 }
 
 out:
 colo_buffer_destroy();
 colo = NULL;
 
+if (fb) {
+qemu_fclose(fb);
+}
+
 if (ctl) {
 qemu_fclose(ctl);
 }
-- 
1.9.1

[Qemu-devel] [RFC PATCH 15/17] COLO save: reuse migration bitmap under colo checkpoint

2014-07-23 Thread Yang Hongyang

reuse migration bitmap under colo checkpoint, only send dirty pages
per-checkpoint.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 arch_init.c| 20 +++-
 include/migration/migration-colo.h |  2 ++
 migration-colo.c   |  6 ++
 stubs/migration-colo.c | 10 ++
 4 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 8ddaf35..c84e6c8 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -52,6 +52,7 @@
 #include exec/ram_addr.h
 #include hw/acpi/acpi.h
 #include qemu/host-utils.h
+#include migration/migration-colo.h
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -769,6 +770,15 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 RAMBlock *block;
 int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
 
+/*
+ * migration has already setup the bitmap, reuse it.
+ */
+if (is_master()) {
+qemu_mutex_lock_ramlist();
+reset_ram_globals();
+goto out_setup;
+}
+
 mig_throttle_on = false;
 dirty_rate_high_cnt = 0;
 bitmap_sync_count = 0;
@@ -828,6 +838,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 migration_bitmap_sync();
 qemu_mutex_unlock_iothread();
 
+out_setup:
 qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
 QTAILQ_FOREACH(block, ram_list.blocks, next) {
@@ -937,7 +948,14 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 }
 
 ram_control_after_iterate(f, RAM_CONTROL_FINISH);
-migration_end();
+
+/*
+ * Since we need to reuse dirty bitmap in colo,
+ * don't cleanup the bitmap.
+ */
+if (!migrate_use_colo() || migration_has_failed(migrate_get_current())) {
+migration_end();
+}
 
 qemu_mutex_unlock_ramlist();
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index 861fa27..c286a60 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -21,10 +21,12 @@ bool colo_supported(void);
 /* save */
 bool migrate_use_colo(void);
 void colo_init_checkpointer(MigrationState *s);
+bool is_master(void);
 
 /* restore */
 bool restore_use_colo(void);
 void restore_exit_colo(void);
+bool is_slave(void);
 
 void colo_process_incoming_checkpoints(QEMUFile *f);
 
diff --git a/migration-colo.c b/migration-colo.c
index 8596845..13a6a57 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -222,8 +222,6 @@ static const QEMUFileOps colo_read_ops = {
 };
 
 /* colo checkpoint control helper */
-static bool is_master(void);
-static bool is_slave(void);
 
 static void ctl_error_handler(void *opaque, int err)
 {
@@ -295,7 +293,7 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require)
 
 /* save */
 
-static bool is_master(void)
+bool is_master(void)
 {
 MigrationState *s = migrate_get_current();
 return (s-state == MIG_STATE_COLO);
@@ -499,7 +497,7 @@ void colo_init_checkpointer(MigrationState *s)
 
 static Coroutine *colo;
 
-static bool is_slave(void)
+bool is_slave(void)
 {
 return colo != NULL;
 }
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 55f0d37..ef65be6 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -22,3 +22,13 @@ void colo_init_checkpointer(MigrationState *s)
 void colo_process_incoming_checkpoints(QEMUFile *f)
 {
 }
+
+bool is_master(void)
+{
+return false;
+}
+
+bool is_slave(void)
+{
+return false;
+}
-- 
1.9.1

[Qemu-devel] [RFC PATCH 16/17] COLO ram cache: implement colo ram cache on slaver

2014-07-23 Thread Yang Hongyang

The ram cache was initially the same as PVM's memory. At
checkpoint, we cache the dirty memory of PVM into ram cache
(so that ram cache always the same as PVM's memory at every
checkpoint), flush cached memory to SVM after we received
all PVM dirty memory(only needed to flush memory that was
both dirty on PVM and SVM since last checkpoint).

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 arch_init.c| 154 -
 include/exec/cpu-all.h |   1 +
 include/migration/migration-colo.h |   3 +
 migration-colo.c   |   4 +
 4 files changed, 159 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index c84e6c8..009bcb5 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1013,6 +1013,7 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void 
*host)
 return 0;
 }
 
+static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock 
*block);
 static inline void *host_from_stream_offset(QEMUFile *f,
 ram_addr_t offset,
 int flags)
@@ -1027,7 +1028,12 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 return NULL;
 }
 
-return memory_region_get_ram_ptr(block-mr) + offset;
+if (is_slave()) {
+migration_bitmap_set_dirty(block-mr-ram_addr + offset);
+return memory_region_get_ram_cache_ptr(block-mr, block) + offset;
+} else {
+return memory_region_get_ram_ptr(block-mr) + offset;
+}
 }
 
 len = qemu_get_byte(f);
@@ -1035,8 +1041,15 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 id[len] = 0;
 
 QTAILQ_FOREACH(block, ram_list.blocks, next) {
-if (!strncmp(id, block-idstr, sizeof(id)))
-return memory_region_get_ram_ptr(block-mr) + offset;
+if (!strncmp(id, block-idstr, sizeof(id))) {
+if (is_slave()) {
+migration_bitmap_set_dirty(block-mr-ram_addr + offset);
+return memory_region_get_ram_cache_ptr(block-mr, block)
+   + offset;
+} else {
+return memory_region_get_ram_ptr(block-mr) + offset;
+}
+}
 }
 
 error_report(Can't find block %s!, id);
@@ -1054,11 +1067,13 @@ void ram_handle_compressed(void *host, uint8_t ch, 
uint64_t size)
 }
 }
 
+static void ram_flush_cache(void);
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
 ram_addr_t addr;
 int flags, ret = 0;
 static uint64_t seq_iter;
+bool need_flush = false;
 
 seq_iter++;
 
@@ -1121,6 +1136,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 break;
 }
 
+need_flush = true;
 ch = qemu_get_byte(f);
 ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
 } else if (flags  RAM_SAVE_FLAG_PAGE) {
@@ -1133,6 +1149,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 break;
 }
 
+need_flush = true;
 qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
 } else if (flags  RAM_SAVE_FLAG_XBZRLE) {
 void *host = host_from_stream_offset(f, addr, flags);
@@ -1148,6 +1165,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ret = -EINVAL;
 break;
 }
+need_flush = true;
 } else if (flags  RAM_SAVE_FLAG_HOOK) {
 ram_control_load_hook(f, flags);
 } else if (flags  RAM_SAVE_FLAG_EOS) {
@@ -1161,11 +1179,141 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ret = qemu_file_get_error(f);
 }
 
+if (!ret  is_slave()  need_flush) {
+ram_flush_cache();
+}
+
 DPRINTF(Completed load of VM with exit code %d seq iteration 
 % PRIu64 \n, ret, seq_iter);
 return ret;
 }
 
+/*
+ * colo cache: this is for secondary VM, we cache the whole
+ * memory of the secondary VM.
+ */
+void create_and_init_ram_cache(void)
+{
+/*
+ * called after first migration
+ */
+RAMBlock *block;
+int64_t ram_cache_pages = last_ram_offset()  TARGET_PAGE_BITS;
+
+QTAILQ_FOREACH(block, ram_list.blocks, next) {
+block-host_cache = g_malloc(block-length);
+memcpy(block-host_cache, block-host, block-length);
+}
+
+migration_bitmap = bitmap_new(ram_cache_pages);
+migration_dirty_pages = 0;
+memory_global_dirty_log_start();
+}
+
+void release_ram_cache(void)
+{
+RAMBlock *block;
+
+if (migration_bitmap) {
+memory_global_dirty_log_stop();
+g_free(migration_bitmap);
+migration_bitmap = NULL;
+}
+
+QTAILQ_FOREACH(block, ram_list.blocks, next) {
+g_free(block-host_cache);
+}
+}
+
+static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block)
+{
+   if (mr-alias) {
+

[Qemu-devel] [RFC PATCH 10/17] COLO ctl: introduce is_slave() and is_master()

2014-07-23 Thread Yang Hongyang

is_slaver is to determine whether the QEMU instance is a
slaver(migration target) at runtime.
is_master is to determine whether the QEMU instance is a
master(migration starter) at runtime.
This 2 APIs will be used later.

Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
---
 migration-colo.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/migration-colo.c b/migration-colo.c
index 802f8b0..2699e77 100644
--- a/migration-colo.c
+++ b/migration-colo.c
@@ -187,6 +187,12 @@ static const QEMUFileOps colo_read_ops = {
 
 /* save */
 
+static __attribute__((unused)) bool is_master(void)
+{
+MigrationState *s = migrate_get_current();
+return (s-state == MIG_STATE_COLO);
+}
+
 static void *colo_thread(void *opaque)
 {
 MigrationState *s = opaque;
@@ -275,6 +281,11 @@ void colo_init_checkpointer(MigrationState *s)
 
 static Coroutine *colo;
 
+static __attribute__((unused)) bool is_slave(void)
+{
+return colo != NULL;
+}
+
 /*
  * return:
  * 0: start a checkpoint
-- 
1.9.1

Re: [Qemu-devel] [RFC PATCH 03/17] COLO migration: add a migration capability 'colo'

2014-07-23 Thread Eric Blake

On 07/23/2014 08:25 AM, Yang Hongyang wrote:
 Add a migration capability 'colo'. If this capability is on,
 The migration will never end, and the VM will be continuously
 checkpointed.
 
 Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
 ---
  include/qapi/qmp/qerror.h | 3 +++
  migration.c   | 6 ++
  qapi-schema.json  | 5 -
  3 files changed, 13 insertions(+), 1 deletion(-)
 
 diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h
 index 902d1a7..226b805 100644
 --- a/include/qapi/qmp/qerror.h
 +++ b/include/qapi/qmp/qerror.h
 @@ -166,4 +166,7 @@ void qerror_report_err(Error *err);
  #define QERR_SOCKET_CREATE_FAILED \
  ERROR_CLASS_GENERIC_ERROR, Failed to create socket
  
 +#define QERR_COLO_UNSUPPORTED \
 +ERROR_CLASS_GENERIC_ERROR, COLO is not currently supported, please 
 rerun configure with --enable-colo option in order to support COLO feature

Unless you plan on using this message in more than one place, we prefer
that you don't add new #defines here.  Instead, just use error_setg with
the message inline.


 +++ b/qapi-schema.json
 @@ -491,10 +491,13 @@
  # @auto-converge: If enabled, QEMU will automatically throttle down the guest
  #  to speed up convergence of RAM migration. (since 1.6)
  #
 +# @colo: The migration will never end, and the VM will instead be 
 continuously
 +#checkpointed. The feature is disabled by default. (since 2.1)

You missed 2.1.  This has to be since 2.2.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/7] hw/misc/platform_devices: helpers for dynamic instantiation of platform devices

2014-07-23 Thread Eric Auger

On 07/08/2014 03:43 PM, Alexander Graf wrote:
 
 On 07.07.14 09:08, Eric Auger wrote:
 This new module implements routines which help in dynamic instantiation
 of sysbus devices. Machine files can use those generic routines.

 ---

 Dynamic sysbus device allocation fully written by Alex Graf.

 [Eric Auger]
 Those functions were initially in ppc e500 machine file. Now moved to a
 separate module.

 PPCE500Params is replaced by a generic struct named PlatformParams

 Signed-off-by: Alexander Graf ag...@suse.de
 Signed-off-by: Eric Auger eric.au...@linaro.org
 ---
   hw/misc/Makefile.objs  |   1 +
   hw/misc/platform_devices.c | 217
 +
   include/hw/misc/platform_devices.h |  61 +++
   3 files changed, 279 insertions(+)
   create mode 100644 hw/misc/platform_devices.c
   create mode 100644 include/hw/misc/platform_devices.h

 diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
 index e47fea8..d081606 100644
 --- a/hw/misc/Makefile.objs
 +++ b/hw/misc/Makefile.objs
 @@ -40,3 +40,4 @@ obj-$(CONFIG_SLAVIO) += slavio_misc.o
   obj-$(CONFIG_ZYNQ) += zynq_slcr.o
 obj-$(CONFIG_PVPANIC) += pvpanic.o
 +obj-y += platform_devices.o
 diff --git a/hw/misc/platform_devices.c b/hw/misc/platform_devices.c
 new file mode 100644
 index 000..96ab272
 --- /dev/null
 +++ b/hw/misc/platform_devices.c
 @@ -0,0 +1,217 @@
 +#include hw/misc/platform_devices.h
 +#include hw/sysbus.h
 +#include qemu/error-report.h
 +
 +#define PAGE_SHIFT 12
 +
 +int sysbus_device_create_devtree(Object *obj, void *opaque)
 +{
 +PlatformDevtreeData *data = opaque;
 +Object *dev;
 +SysBusDevice *sbdev;
 +bool matched = false;
 +
 +dev = object_dynamic_cast(obj, TYPE_SYS_BUS_DEVICE);
 +sbdev = (SysBusDevice *)dev;
 +
 +if (!sbdev) {
 +/* Container, traverse it for children */
 +return object_child_foreach(obj,
 sysbus_device_create_devtree, data);
 +}
 +
 +if (!matched) {
 +error_report(Device %s is not supported by this machine yet.,
 + qdev_fw_name(DEVICE(dev)));
 +exit(1);
 +}
 +
 +return 0;
 +}
 +
 +void platform_bus_create_devtree(PlatformParams *params, void *fdt,
 +const char *mpic)
 +{
 +gchar *node = g_strdup_printf(/platform@%PRIx64,
 +  params-platform_bus_base);
 +const char platcomp[] = qemu,platform\0simple-bus;
 +PlatformDevtreeData data;
 +Object *container;
 +uint64_t addr = params-platform_bus_base;
 +uint64_t size = params-platform_bus_size;
 +int irq_start = params-platform_bus_first_irq;
 +
 +/* Create a /platform node that we can put all devices into */
 +
 +qemu_fdt_add_subnode(fdt, node);
 +qemu_fdt_setprop(fdt, node, compatible, platcomp,
 sizeof(platcomp));
 +
 +/* Our platform bus region is less than 32bit big, so 1 cell is
 enough for
 +   address and size */
 +qemu_fdt_setprop_cells(fdt, node, #size-cells, 1);
 +qemu_fdt_setprop_cells(fdt, node, #address-cells, 1);
 +qemu_fdt_setprop_cells(fdt, node, ranges, 0, addr  32, addr,
 size);
 +
 +qemu_fdt_setprop_phandle(fdt, node, interrupt-parent, mpic);
 +
 +/* Loop through all devices and create nodes for known ones */
 +data.fdt = fdt;
 +data.mpic = mpic;
 +data.irq_start = irq_start;
 +data.node = node;
 +
 +container = container_get(qdev_get_machine(), /peripheral);
 +sysbus_device_create_devtree(container, data);
 +container = container_get(qdev_get_machine(), /peripheral-anon);
 +sysbus_device_create_devtree(container, data);
 +
 +g_free(node);
 +}
 
 Device trees are pretty platform (and even machine) specific. Just to
 give you an example - the interrupt specifier on most e500 systems
 really is 4 cells big:
 
 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt#n80
 
 
 |   Interrupt specifiers consists of 4 cells encoded as
   follows:
 
1st-cell   interrupt-number
 
 Identifies the interrupt source.  The meaning
 depends on the type of interrupt.
 
 Note: If the interrupt-type cell is undefined
 (i.e. #interrupt-cells = 2), this cell
 should be interpreted the same as for
 interrupt-type 0-- i.e. an external or
 normal SoC device interrupt.
 
2nd-cell   level-sense information, encoded as follows:
 0 = low-to-high edge triggered
 1 = active low level-sensitive
 2 = active high level-sensitive
 3 = high-to-low edge triggered
 
3rd-cell   interrupt-type
 
 The following types are supported:
 
   0 = external or normal SoC device interrupt
 
   The interrupt-number cell contains
   the

Re: [Qemu-devel] [PATCH 4/7] hw/arm/virt: Support dynamically spawned sysbus devices

2014-07-23 Thread Eric Auger

On 07/08/2014 03:51 PM, Alexander Graf wrote:
 
 On 07.07.14 09:08, Eric Auger wrote:
 Allows sysbus devices to be instantiated from command line by
 using -device option

 ---

 Inspired from what Alex Graf did in ppc e500
 https://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00012.html

 Signed-off-by: Alexander Graf ag...@suse.de
 Signed-off-by: Eric Auger eric.au...@linaro.org
 ---
   hw/arm/virt.c | 58
 +-
   1 file changed, 57 insertions(+), 1 deletion(-)

 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index eeecdbf..3a21db4 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
 @@ -40,6 +40,8 @@
   #include exec/address-spaces.h
   #include qemu/bitops.h
   #include qemu/error-report.h
 +#include hw/misc/platform_devices.h
 +#include hw/vfio/vfio-platform.h
 #define NUM_VIRTIO_TRANSPORTS 32
   @@ -57,6 +59,14 @@
   #define GIC_FDT_IRQ_PPI_CPU_START 8
   #define GIC_FDT_IRQ_PPI_CPU_WIDTH 8
   +#define MACHVIRT_PLATFORM_BASE 0xa004000
 
 That's an odd address for a 128MB window. Can you make it 128MB aligned?
 Maybe move the virtio region behind this one?
Yes you're right. I didn't pay attention to that. Now we have to find a
hole agreed with everybody if that's feasible ;-)
 
 With a bit of smartness we don't need a virtio-mmio region with this
 patch set anymore btw. We could just generate the virtio-mmio devices on
 our platform bus on the fly.
 
 +#define MACHVIRT_PLATFORM_HOLE (128ULL * 1024 * 1024) /* 128
 MB */
 
 As Scott mentioned in the e500 review round, hole is an odd name ;).
OK I will rename that.
 
 
 Alex

Re: [Qemu-devel] [PATCH] target-i386/FPU: wrong conversion infinity from float80 to int32/int64

2014-07-23 Thread Dmitry Poletaev

I'm understood. So, am I right?

From: Dmitry Poletaev poletaev-q...@yandex.ru
Signed-off-by: Dmitry Poletaev poletaev-q...@yandex.ru

---
 target-i386/fpu_helper.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/target-i386/fpu_helper.c b/target-i386/fpu_helper.c
index 1b2900d..c4fdad8 100644
--- a/target-i386/fpu_helper.c
+++ b/target-i386/fpu_helper.c
@@ -251,16 +251,31 @@ int32_t helper_fist_ST0(CPUX86State *env)
 int32_t helper_fistl_ST0(CPUX86State *env)
 {
 int32_t val;
-
+signed char old_exp_flags;
+
+old_exp_flags = env-fp_status.float_exception_flags;
+env-fp_status.float_exception_flags = 0;
 val = floatx80_to_int32(ST0, env-fp_status);
+if (env-fp_status.float_exception_flags  FPUS_IE) {
+val = 0x8000;
+}
+env-fp_status.float_exception_flags |= old_exp_flags;
 return val;
 }
 
 int64_t helper_fistll_ST0(CPUX86State *env)
 {
 int64_t val;
-
-val = floatx80_to_int64(ST0, env-fp_status);
+signed char old_exp_flags;
+
+old_exp_flags = env-fp_status.float_exception_flags;
+env-fp_status.float_exception_flags = 0;
+
+val = floatx80_to_int64(ST0, env-fp_status);
+if (env-fp_status.float_exception_flags  FPUS_IE) {
+val = 0x8000;
+}
+env-fp_status.float_exception_flags |= old_exp_flags;
 return val;
 }
 
-- 
1.8.4.msysgit.0



23.07.2014, 16:42, Peter Maydell peter.mayd...@linaro.org:
 On 23 July 2014 12:55, Dmitry Poletaev poletaev-q...@yandex.ru wrote:
  14.07.2014, 18:59, Peter Maydell peter.mayd...@linaro.org:
   Since softfloat's status flags are sticky ...
  What does it mean?

 Sticky here means that the status flags accumulate the
 status from a sequence of operations: a softfloat function
 will set the flag if the relevant exception occurred, but if
 the exceptional condition did not happen then the flag will
 be left at whatever its preceding value was. So you can't
 just say if the flag is set then the last operation I did set
 it, because it might have been set by some operation
 before that. (That is, once a bit gets set in the flags word
 it sticks and doesn't go away.)

 This matches the IEEE mandated behaviour for
 floating point exception flags, which is why we do it.

 thanks
 -- PMM

[Qemu-devel] [Bug 1347387] [NEW] while i was created the new virtual machine using qemu the following error was shown in fedora version 20

2014-07-23 Thread selvakumar

Public bug reported:

[root@localhost pkgs]# qemu-img create virtualdisk.img 100M
qemu-img: symbol lookup error: qemu-img: undefined symbol: glfs_discard_async
[root@localhost pkgs]# qemu-i386 create virtualdisk.img 100M
Error while loading create: No such file or directory

[root@localhost pkgs]# rpm -qa qemu-kvm libvirt
qemu-kvm-1.6.2-6.fc20.x86_64
libvirt-1.1.3.5-2.fc20.x86_64
[root@localhost pkgs]# 

[root@localhost pkgs]# rpm -qa|grep *qemu*
qemu-system-m68k-1.6.2-6.fc20.x86_64
qemu-kvm-1.6.2-6.fc20.x86_64
qemu-system-microblaze-1.6.2-6.fc20.x86_64
ipxe-roms-qemu-20130517-3.gitc4bce43.fc20.noarch
qemu-common-1.6.2-6.fc20.x86_64
qemu-system-sh4-1.6.2-6.fc20.x86_64
qemu-system-sparc-1.6.2-6.fc20.x86_64
qemu-system-lm32-1.6.2-6.fc20.x86_64
qemu-img-1.6.2-6.fc20.x86_64
qemu-system-s390x-1.6.2-6.fc20.x86_64
qemu-system-cris-1.6.2-6.fc20.x86_64
qemu-1.6.2-6.fc20.x86_64
qemu-system-xtensa-1.6.2-6.fc20.x86_64
qemu-system-moxie-1.6.2-6.fc20.x86_64
qemu-system-ppc-1.6.2-6.fc20.x86_64
libvirt-daemon-driver-qemu-1.1.3.5-2.fc20.x86_64
qemu-system-mips-1.6.2-6.fc20.x86_64
qemu-system-alpha-1.6.2-6.fc20.x86_64
qemu-guest-agent-1.6.1-2.fc20.x86_64
qemu-user-1.6.2-6.fc20.x86_64
qemu-system-x86-1.6.2-6.fc20.x86_64
qemu-system-arm-1.6.2-6.fc20.x86_64
qemu-system-unicore32-1.6.2-6.fc20.x86_64
qemu-system-or32-1.6.2-6.fc20.x86_64
[root@localhost pkgs]#

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1347387

Title:
  while i was created the new virtual machine using qemu the following
  error was shown in fedora version 20

Status in QEMU:
  New

Bug description:
  [root@localhost pkgs]# qemu-img create virtualdisk.img 100M
  qemu-img: symbol lookup error: qemu-img: undefined symbol: glfs_discard_async
  [root@localhost pkgs]# qemu-i386 create virtualdisk.img 100M
  Error while loading create: No such file or directory

  [root@localhost pkgs]# rpm -qa qemu-kvm libvirt
  qemu-kvm-1.6.2-6.fc20.x86_64
  libvirt-1.1.3.5-2.fc20.x86_64
  [root@localhost pkgs]# 

  [root@localhost pkgs]# rpm -qa|grep *qemu*
  qemu-system-m68k-1.6.2-6.fc20.x86_64
  qemu-kvm-1.6.2-6.fc20.x86_64
  qemu-system-microblaze-1.6.2-6.fc20.x86_64
  ipxe-roms-qemu-20130517-3.gitc4bce43.fc20.noarch
  qemu-common-1.6.2-6.fc20.x86_64
  qemu-system-sh4-1.6.2-6.fc20.x86_64
  qemu-system-sparc-1.6.2-6.fc20.x86_64
  qemu-system-lm32-1.6.2-6.fc20.x86_64
  qemu-img-1.6.2-6.fc20.x86_64
  qemu-system-s390x-1.6.2-6.fc20.x86_64
  qemu-system-cris-1.6.2-6.fc20.x86_64
  qemu-1.6.2-6.fc20.x86_64
  qemu-system-xtensa-1.6.2-6.fc20.x86_64
  qemu-system-moxie-1.6.2-6.fc20.x86_64
  qemu-system-ppc-1.6.2-6.fc20.x86_64
  libvirt-daemon-driver-qemu-1.1.3.5-2.fc20.x86_64
  qemu-system-mips-1.6.2-6.fc20.x86_64
  qemu-system-alpha-1.6.2-6.fc20.x86_64
  qemu-guest-agent-1.6.1-2.fc20.x86_64
  qemu-user-1.6.2-6.fc20.x86_64
  qemu-system-x86-1.6.2-6.fc20.x86_64
  qemu-system-arm-1.6.2-6.fc20.x86_64
  qemu-system-unicore32-1.6.2-6.fc20.x86_64
  qemu-system-or32-1.6.2-6.fc20.x86_64
  [root@localhost pkgs]#

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1347387/+subscriptions

[Qemu-devel] [Bug 1347555] [NEW] qemu build failure, hxtool is a bash script, not a /bin/sh script

2014-07-23 Thread Felix von Leitner

Public bug reported:

hxtool (part of the early build process) is a bash script.  Running it
with /bin/sh yields a syntax error on line 10:

 10 STEXI*|ETEXI*|SQMP*|EQMP*) flag=$(($flag^1))

$(( expr )) is a bash extension, not part of /bin/sh.

Note that replacing the sh in the first line in hxtool with /bin/bash
does not help, because the script is run manually from the Makefile with
sh:

154 $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h  $ 
$@,  GEN   $@)

The fix is to change those lines to

154 $(call quiet-command,bash $(SRC_PATH)/scripts/hxtool -h  $
 $@,  GEN   $@)

(there are five or so).

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1347555

Title:
  qemu build failure, hxtool is a bash script, not a /bin/sh script

Status in QEMU:
  New

Bug description:
  hxtool (part of the early build process) is a bash script.  Running it
  with /bin/sh yields a syntax error on line 10:

   10 STEXI*|ETEXI*|SQMP*|EQMP*) flag=$(($flag^1))

  $(( expr )) is a bash extension, not part of /bin/sh.

  Note that replacing the sh in the first line in hxtool with /bin/bash
  does not help, because the script is run manually from the Makefile
  with sh:

  154 $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h  $
   $@,  GEN   $@)

  The fix is to change those lines to

  154 $(call quiet-command,bash $(SRC_PATH)/scripts/hxtool -h 
  $  $@,  GEN   $@)

  (there are five or so).

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1347555/+subscriptions

Re: [Qemu-devel] [PATCH 5/7] hw/core/sysbus: add fdt_add_node method

2014-07-23 Thread Eric Auger

On 07/08/2014 03:52 PM, Alexander Graf wrote:
 
 On 07.07.14 09:08, Eric Auger wrote:
 This method is meant to be called on sysbus device dynamic
 instantiation (-device option). Devices that support this
 kind of instantiation must implement this method.

 Signed-off-by: Eric Auger eric.au...@linaro.org
 
 For the reason I stated earlier, I don't think it's a good idea to put
 device tree code into our device models.

Hi Alex,

I would propose we discuss that topic during next KVM call if you are
available. Hope Peter will be available to join too. Because I feel
stuck between not putting things in the machine file (1) - obviously we
could put them in a helper module (2) - and not putting them in the
device (3).

Whatever the solution I fear we are going to pollute something: Any time
a new device wants to support dynamic instantiation, we would need to
modify the machine file or the helper module with 1 and 2 resp. In case
we put it in the device we pollute this latter...

My hope was that quite few QEMU platform devices would need to support
that feature and hence would need to implement this dt node generation
method. To me dynamic instantiation of platform device was not the
mainstream solution.

Then there is the fundamental question of technical feasibility of
devising a generic PlatformParams that match all the specialization
needs? Here I miss experience. In case we know the machine type and a
small set of additional fields couldn't we do the adaptations you talked
about, related to IRQs?

Best Regards

Eric

 
 
 Alex

Re: [Qemu-devel] [PATCH 7/7] hw/misc/platform_devices: Add platform_bus_base to PlatformDevtreeData

2014-07-23 Thread Eric Auger

On 07/08/2014 03:53 PM, Alexander Graf wrote:
 
 On 07.07.14 09:08, Eric Auger wrote:
 The base address of the platform bus sometimes is used to build the
 reg property.

 ---

 Actually I did not succeed in doing it another way with Calxeda xgmac.
 If someone knows how to do without, please advise.
 
 Not sure I understand. The regs properties live inside the parent's
 ranges. So in device tree the only thing that should be aware of the
 bus offset is the platform bus node, no?
yes I full agree with you as I mentioned in a previous email. I tried to
use offset instead of range but I never succeeded in making it work.
Maybe a syntax issue, ... I need to spend some more time on it/ get some
help and fix that ...

BR

Eric
 
 
 Alex

Re: [Qemu-devel] [RFC PATCH 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service

2014-07-23 Thread Eric Blake

On 07/23/2014 08:25 AM, Yang Hongyang wrote:
 Virtual machine (VM) replication is a well known technique for
 providing application-agnostic software-implemented hardware fault
 tolerance non-stop service. COLO is a high availability solution.
 Both primary VM (PVM) and secondary VM (SVM) run in parallel. They
 receive the same request from client, and generate response in parallel
 too. If the response packets from PVM and SVM are identical, they are
 released immediately. Otherwise, a VM checkpoint (on demand) is
 conducted. The idea is presented in Xen summit 2012, and 2013,
 and academia paper in SOCC 2013. It's also presented in KVM forum
 2013:
 http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf
 Please refer to above document for detailed information. 
 Please also refer to previous posted RFC proposal:
 http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
 
 The patchset is also hosted on github:
 https://github.com/macrosheep/qemu/tree/colo_v0.1
 
 This patchset is RFC, implements the frame of colo, without
 failover and nic/disk replication. But it is ready for demo
 the COLO idea above QEMU-Kvm.
 Steps using this patchset to get an overview of COLO:
 1. configure the source with --enable-colo option

Code that has to be opt-in tends to bitrot, because people don't
configure their build-bots to opt in.  What sort of penalties does
opting in cause to the code if colo is not used?  I'd much rather make
the default to compile colo unless configured --disable-colo.  Are there
any pre-req libraries required for it to work?  That would be the only
reason to make the default of on or off conditional, rather than
defaulting to on.


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC PATCH 02/17] COLO: introduce an api colo_supported() to indicate COLO support

2014-07-23 Thread Eric Blake

On 07/23/2014 08:25 AM, Yang Hongyang wrote:
 introduce an api colo_supported() to indicate COLO support, returns
 true if colo supported(configured with --enable-colo).

Space before () in English sentences:
 s/supported(configured/supported (configured/

As I mentioned in the cover letter, defaulting to off is probably a bad
idea; I'd rather default to on or even make it unconditional if it
doesn't negatively affect the code base when not used.

 
 Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
 ---
  Makefile.objs  |  1 +
  include/migration/migration-colo.h | 18 ++
  migration-colo.c   | 16 
  stubs/Makefile.objs|  1 +
  stubs/migration-colo.c | 16 
  5 files changed, 52 insertions(+)
  create mode 100644 include/migration/migration-colo.h
  create mode 100644 migration-colo.c
  create mode 100644 stubs/migration-colo.c
 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC PATCH 12/17] COLO ctl: add a RunState RUN_STATE_COLO

2014-07-23 Thread Eric Blake

On 07/23/2014 08:25 AM, Yang Hongyang wrote:
 Guest will enter this state when paused to save/resore VM state

s/resore/restore/

 under colo checkpoint.
 
 Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
 ---
  qapi-schema.json | 4 +++-
  vl.c | 8 
  2 files changed, 11 insertions(+), 1 deletion(-)
 
 diff --git a/qapi-schema.json b/qapi-schema.json
 index 807f5a2..b42171c 100644
 --- a/qapi-schema.json
 +++ b/qapi-schema.json
 @@ -145,12 +145,14 @@
  # @watchdog: the watchdog action is configured to pause and has been 
 triggered
  #
  # @guest-panicked: guest has been panicked as a result of guest OS panic
 +#
 +# @colo: guest is paused to save/restore VM state under colo checkpoint

Missing a '(since 2.2)' designation.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Bug 1347555] [NEW] qemu build failure, hxtool is a bash script, not a /bin/sh script

2014-07-23 Thread Eric Blake

On 07/23/2014 04:21 AM, Felix von Leitner wrote:
 Public bug reported:
 
 hxtool (part of the early build process) is a bash script.  Running it
 with /bin/sh yields a syntax error on line 10:
 
  10 STEXI*|ETEXI*|SQMP*|EQMP*) flag=$(($flag^1))
 
 $(( expr )) is a bash extension, not part of /bin/sh.

Wrong.  $(( expr )) is mandated by POSIX.  What system are you on where
/bin/sh is not POSIX?  (Solaris is the only platform where /bin/sh does
not try to be POSIX-compliant, but who uses that for qemu?)

What is the actual syntax error you are seeing?  Is this a bug in dash
on your distribution?  I can't get dash to fail for me on Fedora:

$ dash -c 'f=1; f=$(($f^1)); echo $f'
0
$ dash -n scripts/hxtool; echo $?
0

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [Bug 1347555] Re: qemu build failure, hxtool is a bash script, not a /bin/sh script

2014-07-23 Thread Felix von Leitner

I actually have bash installed as /bin/sh and /bin/bash.
But I also have heirloom sh installed, which installs itself as /sbin/sh, and 
that happened to be first in my $PATH.

Since the makefiles use sh script to run the scripts, that called the
heirloom sh.

http://heirloom.sourceforge.net/sh.html

It is, it turns out, derived from OpenSolaris.  So there you go :-)

When I delete /sbin/sh, qemu builds.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1347555

Title:
  qemu build failure, hxtool is a bash script, not a /bin/sh script

Status in QEMU:
  New

Bug description:
  hxtool (part of the early build process) is a bash script.  Running it
  with /bin/sh yields a syntax error on line 10:

   10 STEXI*|ETEXI*|SQMP*|EQMP*) flag=$(($flag^1))

  $(( expr )) is a bash extension, not part of /bin/sh.

  Note that replacing the sh in the first line in hxtool with /bin/bash
  does not help, because the script is run manually from the Makefile
  with sh:

  154 $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h  $
   $@,  GEN   $@)

  The fix is to change those lines to

  154 $(call quiet-command,bash $(SRC_PATH)/scripts/hxtool -h 
  $  $@,  GEN   $@)

  (there are five or so).

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1347555/+subscriptions

Re: [Qemu-devel] [Bug 1347555] Re: qemu build failure, hxtool is a bash script, not a /bin/sh script

2014-07-23 Thread Eric Blake

On 07/23/2014 10:13 AM, Felix von Leitner wrote:
 I actually have bash installed as /bin/sh and /bin/bash.
 But I also have heirloom sh installed, which installs itself as /sbin/sh, and 
 that happened to be first in my $PATH.
 
 Since the makefiles use sh script to run the scripts, that called the
 heirloom sh.
 
 http://heirloom.sourceforge.net/sh.html
 
 It is, it turns out, derived from OpenSolaris.  So there you go :-)
 
 When I delete /sbin/sh, qemu builds.

Then the bug is not in qemu, but in your environment.  Installing
known-broken heirloom where it can be found first on a PATH search for
sh is just asking for problems, not just with qemu, but with all SORTS
of programs that expect POSIX semantics from a Linux /bin/sh.

Rather than change the Makefile to invoke the script with bash, we could
instead bend over backwards to rewrite the script in a way that works
with non-POSIX shells (as in, flag=`expr $flag ^ 1`), but that feels
backwards to me.  Until someone is actively worried about porting qemu
to a true Solaris environment, rather than just an heirloom-as-/bin/sh
Linux environment, I don't think it's worth the effort.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 0/2] pc: fix /etc/acpi/tables size in fw_cfg for -M pc-2.0

2014-07-23 Thread Paolo Bonzini

Changing the ACPI table size causes migration to break, and the memory
hotplug work opened our eyes on how horribly we were breaking things in
2.0 already.

Unfortunately when reviewing the design I assumed incorrectly that all
tables would be placed in separate fw_cfg files.  This would have been
better, because you can always move stuff to a new SSDT (and thus a new
file), keeping the sizes under control.

Hard-code 64k as the maximum ACPI table size; for -M pc-i440fx-2.0
and -M pc-i440fx-1.7 compute the payload size of QEMU 2.0 and always
use that one.  This works always for QEMU 2.0, and also for 1.7
except for a few values of -smp maxcpus.

The first patch is needed to shrink the ACPI tables and make them
smaller than they used to be in 2.0.

Please test and ack.  I'll do more testing tomorrow.

Paolo


Paolo Bonzini (2):
  acpi-dsdt: procedurally generate _PRT
  pc: hack for migration compatibility from QEMU 2.0

 hw/i386/acpi-build.c  | 61 +++---
 hw/i386/acpi-dsdt.dsl | 90 ++-
 hw/i386/pc_piix.c | 20 
 hw/i386/pc_q35.c  |  5 +++
 include/hw/i386/pc.h  |  1 +
 5 files changed, 122 insertions(+), 55 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT

2014-07-23 Thread Paolo Bonzini

This replaces the _PRT constant with a method that computes it.

The problem is that the DSDT+SSDT have grown from 2.0 to 2.1,
enough to cross the 8k barrier (we align the ACPI tables to 4k
before putting them in fw_cfg).  This causes problems with
migration and the pc-2.0 machine type.

The solution to the problem is to hardcode 64k as the limit,
but this doesn't solve the bug with pc-2.0.  The fix will be
for QEMU 2.1 to use exactly the same size as QEMU 2.0 for the
ACPI tables.  First, however, we must make the actual AML size
equal or smaller; to do this, rewrite _PRT in a way that saves
over 1k of bytecode.

Tested on Windows XP.  Q35 already uses a method for _PRT
so most guests should be okay.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/i386/acpi-dsdt.dsl | 90 ++-
 1 file changed, 39 insertions(+), 51 deletions(-)

diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
index 3cc0ea0..6ba0170 100644
--- a/hw/i386/acpi-dsdt.dsl
+++ b/hw/i386/acpi-dsdt.dsl
@@ -181,57 +181,45 @@ DefinitionBlock (
 
 Scope(\_SB) {
 Scope(PCI0) {
-Name(_PRT, Package() {
-/* PCI IRQ routing table, example from ACPI 2.0a specification,
-   section 6.2.8.1 */
-/* Note: we provide the same info as the PCI routing
-   table of the Bochs BIOS */
-
-#define prt_slot(nr, lnk0, lnk1, lnk2, lnk3) \
-Package() { nr##, 0, lnk0, 0 }, \
-Package() { nr##, 1, lnk1, 0 }, \
-Package() { nr##, 2, lnk2, 0 }, \
-Package() { nr##, 3, lnk3, 0 }
-
-#define prt_slot0(nr) prt_slot(nr, LNKD, LNKA, LNKB, LNKC)
-#define prt_slot1(nr) prt_slot(nr, LNKA, LNKB, LNKC, LNKD)
-#define prt_slot2(nr) prt_slot(nr, LNKB, LNKC, LNKD, LNKA)
-#define prt_slot3(nr) prt_slot(nr, LNKC, LNKD, LNKA, LNKB)
-
-prt_slot0(0x),
-/* Device 1 is power mgmt device, and can only use irq 9 */
-prt_slot(0x0001, LNKS, LNKB, LNKC, LNKD),
-prt_slot2(0x0002),
-prt_slot3(0x0003),
-prt_slot0(0x0004),
-prt_slot1(0x0005),
-prt_slot2(0x0006),
-prt_slot3(0x0007),
-prt_slot0(0x0008),
-prt_slot1(0x0009),
-prt_slot2(0x000a),
-prt_slot3(0x000b),
-prt_slot0(0x000c),
-prt_slot1(0x000d),
-prt_slot2(0x000e),
-prt_slot3(0x000f),
-prt_slot0(0x0010),
-prt_slot1(0x0011),
-prt_slot2(0x0012),
-prt_slot3(0x0013),
-prt_slot0(0x0014),
-prt_slot1(0x0015),
-prt_slot2(0x0016),
-prt_slot3(0x0017),
-prt_slot0(0x0018),
-prt_slot1(0x0019),
-prt_slot2(0x001a),
-prt_slot3(0x001b),
-prt_slot0(0x001c),
-prt_slot1(0x001d),
-prt_slot2(0x001e),
-prt_slot3(0x001f),
-})
+Method (_PRT, 0) {
+Store(Package(128) {}, Local0)
+Store(Zero, Local1)
+While(LLess(Local1, 128)) {
+// slot = pin  2
+Store(ShiftRight(Local1, 2), Local2)
+
+// lnk = (slot + pin)  3
+Store(And(Add(Local1, Local2), 3), Local3)
+If (LEqual(Local3, 0)) {
+Store(Package(4) { Zero, Zero, LNKD, Zero }, Local4)
+}
+If (LEqual(Local3, 1)) {
+// device 1 is the power-management device, needs SCI
+If (LEqual(Local1, 4)) {
+Store(Package(4) { Zero, Zero, LNKS, Zero }, 
Local4)
+} Else {
+Store(Package(4) { Zero, Zero, LNKA, Zero }, 
Local4)
+}
+}
+If (LEqual(Local3, 2)) {
+Store(Package(4) { Zero, Zero, LNKB, Zero }, Local4)
+}
+If (LEqual(Local3, 3)) {
+Store(Package(4) { Zero, Zero, LNKC, Zero }, Local4)
+}
+
+// Complete the interrupt routing entry:
+//Package(4) { 0x[slot], [pin], [link], 0) }
+
+Store(Or(ShiftLeft(Local2, 16), 0x), Index(Local4, 0))
+Store(And(Local1, 3),Index(Local4, 1))
+Store(Local4,Index(Local0, 
Local1))
+
+Increment(Local1)
+}
+
+Return(Local0)
+}
 }
 
 Field(PCI0.ISA.P40C, ByteAcc, NoLock, Preserve) {
-- 
1.8.3.1

[Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0

2014-07-23 Thread Paolo Bonzini

Changing the ACPI table size causes migration to break, and the memory
hotplug work opened our eyes on how horribly we were breaking things in
2.0 already.

The ACPI table size is rounded to the next 4k, which one would think
gives some headroom.  In practice this is not the case, because the user
can control the ACPI table size (each CPU adds 105 bytes) and so some
-smp values will break the 4k boundary and fail to migrate.  Similarly,
PCI bridges add ~1870 bytes to the SSDT.

To fix this, hard-code 64k as the maximum ACPI table size, which
(despite being an order of magnitude smaller than 640k) should be enough
for everyone.

To fix migration from QEMU 2.0, compute the payload size of QEMU 2.0
and always use that one.  The previous patch shrunk the ACPI tables
enough that the QEMU 2.0 size should always be enough.

Non-AML tables can change depending on the configuration (especially
MADT, SRAT, HPET) but they remain the same between QEMU 2.0 and 2.1,
so we only compute our padding based on the sizes of the SSDT and DSDT.

Migration from QEMU 1.7 should work for guests that have a number of CPUs
other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140, and that have no
PCI bridges.  It was already broken from QEMU 1.7 to QEMU 2.0 in the
same way, though.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/i386/acpi-build.c | 61 
 hw/i386/pc_piix.c| 20 +
 hw/i386/pc_q35.c |  5 +
 include/hw/i386/pc.h |  1 +
 4 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index ebc5f03..7373d93 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -25,7 +25,9 @@
 #include glib.h
 #include qemu-common.h
 #include qemu/bitmap.h
+#include qemu/osdep.h
 #include qemu/range.h
+#include qemu/error-report.h
 #include hw/pci/pci.h
 #include qom/cpu.h
 #include hw/i386/pc.h
@@ -87,6 +89,8 @@ typedef struct AcpiBuildPciBusHotplugState {
 struct AcpiBuildPciBusHotplugState *parent;
 } AcpiBuildPciBusHotplugState;
 
+unsigned bsel_alloc;
+
 static void acpi_get_dsdt(AcpiMiscInfo *info)
 {
 uint16_t *applesmc_sta;
@@ -759,8 +763,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
 static void acpi_set_pci_info(void)
 {
 PCIBus *bus = find_i440fx(); /* TODO: Q35 support */
-unsigned bsel_alloc = 0;
 
+assert(bsel_alloc == 0);
 if (bus) {
 /* Scan all PCI buses. Set property to enable acpi based hotplug. */
 pci_for_each_bus_depth_first(bus, acpi_set_bsel, NULL, bsel_alloc);
@@ -1440,13 +1444,14 @@ static
 void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
 {
 GArray *table_offsets;
-unsigned facs, dsdt, rsdt;
+unsigned facs, ssdt, dsdt, rsdt;
 AcpiCpuInfo cpu;
 AcpiPmInfo pm;
 AcpiMiscInfo misc;
 AcpiMcfgInfo mcfg;
 PcPciInfo pci;
 uint8_t *u;
+size_t aml_len = 0;
 
 acpi_get_cpu_info(cpu);
 acpi_get_pm_info(pm);
@@ -1474,13 +1479,20 @@ void acpi_build(PcGuestInfo *guest_info, 
AcpiBuildTables *tables)
 dsdt = tables-table_data-len;
 build_dsdt(tables-table_data, tables-linker, misc);
 
+/* Count the size of the DSDT and SSDT, we will need it for legacy
+ * sizing of ACPI tables.
+ */
+aml_len += tables-table_data-len - dsdt;
+
 /* ACPI tables pointed to by RSDT */
 acpi_add_table(table_offsets, tables-table_data);
 build_fadt(tables-table_data, tables-linker, pm, facs, dsdt);
 
+ssdt = tables-table_data-len;
 acpi_add_table(table_offsets, tables-table_data);
 build_ssdt(tables-table_data, tables-linker, cpu, pm, misc, pci,
guest_info);
+aml_len += tables-table_data-len - ssdt;
 
 acpi_add_table(table_offsets, tables-table_data);
 build_madt(tables-table_data, tables-linker, cpu, guest_info);
@@ -1513,12 +1525,53 @@ void acpi_build(PcGuestInfo *guest_info, 
AcpiBuildTables *tables)
 /* RSDP is in FSEG memory, so allocate it separately */
 build_rsdp(tables-rsdp, tables-linker, rsdt);
 
-/* We'll expose it all to Guest so align size to reduce
+/* We'll expose it all to Guest so we want to reduce
  * chance of size changes.
  * RSDP is small so it's easy to keep it immutable, no need to
  * bother with alignment.
+ *
+ * We used to align the tables to 4k, but of course this would
+ * too simple to be enough.  4k turned out to be too small an
+ * alignment very soon, and in fact it is almost impossible to
+ * keep the table size stable for all (max_cpus, max_memory_slots)
+ * combinations.  So the table size is always 64k for pc-2.1 and
+ * we give an error if the table grows beyond that limit.
+ *
+ * We still have the problem of migrating from -M pc-2.0.  For that,
+ * we exploit the fact that QEMU 2.1 generates _smaller_ tables than 2.0
+ * and we can always pad the smaller tables with zeros.  We can then use
+ * the exact size of

Re: [Qemu-devel] [PATCH] target-i386/FPU: wrong conversion infinity from float80 to int32/int64

2014-07-23 Thread Peter Maydell

On 23 July 2014 16:04, Dmitry Poletaev poletaev-q...@yandex.ru wrote:
 I'm understood. So, am I right?

Pretty much, except it's better to use the accessor functions
get_float_exception_flags() and set_float_exception_flags().

 +if (env-fp_status.float_exception_flags  FPUS_IE) {
 +val = 0x8000;

Also this constant needs a ULL suffix or it won't build on
32 bit hosts.

thanks
-- PMM

Re: [Qemu-devel] [Bug 1347555] Re: qemu build failure, hxtool is a bash script, not a /bin/sh script

2014-07-23 Thread Peter Maydell

On 23 July 2014 17:31, Eric Blake ebl...@redhat.com wrote:
 Rather than change the Makefile to invoke the script with bash, we could
 instead bend over backwards to rewrite the script in a way that works
 with non-POSIX shells (as in, flag=`expr $flag ^ 1`), but that feels
 backwards to me.  Until someone is actively worried about porting qemu
 to a true Solaris environment, rather than just an heirloom-as-/bin/sh
 Linux environment, I don't think it's worth the effort.

My view on this has always been we shouldn't assume bash,
but we can assume POSIX shell semantics. (And also that
we should assume /bin/sh is a POSIX shell, because it's the
21st century, and Solaris should just get with it :-))

thanks
-- PMM

[Qemu-devel] [ANNOUNCE] QEMU 1.7.2 Stable released

2014-07-23 Thread Michael Roth

Hi everyone,

I am pleased to announce that the QEMU v1.7.2 stable release is now
available at:

  http://wiki.qemu.org/download/qemu-1.7.2.tar.bz2

v1.7.2 is now tagged in the official qemu.git repository,
and the stable-1.7 branch has been updated accordingly:

  http://git.qemu.org/?p=qemu.git;a=shortlog;h=refs/heads/stable-1.7

This release contains 155 build/bug fixes, including important security
updates relating to untrusted guest image files and migration/savevm
sources. See the changelog below for relevant CVEs and additional
details.

Thank you to everyone involved!

CHANGELOG:

adba377: Update VERSION for 1.7.2 release (Michael Roth)
   
8fde73e: Allow mismatched virtio config-len (Dr. David Alan Gilbert)
14d9fb0: pci: assign devfn to pci_dev before calling 
pci_device_iommu_address_space() (Le Tan)
53e4895: hw: Fix qemu_allocate_irqs() leaks (Andreas Färber)
bb485bf: sdhci: Fix misuse of qemu_free_irqs() (Andreas Färber)
02835d5: vnc: Fix tight_detect_smooth_image() for lossless case (Markus 
Armbruster)
41ee918: qapi: zero-initialize all QMP command parameters (Michael Roth)
0c60b74: nbd: Shutdown socket before closing. (Hani Benhabiles)
25351f6: nbd: Close socket on negotiation failure. (Hani Benhabiles)
cf392d2: nbd: Don't validate from and len in NBD_CMD_DISC. (Hani Benhabiles)
3c3d8c6: nbd: Don't export a block device with no medium. (Hani Benhabiles)
62c754e: virtio-serial: don't migrate the config space (Alexander Graf)
0fd14a5: virtio-net: byteswap virtio-net header (Cédric Le Goater)
7a3cd5a: target-i386: Filter FEAT_7_0_EBX TCG features too (Eduardo Habkost)
8a93721: coroutine-win32.c: Add noinline attribute to work around gcc bug 
(Peter Maydell)
b47506f: KVM: Fix GSI number space limit (Alexander Graf)
f0c609d: usb: Fix usb-bt-dongle initialization. (Hani Benhabiles)
79bd778: vhost: fix resource leak in error handling (Michael S. Tsirkin)
36afdba: scsi-disk: fix bug in scsi_block_new_request() introduced by commit 
137745c (Ulrich Obergfell)
63bf1e0: rdma: bug fixes (Michael R. Hines)
23dbc56: qga: Fix handle fd leak in acquire_privilege() (Gonglei)
4041945: aio: fix qemu_bh_schedule() bh-ctx race condition (Stefan Hajnoczi)
5019106: s390x/css: handle emw correctly for tsch (Cornelia Huck)
f784615: target-arm: Fix errors in writes to generic timer control registers 
(Peter Maydell)
e34feec: tcg-i386: Fix win64 qemu store (Richard Henderson)
ccb08f5: linux-user: Don't overrun guest buffer in sched_getaffinity (Peter 
Maydell)
cb34d1e: qemu-img: Plug memory leak in convert command (Markus Armbruster)
df9c108: block/sheepdog: Plug memory leak in sd_snapshot_create() (Markus 
Armbruster)
d3cd48a: block/vvfat: Plug memory leak in read_directory() (Markus Armbruster)
501da93: block/vvfat: Plug memory leak in check_directory_consistency() (Markus 
Armbruster)
7267e51: block/qapi: Plug memory leak in dump_qobject() case QTYPE_QERROR 
(Markus Armbruster)
d1775fe: blockdev: Plug memory leak in drive_init() (Markus Armbruster)
d2b9874: blockdev: Plug memory leak in blockdev_init() (Markus Armbruster)
c2fb0f2: cputlb: Fix regression with TCG interpreter (bug 1310324) (Stefan Weil)
26b5102: target-xtensa: fix cross-page jumps/calls at the end of TB (Max 
Filippov)
44564f8: virtio-scsi: Plug memory leak on virtio_scsi_push_event() error path 
(Markus Armbruster)
2f1eb04: qcow1: Stricter backing file length check (Kevin Wolf)
b53d866: qcow1: Validate image size (CVE-2014-0223) (Kevin Wolf)
8b17eb6: qcow1: Validate L2 table size (CVE-2014-0222) (Kevin Wolf)
e6c55cf: qcow1: Check maximum cluster size (Kevin Wolf)
41819e9: qcow1: Make padding in the header explicit (Kevin Wolf)
97a0e27: parallels: Sanity check for s-tracks (CVE-2014-0142) (Kevin Wolf)
750336b: parallels: Fix catalog size integer overflow (CVE-2014-0143) (Kevin 
Wolf)
cfa8008: qcow2: Check maximum L1 size in qcow2_snapshot_load_tmp() 
(CVE-2014-0143) (Kevin Wolf)
d99c4e2: qcow2: Fix L1 allocation size in qcow2_snapshot_load_tmp() 
(CVE-2014-0145) (Kevin Wolf)
641c3ec: qcow2: Fix copy_sectors() with VM state (Kevin Wolf)
c2c5272: qcow2: Fix NULL dereference in qcow2_open() error path (CVE-2014-0146) 
(Kevin Wolf)
759d386: block: Limit request size (CVE-2014-0143) (Kevin Wolf) 
b6f7fbd: dmg: prevent chunk buffer overflow (CVE-2014-0145) (Stefan Hajnoczi)
d400b5d: dmg: use uint64_t consistently for sectors and lengths (Stefan 
Hajnoczi)
758c484: dmg: sanitize chunk length and sectorcount (CVE-2014-0145) (Stefan 
Hajnoczi)
4b50bd7: dmg: use appropriate types when reading chunks (Stefan Hajnoczi)
4ee5b9c: dmg: drop broken bdrv_pread() loop (Stefan Hajnoczi)
ad08cae: dmg: prevent out-of-bounds array access on terminator (Stefan Hajnoczi)
dedf4a5: dmg: coding style and indentation cleanup (Stefan Hajnoczi)
3c6347c: qcow2: Fix new L1 table size check (CVE-2014-0143) (Kevin Wolf)
e1c8770: qcow2: Protect against some integer overflows in bdrv_check (Kevin 
Wolf)
c874837: qcow2: Fix types in

Re: [Qemu-devel] [RFC PATCH 07/17] COLO buffer: implement colo buffer as well as QEMUFileOps based on it

2014-07-23 Thread Eric Blake

On 07/23/2014 08:25 AM, Yang Hongyang wrote:
 We need a buffer to store migration data.
 
 On save side:
   all saved data was write into colo buffer first, so that we can know

s/was write/is written/

 the total size of the migration data. this can also separate the data
 transmission from colo control data, we use colo control data over
 socket fd to synchronous both side's stat.
 
 On restore side:
   all migration data was read into colo buffer first, then load data
 from the buffer: If network error happens while data transmission,

s/while/during/

 the slaver can still functinal because the migration data are not yet

s/slaver/slave/
s/functinal/function/
s/are/is/

 loaded.
 
 Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
 ---
  migration-colo.c | 112 
 +++
  1 file changed, 112 insertions(+)
 

 +/* colo buffer */
 +
 +#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL)
 +#define COLO_BUFFER_MAX_SIZE (1000*1000*1000*10ULL)

Spaces around binary operators.

 +
 +typedef struct colo_buffer {

For consistency with the rest of the code base, name this ColoBuffer,
not colo_buffer.

 +uint8_t *data;
 +uint64_t used;
 +uint64_t freed;
 +uint64_t size;
 +} colo_buffer_t;

HACKING says to NOT name types with a trailing _t.  Just name the
typedef ColoBuffer.


 +static void colo_buffer_destroy(void)
 +{
 +if (colo_buffer.data) {
 +g_free(colo_buffer.data);
 +colo_buffer.data = NULL;

g_free(NULL) behaves sanely, just make these two lines unconditional.


 +static void colo_buffer_extend(uint64_t len)
 +{
 +if (len  colo_buffer.size - colo_buffer.used) {
 +len = len + colo_buffer.used - colo_buffer.size;
 +len = ROUND_UP(len, COLO_BUFFER_BASE_SIZE) + COLO_BUFFER_BASE_SIZE;
 +
 +colo_buffer.size += len;
 +if (colo_buffer.size  COLO_BUFFER_MAX_SIZE) {
 +error_report(colo_buffer overflow!\n);

No trailing \n in error_report().

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT

2014-07-23 Thread Laszlo Ersek

On 07/23/14 18:37, Paolo Bonzini wrote:
 This replaces the _PRT constant with a method that computes it.
 
 The problem is that the DSDT+SSDT have grown from 2.0 to 2.1,
 enough to cross the 8k barrier (we align the ACPI tables to 4k
 before putting them in fw_cfg).  This causes problems with
 migration and the pc-2.0 machine type.
 
 The solution to the problem is to hardcode 64k as the limit,
 but this doesn't solve the bug with pc-2.0.  The fix will be
 for QEMU 2.1 to use exactly the same size as QEMU 2.0 for the
 ACPI tables.  First, however, we must make the actual AML size
 equal or smaller; to do this, rewrite _PRT in a way that saves
 over 1k of bytecode.
 
 Tested on Windows XP.  Q35 already uses a method for _PRT
 so most guests should be okay.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  hw/i386/acpi-dsdt.dsl | 90 
 ++-
  1 file changed, 39 insertions(+), 51 deletions(-)
 
 diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
 index 3cc0ea0..6ba0170 100644
 --- a/hw/i386/acpi-dsdt.dsl
 +++ b/hw/i386/acpi-dsdt.dsl
 @@ -181,57 +181,45 @@ DefinitionBlock (
  
  Scope(\_SB) {
  Scope(PCI0) {
 -Name(_PRT, Package() {
 -/* PCI IRQ routing table, example from ACPI 2.0a 
 specification,
 -   section 6.2.8.1 */
 -/* Note: we provide the same info as the PCI routing
 -   table of the Bochs BIOS */
 -
 -#define prt_slot(nr, lnk0, lnk1, lnk2, lnk3) \
 -Package() { nr##, 0, lnk0, 0 }, \
 -Package() { nr##, 1, lnk1, 0 }, \
 -Package() { nr##, 2, lnk2, 0 }, \
 -Package() { nr##, 3, lnk3, 0 }
 -
 -#define prt_slot0(nr) prt_slot(nr, LNKD, LNKA, LNKB, LNKC)
 -#define prt_slot1(nr) prt_slot(nr, LNKA, LNKB, LNKC, LNKD)
 -#define prt_slot2(nr) prt_slot(nr, LNKB, LNKC, LNKD, LNKA)
 -#define prt_slot3(nr) prt_slot(nr, LNKC, LNKD, LNKA, LNKB)
 -
 -prt_slot0(0x),
 -/* Device 1 is power mgmt device, and can only use irq 9 */
 -prt_slot(0x0001, LNKS, LNKB, LNKC, LNKD),
 -prt_slot2(0x0002),
 -prt_slot3(0x0003),
 -prt_slot0(0x0004),
 -prt_slot1(0x0005),
 -prt_slot2(0x0006),
 -prt_slot3(0x0007),
 -prt_slot0(0x0008),
 -prt_slot1(0x0009),
 -prt_slot2(0x000a),
 -prt_slot3(0x000b),
 -prt_slot0(0x000c),
 -prt_slot1(0x000d),
 -prt_slot2(0x000e),
 -prt_slot3(0x000f),
 -prt_slot0(0x0010),
 -prt_slot1(0x0011),
 -prt_slot2(0x0012),
 -prt_slot3(0x0013),
 -prt_slot0(0x0014),
 -prt_slot1(0x0015),
 -prt_slot2(0x0016),
 -prt_slot3(0x0017),
 -prt_slot0(0x0018),
 -prt_slot1(0x0019),
 -prt_slot2(0x001a),
 -prt_slot3(0x001b),
 -prt_slot0(0x001c),
 -prt_slot1(0x001d),
 -prt_slot2(0x001e),
 -prt_slot3(0x001f),
 -})
 +Method (_PRT, 0) {
 +Store(Package(128) {}, Local0)
 +Store(Zero, Local1)
 +While(LLess(Local1, 128)) {
 +// slot = pin  2
 +Store(ShiftRight(Local1, 2), Local2)
 +
 +// lnk = (slot + pin)  3
 +Store(And(Add(Local1, Local2), 3), Local3)
 +If (LEqual(Local3, 0)) {
 +Store(Package(4) { Zero, Zero, LNKD, Zero }, Local4)
 +}
 +If (LEqual(Local3, 1)) {
 +// device 1 is the power-management device, needs SCI
 +If (LEqual(Local1, 4)) {
 +Store(Package(4) { Zero, Zero, LNKS, Zero }, 
 Local4)
 +} Else {
 +Store(Package(4) { Zero, Zero, LNKA, Zero }, 
 Local4)
 +}
 +}
 +If (LEqual(Local3, 2)) {
 +Store(Package(4) { Zero, Zero, LNKB, Zero }, Local4)
 +}
 +If (LEqual(Local3, 3)) {
 +Store(Package(4) { Zero, Zero, LNKC, Zero }, Local4)
 +}
 +
 +// Complete the interrupt routing entry:
 +//Package(4) { 0x[slot], [pin], [link], 0) }
 +
 +Store(Or(ShiftLeft(Local2, 16), 0x), Index(Local4, 
 0))
 +Store(And(Local1, 3),Index(Local4, 
 1))
 +Store(Local4,Index(Local0, 
 Local1))
 +
 +Increment(Local1)
 +}
 +
 +

Re: [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0

2014-07-23 Thread Laszlo Ersek

On 07/23/14 18:37, Paolo Bonzini wrote:
 Changing the ACPI table size causes migration to break, and the memory
 hotplug work opened our eyes on how horribly we were breaking things in
 2.0 already.
 
 The ACPI table size is rounded to the next 4k, which one would think
 gives some headroom.  In practice this is not the case, because the user
 can control the ACPI table size (each CPU adds 105 bytes) and so some
 -smp values will break the 4k boundary and fail to migrate.  Similarly,
 PCI bridges add ~1870 bytes to the SSDT.
 
 To fix this, hard-code 64k as the maximum ACPI table size, which
 (despite being an order of magnitude smaller than 640k) should be enough
 for everyone.
 
 To fix migration from QEMU 2.0, compute the payload size of QEMU 2.0
 and always use that one.  The previous patch shrunk the ACPI tables
 enough that the QEMU 2.0 size should always be enough.
 
 Non-AML tables can change depending on the configuration (especially
 MADT, SRAT, HPET) but they remain the same between QEMU 2.0 and 2.1,
 so we only compute our padding based on the sizes of the SSDT and DSDT.
 
 Migration from QEMU 1.7 should work for guests that have a number of CPUs
 other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140, and that have no
 PCI bridges.  It was already broken from QEMU 1.7 to QEMU 2.0 in the
 same way, though.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  hw/i386/acpi-build.c | 61 
 
  hw/i386/pc_piix.c| 20 +
  hw/i386/pc_q35.c |  5 +
  include/hw/i386/pc.h |  1 +
  4 files changed, 83 insertions(+), 4 deletions(-)
 
 diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
 index ebc5f03..7373d93 100644
 --- a/hw/i386/acpi-build.c
 +++ b/hw/i386/acpi-build.c
 @@ -25,7 +25,9 @@
  #include glib.h
  #include qemu-common.h
  #include qemu/bitmap.h
 +#include qemu/osdep.h
  #include qemu/range.h
 +#include qemu/error-report.h
  #include hw/pci/pci.h
  #include qom/cpu.h
  #include hw/i386/pc.h
 @@ -87,6 +89,8 @@ typedef struct AcpiBuildPciBusHotplugState {
  struct AcpiBuildPciBusHotplugState *parent;
  } AcpiBuildPciBusHotplugState;
  
 +unsigned bsel_alloc;
 +
  static void acpi_get_dsdt(AcpiMiscInfo *info)
  {
  uint16_t *applesmc_sta;
 @@ -759,8 +763,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
  static void acpi_set_pci_info(void)
  {
  PCIBus *bus = find_i440fx(); /* TODO: Q35 support */
 -unsigned bsel_alloc = 0;
  
 +assert(bsel_alloc == 0);
  if (bus) {
  /* Scan all PCI buses. Set property to enable acpi based hotplug. */
  pci_for_each_bus_depth_first(bus, acpi_set_bsel, NULL, bsel_alloc);
 @@ -1440,13 +1444,14 @@ static
  void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
  {
  GArray *table_offsets;
 -unsigned facs, dsdt, rsdt;
 +unsigned facs, ssdt, dsdt, rsdt;
  AcpiCpuInfo cpu;
  AcpiPmInfo pm;
  AcpiMiscInfo misc;
  AcpiMcfgInfo mcfg;
  PcPciInfo pci;
  uint8_t *u;
 +size_t aml_len = 0;
  
  acpi_get_cpu_info(cpu);
  acpi_get_pm_info(pm);
 @@ -1474,13 +1479,20 @@ void acpi_build(PcGuestInfo *guest_info, 
 AcpiBuildTables *tables)
  dsdt = tables-table_data-len;
  build_dsdt(tables-table_data, tables-linker, misc);
  
 +/* Count the size of the DSDT and SSDT, we will need it for legacy
 + * sizing of ACPI tables.
 + */
 +aml_len += tables-table_data-len - dsdt;
 +
  /* ACPI tables pointed to by RSDT */
  acpi_add_table(table_offsets, tables-table_data);
  build_fadt(tables-table_data, tables-linker, pm, facs, dsdt);
  
 +ssdt = tables-table_data-len;
  acpi_add_table(table_offsets, tables-table_data);
  build_ssdt(tables-table_data, tables-linker, cpu, pm, misc, pci,
 guest_info);
 +aml_len += tables-table_data-len - ssdt;
  
  acpi_add_table(table_offsets, tables-table_data);
  build_madt(tables-table_data, tables-linker, cpu, guest_info);
 @@ -1513,12 +1525,53 @@ void acpi_build(PcGuestInfo *guest_info, 
 AcpiBuildTables *tables)
  /* RSDP is in FSEG memory, so allocate it separately */
  build_rsdp(tables-rsdp, tables-linker, rsdt);
  
 -/* We'll expose it all to Guest so align size to reduce
 +/* We'll expose it all to Guest so we want to reduce
   * chance of size changes.
   * RSDP is small so it's easy to keep it immutable, no need to
   * bother with alignment.
 + *
 + * We used to align the tables to 4k, but of course this would
 + * too simple to be enough.  4k turned out to be too small an
 + * alignment very soon, and in fact it is almost impossible to
 + * keep the table size stable for all (max_cpus, max_memory_slots)
 + * combinations.  So the table size is always 64k for pc-2.1 and
 + * we give an error if the table grows beyond that limit.
 + *
 + * We still have the problem of migrating from -M pc-2.0.  For that,
 + * we

[Qemu-devel] [PATCH] arm64: 64K pages and 1024MB guest

2014-07-23 Thread Joel Schopp

kvm_set_phys_mem doesn't work on arm64 with memory  1GB.  It exits with:
kvm_set_phys_mem: error registering slot: Invalid argument

An example of the failing address and size are start_addr == 0x90011000
and size=0xaffef000.  As you can see both of these are 4K aligned, not
64K aligned.

At 1024MB or smaller qemu only makes one call to kvm_set_user_memory_region,
so the start_addr and size are aligned by accident and the bug doesn't happen.

The following patch makes things work for me on an arm64 SOC.  I also smoke
tested the patch on an x86-64 box and qemu seemed to still run fine there
with the patch applied.

Cc: Peter Maydell peter.mayd...@linaro.org
Signed-off-by: Joel Schopp joel.sch...@amd.com
---
 kvm-all.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 1402f4f..1975862 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -618,14 +618,14 @@ static void kvm_set_phys_mem(MemoryRegionSection 
*section, bool add)
 
 /* kvm works in page size chunks, but the function may be called
with sub-page size and unaligned start address. */
-delta = TARGET_PAGE_ALIGN(size) - size;
+delta = HOST_PAGE_ALIGN(start_addr) - start_addr;
 if (delta  size) {
 return;
 }
 start_addr += delta;
 size -= delta;
-size = TARGET_PAGE_MASK;
-if (!size || (start_addr  ~TARGET_PAGE_MASK)) {
+size = qemu_host_page_mask;
+if (!size || (start_addr  ~qemu_host_page_mask)) {
 return;
 }

Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation

2014-07-23 Thread Stefan Weil

Am 22.07.2014 17:47, schrieb Le Tan:
 Add support for emulating Intel IOMMU according to the VT-d specification for
 the q35 chipset machine. Implement the logic for DMAR (DMA remapping) without
 PASID support. Use register-based invalidation for context-cache invalidation
 and IOTLB invalidation.
 Basic fault reporting and caching are not implemented yet.

 Signed-off-by: Le Tan tamlokv...@gmail.com
 ---
  hw/i386/Makefile.objs |1 +
  hw/i386/intel_iommu.c | 1139 
 +
  include/hw/i386/intel_iommu.h |  350 +
  3 files changed, 1490 insertions(+)
  create mode 100644 hw/i386/intel_iommu.c
  create mode 100644 include/hw/i386/intel_iommu.h

[...]
 diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
 new file mode 100644
 index 000..3ba0e1e
 --- /dev/null
 +++ b/hw/i386/intel_iommu.c
 @@ -0,0 +1,1139 @@
 +/*
 + * QEMU emulation of an Intel IOMMU (VT-d)
 + *   (DMA Remapping device)
 + *
 + * Copyright (c) 2013 Knut Omang, Oracle knut.om...@oracle.com
 + * Copyright (C) 2014 Le Tan, tamlokv...@gmail.com
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License as published by
 + * the Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 +
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 +
 + * You should have received a copy of the GNU General Public License
 + * along with this program; if not, write to the Free Software
 + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
 + */
 +

I suggest replacing the FSF address here (and in other files) by the URL:

 * You should have received a copy of the GNU General Public License along
 * with this program; if not, see http://www.gnu.org/licenses/.

This is the standard used for most GPL text in QEMU source files.

Regards
Stefan W.

Re: [Qemu-devel] [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-07-23 Thread Konrad Rzeszutek Wilk

On Sat, Jul 19, 2014 at 12:27:21AM +, Kay, Allen M wrote:
  For the MCH PCI registers that do need to be read - can you tell us which 
  ones those are?
 
 In qemu/hw/xen_pt_igd.c/igd_pci_read(), following MCH PCI config register 
 reads are passthrough to the host HW.   Some of the registers are needed by 
 Ironlake GFX driver which we probably can remove.  I did a trace recently on 
 Broadwell,  the number of register accessed are even smaller (0, 2, 2c, 2e, 
 50, 52, a0, a4).  Given that we now have integrated MCH and GPU in the same 
 socket, looks like driver can easily remove reads for offsets 0 - 0x2e.
 
   case 0x00:/* vendor id */
   case 0x02:/* device id */
   case 0x08:/* revision id */
   case 0x2c:/* sybsystem vendor id */
   case 0x2e:/* sybsystem id */

Right. We can fix the i915 to use the mechanism that Michael mentioned.
(see attached RFC patches)

   case 0x50:/* SNB: processor graphics control register */
   case 0x52:/* processor graphics control register */
   case 0xa0:/* top of memory */
   case 0xb0:/* ILK: BSM: should read from dev 2 offset 
 0x5c */
   case 0x58:/* SNB: PAVPC Offset */
   case 0xa4:/* SNB: graphics base of stolen memory */
   case 0xa8:/* SNB: base of GTT stolen memory */

I dug in the intel-gtt.c (see ironlake_gtt_driver) to figure out this
a bit more. As in, I speculated, what if we returned 0 (and not implement
any support for reading from these registers). What would happen?

0x52 for Ironlake (g5):
--
It looks like intel_gmch_probe is called when i915_gem_gtt_init
starts (there is a lot of code that looks to be used between
intel-gtt.c and i915.c).

Anyhow the interesting parts are that i9xx_setup ends up calling
ioremap the i915 BAR to setup some of these registers for older generations.

Then i965_gtt_total_entries gets which reads 0x52, but it is only
needed for v5 generation. For other (v4 and G33) it reads it from the GPU's
0x2020  register.

If there is a mismatch, it writes to the GPU at 0x2020 to update the
the size based on the bridge. And then it reads from 0x2020 and that
is returned and stuck in  intel_private.gtt_total_entries.

So 0x52 in the emulated bridge could be populated with what the
GPU has at 0x2020. And the writes go in the emulated area.

0x52 for v6 - v8:
-
We seem to go to gen6_gmch_probe which just figures out the 
the GTT based on the GPU's BAR sizes. The stolen values
are read from 0x50 from the GPU. So no need to touch the bridge
(see gen6_gmch_probe)

OK, so no need to have 0x52 or 0x50 in the bridge.

0xA0:
-
Could not find any reference in the Linux code. Why would
Windows driver need this? If we returned the _real_ TOM would
it matter? Is it used to figure out the device should use 32-bit
DMA operations or 40-bit?

0xb0 or 0x5c:
-
No mention of them in the Linux code.

0x58, 0xa4, 0xa8:
-
No usage of them in the Linux code. We seem to be using the 0x52
from the bridge and 0x2020 from the GPU for this functionality.


So in theory*, if using Ironlake we need to have a proper value
in 0x52 in the bridge. But for the later chipsets we do not need
these values (I am assuming that intel_setup_mchbar can safely
return as it does that for Ironlake and could very well for later
generations).

Can this be reflected in the Windows driver as well?

P.S.
*theory: That is assuming we modify the Linux i915_drv.c:intel_detect_pch
to pick up the id as suggested earlier. See the RFC patches attached.
(Not compile tested at all!)
 
 Allen
 
 -Original Message-
 From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] 
 Sent: Friday, July 18, 2014 6:45 AM
 To: Kay, Allen M
 Cc: Michael S. Tsirkin; Jesse Barnes; peter.mayd...@linaro.org; 
 xen-de...@lists.xensource.com; Ross Philipson; airl...@linux.ie; 
 daniel.vet...@ffwll.ch; intel-...@lists.freedesktop.org; 
 kelly.zyta...@amd.com; qemu-devel@nongnu.org; Anthony Perard; Stefano 
 Stabellini; anth...@codemonkey.ws; Paolo Bonzini; Zhang, Yang Z; Chen, Tiejun
 Subject: Re: [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel 
 IGD passthrough support
 
 On Thu, Jul 17, 2014 at 05:37:12PM +, Kay, Allen M wrote:
   That sounds great. Tiejun could you confirm that with windows driver guys 
   too?
  
  I believe windows driver can also assume specific CPU/PCH combos.  I will 
  discuss this with native Windows driver guys.  Preferably, the same code 
  path can be used for both native and virtualization cases to avoid frequent 
  breakage as most developers and QA do not test new code changes in 
  virtualization environment.
  
  I have verified that Windows driver do not need to write to any MCH PCI 
  registers on HSW/BDW so the PCI write function can be

[Qemu-devel] [PATCH v3 1/5] block: allow bdrv_unref() to be passed NULL pointers

2014-07-23 Thread Jeff Cody

If bdrv_unref() is passed a NULL BDS pointer, it is safe to
exit with no operation.  This will allow cleanup code to blindly
call bdrv_unref() on a BDS that has been initialized to NULL.

Reviewed-by: Max Reitz mre...@redhat.com
Signed-off-by: Jeff Cody jc...@redhat.com
---
 block.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block.c b/block.c
index 23f366d..f79efc8 100644
--- a/block.c
+++ b/block.c
@@ -5385,6 +5385,9 @@ void bdrv_ref(BlockDriverState *bs)
  * deleted. */
 void bdrv_unref(BlockDriverState *bs)
 {
+if (!bs) {
+return;
+}
 assert(bs-refcnt  0);
 if (--bs-refcnt == 0) {
 bdrv_delete(bs);
-- 
1.9.3

[Qemu-devel] [PATCH v3 2/5] block: vdi - use block layer ops in vdi_create, instead of posix calls

2014-07-23 Thread Jeff Cody

Use the block layer to create, and write to, the image file in the
VDI .bdrv_create() operation.

This has a couple of benefits: Images can now be created over protocols,
and hacks such as NOCOW are not needed in the image format driver, and
the underlying file protocol appropriate for the host OS can be relied
upon.

Also some minor cleanup for error handling.

Reviewed-by: Max Reitz mre...@redhat.com
Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/vdi.c | 75 -
 1 file changed, 29 insertions(+), 46 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 197bd77..070acb6 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -53,13 +53,6 @@
 #include block/block_int.h
 #include qemu/module.h
 #include migration/migration.h
-#ifdef __linux__
-#include linux/fs.h
-#include sys/ioctl.h
-#ifndef FS_NOCOW_FL
-#define FS_NOCOW_FL 0x0080 /* Do not cow file */
-#endif
-#endif
 
 #if defined(CONFIG_UUID)
 #include uuid/uuid.h
@@ -681,7 +674,6 @@ static int vdi_co_write(BlockDriverState *bs,
 
 static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
 {
-int fd;
 int result = 0;
 uint64_t bytes = 0;
 uint32_t blocks;
@@ -690,7 +682,10 @@ static int vdi_create(const char *filename, QemuOpts 
*opts, Error **errp)
 VdiHeader header;
 size_t i;
 size_t bmap_size;
-bool nocow = false;
+int64_t offset = 0;
+Error *local_err = NULL;
+BlockDriverState *bs = NULL;
+uint32_t *bmap = NULL;
 
 logout(\n);
 
@@ -707,7 +702,6 @@ static int vdi_create(const char *filename, QemuOpts *opts, 
Error **errp)
 image_type = VDI_TYPE_STATIC;
 }
 #endif
-nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false);
 
 if (bytes  VDI_DISK_SIZE_MAX) {
 result = -ENOTSUP;
@@ -717,27 +711,16 @@ static int vdi_create(const char *filename, QemuOpts 
*opts, Error **errp)
 goto exit;
 }
 
-fd = qemu_open(filename,
-   O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
-   0644);
-if (fd  0) {
-result = -errno;
+result = bdrv_create_file(filename, opts, local_err);
+if (result  0) {
+error_propagate(errp, local_err);
 goto exit;
 }
-
-if (nocow) {
-#ifdef __linux__
-/* Set NOCOW flag to solve performance issue on fs like btrfs.
- * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value will
- * be ignored since any failure of this operation should not block the
- * left work.
- */
-int attr;
-if (ioctl(fd, FS_IOC_GETFLAGS, attr) == 0) {
-attr |= FS_NOCOW_FL;
-ioctl(fd, FS_IOC_SETFLAGS, attr);
-}
-#endif
+result = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | 
BDRV_O_PROTOCOL,
+   NULL, local_err);
+if (result  0) {
+error_propagate(errp, local_err);
+goto exit;
 }
 
 /* We need enough blocks to store the given disk size,
@@ -769,13 +752,15 @@ static int vdi_create(const char *filename, QemuOpts 
*opts, Error **errp)
 vdi_header_print(header);
 #endif
 vdi_header_to_le(header);
-if (write(fd, header, sizeof(header))  0) {
-result = -errno;
-goto close_and_exit;
+result = bdrv_pwrite_sync(bs, offset, header, sizeof(header));
+if (result  0) {
+error_setg(errp, Error writing header to %s, filename);
+goto exit;
 }
+offset += sizeof(header);
 
 if (bmap_size  0) {
-uint32_t *bmap = g_malloc0(bmap_size);
+bmap = g_malloc0(bmap_size);
 for (i = 0; i  blocks; i++) {
 if (image_type == VDI_TYPE_STATIC) {
 bmap[i] = i;
@@ -783,27 +768,25 @@ static int vdi_create(const char *filename, QemuOpts 
*opts, Error **errp)
 bmap[i] = VDI_UNALLOCATED;
 }
 }
-if (write(fd, bmap, bmap_size)  0) {
-result = -errno;
-g_free(bmap);
-goto close_and_exit;
+result = bdrv_pwrite_sync(bs, offset, bmap, bmap_size);
+if (result  0) {
+error_setg(errp, Error writing bmap to %s, filename);
+goto exit;
 }
-g_free(bmap);
+offset += bmap_size;
 }
 
 if (image_type == VDI_TYPE_STATIC) {
-if (ftruncate(fd, sizeof(header) + bmap_size + blocks * block_size)) {
-result = -errno;
-goto close_and_exit;
+result = bdrv_truncate(bs, offset + blocks * block_size);
+if (result  0) {
+error_setg(errp, Failed to statically allocate %s, filename);
+goto exit;
 }
 }
 
-close_and_exit:
-if ((close(fd)  0)  !result) {
-result = -errno;
-}
-
 exit:
+bdrv_unref(bs);
+g_free(bmap);
 return result;
 }
 
-- 
1.9.3

[Qemu-devel] [PATCH v3 0/5] Allow VPC and VDI to be created over protocols

2014-07-23 Thread Jeff Cody

Changes from v2 - v3:
* Patch 2: Removed extra #ifdef __linux__ from top of file (Max)
* Patch 4: Removed extra #ifdef __linux__ from top of file (Max)
* Patch 5: Removed output from debug cruft (Max)
* Added Max's R-b to remaining patches

Changes from v1 - v2:
* Patch 2: Use'bs' instead of 'bs-file' (Max)
* Patch 3: Same as patch 2 (ripple through)
* Patch 5: Update VDI test for static image (Kevin)
* Added Max's R-b to patches 1,3,4

This allows VPC and VDI to be created over protocols; currently, they use
posix calls directly to open, seek, and write into new image files.  This
obviously precludes them from being able to be created over a protocol, like
glusterfs.

Jeff Cody (5):
  block: allow bdrv_unref() to be passed NULL pointers
  block: vdi - use block layer ops in vdi_create, instead of posix calls
  block: use the standard 'ret' instead of 'result'
  block: vpc - use block layer ops in vpc_create, instead of posix calls
  block: iotest - update 084 to test static VDI image creation

 block.c|   3 ++
 block/vdi.c|  89 +++--
 block/vpc.c| 106 ++---
 tests/qemu-iotests/084 |  16 ++-
 tests/qemu-iotests/084.out |  14 ++
 5 files changed, 110 insertions(+), 118 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH v3 4/5] block: vpc - use block layer ops in vpc_create, instead of posix calls

2014-07-23 Thread Jeff Cody

Use the block layer to create, and write to, the image file in the VPC
.bdrv_create() operation.

This has a couple of benefits: Images can now be created over protocols,
and hacks such as NOCOW are not needed in the image format driver, and
the underlying file protocol appropriate for the host OS can be relied
upon.

Reviewed-by: Max Reitz mre...@redhat.com
Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/vpc.c | 106 
 1 file changed, 43 insertions(+), 63 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 8b376a4..9690344 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -29,13 +29,6 @@
 #if defined(CONFIG_UUID)
 #include uuid/uuid.h
 #endif
-#ifdef __linux__
-#include linux/fs.h
-#include sys/ioctl.h
-#ifndef FS_NOCOW_FL
-#define FS_NOCOW_FL 0x0080 /* Do not cow file */
-#endif
-#endif
 
 /**/
 
@@ -656,39 +649,41 @@ static int calculate_geometry(int64_t total_sectors, 
uint16_t* cyls,
 return 0;
 }
 
-static int create_dynamic_disk(int fd, uint8_t *buf, int64_t total_sectors)
+static int create_dynamic_disk(BlockDriverState *bs, uint8_t *buf,
+   int64_t total_sectors)
 {
 VHDDynDiskHeader *dyndisk_header =
 (VHDDynDiskHeader *) buf;
 size_t block_size, num_bat_entries;
 int i;
-int ret = -EIO;
+int ret;
+int64_t offset = 0;
 
 // Write the footer (twice: at the beginning and at the end)
 block_size = 0x20;
 num_bat_entries = (total_sectors + block_size / 512) / (block_size / 512);
 
-if (write(fd, buf, HEADER_SIZE) != HEADER_SIZE) {
+ret = bdrv_pwrite_sync(bs, offset, buf, HEADER_SIZE);
+if (ret) {
 goto fail;
 }
 
-if (lseek(fd, 1536 + ((num_bat_entries * 4 + 511)  ~511), SEEK_SET)  0) {
-goto fail;
-}
-if (write(fd, buf, HEADER_SIZE) != HEADER_SIZE) {
+offset = 1536 + ((num_bat_entries * 4 + 511)  ~511);
+ret = bdrv_pwrite_sync(bs, offset, buf, HEADER_SIZE);
+if (ret  0) {
 goto fail;
 }
 
 // Write the initial BAT
-if (lseek(fd, 3 * 512, SEEK_SET)  0) {
-goto fail;
-}
+offset = 3 * 512;
 
 memset(buf, 0xFF, 512);
 for (i = 0; i  (num_bat_entries * 4 + 511) / 512; i++) {
-if (write(fd, buf, 512) != 512) {
+ret = bdrv_pwrite_sync(bs, offset, buf, 512);
+if (ret  0) {
 goto fail;
 }
+offset += 512;
 }
 
 // Prepare the Dynamic Disk Header
@@ -709,39 +704,35 @@ static int create_dynamic_disk(int fd, uint8_t *buf, 
int64_t total_sectors)
 dyndisk_header-checksum = be32_to_cpu(vpc_checksum(buf, 1024));
 
 // Write the header
-if (lseek(fd, 512, SEEK_SET)  0) {
-goto fail;
-}
+offset = 512;
 
-if (write(fd, buf, 1024) != 1024) {
+ret = bdrv_pwrite_sync(bs, offset, buf, 1024);
+if (ret  0) {
 goto fail;
 }
-ret = 0;
 
  fail:
 return ret;
 }
 
-static int create_fixed_disk(int fd, uint8_t *buf, int64_t total_size)
+static int create_fixed_disk(BlockDriverState *bs, uint8_t *buf,
+ int64_t total_size)
 {
-int ret = -EIO;
+int ret;
 
 /* Add footer to total size */
-total_size += 512;
-if (ftruncate(fd, total_size) != 0) {
-ret = -errno;
-goto fail;
-}
-if (lseek(fd, -512, SEEK_END)  0) {
-goto fail;
-}
-if (write(fd, buf, HEADER_SIZE) != HEADER_SIZE) {
-goto fail;
+total_size += HEADER_SIZE;
+
+ret = bdrv_truncate(bs, total_size);
+if (ret  0) {
+return ret;
 }
 
-ret = 0;
+ret = bdrv_pwrite_sync(bs, total_size - HEADER_SIZE, buf, HEADER_SIZE);
+if (ret  0) {
+return ret;
+}
 
- fail:
 return ret;
 }
 
@@ -750,7 +741,7 @@ static int vpc_create(const char *filename, QemuOpts *opts, 
Error **errp)
 uint8_t buf[1024];
 VHDFooter *footer = (VHDFooter *) buf;
 char *disk_type_param;
-int fd, i;
+int i;
 uint16_t cyls = 0;
 uint8_t heads = 0;
 uint8_t secs_per_cyl = 0;
@@ -758,7 +749,8 @@ static int vpc_create(const char *filename, QemuOpts *opts, 
Error **errp)
 int64_t total_size;
 int disk_type;
 int ret = -EIO;
-bool nocow = false;
+Error *local_err = NULL;
+BlockDriverState *bs = NULL;
 
 /* Read out options */
 total_size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
@@ -775,28 +767,17 @@ static int vpc_create(const char *filename, QemuOpts 
*opts, Error **errp)
 } else {
 disk_type = VHD_DYNAMIC;
 }
-nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false);
 
-/* Create the file */
-fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644);
-if (fd  0) {
-ret = -EIO;
+ret = bdrv_create_file(filename, opts, local_err);
+if (ret  0) {
+error_propagate(errp, local_err);

[Qemu-devel] [PATCH v3 5/5] block: iotest - update 084 to test static VDI image creation

2014-07-23 Thread Jeff Cody

This updates the VDI corruption test to also test static VDI image
creation, as well as the default dynamic image creation.

Reviewed-by: Max Reitz mre...@redhat.com
Signed-off-by: Jeff Cody jc...@redhat.com
---
 tests/qemu-iotests/084 | 16 ++--
 tests/qemu-iotests/084.out | 14 ++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/084 b/tests/qemu-iotests/084
index cb4d7b7..ae33c2c 100755
--- a/tests/qemu-iotests/084
+++ b/tests/qemu-iotests/084
@@ -1,6 +1,7 @@
 #!/bin/bash
 #
-# Test case for VDI header corruption; image too large, and too many blocks
+# Test case for VDI header corruption; image too large, and too many blocks.
+# Also simple test for creating dynamic and static VDI images.
 #
 # Copyright (C) 2013 Red Hat, Inc.
 #
@@ -43,14 +44,25 @@ _supported_fmt vdi
 _supported_proto generic
 _supported_os Linux
 
+size=64M
 ds_offset=368  # disk image size field offset
 bs_offset=376  # block size field offset
 bii_offset=384 # block in image field offset
 
 echo
+echo === Statically allocated image creation ===
+echo
+_make_test_img $size -o static
+_img_info
+stat -cdisk image file size in bytes: %s ${TEST_IMG}
+_cleanup_test_img
+
+echo
 echo === Testing image size bounds ===
 echo
-_make_test_img 64M
+_make_test_img $size
+_img_info
+stat -cdisk image file size in bytes: %s ${TEST_IMG}
 
 # check for image size too large
 # poke max image size, and appropriate blocks_in_image value
diff --git a/tests/qemu-iotests/084.out b/tests/qemu-iotests/084.out
index c7120d9..ea29ae0 100644
--- a/tests/qemu-iotests/084.out
+++ b/tests/qemu-iotests/084.out
@@ -1,8 +1,22 @@
 QA output created by 084
 
+=== Statically allocated image creation ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+image: TEST_DIR/t.IMGFMT
+file format: IMGFMT
+virtual size: 64M (67108864 bytes)
+cluster_size: 1048576
+disk image file size in bytes: 67109888
+
 === Testing image size bounds ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+image: TEST_DIR/t.IMGFMT
+file format: IMGFMT
+virtual size: 64M (67108864 bytes)
+cluster_size: 1048576
+disk image file size in bytes: 1024
 Test 1: Maximum size (1024 TB):
 qemu-img: Could not open 'TEST_DIR/t.IMGFMT': Could not open 
'TEST_DIR/t.IMGFMT': Invalid argument
 
-- 
1.9.3

[Qemu-devel] [PATCH v3 3/5] block: use the standard 'ret' instead of 'result'

2014-07-23 Thread Jeff Cody

Most QEMU code uses 'ret' for function return values. The VDI driver
uses a mix of 'result' and 'ret'.  This cleans that up, switching over
to the standard 'ret' usage.

Reviewed-by: Max Reitz mre...@redhat.com
Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/vdi.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 070acb6..9d62a3c 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -350,23 +350,23 @@ static int vdi_make_empty(BlockDriverState *bs)
 static int vdi_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
 const VdiHeader *header = (const VdiHeader *)buf;
-int result = 0;
+int ret = 0;
 
 logout(\n);
 
 if (buf_size  sizeof(*header)) {
 /* Header too small, no VDI. */
 } else if (le32_to_cpu(header-signature) == VDI_SIGNATURE) {
-result = 100;
+ret = 100;
 }
 
-if (result == 0) {
+if (ret == 0) {
 logout(no vdi image\n);
 } else {
 logout(%s, header-text);
 }
 
-return result;
+return ret;
 }
 
 static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
@@ -674,7 +674,7 @@ static int vdi_co_write(BlockDriverState *bs,
 
 static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
 {
-int result = 0;
+int ret = 0;
 uint64_t bytes = 0;
 uint32_t blocks;
 size_t block_size = DEFAULT_CLUSTER_SIZE;
@@ -704,21 +704,21 @@ static int vdi_create(const char *filename, QemuOpts 
*opts, Error **errp)
 #endif
 
 if (bytes  VDI_DISK_SIZE_MAX) {
-result = -ENOTSUP;
+ret = -ENOTSUP;
 error_setg(errp, Unsupported VDI image size (size is 0x% PRIx64
   , max supported is 0x% PRIx64 ),
   bytes, VDI_DISK_SIZE_MAX);
 goto exit;
 }
 
-result = bdrv_create_file(filename, opts, local_err);
-if (result  0) {
+ret = bdrv_create_file(filename, opts, local_err);
+if (ret  0) {
 error_propagate(errp, local_err);
 goto exit;
 }
-result = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | 
BDRV_O_PROTOCOL,
-   NULL, local_err);
-if (result  0) {
+ret = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
+NULL, local_err);
+if (ret  0) {
 error_propagate(errp, local_err);
 goto exit;
 }
@@ -752,8 +752,8 @@ static int vdi_create(const char *filename, QemuOpts *opts, 
Error **errp)
 vdi_header_print(header);
 #endif
 vdi_header_to_le(header);
-result = bdrv_pwrite_sync(bs, offset, header, sizeof(header));
-if (result  0) {
+ret = bdrv_pwrite_sync(bs, offset, header, sizeof(header));
+if (ret  0) {
 error_setg(errp, Error writing header to %s, filename);
 goto exit;
 }
@@ -768,8 +768,8 @@ static int vdi_create(const char *filename, QemuOpts *opts, 
Error **errp)
 bmap[i] = VDI_UNALLOCATED;
 }
 }
-result = bdrv_pwrite_sync(bs, offset, bmap, bmap_size);
-if (result  0) {
+ret = bdrv_pwrite_sync(bs, offset, bmap, bmap_size);
+if (ret  0) {
 error_setg(errp, Error writing bmap to %s, filename);
 goto exit;
 }
@@ -777,8 +777,8 @@ static int vdi_create(const char *filename, QemuOpts *opts, 
Error **errp)
 }
 
 if (image_type == VDI_TYPE_STATIC) {
-result = bdrv_truncate(bs, offset + blocks * block_size);
-if (result  0) {
+ret = bdrv_truncate(bs, offset + blocks * block_size);
+if (ret  0) {
 error_setg(errp, Failed to statically allocate %s, filename);
 goto exit;
 }
@@ -787,7 +787,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, 
Error **errp)
 exit:
 bdrv_unref(bs);
 g_free(bmap);
-return result;
+return ret;
 }
 
 static void vdi_close(BlockDriverState *bs)
-- 
1.9.3

Re: [Qemu-devel] [PATCH 5/7] hw/core/sysbus: add fdt_add_node method

2014-07-23 Thread Alexander Graf



On 23.07.14 17:33, Eric Auger wrote:

On 07/08/2014 03:52 PM, Alexander Graf wrote:

On 07.07.14 09:08, Eric Auger wrote:

This method is meant to be called on sysbus device dynamic
instantiation (-device option). Devices that support this
kind of instantiation must implement this method.

Signed-off-by: Eric Auger eric.au...@linaro.org

For the reason I stated earlier, I don't think it's a good idea to put
device tree code into our device models.

Hi Alex,

I would propose we discuss that topic during next KVM call if you are
available.


I lost track when that would be. Next week would work fine, the week 
after not :).



Hope Peter will be available to join too. Because I feel
stuck between not putting things in the machine file (1) - obviously we
could put them in a helper module (2) - and not putting them in the
device (3).

Whatever the solution I fear we are going to pollute something: Any time
a new device wants to support dynamic instantiation, we would need to
modify the machine file or the helper module with 1 and 2 resp. In case
we put it in the device we pollute this latter...

My hope was that quite few QEMU platform devices would need to support
that feature and hence would need to implement this dt node generation
method. To me dynamic instantiation of platform device was not the
mainstream solution.


Quite frankly I don't think it'd be that many. I think we'll cover 99.9% 
of all use cases if we just enable it for the virt machines of e500 and arm.



Then there is the fundamental question of technical feasibility of
devising a generic PlatformParams that match all the specialization
needs? Here I miss experience. In case we know the machine type and a
small set of additional fields couldn't we do the adaptations you talked
about, related to IRQs?


The problem is that I don't know all the boards and different things 
people come up with either. There's also no reason machine files have to 
stick to the platform bus model - they could just take those devices 
and stick them into an existing other virtual bus.


I don't feel comfortable generalizing something where I'm pretty sure 
things will blow up sooner or later.



Alex

Re: [Qemu-devel] [PATCH 1/7] hw/misc/platform_devices: helpers for dynamic instantiation of platform devices

2014-07-23 Thread Alexander Graf



On 23.07.14 16:58, Eric Auger wrote:

On 07/08/2014 03:43 PM, Alexander Graf wrote:

On 07.07.14 09:08, Eric Auger wrote:

This new module implements routines which help in dynamic instantiation
of sysbus devices. Machine files can use those generic routines.

---

Dynamic sysbus device allocation fully written by Alex Graf.

[Eric Auger]
Those functions were initially in ppc e500 machine file. Now moved to a
separate module.

PPCE500Params is replaced by a generic struct named PlatformParams

Signed-off-by: Alexander Graf ag...@suse.de
Signed-off-by: Eric Auger eric.au...@linaro.org
---
   hw/misc/Makefile.objs  |   1 +
   hw/misc/platform_devices.c | 217
+
   include/hw/misc/platform_devices.h |  61 +++
   3 files changed, 279 insertions(+)
   create mode 100644 hw/misc/platform_devices.c
   create mode 100644 include/hw/misc/platform_devices.h

diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index e47fea8..d081606 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -40,3 +40,4 @@ obj-$(CONFIG_SLAVIO) += slavio_misc.o
   obj-$(CONFIG_ZYNQ) += zynq_slcr.o
 obj-$(CONFIG_PVPANIC) += pvpanic.o
+obj-y += platform_devices.o
diff --git a/hw/misc/platform_devices.c b/hw/misc/platform_devices.c
new file mode 100644
index 000..96ab272
--- /dev/null
+++ b/hw/misc/platform_devices.c
@@ -0,0 +1,217 @@
+#include hw/misc/platform_devices.h
+#include hw/sysbus.h
+#include qemu/error-report.h
+
+#define PAGE_SHIFT 12
+
+int sysbus_device_create_devtree(Object *obj, void *opaque)
+{
+PlatformDevtreeData *data = opaque;
+Object *dev;
+SysBusDevice *sbdev;
+bool matched = false;
+
+dev = object_dynamic_cast(obj, TYPE_SYS_BUS_DEVICE);
+sbdev = (SysBusDevice *)dev;
+
+if (!sbdev) {
+/* Container, traverse it for children */
+return object_child_foreach(obj,
sysbus_device_create_devtree, data);
+}
+
+if (!matched) {
+error_report(Device %s is not supported by this machine yet.,
+ qdev_fw_name(DEVICE(dev)));
+exit(1);
+}
+
+return 0;
+}
+
+void platform_bus_create_devtree(PlatformParams *params, void *fdt,
+const char *mpic)
+{
+gchar *node = g_strdup_printf(/platform@%PRIx64,
+  params-platform_bus_base);
+const char platcomp[] = qemu,platform\0simple-bus;
+PlatformDevtreeData data;
+Object *container;
+uint64_t addr = params-platform_bus_base;
+uint64_t size = params-platform_bus_size;
+int irq_start = params-platform_bus_first_irq;
+
+/* Create a /platform node that we can put all devices into */
+
+qemu_fdt_add_subnode(fdt, node);
+qemu_fdt_setprop(fdt, node, compatible, platcomp,
sizeof(platcomp));
+
+/* Our platform bus region is less than 32bit big, so 1 cell is
enough for
+   address and size */
+qemu_fdt_setprop_cells(fdt, node, #size-cells, 1);
+qemu_fdt_setprop_cells(fdt, node, #address-cells, 1);
+qemu_fdt_setprop_cells(fdt, node, ranges, 0, addr  32, addr,
size);
+
+qemu_fdt_setprop_phandle(fdt, node, interrupt-parent, mpic);
+
+/* Loop through all devices and create nodes for known ones */
+data.fdt = fdt;
+data.mpic = mpic;
+data.irq_start = irq_start;
+data.node = node;
+
+container = container_get(qdev_get_machine(), /peripheral);
+sysbus_device_create_devtree(container, data);
+container = container_get(qdev_get_machine(), /peripheral-anon);
+sysbus_device_create_devtree(container, data);
+
+g_free(node);
+}

Device trees are pretty platform (and even machine) specific. Just to
give you an example - the interrupt specifier on most e500 systems
really is 4 cells big:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt#n80


|   Interrupt specifiers consists of 4 cells encoded as
   follows:

1st-cell   interrupt-number

 Identifies the interrupt source.  The meaning
 depends on the type of interrupt.

 Note: If the interrupt-type cell is undefined
 (i.e. #interrupt-cells = 2), this cell
 should be interpreted the same as for
 interrupt-type 0-- i.e. an external or
 normal SoC device interrupt.

2nd-cell   level-sense information, encoded as follows:
 0 = low-to-high edge triggered
 1 = active low level-sensitive
 2 = active high level-sensitive
 3 = high-to-low edge triggered

3rd-cell   interrupt-type

 The following types are supported:

   0 = external or normal SoC device interrupt

   The interrupt-number cell contains
   the SoC device interrupt number.  The
   type-specific

Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation

2014-07-23 Thread Le Tan

Hi Paolo,

2014-07-23 15:58 GMT+08:00 Paolo Bonzini pbonz...@redhat.com:
 Il 22/07/2014 17:47, Le Tan ha scritto:
 +static inline void define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t 
 val,
 +uint64_t wmask, uint64_t w1cmask)
 +{
 +*((uint64_t *)s-csr[addr]) = val;

 All these casts are not endian-safe.  Please use ldl_le_p, ldq_le_p,
 stl_le_p, stq_le_p.

Thanks very much. Finally I got the idea here.:) Also thanks for your
renaming suggestions.

 +*((uint64_t *)s-wmask[addr]) = wmask;
 +*((uint64_t *)s-w1cmask[addr]) = w1cmask;
 +}
 +
 +static inline void define_quad_wo(IntelIOMMUState *s, hwaddr addr,
 +  uint64_t mask)
 +{
 +*((uint64_t *)s-womask[addr]) = mask;
 +}
 +
 +static inline void define_long(IntelIOMMUState *s, hwaddr addr, uint32_t 
 val,
 +uint32_t wmask, uint32_t w1cmask)
 +{
 +*((uint32_t *)s-csr[addr]) = val;
 +*((uint32_t *)s-wmask[addr]) = wmask;
 +*((uint32_t *)s-w1cmask[addr]) = w1cmask;
 +}
 +
 +static inline void define_long_wo(IntelIOMMUState *s, hwaddr addr,
 +  uint32_t mask)
 +{
 +*((uint32_t *)s-womask[addr]) = mask;
 +}
 +
 +/* External get/set operations */
 +static inline void set_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val)
 +{
 +uint64_t oldval = *((uint64_t *)s-csr[addr]);
 +uint64_t wmask = *((uint64_t *)s-wmask[addr]);
 +uint64_t w1cmask = *((uint64_t *)s-w1cmask[addr]);
 +*((uint64_t *)s-csr[addr]) =
 +((oldval  ~wmask) | (val  wmask))  ~(w1cmask  val);
 +}
 +
 +static inline void set_long(IntelIOMMUState *s, hwaddr addr, uint32_t val)
 +{
 +uint32_t oldval = *((uint32_t *)s-csr[addr]);
 +uint32_t wmask = *((uint32_t *)s-wmask[addr]);
 +uint32_t w1cmask = *((uint32_t *)s-w1cmask[addr]);
 +*((uint32_t *)s-csr[addr]) =
 +((oldval  ~wmask) | (val  wmask))  ~(w1cmask  val);
 +}
 +
 +static inline uint64_t get_quad(IntelIOMMUState *s, hwaddr addr)
 +{
 +uint64_t val = *((uint64_t *)s-csr[addr]);
 +uint64_t womask = *((uint64_t *)s-womask[addr]);
 +return val  ~womask;
 +}
 +
 +
 +static inline uint32_t get_long(IntelIOMMUState *s, hwaddr addr)
 +{
 +uint32_t val = *((uint32_t *)s-csr[addr]);
 +uint32_t womask = *((uint32_t *)s-womask[addr]);
 +return val  ~womask;
 +}
 +
 +
 +
 +/* Internal get/set operations */
 +static inline uint64_t __get_quad(IntelIOMMUState *s, hwaddr addr)

 get_quad_raw?

 +{
 +return *((uint64_t *)s-csr[addr]);
 +}
 +
 +static inline uint32_t __get_long(IntelIOMMUState *s, hwaddr addr)

 get_long_raw?

 +{
 +return *((uint32_t *)s-csr[addr]);
 +}
 +
 +
 +/* val = (val  ~clear) | mask */
 +static inline uint32_t set_mask_long(IntelIOMMUState *s, hwaddr addr,

 set_clear_long?

 + uint32_t clear, uint32_t mask)
 +{
 +uint32_t *ptr = (uint32_t *)s-csr[addr];
 +uint32_t val = (*ptr  ~clear) | mask;
 +*ptr = val;
 +return val;
 +}
 +
 +/* val = (val  ~clear) | mask */
 +static inline uint64_t set_mask_quad(IntelIOMMUState *s, hwaddr addr,

 set_clear_quad?
 + uint64_t clear, uint64_t mask)
 +{
 +uint64_t *ptr = (uint64_t *)s-csr[addr];
 +uint64_t val = (*ptr  ~clear) | mask;
 +*ptr = val;
 +return val;
 +}
 +
 +


Regards,
Le

Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation

2014-07-23 Thread Le Tan

Hi Stefan,

2014-07-24 4:29 GMT+08:00 Stefan Weil s...@weilnetz.de:
 Am 22.07.2014 17:47, schrieb Le Tan:
 Add support for emulating Intel IOMMU according to the VT-d specification for
 the q35 chipset machine. Implement the logic for DMAR (DMA remapping) without
 PASID support. Use register-based invalidation for context-cache invalidation
 and IOTLB invalidation.
 Basic fault reporting and caching are not implemented yet.

 Signed-off-by: Le Tan tamlokv...@gmail.com
 ---
  hw/i386/Makefile.objs |1 +
  hw/i386/intel_iommu.c | 1139 
 +
  include/hw/i386/intel_iommu.h |  350 +
  3 files changed, 1490 insertions(+)
  create mode 100644 hw/i386/intel_iommu.c
  create mode 100644 include/hw/i386/intel_iommu.h

 [...]
 diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
 new file mode 100644
 index 000..3ba0e1e
 --- /dev/null
 +++ b/hw/i386/intel_iommu.c
 @@ -0,0 +1,1139 @@
 +/*
 + * QEMU emulation of an Intel IOMMU (VT-d)
 + *   (DMA Remapping device)
 + *
 + * Copyright (c) 2013 Knut Omang, Oracle knut.om...@oracle.com
 + * Copyright (C) 2014 Le Tan, tamlokv...@gmail.com
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License as published by
 + * the Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 +
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 +
 + * You should have received a copy of the GNU General Public License
 + * along with this program; if not, write to the Free Software
 + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  
 USA
 + */
 +

 I suggest replacing the FSF address here (and in other files) by the URL:

  * You should have received a copy of the GNU General Public License along
  * with this program; if not, see http://www.gnu.org/licenses/.

 This is the standard used for most GPL text in QEMU source files.

Get it. I copied it from the Linux kernel tree.
Thanks very much!

 Regards
 Stefan W.


Regards,
Le

Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases

2014-07-23 Thread ronnie sahlberg

On Wed, Jul 16, 2014 at 10:29 AM, Michael Tokarev m...@tls.msk.ru wrote:
 16.07.2014 21:23, ronnie sahlberg wrote:

 If you ask debian to upgrade. Could you ask them to wait and upgrade after I
 have release the next version, hopefully if all goes well, at the end
 of this week?

 There's no problem in updating now to fix missing .pc file and to update
 next week to include a new version.


Please find a new version 1.12 on the website.

Thanks.
ronnie sahlberg

Re: [Qemu-devel] [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-07-23 Thread Chen, Tiejun


On 2014/7/24 4:54, Konrad Rzeszutek Wilk wrote:

On Sat, Jul 19, 2014 at 12:27:21AM +, Kay, Allen M wrote:

For the MCH PCI registers that do need to be read - can you tell us which ones 
those are?


In qemu/hw/xen_pt_igd.c/igd_pci_read(), following MCH PCI config register reads 
are passthrough to the host HW.   Some of the registers are needed by Ironlake 
GFX driver which we probably can remove.  I did a trace recently on Broadwell,  
the number of register accessed are even smaller (0, 2, 2c, 2e, 50, 52, a0, 
a4).  Given that we now have integrated MCH and GPU in the same socket, looks 
like driver can easily remove reads for offsets 0 - 0x2e.

case 0x00:/* vendor id */
case 0x02:/* device id */
case 0x08:/* revision id */
case 0x2c:/* sybsystem vendor id */
case 0x2e:/* sybsystem id */


Right. We can fix the i915 to use the mechanism that Michael mentioned.
(see attached RFC patches)


case 0x50:/* SNB: processor graphics control register */
case 0x52:/* processor graphics control register */
case 0xa0:/* top of memory */
case 0xb0:/* ILK: BSM: should read from dev 2 offset 
0x5c */
case 0x58:/* SNB: PAVPC Offset */
case 0xa4:/* SNB: graphics base of stolen memory */
case 0xa8:/* SNB: base of GTT stolen memory */


I dug in the intel-gtt.c (see ironlake_gtt_driver) to figure out this
a bit more. As in, I speculated, what if we returned 0 (and not implement
any support for reading from these registers). What would happen?

0x52 for Ironlake (g5):
--
It looks like intel_gmch_probe is called when i915_gem_gtt_init
starts (there is a lot of code that looks to be used between
intel-gtt.c and i915.c).

Anyhow the interesting parts are that i9xx_setup ends up calling
ioremap the i915 BAR to setup some of these registers for older generations.

Then i965_gtt_total_entries gets which reads 0x52, but it is only
needed for v5 generation. For other (v4 and G33) it reads it from the GPU's
0x2020  register.

If there is a mismatch, it writes to the GPU at 0x2020 to update the
the size based on the bridge. And then it reads from 0x2020 and that
is returned and stuck in  intel_private.gtt_total_entries.

So 0x52 in the emulated bridge could be populated with what the
GPU has at 0x2020. And the writes go in the emulated area.

0x52 for v6 - v8:
-
We seem to go to gen6_gmch_probe which just figures out the
the GTT based on the GPU's BAR sizes. The stolen values
are read from 0x50 from the GPU. So no need to touch the bridge
(see gen6_gmch_probe)

OK, so no need to have 0x52 or 0x50 in the bridge.

0xA0:
-
Could not find any reference in the Linux code. Why would
Windows driver need this? If we returned the _real_ TOM would
it matter? Is it used to figure out the device should use 32-bit
DMA operations or 40-bit?

0xb0 or 0x5c:
-
No mention of them in the Linux code.

0x58, 0xa4, 0xa8:
-
No usage of them in the Linux code. We seem to be using the 0x52
from the bridge and 0x2020 from the GPU for this functionality.


So in theory*, if using Ironlake we need to have a proper value
in 0x52 in the bridge. But for the later chipsets we do not need
these values (I am assuming that intel_setup_mchbar can safely
return as it does that for Ironlake and could very well for later
generations).

Can this be reflected in the Windows driver as well?

P.S.
*theory: That is assuming we modify the Linux i915_drv.c:intel_detect_pch
to pick up the id as suggested earlier. See the RFC patches attached.
(Not compile tested at all!)


I take a look these patches, looks we still scan all PCI Bridge to walk 
all PCHs. So this means we still need to fake a PCI bridge, right? Or 
maybe you don't cover this problem this time.


I prefer we should check dev slot to get that PCH like my previous 
patch, gpu:drm:i915:intel_detect_pch: back to check devfn instead of 
check class type. Because Windows always use this way, so I think this 
point should be same between Linux and Windows.


Or we need anther better way to unify all OSs.

Thanks
Tiejun



Allen

-Original Message-
From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
Sent: Friday, July 18, 2014 6:45 AM
To: Kay, Allen M
Cc: Michael S. Tsirkin; Jesse Barnes; peter.mayd...@linaro.org; 
xen-de...@lists.xensource.com; Ross Philipson; airl...@linux.ie; 
daniel.vet...@ffwll.ch; intel-...@lists.freedesktop.org; kelly.zyta...@amd.com; 
qemu-devel@nongnu.org; Anthony Perard; Stefano Stabellini; 
anth...@codemonkey.ws; Paolo Bonzini; Zhang, Yang Z; Chen, Tiejun
Subject: Re: [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel 
IGD passthrough support

On Thu, Jul 17, 2014 at 05:37:12PM +, Kay, Allen M wrote:

[Qemu-devel] [RFC PATCH v2] add memory hotunplug support

2014-07-23 Thread Zhu Guihua

From: Hu Tao hu...@cn.fujitsu.com

This patch is to solve a problem that when you add a hot-pluggable memory,
you can't remove the memory.

Its approach is to set GPE status bit by qemu, then trigger SCI interrupt to 
notify
guest os. Guest os checks device status, and free memory resource if possible,
then generate OST. Finally, qemu handles OST events to free dimm device.

Signed-off-by: Hu Tao hu...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/memory_hotplug.c | 74 +++-
 hw/acpi/piix4.c  |  2 ++
 hw/core/qdev.c   |  9 +
 hw/i386/pc.c | 31 +
 hw/i386/ssdt-mem.dsl |  4 +++
 hw/i386/ssdt-misc.dsl| 11 +-
 hw/mem/pc-dimm.c | 10 ++
 include/hw/acpi/memory_hotplug.h |  3 ++
 include/qom/object.h |  1 +
 qdev-monitor.c   | 25 +-
 qom/object.c |  2 +-
 11 files changed, 168 insertions(+), 4 deletions(-)

diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index ed39241..b43b2b4 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -75,12 +75,14 @@ static uint64_t acpi_memory_hotplug_read(void *opaque, 
hwaddr addr,
 case 0x14: /* pack and return is_* fields */
 val |= mdev-is_enabled   ? 1 : 0;
 val |= mdev-is_inserting ? 2 : 0;
+val |= mdev-is_removing ? 4 : 0;
 trace_mhp_acpi_read_flags(mem_st-selector, val);
 break;
 default:
 val = ~0;
 break;
 }
+
 return val;
 }
 
@@ -126,17 +128,57 @@ static void acpi_memory_hotplug_write(void *opaque, 
hwaddr addr, uint64_t data,
 info = acpi_memory_device_status(mem_st-selector, mdev);
 qapi_event_send_acpi_device_ost(info, error_abort);
 qapi_free_ACPIOSTInfo(info);
+switch (mdev-ost_event) {
+case 0x03: /* EJECT */
+switch (mdev-ost_status) {
+case 0x0: /* SUCCESS */
+object_unparent(OBJECT(mdev-dimm));
+mdev-is_removing = false;
+mdev-dimm = NULL;
+break;
+case 0x1: /* FAILURE */
+case 0x2: /* UNRECOGNIZED NOTIFY */
+case 0x80: /* EJECT NOT SUPPORTED */
+case 0x81: /* DEVICE IN USE */
+case 0x82: /* DEVICE BUSY */
+case 0x83: /* EJECT_DEPENDENCY_BUSY */
+mdev-is_removing = false;
+mdev-is_enabled = true;
+break;
+case 0x84: /* EJECTION IN PROGRESS */
+break;
+default:
+break;
+}
+break;
+case 0x103: /* OSPM EJECT */
+switch (mdev-ost_status) {
+case 0x0: /* SUCCESS */
+object_unparent(OBJECT(mdev-dimm));
+mdev-is_removing = false;
+mdev-dimm = NULL;
+break;
+case 0x84: /* EJECTION IN PROGRESS */
+mdev-is_enabled = false;
+mdev-is_removing = true;
+break;
+default:
+break;
+}
+}
 break;
 case 0x14:
 mdev = mem_st-devs[mem_st-selector];
 if (data  2) { /* clear insert event */
 mdev-is_inserting  = false;
 trace_mhp_acpi_clear_insert_evt(mem_st-selector);
+} else if (data  4) { /* MRMV */
+mdev-is_enabled = false;
 }
 break;
 }
-
 }
+
 static const MemoryRegionOps acpi_memory_hotplug_ops = {
 .read = acpi_memory_hotplug_read,
 .write = acpi_memory_hotplug_write,
@@ -195,6 +237,36 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, 
MemHotplugState *mem_st,
 return;
 }
 
+void acpi_memory_unplug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st,
+   DeviceState *dev, Error **errp)
+{
+MemStatus *mdev;
+Error *local_err = NULL;
+int slot = object_property_get_int(OBJECT(dev), slot, local_err);
+
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+if (slot = mem_st-dev_count) {
+char *dev_path = object_get_canonical_path(OBJECT(dev));
+error_setg(errp, acpi_memory_plug_cb: 
+   device [%s] returned invalid memory slot[%d],
+dev_path, slot);
+g_free(dev_path);
+return;
+}
+
+mdev = mem_st-devs[slot];
+mdev-is_removing = true;
+
+/* do ACPI magic */
+ar-gpe.sts[0] |= ACPI_MEMORY_HOTPLUG_STATUS;
+acpi_update_sci(ar, irq);
+return;
+}
+
 static const VMStateDescription vmstate_memhp_sts = {
 .name = memory hotplug device state,
 .version_id = 1,
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index b72b34e..37d593a 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -362,6 +362,8 @@ static void

Re: [Qemu-devel] [RFC PATCH 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service

2014-07-23 Thread Hongyang Yang


On 07/23/2014 11:44 PM, Eric Blake wrote:

On 07/23/2014 08:25 AM, Yang Hongyang wrote:

Virtual machine (VM) replication is a well known technique for
providing application-agnostic software-implemented hardware fault
tolerance non-stop service. COLO is a high availability solution.
Both primary VM (PVM) and secondary VM (SVM) run in parallel. They
receive the same request from client, and generate response in parallel
too. If the response packets from PVM and SVM are identical, they are
released immediately. Otherwise, a VM checkpoint (on demand) is
conducted. The idea is presented in Xen summit 2012, and 2013,
and academia paper in SOCC 2013. It's also presented in KVM forum
2013:
http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf
Please refer to above document for detailed information.
Please also refer to previous posted RFC proposal:
http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html

The patchset is also hosted on github:
https://github.com/macrosheep/qemu/tree/colo_v0.1

This patchset is RFC, implements the frame of colo, without
failover and nic/disk replication. But it is ready for demo
the COLO idea above QEMU-Kvm.
Steps using this patchset to get an overview of COLO:
1. configure the source with --enable-colo option


Code that has to be opt-in tends to bitrot, because people don't
configure their build-bots to opt in.  What sort of penalties does
opting in cause to the code if colo is not used?  I'd much rather make
the default to compile colo unless configured --disable-colo.  Are there
any pre-req libraries required for it to work?  That would be the only
reason to make the default of on or off conditional, rather than
defaulting to on.


Thanks for all your comments on this patchset, will address them.
For this one, it will not affect the rest of the code if COLO is compiled
but not used, and it does not require pre-req libraries for now, so we can
make COLO support default to on next time.






--
Thanks,
Yang.

Re: [Qemu-devel] [PATCH V4 2/5] runner: Tool for fuzz tests execution

2014-07-23 Thread Fam Zheng

On Mon, 07/21 14:18, Maria Kustova wrote:
 The purpose of the test runner is to prepare the test environment (e.g. create
 a work directory, a test image, etc), execute a program under test with
 parameters, indicate a test failure if the program was killed during the test
 execution and collect core dumps, logs and other test artifacts.
 
 The test runner doesn't depend on an image format or a program will be tested,
 so it can be used with any external image generator and program under test.
 
 Signed-off-by: Maria Kustova mari...@catit.be

Looks good. Only two minor comments below but neither is a stopper.

 ---
  tests/image-fuzzer/runner/runner.py | 360 
 
  1 file changed, 360 insertions(+)
  create mode 100755 tests/image-fuzzer/runner/runner.py
 
 diff --git a/tests/image-fuzzer/runner/runner.py 
 b/tests/image-fuzzer/runner/runner.py
 new file mode 100755
 index 000..3e9e65d
 --- /dev/null
 +++ b/tests/image-fuzzer/runner/runner.py
 @@ -0,0 +1,360 @@
 +#!/usr/bin/env python
 +
 +# Tool for running fuzz tests
 +#
 +# Copyright (C) 2014 Maria Kustova mari...@catit.be
 +#
 +# This program is free software: you can redistribute it and/or modify
 +# it under the terms of the GNU General Public License as published by
 +# the Free Software Foundation, either version 2 of the License, or
 +# (at your option) any later version.
 +#
 +# This program is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +# GNU General Public License for more details.
 +#
 +# You should have received a copy of the GNU General Public License
 +# along with this program.  If not, see http://www.gnu.org/licenses/.
 +#
 +
 +import sys, os, signal
 +import subprocess
 +import random
 +from itertools import count
 +from shutil import rmtree
 +import getopt
 +try:
 +import json
 +except ImportError:
 +try:
 +import simplejson as json
 +except ImportError:
 +print Warning: Module for JSON processing is not found.\n + \
 +'--config' and '--command' options are not supported.
 +import resource
 +resource.setrlimit(resource.RLIMIT_CORE, (-1, -1))
 +
 +
 +def multilog(msg, *output):
 + Write an object to all of specified file descriptors
 +
 +
 +for fd in output:
 +fd.write(msg)
 +fd.flush()
 +
 +
 +def str_signal(sig):
 + Convert a numeric value of a system signal to the string one
 +defined by the current operational system
 +
 +
 +for k, v in signal.__dict__.items():
 +if v == sig:
 +return k
 +
 +
 +class TestException(Exception):
 +Exception for errors risen by TestEnv objects
 +pass
 +
 +
 +class TestEnv(object):
 + Trivial test object
 +
 +The class sets up test environment, generates backing and test images
 +and executes application under tests with specified arguments and a test
 +image provided.
 +All logs are collected.
 +Summary log will contain short descriptions and statuses of tests in
 +a run.
 +Test log will include application (e.g. 'qemu-img') logs besides info 
 sent
 +to the summary log.
 +
 +
 +def __init__(self, test_id, seed, work_dir, run_log,
 + cleanup=True, log_all=False):
 +Set test environment in a specified work directory.
 +
 +Path to qemu-img and qemu-io will be retrieved from 'QEMU_IMG' and
 +'QEMU_IO' environment variables
 +
 +if seed is not None:
 +self.seed = seed
 +else:
 +self.seed = str(random.randint(0, sys.maxint))
 +random.seed(self.seed)
 +
 +self.init_path = os.getcwd()
 +self.work_dir = work_dir
 +self.current_dir = os.path.join(work_dir, 'test-' + test_id)
 +self.qemu_img = \
 +os.environ.get('QEMU_IMG', 'qemu-img')\
 +  .strip().split(' ')
 +self.qemu_io = \
 +   os.environ.get('QEMU_IO', 'qemu-io').strip().split(' 
 ')
 +self.commands = [['qemu-img', 'check', '-f', 'qcow2', '$test_img'],
 + ['qemu-img', 'info', '-f', 'qcow2', '$test_img'],
 + ['qemu-io', '$test_img', '-c', 'read $off $len'],
 + ['qemu-io', '$test_img', '-c', 'write $off $len'],
 + ['qemu-io', '$test_img', '-c',
 +  'aio_read $off $len'],
 + ['qemu-io', '$test_img', '-c',
 +  'aio_write $off $len'],
 + ['qemu-io', '$test_img', '-c', 'flush'],
 + ['qemu-io', '$test_img', '-c',
 +  'discard $off $len'],
 + ['qemu-io', '$test_img', '-c',
 +  'truncate $off']]
 +for fmt in ['raw', 'vmdk', 'vdi',

[Qemu-devel] [Bug 1336801] Re: 12.04 guest hangs on a 14.04 host server with cirrus graphics

2014-07-23 Thread Serge Hallyn

Note that on a successful boot, dmesg | grep cirrus shows:

[9.064581] fb: conflicting fb hw usage cirrusdrmfb vs EFI VGA - removing 
generic driver
[9.133808] fbcon: cirrusdrmfb (fb0) is primary device
[9.431359] cirrus :00:02.0: fb0: cirrusdrmfb frame buffer device
[9.431362] cirrus :00:02.0: registered panic notifier
[9.652851] [drm] Initialized cirrus 1.0.0 20110418 for :00:02.0 on 
minor 0

I can also reproduce this on qemu built from upstream git head (earlier
this week) so marking as affecting the upstream project.

** Package changed: libvirt (Ubuntu) = qemu (Ubuntu)

** Also affects: qemu
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1336801

Title:
  12.04 guest hangs on a 14.04 host  server with cirrus graphics

Status in QEMU:
  New
Status in “qemu” package in Ubuntu:
  Triaged

Bug description:
  A new 12.04.4 server guest installation hangs on a 14.04 server host
  machine.

  I did the following:

  Created a new Virtual Machine with the Ubuntu 12.04 template using 
virt-manager
  Ran through the installation without a hitch to install a LAMP+SSH server. 
All standard options apart from that.
  On reboot the 12.04 guest started but then hung after doing fsck step.
  Trying different options (change disk driver, etc) made it progress a couple 
more steps but still hung.

  The thing that fixed it in the end was to switch to a VGA display
  driver, away from the default.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1336801/+subscriptions

96 matches

Mail list logo