[Qemu-devel] [PULL 1/1] usb: mtp: tag root property as experimental
Reason: we don't want commit to that interface yet. Possibly the implementation will be switched over to use fsdev. Suggested-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/usb/dev-mtp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c index 1b51a90..384d4a5 100644 --- a/hw/usb/dev-mtp.c +++ b/hw/usb/dev-mtp.c @@ -1090,7 +1090,7 @@ static const VMStateDescription vmstate_usb_mtp = { }; static Property mtp_properties[] = { -DEFINE_PROP_STRING(root, MTPState, root), +DEFINE_PROP_STRING(x-root, MTPState, root), DEFINE_PROP_STRING(desc, MTPState, desc), DEFINE_PROP_END_OF_LIST(), }; -- 1.8.3.1
Re: [Qemu-devel] [PATCH for-2.1] docs: document remaining QMP events
Eric Blake ebl...@redhat.com writes: Commit dfab4892 restored this file, but did not address any of the grammar problems that had been fixed in passing when moving events out of this file. There are also a couple events that were undocumented since introduction, and one that had been added only in the time that this file was temporarily deleted. * docs/qmp/qmp-events.txt (POWERDOWN, SPICE_MIGRATE_COMPLETED) (VSERPORT_CHANGE): Add. (RESET, SPICE_INITIALIZED): Fix grammar. (SPICE_CONNECTED, SPICE_DISCONNECTED): Split. GNU ChangeLog style, unusual in QEMU commit messages. Not that I mind. Signed-off-by: Eric Blake ebl...@redhat.com --- docs/qmp/qmp-events.txt | 80 + 1 file changed, 74 insertions(+), 6 deletions(-) diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt index 4a6c2a2..78dd76a 100644 --- a/docs/qmp/qmp-events.txt +++ b/docs/qmp/qmp-events.txt @@ -243,6 +243,19 @@ Data: timestamp: { seconds: 1368697518, microseconds: 326866 } } } +POWERDOWN +- + +Emitted when the Virtual Machine is powered down through the power +control system, such as via ACPI. + +Data: None. + +Example: + +{ event: POWERDOWN, +timestamp: { seconds: 1267040730, microseconds: 682951 } } + QUORUM_FAILURE -- @@ -285,7 +298,7 @@ Example: RESET - -Emitted when the Virtual Machine is reseted. +Emitted when the Virtual Machine is reset. Data: None. @@ -325,7 +338,8 @@ Example: SHUTDOWN -Emitted when the Virtual Machine is powered down. +Emitted when the Virtual Machine has shut down, indicating that qemu +is about to exit. Data: None. @@ -337,10 +351,10 @@ Example: Note: If the command-line option -no-shutdown has been specified, a STOP event will eventually follow the SHUTDOWN event. -SPICE_CONNECTED, SPICE_DISCONNECTED +SPICE_CONNECTED +--- -Emitted when a SPICE client connects or disconnects. +Emitted when a SPICE client connects. Wording doesn't match qapi-event.json exactly. I doubt we care. Data: @@ -362,11 +376,36 @@ Example: client: {port: 52873, family: ipv4, host: 127.0.0.1} }} +SPICE_DISCONNECTED +-- + +Emitted when a SPICE client disconnects. + +Data: + +- server: Server information (json-object) + - host: IP address (json-string) + - port: port number (json-string) + - family: address family (json-string, ipv4 or ipv6) +- client: Client information (json-object) + - host: IP address (json-string) + - port: port number (json-string) + - family: address family (json-string, ipv4 or ipv6) + +Example: + +{ timestamp: {seconds: 1290688046, microseconds: 388707}, + event: SPICE_DISCONNECTED, + data: { +server: { port: 5920, family: ipv4, host: 127.0.0.1}, +client: {port: 52873, family: ipv4, host: 127.0.0.1} +}} + SPICE_INITIALIZED - Emitted after initial handshake and authentication takes place (if any) -and the SPICE channel is up'n'running +and the SPICE channel is up and running Data: @@ -399,6 +438,19 @@ Example: channel-id: 0, tls: true} }} +SPICE_INITIALIZED Another SPICE_INITIALIZED? Do you mean SPICE_MIGRATE_COMPLETED? +- + +Emitted when SPICE migration has completed + +Data: None. + +Example: + +{ timestamp: {seconds: 1290688046, microseconds: 417172}, + event: SPICE_MIGRATE_COMPLETED } + + STOP @@ -527,6 +579,22 @@ Example: host: 127.0.0.1, sasl_username: luiz } }, timestamp: { seconds: 1263475302, microseconds: 150772 } } +VSERPORT_CHANGE +--- + +Emitted when the guest opens or closes a virtio-serial port. + +Data: + +- id: device identifier of the virtio-serial port (json-string) +- open: true if the guest has opened the virtio-serial port (json-bool) + +Example: + +{ event: VSERPORT_CHANGE, +data: { id: channel0, open: true }, +timestamp: { seconds: 1401385907, microseconds: 422329 } } + WAKEUP -- Assuming you do mean SPICE_MIGRATE_COMPLETED: list is complete now. Would you mind splitting this patch? * Either one patch per undocumented event (if you want to be nice to downstreams cherry-picking events), or one patch for all of them. * One patch for the rest. Or if you feel generous, two: one for the grammar fixes, one for the spice split.
Re: [Qemu-devel] [PATCH] pci: Don't deliver MSI/MSI-X messages if bus master support is off
On 2014-07-22 21:06, Michael S. Tsirkin wrote: On Mon, Jul 21, 2014 at 12:04:22AM +0200, Jan Kiszka wrote: On 2014-07-20 23:03, Michael S. Tsirkin wrote: On Sun, Jul 20, 2014 at 11:45:10PM +0200, Jan Kiszka wrote: On 2014-07-20 21:48, Michael S. Tsirkin wrote: On Sat, Jul 19, 2014 at 06:55:48PM +0200, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com The spec says (and real HW confirms this) that, if the bus master bit is 0, the device will not generate any PCI accesses. MSI and MSI-X messages fall among these. Signed-off-by: Jan Kiszka jan.kis...@siemens.com I guess an alternative is for callers to check before invoking msi_notify. Please note is this is only option when using e.g. irqfd, so this has some advantages. Is there a specific device that is affected by this? I would expect drivers to disable msi before clearing bus master bit ... This is about emulating conforming behaviour without touching each and every device. I stumbled over this while playing with emulated vs. real Intel HDA. Right so that's my question. How did you hit it? With a custom driver? So to say: with a hand full lines of code to tickle some MSI event out of that device for testing purposes. Doesn't regulat driver disable MSI? Sure. This is not fixing a regular's driver problem. It's a behavioral correction for faulty corner cases. Jan OK based on this I think this is not 2.1 material. Agree? Agree. I'll look into Paolo's suggestion how to model this asap. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation
Il 22/07/2014 17:47, Le Tan ha scritto: +static inline void define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val, +uint64_t wmask, uint64_t w1cmask) +{ +*((uint64_t *)s-csr[addr]) = val; All these casts are not endian-safe. Please use ldl_le_p, ldq_le_p, stl_le_p, stq_le_p. +*((uint64_t *)s-wmask[addr]) = wmask; +*((uint64_t *)s-w1cmask[addr]) = w1cmask; +} + +static inline void define_quad_wo(IntelIOMMUState *s, hwaddr addr, + uint64_t mask) +{ +*((uint64_t *)s-womask[addr]) = mask; +} + +static inline void define_long(IntelIOMMUState *s, hwaddr addr, uint32_t val, +uint32_t wmask, uint32_t w1cmask) +{ +*((uint32_t *)s-csr[addr]) = val; +*((uint32_t *)s-wmask[addr]) = wmask; +*((uint32_t *)s-w1cmask[addr]) = w1cmask; +} + +static inline void define_long_wo(IntelIOMMUState *s, hwaddr addr, + uint32_t mask) +{ +*((uint32_t *)s-womask[addr]) = mask; +} + +/* External get/set operations */ +static inline void set_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val) +{ +uint64_t oldval = *((uint64_t *)s-csr[addr]); +uint64_t wmask = *((uint64_t *)s-wmask[addr]); +uint64_t w1cmask = *((uint64_t *)s-w1cmask[addr]); +*((uint64_t *)s-csr[addr]) = +((oldval ~wmask) | (val wmask)) ~(w1cmask val); +} + +static inline void set_long(IntelIOMMUState *s, hwaddr addr, uint32_t val) +{ +uint32_t oldval = *((uint32_t *)s-csr[addr]); +uint32_t wmask = *((uint32_t *)s-wmask[addr]); +uint32_t w1cmask = *((uint32_t *)s-w1cmask[addr]); +*((uint32_t *)s-csr[addr]) = +((oldval ~wmask) | (val wmask)) ~(w1cmask val); +} + +static inline uint64_t get_quad(IntelIOMMUState *s, hwaddr addr) +{ +uint64_t val = *((uint64_t *)s-csr[addr]); +uint64_t womask = *((uint64_t *)s-womask[addr]); +return val ~womask; +} + + +static inline uint32_t get_long(IntelIOMMUState *s, hwaddr addr) +{ +uint32_t val = *((uint32_t *)s-csr[addr]); +uint32_t womask = *((uint32_t *)s-womask[addr]); +return val ~womask; +} + + + +/* Internal get/set operations */ +static inline uint64_t __get_quad(IntelIOMMUState *s, hwaddr addr) get_quad_raw? +{ +return *((uint64_t *)s-csr[addr]); +} + +static inline uint32_t __get_long(IntelIOMMUState *s, hwaddr addr) get_long_raw? +{ +return *((uint32_t *)s-csr[addr]); +} + + +/* val = (val ~clear) | mask */ +static inline uint32_t set_mask_long(IntelIOMMUState *s, hwaddr addr, set_clear_long? + uint32_t clear, uint32_t mask) +{ +uint32_t *ptr = (uint32_t *)s-csr[addr]; +uint32_t val = (*ptr ~clear) | mask; +*ptr = val; +return val; +} + +/* val = (val ~clear) | mask */ +static inline uint64_t set_mask_quad(IntelIOMMUState *s, hwaddr addr, set_clear_quad? + uint64_t clear, uint64_t mask) +{ +uint64_t *ptr = (uint64_t *)s-csr[addr]; +uint64_t val = (*ptr ~clear) | mask; +*ptr = val; +return val; +} + +
Re: [Qemu-devel] [PATCH 1/2] qemu-img: Allow source cache mode specification
Am 22.07.2014 um 22:06 hat Max Reitz geschrieben: On 21.07.2014 17:52, Eric Blake wrote: On 07/19/2014 02:35 PM, Max Reitz wrote: Many qemu-img subcommands only read the source file(s) once. For these use cases, a full write-back cache is unnecessary and mainly clutters host cache memory. Though this is generally no concern as cache memory is freely available and can be scaled by the host OS, it may become a concern with thin provisioning. For these cases, it makes sense to allow users to freely specify the source cache mode (e.g. use no cache at all). This commit adds a new switch (-T) for the qemu-img subcommands check, compare, convert and rebase to specify the cache to be used for source images (the backing file in case of rebase). What mnemonic did you have in mind when choosing -T? Or was it just a universally available letter for the subcommands you were touching? To be honest, I just didn't know what -t stands for. Therefore I just thought it might be remotely logical if the lower-cased letter is used for destination and the upper-cased letter for source caching. Things might get a bit confusing there, though, because upper-case often means the other image, i.e. destination or backing file, in other commands (create -F, compare -F, convert -O and -B, rebase -F). Of course, most of them are deprecated, so I wouldn't make that a reason to block this series, but perhaps we should consider using more long options instead of randomly assigning the letters that are still free. Kevin
[Qemu-devel] [PATCH] icount: Fix virtual clock start value on ARM
When using the icount option on ARM, the virtual clock starts counting at realtime clock but it should start at 0. This small fix addresses this issue. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr --- cpus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cpus.c b/cpus.c index 5e7f2cf..de18ece 100644 --- a/cpus.c +++ b/cpus.c @@ -104,7 +104,7 @@ static bool all_cpu_threads_idle(void) /* Compensate for varying guest execution speed. */ static int64_t qemu_icount_bias; -static int64_t vm_clock_warp_start; +static int64_t vm_clock_warp_start = -1; /* Conversion factor from emulated instructions to virtual clock ticks. */ static int icount_time_shift; /* Arbitrarily pick 1MIPS as the minimum allowable speed. */ -- 2.0.0.rc2
Re: [Qemu-devel] [PATCH] icount: Fix virtual clock start value on ARM
Il 23/07/2014 11:11, Sebastian Tanase ha scritto: When using the icount option on ARM, the virtual clock starts counting at realtime clock but it should start at 0. This small fix addresses this issue. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Thanks, this is ok for 2.2. Paolo --- cpus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cpus.c b/cpus.c index 5e7f2cf..de18ece 100644 --- a/cpus.c +++ b/cpus.c @@ -104,7 +104,7 @@ static bool all_cpu_threads_idle(void) /* Compensate for varying guest execution speed. */ static int64_t qemu_icount_bias; -static int64_t vm_clock_warp_start; +static int64_t vm_clock_warp_start = -1; /* Conversion factor from emulated instructions to virtual clock ticks. */ static int icount_time_shift; /* Arbitrarily pick 1MIPS as the minimum allowable speed. */
Re: [Qemu-devel] [PATCH] icount: Fix virtual clock start value on ARM
Am 23.07.2014 11:16, schrieb Paolo Bonzini: Il 23/07/2014 11:11, Sebastian Tanase ha scritto: When using the icount option on ARM, the virtual clock starts counting at realtime clock but it should start at 0. This small fix addresses this issue. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Thanks, this is ok for 2.2. Could we get an explanation (in the commit message) of why this fixes that issue? :) By my reading -1 != 0. Thanks, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH] icount: Fix virtual clock start value on ARM
On 23 July 2014 10:11, Sebastian Tanase sebastian.tan...@openwide.fr wrote: When using the icount option on ARM, the virtual clock starts counting at realtime clock but it should start at 0. This small fix addresses this issue. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr --- cpus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cpus.c b/cpus.c index 5e7f2cf..de18ece 100644 --- a/cpus.c +++ b/cpus.c @@ -104,7 +104,7 @@ static bool all_cpu_threads_idle(void) /* Compensate for varying guest execution speed. */ static int64_t qemu_icount_bias; -static int64_t vm_clock_warp_start; +static int64_t vm_clock_warp_start = -1; /* Conversion factor from emulated instructions to virtual clock ticks. */ static int icount_time_shift; /* Arbitrarily pick 1MIPS as the minimum allowable speed. */ Commit message says this is fixing an ARM bug but this is a generic file. Is this actually a bug with wider scope than just ARM? thanks -- PMM
Re: [Qemu-devel] [PATCH] icount: Fix virtual clock start value on ARM
Il 23/07/2014 11:41, Peter Maydell ha scritto: On 23 July 2014 10:11, Sebastian Tanase sebastian.tan...@openwide.fr wrote: When using the icount option on ARM, the virtual clock starts counting at realtime clock but it should start at 0. This small fix addresses this issue. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr --- cpus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cpus.c b/cpus.c index 5e7f2cf..de18ece 100644 --- a/cpus.c +++ b/cpus.c @@ -104,7 +104,7 @@ static bool all_cpu_threads_idle(void) /* Compensate for varying guest execution speed. */ static int64_t qemu_icount_bias; -static int64_t vm_clock_warp_start; +static int64_t vm_clock_warp_start = -1; /* Conversion factor from emulated instructions to virtual clock ticks. */ static int icount_time_shift; /* Arbitrarily pick 1MIPS as the minimum allowable speed. */ Commit message says this is fixing an ARM bug but this is a generic file. Is this actually a bug with wider scope than just ARM? Yes, see the discussion yesterday under Re: [RFC PATCH V4 6/6] monitor: Add drift info to 'info jit' and Re: [RFC PATCH V4 0/6] icount: Implement delay algorithm between guest and host clocks. Paolo
[Qemu-devel] [PATCH v2] icount: Fix virtual clock start value on ARM
When using the icount option on ARM, the virtual clock starts counting at realtime clock but it should start at 0. The reason why the virtual clock starts at realtime clock is because the first time we call qemu_clock_warp (which calls icount_warp_rt) in tcg_exec_all, qemu_icount_bias (which is part of the virtual time computation mechanism) will increment by realtime - vm_clock_warp_start, with vm_clock_warp_start being 0 (see icount_warp_rt in cpus.c). By changing the value of vm_clock_warp_start from 0 to -1, the first time we call qemu_clock_warp which calls icount_warp_rt, we will return immediatly because icount_warp_rt first checks if vm_clock_warp_start is -1 and if it's the case it returns. Therefore, qemu_icount_bias will first be incremented by the value of a virtual timer deadline when the virtual cpu goes from active to inactive. The virtual time will start at 0 and increment based on the instruction counter when the vcpu is active or the qemu_icount_bias value when inactive. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr --- cpus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cpus.c b/cpus.c index 5e7f2cf..de18ece 100644 --- a/cpus.c +++ b/cpus.c @@ -104,7 +104,7 @@ static bool all_cpu_threads_idle(void) /* Compensate for varying guest execution speed. */ static int64_t qemu_icount_bias; -static int64_t vm_clock_warp_start; +static int64_t vm_clock_warp_start = -1; /* Conversion factor from emulated instructions to virtual clock ticks. */ static int icount_time_shift; /* Arbitrarily pick 1MIPS as the minimum allowable speed. */ -- 2.0.0.rc2
Re: [Qemu-devel] [RFC v4 03/13] hw/vfio/pci: Remove unneeded include files
On 07/08/2014 08:55 PM, Alex Williamson wrote: On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote: Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/vfio/pci.c | 12 1 file changed, 12 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 5c7bfd5..a7df3de 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -18,26 +18,14 @@ * Copyright (C) 2008, IBM, Muli Ben-Yehuda (m...@il.ibm.com) */ -#include dirent.h #include linux/vfio.h #include sys/ioctl.h #include sys/mman.h -#include sys/stat.h -#include sys/types.h -#include unistd.h - -#include config.h #include exec/address-spaces.h -#include exec/memory.h #include hw/pci/msi.h #include hw/pci/msix.h -#include hw/pci/pci.h -#include qemu-common.h #include qemu/error-report.h -#include qemu/event_notifier.h -#include qemu/queue.h #include qemu/range.h -#include sysemu/kvm.h #include sysemu/sysemu.h #include hw/vfio/vfio.h Was this just a remove and see if it still compiles exercise? I'm not sure I'm a fan of removing includes that are arbitrarily included via another include chain. Thanks, Hi Alex. Sorry for the delay, coming back from vacation period... Yes it was a lazy way to sort things out for PCI/platform split. Then I will drop that patch file. Besides, some system includes might be removed thanks to the inclusion of qemu-common.h, which sounds stable/reliable? dirent.h as well? Anyway it does not help in any way for my matters. Best Regards Eric Alex
Re: [Qemu-devel] [RFC v4 04/13] hw/vfio/pci: introduce VFIODevice
On 23 July 2014 11:02, Eric Auger eric.au...@linaro.org wrote: On 07/09/2014 12:41 AM, Alex Williamson wrote: On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote: +vdev-vbasedev.ops = vfio_pci_ops; + +vdev-vbasedev.type = VFIO_DEVICE_TYPE_PCI; +vdev-vbasedev.name = g_malloc0(PATH_MAX); +snprintf(vdev-vbasedev.name, PATH_MAX, %04x:%02x:%02x.%01x, +vdev-host.domain, vdev-host.bus, vdev-host.slot, +vdev-host.function); + asprintf(3)? This is a deterministic length, so PATH_MAX is especially ridiculous. agreed, will use asprintf instead. A minor nit given this is going to be in only on Linux code, but we generally prefer g_strdup_printf() over raw asprintf() (they do the same thing, but the glib function is guaranteed to be present everywhere, and the returned memory is freeable with g_free() like most of our strings, rather than needing to remember that it needs to be freed via free().) thanks -- PMM
Re: [Qemu-devel] [PATCH v2 2/3] tap-bsd: implement a FreeBSD only version of tap_open
On Tue, Jul 22, 2014 at 01:26:00PM +0100, Stefano Stabellini wrote: On Tue, 22 Jul 2014, Roger Pau Monné wrote: On 27/05/14 15:29, Stefan Hajnoczi wrote: On Fri, May 23, 2014 at 05:57:48PM +0200, Roger Pau Monne wrote: The current behaviour of tap_open for BSD systems differ greatly from it's Linux counterpart. Since FreeBSD supports interface renaming and tap device cloning by opening /dev/tap, implement a FreeBSD specific version of tap_open that behaves like it's Linux counterpart. This is specially important for toolstacks that use Qemu (like Xen libxl), in order to have a unified behaviour across suported platforms. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: xen-de...@lists.xenproject.org Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Anthony Liguori aligu...@us.ibm.com Cc: Stefan Hajnoczi stefa...@redhat.com --- net/tap-bsd.c | 70 - 1 files changed, 69 insertions(+), 1 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com I still don't see this committed to the repository, should I ping someone? I was assuming that this patch would go via some other tree. But if Stefan is OK I could pick it up and submit a pull request for both patch 1 and 2 of this series. I'm fine with that. I only reviewed this patch since it affects the net subsystem and left the series as a whole for someone to merge. Stefan pgp6LKGK11rH8.pgp Description: PGP signature
[Qemu-devel] Fix a bug in debug printing of memory translation tables
Hi, I've enabled DEBUG_MMAP in linux-user/mmap.c and got debug info of memory layout. This is the debug output of guest memory layout from qemu (including the last mmap call marked with *). mmap: start=0x0804a000 len=0x00021000 prot=rw- flags=MAP_ANON MAP_PRIVATE fd=0 offset= ret=0x0804a000 startend size prot 00048000-00049000 1000 r-x * 00049000-0006b000 00022000 rw- 002f6400-002f7400 1000 rw- 002f7400-003ff400 00108000 r-x 003ff400-003ff400 r-- 003ff400-003f6400 7000 rw- 003fe400-003ff400 1000 rw- 003ff400-003ff400 r-x 003ff400-003fe400 f000 r-- 003fe400-003ff400 1000 rw- 003ff400-000f6800 ffcf7400 --- 000f6800-000f7000 0800 rw- It looks completely insane with weird records where the start is bigger than the end, the size is likely negative and in general all addresses are in wrong boundaries. Found a bug in the function which textualize memory translation tables. Made a fix. Now I have the following output: mmap: start=0x0804a000 len=0x00021000 prot=rw- flags=MAP_ANON MAP_PRIVATE fd=0 offset= ret=0x0804a000 startend size prot 08048000-08049000 1000 r-x * 08049000-0806b000 00022000 rw- f6612000-f6615000 3000 rw- f6615000-f67bb000 001a6000 r-x f67bb000-f67bd000 2000 r-- f67bd000-f67c2000 5000 rw- f67da000-f67dd000 3000 rw- f67dd000-f67fd000 0002 r-x f67fd000-f67fe000 1000 r-- f67fe000-f67ff000 1000 rw- f67ff000-f680 1000 --- This looks much better. From 297045c6e7da0089c6ea4ee271000c507c5a8bf8 Mon Sep 17 00:00:00 2001 From: Mikhail Ilyin m.i...@samsung.com Date: Wed, 23 Jul 2014 13:06:15 +0400 Subject: [PATCH] Fix a bug in debug printing of memory translation tables. Signed-off-by: Mikhail Ilyin m.i...@samsung.com --- translate-all.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/translate-all.c b/translate-all.c index 8f7e11b..cb7a33d 100644 --- a/translate-all.c +++ b/translate-all.c @@ -1728,9 +1728,8 @@ int walk_memory_regions(void *priv, walk_memory_regions_fn fn) data.prot = 0; for (i = 0; i V_L1_SIZE; i++) { -int rc = walk_memory_regions_1(data, (abi_ulong)i V_L1_SHIFT, +int rc = walk_memory_regions_1(data, (abi_ulong)i (V_L1_SHIFT + TARGET_PAGE_BITS), V_L1_SHIFT / V_L2_BITS - 1, l1_map + i); - if (rc != 0) { return rc; } -- 1.9.1
[Qemu-devel] QEMU and other libusb application cause segfaults in libusb
Hi all, I post this to both QEMU and libusb because I'm not sure where the error could be located. I have an application using libusb which is running for months without any issues on the host system. When I start QEMU - which uses libusb, too - the errors begin. I route some USB ports to my QEMU-KVM guest but NOT the port where my hardware is attached. From time to time I start receiving things from my own application that are never sent by my USB hardware, it seems to be more a heap of memory coming from somewhere else. And sometimes the libusb gets segfaulted in libusb_handle_events_timeout_completed(). As long as my application is running alone everything is fine. And there is no dmesg output that processes are fighting for my device. When I exit QEMU early enough, the application stays alive and remains stable. Any hints how to prevent that would be appreciated. My system is an i5 CPU with a vanilla kernel 3.4.67. Thanks. Best regards, Erik
Re: [Qemu-devel] [RFC v4 04/13] hw/vfio/pci: introduce VFIODevice
On 07/23/2014 12:24 PM, Peter Maydell wrote: On 23 July 2014 11:02, Eric Auger eric.au...@linaro.org wrote: On 07/09/2014 12:41 AM, Alex Williamson wrote: On Mon, 2014-07-07 at 13:27 +0100, Eric Auger wrote: +vdev-vbasedev.ops = vfio_pci_ops; + +vdev-vbasedev.type = VFIO_DEVICE_TYPE_PCI; +vdev-vbasedev.name = g_malloc0(PATH_MAX); +snprintf(vdev-vbasedev.name, PATH_MAX, %04x:%02x:%02x.%01x, +vdev-host.domain, vdev-host.bus, vdev-host.slot, +vdev-host.function); + asprintf(3)? This is a deterministic length, so PATH_MAX is especially ridiculous. agreed, will use asprintf instead. A minor nit given this is going to be in only on Linux code, but we generally prefer g_strdup_printf() over raw asprintf() (they do the same thing, but the glib function is guaranteed to be present everywhere, and the returned memory is freeable with g_free() like most of our strings, rather than needing to remember that it needs to be freed via free().) Hi Peter, thanks. this is noted. BR Eric thanks -- PMM
Re: [Qemu-devel] [PATCH for-2.1] docs: document remaining QMP events
On 07/23/2014 01:25 AM, Markus Armbruster wrote: Eric Blake ebl...@redhat.com writes: Commit dfab4892 restored this file, but did not address any of the grammar problems that had been fixed in passing when moving events out of this file. There are also a couple events that were undocumented since introduction, and one that had been added only in the time that this file was temporarily deleted. -SPICE_CONNECTED, SPICE_DISCONNECTED +SPICE_CONNECTED +--- -Emitted when a SPICE client connects or disconnects. +Emitted when a SPICE client connects. Wording doesn't match qapi-event.json exactly. I doubt we care. Not the only place where they don't match. And I personally don't care :) +SPICE_INITIALIZED Another SPICE_INITIALIZED? Do you mean SPICE_MIGRATE_COMPLETED? Copy-and-paste strikes again. Yes, I'll fix that. Assuming you do mean SPICE_MIGRATE_COMPLETED: list is complete now. Would you mind splitting this patch? * Either one patch per undocumented event (if you want to be nice to downstreams cherry-picking events), or one patch for all of them. * One patch for the rest. Or if you feel generous, two: one for the grammar fixes, one for the spice split. v2 coming up as a full series. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] target-i386/FPU: wrong conversion infinity from float80 to int32/int64
14.07.2014, 18:59, Peter Maydell peter.mayd...@linaro.org: Since softfloat's status flags are sticky ... What does it mean?
[Qemu-devel] [PATCH v3] docs/multiple-iothreads.txt: add documentation on IOThread programming
This document explains how IOThreads and the main loop are related, especially how to write code that can run in an IOThread. Currently only virtio-blk-data-plane uses these techniques. The next obvious target is virtio-scsi; there has also been work on virtio-net. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- docs/multiple-iothreads.txt | 134 1 file changed, 134 insertions(+) create mode 100644 docs/multiple-iothreads.txt diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt new file mode 100644 index 000..40b8419 --- /dev/null +++ b/docs/multiple-iothreads.txt @@ -0,0 +1,134 @@ +Copyright (c) 2014 Red Hat Inc. + +This work is licensed under the terms of the GNU GPL, version 2 or later. See +the COPYING file in the top-level directory. + + +This document explains the IOThread feature and how to write code that runs +outside the QEMU global mutex. + +The main loop and IOThreads +--- +QEMU is an event-driven program that can do several things at once using an +event loop. The VNC server and the QMP monitor are both processed from the +same event loop, which monitors their file descriptors until they become +readable and then invokes a callback. + +The default event loop is called the main loop (see main-loop.c). It is +possible to create additional event loop threads using -object +iothread,id=my-iothread. + +Side note: The main loop and IOThread are both event loops but their code is +not shared completely. Sometimes it is useful to remember that although they +are conceptually similar they are currently not interchangeable. + +Why IOThreads are useful + +IOThreads allow the user to control the placement of work. The main loop is a +scalability bottleneck on hosts with many CPUs. Work can be spread across +several IOThreads instead of just one main loop. When set up correctly this +can improve I/O latency and reduce jitter seen by the guest. + +The main loop is also deeply associated with the QEMU global mutex, which is a +scalability bottleneck in itself. vCPU threads and the main loop use the QEMU +global mutex to serialize execution of QEMU code. This mutex is necessary +because a lot of QEMU's code historically was not thread-safe. + +The fact that all I/O processing is done in a single main loop and that the +QEMU global mutex is contended by all vCPU threads and the main loop explain +why it is desirable to place work into IOThreads. + +The experimental virtio-blk data-plane implementation has been benchmarked and +shows these effects: +ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf + +How to program for IOThreads + +The main difference between legacy code and new code that can run in an +IOThread is dealing explicitly with the event loop object, AioContext +(see include/block/aio.h). Code that only works in the main loop +implicitly uses the main loop's AioContext. Code that supports running +in IOThreads must be aware of its AioContext. + +AioContext supports the following services: + * File descriptor monitoring (read/write/error on POSIX hosts) + * Event notifiers (inter-thread signalling) + * Timers + * Bottom Halves (BH) deferred callbacks + +There are several old APIs that use the main loop AioContext: + * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor + * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier + * LEGACY timer_new_ms() - create a timer + * LEGACY qemu_bh_new() - create a BH + * LEGACY qemu_aio_wait() - run an event loop iteration + +Since they implicitly work on the main loop they cannot be used in code that +runs in an IOThread. They might cause a crash or deadlock if called from an +IOThread since the QEMU global mutex is not held. + +Instead, use the AioContext functions directly (see include/block/aio.h): + * aio_set_fd_handler() - monitor a file descriptor + * aio_set_event_notifier() - monitor an event notifier + * aio_timer_new() - create a timer + * aio_bh_new() - create a BH + * aio_poll() - run an event loop iteration + +The AioContext can be obtained from the IOThread using +iothread_get_aio_context() or for the main loop using qemu_get_aio_context(). +Code that takes an AioContext argument works both in IOThreads or the main +loop, depending on which AioContext instance the caller passes in. + +How to synchronize with an IOThread +--- +AioContext is not thread-safe so some rules must be followed when using file +descriptors, event notifiers, timers, or BHs across threads: + +1. AioContext functions can be called safely from file descriptor, event +notifier, timer, or BH callbacks invoked by the AioContext. No locking is +necessary. + +2. Other threads wishing to access the AioContext must use +aio_context_acquire()/aio_context_release() for mutual exclusion. Once the +context is acquired no other thread can
[Qemu-devel] [PATCH v2 for-2.1 0/5] docs: document remaining QMP events
diff from v1: split into series [Markus] fix SPICE_MIGRATE_COMPLETE typo [Markus] Eric Blake (5): docs: grammar fixes to qmp-events docs: split SPICE_* event docs docs: document missing SPICE_MIGRATE_COMPLETED event docs: document missing POWERDOWN event docs: document missing VSERPORT_CHANGE event docs/qmp/qmp-events.txt | 80 + 1 file changed, 74 insertions(+), 6 deletions(-) -- 1.9.3
[Qemu-devel] [PATCH v2 for-2.1 3/5] docs: document missing SPICE_MIGRATE_COMPLETED event
The SPICE_MIGRATE_COMPLETED event was first documented in 7cfadb6b. But since dfab4892 later restored this flie to the state prior to qmp events, and we never documented it in the past, anyone using this file instead of qapi will miss out on this event. * docs/qmp/qmp-events.txt (SPICE_MIGRATE_COMPLETED): Add. Signed-off-by: Eric Blake ebl...@redhat.com --- docs/qmp/qmp-events.txt | 13 + 1 file changed, 13 insertions(+) diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt index 9b7ee7c..22d552f 100644 --- a/docs/qmp/qmp-events.txt +++ b/docs/qmp/qmp-events.txt @@ -424,6 +424,19 @@ Example: channel-id: 0, tls: true} }} +SPICE_MIGRATE_COMPLETED +--- + +Emitted when SPICE migration has completed + +Data: None. + +Example: + +{ timestamp: {seconds: 1290688046, microseconds: 417172}, + event: SPICE_MIGRATE_COMPLETED } + + STOP -- 1.9.3
[Qemu-devel] [PATCH v2 for-2.1 4/5] docs: document missing POWERDOWN event
The POWERDOWN event was first documented in 0aab9ec3. But since dfab4892 later restored this file to the state prior to qmp events, and we never documented it in the past, anyone using this file instead of qapi will miss out on this event. Tweak the existing wording of SHUTDOWN to match 84321831, and make the difference between the two events apparent. * docs/qmp/qmp-events.txt (POWERDOWN): Add. (SHUTDOWN): Tweak. Signed-off-by: Eric Blake ebl...@redhat.com --- docs/qmp/qmp-events.txt | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt index 22d552f..9d7439e 100644 --- a/docs/qmp/qmp-events.txt +++ b/docs/qmp/qmp-events.txt @@ -243,6 +243,19 @@ Data: timestamp: { seconds: 1368697518, microseconds: 326866 } } } +POWERDOWN +- + +Emitted when the Virtual Machine is powered down through the power +control system, such as via ACPI. + +Data: None. + +Example: + +{ event: POWERDOWN, +timestamp: { seconds: 1267040730, microseconds: 682951 } } + QUORUM_FAILURE -- @@ -325,7 +338,8 @@ Example: SHUTDOWN -Emitted when the Virtual Machine is powered down. +Emitted when the Virtual Machine has shut down, indicating that qemu +is about to exit. Data: None. -- 1.9.3
[Qemu-devel] [PATCH v2 for-2.1 5/5] docs: document missing VSERPORT_CHANGE event
The VSERPORT_CHANGE event was added in e2ae6159. The patch for this event was prepared at a time when this file was gone, even though it got applied immediately after dfab4892 restored this file. Duplicate the documentation into this file, so that anyone using this file instead of qapi will not miss out on this new event. * docs/qmp/qmp-events.txt (VSERPORT_CHANGE): Add. Signed-off-by: Eric Blake ebl...@redhat.com --- docs/qmp/qmp-events.txt | 16 1 file changed, 16 insertions(+) diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt index 9d7439e..d759d19 100644 --- a/docs/qmp/qmp-events.txt +++ b/docs/qmp/qmp-events.txt @@ -579,6 +579,22 @@ Example: host: 127.0.0.1, sasl_username: luiz } }, timestamp: { seconds: 1263475302, microseconds: 150772 } } +VSERPORT_CHANGE +--- + +Emitted when the guest opens or closes a virtio-serial port. + +Data: + +- id: device identifier of the virtio-serial port (json-string) +- open: true if the guest has opened the virtio-serial port (json-bool) + +Example: + +{ event: VSERPORT_CHANGE, +data: { id: channel0, open: true }, +timestamp: { seconds: 1401385907, microseconds: 422329 } } + WAKEUP -- -- 1.9.3
[Qemu-devel] [PATCH v2 for-2.1 1/5] docs: grammar fixes to qmp-events
When converting to qmp events, commits 7cfadb6b and a6330785 fixed some grammar as part of moving text between files. But since dfab4892 later restored this file to the state prior to qmp events, we have to do it again. * docs/qmp/qmp-events.txt (RESET, SPICE_INITIALIZED): Tweak. Signed-off-by: Eric Blake ebl...@redhat.com --- docs/qmp/qmp-events.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt index 4a6c2a2..524eadf 100644 --- a/docs/qmp/qmp-events.txt +++ b/docs/qmp/qmp-events.txt @@ -285,7 +285,7 @@ Example: RESET - -Emitted when the Virtual Machine is reseted. +Emitted when the Virtual Machine is reset. Data: None. @@ -366,7 +366,7 @@ SPICE_INITIALIZED - Emitted after initial handshake and authentication takes place (if any) -and the SPICE channel is up'n'running +and the SPICE channel is up and running Data: -- 1.9.3
[Qemu-devel] [PATCH v2 for-2.1 2/5] docs: split SPICE_* event docs
For consistency with the rest of this file, every event should be listed in isolation. Compare how commit 7cfadb6b split SPICE_CONNECTED and SPICE_DISCONNECTED into separate qmp events. * docs/qmp/qmp-events.txt (SPICE_CONNECTED, SPICE_DISCONNECTED): Split. Signed-off-by: Eric Blake ebl...@redhat.com --- docs/qmp/qmp-events.txt | 31 --- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt index 524eadf..9b7ee7c 100644 --- a/docs/qmp/qmp-events.txt +++ b/docs/qmp/qmp-events.txt @@ -337,10 +337,10 @@ Example: Note: If the command-line option -no-shutdown has been specified, a STOP event will eventually follow the SHUTDOWN event. -SPICE_CONNECTED, SPICE_DISCONNECTED +SPICE_CONNECTED +--- -Emitted when a SPICE client connects or disconnects. +Emitted when a SPICE client connects. Data: @@ -362,6 +362,31 @@ Example: client: {port: 52873, family: ipv4, host: 127.0.0.1} }} +SPICE_DISCONNECTED +-- + +Emitted when a SPICE client disconnects. + +Data: + +- server: Server information (json-object) + - host: IP address (json-string) + - port: port number (json-string) + - family: address family (json-string, ipv4 or ipv6) +- client: Client information (json-object) + - host: IP address (json-string) + - port: port number (json-string) + - family: address family (json-string, ipv4 or ipv6) + +Example: + +{ timestamp: {seconds: 1290688046, microseconds: 388707}, + event: SPICE_DISCONNECTED, + data: { +server: { port: 5920, family: ipv4, host: 127.0.0.1}, +client: {port: 52873, family: ipv4, host: 127.0.0.1} +}} + SPICE_INITIALIZED - -- 1.9.3
Re: [Qemu-devel] [PATCH v3] docs/multiple-iothreads.txt: add documentation on IOThread programming
On 07/23/2014 05:55 AM, Stefan Hajnoczi wrote: This document explains how IOThreads and the main loop are related, especially how to write code that can run in an IOThread. Currently only virtio-blk-data-plane uses these techniques. The next obvious target is virtio-scsi; there has also been work on virtio-net. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- Would have been nice to explain the diff to v2... docs/multiple-iothreads.txt | 134 1 file changed, 134 insertions(+) create mode 100644 docs/multiple-iothreads.txt Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] target-i386/FPU: wrong conversion infinity from float80 to int32/int64
On 23 July 2014 12:55, Dmitry Poletaev poletaev-q...@yandex.ru wrote: 14.07.2014, 18:59, Peter Maydell peter.mayd...@linaro.org: Since softfloat's status flags are sticky ... What does it mean? Sticky here means that the status flags accumulate the status from a sequence of operations: a softfloat function will set the flag if the relevant exception occurred, but if the exceptional condition did not happen then the flag will be left at whatever its preceding value was. So you can't just say if the flag is set then the last operation I did set it, because it might have been set by some operation before that. (That is, once a bit gets set in the flags word it sticks and doesn't go away.) This matches the IEEE mandated behaviour for floating point exception flags, which is why we do it. thanks -- PMM
[Qemu-devel] [RFC 3/3] QMP: extend BLOCK_IO_ERROR event with no-space indicator
Management software, such as OpenStack and RHEV's vdsm, want to be able to allocate disk space on demand. The basic use case is to start a VM with a small disk and then the disk is enlarged when QEMU hits a ENOSPC condition. To this end, the management software has to be notified when QEMU encounters ENOSPC. The solution implemented by this commit is simple: it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true when QEMU is stopped due to ENOSPC. Note that support for quering this event is already present in query-block by means of the 'io-status' key and that the new 'nospace' BLOCK_IO_ERROR field shares the same semantics with 'io-status', which basically means that werror= has to be set to either 'stop' or 'enospc'. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- block.c | 22 ++ qapi/block-core.json | 7 ++- 2 files changed, 20 insertions(+), 9 deletions(-) diff --git a/block.c b/block.c index 8cf519b..566ef56 100644 --- a/block.c +++ b/block.c @@ -3596,6 +3596,18 @@ BlockErrorAction bdrv_get_error_action(BlockDriverState *bs, bool is_read, int e } } +static void send_qmp_error_event(BlockDriverState *bs, + BlockErrorAction action, + bool is_read, int error) +{ +BlockErrorAction ac; + +ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE; +qapi_event_send_block_io_error(bdrv_get_device_name(bs), ac, action, + bdrv_iostatus_is_enabled(bs), + error == ENOSPC, error_abort); +} + /* This is done by device models because, while the block layer knows * about the error, it does not know whether an operation comes from * the device or the block layer (from a job, for example). @@ -3621,16 +3633,10 @@ void bdrv_error_action(BlockDriverState *bs, BlockErrorAction action, * also ensures that the STOP/RESUME pair of events is emitted. */ qemu_system_vmstop_request_prepare(); -qapi_event_send_block_io_error(bdrv_get_device_name(bs), - is_read ? IO_OPERATION_TYPE_READ : - IO_OPERATION_TYPE_WRITE, - action, error_abort); +send_qmp_error_event(bs, action, is_read, error); qemu_system_vmstop_request(RUN_STATE_IO_ERROR); } else { -qapi_event_send_block_io_error(bdrv_get_device_name(bs), - is_read ? IO_OPERATION_TYPE_READ : - IO_OPERATION_TYPE_WRITE, - action, error_abort); +send_qmp_error_event(bs, action, is_read, error); } } diff --git a/qapi/block-core.json b/qapi/block-core.json index 1069679..d659165 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -1534,6 +1534,11 @@ # # @action: action that has been taken # +# @nospace: #optional true if I/O error was caused due to a no-space +# condition. This key is only present if query-block's +# io-status is present, please see query-block documentation +# for more information (since: 2.2) +# # Note: If action is stop, a STOP event will eventually follow the # BLOCK_IO_ERROR event # @@ -1541,7 +1546,7 @@ ## { 'event': 'BLOCK_IO_ERROR', 'data': { 'device': 'str', 'operation': 'IoOperationType', -'action': 'BlockErrorAction' } } +'action': 'BlockErrorAction', '*nospace': 'bool' } } ## # @BLOCK_JOB_COMPLETED -- 1.9.3
[Qemu-devel] [RFC 0/3] QMP: extend BLOCK_IO_ERROR event
Management software, such as OpenStack and RHEV's vdsm, wants to be able to allocate VM disk space on demand. The basic use case is to start a VM with a small disk and then the disk is enlarged when QEMU hits a ENOSPC condition. To this end, the management software has to be notified when QEMU encounters ENOSPC. The most straightforward solution is to extend QMP's BLOCK_IO_ERROR event with that information. This series does exactly that. The approach taken is the simplest possible: the BLOCK_IO_ERROR event is extended to contain a nospace key, which will be true whenever the guest runs out of space *and* werror=stop|enospc. Here's an example: { event: BLOCK_IO_ERROR, data: { device: ide0-hd1, operation: write, action: stop, nospace: true }, timestamp: { seconds: 1265044230, microseconds: 450486 } } There are three important things to observe: 1. query-block already supports querying the event by means of the io-status key. Actually, nospace and io-status keys share the same semantics. This is a big advantage of this approach, no further extension of query-block is needed 2. The event could also contain an error message key for debugging, But if we add it to the event, should we add it to query-block too? 3. I'm not extending BLOCK_JOB_ERROR. The reason is that it seems that BLOCK_IO_ERROR is also emitted on BLOCK_JOB_ERROR Now, this series is an RFC because there's an alternative solution for this problem: instead of extending the BLOCK_IO_ERROR event with no-space indicator, we could have a stringfied errno. This way management apps would also be able to distinguish among other errors. For example, we could have a error-details dict containing a reason and a message key: { event: BLOCK_IO_ERROR, data: { device: ide0-hd1, operation: write, action: stop, error-details: { reason: eio, message: I/O error }, timestamp: { seconds: 1265044230, microseconds: 450486 } } And then query-block would have to be extended to contain the same information. IMO, this series implementation is good enough for the requirement we currently have but I'm open to go complex if needed. Luiz Capitulino (3): qapi: block-core.json: improve query-block doc QMP: rate limit BLOCK_IO_ERROR QMP: extend BLOCK_IO_ERROR event with no-space indicator block.c | 22 ++ monitor.c| 1 + qapi/block-core.json | 8 +++- 3 files changed, 22 insertions(+), 9 deletions(-) -- 1.9.3
[Qemu-devel] [RFC 2/3] QMP: rate limit BLOCK_IO_ERROR
This event has the same characteristics of the other rate-limited events, mainly we can emit dozens of it. Rate limit it then. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- monitor.c | 1 + 1 file changed, 1 insertion(+) diff --git a/monitor.c b/monitor.c index 5bc70a6..33abe6c 100644 --- a/monitor.c +++ b/monitor.c @@ -589,6 +589,7 @@ static void monitor_qapi_event_init(void) monitor_qapi_event_throttle(QAPI_EVENT_QUORUM_REPORT_BAD, 1000); monitor_qapi_event_throttle(QAPI_EVENT_QUORUM_FAILURE, 1000); monitor_qapi_event_throttle(QAPI_EVENT_VSERPORT_CHANGE, 1000); +monitor_qapi_event_throttle(QAPI_EVENT_BLOCK_IO_ERROR, 1000); qmp_event_set_func_emit(monitor_qapi_event_queue); } -- 1.9.3
Re: [Qemu-devel] [PATCH v2 for-2.1 3/5] docs: document missing SPICE_MIGRATE_COMPLETED event
Eric Blake ebl...@redhat.com writes: The SPICE_MIGRATE_COMPLETED event was first documented in 7cfadb6b. But since dfab4892 later restored this flie to the this file state prior to qmp events, and we never documented it in the past, anyone using this file instead of qapi will miss out on this event. * docs/qmp/qmp-events.txt (SPICE_MIGRATE_COMPLETED): Add. Signed-off-by: Eric Blake ebl...@redhat.com Patch is fine.
Re: [Qemu-devel] [PATCH v2 2/3] tap-bsd: implement a FreeBSD only version of tap_open
On 22/07/14 14:26, Stefano Stabellini wrote: On Tue, 22 Jul 2014, Roger Pau Monné wrote: On 27/05/14 15:29, Stefan Hajnoczi wrote: On Fri, May 23, 2014 at 05:57:48PM +0200, Roger Pau Monne wrote: The current behaviour of tap_open for BSD systems differ greatly from it's Linux counterpart. Since FreeBSD supports interface renaming and tap device cloning by opening /dev/tap, implement a FreeBSD specific version of tap_open that behaves like it's Linux counterpart. This is specially important for toolstacks that use Qemu (like Xen libxl), in order to have a unified behaviour across suported platforms. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: xen-de...@lists.xenproject.org Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Anthony Liguori aligu...@us.ibm.com Cc: Stefan Hajnoczi stefa...@redhat.com --- net/tap-bsd.c | 70 - 1 files changed, 69 insertions(+), 1 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com I still don't see this committed to the repository, should I ping someone? I was assuming that this patch would go via some other tree. But if Stefan is OK I could pick it up and submit a pull request for both patch 1 and 2 of this series. Would you do the backport of those three patches (one is already committed as e02bc6) to the qemu-xen repo at the same time, or would you like me to remind you about this in a month or so? I would really like to have all this patches in Xen 4.5 if possible. Thanks, Roger.
Re: [Qemu-devel] [PATCH v2 2/3] tap-bsd: implement a FreeBSD only version of tap_open
On Wed, 23 Jul 2014, Roger Pau Monné wrote: On 22/07/14 14:26, Stefano Stabellini wrote: On Tue, 22 Jul 2014, Roger Pau Monné wrote: On 27/05/14 15:29, Stefan Hajnoczi wrote: On Fri, May 23, 2014 at 05:57:48PM +0200, Roger Pau Monne wrote: The current behaviour of tap_open for BSD systems differ greatly from it's Linux counterpart. Since FreeBSD supports interface renaming and tap device cloning by opening /dev/tap, implement a FreeBSD specific version of tap_open that behaves like it's Linux counterpart. This is specially important for toolstacks that use Qemu (like Xen libxl), in order to have a unified behaviour across suported platforms. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: xen-de...@lists.xenproject.org Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Anthony Liguori aligu...@us.ibm.com Cc: Stefan Hajnoczi stefa...@redhat.com --- net/tap-bsd.c | 70 - 1 files changed, 69 insertions(+), 1 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com I still don't see this committed to the repository, should I ping someone? I was assuming that this patch would go via some other tree. But if Stefan is OK I could pick it up and submit a pull request for both patch 1 and 2 of this series. Would you do the backport of those three patches (one is already committed as e02bc6) to the qemu-xen repo at the same time, or would you like me to remind you about this in a month or so? I would really like to have all this patches in Xen 4.5 if possible. I should remember when I'll send a pull request (when 2.1 will be out). But please remind me if I'll forget.
[Qemu-devel] [PATCH v7 3/5] block/archipelago: Add support for creating images
qemu-img archipelago:volumename[/mport=mapperd_port[:vport=vlmcd_port] [:segment=segment_name]] [size] Signed-off-by: Chrysostomos Nanakos cnana...@grnet.gr --- block/archipelago.c | 146 +++ 1 file changed, 146 insertions(+) diff --git a/block/archipelago.c b/block/archipelago.c index 5a9fc68..b5c66fd 100644 --- a/block/archipelago.c +++ b/block/archipelago.c @@ -592,6 +592,137 @@ err_exit: xseg_leave(s-xseg); } +static int qemu_archipelago_create_volume(Error **errp, const char *volname, + char *segment_name, + uint64_t size, xport mportno, + xport vportno) +{ +int ret, targetlen; +struct xseg *xseg = NULL; +struct xseg_request *req; +struct xseg_request_clone *xclone; +struct xseg_port *port; +xport srcport = NoPort, sport = NoPort; +char *target; + +/* Try default values if none has been set */ +if (mportno == (xport) -1) { +mportno = ARCHIPELAGO_DFL_MPORT; +} + +if (vportno == (xport) -1) { +vportno = ARCHIPELAGO_DFL_VPORT; +} + +if (xseg_initialize()) { +error_setg(errp, Cannot initialize XSEG); +return -1; +} + +xseg = xseg_join(posix, segment_name, + posixfd, NULL); + +if (!xseg) { +error_setg(errp, Cannot join XSEG shared memory segment); +return -1; +} + +port = xseg_bind_dynport(xseg); +srcport = port-portno; +init_local_signal(xseg, sport, srcport); + +req = xseg_get_request(xseg, srcport, mportno, X_ALLOC); +if (!req) { +error_setg(errp, Cannot get XSEG request); +return -1; +} + +targetlen = strlen(volname); +ret = xseg_prep_request(xseg, req, targetlen, +sizeof(struct xseg_request_clone)); +if (ret 0) { +error_setg(errp, Cannot prepare XSEG request); +goto err_exit; +} + +target = xseg_get_target(xseg, req); +if (!target) { +error_setg(errp, Cannot get XSEG target.\n); +goto err_exit; +} +memcpy(target, volname, targetlen); +xclone = (struct xseg_request_clone *) xseg_get_data(xseg, req); +memset(xclone-target, 0 , XSEG_MAX_TARGETLEN); +xclone-targetlen = 0; +xclone-size = size; +req-offset = 0; +req-size = req-datalen; +req-op = X_CLONE; + +xport p = xseg_submit(xseg, req, srcport, X_ALLOC); +if (p == NoPort) { +error_setg(errp, Could not submit XSEG request); +goto err_exit; +} +xseg_signal(xseg, p); + +ret = wait_reply(xseg, srcport, port, req); +if (ret 0) { +error_setg(errp, wait_reply() error.); +} + +xseg_put_request(xseg, req, srcport); +xseg_quit_local_signal(xseg, srcport); +xseg_leave_dynport(xseg, port); +xseg_leave(xseg); +return ret; + +err_exit: +xseg_put_request(xseg, req, srcport); +xseg_quit_local_signal(xseg, srcport); +xseg_leave_dynport(xseg, port); +xseg_leave(xseg); +return -1; +} + +static int qemu_archipelago_create(const char *filename, + QemuOpts *options, + Error **errp) +{ +int ret = 0; +uint64_t total_size = 0; +char *volname = NULL, *segment_name = NULL; +const char *start; +xport mport = NoPort, vport = NoPort; + +if (!strstart(filename, archipelago:, start)) { +error_setg(errp, File name must start with 'archipelago:'); +return -1; +} + +if (!strlen(start) || strstart(start, /, NULL)) { +error_setg(errp, volume name must be specified); +return -1; +} + +parse_filename_opts(filename, errp, volname, segment_name, mport, +vport); +total_size = qemu_opt_get_size_del(options, BLOCK_OPT_SIZE, 0); + +if (segment_name == NULL) { +segment_name = g_strdup(archipelago); +} + +/* Create an Archipelago volume */ +ret = qemu_archipelago_create_volume(errp, volname, segment_name, + total_size, mport, + vport); + +g_free(volname); +g_free(segment_name); +return ret; +} + static void qemu_archipelago_aio_cancel(BlockDriverAIOCB *blockacb) { ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) blockacb; @@ -892,6 +1023,19 @@ static int64_t qemu_archipelago_getlength(BlockDriverState *bs) return ret; } +static QemuOptsList qemu_archipelago_create_opts = { +.name = archipelago-create-opts, +.head = QTAILQ_HEAD_INITIALIZER(qemu_archipelago_create_opts.head), +.desc = { +{ +.name = BLOCK_OPT_SIZE, +.type = QEMU_OPT_SIZE, +.help = Virtual disk size +}, +{ /* end of list */ } +} +}; + static BlockDriverAIOCB *qemu_archipelago_aio_flush(BlockDriverState
[Qemu-devel] [PATCH v7 4/5] QMP: Add support for Archipelago
Introduce new enum BlockdevOptionsArchipelago. @volume: #Name of the Archipelago volume image @mport: #'mport' is the port number on which mapperd is listening. This is optional and if not specified, QEMU will make Archipelago to use the default port. @vport: #'vport' is the port number on which vlmcd is listening. This is optional and if not specified, QEMU will make Archipelago to use the default port. @segment: #optional The name of the shared memory segment Archipelago stack is using. This is optional and if not specified, QEMU will make Archipelago use the default value, 'archipelago'. Signed-off-by: Chrysostomos Nanakos cnana...@grnet.gr --- qapi/block-core.json | 38 +++--- 1 file changed, 35 insertions(+), 3 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index e378653..0fa0c12 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -190,8 +190,8 @@ # @ro: true if the backing device was open read-only # # @drv: the name of the block format used to open the backing device. As of -# 0.14.0 this can be: 'blkdebug', 'bochs', 'cloop', 'cow', 'dmg', -# 'file', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device', +# 0.14.0 this can be: 'archipelago', 'blkdebug', 'bochs', 'cloop', 'cow', +# 'dmg', 'file', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device', # 'host_floppy', 'http', 'https', 'nbd', 'parallels', 'qcow', # 'qcow2', 'raw', 'tftp', 'vdi', 'vmdk', 'vpc', 'vvfat' # @@ -1143,7 +1143,7 @@ # Since: 2.0 ## { 'enum': 'BlockdevDriver', - 'data': [ 'file', 'host_device', 'host_cdrom', 'host_floppy', + 'data': [ 'archipelago', 'file', 'host_device', 'host_cdrom', 'host_floppy', 'http', 'https', 'ftp', 'ftps', 'tftp', 'vvfat', 'blkdebug', 'blkverify', 'bochs', 'cloop', 'cow', 'dmg', 'parallels', 'qcow', 'qcow2', 'qed', 'raw', 'vdi', 'vhdx', 'vmdk', 'vpc', 'quorum' ] } @@ -1273,6 +1273,37 @@ '*pass-discard-snapshot': 'bool', '*pass-discard-other': 'bool' } } + +## +# @BlockdevOptionsArchipelago +# +# Driver specific block device options for Archipelago. +# +# @volume: Name of the Archipelago volume image +# +# @mport: #optional The port number on which mapperd is +# listening. This is optional +# and if not specified, QEMU will make Archipelago +# use the default port (1001). +# +# @vport: #optional The port number on which vlmcd is +# listening. This is optional +# and if not specified, QEMU will make Archipelago +# use the default port (501). +# +# @segment: #optional The name of the shared memory segment +# Archipelago stack is using. This is optional +# and if not specified, QEMU will make Archipelago +# use the default value, 'archipelago'. +# Since: 2.2 +## +{ 'type': 'BlockdevOptionsArchipelago', + 'data': { 'volume': 'str', +'*mport': 'int', +'*vport': 'int', +'*segment': 'str' } } + + ## # @BlkdebugEvent # @@ -1416,6 +1447,7 @@ 'base': 'BlockdevOptionsBase', 'discriminator': 'driver', 'data': { + 'archipelago':'BlockdevOptionsArchipelago', 'file': 'BlockdevOptionsFile', 'host_device':'BlockdevOptionsFile', 'host_cdrom': 'BlockdevOptionsFile', -- 1.7.10.4
[Qemu-devel] [PATCH v7 0/5] Support Archipelago as a QEMU block backend
v7: - Fix coding style issues. - Rename __archipelago_submit_request function to archipelago_submit_request. - Set X_NONBLOCK flag to xseg_receive(). - Return -EIO to .bdrv_getlength() if archipelago_volume_info() fails. - Fix segment_name mem leak. - Bump version number from 2.1 to 2.2 in qapi/block-core.json file concerning QEMU Archipelago support. - Convert qemu_aio_wait() to aio_poll(). - Remove qemu_blockalign() and memcpy() call and use qemu_iovec_to_buf() directly. v6: - Split v5 1/4 patch into two different patches. First one implements QMP structured options and the second one implements bdrv_parse_filename(). v5: - Remove useless qemu_aio_count variable from BDRVArchipelagoState struct. - Cleanup xseg signal descriptor, call xseg_quit_local_signal() when closing block device. - Fix ds and volname leaks. - Make xseg request handler thread joinable and wait until exits before destroying condition variables and mutexes. Thanks to Stefan Hajnoczi for pointing this out. - Remove error_propagate() useless call. - Use memcpy instead of strncpy. - Remove check after trying to allocate memory with g_malloc(). - Remove pipe code and complete AIO by introducing QEMU bottom-half. - Add Archipelago shared memory segment name in options list and QMP. - Remove functions archipelago_aio_read()/_write() and introduce new and simpler function, __archipelago_submit_request(). Refactor archipelago_aio_segmented_rw() function. - Enable Archipelago support in qemu-iotests v4: - Move Archipelago QMP support from qapi-schema.json file to qapi/block-core.json. Fixe various typographic errors, thanks to Kevin Wolf and Eric Blake. - Use new .create_opts format, define new QemuOptsList structure and refactor qemu_archipelago_create function. v3: - Break down initial patch from one to three. First patch implements Archipelago QEMU block backend with read/write functionality. Second patch implements .bdrv_create() and adds support for creating Archipelago images. Third patch adds QMP support. - Remove global variable g_xseg_init, make xseg_initialize(), xseg_join() and xseg_leave() reentrant and thread-safe. - Introduce new enum BlockdevOptionsArchipelago for the QMP support. v2: - Implement .bdrv_parse_filename() function to convert the shortuct version with a single string to the individual options. - Remove global variables and move relevant fields to ArchipelagoAIOCB struct. - Remove ArchipelagoConf struct and use the relevant fields as individual arguments. - Remove ArchipelagoCB struct and use ArchipelagoAIOCB instead. - Remove ArchipelagoThread struct and move relevant fields to ArchipelagoAIOCB instead. Now an I/O thread is spawned for per-device to handle all async I/O requests. - Remove double data copy, use qemu_iovec_from_buf() and copy data directly to the destination buffer. - Remove archipelago_aio_bh_cb() function, a full request is completed in qemu_archipelago_complete_aio() instead. - Resolve proposed changes from Kevin Wolf and miscellaneous style issues. Chrysostomos Nanakos (5): block: Support Archipelago as a QEMU block backend block/archipelago: Implement bdrv_parse_filename() block/archipelago: Add support for creating images QMP: Add support for Archipelago qemu-iotests: add support for Archipelago protocol MAINTAINERS |6 + block/Makefile.objs |2 + block/archipelago.c | 1064 ++ configure| 40 ++ qapi/block-core.json | 38 +- tests/qemu-iotests/common|6 + tests/qemu-iotests/common.rc |9 +- 7 files changed, 1161 insertions(+), 4 deletions(-) create mode 100644 block/archipelago.c -- 1.7.10.4
[Qemu-devel] [PATCH v7 1/5] block: Support Archipelago as a QEMU block backend
VM Image on Archipelago volume is specified like this: file.driver=archipelago,file.volume=volumename[,file.mport=mapperd_port[, file.vport=vlmcd_port][,file.segment=segment_name]] 'archipelago' is the protocol. 'mport' is the port number on which mapperd is listening. This is optional and if not specified, QEMU will make Archipelago to use the default port. 'vport' is the port number on which vlmcd is listening. This is optional and if not specified, QEMU will make Archipelago to use the default port. 'segment' is the name of the shared memory segment Archipelago stack is using. This is optional and if not specified, QEMU will make Archipelago to use the default value, 'archipelago'. Examples: file.driver=archipelago,file.volume=my_vm_volume file.driver=archipelago,file.volume=my_vm_volume,file.mport=123 file.driver=archipelago,file.volume=my_vm_volume,file.mport=123, file.vport=1234 file.driver=archipelago,file.volume=my_vm_volume,file.mport=123, file.vport=1234,file.segment=my_segment Signed-off-by: Chrysostomos Nanakos cnana...@grnet.gr --- MAINTAINERS |6 + block/Makefile.objs |2 + block/archipelago.c | 785 +++ configure | 40 +++ 4 files changed, 833 insertions(+) create mode 100644 block/archipelago.c diff --git a/MAINTAINERS b/MAINTAINERS index 906f252..59940f9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1000,3 +1000,9 @@ SSH M: Richard W.M. Jones rjo...@redhat.com S: Supported F: block/ssh.c + +ARCHIPELAGO +M: Chrysostomos Nanakos cnana...@grnet.gr +M: Chrysostomos Nanakos ch...@include.gr +S: Maintained +F: block/archipelago.c diff --git a/block/Makefile.objs b/block/Makefile.objs index fd88c03..858d2b3 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -17,6 +17,7 @@ block-obj-$(CONFIG_LIBNFS) += nfs.o block-obj-$(CONFIG_CURL) += curl.o block-obj-$(CONFIG_RBD) += rbd.o block-obj-$(CONFIG_GLUSTERFS) += gluster.o +block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o block-obj-$(CONFIG_LIBSSH2) += ssh.o endif @@ -35,5 +36,6 @@ gluster.o-cflags := $(GLUSTERFS_CFLAGS) gluster.o-libs := $(GLUSTERFS_LIBS) ssh.o-cflags := $(LIBSSH2_CFLAGS) ssh.o-libs := $(LIBSSH2_LIBS) +archipelago.o-libs := $(ARCHIPELAGO_LIBS) qcow.o-libs:= -lz linux-aio.o-libs := -laio diff --git a/block/archipelago.c b/block/archipelago.c new file mode 100644 index 000..1c21d36 --- /dev/null +++ b/block/archipelago.c @@ -0,0 +1,785 @@ +/* + * QEMU Block driver for Archipelago + * + * Copyright (C) 2014 Chrysostomos Nanakos cnana...@grnet.gr + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ + +/* + * VM Image on Archipelago volume is specified like this: + * + * file.driver=archipelago,file.volume=volumename + * [,file.mport=mapperd_port[,file.vport=vlmcd_port] + * [,file.segment=segment_name]] + * + * 'archipelago' is the protocol. + * + * 'mport' is the port number on which mapperd is listening. This is optional + * and if not specified, QEMU will make Archipelago to use the default port. + * + * 'vport' is the port number on which vlmcd is listening. This is optional + * and if not specified, QEMU will make Archipelago to use the default port. + * + * 'segment' is the name of the shared memory segment Archipelago stack + * is using. This is optional and if not specified, QEMU will make Archipelago + * to use the default value, 'archipelago'. + * + * Examples: + * + * file.driver=archipelago,file.volume=my_vm_volume + * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123 + * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123, + * file.vport=1234 + * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123, + * file.vport=1234,file.segment=my_segment + */ + +#include block/block_int.h +#include qemu/error-report.h +#include qemu/thread.h +#include qapi/qmp/qint.h +#include qapi/qmp/qstring.h +#include qapi/qmp/qjson.h + +#include inttypes.h +#include xseg/xseg.h +#include xseg/protocol.h + +#define ARCHIP_FD_READ 0 +#define ARCHIP_FD_WRITE 1 +#define MAX_REQUEST_SIZE524288 + +#define ARCHIPELAGO_OPT_VOLUME volume +#define ARCHIPELAGO_OPT_SEGMENT segment +#define ARCHIPELAGO_OPT_MPORT mport +#define ARCHIPELAGO_OPT_VPORT vport +#define ARCHIPELAGO_DFL_MPORT 1001 +#define ARCHIPELAGO_DFL_VPORT 501 + +#define archipelagolog(fmt, ...) \ +do { \ +fprintf(stderr, archipelago\t%-24s: fmt, __func__, ##__VA_ARGS__); \ +} while (0) + +typedef enum { +ARCHIP_OP_READ, +ARCHIP_OP_WRITE, +ARCHIP_OP_FLUSH, +ARCHIP_OP_VOLINFO, +} ARCHIPCmd; + +typedef struct ArchipelagoAIOCB { +BlockDriverAIOCB common; +QEMUBH *bh; +struct BDRVArchipelagoState *s; +QEMUIOVector *qiov; +ARCHIPCmd cmd; +bool cancelled; +int status; +int64_t size; +int64_t ret;
[Qemu-devel] [PATCH v7 2/5] block/archipelago: Implement bdrv_parse_filename()
VM Image on Archipelago volume can also be specified like this: file=archipelago:volumename[/mport=mapperd_port[:vport=vlmcd_port][: segment=segment_name]] Examples: file=archipelago:my_vm_volume file=archipelago:my_vm_volume/mport=123 file=archipelago:my_vm_volume/mport=123:vport=1234 file=archipelago:my_vm_volume/mport=123:vport=1234:segment=my_segment Signed-off-by: Chrysostomos Nanakos cnana...@grnet.gr --- block/archipelago.c | 140 ++- 1 file changed, 138 insertions(+), 2 deletions(-) diff --git a/block/archipelago.c b/block/archipelago.c index 1c21d36..5a9fc68 100644 --- a/block/archipelago.c +++ b/block/archipelago.c @@ -15,6 +15,11 @@ * [,file.mport=mapperd_port[,file.vport=vlmcd_port] * [,file.segment=segment_name]] * + * or + * + * file=archipelago:volumename[/mport=mapperd_port[:vport=vlmcd_port][: + * segment=segment_name]] + * * 'archipelago' is the protocol. * * 'mport' is the port number on which mapperd is listening. This is optional @@ -32,11 +37,20 @@ * file.driver=archipelago,file.volume=my_vm_volume * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123 * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123, - * file.vport=1234 + * file.vport=1234 * file.driver=archipelago,file.volume=my_vm_volume,file.mport=123, - * file.vport=1234,file.segment=my_segment + * file.vport=1234,file.segment=my_segment + * + * or + * + * file=archipelago:my_vm_volume + * file=archipelago:my_vm_volume/mport=123 + * file=archipelago:my_vm_volume/mport=123:vport=1234 + * file=archipelago:my_vm_volume/mport=123:vport=1234:segment=my_segment + * */ +#include qemu-common.h #include block/block_int.h #include qemu/error-report.h #include qemu/thread.h @@ -309,6 +323,127 @@ static void qemu_archipelago_complete_aio(void *opaque) g_free(reqdata); } +static void xseg_find_port(char *pstr, const char *needle, xport *aport) +{ +const char *a; +char *endptr = NULL; +unsigned long port; +if (strstart(pstr, needle, a)) { +if (strlen(a) 0) { +port = strtoul(a, endptr, 10); +if (strlen(endptr)) { +*aport = -2; +return; +} +*aport = (xport) port; +} +} +} + +static void xseg_find_segment(char *pstr, const char *needle, + char **segment_name) +{ +const char *a; +if (strstart(pstr, needle, a)) { +if (strlen(a) 0) { +*segment_name = g_strdup(a); +} +} +} + +static void parse_filename_opts(const char *filename, Error **errp, +char **volume, char **segment_name, +xport *mport, xport *vport) +{ +const char *start; +char *tokens[4], *ds; +int idx; +xport lmport = NoPort, lvport = NoPort; + +strstart(filename, archipelago:, start); + +ds = g_strdup(start); +tokens[0] = strtok(ds, /); +tokens[1] = strtok(NULL, :); +tokens[2] = strtok(NULL, :); +tokens[3] = strtok(NULL, \0); + +if (!strlen(tokens[0])) { +error_setg(errp, volume name must be specified first); +g_free(ds); +return; +} + +for (idx = 1; idx 4; idx++) { +if (tokens[idx] != NULL) { +if (strstart(tokens[idx], mport=, NULL)) { +xseg_find_port(tokens[idx], mport=, lmport); +} +if (strstart(tokens[idx], vport=, NULL)) { +xseg_find_port(tokens[idx], vport=, lvport); +} +if (strstart(tokens[idx], segment=, NULL)) { +xseg_find_segment(tokens[idx], segment=, segment_name); +} +} +} + +if ((lmport == -2) || (lvport == -2)) { +error_setg(errp, mport and/or vport must be set); +g_free(ds); +return; +} +*volume = g_strdup(tokens[0]); +*mport = lmport; +*vport = lvport; +g_free(ds); +} + +static void archipelago_parse_filename(const char *filename, QDict *options, + Error **errp) +{ +const char *start; +char *volume = NULL, *segment_name = NULL; +xport mport = NoPort, vport = NoPort; + +if (qdict_haskey(options, ARCHIPELAGO_OPT_VOLUME) +|| qdict_haskey(options, ARCHIPELAGO_OPT_SEGMENT) +|| qdict_haskey(options, ARCHIPELAGO_OPT_MPORT) +|| qdict_haskey(options, ARCHIPELAGO_OPT_VPORT)) { +error_setg(errp, volume/mport/vport/segment and a file name may not + be specified at the same time); +return; +} + +if (!strstart(filename, archipelago:, start)) { +error_setg(errp, File name must start with 'archipelago:'); +return; +} + +if (!strlen(start) || strstart(start, /, NULL)) { +error_setg(errp, volume name must be specified); +return; +} + +parse_filename_opts(filename, errp, volume,
[Qemu-devel] [PATCH v7 5/5] qemu-iotests: add support for Archipelago protocol
Signed-off-by: Chrysostomos Nanakos cnana...@grnet.gr --- tests/qemu-iotests/common|6 ++ tests/qemu-iotests/common.rc |9 - 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common index e4083f4..70df659 100644 --- a/tests/qemu-iotests/common +++ b/tests/qemu-iotests/common @@ -152,6 +152,7 @@ check options -nbdtest nbd -sshtest ssh -nfstest nfs +-archipelagotest archipelago -xdiff graphical mode diff -nocacheuse O_DIRECT on backing file -misalign misalign memory allocations @@ -263,6 +264,11 @@ testlist options xpand=false ;; +-archipelago) +IMGPROTO=archipelago +xpand=false +;; + -nocache) CACHEMODE=none CACHEMODE_IS_DEFAULT=false diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc index e0ea7e3..3fd691e 100644 --- a/tests/qemu-iotests/common.rc +++ b/tests/qemu-iotests/common.rc @@ -64,6 +64,8 @@ elif [ $IMGPROTO = ssh ]; then elif [ $IMGPROTO = nfs ]; then TEST_DIR=nfs://127.0.0.1/$TEST_DIR TEST_IMG=$TEST_DIR/t.$IMGFMT +elif [ $IMGPROTO = archipelago ]; then +TEST_IMG=archipelago:at.$IMGFMT else TEST_IMG=$IMGPROTO:$TEST_DIR/t.$IMGFMT fi @@ -163,7 +165,8 @@ _make_test_img() -e s# lazy_refcounts=\\(on\\|off\\)##g \ -e s# block_size=[0-9]\\+##g \ -e s# block_state_zero=\\(on\\|off\\)##g \ --e s# log_size=[0-9]\\+##g +-e s# log_size=[0-9]\\+##g \ +-e s/archipelago:a/TEST_DIR\//g # Start an NBD server on the image file, which is what we'll be talking to if [ $IMGPROTO = nbd ]; then @@ -206,6 +209,10 @@ _cleanup_test_img() rbd --no-progress rm $TEST_DIR/t.$IMGFMT /dev/null ;; +archipelago) +vlmc remove at.$IMGFMT /dev/null +;; + sheepdog) collie vdi delete $TEST_DIR/t.$IMGFMT ;; -- 1.7.10.4
Re: [Qemu-devel] [PATCH v2 for-2.1 4/5] docs: document missing POWERDOWN event
于 2014/7/23 20:26, Eric Blake 写道: The POWERDOWN event was first documented in 0aab9ec3. But since dfab4892 later restored this file to the state prior to qmp events, and we never documented it in the past, anyone using this file instead of qapi will miss out on this event. Tweak the existing wording of SHUTDOWN to match 84321831, and make the difference between the two events apparent. * docs/qmp/qmp-events.txt (POWERDOWN): Add. (SHUTDOWN): Tweak. Signed-off-by: Eric Blake ebl...@redhat.com --- docs/qmp/qmp-events.txt | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt index 22d552f..9d7439e 100644 --- a/docs/qmp/qmp-events.txt +++ b/docs/qmp/qmp-events.txt @@ -243,6 +243,19 @@ Data: timestamp: { seconds: 1368697518, microseconds: 326866 } } } +POWERDOWN +- + +Emitted when the Virtual Machine is powered down through the power +control system, such as via ACPI. + +Data: None. + +Example: + +{ event: POWERDOWN, +timestamp: { seconds: 1267040730, microseconds: 682951 } } + QUORUM_FAILURE -- @@ -325,7 +338,8 @@ Example: SHUTDOWN -Emitted when the Virtual Machine is powered down. +Emitted when the Virtual Machine has shut down, indicating that qemu +is about to exit. Data: None. Nice to have explantion about the difference.
Re: [Qemu-devel] [PATCH v2 for-2.1 0/5] docs: document remaining QMP events
Reviewed-by: Wenchao Xia wenchaoq...@gmail.com
Re: [Qemu-devel] [PATCH] scripts: qapi-event.py: support vendor extension
Reviewed-by: Wenchao Xia wenchaoq...@gmail.com I didn't expect dot in schema before.
[Qemu-devel] [RFC PATCH 04/17] COLO info: use colo info to tell migration target colo is enabled
migrate colo info to migration target to tell the target colo is enabled. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- Makefile.objs | 1 + include/migration/migration-colo.h | 3 ++ migration-colo-comm.c | 68 ++ vl.c | 4 +++ 4 files changed, 76 insertions(+) create mode 100644 migration-colo-comm.c diff --git a/Makefile.objs b/Makefile.objs index cab5824..1836a68 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -50,6 +50,7 @@ common-obj-$(CONFIG_POSIX) += os-posix.o common-obj-$(CONFIG_LINUX) += fsdev/ common-obj-y += migration.o migration-tcp.o +common-obj-y += migration-colo-comm.o common-obj-$(CONFIG_COLO) += migration-colo.o common-obj-y += vmstate.o common-obj-y += qemu-file.o diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h index 35b384c..e3735d8 100644 --- a/include/migration/migration-colo.h +++ b/include/migration/migration-colo.h @@ -12,6 +12,9 @@ #define QEMU_MIGRATION_COLO_H #include qemu-common.h +#include migration/migration.h + +void colo_info_mig_init(void); bool colo_supported(void); diff --git a/migration-colo-comm.c b/migration-colo-comm.c new file mode 100644 index 000..ccbc246 --- /dev/null +++ b/migration-colo-comm.c @@ -0,0 +1,68 @@ +/* + * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO) + * (a.k.a. Fault Tolerance or Continuous Replication) + * + * Copyright (C) 2014 FUJITSU LIMITED + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + * + */ + +#include migration/migration-colo.h + +#define DEBUG_COLO + +#ifdef DEBUG_COLO +#define DPRINTF(fmt, ...) \ +do { fprintf(stdout, COLO: fmt, ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) \ +do { } while (0) +#endif + +static bool colo_requested; + +/* save */ + +static bool migrate_use_colo(void) +{ +MigrationState *s = migrate_get_current(); +return s-enabled_capabilities[MIGRATION_CAPABILITY_COLO]; +} + +static void colo_info_save(QEMUFile *f, void *opaque) +{ +qemu_put_byte(f, migrate_use_colo()); +} + +/* restore */ + +static int colo_info_load(QEMUFile *f, void *opaque, int version_id) +{ +int value = qemu_get_byte(f); + +if (value !colo_supported()) { +fprintf(stderr, COLO is not supported\n); +return -EINVAL; +} + +if (value !colo_requested) { +DPRINTF(COLO requested!\n); +} + +colo_requested = value; + +return 0; +} + +static SaveVMHandlers savevm_colo_info_handlers = { +.save_state = colo_info_save, +.load_state = colo_info_load, +}; + +void colo_info_mig_init(void) +{ +register_savevm_live(NULL, colo info, -1, 1, + savevm_colo_info_handlers, NULL); +} diff --git a/vl.c b/vl.c index fe451aa..1a282d8 100644 --- a/vl.c +++ b/vl.c @@ -89,6 +89,7 @@ int main(int argc, char **argv) #include sysemu/dma.h #include audio/audio.h #include migration/migration.h +#include migration/migration-colo.h #include sysemu/kvm.h #include qapi/qmp/qjson.h #include qemu/option.h @@ -4339,6 +4340,9 @@ int main(int argc, char **argv, char **envp) blk_mig_init(); ram_mig_init(); +if (colo_supported()) { +colo_info_mig_init(); +} /* open the virtual block devices */ if (snapshot) -- 1.9.1
[Qemu-devel] [RFC PATCH 07/17] COLO buffer: implement colo buffer as well as QEMUFileOps based on it
We need a buffer to store migration data. On save side: all saved data was write into colo buffer first, so that we can know the total size of the migration data. this can also separate the data transmission from colo control data, we use colo control data over socket fd to synchronous both side's stat. On restore side: all migration data was read into colo buffer first, then load data from the buffer: If network error happens while data transmission, the slaver can still functinal because the migration data are not yet loaded. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- migration-colo.c | 112 +++ 1 file changed, 112 insertions(+) diff --git a/migration-colo.c b/migration-colo.c index d566b9d..b90d9b6 100644 --- a/migration-colo.c +++ b/migration-colo.c @@ -11,6 +11,7 @@ #include qemu/main-loop.h #include qemu/thread.h #include block/coroutine.h +#include qemu/error-report.h #include migration/migration-colo.h static QEMUBH *colo_bh; @@ -20,14 +21,122 @@ bool colo_supported(void) return true; } +/* colo buffer */ + +#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL) +#define COLO_BUFFER_MAX_SIZE (1000*1000*1000*10ULL) + +typedef struct colo_buffer { +uint8_t *data; +uint64_t used; +uint64_t freed; +uint64_t size; +} colo_buffer_t; + +static colo_buffer_t colo_buffer; + +static void colo_buffer_init(void) +{ +if (colo_buffer.size == 0) { +colo_buffer.data = g_malloc(COLO_BUFFER_BASE_SIZE); +colo_buffer.size = COLO_BUFFER_BASE_SIZE; +} +colo_buffer.used = 0; +colo_buffer.freed = 0; +} + +static void colo_buffer_destroy(void) +{ +if (colo_buffer.data) { +g_free(colo_buffer.data); +colo_buffer.data = NULL; +} +colo_buffer.used = 0; +colo_buffer.freed = 0; +colo_buffer.size = 0; +} + +static void colo_buffer_extend(uint64_t len) +{ +if (len colo_buffer.size - colo_buffer.used) { +len = len + colo_buffer.used - colo_buffer.size; +len = ROUND_UP(len, COLO_BUFFER_BASE_SIZE) + COLO_BUFFER_BASE_SIZE; + +colo_buffer.size += len; +if (colo_buffer.size COLO_BUFFER_MAX_SIZE) { +error_report(colo_buffer overflow!\n); +exit(EXIT_FAILURE); +} +colo_buffer.data = g_realloc(colo_buffer.data, colo_buffer.size); +} +} + +static int colo_put_buffer(void *opaque, const uint8_t *buf, + int64_t pos, int size) +{ +colo_buffer_extend(size); +memcpy(colo_buffer.data + colo_buffer.used, buf, size); +colo_buffer.used += size; + +return size; +} + +static int colo_get_buffer_internal(uint8_t *buf, int size) +{ +if ((size + colo_buffer.freed) colo_buffer.used) { +size = colo_buffer.used - colo_buffer.freed; +} +memcpy(buf, colo_buffer.data + colo_buffer.freed, size); +colo_buffer.freed += size; + +return size; +} + +static int colo_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) +{ +return colo_get_buffer_internal(buf, size); +} + +static int colo_close(void *opaque) +{ +colo_buffer_t *cb = opaque ; + +cb-used = 0; +cb-freed = 0; + +return 0; +} + +static int colo_get_fd(void *opaque) +{ +/* colo buffer, no fd */ +return -1; +} + +static const QEMUFileOps colo_write_ops = { +.put_buffer = colo_put_buffer, +.get_fd = colo_get_fd, +.close = colo_close, +}; + +static const QEMUFileOps colo_read_ops = { +.get_buffer = colo_get_buffer, +.get_fd = colo_get_fd, +.close = colo_close, +}; + /* save */ static void *colo_thread(void *opaque) { MigrationState *s = opaque; +colo_buffer_init(); + /*TODO: COLO checkpointed save loop*/ +colo_buffer_destroy(); + if (s-state != MIG_STATE_ERROR) { migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED); } @@ -77,8 +186,11 @@ void colo_process_incoming_checkpoints(QEMUFile *f) colo = qemu_coroutine_self(); assert(colo != NULL); +colo_buffer_init(); + /* TODO: COLO checkpointed restore loop */ +colo_buffer_destroy(); colo = NULL; restore_exit_colo(); -- 1.9.1
[Qemu-devel] [RFC PATCH 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
Virtual machine (VM) replication is a well known technique for providing application-agnostic software-implemented hardware fault tolerance non-stop service. COLO is a high availability solution. Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the same request from client, and generate response in parallel too. If the response packets from PVM and SVM are identical, they are released immediately. Otherwise, a VM checkpoint (on demand) is conducted. The idea is presented in Xen summit 2012, and 2013, and academia paper in SOCC 2013. It's also presented in KVM forum 2013: http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf Please refer to above document for detailed information. Please also refer to previous posted RFC proposal: http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html The patchset is also hosted on github: https://github.com/macrosheep/qemu/tree/colo_v0.1 This patchset is RFC, implements the frame of colo, without failover and nic/disk replication. But it is ready for demo the COLO idea above QEMU-Kvm. Steps using this patchset to get an overview of COLO: 1. configure the source with --enable-colo option 2. compile 3. just like QEMU's normal migration, run 2 QEMU VM: - Primary VM - Secondary VM with -incoming tcp:[IP]:[PORT] option 4. on Primary VM's QEMU monitor, run following command: migrate_set_capability colo on migrate tcp:[IP]:[PORT] 5. done you will see two runing VMs, whenever you make changes to PVM, SVM will be synced to PVM's state. TODO list: 1. failover 2. nic replication 3. disk replication[COLO Disk manager] Any comments/feedbacks are warmly welcomed. Thanks, Yang Yang Hongyang (17): configure: add CONFIG_COLO to switch COLO support COLO: introduce an api colo_supported() to indicate COLO support COLO migration: add a migration capability 'colo' COLO info: use colo info to tell migration target colo is enabled COLO save: integrate COLO checkpointed save into qemu migration COLO restore: integrate COLO checkpointed restore into qemu restore COLO buffer: implement colo buffer as well as QEMUFileOps based on it COLO: disable qdev hotplug COLO ctl: implement API's that communicate with colo agent COLO ctl: introduce is_slave() and is_master() COLO ctl: implement colo checkpoint protocol COLO ctl: add a RunState RUN_STATE_COLO COLO ctl: implement colo save COLO ctl: implement colo restore COLO save: reuse migration bitmap under colo checkpoint COLO ram cache: implement colo ram cache on slaver HACK: trigger checkpoint every 500ms Makefile.objs | 2 + arch_init.c| 174 +- configure | 14 + include/exec/cpu-all.h | 1 + include/migration/migration-colo.h | 36 +++ include/migration/migration.h | 13 + include/qapi/qmp/qerror.h | 3 + migration-colo-comm.c | 78 + migration-colo.c | 643 + migration.c| 45 ++- qapi-schema.json | 9 +- stubs/Makefile.objs| 1 + stubs/migration-colo.c | 34 ++ vl.c | 12 + 14 files changed, 1044 insertions(+), 21 deletions(-) create mode 100644 include/migration/migration-colo.h create mode 100644 migration-colo-comm.c create mode 100644 migration-colo.c create mode 100644 stubs/migration-colo.c -- 1.9.1
[Qemu-devel] [RFC PATCH 03/17] COLO migration: add a migration capability 'colo'
Add a migration capability 'colo'. If this capability is on, The migration will never end, and the VM will be continuously checkpointed. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- include/qapi/qmp/qerror.h | 3 +++ migration.c | 6 ++ qapi-schema.json | 5 - 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h index 902d1a7..226b805 100644 --- a/include/qapi/qmp/qerror.h +++ b/include/qapi/qmp/qerror.h @@ -166,4 +166,7 @@ void qerror_report_err(Error *err); #define QERR_SOCKET_CREATE_FAILED \ ERROR_CLASS_GENERIC_ERROR, Failed to create socket +#define QERR_COLO_UNSUPPORTED \ +ERROR_CLASS_GENERIC_ERROR, COLO is not currently supported, please rerun configure with --enable-colo option in order to support COLO feature + #endif /* QERROR_H */ diff --git a/migration.c b/migration.c index 8d675b3..ca83310 100644 --- a/migration.c +++ b/migration.c @@ -25,6 +25,7 @@ #include qemu/thread.h #include qmp-commands.h #include trace.h +#include migration/migration-colo.h enum { MIG_STATE_ERROR = -1, @@ -277,6 +278,11 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params, } for (cap = params; cap; cap = cap-next) { +if (cap-value-capability == MIGRATION_CAPABILITY_COLO +cap-value-state !colo_supported()) { +error_set(errp, QERR_COLO_UNSUPPORTED); +continue; +} s-enabled_capabilities[cap-value-capability] = cap-value-state; } } diff --git a/qapi-schema.json b/qapi-schema.json index b11aad2..807f5a2 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -491,10 +491,13 @@ # @auto-converge: If enabled, QEMU will automatically throttle down the guest # to speed up convergence of RAM migration. (since 1.6) # +# @colo: The migration will never end, and the VM will instead be continuously +#checkpointed. The feature is disabled by default. (since 2.1) +# # Since: 1.2 ## { 'enum': 'MigrationCapability', - 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] } + 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', 'colo'] } ## # @MigrationCapabilityStatus -- 1.9.1
[Qemu-devel] [RFC PATCH 05/17] COLO save: integrate COLO checkpointed save into qemu migration
Integrate COLO checkpointed save flow into qemu migration. Add a migrate state: MIG_STATE_COLO, enter this migrate state after the first live migration successfully finished. Create a colo thread to do the checkpointed save. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- include/migration/migration-colo.h | 4 include/migration/migration.h | 13 +++ migration-colo-comm.c | 2 +- migration-colo.c | 48 ++ migration.c| 36 stubs/migration-colo.c | 4 6 files changed, 91 insertions(+), 16 deletions(-) diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h index e3735d8..24589c0 100644 --- a/include/migration/migration-colo.h +++ b/include/migration/migration-colo.h @@ -18,4 +18,8 @@ void colo_info_mig_init(void); bool colo_supported(void); +/* save */ +bool migrate_use_colo(void); +void colo_init_checkpointer(MigrationState *s); + #endif diff --git a/include/migration/migration.h b/include/migration/migration.h index 3cb5ba8..3e81a27 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -64,6 +64,19 @@ struct MigrationState int64_t dirty_sync_count; }; +enum { +MIG_STATE_ERROR = -1, +MIG_STATE_NONE, +MIG_STATE_SETUP, +MIG_STATE_CANCELLING, +MIG_STATE_CANCELLED, +MIG_STATE_ACTIVE, +MIG_STATE_COLO, +MIG_STATE_COMPLETED, +}; + +void migrate_set_state(MigrationState *s, int old_state, int new_state); + void process_incoming_migration(QEMUFile *f); void qemu_start_incoming_migration(const char *uri, Error **errp); diff --git a/migration-colo-comm.c b/migration-colo-comm.c index ccbc246..4504ceb 100644 --- a/migration-colo-comm.c +++ b/migration-colo-comm.c @@ -25,7 +25,7 @@ static bool colo_requested; /* save */ -static bool migrate_use_colo(void) +bool migrate_use_colo(void) { MigrationState *s = migrate_get_current(); return s-enabled_capabilities[MIGRATION_CAPABILITY_COLO]; diff --git a/migration-colo.c b/migration-colo.c index 1d3bef8..0cef8bd 100644 --- a/migration-colo.c +++ b/migration-colo.c @@ -8,9 +8,57 @@ * the COPYING file in the top-level directory. */ +#include qemu/main-loop.h +#include qemu/thread.h #include migration/migration-colo.h +static QEMUBH *colo_bh; + bool colo_supported(void) { return true; } + +/* save */ + +static void *colo_thread(void *opaque) +{ +MigrationState *s = opaque; + +/*TODO: COLO checkpointed save loop*/ + +if (s-state != MIG_STATE_ERROR) { +migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED); +} + +qemu_mutex_lock_iothread(); +qemu_bh_schedule(s-cleanup_bh); +qemu_mutex_unlock_iothread(); + +return NULL; +} + +static void colo_start_checkpointer(void *opaque) +{ +MigrationState *s = opaque; + +if (colo_bh) { +qemu_bh_delete(colo_bh); +colo_bh = NULL; +} + +qemu_mutex_unlock_iothread(); +qemu_thread_join(s-thread); +qemu_mutex_lock_iothread(); + +migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COLO); + +qemu_thread_create(s-thread, colo, colo_thread, s, + QEMU_THREAD_JOINABLE); +} + +void colo_init_checkpointer(MigrationState *s) +{ +colo_bh = qemu_bh_new(colo_start_checkpointer, s); +qemu_bh_schedule(colo_bh); +} diff --git a/migration.c b/migration.c index ca83310..b7f8e7e 100644 --- a/migration.c +++ b/migration.c @@ -27,16 +27,6 @@ #include trace.h #include migration/migration-colo.h -enum { -MIG_STATE_ERROR = -1, -MIG_STATE_NONE, -MIG_STATE_SETUP, -MIG_STATE_CANCELLING, -MIG_STATE_CANCELLED, -MIG_STATE_ACTIVE, -MIG_STATE_COMPLETED, -}; - #define MAX_THROTTLE (32 20) /* Migration speed throttling */ /* Amount of time to allocate to each chunk of bandwidth-throttled @@ -229,6 +219,11 @@ MigrationInfo *qmp_query_migrate(Error **errp) get_xbzrle_cache_stats(info); break; +case MIG_STATE_COLO: +info-has_status = true; +info-status = g_strdup(colo); +/* TODO: display COLO specific informations(checkpoint info etc.),*/ +break; case MIG_STATE_COMPLETED: get_xbzrle_cache_stats(info); @@ -272,7 +267,8 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params, MigrationState *s = migrate_get_current(); MigrationCapabilityStatusList *cap; -if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP) { +if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP || +s-state == MIG_STATE_COLO) { error_set(errp, QERR_MIGRATION_ACTIVE); return; } @@ -289,7 +285,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params, /* shared migration helpers */ -static void migrate_set_state(MigrationState *s, int old_state, int new_state) +void
[Qemu-devel] [RFC PATCH 09/17] COLO ctl: implement API's that communicate with colo agent
We use COLO agent to compare the packets returned by Primary VM and Secondary VM, and decide whether to start a checkpoint according to some rules. It is a linux kernel module for host. COLO controller communicate with the agent through ioctl(). Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- migration-colo.c | 115 +-- 1 file changed, 112 insertions(+), 3 deletions(-) diff --git a/migration-colo.c b/migration-colo.c index f295e56..802f8b0 100644 --- a/migration-colo.c +++ b/migration-colo.c @@ -13,7 +13,16 @@ #include block/coroutine.h #include qemu/error-report.h #include hw/qdev-core.h +#include qemu/timer.h #include migration/migration-colo.h +#include sys/ioctl.h + +/* + * checkpoint timer: unit ms + * this is large because COLO checkpoint will mostly depend on + * COLO compare module. + */ +#define CHKPOINT_TIMER 1 static QEMUBH *colo_bh; @@ -22,6 +31,56 @@ bool colo_supported(void) return true; } +/* colo compare */ +#define COMP_IOC_MAGIC 'k' +#define COMP_IOCTWAIT _IO(COMP_IOC_MAGIC, 0) +#define COMP_IOCTFLUSH _IO(COMP_IOC_MAGIC, 1) +#define COMP_IOCTRESUME _IO(COMP_IOC_MAGIC, 2) + +#define COMPARE_DEV /dev/HA_compare +/* COLO compare module FD */ +static int comp_fd = -1; + +static int colo_compare_init(void) +{ +comp_fd = open(COMPARE_DEV, O_RDONLY); +if (comp_fd 0) { +return -1; +} + +return 0; +} + +static void colo_compare_destroy(void) +{ +if (comp_fd = 0) { +close(comp_fd); +comp_fd = -1; +} +} + +/* + * Communicate with COLO Agent through ioctl. + * return: + * 0: start a checkpoint + * other: errno == ETIME or ERESTART, try again + *errno == other, error, quit colo save + */ +static int colo_compare(void) +{ +return ioctl(comp_fd, COMP_IOCTWAIT, 250); +} + +static __attribute__((unused)) int colo_compare_flush(void) +{ +return ioctl(comp_fd, COMP_IOCTFLUSH, 1); +} + +static __attribute__((unused)) int colo_compare_resume(void) +{ +return ioctl(comp_fd, COMP_IOCTRESUME, 1); +} + /* colo buffer */ #define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL) @@ -131,15 +190,48 @@ static const QEMUFileOps colo_read_ops = { static void *colo_thread(void *opaque) { MigrationState *s = opaque; -int dev_hotplug = qdev_hotplug; +int dev_hotplug = qdev_hotplug, wait_cp = 0; +int64_t start_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); +int64_t current_time; + +if (colo_compare_init() 0) { +error_report(Init colo compare error\n); +goto out; +} qdev_hotplug = 0; colo_buffer_init(); -/*TODO: COLO checkpointed save loop*/ +while (s-state == MIG_STATE_COLO) { +/* wait for a colo checkpoint */ +wait_cp = colo_compare(); +if (wait_cp) { +if (errno != ETIME errno != ERESTART) { +error_report(compare module failed(%s), strerror(errno)); +goto out; +} +/* + * no checkpoint is needed, wait for 1ms and then + * check if we need checkpoint + */ +current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); +if (current_time - start_time CHKPOINT_TIMER) { +usleep(1000); +continue; +} +} + +/* start a colo checkpoint */ + +/*TODO: COLO save */ +start_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); +} + +out: colo_buffer_destroy(); +colo_compare_destroy(); if (s-state != MIG_STATE_ERROR) { migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED); @@ -183,6 +275,17 @@ void colo_init_checkpointer(MigrationState *s) static Coroutine *colo; +/* + * return: + * 0: start a checkpoint + * 1: some error happend, exit colo restore + */ +static int slave_wait_new_checkpoint(QEMUFile *f) +{ +/* TODO: wait checkpoint start command from master */ +return 1; +} + void colo_process_incoming_checkpoints(QEMUFile *f) { int dev_hotplug = qdev_hotplug; @@ -198,7 +301,13 @@ void colo_process_incoming_checkpoints(QEMUFile *f) colo_buffer_init(); -/* TODO: COLO checkpointed restore loop */ +while (true) { +if (slave_wait_new_checkpoint(f)) { +break; +} + +/* TODO: COLO restore */ +} colo_buffer_destroy(); colo = NULL; -- 1.9.1
[Qemu-devel] [RFC PATCH 11/17] COLO ctl: implement colo checkpoint protocol
implement colo checkpoint protocol. Checkpoint synchronzing points. Primary Secondary NEW @ Suspend SUSPENDED @ SuspendSave state SEND@ Send state Receive state RECEIVED@ Flush network Load state LOADED @ Resume Resume Start Comparing NOTE: 1) '@' who sends the message 2) Every sync-point is synchronized by two sides with only one handshake(single direction) for low-latency. If more strict synchronization is required, a opposite direction sync-point should be added. 3) Since sync-points are single direction, the remote side may go forward a lot when this side just receives the sync-point. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- migration-colo.c | 268 +-- 1 file changed, 262 insertions(+), 6 deletions(-) diff --git a/migration-colo.c b/migration-colo.c index 2699e77..a708872 100644 --- a/migration-colo.c +++ b/migration-colo.c @@ -24,6 +24,41 @@ */ #define CHKPOINT_TIMER 1 +enum { +COLO_READY = 0x46, + +/* + * Checkpoint synchronzing points. + * + * Primary Secondary + * NEW @ + * Suspend + * SUSPENDED @ + * SuspendSave state + * SEND@ + * Send state Receive state + * RECEIVED@ + * Flush network Load state + * LOADED @ + * Resume Resume + * + * Start Comparing + * NOTE: + * 1) '@' who sends the message + * 2) Every sync-point is synchronized by two sides with only + *one handshake(single direction) for low-latency. + *If more strict synchronization is required, a opposite direction + *sync-point should be added. + * 3) Since sync-points are single direction, the remote side may + *go forward a lot when this side just receives the sync-point. + */ +COLO_CHECKPOINT_NEW, +COLO_CHECKPOINT_SUSPENDED, +COLO_CHECKPOINT_SEND, +COLO_CHECKPOINT_RECEIVED, +COLO_CHECKPOINT_LOADED, +}; + static QEMUBH *colo_bh; bool colo_supported(void) @@ -185,30 +220,161 @@ static const QEMUFileOps colo_read_ops = { .close = colo_close, }; +/* colo checkpoint control helper */ +static bool is_master(void); +static bool is_slave(void); + +static void ctl_error_handler(void *opaque, int err) +{ +if (is_slave()) { +/* TODO: determine whether we need to failover */ +/* FIXME: we will not failover currently, just kill slave */ +error_report(error: colo transmission failed!\n); +exit(1); +} else if (is_master()) { +/* Master still alive, do not failover */ +error_report(error: colo transmission failed!\n); +return; +} else { +error_report(COLO: Unexpected error happend!\n); +exit(EXIT_FAILURE); +} +} + +static int colo_ctl_put(QEMUFile *f, uint64_t request) +{ +int ret = 0; + +qemu_put_be64(f, request); +qemu_fflush(f); + +ret = qemu_file_get_error(f); +if (ret 0) { +ctl_error_handler(f, ret); +return 1; +} + +return ret; +} + +static int colo_ctl_get_value(QEMUFile *f, uint64_t *value) +{ +int ret = 0; +uint64_t temp; + +temp = qemu_get_be64(f); + +ret = qemu_file_get_error(f); +if (ret 0) { +ctl_error_handler(f, ret); +return 1; +} + +*value = temp; +return 0; +} + +static int colo_ctl_get(QEMUFile *f, uint64_t require) +{ +int ret; +uint64_t value; + +ret = colo_ctl_get_value(f, value); +if (ret) { +return ret; +} + +if (value != require) { +error_report(unexpected state received!\n); +exit(1); +} + +return ret; +} + /* save */ -static __attribute__((unused)) bool is_master(void) +static bool is_master(void) { MigrationState *s = migrate_get_current(); return (s-state == MIG_STATE_COLO); } +static int do_colo_transaction(MigrationState *s, QEMUFile *control, + QEMUFile *trans) +{ +int ret; + +ret = colo_ctl_put(s-file, COLO_CHECKPOINT_NEW); +if (ret) { +goto out; +} + +ret = colo_ctl_get(control, COLO_CHECKPOINT_SUSPENDED); +if (ret) { +goto out; +} + +/* TODO: suspend and save vm state to colo buffer */ + +ret = colo_ctl_put(s-file, COLO_CHECKPOINT_SEND); +if (ret) { +goto out; +} + +/*
[Qemu-devel] [RFC PATCH 14/17] COLO ctl: implement colo restore
implement colo restore Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- migration-colo.c | 43 +++ 1 file changed, 35 insertions(+), 8 deletions(-) diff --git a/migration-colo.c b/migration-colo.c index 03ac157..8596845 100644 --- a/migration-colo.c +++ b/migration-colo.c @@ -535,8 +535,9 @@ void colo_process_incoming_checkpoints(QEMUFile *f) { int fd = qemu_get_fd(f); int dev_hotplug = qdev_hotplug; -QEMUFile *ctl = NULL; +QEMUFile *ctl = NULL, *fb = NULL; int ret; +uint64_t total_size; if (!restore_use_colo()) { return; @@ -560,7 +561,8 @@ void colo_process_incoming_checkpoints(QEMUFile *f) goto out; } -/* TODO: in COLO mode, slave is runing, so start the vm */ +/* in COLO mode, slave is runing, so start the vm */ +vm_start(); while (true) { if (slave_wait_new_checkpoint(f)) { @@ -569,43 +571,68 @@ void colo_process_incoming_checkpoints(QEMUFile *f) /* start colo checkpoint */ -/* TODO: suspend guest */ +/* suspend guest */ +vm_stop_force_state(RUN_STATE_COLO); ret = colo_ctl_put(ctl, COLO_CHECKPOINT_SUSPENDED); if (ret) { goto out; } -/* TODO: open colo buffer for read */ +/* open colo buffer for read */ +fb = qemu_fopen_ops(colo_buffer, colo_read_ops); +if (!fb) { +error_report(can't open colo buffer\n); +goto out; +} ret = colo_ctl_get(f, COLO_CHECKPOINT_SEND); if (ret) { goto out; } -/* TODO: read migration data into colo buffer */ +/* read migration data into colo buffer */ + +/* read the vmstate total size first */ +ret = colo_ctl_get_value(f, total_size); +if (ret) { +goto out; +} +colo_buffer_extend(total_size); +qemu_get_buffer(f, colo_buffer.data, total_size); +colo_buffer.used = total_size; ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED); if (ret) { goto out; } -/* TODO: load vm state */ +/* load vm state */ +if (qemu_loadvm_state(fb) 0) { +error_report(COLO: loadvm failed\n); +goto out; +} ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED); if (ret) { goto out; } -/* TODO: resume guest */ +/* resume guest */ +vm_start(); -/* TODO: close colo buffer */ +qemu_fclose(fb); +fb = NULL; } out: colo_buffer_destroy(); colo = NULL; +if (fb) { +qemu_fclose(fb); +} + if (ctl) { qemu_fclose(ctl); } -- 1.9.1
[Qemu-devel] [RFC PATCH 15/17] COLO save: reuse migration bitmap under colo checkpoint
reuse migration bitmap under colo checkpoint, only send dirty pages per-checkpoint. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- arch_init.c| 20 +++- include/migration/migration-colo.h | 2 ++ migration-colo.c | 6 ++ stubs/migration-colo.c | 10 ++ 4 files changed, 33 insertions(+), 5 deletions(-) diff --git a/arch_init.c b/arch_init.c index 8ddaf35..c84e6c8 100644 --- a/arch_init.c +++ b/arch_init.c @@ -52,6 +52,7 @@ #include exec/ram_addr.h #include hw/acpi/acpi.h #include qemu/host-utils.h +#include migration/migration-colo.h #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ @@ -769,6 +770,15 @@ static int ram_save_setup(QEMUFile *f, void *opaque) RAMBlock *block; int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */ +/* + * migration has already setup the bitmap, reuse it. + */ +if (is_master()) { +qemu_mutex_lock_ramlist(); +reset_ram_globals(); +goto out_setup; +} + mig_throttle_on = false; dirty_rate_high_cnt = 0; bitmap_sync_count = 0; @@ -828,6 +838,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) migration_bitmap_sync(); qemu_mutex_unlock_iothread(); +out_setup: qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); QTAILQ_FOREACH(block, ram_list.blocks, next) { @@ -937,7 +948,14 @@ static int ram_save_complete(QEMUFile *f, void *opaque) } ram_control_after_iterate(f, RAM_CONTROL_FINISH); -migration_end(); + +/* + * Since we need to reuse dirty bitmap in colo, + * don't cleanup the bitmap. + */ +if (!migrate_use_colo() || migration_has_failed(migrate_get_current())) { +migration_end(); +} qemu_mutex_unlock_ramlist(); qemu_put_be64(f, RAM_SAVE_FLAG_EOS); diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h index 861fa27..c286a60 100644 --- a/include/migration/migration-colo.h +++ b/include/migration/migration-colo.h @@ -21,10 +21,12 @@ bool colo_supported(void); /* save */ bool migrate_use_colo(void); void colo_init_checkpointer(MigrationState *s); +bool is_master(void); /* restore */ bool restore_use_colo(void); void restore_exit_colo(void); +bool is_slave(void); void colo_process_incoming_checkpoints(QEMUFile *f); diff --git a/migration-colo.c b/migration-colo.c index 8596845..13a6a57 100644 --- a/migration-colo.c +++ b/migration-colo.c @@ -222,8 +222,6 @@ static const QEMUFileOps colo_read_ops = { }; /* colo checkpoint control helper */ -static bool is_master(void); -static bool is_slave(void); static void ctl_error_handler(void *opaque, int err) { @@ -295,7 +293,7 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require) /* save */ -static bool is_master(void) +bool is_master(void) { MigrationState *s = migrate_get_current(); return (s-state == MIG_STATE_COLO); @@ -499,7 +497,7 @@ void colo_init_checkpointer(MigrationState *s) static Coroutine *colo; -static bool is_slave(void) +bool is_slave(void) { return colo != NULL; } diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c index 55f0d37..ef65be6 100644 --- a/stubs/migration-colo.c +++ b/stubs/migration-colo.c @@ -22,3 +22,13 @@ void colo_init_checkpointer(MigrationState *s) void colo_process_incoming_checkpoints(QEMUFile *f) { } + +bool is_master(void) +{ +return false; +} + +bool is_slave(void) +{ +return false; +} -- 1.9.1
[Qemu-devel] [RFC PATCH 16/17] COLO ram cache: implement colo ram cache on slaver
The ram cache was initially the same as PVM's memory. At checkpoint, we cache the dirty memory of PVM into ram cache (so that ram cache always the same as PVM's memory at every checkpoint), flush cached memory to SVM after we received all PVM dirty memory(only needed to flush memory that was both dirty on PVM and SVM since last checkpoint). Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- arch_init.c| 154 - include/exec/cpu-all.h | 1 + include/migration/migration-colo.h | 3 + migration-colo.c | 4 + 4 files changed, 159 insertions(+), 3 deletions(-) diff --git a/arch_init.c b/arch_init.c index c84e6c8..009bcb5 100644 --- a/arch_init.c +++ b/arch_init.c @@ -1013,6 +1013,7 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host) return 0; } +static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block); static inline void *host_from_stream_offset(QEMUFile *f, ram_addr_t offset, int flags) @@ -1027,7 +1028,12 @@ static inline void *host_from_stream_offset(QEMUFile *f, return NULL; } -return memory_region_get_ram_ptr(block-mr) + offset; +if (is_slave()) { +migration_bitmap_set_dirty(block-mr-ram_addr + offset); +return memory_region_get_ram_cache_ptr(block-mr, block) + offset; +} else { +return memory_region_get_ram_ptr(block-mr) + offset; +} } len = qemu_get_byte(f); @@ -1035,8 +1041,15 @@ static inline void *host_from_stream_offset(QEMUFile *f, id[len] = 0; QTAILQ_FOREACH(block, ram_list.blocks, next) { -if (!strncmp(id, block-idstr, sizeof(id))) -return memory_region_get_ram_ptr(block-mr) + offset; +if (!strncmp(id, block-idstr, sizeof(id))) { +if (is_slave()) { +migration_bitmap_set_dirty(block-mr-ram_addr + offset); +return memory_region_get_ram_cache_ptr(block-mr, block) + + offset; +} else { +return memory_region_get_ram_ptr(block-mr) + offset; +} +} } error_report(Can't find block %s!, id); @@ -1054,11 +1067,13 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size) } } +static void ram_flush_cache(void); static int ram_load(QEMUFile *f, void *opaque, int version_id) { ram_addr_t addr; int flags, ret = 0; static uint64_t seq_iter; +bool need_flush = false; seq_iter++; @@ -1121,6 +1136,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) break; } +need_flush = true; ch = qemu_get_byte(f); ram_handle_compressed(host, ch, TARGET_PAGE_SIZE); } else if (flags RAM_SAVE_FLAG_PAGE) { @@ -1133,6 +1149,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) break; } +need_flush = true; qemu_get_buffer(f, host, TARGET_PAGE_SIZE); } else if (flags RAM_SAVE_FLAG_XBZRLE) { void *host = host_from_stream_offset(f, addr, flags); @@ -1148,6 +1165,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) ret = -EINVAL; break; } +need_flush = true; } else if (flags RAM_SAVE_FLAG_HOOK) { ram_control_load_hook(f, flags); } else if (flags RAM_SAVE_FLAG_EOS) { @@ -1161,11 +1179,141 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) ret = qemu_file_get_error(f); } +if (!ret is_slave() need_flush) { +ram_flush_cache(); +} + DPRINTF(Completed load of VM with exit code %d seq iteration % PRIu64 \n, ret, seq_iter); return ret; } +/* + * colo cache: this is for secondary VM, we cache the whole + * memory of the secondary VM. + */ +void create_and_init_ram_cache(void) +{ +/* + * called after first migration + */ +RAMBlock *block; +int64_t ram_cache_pages = last_ram_offset() TARGET_PAGE_BITS; + +QTAILQ_FOREACH(block, ram_list.blocks, next) { +block-host_cache = g_malloc(block-length); +memcpy(block-host_cache, block-host, block-length); +} + +migration_bitmap = bitmap_new(ram_cache_pages); +migration_dirty_pages = 0; +memory_global_dirty_log_start(); +} + +void release_ram_cache(void) +{ +RAMBlock *block; + +if (migration_bitmap) { +memory_global_dirty_log_stop(); +g_free(migration_bitmap); +migration_bitmap = NULL; +} + +QTAILQ_FOREACH(block, ram_list.blocks, next) { +g_free(block-host_cache); +} +} + +static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block) +{ + if (mr-alias) { +
[Qemu-devel] [RFC PATCH 10/17] COLO ctl: introduce is_slave() and is_master()
is_slaver is to determine whether the QEMU instance is a slaver(migration target) at runtime. is_master is to determine whether the QEMU instance is a master(migration starter) at runtime. This 2 APIs will be used later. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- migration-colo.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/migration-colo.c b/migration-colo.c index 802f8b0..2699e77 100644 --- a/migration-colo.c +++ b/migration-colo.c @@ -187,6 +187,12 @@ static const QEMUFileOps colo_read_ops = { /* save */ +static __attribute__((unused)) bool is_master(void) +{ +MigrationState *s = migrate_get_current(); +return (s-state == MIG_STATE_COLO); +} + static void *colo_thread(void *opaque) { MigrationState *s = opaque; @@ -275,6 +281,11 @@ void colo_init_checkpointer(MigrationState *s) static Coroutine *colo; +static __attribute__((unused)) bool is_slave(void) +{ +return colo != NULL; +} + /* * return: * 0: start a checkpoint -- 1.9.1
Re: [Qemu-devel] [RFC PATCH 03/17] COLO migration: add a migration capability 'colo'
On 07/23/2014 08:25 AM, Yang Hongyang wrote: Add a migration capability 'colo'. If this capability is on, The migration will never end, and the VM will be continuously checkpointed. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- include/qapi/qmp/qerror.h | 3 +++ migration.c | 6 ++ qapi-schema.json | 5 - 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h index 902d1a7..226b805 100644 --- a/include/qapi/qmp/qerror.h +++ b/include/qapi/qmp/qerror.h @@ -166,4 +166,7 @@ void qerror_report_err(Error *err); #define QERR_SOCKET_CREATE_FAILED \ ERROR_CLASS_GENERIC_ERROR, Failed to create socket +#define QERR_COLO_UNSUPPORTED \ +ERROR_CLASS_GENERIC_ERROR, COLO is not currently supported, please rerun configure with --enable-colo option in order to support COLO feature Unless you plan on using this message in more than one place, we prefer that you don't add new #defines here. Instead, just use error_setg with the message inline. +++ b/qapi-schema.json @@ -491,10 +491,13 @@ # @auto-converge: If enabled, QEMU will automatically throttle down the guest # to speed up convergence of RAM migration. (since 1.6) # +# @colo: The migration will never end, and the VM will instead be continuously +#checkpointed. The feature is disabled by default. (since 2.1) You missed 2.1. This has to be since 2.2. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 1/7] hw/misc/platform_devices: helpers for dynamic instantiation of platform devices
On 07/08/2014 03:43 PM, Alexander Graf wrote: On 07.07.14 09:08, Eric Auger wrote: This new module implements routines which help in dynamic instantiation of sysbus devices. Machine files can use those generic routines. --- Dynamic sysbus device allocation fully written by Alex Graf. [Eric Auger] Those functions were initially in ppc e500 machine file. Now moved to a separate module. PPCE500Params is replaced by a generic struct named PlatformParams Signed-off-by: Alexander Graf ag...@suse.de Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/misc/Makefile.objs | 1 + hw/misc/platform_devices.c | 217 + include/hw/misc/platform_devices.h | 61 +++ 3 files changed, 279 insertions(+) create mode 100644 hw/misc/platform_devices.c create mode 100644 include/hw/misc/platform_devices.h diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs index e47fea8..d081606 100644 --- a/hw/misc/Makefile.objs +++ b/hw/misc/Makefile.objs @@ -40,3 +40,4 @@ obj-$(CONFIG_SLAVIO) += slavio_misc.o obj-$(CONFIG_ZYNQ) += zynq_slcr.o obj-$(CONFIG_PVPANIC) += pvpanic.o +obj-y += platform_devices.o diff --git a/hw/misc/platform_devices.c b/hw/misc/platform_devices.c new file mode 100644 index 000..96ab272 --- /dev/null +++ b/hw/misc/platform_devices.c @@ -0,0 +1,217 @@ +#include hw/misc/platform_devices.h +#include hw/sysbus.h +#include qemu/error-report.h + +#define PAGE_SHIFT 12 + +int sysbus_device_create_devtree(Object *obj, void *opaque) +{ +PlatformDevtreeData *data = opaque; +Object *dev; +SysBusDevice *sbdev; +bool matched = false; + +dev = object_dynamic_cast(obj, TYPE_SYS_BUS_DEVICE); +sbdev = (SysBusDevice *)dev; + +if (!sbdev) { +/* Container, traverse it for children */ +return object_child_foreach(obj, sysbus_device_create_devtree, data); +} + +if (!matched) { +error_report(Device %s is not supported by this machine yet., + qdev_fw_name(DEVICE(dev))); +exit(1); +} + +return 0; +} + +void platform_bus_create_devtree(PlatformParams *params, void *fdt, +const char *mpic) +{ +gchar *node = g_strdup_printf(/platform@%PRIx64, + params-platform_bus_base); +const char platcomp[] = qemu,platform\0simple-bus; +PlatformDevtreeData data; +Object *container; +uint64_t addr = params-platform_bus_base; +uint64_t size = params-platform_bus_size; +int irq_start = params-platform_bus_first_irq; + +/* Create a /platform node that we can put all devices into */ + +qemu_fdt_add_subnode(fdt, node); +qemu_fdt_setprop(fdt, node, compatible, platcomp, sizeof(platcomp)); + +/* Our platform bus region is less than 32bit big, so 1 cell is enough for + address and size */ +qemu_fdt_setprop_cells(fdt, node, #size-cells, 1); +qemu_fdt_setprop_cells(fdt, node, #address-cells, 1); +qemu_fdt_setprop_cells(fdt, node, ranges, 0, addr 32, addr, size); + +qemu_fdt_setprop_phandle(fdt, node, interrupt-parent, mpic); + +/* Loop through all devices and create nodes for known ones */ +data.fdt = fdt; +data.mpic = mpic; +data.irq_start = irq_start; +data.node = node; + +container = container_get(qdev_get_machine(), /peripheral); +sysbus_device_create_devtree(container, data); +container = container_get(qdev_get_machine(), /peripheral-anon); +sysbus_device_create_devtree(container, data); + +g_free(node); +} Device trees are pretty platform (and even machine) specific. Just to give you an example - the interrupt specifier on most e500 systems really is 4 cells big: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt#n80 | Interrupt specifiers consists of 4 cells encoded as follows: 1st-cell interrupt-number Identifies the interrupt source. The meaning depends on the type of interrupt. Note: If the interrupt-type cell is undefined (i.e. #interrupt-cells = 2), this cell should be interpreted the same as for interrupt-type 0-- i.e. an external or normal SoC device interrupt. 2nd-cell level-sense information, encoded as follows: 0 = low-to-high edge triggered 1 = active low level-sensitive 2 = active high level-sensitive 3 = high-to-low edge triggered 3rd-cell interrupt-type The following types are supported: 0 = external or normal SoC device interrupt The interrupt-number cell contains the
Re: [Qemu-devel] [PATCH 4/7] hw/arm/virt: Support dynamically spawned sysbus devices
On 07/08/2014 03:51 PM, Alexander Graf wrote: On 07.07.14 09:08, Eric Auger wrote: Allows sysbus devices to be instantiated from command line by using -device option --- Inspired from what Alex Graf did in ppc e500 https://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00012.html Signed-off-by: Alexander Graf ag...@suse.de Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/arm/virt.c | 58 +- 1 file changed, 57 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index eeecdbf..3a21db4 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -40,6 +40,8 @@ #include exec/address-spaces.h #include qemu/bitops.h #include qemu/error-report.h +#include hw/misc/platform_devices.h +#include hw/vfio/vfio-platform.h #define NUM_VIRTIO_TRANSPORTS 32 @@ -57,6 +59,14 @@ #define GIC_FDT_IRQ_PPI_CPU_START 8 #define GIC_FDT_IRQ_PPI_CPU_WIDTH 8 +#define MACHVIRT_PLATFORM_BASE 0xa004000 That's an odd address for a 128MB window. Can you make it 128MB aligned? Maybe move the virtio region behind this one? Yes you're right. I didn't pay attention to that. Now we have to find a hole agreed with everybody if that's feasible ;-) With a bit of smartness we don't need a virtio-mmio region with this patch set anymore btw. We could just generate the virtio-mmio devices on our platform bus on the fly. +#define MACHVIRT_PLATFORM_HOLE (128ULL * 1024 * 1024) /* 128 MB */ As Scott mentioned in the e500 review round, hole is an odd name ;). OK I will rename that. Alex
Re: [Qemu-devel] [PATCH] target-i386/FPU: wrong conversion infinity from float80 to int32/int64
I'm understood. So, am I right? From: Dmitry Poletaev poletaev-q...@yandex.ru Signed-off-by: Dmitry Poletaev poletaev-q...@yandex.ru --- target-i386/fpu_helper.c | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/target-i386/fpu_helper.c b/target-i386/fpu_helper.c index 1b2900d..c4fdad8 100644 --- a/target-i386/fpu_helper.c +++ b/target-i386/fpu_helper.c @@ -251,16 +251,31 @@ int32_t helper_fist_ST0(CPUX86State *env) int32_t helper_fistl_ST0(CPUX86State *env) { int32_t val; - +signed char old_exp_flags; + +old_exp_flags = env-fp_status.float_exception_flags; +env-fp_status.float_exception_flags = 0; val = floatx80_to_int32(ST0, env-fp_status); +if (env-fp_status.float_exception_flags FPUS_IE) { +val = 0x8000; +} +env-fp_status.float_exception_flags |= old_exp_flags; return val; } int64_t helper_fistll_ST0(CPUX86State *env) { int64_t val; - -val = floatx80_to_int64(ST0, env-fp_status); +signed char old_exp_flags; + +old_exp_flags = env-fp_status.float_exception_flags; +env-fp_status.float_exception_flags = 0; + +val = floatx80_to_int64(ST0, env-fp_status); +if (env-fp_status.float_exception_flags FPUS_IE) { +val = 0x8000; +} +env-fp_status.float_exception_flags |= old_exp_flags; return val; } -- 1.8.4.msysgit.0 23.07.2014, 16:42, Peter Maydell peter.mayd...@linaro.org: On 23 July 2014 12:55, Dmitry Poletaev poletaev-q...@yandex.ru wrote: 14.07.2014, 18:59, Peter Maydell peter.mayd...@linaro.org: Since softfloat's status flags are sticky ... What does it mean? Sticky here means that the status flags accumulate the status from a sequence of operations: a softfloat function will set the flag if the relevant exception occurred, but if the exceptional condition did not happen then the flag will be left at whatever its preceding value was. So you can't just say if the flag is set then the last operation I did set it, because it might have been set by some operation before that. (That is, once a bit gets set in the flags word it sticks and doesn't go away.) This matches the IEEE mandated behaviour for floating point exception flags, which is why we do it. thanks -- PMM
[Qemu-devel] [Bug 1347387] [NEW] while i was created the new virtual machine using qemu the following error was shown in fedora version 20
Public bug reported: [root@localhost pkgs]# qemu-img create virtualdisk.img 100M qemu-img: symbol lookup error: qemu-img: undefined symbol: glfs_discard_async [root@localhost pkgs]# qemu-i386 create virtualdisk.img 100M Error while loading create: No such file or directory [root@localhost pkgs]# rpm -qa qemu-kvm libvirt qemu-kvm-1.6.2-6.fc20.x86_64 libvirt-1.1.3.5-2.fc20.x86_64 [root@localhost pkgs]# [root@localhost pkgs]# rpm -qa|grep *qemu* qemu-system-m68k-1.6.2-6.fc20.x86_64 qemu-kvm-1.6.2-6.fc20.x86_64 qemu-system-microblaze-1.6.2-6.fc20.x86_64 ipxe-roms-qemu-20130517-3.gitc4bce43.fc20.noarch qemu-common-1.6.2-6.fc20.x86_64 qemu-system-sh4-1.6.2-6.fc20.x86_64 qemu-system-sparc-1.6.2-6.fc20.x86_64 qemu-system-lm32-1.6.2-6.fc20.x86_64 qemu-img-1.6.2-6.fc20.x86_64 qemu-system-s390x-1.6.2-6.fc20.x86_64 qemu-system-cris-1.6.2-6.fc20.x86_64 qemu-1.6.2-6.fc20.x86_64 qemu-system-xtensa-1.6.2-6.fc20.x86_64 qemu-system-moxie-1.6.2-6.fc20.x86_64 qemu-system-ppc-1.6.2-6.fc20.x86_64 libvirt-daemon-driver-qemu-1.1.3.5-2.fc20.x86_64 qemu-system-mips-1.6.2-6.fc20.x86_64 qemu-system-alpha-1.6.2-6.fc20.x86_64 qemu-guest-agent-1.6.1-2.fc20.x86_64 qemu-user-1.6.2-6.fc20.x86_64 qemu-system-x86-1.6.2-6.fc20.x86_64 qemu-system-arm-1.6.2-6.fc20.x86_64 qemu-system-unicore32-1.6.2-6.fc20.x86_64 qemu-system-or32-1.6.2-6.fc20.x86_64 [root@localhost pkgs]# ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1347387 Title: while i was created the new virtual machine using qemu the following error was shown in fedora version 20 Status in QEMU: New Bug description: [root@localhost pkgs]# qemu-img create virtualdisk.img 100M qemu-img: symbol lookup error: qemu-img: undefined symbol: glfs_discard_async [root@localhost pkgs]# qemu-i386 create virtualdisk.img 100M Error while loading create: No such file or directory [root@localhost pkgs]# rpm -qa qemu-kvm libvirt qemu-kvm-1.6.2-6.fc20.x86_64 libvirt-1.1.3.5-2.fc20.x86_64 [root@localhost pkgs]# [root@localhost pkgs]# rpm -qa|grep *qemu* qemu-system-m68k-1.6.2-6.fc20.x86_64 qemu-kvm-1.6.2-6.fc20.x86_64 qemu-system-microblaze-1.6.2-6.fc20.x86_64 ipxe-roms-qemu-20130517-3.gitc4bce43.fc20.noarch qemu-common-1.6.2-6.fc20.x86_64 qemu-system-sh4-1.6.2-6.fc20.x86_64 qemu-system-sparc-1.6.2-6.fc20.x86_64 qemu-system-lm32-1.6.2-6.fc20.x86_64 qemu-img-1.6.2-6.fc20.x86_64 qemu-system-s390x-1.6.2-6.fc20.x86_64 qemu-system-cris-1.6.2-6.fc20.x86_64 qemu-1.6.2-6.fc20.x86_64 qemu-system-xtensa-1.6.2-6.fc20.x86_64 qemu-system-moxie-1.6.2-6.fc20.x86_64 qemu-system-ppc-1.6.2-6.fc20.x86_64 libvirt-daemon-driver-qemu-1.1.3.5-2.fc20.x86_64 qemu-system-mips-1.6.2-6.fc20.x86_64 qemu-system-alpha-1.6.2-6.fc20.x86_64 qemu-guest-agent-1.6.1-2.fc20.x86_64 qemu-user-1.6.2-6.fc20.x86_64 qemu-system-x86-1.6.2-6.fc20.x86_64 qemu-system-arm-1.6.2-6.fc20.x86_64 qemu-system-unicore32-1.6.2-6.fc20.x86_64 qemu-system-or32-1.6.2-6.fc20.x86_64 [root@localhost pkgs]# To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1347387/+subscriptions
[Qemu-devel] [Bug 1347555] [NEW] qemu build failure, hxtool is a bash script, not a /bin/sh script
Public bug reported: hxtool (part of the early build process) is a bash script. Running it with /bin/sh yields a syntax error on line 10: 10 STEXI*|ETEXI*|SQMP*|EQMP*) flag=$(($flag^1)) $(( expr )) is a bash extension, not part of /bin/sh. Note that replacing the sh in the first line in hxtool with /bin/bash does not help, because the script is run manually from the Makefile with sh: 154 $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h $ $@, GEN $@) The fix is to change those lines to 154 $(call quiet-command,bash $(SRC_PATH)/scripts/hxtool -h $ $@, GEN $@) (there are five or so). ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1347555 Title: qemu build failure, hxtool is a bash script, not a /bin/sh script Status in QEMU: New Bug description: hxtool (part of the early build process) is a bash script. Running it with /bin/sh yields a syntax error on line 10: 10 STEXI*|ETEXI*|SQMP*|EQMP*) flag=$(($flag^1)) $(( expr )) is a bash extension, not part of /bin/sh. Note that replacing the sh in the first line in hxtool with /bin/bash does not help, because the script is run manually from the Makefile with sh: 154 $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h $ $@, GEN $@) The fix is to change those lines to 154 $(call quiet-command,bash $(SRC_PATH)/scripts/hxtool -h $ $@, GEN $@) (there are five or so). To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1347555/+subscriptions
Re: [Qemu-devel] [PATCH 5/7] hw/core/sysbus: add fdt_add_node method
On 07/08/2014 03:52 PM, Alexander Graf wrote: On 07.07.14 09:08, Eric Auger wrote: This method is meant to be called on sysbus device dynamic instantiation (-device option). Devices that support this kind of instantiation must implement this method. Signed-off-by: Eric Auger eric.au...@linaro.org For the reason I stated earlier, I don't think it's a good idea to put device tree code into our device models. Hi Alex, I would propose we discuss that topic during next KVM call if you are available. Hope Peter will be available to join too. Because I feel stuck between not putting things in the machine file (1) - obviously we could put them in a helper module (2) - and not putting them in the device (3). Whatever the solution I fear we are going to pollute something: Any time a new device wants to support dynamic instantiation, we would need to modify the machine file or the helper module with 1 and 2 resp. In case we put it in the device we pollute this latter... My hope was that quite few QEMU platform devices would need to support that feature and hence would need to implement this dt node generation method. To me dynamic instantiation of platform device was not the mainstream solution. Then there is the fundamental question of technical feasibility of devising a generic PlatformParams that match all the specialization needs? Here I miss experience. In case we know the machine type and a small set of additional fields couldn't we do the adaptations you talked about, related to IRQs? Best Regards Eric Alex
Re: [Qemu-devel] [PATCH 7/7] hw/misc/platform_devices: Add platform_bus_base to PlatformDevtreeData
On 07/08/2014 03:53 PM, Alexander Graf wrote: On 07.07.14 09:08, Eric Auger wrote: The base address of the platform bus sometimes is used to build the reg property. --- Actually I did not succeed in doing it another way with Calxeda xgmac. If someone knows how to do without, please advise. Not sure I understand. The regs properties live inside the parent's ranges. So in device tree the only thing that should be aware of the bus offset is the platform bus node, no? yes I full agree with you as I mentioned in a previous email. I tried to use offset instead of range but I never succeeded in making it work. Maybe a syntax issue, ... I need to spend some more time on it/ get some help and fix that ... BR Eric Alex
Re: [Qemu-devel] [RFC PATCH 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
On 07/23/2014 08:25 AM, Yang Hongyang wrote: Virtual machine (VM) replication is a well known technique for providing application-agnostic software-implemented hardware fault tolerance non-stop service. COLO is a high availability solution. Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the same request from client, and generate response in parallel too. If the response packets from PVM and SVM are identical, they are released immediately. Otherwise, a VM checkpoint (on demand) is conducted. The idea is presented in Xen summit 2012, and 2013, and academia paper in SOCC 2013. It's also presented in KVM forum 2013: http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf Please refer to above document for detailed information. Please also refer to previous posted RFC proposal: http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html The patchset is also hosted on github: https://github.com/macrosheep/qemu/tree/colo_v0.1 This patchset is RFC, implements the frame of colo, without failover and nic/disk replication. But it is ready for demo the COLO idea above QEMU-Kvm. Steps using this patchset to get an overview of COLO: 1. configure the source with --enable-colo option Code that has to be opt-in tends to bitrot, because people don't configure their build-bots to opt in. What sort of penalties does opting in cause to the code if colo is not used? I'd much rather make the default to compile colo unless configured --disable-colo. Are there any pre-req libraries required for it to work? That would be the only reason to make the default of on or off conditional, rather than defaulting to on. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC PATCH 02/17] COLO: introduce an api colo_supported() to indicate COLO support
On 07/23/2014 08:25 AM, Yang Hongyang wrote: introduce an api colo_supported() to indicate COLO support, returns true if colo supported(configured with --enable-colo). Space before () in English sentences: s/supported(configured/supported (configured/ As I mentioned in the cover letter, defaulting to off is probably a bad idea; I'd rather default to on or even make it unconditional if it doesn't negatively affect the code base when not used. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- Makefile.objs | 1 + include/migration/migration-colo.h | 18 ++ migration-colo.c | 16 stubs/Makefile.objs| 1 + stubs/migration-colo.c | 16 5 files changed, 52 insertions(+) create mode 100644 include/migration/migration-colo.h create mode 100644 migration-colo.c create mode 100644 stubs/migration-colo.c -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC PATCH 12/17] COLO ctl: add a RunState RUN_STATE_COLO
On 07/23/2014 08:25 AM, Yang Hongyang wrote: Guest will enter this state when paused to save/resore VM state s/resore/restore/ under colo checkpoint. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- qapi-schema.json | 4 +++- vl.c | 8 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/qapi-schema.json b/qapi-schema.json index 807f5a2..b42171c 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -145,12 +145,14 @@ # @watchdog: the watchdog action is configured to pause and has been triggered # # @guest-panicked: guest has been panicked as a result of guest OS panic +# +# @colo: guest is paused to save/restore VM state under colo checkpoint Missing a '(since 2.2)' designation. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [Bug 1347555] [NEW] qemu build failure, hxtool is a bash script, not a /bin/sh script
On 07/23/2014 04:21 AM, Felix von Leitner wrote: Public bug reported: hxtool (part of the early build process) is a bash script. Running it with /bin/sh yields a syntax error on line 10: 10 STEXI*|ETEXI*|SQMP*|EQMP*) flag=$(($flag^1)) $(( expr )) is a bash extension, not part of /bin/sh. Wrong. $(( expr )) is mandated by POSIX. What system are you on where /bin/sh is not POSIX? (Solaris is the only platform where /bin/sh does not try to be POSIX-compliant, but who uses that for qemu?) What is the actual syntax error you are seeing? Is this a bug in dash on your distribution? I can't get dash to fail for me on Fedora: $ dash -c 'f=1; f=$(($f^1)); echo $f' 0 $ dash -n scripts/hxtool; echo $? 0 -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
[Qemu-devel] [Bug 1347555] Re: qemu build failure, hxtool is a bash script, not a /bin/sh script
I actually have bash installed as /bin/sh and /bin/bash. But I also have heirloom sh installed, which installs itself as /sbin/sh, and that happened to be first in my $PATH. Since the makefiles use sh script to run the scripts, that called the heirloom sh. http://heirloom.sourceforge.net/sh.html It is, it turns out, derived from OpenSolaris. So there you go :-) When I delete /sbin/sh, qemu builds. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1347555 Title: qemu build failure, hxtool is a bash script, not a /bin/sh script Status in QEMU: New Bug description: hxtool (part of the early build process) is a bash script. Running it with /bin/sh yields a syntax error on line 10: 10 STEXI*|ETEXI*|SQMP*|EQMP*) flag=$(($flag^1)) $(( expr )) is a bash extension, not part of /bin/sh. Note that replacing the sh in the first line in hxtool with /bin/bash does not help, because the script is run manually from the Makefile with sh: 154 $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h $ $@, GEN $@) The fix is to change those lines to 154 $(call quiet-command,bash $(SRC_PATH)/scripts/hxtool -h $ $@, GEN $@) (there are five or so). To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1347555/+subscriptions
Re: [Qemu-devel] [Bug 1347555] Re: qemu build failure, hxtool is a bash script, not a /bin/sh script
On 07/23/2014 10:13 AM, Felix von Leitner wrote: I actually have bash installed as /bin/sh and /bin/bash. But I also have heirloom sh installed, which installs itself as /sbin/sh, and that happened to be first in my $PATH. Since the makefiles use sh script to run the scripts, that called the heirloom sh. http://heirloom.sourceforge.net/sh.html It is, it turns out, derived from OpenSolaris. So there you go :-) When I delete /sbin/sh, qemu builds. Then the bug is not in qemu, but in your environment. Installing known-broken heirloom where it can be found first on a PATH search for sh is just asking for problems, not just with qemu, but with all SORTS of programs that expect POSIX semantics from a Linux /bin/sh. Rather than change the Makefile to invoke the script with bash, we could instead bend over backwards to rewrite the script in a way that works with non-POSIX shells (as in, flag=`expr $flag ^ 1`), but that feels backwards to me. Until someone is actively worried about porting qemu to a true Solaris environment, rather than just an heirloom-as-/bin/sh Linux environment, I don't think it's worth the effort. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH 0/2] pc: fix /etc/acpi/tables size in fw_cfg for -M pc-2.0
Changing the ACPI table size causes migration to break, and the memory hotplug work opened our eyes on how horribly we were breaking things in 2.0 already. Unfortunately when reviewing the design I assumed incorrectly that all tables would be placed in separate fw_cfg files. This would have been better, because you can always move stuff to a new SSDT (and thus a new file), keeping the sizes under control. Hard-code 64k as the maximum ACPI table size; for -M pc-i440fx-2.0 and -M pc-i440fx-1.7 compute the payload size of QEMU 2.0 and always use that one. This works always for QEMU 2.0, and also for 1.7 except for a few values of -smp maxcpus. The first patch is needed to shrink the ACPI tables and make them smaller than they used to be in 2.0. Please test and ack. I'll do more testing tomorrow. Paolo Paolo Bonzini (2): acpi-dsdt: procedurally generate _PRT pc: hack for migration compatibility from QEMU 2.0 hw/i386/acpi-build.c | 61 +++--- hw/i386/acpi-dsdt.dsl | 90 ++- hw/i386/pc_piix.c | 20 hw/i386/pc_q35.c | 5 +++ include/hw/i386/pc.h | 1 + 5 files changed, 122 insertions(+), 55 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT
This replaces the _PRT constant with a method that computes it. The problem is that the DSDT+SSDT have grown from 2.0 to 2.1, enough to cross the 8k barrier (we align the ACPI tables to 4k before putting them in fw_cfg). This causes problems with migration and the pc-2.0 machine type. The solution to the problem is to hardcode 64k as the limit, but this doesn't solve the bug with pc-2.0. The fix will be for QEMU 2.1 to use exactly the same size as QEMU 2.0 for the ACPI tables. First, however, we must make the actual AML size equal or smaller; to do this, rewrite _PRT in a way that saves over 1k of bytecode. Tested on Windows XP. Q35 already uses a method for _PRT so most guests should be okay. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-dsdt.dsl | 90 ++- 1 file changed, 39 insertions(+), 51 deletions(-) diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl index 3cc0ea0..6ba0170 100644 --- a/hw/i386/acpi-dsdt.dsl +++ b/hw/i386/acpi-dsdt.dsl @@ -181,57 +181,45 @@ DefinitionBlock ( Scope(\_SB) { Scope(PCI0) { -Name(_PRT, Package() { -/* PCI IRQ routing table, example from ACPI 2.0a specification, - section 6.2.8.1 */ -/* Note: we provide the same info as the PCI routing - table of the Bochs BIOS */ - -#define prt_slot(nr, lnk0, lnk1, lnk2, lnk3) \ -Package() { nr##, 0, lnk0, 0 }, \ -Package() { nr##, 1, lnk1, 0 }, \ -Package() { nr##, 2, lnk2, 0 }, \ -Package() { nr##, 3, lnk3, 0 } - -#define prt_slot0(nr) prt_slot(nr, LNKD, LNKA, LNKB, LNKC) -#define prt_slot1(nr) prt_slot(nr, LNKA, LNKB, LNKC, LNKD) -#define prt_slot2(nr) prt_slot(nr, LNKB, LNKC, LNKD, LNKA) -#define prt_slot3(nr) prt_slot(nr, LNKC, LNKD, LNKA, LNKB) - -prt_slot0(0x), -/* Device 1 is power mgmt device, and can only use irq 9 */ -prt_slot(0x0001, LNKS, LNKB, LNKC, LNKD), -prt_slot2(0x0002), -prt_slot3(0x0003), -prt_slot0(0x0004), -prt_slot1(0x0005), -prt_slot2(0x0006), -prt_slot3(0x0007), -prt_slot0(0x0008), -prt_slot1(0x0009), -prt_slot2(0x000a), -prt_slot3(0x000b), -prt_slot0(0x000c), -prt_slot1(0x000d), -prt_slot2(0x000e), -prt_slot3(0x000f), -prt_slot0(0x0010), -prt_slot1(0x0011), -prt_slot2(0x0012), -prt_slot3(0x0013), -prt_slot0(0x0014), -prt_slot1(0x0015), -prt_slot2(0x0016), -prt_slot3(0x0017), -prt_slot0(0x0018), -prt_slot1(0x0019), -prt_slot2(0x001a), -prt_slot3(0x001b), -prt_slot0(0x001c), -prt_slot1(0x001d), -prt_slot2(0x001e), -prt_slot3(0x001f), -}) +Method (_PRT, 0) { +Store(Package(128) {}, Local0) +Store(Zero, Local1) +While(LLess(Local1, 128)) { +// slot = pin 2 +Store(ShiftRight(Local1, 2), Local2) + +// lnk = (slot + pin) 3 +Store(And(Add(Local1, Local2), 3), Local3) +If (LEqual(Local3, 0)) { +Store(Package(4) { Zero, Zero, LNKD, Zero }, Local4) +} +If (LEqual(Local3, 1)) { +// device 1 is the power-management device, needs SCI +If (LEqual(Local1, 4)) { +Store(Package(4) { Zero, Zero, LNKS, Zero }, Local4) +} Else { +Store(Package(4) { Zero, Zero, LNKA, Zero }, Local4) +} +} +If (LEqual(Local3, 2)) { +Store(Package(4) { Zero, Zero, LNKB, Zero }, Local4) +} +If (LEqual(Local3, 3)) { +Store(Package(4) { Zero, Zero, LNKC, Zero }, Local4) +} + +// Complete the interrupt routing entry: +//Package(4) { 0x[slot], [pin], [link], 0) } + +Store(Or(ShiftLeft(Local2, 16), 0x), Index(Local4, 0)) +Store(And(Local1, 3),Index(Local4, 1)) +Store(Local4,Index(Local0, Local1)) + +Increment(Local1) +} + +Return(Local0) +} } Field(PCI0.ISA.P40C, ByteAcc, NoLock, Preserve) { -- 1.8.3.1
[Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0
Changing the ACPI table size causes migration to break, and the memory hotplug work opened our eyes on how horribly we were breaking things in 2.0 already. The ACPI table size is rounded to the next 4k, which one would think gives some headroom. In practice this is not the case, because the user can control the ACPI table size (each CPU adds 105 bytes) and so some -smp values will break the 4k boundary and fail to migrate. Similarly, PCI bridges add ~1870 bytes to the SSDT. To fix this, hard-code 64k as the maximum ACPI table size, which (despite being an order of magnitude smaller than 640k) should be enough for everyone. To fix migration from QEMU 2.0, compute the payload size of QEMU 2.0 and always use that one. The previous patch shrunk the ACPI tables enough that the QEMU 2.0 size should always be enough. Non-AML tables can change depending on the configuration (especially MADT, SRAT, HPET) but they remain the same between QEMU 2.0 and 2.1, so we only compute our padding based on the sizes of the SSDT and DSDT. Migration from QEMU 1.7 should work for guests that have a number of CPUs other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140, and that have no PCI bridges. It was already broken from QEMU 1.7 to QEMU 2.0 in the same way, though. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-build.c | 61 hw/i386/pc_piix.c| 20 + hw/i386/pc_q35.c | 5 + include/hw/i386/pc.h | 1 + 4 files changed, 83 insertions(+), 4 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index ebc5f03..7373d93 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -25,7 +25,9 @@ #include glib.h #include qemu-common.h #include qemu/bitmap.h +#include qemu/osdep.h #include qemu/range.h +#include qemu/error-report.h #include hw/pci/pci.h #include qom/cpu.h #include hw/i386/pc.h @@ -87,6 +89,8 @@ typedef struct AcpiBuildPciBusHotplugState { struct AcpiBuildPciBusHotplugState *parent; } AcpiBuildPciBusHotplugState; +unsigned bsel_alloc; + static void acpi_get_dsdt(AcpiMiscInfo *info) { uint16_t *applesmc_sta; @@ -759,8 +763,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque) static void acpi_set_pci_info(void) { PCIBus *bus = find_i440fx(); /* TODO: Q35 support */ -unsigned bsel_alloc = 0; +assert(bsel_alloc == 0); if (bus) { /* Scan all PCI buses. Set property to enable acpi based hotplug. */ pci_for_each_bus_depth_first(bus, acpi_set_bsel, NULL, bsel_alloc); @@ -1440,13 +1444,14 @@ static void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) { GArray *table_offsets; -unsigned facs, dsdt, rsdt; +unsigned facs, ssdt, dsdt, rsdt; AcpiCpuInfo cpu; AcpiPmInfo pm; AcpiMiscInfo misc; AcpiMcfgInfo mcfg; PcPciInfo pci; uint8_t *u; +size_t aml_len = 0; acpi_get_cpu_info(cpu); acpi_get_pm_info(pm); @@ -1474,13 +1479,20 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) dsdt = tables-table_data-len; build_dsdt(tables-table_data, tables-linker, misc); +/* Count the size of the DSDT and SSDT, we will need it for legacy + * sizing of ACPI tables. + */ +aml_len += tables-table_data-len - dsdt; + /* ACPI tables pointed to by RSDT */ acpi_add_table(table_offsets, tables-table_data); build_fadt(tables-table_data, tables-linker, pm, facs, dsdt); +ssdt = tables-table_data-len; acpi_add_table(table_offsets, tables-table_data); build_ssdt(tables-table_data, tables-linker, cpu, pm, misc, pci, guest_info); +aml_len += tables-table_data-len - ssdt; acpi_add_table(table_offsets, tables-table_data); build_madt(tables-table_data, tables-linker, cpu, guest_info); @@ -1513,12 +1525,53 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) /* RSDP is in FSEG memory, so allocate it separately */ build_rsdp(tables-rsdp, tables-linker, rsdt); -/* We'll expose it all to Guest so align size to reduce +/* We'll expose it all to Guest so we want to reduce * chance of size changes. * RSDP is small so it's easy to keep it immutable, no need to * bother with alignment. + * + * We used to align the tables to 4k, but of course this would + * too simple to be enough. 4k turned out to be too small an + * alignment very soon, and in fact it is almost impossible to + * keep the table size stable for all (max_cpus, max_memory_slots) + * combinations. So the table size is always 64k for pc-2.1 and + * we give an error if the table grows beyond that limit. + * + * We still have the problem of migrating from -M pc-2.0. For that, + * we exploit the fact that QEMU 2.1 generates _smaller_ tables than 2.0 + * and we can always pad the smaller tables with zeros. We can then use + * the exact size of
Re: [Qemu-devel] [PATCH] target-i386/FPU: wrong conversion infinity from float80 to int32/int64
On 23 July 2014 16:04, Dmitry Poletaev poletaev-q...@yandex.ru wrote: I'm understood. So, am I right? Pretty much, except it's better to use the accessor functions get_float_exception_flags() and set_float_exception_flags(). +if (env-fp_status.float_exception_flags FPUS_IE) { +val = 0x8000; Also this constant needs a ULL suffix or it won't build on 32 bit hosts. thanks -- PMM
Re: [Qemu-devel] [Bug 1347555] Re: qemu build failure, hxtool is a bash script, not a /bin/sh script
On 23 July 2014 17:31, Eric Blake ebl...@redhat.com wrote: Rather than change the Makefile to invoke the script with bash, we could instead bend over backwards to rewrite the script in a way that works with non-POSIX shells (as in, flag=`expr $flag ^ 1`), but that feels backwards to me. Until someone is actively worried about porting qemu to a true Solaris environment, rather than just an heirloom-as-/bin/sh Linux environment, I don't think it's worth the effort. My view on this has always been we shouldn't assume bash, but we can assume POSIX shell semantics. (And also that we should assume /bin/sh is a POSIX shell, because it's the 21st century, and Solaris should just get with it :-)) thanks -- PMM
[Qemu-devel] [ANNOUNCE] QEMU 1.7.2 Stable released
Hi everyone, I am pleased to announce that the QEMU v1.7.2 stable release is now available at: http://wiki.qemu.org/download/qemu-1.7.2.tar.bz2 v1.7.2 is now tagged in the official qemu.git repository, and the stable-1.7 branch has been updated accordingly: http://git.qemu.org/?p=qemu.git;a=shortlog;h=refs/heads/stable-1.7 This release contains 155 build/bug fixes, including important security updates relating to untrusted guest image files and migration/savevm sources. See the changelog below for relevant CVEs and additional details. Thank you to everyone involved! CHANGELOG: adba377: Update VERSION for 1.7.2 release (Michael Roth) 8fde73e: Allow mismatched virtio config-len (Dr. David Alan Gilbert) 14d9fb0: pci: assign devfn to pci_dev before calling pci_device_iommu_address_space() (Le Tan) 53e4895: hw: Fix qemu_allocate_irqs() leaks (Andreas Färber) bb485bf: sdhci: Fix misuse of qemu_free_irqs() (Andreas Färber) 02835d5: vnc: Fix tight_detect_smooth_image() for lossless case (Markus Armbruster) 41ee918: qapi: zero-initialize all QMP command parameters (Michael Roth) 0c60b74: nbd: Shutdown socket before closing. (Hani Benhabiles) 25351f6: nbd: Close socket on negotiation failure. (Hani Benhabiles) cf392d2: nbd: Don't validate from and len in NBD_CMD_DISC. (Hani Benhabiles) 3c3d8c6: nbd: Don't export a block device with no medium. (Hani Benhabiles) 62c754e: virtio-serial: don't migrate the config space (Alexander Graf) 0fd14a5: virtio-net: byteswap virtio-net header (Cédric Le Goater) 7a3cd5a: target-i386: Filter FEAT_7_0_EBX TCG features too (Eduardo Habkost) 8a93721: coroutine-win32.c: Add noinline attribute to work around gcc bug (Peter Maydell) b47506f: KVM: Fix GSI number space limit (Alexander Graf) f0c609d: usb: Fix usb-bt-dongle initialization. (Hani Benhabiles) 79bd778: vhost: fix resource leak in error handling (Michael S. Tsirkin) 36afdba: scsi-disk: fix bug in scsi_block_new_request() introduced by commit 137745c (Ulrich Obergfell) 63bf1e0: rdma: bug fixes (Michael R. Hines) 23dbc56: qga: Fix handle fd leak in acquire_privilege() (Gonglei) 4041945: aio: fix qemu_bh_schedule() bh-ctx race condition (Stefan Hajnoczi) 5019106: s390x/css: handle emw correctly for tsch (Cornelia Huck) f784615: target-arm: Fix errors in writes to generic timer control registers (Peter Maydell) e34feec: tcg-i386: Fix win64 qemu store (Richard Henderson) ccb08f5: linux-user: Don't overrun guest buffer in sched_getaffinity (Peter Maydell) cb34d1e: qemu-img: Plug memory leak in convert command (Markus Armbruster) df9c108: block/sheepdog: Plug memory leak in sd_snapshot_create() (Markus Armbruster) d3cd48a: block/vvfat: Plug memory leak in read_directory() (Markus Armbruster) 501da93: block/vvfat: Plug memory leak in check_directory_consistency() (Markus Armbruster) 7267e51: block/qapi: Plug memory leak in dump_qobject() case QTYPE_QERROR (Markus Armbruster) d1775fe: blockdev: Plug memory leak in drive_init() (Markus Armbruster) d2b9874: blockdev: Plug memory leak in blockdev_init() (Markus Armbruster) c2fb0f2: cputlb: Fix regression with TCG interpreter (bug 1310324) (Stefan Weil) 26b5102: target-xtensa: fix cross-page jumps/calls at the end of TB (Max Filippov) 44564f8: virtio-scsi: Plug memory leak on virtio_scsi_push_event() error path (Markus Armbruster) 2f1eb04: qcow1: Stricter backing file length check (Kevin Wolf) b53d866: qcow1: Validate image size (CVE-2014-0223) (Kevin Wolf) 8b17eb6: qcow1: Validate L2 table size (CVE-2014-0222) (Kevin Wolf) e6c55cf: qcow1: Check maximum cluster size (Kevin Wolf) 41819e9: qcow1: Make padding in the header explicit (Kevin Wolf) 97a0e27: parallels: Sanity check for s-tracks (CVE-2014-0142) (Kevin Wolf) 750336b: parallels: Fix catalog size integer overflow (CVE-2014-0143) (Kevin Wolf) cfa8008: qcow2: Check maximum L1 size in qcow2_snapshot_load_tmp() (CVE-2014-0143) (Kevin Wolf) d99c4e2: qcow2: Fix L1 allocation size in qcow2_snapshot_load_tmp() (CVE-2014-0145) (Kevin Wolf) 641c3ec: qcow2: Fix copy_sectors() with VM state (Kevin Wolf) c2c5272: qcow2: Fix NULL dereference in qcow2_open() error path (CVE-2014-0146) (Kevin Wolf) 759d386: block: Limit request size (CVE-2014-0143) (Kevin Wolf) b6f7fbd: dmg: prevent chunk buffer overflow (CVE-2014-0145) (Stefan Hajnoczi) d400b5d: dmg: use uint64_t consistently for sectors and lengths (Stefan Hajnoczi) 758c484: dmg: sanitize chunk length and sectorcount (CVE-2014-0145) (Stefan Hajnoczi) 4b50bd7: dmg: use appropriate types when reading chunks (Stefan Hajnoczi) 4ee5b9c: dmg: drop broken bdrv_pread() loop (Stefan Hajnoczi) ad08cae: dmg: prevent out-of-bounds array access on terminator (Stefan Hajnoczi) dedf4a5: dmg: coding style and indentation cleanup (Stefan Hajnoczi) 3c6347c: qcow2: Fix new L1 table size check (CVE-2014-0143) (Kevin Wolf) e1c8770: qcow2: Protect against some integer overflows in bdrv_check (Kevin Wolf) c874837: qcow2: Fix types in
Re: [Qemu-devel] [RFC PATCH 07/17] COLO buffer: implement colo buffer as well as QEMUFileOps based on it
On 07/23/2014 08:25 AM, Yang Hongyang wrote: We need a buffer to store migration data. On save side: all saved data was write into colo buffer first, so that we can know s/was write/is written/ the total size of the migration data. this can also separate the data transmission from colo control data, we use colo control data over socket fd to synchronous both side's stat. On restore side: all migration data was read into colo buffer first, then load data from the buffer: If network error happens while data transmission, s/while/during/ the slaver can still functinal because the migration data are not yet s/slaver/slave/ s/functinal/function/ s/are/is/ loaded. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- migration-colo.c | 112 +++ 1 file changed, 112 insertions(+) +/* colo buffer */ + +#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL) +#define COLO_BUFFER_MAX_SIZE (1000*1000*1000*10ULL) Spaces around binary operators. + +typedef struct colo_buffer { For consistency with the rest of the code base, name this ColoBuffer, not colo_buffer. +uint8_t *data; +uint64_t used; +uint64_t freed; +uint64_t size; +} colo_buffer_t; HACKING says to NOT name types with a trailing _t. Just name the typedef ColoBuffer. +static void colo_buffer_destroy(void) +{ +if (colo_buffer.data) { +g_free(colo_buffer.data); +colo_buffer.data = NULL; g_free(NULL) behaves sanely, just make these two lines unconditional. +static void colo_buffer_extend(uint64_t len) +{ +if (len colo_buffer.size - colo_buffer.used) { +len = len + colo_buffer.used - colo_buffer.size; +len = ROUND_UP(len, COLO_BUFFER_BASE_SIZE) + COLO_BUFFER_BASE_SIZE; + +colo_buffer.size += len; +if (colo_buffer.size COLO_BUFFER_MAX_SIZE) { +error_report(colo_buffer overflow!\n); No trailing \n in error_report(). -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT
On 07/23/14 18:37, Paolo Bonzini wrote: This replaces the _PRT constant with a method that computes it. The problem is that the DSDT+SSDT have grown from 2.0 to 2.1, enough to cross the 8k barrier (we align the ACPI tables to 4k before putting them in fw_cfg). This causes problems with migration and the pc-2.0 machine type. The solution to the problem is to hardcode 64k as the limit, but this doesn't solve the bug with pc-2.0. The fix will be for QEMU 2.1 to use exactly the same size as QEMU 2.0 for the ACPI tables. First, however, we must make the actual AML size equal or smaller; to do this, rewrite _PRT in a way that saves over 1k of bytecode. Tested on Windows XP. Q35 already uses a method for _PRT so most guests should be okay. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-dsdt.dsl | 90 ++- 1 file changed, 39 insertions(+), 51 deletions(-) diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl index 3cc0ea0..6ba0170 100644 --- a/hw/i386/acpi-dsdt.dsl +++ b/hw/i386/acpi-dsdt.dsl @@ -181,57 +181,45 @@ DefinitionBlock ( Scope(\_SB) { Scope(PCI0) { -Name(_PRT, Package() { -/* PCI IRQ routing table, example from ACPI 2.0a specification, - section 6.2.8.1 */ -/* Note: we provide the same info as the PCI routing - table of the Bochs BIOS */ - -#define prt_slot(nr, lnk0, lnk1, lnk2, lnk3) \ -Package() { nr##, 0, lnk0, 0 }, \ -Package() { nr##, 1, lnk1, 0 }, \ -Package() { nr##, 2, lnk2, 0 }, \ -Package() { nr##, 3, lnk3, 0 } - -#define prt_slot0(nr) prt_slot(nr, LNKD, LNKA, LNKB, LNKC) -#define prt_slot1(nr) prt_slot(nr, LNKA, LNKB, LNKC, LNKD) -#define prt_slot2(nr) prt_slot(nr, LNKB, LNKC, LNKD, LNKA) -#define prt_slot3(nr) prt_slot(nr, LNKC, LNKD, LNKA, LNKB) - -prt_slot0(0x), -/* Device 1 is power mgmt device, and can only use irq 9 */ -prt_slot(0x0001, LNKS, LNKB, LNKC, LNKD), -prt_slot2(0x0002), -prt_slot3(0x0003), -prt_slot0(0x0004), -prt_slot1(0x0005), -prt_slot2(0x0006), -prt_slot3(0x0007), -prt_slot0(0x0008), -prt_slot1(0x0009), -prt_slot2(0x000a), -prt_slot3(0x000b), -prt_slot0(0x000c), -prt_slot1(0x000d), -prt_slot2(0x000e), -prt_slot3(0x000f), -prt_slot0(0x0010), -prt_slot1(0x0011), -prt_slot2(0x0012), -prt_slot3(0x0013), -prt_slot0(0x0014), -prt_slot1(0x0015), -prt_slot2(0x0016), -prt_slot3(0x0017), -prt_slot0(0x0018), -prt_slot1(0x0019), -prt_slot2(0x001a), -prt_slot3(0x001b), -prt_slot0(0x001c), -prt_slot1(0x001d), -prt_slot2(0x001e), -prt_slot3(0x001f), -}) +Method (_PRT, 0) { +Store(Package(128) {}, Local0) +Store(Zero, Local1) +While(LLess(Local1, 128)) { +// slot = pin 2 +Store(ShiftRight(Local1, 2), Local2) + +// lnk = (slot + pin) 3 +Store(And(Add(Local1, Local2), 3), Local3) +If (LEqual(Local3, 0)) { +Store(Package(4) { Zero, Zero, LNKD, Zero }, Local4) +} +If (LEqual(Local3, 1)) { +// device 1 is the power-management device, needs SCI +If (LEqual(Local1, 4)) { +Store(Package(4) { Zero, Zero, LNKS, Zero }, Local4) +} Else { +Store(Package(4) { Zero, Zero, LNKA, Zero }, Local4) +} +} +If (LEqual(Local3, 2)) { +Store(Package(4) { Zero, Zero, LNKB, Zero }, Local4) +} +If (LEqual(Local3, 3)) { +Store(Package(4) { Zero, Zero, LNKC, Zero }, Local4) +} + +// Complete the interrupt routing entry: +//Package(4) { 0x[slot], [pin], [link], 0) } + +Store(Or(ShiftLeft(Local2, 16), 0x), Index(Local4, 0)) +Store(And(Local1, 3),Index(Local4, 1)) +Store(Local4,Index(Local0, Local1)) + +Increment(Local1) +} + +
Re: [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0
On 07/23/14 18:37, Paolo Bonzini wrote: Changing the ACPI table size causes migration to break, and the memory hotplug work opened our eyes on how horribly we were breaking things in 2.0 already. The ACPI table size is rounded to the next 4k, which one would think gives some headroom. In practice this is not the case, because the user can control the ACPI table size (each CPU adds 105 bytes) and so some -smp values will break the 4k boundary and fail to migrate. Similarly, PCI bridges add ~1870 bytes to the SSDT. To fix this, hard-code 64k as the maximum ACPI table size, which (despite being an order of magnitude smaller than 640k) should be enough for everyone. To fix migration from QEMU 2.0, compute the payload size of QEMU 2.0 and always use that one. The previous patch shrunk the ACPI tables enough that the QEMU 2.0 size should always be enough. Non-AML tables can change depending on the configuration (especially MADT, SRAT, HPET) but they remain the same between QEMU 2.0 and 2.1, so we only compute our padding based on the sizes of the SSDT and DSDT. Migration from QEMU 1.7 should work for guests that have a number of CPUs other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140, and that have no PCI bridges. It was already broken from QEMU 1.7 to QEMU 2.0 in the same way, though. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-build.c | 61 hw/i386/pc_piix.c| 20 + hw/i386/pc_q35.c | 5 + include/hw/i386/pc.h | 1 + 4 files changed, 83 insertions(+), 4 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index ebc5f03..7373d93 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -25,7 +25,9 @@ #include glib.h #include qemu-common.h #include qemu/bitmap.h +#include qemu/osdep.h #include qemu/range.h +#include qemu/error-report.h #include hw/pci/pci.h #include qom/cpu.h #include hw/i386/pc.h @@ -87,6 +89,8 @@ typedef struct AcpiBuildPciBusHotplugState { struct AcpiBuildPciBusHotplugState *parent; } AcpiBuildPciBusHotplugState; +unsigned bsel_alloc; + static void acpi_get_dsdt(AcpiMiscInfo *info) { uint16_t *applesmc_sta; @@ -759,8 +763,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque) static void acpi_set_pci_info(void) { PCIBus *bus = find_i440fx(); /* TODO: Q35 support */ -unsigned bsel_alloc = 0; +assert(bsel_alloc == 0); if (bus) { /* Scan all PCI buses. Set property to enable acpi based hotplug. */ pci_for_each_bus_depth_first(bus, acpi_set_bsel, NULL, bsel_alloc); @@ -1440,13 +1444,14 @@ static void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) { GArray *table_offsets; -unsigned facs, dsdt, rsdt; +unsigned facs, ssdt, dsdt, rsdt; AcpiCpuInfo cpu; AcpiPmInfo pm; AcpiMiscInfo misc; AcpiMcfgInfo mcfg; PcPciInfo pci; uint8_t *u; +size_t aml_len = 0; acpi_get_cpu_info(cpu); acpi_get_pm_info(pm); @@ -1474,13 +1479,20 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) dsdt = tables-table_data-len; build_dsdt(tables-table_data, tables-linker, misc); +/* Count the size of the DSDT and SSDT, we will need it for legacy + * sizing of ACPI tables. + */ +aml_len += tables-table_data-len - dsdt; + /* ACPI tables pointed to by RSDT */ acpi_add_table(table_offsets, tables-table_data); build_fadt(tables-table_data, tables-linker, pm, facs, dsdt); +ssdt = tables-table_data-len; acpi_add_table(table_offsets, tables-table_data); build_ssdt(tables-table_data, tables-linker, cpu, pm, misc, pci, guest_info); +aml_len += tables-table_data-len - ssdt; acpi_add_table(table_offsets, tables-table_data); build_madt(tables-table_data, tables-linker, cpu, guest_info); @@ -1513,12 +1525,53 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) /* RSDP is in FSEG memory, so allocate it separately */ build_rsdp(tables-rsdp, tables-linker, rsdt); -/* We'll expose it all to Guest so align size to reduce +/* We'll expose it all to Guest so we want to reduce * chance of size changes. * RSDP is small so it's easy to keep it immutable, no need to * bother with alignment. + * + * We used to align the tables to 4k, but of course this would + * too simple to be enough. 4k turned out to be too small an + * alignment very soon, and in fact it is almost impossible to + * keep the table size stable for all (max_cpus, max_memory_slots) + * combinations. So the table size is always 64k for pc-2.1 and + * we give an error if the table grows beyond that limit. + * + * We still have the problem of migrating from -M pc-2.0. For that, + * we
[Qemu-devel] [PATCH] arm64: 64K pages and 1024MB guest
kvm_set_phys_mem doesn't work on arm64 with memory 1GB. It exits with: kvm_set_phys_mem: error registering slot: Invalid argument An example of the failing address and size are start_addr == 0x90011000 and size=0xaffef000. As you can see both of these are 4K aligned, not 64K aligned. At 1024MB or smaller qemu only makes one call to kvm_set_user_memory_region, so the start_addr and size are aligned by accident and the bug doesn't happen. The following patch makes things work for me on an arm64 SOC. I also smoke tested the patch on an x86-64 box and qemu seemed to still run fine there with the patch applied. Cc: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Joel Schopp joel.sch...@amd.com --- kvm-all.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 1402f4f..1975862 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -618,14 +618,14 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add) /* kvm works in page size chunks, but the function may be called with sub-page size and unaligned start address. */ -delta = TARGET_PAGE_ALIGN(size) - size; +delta = HOST_PAGE_ALIGN(start_addr) - start_addr; if (delta size) { return; } start_addr += delta; size -= delta; -size = TARGET_PAGE_MASK; -if (!size || (start_addr ~TARGET_PAGE_MASK)) { +size = qemu_host_page_mask; +if (!size || (start_addr ~qemu_host_page_mask)) { return; }
Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation
Am 22.07.2014 17:47, schrieb Le Tan: Add support for emulating Intel IOMMU according to the VT-d specification for the q35 chipset machine. Implement the logic for DMAR (DMA remapping) without PASID support. Use register-based invalidation for context-cache invalidation and IOTLB invalidation. Basic fault reporting and caching are not implemented yet. Signed-off-by: Le Tan tamlokv...@gmail.com --- hw/i386/Makefile.objs |1 + hw/i386/intel_iommu.c | 1139 + include/hw/i386/intel_iommu.h | 350 + 3 files changed, 1490 insertions(+) create mode 100644 hw/i386/intel_iommu.c create mode 100644 include/hw/i386/intel_iommu.h [...] diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c new file mode 100644 index 000..3ba0e1e --- /dev/null +++ b/hw/i386/intel_iommu.c @@ -0,0 +1,1139 @@ +/* + * QEMU emulation of an Intel IOMMU (VT-d) + * (DMA Remapping device) + * + * Copyright (c) 2013 Knut Omang, Oracle knut.om...@oracle.com + * Copyright (C) 2014 Le Tan, tamlokv...@gmail.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ + I suggest replacing the FSF address here (and in other files) by the URL: * You should have received a copy of the GNU General Public License along * with this program; if not, see http://www.gnu.org/licenses/. This is the standard used for most GPL text in QEMU source files. Regards Stefan W.
Re: [Qemu-devel] [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On Sat, Jul 19, 2014 at 12:27:21AM +, Kay, Allen M wrote: For the MCH PCI registers that do need to be read - can you tell us which ones those are? In qemu/hw/xen_pt_igd.c/igd_pci_read(), following MCH PCI config register reads are passthrough to the host HW. Some of the registers are needed by Ironlake GFX driver which we probably can remove. I did a trace recently on Broadwell, the number of register accessed are even smaller (0, 2, 2c, 2e, 50, 52, a0, a4). Given that we now have integrated MCH and GPU in the same socket, looks like driver can easily remove reads for offsets 0 - 0x2e. case 0x00:/* vendor id */ case 0x02:/* device id */ case 0x08:/* revision id */ case 0x2c:/* sybsystem vendor id */ case 0x2e:/* sybsystem id */ Right. We can fix the i915 to use the mechanism that Michael mentioned. (see attached RFC patches) case 0x50:/* SNB: processor graphics control register */ case 0x52:/* processor graphics control register */ case 0xa0:/* top of memory */ case 0xb0:/* ILK: BSM: should read from dev 2 offset 0x5c */ case 0x58:/* SNB: PAVPC Offset */ case 0xa4:/* SNB: graphics base of stolen memory */ case 0xa8:/* SNB: base of GTT stolen memory */ I dug in the intel-gtt.c (see ironlake_gtt_driver) to figure out this a bit more. As in, I speculated, what if we returned 0 (and not implement any support for reading from these registers). What would happen? 0x52 for Ironlake (g5): -- It looks like intel_gmch_probe is called when i915_gem_gtt_init starts (there is a lot of code that looks to be used between intel-gtt.c and i915.c). Anyhow the interesting parts are that i9xx_setup ends up calling ioremap the i915 BAR to setup some of these registers for older generations. Then i965_gtt_total_entries gets which reads 0x52, but it is only needed for v5 generation. For other (v4 and G33) it reads it from the GPU's 0x2020 register. If there is a mismatch, it writes to the GPU at 0x2020 to update the the size based on the bridge. And then it reads from 0x2020 and that is returned and stuck in intel_private.gtt_total_entries. So 0x52 in the emulated bridge could be populated with what the GPU has at 0x2020. And the writes go in the emulated area. 0x52 for v6 - v8: - We seem to go to gen6_gmch_probe which just figures out the the GTT based on the GPU's BAR sizes. The stolen values are read from 0x50 from the GPU. So no need to touch the bridge (see gen6_gmch_probe) OK, so no need to have 0x52 or 0x50 in the bridge. 0xA0: - Could not find any reference in the Linux code. Why would Windows driver need this? If we returned the _real_ TOM would it matter? Is it used to figure out the device should use 32-bit DMA operations or 40-bit? 0xb0 or 0x5c: - No mention of them in the Linux code. 0x58, 0xa4, 0xa8: - No usage of them in the Linux code. We seem to be using the 0x52 from the bridge and 0x2020 from the GPU for this functionality. So in theory*, if using Ironlake we need to have a proper value in 0x52 in the bridge. But for the later chipsets we do not need these values (I am assuming that intel_setup_mchbar can safely return as it does that for Ironlake and could very well for later generations). Can this be reflected in the Windows driver as well? P.S. *theory: That is assuming we modify the Linux i915_drv.c:intel_detect_pch to pick up the id as suggested earlier. See the RFC patches attached. (Not compile tested at all!) Allen -Original Message- From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] Sent: Friday, July 18, 2014 6:45 AM To: Kay, Allen M Cc: Michael S. Tsirkin; Jesse Barnes; peter.mayd...@linaro.org; xen-de...@lists.xensource.com; Ross Philipson; airl...@linux.ie; daniel.vet...@ffwll.ch; intel-...@lists.freedesktop.org; kelly.zyta...@amd.com; qemu-devel@nongnu.org; Anthony Perard; Stefano Stabellini; anth...@codemonkey.ws; Paolo Bonzini; Zhang, Yang Z; Chen, Tiejun Subject: Re: [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support On Thu, Jul 17, 2014 at 05:37:12PM +, Kay, Allen M wrote: That sounds great. Tiejun could you confirm that with windows driver guys too? I believe windows driver can also assume specific CPU/PCH combos. I will discuss this with native Windows driver guys. Preferably, the same code path can be used for both native and virtualization cases to avoid frequent breakage as most developers and QA do not test new code changes in virtualization environment. I have verified that Windows driver do not need to write to any MCH PCI registers on HSW/BDW so the PCI write function can be
[Qemu-devel] [PATCH v3 1/5] block: allow bdrv_unref() to be passed NULL pointers
If bdrv_unref() is passed a NULL BDS pointer, it is safe to exit with no operation. This will allow cleanup code to blindly call bdrv_unref() on a BDS that has been initialized to NULL. Reviewed-by: Max Reitz mre...@redhat.com Signed-off-by: Jeff Cody jc...@redhat.com --- block.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block.c b/block.c index 23f366d..f79efc8 100644 --- a/block.c +++ b/block.c @@ -5385,6 +5385,9 @@ void bdrv_ref(BlockDriverState *bs) * deleted. */ void bdrv_unref(BlockDriverState *bs) { +if (!bs) { +return; +} assert(bs-refcnt 0); if (--bs-refcnt == 0) { bdrv_delete(bs); -- 1.9.3
[Qemu-devel] [PATCH v3 2/5] block: vdi - use block layer ops in vdi_create, instead of posix calls
Use the block layer to create, and write to, the image file in the VDI .bdrv_create() operation. This has a couple of benefits: Images can now be created over protocols, and hacks such as NOCOW are not needed in the image format driver, and the underlying file protocol appropriate for the host OS can be relied upon. Also some minor cleanup for error handling. Reviewed-by: Max Reitz mre...@redhat.com Signed-off-by: Jeff Cody jc...@redhat.com --- block/vdi.c | 75 - 1 file changed, 29 insertions(+), 46 deletions(-) diff --git a/block/vdi.c b/block/vdi.c index 197bd77..070acb6 100644 --- a/block/vdi.c +++ b/block/vdi.c @@ -53,13 +53,6 @@ #include block/block_int.h #include qemu/module.h #include migration/migration.h -#ifdef __linux__ -#include linux/fs.h -#include sys/ioctl.h -#ifndef FS_NOCOW_FL -#define FS_NOCOW_FL 0x0080 /* Do not cow file */ -#endif -#endif #if defined(CONFIG_UUID) #include uuid/uuid.h @@ -681,7 +674,6 @@ static int vdi_co_write(BlockDriverState *bs, static int vdi_create(const char *filename, QemuOpts *opts, Error **errp) { -int fd; int result = 0; uint64_t bytes = 0; uint32_t blocks; @@ -690,7 +682,10 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp) VdiHeader header; size_t i; size_t bmap_size; -bool nocow = false; +int64_t offset = 0; +Error *local_err = NULL; +BlockDriverState *bs = NULL; +uint32_t *bmap = NULL; logout(\n); @@ -707,7 +702,6 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp) image_type = VDI_TYPE_STATIC; } #endif -nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false); if (bytes VDI_DISK_SIZE_MAX) { result = -ENOTSUP; @@ -717,27 +711,16 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp) goto exit; } -fd = qemu_open(filename, - O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE, - 0644); -if (fd 0) { -result = -errno; +result = bdrv_create_file(filename, opts, local_err); +if (result 0) { +error_propagate(errp, local_err); goto exit; } - -if (nocow) { -#ifdef __linux__ -/* Set NOCOW flag to solve performance issue on fs like btrfs. - * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value will - * be ignored since any failure of this operation should not block the - * left work. - */ -int attr; -if (ioctl(fd, FS_IOC_GETFLAGS, attr) == 0) { -attr |= FS_NOCOW_FL; -ioctl(fd, FS_IOC_SETFLAGS, attr); -} -#endif +result = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL, + NULL, local_err); +if (result 0) { +error_propagate(errp, local_err); +goto exit; } /* We need enough blocks to store the given disk size, @@ -769,13 +752,15 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp) vdi_header_print(header); #endif vdi_header_to_le(header); -if (write(fd, header, sizeof(header)) 0) { -result = -errno; -goto close_and_exit; +result = bdrv_pwrite_sync(bs, offset, header, sizeof(header)); +if (result 0) { +error_setg(errp, Error writing header to %s, filename); +goto exit; } +offset += sizeof(header); if (bmap_size 0) { -uint32_t *bmap = g_malloc0(bmap_size); +bmap = g_malloc0(bmap_size); for (i = 0; i blocks; i++) { if (image_type == VDI_TYPE_STATIC) { bmap[i] = i; @@ -783,27 +768,25 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp) bmap[i] = VDI_UNALLOCATED; } } -if (write(fd, bmap, bmap_size) 0) { -result = -errno; -g_free(bmap); -goto close_and_exit; +result = bdrv_pwrite_sync(bs, offset, bmap, bmap_size); +if (result 0) { +error_setg(errp, Error writing bmap to %s, filename); +goto exit; } -g_free(bmap); +offset += bmap_size; } if (image_type == VDI_TYPE_STATIC) { -if (ftruncate(fd, sizeof(header) + bmap_size + blocks * block_size)) { -result = -errno; -goto close_and_exit; +result = bdrv_truncate(bs, offset + blocks * block_size); +if (result 0) { +error_setg(errp, Failed to statically allocate %s, filename); +goto exit; } } -close_and_exit: -if ((close(fd) 0) !result) { -result = -errno; -} - exit: +bdrv_unref(bs); +g_free(bmap); return result; } -- 1.9.3
[Qemu-devel] [PATCH v3 0/5] Allow VPC and VDI to be created over protocols
Changes from v2 - v3: * Patch 2: Removed extra #ifdef __linux__ from top of file (Max) * Patch 4: Removed extra #ifdef __linux__ from top of file (Max) * Patch 5: Removed output from debug cruft (Max) * Added Max's R-b to remaining patches Changes from v1 - v2: * Patch 2: Use'bs' instead of 'bs-file' (Max) * Patch 3: Same as patch 2 (ripple through) * Patch 5: Update VDI test for static image (Kevin) * Added Max's R-b to patches 1,3,4 This allows VPC and VDI to be created over protocols; currently, they use posix calls directly to open, seek, and write into new image files. This obviously precludes them from being able to be created over a protocol, like glusterfs. Jeff Cody (5): block: allow bdrv_unref() to be passed NULL pointers block: vdi - use block layer ops in vdi_create, instead of posix calls block: use the standard 'ret' instead of 'result' block: vpc - use block layer ops in vpc_create, instead of posix calls block: iotest - update 084 to test static VDI image creation block.c| 3 ++ block/vdi.c| 89 +++-- block/vpc.c| 106 ++--- tests/qemu-iotests/084 | 16 ++- tests/qemu-iotests/084.out | 14 ++ 5 files changed, 110 insertions(+), 118 deletions(-) -- 1.9.3
[Qemu-devel] [PATCH v3 4/5] block: vpc - use block layer ops in vpc_create, instead of posix calls
Use the block layer to create, and write to, the image file in the VPC .bdrv_create() operation. This has a couple of benefits: Images can now be created over protocols, and hacks such as NOCOW are not needed in the image format driver, and the underlying file protocol appropriate for the host OS can be relied upon. Reviewed-by: Max Reitz mre...@redhat.com Signed-off-by: Jeff Cody jc...@redhat.com --- block/vpc.c | 106 1 file changed, 43 insertions(+), 63 deletions(-) diff --git a/block/vpc.c b/block/vpc.c index 8b376a4..9690344 100644 --- a/block/vpc.c +++ b/block/vpc.c @@ -29,13 +29,6 @@ #if defined(CONFIG_UUID) #include uuid/uuid.h #endif -#ifdef __linux__ -#include linux/fs.h -#include sys/ioctl.h -#ifndef FS_NOCOW_FL -#define FS_NOCOW_FL 0x0080 /* Do not cow file */ -#endif -#endif /**/ @@ -656,39 +649,41 @@ static int calculate_geometry(int64_t total_sectors, uint16_t* cyls, return 0; } -static int create_dynamic_disk(int fd, uint8_t *buf, int64_t total_sectors) +static int create_dynamic_disk(BlockDriverState *bs, uint8_t *buf, + int64_t total_sectors) { VHDDynDiskHeader *dyndisk_header = (VHDDynDiskHeader *) buf; size_t block_size, num_bat_entries; int i; -int ret = -EIO; +int ret; +int64_t offset = 0; // Write the footer (twice: at the beginning and at the end) block_size = 0x20; num_bat_entries = (total_sectors + block_size / 512) / (block_size / 512); -if (write(fd, buf, HEADER_SIZE) != HEADER_SIZE) { +ret = bdrv_pwrite_sync(bs, offset, buf, HEADER_SIZE); +if (ret) { goto fail; } -if (lseek(fd, 1536 + ((num_bat_entries * 4 + 511) ~511), SEEK_SET) 0) { -goto fail; -} -if (write(fd, buf, HEADER_SIZE) != HEADER_SIZE) { +offset = 1536 + ((num_bat_entries * 4 + 511) ~511); +ret = bdrv_pwrite_sync(bs, offset, buf, HEADER_SIZE); +if (ret 0) { goto fail; } // Write the initial BAT -if (lseek(fd, 3 * 512, SEEK_SET) 0) { -goto fail; -} +offset = 3 * 512; memset(buf, 0xFF, 512); for (i = 0; i (num_bat_entries * 4 + 511) / 512; i++) { -if (write(fd, buf, 512) != 512) { +ret = bdrv_pwrite_sync(bs, offset, buf, 512); +if (ret 0) { goto fail; } +offset += 512; } // Prepare the Dynamic Disk Header @@ -709,39 +704,35 @@ static int create_dynamic_disk(int fd, uint8_t *buf, int64_t total_sectors) dyndisk_header-checksum = be32_to_cpu(vpc_checksum(buf, 1024)); // Write the header -if (lseek(fd, 512, SEEK_SET) 0) { -goto fail; -} +offset = 512; -if (write(fd, buf, 1024) != 1024) { +ret = bdrv_pwrite_sync(bs, offset, buf, 1024); +if (ret 0) { goto fail; } -ret = 0; fail: return ret; } -static int create_fixed_disk(int fd, uint8_t *buf, int64_t total_size) +static int create_fixed_disk(BlockDriverState *bs, uint8_t *buf, + int64_t total_size) { -int ret = -EIO; +int ret; /* Add footer to total size */ -total_size += 512; -if (ftruncate(fd, total_size) != 0) { -ret = -errno; -goto fail; -} -if (lseek(fd, -512, SEEK_END) 0) { -goto fail; -} -if (write(fd, buf, HEADER_SIZE) != HEADER_SIZE) { -goto fail; +total_size += HEADER_SIZE; + +ret = bdrv_truncate(bs, total_size); +if (ret 0) { +return ret; } -ret = 0; +ret = bdrv_pwrite_sync(bs, total_size - HEADER_SIZE, buf, HEADER_SIZE); +if (ret 0) { +return ret; +} - fail: return ret; } @@ -750,7 +741,7 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp) uint8_t buf[1024]; VHDFooter *footer = (VHDFooter *) buf; char *disk_type_param; -int fd, i; +int i; uint16_t cyls = 0; uint8_t heads = 0; uint8_t secs_per_cyl = 0; @@ -758,7 +749,8 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp) int64_t total_size; int disk_type; int ret = -EIO; -bool nocow = false; +Error *local_err = NULL; +BlockDriverState *bs = NULL; /* Read out options */ total_size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0); @@ -775,28 +767,17 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp) } else { disk_type = VHD_DYNAMIC; } -nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false); -/* Create the file */ -fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644); -if (fd 0) { -ret = -EIO; +ret = bdrv_create_file(filename, opts, local_err); +if (ret 0) { +error_propagate(errp, local_err);
[Qemu-devel] [PATCH v3 5/5] block: iotest - update 084 to test static VDI image creation
This updates the VDI corruption test to also test static VDI image creation, as well as the default dynamic image creation. Reviewed-by: Max Reitz mre...@redhat.com Signed-off-by: Jeff Cody jc...@redhat.com --- tests/qemu-iotests/084 | 16 ++-- tests/qemu-iotests/084.out | 14 ++ 2 files changed, 28 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/084 b/tests/qemu-iotests/084 index cb4d7b7..ae33c2c 100755 --- a/tests/qemu-iotests/084 +++ b/tests/qemu-iotests/084 @@ -1,6 +1,7 @@ #!/bin/bash # -# Test case for VDI header corruption; image too large, and too many blocks +# Test case for VDI header corruption; image too large, and too many blocks. +# Also simple test for creating dynamic and static VDI images. # # Copyright (C) 2013 Red Hat, Inc. # @@ -43,14 +44,25 @@ _supported_fmt vdi _supported_proto generic _supported_os Linux +size=64M ds_offset=368 # disk image size field offset bs_offset=376 # block size field offset bii_offset=384 # block in image field offset echo +echo === Statically allocated image creation === +echo +_make_test_img $size -o static +_img_info +stat -cdisk image file size in bytes: %s ${TEST_IMG} +_cleanup_test_img + +echo echo === Testing image size bounds === echo -_make_test_img 64M +_make_test_img $size +_img_info +stat -cdisk image file size in bytes: %s ${TEST_IMG} # check for image size too large # poke max image size, and appropriate blocks_in_image value diff --git a/tests/qemu-iotests/084.out b/tests/qemu-iotests/084.out index c7120d9..ea29ae0 100644 --- a/tests/qemu-iotests/084.out +++ b/tests/qemu-iotests/084.out @@ -1,8 +1,22 @@ QA output created by 084 +=== Statically allocated image creation === + +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 +image: TEST_DIR/t.IMGFMT +file format: IMGFMT +virtual size: 64M (67108864 bytes) +cluster_size: 1048576 +disk image file size in bytes: 67109888 + === Testing image size bounds === Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 +image: TEST_DIR/t.IMGFMT +file format: IMGFMT +virtual size: 64M (67108864 bytes) +cluster_size: 1048576 +disk image file size in bytes: 1024 Test 1: Maximum size (1024 TB): qemu-img: Could not open 'TEST_DIR/t.IMGFMT': Could not open 'TEST_DIR/t.IMGFMT': Invalid argument -- 1.9.3
[Qemu-devel] [PATCH v3 3/5] block: use the standard 'ret' instead of 'result'
Most QEMU code uses 'ret' for function return values. The VDI driver uses a mix of 'result' and 'ret'. This cleans that up, switching over to the standard 'ret' usage. Reviewed-by: Max Reitz mre...@redhat.com Signed-off-by: Jeff Cody jc...@redhat.com --- block/vdi.c | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/block/vdi.c b/block/vdi.c index 070acb6..9d62a3c 100644 --- a/block/vdi.c +++ b/block/vdi.c @@ -350,23 +350,23 @@ static int vdi_make_empty(BlockDriverState *bs) static int vdi_probe(const uint8_t *buf, int buf_size, const char *filename) { const VdiHeader *header = (const VdiHeader *)buf; -int result = 0; +int ret = 0; logout(\n); if (buf_size sizeof(*header)) { /* Header too small, no VDI. */ } else if (le32_to_cpu(header-signature) == VDI_SIGNATURE) { -result = 100; +ret = 100; } -if (result == 0) { +if (ret == 0) { logout(no vdi image\n); } else { logout(%s, header-text); } -return result; +return ret; } static int vdi_open(BlockDriverState *bs, QDict *options, int flags, @@ -674,7 +674,7 @@ static int vdi_co_write(BlockDriverState *bs, static int vdi_create(const char *filename, QemuOpts *opts, Error **errp) { -int result = 0; +int ret = 0; uint64_t bytes = 0; uint32_t blocks; size_t block_size = DEFAULT_CLUSTER_SIZE; @@ -704,21 +704,21 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp) #endif if (bytes VDI_DISK_SIZE_MAX) { -result = -ENOTSUP; +ret = -ENOTSUP; error_setg(errp, Unsupported VDI image size (size is 0x% PRIx64 , max supported is 0x% PRIx64 ), bytes, VDI_DISK_SIZE_MAX); goto exit; } -result = bdrv_create_file(filename, opts, local_err); -if (result 0) { +ret = bdrv_create_file(filename, opts, local_err); +if (ret 0) { error_propagate(errp, local_err); goto exit; } -result = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL, - NULL, local_err); -if (result 0) { +ret = bdrv_open(bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL, +NULL, local_err); +if (ret 0) { error_propagate(errp, local_err); goto exit; } @@ -752,8 +752,8 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp) vdi_header_print(header); #endif vdi_header_to_le(header); -result = bdrv_pwrite_sync(bs, offset, header, sizeof(header)); -if (result 0) { +ret = bdrv_pwrite_sync(bs, offset, header, sizeof(header)); +if (ret 0) { error_setg(errp, Error writing header to %s, filename); goto exit; } @@ -768,8 +768,8 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp) bmap[i] = VDI_UNALLOCATED; } } -result = bdrv_pwrite_sync(bs, offset, bmap, bmap_size); -if (result 0) { +ret = bdrv_pwrite_sync(bs, offset, bmap, bmap_size); +if (ret 0) { error_setg(errp, Error writing bmap to %s, filename); goto exit; } @@ -777,8 +777,8 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp) } if (image_type == VDI_TYPE_STATIC) { -result = bdrv_truncate(bs, offset + blocks * block_size); -if (result 0) { +ret = bdrv_truncate(bs, offset + blocks * block_size); +if (ret 0) { error_setg(errp, Failed to statically allocate %s, filename); goto exit; } @@ -787,7 +787,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp) exit: bdrv_unref(bs); g_free(bmap); -return result; +return ret; } static void vdi_close(BlockDriverState *bs) -- 1.9.3
Re: [Qemu-devel] [PATCH 5/7] hw/core/sysbus: add fdt_add_node method
On 23.07.14 17:33, Eric Auger wrote: On 07/08/2014 03:52 PM, Alexander Graf wrote: On 07.07.14 09:08, Eric Auger wrote: This method is meant to be called on sysbus device dynamic instantiation (-device option). Devices that support this kind of instantiation must implement this method. Signed-off-by: Eric Auger eric.au...@linaro.org For the reason I stated earlier, I don't think it's a good idea to put device tree code into our device models. Hi Alex, I would propose we discuss that topic during next KVM call if you are available. I lost track when that would be. Next week would work fine, the week after not :). Hope Peter will be available to join too. Because I feel stuck between not putting things in the machine file (1) - obviously we could put them in a helper module (2) - and not putting them in the device (3). Whatever the solution I fear we are going to pollute something: Any time a new device wants to support dynamic instantiation, we would need to modify the machine file or the helper module with 1 and 2 resp. In case we put it in the device we pollute this latter... My hope was that quite few QEMU platform devices would need to support that feature and hence would need to implement this dt node generation method. To me dynamic instantiation of platform device was not the mainstream solution. Quite frankly I don't think it'd be that many. I think we'll cover 99.9% of all use cases if we just enable it for the virt machines of e500 and arm. Then there is the fundamental question of technical feasibility of devising a generic PlatformParams that match all the specialization needs? Here I miss experience. In case we know the machine type and a small set of additional fields couldn't we do the adaptations you talked about, related to IRQs? The problem is that I don't know all the boards and different things people come up with either. There's also no reason machine files have to stick to the platform bus model - they could just take those devices and stick them into an existing other virtual bus. I don't feel comfortable generalizing something where I'm pretty sure things will blow up sooner or later. Alex
Re: [Qemu-devel] [PATCH 1/7] hw/misc/platform_devices: helpers for dynamic instantiation of platform devices
On 23.07.14 16:58, Eric Auger wrote: On 07/08/2014 03:43 PM, Alexander Graf wrote: On 07.07.14 09:08, Eric Auger wrote: This new module implements routines which help in dynamic instantiation of sysbus devices. Machine files can use those generic routines. --- Dynamic sysbus device allocation fully written by Alex Graf. [Eric Auger] Those functions were initially in ppc e500 machine file. Now moved to a separate module. PPCE500Params is replaced by a generic struct named PlatformParams Signed-off-by: Alexander Graf ag...@suse.de Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/misc/Makefile.objs | 1 + hw/misc/platform_devices.c | 217 + include/hw/misc/platform_devices.h | 61 +++ 3 files changed, 279 insertions(+) create mode 100644 hw/misc/platform_devices.c create mode 100644 include/hw/misc/platform_devices.h diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs index e47fea8..d081606 100644 --- a/hw/misc/Makefile.objs +++ b/hw/misc/Makefile.objs @@ -40,3 +40,4 @@ obj-$(CONFIG_SLAVIO) += slavio_misc.o obj-$(CONFIG_ZYNQ) += zynq_slcr.o obj-$(CONFIG_PVPANIC) += pvpanic.o +obj-y += platform_devices.o diff --git a/hw/misc/platform_devices.c b/hw/misc/platform_devices.c new file mode 100644 index 000..96ab272 --- /dev/null +++ b/hw/misc/platform_devices.c @@ -0,0 +1,217 @@ +#include hw/misc/platform_devices.h +#include hw/sysbus.h +#include qemu/error-report.h + +#define PAGE_SHIFT 12 + +int sysbus_device_create_devtree(Object *obj, void *opaque) +{ +PlatformDevtreeData *data = opaque; +Object *dev; +SysBusDevice *sbdev; +bool matched = false; + +dev = object_dynamic_cast(obj, TYPE_SYS_BUS_DEVICE); +sbdev = (SysBusDevice *)dev; + +if (!sbdev) { +/* Container, traverse it for children */ +return object_child_foreach(obj, sysbus_device_create_devtree, data); +} + +if (!matched) { +error_report(Device %s is not supported by this machine yet., + qdev_fw_name(DEVICE(dev))); +exit(1); +} + +return 0; +} + +void platform_bus_create_devtree(PlatformParams *params, void *fdt, +const char *mpic) +{ +gchar *node = g_strdup_printf(/platform@%PRIx64, + params-platform_bus_base); +const char platcomp[] = qemu,platform\0simple-bus; +PlatformDevtreeData data; +Object *container; +uint64_t addr = params-platform_bus_base; +uint64_t size = params-platform_bus_size; +int irq_start = params-platform_bus_first_irq; + +/* Create a /platform node that we can put all devices into */ + +qemu_fdt_add_subnode(fdt, node); +qemu_fdt_setprop(fdt, node, compatible, platcomp, sizeof(platcomp)); + +/* Our platform bus region is less than 32bit big, so 1 cell is enough for + address and size */ +qemu_fdt_setprop_cells(fdt, node, #size-cells, 1); +qemu_fdt_setprop_cells(fdt, node, #address-cells, 1); +qemu_fdt_setprop_cells(fdt, node, ranges, 0, addr 32, addr, size); + +qemu_fdt_setprop_phandle(fdt, node, interrupt-parent, mpic); + +/* Loop through all devices and create nodes for known ones */ +data.fdt = fdt; +data.mpic = mpic; +data.irq_start = irq_start; +data.node = node; + +container = container_get(qdev_get_machine(), /peripheral); +sysbus_device_create_devtree(container, data); +container = container_get(qdev_get_machine(), /peripheral-anon); +sysbus_device_create_devtree(container, data); + +g_free(node); +} Device trees are pretty platform (and even machine) specific. Just to give you an example - the interrupt specifier on most e500 systems really is 4 cells big: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/powerpc/fsl/mpic.txt#n80 | Interrupt specifiers consists of 4 cells encoded as follows: 1st-cell interrupt-number Identifies the interrupt source. The meaning depends on the type of interrupt. Note: If the interrupt-type cell is undefined (i.e. #interrupt-cells = 2), this cell should be interpreted the same as for interrupt-type 0-- i.e. an external or normal SoC device interrupt. 2nd-cell level-sense information, encoded as follows: 0 = low-to-high edge triggered 1 = active low level-sensitive 2 = active high level-sensitive 3 = high-to-low edge triggered 3rd-cell interrupt-type The following types are supported: 0 = external or normal SoC device interrupt The interrupt-number cell contains the SoC device interrupt number. The type-specific
Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation
Hi Paolo, 2014-07-23 15:58 GMT+08:00 Paolo Bonzini pbonz...@redhat.com: Il 22/07/2014 17:47, Le Tan ha scritto: +static inline void define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val, +uint64_t wmask, uint64_t w1cmask) +{ +*((uint64_t *)s-csr[addr]) = val; All these casts are not endian-safe. Please use ldl_le_p, ldq_le_p, stl_le_p, stq_le_p. Thanks very much. Finally I got the idea here.:) Also thanks for your renaming suggestions. +*((uint64_t *)s-wmask[addr]) = wmask; +*((uint64_t *)s-w1cmask[addr]) = w1cmask; +} + +static inline void define_quad_wo(IntelIOMMUState *s, hwaddr addr, + uint64_t mask) +{ +*((uint64_t *)s-womask[addr]) = mask; +} + +static inline void define_long(IntelIOMMUState *s, hwaddr addr, uint32_t val, +uint32_t wmask, uint32_t w1cmask) +{ +*((uint32_t *)s-csr[addr]) = val; +*((uint32_t *)s-wmask[addr]) = wmask; +*((uint32_t *)s-w1cmask[addr]) = w1cmask; +} + +static inline void define_long_wo(IntelIOMMUState *s, hwaddr addr, + uint32_t mask) +{ +*((uint32_t *)s-womask[addr]) = mask; +} + +/* External get/set operations */ +static inline void set_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val) +{ +uint64_t oldval = *((uint64_t *)s-csr[addr]); +uint64_t wmask = *((uint64_t *)s-wmask[addr]); +uint64_t w1cmask = *((uint64_t *)s-w1cmask[addr]); +*((uint64_t *)s-csr[addr]) = +((oldval ~wmask) | (val wmask)) ~(w1cmask val); +} + +static inline void set_long(IntelIOMMUState *s, hwaddr addr, uint32_t val) +{ +uint32_t oldval = *((uint32_t *)s-csr[addr]); +uint32_t wmask = *((uint32_t *)s-wmask[addr]); +uint32_t w1cmask = *((uint32_t *)s-w1cmask[addr]); +*((uint32_t *)s-csr[addr]) = +((oldval ~wmask) | (val wmask)) ~(w1cmask val); +} + +static inline uint64_t get_quad(IntelIOMMUState *s, hwaddr addr) +{ +uint64_t val = *((uint64_t *)s-csr[addr]); +uint64_t womask = *((uint64_t *)s-womask[addr]); +return val ~womask; +} + + +static inline uint32_t get_long(IntelIOMMUState *s, hwaddr addr) +{ +uint32_t val = *((uint32_t *)s-csr[addr]); +uint32_t womask = *((uint32_t *)s-womask[addr]); +return val ~womask; +} + + + +/* Internal get/set operations */ +static inline uint64_t __get_quad(IntelIOMMUState *s, hwaddr addr) get_quad_raw? +{ +return *((uint64_t *)s-csr[addr]); +} + +static inline uint32_t __get_long(IntelIOMMUState *s, hwaddr addr) get_long_raw? +{ +return *((uint32_t *)s-csr[addr]); +} + + +/* val = (val ~clear) | mask */ +static inline uint32_t set_mask_long(IntelIOMMUState *s, hwaddr addr, set_clear_long? + uint32_t clear, uint32_t mask) +{ +uint32_t *ptr = (uint32_t *)s-csr[addr]; +uint32_t val = (*ptr ~clear) | mask; +*ptr = val; +return val; +} + +/* val = (val ~clear) | mask */ +static inline uint64_t set_mask_quad(IntelIOMMUState *s, hwaddr addr, set_clear_quad? + uint64_t clear, uint64_t mask) +{ +uint64_t *ptr = (uint64_t *)s-csr[addr]; +uint64_t val = (*ptr ~clear) | mask; +*ptr = val; +return val; +} + + Regards, Le
Re: [Qemu-devel] [PATCH 1/3] intel-iommu: introduce Intel IOMMU (VT-d) emulation
Hi Stefan, 2014-07-24 4:29 GMT+08:00 Stefan Weil s...@weilnetz.de: Am 22.07.2014 17:47, schrieb Le Tan: Add support for emulating Intel IOMMU according to the VT-d specification for the q35 chipset machine. Implement the logic for DMAR (DMA remapping) without PASID support. Use register-based invalidation for context-cache invalidation and IOTLB invalidation. Basic fault reporting and caching are not implemented yet. Signed-off-by: Le Tan tamlokv...@gmail.com --- hw/i386/Makefile.objs |1 + hw/i386/intel_iommu.c | 1139 + include/hw/i386/intel_iommu.h | 350 + 3 files changed, 1490 insertions(+) create mode 100644 hw/i386/intel_iommu.c create mode 100644 include/hw/i386/intel_iommu.h [...] diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c new file mode 100644 index 000..3ba0e1e --- /dev/null +++ b/hw/i386/intel_iommu.c @@ -0,0 +1,1139 @@ +/* + * QEMU emulation of an Intel IOMMU (VT-d) + * (DMA Remapping device) + * + * Copyright (c) 2013 Knut Omang, Oracle knut.om...@oracle.com + * Copyright (C) 2014 Le Tan, tamlokv...@gmail.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ + I suggest replacing the FSF address here (and in other files) by the URL: * You should have received a copy of the GNU General Public License along * with this program; if not, see http://www.gnu.org/licenses/. This is the standard used for most GPL text in QEMU source files. Get it. I copied it from the Linux kernel tree. Thanks very much! Regards Stefan W. Regards, Le
Re: [Qemu-devel] [RFC] How to handle feature regressions in new QEMU releases
On Wed, Jul 16, 2014 at 10:29 AM, Michael Tokarev m...@tls.msk.ru wrote: 16.07.2014 21:23, ronnie sahlberg wrote: If you ask debian to upgrade. Could you ask them to wait and upgrade after I have release the next version, hopefully if all goes well, at the end of this week? There's no problem in updating now to fix missing .pc file and to update next week to include a new version. Please find a new version 1.12 on the website. Thanks. ronnie sahlberg
Re: [Qemu-devel] [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On 2014/7/24 4:54, Konrad Rzeszutek Wilk wrote: On Sat, Jul 19, 2014 at 12:27:21AM +, Kay, Allen M wrote: For the MCH PCI registers that do need to be read - can you tell us which ones those are? In qemu/hw/xen_pt_igd.c/igd_pci_read(), following MCH PCI config register reads are passthrough to the host HW. Some of the registers are needed by Ironlake GFX driver which we probably can remove. I did a trace recently on Broadwell, the number of register accessed are even smaller (0, 2, 2c, 2e, 50, 52, a0, a4). Given that we now have integrated MCH and GPU in the same socket, looks like driver can easily remove reads for offsets 0 - 0x2e. case 0x00:/* vendor id */ case 0x02:/* device id */ case 0x08:/* revision id */ case 0x2c:/* sybsystem vendor id */ case 0x2e:/* sybsystem id */ Right. We can fix the i915 to use the mechanism that Michael mentioned. (see attached RFC patches) case 0x50:/* SNB: processor graphics control register */ case 0x52:/* processor graphics control register */ case 0xa0:/* top of memory */ case 0xb0:/* ILK: BSM: should read from dev 2 offset 0x5c */ case 0x58:/* SNB: PAVPC Offset */ case 0xa4:/* SNB: graphics base of stolen memory */ case 0xa8:/* SNB: base of GTT stolen memory */ I dug in the intel-gtt.c (see ironlake_gtt_driver) to figure out this a bit more. As in, I speculated, what if we returned 0 (and not implement any support for reading from these registers). What would happen? 0x52 for Ironlake (g5): -- It looks like intel_gmch_probe is called when i915_gem_gtt_init starts (there is a lot of code that looks to be used between intel-gtt.c and i915.c). Anyhow the interesting parts are that i9xx_setup ends up calling ioremap the i915 BAR to setup some of these registers for older generations. Then i965_gtt_total_entries gets which reads 0x52, but it is only needed for v5 generation. For other (v4 and G33) it reads it from the GPU's 0x2020 register. If there is a mismatch, it writes to the GPU at 0x2020 to update the the size based on the bridge. And then it reads from 0x2020 and that is returned and stuck in intel_private.gtt_total_entries. So 0x52 in the emulated bridge could be populated with what the GPU has at 0x2020. And the writes go in the emulated area. 0x52 for v6 - v8: - We seem to go to gen6_gmch_probe which just figures out the the GTT based on the GPU's BAR sizes. The stolen values are read from 0x50 from the GPU. So no need to touch the bridge (see gen6_gmch_probe) OK, so no need to have 0x52 or 0x50 in the bridge. 0xA0: - Could not find any reference in the Linux code. Why would Windows driver need this? If we returned the _real_ TOM would it matter? Is it used to figure out the device should use 32-bit DMA operations or 40-bit? 0xb0 or 0x5c: - No mention of them in the Linux code. 0x58, 0xa4, 0xa8: - No usage of them in the Linux code. We seem to be using the 0x52 from the bridge and 0x2020 from the GPU for this functionality. So in theory*, if using Ironlake we need to have a proper value in 0x52 in the bridge. But for the later chipsets we do not need these values (I am assuming that intel_setup_mchbar can safely return as it does that for Ironlake and could very well for later generations). Can this be reflected in the Windows driver as well? P.S. *theory: That is assuming we modify the Linux i915_drv.c:intel_detect_pch to pick up the id as suggested earlier. See the RFC patches attached. (Not compile tested at all!) I take a look these patches, looks we still scan all PCI Bridge to walk all PCHs. So this means we still need to fake a PCI bridge, right? Or maybe you don't cover this problem this time. I prefer we should check dev slot to get that PCH like my previous patch, gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class type. Because Windows always use this way, so I think this point should be same between Linux and Windows. Or we need anther better way to unify all OSs. Thanks Tiejun Allen -Original Message- From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] Sent: Friday, July 18, 2014 6:45 AM To: Kay, Allen M Cc: Michael S. Tsirkin; Jesse Barnes; peter.mayd...@linaro.org; xen-de...@lists.xensource.com; Ross Philipson; airl...@linux.ie; daniel.vet...@ffwll.ch; intel-...@lists.freedesktop.org; kelly.zyta...@amd.com; qemu-devel@nongnu.org; Anthony Perard; Stefano Stabellini; anth...@codemonkey.ws; Paolo Bonzini; Zhang, Yang Z; Chen, Tiejun Subject: Re: [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support On Thu, Jul 17, 2014 at 05:37:12PM +, Kay, Allen M wrote:
[Qemu-devel] [RFC PATCH v2] add memory hotunplug support
From: Hu Tao hu...@cn.fujitsu.com This patch is to solve a problem that when you add a hot-pluggable memory, you can't remove the memory. Its approach is to set GPE status bit by qemu, then trigger SCI interrupt to notify guest os. Guest os checks device status, and free memory resource if possible, then generate OST. Finally, qemu handles OST events to free dimm device. Signed-off-by: Hu Tao hu...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/memory_hotplug.c | 74 +++- hw/acpi/piix4.c | 2 ++ hw/core/qdev.c | 9 + hw/i386/pc.c | 31 + hw/i386/ssdt-mem.dsl | 4 +++ hw/i386/ssdt-misc.dsl| 11 +- hw/mem/pc-dimm.c | 10 ++ include/hw/acpi/memory_hotplug.h | 3 ++ include/qom/object.h | 1 + qdev-monitor.c | 25 +- qom/object.c | 2 +- 11 files changed, 168 insertions(+), 4 deletions(-) diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c index ed39241..b43b2b4 100644 --- a/hw/acpi/memory_hotplug.c +++ b/hw/acpi/memory_hotplug.c @@ -75,12 +75,14 @@ static uint64_t acpi_memory_hotplug_read(void *opaque, hwaddr addr, case 0x14: /* pack and return is_* fields */ val |= mdev-is_enabled ? 1 : 0; val |= mdev-is_inserting ? 2 : 0; +val |= mdev-is_removing ? 4 : 0; trace_mhp_acpi_read_flags(mem_st-selector, val); break; default: val = ~0; break; } + return val; } @@ -126,17 +128,57 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr addr, uint64_t data, info = acpi_memory_device_status(mem_st-selector, mdev); qapi_event_send_acpi_device_ost(info, error_abort); qapi_free_ACPIOSTInfo(info); +switch (mdev-ost_event) { +case 0x03: /* EJECT */ +switch (mdev-ost_status) { +case 0x0: /* SUCCESS */ +object_unparent(OBJECT(mdev-dimm)); +mdev-is_removing = false; +mdev-dimm = NULL; +break; +case 0x1: /* FAILURE */ +case 0x2: /* UNRECOGNIZED NOTIFY */ +case 0x80: /* EJECT NOT SUPPORTED */ +case 0x81: /* DEVICE IN USE */ +case 0x82: /* DEVICE BUSY */ +case 0x83: /* EJECT_DEPENDENCY_BUSY */ +mdev-is_removing = false; +mdev-is_enabled = true; +break; +case 0x84: /* EJECTION IN PROGRESS */ +break; +default: +break; +} +break; +case 0x103: /* OSPM EJECT */ +switch (mdev-ost_status) { +case 0x0: /* SUCCESS */ +object_unparent(OBJECT(mdev-dimm)); +mdev-is_removing = false; +mdev-dimm = NULL; +break; +case 0x84: /* EJECTION IN PROGRESS */ +mdev-is_enabled = false; +mdev-is_removing = true; +break; +default: +break; +} +} break; case 0x14: mdev = mem_st-devs[mem_st-selector]; if (data 2) { /* clear insert event */ mdev-is_inserting = false; trace_mhp_acpi_clear_insert_evt(mem_st-selector); +} else if (data 4) { /* MRMV */ +mdev-is_enabled = false; } break; } - } + static const MemoryRegionOps acpi_memory_hotplug_ops = { .read = acpi_memory_hotplug_read, .write = acpi_memory_hotplug_write, @@ -195,6 +237,36 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st, return; } +void acpi_memory_unplug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st, + DeviceState *dev, Error **errp) +{ +MemStatus *mdev; +Error *local_err = NULL; +int slot = object_property_get_int(OBJECT(dev), slot, local_err); + +if (local_err) { +error_propagate(errp, local_err); +return; +} + +if (slot = mem_st-dev_count) { +char *dev_path = object_get_canonical_path(OBJECT(dev)); +error_setg(errp, acpi_memory_plug_cb: + device [%s] returned invalid memory slot[%d], +dev_path, slot); +g_free(dev_path); +return; +} + +mdev = mem_st-devs[slot]; +mdev-is_removing = true; + +/* do ACPI magic */ +ar-gpe.sts[0] |= ACPI_MEMORY_HOTPLUG_STATUS; +acpi_update_sci(ar, irq); +return; +} + static const VMStateDescription vmstate_memhp_sts = { .name = memory hotplug device state, .version_id = 1, diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c index b72b34e..37d593a 100644 --- a/hw/acpi/piix4.c +++ b/hw/acpi/piix4.c @@ -362,6 +362,8 @@ static void
Re: [Qemu-devel] [RFC PATCH 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
On 07/23/2014 11:44 PM, Eric Blake wrote: On 07/23/2014 08:25 AM, Yang Hongyang wrote: Virtual machine (VM) replication is a well known technique for providing application-agnostic software-implemented hardware fault tolerance non-stop service. COLO is a high availability solution. Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the same request from client, and generate response in parallel too. If the response packets from PVM and SVM are identical, they are released immediately. Otherwise, a VM checkpoint (on demand) is conducted. The idea is presented in Xen summit 2012, and 2013, and academia paper in SOCC 2013. It's also presented in KVM forum 2013: http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf Please refer to above document for detailed information. Please also refer to previous posted RFC proposal: http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html The patchset is also hosted on github: https://github.com/macrosheep/qemu/tree/colo_v0.1 This patchset is RFC, implements the frame of colo, without failover and nic/disk replication. But it is ready for demo the COLO idea above QEMU-Kvm. Steps using this patchset to get an overview of COLO: 1. configure the source with --enable-colo option Code that has to be opt-in tends to bitrot, because people don't configure their build-bots to opt in. What sort of penalties does opting in cause to the code if colo is not used? I'd much rather make the default to compile colo unless configured --disable-colo. Are there any pre-req libraries required for it to work? That would be the only reason to make the default of on or off conditional, rather than defaulting to on. Thanks for all your comments on this patchset, will address them. For this one, it will not affect the rest of the code if COLO is compiled but not used, and it does not require pre-req libraries for now, so we can make COLO support default to on next time. -- Thanks, Yang.
Re: [Qemu-devel] [PATCH V4 2/5] runner: Tool for fuzz tests execution
On Mon, 07/21 14:18, Maria Kustova wrote: The purpose of the test runner is to prepare the test environment (e.g. create a work directory, a test image, etc), execute a program under test with parameters, indicate a test failure if the program was killed during the test execution and collect core dumps, logs and other test artifacts. The test runner doesn't depend on an image format or a program will be tested, so it can be used with any external image generator and program under test. Signed-off-by: Maria Kustova mari...@catit.be Looks good. Only two minor comments below but neither is a stopper. --- tests/image-fuzzer/runner/runner.py | 360 1 file changed, 360 insertions(+) create mode 100755 tests/image-fuzzer/runner/runner.py diff --git a/tests/image-fuzzer/runner/runner.py b/tests/image-fuzzer/runner/runner.py new file mode 100755 index 000..3e9e65d --- /dev/null +++ b/tests/image-fuzzer/runner/runner.py @@ -0,0 +1,360 @@ +#!/usr/bin/env python + +# Tool for running fuzz tests +# +# Copyright (C) 2014 Maria Kustova mari...@catit.be +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +import sys, os, signal +import subprocess +import random +from itertools import count +from shutil import rmtree +import getopt +try: +import json +except ImportError: +try: +import simplejson as json +except ImportError: +print Warning: Module for JSON processing is not found.\n + \ +'--config' and '--command' options are not supported. +import resource +resource.setrlimit(resource.RLIMIT_CORE, (-1, -1)) + + +def multilog(msg, *output): + Write an object to all of specified file descriptors + + +for fd in output: +fd.write(msg) +fd.flush() + + +def str_signal(sig): + Convert a numeric value of a system signal to the string one +defined by the current operational system + + +for k, v in signal.__dict__.items(): +if v == sig: +return k + + +class TestException(Exception): +Exception for errors risen by TestEnv objects +pass + + +class TestEnv(object): + Trivial test object + +The class sets up test environment, generates backing and test images +and executes application under tests with specified arguments and a test +image provided. +All logs are collected. +Summary log will contain short descriptions and statuses of tests in +a run. +Test log will include application (e.g. 'qemu-img') logs besides info sent +to the summary log. + + +def __init__(self, test_id, seed, work_dir, run_log, + cleanup=True, log_all=False): +Set test environment in a specified work directory. + +Path to qemu-img and qemu-io will be retrieved from 'QEMU_IMG' and +'QEMU_IO' environment variables + +if seed is not None: +self.seed = seed +else: +self.seed = str(random.randint(0, sys.maxint)) +random.seed(self.seed) + +self.init_path = os.getcwd() +self.work_dir = work_dir +self.current_dir = os.path.join(work_dir, 'test-' + test_id) +self.qemu_img = \ +os.environ.get('QEMU_IMG', 'qemu-img')\ + .strip().split(' ') +self.qemu_io = \ + os.environ.get('QEMU_IO', 'qemu-io').strip().split(' ') +self.commands = [['qemu-img', 'check', '-f', 'qcow2', '$test_img'], + ['qemu-img', 'info', '-f', 'qcow2', '$test_img'], + ['qemu-io', '$test_img', '-c', 'read $off $len'], + ['qemu-io', '$test_img', '-c', 'write $off $len'], + ['qemu-io', '$test_img', '-c', + 'aio_read $off $len'], + ['qemu-io', '$test_img', '-c', + 'aio_write $off $len'], + ['qemu-io', '$test_img', '-c', 'flush'], + ['qemu-io', '$test_img', '-c', + 'discard $off $len'], + ['qemu-io', '$test_img', '-c', + 'truncate $off']] +for fmt in ['raw', 'vmdk', 'vdi',
[Qemu-devel] [Bug 1336801] Re: 12.04 guest hangs on a 14.04 host server with cirrus graphics
Note that on a successful boot, dmesg | grep cirrus shows: [9.064581] fb: conflicting fb hw usage cirrusdrmfb vs EFI VGA - removing generic driver [9.133808] fbcon: cirrusdrmfb (fb0) is primary device [9.431359] cirrus :00:02.0: fb0: cirrusdrmfb frame buffer device [9.431362] cirrus :00:02.0: registered panic notifier [9.652851] [drm] Initialized cirrus 1.0.0 20110418 for :00:02.0 on minor 0 I can also reproduce this on qemu built from upstream git head (earlier this week) so marking as affecting the upstream project. ** Package changed: libvirt (Ubuntu) = qemu (Ubuntu) ** Also affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1336801 Title: 12.04 guest hangs on a 14.04 host server with cirrus graphics Status in QEMU: New Status in “qemu” package in Ubuntu: Triaged Bug description: A new 12.04.4 server guest installation hangs on a 14.04 server host machine. I did the following: Created a new Virtual Machine with the Ubuntu 12.04 template using virt-manager Ran through the installation without a hitch to install a LAMP+SSH server. All standard options apart from that. On reboot the 12.04 guest started but then hung after doing fsck step. Trying different options (change disk driver, etc) made it progress a couple more steps but still hung. The thing that fixed it in the end was to switch to a VGA display driver, away from the default. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1336801/+subscriptions