Re: [Qemu-devel] [Qemu-ppc] [PATCH for-2.2] spapr: add host Linux version information to device tree
On 07/24/2014 11:15 PM, Alexander Graf wrote: On 18.07.14 06:31, cyril...@gmail.com wrote: It may prove useful know which Linux distribution version the host machine is running when an issue in the guest arises but a user cannot access the host. Signed-off-by: Cyril Bur cyril@au1.ibm.com --- hw/ppc/spapr.c | 8 +++ target-ppc/kvm.c | 62 target-ppc/kvm_ppc.h | 6 + 3 files changed, 76 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 6b48a26..391d47a 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -375,6 +375,14 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base, _FDT((fdt_property_string(fdt, vm,uuid, buf))); g_free(buf); +/* + * Add info to the guest FDT to tell it what linux the host is + */ +if (kvmppc_get_linux_host(buf)) { +_FDT((fdt_property_string(fdt, linux,host, buf))); Is this even specified in sPAPR? PAPR does not know about any linux,xxx properties. +g_free(buf); +} + _FDT((fdt_property_cell(fdt, #address-cells, 0x2))); _FDT((fdt_property_cell(fdt, #size-cells, 0x2))); diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 8c9e79c..95e0970 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -1415,6 +1415,68 @@ bool kvmppc_get_host_model(char **value) return g_file_get_contents(/proc/device-tree/model, value, NULL, NULL); } +bool kvmppc_get_linux_host(char **value) +{ +FILE *f; +int i; +char line[512]; +const char *names[] = {NAME, VERSION, BUILD_ID}; +bool names_found[ARRAY_SIZE(names)] = { 0 }; +GString *output = NULL; +f = fopen(/etc/os-release, r); A few comments: 1) Why would anyone care? Useful debug info when the host is not reachable. 2) I'm not sure KVM is the right decision maker on whether we want this exposed or not. Good point, there is no reason not to show this info in TCG. After all, the files you read here are available on an x86 host just as well Does qemu-x86 has any way to pass information like this to the guest? 3) Use glib functions to read files Alex -- Alexey
[Qemu-devel] [PATCH] target-i386/cpu.c: Fix two error output indentation
Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com --- target-i386/cpu.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 6d008ab..217500c 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -1716,9 +1716,9 @@ static void x86_set_hv_spinlocks(Object *obj, Visitor *v, void *opaque, if (value min || value max) { error_setg(errp, Property %s.%s doesn't take value % PRId64 - (minimum: % PRId64 , maximum: % PRId64 ), - object_get_typename(obj), name ? name : null, - value, min, max); +(minimum: % PRId64 , maximum: % PRId64 ), + object_get_typename(obj), name ? name : null, + value, min, max); return; } cpu-hyperv_spinlock_attempts = value; @@ -1808,8 +1808,8 @@ static void x86_cpu_parse_featurestr(CPUState *cs, char *features, } if (numvalue min) { error_report(hv-spinlocks value shall always be = 0x%x -, fixup will be removed in future versions, -min); + , fixup will be removed in future versions, + min); numvalue = min; } snprintf(num, sizeof(num), % PRId32, numvalue); -- 1.9.3
Re: [Qemu-devel] [PATCH for-2.1 1/2] migration: load smaller RAMBlock to a bigger one if permitted
On Fri, 25 Jul 2014 19:56:40 +0200 Laszlo Ersek ler...@redhat.com wrote: On 07/25/14 17:48, Igor Mammedov wrote: Add API to mark memory region as extend-able on migration, to allow migration code to load smaller RAMBlock into a bigger one on destination QEMU instance. This will allow to fix broken migration from QEMU 1.7/2.0 to QEMU 2.1 due to ACPI tables size changes across 1.7/2.0/2.1 versions by marking ACPI tables ROM blob as extend-able. So that smaller tables from previos version could be always (previous) migrated to a bigger rom blob on new version. Credits-for-idea: Michael S. Tsirkin m...@redhat.com Signed-off-by: Igor Mammedov imamm...@redhat.com --- arch_init.c | 19 ++- exec.c | 15 +-- include/exec/cpu-all.h | 15 ++- include/exec/memory.h | 10 ++ include/exec/ram_addr.h | 1 + memory.c| 5 + 6 files changed, 53 insertions(+), 12 deletions(-) (Please pass the -O flag to git-format-patch, with an order file that moves header files (*.h) to the front. Header hunks should lead in a patch. Thanks.) diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index 6593be1..248c075 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -33,6 +33,7 @@ void *qemu_get_ram_block_host_ptr(ram_addr_t addr); void *qemu_get_ram_ptr(ram_addr_t addr); void qemu_ram_free(ram_addr_t addr); void qemu_ram_free_from_ptr(ram_addr_t addr); +void qemu_ram_set_extendable_on_migration(ram_addr_t addr); static inline bool cpu_physical_memory_get_dirty(ram_addr_t start, ram_addr_t length, (Ugh. The declarations (prototypes) of qemu_ram_*() functions are scattered between ram_addr.h and cpu-common.h (both under include/exec). I can't see why that is a good thing (the function definitions all seem to be in exec.c).) Trying to avoid putting stuff to global cpu-common.h, I've put definition in ram_addr.h. The rest should be moved from cpu-common.h to ram_addr.h when 2.2 is open. [...] diff --git a/arch_init.c b/arch_init.c index 8ddaf35..812f8b5 100644 --- a/arch_init.c +++ b/arch_init.c @@ -1071,11 +1071,20 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) QTAILQ_FOREACH(block, ram_list.blocks, next) { if (!strncmp(id, block-idstr, sizeof(id))) { -if (block-length != length) { -error_report(Length mismatch: %s: RAM_ADDR_FMT - in != RAM_ADDR_FMT, id, length, - block-length); -ret = -EINVAL; +if (block-flags RAM_EXTENDABLE_ON_MIGRATE) { +if (block-length length) { +error_report(Length too big: %s: RAM_ADDR_FMT missing space between '' and RAM_ADDR_FMT (for readability) +RAM_ADDR_FMT, id, length, you dropped the important word in here (cf. in != above). That word explains it to the user that the incoming size (stored in length) is too big. + block-length); +ret = -EINVAL; +} +} else { +if (block-length != length) { +error_report(Length mismatch: %s: RAM_ADDR_FMT missing space between '' and RAM_ADDR_FMT (for readability) + in != RAM_ADDR_FMT, id, length, + block-length); +ret = -EINVAL; +} } break; } Can you please explain where it is ensured that the non-overwritten part of a greater RAMBlock is zero-filled? As far as I can see: main() [vl.c] qemu_run_machine_init_done_notifiers() [vl.c] notifier_list_notify() [util/notify.c] pc_guest_info_machine_done() [hw/i386/pc.c] acpi_setup() [hw/i386/acpi-build.c] acpi_add_rom_blob() [hw/i386/acpi-build.c] rom_add_blob() [hw/core/loader.c] rom_set_mr() [hw/core/loader.c] memory_region_init_ram() [memory.c] qemu_ram_alloc() [exec.c] memcpy() -- populates it fw_cfg_add_file_callback() [hw/nvram/fw_cfg.c] fw_cfg_add_bytes_read_callback() [hw/nvram/fw_cfg.c] - stores len qemu_start_incoming_migration() [migration.c] I currently cannot see where the trailing portion of the bigger RAMBlock
Re: [Qemu-devel] [RFC PATCH v2 00/49] Series short description
From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini Il 17/07/2014 13:01, Pavel Dovgalyuk ha scritto: This set of patches is related to the reverse execution and deterministic replay of qemu execution Our implementation of deterministic replay can be used for deterministic and reverse debugging of guest code through gdb remote interface. Execution recording writes non-deterministic events log, which can be later used for replaying the execution anywhere and for unlimited number of times. It also supports checkpointing for faster rewinding during reverse debugging. Execution replaying reads the log and replays all non-deterministic events including external input, hardware clocks, and interrupts. From a first look: - patches 2 to 13 probably should try to use subsections, so that VMs that do not use the devices try not to save the extra data and keep backwards migration compatibility (at least try to) Could you give me and example? As I know, subsection is loaded when some predicate function returns true. How can I construct such a function for integratorcp module? What kind of condition should it check? In this module I just added missed vmstates (it does not saved/restored at all by the master version). - patch 16 should also use subsections, and perhaps apply to all other CPUs too? We implemented replay only for i386 and ARM. If we'll change other targets, it will not add record/replay capabilities to them, but can confuse the reviewers. - patches 23-24-25 perhaps could try using icount, like Konrad's patch do? Using faster icount (like in Konrad's patches) is the our next aim. It obviously will increase the speed of recording process. But now I submitted slower, but more conservative version of icount which we had already tested. - patch 27 makes sense but VIRTUAL is used to skip blinking when the VM is stopped Right, this is kind of hack. I haven't found better solution yet. - the others I haven't yet looked at, but they look like something that would bitrot really, really fast. For patch 33, I think changing INSERT_HEAD to INSERT_TAIL would be just fine, and I wonder if it's the same for other patches here. How do you plan on testing them and keeping them up to date? We're constantly keeping these patches up to date, because we are using deterministic replay and reverse debugging for solving our tasks. We re-test all the features when pulling new patches from the master branch. Pavel Dovgalyuk Reverse execution has the following features: * Deterministically replays whole system execution and all contents of the memory, state of the hadrware devices, clocks, and screen of the VM. * Writes execution log into the file for latter replaying for multiple times on different machines. * Supports i386, x86_64, and ARM hardware platforms. * Performs deterministic replay of all operations with keyboard, mouse, network adapters, audio devices, serial interfaces, and physical USB devices connected to the emulator. * Provides support for gdb reverse debugging commands like reverse-step and reverse- continue. * Supports auto-checkpointing for convenient reverse debugging. Usage of the record/replay: * First, record the execution, by adding '-record fname=replay.bin' to the command line. * Then you can replay it for the multiple times by using another command line option: '-replay fname=replay.bin' * Virtual machine should have at least one virtual disk, which is used to store checkpoints. If you want to enable automatic checkpointing, simply add ',period=XX' to record options, where XX is the checkpointing period in seconds. * Using of the network adapters in record/replay mode is possible with the following command-line options: - '-net user' (or another host adapter) in record mode - '-net replay' in replay mode. Every host network adapter should be replaced by 'replay' when replaying the execution. * Reverse debugging can be used through gdb remote interface. reverse-stepi and reverse-continue commands are supported. Other reverse commands should also work, because they reuse these ones. * Monitor is extended by the following commands: - replay_info - prints information about replay mode and current step (number of instructions executed) - replay_break - sets breakpoint at the specified instructions count. - replay_seek - rewinds (using the checkpoints, if possible) to the specified step of replay log. Paper with short description of deterministic replay implementation: http://www.computer.org/csdl/proceedings/csmr/2012/4666/00/4666a553-abs.html Modifications of qemu include: * adding missed fields of the virtual devices' states to the vmstate structures to allow deterministic saving and restoring the VM state * adding virtual
Re: [Qemu-devel] [Qemu-ppc] [PATCH for-2.2] spapr: add host Linux version information to device tree
Am 28.07.2014 um 08:47 schrieb Alexey Kardashevskiy a...@ozlabs.ru: On 07/24/2014 11:15 PM, Alexander Graf wrote: On 18.07.14 06:31, cyril...@gmail.com wrote: It may prove useful know which Linux distribution version the host machine is running when an issue in the guest arises but a user cannot access the host. Signed-off-by: Cyril Bur cyril@au1.ibm.com --- hw/ppc/spapr.c | 8 +++ target-ppc/kvm.c | 62 target-ppc/kvm_ppc.h | 6 + 3 files changed, 76 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 6b48a26..391d47a 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -375,6 +375,14 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base, _FDT((fdt_property_string(fdt, vm,uuid, buf))); g_free(buf); +/* + * Add info to the guest FDT to tell it what linux the host is + */ +if (kvmppc_get_linux_host(buf)) { +_FDT((fdt_property_string(fdt, linux,host, buf))); Is this even specified in sPAPR? PAPR does not know about any linux,xxx properties. +g_free(buf); +} + _FDT((fdt_property_cell(fdt, #address-cells, 0x2))); _FDT((fdt_property_cell(fdt, #size-cells, 0x2))); diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 8c9e79c..95e0970 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -1415,6 +1415,68 @@ bool kvmppc_get_host_model(char **value) return g_file_get_contents(/proc/device-tree/model, value, NULL, NULL); } +bool kvmppc_get_linux_host(char **value) +{ +FILE *f; +int i; +char line[512]; +const char *names[] = {NAME, VERSION, BUILD_ID}; +bool names_found[ARRAY_SIZE(names)] = { 0 }; +GString *output = NULL; +f = fopen(/etc/os-release, r); A few comments: 1) Why would anyone care? Useful debug info when the host is not reachable. 2) I'm not sure KVM is the right decision maker on whether we want this exposed or not. Good point, there is no reason not to show this info in TCG. After all, the files you read here are available on an x86 host just as well Does qemu-x86 has any way to pass information like this to the guest? Yes and no. It has fw_cfg where we could put it and it has DSST generation where we could also put it I guess. Just like here, there's no standardized way to expose it though. Alex 3) Use glib functions to read files Alex -- Alexey
Re: [Qemu-devel] [PATCH for-2.1 2/2] acpi: mark ACPI tables ROM blob as extend-able on migration
On Fri, 25 Jul 2014 19:37:41 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 25/07/2014 17:48, Igor Mammedov ha scritto: It fixes migration failure for machine type pc-i440fx-1.7 from QEMU 1.7/2.0 to QEMU 2.1 Migration fails due to ACPI tables size grows across 1.7-2.1 versions. That causes ACPI tables ROM blob to change its size differently for the same configurations on different QEMU versions. As result migration code bails out when attempting to load smaller ROM blob into a bigger one on a newer QEMU. To trigger failure it's enough to add pci-bridge device to QEMU CLI Marking ACPI tables ROM blob as extend-able on migration allows QEMU to load smaller ACPI tables from QEMU 1.7/2.0, fixing forward migration failure introduced since 2.0 which affects only configurations that cause ACPI ROM blob size change. This works in this case, and it is more friendly to downstreams indeed. It also is simpler, at least on the surface. I think the ramifications could be a bit larger than with my own patches, but still I guess it's more appropriate at this point of the release cycle. It doesn't handle the case of ACPI tables that shrink, which can happen as well. I guess if this ever happens we can just hard-code the table size of the old versions to something big enough (64K?) and keep using fine-grained sizing. That was intentionally omitted in here so far size only goes up from 1.7 to 2.1. My though was that can enforce minimum size later during 2.2 cycle Anyway, I'll think more about it, and maybe post additional patch on top of this to set minimum size if I find a reason for it to be in 2.1. I'd like a day or two to mull about it, but I have it even if the patches are applied. Peter, feel free to go ahead with Igor's patches. Paolo
[Qemu-devel] [PATCH for-2.1 v2 0/2] Fix migration failure due to ACPI tables size changes
Changes since v2: - addressed Laszlo's comments * fixing typos, rewording comments * dropping enum-ification of RAMBlock flags * adding zeroing out destination ramblock * replacing 'if' with assert() Changing the ACPI table size causes migration to break, and the memory hotplug work opened our eyes on how horribly we were breaking things in 2.0 already. To trigger issue start QEMU-1.7 with -M pc-i440fx-1.7 -device pci-bridge,chassis_nr=1 and try to migrate to QEMU-2.1 or QEMU-2.0 as result target will fail with: qemu-system-x86_64: Length mismatch: /rom@etc/acpi/tables: 2000 in != 3000 This fix allows target QEMU to load smaller RAMBlock into a bigger one and fixes regression which was introduced since 2.0, allowing forward migration from 1.7/2.0 to 2.1 Fix is also suitable for stable-2.0. Igor Mammedov (2): migration: load smaller RAMBlock to a bigger one if permitted acpi: mark ACPI tables ROM blob as extend-able on migration arch_init.c | 22 +- exec.c | 8 hw/core/loader.c| 6 +- hw/i386/acpi-build.c| 2 +- include/exec/memory.h | 11 +++ include/exec/ram_addr.h | 3 +++ include/hw/loader.h | 5 +++-- memory.c| 5 + 8 files changed, 53 insertions(+), 9 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH for-2.1 v2 1/2] migration: load smaller RAMBlock to a bigger one if permitted
Add API to mark memory region as extend-able on migration, to allow migration code to load smaller RAMBlock into a bigger one on destination QEMU instance. This will allow to fix broken migration from QEMU 1.7/2.0 to QEMU 2.1 due to ACPI tables size changes across 1.7/2.0/2.1 versions by marking ACPI tables ROM blob as extend-able. So that smaller tables from previous version could be always migrated to a bigger rom blob on new version. Credits-for-idea: Michael S. Tsirkin m...@redhat.com Signed-off-by: Igor Mammedov imamm...@redhat.com --- v2: fixed patch as suggested by Laszlo --- arch_init.c | 22 +- exec.c | 8 include/exec/memory.h | 11 +++ include/exec/ram_addr.h | 3 +++ memory.c| 5 + 5 files changed, 44 insertions(+), 5 deletions(-) diff --git a/arch_init.c b/arch_init.c index 8ddaf35..2c0c238 100644 --- a/arch_init.c +++ b/arch_init.c @@ -1071,11 +1071,23 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) QTAILQ_FOREACH(block, ram_list.blocks, next) { if (!strncmp(id, block-idstr, sizeof(id))) { -if (block-length != length) { -error_report(Length mismatch: %s: RAM_ADDR_FMT - in != RAM_ADDR_FMT, id, length, - block-length); -ret = -EINVAL; +if (block-flags RAM_EXTENDABLE_ON_MIGRATE) { +if (block-length length) { +error_report(Length too big: %s: RAM_ADDR_FMT + in RAM_ADDR_FMT, id, length, + block-length); +ret = -EINVAL; +} else { +memset(block-host, 0, block-length); +} +} else { +if (block-length != length) { +error_report(Length mismatch: %s: + RAM_ADDR_FMT in != + RAM_ADDR_FMT, + id, length, block-length); +ret = -EINVAL; +} } break; } diff --git a/exec.c b/exec.c index 765bd94..02536f8e 100644 --- a/exec.c +++ b/exec.c @@ -1214,6 +1214,14 @@ void qemu_ram_unset_idstr(ram_addr_t addr) } } +void qemu_ram_set_extendable_on_migration(ram_addr_t addr) +{ +RAMBlock *block = find_ram_block(addr); + +assert(block != NULL); +block-flags |= RAM_EXTENDABLE_ON_MIGRATE; +} + static int memory_try_enable_merging(void *addr, size_t len) { if (!qemu_opt_get_bool(qemu_get_machine_opts(), mem-merge, true)) { diff --git a/include/exec/memory.h b/include/exec/memory.h index e2c8e3e..f96ddbb 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -894,6 +894,17 @@ bool memory_region_present(MemoryRegion *container, hwaddr addr); bool memory_region_is_mapped(MemoryRegion *mr); /** + * memory_region_permit_extendable_migration: marks #MemoryRegion + * as extendable on migration, allowing the migration code to load + * source memory block of smaller size than destination memory block + * at migration time + * + * @mr: a #MemoryRegion whose #RAMBlock should be marked as + * extendable on migration + */ +void memory_region_permit_extendable_migration(MemoryRegion *mr); + +/** * memory_region_find: translate an address/size relative to a * MemoryRegion into a #MemoryRegionSection. * diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index 6593be1..7a6b782 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -34,6 +34,9 @@ void *qemu_get_ram_ptr(ram_addr_t addr); void qemu_ram_free(ram_addr_t addr); void qemu_ram_free_from_ptr(ram_addr_t addr); +#define RAM_EXTENDABLE_ON_MIGRATE (1U 31) +void qemu_ram_set_extendable_on_migration(ram_addr_t addr); + static inline bool cpu_physical_memory_get_dirty(ram_addr_t start, ram_addr_t length, unsigned client) diff --git a/memory.c b/memory.c index 64d7176..744c746 100644 --- a/memory.c +++ b/memory.c @@ -1791,6 +1791,11 @@ bool memory_region_is_mapped(MemoryRegion *mr) return mr-container ? true : false; } +void memory_region_permit_extendable_migration(MemoryRegion *mr) +{ +qemu_ram_set_extendable_on_migration(mr-ram_addr); +} + MemoryRegionSection memory_region_find(MemoryRegion *mr, hwaddr addr, uint64_t size) { -- 1.8.3.1
[Qemu-devel] [PATCH for-2.1 v2 2/2] acpi: mark ACPI tables ROM blob as extend-able on migration
It fixes migration failure for machine type pc-i440fx-1.7 from QEMU 1.7/2.0 to QEMU 2.1 Migration fails due to ACPI tables size grows across 1.7-2.1 versions. That causes ACPI tables ROM blob to change its size differently for the same configurations on different QEMU versions. As result migration code bails out when attempting to load smaller ROM blob into a bigger one on a newer QEMU. To trigger failure it's enough to add pci-bridge device to QEMU CLI Marking ACPI tables ROM blob as extend-able on migration allows QEMU to load smaller ACPI tables from QEMU 1.7/2.0, fixing forward migration failure introduced since 2.0 which affects only configurations that cause ACPI ROM blob size change. Signed-off-by: Igor Mammedov imamm...@redhat.com Reviewed-by: Laszlo Ersek ler...@redhat.com --- hw/core/loader.c | 6 +- hw/i386/acpi-build.c | 2 +- include/hw/loader.h | 5 +++-- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/hw/core/loader.c b/hw/core/loader.c index 2bf6b8f..d09b894 100644 --- a/hw/core/loader.c +++ b/hw/core/loader.c @@ -722,7 +722,8 @@ err: void *rom_add_blob(const char *name, const void *blob, size_t len, hwaddr addr, const char *fw_file_name, - FWCfgReadCallback fw_callback, void *callback_opaque) + FWCfgReadCallback fw_callback, void *callback_opaque, + bool extendable) { Rom *rom; void *data = NULL; @@ -742,6 +743,9 @@ void *rom_add_blob(const char *name, const void *blob, size_t len, if (rom_file_has_mr) { data = rom_set_mr(rom, OBJECT(fw_cfg), devpath); +if (extendable) { +memory_region_permit_extendable_migration(rom-mr); +} } else { data = rom-data; } diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index ebc5f03..651c06b 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -1558,7 +1558,7 @@ static void *acpi_add_rom_blob(AcpiBuildState *build_state, GArray *blob, const char *name) { return rom_add_blob(name, blob-data, acpi_data_len(blob), -1, name, -acpi_build_update, build_state); +acpi_build_update, build_state, true); } static const VMStateDescription vmstate_acpi_build = { diff --git a/include/hw/loader.h b/include/hw/loader.h index 796cbf9..e5a24ac 100644 --- a/include/hw/loader.h +++ b/include/hw/loader.h @@ -57,7 +57,8 @@ int rom_add_file(const char *file, const char *fw_dir, bool option_rom); void *rom_add_blob(const char *name, const void *blob, size_t len, hwaddr addr, const char *fw_file_name, - FWCfgReadCallback fw_callback, void *callback_opaque); + FWCfgReadCallback fw_callback, void *callback_opaque, + bool extendable); int rom_add_elf_program(const char *name, void *data, size_t datasize, size_t romsize, hwaddr addr); int rom_load_all(void); @@ -70,7 +71,7 @@ void do_info_roms(Monitor *mon, const QDict *qdict); #define rom_add_file_fixed(_f, _a, _i) \ rom_add_file(_f, NULL, _a, _i, false) #define rom_add_blob_fixed(_f, _b, _l, _a) \ -rom_add_blob(_f, _b, _l, _a, NULL, NULL, NULL) +rom_add_blob(_f, _b, _l, _a, NULL, NULL, NULL, false) #define PC_ROM_MIN_VGA 0xc #define PC_ROM_MIN_OPTION 0xc8000 -- 1.8.3.1
Re: [Qemu-devel] [PATCH for-2.1 1/2] migration: load smaller RAMBlock to a bigger one if permitted
On 07/28/14 09:40, Igor Mammedov wrote: On Fri, 25 Jul 2014 19:56:40 +0200 Laszlo Ersek ler...@redhat.com wrote: On 07/25/14 17:48, Igor Mammedov wrote: Add API to mark memory region as extend-able on migration, to allow migration code to load smaller RAMBlock into a bigger one on destination QEMU instance. This will allow to fix broken migration from QEMU 1.7/2.0 to QEMU 2.1 due to ACPI tables size changes across 1.7/2.0/2.1 versions by marking ACPI tables ROM blob as extend-able. So that smaller tables from previos version could be always (previous) migrated to a bigger rom blob on new version. Credits-for-idea: Michael S. Tsirkin m...@redhat.com Signed-off-by: Igor Mammedov imamm...@redhat.com --- arch_init.c | 19 ++- exec.c | 15 +-- include/exec/cpu-all.h | 15 ++- include/exec/memory.h | 10 ++ include/exec/ram_addr.h | 1 + memory.c| 5 + 6 files changed, 53 insertions(+), 12 deletions(-) (Please pass the -O flag to git-format-patch, with an order file that moves header files (*.h) to the front. Header hunks should lead in a patch. Thanks.) diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index 6593be1..248c075 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -33,6 +33,7 @@ void *qemu_get_ram_block_host_ptr(ram_addr_t addr); void *qemu_get_ram_ptr(ram_addr_t addr); void qemu_ram_free(ram_addr_t addr); void qemu_ram_free_from_ptr(ram_addr_t addr); +void qemu_ram_set_extendable_on_migration(ram_addr_t addr); static inline bool cpu_physical_memory_get_dirty(ram_addr_t start, ram_addr_t length, (Ugh. The declarations (prototypes) of qemu_ram_*() functions are scattered between ram_addr.h and cpu-common.h (both under include/exec). I can't see why that is a good thing (the function definitions all seem to be in exec.c).) Trying to avoid putting stuff to global cpu-common.h, I've put definition in ram_addr.h. The rest should be moved from cpu-common.h to ram_addr.h when 2.2 is open. [...] diff --git a/arch_init.c b/arch_init.c index 8ddaf35..812f8b5 100644 --- a/arch_init.c +++ b/arch_init.c @@ -1071,11 +1071,20 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) QTAILQ_FOREACH(block, ram_list.blocks, next) { if (!strncmp(id, block-idstr, sizeof(id))) { -if (block-length != length) { -error_report(Length mismatch: %s: RAM_ADDR_FMT - in != RAM_ADDR_FMT, id, length, - block-length); -ret = -EINVAL; +if (block-flags RAM_EXTENDABLE_ON_MIGRATE) { +if (block-length length) { +error_report(Length too big: %s: RAM_ADDR_FMT missing space between '' and RAM_ADDR_FMT (for readability) +RAM_ADDR_FMT, id, length, you dropped the important word in here (cf. in != above). That word explains it to the user that the incoming size (stored in length) is too big. + block-length); +ret = -EINVAL; +} +} else { +if (block-length != length) { +error_report(Length mismatch: %s: RAM_ADDR_FMT missing space between '' and RAM_ADDR_FMT (for readability) + in != RAM_ADDR_FMT, id, length, + block-length); +ret = -EINVAL; +} } break; } Can you please explain where it is ensured that the non-overwritten part of a greater RAMBlock is zero-filled? As far as I can see: main() [vl.c] qemu_run_machine_init_done_notifiers() [vl.c] notifier_list_notify() [util/notify.c] pc_guest_info_machine_done() [hw/i386/pc.c] acpi_setup() [hw/i386/acpi-build.c] acpi_add_rom_blob() [hw/i386/acpi-build.c] rom_add_blob() [hw/core/loader.c] rom_set_mr() [hw/core/loader.c] memory_region_init_ram() [memory.c] qemu_ram_alloc() [exec.c] memcpy() -- populates it fw_cfg_add_file_callback() [hw/nvram/fw_cfg.c] fw_cfg_add_bytes_read_callback() [hw/nvram/fw_cfg.c] - stores len qemu_start_incoming_migration() [migration.c] I currently cannot see where the trailing portion of the bigger RAMBlock would be overwritten with zeroes.
Re: [Qemu-devel] [PATCH] hw/audio/intel-hda: Fix MSI capability address
Il 27/07/2014 08:57, Jan Kiszka ha scritto: From: Jan Kiszka jan.kis...@siemens.com According to ICH9 spec, the MSI capability is located at 0x60. This is important for guest drivers that do not parse the capability chain and use absolute addresses instead. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/audio/intel-hda.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c index aa49b47..09c4118 100644 --- a/hw/audio/intel-hda.c +++ b/hw/audio/intel-hda.c @@ -1141,7 +1141,7 @@ static int intel_hda_init(PCIDevice *pci) intel-hda, 0x4000); pci_register_bar(d-pci, 0, 0, d-mmio); if (d-msi) { -msi_init(d-pci, 0x50, 1, true, false); +msi_init(d-pci, 0x60, 1, true, false); } hda_codec_bus_init(DEVICE(pci), d-codecs, sizeof(d-codecs), Does this need a compat property? Paolo
Re: [Qemu-devel] [PATCH] hw/audio/intel-hda: Fix MSI capability address
On 2014-07-28 10:11, Paolo Bonzini wrote: Il 27/07/2014 08:57, Jan Kiszka ha scritto: From: Jan Kiszka jan.kis...@siemens.com According to ICH9 spec, the MSI capability is located at 0x60. This is important for guest drivers that do not parse the capability chain and use absolute addresses instead. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/audio/intel-hda.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c index aa49b47..09c4118 100644 --- a/hw/audio/intel-hda.c +++ b/hw/audio/intel-hda.c @@ -1141,7 +1141,7 @@ static int intel_hda_init(PCIDevice *pci) intel-hda, 0x4000); pci_register_bar(d-pci, 0, 0, d-mmio); if (d-msi) { -msi_init(d-pci, 0x50, 1, true, false); +msi_init(d-pci, 0x60, 1, true, false); } hda_codec_bus_init(DEVICE(pci), d-codecs, sizeof(d-codecs), Does this need a compat property? Sigh, right, it's guest visible. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH for-2.1 2/2] acpi: mark ACPI tables ROM blob as extend-able on migration
Il 28/07/2014 09:56, Igor Mammedov ha scritto: It doesn't handle the case of ACPI tables that shrink, which can happen as well. I guess if this ever happens we can just hard-code the table size of the old versions to something big enough (64K?) and keep using fine-grained sizing. That was intentionally omitted in here so far size only goes up from 1.7 to 2.1. My though was that can enforce minimum size later during 2.2 cycle Anyway, I'll think more about it, and maybe post additional patch on top of this to set minimum size if I find a reason for it to be in 2.1. No, there's no need for it in 2.1. Paolo
Re: [Qemu-devel] [PATCH for-2.1 v2 1/2] migration: load smaller RAMBlock to a bigger one if permitted
On 07/28/14 10:03, Igor Mammedov wrote: Add API to mark memory region as extend-able on migration, to allow migration code to load smaller RAMBlock into a bigger one on destination QEMU instance. This will allow to fix broken migration from QEMU 1.7/2.0 to QEMU 2.1 due to ACPI tables size changes across 1.7/2.0/2.1 versions by marking ACPI tables ROM blob as extend-able. So that smaller tables from previous version could be always migrated to a bigger rom blob on new version. Credits-for-idea: Michael S. Tsirkin m...@redhat.com Signed-off-by: Igor Mammedov imamm...@redhat.com --- v2: fixed patch as suggested by Laszlo --- arch_init.c | 22 +- exec.c | 8 include/exec/memory.h | 11 +++ include/exec/ram_addr.h | 3 +++ memory.c| 5 + 5 files changed, 44 insertions(+), 5 deletions(-) Thank you. Reviewed-by: Laszlo Ersek ler...@redhat.com
Re: [Qemu-devel] Possible null-ptr dereference
Hey, Yup, thanks, task closed ;-) Best regards, Mateusz Krzywicki From: arei.gong...@huawei.com To: mateusz.krzywi...@windowslive.com; qemu-devel@nongnu.org CC: stefa...@redhat.com; kw...@redhat.com Subject: RE: [Qemu-devel] Possible null-ptr dereference Date: Mon, 28 Jul 2014 06:03:45 + Hi, Should be easy to fix though. Does the following help? (Cc’ing Stefan Kevin) -- xen_disk: fix possible null-ptr dereference Signed-off-by: Gonglei arei.gong...@huawei.com --- hw/block/xen_disk.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c index aed5b5b..a221d0b 100644 --- a/hw/block/xen_disk.c +++ b/hw/block/xen_disk.c @@ -589,6 +589,7 @@ static int blk_send_response_one(struct ioreq *ioreq) break; default: dst = NULL; +return 0; } memcpy(dst, resp, sizeof(resp)); blkdev-rings.common.rsp_prod_pvt++; -- Best regards, -Gonglei From: qemu-devel-bounces+arei.gonglei=huawei@nongnu.org [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On Behalf Of mateusz.krzywi...@windowslive.com Sent: Saturday, July 26, 2014 6:52 PM To: qemu-devel@nongnu.org Subject: [Qemu-devel] Possible null-ptr dereference Hey, Found a little bug in latest qemu: In function: static int blk_send_response_one(struct ioreq *ioreq) File: qemu\hw\block\xen_disk.c Code: default: dst = NULL; } memcpy(dst, resp, sizeof(resp)); Just add simple check for dst and it will be all cool ;-) Best regards, Mateusz Krzywicki
Re: [Qemu-devel] [PATCH v2 1/7] bootindex: add modify_boot_device_path function
Hi, ... because you can just copy the suffix from the old entry here, instead of expecting the caller pass it in. Okay, agreed. But we should also think about the situation which a device don't have old entry in global fw_boot_order list. Throw an error? I think it is ok to allow only *changing* the bootindex. cheers, Gerd
Re: [Qemu-devel] [RFC PATCH v2 08/49] hpet: fixing saving and loading process
Il 17/07/2014 13:02, Pavel Dovgalyuk ha scritto: VM clock does not run while saving, so there is no need for saving the ticks in HPET. Also added saving of hpet_offset field. Signed-off-by: Pavel Dovgalyuk pavel.dovga...@ispras.ru --- hw/timer/hpet.c | 13 + 1 files changed, 1 insertions(+), 12 deletions(-) diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c index e160e8f..73401b9 100644 --- a/hw/timer/hpet.c +++ b/hw/timer/hpet.c @@ -222,14 +222,6 @@ static void update_irq(struct HPETTimer *timer, int set) } } -static void hpet_pre_save(void *opaque) -{ -HPETState *s = opaque; - -/* save current counter value */ -s-hpet_counter = hpet_get_ticks(s); -} - static int hpet_pre_load(void *opaque) { HPETState *s = opaque; @@ -255,9 +247,6 @@ static int hpet_post_load(void *opaque, int version_id) { HPETState *s = opaque; -/* Recalculate the offset between the main counter and guest time */ -s-hpet_offset = ticks_to_ns(s-hpet_counter) - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); - /* Push number of timers into capability returned via HPET_ID */ s-capability = ~HPET_ID_NUM_TIM_MASK; s-capability |= (s-num_timers - 1) HPET_ID_NUM_TIM_SHIFT; @@ -308,13 +297,13 @@ static const VMStateDescription vmstate_hpet = { .name = hpet, .version_id = 2, .minimum_version_id = 1, -.pre_save = hpet_pre_save, .pre_load = hpet_pre_load, .post_load = hpet_post_load, .fields = (VMStateField[]) { VMSTATE_UINT64(config, HPETState), VMSTATE_UINT64(isr, HPETState), VMSTATE_UINT64(hpet_counter, HPETState), +VMSTATE_UINT64(hpet_offset, HPETState), This needs a version bump. Paolo VMSTATE_UINT8_V(num_timers, HPETState, 2), VMSTATE_VALIDATE(num_timers in range, hpet_validate_num_timers), VMSTATE_STRUCT_VARRAY_UINT8(timer, HPETState, num_timers, 0,
Re: [Qemu-devel] [PATCH v2 1/7] bootindex: add modify_boot_device_path function
-Original Message- From: Gerd Hoffmann [mailto:kra...@redhat.com] Sent: Monday, July 28, 2014 4:31 PM Subject: Re: [PATCH v2 1/7] bootindex: add modify_boot_device_path function Hi, ... because you can just copy the suffix from the old entry here, instead of expecting the caller pass it in. Okay, agreed. But we should also think about the situation which a device don't have old entry in global fw_boot_order list. Throw an error? No. I should firstly use the suffix which the caller pass in IMHO, just like V3 I have posted, Thanks. I think it is ok to allow only *changing* the bootindex. Yes, that's no problem. Best regards, -Gonglei
Re: [Qemu-devel] [PATCH for-2.1 v2 2/2] acpi: mark ACPI tables ROM blob as extend-able on migration
On Mon, 28 Jul 2014 08:03:25 + Igor Mammedov imamm...@redhat.com wrote: It fixes migration failure for machine type pc-i440fx-1.7 from QEMU 1.7/2.0 to QEMU 2.1 Migration fails due to ACPI tables size grows across 1.7-2.1 versions. That causes ACPI tables ROM blob to change its size differently for the same configurations on different QEMU versions. As result migration code bails out when attempting to load smaller ROM blob into a bigger one on a newer QEMU. To trigger failure it's enough to add pci-bridge device to QEMU CLI Marking ACPI tables ROM blob as extend-able on migration allows QEMU to load smaller ACPI tables from QEMU 1.7/2.0, fixing forward migration failure introduced since 2.0 which affects only configurations that cause ACPI ROM blob size change. Signed-off-by: Igor Mammedov imamm...@redhat.com Reviewed-by: Laszlo Ersek ler...@redhat.com Self-NACK I'm sorry for mess, I've tested it on for i386 target, but patch breaks build for other targets. I'll resubmit v3 shortly. --- hw/core/loader.c | 6 +- hw/i386/acpi-build.c | 2 +- include/hw/loader.h | 5 +++-- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/hw/core/loader.c b/hw/core/loader.c index 2bf6b8f..d09b894 100644 --- a/hw/core/loader.c +++ b/hw/core/loader.c @@ -722,7 +722,8 @@ err: void *rom_add_blob(const char *name, const void *blob, size_t len, hwaddr addr, const char *fw_file_name, - FWCfgReadCallback fw_callback, void *callback_opaque) + FWCfgReadCallback fw_callback, void *callback_opaque, + bool extendable) { Rom *rom; void *data = NULL; @@ -742,6 +743,9 @@ void *rom_add_blob(const char *name, const void *blob, size_t len, if (rom_file_has_mr) { data = rom_set_mr(rom, OBJECT(fw_cfg), devpath); +if (extendable) { +memory_region_permit_extendable_migration(rom-mr); +} } else { data = rom-data; } diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index ebc5f03..651c06b 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -1558,7 +1558,7 @@ static void *acpi_add_rom_blob(AcpiBuildState *build_state, GArray *blob, const char *name) { return rom_add_blob(name, blob-data, acpi_data_len(blob), -1, name, -acpi_build_update, build_state); +acpi_build_update, build_state, true); } static const VMStateDescription vmstate_acpi_build = { diff --git a/include/hw/loader.h b/include/hw/loader.h index 796cbf9..e5a24ac 100644 --- a/include/hw/loader.h +++ b/include/hw/loader.h @@ -57,7 +57,8 @@ int rom_add_file(const char *file, const char *fw_dir, bool option_rom); void *rom_add_blob(const char *name, const void *blob, size_t len, hwaddr addr, const char *fw_file_name, - FWCfgReadCallback fw_callback, void *callback_opaque); + FWCfgReadCallback fw_callback, void *callback_opaque, + bool extendable); int rom_add_elf_program(const char *name, void *data, size_t datasize, size_t romsize, hwaddr addr); int rom_load_all(void); @@ -70,7 +71,7 @@ void do_info_roms(Monitor *mon, const QDict *qdict); #define rom_add_file_fixed(_f, _a, _i) \ rom_add_file(_f, NULL, _a, _i, false) #define rom_add_blob_fixed(_f, _b, _l, _a) \ -rom_add_blob(_f, _b, _l, _a, NULL, NULL, NULL) +rom_add_blob(_f, _b, _l, _a, NULL, NULL, NULL, false) #define PC_ROM_MIN_VGA 0xc #define PC_ROM_MIN_OPTION 0xc8000
Re: [Qemu-devel] [RFC PATCH v2 07/49] kvmapic: fixing loading vmstate
Il 17/07/2014 13:02, Pavel Dovgalyuk ha scritto: vapic state should not be synchronized with APIC while loading, because APIC state could be not loaded yet at that moment. We just save vapic_paddr in APIC VMState instead of synchronization. This comment is now obsolete: include/hw/i386/apic_internal.h:hwaddr vapic_paddr; /* note: persistence via kvmvapic */ Signed-off-by: Pavel Dovgalyuk pavel.dovga...@ispras.ru --- hw/i386/kvmvapic.c| 22 +- hw/intc/apic_common.c |5 - 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c index cb855c7..417ab6a 100644 --- a/hw/i386/kvmvapic.c +++ b/hw/i386/kvmvapic.c @@ -351,6 +351,24 @@ static int get_kpcr_number(X86CPU *cpu) return kpcr.number; } +static int vapic_enable_post_load(VAPICROMState *s, X86CPU *cpu) +{ +int cpu_number = get_kpcr_number(cpu); +hwaddr vapic_paddr; +static const uint8_t enabled = 1; + +if (cpu_number 0) { +return -1; +} +vapic_paddr = s-vapic_paddr + +(((hwaddr)cpu_number) VAPIC_CPU_SHIFT); +cpu_physical_memory_rw(vapic_paddr + offsetof(VAPICState, enabled), + (void *)enabled, sizeof(enabled), 1); +s-state = VAPIC_ACTIVE; + +return 0; +} + static int vapic_enable(VAPICROMState *s, X86CPU *cpu) { int cpu_number = get_kpcr_number(cpu); @@ -731,7 +749,9 @@ static void do_vapic_enable(void *data) VAPICROMState *s = data; X86CPU *cpu = X86_CPU(first_cpu); -vapic_enable(s, cpu); +/* Do not synchronize with APIC, because it was not loaded yet. + Just call the enable function which does not have synchronization. */ +vapic_enable_post_load(s, cpu); } static int vapic_post_load(void *opaque, int version_id) diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index ce3d903..9d75ee0 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -347,7 +347,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id) static const VMStateDescription vmstate_apic_common = { .name = apic, -.version_id = 3, +.version_id = 4, .minimum_version_id = 3, .minimum_version_id_old = 1, .load_state_old = apic_load_old, @@ -374,6 +374,9 @@ static const VMStateDescription vmstate_apic_common = { VMSTATE_INT64(next_time, APICCommonState), VMSTATE_INT64(timer_expiry, APICCommonState), /* open-coded timer state */ +VMSTATE_INT32_V(sipi_vector, APICCommonState, 4), +VMSTATE_INT32_V(wait_for_sipi, APICCommonState, 4), This could be a subsection. sipi_vector is only used (needed) if wait_for_sipi != 0. +VMSTATE_UINT64_V(vapic_paddr, APICCommonState, 4), Here you could also use a subsection, where the needed function returns false if vapic_paddr == 0. Paolo VMSTATE_END_OF_LIST() } };
Re: [Qemu-devel] [PATCH v12 0/6] qcow2, raw: add preallocation=full and preallocation=falloc
ping... All the 6 patches have reviewed-by now. On Fri, Jul 11, 2014 at 02:09:57PM +0800, Hu Tao wrote: This series adds two preallocation mode to qcow2 and raw: Option preallocation=full preallocates disk space for image by writing zeros to disk, this ensures disk space in any cases. Option preallocation=falloc preallocates disk space by calling posix_fallocate(). This is faster than preallocation=full. This series depends on patches 1-3 of Max's series 'qemu-img: Implement commit like QMP'. Specifically, patch 6 'qcow2: Add falloc and full preallocation option' uses minimal_blob_size() introduced by Max's patch 'qcow2: Optimize bdrv_make_empty()'. The series is also at https://github.com/taohu/qemu/commits/preallocation-v12 for you to check out, including depended patches from Max. Eric, I'm afraid now we missed qemu 2.1, so patch 1 is still sent with this series. changes to v11: - fix test case 049 (patch 4) - unsigned nl2e - uint64_t nl2e (patch 6) - use instead of / (patch 6) Hu Tao (6): block: round up file size to nearest sector raw, qcow2: don't convert file size to sector size rename parse_enum_option to qapi_enum_parse and make it public qapi: introduce PreallocMode and a new PreallocMode full. raw-posix: Add falloc and full preallocation option qcow2: Add falloc and full preallocation option block/qcow2.c | 56 +--- block/raw-posix.c | 92 +++--- block/raw-win32.c | 6 +-- blockdev.c | 30 +++ include/qapi/util.h| 17 + qapi/Makefile.objs | 1 + qapi/block-core.json | 17 + qapi/qapi-util.c | 32 tests/qemu-iotests/049.out | 2 +- tests/qemu-iotests/082.out | 54 +-- tests/qemu-iotests/096 | 64 tests/qemu-iotests/096.out | 14 +++ tests/qemu-iotests/group | 1 + 13 files changed, 296 insertions(+), 90 deletions(-) create mode 100644 include/qapi/util.h create mode 100644 qapi/qapi-util.c create mode 100755 tests/qemu-iotests/096 create mode 100644 tests/qemu-iotests/096.out -- 1.9.3
[Qemu-devel] [PATCH for-2.1 v3 2/2] acpi: mark ACPI tables ROM blob as extend-able on migration
It fixes migration failure for machine type pc-i440fx-1.7 from QEMU 1.7/2.0 to QEMU 2.1 Migration fails due to ACPI tables size grows across 1.7-2.1 versions. That causes ACPI tables ROM blob to change its size differently for the same configurations on different QEMU versions. As result migration code bails out when attempting to load smaller ROM blob into a bigger one on a newer QEMU. To trigger failure it's enough to add pci-bridge device to QEMU CLI Marking ACPI tables ROM blob as extend-able on migration allows QEMU to load smaller ACPI tables from QEMU 1.7/2.0, fixing forward migration failure introduced since 2.0 which affects only configurations that cause ACPI ROM blob size change. Signed-off-by: Igor Mammedov imamm...@redhat.com Reviewed-by: Laszlo Ersek ler...@redhat.com --- v3: fix build breakage of lm32 target --- hw/core/loader.c | 6 +- hw/i386/acpi-build.c | 2 +- hw/lm32/lm32_hwsetup.h | 2 +- include/hw/loader.h| 5 +++-- 4 files changed, 10 insertions(+), 5 deletions(-) diff --git a/hw/core/loader.c b/hw/core/loader.c index 2bf6b8f..d09b894 100644 --- a/hw/core/loader.c +++ b/hw/core/loader.c @@ -722,7 +722,8 @@ err: void *rom_add_blob(const char *name, const void *blob, size_t len, hwaddr addr, const char *fw_file_name, - FWCfgReadCallback fw_callback, void *callback_opaque) + FWCfgReadCallback fw_callback, void *callback_opaque, + bool extendable) { Rom *rom; void *data = NULL; @@ -742,6 +743,9 @@ void *rom_add_blob(const char *name, const void *blob, size_t len, if (rom_file_has_mr) { data = rom_set_mr(rom, OBJECT(fw_cfg), devpath); +if (extendable) { +memory_region_permit_extendable_migration(rom-mr); +} } else { data = rom-data; } diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index ebc5f03..651c06b 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -1558,7 +1558,7 @@ static void *acpi_add_rom_blob(AcpiBuildState *build_state, GArray *blob, const char *name) { return rom_add_blob(name, blob-data, acpi_data_len(blob), -1, name, -acpi_build_update, build_state); +acpi_build_update, build_state, true); } static const VMStateDescription vmstate_acpi_build = { diff --git a/hw/lm32/lm32_hwsetup.h b/hw/lm32/lm32_hwsetup.h index 9fd5e69..943130c 100644 --- a/hw/lm32/lm32_hwsetup.h +++ b/hw/lm32/lm32_hwsetup.h @@ -73,7 +73,7 @@ static inline void hwsetup_free(HWSetup *hw) static inline void hwsetup_create_rom(HWSetup *hw, hwaddr base) { -rom_add_blob(hwsetup, hw-data, TARGET_PAGE_SIZE, base, NULL, NULL, NULL); +rom_add_blob_fixed(hwsetup, hw-data, TARGET_PAGE_SIZE, base); } static inline void hwsetup_add_u8(HWSetup *hw, uint8_t u) diff --git a/include/hw/loader.h b/include/hw/loader.h index 796cbf9..e5a24ac 100644 --- a/include/hw/loader.h +++ b/include/hw/loader.h @@ -57,7 +57,8 @@ int rom_add_file(const char *file, const char *fw_dir, bool option_rom); void *rom_add_blob(const char *name, const void *blob, size_t len, hwaddr addr, const char *fw_file_name, - FWCfgReadCallback fw_callback, void *callback_opaque); + FWCfgReadCallback fw_callback, void *callback_opaque, + bool extendable); int rom_add_elf_program(const char *name, void *data, size_t datasize, size_t romsize, hwaddr addr); int rom_load_all(void); @@ -70,7 +71,7 @@ void do_info_roms(Monitor *mon, const QDict *qdict); #define rom_add_file_fixed(_f, _a, _i) \ rom_add_file(_f, NULL, _a, _i, false) #define rom_add_blob_fixed(_f, _b, _l, _a) \ -rom_add_blob(_f, _b, _l, _a, NULL, NULL, NULL) +rom_add_blob(_f, _b, _l, _a, NULL, NULL, NULL, false) #define PC_ROM_MIN_VGA 0xc #define PC_ROM_MIN_OPTION 0xc8000 -- 1.8.3.1
Re: [Qemu-devel] [PATCH 2/7] tests: Add virtio device initialization
On Fri, Jul 25, 2014 at 07:01:47PM +0200, Marc Marà wrote: @@ -73,3 +97,11 @@ QVirtioPCIDevice *qvirtio_pci_device_find(QPCIBus *bus, uint16_t device_type) return dev; } + +void qvirtio_pci_enable_device(QVirtioPCIDevice *d) +{ +qpci_device_enable(d-pdev); +d-addr = qpci_iomap(d-pdev, 0); +g_assert(d-addr != NULL); +} Where is qpci_iounmap() called to clean up? Missed. Also, it is unimplemented. It would be much harder to add in the appropriate guest_free() calls later so users should still call it. Just like we should call guest_free() even if it is currently unimplemented. Stefan pgpw3RjuSra6Y.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH v2 1/7] bootindex: add modify_boot_device_path function
I think it is ok to allow only *changing* the bootindex. Yes, that's no problem. But then yoy always will have a old entry where you can take the suffix from, and you don't need the suffix as parameter for the monitor command. cheers, Gerd
Re: [Qemu-devel] [RFC PATCH v2 09/49] pckbd: adding new fields to vmstate
Il 17/07/2014 13:02, Pavel Dovgalyuk ha scritto: This patch adds outport to VMState to allow correct saving and restoring the state of PC keyboard controller. Signed-off-by: Pavel Dovgalyuk pavel.dovga...@ispras.ru --- hw/input/pckbd.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/hw/input/pckbd.c b/hw/input/pckbd.c index ca1cffc..19f6658 100644 --- a/hw/input/pckbd.c +++ b/hw/input/pckbd.c @@ -371,13 +371,14 @@ static void kbd_reset(void *opaque) static const VMStateDescription vmstate_kbd = { .name = pckbd, -.version_id = 3, +.version_id = 4, .minimum_version_id = 3, .fields = (VMStateField[]) { VMSTATE_UINT8(write_cmd, KBDState), VMSTATE_UINT8(status, KBDState), VMSTATE_UINT8(mode, KBDState), VMSTATE_UINT8(pending, KBDState), +VMSTATE_UINT8_V(outport, KBDState, 4), VMSTATE_END_OF_LIST() } }; Again it would be nice to use a subsection. You can use as the default value KBD_OUT_RESET | KBD_OUT_A20 | (kbd-status KBD_STAT_OBF ? KBD_OUT_OBF : 0) | (kbd-status KBD_STAT_MOUSE_OBF ? KBD_OUT_MOUSE_OBF : 0) If the value of outport matches this, you need not write it. It's not trivial, but you could do it like this: - needed: return false if outport doesn't have the value above - subsection post_load: set kbd-outport_present = 1 - device post_load: reconstruct outport if kbd-outport_present == 0 Paolo
Re: [Qemu-devel] [PATCH v2 1/7] bootindex: add modify_boot_device_path function
Hi, -Original Message- From: Gerd Hoffmann [mailto:kra...@redhat.com] Sent: Monday, July 28, 2014 5:28 PM Subject: Re: [PATCH v2 1/7] bootindex: add modify_boot_device_path function I think it is ok to allow only *changing* the bootindex. Yes, that's no problem. But then yoy always will have a old entry where you can take the suffix from, and you don't need the suffix as parameter for the monitor command. No, optional. Because the bootindex property is not a necessary property for devices. If a device, such as IDE cdrom haven't attach the bootindex in qemu booting command line, the global fw_boot_order will not have its entry. So, we should save the suffix as parameter. Best regards, -Gonglei
Re: [Qemu-devel] [RFC PATCH v2 10/49] rtl8139: adding new fields to vmstate
Il 17/07/2014 13:02, Pavel Dovgalyuk ha scritto: This patch adds virtual clock-dependent timers to VMState to allow correct saving and restoring the state of RTL8139 network controller. Signed-off-by: Pavel Dovgalyuk pavel.dovga...@ispras.ru --- hw/net/rtl8139.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 90bc5ec..992caf0 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -3289,7 +3289,7 @@ static void rtl8139_pre_save(void *opaque) static const VMStateDescription vmstate_rtl8139 = { .name = rtl8139, -.version_id = 4, +.version_id = 5, .minimum_version_id = 3, .post_load = rtl8139_post_load, .pre_save = rtl8139_pre_save, @@ -3363,6 +3363,9 @@ static const VMStateDescription vmstate_rtl8139 = { VMSTATE_STRUCT(tally_counters, RTL8139State, 0, vmstate_tally_counters, RTL8139TallyCounters), +VMSTATE_TIMER_V(timer, RTL8139State, 5), timer need not be migrated, because it is reinstated by rtl8139_post_load. +VMSTATE_INT64_V(TimerExpire, RTL8139State, 5), This can be in a subsection, migrated only if non-zero. Paolo VMSTATE_UINT32_V(cplus_enabled, RTL8139State, 4), VMSTATE_END_OF_LIST() },
Re: [Qemu-devel] [RFC PATCH v2 12/49] mc146818rtc: add missed field to vmstate
Il 17/07/2014 13:03, Pavel Dovgalyuk ha scritto: This patch adds irq_reinject_on_ack_count field to VMState to allow correct saving/loading the state of MC146818 RTC. Signed-off-by: Pavel Dovgalyuk pavel.dovga...@ispras.ru --- hw/timer/mc146818rtc.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c index 9d817ca..c204abb 100644 --- a/hw/timer/mc146818rtc.c +++ b/hw/timer/mc146818rtc.c @@ -735,7 +735,7 @@ static int rtc_post_load(void *opaque, int version_id) static const VMStateDescription vmstate_rtc = { .name = mc146818rtc, -.version_id = 3, +.version_id = 4, .minimum_version_id = 1, .post_load = rtc_post_load, .fields = (VMStateField[]) { @@ -752,6 +752,7 @@ static const VMStateDescription vmstate_rtc = { VMSTATE_INT64_V(offset, RTCState, 3), VMSTATE_TIMER_V(update_timer, RTCState, 3), VMSTATE_UINT64_V(next_alarm_time, RTCState, 3), +VMSTATE_UINT16_V(irq_reinject_on_ack_count, RTCState, 4), Also can be a subsection, migrated only if nonzero. Paolo VMSTATE_END_OF_LIST() } };
Re: [Qemu-devel] [RFC PATCH v2 04/49] fdc: adding vmstate for save/restore
Il 17/07/2014 13:02, Pavel Dovgalyuk ha scritto: VMState added by this patch preserves correct loading of the FDC device state. Signed-off-by: Pavel Dovgalyuk pavel.dovga...@ispras.ru --- hw/block/fdc.c | 11 +-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/hw/block/fdc.c b/hw/block/fdc.c index 490d127..132310a 100644 --- a/hw/block/fdc.c +++ b/hw/block/fdc.c @@ -697,12 +697,17 @@ static const VMStateDescription vmstate_fdrive_media_rate = { static const VMStateDescription vmstate_fdrive = { .name = fdrive, -.version_id = 1, +.version_id = 2, .minimum_version_id = 1, .fields = (VMStateField[]) { VMSTATE_UINT8(head, FDrive), VMSTATE_UINT8(track, FDrive), VMSTATE_UINT8(sect, FDrive), +VMSTATE_UINT8_V(last_sect, FDrive, 2), +VMSTATE_UINT8_V(max_track, FDrive, 2), +VMSTATE_UINT16_V(bps, FDrive, 2), +VMSTATE_UINT8_V(ro, FDrive, 2), +VMSTATE_UINT8_V(perpendicular, FDrive, 2), Perpendicular can be added to a subsection, migrated only if nonzero. The others can be reconstructed by calling fd_revalidate in vmstate_fdrive's post_load callback. VMSTATE_END_OF_LIST() }, .subsections = (VMStateSubsection[]) { @@ -736,7 +741,7 @@ static int fdc_post_load(void *opaque, int version_id) static const VMStateDescription vmstate_fdc = { .name = fdc, -.version_id = 2, +.version_id = 3, .minimum_version_id = 2, .pre_save = fdc_pre_save, .post_load = fdc_post_load, @@ -769,6 +774,8 @@ static const VMStateDescription vmstate_fdc = { VMSTATE_UINT8_EQUAL(num_floppies, FDCtrl), VMSTATE_STRUCT_ARRAY(drives, FDCtrl, MAX_FD, 1, vmstate_fdrive, FDrive), +VMSTATE_INT32_V(reset_sensei, FDCtrl, 3), Subsection, only migrated if nonzero. +VMSTATE_TIMER_V(result_timer, FDCtrl, 3), Subsection, only migrated if pending. Paolo VMSTATE_END_OF_LIST() } };
Re: [Qemu-devel] Possible null-ptr dereference
On Mon, Jul 28, 2014 at 06:03:45AM +, Gonglei (Arei) wrote: Hi, Should be easy to fix though. Does the following help? (Cc'ing Stefan Kevin) -- xen_disk: fix possible null-ptr dereference Signed-off-by: Gonglei arei.gong...@huawei.com --- hw/block/xen_disk.c | 1 + 1 file changed, 1 insertion(+) This code path can never be reached since protocol is always set to one of 3 valid values in xen_disk.c. Therefore, I'm not merging this for QEMU 2.1 where we are only taking critical bug fixes now. Still, it will help silence static checkers and make the intent clear to readers. Thanks, applied to my block-next tree for QEMU 2.2: https://github.com/stefanha/qemu/commits/block-next Stefan pgpu0OfPHR42l.pgp Description: PGP signature
Re: [Qemu-devel] Possible null-ptr dereference
-Original Message- From: Stefan Hajnoczi [mailto:stefa...@redhat.com] Sent: Monday, July 28, 2014 5:49 PM To: Gonglei (Arei) Cc: mateusz.krzywi...@windowslive.com; qemu-devel@nongnu.org; kw...@redhat.com Subject: Re: [Qemu-devel] Possible null-ptr dereference On Mon, Jul 28, 2014 at 06:03:45AM +, Gonglei (Arei) wrote: Hi, Should be easy to fix though. Does the following help? (Cc'ing Stefan Kevin) -- xen_disk: fix possible null-ptr dereference Signed-off-by: Gonglei arei.gong...@huawei.com --- hw/block/xen_disk.c | 1 + 1 file changed, 1 insertion(+) This code path can never be reached since protocol is always set to one of 3 valid values in xen_disk.c. Therefore, I'm not merging this for QEMU 2.1 where we are only taking critical bug fixes now. OK. Still, it will help silence static checkers and make the intent clear to readers. Thanks, applied to my block-next tree for QEMU 2.2: https://github.com/stefanha/qemu/commits/block-next Thanks. Best regards, -Gonglei
Re: [Qemu-devel] [RFC PATCH v2 10/49] rtl8139: adding new fields to vmstate
From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini Il 17/07/2014 13:02, Pavel Dovgalyuk ha scritto: This patch adds virtual clock-dependent timers to VMState to allow correct saving and restoring the state of RTL8139 network controller. Signed-off-by: Pavel Dovgalyuk pavel.dovga...@ispras.ru --- hw/net/rtl8139.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 90bc5ec..992caf0 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -3289,7 +3289,7 @@ static void rtl8139_pre_save(void *opaque) static const VMStateDescription vmstate_rtl8139 = { .name = rtl8139, -.version_id = 4, +.version_id = 5, .minimum_version_id = 3, .post_load = rtl8139_post_load, .pre_save = rtl8139_pre_save, @@ -3363,6 +3363,9 @@ static const VMStateDescription vmstate_rtl8139 = { VMSTATE_STRUCT(tally_counters, RTL8139State, 0, vmstate_tally_counters, RTL8139TallyCounters), +VMSTATE_TIMER_V(timer, RTL8139State, 5), timer need not be migrated, because it is reinstated by rtl8139_post_load. That's true for normal execution. In replay execution mode post_load can be called before cached virtual clock values are loaded. This may cause invalid setting of the timer and raising an IRQ which didn't happen in record mode. I will update this patch to fix post_load function and avoid this non-deterministic behavior. Pavel Dovgalyuk
Re: [Qemu-devel] [RFC PATCH v2 06/49] serial: fixing vmstate for save/restore
Il 17/07/2014 13:02, Pavel Dovgalyuk ha scritto: -.version_id = 3, +.version_id = 4, .minimum_version_id = 2, .pre_save = serial_pre_save, .post_load = serial_post_load, .fields = (VMStateField[]) { VMSTATE_UINT16_V(divider, SerialState, 2), VMSTATE_UINT8(rbr, SerialState), +VMSTATE_UINT8_V(thr, SerialState, 4), +VMSTATE_UINT8_V(tsr, SerialState, 4), VMSTATE_UINT8(ier, SerialState), VMSTATE_UINT8(iir, SerialState), VMSTATE_UINT8(lcr, SerialState), @@ -613,6 +627,15 @@ const VMStateDescription vmstate_serial = { VMSTATE_UINT8(msr, SerialState), VMSTATE_UINT8(scr, SerialState), VMSTATE_UINT8_V(fcr_vmstate, SerialState, 3), +VMSTATE_INT32_V(thr_ipending, SerialState, 4), Subsection, only migrated if it doesn't match (s-iir UART_IIR_ID) == UART_IIR_THRI. +VMSTATE_INT32_V(last_break_enable, SerialState, 4), Can be reconstructed in the post_load callback from s-lcr. +VMSTATE_INT32_V(tsr_retry, SerialState, 4), Subsection, only migrated if nonzero. thr/tsr can be in this subsection as well. +VMSTATE_STRUCT(recv_fifo, SerialState, 4, vmstate_fifo8, Fifo8), +VMSTATE_STRUCT(xmit_fifo, SerialState, 4, vmstate_fifo8, Fifo8), Two subsections, only transmitted if nonempty. +VMSTATE_TIMER_V(fifo_timeout_timer, SerialState, 4), Subsection, only transmitted if pending. +VMSTATE_INT32_V(timeout_ipending, SerialState, 4), Subsection, transmitted only if nonzero. +VMSTATE_INT32_V(poll_msl, SerialState, 4), +VMSTATE_TIMER_V(modem_status_poll, SerialState, 4), Both in a subsection, only migrated if poll_msl is not -1. Paolo VMSTATE_END_OF_LIST()
Re: [Qemu-devel] Possible null-ptr dereference
On 28 July 2014 10:49, Stefan Hajnoczi stefa...@redhat.com wrote: On Mon, Jul 28, 2014 at 06:03:45AM +, Gonglei (Arei) wrote: Hi, Should be easy to fix though. Does the following help? (Cc'ing Stefan Kevin) -- xen_disk: fix possible null-ptr dereference Signed-off-by: Gonglei arei.gong...@huawei.com --- hw/block/xen_disk.c | 1 + 1 file changed, 1 insertion(+) This code path can never be reached since protocol is always set to one of 3 valid values in xen_disk.c. Maybe g_assert_not_reached(); ? thanks -- PMM
Re: [Qemu-devel] [RFC PATCH v2 05/49] parallel: adding vmstate for save/restore
Il 17/07/2014 13:02, Pavel Dovgalyuk ha scritto: +.fields = (VMStateField []) { +VMSTATE_UINT8(state.dataw, ISAParallelState), +VMSTATE_UINT8(state.datar, ISAParallelState), +VMSTATE_UINT8(state.status, ISAParallelState), +VMSTATE_UINT8(state.control, ISAParallelState), +VMSTATE_INT32(state.irq_pending, ISAParallelState), +VMSTATE_INT32(state.hw_driver, ISAParallelState), Static, doesn't need migration. +VMSTATE_INT32(state.epp_timeout, ISAParallelState), +VMSTATE_INT32(state.it_shift, ISAParallelState), Static, doesn't need migration. +VMSTATE_END_OF_LIST()
Re: [Qemu-devel] [PATCH v2 1/7] bootindex: add modify_boot_device_path function
Hi, I think it is ok to allow only *changing* the bootindex. Yes, that's no problem. But then yoy always will have a old entry where you can take the suffix from, and you don't need the suffix as parameter for the monitor command. No, optional. Because the bootindex property is not a necessary property for devices. If a device, such as IDE cdrom haven't attach the bootindex in qemu booting command line, the global fw_boot_order will not have its entry. I'd suggest to simply not support this and throw an error then. So, we should save the suffix as parameter. I think it is a bad idea to have the suffix as monitor command parameter. It is a implementation detail which the monitor users should not have to worry about. Easiest way to do this is to allow *changing* an existing bootindex entry only, and not support *adding* new boot entries. The user has to set a bootindex for any device it might want to boot from in the future then. I think this is acceptable. If you want support adding bootentries at runtime (and it probably makes sense to support removing entries too in that case) the suffix handling should be reworked. The suffix could be stored in DeviceState instead of being passed to the add_bootentry function, so you can add boot entries later on without expecting the user to know what the correct suffix is. cheers, Gerd
Re: [Qemu-devel] [RFC PATCH v2 10/49] rtl8139: adding new fields to vmstate
Il 28/07/2014 11:54, Pavel Dovgaluk ha scritto: +VMSTATE_TIMER_V(timer, RTL8139State, 5), timer need not be migrated, because it is reinstated by rtl8139_post_load. That's true for normal execution. In replay execution mode post_load can be called before cached virtual clock values are loaded. This may cause invalid setting of the timer and raising an IRQ which didn't happen in record mode. I will update this patch to fix post_load function and avoid this non-deterministic behavior. This is what worries me of this series. These invariants are not documented anywhere, and people will break them unless you add assertions that also hold in normal mode. Paolo
Re: [Qemu-devel] [RFC PATCH v2 00/49] Series short description
Il 28/07/2014 09:50, Pavel Dovgaluk ha scritto: - patches 2 to 13 probably should try to use subsections, so that VMs that do not use the devices try not to save the extra data and keep backwards migration compatibility (at least try to) Could you give me and example? As I know, subsection is loaded when some predicate function returns true. How can I construct such a function for integratorcp module? What kind of condition should it check? In this module I just added missed vmstates (it does not saved/restored at all by the master version). I tried to review the x86 ones. These are the most interesting ones, because it's the only board where we support cross-version migration. Parallel is (quite obviously) unsalvageable, and I think HPET is too. - patch 16 should also use subsections, and perhaps apply to all other CPUs too? We implemented replay only for i386 and ARM. If we'll change other targets, it will not add record/replay capabilities to them, but can confuse the reviewers. Why are these fields only required by record/replay? - patches 23-24-25 perhaps could try using icount, like Konrad's patch do? Using faster icount (like in Konrad's patches) is the our next aim. It obviously will increase the speed of recording process. But now I submitted slower, but more conservative version of icount which we had already tested. I see. - patch 27 makes sense but VIRTUAL is used to skip blinking when the VM is stopped Right, this is kind of hack. I haven't found better solution yet. I think it's okay to use REALTIME, just add runstate_is_running() to the if (now = s-cursor_blink_time). That said, every read to virtual clock is written to the replay log worries me a bit from the point of view of thread-safety. Do you need to log all reads because you don't use icount? Reads can only happen at given points if you use icount, and you could simply log the icount value at each synchronization point. Paolo
Re: [Qemu-devel] [PATCH v2 1/7] bootindex: add modify_boot_device_path function
-Original Message- From: Gerd Hoffmann [mailto:kra...@redhat.com] Sent: Monday, July 28, 2014 6:02 PM Subject: Re: [PATCH v2 1/7] bootindex: add modify_boot_device_path function Hi, I think it is ok to allow only *changing* the bootindex. Yes, that's no problem. But then yoy always will have a old entry where you can take the suffix from, and you don't need the suffix as parameter for the monitor command. No, optional. Because the bootindex property is not a necessary property for devices. If a device, such as IDE cdrom haven't attach the bootindex in qemu booting command line, the global fw_boot_order will not have its entry. I'd suggest to simply not support this and throw an error then. Ok. So, we should save the suffix as parameter. I think it is a bad idea to have the suffix as monitor command parameter. It is a implementation detail which the monitor users should not have to worry about. Yes. Actually I also have this misgivings. Easiest way to do this is to allow *changing* an existing bootindex entry only, and not support *adding* new boot entries. The user has to set a bootindex for any device it might want to boot from in the future then. I think this is acceptable. Hmm.. If you want support adding bootentries at runtime (and it probably makes sense to support removing entries too in that case) the suffix handling should be reworked. The suffix could be stored in DeviceState instead of being passed to the add_bootentry function, so you can add boot entries later on without expecting the user to know what the correct suffix is. That's a good idea! I think this is a good improvement. Any other comments? Thanks! Best regards, -Gonglei
Re: [Qemu-devel] [PATCH v2 7/7] libqos: Added basic virtqueue support to virtio implementation
El Fri, 25 Jul 2014 23:37:43 +0200 Marc Marà marc.mari.barc...@gmail.com escribió: + +data = g_malloc0(512); +memread(r_req+16, data, 512); +g_assert_cmpstr(data, ==, TEST); +g_free(data); + guest_free() for both requests should be added here. Will be added for v3 +/* End test */ +guest_free(alloc, vq-desc); qvirtio_pci_disable_device(dev); g_free(dev); test_end();
Re: [Qemu-devel] [PULL for-2.1] Trivial patch for 2014-07-26
On 26 July 2014 08:18, Michael Tokarev m...@tls.msk.ru wrote: There's just one trivial patch this time, fixing another occurence of allows to in help text. Please consider applying for 2.1. Thanks, /mjt The following changes since commit c60a57ff497667780132a3fcdc1500c83af5d5c0: Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging (2014-07-25 16:58:41 +0100) are available in the git repository at: git://git.corpit.ru/qemu.git tags/trivial-patches-2014-07-26 for you to fetch changes up to 2f47b403bd494b6717ef14c0d278d4b2d364b382: qemu-options: fix another allows-to for -net l2tpv3 (2014-07-26 11:16:44 +0400) trivial patches for 2014-07-26 Michael Tokarev (1): qemu-options: fix another allows-to for -net l2tpv3 qemu-options.hx |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Applied, thanks. -- PMM
Re: [Qemu-devel] [PATCH v2 7/7] spapr: fix possible memory leak
On 25.07.14 08:52, arei.gong...@huawei.com wrote: From: Gonglei arei.gong...@huawei.com get_boot_devices_list() will malloc memory, spapr_finalize_fdt doesn't free it. Signed-off-by: Chenliang chenlian...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com Thanks, applied to ppc-next-2.2. Alex
[Qemu-devel] [PATCH] target-mips: Ignore unassigned accesses with KVM
MIPS registers an unassigned access handler which raises a guest bus error exception. However this causes QEMU to crash when KVM is enabled as it isn't called from the main execution loop so longjmp() gets called without a corresponding setjmp(). Until the KVM API can be updated to trigger a guest exception in response to an MMIO exit, prevent the bus error exception being raised from mips_cpu_unassigned_access() if KVM is enabled. The check is at run time since the do_unassigned_access callback is initialised before it is known whether KVM will be enabled. The problem can be triggered with Malta emulation by making the guest write to the reset region at physical address 0x1bf0, since it is marked read-only which is treated as unassigned for writes. Signed-off-by: James Hogan james.ho...@imgtec.com Cc: Aurelien Jarno aurel...@aurel32.net Cc: Peter Maydell peter.mayd...@linaro.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Christoffer Dall christoffer.d...@linaro.org Cc: Sanjay Lal sanj...@kymasys.com --- target-mips/op_helper.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c index 27651a4a00c1..df97b35f8701 100644 --- a/target-mips/op_helper.c +++ b/target-mips/op_helper.c @@ -21,6 +21,7 @@ #include qemu/host-utils.h #include exec/helper-proto.h #include exec/cpu_ldst.h +#include sysemu/kvm.h #ifndef CONFIG_USER_ONLY static inline void cpu_mips_tlb_flush (CPUMIPSState *env, int flush_global); @@ -2168,6 +2169,16 @@ void mips_cpu_unassigned_access(CPUState *cs, hwaddr addr, MIPSCPU *cpu = MIPS_CPU(cs); CPUMIPSState *env = cpu-env; +/* + * Raising an exception with KVM enabled will crash because it won't be from + * the main execution loop so the longjmp won't have a matching setjmp. + * Until we can trigger a bus error exception through KVM lets just ignore + * the access. + */ +if (kvm_enabled()) { +return; +} + if (is_exec) { helper_raise_exception(env, EXCP_IBE); } else { -- 1.8.5.5
Re: [Qemu-devel] [PATCH v2 for-2.1 2/2] pc: hack for migration compatibility from QEMU 2.0
Il 28/07/2014 13:45, Michael S. Tsirkin ha scritto: +/* These are used to size the ACPI tables for -M pc-i440fx-1.7 and + * -M pc-i440fx-2.0. Let's just say 2.0 and earlier? This would give the idea that 1.6 is broken, but it isn't. Even if the actual amount of AML generated grows + * a little bit, there should be plenty of free space since the DSDT + * shrunk by ~1.5k between QEMU 2.0 and QEMU 2.1. + */ +#define ACPI_BUILD_CPU_AML_SIZE97 +#define ACPI_BUILD_BRIDGE_AML_SIZE 1875 Let's put _LEGACY_ somewhere here? Ok. + +#define ACPI_BUILD_TABLE_SIZE 0x1 + typedef struct AcpiCpuInfo { DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT); } AcpiCpuInfo; @@ -87,6 +99,8 @@ typedef struct AcpiBuildPciBusHotplugState { struct AcpiBuildPciBusHotplugState *parent; } AcpiBuildPciBusHotplugState; +unsigned bsel_alloc; + Patch will be better contained if instead of using a global bsel_alloc, we actually go and count the devices that have ACPI_PCIHP_PROP_BSEL. You can just scan all devices, or all pci devices, it should not matter. This way, this code will be local to the legacy path. Ok. static void acpi_get_dsdt(AcpiMiscInfo *info) { uint16_t *applesmc_sta; @@ -759,8 +773,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque) static void acpi_set_pci_info(void) { PCIBus *bus = find_i440fx(); /* TODO: Q35 support */ -unsigned bsel_alloc = 0; +assert(bsel_alloc == 0); if (bus) { /* Scan all PCI buses. Set property to enable acpi based hotplug. */ pci_for_each_bus_depth_first(bus, acpi_set_bsel, NULL, bsel_alloc); @@ -1440,13 +1454,14 @@ static void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) { GArray *table_offsets; -unsigned facs, dsdt, rsdt; +unsigned facs, ssdt, dsdt, rsdt; AcpiCpuInfo cpu; AcpiPmInfo pm; AcpiMiscInfo misc; AcpiMcfgInfo mcfg; PcPciInfo pci; uint8_t *u; +size_t aml_len = 0; acpi_get_cpu_info(cpu); acpi_get_pm_info(pm); @@ -1474,13 +1489,20 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) dsdt = tables-table_data-len; build_dsdt(tables-table_data, tables-linker, misc); +/* Count the size of the DSDT and SSDT, we will need it for legacy + * sizing of ACPI tables. + */ +aml_len += tables-table_data-len - dsdt; + /* ACPI tables pointed to by RSDT */ acpi_add_table(table_offsets, tables-table_data); build_fadt(tables-table_data, tables-linker, pm, facs, dsdt); +ssdt = tables-table_data-len; acpi_add_table(table_offsets, tables-table_data); build_ssdt(tables-table_data, tables-linker, cpu, pm, misc, pci, guest_info); +aml_len += tables-table_data-len - ssdt; acpi_add_table(table_offsets, tables-table_data); build_madt(tables-table_data, tables-linker, cpu, guest_info); @@ -1513,12 +1535,53 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) /* RSDP is in FSEG memory, so allocate it separately */ build_rsdp(tables-rsdp, tables-linker, rsdt); -/* We'll expose it all to Guest so align size to reduce +/* We'll expose it all to Guest so we want to reduce * chance of size changes. * RSDP is small so it's easy to keep it immutable, no need to * bother with alignment. + * + * We used to align the tables to 4k, but of course this would + * too simple to be enough. 4k turned out to be too small an + * alignment very soon, and in fact it is almost impossible to + * keep the table size stable for all (max_cpus, max_memory_slots) + * combinations. So the table size is always 64k for pc-i440fx-2.1 + * and we give an error if the table grows beyond that limit. + * + * We still have the problem of migrating from -M pc-i440fx-2.0. For + * that, we exploit the fact that QEMU 2.1 generates _smaller_ tables + * than 2.0 and we can always pad the smaller tables with zeros. We can + * then use the exact size of the 2.0 tables. + * + * All this is for PIIX4, since QEMU 2.0 didn't support Q35 migration. */ -acpi_align_size(tables-table_data, 0x1000); +if (guest_info-legacy_acpi_table_size) { +/* Subtracting aml_len gives the size of fixed tables. Then add the + * size of the PIIX4 DSDT/SSDT in QEMU 2.0. + */ +int legacy_aml_len = +guest_info-legacy_acpi_table_size + +ACPI_BUILD_CPU_AML_SIZE * max_cpus + +ACPI_BUILD_BRIDGE_AML_SIZE * (MAX(bsel_alloc, 1) - 1); +int legacy_table_size = +ROUND_UP(tables-table_data-len - aml_len + legacy_aml_len, 0x1000); +if (tables-table_data-len legacy_table_size) { +/* -M pc-i440fx-2.0 doesn't support memory hotplug, so this should + * never happen.
Re: [Qemu-devel] [PATCH v12 1/4] spapr_pci: Make find_phb()/find_dev() public
On 16.07.14 02:20, Gavin Shan wrote: From: Alexey Kardashevskiy a...@ozlabs.ru This makes find_phb()/find_dev() public and changed its names to spapr_pci_find_phb()/spapr_pci_find_dev() as they are going to be used from other parts of QEMU such as VFIO DDW (dynamic DMA window) or VFIO PCI error injection or VFIO EEH handling - in all these cases there are RTAS calls which are addressed to BUID+config_addr in IEEE1275 format. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Do we need this still? Alex
Re: [Qemu-devel] [PATCH v12 3/4] headers: Update kernel header
On 16.07.14 03:40, Gavin Shan wrote: On Wed, Jul 16, 2014 at 11:32:13AM +1000, Alexey Kardashevskiy wrote: On 07/16/2014 11:16 AM, Gavin Shan wrote: On Wed, Jul 16, 2014 at 11:09:44AM +1000, Alexey Kardashevskiy wrote: On 07/16/2014 10:20 AM, Gavin Shan wrote: This updates kernel header (vfio.h) for EEH support on VFIO PCI devices. Has this reached kernel upstream? The way linux headers update normally happens is you have to run scripts/update-linux-headers.sh against some linux kernel tag which you know that it won't change (like v3.16-rc5) and post all the changes as a single patch. It is never a header update for a specific feature, it is just an update. The kernel part isn't merged yet. I guess that's for 3.17 merge window. Ok, good to know scripts/update-linux-headers.sh. So this patch should be dropped and some one run the script to update QEMU (linux-headers directory) ? Once your changes are in upstream kernel, you wait till kernel tree gets new v3.xx-rcX tag, then you run the script and make a separate patch for QEMU. Then you wait till it reaches QEMU upstream (because I do not know who will pull it to what tree, look at git history) or ppc-next (if Alex pulls it and you are basing your work on ppc-next) and then repost other patches. Thanks for detailed explaining, Alexey. I guess I have to suspend a bit until v3.17.rc1 is coming out. It's also perfectly fine to keep the kernel header update inside your patch set, but please make sure to mention which commit it is against. If I pick it up, I will generate the headers myself though and drop your patch while applying the rest of the set. Alex
Re: [Qemu-devel] [PATCH] pty: Fix byte loss bug when connecting to pty
Il 28/07/2014 13:39, Sebastian Tanase ha scritto: When trying to print data to the pty, we first check if it is connected. If not, we try to reconnect, but we drop the pending data even if we have successfully reconnected; this makes us lose the first byte of the very first transmission. This small fix addresses the issue by checking once more if the pty is connected after having tried to reconnect. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr --- To reproduce the bug, launch a qemu image that has a parallel port (say lp0) and redirect it to a pty (-parallel pty). After the VM is launched, open the corresponding pty on your host (cat /dev/pts/X) and send some data from the VM to the host: echo abcd /dev/lp0 The first time, the received string will be bcd instead of abcd. This bug can have important consequences if you try, for example, to send a postscript file from a printer within the VM. Losing the first character will render the .ps file unusable. --- qemu-char.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/qemu-char.c b/qemu-char.c index 7acc03f..ce52d0f 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -1160,7 +1160,9 @@ static int pty_chr_write(CharDriverState *chr, const uint8_t *buf, int len) if (!s-connected) { /* guest sends data, check for (re-)connect */ pty_chr_update_read_handler_locked(chr); -return 0; +if (!s-connected) { +return 0; +} } return io_channel_send(s-fd, buf, len); } Looks ok, though only for 2.2 and 2.1.1. Gerd, can you take care of this patch? Paolo
Re: [Qemu-devel] [PULL for-2.1 0/3] Last minute patches
Peter Maydell peter.mayd...@linaro.org wrote: On 25 July 2014 15:22, Paolo Bonzini pbonz...@redhat.com wrote: Since Igor hasn't sent his patches, and I'm leaving the office, I pushed this to git://github.com/bonzini/qemu.git tags/for-upstream-full I don't know about tests/acpi-test-data/pc. It makes sense that this patch should modify something there, but: * make check passes * the test warns even before patch 1, for both the DSDT (modified by the patch) and SSDT (which this series doesn't touch at all) * I cannot get it to pass, except by blindly copying the actual output on the expected files * mst is on vacation and Marcel is off on Fridays Based on my understanding of the problem, it is not possible to fix the bug without hacks like this one, and even reverting all patches in this area would be more risky. Hmm. I'm not really sure what the right thing is, so what I'm planning to do is: * just apply the qemu-char fix for now * not tag -rc4 today * see if things are clearer on Monday (I see Igor has now sent out a patchset) * tag -rc4 Mon or Tues * slip the release date a few days (not a big deal, I think) I am reading both patch-sets. I preffer very much Paolo solution to Igor one. But I have to say that I don't understand PATCH 1 (neither before or after the change). Solution does what we should do, that is generate the size that destination is expecting, and no simply blindy accept packages that are smaller. The compatibility bits of PATCH2 look ok (that ones, I can kind of understand them). Igor? Later, Juan.
Re: [Qemu-devel] [Bug 1343827] [NEW] block.c: multiwrite_merge() truncates overlapping requests
On Fri, Jul 18, 2014 at 10:15 AM, Slava Pestov sviatoslav.pes...@gmail.com wrote: Public bug reported: If the list of requests passed to multiwrite_merge() contains two requests where the first is for a range of sectors that is a strict subset of the second's, the second request is truncated to end where the first starts, so the second half of the second request is lost. This is easy to reproduce by running fio against a virtio-blk device running on qemu 2.1.0-rc1 with the below fio script. At least with fio 2.0.13, the randwrite pass will issue overlapping bios to the block driver, which the kernel is happy to pass along to qemu: [global] randrepeat=0 ioengine=libaio iodepth=64 direct=1 size=1M numjobs=1 verify_fatal=1 verify_dump=1 filename=$dev [seqwrite] blocksize_range=4k-1M rw=write verify=crc32c-intel [randwrite] stonewall blocksize_range=4k-1M rw=randwrite verify=meta Here is a naive fix for the problem that simply avoids merging problematic requests. I guess a better solution would be to redo qemu_iovec_concat() to do the right thing. diff -ur old/qemu-2.1.0-rc2/block.c qemu-2.1.0-rc2/block.c --- old/qemu-2.1.0-rc2/block.c 2014-07-15 14:49:14.0 -0700 +++ qemu-2.1.0-rc2/block.c 2014-07-17 23:03:14.224169741 -0700 @@ -4460,7 +4460,9 @@ int64_t oldreq_last = reqs[outidx].sector + reqs[outidx].nb_sectors; // Handle exactly sequential writes and overlapping writes. -if (reqs[i].sector = oldreq_last) { +// If this request ends before the previous one, don't merge. +if (reqs[i].sector = oldreq_last +reqs[i].sector + reqs[i].nb_sectors = oldreq_last) { merge = 1; } ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1343827 Title: block.c: multiwrite_merge() truncates overlapping requests Status in QEMU: New Bug description: If the list of requests passed to multiwrite_merge() contains two requests where the first is for a range of sectors that is a strict subset of the second's, the second request is truncated to end where the first starts, so the second half of the second request is lost. This is easy to reproduce by running fio against a virtio-blk device running on qemu 2.1.0-rc1 with the below fio script. At least with fio 2.0.13, the randwrite pass will issue overlapping bios to the block driver, which the kernel is happy to pass along to qemu: [global] randrepeat=0 ioengine=libaio iodepth=64 direct=1 size=1M numjobs=1 verify_fatal=1 verify_dump=1 filename=$dev [seqwrite] blocksize_range=4k-1M rw=write verify=crc32c-intel [randwrite] stonewall blocksize_range=4k-1M rw=randwrite verify=meta Here is a naive fix for the problem that simply avoids merging problematic requests. I guess a better solution would be to redo qemu_iovec_concat() to do the right thing. diff -ur old/qemu-2.1.0-rc2/block.c qemu-2.1.0-rc2/block.c --- old/qemu-2.1.0-rc2/block.c 2014-07-15 14:49:14.0 -0700 +++ qemu-2.1.0-rc2/block.c 2014-07-17 23:03:14.224169741 -0700 @@ -4460,7 +4460,9 @@ int64_t oldreq_last = reqs[outidx].sector + reqs[outidx].nb_sectors; // Handle exactly sequential writes and overlapping writes. -if (reqs[i].sector = oldreq_last) { +// If this request ends before the previous one, don't merge. +if (reqs[i].sector = oldreq_last +reqs[i].sector + reqs[i].nb_sectors = oldreq_last) { merge = 1; } To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1343827/+subscriptions Hello, bug is still here in the master and can be easily reproduced (and, of course, looks like blocker since data corruption takes place). Does anyone have an idea on when the fix (at least suggested one) will be merged?
Re: [Qemu-devel] [PULL for-2.1 0/3] Last minute patches
On Mon, 28 Jul 2014 14:59:26 +0200 Juan Quintela quint...@redhat.com wrote: Peter Maydell peter.mayd...@linaro.org wrote: On 25 July 2014 15:22, Paolo Bonzini pbonz...@redhat.com wrote: Since Igor hasn't sent his patches, and I'm leaving the office, I pushed this to git://github.com/bonzini/qemu.git tags/for-upstream-full I don't know about tests/acpi-test-data/pc. It makes sense that this patch should modify something there, but: * make check passes * the test warns even before patch 1, for both the DSDT (modified by the patch) and SSDT (which this series doesn't touch at all) * I cannot get it to pass, except by blindly copying the actual output on the expected files * mst is on vacation and Marcel is off on Fridays Based on my understanding of the problem, it is not possible to fix the bug without hacks like this one, and even reverting all patches in this area would be more risky. Hmm. I'm not really sure what the right thing is, so what I'm planning to do is: * just apply the qemu-char fix for now * not tag -rc4 today * see if things are clearer on Monday (I see Igor has now sent out a patchset) * tag -rc4 Mon or Tues * slip the release date a few days (not a big deal, I think) I am reading both patch-sets. I preffer very much Paolo solution to Igor one. But I have to say that I don't understand PATCH 1 (neither before or after the change). Solution does what we should do, that is generate the size that destination is expecting, and no simply blindy accept packages that are smaller. The compatibility bits of PATCH2 look ok (that ones, I can kind of understand them). Igor? These patches work for 2.0-2.1 for -M pc-1.7 (Ubuntu case), however they doesn't for 1.7-2.1 (RHEL7 case). Also chasing exact size is a bit problematic considering that different iasl versions produce differently sized AML output. As far as I understood from chat on #qemu, Michael is going to pick-up this series + extendable RAMBlock series + disabling PCI bridge hotplug patch and may be cook additional one to make working backwards migration. Later, Juan.
Re: [Qemu-devel] [PATCH 0/4 v8] ppc: Add debug stub support
On 14.07.14 11:15, Bharat Bhushan wrote: This patchset add support for - software breakpoint - h/w breakpoint - h/w watchpoint Please find description in individual patch. Thanks, applied (with minor edits in comments) to ppc-next-2.2. Alex
Re: [Qemu-devel] [PATCH 3/3] ppc/spapr: Fix MAX_CPUS to 255
On 27.06.14 08:47, Nikunj A Dadhania wrote: MAX_CPUS 256 is inconsistent with qemu supporting upto 255 cpus. This MAX_CPUS number was percolated back to virsh capabilities with wrong max_cpus. Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com Nice one :). Thanks, applied to ppc-next-2.2. Alex
Re: [Qemu-devel] [PATCH/RFC 0/5] s390x/kvm: track the logical cpu state in QEMU and propagate it to kvm
On 10.07.14 15:27, David Hildenbrand wrote: This is the qemu part of kernel series Let user space control the cpu states Christian Borntraeger (1): update linux headers with with cpustate changes David Hildenbrand (4): s390x/kvm: introduce proper states for s390 cpus s390x/kvm: proper use of the cpu states OPERATING and STOPPED s390x/kvm: test whether a cpu is STOPPED when checking has_work s390x/kvm: propagate s390 cpu state to kvm hw/s390x/ipl.c| 2 +- hw/s390x/s390-virtio.c| 32 -- linux-headers/linux/kvm.h | 7 ++- target-s390x/cpu.c| 106 +++--- target-s390x/cpu.h| 33 +-- target-s390x/helper.c | 11 ++--- target-s390x/kvm.c| 49 ++--- trace-events | 6 +++ 8 files changed, 179 insertions(+), 67 deletions(-) Looks good to me -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html @all thought it was the final internal review :) It's a perfectly good thing to say looks good to me in public too. The only major difference is that usually you would say reviewed-by ;). Alex
Re: [Qemu-devel] [PATCH/RFC 0/5] s390x/kvm: track the logical cpu state in QEMU and propagate it to kvm
On 28.07.2014, at 15:43, Alexander Graf ag...@suse.de wrote: On 10.07.14 15:27, David Hildenbrand wrote: This is the qemu part of kernel series Let user space control the cpu states Christian Borntraeger (1): update linux headers with with cpustate changes David Hildenbrand (4): s390x/kvm: introduce proper states for s390 cpus s390x/kvm: proper use of the cpu states OPERATING and STOPPED s390x/kvm: test whether a cpu is STOPPED when checking has_work s390x/kvm: propagate s390 cpu state to kvm hw/s390x/ipl.c| 2 +- hw/s390x/s390-virtio.c| 32 -- linux-headers/linux/kvm.h | 7 ++- target-s390x/cpu.c| 106 +++--- target-s390x/cpu.h| 33 +-- target-s390x/helper.c | 11 ++--- target-s390x/kvm.c| 49 ++--- trace-events | 6 +++ 8 files changed, 179 insertions(+), 67 deletions(-) Looks good to me -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html @all thought it was the final internal review :) It's a perfectly good thing to say looks good to me in public too. The only major difference is that usually you would say reviewed-by ;). Meh - only realized after I sent this that all those patches are From: you. Then of course it's not useful ;) Alex
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work
On 10.07.14 15:10, Christian Borntraeger wrote: From: David Hildenbrand d...@linux.vnet.ibm.com If a cpu is stopped, it must never be allowed to run and no interrupt may wake it up. A cpu also has to be unhalted if it is halted and has work to do - this scenario wasn't hit in kvm case yet, as only disabled wait is processed within QEMU. Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com This looks like it's something that generic infrastructure should take care of, no? How does this work for the other archs? They always get an interrupt on the transition between !has_work - has_work. Why don't we get one for s390x? Alex
Re: [Qemu-devel] [PATCH v12 1/4] spapr_pci: Make find_phb()/find_dev() public
On 07/28/2014 10:49 PM, Alexander Graf wrote: On 16.07.14 02:20, Gavin Shan wrote: From: Alexey Kardashevskiy a...@ozlabs.ru This makes find_phb()/find_dev() public and changed its names to spapr_pci_find_phb()/spapr_pci_find_dev() as they are going to be used from other parts of QEMU such as VFIO DDW (dynamic DMA window) or VFIO PCI error injection or VFIO EEH handling - in all these cases there are RTAS calls which are addressed to BUID+config_addr in IEEE1275 format. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Do we need this still? Ufff. I missed when this patchet stopped needing this patch :) Looks to me that this patchset does not need it so I'll repost it for DDW later. -- Alexey
[Qemu-devel] [PATCH v3 0/5] ACPI fixes for QEMU 2.1
v2-v3: fix tests/acpi-test-data/pc/DSDT [Peter] track down make check failure, fix it [patch 4, me] split patch 2 in two parts [mst] do not make bsel_alloc global [mst] include Igor's bridge patch [mst, as discussed on IRC] Igor Mammedov (1): pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is disabled Paolo Bonzini (4): acpi-dsdt: procedurally generate _PRT pc: hack for migration compatibility from QEMU 2.0 pc: future-proof migration-compatibility of ACPI tables bios-tables-test: fix ASL normalization false positive hw/i386/acpi-build.c| 119 ++- hw/i386/acpi-dsdt.dsl | 90 +- hw/i386/acpi-dsdt.hex.generated | 1910 +++ hw/i386/pc_piix.c | 19 + hw/i386/pc_q35.c|5 + include/hw/i386/pc.h|1 + tests/acpi-test-data/pc/DSDT| Bin 4499 - 2807 bytes tests/bios-tables-test.c|6 +- 8 files changed, 288 insertions(+), 1862 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH 1/5] acpi-dsdt: procedurally generate _PRT
This replaces the _PRT constant with a method that computes it. The problem is that the DSDT+SSDT have grown from 2.0 to 2.1, enough to cross the 8k barrier (we align the ACPI tables to 4k before putting them in fw_cfg). This causes problems with migration and the pc-i440fx-2.0 machine type. The solution to the problem is to hardcode 64k as the limit, but this doesn't solve the bug with pc-i440fx-2.0. The fix will be for QEMU 2.1 to use exactly the same size as QEMU 2.0 for the ACPI tables. First, however, we must make the actual AML equal or smaller; to do this, rewrite _PRT in a way that saves over 1k of bytecode. Reviewed-by: Laszlo Ersek ler...@redhat.com Tested-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-dsdt.dsl | 90 +- hw/i386/acpi-dsdt.hex.generated | 1910 +++ tests/acpi-test-data/pc/DSDT| Bin 4499 - 2807 bytes 3 files changed, 148 insertions(+), 1852 deletions(-) diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl index 3cc0ea0..6ba0170 100644 --- a/hw/i386/acpi-dsdt.dsl +++ b/hw/i386/acpi-dsdt.dsl @@ -181,57 +181,45 @@ DefinitionBlock ( Scope(\_SB) { Scope(PCI0) { -Name(_PRT, Package() { -/* PCI IRQ routing table, example from ACPI 2.0a specification, - section 6.2.8.1 */ -/* Note: we provide the same info as the PCI routing - table of the Bochs BIOS */ - -#define prt_slot(nr, lnk0, lnk1, lnk2, lnk3) \ -Package() { nr##, 0, lnk0, 0 }, \ -Package() { nr##, 1, lnk1, 0 }, \ -Package() { nr##, 2, lnk2, 0 }, \ -Package() { nr##, 3, lnk3, 0 } - -#define prt_slot0(nr) prt_slot(nr, LNKD, LNKA, LNKB, LNKC) -#define prt_slot1(nr) prt_slot(nr, LNKA, LNKB, LNKC, LNKD) -#define prt_slot2(nr) prt_slot(nr, LNKB, LNKC, LNKD, LNKA) -#define prt_slot3(nr) prt_slot(nr, LNKC, LNKD, LNKA, LNKB) - -prt_slot0(0x), -/* Device 1 is power mgmt device, and can only use irq 9 */ -prt_slot(0x0001, LNKS, LNKB, LNKC, LNKD), -prt_slot2(0x0002), -prt_slot3(0x0003), -prt_slot0(0x0004), -prt_slot1(0x0005), -prt_slot2(0x0006), -prt_slot3(0x0007), -prt_slot0(0x0008), -prt_slot1(0x0009), -prt_slot2(0x000a), -prt_slot3(0x000b), -prt_slot0(0x000c), -prt_slot1(0x000d), -prt_slot2(0x000e), -prt_slot3(0x000f), -prt_slot0(0x0010), -prt_slot1(0x0011), -prt_slot2(0x0012), -prt_slot3(0x0013), -prt_slot0(0x0014), -prt_slot1(0x0015), -prt_slot2(0x0016), -prt_slot3(0x0017), -prt_slot0(0x0018), -prt_slot1(0x0019), -prt_slot2(0x001a), -prt_slot3(0x001b), -prt_slot0(0x001c), -prt_slot1(0x001d), -prt_slot2(0x001e), -prt_slot3(0x001f), -}) +Method (_PRT, 0) { +Store(Package(128) {}, Local0) +Store(Zero, Local1) +While(LLess(Local1, 128)) { +// slot = pin 2 +Store(ShiftRight(Local1, 2), Local2) + +// lnk = (slot + pin) 3 +Store(And(Add(Local1, Local2), 3), Local3) +If (LEqual(Local3, 0)) { +Store(Package(4) { Zero, Zero, LNKD, Zero }, Local4) +} +If (LEqual(Local3, 1)) { +// device 1 is the power-management device, needs SCI +If (LEqual(Local1, 4)) { +Store(Package(4) { Zero, Zero, LNKS, Zero }, Local4) +} Else { +Store(Package(4) { Zero, Zero, LNKA, Zero }, Local4) +} +} +If (LEqual(Local3, 2)) { +Store(Package(4) { Zero, Zero, LNKB, Zero }, Local4) +} +If (LEqual(Local3, 3)) { +Store(Package(4) { Zero, Zero, LNKC, Zero }, Local4) +} + +// Complete the interrupt routing entry: +//Package(4) { 0x[slot], [pin], [link], 0) } + +Store(Or(ShiftLeft(Local2, 16), 0x), Index(Local4, 0)) +Store(And(Local1, 3),Index(Local4, 1)) +Store(Local4,Index(Local0, Local1)) + +Increment(Local1) +} + +Return(Local0) +} }
[Qemu-devel] [PATCH 2/5] pc: hack for migration compatibility from QEMU 2.0
Changing the ACPI table size causes migration to break, and the memory hotplug work opened our eyes on how horribly we were breaking things in 2.0 already. The ACPI table size is rounded to the next 4k, which one would think gives some headroom. In practice this is not the case, because the user can control the ACPI table size (each CPU adds 97 bytes to the SSDT and 8 to the MADT) and so some -smp values will break the 4k boundary and fail to migrate. Similarly, PCI bridges add ~1870 bytes to the SSDT. This patch concerns itself with fixing migration from QEMU 2.0. It computes the payload size of QEMU 2.0 and always uses that one. The previous patch shrunk the ACPI tables enough that the QEMU 2.0 size should always be enough; non-AML tables can change depending on the configuration (especially MADT, SRAT, HPET) but they remain the same between QEMU 2.0 and 2.1, so we only compute our padding based on the sizes of the SSDT and DSDT. Migration from QEMU 1.7 should work for guests that have a number of CPUs other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140. It was already broken from QEMU 1.7 to QEMU 2.0 in the same way, though. The amount of AML for a bridge varies a little bit between 1872 and 1875 due to optimized number encodings. Use the smallest value, and let any extra chew away at the slack left by the shrinking of the DSDT. Reviewed-by: Laszlo Ersek ler...@redhat.com Tested-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-build.c | 84 +--- hw/i386/pc_piix.c| 19 hw/i386/pc_q35.c | 5 include/hw/i386/pc.h | 1 + 4 files changed, 105 insertions(+), 4 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index ebc5f03..7d2251f 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -25,7 +25,9 @@ #include glib.h #include qemu-common.h #include qemu/bitmap.h +#include qemu/osdep.h #include qemu/range.h +#include qemu/error-report.h #include hw/pci/pci.h #include qom/cpu.h #include hw/i386/pc.h @@ -52,6 +54,15 @@ #include qapi/qmp/qint.h #include qom/qom-qobject.h +/* These are used to size the ACPI tables for -M pc-i440fx-1.7 and + * -M pc-i440fx-2.0. Even if the actual amount of AML generated grows + * a little bit, there should be plenty of free space since the DSDT + * shrunk by ~1.5k between QEMU 2.0 and QEMU 2.1. + */ +#define ACPI_BUILD_LEGACY_CPU_AML_SIZE97 +#define ACPI_BUILD_LEGACY_BRIDGE_AML_SIZE 1872 +#define ACPI_BUILD_ALIGN_SIZE 0x1000 + typedef struct AcpiCpuInfo { DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT); } AcpiCpuInfo; @@ -737,6 +748,27 @@ static void patch_pciqxl(int slot, uint8_t *ssdt_ptr) ssdt_ptr[ACPI_PCIQXL_OFFSET_ADR + 2] = slot; } +static void *pci_for_each_bus_incr_func(PCIBus *bus, void *opaque) +{ +unsigned *bsel_alloc = opaque; + +if (bus-qbus.allow_hotplug) { +(*bsel_alloc)++; +} + +return bsel_alloc; +} + +static unsigned acpi_count_hotpluggable_pci_buses(void) +{ +PCIBus *bus = find_i440fx(); /* TODO: Q35 support */ +unsigned bus_count = 0; + +pci_for_each_bus_depth_first(bus, pci_for_each_bus_incr_func, + NULL, bus_count); +return bus_count; +} + /* Assign BSEL property to all buses. In the future, this can be changed * to only assign to buses that support hotplug. */ @@ -1440,13 +1472,14 @@ static void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) { GArray *table_offsets; -unsigned facs, dsdt, rsdt; +unsigned facs, ssdt, dsdt, rsdt; AcpiCpuInfo cpu; AcpiPmInfo pm; AcpiMiscInfo misc; AcpiMcfgInfo mcfg; PcPciInfo pci; uint8_t *u; +size_t aml_len = 0; acpi_get_cpu_info(cpu); acpi_get_pm_info(pm); @@ -1474,13 +1507,20 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) dsdt = tables-table_data-len; build_dsdt(tables-table_data, tables-linker, misc); +/* Count the size of the DSDT and SSDT, we will need it for legacy + * sizing of ACPI tables. + */ +aml_len += tables-table_data-len - dsdt; + /* ACPI tables pointed to by RSDT */ acpi_add_table(table_offsets, tables-table_data); build_fadt(tables-table_data, tables-linker, pm, facs, dsdt); +ssdt = tables-table_data-len; acpi_add_table(table_offsets, tables-table_data); build_ssdt(tables-table_data, tables-linker, cpu, pm, misc, pci, guest_info); +aml_len += tables-table_data-len - ssdt; acpi_add_table(table_offsets, tables-table_data); build_madt(tables-table_data, tables-linker, cpu, guest_info); @@ -1513,14 +1553,50 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) /* RSDP is in FSEG memory, so allocate it separately */ build_rsdp(tables-rsdp, tables-linker, rsdt); -/* We'll expose it all to Guest so align size
[Qemu-devel] [PATCH 3/5] pc: future-proof migration-compatibility of ACPI tables
This patch avoids that similar changes break QEMU again in the future. QEMU will now hard-code 64k as the maximum ACPI table size, which (despite being an order of magnitude smaller than 640k) should be enough for everyone. Reviewed-by: Laszlo Ersek ler...@redhat.com Tested-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-build.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 7d2251f..8d42eaf 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -63,6 +63,8 @@ #define ACPI_BUILD_LEGACY_BRIDGE_AML_SIZE 1875 #define ACPI_BUILD_ALIGN_SIZE 0x1000 +#define ACPI_BUILD_TABLE_SIZE 0x1 + typedef struct AcpiCpuInfo { DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT); } AcpiCpuInfo; @@ -1593,7 +1595,13 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) } g_array_set_size(tables-table_data, legacy_table_size); } else { -acpi_align_size(tables-table_data, ACPI_BUILD_ALIGN_SIZE); +if (tables-table_data-len ACPI_BUILD_TABLE_SIZE) { +/* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots. */ +error_report(ACPI tables are larger than 64k. Please remove); +error_report(CPUs, NUMA nodes, memory slots or PCI bridges.); +exit(1); +} +g_array_set_size(tables-table_data, ACPI_BUILD_TABLE_SIZE); } acpi_align_size(tables-linker, ACPI_BUILD_ALIGN_SIZE); -- 1.8.3.1
[Qemu-devel] [PATCH 4/5] bios-tables-test: fix ASL normalization false positive
My version of IASL (from RHEL7) puts two newlines between the head comment and the DefinitionBlock property. One was already removed because the test uses sizeof instead of strlen, but the extra one breaks the detection of DefinitionBlock. Killing all newlines after the comment drops the warning. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- tests/bios-tables-test.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c index 62771f7..045eb27 100644 --- a/tests/bios-tables-test.c +++ b/tests/bios-tables-test.c @@ -487,7 +487,11 @@ static GString *normalize_asl(gchar *asl_code) /* strip comments (different generation days) */ comment = g_strstr_len(asl-str, asl-len, COMMENT_END); if (comment) { -asl = g_string_erase(asl, 0, comment + sizeof(COMMENT_END) - asl-str); +comment += strlen(COMMENT_END); +while (*comment == '\n') { +comment++; +} +asl = g_string_erase(asl, 0, comment - asl-str); } /* strip def block name (it has file path in it) */ -- 1.8.3.1
[Qemu-devel] [PATCH 5/5] pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is disabled
From: Igor Mammedov imamm...@redhat.com Fixes migration regression from QEMU-1.7 to a newer QEMUs. SSDT table size in QEMU-1.7 doesn't change regardless of a number of PCI bridge devices present at startup. However in QEMU-2.0 since addition of hotplug on PCI bridges, each PCI bridge adds ~1875 bytes to SSDT table, including pc-i440fx-1.7 machine type where PCI bridge hotplug disabled via compat property. It breaks migration from QEMU-1.7 to QEMU-2.[01] -M pc-i440fx-1.7 since RAMBlock size of ACPI tables on target becomes larger then on source and migration fails with: Length mismatch: /rom@etc/acpi/tables: 2000 in != 3000 error. Fix this by generating AML only for PCI0 bus if hotplug on PCI bridges is disabled and preserves PCI brigde description in AML as it was done in QEMU-1.7 for pc-i440fx-1.7. It will help to maintain size of SSDT static regardless of number of PCI bridges on startup for pc-i440fx-1.7 machine type. Signed-off-by: Igor Mammedov imamm...@redhat.com [Affect bus_count too. - Paolo] Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-build.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 8d42eaf..90f1b90 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -77,6 +77,7 @@ typedef struct AcpiMcfgInfo { typedef struct AcpiPmInfo { bool s3_disabled; bool s4_disabled; +bool pcihp_bridge_en; uint8_t s4_val; uint16_t sci_int; uint8_t acpi_enable_cmd; @@ -98,6 +99,7 @@ typedef struct AcpiBuildPciBusHotplugState { GArray *device_table; GArray *notify_table; struct AcpiBuildPciBusHotplugState *parent; +bool pcihp_bridge_en; } AcpiBuildPciBusHotplugState; static void acpi_get_dsdt(AcpiMiscInfo *info) @@ -201,6 +203,9 @@ static void acpi_get_pm_info(AcpiPmInfo *pm) NULL); pm-gpe0_blk_len = object_property_get_int(obj, ACPI_PM_PROP_GPE0_BLK_LEN, NULL); +pm-pcihp_bridge_en = +object_property_get_bool(obj, acpi-pci-hotplug-with-bridge-support, + NULL); } static void acpi_get_misc_info(AcpiMiscInfo *info) @@ -802,11 +807,13 @@ static void acpi_set_pci_info(void) } static void build_pci_bus_state_init(AcpiBuildPciBusHotplugState *state, - AcpiBuildPciBusHotplugState *parent) + AcpiBuildPciBusHotplugState *parent, + bool pcihp_bridge_en) { state-parent = parent; state-device_table = build_alloc_array(); state-notify_table = build_alloc_array(); +state-pcihp_bridge_en = pcihp_bridge_en; } static void build_pci_bus_state_cleanup(AcpiBuildPciBusHotplugState *state) @@ -820,7 +827,7 @@ static void *build_pci_bus_begin(PCIBus *bus, void *parent_state) AcpiBuildPciBusHotplugState *parent = parent_state; AcpiBuildPciBusHotplugState *child = g_malloc(sizeof *child); -build_pci_bus_state_init(child, parent); +build_pci_bus_state_init(child, parent, parent-pcihp_bridge_en); return child; } @@ -841,6 +848,14 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) GArray *method; bool bus_hotplug_support = false; +/* +skip bridge subtree creation if bridge hotplug is disabled +to make it compatible with 1.7 machine type +*/ +if (!child-pcihp_bridge_en bus-parent_dev) { +return; +} + if (bus-parent_dev) { op = 0x82; /* DeviceOp */ build_append_nameseg(bus_table, S%.02X_, @@ -887,7 +902,8 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) pc = PCI_DEVICE_GET_CLASS(pdev); dc = DEVICE_GET_CLASS(pdev); -if (pc-class_id == PCI_CLASS_BRIDGE_ISA || pc-is_bridge) { +if (pc-class_id == PCI_CLASS_BRIDGE_ISA || +(pc-is_bridge child-pcihp_bridge_en)) { set_bit(slot, slot_device_system); } @@ -899,7 +915,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) } } -if (!dc-hotpluggable || pc-is_bridge) { +if (!dc-hotpluggable || (pc-is_bridge child-pcihp_bridge_en)) { clear_bit(slot, slot_hotplug_enable); } } @@ -1164,7 +1180,7 @@ build_ssdt(GArray *table_data, GArray *linker, bus = PCI_HOST_BRIDGE(pci_host)-bus; } -build_pci_bus_state_init(hotplug_state, NULL); +build_pci_bus_state_init(hotplug_state, NULL, pm-pcihp_bridge_en); if (bus) { /* Scan all PCI buses. Generate tables to support hotplug. */ @@ -1578,7 +1594,8 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) /* Subtracting aml_len gives the size of fixed tables. Then add the * size of the PIIX4 DSDT/SSDT in QEMU 2.0.
Re: [Qemu-devel] [PATCH] e1000: Delay LSC until mask is active
On 03.07.14 22:18, Michael S. Tsirkin wrote: On Thu, Jul 03, 2014 at 08:00:04PM +0200, Alexander Graf wrote: On 03.07.14 19:57, Peter Maydell wrote: On 3 July 2014 18:39, Alexander Graf ag...@suse.de wrote: Mac OS X reads ICR on every interrupt. When the IRQ line is shared, this may result in a race where LSC is not interpreted yet, but already gets cleared. The guest already has a way of telling us that it can interpret LSC events though and that's via the interrupt mask register (IMS). So if we just leave the LSC interrupt bit pending, but invisible to the guest as long as it's not ready to receive LSC interrupts, we basically defer the interrupt to the earliest point in time when the guest would know how to handle it. This would break any guests dealing with this in a polling mode (ie permanently leave interrupts masked and read ICR periodically to find out whether anything interesting has happened), right? If those guests would wait for a link detect event that way, yes. Considering all the hackery we already have about link negotiation (delay it until a random amount of ms passed) I'd say the breakage this patch fixes is a lot more likely than a polling guest that waits for a link based on ICR.LSC :). Alex Well that hackery was justified by the claim that real hardware behaves this way: it has a random delay since it needs a bit of time to bring the link up. What's the justification here? How come this driver works with real hardware? Real hardware never shares interrupts? ;) Alex
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work
On 10.07.14 15:10, Christian Borntraeger wrote: From: David Hildenbrand d...@linux.vnet.ibm.com If a cpu is stopped, it must never be allowed to run and no interrupt may wake it up. A cpu also has to be unhalted if it is halted and has work to do - this scenario wasn't hit in kvm case yet, as only disabled wait is processed within QEMU. Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com This looks like it's something that generic infrastructure should take care of, no? How does this work for the other archs? They always get an interrupt on the transition between !has_work - has_work. Why don't we get one for s390x? Alex Well, we have the special case on s390 as a CPU that is in the STOPPED or the CHECK STOP state may never run - even if there is an interrupt. It's basically like this CPU has been switched off. Imagine that it is tried to inject an interrupt into a stopped vcpu. It will kick the stopped vcpu and thus lead to a call to kvm_arch_process_async_events(). We have to deny that this vcpu will ever run as long as it is stopped. It's like a way to suppress the interrupt for such a transition you mentioned. Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a SIGP START to that vcpu). I am not sure if such a mechanism/scenario is applicable to any other arch. They all seem to reset the cs-halted flag if they know they are able to run (e.g. due to an interrupt) - they have no such thing as stopped cpus, only halted/waiting cpus. David
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work
Il 28/07/2014 16:16, David Hildenbrand ha scritto: Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a SIGP START to that vcpu). I am not sure if such a mechanism/scenario is applicable to any other arch. They all seem to reset the cs-halted flag if they know they are able to run (e.g. due to an interrupt) - they have no such thing as stopped cpus, only halted/waiting cpus. On x86, INIT_RECEIVED is pretty much a stopped CPU. It can only run (and receive interrupts) after getting a special startup interrupt (SIPI). Paolo
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work
On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote: On 10.07.14 15:10, Christian Borntraeger wrote: From: David Hildenbrand d...@linux.vnet.ibm.com If a cpu is stopped, it must never be allowed to run and no interrupt may wake it up. A cpu also has to be unhalted if it is halted and has work to do - this scenario wasn't hit in kvm case yet, as only disabled wait is processed within QEMU. Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com This looks like it's something that generic infrastructure should take care of, no? How does this work for the other archs? They always get an interrupt on the transition between !has_work - has_work. Why don't we get one for s390x? Alex Well, we have the special case on s390 as a CPU that is in the STOPPED or the CHECK STOP state may never run - even if there is an interrupt. It's basically like this CPU has been switched off. Imagine that it is tried to inject an interrupt into a stopped vcpu. It will kick the stopped vcpu and thus lead to a call to kvm_arch_process_async_events(). We have to deny that this vcpu will ever run as long as it is stopped. It's like a way to suppress the interrupt for such a transition you mentioned. An interrupt kick usually just means we go back into the main loop. From there we check the interrupt bitmap which interrupt to handle. Check out the handling code here: http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580 If you just check for the stopped state in here, do_interrupt() will never get called and thus the CPU shouldn't ever get executed. Unless I'm heavily mistaken :). Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a SIGP START to that vcpu). Yes, in that case that other CPU generates a signal (a different bit in interrupt_request) and the first CPU would see that it has to wake up and wake up. I am not sure if such a mechanism/scenario is applicable to any other arch. They all seem to reset the cs-halted flag if they know they are able to run (e.g. due to an interrupt) - they have no such thing as stopped cpus, only halted/waiting cpus. There's not really much difference between the two. The only difference from a software point of view is that a stopped CPU has its external interrupt bits masked off, no? Alex
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work
On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote: On 10.07.14 15:10, Christian Borntraeger wrote: From: David Hildenbrand d...@linux.vnet.ibm.com If a cpu is stopped, it must never be allowed to run and no interrupt may wake it up. A cpu also has to be unhalted if it is halted and has work to do - this scenario wasn't hit in kvm case yet, as only disabled wait is processed within QEMU. Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com This looks like it's something that generic infrastructure should take care of, no? How does this work for the other archs? They always get an interrupt on the transition between !has_work - has_work. Why don't we get one for s390x? Alex Well, we have the special case on s390 as a CPU that is in the STOPPED or the CHECK STOP state may never run - even if there is an interrupt. It's basically like this CPU has been switched off. Imagine that it is tried to inject an interrupt into a stopped vcpu. It will kick the stopped vcpu and thus lead to a call to kvm_arch_process_async_events(). We have to deny that this vcpu will ever run as long as it is stopped. It's like a way to suppress the interrupt for such a transition you mentioned. An interrupt kick usually just means we go back into the main loop. From there we check the interrupt bitmap which interrupt to handle. Check out the handling code here: http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580 If you just check for the stopped state in here, do_interrupt() will never get called and thus the CPU shouldn't ever get executed. Unless I'm heavily mistaken :). So you would rather move the check out of has_work() into the main loop in cpu-exec.c and directly into kvm_arch_process_async_events()? This would on the other hand lead to an unhalt of the vcpu in cpu_exec() on any CPU_INTERRUPT_HARD. A VCPU might thus be unhalted although it is not able to run. Is okay? Looking at cpu.c:cpu_thread_is_idle(), we would maybe return false, although we are idle (because we are idle when we are stopped)? My qemu kvm knowledge is way better than the qemu emulation knowledge, so I appreciate any insights :) Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a SIGP START to that vcpu). Yes, in that case that other CPU generates a signal (a different bit in interrupt_request) and the first CPU would see that it has to wake up and wake up. I am not sure if such a mechanism/scenario is applicable to any other arch. They all seem to reset the cs-halted flag if they know they are able to run (e.g. due to an interrupt) - they have no such thing as stopped cpus, only halted/waiting cpus. There's not really much difference between the two. The only difference from a software point of view is that a stopped CPU has its external interrupt bits masked off, no? Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like a SIPI++ as it performs a psw exchange - NMI). So we basically have two paths that can lead to a state change. All interrupt bits may be in any combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START be denied). The other thing may be that on s390, each vcpu (including itself) can put another vcpu into the STOPPED state - I assume that this is different for x86 INIT_RECEIVED. For this reason we have to watch out for bad race conditions (e.g. multiple vcpus working on another vcpu)... David Alex
Re: [Qemu-devel] [PATCH 2/5] pc: hack for migration compatibility from QEMU 2.0
On Mon, Jul 28, 2014 at 04:02:12PM +0200, Paolo Bonzini wrote: Changing the ACPI table size causes migration to break, and the memory hotplug work opened our eyes on how horribly we were breaking things in 2.0 already. The ACPI table size is rounded to the next 4k, which one would think gives some headroom. In practice this is not the case, because the user can control the ACPI table size (each CPU adds 97 bytes to the SSDT and 8 to the MADT) and so some -smp values will break the 4k boundary and fail to migrate. Similarly, PCI bridges add ~1870 bytes to the SSDT. This patch concerns itself with fixing migration from QEMU 2.0. It computes the payload size of QEMU 2.0 and always uses that one. The previous patch shrunk the ACPI tables enough that the QEMU 2.0 size should always be enough; non-AML tables can change depending on the configuration (especially MADT, SRAT, HPET) but they remain the same between QEMU 2.0 and 2.1, so we only compute our padding based on the sizes of the SSDT and DSDT. Migration from QEMU 1.7 should work for guests that have a number of CPUs other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140. It was already broken from QEMU 1.7 to QEMU 2.0 in the same way, though. The amount of AML for a bridge varies a little bit between 1872 and 1875 due to optimized number encodings. Use the smallest value, and let any extra chew away at the slack left by the shrinking of the DSDT. Reviewed-by: Laszlo Ersek ler...@redhat.com Tested-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-build.c | 84 +--- hw/i386/pc_piix.c| 19 hw/i386/pc_q35.c | 5 include/hw/i386/pc.h | 1 + 4 files changed, 105 insertions(+), 4 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index ebc5f03..7d2251f 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -25,7 +25,9 @@ #include glib.h #include qemu-common.h #include qemu/bitmap.h +#include qemu/osdep.h #include qemu/range.h +#include qemu/error-report.h #include hw/pci/pci.h #include qom/cpu.h #include hw/i386/pc.h @@ -52,6 +54,15 @@ #include qapi/qmp/qint.h #include qom/qom-qobject.h +/* These are used to size the ACPI tables for -M pc-i440fx-1.7 and + * -M pc-i440fx-2.0. Even if the actual amount of AML generated grows + * a little bit, there should be plenty of free space since the DSDT + * shrunk by ~1.5k between QEMU 2.0 and QEMU 2.1. + */ +#define ACPI_BUILD_LEGACY_CPU_AML_SIZE97 +#define ACPI_BUILD_LEGACY_BRIDGE_AML_SIZE 1872 Hmm this is wrong if some slot is occupied by a non hotpluggable device, is it not? +#define ACPI_BUILD_ALIGN_SIZE 0x1000 + typedef struct AcpiCpuInfo { DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT); } AcpiCpuInfo; @@ -737,6 +748,27 @@ static void patch_pciqxl(int slot, uint8_t *ssdt_ptr) ssdt_ptr[ACPI_PCIQXL_OFFSET_ADR + 2] = slot; } +static void *pci_for_each_bus_incr_func(PCIBus *bus, void *opaque) +{ +unsigned *bsel_alloc = opaque; + +if (bus-qbus.allow_hotplug) { +(*bsel_alloc)++; +} Hmm I don't know why we do allow_hotplug in the place where you copied this from :( + +return bsel_alloc; +} + +static unsigned acpi_count_hotpluggable_pci_buses(void) +{ +PCIBus *bus = find_i440fx(); /* TODO: Q35 support */ +unsigned bus_count = 0; + +pci_for_each_bus_depth_first(bus, pci_for_each_bus_incr_func, + NULL, bus_count); +return bus_count; +} + /* Assign BSEL property to all buses. In the future, this can be changed * to only assign to buses that support hotplug. */ @@ -1440,13 +1472,14 @@ static void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) { GArray *table_offsets; -unsigned facs, dsdt, rsdt; +unsigned facs, ssdt, dsdt, rsdt; AcpiCpuInfo cpu; AcpiPmInfo pm; AcpiMiscInfo misc; AcpiMcfgInfo mcfg; PcPciInfo pci; uint8_t *u; +size_t aml_len = 0; acpi_get_cpu_info(cpu); acpi_get_pm_info(pm); @@ -1474,13 +1507,20 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) dsdt = tables-table_data-len; build_dsdt(tables-table_data, tables-linker, misc); +/* Count the size of the DSDT and SSDT, we will need it for legacy + * sizing of ACPI tables. + */ +aml_len += tables-table_data-len - dsdt; + /* ACPI tables pointed to by RSDT */ acpi_add_table(table_offsets, tables-table_data); build_fadt(tables-table_data, tables-linker, pm, facs, dsdt); +ssdt = tables-table_data-len; acpi_add_table(table_offsets, tables-table_data); build_ssdt(tables-table_data, tables-linker, cpu, pm, misc, pci, guest_info); +aml_len += tables-table_data-len - ssdt;
Re: [Qemu-devel] [PATCH 2/5] pc: hack for migration compatibility from QEMU 2.0
Il 28/07/2014 17:05, Michael S. Tsirkin ha scritto: +/* These are used to size the ACPI tables for -M pc-i440fx-1.7 and + * -M pc-i440fx-2.0. Even if the actual amount of AML generated grows + * a little bit, there should be plenty of free space since the DSDT + * shrunk by ~1.5k between QEMU 2.0 and QEMU 2.1. + */ +#define ACPI_BUILD_LEGACY_CPU_AML_SIZE97 +#define ACPI_BUILD_LEGACY_BRIDGE_AML_SIZE 1872 Hmm this is wrong if some slot is occupied by a non hotpluggable device, is it not? I honestly have no idea. If even more complicated code is needed, I'd rather get rid of all this bridge stuff altogether (as far as computing the AML size goes), and declare 2.0-2.1 migration broken if the VM has bridges. Paolo
[Qemu-devel] [PATCH 1/5] scsi-bus: prepare scsi_req_new for introduction of parse_cdb
The per-SCSIDevice parse_cdb callback must not be called if the request will go through special SCSIReqOps, so detect the special cases early enough. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi/scsi-bus.c | 51 ++- 1 file changed, 30 insertions(+), 21 deletions(-) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index 4341754..ca4e9f3 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -561,13 +561,38 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, uint8_t *buf, void *hba_private) { SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, d-qdev.parent_bus); +const SCSIReqOps *ops; SCSIRequest *req; -SCSICommand cmd; +SCSICommand cmd = { .len = 0 }; +int ret; + +if ((d-unit_attention.key == UNIT_ATTENTION || + bus-unit_attention.key == UNIT_ATTENTION) +(buf[0] != INQUIRY + buf[0] != REPORT_LUNS + buf[0] != GET_CONFIGURATION + buf[0] != GET_EVENT_STATUS_NOTIFICATION + + /* + * If we already have a pending unit attention condition, + * report this one before triggering another one. + */ + !(buf[0] == REQUEST_SENSE d-sense_is_ua))) { +ops = reqops_unit_attention; +} else if (lun != d-lun || + buf[0] == REPORT_LUNS || + (buf[0] == REQUEST_SENSE d-sense_len)) { +ops = reqops_target_command; +} else { +ops = NULL; +} -if (scsi_req_parse(cmd, d, buf) != 0) { +ret = scsi_req_parse(cmd, d, buf); +if (ret != 0) { trace_scsi_req_parse_bad(d-id, lun, tag, buf[0]); req = scsi_req_alloc(reqops_invalid_opcode, d, tag, lun, hba_private); } else { +assert(cmd.len != 0); trace_scsi_req_parsed(d-id, lun, tag, buf[0], cmd.mode, cmd.xfer); if (cmd.lba != -1) { @@ -577,25 +602,8 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, if (cmd.xfer INT32_MAX) { req = scsi_req_alloc(reqops_invalid_field, d, tag, lun, hba_private); -} else if ((d-unit_attention.key == UNIT_ATTENTION || - bus-unit_attention.key == UNIT_ATTENTION) - (buf[0] != INQUIRY - buf[0] != REPORT_LUNS - buf[0] != GET_CONFIGURATION - buf[0] != GET_EVENT_STATUS_NOTIFICATION - - /* -* If we already have a pending unit attention condition, -* report this one before triggering another one. -*/ - !(buf[0] == REQUEST_SENSE d-sense_is_ua))) { -req = scsi_req_alloc(reqops_unit_attention, d, tag, lun, - hba_private); -} else if (lun != d-lun || - buf[0] == REPORT_LUNS || - (buf[0] == REQUEST_SENSE d-sense_len)) { -req = scsi_req_alloc(reqops_target_command, d, tag, lun, - hba_private); +} else if (ops) { +req = scsi_req_alloc(ops, d, tag, lun, hba_private); } else { req = scsi_device_alloc_req(d, tag, lun, buf, hba_private); } @@ -1186,6 +1194,7 @@ static int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf) { int rc; +cmd-lba = -1; switch (buf[0] 5) { case 0: cmd-len = 6; -- 1.9.3
[Qemu-devel] [PATCH for-2.2 v2 0/5] scsi: enable passthrough of vendor-specific commands
Right now scsi-generic is parsing the CDB, in order to compute the expected number of bytes to be transferred. This is necessary if DMA is done by the HBA via scsi_req_data, but it prevents executing vendor-specific commands via scsi-generic because we don't know how to parse them. If DMA is delegated to the SCSI layer via get_sg_list, we know in advance how many bytes the guest will want to receive and we can pass the information straight from the guest to SG_IO. In this case, it is unnecessary to parse the CDB to get the same information. scsi-disk needs it to detect underruns and overruns, but scsi-generic and scsi-block can just ask the HBA about the transfer direction and size. This series introduces a new parse_cdb callback in both the device and the HBA. The latter is called by scsi_bus_parse_cdb, which devices can call for passthrough requests in their implementation of parse_cdb. Paolo v1-v2: use the right CDB size for non-vendor-specific commands, as some drivers and/or firmware expect that and complain if you pass a READ(10) command in a 16-byte CDB. Interdiff here. diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index f999bfa..6f4462b 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -57,12 +57,14 @@ int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, void *hba_private) { SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, dev-qdev.parent_bus); +int rc; +assert(cmd-len == 0); +rc = scsi_req_parse_cdb(dev, cmd, buf); if (bus-info-parse_cdb) { -return bus-info-parse_cdb(dev, cmd, buf, hba_private); -} else { -return scsi_req_parse_cdb(dev, cmd, buf); +rc = bus-info-parse_cdb(dev, cmd, buf, hba_private); } +return rc; } static SCSIRequest *scsi_device_alloc_req(SCSIDevice *s, uint32_t tag, uint32_t lun, @@ -575,7 +577,7 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, const SCSIReqOps *ops; SCSIDeviceClass *sc = SCSI_DEVICE_GET_CLASS(d); SCSIRequest *req; -SCSICommand cmd; +SCSICommand cmd = { .len = 0 }; int ret; if ((d-unit_attention.key == UNIT_ATTENTION || @@ -609,6 +611,7 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, trace_scsi_req_parse_bad(d-id, lun, tag, buf[0]); req = scsi_req_alloc(reqops_invalid_opcode, d, tag, lun, hba_private); } else { +assert(cmd.len != 0); trace_scsi_req_parsed(d-id, lun, tag, buf[0], cmd.mode, cmd.xfer); if (cmd.lba != -1) { @@ -1210,6 +1213,7 @@ int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf) { int rc; +cmd-lba = -1; switch (buf[0] 5) { case 0: cmd-len = 6; diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index e8f4c0c..2dd9255 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -411,9 +411,10 @@ static int virtio_scsi_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, { VirtIOSCSIReq *req = hba_private; -cmd-lba = -1; -cmd-len = MIN(VIRTIO_SCSI_CDB_SIZE, SCSI_CMD_BUF_SIZE); -memcpy(cmd-buf, buf, cmd-len); +if (cmd-len == 0) { +cmd-len = MIN(VIRTIO_SCSI_CDB_SIZE, SCSI_CMD_BUF_SIZE); +memcpy(cmd-buf, buf, cmd-len); +} /* Extract the direction and mode directly from the request, for * host device passthrough. Paolo Bonzini (5): scsi-bus: prepare scsi_req_new for introduction of parse_cdb scsi-bus: introduce parse_cdb in SCSIDeviceClass and SCSIBusInfo scsi-block: extract scsi_block_is_passthrough scsi-block, scsi-generic: implement parse_cdb virtio-scsi: implement parse_cdb hw/scsi/scsi-bus.c | 74 ++ hw/scsi/scsi-disk.c| 52 +++ hw/scsi/scsi-generic.c | 7 + hw/scsi/virtio-scsi.c | 25 + include/hw/scsi/scsi.h | 7 + 5 files changed, 130 insertions(+), 35 deletions(-) -- 1.9.3
[Qemu-devel] [PATCH 3/5] scsi-block: extract scsi_block_is_passthrough
This will be used for both scsi_block_new_request and the scsi-block implementation of parse_cdb. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi/scsi-disk.c | 38 ++ 1 file changed, 26 insertions(+), 12 deletions(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index d47ecd6..81b7276 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -2501,12 +2501,8 @@ static int scsi_block_initfn(SCSIDevice *dev) return scsi_initfn(s-qdev); } -static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, - uint32_t lun, uint8_t *buf, - void *hba_private) +static bool scsi_block_is_passthrough(SCSIDiskState *s, uint8_t *buf) { -SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d); - switch (buf[0]) { case READ_6: case READ_10: @@ -2523,9 +2519,9 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, case WRITE_VERIFY_12: case WRITE_VERIFY_16: /* If we are not using O_DIRECT, we might read stale data from the -* host cache if writes were made using other commands than these -* ones (such as WRITE SAME or EXTENDED COPY, etc.). So, without -* O_DIRECT everything must go through SG_IO. + * host cache if writes were made using other commands than these + * ones (such as WRITE SAME or EXTENDED COPY, etc.). So, without + * O_DIRECT everything must go through SG_IO. */ if (!(bdrv_get_flags(s-qdev.conf.bs) BDRV_O_NOCACHE)) { break; @@ -2542,13 +2538,31 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, * just make scsi-block operate the same as scsi-generic for them. */ if (s-qdev.type != TYPE_ROM) { -return scsi_req_alloc(scsi_disk_dma_reqops, s-qdev, tag, lun, - hba_private); +return false; } +break; + +default: +break; } -return scsi_req_alloc(scsi_generic_req_ops, s-qdev, tag, lun, - hba_private); +return true; +} + + +static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, + uint32_t lun, uint8_t *buf, + void *hba_private) +{ +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d); + +if (scsi_block_is_passthrough(s, buf)) { +return scsi_req_alloc(scsi_generic_req_ops, s-qdev, tag, lun, + hba_private); +} else { +return scsi_req_alloc(scsi_disk_dma_reqops, s-qdev, tag, lun, + hba_private); +} } #endif -- 1.9.3
[Qemu-devel] [PATCH 4/5] scsi-block, scsi-generic: implement parse_cdb
The callback lets the bus provide the direction and transfer count for passthrough commands, enabling passthrough of vendor-specific commands. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi/scsi-bus.c | 3 +-- hw/scsi/scsi-disk.c| 14 ++ hw/scsi/scsi-generic.c | 7 +++ include/hw/scsi/scsi.h | 1 + 4 files changed, 23 insertions(+), 2 deletions(-) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index d97d18a..6f4462b 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -9,7 +9,6 @@ static char *scsibus_get_dev_path(DeviceState *dev); static char *scsibus_get_fw_dev_path(DeviceState *dev); -static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf); static void scsi_req_dequeue(SCSIRequest *req); static uint8_t *scsi_target_alloc_buf(SCSIRequest *req, size_t len); static void scsi_target_free_buf(SCSIRequest *req); @@ -1210,7 +1209,7 @@ static uint64_t scsi_cmd_lba(SCSICommand *cmd) return lba; } -static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf) +int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf) { int rc; diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 81b7276..d55521d 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -2564,6 +2564,19 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, hba_private); } } + +static int scsi_block_parse_cdb(SCSIDevice *d, SCSICommand *cmd, + uint8_t *buf, void *hba_private) +{ +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d); + +if (scsi_block_is_passthrough(s, buf)) { +return scsi_bus_parse_cdb(s-qdev, cmd, buf, hba_private); +} else { +return scsi_req_parse_cdb(s-qdev, cmd, buf); +} +} + #endif #define DEFINE_SCSI_DISK_PROPERTIES()\ @@ -2672,6 +2685,7 @@ static void scsi_block_class_initfn(ObjectClass *klass, void *data) sc-init = scsi_block_initfn; sc-destroy = scsi_destroy; sc-alloc_req= scsi_block_new_request; +sc-parse_cdb= scsi_block_parse_cdb; dc-fw_name = disk; dc-desc = SCSI block device passthrough; dc-reset = scsi_disk_reset; diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c index 3733d2c..0b2ff90 100644 --- a/hw/scsi/scsi-generic.c +++ b/hw/scsi/scsi-generic.c @@ -490,6 +490,12 @@ static Property scsi_generic_properties[] = { DEFINE_PROP_END_OF_LIST(), }; +static int scsi_generic_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, + uint8_t *buf, void *hba_private) +{ +return scsi_bus_parse_cdb(dev, cmd, buf, hba_private); +} + static void scsi_generic_class_initfn(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -498,6 +504,7 @@ static void scsi_generic_class_initfn(ObjectClass *klass, void *data) sc-init = scsi_generic_initfn; sc-destroy = scsi_destroy; sc-alloc_req= scsi_new_request; +sc-parse_cdb= scsi_generic_parse_cdb; dc-fw_name = disk; dc-desc = pass through generic scsi device (/dev/sg*); dc-reset = scsi_generic_reset; diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h index 4a0b860..a7a28e6 100644 --- a/include/hw/scsi/scsi.h +++ b/include/hw/scsi/scsi.h @@ -250,6 +250,7 @@ void scsi_req_unref(SCSIRequest *req); int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, void *hba_private); +int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf); void scsi_req_build_sense(SCSIRequest *req, SCSISense sense); void scsi_req_print(SCSIRequest *req); void scsi_req_continue(SCSIRequest *req); -- 1.9.3
[Qemu-devel] [PATCH 2/5] scsi-bus: introduce parse_cdb in SCSIDeviceClass and SCSIBusInfo
These callbacks will let devices do their own request parsing, or defer it to the bus. If the bus does not provide an implementation, in turn, fall back to the default parsing routine. Swap the first two arguments to scsi_req_parse, and rename it to scsi_req_parse_cdb, for consistency. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi/scsi-bus.c | 26 +++--- include/hw/scsi/scsi.h | 6 ++ 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index ca4e9f3..d97d18a 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -9,7 +9,7 @@ static char *scsibus_get_dev_path(DeviceState *dev); static char *scsibus_get_fw_dev_path(DeviceState *dev); -static int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf); +static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf); static void scsi_req_dequeue(SCSIRequest *req); static uint8_t *scsi_target_alloc_buf(SCSIRequest *req, size_t len); static void scsi_target_free_buf(SCSIRequest *req); @@ -54,6 +54,20 @@ static void scsi_device_destroy(SCSIDevice *s) } } +int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, + void *hba_private) +{ +SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, dev-qdev.parent_bus); +int rc; + +assert(cmd-len == 0); +rc = scsi_req_parse_cdb(dev, cmd, buf); +if (bus-info-parse_cdb) { +rc = bus-info-parse_cdb(dev, cmd, buf, hba_private); +} +return rc; +} + static SCSIRequest *scsi_device_alloc_req(SCSIDevice *s, uint32_t tag, uint32_t lun, uint8_t *buf, void *hba_private) { @@ -562,6 +576,7 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, { SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, d-qdev.parent_bus); const SCSIReqOps *ops; +SCSIDeviceClass *sc = SCSI_DEVICE_GET_CLASS(d); SCSIRequest *req; SCSICommand cmd = { .len = 0 }; int ret; @@ -587,7 +602,12 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, ops = NULL; } -ret = scsi_req_parse(cmd, d, buf); +if (ops != NULL || !sc-parse_cdb) { +ret = scsi_req_parse_cdb(d, cmd, buf); +} else { +ret = sc-parse_cdb(d, cmd, buf, hba_private); +} + if (ret != 0) { trace_scsi_req_parse_bad(d-id, lun, tag, buf[0]); req = scsi_req_alloc(reqops_invalid_opcode, d, tag, lun, hba_private); @@ -1190,7 +1210,7 @@ static uint64_t scsi_cmd_lba(SCSICommand *cmd) return lba; } -static int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf) +static int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf) { int rc; diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h index 1adb549..4a0b860 100644 --- a/include/hw/scsi/scsi.h +++ b/include/hw/scsi/scsi.h @@ -76,6 +76,8 @@ typedef struct SCSIDeviceClass { DeviceClass parent_class; int (*init)(SCSIDevice *dev); void (*destroy)(SCSIDevice *s); +int (*parse_cdb)(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, + void *hba_private); SCSIRequest *(*alloc_req)(SCSIDevice *s, uint32_t tag, uint32_t lun, uint8_t *buf, void *hba_private); void (*unit_attention_reported)(SCSIDevice *s); @@ -131,6 +133,8 @@ struct SCSIReqOps { struct SCSIBusInfo { int tcq; int max_channel, max_target, max_lun; +int (*parse_cdb)(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, + void *hba_private); void (*transfer_data)(SCSIRequest *req, uint32_t arg); void (*complete)(SCSIRequest *req, uint32_t arg, size_t resid); void (*cancel)(SCSIRequest *req); @@ -244,6 +248,8 @@ void scsi_req_free(SCSIRequest *req); SCSIRequest *scsi_req_ref(SCSIRequest *req); void scsi_req_unref(SCSIRequest *req); +int scsi_bus_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf, + void *hba_private); void scsi_req_build_sense(SCSIRequest *req, SCSISense sense); void scsi_req_print(SCSIRequest *req); void scsi_req_continue(SCSIRequest *req); -- 1.9.3
[Qemu-devel] [PATCH 5/5] virtio-scsi: implement parse_cdb
Enable passthrough of vendor-specific commands. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi/virtio-scsi.c | 25 + 1 file changed, 25 insertions(+) diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 0eb069a..2dd9255 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -406,6 +406,30 @@ static void virtio_scsi_command_complete(SCSIRequest *r, uint32_t status, virtio_scsi_complete_cmd_req(req); } +static int virtio_scsi_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, + uint8_t *buf, void *hba_private) +{ +VirtIOSCSIReq *req = hba_private; + +if (cmd-len == 0) { +cmd-len = MIN(VIRTIO_SCSI_CDB_SIZE, SCSI_CMD_BUF_SIZE); +memcpy(cmd-buf, buf, cmd-len); +} + +/* Extract the direction and mode directly from the request, for + * host device passthrough. + */ +cmd-xfer = req-qsgl.size; +if (cmd-xfer == 0) { +cmd-mode = SCSI_XFER_NONE; +} else if (iov_size(req-elem.in_sg, req-elem.in_num) req-resp_size) { +cmd-mode = SCSI_XFER_FROM_DEV; +} else { +cmd-mode = SCSI_XFER_TO_DEV; +} +return 0; +} + static QEMUSGList *virtio_scsi_get_sg_list(SCSIRequest *r) { VirtIOSCSIReq *req = r-hba_private; @@ -658,6 +682,7 @@ static struct SCSIBusInfo virtio_scsi_scsi_info = { .change = virtio_scsi_change, .hotplug = virtio_scsi_hotplug, .hot_unplug = virtio_scsi_hot_unplug, +.parse_cdb = virtio_scsi_parse_cdb, .get_sg_list = virtio_scsi_get_sg_list, .save_request = virtio_scsi_save_request, .load_request = virtio_scsi_load_request, -- 1.9.3
Re: [Qemu-devel] [PATCH v5 2/2] vmdk: Optimize cluster allocation
On Tue, May 27, 2014 at 04:49:22PM +0800, Fam Zheng wrote: +if (!bs-backing_hd) { +memset(whole_grain, 0, skip_start_sector BDRV_SECTOR_BITS); +memset(whole_grain + (skip_end_sector BDRV_SECTOR_BITS), 0, + cluster_bytes - (skip_end_sector BDRV_SECTOR_BITS)); +} + +assert(skip_end_sector = sector_num + extent-cluster_sectors); Does this assertion make sense? skip_end_sector is a small number of sectors (relative to start of cluster), while sector_num + extent-cluster_sectors is a large absolute sector offset. +/** + * get_cluster_offset + * + * Look up cluster offset in extent file by sector number, and store in + * @cluster_offset. + * + * For flat extent, the start offset as parsed from the description file is s/extent/extents/ + * returned. + * + * For sparse extent, look up in L1, L2 table. If allocate is true, return an s/extent/extents/ + * offset for a new cluster and update L2 cache. If there is a backing file, + * COW is done before returning; otherwise, zeroes are written to the allocated + * cluster. Both COW and zero writting skips the sector range s/writting/writing/ pgpcGqEk_CN0h.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH for 2.1 V3] qemu-img info: show nocow info
On Wed, Jul 09, 2014 at 10:43:13AM +0800, Chunyan Liu wrote: Add nocow info in 'qemu-img info' output to show whether the file currently has NOCOW flag set or not. Signed-off-by: Chunyan Liu cy...@suse.com --- Changes: - update output info to NOCOW flag: set block/qapi.c | 25 + qapi/block-core.json | 5 - 2 files changed, 29 insertions(+), 1 deletion(-) This patch was sent on July 9th, after the 2.1 soft freeze when we stop merging new features. Soft freeze was 17th of June. Please resend for QEMU 2.2 and update the qapi-schema.json version comment. Thanks, Stefan pgpnaVyvL6A24.pgp Description: PGP signature
[Qemu-devel] [PATCH v4 0/5] ACPI fixes for QEMU 2.1
v3-v4: drop all pretense of supporting bridges [me] v2-v3: fix tests/acpi-test-data/pc/DSDT [Peter] track down make check failure, fix it [patch 4, me] split patch 2 in two parts [mst] do not make bsel_alloc global [mst] include Igor's bridge patch [mst, as discussed on IRC] Igor Mammedov (1): pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is disabled Paolo Bonzini (4): acpi-dsdt: procedurally generate _PRT pc: hack for migration compatibility from QEMU 2.0 pc: future-proof migration-compatibility of ACPI tables bios-tables-test: fix ASL normalization false positive hw/i386/acpi-build.c| 94 +- hw/i386/acpi-dsdt.dsl | 90 +- hw/i386/acpi-dsdt.hex.generated | 1910 +++ hw/i386/pc_piix.c | 19 + hw/i386/pc_q35.c|5 + include/hw/i386/pc.h|1 + tests/acpi-test-data/pc/DSDT| Bin 4499 - 2807 bytes tests/bios-tables-test.c|6 +- 8 files changed, 263 insertions(+), 1862 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH 4/5] bios-tables-test: fix ASL normalization false positive
My version of IASL (from RHEL7) puts two newlines between the head comment and the DefinitionBlock property. Kill all newlines after the comment, so that normalize_asl works properly. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- tests/bios-tables-test.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c index 62771f7..045eb27 100644 --- a/tests/bios-tables-test.c +++ b/tests/bios-tables-test.c @@ -487,7 +487,11 @@ static GString *normalize_asl(gchar *asl_code) /* strip comments (different generation days) */ comment = g_strstr_len(asl-str, asl-len, COMMENT_END); if (comment) { -asl = g_string_erase(asl, 0, comment + sizeof(COMMENT_END) - asl-str); +comment += strlen(COMMENT_END); +while (*comment == '\n') { +comment++; +} +asl = g_string_erase(asl, 0, comment - asl-str); } /* strip def block name (it has file path in it) */ -- 1.8.3.1
[Qemu-devel] [PATCH 1/5] acpi-dsdt: procedurally generate _PRT
This replaces the _PRT constant with a method that computes it. The problem is that the DSDT+SSDT have grown from 2.0 to 2.1, enough to cross the 8k barrier (we align the ACPI tables to 4k before putting them in fw_cfg). This causes problems with migration and the pc-i440fx-2.0 machine type. The solution to the problem is to hardcode 64k as the limit, but this doesn't solve the bug with pc-i440fx-2.0. The fix will be for QEMU 2.1 to use exactly the same size as QEMU 2.0 for the ACPI tables. First, however, we must make the actual AML equal or smaller; to do this, rewrite _PRT in a way that saves over 1k of bytecode. Reviewed-by: Laszlo Ersek ler...@redhat.com Tested-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-dsdt.dsl | 90 +- hw/i386/acpi-dsdt.hex.generated | 1910 +++ tests/acpi-test-data/pc/DSDT| Bin 4499 - 2807 bytes 3 files changed, 148 insertions(+), 1852 deletions(-) diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl index 3cc0ea0..6ba0170 100644 --- a/hw/i386/acpi-dsdt.dsl +++ b/hw/i386/acpi-dsdt.dsl @@ -181,57 +181,45 @@ DefinitionBlock ( Scope(\_SB) { Scope(PCI0) { -Name(_PRT, Package() { -/* PCI IRQ routing table, example from ACPI 2.0a specification, - section 6.2.8.1 */ -/* Note: we provide the same info as the PCI routing - table of the Bochs BIOS */ - -#define prt_slot(nr, lnk0, lnk1, lnk2, lnk3) \ -Package() { nr##, 0, lnk0, 0 }, \ -Package() { nr##, 1, lnk1, 0 }, \ -Package() { nr##, 2, lnk2, 0 }, \ -Package() { nr##, 3, lnk3, 0 } - -#define prt_slot0(nr) prt_slot(nr, LNKD, LNKA, LNKB, LNKC) -#define prt_slot1(nr) prt_slot(nr, LNKA, LNKB, LNKC, LNKD) -#define prt_slot2(nr) prt_slot(nr, LNKB, LNKC, LNKD, LNKA) -#define prt_slot3(nr) prt_slot(nr, LNKC, LNKD, LNKA, LNKB) - -prt_slot0(0x), -/* Device 1 is power mgmt device, and can only use irq 9 */ -prt_slot(0x0001, LNKS, LNKB, LNKC, LNKD), -prt_slot2(0x0002), -prt_slot3(0x0003), -prt_slot0(0x0004), -prt_slot1(0x0005), -prt_slot2(0x0006), -prt_slot3(0x0007), -prt_slot0(0x0008), -prt_slot1(0x0009), -prt_slot2(0x000a), -prt_slot3(0x000b), -prt_slot0(0x000c), -prt_slot1(0x000d), -prt_slot2(0x000e), -prt_slot3(0x000f), -prt_slot0(0x0010), -prt_slot1(0x0011), -prt_slot2(0x0012), -prt_slot3(0x0013), -prt_slot0(0x0014), -prt_slot1(0x0015), -prt_slot2(0x0016), -prt_slot3(0x0017), -prt_slot0(0x0018), -prt_slot1(0x0019), -prt_slot2(0x001a), -prt_slot3(0x001b), -prt_slot0(0x001c), -prt_slot1(0x001d), -prt_slot2(0x001e), -prt_slot3(0x001f), -}) +Method (_PRT, 0) { +Store(Package(128) {}, Local0) +Store(Zero, Local1) +While(LLess(Local1, 128)) { +// slot = pin 2 +Store(ShiftRight(Local1, 2), Local2) + +// lnk = (slot + pin) 3 +Store(And(Add(Local1, Local2), 3), Local3) +If (LEqual(Local3, 0)) { +Store(Package(4) { Zero, Zero, LNKD, Zero }, Local4) +} +If (LEqual(Local3, 1)) { +// device 1 is the power-management device, needs SCI +If (LEqual(Local1, 4)) { +Store(Package(4) { Zero, Zero, LNKS, Zero }, Local4) +} Else { +Store(Package(4) { Zero, Zero, LNKA, Zero }, Local4) +} +} +If (LEqual(Local3, 2)) { +Store(Package(4) { Zero, Zero, LNKB, Zero }, Local4) +} +If (LEqual(Local3, 3)) { +Store(Package(4) { Zero, Zero, LNKC, Zero }, Local4) +} + +// Complete the interrupt routing entry: +//Package(4) { 0x[slot], [pin], [link], 0) } + +Store(Or(ShiftLeft(Local2, 16), 0x), Index(Local4, 0)) +Store(And(Local1, 3),Index(Local4, 1)) +Store(Local4,Index(Local0, Local1)) + +Increment(Local1) +} + +Return(Local0) +} }
[Qemu-devel] [PATCH 3/5] pc: future-proof migration-compatibility of ACPI tables
This patch avoids that similar changes break QEMU again in the future. QEMU will now hard-code 64k as the maximum ACPI table size, which (despite being an order of magnitude smaller than 640k) should be enough for everyone. Reviewed-by: Laszlo Ersek ler...@redhat.com Tested-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-build.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index a3d5822..25cf297 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -62,6 +62,8 @@ #define ACPI_BUILD_LEGACY_CPU_AML_SIZE97 #define ACPI_BUILD_ALIGN_SIZE 0x1000 +#define ACPI_BUILD_TABLE_SIZE 0x1 + typedef struct AcpiCpuInfo { DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT); } AcpiCpuInfo; @@ -1569,7 +1571,13 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) } g_array_set_size(tables-table_data, legacy_table_size); } else { -acpi_align_size(tables-table_data, ACPI_BUILD_ALIGN_SIZE); +if (tables-table_data-len ACPI_BUILD_TABLE_SIZE) { +/* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots. */ +error_report(ACPI tables are larger than 64k. Please remove); +error_report(CPUs, NUMA nodes, memory slots or PCI bridges.); +exit(1); +} +g_array_set_size(tables-table_data, ACPI_BUILD_TABLE_SIZE); } acpi_align_size(tables-linker, ACPI_BUILD_ALIGN_SIZE); -- 1.8.3.1
[Qemu-devel] [PATCH 5/5] pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is disabled
From: Igor Mammedov imamm...@redhat.com Fixes migration regression from QEMU-1.7 to a newer QEMUs. SSDT table size in QEMU-1.7 doesn't change regardless of a number of PCI bridge devices present at startup. However in QEMU-2.0 since addition of hotplug on PCI bridges, each PCI bridge adds ~1875 bytes to SSDT table, including pc-i440fx-1.7 machine type where PCI bridge hotplug disabled via compat property. It breaks migration from QEMU-1.7 to QEMU-2.[01] -M pc-i440fx-1.7 since RAMBlock size of ACPI tables on target becomes larger then on source and migration fails with: Length mismatch: /rom@etc/acpi/tables: 2000 in != 3000 error. Fix this by generating AML only for PCI0 bus if hotplug on PCI bridges is disabled and preserves PCI brigde description in AML as it was done in QEMU-1.7 for pc-i440fx-1.7. It will help to maintain size of SSDT static regardless of number of PCI bridges on startup for pc-i440fx-1.7 machine type. Signed-off-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-build.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 25cf297..b92c531 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -76,6 +76,7 @@ typedef struct AcpiMcfgInfo { typedef struct AcpiPmInfo { bool s3_disabled; bool s4_disabled; +bool pcihp_bridge_en; uint8_t s4_val; uint16_t sci_int; uint8_t acpi_enable_cmd; @@ -97,6 +98,7 @@ typedef struct AcpiBuildPciBusHotplugState { GArray *device_table; GArray *notify_table; struct AcpiBuildPciBusHotplugState *parent; +bool pcihp_bridge_en; } AcpiBuildPciBusHotplugState; static void acpi_get_dsdt(AcpiMiscInfo *info) @@ -200,6 +202,9 @@ static void acpi_get_pm_info(AcpiPmInfo *pm) NULL); pm-gpe0_blk_len = object_property_get_int(obj, ACPI_PM_PROP_GPE0_BLK_LEN, NULL); +pm-pcihp_bridge_en = +object_property_get_bool(obj, acpi-pci-hotplug-with-bridge-support, + NULL); } static void acpi_get_misc_info(AcpiMiscInfo *info) @@ -780,11 +785,13 @@ static void acpi_set_pci_info(void) } static void build_pci_bus_state_init(AcpiBuildPciBusHotplugState *state, - AcpiBuildPciBusHotplugState *parent) + AcpiBuildPciBusHotplugState *parent, + bool pcihp_bridge_en) { state-parent = parent; state-device_table = build_alloc_array(); state-notify_table = build_alloc_array(); +state-pcihp_bridge_en = pcihp_bridge_en; } static void build_pci_bus_state_cleanup(AcpiBuildPciBusHotplugState *state) @@ -798,7 +805,7 @@ static void *build_pci_bus_begin(PCIBus *bus, void *parent_state) AcpiBuildPciBusHotplugState *parent = parent_state; AcpiBuildPciBusHotplugState *child = g_malloc(sizeof *child); -build_pci_bus_state_init(child, parent); +build_pci_bus_state_init(child, parent, parent-pcihp_bridge_en); return child; } @@ -819,6 +826,14 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) GArray *method; bool bus_hotplug_support = false; +/* +skip bridge subtree creation if bridge hotplug is disabled +to make it compatible with 1.7 machine type +*/ +if (!child-pcihp_bridge_en bus-parent_dev) { +return; +} + if (bus-parent_dev) { op = 0x82; /* DeviceOp */ build_append_nameseg(bus_table, S%.02X_, @@ -865,7 +880,8 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) pc = PCI_DEVICE_GET_CLASS(pdev); dc = DEVICE_GET_CLASS(pdev); -if (pc-class_id == PCI_CLASS_BRIDGE_ISA || pc-is_bridge) { +if (pc-class_id == PCI_CLASS_BRIDGE_ISA || +(pc-is_bridge child-pcihp_bridge_en)) { set_bit(slot, slot_device_system); } @@ -877,7 +893,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) } } -if (!dc-hotpluggable || pc-is_bridge) { +if (!dc-hotpluggable || (pc-is_bridge child-pcihp_bridge_en)) { clear_bit(slot, slot_hotplug_enable); } } @@ -1142,7 +1158,7 @@ build_ssdt(GArray *table_data, GArray *linker, bus = PCI_HOST_BRIDGE(pci_host)-bus; } -build_pci_bus_state_init(hotplug_state, NULL); +build_pci_bus_state_init(hotplug_state, NULL, pm-pcihp_bridge_en); if (bus) { /* Scan all PCI buses. Generate tables to support hotplug. */ -- 1.8.3.1
[Qemu-devel] [PATCH 2/5] pc: hack for migration compatibility from QEMU 2.0
Changing the ACPI table size causes migration to break, and the memory hotplug work opened our eyes on how horribly we were breaking things in 2.0 already. The ACPI table size is rounded to the next 4k, which one would think gives some headroom. In practice this is not the case, because the user can control the ACPI table size (each CPU adds 97 bytes to the SSDT and 8 to the MADT) and so some -smp values will break the 4k boundary and fail to migrate. Similarly, PCI bridges add ~1870 bytes to the SSDT. This patch concerns itself with fixing migration from QEMU 2.0. It computes the payload size of QEMU 2.0 and always uses that one. The previous patch shrunk the ACPI tables enough that the QEMU 2.0 size should always be enough; non-AML tables can change depending on the configuration (especially MADT, SRAT, HPET) but they remain the same between QEMU 2.0 and 2.1, so we only compute our padding based on the sizes of the SSDT and DSDT. Migration from QEMU 1.7 should work for guests that have a number of CPUs other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140. It was already broken from QEMU 1.7 to QEMU 2.0 in the same way, though. Even with this patch, QEMU 1.7 and 2.0 have two different ideas of -M pc-i440fx-2.0 when there are PCI bridges. Igor sent a patch to adopt the QEMU 1.7 definition. I think distributions should apply it if they move directly from QEMU 1.7 to 2.1+ without ever packaging version 2.0. Reviewed-by: Laszlo Ersek ler...@redhat.com Tested-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-build.c | 60 hw/i386/pc_piix.c| 19 + hw/i386/pc_q35.c | 5 + include/hw/i386/pc.h | 1 + 4 files changed, 81 insertions(+), 4 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index ebc5f03..a3d5822 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -25,7 +25,9 @@ #include glib.h #include qemu-common.h #include qemu/bitmap.h +#include qemu/osdep.h #include qemu/range.h +#include qemu/error-report.h #include hw/pci/pci.h #include qom/cpu.h #include hw/i386/pc.h @@ -52,6 +54,14 @@ #include qapi/qmp/qint.h #include qom/qom-qobject.h +/* These are used to size the ACPI tables for -M pc-i440fx-1.7 and + * -M pc-i440fx-2.0. Even if the actual amount of AML generated grows + * a little bit, there should be plenty of free space since the DSDT + * shrunk by ~1.5k between QEMU 2.0 and QEMU 2.1. + */ +#define ACPI_BUILD_LEGACY_CPU_AML_SIZE97 +#define ACPI_BUILD_ALIGN_SIZE 0x1000 + typedef struct AcpiCpuInfo { DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT); } AcpiCpuInfo; @@ -1440,13 +1450,14 @@ static void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) { GArray *table_offsets; -unsigned facs, dsdt, rsdt; +unsigned facs, ssdt, dsdt, rsdt; AcpiCpuInfo cpu; AcpiPmInfo pm; AcpiMiscInfo misc; AcpiMcfgInfo mcfg; PcPciInfo pci; uint8_t *u; +size_t aml_len = 0; acpi_get_cpu_info(cpu); acpi_get_pm_info(pm); @@ -1474,13 +1485,20 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) dsdt = tables-table_data-len; build_dsdt(tables-table_data, tables-linker, misc); +/* Count the size of the DSDT and SSDT, we will need it for legacy + * sizing of ACPI tables. + */ +aml_len += tables-table_data-len - dsdt; + /* ACPI tables pointed to by RSDT */ acpi_add_table(table_offsets, tables-table_data); build_fadt(tables-table_data, tables-linker, pm, facs, dsdt); +ssdt = tables-table_data-len; acpi_add_table(table_offsets, tables-table_data); build_ssdt(tables-table_data, tables-linker, cpu, pm, misc, pci, guest_info); +aml_len += tables-table_data-len - ssdt; acpi_add_table(table_offsets, tables-table_data); build_madt(tables-table_data, tables-linker, cpu, guest_info); @@ -1513,14 +1531,45 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) /* RSDP is in FSEG memory, so allocate it separately */ build_rsdp(tables-rsdp, tables-linker, rsdt); -/* We'll expose it all to Guest so align size to reduce +/* We'll expose it all to Guest so we want to reduce * chance of size changes. * RSDP is small so it's easy to keep it immutable, no need to * bother with alignment. + * + * We used to align the tables to 4k, but of course this would + * too simple to be enough. 4k turned out to be too small an + * alignment very soon, and in fact it is almost impossible to + * keep the table size stable for all (max_cpus, max_memory_slots) + * combinations. So the table size is always 64k for pc-i440fx-2.1 + * and we give an error if the table grows beyond that limit. + * + * We still have the problem of migrating from -M pc-i440fx-2.0. For + * that, we exploit
[Qemu-devel] [PATCH 1/3] loader: add support for resizeable blobs
Support resizeable blobs: we allocate more memory than currently available in the blob, which can later be filled in. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- include/hw/loader.h | 14 +++-- include/hw/nvram/fw_cfg.h | 2 +- hw/core/loader.c | 15 + hw/nvram/fw_cfg.c | 79 --- 4 files changed, 96 insertions(+), 14 deletions(-) diff --git a/include/hw/loader.h b/include/hw/loader.h index 796cbf9..ecce654 100644 --- a/include/hw/loader.h +++ b/include/hw/loader.h @@ -55,9 +55,19 @@ extern bool rom_file_has_mr; int rom_add_file(const char *file, const char *fw_dir, hwaddr addr, int32_t bootindex, bool option_rom); -void *rom_add_blob(const char *name, const void *blob, size_t len, +void *rom_add_blob_resizeable(const char *name, const void *blob, + size_t len, size_t max_len, + hwaddr addr, const char *fw_file_name, + FWCfgReadCallback fw_callback, + void *callback_opaque); +static inline void *rom_add_blob(const char *name, const void *blob, size_t len, hwaddr addr, const char *fw_file_name, - FWCfgReadCallback fw_callback, void *callback_opaque); + FWCfgReadCallback fw_callback, void *callback_opaque) +{ +return rom_add_blob_resizeable(name, blob, len, len, addr, + fw_file_name, fw_callback, callback_opaque); +} + int rom_add_elf_program(const char *name, void *data, size_t datasize, size_t romsize, hwaddr addr); int rom_load_all(void); diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h index 72b1549..da4f5c5 100644 --- a/include/hw/nvram/fw_cfg.h +++ b/include/hw/nvram/fw_cfg.h @@ -75,7 +75,7 @@ void fw_cfg_add_file(FWCfgState *s, const char *filename, void *data, size_t len); void fw_cfg_add_file_callback(FWCfgState *s, const char *filename, FWCfgReadCallback callback, void *callback_opaque, - void *data, size_t len); + void *data, size_t len, size_t max_len); FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t data_port, hwaddr crl_addr, hwaddr data_addr); diff --git a/hw/core/loader.c b/hw/core/loader.c index 2bf6b8f..ad6ec67 100644 --- a/hw/core/loader.c +++ b/hw/core/loader.c @@ -720,9 +720,11 @@ err: return -1; } -void *rom_add_blob(const char *name, const void *blob, size_t len, - hwaddr addr, const char *fw_file_name, - FWCfgReadCallback fw_callback, void *callback_opaque) +void *rom_add_blob_resizeable(const char *name, const void *blob, + size_t len, size_t max_len, + hwaddr addr, const char *fw_file_name, + FWCfgReadCallback fw_callback, + void *callback_opaque) { Rom *rom; void *data = NULL; @@ -730,9 +732,10 @@ void *rom_add_blob(const char *name, const void *blob, size_t len, rom = g_malloc0(sizeof(*rom)); rom-name = g_strdup(name); rom-addr = addr; -rom-romsize = len; -rom-datasize = len; +rom-romsize = max_len; +rom-datasize = max_len; rom-data = g_malloc0(rom-datasize); +assert(len = rom-datasize); memcpy(rom-data, blob, len); rom_insert(rom); if (fw_file_name fw_cfg) { @@ -748,7 +751,7 @@ void *rom_add_blob(const char *name, const void *blob, size_t len, fw_cfg_add_file_callback(fw_cfg, fw_file_name, fw_callback, callback_opaque, - data, rom-romsize); + data, rom-romsize, rom-datasize); } return data; } diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c index b71d251..65f233e 100644 --- a/hw/nvram/fw_cfg.c +++ b/hw/nvram/fw_cfg.c @@ -38,6 +38,8 @@ #define FW_CFG(obj) OBJECT_CHECK(FWCfgState, (obj), TYPE_FW_CFG) typedef struct FWCfgEntry { +uint32_t max_len; +uint32_t reset_len; uint32_t len; uint8_t *data; void *callback_opaque; @@ -57,6 +59,9 @@ struct FWCfgState { uint16_t cur_entry; uint32_t cur_offset; Notifier machine_ready; +/* Entry length: used for migration */ +#define FW_CFG_LEN_ENTRIES (2 * FW_CFG_MAX_ENTRY) +uint32_t len[FW_CFG_LEN_ENTRIES]; }; #define JPG_FILE 0 @@ -336,6 +341,13 @@ static const MemoryRegionOps fw_cfg_comb_mem_ops = { static void fw_cfg_reset(DeviceState *d) { FWCfgState *s = FW_CFG(d); +int i, j; + +for (i = 0; i ARRAY_SIZE(s-entries); ++i) { +for (j = 0; j ARRAY_SIZE(s-entries[0]); ++j) { +s-entries[i][j].len = s-entries[i][j].reset_len; +} +} fw_cfg_select(s, 0);
[Qemu-devel] [PATCH 2/3] migration: load smaller RAMBlock to a bigger one if permitted
From: Igor Mammedov imamm...@redhat.com Add API to mark memory region as extend-able on migration, to allow migration code to load smaller RAMBlock into a bigger one on destination QEMU instance. This will allow to fix broken migration from QEMU 1.7/2.0 to QEMU 2.1 due to ACPI tables size changes across 1.7/2.0/2.1 versions by marking ACPI tables ROM blob as extend-able. So that smaller tables from previous version could be always migrated to a bigger rom blob on new version. Credits-for-idea: Michael S. Tsirkin m...@redhat.com Signed-off-by: Igor Mammedov imamm...@redhat.com Reviewed-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- include/exec/memory.h | 11 +++ include/exec/ram_addr.h | 3 +++ arch_init.c | 22 +- exec.c | 8 memory.c| 5 + 5 files changed, 44 insertions(+), 5 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index e2c8e3e..f96ddbb 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -894,6 +894,17 @@ bool memory_region_present(MemoryRegion *container, hwaddr addr); bool memory_region_is_mapped(MemoryRegion *mr); /** + * memory_region_permit_extendable_migration: marks #MemoryRegion + * as extendable on migration, allowing the migration code to load + * source memory block of smaller size than destination memory block + * at migration time + * + * @mr: a #MemoryRegion whose #RAMBlock should be marked as + * extendable on migration + */ +void memory_region_permit_extendable_migration(MemoryRegion *mr); + +/** * memory_region_find: translate an address/size relative to a * MemoryRegion into a #MemoryRegionSection. * diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index 6593be1..7a6b782 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -34,6 +34,9 @@ void *qemu_get_ram_ptr(ram_addr_t addr); void qemu_ram_free(ram_addr_t addr); void qemu_ram_free_from_ptr(ram_addr_t addr); +#define RAM_EXTENDABLE_ON_MIGRATE (1U 31) +void qemu_ram_set_extendable_on_migration(ram_addr_t addr); + static inline bool cpu_physical_memory_get_dirty(ram_addr_t start, ram_addr_t length, unsigned client) diff --git a/arch_init.c b/arch_init.c index 8ddaf35..2c0c238 100644 --- a/arch_init.c +++ b/arch_init.c @@ -1071,11 +1071,23 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) QTAILQ_FOREACH(block, ram_list.blocks, next) { if (!strncmp(id, block-idstr, sizeof(id))) { -if (block-length != length) { -error_report(Length mismatch: %s: RAM_ADDR_FMT - in != RAM_ADDR_FMT, id, length, - block-length); -ret = -EINVAL; +if (block-flags RAM_EXTENDABLE_ON_MIGRATE) { +if (block-length length) { +error_report(Length too big: %s: RAM_ADDR_FMT + in RAM_ADDR_FMT, id, length, + block-length); +ret = -EINVAL; +} else { +memset(block-host, 0, block-length); +} +} else { +if (block-length != length) { +error_report(Length mismatch: %s: + RAM_ADDR_FMT in != + RAM_ADDR_FMT, + id, length, block-length); +ret = -EINVAL; +} } break; } diff --git a/exec.c b/exec.c index 765bd94..02536f8e 100644 --- a/exec.c +++ b/exec.c @@ -1214,6 +1214,14 @@ void qemu_ram_unset_idstr(ram_addr_t addr) } } +void qemu_ram_set_extendable_on_migration(ram_addr_t addr) +{ +RAMBlock *block = find_ram_block(addr); + +assert(block != NULL); +block-flags |= RAM_EXTENDABLE_ON_MIGRATE; +} + static int memory_try_enable_merging(void *addr, size_t len) { if (!qemu_opt_get_bool(qemu_get_machine_opts(), mem-merge, true)) { diff --git a/memory.c b/memory.c index 64d7176..744c746 100644 --- a/memory.c +++ b/memory.c @@ -1791,6 +1791,11 @@ bool memory_region_is_mapped(MemoryRegion *mr) return mr-container ? true : false; } +void memory_region_permit_extendable_migration(MemoryRegion *mr) +{ +qemu_ram_set_extendable_on_migration(mr-ram_addr); +} + MemoryRegionSection memory_region_find(MemoryRegion *mr, hwaddr addr,
[Qemu-devel] [PATCH 3/3] loader: mark MR for resizeable blobs as extendable
This makes migration from older QEMU versions more robust. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/core/loader.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/hw/core/loader.c b/hw/core/loader.c index ad6ec67..fc00a87 100644 --- a/hw/core/loader.c +++ b/hw/core/loader.c @@ -745,6 +745,13 @@ void *rom_add_blob_resizeable(const char *name, const void *blob, if (rom_file_has_mr) { data = rom_set_mr(rom, OBJECT(fw_cfg), devpath); +/* If there's padding at tail, blob is resizeable. + * Set flag to allow migration from older QEMU versions + * where this region could have been smaller. + */ +if (max_len != len) { +memory_region_permit_extendable_migration(rom-mr); +} } else { data = rom-data; } -- MST
Re: [Qemu-devel] [PATCH 3/5] pc: future-proof migration-compatibility of ACPI tables
On Mon, Jul 28, 2014 at 05:34:16PM +0200, Paolo Bonzini wrote: This patch avoids that similar changes break QEMU again in the future. QEMU will now hard-code 64k as the maximum ACPI table size, which (despite being an order of magnitude smaller than 640k) should be enough for everyone. Famous last words :) So what worries me here, is that we are potentially breaking legal configurations for the benefit of the minority that cares about cross-version migration. So I'm inclined to apply everything except this patch, and instead, use the patches that I sent to make the ram block very large, something like 1 Megabyte. This localizes the pain to cross-version migration. Reviewed-by: Laszlo Ersek ler...@redhat.com Tested-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-build.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index a3d5822..25cf297 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -62,6 +62,8 @@ #define ACPI_BUILD_LEGACY_CPU_AML_SIZE97 #define ACPI_BUILD_ALIGN_SIZE 0x1000 +#define ACPI_BUILD_TABLE_SIZE 0x1 + typedef struct AcpiCpuInfo { DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT); } AcpiCpuInfo; @@ -1569,7 +1571,13 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) } g_array_set_size(tables-table_data, legacy_table_size); } else { -acpi_align_size(tables-table_data, ACPI_BUILD_ALIGN_SIZE); +if (tables-table_data-len ACPI_BUILD_TABLE_SIZE) { +/* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots. */ +error_report(ACPI tables are larger than 64k. Please remove); +error_report(CPUs, NUMA nodes, memory slots or PCI bridges.); +exit(1); +} +g_array_set_size(tables-table_data, ACPI_BUILD_TABLE_SIZE); } acpi_align_size(tables-linker, ACPI_BUILD_ALIGN_SIZE); -- 1.8.3.1
Re: [Qemu-devel] nic is lost, in the virtual machine running
On Sat, Jul 19, 2014 at 03:31:00PM +0800, melolinux wrote: 1. /usr/bin/qemu-system-x86_64 -machine accel=kvm -name a58970a2-10aa-4973-9261-7384b2f53221 -S -machine pc-0.14,accel=kvm,usb=off -m 4096 -smp 4,sockets=1,cores=4,threads=1 -uuid f3a2217f-ae5c-461b-a922-d37a3a98edc6 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/a58970a2-10aa-4973-9261-7384b2f53221.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/dev/493bdd93-6eb4-4499-a344-99cd0abef251/95eb250a-05ab-4da1-8673-8a616b4d0a6e,if=none,id=drive-ide0-0-0,format=qcow2,cache=writeback -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive file=/dev/493bdd93-6eb4-4499-a344-99cd0abef251/27d2df8a-dc00-4f6a-8669-f5dffc7a92d8,if=none,id=drive-ide0-0-1,format=qcow2,cache=writeback -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -netdev tap,ifname=tapa589700,script=no,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=c6:b0:5c:de:54:39,bus=pci.0,addr=0x3 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5900,addr=10.9.40.213,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=9437184 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 2. Problem found :the client can't find the network device (It takes a long time since we create the vm) 3. we use qemu-monitor-commond --hmp xx info pci to check it and find that we lost the ethernet controller. There is not enough information in #2 and #3 to say what is going on. What does lspci report inside the guest? Does rmmod and modprobe of the NIC driver inside the guest bring the interface back? Did you check that QEMU's info pci reports the NIC when the guest has booted but doesn't report it when the network becomes unavailable? If yes, then please check the libvirt log for hot unplug events (/var/log/libvirt). Stefan pgpGud3TZbN1S.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH 3/5] pc: future-proof migration-compatibility of ACPI tables
Il 28/07/2014 17:59, Michael S. Tsirkin ha scritto: On Mon, Jul 28, 2014 at 05:34:16PM +0200, Paolo Bonzini wrote: This patch avoids that similar changes break QEMU again in the future. QEMU will now hard-code 64k as the maximum ACPI table size, which (despite being an order of magnitude smaller than 640k) should be enough for everyone. Famous last words :) So what worries me here, is that we are potentially breaking legal configurations for the benefit of the minority that cares about cross-version migration. So I'm inclined to apply everything except this patch, and instead, use the patches that I sent to make the ram block very large, something like 1 Megabyte. Even just 128k are enough for 160 VCPUs, 255 memory slots and 35-40 PCI bridges. And for 2.2 I'd rather move to the other model where all user-defined elements (MADT, SSDT) are in a separate file and we guarantee that *all* changes are versioned by machine type. What do you think about just changing 64k-128k? Your patch is a huge amount of code for -rc4. Paolo
Re: [Qemu-devel] [Bug 1347387] [NEW] while i was created the new virtual machine using qemu the following error was shown in fedora version 20
On Wed, Jul 23, 2014 at 04:43:27AM -, selvakumar wrote: Public bug reported: [root@localhost pkgs]# qemu-img create virtualdisk.img 100M qemu-img: symbol lookup error: qemu-img: undefined symbol: glfs_discard_async CCing Fedora qemu-kvm package maintainer. Stefan [root@localhost pkgs]# qemu-i386 create virtualdisk.img 100M Error while loading create: No such file or directory [root@localhost pkgs]# rpm -qa qemu-kvm libvirt qemu-kvm-1.6.2-6.fc20.x86_64 libvirt-1.1.3.5-2.fc20.x86_64 [root@localhost pkgs]# [root@localhost pkgs]# rpm -qa|grep *qemu* qemu-system-m68k-1.6.2-6.fc20.x86_64 qemu-kvm-1.6.2-6.fc20.x86_64 qemu-system-microblaze-1.6.2-6.fc20.x86_64 ipxe-roms-qemu-20130517-3.gitc4bce43.fc20.noarch qemu-common-1.6.2-6.fc20.x86_64 qemu-system-sh4-1.6.2-6.fc20.x86_64 qemu-system-sparc-1.6.2-6.fc20.x86_64 qemu-system-lm32-1.6.2-6.fc20.x86_64 qemu-img-1.6.2-6.fc20.x86_64 qemu-system-s390x-1.6.2-6.fc20.x86_64 qemu-system-cris-1.6.2-6.fc20.x86_64 qemu-1.6.2-6.fc20.x86_64 qemu-system-xtensa-1.6.2-6.fc20.x86_64 qemu-system-moxie-1.6.2-6.fc20.x86_64 qemu-system-ppc-1.6.2-6.fc20.x86_64 libvirt-daemon-driver-qemu-1.1.3.5-2.fc20.x86_64 qemu-system-mips-1.6.2-6.fc20.x86_64 qemu-system-alpha-1.6.2-6.fc20.x86_64 qemu-guest-agent-1.6.1-2.fc20.x86_64 qemu-user-1.6.2-6.fc20.x86_64 qemu-system-x86-1.6.2-6.fc20.x86_64 qemu-system-arm-1.6.2-6.fc20.x86_64 qemu-system-unicore32-1.6.2-6.fc20.x86_64 qemu-system-or32-1.6.2-6.fc20.x86_64 [root@localhost pkgs]# ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1347387 Title: while i was created the new virtual machine using qemu the following error was shown in fedora version 20 Status in QEMU: New Bug description: [root@localhost pkgs]# qemu-img create virtualdisk.img 100M qemu-img: symbol lookup error: qemu-img: undefined symbol: glfs_discard_async [root@localhost pkgs]# qemu-i386 create virtualdisk.img 100M Error while loading create: No such file or directory [root@localhost pkgs]# rpm -qa qemu-kvm libvirt qemu-kvm-1.6.2-6.fc20.x86_64 libvirt-1.1.3.5-2.fc20.x86_64 [root@localhost pkgs]# [root@localhost pkgs]# rpm -qa|grep *qemu* qemu-system-m68k-1.6.2-6.fc20.x86_64 qemu-kvm-1.6.2-6.fc20.x86_64 qemu-system-microblaze-1.6.2-6.fc20.x86_64 ipxe-roms-qemu-20130517-3.gitc4bce43.fc20.noarch qemu-common-1.6.2-6.fc20.x86_64 qemu-system-sh4-1.6.2-6.fc20.x86_64 qemu-system-sparc-1.6.2-6.fc20.x86_64 qemu-system-lm32-1.6.2-6.fc20.x86_64 qemu-img-1.6.2-6.fc20.x86_64 qemu-system-s390x-1.6.2-6.fc20.x86_64 qemu-system-cris-1.6.2-6.fc20.x86_64 qemu-1.6.2-6.fc20.x86_64 qemu-system-xtensa-1.6.2-6.fc20.x86_64 qemu-system-moxie-1.6.2-6.fc20.x86_64 qemu-system-ppc-1.6.2-6.fc20.x86_64 libvirt-daemon-driver-qemu-1.1.3.5-2.fc20.x86_64 qemu-system-mips-1.6.2-6.fc20.x86_64 qemu-system-alpha-1.6.2-6.fc20.x86_64 qemu-guest-agent-1.6.1-2.fc20.x86_64 qemu-user-1.6.2-6.fc20.x86_64 qemu-system-x86-1.6.2-6.fc20.x86_64 qemu-system-arm-1.6.2-6.fc20.x86_64 qemu-system-unicore32-1.6.2-6.fc20.x86_64 qemu-system-or32-1.6.2-6.fc20.x86_64 [root@localhost pkgs]# To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1347387/+subscriptions pgpgvyCSnx7qc.pgp Description: PGP signature
[Qemu-devel] Dynamic QEMU platform device instantiation in machine files: phone call on Wed July 30
Dear all, For your information, a phone call will be held this week on Wed July 30, 17h-18h CET to address the topic of dynamic instantiation of QEMU platform devices in machine files (using the -device qemu option). Related threads are: http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg01019.html The objective of the meeting is to discuss (and hopefully reach a consensus) on where we can put the code associated to platform device dt node generation and also qom binding stuff. In case you are interested in this discussion, please contact me and I will send you the phone call details. Best Regards Eric
Re: [Qemu-devel] [PATCH v3] docs/multiple-iothreads.txt: add documentation on IOThread programming
On Wed, Jul 23, 2014 at 12:55:32PM +0100, Stefan Hajnoczi wrote: This document explains how IOThreads and the main loop are related, especially how to write code that can run in an IOThread. Currently only virtio-blk-data-plane uses these techniques. The next obvious target is virtio-scsi; there has also been work on virtio-net. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- docs/multiple-iothreads.txt | 134 1 file changed, 134 insertions(+) create mode 100644 docs/multiple-iothreads.txt Applied to my block-next tree: https://github.com/stefanha/qemu/commits/block-next Stefan pgpG3fkA8n1To.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH for-2.1] qemu-options: fix another allows-to for -net l2tpv3
On Thu, Jul 24, 2014 at 08:11:26PM +0400, Michael Tokarev wrote: Signed-off-by: Michael Tokarev m...@tls.msk.ru --- qemu-options.hx |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/qemu-options.hx b/qemu-options.hx index 9e54686..1549625 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -1437,7 +1437,7 @@ DEF(net, HAS_ARG, QEMU_OPTION_net, -net l2tpv3[,vlan=n][,name=str],src=srcaddr,dst=dstaddr[,srcport=srcport][,dstport=dstport],txsession=txsession[,rxsession=rxsession][,ipv6=on/off][,udp=on/off][,cookie64=on/off][,counter][,pincounter][,txcookie=txcookie][,rxcookie=rxcookie][,offset=offset]\n connect the VLAN to an Ethernet over L2TPv3 pseudowire\n Linux kernel 3.3+ as well as most routers can talk\n -L2TPv3. This transport allows to connect a VM to a VM,\n +L2TPv3. This transport allows connecting a VM to a VM,\n Reviewed-by: Stefan Hajnoczi stefa...@redhat.com pgpkBKzxPozae.pgp Description: PGP signature
[Qemu-devel] [PATCH v2 1/4] parallels: extend parallels format header with actual data values
Parallels image format has several additional fields inside: - nb_sectors is actually 64 bit wide. Upper 32bits are not used for images with signature WithoutFreeSpace and must be explicitly zeroed according to Parallels. They will be used for images with signature WithouFreSpacExt - inuse is magic which means that the image is currently opened for read/write or was not closed correctly, the magic is 0x746f6e59 - data_off is the location of the first data block. It can be zero and in this case data starts just beyond the header aligned to 512 bytes. Though this field does not matter for read-only driver This patch adds these values to struct parallels_header and adds proper handling of nb_sectors for currently supported WithoutFreeSpace images. WithouFreSpacExt will be covered in next patches. Signed-off-by: Denis V. Lunev d...@openvz.org CC: Kevin Wolf kw...@redhat.com CC: Stefan Hajnoczi stefa...@redhat.com CC: Jeff Cody jc...@redhat.com --- block/parallels.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/block/parallels.c b/block/parallels.c index 1a5bd35..e39 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -41,8 +41,10 @@ struct parallels_header { uint32_t cylinders; uint32_t tracks; uint32_t catalog_entries; -uint32_t nb_sectors; -char padding[24]; +uint64_t nb_sectors; +uint32_t inuse; +uint32_t data_off; +char padding[12]; } QEMU_PACKED; typedef struct BDRVParallelsState { @@ -90,7 +92,7 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags, goto fail; } -bs-total_sectors = le32_to_cpu(ph.nb_sectors); +bs-total_sectors = 0x le64_to_cpu(ph.nb_sectors); s-tracks = le32_to_cpu(ph.tracks); if (s-tracks == 0) { -- 1.9.1
[Qemu-devel] [PATCH v2 2/4] parallels: replace tabs with spaces in block/parallels.c
Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Jeff Cody jc...@redhat.com CC: Kevin Wolf kw...@redhat.com CC: Stefan Hajnoczi stefa...@redhat.com --- block/parallels.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/block/parallels.c b/block/parallels.c index e39..16d14ad 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -61,11 +61,11 @@ static int parallels_probe(const uint8_t *buf, int buf_size, const char *filenam const struct parallels_header *ph = (const void *)buf; if (buf_size HEADER_SIZE) - return 0; +return 0; if (!memcmp(ph-magic, HEADER_MAGIC, 16) - (le32_to_cpu(ph-version) == HEADER_VERSION)) - return 100; +(le32_to_cpu(ph-version) == HEADER_VERSION)) +return 100; return 0; } @@ -115,7 +115,7 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags, } for (i = 0; i s-catalog_size; i++) - le32_to_cpus(s-catalog_bitmap[i]); +le32_to_cpus(s-catalog_bitmap[i]); qemu_co_mutex_init(s-lock); return 0; @@ -135,7 +135,7 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num) /* not allocated */ if ((index s-catalog_size) || (s-catalog_bitmap[index] == 0)) - return -1; +return -1; return (uint64_t)(s-catalog_bitmap[index] + offset) * 512; } -- 1.9.1
[Qemu-devel] [PATCH v2 3/4] parallels: split check for parallels format in parallels_open
and rework error path a bit. There is no difference at the moment, but the code will be definitely shorter when additional processing will be required for WithouFreSpacExt Signed-off-by: Denis V. Lunev d...@openvz.org CC: Jeff Cody jc...@redhat.com CC: Kevin Wolf kw...@redhat.com CC: Stefan Hajnoczi stefa...@redhat.com --- block/parallels.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/block/parallels.c b/block/parallels.c index 16d14ad..466705e 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -85,11 +85,11 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags, goto fail; } -if (memcmp(ph.magic, HEADER_MAGIC, 16) || -(le32_to_cpu(ph.version) != HEADER_VERSION)) { -error_setg(errp, Image not in Parallels format); -ret = -EINVAL; -goto fail; +if (le32_to_cpu(ph.version) != HEADER_VERSION) { +goto fail_format; +} +if (memcmp(ph.magic, HEADER_MAGIC, 16)) { +goto fail_format; } bs-total_sectors = 0x le64_to_cpu(ph.nb_sectors); @@ -120,6 +120,9 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags, qemu_co_mutex_init(s-lock); return 0; +fail_format: +error_setg(errp, Image not in Parallels format); +ret = -EINVAL; fail: g_free(s-catalog_bitmap); return ret; -- 1.9.1
[Qemu-devel] [PATCH v2 0/4] block/parallels: 2TB+ parallels images support
Parallels has released in the recent updates of Parallels Server 5/6 new addition to his image format. Images with signature WithouFreSpacExt have offsets in the catalog coded not as offsets in sectors (multiple of 512 bytes) but offsets coded in blocks (i.e. header-tracks * 512) In this case to code the virtual disk size for such images nb_sectors field is extended to 64 bits. The reader of older images with signature WithoutFreeSpace must manually zero most valuable bits of nb_sectors on open. Changes from v1: - fixed message in patch 1 - added braces to conform qemu coding style in patches 3 4 - added check for ph.tracks in patch 4 to avoid offset overflow as suggested by Jeff Signed-off-by: Denis V. Lunev d...@openvz.org CC: Kevin Wolf kw...@redhat.com CC: Stefan Hajnoczi stefa...@redhat.com CC: Jeff Cody jc...@redhat.com
[Qemu-devel] [PATCH v2 4/4] parallels: 2TB+ parallels images support
Parallels has released in the recent updates of Parallels Server 5/6 new addition to his image format. Images with signature WithouFreSpacExt have offsets in the catalog coded not as offsets in sectors (multiple of 512 bytes) but offsets coded in blocks (i.e. header-tracks * 512) In this case all 64 bits of header-nb_sectors are used for image size. This patch implements support of this for qemu-img and also adds specific check for an incorrect image. Images with block size greater than INT_MAX/513 are not supported. The biggest available Parallels image cluster size in the field is 1 Mb. Thus this limit will not hurt anyone. Signed-off-by: Denis V. Lunev d...@openvz.org CC: Jeff Cody jc...@redhat.com CC: Kevin Wolf kw...@redhat.com CC: Stefan Hajnoczi stefa...@redhat.com --- block/parallels.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/block/parallels.c b/block/parallels.c index 466705e..4414a9d 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -30,6 +30,7 @@ /**/ #define HEADER_MAGIC WithoutFreeSpace +#define HEADER_MAGIC2 WithouFreSpacExt #define HEADER_VERSION 2 #define HEADER_SIZE 64 @@ -54,6 +55,8 @@ typedef struct BDRVParallelsState { unsigned int catalog_size; unsigned int tracks; + +unsigned int off_multiplier; } BDRVParallelsState; static int parallels_probe(const uint8_t *buf, int buf_size, const char *filename) @@ -63,7 +66,8 @@ static int parallels_probe(const uint8_t *buf, int buf_size, const char *filenam if (buf_size HEADER_SIZE) return 0; -if (!memcmp(ph-magic, HEADER_MAGIC, 16) +if ((!memcmp(ph-magic, HEADER_MAGIC, 16) || +!memcmp(ph-magic, HEADER_MAGIC2, 16)) (le32_to_cpu(ph-version) == HEADER_VERSION)) return 100; @@ -85,21 +89,31 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags, goto fail; } +bs-total_sectors = le64_to_cpu(ph.nb_sectors); + if (le32_to_cpu(ph.version) != HEADER_VERSION) { goto fail_format; } -if (memcmp(ph.magic, HEADER_MAGIC, 16)) { +if (!memcmp(ph.magic, HEADER_MAGIC, 16)) { +s-off_multiplier = 1; +bs-total_sectors = 0x bs-total_sectors; +} else if (!memcmp(ph.magic, HEADER_MAGIC2, 16)) { +s-off_multiplier = le32_to_cpu(ph.tracks); +} else { goto fail_format; } -bs-total_sectors = 0x le64_to_cpu(ph.nb_sectors); - s-tracks = le32_to_cpu(ph.tracks); if (s-tracks == 0) { error_setg(errp, Invalid image: Zero sectors per track); ret = -EINVAL; goto fail; } +if (s-tracks INT32_MAX/513) { +error_setg(errp, Invalid image: Too big cluster); +ret = -EFBIG; +goto fail; +} s-catalog_size = le32_to_cpu(ph.catalog_entries); if (s-catalog_size INT_MAX / 4) { @@ -139,7 +153,8 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num) /* not allocated */ if ((index s-catalog_size) || (s-catalog_bitmap[index] == 0)) return -1; -return (uint64_t)(s-catalog_bitmap[index] + offset) * 512; +return +((uint64_t)s-catalog_bitmap[index] * s-off_multiplier + offset) * 512; } static int parallels_read(BlockDriverState *bs, int64_t sector_num, -- 1.9.1
Re: [Qemu-devel] [Bug 1347387] [NEW] while i was created the new virtual machine using qemu the following error was shown in fedora version 20
On 07/28/2014 12:09 PM, Stefan Hajnoczi wrote: On Wed, Jul 23, 2014 at 04:43:27AM -, selvakumar wrote: Public bug reported: [root@localhost pkgs]# qemu-img create virtualdisk.img 100M qemu-img: symbol lookup error: qemu-img: undefined symbol: glfs_discard_async You need to update glusterfs-libs. gluster doesn't have API symbol versioning, so RPM doesn't automatically pick this dependency up like it does for well behaved libraries: https://bugzilla.redhat.com/show_bug.cgi?id=1096654 - Cole
[Qemu-devel] [Bug 1347387] Re: while i was created the new virtual machine using qemu the following error was shown in fedora version 20
** Changed in: qemu Status: New = Invalid ** Bug watch added: Red Hat Bugzilla #1096654 https://bugzilla.redhat.com/show_bug.cgi?id=1096654 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1347387 Title: while i was created the new virtual machine using qemu the following error was shown in fedora version 20 Status in QEMU: Invalid Bug description: [root@localhost pkgs]# qemu-img create virtualdisk.img 100M qemu-img: symbol lookup error: qemu-img: undefined symbol: glfs_discard_async [root@localhost pkgs]# qemu-i386 create virtualdisk.img 100M Error while loading create: No such file or directory [root@localhost pkgs]# rpm -qa qemu-kvm libvirt qemu-kvm-1.6.2-6.fc20.x86_64 libvirt-1.1.3.5-2.fc20.x86_64 [root@localhost pkgs]# [root@localhost pkgs]# rpm -qa|grep *qemu* qemu-system-m68k-1.6.2-6.fc20.x86_64 qemu-kvm-1.6.2-6.fc20.x86_64 qemu-system-microblaze-1.6.2-6.fc20.x86_64 ipxe-roms-qemu-20130517-3.gitc4bce43.fc20.noarch qemu-common-1.6.2-6.fc20.x86_64 qemu-system-sh4-1.6.2-6.fc20.x86_64 qemu-system-sparc-1.6.2-6.fc20.x86_64 qemu-system-lm32-1.6.2-6.fc20.x86_64 qemu-img-1.6.2-6.fc20.x86_64 qemu-system-s390x-1.6.2-6.fc20.x86_64 qemu-system-cris-1.6.2-6.fc20.x86_64 qemu-1.6.2-6.fc20.x86_64 qemu-system-xtensa-1.6.2-6.fc20.x86_64 qemu-system-moxie-1.6.2-6.fc20.x86_64 qemu-system-ppc-1.6.2-6.fc20.x86_64 libvirt-daemon-driver-qemu-1.1.3.5-2.fc20.x86_64 qemu-system-mips-1.6.2-6.fc20.x86_64 qemu-system-alpha-1.6.2-6.fc20.x86_64 qemu-guest-agent-1.6.1-2.fc20.x86_64 qemu-user-1.6.2-6.fc20.x86_64 qemu-system-x86-1.6.2-6.fc20.x86_64 qemu-system-arm-1.6.2-6.fc20.x86_64 qemu-system-unicore32-1.6.2-6.fc20.x86_64 qemu-system-or32-1.6.2-6.fc20.x86_64 [root@localhost pkgs]# To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1347387/+subscriptions