[Qemu-devel] [PATCH v16 1/2] sPAPR: Implement EEH RTAS calls
The emulation for EEH RTAS requests from guest isn't covered by QEMU yet and the patch implements them. The patch defines constants used by EEH RTAS calls and adds callback sPAPRPHBClass::eeh_handler, which is going to be used this way: * RTAS calls are received in spapr_pci.c, sanity check is done there. * RTAS handlers handle what they can. If there is something it cannot handle and sPAPRPHBClass::eeh_handler callback is defined, it is called. * sPAPRPHBClass::eeh_handler is only implemented for VFIO now. It does ioctl() to the IOMMU container fd to complete the call. Error codes from that ioctl() are transferred back to the guest. [aik: defined RTAS tokens for EEH RTAS calls] Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- hw/ppc/spapr_pci.c | 310 include/hw/pci-host/spapr.h | 7 + include/hw/ppc/spapr.h | 43 +- 3 files changed, 358 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index 6deeb19..3fac5a9 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -406,6 +406,297 @@ static void rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu, rtas_st(rets, 2, 1);/* 0 == level; 1 == edge */ } +static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu, +sPAPREnvironment *spapr, +uint32_t token, uint32_t nargs, +target_ulong args, uint32_t nret, +target_ulong rets) +{ +sPAPRPHBState *sphb; +sPAPRPHBClass *spc; +uint32_t addr, option; +uint64_t buid; +int ret; + +if ((nargs != 4) || (nret != 1)) { +goto param_error_exit; +} + +buid = ((uint64_t)rtas_ld(args, 1) 32) | rtas_ld(args, 2); +addr = rtas_ld(args, 0); +option = rtas_ld(args, 3); + +sphb = find_phb(spapr, buid); +if (!sphb) { +goto param_error_exit; +} + +spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb); +if (!spc-eeh_handler) { +goto param_error_exit; +} + +switch (option) { +case RTAS_EEH_ENABLE: +if (!find_dev(spapr, buid, addr)) { +goto param_error_exit; +} +break; +case RTAS_EEH_DISABLE: +case RTAS_EEH_THAW_IO: +case RTAS_EEH_THAW_DMA: +break; +default: +goto param_error_exit; +} + +ret = spc-eeh_handler(sphb, RTAS_EEH_REQ_SET_OPTION, option); +if (ret 0) { +rtas_st(rets, 0, RTAS_OUT_HW_ERROR); +return; +} + +rtas_st(rets, 0, RTAS_OUT_SUCCESS); +return; + +param_error_exit: +rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR); +} + +static void rtas_ibm_get_config_addr_info2(PowerPCCPU *cpu, + sPAPREnvironment *spapr, + uint32_t token, uint32_t nargs, + target_ulong args, uint32_t nret, + target_ulong rets) +{ +sPAPRPHBState *sphb; +sPAPRPHBClass *spc; +PCIDevice *pdev; +uint32_t addr, option; +uint64_t buid; + +if ((nargs != 4) || (nret != 2)) { +goto param_error_exit; +} + +buid = ((uint64_t)rtas_ld(args, 1) 32) | rtas_ld(args, 2); +sphb = find_phb(spapr, buid); +if (!sphb) { +goto param_error_exit; +} + +spc = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb); +if (!spc-eeh_handler) { +goto param_error_exit; +} + +addr = rtas_ld(args, 0); +option = rtas_ld(args, 3); +if (option != RTAS_GET_PE_ADDR option != RTAS_GET_PE_MODE) { +goto param_error_exit; +} + +pdev = find_dev(spapr, buid, addr); +if (!pdev) { +goto param_error_exit; +} + +/* + * For now, we always have bus level PE whose address + * has format 00BBSS00. The guest OS might regard + * PE address 0 as invalid. We avoid that simply by + * extending it with one. + */ +if (option == RTAS_GET_PE_ADDR) { +rtas_st(rets, 1, (pci_bus_num(pdev-bus) 16) + 1); +} else { +rtas_st(rets, 1, RTAS_PE_MODE_SHARED); +} + +rtas_st(rets, 0, RTAS_OUT_SUCCESS); +return; + +param_error_exit: +rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR); +} + +static void rtas_ibm_read_slot_reset_state2(PowerPCCPU *cpu, +sPAPREnvironment *spapr, +uint32_t token, uint32_t nargs, +target_ulong args, uint32_t nret, +target_ulong rets) +{ +sPAPRPHBState *sphb; +sPAPRPHBClass *spc; +uint64_t buid; +int ret; + +if ((nargs != 3) || (nret != 4 nret != 5)) { +goto param_error_exit; +} + +buid = ((uint64_t)rtas_ld(args, 1) 32) | rtas_ld(args, 2); +sphb = find_phb(spapr, buid); +if (!sphb) { +
[Qemu-devel] [PATCH v16 0/2] EEH Support for VFIO Devices
The series of patches adds support EEH for VFIO PCI devices on sPAPR platform. It requires corresponding host kernel support, which was merged during 3.17 merge window. This patchset has been rebased to Alex Graf's QEMU repository: git://github.com/agraf/qemu.git (branch: ppc-next) The implementations notes are below. Please consider for merging! * RTAS calls are received in spapr_pci.c, sanity check is done there. RTAS handlers handle what they can. If there is something it cannot handle and sPAPRPHBClass::eeh_handler callback is defined, it is called. * sPAPRPHBClass::eeh_handler is only implemented for VFIO now. It does ioctl() to the IOMMU container fd to complete the call. Error codes from that ioctl() are transferred back to the guest. Changelog = v12 - v13: * Rebase to Alex Graf's QEMU repository (ppc-next branch). * Drop the patch for header file (vfio.h) changes, which was merged to QEMU repository by commit a9fd1654 (linux-headers: update to 3.17-rc7). * Retested on Emulex adapter and EEH errors are recovered successfully. v13 - v14: * Check if sPAPRPHBState instance is valid before converting it to the corresponding class as pointed by Alex Graf. v14 - v15: * Dropped unrelated patch making find_phb()/find_dev() public. * Checking RTAS parameter number before accessing RTAS parameter buffer for more safety. * Return hardware error from RTAS call ibm,set-eeh-option and ibm,set-slot-reset for some cases according to PAPR spec. v15 - v16: * Drop rtas_handle_eeh_request() and merge the logic to its callers so that more accurate return values can be returned for RTAS calls in the callers * Always return 1 (No error log) for RTAS call ibm,slot-error-detail and correct wrong return values for other RTAS calls according to David Gibson's suggestions. * Make fall-through more obvious for case of negative return value from sPAPRPHBClass::eeh_handler() * Clear the argument buffer passed to ioctl() * Rename sPAPRPHBClass variable from info to spc Gavin Shan (2): sPAPR: Implement EEH RTAS calls sPAPR: Implement sPAPRPHBClass::eeh_handler hw/ppc/spapr_pci.c | 310 hw/ppc/spapr_pci_vfio.c | 58 + hw/vfio/common.c| 1 + include/hw/pci-host/spapr.h | 7 + include/hw/ppc/spapr.h | 43 +- 5 files changed, 417 insertions(+), 2 deletions(-) -- 1.8.3.2
Re: [Qemu-devel] [PATCH 6/9] cosmetic changes preparing for the following patches
On Tue, 02/03 13:52, Paolo Bonzini wrote: From: Mike Day ncm...@ncultra.org Signed-off-by: Mike Day ncm...@ncultra.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch_init.c| 5 +-- exec.c | 84 +- include/exec/cpu-all.h | 1 + 3 files changed, 57 insertions(+), 33 deletions(-) diff --git a/arch_init.c b/arch_init.c index 89c8fa4..b13f74b 100644 --- a/arch_init.c +++ b/arch_init.c @@ -688,9 +688,9 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage) } } } + last_seen_block = block; last_offset = offset; - return bytes_sent; } @@ -1117,7 +1117,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) ret = -EINVAL; break; } - ch = qemu_get_byte(f); ram_handle_compressed(host, ch, TARGET_PAGE_SIZE); break; @@ -1128,7 +1127,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) ret = -EINVAL; break; } - qemu_get_buffer(f, host, TARGET_PAGE_SIZE); break; case RAM_SAVE_FLAG_XBZRLE: @@ -1138,7 +1136,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) ret = -EINVAL; break; } - if (load_xbzrle(f, addr, host) 0) { error_report(Failed to decompress XBZRLE page at RAM_ADDR_FMT, addr); diff --git a/exec.c b/exec.c index 05c5b44..8239370 100644 --- a/exec.c +++ b/exec.c @@ -1265,11 +1265,12 @@ static RAMBlock *find_ram_block(ram_addr_t addr) return NULL; } +/* Called with iothread lock held. */ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev) { -RAMBlock *new_block = find_ram_block(addr); -RAMBlock *block; +RAMBlock *new_block, *block; +new_block = find_ram_block(addr); assert(new_block); assert(!new_block-idstr[0]); @@ -1282,7 +1283,6 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev) } pstrcat(new_block-idstr, sizeof(new_block-idstr), name); -/* This assumes the iothread lock is taken here too. */ qemu_mutex_lock_ramlist(); QTAILQ_FOREACH(block, ram_list.blocks, next) { if (block != new_block !strcmp(block-idstr, new_block-idstr)) { @@ -1294,10 +1294,17 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev) qemu_mutex_unlock_ramlist(); } +/* Called with iothread lock held. */ void qemu_ram_unset_idstr(ram_addr_t addr) { -RAMBlock *block = find_ram_block(addr); +RAMBlock *block; +/* FIXME: arch_init.c assumes that this is not called throughout + * migration. Ignore the problem since hot-unplug during migration + * does not work anyway. + */ + +block = find_ram_block(addr); if (block) { memset(block-idstr, 0, sizeof(block-idstr)); } @@ -1585,7 +1592,6 @@ void qemu_ram_free(ram_addr_t addr) } } qemu_mutex_unlock_ramlist(); - } #ifndef _WIN32 @@ -1633,7 +1639,6 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length) memory_try_enable_merging(vaddr, length); qemu_ram_setup_dump(vaddr, length); } -return; Other changes are equivalent, but not quite for this one. But I think it is still correct, so: Reviewed-by: Fam Zheng f...@redhat.com } } } @@ -1641,49 +1646,60 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length) int qemu_get_ram_fd(ram_addr_t addr) { -RAMBlock *block = qemu_get_ram_block(addr); +RAMBlock *block; +int fd; -return block-fd; +block = qemu_get_ram_block(addr); +fd = block-fd; +return fd; } void *qemu_get_ram_block_host_ptr(ram_addr_t addr) { -RAMBlock *block = qemu_get_ram_block(addr); +RAMBlock *block; +void *ptr; -return ramblock_ptr(block, 0); +block = qemu_get_ram_block(addr); +ptr = ramblock_ptr(block, 0); +return ptr; } /* Return a host pointer to ram allocated with qemu_ram_alloc. - With the exception of the softmmu code in this file, this should - only be used for local memory (e.g. video ram) that the device owns, - and knows it isn't going to access beyond the end of the block. - - It should not be used for general purpose DMA. - Use cpu_physical_memory_map/cpu_physical_memory_rw instead. + * This should not be used for general purpose DMA. Use address_space_map + * or address_space_rw instead. For local memory (e.g. video ram) that the + * device owns, use memory_region_get_ram_ptr. */ void *qemu_get_ram_ptr(ram_addr_t addr) { -RAMBlock *block =
Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx
On Tue, 2015-02-03 at 16:19 -0800, David Miller wrote: From: David Woodhouse dw...@infradead.org Date: Mon, 02 Feb 2015 07:27:10 + I'm guessing you don't want to push the *whole* management of the TLS control connection *and* the UDP transport, and probing the latter with keepalives, into the kernel? I certainly don't :) Whilst Herbert Xu and I have discussed in the past supporting automatic SSL handling of socket data during socket writes in the kernel, doing TLS stuff would be a bit of a stretch :-) Right. For the DTLS I was thinking we'd do the handshake in userspace and then hand the UDP socket down. At that point it's basically the same as ESP with the bytes in a slightly different place. So I really am looking at an option for here's a UDP socket to send those tun packets out on, with this encryption setup as the sanest plan I can come up with. -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
[Qemu-devel] [PATCH v2 09/12] acpi, piix4: Add memory hot unplug support for piix4.
From: Tang Chen tangc...@cn.fujitsu.com Call memory unplug cb in piix4_device_unplug_cb(). Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/piix4.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c index 8bd9007..acd054e 100644 --- a/hw/acpi/piix4.c +++ b/hw/acpi/piix4.c @@ -377,8 +377,15 @@ static void piix4_device_unplug_request_cb(HotplugHandler *hotplug_dev, static void piix4_device_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { -error_setg(errp, acpi: device unplug for not supported device -type: %s, object_get_typename(OBJECT(dev))); +PIIX4PMState *s = PIIX4_PM(hotplug_dev); + +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) { +acpi_memory_unplug_cb(s-ar, s-irq, s-acpi_memory_hotplug, + dev, errp); +} else { +error_setg(errp, acpi: device unplug for not supported device +type: %s, object_get_typename(OBJECT(dev))); +} } static void piix4_update_bus_hotplug(PCIBus *pci_bus, void *opaque) -- 1.9.3
[Qemu-devel] [PATCH v2 06/12] acpi, ich9: Add memory hot unplug request support for ich9.
From: Tang Chen tangc...@cn.fujitsu.com Call memory unplug request cb in ich9_pm_device_unplug_request_cb(). Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/ich9.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c index 5352e19..b85eed4 100644 --- a/hw/acpi/ich9.c +++ b/hw/acpi/ich9.c @@ -400,8 +400,14 @@ void ich9_pm_device_plug_cb(ICH9LPCPMRegs *pm, DeviceState *dev, Error **errp) void ich9_pm_device_unplug_request_cb(ICH9LPCPMRegs *pm, DeviceState *dev, Error **errp) { -error_setg(errp, acpi: device unplug request for not supported device -type: %s, object_get_typename(OBJECT(dev))); +if (pm-acpi_memory_hotplug.is_enabled +object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) { +acpi_memory_unplug_request_cb(pm-acpi_regs, pm-irq, + pm-acpi_memory_hotplug, dev, errp); +} else { +error_setg(errp, acpi: device unplug request for not supported device +type: %s, object_get_typename(OBJECT(dev))); +} } void ich9_pm_device_unplug_cb(ICH9LPCPMRegs *pm, DeviceState *dev, -- 1.9.3
[Qemu-devel] [PATCH v2 08/12] acpi, mem-hotplug: Add unplug cb for memory device.
From: Tang Chen tangc...@cn.fujitsu.com Reset all memory status, and unparent the memory device. Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/memory_hotplug.c | 34 ++ hw/core/qdev.c | 2 +- include/hw/acpi/memory_hotplug.h | 2 ++ include/hw/qdev-core.h | 1 + 4 files changed, 38 insertions(+), 1 deletion(-) diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c index 3d3c1ec..3ae9629 100644 --- a/hw/acpi/memory_hotplug.c +++ b/hw/acpi/memory_hotplug.c @@ -1,6 +1,7 @@ #include hw/acpi/memory_hotplug.h #include hw/acpi/pc-hotplug.h #include hw/mem/pc-dimm.h +#include hw/i386/pc.h #include hw/boards.h #include trace.h #include qapi-event.h @@ -221,6 +222,39 @@ void acpi_memory_unplug_request_cb(ACPIREGS *ar, qemu_irq irq, acpi_send_gpe_event(ar, irq, ACPI_MEMORY_HOTPLUG_STATUS); } +void acpi_memory_unplug_cb(ACPIREGS *ar, qemu_irq irq, + MemHotplugState *mem_st, + DeviceState *dev, Error **errp) +{ +MemStatus *mdev; +HotplugHandler *hotplug_dev; +PCMachineState *pcms; +PCDIMMDevice *dimm; +PCDIMMDeviceClass *ddc; +MemoryRegion *mr; + +if (!mem_st-is_enabled) { +error_setg(errp, memory hotplug is not supported); +return; +} + +mdev = acpi_memory_slot_status(mem_st, dev, errp); +if (!mdev) +return; + +mdev-is_enabled = false; +mdev-dimm = NULL; + +hotplug_dev = qdev_get_hotplug_handler(dev); +pcms = PC_MACHINE(hotplug_dev); +dimm = PC_DIMM(dev); +ddc = PC_DIMM_GET_CLASS(dimm); +mr = ddc-get_memory_region(dimm); + +memory_region_del_subregion(pcms-hotplug_memory, mr); +vmstate_unregister_ram(mr, dev); +} + static const VMStateDescription vmstate_memhp_sts = { .name = memory hotplug device state, .version_id = 1, diff --git a/hw/core/qdev.c b/hw/core/qdev.c index 2eacac0..2f3d1df 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -273,7 +273,7 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id, dev-alias_required_for_version = required_for_version; } -static HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev) +HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev) { HotplugHandler *hotplug_ctrl = NULL; diff --git a/include/hw/acpi/memory_hotplug.h b/include/hw/acpi/memory_hotplug.h index c437a85..6b8d9f7 100644 --- a/include/hw/acpi/memory_hotplug.h +++ b/include/hw/acpi/memory_hotplug.h @@ -32,6 +32,8 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st, void acpi_memory_unplug_request_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st, DeviceState *dev, Error **errp); +void acpi_memory_unplug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st, + DeviceState *dev, Error **errp); extern const VMStateDescription vmstate_memory_hotplug; #define VMSTATE_MEMORY_HOTPLUG(memhp, state) \ diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 15a226f..03d6239 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -266,6 +266,7 @@ int qdev_init(DeviceState *dev) QEMU_WARN_UNUSED_RESULT; void qdev_init_nofail(DeviceState *dev); void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id, int required_for_version); +HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev); void qdev_unplug(DeviceState *dev, Error **errp); void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp); -- 1.9.3
[Qemu-devel] [PATCH v2 04/12] acpi, mem-hotplug: Add unplug request cb for memory device.
From: Tang Chen tangc...@cn.fujitsu.com Memory hot unplug are both asynchronize procedures. When the unplug operation happens, unplug request cb is called first. And when ghest OS finished handling unplug, unplug cb will be called to do the real removal of device. This patch adds unplug request cb for memory device. Add a new bool member named is_removing to MemStatus indicating that the memory slot is being removed. Set it to true in acpi_memory_unplug_request_cb(), and send SCI to guest. Signed-off-by: Tang Chen tangc...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/memory_hotplug.c | 16 include/hw/acpi/memory_hotplug.h | 4 2 files changed, 20 insertions(+) diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c index f30d8f9..3d3c1ec 100644 --- a/hw/acpi/memory_hotplug.c +++ b/hw/acpi/memory_hotplug.c @@ -205,6 +205,22 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st, acpi_send_gpe_event(ar, irq, ACPI_MEMORY_HOTPLUG_STATUS); } +void acpi_memory_unplug_request_cb(ACPIREGS *ar, qemu_irq irq, + MemHotplugState *mem_st, + DeviceState *dev, Error **errp) +{ +MemStatus *mdev; + +mdev = acpi_memory_slot_status(mem_st, dev, errp); +if (!mdev) +return; + +mdev-is_removing = true; + +/* Do ACPI magic */ +acpi_send_gpe_event(ar, irq, ACPI_MEMORY_HOTPLUG_STATUS); +} + static const VMStateDescription vmstate_memhp_sts = { .name = memory hotplug device state, .version_id = 1, diff --git a/include/hw/acpi/memory_hotplug.h b/include/hw/acpi/memory_hotplug.h index 7bbf8a0..c437a85 100644 --- a/include/hw/acpi/memory_hotplug.h +++ b/include/hw/acpi/memory_hotplug.h @@ -11,6 +11,7 @@ typedef struct MemStatus { DeviceState *dimm; bool is_enabled; bool is_inserting; +bool is_removing; uint32_t ost_event; uint32_t ost_status; } MemStatus; @@ -28,6 +29,9 @@ void acpi_memory_hotplug_init(MemoryRegion *as, Object *owner, void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st, DeviceState *dev, Error **errp); +void acpi_memory_unplug_request_cb(ACPIREGS *ar, qemu_irq irq, + MemHotplugState *mem_st, + DeviceState *dev, Error **errp); extern const VMStateDescription vmstate_memory_hotplug; #define VMSTATE_MEMORY_HOTPLUG(memhp, state) \ -- 1.9.3
Re: [Qemu-devel] [PATCH v4] sheepdog: selectable object size support
(2015/02/02 15:52), Liu Yuan wrote: On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote: Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle block_size_shift value for calculating VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify block_size_shift option. If you want to create a VDI of 8MB(1 23) object size, you need to specify following command option. # qemu-img create -o block_size_shift=23 sheepdog:test1 100M Is it possible to make this option more user friendly? such as $ qemu-img create -o object_size=8M sheepdog:test 1G At first, I thought that the object_size was user friendly. But, Sheepdog has already the value of block_size_shift in the inode layout that means like object_size. 'object_size' doesn't always fit right in 'block_size_shift'. On the other hands, 'block_size_shift' always fit right in 'object_size'. I think that existing layout shouldn't be changed easily and it seems that it is difficult for users to specify the object_size value that fit right in 'block_size_shift'. Thanks, Teruaki Ishizaki
Re: [Qemu-devel] [PATCH 3/9] exec: RCUify AddressSpaceDispatch
On Tue, 02/03 13:52, Paolo Bonzini wrote: Note that even after this patch, most callers of address_space_* functions must still be under the big QEMU lock, otherwise the memory region returned by address_space_translate can disappear as soon as address_space_translate returns. This will be fixed in the next part of this series. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpu-exec.c | 25 - cpus.c | 2 +- cputlb.c| 8 ++-- exec.c | 34 ++ hw/i386/intel_iommu.c | 3 +++ hw/pci-host/apb.c | 1 + hw/ppc/spapr_iommu.c| 1 + include/exec/exec-all.h | 1 + 8 files changed, 63 insertions(+), 12 deletions(-) diff --git a/cpu-exec.c b/cpu-exec.c index 98f968d..adb939a 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -26,6 +26,7 @@ #include qemu/timer.h #include exec/address-spaces.h #include exec/memory-internal.h +#include qemu/rcu.h /* -icount align implementation. */ @@ -146,8 +147,27 @@ void cpu_resume_from_signal(CPUState *cpu, void *puc) void cpu_reload_memory_map(CPUState *cpu) { +AddressSpaceDispatch *d; + +if (qemu_in_vcpu_thread()) { +/* Do not let the guest prolong the critical section as much as it + * as it desires. + * + * Currently, this is prevented by the I/O thread's periodinc kicking + * of the VCPU thread (iothread_requesting_mutex, qemu_cpu_kick_thread) + * but this will go away once TCG's execution moves out of the global + * mutex. + * + * This pair matches cpu_exec's rcu_read_lock()/rcu_read_unlock(), which + * only protects cpu-as-dispatch. Since we reload it below, we can + * split the critical section. + */ +rcu_read_unlock(); +rcu_read_lock(); +} + /* The CPU and TLB are protected by the iothread lock. */ -AddressSpaceDispatch *d = cpu-as-dispatch; +d = atomic_rcu_read(cpu-as-dispatch); cpu-memory_dispatch = d; tlb_flush(cpu, 1); } @@ -362,6 +382,8 @@ int cpu_exec(CPUArchState *env) * an instruction scheduling constraint on modern architectures. */ smp_mb(); +rcu_read_lock(); + if (unlikely(exit_request)) { cpu-exit_request = 1; } @@ -564,6 +586,7 @@ int cpu_exec(CPUArchState *env) } /* for(;;) */ cc-cpu_exec_exit(cpu); +rcu_read_unlock(); /* fail safe : never use current_cpu outside cpu_exec() */ current_cpu = NULL; diff --git a/cpus.c b/cpus.c index 0cdd1d7..b826fac 100644 --- a/cpus.c +++ b/cpus.c @@ -1104,7 +1104,7 @@ bool qemu_cpu_is_self(CPUState *cpu) return qemu_thread_is_self(cpu-thread); } -static bool qemu_in_vcpu_thread(void) +bool qemu_in_vcpu_thread(void) { return current_cpu qemu_cpu_is_self(current_cpu); } diff --git a/cputlb.c b/cputlb.c index f92db5e..38f2151 100644 --- a/cputlb.c +++ b/cputlb.c @@ -243,8 +243,12 @@ static void tlb_add_large_page(CPUArchState *env, target_ulong vaddr, } /* Add a new TLB entry. At most one entry for a given virtual address - is permitted. Only a single TARGET_PAGE_SIZE region is mapped, the - supplied size is only used by tlb_flush_page. */ + * is permitted. Only a single TARGET_PAGE_SIZE region is mapped, the + * supplied size is only used by tlb_flush_page. + * + * Called from TCG-generated code, which is under an RCU read-side + * critical section. + */ void tlb_set_page(CPUState *cpu, target_ulong vaddr, hwaddr paddr, int prot, int mmu_idx, target_ulong size) diff --git a/exec.c b/exec.c index 1854c95..a423def 100644 --- a/exec.c +++ b/exec.c @@ -115,6 +115,8 @@ struct PhysPageEntry { typedef PhysPageEntry Node[P_L2_SIZE]; typedef struct PhysPageMap { +struct rcu_head rcu; + unsigned sections_nb; unsigned sections_nb_alloc; unsigned nodes_nb; @@ -124,6 +126,8 @@ typedef struct PhysPageMap { } PhysPageMap; struct AddressSpaceDispatch { +struct rcu_head rcu; + /* This is a multi-level map on the physical address space. * The bottom level has pointers to MemoryRegionSections. */ @@ -315,6 +319,7 @@ bool memory_region_is_unassigned(MemoryRegion *mr) mr != io_mem_watch; } +/* Called from RCU critical section */ static MemoryRegionSection *address_space_lookup_region(AddressSpaceDispatch *d, hwaddr addr, bool resolve_subpage) @@ -330,6 +335,7 @@ static MemoryRegionSection *address_space_lookup_region(AddressSpaceDispatch *d, return section; } +/* Called from RCU critical section */ static MemoryRegionSection * address_space_translate_internal(AddressSpaceDispatch *d, hwaddr
[Qemu-devel] [PATCH v2 05/12] acpi, piix4: Add memory hot unplug request support for piix4.
From: Hu Tao hu...@cn.fujitsu.com Call memory unplug request cb in piix4_device_unplug_request_cb(). Signed-off-by: Hu Tao hu...@cn.fujitsu.com Signed-off-by: Tang Chen tangc...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/piix4.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c index 14d40a0..8bd9007 100644 --- a/hw/acpi/piix4.c +++ b/hw/acpi/piix4.c @@ -361,7 +361,11 @@ static void piix4_device_unplug_request_cb(HotplugHandler *hotplug_dev, { PIIX4PMState *s = PIIX4_PM(hotplug_dev); -if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { +if (s-acpi_memory_hotplug.is_enabled +object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) { +acpi_memory_unplug_request_cb(s-ar, s-irq, s-acpi_memory_hotplug, + dev, errp); +} else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { acpi_pcihp_device_unplug_cb(s-ar, s-irq, s-acpi_pci_hotplug, dev, errp); } else { -- 1.9.3
[Qemu-devel] [PATCH v2 07/12] pc-dimm: Add memory hot unplug request support for pc-dimm.
From: Tang Chen tangc...@cn.fujitsu.com Implement memory unplug request cb for pc-dimm, and call it in pc_machine_device_unplug_request_cb(). Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/i386/pc.c | 28 ++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 850b6b5..ddc0190 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1641,6 +1641,26 @@ out: error_propagate(errp, local_err); } +static void pc_dimm_unplug_request(HotplugHandler *hotplug_dev, + DeviceState *dev, Error **errp) +{ +HotplugHandlerClass *hhc; +Error *local_err = NULL; +PCMachineState *pcms = PC_MACHINE(hotplug_dev); + +if (!pcms-acpi_dev) { +error_setg(local_err, + memory hotplug is not enabled: missing acpi device); +goto out; +} + +hhc = HOTPLUG_HANDLER_GET_CLASS(pcms-acpi_dev); +hhc-unplug_request(HOTPLUG_HANDLER(pcms-acpi_dev), dev, local_err); + +out: +error_propagate(errp, local_err); +} + static void pc_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { @@ -1683,8 +1703,12 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev, static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { -error_setg(errp, acpi: device unplug request for not supported device -type: %s, object_get_typename(OBJECT(dev))); +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) { +pc_dimm_unplug_request(hotplug_dev, dev, errp); +} else { +error_setg(errp, acpi: device unplug request for not supported device +type: %s, object_get_typename(OBJECT(dev))); +} } static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev, -- 1.9.3
Re: [Qemu-devel] [PATCH 4/9] rcu: introduce RCU-enabled QLIST
On Tue, 02/03 13:52, Paolo Bonzini wrote: From: Mike Day ncm...@ncultra.org Add RCU-enabled variants on the existing bsd DQ facility. Each operation has the same interface as the existing (non-RCU) version. Also, each operation is implemented as macro. Using the RCU-enabled QLIST, existing QLIST users will be able to convert to RCU without using a different list interface. Signed-off-by: Mike Day ncm...@ncultra.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/9pfs/virtio-9p-synth.c | 2 +- include/qemu/queue.h | 11 -- include/qemu/rcu_queue.h | 134 tests/Makefile| 5 +- tests/test-rcu-list.c | 306 ++ 5 files changed, 445 insertions(+), 13 deletions(-) create mode 100644 include/qemu/rcu_queue.h create mode 100644 tests/test-rcu-list.c diff --git a/hw/9pfs/virtio-9p-synth.c b/hw/9pfs/virtio-9p-synth.c index e75aa87..a0ab9a8 100644 --- a/hw/9pfs/virtio-9p-synth.c +++ b/hw/9pfs/virtio-9p-synth.c @@ -18,7 +18,7 @@ #include fsdev/qemu-fsdev.h #include virtio-9p-synth.h #include qemu/rcu.h - +#include qemu/rcu_queue.h #include sys/stat.h /* Root node for synth file system */ diff --git a/include/qemu/queue.h b/include/qemu/queue.h index c602797..8094150 100644 --- a/include/qemu/queue.h +++ b/include/qemu/queue.h @@ -139,17 +139,6 @@ struct { \ (elm)-field.le_prev = (head)-lh_first; \ } while (/*CONSTCOND*/0) -#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\ -(elm)-field.le_prev = (head)-lh_first; \ -(elm)-field.le_next = (head)-lh_first;\ -smp_wmb(); /* fill elm before linking it */ \ -if ((head)-lh_first != NULL) {\ -(head)-lh_first-field.le_prev = (elm)-field.le_next;\ -} \ -(head)-lh_first = (elm); \ -smp_wmb(); \ -} while (/* CONSTCOND*/0) - #define QLIST_REMOVE(elm, field) do { \ if ((elm)-field.le_next != NULL) \ (elm)-field.le_next-field.le_prev = \ diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h new file mode 100644 index 000..3aca7a5 --- /dev/null +++ b/include/qemu/rcu_queue.h @@ -0,0 +1,134 @@ +#ifndef QEMU_RCU_QUEUE_H +#define QEMU_RCU_QUEUE_H + +/* + * rcu_queue.h + * + * RCU-friendly versions of the queue.h primitives. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + * + * Copyright (c) 2013 Mike D. Day, IBM Corporation. + * + * IBM's contributions to this file may be relicensed under LGPLv2 or later. + */ + +#include qemu/queue.h +#include qemu/atomic.h + +#ifdef __cplusplus +extern C { +#endif + + +/* + * List access methods. + */ +#define QLIST_EMPTY_RCU(head) (atomic_rcu_read((head)-lh_first) == NULL) +#define QLIST_FIRST_RCU(head) (atomic_rcu_read((head)-lh_first)) +#define QLIST_NEXT_RCU(elm, field) (atomic_rcu_read((elm)-field.le_next)) + +/* + * List functions. + */ + + +/* + * The difference between atomic_read/set and atomic_rcu_read/set + * is in the including of a read/write memory barrier to the volatile + * access. atomic_rcu_* macros include the memory barrier, the + * plain atomic macros do not. Therefore, it should be correct to + * issue a series of reads or writes to the same element using only + * the atomic_* macro, until the last read or write, which should be + * atomic_rcu_* to introduce a read or write memory barrier as + * appropriate. + */ + +/* Upon publication of the listelm-next value, list readers + * will see the new node when following next pointers from + * antecedent nodes, but may not see the new node when following + * prev pointers from subsequent nodes until after the RCU grace + * period expires. + * see linux/include/rculist.h
Re: [Qemu-devel] [PATCH v2 0/1] dataplane vs. endianness
On Tue, Jan 27, 2015 at 04:15:23PM +1100, David Gibson wrote: On Mon, Jan 26, 2015 at 05:26:41PM +0100, Cornelia Huck wrote: Stefan: Here's v2 of my endianness patch for dataplane, with the extraneous vdev argument dropped from get_desc(). I orginally planned to send my virtio-1 patchset as well, but I haven't found the time for it; therefore, I think this should be applied independently. David: I take it your r-b still holds? Yes. I also retested this version and it still works fine. Tested-by: David Gibson da...@gibson.dropbear.id.au Any word on getting this merged? -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpxzCp_NTceq.pgp Description: PGP signature
[Qemu-devel] [PATCH v2 10/12] acpi, ich9: Add memory hot unplug support for ich9.
From: Tang Chen tangc...@cn.fujitsu.com Call memory unplug cb in ich9_pm_device_unplug_cb(). Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/ich9.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c index b85eed4..3a8d712 100644 --- a/hw/acpi/ich9.c +++ b/hw/acpi/ich9.c @@ -413,8 +413,14 @@ void ich9_pm_device_unplug_request_cb(ICH9LPCPMRegs *pm, DeviceState *dev, void ich9_pm_device_unplug_cb(ICH9LPCPMRegs *pm, DeviceState *dev, Error **errp) { -error_setg(errp, acpi: device unplug for not supported device -type: %s, object_get_typename(OBJECT(dev))); +if (pm-acpi_memory_hotplug.is_enabled +object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) { +acpi_memory_unplug_cb(pm-acpi_regs, pm-irq, + pm-acpi_memory_hotplug, dev, errp); +} else { +error_setg(errp, acpi: device unplug for not supported device +type: %s, object_get_typename(OBJECT(dev))); +} } void ich9_pm_ospm_status(AcpiDeviceIf *adev, ACPIOSTInfoList ***list) -- 1.9.3
[Qemu-devel] [PATCH v2 12/12] acpi: Add hardware implementation for memory hot unplug.
From: Tang Chen tangc...@cn.fujitsu.com This patch adds a new bit to memory hotplug IO port indicating that ej0 has been evaluated by guest OS. And call pc-dimm unplug cb to do the real removal. Signed-off-by: Hu Tao hu...@cn.fujitsu.com Signed-off-by: Tang Chen tangc...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- docs/specs/acpi_mem_hotplug.txt | 9 +++-- hw/acpi/memory_hotplug.c | 25 ++--- hw/i386/acpi-dsdt-mem-hotplug.dsl | 11 ++- hw/i386/ssdt-mem.dsl | 5 + include/hw/acpi/pc-hotplug.h | 2 ++ 5 files changed, 46 insertions(+), 6 deletions(-) diff --git a/docs/specs/acpi_mem_hotplug.txt b/docs/specs/acpi_mem_hotplug.txt index 1290994..9805f1a 100644 --- a/docs/specs/acpi_mem_hotplug.txt +++ b/docs/specs/acpi_mem_hotplug.txt @@ -19,7 +19,9 @@ Memory hot-plug interface (IO port 0xa00-0xa17, 1-4 byte access): 1: Device insert event, used to distinguish device for which no device check event to OSPM was issued. It's valid only when bit 1 is set. - 2-7: reserved and should be ignored by OSPM + 2: Device remove event, used to distinguish device for which + no device check event to OSPM was issued. + 3-7: reserved and should be ignored by OSPM [0x15-0x17] reserved write access: @@ -35,7 +37,10 @@ Memory hot-plug interface (IO port 0xa00-0xa17, 1-4 byte access): 1: if set to 1 clears device insert event, set by OSPM after it has emitted device check event for the selected memory device - 2-7: reserved, OSPM must clear them before writing to register + 2: if set to 1 clears device remove event, set by OSPM + after it has emitted device check event for the + selected memory device + 3-7: reserved, OSPM must clear them before writing to register Selecting memory device slot beyond present range has no effect on platform: - write accesses to memory hot-plug registers not documented above are diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c index 3ae9629..a6fc3b3 100644 --- a/hw/acpi/memory_hotplug.c +++ b/hw/acpi/memory_hotplug.c @@ -3,6 +3,7 @@ #include hw/mem/pc-dimm.h #include hw/i386/pc.h #include hw/boards.h +#include hw/qdev-core.h #include trace.h #include qapi-event.h @@ -76,6 +77,7 @@ static uint64_t acpi_memory_hotplug_read(void *opaque, hwaddr addr, case 0x14: /* pack and return is_* fields */ val |= mdev-is_enabled ? 1 : 0; val |= mdev-is_inserting ? 2 : 0; +val |= mdev-is_removing ? 4 : 0; trace_mhp_acpi_read_flags(mem_st-selector, val); break; default: @@ -91,6 +93,8 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr addr, uint64_t data, MemHotplugState *mem_st = opaque; MemStatus *mdev; ACPIOSTInfo *info; +DeviceState *dev = NULL; +HotplugHandler *hotplug_ctrl = NULL; if (!mem_st-dev_count) { return; @@ -122,21 +126,36 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr addr, uint64_t data, mdev = mem_st-devs[mem_st-selector]; mdev-ost_status = data; trace_mhp_acpi_write_ost_status(mem_st-selector, mdev-ost_status); -/* TODO: implement memory removal on guest signal */ info = acpi_memory_device_status(mem_st-selector, mdev); qapi_event_send_acpi_device_ost(info, error_abort); qapi_free_ACPIOSTInfo(info); break; -case 0x14: +case 0x14: /* set is_* fields */ mdev = mem_st-devs[mem_st-selector]; + if (data 2) { /* clear insert event */ mdev-is_inserting = false; trace_mhp_acpi_clear_insert_evt(mem_st-selector); +} else if (data 4) { /* request removal of device */ +mdev-is_removing = false; +trace_mhp_acpi_clear_remove_evt(mem_st-selector); +/* + * QEmu memory hot unplug is an asynchronized procedure. QEmu first + * calls pc-dimm unplug request cb to send a SCI to guest. When the + * Guest OS finished handling the SCI, it evaluates ACPI ej0, and + * QEmu calls pc-dimm unplug cb to remove memory device. + */ +dev = DEVICE(mdev-dimm); +hotplug_ctrl = qdev_get_hotplug_handler(dev); +/* Call pc-dimm unplug cb. */ +hotplug_handler_unplug(hotplug_ctrl, dev, NULL); } + +break; +default: break; } - } static const MemoryRegionOps acpi_memory_hotplug_ops = { .read = acpi_memory_hotplug_read, diff --git a/hw/i386/acpi-dsdt-mem-hotplug.dsl b/hw/i386/acpi-dsdt-mem-hotplug.dsl index 2a36c47..b53bf77 100644 --- a/hw/i386/acpi-dsdt-mem-hotplug.dsl +++ b/hw/i386/acpi-dsdt-mem-hotplug.dsl @@ -50,6 +50,7 @@
[Qemu-devel] [PATCH v2 02/12] acpi, mem-hotplug: Add acpi_memory_slot_status() to get MemStatus.
From: Tang Chen tangc...@cn.fujitsu.com Add a new API named acpi_memory_get_slot_status_descriptor() to obtain a single memory slot status. Doing this is because this procedure will be used by other functions in the next coming patches. Signed-off-by: Tang Chen tangc...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/memory_hotplug.c | 27 +++ 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c index c6580da..ddbe01b 100644 --- a/hw/acpi/memory_hotplug.c +++ b/hw/acpi/memory_hotplug.c @@ -163,29 +163,40 @@ void acpi_memory_hotplug_init(MemoryRegion *as, Object *owner, memory_region_add_subregion(as, ACPI_MEMORY_HOTPLUG_BASE, state-io); } -void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st, - DeviceState *dev, Error **errp) +static MemStatus * +acpi_memory_slot_status(MemHotplugState *mem_st, +DeviceState *dev, Error **errp) { -MemStatus *mdev; Error *local_err = NULL; int slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP, local_err); if (local_err) { error_propagate(errp, local_err); -return; +return NULL; } if (slot = mem_st-dev_count) { char *dev_path = object_get_canonical_path(OBJECT(dev)); -error_setg(errp, acpi_memory_plug_cb: +error_setg(errp, acpi_memory_get_slot_status_descriptor: device [%s] returned invalid memory slot[%d], -dev_path, slot); + dev_path, slot); g_free(dev_path); -return; +return NULL; } -mdev = mem_st-devs[slot]; +return mem_st-devs[slot]; +} + +void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st, + DeviceState *dev, Error **errp) +{ +MemStatus *mdev; + +mdev = acpi_memory_slot_status(mem_st, dev, errp); +if (!mdev) +return; + mdev-dimm = dev; mdev-is_enabled = true; mdev-is_inserting = true; -- 1.9.3
[Qemu-devel] [PATCH v2 00/12] QEmu memory hot unplug support
Memory hot unplug are both asynchronize procedures. When the unplug operation happens, unplug request cb is called first. And when ghest OS finished handling unplug, unplug cb will be called to do the real removal of device. This series depends on the following patchset. [PATCH v2 0/5] Common unplug and unplug request cb for memory and CPU hot-unplug https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg03929.html v2: - do a generic for acpi to send gpe event - unparent object by PC_MACHINE - update description in acpi_mem_hotplug.txt - combine the last two patches in the last version - cleanup external state in acpi_memory_unplug_cb Hu Tao (1): acpi, piix4: Add memory hot unplug request support for piix4. Tang Chen (11): acpi, mem-hotplug: Use PC_DIMM_SLOT_PROP in acpi_memory_plug_cb(). acpi, mem-hotplug: Add acpi_memory_slot_status() to get MemStatus. acpi, mem-hotplug: Add acpi_memory_hotplug_sci() to rise sci for memory hotplug. acpi, mem-hotplug: Add unplug request cb for memory device. acpi, ich9: Add memory hot unplug request support for ich9. pc-dimm: Add memory hot unplug request support for pc-dimm. acpi, mem-hotplug: Add unplug cb for memory device. acpi, piix4: Add memory hot unplug support for piix4. acpi, ich9: Add memory hot unplug support for ich9. pc-dimm: Add memory hot unplug support for pc-dimm. acpi: Add hardware implementation for memory hot unplug. docs/specs/acpi_mem_hotplug.txt | 9 +++- hw/acpi/core.c| 7 +++ hw/acpi/ich9.c| 20 +-- hw/acpi/memory_hotplug.c | 111 -- hw/acpi/piix4.c | 17 -- hw/core/qdev.c| 2 +- hw/i386/acpi-dsdt-mem-hotplug.dsl | 11 +++- hw/i386/pc.c | 48 +++-- hw/i386/ssdt-mem.dsl | 5 ++ include/hw/acpi/acpi.h| 3 ++ include/hw/acpi/memory_hotplug.h | 6 +++ include/hw/acpi/pc-hotplug.h | 2 + include/hw/qdev-core.h| 1 + 13 files changed, 211 insertions(+), 31 deletions(-) -- 1.9.3
[Qemu-devel] [PATCH v16 2/2] sPAPR: Implement sPAPRPHBClass::eeh_handler
The patch implements sPAPRPHBClass::eeh_handler so that the EEH RTAS requests can be routed to VFIO for further handling. Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- hw/ppc/spapr_pci_vfio.c | 58 + hw/vfio/common.c| 1 + 2 files changed, 59 insertions(+) diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c index 144912b..b76c660 100644 --- a/hw/ppc/spapr_pci_vfio.c +++ b/hw/ppc/spapr_pci_vfio.c @@ -71,6 +71,63 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp) spapr_tce_get_iommu(tcet)); } +static int spapr_phb_vfio_eeh_handler(sPAPRPHBState *sphb, int req, int opt) +{ +sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); +struct vfio_eeh_pe_op op; +int cmd; + +memset(op, 0, sizeof(op)); +op.argsz = sizeof(op); +switch (req) { +case RTAS_EEH_REQ_SET_OPTION: +switch (opt) { +case RTAS_EEH_DISABLE: +cmd = VFIO_EEH_PE_DISABLE; +break; +case RTAS_EEH_ENABLE: +cmd = VFIO_EEH_PE_ENABLE; +break; +case RTAS_EEH_THAW_IO: +cmd = VFIO_EEH_PE_UNFREEZE_IO; +break; +case RTAS_EEH_THAW_DMA: +cmd = VFIO_EEH_PE_UNFREEZE_DMA; +break; +default: +return -EINVAL; +} +break; +case RTAS_EEH_REQ_GET_STATE: +cmd = VFIO_EEH_PE_GET_STATE; +break; +case RTAS_EEH_REQ_RESET: +switch (opt) { +case RTAS_SLOT_RESET_DEACTIVATE: +cmd = VFIO_EEH_PE_RESET_DEACTIVATE; +break; +case RTAS_SLOT_RESET_HOT: +cmd = VFIO_EEH_PE_RESET_HOT; +break; +case RTAS_SLOT_RESET_FUNDAMENTAL: +cmd = VFIO_EEH_PE_RESET_FUNDAMENTAL; +break; +default: +return -EINVAL; +} +break; +case RTAS_EEH_REQ_CONFIGURE: +cmd = VFIO_EEH_PE_CONFIGURE; +break; +default: + return -EINVAL; +} + +op.op = cmd; +return vfio_container_ioctl(svphb-phb.iommu_as, svphb-iommugroupid, +VFIO_EEH_PE_OP, op); +} + static void spapr_phb_vfio_reset(DeviceState *qdev) { /* Do nothing */ @@ -84,6 +141,7 @@ static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data) dc-props = spapr_phb_vfio_properties; dc-reset = spapr_phb_vfio_reset; spc-finish_realize = spapr_phb_vfio_finish_realize; +spc-eeh_handler = spapr_phb_vfio_eeh_handler; } static const TypeInfo spapr_phb_vfio_info = { diff --git a/hw/vfio/common.c b/hw/vfio/common.c index cf483ff..8a10c8b 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -948,6 +948,7 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid, switch (req) { case VFIO_CHECK_EXTENSION: case VFIO_IOMMU_SPAPR_TCE_GET_INFO: +case VFIO_EEH_PE_OP: break; default: /* Return an error on unknown requests */ -- 1.8.3.2
Re: [Qemu-devel] [PATCH 2/9] exec: make iotlb RCU-friendly
On Tue, 02/03 13:52, Paolo Bonzini wrote: After the previous patch, TLBs will be flushed on every change to the memory mapping. This patch augments that with synchronization of the MemoryRegionSections referred to in the iotlb array. With this change, it is guaranteed that iotlb_to_region will access the correct memory map, even once the TLB will be accessed outside the BQL. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpu-exec.c | 6 +- cputlb.c| 5 ++--- exec.c | 13 - include/exec/cputlb.h | 2 +- include/exec/exec-all.h | 3 ++- include/qom/cpu.h | 1 + softmmu_template.h | 4 ++-- 7 files changed, 21 insertions(+), 13 deletions(-) diff --git a/cpu-exec.c b/cpu-exec.c index 78fe382..98f968d 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -24,6 +24,8 @@ #include qemu/atomic.h #include sysemu/qtest.h #include qemu/timer.h +#include exec/address-spaces.h +#include exec/memory-internal.h /* -icount align implementation. */ @@ -144,7 +146,9 @@ void cpu_resume_from_signal(CPUState *cpu, void *puc) void cpu_reload_memory_map(CPUState *cpu) { -/* The TLB is protected by the iothread lock. */ +/* The CPU and TLB are protected by the iothread lock. */ +AddressSpaceDispatch *d = cpu-as-dispatch; +cpu-memory_dispatch = d; tlb_flush(cpu, 1); } #endif diff --git a/cputlb.c b/cputlb.c index 3b271d4..f92db5e 100644 --- a/cputlb.c +++ b/cputlb.c @@ -265,8 +265,7 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr, } sz = size; -section = address_space_translate_for_iotlb(cpu-as, paddr, -xlat, sz); +section = address_space_translate_for_iotlb(cpu, paddr, xlat, sz); assert(sz = TARGET_PAGE_SIZE); #if defined(DEBUG_TLB) @@ -347,7 +346,7 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr) cpu_ldub_code(env1, addr); } pd = env1-iotlb[mmu_idx][page_index] ~TARGET_PAGE_MASK; -mr = iotlb_to_region(cpu-as, pd); +mr = iotlb_to_region(cpu, pd); if (memory_region_is_unassigned(mr)) { CPUClass *cc = CPU_GET_CLASS(cpu); diff --git a/exec.c b/exec.c index 5a75909..1854c95 100644 --- a/exec.c +++ b/exec.c @@ -401,11 +401,12 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr, } MemoryRegionSection * -address_space_translate_for_iotlb(AddressSpace *as, hwaddr addr, hwaddr *xlat, - hwaddr *plen) +address_space_translate_for_iotlb(CPUState *cpu, hwaddr addr, + hwaddr *xlat, hwaddr *plen) { MemoryRegionSection *section; -section = address_space_translate_internal(as-dispatch, addr, xlat, plen, false); +section = address_space_translate_internal(cpu-memory_dispatch, + addr, xlat, plen, false); assert(!section-mr-iommu_ops); return section; @@ -1961,9 +1962,11 @@ static uint16_t dummy_section(PhysPageMap *map, AddressSpace *as, return phys_section_add(map, section); } -MemoryRegion *iotlb_to_region(AddressSpace *as, hwaddr index) +MemoryRegion *iotlb_to_region(CPUState *cpu, hwaddr index) { -return as-dispatch-map.sections[index ~TARGET_PAGE_MASK].mr; +MemoryRegionSection *sections = cpu-memory_dispatch-map.sections; + +return sections[index ~TARGET_PAGE_MASK].mr; } static void io_mem_init(void) diff --git a/include/exec/cputlb.h b/include/exec/cputlb.h index b8ecd6f..e0da9d7 100644 --- a/include/exec/cputlb.h +++ b/include/exec/cputlb.h @@ -34,7 +34,7 @@ extern int tlb_flush_count; void tb_flush_jmp_cache(CPUState *cpu, target_ulong addr); MemoryRegionSection * -address_space_translate_for_iotlb(AddressSpace *as, hwaddr addr, hwaddr *xlat, +address_space_translate_for_iotlb(CPUState *cpu, hwaddr addr, hwaddr *xlat, hwaddr *plen); hwaddr memory_region_section_get_iotlb(CPUState *cpu, MemoryRegionSection *section, diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 1b30813..bb3fd37 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -338,7 +338,8 @@ extern uintptr_t tci_tb_ptr; void phys_mem_set_alloc(void *(*alloc)(size_t, uint64_t *align)); -struct MemoryRegion *iotlb_to_region(AddressSpace *as, hwaddr index); +struct MemoryRegion *iotlb_to_region(CPUState *cpu, + hwaddr index); bool io_mem_read(struct MemoryRegion *mr, hwaddr addr, uint64_t *pvalue, unsigned size); bool io_mem_write(struct MemoryRegion *mr, hwaddr addr, diff --git a/include/qom/cpu.h b/include/qom/cpu.h index 2098f1c..48fd6fb 100644 --- a/include/qom/cpu.h +++ b/include/qom/cpu.h @@ -256,6 +256,7 @@
Re: [Qemu-devel] [v4 12/13] migration: Add command to set migration parameter
On 02/03/2015 06:26 PM, Li, Liang Z wrote: Hmm - do we really need two parameters here? Remember, compress threads is used only on the source, and decompress threads is used only on the destination. Having a single parameter, 'threads', which is set to compression threads on source and decompression threads on destination, and which need not be equal between the two machines, should still work, right? Yes, it works. The benefit of using one parameter instead of two can reduce the QMP command count, and the side effect of using the same thread count for compression and decompression is a little waste if the user just want to use the default settings, you know, decompression is usually about 4 times faster than compression. Use more decompression threads than needed will waste some RAM which used to save data structure related to the decompression thread, about 4K bytes RAM per thread, is it acceptable? The default setting is no compression. The user already has to configure things on both sides to get compression, so it is not a burden to ask them to configure thread count on both sides correctly. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH v2 03/12] acpi, mem-hotplug: Add acpi_memory_hotplug_sci() to rise sci for memory hotplug.
From: Tang Chen tangc...@cn.fujitsu.com Add a new API named acpi_memory_hotplug_sci() to send memory hotplug SCI. Doing this is because this procedure will be used by other functions in the next coming patches. Signed-off-by: Tang Chen tangc...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/core.c | 7 +++ hw/acpi/memory_hotplug.c | 6 ++ include/hw/acpi/acpi.h | 3 +++ 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/hw/acpi/core.c b/hw/acpi/core.c index 51913d6..98ca994 100644 --- a/hw/acpi/core.c +++ b/hw/acpi/core.c @@ -666,6 +666,13 @@ uint32_t acpi_gpe_ioport_readb(ACPIREGS *ar, uint32_t addr) return val; } +void acpi_send_gpe_event(ACPIREGS *ar, qemu_irq irq, + unsigned int hotplug_status) +{ +ar-gpe.sts[0] |= hotplug_status; +acpi_update_sci(ar, irq); +} + void acpi_update_sci(ACPIREGS *regs, qemu_irq irq) { int sci_level, pm1a_sts; diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c index ddbe01b..f30d8f9 100644 --- a/hw/acpi/memory_hotplug.c +++ b/hw/acpi/memory_hotplug.c @@ -201,10 +201,8 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st, mdev-is_enabled = true; mdev-is_inserting = true; -/* do ACPI magic */ -ar-gpe.sts[0] |= ACPI_MEMORY_HOTPLUG_STATUS; -acpi_update_sci(ar, irq); -return; +/* Do ACPI magic */ +acpi_send_gpe_event(ar, irq, ACPI_MEMORY_HOTPLUG_STATUS); } static const VMStateDescription vmstate_memhp_sts = { diff --git a/include/hw/acpi/acpi.h b/include/hw/acpi/acpi.h index 1f678b4..7a0a209 100644 --- a/include/hw/acpi/acpi.h +++ b/include/hw/acpi/acpi.h @@ -172,6 +172,9 @@ void acpi_gpe_reset(ACPIREGS *ar); void acpi_gpe_ioport_writeb(ACPIREGS *ar, uint32_t addr, uint32_t val); uint32_t acpi_gpe_ioport_readb(ACPIREGS *ar, uint32_t addr); +void acpi_send_gpe_event(ACPIREGS *ar, qemu_irq irq, + unsigned int hotplug_status); + void acpi_update_sci(ACPIREGS *acpi_regs, qemu_irq irq); /* acpi.c */ -- 1.9.3
[Qemu-devel] [PATCH v2 01/12] acpi, mem-hotplug: Use PC_DIMM_SLOT_PROP in acpi_memory_plug_cb().
From: Tang Chen tangc...@cn.fujitsu.com Replace string slot in acpi_memory_plug_cb() with MACRO PC_DIMM_SLOT_PROP. Reviewed-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Tang Chen tangc...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/memory_hotplug.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c index ed39241..c6580da 100644 --- a/hw/acpi/memory_hotplug.c +++ b/hw/acpi/memory_hotplug.c @@ -168,7 +168,8 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st, { MemStatus *mdev; Error *local_err = NULL; -int slot = object_property_get_int(OBJECT(dev), slot, local_err); +int slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP, + local_err); if (local_err) { error_propagate(errp, local_err); -- 1.9.3
Re: [Qemu-devel] [PATCH 7/9] rcu: prod call_rcu thread when calling synchronize_rcu
On Tue, 02/03 13:52, Paolo Bonzini wrote: call_rcu operates on the principle that either there is a steady stream of incoming RCU callbacks, or it is not worthwhile to wake up and process the few that are there. This however makes it hard to assert in testcases that all RCU callbacks are processed. To avoid this, make call_rcu also process callbacks if there is a steady stream of synchronize_rcu calls. This avoids deadlocks in the upcoming test-rcu-list unit test, which waits for call_rcu to reclaim all nodes that it allocates. Especially with very high load on the host, call_rcu decided to wait for a few more callbacks to pile up, but the test was done and was not going to produce more. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- util/rcu.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/util/rcu.c b/util/rcu.c index c9c3e6e..aa9f639 100644 --- a/util/rcu.c +++ b/util/rcu.c @@ -48,6 +48,9 @@ unsigned long rcu_gp_ctr = RCU_GP_LOCKED; QemuEvent rcu_gp_event; static QemuMutex rcu_gp_lock; +static int rcu_call_count; +static QemuEvent rcu_call_ready_event; + /* * Check whether a quiescent state was crossed between the beginning of * update_counter_and_wait and now. @@ -149,6 +152,9 @@ void synchronize_rcu(void) } qemu_mutex_unlock(rcu_gp_lock); +if (atomic_read(rcu_call_count)) { +qemu_event_set(rcu_call_ready_event); +} } @@ -159,8 +165,6 @@ void synchronize_rcu(void) */ static struct rcu_head dummy; static struct rcu_head *head = dummy, **tail = dummy.next; -static int rcu_call_count; -static QemuEvent rcu_call_ready_event; static void enqueue(struct rcu_head *node) { -- 1.8.3.1 Reviewed-by: Fam Zheng f...@redhat.com
[Qemu-devel] [PATCH v2 11/12] pc-dimm: Add memory hot unplug support for pc-dimm.
From: Tang Chen tangc...@cn.fujitsu.com Implement unplug cb for pc-dimm. It remove the corresponding memory region, and unregister vmstat. At last, it calls memory unplug cb to reset memory status and do unparenting. Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/i386/pc.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index ddc0190..4c03ee5 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1661,6 +1661,17 @@ out: error_propagate(errp, local_err); } +static void pc_dimm_unplug(HotplugHandler *hotplug_dev, + DeviceState *dev, Error **errp) +{ +PCMachineState *pcms = PC_MACHINE(hotplug_dev); +HotplugHandlerClass *hhc; +Error *local_err = NULL; + +hhc = HOTPLUG_HANDLER_GET_CLASS(pcms-acpi_dev); +hhc-unplug(HOTPLUG_HANDLER(pcms-acpi_dev), dev, local_err); +} + static void pc_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { @@ -1714,8 +1725,13 @@ static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev, static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { -error_setg(errp, acpi: device unplug for not supported device -type: %s, object_get_typename(OBJECT(dev))); +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) { +pc_dimm_unplug(hotplug_dev, dev, errp); +object_unparent(OBJECT(dev)); +} else { +error_setg(errp, acpi: device unplug for not supported device +type: %s, object_get_typename(OBJECT(dev))); +} } static HotplugHandler *pc_get_hotpug_handler(MachineState *machine, -- 1.9.3
Re: [Qemu-devel] [PATCH RFC 1/1] KVM: s390: Add MEMOP ioctl for reading/writing guest memory
On Tue, 03 Feb 2015 16:22:32 +0100 Paolo Bonzini pbonz...@redhat.com wrote: On 03/02/2015 16:16, Thomas Huth wrote: Actually, I'd prefer to keep the virtual in the defines for the type of operation below: When it comes to s390 storage keys, we likely might need some calls for reading and writing to physical memory, too. Then we could simply extend this ioctl instead of inventing a new one. Can you explain why it is necessary to read/write physical addresses from user space? In the case of QEMU, I'm worried that you would have to invent your own memory read/write APIs that are different from everything else. On real s390 zPCI, does bus-master DMA update storage keys? Ah, I was not thinking about bus-mastering/DMA here: AFAIK there are some CPU instructions that access a parameter block in physical memory, for example the SCLP instruction (see hw/s390x/sclp.c) - it's already doing a cpu_physical_memory_read and ..._write for the parameters. However, I haven't checked yet whether it is also supposed to touch the storage keys, so if not, we also might be fine without the ioctls for reading/writing to physical memory. Not really true, as you don't check it. So It is not used by KVM with the currently defined set of flags is a better explanation. ok ... and maybe add should be set to zero ? If you don't check it, it is misleading to document this. True... so I'll omit that. Thomas
Re: [Qemu-devel] [PATCH] hw/arm/virt: explain device-to-transport mapping in create_virtio_devices()
On 30 January 2015 at 04:34, Laszlo Ersek ler...@redhat.com wrote: Peter, On 01/30/15 05:31, Laszlo Ersek wrote: Signed-off-by: Laszlo Ersek ler...@redhat.com --- hw/arm/virt.c | 32 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 2353440..091e5ee 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -441,10 +441,27 @@ static void create_virtio_devices(const VirtBoardInfo *vbi, qemu_irq *pic) int i; hwaddr size = vbi-memmap[VIRT_MMIO].size; -/* Note that we have to create the transports in forwards order - * so that command line devices are inserted lowest address first, - * and then add dtb nodes in reverse order so that they appear in - * the finished device tree lowest address first. +/* We create the transports in forwards order. Since qbus_realize() + * prepends (not appends) new child buses, the incrementing loop below will + * create a list of virtio-mmio buses with decreasing base addresses. + * + * When a -device option is processed from the command line, + * qbus_find_recursive() picks the next free virtio-mmio bus in forwards + * order. The upshot is that -device options in increasing command line + * order are mapped to virtio-mmio buses with decreasing base addresses. + * + * When this code was originally written, that arrangement ensured that the + * guest Linux kernel would give the lowest name (/dev/vda, eth0, etc) to + * the first -device on the command line. (The end-to-end order is a + * function of this loop, qbus_realize(), qbus_find_recursive(), and the + * guest kernel's name-to-address assignment strategy.) + * + * Meanwhile, the kernel's traversal seems to have been reserved; see eg. can you please s/reserved/reversed/? Result of over-editing, sorry. Sure, no problem. I also suggest I add this para: * * In any case, the kernel makes no guarantee about the stability of * enumeration order of virtio devices (as demonstrated by it changing * between kernel versions). For reliable and stable identification * of disks users must use UUIDs or similar mechanisms. -- PMM
Re: [Qemu-devel] [PULL 0/9] s390x bugfixes and cleanups
Am 03.02.2015 um 14:45 schrieb Peter Maydell: On 3 February 2015 at 13:08, Cornelia Huck cornelia.h...@de.ibm.com wrote: The following changes since commit 16017c48547960539fcadb1f91d252124f442482: softfloat: Clarify license status (2015-01-29 16:45:45 +) are available in the git repository at: git://github.com/cohuck/qemu tags/s390x-20150203 for you to fetch changes up to 553ce81c31e49d834b1bf635ab486695a4694333: pc-bios/s390-ccw: update binary (2015-02-03 13:42:40 +0100) Some bugfixes and cleanups for s390x, both in the new pci code and in old code. I'm a bit sad my fix-clang-warnings-in-s390 code patches didn't make it in to this pull, because I think they're the only remaining obstacle to my enabling warnings-are-errors in that build config... These fixes are tcg code, so Alex or Richard should take these. Christian
[Qemu-devel] [PATCH 1/9] exec: introduce cpu_reload_memory_map
This for now is a simple TLB flush. This can change later for two reasons: 1) an AddressSpaceDispatch will be cached in the CPUState object 2) it will not be possible to do tlb_flush once the TCG-generated code runs outside the BQL. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpu-exec.c | 6 ++ exec.c | 2 +- include/exec/exec-all.h | 1 + 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/cpu-exec.c b/cpu-exec.c index fa506e6..78fe382 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -141,6 +141,12 @@ void cpu_resume_from_signal(CPUState *cpu, void *puc) cpu-exception_index = -1; siglongjmp(cpu-jmp_env, 1); } + +void cpu_reload_memory_map(CPUState *cpu) +{ +/* The TLB is protected by the iothread lock. */ +tlb_flush(cpu, 1); +} #endif /* Execute a TB, and fix up the CPU state afterwards if necessary */ diff --git a/exec.c b/exec.c index 6b79ad1..5a75909 100644 --- a/exec.c +++ b/exec.c @@ -2026,7 +2026,7 @@ static void tcg_commit(MemoryListener *listener) if (cpu-tcg_as_listener != listener) { continue; } -tlb_flush(cpu, 1); +cpu_reload_memory_map(cpu); } } diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 6a15448..1b30813 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -96,6 +96,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end, void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end, int is_cpu_write_access); #if !defined(CONFIG_USER_ONLY) +void cpu_reload_memory_map(CPUState *cpu); void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace *as); /* cputlb.c */ void tlb_flush_page(CPUState *cpu, target_ulong addr); -- 1.8.3.1
Re: [Qemu-devel] [PATCH RFC 1/1] KVM: s390: Add MEMOP ioctl for reading/writing guest memory
On 03/02/2015 13:11, Thomas Huth wrote: On s390, we've got to make sure to hold the IPTE lock while accessing virtual memory. So let's add an ioctl for reading and writing virtual memory to provide this feature for userspace, too. Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com Reviewed-by: Dominik Dingel din...@linux.vnet.ibm.com Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com --- Documentation/virtual/kvm/api.txt | 44 + arch/s390/kvm/gaccess.c | 22 + arch/s390/kvm/gaccess.h |2 + arch/s390/kvm/kvm-s390.c | 63 + include/uapi/linux/kvm.h | 21 5 files changed, 152 insertions(+), 0 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index b112efc..bf44b53 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2716,6 +2716,50 @@ The fields in each entry are defined as follows: eax, ebx, ecx, edx: the values returned by the cpuid instruction for this function/index combination +4.89 KVM_GUEST_MEM_OP + +Capability: KVM_CAP_MEM_OP Put virtual somewhere in the ioctl name and capability? +Architectures: s390 +Type: vcpu ioctl +Parameters: struct kvm_guest_mem_op (in) +Returns: = 0 on success, + 0 on generic error (e.g. -EFAULT or -ENOMEM), + 0 if an exception occurred while walking the page tables + +Read or write data from/to the virtual memory of a VPCU. + +Parameters are specified via the following structure: + +struct kvm_guest_mem_op { + __u64 gaddr;/* the guest address */ + __u64 flags;/* arch specific flags */ + __u32 size; /* amount of bytes */ + __u32 op; /* type of operation */ + __u64 buf; /* buffer in userspace */ + __u8 reserved[32]; /* should be set to 0 */ +}; + +The type of operation is specified in the op field, either KVM_MEMOP_VIRTREAD +for reading from memory, KVM_MEMOP_VIRTWRITE for writing to memory, or +KVM_MEMOP_CHECKVIRTREAD or KVM_MEMOP_CHECKVIRTWRITE to check whether the Better: #define KVM_MEMOP_READ 0 #define KVM_MEMOP_WRITE 1 and in the flags field: #define KVM_MEMOP_F_CHECK_ONLY (1 1) +corresponding memory access would create an access exception (without +changing the data in the memory at the destination). In case an access +exception occurred while walking the MMU tables of the guest, the ioctl +returns a positive error number to indicate the type of exception. The +exception is raised directly at the corresponding VCPU if the bit +KVM_MEMOP_F_INJECT_EXC is set in the flags field. KVM_MEMOP_F_INJECT_EXCEPTION. +The logical (virtual) start address of the memory region has to be specified +in the gaddr field, and the length of the region in the size field. +buf is the buffer supplied by the userspace application where the read data +should be written to for KVM_MEMOP_VIRTREAD, or where the data that should +be written is stored for a KVM_MEMOP_VIRTWRITE. buf can be NULL for both +CHECK operations. buf is unused and can be NULL for both CHECK operations. +The reserved field is meant for future extensions. It must currently be +set to 0. Not really true, as you don't check it. So It is not used by KVM with the currently defined set of flags is a better explanation. Paolo + 5. The kvm_run structure diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 8a1be90..d912362 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -697,6 +697,28 @@ int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, } /** + * check_gva_range - test a range of guest virtual addresses for accessibility + */ +int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva, + unsigned long length, int is_write) +{ + unsigned long gpa; + unsigned long currlen; + int rc = 0; + + ipte_lock(vcpu); + while (length 0 !rc) { + currlen = min(length, PAGE_SIZE - (gva % PAGE_SIZE)); + rc = guest_translate_address(vcpu, gva, gpa, is_write); + gva += currlen; + length -= currlen; + } + ipte_unlock(vcpu); + + return rc; +} + +/** * kvm_s390_check_low_addr_protection - check for low-address protection * @ga: Guest address * diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 0149cf1..268beb7 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -157,6 +157,8 @@ int read_guest_lc(struct kvm_vcpu *vcpu, unsigned long gra, void *data, int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, unsigned long *gpa, int write); +int check_gva_range(struct kvm_vcpu *vcpu, unsigned long
Re: [Qemu-devel] [PULL 0/2] OpenRISC patch queue for 2.3
Hi Peter, unfortunately you are right. The correct line is this: /* invalidate lock */ -env-cpu_lock_addr = -1; +env-lock_addr = -1; I am sorry. It was most likely the last line which I added. But I forgot, that I disabled the system emulation already. Therefore my make process didn't complain. Should I send an updated patch, or can you do a hot-fix? Sebastian On 2/3/2015 11:40 AM, Peter Maydell wrote: On 3 February 2015 at 02:19, Jia Liu pro...@gmail.com wrote: Hi Anthony, This is my OpenRISC patch queue for 2.3, it have been well tested, please pull. ...it can't have been very well tested, because it doesn't compile: target-openrisc/interrupt.c: In function ‘openrisc_cpu_do_interrupt’: target-openrisc/interrupt.c:58:8: error: ‘CPUOpenRISCState’ has no member named ‘cpu_lock_addr’ thanks -- PMM
Re: [Qemu-devel] [PULL 0/2] OpenRISC patch queue for 2.3
On 3 February 2015 at 13:04, Sebastian Macke sebast...@macke.de wrote: Hi Peter, unfortunately you are right. The correct line is this: /* invalidate lock */ -env-cpu_lock_addr = -1; +env-lock_addr = -1; I am sorry. It was most likely the last line which I added. But I forgot, that I disabled the system emulation already. Therefore my make process didn't complain. Should I send an updated patch, or can you do a hot-fix? You should send an updated patch, and then Jia needs to re-test and send a new pull request. Somebody ought to be testing these instructions in system emulation mode as well as linux-user... thanks -- PMM
Re: [Qemu-devel] [PULL 0/9] s390x bugfixes and cleanups
On 3 February 2015 at 13:08, Cornelia Huck cornelia.h...@de.ibm.com wrote: The following changes since commit 16017c48547960539fcadb1f91d252124f442482: softfloat: Clarify license status (2015-01-29 16:45:45 +) are available in the git repository at: git://github.com/cohuck/qemu tags/s390x-20150203 for you to fetch changes up to 553ce81c31e49d834b1bf635ab486695a4694333: pc-bios/s390-ccw: update binary (2015-02-03 13:42:40 +0100) Some bugfixes and cleanups for s390x, both in the new pci code and in old code. I'm a bit sad my fix-clang-warnings-in-s390 code patches didn't make it in to this pull, because I think they're the only remaining obstacle to my enabling warnings-are-errors in that build config... -- PMM
Re: [Qemu-devel] [PATCH] vfio: free dynamically-allocated data in instance_finalize
On Tue, 2015-02-03 at 13:48 +0100, Paolo Bonzini wrote: In order to enable out-of-BQL address space lookup, destruction of devices needs to be split in two phases. Unrealize is the first phase; once it complete no new accesses will be started, but there may still be pending memory accesses can still be completed. The second part is freeing the device, which only happens once all memory accesses are complete. At this point the reference count has dropped to zero, an RCU grace period must have completed (because the RCU-protected FlatViews hold a reference to the device via memory_region_ref). This is when instance_finalize is called. Freeing data belongs in an instance_finalize callback, because the dynamically allocated memory can still be used after unrealize by the pending memory accesses. In the case of VFIO, the unrealize callback is too early to munmap the BARs. The munmap must be delayed until memory accesses are complete. To do this, split vfio_unmap_bars in two. The removal step, now called vfio_unregister_bars, remains in vfio_exitfn. The reclamation step is vfio_unmap_bars and is moved to the instance_finalize callback. Similarly, quirk MemoryRegions have to be removed during vfio_unregister_bars, but freeing the data structure must be delayed to vfio_unmap_bars. Cc: Alex Williamson alex.william...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- This patch is part of the third installment 3 of the RCU work. Sending it out separately for Alex to review it. hw/vfio/pci.c | 78 +- 1 file changed, 68 insertions(+), 10 deletions(-) Looks good to me. I don't see any external dependencies, so do you want me to pull this in through my branch? Thanks, Alex diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 014a92c..69d4a33 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -1997,12 +1997,23 @@ static void vfio_vga_quirk_setup(VFIOPCIDevice *vdev) static void vfio_vga_quirk_teardown(VFIOPCIDevice *vdev) { +VFIOQuirk *quirk; +int i; + +for (i = 0; i ARRAY_SIZE(vdev-vga.region); i++) { +QLIST_FOREACH(quirk, vdev-vga.region[i].quirks, next) { +memory_region_del_subregion(vdev-vga.region[i].mem, quirk-mem); +} +} +} + +static void vfio_vga_quirk_free(VFIOPCIDevice *vdev) +{ int i; for (i = 0; i ARRAY_SIZE(vdev-vga.region); i++) { while (!QLIST_EMPTY(vdev-vga.region[i].quirks)) { VFIOQuirk *quirk = QLIST_FIRST(vdev-vga.region[i].quirks); -memory_region_del_subregion(vdev-vga.region[i].mem, quirk-mem); object_unparent(OBJECT(quirk-mem)); QLIST_REMOVE(quirk, next); g_free(quirk); @@ -2023,10 +2034,19 @@ static void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr) static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr) { VFIOBAR *bar = vdev-bars[nr]; +VFIOQuirk *quirk; + +QLIST_FOREACH(quirk, bar-quirks, next) { +memory_region_del_subregion(bar-region.mem, quirk-mem); +} +} + +static void vfio_bar_quirk_free(VFIOPCIDevice *vdev, int nr) +{ +VFIOBAR *bar = vdev-bars[nr]; while (!QLIST_EMPTY(bar-quirks)) { VFIOQuirk *quirk = QLIST_FIRST(bar-quirks); -memory_region_del_subregion(bar-region.mem, quirk-mem); object_unparent(OBJECT(quirk-mem)); QLIST_REMOVE(quirk, next); g_free(quirk); @@ -2282,7 +2302,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled) } } -static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr) +static void vfio_unregister_bar(VFIOPCIDevice *vdev, int nr) { VFIOBAR *bar = vdev-bars[nr]; @@ -2293,10 +2313,25 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr) vfio_bar_quirk_teardown(vdev, nr); memory_region_del_subregion(bar-region.mem, bar-region.mmap_mem); -munmap(bar-region.mmap, memory_region_size(bar-region.mmap_mem)); if (vdev-msix vdev-msix-table_bar == nr) { memory_region_del_subregion(bar-region.mem, vdev-msix-mmap_mem); +} +} + +static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr) +{ +VFIOBAR *bar = vdev-bars[nr]; + +if (!bar-region.size) { +return; +} + +vfio_bar_quirk_free(vdev, nr); + +munmap(bar-region.mmap, memory_region_size(bar-region.mmap_mem)); + +if (vdev-msix vdev-msix-table_bar == nr) { munmap(vdev-msix-mmap, memory_region_size(vdev-msix-mmap_mem)); } } @@ -2413,6 +2448,19 @@ static void vfio_unmap_bars(VFIOPCIDevice *vdev) } if (vdev-has_vga) { +vfio_vga_quirk_free(vdev); +} +} + +static void vfio_unregister_bars(VFIOPCIDevice *vdev) +{ +int i; + +for (i = 0; i PCI_ROM_SLOT; i++) { +vfio_unregister_bar(vdev, i); +} + +if
Re: [Qemu-devel] [PATCH] block: introduce BDRV_REQUEST_MAX_SECTORS
On 03/02/15 17:30, Peter Lieven wrote: Am 03.02.2015 um 14:29 schrieb Denis V. Lunev: On 03/02/15 15:12, Peter Lieven wrote: we check and adjust request sizes at several places with sometimes inconsistent checks or default values: INT_MAX INT_MAX BDRV_SECTOR_BITS UINT_MAX BDRV_SECTOR_BITS SIZE_MAX BDRV_SECTOR_BITS This patches introdocues a macro for the maximal allowed sectors per request and uses it at several places. Signed-off-by: Peter Lieven p...@kamp.de --- block.c | 19 --- hw/block/virtio-blk.c | 4 ++-- include/block/block.h | 3 +++ 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/block.c b/block.c index 8272ef9..4e58b35 100644 --- a/block.c +++ b/block.c @@ -2671,7 +2671,7 @@ static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset, static int bdrv_check_request(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { -if (nb_sectors 0 || nb_sectors INT_MAX / BDRV_SECTOR_SIZE) { +if (nb_sectors 0 || nb_sectors BDRV_REQUEST_MAX_SECTORS) { return -EIO; } @@ -2758,7 +2758,7 @@ static int bdrv_rw_co(BlockDriverState *bs, int64_t sector_num, uint8_t *buf, .iov_len = nb_sectors * BDRV_SECTOR_SIZE, }; -if (nb_sectors 0 || nb_sectors INT_MAX / BDRV_SECTOR_SIZE) { +if (nb_sectors 0 || nb_sectors BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -2826,13 +2826,10 @@ int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags) } for (;;) { -nb_sectors = target_sectors - sector_num; +nb_sectors = MIN(target_sectors - sector_num, BDRV_REQUEST_MAX_SECTORS); if (nb_sectors = 0) { return 0; } -if (nb_sectors INT_MAX / BDRV_SECTOR_SIZE) { -nb_sectors = INT_MAX / BDRV_SECTOR_SIZE; -} ret = bdrv_get_block_status(bs, sector_num, nb_sectors, n); if (ret 0) { error_report(error getting block status at sector % PRId64 : %s, @@ -3167,7 +3164,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, BdrvRequestFlags flags) { -if (nb_sectors 0 || nb_sectors (UINT_MAX BDRV_SECTOR_BITS)) { +if (nb_sectors 0 || nb_sectors BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -3202,8 +3199,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, struct iovec iov = {0}; int ret = 0; -int max_write_zeroes = bs-bl.max_write_zeroes ? - bs-bl.max_write_zeroes : INT_MAX; +int max_write_zeroes = MIN_NON_ZERO(bs-bl.max_write_zeroes, + BDRV_REQUEST_MAX_SECTORS); while (nb_sectors 0 !ret) { int num = nb_sectors; @@ -3458,7 +3455,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, BdrvRequestFlags flags) { -if (nb_sectors 0 || nb_sectors (INT_MAX BDRV_SECTOR_BITS)) { +if (nb_sectors 0 || nb_sectors BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -5120,7 +5117,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, return 0; } -max_discard = bs-bl.max_discard ? bs-bl.max_discard : INT_MAX; +max_discard = MIN_NON_ZERO(bs-bl.max_discard, BDRV_REQUEST_MAX_SECTORS); while (nb_sectors 0) { int ret; int num = nb_sectors; diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 8c51a29..1a8a176 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -381,7 +381,7 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb) } max_xfer_len = blk_get_max_transfer_length(mrb-reqs[0]-dev-blk); -max_xfer_len = MIN_NON_ZERO(max_xfer_len, INT_MAX); +max_xfer_len = MIN_NON_ZERO(max_xfer_len, BDRV_REQUEST_MAX_SECTORS); qsort(mrb-reqs, mrb-num_reqs, sizeof(*mrb-reqs), multireq_compare); @@ -447,7 +447,7 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev, uint64_t nb_sectors = size BDRV_SECTOR_BITS; uint64_t total_sectors; -if (nb_sectors INT_MAX) { +if (nb_sectors BDRV_REQUEST_MAX_SECTORS) { return false; } if (sector dev-sector_mask) { diff --git a/include/block/block.h b/include/block/block.h index 3082d2b..25a6d62 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -83,6 +83,9 @@ typedef enum { #define BDRV_SECTOR_SIZE (1ULL BDRV_SECTOR_BITS) #define BDRV_SECTOR_MASK ~(BDRV_SECTOR_SIZE - 1) +#define BDRV_REQUEST_MAX_SECTORS MIN(SIZE_MAX BDRV_SECTOR_BITS, \ + INT_MAX BDRV_SECTOR_BITS) + /* * Allocation status flags * BDRV_BLOCK_DATA: data is read from bs-file or another file Reviewed-by: Denis V. Lunev
Re: [Qemu-devel] [PULL 0/9] s390x bugfixes and cleanups
On 3 February 2015 at 13:08, Cornelia Huck cornelia.h...@de.ibm.com wrote: The following changes since commit 16017c48547960539fcadb1f91d252124f442482: softfloat: Clarify license status (2015-01-29 16:45:45 +) are available in the git repository at: git://github.com/cohuck/qemu tags/s390x-20150203 for you to fetch changes up to 553ce81c31e49d834b1bf635ab486695a4694333: pc-bios/s390-ccw: update binary (2015-02-03 13:42:40 +0100) Some bugfixes and cleanups for s390x, both in the new pci code and in old code. Applied, thanks. -- PMM
Re: [Qemu-devel] [PATCH 0/4] block: Drop BDS.filename
Am 03.02.2015 um 14:48 hat Max Reitz geschrieben: On 2015-02-03 at 04:32, Kevin Wolf wrote: Am 24.09.2014 um 21:48 hat Max Reitz geschrieben: The BDS filename field is generally only used when opening disk images or emitting error or warning messages, the only exception to this rule is the map command of qemu-img. However, using exact_filename there instead should not be a problem. Therefore, we can drop the filename field from the BlockDriverState and use a function instead which builds the filename from scratch when called. This is slower than reading a static char array but the problem of that static array is that it may become obsolete due to changes in any BlockDriverState or in the BDS graph. Using a function which rebuilds the filename every time it is called resolves this problem. The disadvantage of worse performance is negligible, on the other hand. After patch 2 of this series, which replaces some queries of BDS.filename by reads from somewhere else (mostly BDS.exact_filename), the filename field is only used when a disk image is opened or some message should be emitted, both of which cases do not suffer from the performance hit. Surprisingly (or not), this one needs rebasing. Well... I tried it and it doesn't look too hard, but it's a little bit more than what I'm comfortable with doing while applying a series. I admire your courage, but I'm not sure whether this series is ready for being applied at all. First we (or I) will have to look into how users like libvirt which identify a BDS based on the filename can break from applying this series. Well, I haven't reviewed it, so I can't tell. It didn't have a (Self-)NACK and it's still on your list of to-be-merged patches, so I took a look. You're talking about courage - but I just wasn't courageous enough yet to attack your larger series... ;-) Kevin
[Qemu-devel] Looking for Outreachy sponsors for QEMU, libvirt, and KVM internships (was Outreach Program for Women)
Outreach Program for Women is renaming to Outreachy. The new website is: http://outreachy.org/ What is Outreachy? Outreachy helps people from underrepresented groups join the open source community through a 12-week full-time paid internship. The format is similar to Google Summer of Code. Instead of funding university students the focus is on funding women (cis and trans), trans men, and genderqueer people. Last year QEMU participated with one intern, Maria, who developed a qcow2 image format fuzzer to find input validation bugs in QEMU's qcow2 block driver. GNOME, the Linux kernel community, and other projects have also been participating successfully for years. What is the level of sponsorship? Sponsorship is $6,500 per intern. Sponsors can choose their mentor if desired, otherwise we have experienced mentors who can participate. If your company wants to be active in growing the open source community, this is a great way to engage without administrating your own internship program! Dates: * Funding commitment: Monday, February 16 * Participating orgs announced: February 17 * Application deadline for interns: March 24 * Internship dates: May 25 to August 25 Sponsors are listed for recognition on the Outreachy website and can promote job openings. How do QEMU, libvirt, and KVM participate? We try to participate in both Outreachy and Google Summer of Code each year. QEMU acts as an umbrella organization for libvirt and KVM. We have experienced mentors and are able to add new mentors who are active contributors to QEMU, libvirt, or KVM. Full info for organizations: https://wiki.gnome.org/Outreachy/Admin/InfoForOrgs Please let me know if you have any questions. Stefan
[Qemu-devel] [PATCH v3 8/8] tcg: Remove unused opcodes
We no longer need INDEX_op_end to terminate the list, nor do we need 5 forms of nop, since we just remove the TCGOp instead. Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de Signed-off-by: Richard Henderson r...@twiddle.net --- tcg/tcg-opc.h | 9 - tcg/tcg.c | 7 ++- tci.c | 13 - 3 files changed, 2 insertions(+), 27 deletions(-) diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index 042d442..42d0cfe 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -27,15 +27,6 @@ */ /* predefined ops */ -DEF(end, 0, 0, 0, TCG_OPF_NOT_PRESENT) /* must be kept first */ -DEF(nop, 0, 0, 0, TCG_OPF_NOT_PRESENT) -DEF(nop1, 0, 0, 1, TCG_OPF_NOT_PRESENT) -DEF(nop2, 0, 0, 2, TCG_OPF_NOT_PRESENT) -DEF(nop3, 0, 0, 3, TCG_OPF_NOT_PRESENT) - -/* variable number of parameters */ -DEF(nopn, 0, 0, 1, TCG_OPF_NOT_PRESENT) - DEF(discard, 1, 0, 0, TCG_OPF_NOT_PRESENT) DEF(set_label, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_NOT_PRESENT) diff --git a/tcg/tcg.c b/tcg/tcg.c index 4115e8b..3841e99 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1260,7 +1260,7 @@ void tcg_op_remove(TCGContext *s, TCGOp *op) s-gen_first_op_idx = next; } -*op = (TCGOp){ .opc = INDEX_op_nop, .next = -1, .prev = -1 }; +memset(op, -1, sizeof(*op)); #ifdef CONFIG_PROFILER s-del_op_count++; @@ -1385,8 +1385,6 @@ static void tcg_liveness_analysis(TCGContext *s) } break; case INDEX_op_debug_insn_start: -case INDEX_op_nop: -case INDEX_op_end: break; case INDEX_op_discard: /* mark the temporary as dead */ @@ -2244,7 +2242,7 @@ void tcg_dump_op_count(FILE *f, fprintf_function cpu_fprintf) { int i; -for(i = INDEX_op_end; i NB_OPS; i++) { +for (i = 0; i NB_OPS; i++) { cpu_fprintf(f, %s % PRId64 \n, tcg_op_defs[i].name, tcg_table_op_count[i]); } @@ -2328,7 +2326,6 @@ static inline int tcg_gen_code_common(TCGContext *s, tcg_reg_alloc_movi(s, args, dead_args, sync_args); break; case INDEX_op_debug_insn_start: -case INDEX_op_nop: break; case INDEX_op_discard: temp_dead(s, args[0]); diff --git a/tci.c b/tci.c index 4711ee4..28292b3 100644 --- a/tci.c +++ b/tci.c @@ -506,19 +506,6 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr) tb_ptr += 2; switch (opc) { -case INDEX_op_end: -case INDEX_op_nop: -break; -case INDEX_op_nop1: -case INDEX_op_nop2: -case INDEX_op_nop3: -case INDEX_op_nopn: -case INDEX_op_discard: -TODO(); -break; -case INDEX_op_set_label: -TODO(); -break; case INDEX_op_call: t0 = tci_read_ri(tb_ptr); #if TCG_TARGET_REG_BITS == 32 -- 2.1.0
[Qemu-devel] [PATCH v3 5/8] tcg: Put opcodes in a linked list
The previous setup required ops and args to be completely sequential, and was error prone when it came to both iteration and optimization. Signed-off-by: Richard Henderson r...@twiddle.net --- include/exec/gen-icount.h | 22 ++- tcg/optimize.c| 286 ++- tcg/tcg-op.c | 190 --- tcg/tcg.c | 376 +++--- tcg/tcg.h | 58 --- 5 files changed, 431 insertions(+), 501 deletions(-) diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h index a37a61d..6e5b012 100644 --- a/include/exec/gen-icount.h +++ b/include/exec/gen-icount.h @@ -11,8 +11,8 @@ static int exitreq_label; static inline void gen_tb_start(TranslationBlock *tb) { -TCGv_i32 count; -TCGv_i32 flag; +TCGv_i32 count, flag, imm; +int i; exitreq_label = gen_new_label(); flag = tcg_temp_new_i32(); @@ -21,16 +21,25 @@ static inline void gen_tb_start(TranslationBlock *tb) tcg_gen_brcondi_i32(TCG_COND_NE, flag, 0, exitreq_label); tcg_temp_free_i32(flag); -if (!(tb-cflags CF_USE_ICOUNT)) +if (!(tb-cflags CF_USE_ICOUNT)) { return; +} icount_label = gen_new_label(); count = tcg_temp_local_new_i32(); tcg_gen_ld_i32(count, cpu_env, -ENV_OFFSET + offsetof(CPUState, icount_decr.u32)); + +imm = tcg_temp_new_i32(); +tcg_gen_movi_i32(imm, 0xdeadbeef); + /* This is a horrid hack to allow fixing up the value later. */ -icount_arg = tcg_ctx.gen_opparam_ptr + 1; -tcg_gen_subi_i32(count, count, 0xdeadbeef); +i = tcg_ctx.gen_last_op_idx; +i = tcg_ctx.gen_op_buf[i].args; +icount_arg = tcg_ctx.gen_opparam_buf[i + 1]; + +tcg_gen_sub_i32(count, count, imm); +tcg_temp_free_i32(imm); tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, icount_label); tcg_gen_st16_i32(count, cpu_env, @@ -49,7 +58,8 @@ static void gen_tb_end(TranslationBlock *tb, int num_insns) tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_ICOUNT_EXPIRED); } -*tcg_ctx.gen_opc_ptr = INDEX_op_end; +/* Terminate the linked list. */ +tcg_ctx.gen_op_buf[tcg_ctx.gen_last_op_idx].next = -1; } static inline void gen_io_start(void) diff --git a/tcg/optimize.c b/tcg/optimize.c index 34ae3c2..f2b8acf 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -162,13 +162,13 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2) return false; } -static void tcg_opt_gen_mov(TCGContext *s, int op_index, TCGArg *gen_args, +static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args, TCGOpcode old_op, TCGArg dst, TCGArg src) { TCGOpcode new_op = op_to_mov(old_op); tcg_target_ulong mask; -s-gen_opc_buf[op_index] = new_op; +op-opc = new_op; reset_temp(dst); mask = temps[src].mask; @@ -193,17 +193,17 @@ static void tcg_opt_gen_mov(TCGContext *s, int op_index, TCGArg *gen_args, temps[src].next_copy = dst; } -gen_args[0] = dst; -gen_args[1] = src; +args[0] = dst; +args[1] = src; } -static void tcg_opt_gen_movi(TCGContext *s, int op_index, TCGArg *gen_args, +static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args, TCGOpcode old_op, TCGArg dst, TCGArg val) { TCGOpcode new_op = op_to_movi(old_op); tcg_target_ulong mask; -s-gen_opc_buf[op_index] = new_op; +op-opc = new_op; reset_temp(dst); temps[dst].state = TCG_TEMP_CONST; @@ -215,8 +215,8 @@ static void tcg_opt_gen_movi(TCGContext *s, int op_index, TCGArg *gen_args, } temps[dst].mask = mask; -gen_args[0] = dst; -gen_args[1] = val; +args[0] = dst; +args[1] = val; } static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y) @@ -533,11 +533,9 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2) } /* Propagate constants and copies, fold constant expressions. */ -static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, -TCGArg *args, TCGOpDef *tcg_op_defs) +static void tcg_constant_folding(TCGContext *s) { -int nb_ops, op_index, nb_temps, nb_globals; -TCGArg *gen_args; +int oi, oi_next, nb_temps, nb_globals; /* Array VALS has an element for each temp. If this temp holds a constant then its value is kept in VALS' element. @@ -548,24 +546,23 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, nb_globals = s-nb_globals; reset_all_temps(nb_temps); -nb_ops = tcg_opc_ptr - s-gen_opc_buf; -gen_args = args; -for (op_index = 0; op_index nb_ops; op_index++) { -TCGOpcode op = s-gen_opc_buf[op_index]; -const TCGOpDef *def = tcg_op_defs[op]; +for (oi = s-gen_first_op_idx; oi = 0; oi = oi_next) { tcg_target_ulong mask, partmask, affected; -int nb_oargs, nb_iargs, nb_args, i;
[Qemu-devel] [PATCH v3 2/8] tcg: Reduce ifdefs in tcg-op.c
Almost completely eliminates the ifdefs in this file, improving confidence in the lesser used 32-bit builds. Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de Signed-off-by: Richard Henderson r...@twiddle.net --- tcg/tcg-op.c | 449 +++ 1 file changed, 207 insertions(+), 242 deletions(-) diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c index a6fd0a6..5305f1d 100644 --- a/tcg/tcg-op.c +++ b/tcg/tcg-op.c @@ -25,6 +25,15 @@ #include tcg.h #include tcg-op.h +/* Reduce the number of ifdefs below. This assumes that all uses of + TCGV_HIGH and TCGV_LOW are properly protected by a conditional that + the compiler can eliminate. */ +#if TCG_TARGET_REG_BITS == 64 +extern TCGv_i32 TCGV_LOW_link_error(TCGv_i64); +extern TCGv_i32 TCGV_HIGH_link_error(TCGv_i64); +#define TCGV_LOW TCGV_LOW_link_error +#define TCGV_HIGH TCGV_HIGH_link_error +#endif void tcg_gen_op0(TCGContext *ctx, TCGOpcode opc) { @@ -901,11 +910,14 @@ void tcg_gen_subi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2) void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2) { -#if TCG_TARGET_REG_BITS == 32 -tcg_gen_andi_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2); -tcg_gen_andi_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2 32); -#else TCGv_i64 t0; + +if (TCG_TARGET_REG_BITS == 32) { +tcg_gen_andi_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2); +tcg_gen_andi_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2 32); +return; +} + /* Some cases can be optimized here. */ switch (arg2) { case 0: @@ -937,15 +949,15 @@ void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2) t0 = tcg_const_i64(arg2); tcg_gen_and_i64(ret, arg1, t0); tcg_temp_free_i64(t0); -#endif } void tcg_gen_ori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2) { -#if TCG_TARGET_REG_BITS == 32 -tcg_gen_ori_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2); -tcg_gen_ori_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2 32); -#else +if (TCG_TARGET_REG_BITS == 32) { +tcg_gen_ori_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2); +tcg_gen_ori_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2 32); +return; +} /* Some cases can be optimized here. */ if (arg2 == -1) { tcg_gen_movi_i64(ret, -1); @@ -956,15 +968,15 @@ void tcg_gen_ori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2) tcg_gen_or_i64(ret, arg1, t0); tcg_temp_free_i64(t0); } -#endif } void tcg_gen_xori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2) { -#if TCG_TARGET_REG_BITS == 32 -tcg_gen_xori_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2); -tcg_gen_xori_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2 32); -#else +if (TCG_TARGET_REG_BITS == 32) { +tcg_gen_xori_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2); +tcg_gen_xori_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2 32); +return; +} /* Some cases can be optimized here. */ if (arg2 == 0) { tcg_gen_mov_i64(ret, arg1); @@ -976,10 +988,8 @@ void tcg_gen_xori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2) tcg_gen_xor_i64(ret, arg1, t0); tcg_temp_free_i64(t0); } -#endif } -#if TCG_TARGET_REG_BITS == 32 static inline void tcg_gen_shifti_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned c, bool right, bool arith) { @@ -1031,23 +1041,10 @@ static inline void tcg_gen_shifti_i64(TCGv_i64 ret, TCGv_i64 arg1, void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2) { -tcg_gen_shifti_i64(ret, arg1, arg2, 0, 0); -} - -void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2) -{ -tcg_gen_shifti_i64(ret, arg1, arg2, 1, 0); -} - -void tcg_gen_sari_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2) -{ -tcg_gen_shifti_i64(ret, arg1, arg2, 1, 1); -} -#else /* TCG_TARGET_REG_SIZE == 64 */ -void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2) -{ tcg_debug_assert(arg2 64); -if (arg2 == 0) { +if (TCG_TARGET_REG_BITS == 32) { +tcg_gen_shifti_i64(ret, arg1, arg2, 0, 0); +} else if (arg2 == 0) { tcg_gen_mov_i64(ret, arg1); } else { TCGv_i64 t0 = tcg_const_i64(arg2); @@ -1059,7 +1056,9 @@ void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2) void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2) { tcg_debug_assert(arg2 64); -if (arg2 == 0) { +if (TCG_TARGET_REG_BITS == 32) { +tcg_gen_shifti_i64(ret, arg1, arg2, 1, 0); +} else if (arg2 == 0) { tcg_gen_mov_i64(ret, arg1); } else { TCGv_i64 t0 = tcg_const_i64(arg2); @@ -1071,7 +1070,9 @@ void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2) void tcg_gen_sari_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2) { tcg_debug_assert(arg2 64); -if (arg2 == 0) { +if (TCG_TARGET_REG_BITS == 32) { +tcg_gen_shifti_i64(ret, arg1, arg2, 1, 1); +} else if (arg2 == 0) {
Re: [Qemu-devel] [PATCH v5 07/10] qmp: add rocker device support
On 01/22/2015 01:03 AM, sfel...@gmail.com wrote: From: Scott Feldman sfel...@gmail.com Add QMP/HMP support for rocker devices. This is mostly for debugging purposes to see inside the device's tables and port configurations. Some examples: QMP interface review: +++ b/qapi-schema.json @@ -3523,3 +3523,6 @@ # Since: 2.1 ## { 'command': 'rtc-reset-reinjection' } + +# Rocker ethernet network switch +{ 'include': 'qapi/rocker.json' } diff --git a/qapi/rocker.json b/qapi/rocker.json new file mode 100644 index 000..326c6c7 --- /dev/null +++ b/qapi/rocker.json @@ -0,0 +1,259 @@ +## +# @Rocker: +# +# Rocker switch information. +# +# @name: switch name +# +# @id: switch ID +# +# @ports: number of front-panel ports +## Missing a 'Since: 2.3' designation. +{ 'type': 'RockerSwitch', + 'data': { 'name': 'str', 'id': 'uint64', 'ports': 'uint32' } } + +## +# @rocker: +# +# Return rocker switch information. +# +# Returns: @Rocker information +# +# Since: 2.3 +## +{ 'command': 'rocker', + 'data': { 'name': 'str' }, + 'returns': 'RockerSwitch' } Should this command be named 'query-rocker', as it is used for queries? Should the 'name' argument be optional, and the output be an array (all rocker devices, rather than just a given rocker name lookup)? + +## +# @RockerPortDuplex: +# +# An eumeration of port duplex states. +# +# @half: half duplex +# +# @full: full duplex +## Missing a 'Since: 2.3' designation. +{ 'enum': 'RockerPortDuplex', 'data': [ 'half', 'full' ] } + +## +# @RockerPortAutoneg: +# +# An eumeration of port autoneg states. +# +# @off: autoneg is off +# +# @on: autoneg is on +## Missing a 'Since: 2.3' designation. +{ 'enum': 'RockerPortAutoneg', 'data': [ 'off', 'on' ] } + +## +# @RockerPort: +# +# Rocker switch port information. +# +# @name: port name +# +# @enabled: port is enabled for I/O +# +# @link-up: physical link is UP on port +# +# @speed: port link speed in Mbps +# +# @duplex: port link duplex +# +# @autoneg: port link autoneg +## Missing a 'Since: 2.3' designation. +{ 'type': 'RockerPort', + 'data': { 'name': 'str', 'enabled': 'bool', 'link-up': 'bool', +'speed': 'uint32', 'duplex': 'RockerPortDuplex', +'autoneg': 'RockerPortAutoneg' } } + +## +# @rocker-ports: +# +# Return rocker switch information. +# +# Returns: @Rocker information +# +# Since: 2.3 +## +{ 'command': 'rocker-ports', Should this be named 'query-rocker-ports'? Should the port information be returned as part of the more generic 'rocker' command rather than having to do a two-stage query (what are my rocker devices, then for each device what are the ports)? + 'data': { 'name': 'str' }, + 'returns': ['RockerPort'] } + +## +# @RockerOfDpaFlowKey: +# +# Rocker switch OF-DPA flow key +# +# @priority: key priority, 0 being lowest priority +# +# @tbl-id: flow table ID +# +# @in-pport: physical input port +# +# @tunnel-id: tunnel ID +# +# @vlan-id: VLAN ID +# +# @eth-type: Ethernet header type +# +# @eth-src: Ethernet header source MAC address +# +# @eth-dst: Ethernet header destination MAC address +# +# @ip-proto: IP Header protocol field +# +# @ip-tos: IP header TOS field +# +# @ip-dst: IP header destination address +## Missing a 'Since: 2.3' designation. +{ 'type': 'RockerOfDpaFlowKey', + 'data' : { 'priority': 'uint32', 'tbl-id': 'uint32', '*in-pport': 'uint32', + '*tunnel-id': 'uint32', '*vlan-id': 'uint16', + '*eth-type': 'uint16', '*eth-src': 'str', '*eth-dst': 'str', + '*ip-proto': 'uint8', '*ip-tos': 'uint8', '*ip-dst': 'str' } } Missing '#optional' tags on the various optional fields. Why are certain fields optional? Does it mean they have a default value, or that they don't make sense in some configurations? The docs could be more clear on that. + +## +# @RockerOfDpaFlowMask: +# +# Rocker switch OF-DPA flow mask +# +# @in-pport: physical input port +# +# @tunnel-id: tunnel ID +# +# @vlan-id: VLAN ID +# +# @eth-src: Ethernet header source MAC address +# +# @eth-dst: Ethernet header destination MAC address +# +# @ip-proto: IP Header protocol field +# +# @ip-tos: IP header TOS field +## Missing a 'Since: 2.3' designation. +{ 'type': 'RockerOfDpaFlowMask', + 'data' : { '*in-pport': 'uint32', '*tunnel-id': 'uint32', + '*vlan-id': 'uint16', '*eth-src': 'str', '*eth-dst': 'str', + '*ip-proto': 'uint8', '*ip-tos': 'uint8' } } Again, missing #optional tags in the docs, as well as what it means when a field is omitted. + +## +# @RockerOfDpaFlowAction: +# +# Rocker switch OF-DPA flow action +# +# @goto-tbl: next table ID +# +# @group-id: group ID +# +# @tunnel-lport: tunnel logical port ID +# +# @vlan-id: VLAN ID +# +# @new-vlan-id: new VLAN ID +# +# @out-pport: physical output port +## Missing a 'Since: 2.3' designation. +{ 'type':
Re: [Qemu-devel] [PATCH RFC 1/1] KVM: s390: Add MEMOP ioctl for reading/writing guest memory
On Tue, 03 Feb 2015 14:04:43 +0100 Paolo Bonzini pbonz...@redhat.com wrote: On 03/02/2015 13:11, Thomas Huth wrote: On s390, we've got to make sure to hold the IPTE lock while accessing virtual memory. So let's add an ioctl for reading and writing virtual memory to provide this feature for userspace, too. Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com Reviewed-by: Dominik Dingel din...@linux.vnet.ibm.com Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com --- Documentation/virtual/kvm/api.txt | 44 + arch/s390/kvm/gaccess.c | 22 + arch/s390/kvm/gaccess.h |2 + arch/s390/kvm/kvm-s390.c | 63 + include/uapi/linux/kvm.h | 21 5 files changed, 152 insertions(+), 0 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index b112efc..bf44b53 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2716,6 +2716,50 @@ The fields in each entry are defined as follows: eax, ebx, ecx, edx: the values returned by the cpuid instruction for this function/index combination +4.89 KVM_GUEST_MEM_OP + +Capability: KVM_CAP_MEM_OP Put virtual somewhere in the ioctl name and capability? Actually, I'd prefer to keep the virtual in the defines for the type of operation below: When it comes to s390 storage keys, we likely might need some calls for reading and writing to physical memory, too. Then we could simply extend this ioctl instead of inventing a new one. +Architectures: s390 +Type: vcpu ioctl +Parameters: struct kvm_guest_mem_op (in) +Returns: = 0 on success, + 0 on generic error (e.g. -EFAULT or -ENOMEM), + 0 if an exception occurred while walking the page tables + +Read or write data from/to the virtual memory of a VPCU. + +Parameters are specified via the following structure: + +struct kvm_guest_mem_op { + __u64 gaddr;/* the guest address */ + __u64 flags;/* arch specific flags */ + __u32 size; /* amount of bytes */ + __u32 op; /* type of operation */ + __u64 buf; /* buffer in userspace */ + __u8 reserved[32]; /* should be set to 0 */ +}; + +The type of operation is specified in the op field, either KVM_MEMOP_VIRTREAD +for reading from memory, KVM_MEMOP_VIRTWRITE for writing to memory, or +KVM_MEMOP_CHECKVIRTREAD or KVM_MEMOP_CHECKVIRTWRITE to check whether the Better: #define KVM_MEMOP_READ 0 #define KVM_MEMOP_WRITE 1 and in the flags field: #define KVM_MEMOP_F_CHECK_ONLY (1 1) Ok, a flag for the check operations is fine for me, too. ... +The logical (virtual) start address of the memory region has to be specified +in the gaddr field, and the length of the region in the size field. +buf is the buffer supplied by the userspace application where the read data +should be written to for KVM_MEMOP_VIRTREAD, or where the data that should +be written is stored for a KVM_MEMOP_VIRTWRITE. buf can be NULL for both +CHECK operations. buf is unused and can be NULL for both CHECK operations. +The reserved field is meant for future extensions. It must currently be +set to 0. Not really true, as you don't check it. So It is not used by KVM with the currently defined set of flags is a better explanation. ok ... and maybe add should be set to zero ? Paolo Thanks for the review! Thomas
[Qemu-devel] [PATCH v3 0/8] Linked list for tcg ops
Currently tcg ops are simply placed in a buffer in order. Which is fine until we want to actually do something with the opcode stream, such as optimize them. Note the horrible things like call opcodes needing their argument count both prefixed and postfixed so that we can iterate across the call either forward or backward. While I'm changing this, I also move quite a lot of tcg-op.h out of line. There is very little benefit to having most of them be inline, since their arguments are extracted from the guest instructions being translated, and thus their values are not really predictable. I chose a cutoff of one function call. If a tcg-op.h function consists of a single function call, inline it, otherwise move it out of line. This also removes a bit of boilerplate from each target. I haven't been able to measure a performance difference with this patch set. I wouldn't really expect any, as the complexity level remains the same. I simply find the link list significantly more maintainable. Changes v2-v3: * Parameter order bug affecting 32-bit hosts fixed (thanks Peter). r~ Richard Henderson (8): tcg: Move some opcode generation functions out of line tcg: Reduce ifdefs in tcg-op.c tcg: Move emit of INDEX_op_end into gen_tb_end tcg: Introduce tcg_op_buf_count and tcg_op_buf_full tcg: Put opcodes in a linked list tcg: Remove opcodes instead of noping them out tcg: Implement insert_op_before tcg: Remove unused opcodes Makefile.target |2 +- include/exec/gen-icount.h | 22 +- target-alpha/translate.c | 16 +- target-arm/translate-a64.c| 10 +- target-arm/translate.c| 10 +- target-cris/translate.c | 15 +- target-i386/translate.c | 11 +- target-lm32/translate.c | 16 +- target-m68k/translate.c | 10 +- target-microblaze/translate.c | 22 +- target-mips/translate.c | 10 +- target-moxie/translate.c | 10 +- target-openrisc/translate.c | 15 +- target-ppc/translate.c| 11 +- target-s390x/translate.c | 11 +- target-sh4/translate.c| 10 +- target-sparc/translate.c | 10 +- target-tricore/translate.c|5 +- target-unicore32/translate.c | 10 +- target-xtensa/translate.c |8 +- tcg/optimize.c| 307 +++-- tcg/tcg-op.c | 1934 tcg/tcg-op.h | 2487 ++--- tcg/tcg-opc.h |9 - tcg/tcg.c | 532 +++-- tcg/tcg.h | 72 +- tci.c | 13 - 27 files changed, 2751 insertions(+), 2837 deletions(-) create mode 100644 tcg/tcg-op.c -- 2.1.0
[Qemu-devel] [PATCH v3 3/8] tcg: Move emit of INDEX_op_end into gen_tb_end
Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de Signed-off-by: Richard Henderson r...@twiddle.net --- include/exec/gen-icount.h | 2 ++ target-alpha/translate.c | 2 +- target-arm/translate-a64.c| 1 - target-arm/translate.c| 1 - target-cris/translate.c | 2 +- target-i386/translate.c | 2 +- target-lm32/translate.c | 2 +- target-m68k/translate.c | 1 - target-microblaze/translate.c | 2 +- target-mips/translate.c | 2 +- target-moxie/translate.c | 2 +- target-openrisc/translate.c | 2 +- target-ppc/translate.c| 2 +- target-s390x/translate.c | 2 +- target-sh4/translate.c| 2 +- target-sparc/translate.c | 2 +- target-tricore/translate.c| 1 - target-unicore32/translate.c | 1 - target-xtensa/translate.c | 1 - 19 files changed, 14 insertions(+), 18 deletions(-) diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h index 221aad0..a37a61d 100644 --- a/include/exec/gen-icount.h +++ b/include/exec/gen-icount.h @@ -48,6 +48,8 @@ static void gen_tb_end(TranslationBlock *tb, int num_insns) gen_set_label(icount_label); tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_ICOUNT_EXPIRED); } + +*tcg_ctx.gen_opc_ptr = INDEX_op_end; } static inline void gen_io_start(void) diff --git a/target-alpha/translate.c b/target-alpha/translate.c index f888367..aa04c60 100644 --- a/target-alpha/translate.c +++ b/target-alpha/translate.c @@ -2912,7 +2912,7 @@ static inline void gen_intermediate_code_internal(AlphaCPU *cpu, } gen_tb_end(tb, num_insns); -*tcg_ctx.gen_opc_ptr = INDEX_op_end; + if (search_pc) { j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf; lj++; diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index 80d2359..10e09bc 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -11090,7 +11090,6 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu, done_generating: gen_tb_end(tb, num_insns); -*tcg_ctx.gen_opc_ptr = INDEX_op_end; #ifdef DEBUG_DISAS if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)) { diff --git a/target-arm/translate.c b/target-arm/translate.c index bdfcdf1..4b30698 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -11330,7 +11330,6 @@ static inline void gen_intermediate_code_internal(ARMCPU *cpu, done_generating: gen_tb_end(tb, num_insns); -*tcg_ctx.gen_opc_ptr = INDEX_op_end; #ifdef DEBUG_DISAS if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)) { diff --git a/target-cris/translate.c b/target-cris/translate.c index b675ed0..b5a792c 100644 --- a/target-cris/translate.c +++ b/target-cris/translate.c @@ -3344,7 +3344,7 @@ gen_intermediate_code_internal(CRISCPU *cpu, TranslationBlock *tb, } } gen_tb_end(tb, num_insns); -*tcg_ctx.gen_opc_ptr = INDEX_op_end; + if (search_pc) { j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf; lj++; diff --git a/target-i386/translate.c b/target-i386/translate.c index 9ebdf4b..e2e21e4 100644 --- a/target-i386/translate.c +++ b/target-i386/translate.c @@ -8077,7 +8077,7 @@ static inline void gen_intermediate_code_internal(X86CPU *cpu, gen_io_end(); done_generating: gen_tb_end(tb, num_insns); -*tcg_ctx.gen_opc_ptr = INDEX_op_end; + /* we don't forget to fill the last values */ if (search_pc) { j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf; diff --git a/target-lm32/translate.c b/target-lm32/translate.c index a7579dc..cd09293 100644 --- a/target-lm32/translate.c +++ b/target-lm32/translate.c @@ -1158,7 +1158,7 @@ void gen_intermediate_code_internal(LM32CPU *cpu, } gen_tb_end(tb, num_insns); -*tcg_ctx.gen_opc_ptr = INDEX_op_end; + if (search_pc) { j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf; lj++; diff --git a/target-m68k/translate.c b/target-m68k/translate.c index 47edc7a..7e98a17 100644 --- a/target-m68k/translate.c +++ b/target-m68k/translate.c @@ -3075,7 +3075,6 @@ gen_intermediate_code_internal(M68kCPU *cpu, TranslationBlock *tb, } } gen_tb_end(tb, num_insns); -*tcg_ctx.gen_opc_ptr = INDEX_op_end; #ifdef DEBUG_DISAS if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)) { diff --git a/target-microblaze/translate.c b/target-microblaze/translate.c index 69ce4df..437a069 100644 --- a/target-microblaze/translate.c +++ b/target-microblaze/translate.c @@ -1846,7 +1846,7 @@ gen_intermediate_code_internal(MicroBlazeCPU *cpu, TranslationBlock *tb, } } gen_tb_end(tb, num_insns); -*tcg_ctx.gen_opc_ptr = INDEX_op_end; + if (search_pc) { j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf; lj++; diff --git a/target-mips/translate.c b/target-mips/translate.c index e9d86b2..70b5b45 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -19240,7 +19240,7 @@ gen_intermediate_code_internal(MIPSCPU *cpu, TranslationBlock *tb, }
[Qemu-devel] [PATCH v3 7/8] tcg: Implement insert_op_before
Rather reserving space in the op stream for optimization, let the optimizer add ops as necessary. Signed-off-by: Richard Henderson r...@twiddle.net --- tcg/optimize.c | 57 +++-- tcg/tcg-op.c | 21 - tcg/tcg-op.h | 1 - 3 files changed, 35 insertions(+), 44 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 973fbb4..067917c 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -67,6 +67,37 @@ static void reset_temp(TCGArg temp) temps[temp].mask = -1; } +static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op, +TCGOpcode opc, int nargs) +{ +int oi = s-gen_next_op_idx; +int pi = s-gen_next_parm_idx; +int prev = old_op-prev; +int next = old_op - s-gen_op_buf; +TCGOp *new_op; + +tcg_debug_assert(oi OPC_BUF_SIZE); +tcg_debug_assert(pi + nargs = OPPARAM_BUF_SIZE); +s-gen_next_op_idx = oi + 1; +s-gen_next_parm_idx = pi + nargs; + +new_op = s-gen_op_buf[oi]; +*new_op = (TCGOp){ +.opc = opc, +.args = pi, +.prev = prev, +.next = next +}; +if (prev = 0) { +s-gen_op_buf[prev].next = oi; +} else { +s-gen_first_op_idx = oi; +} +old_op-prev = oi; + +return new_op; +} + /* Reset all temporaries, given that there are NB_TEMPS of them. */ static void reset_all_temps(int nb_temps) { @@ -1108,8 +1139,8 @@ static void tcg_constant_folding(TCGContext *s) uint64_t a = ((uint64_t)ah 32) | al; uint64_t b = ((uint64_t)bh 32) | bl; TCGArg rl, rh; -TCGOp *op2; -TCGArg *args2; +TCGOp *op2 = insert_op_before(s, op, INDEX_op_movi_i32, 2); +TCGArg *args2 = s-gen_opparam_buf[op2-args]; if (opc == INDEX_op_add2_i32) { a += b; @@ -1117,15 +1148,6 @@ static void tcg_constant_folding(TCGContext *s) a -= b; } -/* We emit the extra nop when we emit the add2/sub2. */ -op2 = s-gen_op_buf[oi_next]; -assert(op2-opc == INDEX_op_nop); - -/* But we still have to allocate args for the op. */ -op2-args = s-gen_next_parm_idx; -s-gen_next_parm_idx += 2; -args2 = s-gen_opparam_buf[op2-args]; - rl = args[0]; rh = args[1]; tcg_opt_gen_movi(s, op, args, opc, rl, (uint32_t)a); @@ -1144,17 +1166,8 @@ static void tcg_constant_folding(TCGContext *s) uint32_t b = temps[args[3]].val; uint64_t r = (uint64_t)a * b; TCGArg rl, rh; -TCGOp *op2; -TCGArg *args2; - -/* We emit the extra nop when we emit the mulu2. */ -op2 = s-gen_op_buf[oi_next]; -assert(op2-opc == INDEX_op_nop); - -/* But we still have to allocate args for the op. */ -op2-args = s-gen_next_parm_idx; -s-gen_next_parm_idx += 2; -args2 = s-gen_opparam_buf[op2-args]; +TCGOp *op2 = insert_op_before(s, op, INDEX_op_movi_i32, 2); +TCGArg *args2 = s-gen_opparam_buf[op2-args]; rl = args[0]; rh = args[1]; diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c index cbaa15c..afa351d 100644 --- a/tcg/tcg-op.c +++ b/tcg/tcg-op.c @@ -57,11 +57,6 @@ static void tcg_emit_op(TCGContext *ctx, TCGOpcode opc, int args) }; } -void tcg_gen_op0(TCGContext *ctx, TCGOpcode opc) -{ -tcg_emit_op(ctx, opc, -1); -} - void tcg_gen_op1(TCGContext *ctx, TCGOpcode opc, TCGArg a1) { int pi = ctx-gen_next_parm_idx; @@ -571,8 +566,6 @@ void tcg_gen_add2_i32(TCGv_i32 rl, TCGv_i32 rh, TCGv_i32 al, { if (TCG_TARGET_HAS_add2_i32) { tcg_gen_op6_i32(INDEX_op_add2_i32, rl, rh, al, ah, bl, bh); -/* Allow the optimizer room to replace add2 with two moves. */ -tcg_gen_op0(tcg_ctx, INDEX_op_nop); } else { TCGv_i64 t0 = tcg_temp_new_i64(); TCGv_i64 t1 = tcg_temp_new_i64(); @@ -590,8 +583,6 @@ void tcg_gen_sub2_i32(TCGv_i32 rl, TCGv_i32 rh, TCGv_i32 al, { if (TCG_TARGET_HAS_sub2_i32) { tcg_gen_op6_i32(INDEX_op_sub2_i32, rl, rh, al, ah, bl, bh); -/* Allow the optimizer room to replace sub2 with two moves. */ -tcg_gen_op0(tcg_ctx, INDEX_op_nop); } else { TCGv_i64 t0 = tcg_temp_new_i64(); TCGv_i64 t1 = tcg_temp_new_i64(); @@ -608,8 +599,6 @@ void tcg_gen_mulu2_i32(TCGv_i32 rl, TCGv_i32 rh, TCGv_i32 arg1, TCGv_i32 arg2) { if (TCG_TARGET_HAS_mulu2_i32) { tcg_gen_op4_i32(INDEX_op_mulu2_i32, rl, rh, arg1, arg2); -/* Allow the optimizer room to replace mulu2 with two moves. */ -tcg_gen_op0(tcg_ctx, INDEX_op_nop); } else if
[Qemu-devel] QEMU crash on PCI passthrough
Hello qemu-devel list, I have a problem with PCI passthrough of my second gfx card (Nvidia GTX 760). I use gentoo, gentoo-hardened kernel sources and I have SELinux enabled (anyway for purpose of these tests I've added svirt_t and virtd_t to permissive types, so it shouldn't make any problem). I use the following versions of applications: app-emulation/qemu-2.2.0 (but I've also tested app-emulation/qemu-2.1.2-r2) app-emulation/libvirt-1.2.10-r4 app-emulation/virt-manager-1.1.0 While running guest without PCI passthrough everything works fine so far, when I pass my PCI device to the guest I end up with the following message: __QUOTE_BEGIN_CANARY__ Error starting domain: internal error: Process exited while reading console log output: char device redirected to /dev/pts/3 (label charserial0) qemu: hardware error: pci read failed, ret = 0 errno = 0 CPU #0: EAX= EBX= ECX= EDX=0663 ESI= EDI= EBP= ESP= EIP=fff0 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= IDT= CR0=6010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80 FPR0= FPR1= FPR2= FPR3=0 __QUOTE_END_CANARY__ In /var/log/libvirtd/libvirtd.log there is: 2015-02-03 02:12:27.135+: 3073: error : qemuProcessReadLogOutput:1719 : internal error: Process exited while reading console log output: char device redirected to /dev/pts/3 (label charserial0) qemu: hardware error: pci read failed, ret = 0 errno = 0 __QUOTE_BEGIN_CANARY__ ... __QUOTE_END_CANARY__ And in /var/log/libvirtd/qemu/guest.log there is: 2015-02-03 02:12:27.025+: starting up LC_ALL=C PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin HOME=/ USER=root QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name debian7 -S -machine pc-i440fx-2.1,accel=kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid a00b499e-ad90-4470-8b65-18e17e55dca4 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/debian7.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -no-hpet -no-shutdown -boot c -usb -drive file=/var/lib/libvirt/filesystems/debian.qcow2,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/var/lib/libvirt/images/debian-7.8.0-amd64-netinst.iso,if=none,media=cdrom, id=drive-ide0-0-0,readonly=on,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device pci-assign,host=06:00.0,id=hostdev0,bus=pci.0,addr=0x5 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 char device redirected to /dev/pts/3 (label charserial0) qemu: hardware error: pci read failed, ret = 0 errno = 0 __QUOTE_BEGIN_CANARY__ ... __QUOTE_END_CANARY__ I add that I was required to turn off integrated sound device because the earlier error was: 2015-02-03 01:47:40.189+: 3070: error : virPCIDeviceReset:985 : internal error: Unable to reset PCI device :06:00.0: internal error: Active :06:00.1 devices on bus with :06:00.0, not doing bus reset after typing: echo 1 /sys/devices/pci:00/:00:15.0/:06:00.1/remove this problem disappears and I'm stuck on the above one. Any help? Any work around? Is it a known issue? Could I provide any additional info to help you diagnose the problem? Thanks, Chris
Re: [Qemu-devel] [PATCH RFC 1/1] KVM: s390: Add MEMOP ioctl for reading/writing guest memory
On 03/02/2015 16:16, Thomas Huth wrote: Actually, I'd prefer to keep the virtual in the defines for the type of operation below: When it comes to s390 storage keys, we likely might need some calls for reading and writing to physical memory, too. Then we could simply extend this ioctl instead of inventing a new one. Can you explain why it is necessary to read/write physical addresses from user space? In the case of QEMU, I'm worried that you would have to invent your own memory read/write APIs that are different from everything else. On real s390 zPCI, does bus-master DMA update storage keys? Not really true, as you don't check it. So It is not used by KVM with the currently defined set of flags is a better explanation. ok ... and maybe add should be set to zero ? If you don't check it, it is misleading to document this. Paolo
[Qemu-devel] [PATCH v3 6/8] tcg: Remove opcodes instead of noping them out
With the linked list scheme we need not leave nops in the stream that we need to process later. Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de Signed-off-by: Richard Henderson r...@twiddle.net --- tcg/optimize.c | 14 +++--- tcg/tcg.c | 28 tcg/tcg.h | 1 + 3 files changed, 32 insertions(+), 11 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index f2b8acf..973fbb4 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -758,7 +758,7 @@ static void tcg_constant_folding(TCGContext *s) break; do_mov3: if (temps_are_copies(args[0], args[1])) { -op-opc = INDEX_op_nop; +tcg_op_remove(s, op); } else { tcg_opt_gen_mov(s, op, args, opc, args[0], args[1]); } @@ -916,7 +916,7 @@ static void tcg_constant_folding(TCGContext *s) if (affected == 0) { assert(nb_oargs == 1); if (temps_are_copies(args[0], args[1])) { -op-opc = INDEX_op_nop; +tcg_op_remove(s, op); } else if (temps[args[1]].state != TCG_TEMP_CONST) { tcg_opt_gen_mov(s, op, args, opc, args[0], args[1]); } else { @@ -948,7 +948,7 @@ static void tcg_constant_folding(TCGContext *s) CASE_OP_32_64(and): if (temps_are_copies(args[1], args[2])) { if (temps_are_copies(args[0], args[1])) { -op-opc = INDEX_op_nop; +tcg_op_remove(s, op); } else { tcg_opt_gen_mov(s, op, args, opc, args[0], args[1]); } @@ -979,7 +979,7 @@ static void tcg_constant_folding(TCGContext *s) switch (opc) { CASE_OP_32_64(mov): if (temps_are_copies(args[0], args[1])) { -op-opc = INDEX_op_nop; +tcg_op_remove(s, op); break; } if (temps[args[1]].state != TCG_TEMP_CONST) { @@ -1074,7 +1074,7 @@ static void tcg_constant_folding(TCGContext *s) op-opc = INDEX_op_br; args[0] = args[3]; } else { -op-opc = INDEX_op_nop; +tcg_op_remove(s, op); } break; } @@ -1084,7 +1084,7 @@ static void tcg_constant_folding(TCGContext *s) tmp = do_constant_folding_cond(opc, args[1], args[2], args[5]); if (tmp != 2) { if (temps_are_copies(args[0], args[4-tmp])) { -op-opc = INDEX_op_nop; +tcg_op_remove(s, op); } else if (temps[args[4-tmp]].state == TCG_TEMP_CONST) { tcg_opt_gen_movi(s, op, args, opc, args[0], temps[args[4-tmp]].val); @@ -1177,7 +1177,7 @@ static void tcg_constant_folding(TCGContext *s) args[0] = args[5]; } else { do_brcond_false: -op-opc = INDEX_op_nop; +tcg_op_remove(s, op); } } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE) temps[args[2]].state == TCG_TEMP_CONST diff --git a/tcg/tcg.c b/tcg/tcg.c index ee041b9..4115e8b 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1244,6 +1244,29 @@ void tcg_add_target_add_op_defs(const TCGTargetOpDef *tdefs) #endif } +void tcg_op_remove(TCGContext *s, TCGOp *op) +{ +int next = op-next; +int prev = op-prev; + +if (next = 0) { +s-gen_op_buf[next].prev = prev; +} else { +s-gen_last_op_idx = prev; +} +if (prev = 0) { +s-gen_op_buf[prev].next = next; +} else { +s-gen_first_op_idx = next; +} + +*op = (TCGOp){ .opc = INDEX_op_nop, .next = -1, .prev = -1 }; + +#ifdef CONFIG_PROFILER +s-del_op_count++; +#endif +} + #ifdef USE_LIVENESS_ANALYSIS /* liveness analysis: end of function: all temps are dead, and globals should be in memory. */ @@ -1466,10 +1489,7 @@ static void tcg_liveness_analysis(TCGContext *s) } } do_remove: -op-opc = INDEX_op_nop; -#ifdef CONFIG_PROFILER -s-del_op_count++; -#endif +tcg_op_remove(s, op); } else { do_not_remove: /* output args are dead */ diff --git a/tcg/tcg.h b/tcg/tcg.h index 596e30a..f941965 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -743,6 +743,7 @@ void tcg_add_target_add_op_defs(const TCGTargetOpDef *tdefs); void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret, int nargs, TCGArg *args); +void tcg_op_remove(TCGContext *s, TCGOp *op); void tcg_optimize(TCGContext *s); /* only used for debugging purposes */ -- 2.1.0
[Qemu-devel] [PATCH v3 4/8] tcg: Introduce tcg_op_buf_count and tcg_op_buf_full
The method by which we count the number of ops emitted is going to change. Abstract that away into some inlines. Reviewed-by: Bastian Koppelmann kbast...@mail.uni-paderborn.de Signed-off-by: Richard Henderson r...@twiddle.net --- target-alpha/translate.c | 14 +++--- target-arm/translate-a64.c| 9 +++-- target-arm/translate.c| 9 +++-- target-cris/translate.c | 13 + target-i386/translate.c | 9 +++-- target-lm32/translate.c | 14 +- target-m68k/translate.c | 9 +++-- target-microblaze/translate.c | 20 target-mips/translate.c | 8 +++- target-moxie/translate.c | 8 +++- target-openrisc/translate.c | 13 + target-ppc/translate.c| 9 +++-- target-s390x/translate.c | 9 +++-- target-sh4/translate.c| 8 +++- target-sparc/translate.c | 8 +++- target-tricore/translate.c| 4 +--- target-unicore32/translate.c | 9 +++-- target-xtensa/translate.c | 7 +++ tcg/tcg.h | 12 19 files changed, 79 insertions(+), 113 deletions(-) diff --git a/target-alpha/translate.c b/target-alpha/translate.c index aa04c60..9c77d46 100644 --- a/target-alpha/translate.c +++ b/target-alpha/translate.c @@ -2790,7 +2790,6 @@ static inline void gen_intermediate_code_internal(AlphaCPU *cpu, target_ulong pc_start; target_ulong pc_mask; uint32_t insn; -uint16_t *gen_opc_end; CPUBreakpoint *bp; int j, lj = -1; ExitStatus ret; @@ -2798,7 +2797,6 @@ static inline void gen_intermediate_code_internal(AlphaCPU *cpu, int max_insns; pc_start = tb-pc; -gen_opc_end = tcg_ctx.gen_opc_buf + OPC_MAX_SIZE; ctx.tb = tb; ctx.pc = pc_start; @@ -2839,11 +2837,12 @@ static inline void gen_intermediate_code_internal(AlphaCPU *cpu, } } if (search_pc) { -j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf; +j = tcg_op_buf_count(); if (lj j) { lj++; -while (lj j) +while (lj j) { tcg_ctx.gen_opc_instr_start[lj++] = 0; +} } tcg_ctx.gen_opc_pc[lj] = ctx.pc; tcg_ctx.gen_opc_instr_start[lj] = 1; @@ -2881,7 +2880,7 @@ static inline void gen_intermediate_code_internal(AlphaCPU *cpu, or exhaust instruction count, stop generation. */ if (ret == NO_EXIT ((ctx.pc pc_mask) == 0 -|| tcg_ctx.gen_opc_ptr = gen_opc_end +|| tcg_op_buf_full() || num_insns = max_insns || singlestep || ctx.singlestep_enabled)) { @@ -2914,10 +2913,11 @@ static inline void gen_intermediate_code_internal(AlphaCPU *cpu, gen_tb_end(tb, num_insns); if (search_pc) { -j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf; +j = tcg_op_buf_count(); lj++; -while (lj = j) +while (lj = j) { tcg_ctx.gen_opc_instr_start[lj++] = 0; +} } else { tb-size = ctx.pc - pc_start; tb-icount = num_insns; diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index 10e09bc..a85ca5d 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -10899,7 +10899,6 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu, CPUARMState *env = cpu-env; DisasContext dc1, *dc = dc1; CPUBreakpoint *bp; -uint16_t *gen_opc_end; int j, lj; target_ulong pc_start; target_ulong next_page_start; @@ -10910,8 +10909,6 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu, dc-tb = tb; -gen_opc_end = tcg_ctx.gen_opc_buf + OPC_MAX_SIZE; - dc-is_jmp = DISAS_NEXT; dc-pc = pc_start; dc-singlestep_enabled = cs-singlestep_enabled; @@ -10980,7 +10977,7 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu, } if (search_pc) { -j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf; +j = tcg_op_buf_count(); if (lj j) { lj++; while (lj j) { @@ -11030,7 +11027,7 @@ void gen_intermediate_code_internal_a64(ARMCPU *cpu, * ensures prefetch aborts occur at the right place. */ num_insns++; -} while (!dc-is_jmp tcg_ctx.gen_opc_ptr gen_opc_end +} while (!dc-is_jmp !tcg_op_buf_full() !cs-singlestep_enabled !singlestep !dc-ss_active @@ -11101,7 +11098,7 @@ done_generating: } #endif if (search_pc) { -j = tcg_ctx.gen_opc_ptr - tcg_ctx.gen_opc_buf; +j = tcg_op_buf_count(); lj++; while (lj = j) { tcg_ctx.gen_opc_instr_start[lj++] = 0; diff --git a/target-arm/translate.c b/target-arm/translate.c index 4b30698..24658f6 100644 --- a/target-arm/translate.c +++
Re: [Qemu-devel] [PATCH] block: introduce BDRV_REQUEST_MAX_SECTORS
Am 03.02.2015 um 14:29 schrieb Denis V. Lunev: On 03/02/15 15:12, Peter Lieven wrote: we check and adjust request sizes at several places with sometimes inconsistent checks or default values: INT_MAX INT_MAX BDRV_SECTOR_BITS UINT_MAX BDRV_SECTOR_BITS SIZE_MAX BDRV_SECTOR_BITS This patches introdocues a macro for the maximal allowed sectors per request and uses it at several places. Signed-off-by: Peter Lieven p...@kamp.de --- block.c | 19 --- hw/block/virtio-blk.c | 4 ++-- include/block/block.h | 3 +++ 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/block.c b/block.c index 8272ef9..4e58b35 100644 --- a/block.c +++ b/block.c @@ -2671,7 +2671,7 @@ static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset, static int bdrv_check_request(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { -if (nb_sectors 0 || nb_sectors INT_MAX / BDRV_SECTOR_SIZE) { +if (nb_sectors 0 || nb_sectors BDRV_REQUEST_MAX_SECTORS) { return -EIO; } @@ -2758,7 +2758,7 @@ static int bdrv_rw_co(BlockDriverState *bs, int64_t sector_num, uint8_t *buf, .iov_len = nb_sectors * BDRV_SECTOR_SIZE, }; -if (nb_sectors 0 || nb_sectors INT_MAX / BDRV_SECTOR_SIZE) { +if (nb_sectors 0 || nb_sectors BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -2826,13 +2826,10 @@ int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags) } for (;;) { -nb_sectors = target_sectors - sector_num; +nb_sectors = MIN(target_sectors - sector_num, BDRV_REQUEST_MAX_SECTORS); if (nb_sectors = 0) { return 0; } -if (nb_sectors INT_MAX / BDRV_SECTOR_SIZE) { -nb_sectors = INT_MAX / BDRV_SECTOR_SIZE; -} ret = bdrv_get_block_status(bs, sector_num, nb_sectors, n); if (ret 0) { error_report(error getting block status at sector % PRId64 : %s, @@ -3167,7 +3164,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, BdrvRequestFlags flags) { -if (nb_sectors 0 || nb_sectors (UINT_MAX BDRV_SECTOR_BITS)) { +if (nb_sectors 0 || nb_sectors BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -3202,8 +3199,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, struct iovec iov = {0}; int ret = 0; -int max_write_zeroes = bs-bl.max_write_zeroes ? - bs-bl.max_write_zeroes : INT_MAX; +int max_write_zeroes = MIN_NON_ZERO(bs-bl.max_write_zeroes, + BDRV_REQUEST_MAX_SECTORS); while (nb_sectors 0 !ret) { int num = nb_sectors; @@ -3458,7 +3455,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, BdrvRequestFlags flags) { -if (nb_sectors 0 || nb_sectors (INT_MAX BDRV_SECTOR_BITS)) { +if (nb_sectors 0 || nb_sectors BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -5120,7 +5117,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, return 0; } -max_discard = bs-bl.max_discard ? bs-bl.max_discard : INT_MAX; +max_discard = MIN_NON_ZERO(bs-bl.max_discard, BDRV_REQUEST_MAX_SECTORS); while (nb_sectors 0) { int ret; int num = nb_sectors; diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 8c51a29..1a8a176 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -381,7 +381,7 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb) } max_xfer_len = blk_get_max_transfer_length(mrb-reqs[0]-dev-blk); -max_xfer_len = MIN_NON_ZERO(max_xfer_len, INT_MAX); +max_xfer_len = MIN_NON_ZERO(max_xfer_len, BDRV_REQUEST_MAX_SECTORS); qsort(mrb-reqs, mrb-num_reqs, sizeof(*mrb-reqs), multireq_compare); @@ -447,7 +447,7 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev, uint64_t nb_sectors = size BDRV_SECTOR_BITS; uint64_t total_sectors; -if (nb_sectors INT_MAX) { +if (nb_sectors BDRV_REQUEST_MAX_SECTORS) { return false; } if (sector dev-sector_mask) { diff --git a/include/block/block.h b/include/block/block.h index 3082d2b..25a6d62 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -83,6 +83,9 @@ typedef enum { #define BDRV_SECTOR_SIZE (1ULL BDRV_SECTOR_BITS) #define BDRV_SECTOR_MASK ~(BDRV_SECTOR_SIZE - 1) +#define BDRV_REQUEST_MAX_SECTORS MIN(SIZE_MAX BDRV_SECTOR_BITS, \ + INT_MAX BDRV_SECTOR_BITS) + /* * Allocation status flags * BDRV_BLOCK_DATA: data is read from bs-file or another file Reviewed-by: Denis V. Lunev d...@openvz.org On the other hand the limitation to INT_MAX for a request
Re: [Qemu-devel] [PATCH v2 0/5] vhost-scsi: support to assign boot order
On 2015/1/29 15:08, Gonglei (Arei) wrote: From: Gonglei arei.gong...@huawei.com Qemu haven't provide a bootindex property for vhost-scsi device. So, we can not assign the boot order for it at present. But Some clients/users have requirements for that in some scenarios. This patch achieve the aim in Qemu side. Because Qemu only accept an wwpn argument for vhost-scsi, we cannot assign a tpgt. That's say tpg is transparent for Qemu, Qemu doesn't know which tpg can boot, but vhost-scsi driver module doesn't know too for one assigned wwpn. At present, we assume that the first tpg can boot only, and add a boot_tpgt property that defaults to 0. Of course, people can pass a valid value by qemu command line. Ping... v2 - v1: (Thanks to Paolo's suggestion) - change calling qdev_get_own_fw_dev_path_from_handler in get_boot_devices_list, and convert non-NULL suffixes to implementations of FWPathProvider in Patch 1. (Paolo) - add a boot_tpgt property for vhost-scsi in Patch 4. (Paolo) - remove the ioctl calling in Patch 4, because the kernel patch hasn't been accepted. kernel patch: [PATCH] vhost-scsi: introduce an ioctl to get the minimum tpgt http://news.gmane.org/gmane.comp.emulators.kvm.devel Gonglei (5): qdev: support to get a device firmware path directly vhost-scsi: add bootindex property vhost-scsi: realize the TYPE_FW_PATH_PROVIDER interface vhost-scsi: add a property for booting vhost-scsi: set the bootable value of channel/target/lun bootdevice.c| 31 +-- hw/core/qdev.c | 7 +++ hw/scsi/vhost-scsi.c| 35 +++ hw/virtio/virtio-pci.c | 2 ++ include/hw/qdev-core.h | 1 + include/hw/virtio/vhost-scsi.h | 5 + include/hw/virtio/virtio-scsi.h | 1 + 7 files changed, 68 insertions(+), 14 deletions(-)
Re: [Qemu-devel] [Qemu-ppc] [RFC] pseries: Enable in-kernel H_LOGICAL_CI_{LOAD, STORE} implementations
David Gibson da...@gibson.dropbear.id.au writes: qemu currently implements the hypercalls H_LOGICAL_CI_LOAD and H_LOGICAL_CI_STORE as PAPR extensions. These are used by the SLOF firmware for IO, because performing cache inhibited MMIO accesses with the MMU off (real mode) is very awkward on POWER. This approach breaks when SLOF needs to access IO devices implemented within KVM instead of in qemu. The simplest example would be virtio-blk using an iothread, because the iothread / dataplane mechanism relies on an in-kernel implementation of the virtio queue notification MMIO. To fix this, an in-kernel implementation of these hypercalls has been made, however, the hypercalls still need to be enabled from qemu. This performs the necessary calls to do so. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Reviewed-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com
Re: [Qemu-devel] [PATCH 1/3] nbd: Drop BDS backpointer
On 02/02/2015 22:40, Max Reitz wrote: Before this patch, the opaque pointer in an NBD BDS points to a BDRVNBDState, which contains an NbdClientSession object, which in turn contains a pointer to the BDS. This pointer may become invalid due to bdrv_swap(), so drop it, and instead pass the BDS directly to the nbd-client.c functions which then retrieve the NbdClientSession object from there. Looks good, but please change function names from nbd_client_session_foo to nbd_client_foo or even just nbd_foo if they do not take an NbdClientSession* as the first parameter. Thanks, Paolo Signed-off-by: Max Reitz mre...@redhat.com --- block/nbd-client.c | 95 -- block/nbd-client.h | 20 ++-- block/nbd.c| 37 - 3 files changed, 73 insertions(+), 79 deletions(-) diff --git a/block/nbd-client.c b/block/nbd-client.c index 28bfb62..4ede714 100644 --- a/block/nbd-client.c +++ b/block/nbd-client.c @@ -43,20 +43,23 @@ static void nbd_recv_coroutines_enter_all(NbdClientSession *s) } } -static void nbd_teardown_connection(NbdClientSession *client) +static void nbd_teardown_connection(BlockDriverState *bs) { +NbdClientSession *client = nbd_get_client_session(bs); + /* finish any pending coroutines */ shutdown(client-sock, 2); nbd_recv_coroutines_enter_all(client); -nbd_client_session_detach_aio_context(client); +nbd_client_session_detach_aio_context(bs); closesocket(client-sock); client-sock = -1; } static void nbd_reply_ready(void *opaque) { -NbdClientSession *s = opaque; +BlockDriverState *bs = opaque; +NbdClientSession *s = nbd_get_client_session(bs); uint64_t i; int ret; @@ -89,28 +92,29 @@ static void nbd_reply_ready(void *opaque) } fail: -nbd_teardown_connection(s); +nbd_teardown_connection(bs); } static void nbd_restart_write(void *opaque) { -NbdClientSession *s = opaque; +BlockDriverState *bs = opaque; -qemu_coroutine_enter(s-send_coroutine, NULL); +qemu_coroutine_enter(nbd_get_client_session(bs)-send_coroutine, NULL); } -static int nbd_co_send_request(NbdClientSession *s, -struct nbd_request *request, -QEMUIOVector *qiov, int offset) +static int nbd_co_send_request(BlockDriverState *bs, + struct nbd_request *request, + QEMUIOVector *qiov, int offset) { +NbdClientSession *s = nbd_get_client_session(bs); AioContext *aio_context; int rc, ret; qemu_co_mutex_lock(s-send_mutex); s-send_coroutine = qemu_coroutine_self(); -aio_context = bdrv_get_aio_context(s-bs); +aio_context = bdrv_get_aio_context(bs); aio_set_fd_handler(aio_context, s-sock, - nbd_reply_ready, nbd_restart_write, s); + nbd_reply_ready, nbd_restart_write, bs); if (qiov) { if (!s-is_unix) { socket_set_cork(s-sock, 1); @@ -129,7 +133,7 @@ static int nbd_co_send_request(NbdClientSession *s, } else { rc = nbd_send_request(s-sock, request); } -aio_set_fd_handler(aio_context, s-sock, nbd_reply_ready, NULL, s); +aio_set_fd_handler(aio_context, s-sock, nbd_reply_ready, NULL, bs); s-send_coroutine = NULL; qemu_co_mutex_unlock(s-send_mutex); return rc; @@ -195,10 +199,11 @@ static void nbd_coroutine_end(NbdClientSession *s, } } -static int nbd_co_readv_1(NbdClientSession *client, int64_t sector_num, +static int nbd_co_readv_1(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int offset) { +NbdClientSession *client = nbd_get_client_session(bs); struct nbd_request request = { .type = NBD_CMD_READ }; struct nbd_reply reply; ssize_t ret; @@ -207,7 +212,7 @@ static int nbd_co_readv_1(NbdClientSession *client, int64_t sector_num, request.len = nb_sectors * 512; nbd_coroutine_start(client, request); -ret = nbd_co_send_request(client, request, NULL, 0); +ret = nbd_co_send_request(bs, request, NULL, 0); if (ret 0) { reply.error = -ret; } else { @@ -218,15 +223,16 @@ static int nbd_co_readv_1(NbdClientSession *client, int64_t sector_num, } -static int nbd_co_writev_1(NbdClientSession *client, int64_t sector_num, +static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int offset) { +NbdClientSession *client = nbd_get_client_session(bs); struct nbd_request request = { .type = NBD_CMD_WRITE }; struct nbd_reply reply; ssize_t ret; -if (!bdrv_enable_write_cache(client-bs) +if (!bdrv_enable_write_cache(bs) (client-nbdflags
Re: [Qemu-devel] [PATCH 3/3] iotests: Add test for drive-mirror with NBD target
On 02/02/2015 22:40, Max Reitz wrote: When the drive-mirror block job is completed, it will call bdrv_swap() on the source and the target BDS; this should obviously not result in a segmentation fault. Signed-off-by: Max Reitz mre...@redhat.com --- tests/qemu-iotests/094 | 81 ++ tests/qemu-iotests/094.out | 11 +++ tests/qemu-iotests/group | 1 + 3 files changed, 93 insertions(+) create mode 100755 tests/qemu-iotests/094 create mode 100644 tests/qemu-iotests/094.out diff --git a/tests/qemu-iotests/094 b/tests/qemu-iotests/094 new file mode 100755 index 000..27a2be2 --- /dev/null +++ b/tests/qemu-iotests/094 @@ -0,0 +1,81 @@ +#!/bin/bash +# +# Test case for drive-mirror to NBD (especially bdrv_swap() on NBD BDS) +# +# Copyright (C) 2015 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +# creator +owner=mre...@redhat.com + +seq=$(basename $0) +echo QA output created by $seq + +here=$PWD +tmp=/tmp/$$ +status=1 # failure is the default! + +trap exit \$status 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common.rc +. ./common.filter +. ./common.qemu + +_supported_fmt generic +_supported_proto nbd +_supported_os Linux +_unsupported_imgopts subformat=monolithicFlat subformat=twoGbMaxExtentFlat + +_make_test_img 64M +$QEMU_IMG create -f $IMGFMT $TEST_DIR/source.$IMGFMT 64M | _filter_img_create + +_launch_qemu -drive if=none,id=src,file=$TEST_DIR/source.$IMGFMT,format=raw \ + -nodefaults + +_send_qemu_cmd $QEMU_HANDLE \ +{'execute': 'qmp_capabilities'} \ +'return' + +# 'format': 'nbd' is not actually correct, but this is probably the only way +# to test bdrv_swap() on an NBD BDS +_send_qemu_cmd $QEMU_HANDLE \ +{'execute': 'drive-mirror', + 'arguments': {'device': 'src', +'target': '$TEST_IMG', +'format': 'nbd', +'sync':'full', +'mode':'existing'}} \ +'BLOCK_JOB_READY' + +_send_qemu_cmd $QEMU_HANDLE \ +{'execute': 'block-job-complete', + 'arguments': {'device': 'src'}} \ +'BLOCK_JOB_COMPLETE' + +_send_qemu_cmd $QEMU_HANDLE \ +{'execute': 'quit'} \ +'return' + +wait=1 _cleanup_qemu + +_cleanup_test_img +rm -f $TEST_DIR/source.$IMGFMT + +# success, all done +echo '*** done' +rm -f $seq.full +status=0 diff --git a/tests/qemu-iotests/094.out b/tests/qemu-iotests/094.out new file mode 100644 index 000..b66dc07 --- /dev/null +++ b/tests/qemu-iotests/094.out @@ -0,0 +1,11 @@ +QA output created by 094 +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 +Formatting 'TEST_DIR/source.IMGFMT', fmt=IMGFMT size=67108864 +{return: {}} +{return: {}} +{timestamp: {seconds: TIMESTAMP, microseconds: TIMESTAMP}, event: BLOCK_JOB_READY, data: {device: src, len: 67108864, offset: 67108864, speed: 0, type: mirror}} +{return: {}} +{timestamp: {seconds: TIMESTAMP, microseconds: TIMESTAMP}, event: BLOCK_JOB_COMPLETED, data: {device: src, len: 67108864, offset: 67108864, speed: 0, type: mirror}} +{return: {}} +{timestamp: {seconds: TIMESTAMP, microseconds: TIMESTAMP}, event: SHUTDOWN} +*** done diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group index 4b2b93b..6e2447a 100644 --- a/tests/qemu-iotests/group +++ b/tests/qemu-iotests/group @@ -99,6 +99,7 @@ 090 rw auto quick 091 rw auto 092 rw auto quick +094 rw auto quick 095 rw auto quick 097 rw auto backing 098 rw auto backing quick Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Re: [Qemu-devel] [PATCH v2 0/5] Common unplug and unplug request cb for memory and CPU hot-unplug
HI, If you can push the patchset to a branch on github, it will be convenient for other guys to do some tests. On Wed, Jan 28, 2015 at 3:45 PM, Zhu Guihua zhugh.f...@cn.fujitsu.com wrote: Memory and CPU hot unplug are both asynchronous procedures. When the unplug operation happens, unplug request cb is called first. And when guest OS finished handling unplug, unplug cb will be called to do the real removal of device. They both need pc-machine, piix4 and ich9 unplug and unplug request cb. So this patchset introduces these commom functions as part1, and memory and CPU hot-unplug will come soon as part 2 and 3. This patch-set is based on QEmu 2.2 v2: - Commit messages changes Tang Chen (5): acpi, pc: Add hotunplug request cb for pc machine. acpi, ich9: Add hotunplug request cb for ich9. acpi, pc: Add unplug cb for pc machine. acpi, ich9: Add unplug cb for ich9. acpi, piix4: Add unplug cb for piix4. hw/acpi/ich9.c | 14 ++ hw/acpi/piix4.c| 8 hw/i386/pc.c | 16 hw/isa/lpc_ich9.c | 14 -- include/hw/acpi/ich9.h | 4 5 files changed, 54 insertions(+), 2 deletions(-) -- 1.9.3 -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [PATCH 13/19] libqos/ahci: add ahci command size setters
On 02/02/2015 22:09, John Snow wrote: In this case, only the command header had a utility written for it to flip the bits for me. This is part of the FIS, instead, which has no explicit flip-on-write mechanism inside of commit. So, it's correct, but not terribly consistent. I can write a fis write helper to make this more internally consistent about when we handle it for the user and when we don't. Please do. :) Paolo
Re: [Qemu-devel] [PATCH v2 1/2] configure: Default to enable module build
On 03/02/2015 02:29, Fam Zheng wrote: Peter reported that module linking fails on ARM host: LINK block/curl.so /usr/bin/ld: block/curl.o: relocation R_ARM_THM_MOVW_ABS_NC against `__stack_chk_guard' can not be used when making a shared object; recompile with -fPIC block/curl.o: could not read symbols: Bad value collect2: error: ld returned 1 exit status I don't see how -fPIC is missed in ARM host :( Does the below patch fix this? I haven't yet tested on ARM host, hope to do so some time this week. Paolo
Re: [Qemu-devel] [PATCH 2/3] iotests: Add wait functionality to _cleanup_qemu
On 02/02/2015 22:40, Max Reitz wrote: The qemu process does not always need to be killed, just waiting for it can be fine, too. This introduces a way to do so. Signed-off-by: Max Reitz mre...@redhat.com --- tests/qemu-iotests/common.qemu | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu index 8e618b5..4e1996c 100644 --- a/tests/qemu-iotests/common.qemu +++ b/tests/qemu-iotests/common.qemu @@ -187,13 +187,23 @@ function _launch_qemu() # Silenty kills the QEMU process +# +# If $wait is set to anything other than the empty string, the process will not +# be killed but only waited for, and any output will be forwarded to stdout. If +# $wait is empty, the process will be killed and all output will be suppressed. function _cleanup_qemu() { # QEMU_PID[], QEMU_IN[], QEMU_OUT[] all use same indices for i in ${!QEMU_OUT[@]} do -kill -KILL ${QEMU_PID[$i]} 2/dev/null +if [ -z ${wait} ]; then +kill -KILL ${QEMU_PID[$i]} 2/dev/null +fi wait ${QEMU_PID[$i]} 2/dev/null # silent kill +if [ -n ${wait} ]; then +cat ${QEMU_OUT[$i]} | _filter_testdir | _filter_qemu \ + | _filter_qemu_io | _filter_qmp +fi rm -f ${QEMU_FIFO_IN}_${i} ${QEMU_FIFO_OUT}_${i} eval exec ${QEMU_IN[$i]}- # close file descriptors eval exec ${QEMU_OUT[$i]}- Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Re: [Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support
HI, Can you push the patchset to a branch on github? It will be convenient for other guys to do some tests. On Wed, Jan 14, 2015 at 3:44 PM, Zhu Guihua zhugh.f...@cn.fujitsu.com wrote: This series is based on chen fan's previous i386 cpu hot remove patchset: https://lists.nongnu.org/archive/html/qemu-devel/2013-12/msg04266.html Via implementing ACPI standard methods _EJ0 in ACPI table, after Guest OS remove one vCPU online, the fireware will store removed bitmap to QEMU, then QEMU could know to notify the assigned vCPU of exiting. Meanwhile, intruduce the QOM command 'device_del' to remove vCPU from QEMU itself. The whole work is based on the new hot plug/unplug framework, ,the unplug request callback does the pre-check and send the request, unplug callback does the removal handling. This series depends on tangchen's common hot plug/unplug enhance patchset. [RESEND PATCH v1 0/5] Common unplug and unplug request cb for memory and CPU hot-unplug https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg00429.html The is the second half of the previous series: [RFC V2 00/10] cpu: add device_add foo-x86_64-cpu and i386 cpu hot remove support https://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04779.html If you want to test the series, you need to apply the 'device_add foo-x86_64-cpu' patchset first: [PATCH v3 0/7] cpu: add device_add foo-x86_64-cpu support https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg01552.html --- Changelog since v1: -rebase on the latest version. -delete patch i386/cpu: add instance finalize callback, and put it into patchset [PATCH v3 0/6] cpu: add device_add foo-x86_64-cpu support. Changelog since RFC: -splited the i386 cpu hot remove into single thread. -replaced apic_no with apic_id, so does the related stuff to make it work with arbitrary CPU hotadd. -add the icc_device_unrealize callback to handle apic unrealize. -rework on the new hot plug/unplug platform. --- Chen Fan (2): x86: add x86_cpu_unrealizefn() for cpu apic remove cpu hotplug: implement function cpu_status_write() for vcpu ejection Gu Zheng (5): acpi/cpu: add cpu hot unplug request callback function acpi/piix4: add cpu hot unplug callback support acpi/ich9: add cpu hot unplug support pc: add cpu hot unplug callback support cpus: reclaim allocated vCPU objects Zhu Guihua (4): acpi/piix4: add cpu hot unplug request callback support acpi/ich9: add cpu hot unplug request callback support pc: add cpu hot unplug request callback support acpi/cpu: add cpu hot unplug callback function cpus.c| 44 hw/acpi/cpu_hotplug.c | 88 --- hw/acpi/ich9.c| 17 ++-- hw/acpi/piix4.c | 12 +- hw/core/qdev.c| 2 +- hw/cpu/icc_bus.c | 11 + hw/i386/acpi-dsdt-cpu-hotplug.dsl | 6 ++- hw/i386/kvm/apic.c| 8 hw/i386/pc.c | 62 +-- hw/intc/apic.c| 10 + hw/intc/apic_common.c | 21 ++ include/hw/acpi/cpu_hotplug.h | 8 include/hw/cpu/icc_bus.h | 1 + include/hw/i386/apic_internal.h | 1 + include/hw/qdev-core.h| 1 + include/qom/cpu.h | 9 include/sysemu/kvm.h | 1 + kvm-all.c | 57 - target-i386/cpu.c | 46 19 files changed, 378 insertions(+), 27 deletions(-) -- 1.9.3 -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [question] the patch which affect performance of virtio-scsi
On 03/02/2015 03:31, Wangting (Kathy) wrote: Hi Paolo, Recently I test IO performance with virtio-scsi, and find out that the patch of virtio-scsi: add support for the any_layout feature affects IO performance of model with 4KB 32iodepth sequence read. Why cdb and sense is removed from the struct of VirtIOSCSICmdReq and VirtIOSCSICmdResp? Because I could not find any other way to implement ANY_LAYOUT, and virtio 1.0 requires that feature. The performance however was already improved in commit faf1e1f (virtio-scsi: Optimize virtio_scsi_init_req, 2014-09-16). Paolo How do you consider the impact of the changes to the performance? Although the latest version of qemu can optimize the performance by the way of reading merger, I think the affect of this patch cannot be ignored.
Re: [Qemu-devel] [RESEND PATCH v1 00/13] QEmu memory hot unplug support.
HI, Can you push the patchset to a branch on github? It will be convenient for other guys to do some tests. On Thu, Jan 8, 2015 at 9:06 AM, Tang Chen tangc...@cn.fujitsu.com wrote: Memory hot unplug are both asynchronize procedures. When the unplug operation happens, unplug request cb is called first. And when ghest OS finished handling unplug, unplug cb will be called to do the real removal of device. This patch-set is based on QEmu 2.2 This series depends on the following patchset. [PATCH] Common unplug and unplug request cb for memory and CPU hot-unplug. https://www.mail-archive.com/qemu-devel@nongnu.org/msg272745.html Hu Tao (2): acpi, piix4: Add memory hot unplug request support for piix4. pc, acpi bios: Add memory hot unplug interface. Tang Chen (11): acpi, mem-hotplug: Use PC_DIMM_SLOT_PROP in acpi_memory_plug_cb(). acpi, mem-hotplug: Add acpi_memory_get_slot_status_descriptor() to get MemStatus. acpi, mem-hotplug: Add acpi_memory_hotplug_sci() to rise sci for memory hotplug. acpi, mem-hotplug: Add unplug request cb for memory device. acpi, ich9: Add memory hot unplug request support for ich9. pc-dimm: Add memory hot unplug request support for pc-dimm. acpi, mem-hotplug: Add unplug cb for memory device. acpi, piix4: Add memory hot unplug support for piix4. acpi, ich9: Add memory hot unplug support for ich9. pc-dimm: Add memory hot unplug support for pc-dimm. acpi: Add hardware implementation for memory hot unplug. docs/specs/acpi_mem_hotplug.txt | 8 +++- hw/acpi/ich9.c| 20 ++-- hw/acpi/memory_hotplug.c | 97 --- hw/acpi/piix4.c | 18 ++-- hw/core/qdev.c| 2 +- hw/i386/acpi-dsdt-mem-hotplug.dsl | 11 - hw/i386/pc.c | 53 +++-- hw/i386/ssdt-mem.dsl | 5 ++ include/hw/acpi/memory_hotplug.h | 6 +++ include/hw/acpi/pc-hotplug.h | 2 + include/hw/qdev-core.h| 1 + 11 files changed, 192 insertions(+), 31 deletions(-) -- 1.8.4.2 -- Regards, Zhi Yong Wu
Re: [Qemu-devel] [question] the patch which affect performance of virtio-scsi
On 03/02/2015 03:56, Wangting (Kathy) wrote: Sorry, I find that the patch of virtio-scsi: Optimize virtio_scsi_init_req can slove this problem. Great that you could confirm that. :) By the way, can you tell me the reason of the change about cdb and sense? cdb and sense are variable-size items. ANY_LAYOUT support changed VirtIOSCSIReq: instead of having a pointer to the request, it copies the request from guest memory into VirtIOSCSIReq. This is required because the request might not be contiguous in guest memory. And because the request and response headers (e.g. VirtIOSCSICmdReq and VirtIOSCSICmdResp) are included by value in VirtIOSCSIReq, the variable-sized fields have to be treated specially. Only one of them can remain in VirtIOSCSIReq, because you cannot have a flexible array member (e.g. uint_8 sense[];) in the middle of a struct. cdb is always used, so it is chosen for the variable-sized part of VirtIOSCSIReq: cdb was simply moved from VirtIOSCSICmdReq to VirtIOSCSIReq. Instead, requests that complete with sense data are not a fast path. Hence sense is retrieved from the SCSIRequest, and virtio_scsi_command_complete copies it into the guest buffer via scsi_req_get_sense + qemu_iovec_from_buf. Paolo
Re: [Qemu-devel] [PATCH 2/2] bootdevice: add check in restore_boot_order()
On 2015/2/3 15:49, Markus Armbruster wrote: You're right. pc.c's set_boot_dev() fails when its boot order argument is invalid. The boot order interface is crap, because it makes detecting configuration errors early hard. Two solutions: A. It may be hard, but not too hard for the determined 1. If once is given, register reset handler to restore boot order. 2. Pass the normal boot order to machine creation. Should fail when the normal boot order is invalid. 3. If once is given, set it with qemu_boot_set(). Fails when the once boot order is invalid. 4. Start the machine. 5. On reset, the reset handler calls qemu_boot_set() to restore boot order. Should never fail. What about the below patch? diff --git a/vl.c b/vl.c index 983259b..7d37191 100644 --- a/vl.c +++ b/vl.c @@ -126,6 +126,7 @@ int main(int argc, char **argv) @@ -126,6 +126,7 @@ int main(int argc, char **argv) --- a/vl.c +++ b/vl.c @@ -126,6 +126,7 @@ int main(int argc, char **argv) static const char *data_dir[16]; static int data_dir_idx; +const char *once = NULL; const char *bios_name = NULL; enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB; DisplayType display_type = DT_DEFAULT; @@ -4046,7 +4047,7 @@ int main(int argc, char **argv, char **envp) opts = qemu_opts_find(qemu_find_opts(boot-opts), NULL); if (opts) { char *normal_boot_order; -const char *order, *once; +const char *order; Error *local_err = NULL; order = qemu_opt_get(opts, order); @@ -4067,7 +4068,6 @@ int main(int argc, char **argv, char **envp) exit(1); } normal_boot_order = g_strdup(boot_order); -boot_order = once; qemu_register_reset(restore_boot_order, normal_boot_order); } @@ -4246,6 +4246,15 @@ int main(int argc, char **argv, char **envp) net_check_clients(); +if (once) { +Error *local_err = NULL; +qemu_boot_set(once, local_err); +if (local_err) { +error_report(%s, error_get_pretty(local_err)); +exit(1); +} +} + Regards, -Gonglei B. Fix the crappy interface Separate parameter validation from the actual action. Only validation may fail. Validate before starting the guest. * validate_bootdevices() fails Should never happen, because we've called it in main() already, treating failure as fatal error. Yes. * boot_set_handler is null MachineClass method init() may set this. main() could *easily* test whether it did! If it didn't, and -boot once is given, error out. Similar checks exist already, e.g. drive_check_orphaned(), net_check_clients(). They only warn, but that's detail. I agree, just need to report the error message. Regards, -Gonglei
Re: [Qemu-devel] [PATCH 17/19] qtest/ahci: Add a macro bootup routine
On 02/02/2015 22:12, John Snow wrote: It comes in handy later for testing migration so I don't have to do a lot of boilerplate for each instance, though it is just a convenience subroutine with no logic of its own. I like to cut down on boilerplate as much as possible to expose the logic of the test as much as possible. Have a suggestion for a better name, or are you very adamant about culling it? I'm adamant about culling it because I don't have a suggestion for a better name. In the long run, I think we should just have a qos_boot function that does everything including PCI scanning, mapping BARs and initializing devices. But we're of course very far from that. Paolo
Re: [Qemu-devel] [PATCH 0/4] nbd: iotest fixes and error message improvement
Am 27.01.2015 um 03:02 hat Max Reitz geschrieben: This series is a follow-up to my previous patch iotests: Specify format for qemu-nbd and as such relies on it. The first three patches of this series fix the qemu-iotests so they once again pass when using NBD. The fourth patch of this series improves NBD's error message for establishing connections, especially if the server's and the client's NBD version differs (which, until now, was simply Bad magic received). Thanks, applied to the block branch. Kevin
Re: [Qemu-devel] [RFC PATCH v8 11/21] replay: recording and replaying clock ticks
From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 03/02/2015 11:51, Pavel Dovgaluk wrote: From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 22/01/2015 09:52, Pavel Dovgalyuk wrote: Clock ticks are considered as the sources of non-deterministic data for virtual machine. This patch implements saving the clock values when they are acquired (virtual, host clock, rdtsc, and some other timers). When replaying the execution corresponding values are read from log and transfered to the module, which wants to read the values. Such a design required the clock polling to be synchronized. Sometimes it is not true - e.g. when timeouts for timer lists are checked. In this case we use a cached value of the clock, passing it to the client code. Signed-off-by: Pavel Dovgalyuk pavel.dovga...@ispras.ru --- cpus.c |3 +- include/qemu/timer.h | 10 + qemu-timer.c |7 ++-- replay/Makefile.objs |1 + replay/replay-internal.h | 13 +++ replay/replay-time.c | 84 ++ replay/replay.h | 25 ++ stubs/replay.c |9 + 8 files changed, 147 insertions(+), 5 deletions(-) create mode 100755 replay/replay-time.c diff --git a/cpus.c b/cpus.c index 8787277..01d89aa 100644 --- a/cpus.c +++ b/cpus.c @@ -353,7 +353,8 @@ static void icount_warp_rt(void *opaque) seqlock_write_lock(timers_state.vm_clock_seqlock); if (runstate_is_running()) { -int64_t clock = cpu_get_clock_locked(); +int64_t clock = REPLAY_CLOCK(REPLAY_CLOCK_VIRTUAL_RT, + cpu_get_clock_locked()); int64_t warp_delta; warp_delta = clock - vm_clock_warp_start; diff --git a/include/qemu/timer.h b/include/qemu/timer.h index 0666920..0c2472c 100644 --- a/include/qemu/timer.h +++ b/include/qemu/timer.h @@ -4,6 +4,7 @@ #include qemu/typedefs.h #include qemu-common.h #include qemu/notify.h +#include replay/replay.h /* timers */ @@ -760,6 +761,8 @@ int64_t cpu_icount_to_ns(int64_t icount); /***/ /* host CPU ticks (if available) */ +#define cpu_get_real_ticks cpu_get_real_ticks_impl + #if defined(_ARCH_PPC) static inline int64_t cpu_get_real_ticks(void) @@ -913,6 +916,13 @@ static inline int64_t cpu_get_real_ticks (void) } #endif +#undef cpu_get_real_ticks + +static inline int64_t cpu_get_real_ticks(void) cpu_get_real_ticks should never be used. Please instead wrap cpu_get_ticks() with REPLAY_CLOCK. I don't quite understand this comment. Do you mean that I should move REPLAY_CLOCK to the cpu_get_real_ticks usages instead of it's implementation? Only to the cpu_get_ticks usage. The others are okay. cpu_get_ticks cannot call cpu_get_real_ticks in icount mode. And other functions can. Then we should put REPLAY_CLOCK into those functions? +/*! Reads next clock value from the file. +If clock kind read from the file is different from the parameter, +the value is not used. +If the parameter is -1, the clock value is read to the cache anyway. */ In what case could the clock kind not match? It was used in full version which had to skip clock from the log while loading the VM state. So can it be removed for now? I think it can. Pavel Dovgalyuk
Re: [Qemu-devel] [PATCH 1/3] softfloat: Expand out the STATUS_PARAM macro
On 2 February 2015 at 21:37, Richard Henderson r...@twiddle.net wrote: On 02/02/2015 12:31 PM, Peter Maydell wrote: -void float_raise( int8 flags STATUS_PARAM ) +void float_raise(int8 flags , float_status *status) Extra space before comma. Thanks, fixed. I don't propose to send out a respin just for that. -- PMM
Re: [Qemu-devel] [PATCH v2 0/5] vhost-scsi: support to assign boot order
On 2015/2/3 19:11, Paolo Bonzini wrote: On 03/02/2015 09:55, Gonglei wrote: On 2015/1/29 15:08, Gonglei (Arei) wrote: From: Gonglei arei.gong...@huawei.com Qemu haven't provide a bootindex property for vhost-scsi device. So, we can not assign the boot order for it at present. But Some clients/users have requirements for that in some scenarios. This patch achieve the aim in Qemu side. Because Qemu only accept an wwpn argument for vhost-scsi, we cannot assign a tpgt. That's say tpg is transparent for Qemu, Qemu doesn't know which tpg can boot, but vhost-scsi driver module doesn't know too for one assigned wwpn. At present, we assume that the first tpg can boot only, and add a boot_tpgt property that defaults to 0. Of course, people can pass a valid value by qemu command line. Ping... Reviewed-by: Paolo Bonzini pbonz...@redhat.com Thanks :) Regards, -Gonglei
Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard
Am 03.02.2015 um 08:31 schrieb Denis V. Lunev: On 02/02/15 23:46, Denis V. Lunev wrote: On 02/02/15 23:40, Peter Lieven wrote: Am 02.02.2015 um 21:09 schrieb Denis V. Lunev: qemu_gluster_co_discard calculates size to discard as follows size_t size = nb_sectors * BDRV_SECTOR_SIZE; ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb); glfs_discard_async is declared as follows: int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent, glfs_io_cbk fn, void *data) __THROW This is problematic on i686 as sizeof(size_t) == 4. Set bl_max_discard to SIZE_MAX BDRV_SECTOR_BITS to avoid overflow on i386. Signed-off-by: Denis V. Lunev d...@openvz.org CC: Kevin Wolf kw...@redhat.com CC: Peter Lieven p...@kamp.de --- block/gluster.c | 9 + 1 file changed, 9 insertions(+) diff --git a/block/gluster.c b/block/gluster.c index 1eb3a8c..8a8c153 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -622,6 +622,11 @@ out: return ret; } +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs-bl.max_discard = MIN(SIZE_MAX BDRV_SECTOR_BITS, INT_MAX); +} + Looking at the gluster code bl.max_transfer_length should have the same limit, but thats a different patch. ha, the same applies to nbd code too. I'll do this stuff tomorrow and also I think that some audit in other drivers could reveal something interesting. Den ok. The situation is well rotten here on i686. The problem comes from the fact that QEMUIOVector and iovec uses size_t as length. All API calls use this abstraction. Thus all conversion operations from nr_sectors to size could bang at any moment. Putting dirty hands here is problematic from my point of view. Should we really care about this? 32bit applications are becoming old good history of IT... The host has to be 32bit to be in trouble. And at least if we have KVM the host has to support long mode. I have on my todo to add generic code for honouring bl.max_transfer_length in block.c. We could change default maximum from INT_MAX to SIZE_MAX BDRV_SECTOR_BITS for bl.max_transfer_length. Peter
[Qemu-devel] [PATCH] libcacard: stop linking against every single 3rd party library
Building QEMU results in a libcacard.so that links against practically the entire world linux-vdso.so.1 = (0x7fff71e99000) libssl3.so = /usr/lib64/libssl3.so (0x7f49f94b6000) libsmime3.so = /usr/lib64/libsmime3.so (0x7f49f928e000) libnss3.so = /usr/lib64/libnss3.so (0x7f49f8f67000) libnssutil3.so = /usr/lib64/libnssutil3.so (0x7f49f8d3b000) libplds4.so = /usr/lib64/libplds4.so (0x7f49f8b36000) libplc4.so = /usr/lib64/libplc4.so (0x7f49f8931000) libnspr4.so = /usr/lib64/libnspr4.so (0x7f49f86f2000) libdl.so.2 = /usr/lib64/libdl.so.2 (0x7f49f84ed000) libm.so.6 = /usr/lib64/libm.so.6 (0x7f49f81e5000) libgthread-2.0.so.0 = /usr/lib64/libgthread-2.0.so.0 (0x7f49f7fe3000) librt.so.1 = /usr/lib64/librt.so.1 (0x7f49f7dda000) libz.so.1 = /usr/lib64/libz.so.1 (0x7f49f7bc4000) libcap-ng.so.0 = /usr/lib64/libcap-ng.so.0 (0x7f49f79be000) libuuid.so.1 = /usr/lib64/libuuid.so.1 (0x7f49f77b8000) libgnutls.so.28 = /usr/lib64/libgnutls.so.28 (0x7f49f749a000) libSDL-1.2.so.0 = /usr/lib64/libSDL-1.2.so.0 (0x7f49f71fd000) libpthread.so.0 = /usr/lib64/libpthread.so.0 (0x7f49f6fe) libvte.so.9 = /usr/lib64/libvte.so.9 (0x7f49f6d3f000) libXext.so.6 = /usr/lib64/libXext.so.6 (0x7f49f6b2d000) libgtk-x11-2.0.so.0 = /usr/lib64/libgtk-x11-2.0.so.0 (0x7f49f64a) libgdk-x11-2.0.so.0 = /usr/lib64/libgdk-x11-2.0.so.0 (0x7f49f61de000) libpangocairo-1.0.so.0 = /usr/lib64/libpangocairo-1.0.so.0 (0x7f49f5fd1000) libatk-1.0.so.0 = /usr/lib64/libatk-1.0.so.0 (0x7f49f5daa000) libcairo.so.2 = /usr/lib64/libcairo.so.2 (0x7f49f5a9d000) libgdk_pixbuf-2.0.so.0 = /usr/lib64/libgdk_pixbuf-2.0.so.0 (0x7f49f5878000) libgio-2.0.so.0 = /usr/lib64/libgio-2.0.so.0 (0x7f49f550) libpangoft2-1.0.so.0 = /usr/lib64/libpangoft2-1.0.so.0 (0x7f49f52eb000) libpango-1.0.so.0 = /usr/lib64/libpango-1.0.so.0 (0x7f49f50a) libgobject-2.0.so.0 = /usr/lib64/libgobject-2.0.so.0 (0x7f49f4e4e000) libglib-2.0.so.0 = /usr/lib64/libglib-2.0.so.0 (0x7f49f4b15000) libfontconfig.so.1 = /usr/lib64/libfontconfig.so.1 (0x7f49f48d6000) libfreetype.so.6 = /usr/lib64/libfreetype.so.6 (0x7f49f462b000) libX11.so.6 = /usr/lib64/libX11.so.6 (0x7f49f42e8000) libxenstore.so.3.0 = /usr/lib64/libxenstore.so.3.0 (0x7f49f40de000) libxenctrl.so.4.4 = /usr/lib64/libxenctrl.so.4.4 (0x7f49f3eb6000) libxenguest.so.4.4 = /usr/lib64/libxenguest.so.4.4 (0x7f49f3c8b000) libseccomp.so.2 = /usr/lib64/libseccomp.so.2 (0x7f49f3a74000) librdmacm.so.1 = /usr/lib64/librdmacm.so.1 (0x7f49f385d000) libibverbs.so.1 = /usr/lib64/libibverbs.so.1 (0x7f49f364a000) libutil.so.1 = /usr/lib64/libutil.so.1 (0x7f49f3447000) libc.so.6 = /usr/lib64/libc.so.6 (0x7f49f3089000) /lib64/ld-linux-x86-64.so.2 (0x7f49f9902000) libp11-kit.so.0 = /usr/lib64/libp11-kit.so.0 (0x7f49f2e23000) libtspi.so.1 = /usr/lib64/libtspi.so.1 (0x7f49f2bb2000) libtasn1.so.6 = /usr/lib64/libtasn1.so.6 (0x7f49f299f000) libnettle.so.4 = /usr/lib64/libnettle.so.4 (0x7f49f276d000) libhogweed.so.2 = /usr/lib64/libhogweed.so.2 (0x7f49f2545000) libgmp.so.10 = /usr/lib64/libgmp.so.10 (0x7f49f22cd000) libncurses.so.5 = /usr/lib64/libncurses.so.5 (0x7f49f20a5000) libtinfo.so.5 = /usr/lib64/libtinfo.so.5 (0x7f49f1e7a000) libgmodule-2.0.so.0 = /usr/lib64/libgmodule-2.0.so.0 (0x7f49f1c76000) libXfixes.so.3 = /usr/lib64/libXfixes.so.3 (0x7f49f1a6f000) libXrender.so.1 = /usr/lib64/libXrender.so.1 (0x7f49f1865000) libXinerama.so.1 = /usr/lib64/libXinerama.so.1 (0x7f49f1662000) libXi.so.6 = /usr/lib64/libXi.so.6 (0x7f49f1452000) libXrandr.so.2 = /usr/lib64/libXrandr.so.2 (0x7f49f1247000) libXcursor.so.1 = /usr/lib64/libXcursor.so.1 (0x7f49f103c000) libXcomposite.so.1 = /usr/lib64/libXcomposite.so.1 (0x7f49f0e39000) libXdamage.so.1 = /usr/lib64/libXdamage.so.1 (0x7f49f0c35000) libharfbuzz.so.0 = /usr/lib64/libharfbuzz.so.0 (0x7f49f09dd000) libpixman-1.so.0 = /usr/lib64/libpixman-1.so.0 (0x7f49f072f000) libEGL.so.1 = /usr/lib64/libEGL.so.1 (0x7f49f0505000) libpng16.so.16 = /usr/lib64/libpng16.so.16 (0x7f49f02d2000) libxcb-shm.so.0 = /usr/lib64/libxcb-shm.so.0 (0x7f49f00cd000) libxcb-render.so.0 = /usr/lib64/libxcb-render.so.0 (0x7f49efec3000) libxcb.so.1 = /usr/lib64/libxcb.so.1 (0x7f49efca1000) libGL.so.1 = /usr/lib64/libGL.so.1 (0x7f49efa06000)
Re: [Qemu-devel] [PATCH v2 00/11] target-arm: handle mmu_idx/translation regimes properly
On 29 January 2015 at 18:55, Peter Maydell peter.mayd...@linaro.org wrote: This patchseries fixes up our somewhat broken handling of mmu_idx values: * implement the full set of 7 mmu_idxes we need for supporting EL2 and EL3 * pass the mmu_idx in the TB flags rather than EL or a priv flag, so we can generate code with the correct kind of access * identify the correct mmu_idx to use for AT/ATS system insns * pass mmu_idx into get_phys_addr() and use it within that family of functions as an indication of which translation regime to do a v-to-p lookup for, instead of relying on an is_user flag plus the current CPU state * some minor indent stuff on the end It does not contain: * complete support for EL2 or 64-bit EL3; in some places I have added the code where it was obvious and easy; in others I have just left TODO marker comments * the 'tlb_flush_for_mmuidx' functionality I proposed in a previous mail; I preferred to get the semantics right in this patchset first before improving the efficiency later I'm planning to put this series into my next target-arm pull, sometime tail end of the week. -- PMM
Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard
Am 03.02.2015 um 12:30 hat Peter Lieven geschrieben: Am 03.02.2015 um 08:31 schrieb Denis V. Lunev: On 02/02/15 23:46, Denis V. Lunev wrote: On 02/02/15 23:40, Peter Lieven wrote: Am 02.02.2015 um 21:09 schrieb Denis V. Lunev: qemu_gluster_co_discard calculates size to discard as follows size_t size = nb_sectors * BDRV_SECTOR_SIZE; ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb); glfs_discard_async is declared as follows: int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent, glfs_io_cbk fn, void *data) __THROW This is problematic on i686 as sizeof(size_t) == 4. Set bl_max_discard to SIZE_MAX BDRV_SECTOR_BITS to avoid overflow on i386. Signed-off-by: Denis V. Lunev d...@openvz.org CC: Kevin Wolf kw...@redhat.com CC: Peter Lieven p...@kamp.de --- block/gluster.c | 9 + 1 file changed, 9 insertions(+) diff --git a/block/gluster.c b/block/gluster.c index 1eb3a8c..8a8c153 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -622,6 +622,11 @@ out: return ret; } +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs-bl.max_discard = MIN(SIZE_MAX BDRV_SECTOR_BITS, INT_MAX); +} + Looking at the gluster code bl.max_transfer_length should have the same limit, but thats a different patch. ha, the same applies to nbd code too. I'll do this stuff tomorrow and also I think that some audit in other drivers could reveal something interesting. Den ok. The situation is well rotten here on i686. The problem comes from the fact that QEMUIOVector and iovec uses size_t as length. All API calls use this abstraction. Thus all conversion operations from nr_sectors to size could bang at any moment. Putting dirty hands here is problematic from my point of view. Should we really care about this? 32bit applications are becoming old good history of IT... The host has to be 32bit to be in trouble. And at least if we have KVM the host has to support long mode. I have on my todo to add generic code for honouring bl.max_transfer_length in block.c. We could change default maximum from INT_MAX to SIZE_MAX BDRV_SECTOR_BITS for bl.max_transfer_length. So the conclusion is that we'll apply this series as it is and you'll take care of the rest later? Kevin
Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard
Am 03.02.2015 um 12:37 schrieb Kevin Wolf: Am 03.02.2015 um 12:30 hat Peter Lieven geschrieben: Am 03.02.2015 um 08:31 schrieb Denis V. Lunev: On 02/02/15 23:46, Denis V. Lunev wrote: On 02/02/15 23:40, Peter Lieven wrote: Am 02.02.2015 um 21:09 schrieb Denis V. Lunev: qemu_gluster_co_discard calculates size to discard as follows size_t size = nb_sectors * BDRV_SECTOR_SIZE; ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb); glfs_discard_async is declared as follows: int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent, glfs_io_cbk fn, void *data) __THROW This is problematic on i686 as sizeof(size_t) == 4. Set bl_max_discard to SIZE_MAX BDRV_SECTOR_BITS to avoid overflow on i386. Signed-off-by: Denis V. Lunev d...@openvz.org CC: Kevin Wolf kw...@redhat.com CC: Peter Lieven p...@kamp.de --- block/gluster.c | 9 + 1 file changed, 9 insertions(+) diff --git a/block/gluster.c b/block/gluster.c index 1eb3a8c..8a8c153 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -622,6 +622,11 @@ out: return ret; } +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs-bl.max_discard = MIN(SIZE_MAX BDRV_SECTOR_BITS, INT_MAX); +} + Looking at the gluster code bl.max_transfer_length should have the same limit, but thats a different patch. ha, the same applies to nbd code too. I'll do this stuff tomorrow and also I think that some audit in other drivers could reveal something interesting. Den ok. The situation is well rotten here on i686. The problem comes from the fact that QEMUIOVector and iovec uses size_t as length. All API calls use this abstraction. Thus all conversion operations from nr_sectors to size could bang at any moment. Putting dirty hands here is problematic from my point of view. Should we really care about this? 32bit applications are becoming old good history of IT... The host has to be 32bit to be in trouble. And at least if we have KVM the host has to support long mode. I have on my todo to add generic code for honouring bl.max_transfer_length in block.c. We could change default maximum from INT_MAX to SIZE_MAX BDRV_SECTOR_BITS for bl.max_transfer_length. So the conclusion is that we'll apply this series as it is and you'll take care of the rest later? Yes, and actually we need a macro like #define BDRV_MAX_REQUEST_SECTORS MIN(SIZE_MAX BDRV_SECTOR_BITS, INT_MAX) as limit for everything. Because bdrv_check_byte_request already has a size_t argument. So we could already create an overflow in bdrv_check_request when we convert nb_sectors to size_t. I will create a patch to catch at least this overflow shortly. Peter
[Qemu-devel] [PATCH v2 12/19] libqos/ahci: add ahci command functions
This patch adds the AHCICommand structure, and a set of functions to operate on the structure. ahci_command_create - Initialize and create a new AHCICommand in memory ahci_command_free - Destroy this object. ahci_command_set_buffer - Set where the guest memory DMA buffer is. ahci_command_commit - Write this command to the AHCI HBA. ahci_command_issue - Issue the committed command synchronously. ahci_command_issue_async - Issue the committed command asynchronously. ahci_command_wait - Wait for an asynchronous command to finish. ahci_command_slot - Get the number of the command slot we committed to. Helpers: size_to_prdtl - Calculate the required minimum PRDTL size from a buffer size. ahci_command_find - Given an ATA command mnemonic, look it up in the properties table to obtain info about the command. command_header_init - Initialize the command header with sane values. command_table_init - Initialize the command table with sane values. Signed-off-by: John Snow js...@redhat.com --- tests/ahci-test.c | 73 +-- tests/libqos/ahci.c | 202 tests/libqos/ahci.h | 15 3 files changed, 234 insertions(+), 56 deletions(-) diff --git a/tests/ahci-test.c b/tests/ahci-test.c index 658956d..0834020 100644 --- a/tests/ahci-test.c +++ b/tests/ahci-test.c @@ -657,30 +657,28 @@ static void ahci_test_port_spec(AHCIQState *ahci, uint8_t port) */ static void ahci_test_identify(AHCIQState *ahci) { -RegH2DFIS fis; -AHCICommandHeader cmd; -PRD prd; uint32_t data_ptr; uint16_t buff[256]; unsigned i; int rc; +AHCICommand *cmd; uint8_t cx; -uint64_t table; g_assert(ahci != NULL); /* We need to: - * (1) Create a Command Table Buffer and update the Command List Slot #0 - * to point to this buffer. - * (2) Construct an FIS host-to-device command structure, and write it to + * (1) Create a data buffer for the IDENTIFY response to be sent to, + * (2) Create a Command Table Buffer + * (3) Construct an FIS host-to-device command structure, and write it to * the top of the command table buffer. - * (3) Create a data buffer for the IDENTIFY response to be sent to * (4) Create a Physical Region Descriptor that points to the data buffer, * and write it to the bottom (offset 0x80) of the command table. - * (5) Now, PxCLB points to the command list, command 0 points to + * (5) Obtain a Command List slot, and update this header to point to + * the Command Table we built above. + * (6) Now, PxCLB points to the command list, command 0 points to * our table, and our table contains an FIS instruction and a * PRD that points to our rx buffer. - * (6) We inform the HBA via PxCI that there is a command ready in slot #0. + * (7) We inform the HBA via PxCI that there is a command ready in slot #0. */ /* Pick the first implemented and running port */ @@ -690,61 +688,24 @@ static void ahci_test_identify(AHCIQState *ahci) /* Clear out the FIS Receive area and any pending interrupts. */ ahci_port_clear(ahci, i); -/* Create a Command Table buffer. 0x80 is the smallest with a PRDTL of 0. */ -/* We need at least one PRD, so round up to the nearest 0x80 multiple. */ -table = ahci_alloc(ahci, CMD_TBL_SIZ(1)); -g_assert(table); -ASSERT_BIT_CLEAR(table, 0x7F); - -/* Create a data buffer ... where we will dump the IDENTIFY data to. */ +/* Create a data buffer where we will dump the IDENTIFY data to. */ data_ptr = ahci_alloc(ahci, 512); g_assert(data_ptr); -/* pick a command slot (should be 0!) */ -cx = ahci_pick_cmd(ahci, i); - -/* Construct our Command Header (set_command_header handles endianness.) */ -memset(cmd, 0x00, sizeof(cmd)); -cmd.flags = 5; /* reg_h2d_fis is 5 double-words long */ -cmd.flags |= CMDH_CLR_BSY; /* clear PxTFD.STS.BSY when done */ -cmd.prdtl = 1; /* One PRD table entry. */ -cmd.prdbc = 0; -cmd.ctba = table; - -/* Construct our PRD, noting that DBC is 0-indexed. */ -prd.dba = cpu_to_le64(data_ptr); -prd.res = 0; -/* 511+1 bytes, request DPS interrupt */ -prd.dbc = cpu_to_le32(511 | 0x8000); - -/* Construct our Command FIS, Based on http://wiki.osdev.org/AHCI */ -memset(fis, 0x00, sizeof(fis)); -fis.fis_type = REG_H2D_FIS; /* Register Host-to-Device FIS */ -fis.command = CMD_IDENTIFY; -fis.device = 0; -fis.flags = REG_H2D_FIS_CMD; /* Indicate this is a command FIS */ - -/* We've committed nothing yet, no interrupts should be posted yet. */ -g_assert_cmphex(ahci_px_rreg(ahci, i, AHCI_PX_IS), ==, 0); - -/* Commit the Command FIS to the Command Table */ -ahci_write_fis(ahci, fis, table); - -/* Commit the PRD entry to the Command Table */ -
Re: [Qemu-devel] [PATCH 4/8] guest agent: add guest-pipe-open
On 12/31/2014 06:06 AM, Denis V. Lunev wrote: From: Simon Zolin szo...@parallels.com Creates a FIFO pair that can be used with existing file read/write interfaces to communicate with processes spawned via the forthcoming guest-file-exec interface. Signed-off-by: Simon Zolin szo...@parallels.com Acked-by: Roman Kagan rka...@parallels.com Signed-off-by: Denis V. Lunev d...@openvz.org CC: Michael Roth mdr...@linux.vnet.ibm.com --- +++ b/qga/qapi-schema.json @@ -212,12 +212,33 @@ 'returns': 'int' } ## +# @guest-pipe-open +# +# Open a pipe to in the guest to associated with a qga-spawned processes +# for communication. +# +# Returns: Guest file handle on success, as per guest-file-open. This +# handle is useable with the same interfaces as a handle returned by s/useable/usable/ +# guest-file-open. +# +# Since: 2.3 +## +{ 'command': 'guest-pipe-open', + 'data':{ 'mode': 'str' }, + 'returns': 'int' } I'm not a fan of returning a bare 'int' - it is not extensible. Better is returning a dictionary, such as 'returns': { 'fd': 'int' }. That way, if we ever find a reason to return multiple pieces of information, we just return a larger dictionary. Yeah, I know guest-pipe-open breaks the rules here, and so consistency may be an argument in favor of also breaking the rules. I don't like 'mode' encoded as a raw string. Make it an enum type (as in { 'enum':'PipeMode', 'data':['read', 'write']} ... 'mode':'PipeMode') or even a bool (as in 'read':'bool') This only returns ONE end of a pipe (good for when the host is piping data into the child, or when the child is piping data into the host). But isn't your goal to also make it possible to string together multiple child processes where the output of one is the input of the other (no host involvement)? How would you wire that up? + +## # @guest-file-close: # # Close an open file in the guest # # @handle: filehandle returned by guest-file-open # +# Please note that closing the write side of a pipe will block until the read +# side is closed. If you passed the read-side of the pipe to a qga-spawned +# process, make sure the process has exited before attempting to close the +# write side. How does one pass the read side of a pipe to a spawned child? Can you design the spawning API so that close cannot deadlock? +# # Returns: Nothing on success. # # Since: 0.15.0 -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 3/8] guest agent: guest-file-open: refactoring
On 12/31/2014 06:06 AM, Denis V. Lunev wrote: From: Simon Zolin szo...@parallels.com Moved the code that sets non-blocking flag on fd into a separate function. Signed-off-by: Simon Zolin szo...@parallels.com Acked-by: Roman Kagan rka...@parallels.com Signed-off-by: Denis V. Lunev d...@openvz.org CC: Michael Roth mdr...@linux.vnet.ibm.com --- qga/commands-posix.c | 31 +++ 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index f6f3e3c..fd746db 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -376,13 +376,33 @@ safe_open_or_create(const char *path, const char *mode, Error **errp) return NULL; } +static int guest_file_toggle_flags(int fd, long flags, bool set, Error **err) +{ Why is 'flags' a long? +int ret, old_flags; + +old_flags = fcntl(fd, F_GETFL); +if (old_flags == -1) { +error_set_errno(err, errno, QERR_QGA_COMMAND_FAILED, +failed to fetch filehandle flags); +return -1; +} + +ret = fcntl(fd, F_SETFL, set ? (old_flags | flags) : (old_flags ~flags)); Bug. 'int | long' is a long, but on 64-bit platforms, passing a 'long' as the var-arg third argument of fcntl where the interface expects 'int' is liable to corrupt things depending on endianness. You MUST pass an 'int' for F_SETFL. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 4/8] guest agent: add guest-pipe-open
On 02/03/2015 02:57 PM, Eric Blake wrote: +# Returns: Guest file handle on success, as per guest-file-open. This +# handle is useable with the same interfaces as a handle returned by + 'returns': 'int' } I'm not a fan of returning a bare 'int' - it is not extensible. Better is returning a dictionary, such as 'returns': { 'fd': 'int' }. That way, if we ever find a reason to return multiple pieces of information, we just return a larger dictionary. Yeah, I know guest-pipe-open breaks the rules here, and so consistency may be an argument in favor of also breaking the rules. I meant to say 'guest-file-open' breaks the rules, and that you are proposing that 'guest-pipe-open' be consistent with 'guest-file-open'. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] RFC: Proposal to add QEMU Guest Environment Variables
Quoting Gabriel L. Somlo (2015-02-03 15:38:59) On Tue, Feb 03, 2015 at 02:11:12PM -0600, Michael Roth wrote: This does seem like useful functionality, but I think I'd like to know more about the actual use-cases being looked at. The proposed functionality is mostly equivalent to that offered by GuestInfo variables. So yes, initial activation scripts :) Is this mostly about executing initial activation scripts? Because after that point, a key-value store can be managed through the guest-file-read/write interfaces for anything on the guest-side that's interested in these variables. Even activation could be done using this approach, where the scripts start QGA and wait for the host to coordinate the initial creation of the file containing those variables, then setting a file marker that allows activation to proceed. And if that seems wonky, I'm fairly sure you could script the creation of the initial key-value store prior to starting the guest using libguestfs: http://libguestfs.org/ Specifically, I'm trying to port to QEMU a simulation/training setup where multiple VMs are started from the same base image, and guestinfo environment variables help each instance determine its personality. Editing the disk image is not feasible, since the idea is to share the base disk image across multiple VMs. And needing to connect to each VM Well, I assume by shared a base image you mean using a template image as the backing image for a COW image allocated for each guest prior to activation? As long as the editing is done against the COW image rather than the backing image it should work. Maybe it's not ideal, but it's feasible. I hadn't really considered the SMBIOS approach though. That might be more straightforward to get the initial store to the guest. after having started it, wait for it to bring up the QGA, then get it to accept environment variables, that's precisely the wonkiness I'm trying to avoid :) Understandable :) I can certainly start small and implement read-only, host-guest startup time values (the smbios type11 strings plus a way to read them via a guest-side binary associated with a guest-tools package), and we can decide whether we want to support set-env operations and exporting set-env and get-env via the agent at a later stage. That functionality is available with GuestInfo variables, but the system I'm trying to port to QEMU doesn't require it as far as I can tell. Seems like a reasonable start to me. Thanks much, --Gabriel I think we'd need a very strong argument to bake what seems to be high-level guest management tasks into QEMU. If that can avoided with some automated image modifications beforehand that seems to me the more reasonable approach. Libvirt could ostensibly even handle the task of writing those XML strings into the image's key-value store to make management easier, but I suspect even that is a bit too low in the stack for this level of management.
Re: [Qemu-devel] RFC: Proposal to add QEMU Guest Environment Variables
On Wed, Feb 04, 2015 at 12:49:22AM +0300, Denis V. Lunev wrote: On 04/02/15 00:38, Gabriel L. Somlo wrote: On Tue, Feb 03, 2015 at 02:11:12PM -0600, Michael Roth wrote: This does seem like useful functionality, but I think I'd like to know more about the actual use-cases being looked at. The proposed functionality is mostly equivalent to that offered by GuestInfo variables. So yes, initial activation scripts :) Is this mostly about executing initial activation scripts? Because after that point, a key-value store can be managed through the guest-file-read/write interfaces for anything on the guest-side that's interested in these variables. Even activation could be done using this approach, where the scripts start QGA and wait for the host to coordinate the initial creation of the file containing those variables, then setting a file marker that allows activation to proceed. And if that seems wonky, I'm fairly sure you could script the creation of the initial key-value store prior to starting the guest using libguestfs: http://libguestfs.org/ Specifically, I'm trying to port to QEMU a simulation/training setup where multiple VMs are started from the same base image, and guestinfo environment variables help each instance determine its personality. Editing the disk image is not feasible, since the idea is to share the base disk image across multiple VMs. And needing to connect to each VM after having started it, wait for it to bring up the QGA, then get it to accept environment variables, that's precisely the wonkiness I'm trying to avoid :) I can certainly start small and implement read-only, host-guest startup time values (the smbios type11 strings plus a way to read them via a guest-side binary associated with a guest-tools package), and we can decide whether we want to support set-env operations and exporting set-env and get-env via the agent at a later stage. That functionality is available with GuestInfo variables, but the system I'm trying to port to QEMU doesn't require it as far as I can tell. guest exec with ability to pass an environment could solve your problem even without read/write. Boot guest, wait guest agent startup, start something you need from agent with desired environment. I'm trying as hard as I can to avoid the bit where I have to wait guest agent startup, connect, run a bunch of stuff on the guest through the agent... :) The application I'm trying to port has a bunch of VM templates with some guestinfo/environment variables in them, many of them sharing the same disk image (vmdk) file. We start them all, and that's it. Fire and forget, no further fuss. If a VM hangs during startup, that's the application USER's problem, they can restart it, or whatever. With your suggestion, I'd have to write a bunch of additional logic to monitor each starting VM, connect to it, handle errors and exceptions (what if the QGA doesn't start, what if it takes a long time to start, etc.) Now dealing with a failed boot is suddenly the application's problem, since I'm sitting there waiting to connect to the agent, and have to do something if the agent isn't coming up. That's currently not necessary when using that other hypervisor from which I'm trying to migrate, so QEMU is at a bit of a disadvantage. I think there's value making it easy to port stuff over without imposing a major redesign of the application being ported... Thanks, --Gabriel this is a quote from the patchset being discussed at the moment. [PATCH v2 0/8] qemu: guest agent: implement guest-exec command for Linux +## +# @guest-exec: +# +# Execute a command in the guest +# +# @path: path or executable name to execute +# @params: #optional parameter list to pass to executable +# @env: #optional environment variables to pass to executable +# @handle_stdin: #optional handle to associate with process' stdin. +# @handle_stdout: #optional handle to associate with process' stdout +# @handle_stderr: #optional handle to associate with process' stderr +# +# Returns: PID on success. +# +# Since: 2.3 +## +{ 'command': 'guest-exec', + 'data':{ 'path': 'str', '*params': ['str'], '*env': ['str'], + '*handle_stdin': 'int', '*handle_stdout': 'int', + '*handle_stderr': 'int' }, + 'returns': 'int' } Thanks much, --Gabriel I think we'd need a very strong argument to bake what seems to be high-level guest management tasks into QEMU. If that can avoided with some automated image modifications beforehand that seems to me the more reasonable approach. Libvirt could ostensibly even handle the task of writing those XML strings into the image's key-value store to make management easier, but I suspect even that is a bit too low in the stack for this level of management.
Re: [Qemu-devel] [PATCH v2] qga: add guest-set-admin-password command
On 01/12/2015 08:58 AM, Daniel P. Berrange wrote: Add a new 'guest-set-admin-password' command for changing the root/administrator password. This command is needed to allow OpenStack to support its API for changing the admin password on a running guest. Accepts either the raw password string: $ virsh -c qemu:///system qemu-agent-command f21x86_64 \ '{ execute: guest-set-admin-password, arguments: { crypted: false, password: 12345678 } }' {return:{}} Or a pre-encrypted string (recommended) $ virsh -c qemu:///system qemu-agent-command f21x86_64 \ '{ execute: guest-set-admin-password, arguments: { crypted: true, password: $6$T9O/j/aGPrE...sniprQoRN4F0.GG0MPjNUNyml. } }' NB windows support is desirable, but not implemented in this patch. Signed-off-by: Daniel P. Berrange berra...@redhat.com --- qga/commands-posix.c | 90 qga/commands-win32.c | 6 qga/qapi-schema.json | 19 +++ 3 files changed, 115 insertions(+) +++ b/qga/qapi-schema.json @@ -738,3 +738,22 @@ ## { 'command': 'guest-get-fsinfo', 'returns': ['GuestFilesystemInfo'] } + +## +# @guest-set-admin-password +# +# @crypted: true if password is already crypt()d, false if raw +# @password: the new password entry +# +# If the @crypted flag is true, it is the callers responsibility s/callers/caller's/ +# to ensure the correct crypt() encryption scheme is used. This +# command does not attempt to interpret or report on the encryption +# scheme. Refer to the documentation of the guest operating system +# in question to determine what is supported. +# +# Returns: Nothing on success. +# +# Since 2.3 +## +{ 'command': 'guest-set-admin-password', + 'data': { 'crypted': 'bool', 'password': 'str' } } Normally, 'password':'str' means we are passing UTF8 JSON. But what if the desired password is NOT valid UTF8, but still valid to the end user (for example, a user that intentionally wants a Latin1 encoded password that uses 8-bit characters)? In other interfaces, we've allowed an enum that specifies whether a raw data string is 'utf8' or 'base64' encoded; should we have such a parameter here? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [v4 11/13] migration: Add interface to control compression
On 02/02/2015 04:05 AM, Liang Li wrote: The multiple compression threads can be turned on/off through qmp and hmp interface before doing live migration. Signed-off-by: Liang Li liang.z...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com Reviewed-by: Dr.David Alan Gilbert dgilb...@redhat.com --- migration/migration.c | 7 +-- qapi-schema.json | 7 ++- 2 files changed, 11 insertions(+), 3 deletions(-) Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC 05/10] extract TBContext from TCGContext.
On 01/29/2015 07:44 AM, Peter Maydell wrote: On 16 January 2015 at 17:19, fred.kon...@greensocs.com wrote: From: KONRAD Frederic fred.kon...@greensocs.com In order to have one TCGContext per thread and a single TBContext we have to extract TBContext from TCGContext. This seems a bit odd. It's not clear to me what the advantages are of having one TCGContext per thread but only a single TBContext (as opposed to either (1) having a single TCGContext and TBContext with locks protecting against multiple threads generating code at once, or (2) having each thread have its own TCGContext and TBContext and completely independent codegen). Maybe it would help if you sketched out your design in a little more detail in the cover letter, with emphasis on which data structures are going to be per-thread and which are going to be shared (and if so how shared). (Long term we would want to be able to have multiple TBContexts to support heterogenous systems where CPUs might be different architectures or have different views of physical memory...) Seconded. r~
Re: [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.
On 03/02/2015 17:17, Richard Henderson wrote: @@ -759,7 +760,9 @@ static void page_flush_tb_1(int level, void **lp) PageDesc *pd = *lp; for (i = 0; i V_L2_SIZE; ++i) { -pd[i].first_tb = NULL; +for (j = 0; j MAX_CPUS; j++) { +pd[i].first_tb[j] = NULL; +} invalidate_page_bitmap(pd + i); } } else { Surely you've got to do some locking somewhere in order to be able to modify another thread's cpu tb list. But that's probably not even necessary. page_flush_tb_1 is called from tb_flush, which in turn is only called in very special circumstances. It should be possible to have something like the kernel's stop_machine that does the following: 1) schedule a callback on all TCG CPU threads 2) wait for all CPUs to have reached that callback 3) do tb_flush on all CPUs, while it knows they are not holding any lock 4) release all TCG CPU threads With one TCG thread, just use qemu_bh_new (hidden behind a suitable API of course!). Once you have multiple TCG CPU threads, loop on all CPUs with the same run_on_cpu function that KVM uses. Paolo
Re: [Qemu-devel] [RFC PATCH v2 09/11] hw/arm/virt-acpi-build: Generate XSDT table
On 02/03/15 17:19, Igor Mammedov wrote: On Thu, 29 Jan 2015 16:37:11 +0800 Shannon Zhao zhaoshengl...@huawei.com wrote: XDST points to other tables except FACS DSDT. Is there any reason to use XSDT instead of RSDT? If ACPI tables are below 4Gb which probably would be the case then RSDT could be used just fine and we could share more code between x86 and ARM. Laszlo, Do you know if OVMF allocates memory below 4G address range? Yes, it does. https://github.com/tianocore/edk2/blob/master/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c#L162 RSDT should suffice. Thanks, Laszlo
Re: [Qemu-devel] [PATCH 0/7] MIPS: IEEE 754-2008 features support
Hi! On Fri, 30 Jan 2015 13:47:17 +, Maciej W. Rozycki ma...@linux-mips.org wrote: On Fri, 30 Jan 2015, Peter Maydell wrote: This patch series comprises changes to QEMU, both the MIPS backend and generic SoftFloat support code, to support IEEE 754-2008 features introduced to revision 3.50 of the MIPS Architecture as follows. Just to let you know that: (1) the softfloat relicensing has hit master, so this patchset isn't blocked by anything now \o/ (2) I would like to see a definite we are happy to license this patchset under the SoftFloat-2a license for these changes, because they were submitted before we applied the relicensing, and therefore the changes after $DATE will be -2a license unless otherwise stated note in the sourcecode can't be assumed to apply to them. Thanks for the heads-up! At this stage however someone at Mentor will have to make such a statement on behalf of the company as I am no longer there and as far as this patch set is concerned I am merely a member of the public who can just make technical comments as anyone can, including you. I think Thomas, being the writer of the majority of code comprising these patches Too bad that Git doesn't allow for listing several authors. ;-) is now in the best position to make such a statement happen. Thomas -- will you be able to take it from here? Thanks! It is fine to license these changes under the SoftFloat-2a license. Grüße, Thomas pgpiNvmH44Sxl.pgp Description: PGP signature
[Qemu-devel] balloon vs postcopy migrate
Hi, Andrea pointed out there is a risk that a guest inflating its balloon during a postcopy migrate could cause us problems, and I wanted to see what the best way of avoiding the problem was. Guests inflating there balloon cause an madvise(MADV_DONTNEED) on the host, marking pages as not present, that will potentially trigger a userfault, that we are using in postcopy to detect pages that need to be fetched from the source. In theory, at the moment guests *should* only ask for a balloon inflation if they've been asked to do so by the host; however there are no guards for that, and it's been suggested giving the guest more freedom might be a good idea anyway. My alternatives seem to be: 1) Stop servicing the message queue from the guest so that we just don't notice the inflate messages until afterwards. (Easy for Qemu, not sure how the guests will like an unserviced queue). 2) I could keep servicing the queue and ignore the messages (Easy for everyone, not very nice in actual used memory - does it cause any long term problems other than that?) 3) I could keep servicing the queue but put the messages in a list somewhere that replay after migrate has finished. (That list sounds bounded only in a very large way?) Thoughts? Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [RFC PATCH v2 09/11] hw/arm/virt-acpi-build: Generate XSDT table
On Thu, 29 Jan 2015 16:37:11 +0800 Shannon Zhao zhaoshengl...@huawei.com wrote: XDST points to other tables except FACS DSDT. Is there any reason to use XSDT instead of RSDT? If ACPI tables are below 4Gb which probably would be the case then RSDT could be used just fine and we could share more code between x86 and ARM. Laszlo, Do you know if OVMF allocates memory below 4G address range? Signed-off-by: Shannon Zhao zhaoshengl...@huawei.com --- hw/arm/virt-acpi-build.c| 32 +++- include/hw/acpi/acpi-defs.h |9 + 2 files changed, 40 insertions(+), 1 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index ac0a864..2a2b2ab 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -176,6 +176,32 @@ static void acpi_dsdt_add_virtio(AcpiAml *scope, const hwaddr *mmio_addrs, } } + +/* XSDT */ +static void +build_xsdt(GArray *table_data, GArray *linker, GArray *table_offsets) +{ +AcpiXsdtDescriptor *xsdt; +size_t xsdt_len; +int i; + +xsdt_len = sizeof(*xsdt) + sizeof(uint64_t) * table_offsets-len; +xsdt = acpi_data_push(table_data, xsdt_len); +memcpy(xsdt-table_offset_entry, table_offsets-data, + sizeof(uint64_t) * table_offsets-len); +for (i = 0; i table_offsets-len; ++i) { +/* xsdt-table_offset_entry to be filled by Guest linker */ +bios_linker_loader_add_pointer(linker, + ACPI_BUILD_TABLE_FILE, + ACPI_BUILD_TABLE_FILE, + table_data, xsdt-table_offset_entry[i], + sizeof(uint64_t)); +} +build_header(linker, table_data, (void *)xsdt, XSDT, + ACPI_BUILD_APPNAME6, ACPI_BUILD_APPNAME4, + xsdt_len, 1); +} + /* GTDT */ static void build_gtdt(GArray *table_data, GArray *linker, VirtGuestInfo *guest_info) @@ -311,7 +337,7 @@ static void virt_acpi_build(VirtGuestInfo *guest_info, AcpiBuildTables *tables) { GArray *table_offsets; -unsigned dsdt; +unsigned dsdt, xsdt; VirtAcpiCpuInfo cpuinfo; virt_acpi_get_cpu_info(cpuinfo); @@ -346,6 +372,10 @@ void virt_acpi_build(VirtGuestInfo *guest_info, AcpiBuildTables *tables) acpi_add_table(table_offsets, tables-table_data.buf); build_gtdt(tables-table_data.buf, tables-linker, guest_info); +/* XSDT is pointed to by RSDP */ +xsdt = tables-table_data.buf-len; +build_xsdt(tables-table_data.buf, tables-linker, table_offsets); + /* Cleanup memory that's no longer used. */ g_array_free(table_offsets, true); } diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h index ee40a5e..47c8c41 100644 --- a/include/hw/acpi/acpi-defs.h +++ b/include/hw/acpi/acpi-defs.h @@ -88,6 +88,15 @@ struct AcpiTableHeader /* ACPI common table header */ typedef struct AcpiTableHeader AcpiTableHeader; /* + * Extended System Description Table (XSDT) + */ +struct AcpiXsdtDescriptor { +ACPI_TABLE_HEADER_DEF +uint64_t table_offset_entry[1]; /* Array of pointers to ACPI tables */ +} QEMU_PACKED; +typedef struct AcpiXsdtDescriptor AcpiXsdtDescriptor; + +/* * ACPI Fixed ACPI Description Table (FADT) */ #define ACPI_FADT_COMMON_DEF /* FADT common definition */ \
Re: [Qemu-devel] [PATCH v2 3/7] softfloat: Convert `*_default_nan' variables into inline functions
On 01/30/2015 08:02 AM, Maciej W. Rozycki wrote: Hmm, so perhaps my idea for a later improvement: Eventually we might want to move the new inline functions into a separate header to be included from softfloat.h instead of softfloat.c, but let's make changes one step at a time. will actually have to be made right away. I suspect GCC is more liberal here due to its convoluted extern/static/inline semantics history. Sigh... GCC 5 is moving to -std=gnu11 as default, and so will have the same problem. r~
Re: [Qemu-devel] [PATCH v5 02/10] virtio-net: use qemu_mac_strdup_printf
On 01/22/2015 01:03 AM, sfel...@gmail.com wrote: From: Scott Feldman sfel...@gmail.com Signed-off-by: Scott Feldman sfel...@gmail.com --- hw/net/virtio-net.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) You could merge this with 1/10 without any dire consequences. But whether you merge or keep as two patches, feel free to add this on the respin: Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] vfio: free dynamically-allocated data in instance_finalize
On 03/02/2015 16:20, Alex Williamson wrote: On Tue, 2015-02-03 at 13:48 +0100, Paolo Bonzini wrote: In order to enable out-of-BQL address space lookup, destruction of devices needs to be split in two phases. Unrealize is the first phase; once it complete no new accesses will be started, but there may still be pending memory accesses can still be completed. The second part is freeing the device, which only happens once all memory accesses are complete. At this point the reference count has dropped to zero, an RCU grace period must have completed (because the RCU-protected FlatViews hold a reference to the device via memory_region_ref). This is when instance_finalize is called. Freeing data belongs in an instance_finalize callback, because the dynamically allocated memory can still be used after unrealize by the pending memory accesses. In the case of VFIO, the unrealize callback is too early to munmap the BARs. The munmap must be delayed until memory accesses are complete. To do this, split vfio_unmap_bars in two. The removal step, now called vfio_unregister_bars, remains in vfio_exitfn. The reclamation step is vfio_unmap_bars and is moved to the instance_finalize callback. Similarly, quirk MemoryRegions have to be removed during vfio_unregister_bars, but freeing the data structure must be delayed to vfio_unmap_bars. Cc: Alex Williamson alex.william...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- This patch is part of the third installment 3 of the RCU work. Sending it out separately for Alex to review it. hw/vfio/pci.c | 78 +- 1 file changed, 68 insertions(+), 10 deletions(-) Looks good to me. I don't see any external dependencies, so do you want me to pull this in through my branch? Thanks, Yes, please. Paolo
Re: [Qemu-devel] [PATCH v5 03/10] rocker: add register programming guide
On 01/22/2015 01:03 AM, sfel...@gmail.com wrote: From: Scott Feldman sfel...@gmail.com This is the register programming guide for the Rocker device. It's intended for driver writers and device writers. It covers the device's PCI space, the register set, DMA interface, and interrupts. In addition to typos already pointed out by Stefan, + +Writing BASE_ADDR or SIZE will reset HEAD and TAIL to zero. HEAD cannot be +written passed TAIL. To do so would wrap the ring. An empty ring is when HEAD s/passed/past/ + +To support forward- and backward-compatibility, descriptor and completion +payloads are specified in TLV format. Fields are packed with Type=field name, +Length=field length, and Value=field value. Software will ignore unknown fields +filled in by the switch. Likewise, the switch will ignore unknown fields +filled in by software. Is ignoring unknown fields always the wisest action? If the unknown fields are supposed to have an impact according the to writer, but get ignored by the reader, then the two can get out of sync with what they assume the other end is doing. +MSI-X vectors used for descriptor ring completions use a credit mechanism for +efficient device, PCIe bus, OS and driver operations. Each descriptor ring has +a credit count which represent the number of outstanding descriptors to be s/represent/represents/ + + portmapping + --- + 0 CPU port (for packets to/from host CPU) + 1-62front-panel physical ports + 63 loopback port + 64-0x RSVD + 0x0001-0x0001 logical tunnel ports +0x0002-0x RSVD Alignment looks off. +Port Settings +- + +Links status for all front-panel ports is available via PORT_PHYS_LINK_STATUS: s/Links/Link/ + + DESC_COMP_ERR reason + + 0 OK + -ROCKER_ENXIO address or data read err on desc buf + -ROCKER_ENOMEM no memory for internal staging desc buf + -ROCKER_EMSGSIZE Rx descriptor buffer wasn't big enough to contain + pactet data TLV and other TLVs. s/pactet/packet/ -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.
On 01/16/2015 09:19 AM, fred.kon...@greensocs.com wrote: @@ -759,7 +760,9 @@ static void page_flush_tb_1(int level, void **lp) PageDesc *pd = *lp; for (i = 0; i V_L2_SIZE; ++i) { -pd[i].first_tb = NULL; +for (j = 0; j MAX_CPUS; j++) { +pd[i].first_tb[j] = NULL; +} invalidate_page_bitmap(pd + i); } } else { Surely you've got to do some locking somewhere in order to be able to modify another thread's cpu tb list. I realize that we do have to solve this problem for x86, but for most other targets we ought, in principal, be able to avoid it. Which simply requires that we not treat icache flushes as nops. When the kernel has modified a page, like so, it will also have notified the other cpus that like so, if (smp_call_function(ipi_flush_icache_page, mm, 1)) { We ought to be able to leverage this to avoid some locking at the qemu level. r~
Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard
On 03/02/15 14:47, Peter Lieven wrote: Am 03.02.2015 um 12:37 schrieb Kevin Wolf: Am 03.02.2015 um 12:30 hat Peter Lieven geschrieben: Am 03.02.2015 um 08:31 schrieb Denis V. Lunev: On 02/02/15 23:46, Denis V. Lunev wrote: On 02/02/15 23:40, Peter Lieven wrote: Am 02.02.2015 um 21:09 schrieb Denis V. Lunev: qemu_gluster_co_discard calculates size to discard as follows size_t size = nb_sectors * BDRV_SECTOR_SIZE; ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb); glfs_discard_async is declared as follows: int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent, glfs_io_cbk fn, void *data) __THROW This is problematic on i686 as sizeof(size_t) == 4. Set bl_max_discard to SIZE_MAX BDRV_SECTOR_BITS to avoid overflow on i386. Signed-off-by: Denis V. Lunev d...@openvz.org CC: Kevin Wolf kw...@redhat.com CC: Peter Lieven p...@kamp.de --- block/gluster.c | 9 + 1 file changed, 9 insertions(+) diff --git a/block/gluster.c b/block/gluster.c index 1eb3a8c..8a8c153 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -622,6 +622,11 @@ out: return ret; } +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs-bl.max_discard = MIN(SIZE_MAX BDRV_SECTOR_BITS, INT_MAX); +} + Looking at the gluster code bl.max_transfer_length should have the same limit, but thats a different patch. ha, the same applies to nbd code too. I'll do this stuff tomorrow and also I think that some audit in other drivers could reveal something interesting. Den ok. The situation is well rotten here on i686. The problem comes from the fact that QEMUIOVector and iovec uses size_t as length. All API calls use this abstraction. Thus all conversion operations from nr_sectors to size could bang at any moment. Putting dirty hands here is problematic from my point of view. Should we really care about this? 32bit applications are becoming old good history of IT... The host has to be 32bit to be in trouble. And at least if we have KVM the host has to support long mode. I have on my todo to add generic code for honouring bl.max_transfer_length in block.c. We could change default maximum from INT_MAX to SIZE_MAX BDRV_SECTOR_BITS for bl.max_transfer_length. So the conclusion is that we'll apply this series as it is and you'll take care of the rest later? Yes, and actually we need a macro like #define BDRV_MAX_REQUEST_SECTORS MIN(SIZE_MAX BDRV_SECTOR_BITS, INT_MAX) as limit for everything. Because bdrv_check_byte_request already has a size_t argument. So we could already create an overflow in bdrv_check_request when we convert nb_sectors to size_t. I will create a patch to catch at least this overflow shortly. Peter I like this macro :) I vote to move MIN(SIZE_MAX BDRV_SECTOR_BITS, INT_MAX) into generic code on discard/write_zero paths immediately and drop this exact patch. Patch 2 of this set would be better to have additional +bs-bl.max_transfer_length = UINT32_MAX BDRV_SECTOR_BITS; I'll wait Peter's patch and respin on top of it to avoid unnecessary commits. Den
[Qemu-devel] [PATCH v2 0/3] bootdevcie: change the boot order validation logic
From: Gonglei arei.gong...@huawei.com The reset logic can be done by both machine reset and boot handler. So we shouldn't return error when the boot handler callback don't be set in patch 1. Patch 2 check boot order argument validation before vm running. Patch 3 passing error_abort instead of NULL. v2 - v1: - add patch 2 suggested by Markus. - rework patch 3. (Maruks) - add R-by in patch 1. Gonglei (3): bootdevice: remove the check about boot_set_handler bootdevice: check boot order argument validation before vm running bootdevice: add check in restore_boot_order() bootdevice.c | 12 vl.c | 13 +++-- 2 files changed, 15 insertions(+), 10 deletions(-) -- 1.7.12.4
[Qemu-devel] [PATCH v2 2/3] bootdevice: check boot order argument validation before vm running
From: Gonglei arei.gong...@huawei.com Either 'once' option or 'order' option can take effect for -boot at the same time, that is say initial startup processing can check only one. And pc.c's set_boot_dev() fails when its boot order argument is invalid. This patch provide a solution fix this problem: 1. If once is given, register reset handler to restore boot order. 2. Pass the normal boot order to machine creation. Should fail when the normal boot order is invalid. 3. If once is given, set it with qemu_boot_set(). Fails when the once boot order is invalid. 4. Start the machine. 5. On reset, the reset handler calls qemu_boot_set() to restore boot order. Should never fail. Suggested-by: Markus Armbruster arm...@redhat.com Signed-off-by: Gonglei arei.gong...@huawei.com --- vl.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/vl.c b/vl.c index 983259b..0d90d98 100644 --- a/vl.c +++ b/vl.c @@ -2734,6 +2734,7 @@ int main(int argc, char **argv, char **envp) const char *initrd_filename; const char *kernel_filename, *kernel_cmdline; const char *boot_order; +const char *once = NULL; DisplayState *ds; int cyls, heads, secs, translation; QemuOpts *hda_opts = NULL, *opts, *machine_opts, *icount_opts = NULL; @@ -4046,7 +4047,7 @@ int main(int argc, char **argv, char **envp) opts = qemu_opts_find(qemu_find_opts(boot-opts), NULL); if (opts) { char *normal_boot_order; -const char *order, *once; +const char *order; Error *local_err = NULL; order = qemu_opt_get(opts, order); @@ -4067,7 +4068,6 @@ int main(int argc, char **argv, char **envp) exit(1); } normal_boot_order = g_strdup(boot_order); -boot_order = once; qemu_register_reset(restore_boot_order, normal_boot_order); } @@ -4246,6 +4246,15 @@ int main(int argc, char **argv, char **envp) net_check_clients(); +if (once) { +Error *local_err = NULL; +qemu_boot_set(once, local_err); +if (local_err) { +error_report(%s, error_get_pretty(local_err)); +exit(1); +} +} + ds = init_displaystate(); /* init local displays */ -- 1.7.12.4
[Qemu-devel] [PATCH v2 3/3] bootdevice: add check in restore_boot_order()
From: Gonglei arei.gong...@huawei.com qemu_boot_set() can't fail in restore_boot_order(), then simply assert it doesn't fail, by passing error_abort. Signed-off-by: Gonglei arei.gong...@huawei.com --- bootdevice.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bootdevice.c b/bootdevice.c index 52d3f9e..d3d4277 100644 --- a/bootdevice.c +++ b/bootdevice.c @@ -101,7 +101,7 @@ void restore_boot_order(void *opaque) return; } -qemu_boot_set(normal_boot_order, NULL); +qemu_boot_set(normal_boot_order, error_abort); qemu_unregister_reset(restore_boot_order, normal_boot_order); g_free(normal_boot_order); -- 1.7.12.4
[Qemu-devel] [PATCH] vfio: free dynamically-allocated data in instance_finalize
In order to enable out-of-BQL address space lookup, destruction of devices needs to be split in two phases. Unrealize is the first phase; once it complete no new accesses will be started, but there may still be pending memory accesses can still be completed. The second part is freeing the device, which only happens once all memory accesses are complete. At this point the reference count has dropped to zero, an RCU grace period must have completed (because the RCU-protected FlatViews hold a reference to the device via memory_region_ref). This is when instance_finalize is called. Freeing data belongs in an instance_finalize callback, because the dynamically allocated memory can still be used after unrealize by the pending memory accesses. In the case of VFIO, the unrealize callback is too early to munmap the BARs. The munmap must be delayed until memory accesses are complete. To do this, split vfio_unmap_bars in two. The removal step, now called vfio_unregister_bars, remains in vfio_exitfn. The reclamation step is vfio_unmap_bars and is moved to the instance_finalize callback. Similarly, quirk MemoryRegions have to be removed during vfio_unregister_bars, but freeing the data structure must be delayed to vfio_unmap_bars. Cc: Alex Williamson alex.william...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- This patch is part of the third installment 3 of the RCU work. Sending it out separately for Alex to review it. hw/vfio/pci.c | 78 +- 1 file changed, 68 insertions(+), 10 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 014a92c..69d4a33 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -1997,12 +1997,23 @@ static void vfio_vga_quirk_setup(VFIOPCIDevice *vdev) static void vfio_vga_quirk_teardown(VFIOPCIDevice *vdev) { +VFIOQuirk *quirk; +int i; + +for (i = 0; i ARRAY_SIZE(vdev-vga.region); i++) { +QLIST_FOREACH(quirk, vdev-vga.region[i].quirks, next) { +memory_region_del_subregion(vdev-vga.region[i].mem, quirk-mem); +} +} +} + +static void vfio_vga_quirk_free(VFIOPCIDevice *vdev) +{ int i; for (i = 0; i ARRAY_SIZE(vdev-vga.region); i++) { while (!QLIST_EMPTY(vdev-vga.region[i].quirks)) { VFIOQuirk *quirk = QLIST_FIRST(vdev-vga.region[i].quirks); -memory_region_del_subregion(vdev-vga.region[i].mem, quirk-mem); object_unparent(OBJECT(quirk-mem)); QLIST_REMOVE(quirk, next); g_free(quirk); @@ -2023,10 +2034,19 @@ static void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr) static void vfio_bar_quirk_teardown(VFIOPCIDevice *vdev, int nr) { VFIOBAR *bar = vdev-bars[nr]; +VFIOQuirk *quirk; + +QLIST_FOREACH(quirk, bar-quirks, next) { +memory_region_del_subregion(bar-region.mem, quirk-mem); +} +} + +static void vfio_bar_quirk_free(VFIOPCIDevice *vdev, int nr) +{ +VFIOBAR *bar = vdev-bars[nr]; while (!QLIST_EMPTY(bar-quirks)) { VFIOQuirk *quirk = QLIST_FIRST(bar-quirks); -memory_region_del_subregion(bar-region.mem, quirk-mem); object_unparent(OBJECT(quirk-mem)); QLIST_REMOVE(quirk, next); g_free(quirk); @@ -2282,7 +2302,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled) } } -static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr) +static void vfio_unregister_bar(VFIOPCIDevice *vdev, int nr) { VFIOBAR *bar = vdev-bars[nr]; @@ -2293,10 +2313,25 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr) vfio_bar_quirk_teardown(vdev, nr); memory_region_del_subregion(bar-region.mem, bar-region.mmap_mem); -munmap(bar-region.mmap, memory_region_size(bar-region.mmap_mem)); if (vdev-msix vdev-msix-table_bar == nr) { memory_region_del_subregion(bar-region.mem, vdev-msix-mmap_mem); +} +} + +static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr) +{ +VFIOBAR *bar = vdev-bars[nr]; + +if (!bar-region.size) { +return; +} + +vfio_bar_quirk_free(vdev, nr); + +munmap(bar-region.mmap, memory_region_size(bar-region.mmap_mem)); + +if (vdev-msix vdev-msix-table_bar == nr) { munmap(vdev-msix-mmap, memory_region_size(vdev-msix-mmap_mem)); } } @@ -2413,6 +2448,19 @@ static void vfio_unmap_bars(VFIOPCIDevice *vdev) } if (vdev-has_vga) { +vfio_vga_quirk_free(vdev); +} +} + +static void vfio_unregister_bars(VFIOPCIDevice *vdev) +{ +int i; + +for (i = 0; i PCI_ROM_SLOT; i++) { +vfio_unregister_bar(vdev, i); +} + +if (vdev-has_vga) { vfio_vga_quirk_teardown(vdev); pci_unregister_vga(vdev-pdev); } @@ -3324,6 +3372,7 @@ static int vfio_initfn(PCIDevice *pdev) out_teardown: pci_device_set_intx_routing_notifier(vdev-pdev, NULL); vfio_teardown_msi(vdev); +vfio_unregister_bars(vdev);
Re: [Qemu-devel] [PATCH RFC 0/1] KVM: ioctl for reading/writing guest memory
Am 03.02.2015 um 13:59 schrieb Paolo Bonzini: On 03/02/2015 13:11, Thomas Huth wrote: The userspace (QEMU) then can simply call this ioctl when it wants to read or write from/to virtual guest memory. Then kernel then takes the IPTE-lock, walks the MMU table of the guest to find out the physical address that corresponds to the virtual address, copies the requested amount of bytes from the userspace buffer to guest memory or the other way round, and finally frees the IPTE-lock again. Does that sound like a viable solution (IMHO it does ;-))? Or should I maybe try to pursue another approach? It looks feasible to me as well. Yes, we discussed this internally a lot and things are really tricky. The ipte lock could be exported to userspace, but we might also need to handle storage keys (and key protection) in an atomic fashion, so this really looks like the only safe way. I guess we will give it some more testing, but to me it looks like a good candidate for kvm/next after 3.20-rc1. Christian
[Qemu-devel] [PULL 6/9] s390x/kvm: unknown DIAGNOSE code should give a specification exception
From: Christian Borntraeger borntrae...@de.ibm.com As described in CP programming services an unimplemented DIAGNOSE function should return a specification exception. Today we give the guest an operation exception. As both exception types are suppressing and Linux as a guest does not care about the type of program check in its exception table handler as long as both types have the same kind of error handling (nullifying, terminating, suppressing etc.) this was unnoticed. Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- target-s390x/kvm.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c index 6bf2719..6f2d5b4 100644 --- a/target-s390x/kvm.c +++ b/target-s390x/kvm.c @@ -1091,7 +1091,7 @@ static int handle_diag(S390CPU *cpu, struct kvm_run *run, uint32_t ipb) break; default: DPRINTF(KVM: unknown DIAG: 0x%x\n, func_code); -r = -1; +enter_pgmcheck(cpu, PGM_SPECIFICATION); break; } -- 1.7.9.5