Re: [Qemu-devel] [PATCH 2/3] qemu-char: add -chardev exit-on-eof option
Eric Blake ebl...@redhat.com writes: On 07/25/2014 02:12 AM, Markus Armbruster wrote: Stefan Hajnoczi stefa...@redhat.com writes: When QEMU is executed as part of a test case or from a script, it is usually desirable to exit if the parent process terminates. This ensures that leaked QEMU processes do not continue consuming resources after their parent has died. ## -{ 'type': 'ChardevHostdev', 'data': { 'device' : 'str' } } +{ 'type': 'ChardevHostdev', 'data': { 'device' : 'str', + '*exit-on-eof' : 'bool' } } ## # @ChardevSocket: Any use cases beyond libqtest? Libvirt probably won't use it for normal guests (we don't want to kill qemu just because the monitor disconnects), but does have the notion of an autodestroy guest where it might be useful (we WANT the guest to go away if libvirtd dies early). In fact, autodestroy guests are used during migration - we want to kill qemu on the destination side if libvirtd dies before the source side finishes sending the migration stream. But in that scenario, once migration succeeds, libvirt has to be able to convert an autodestroy guest back into a normal guest that no longer disappears when libvirtd does; would this be something that QMP can toggle the state of this attribute on the fly? Perhaps through qom-set? After migration completes, execution moves from source to target. Wouldn't you want to switch off target auto-destruct together with that move, atomically? If no, should this be x-exit-on-eof? Hmm, looks like there's no precedence for x- in QAPI. Ah, but we do. For example, x-rdma-pin-all in MigrationCapability in qemu 1.7; which has later been removed. However, we also have precedence of actions in QAPI that are very unlikely to be used outside of qtest, but which are not marked experimental; for example, the 'Abort' action in 'transaction' will probably never be used by libvirt Arguably not a conscious decision to make it ABI forever, more a case of nobody thought about *not* making it ABI :) (but as I type that, the thought in the back of my mind is that I could possibly do some sort of feature probing by setting up a transaction that will abort, and distinguish between whether the transaction aborted or whether it errored out because the feature I'm probing for isn't present). I'm fine leaving this as plain 'exit-on-eof'. I don't really mind this instance, but I'm a bit concerned about rank ABI growth.
[Qemu-devel] [PATCH for-2.1] po: update Italian translation
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- po/it.po | 63 --- 1 file changed, 40 insertions(+), 23 deletions(-) diff --git a/po/it.po b/po/it.po index a62665c..e46fb3a 100644 --- a/po/it.po +++ b/po/it.po @@ -1,13 +1,13 @@ # Italian translation for QEMU. # This file is put in the public domain. -# Paolo Bonzini pbonz...@redhat.com, 2012. +# Paolo Bonzini pbonz...@redhat.com, 2012-2014. # msgid msgstr Project-Id-Version: QEMU 1.4.50\n Report-Msgid-Bugs-To: qemu-devel@nongnu.org\n -POT-Creation-Date: 2013-07-05 22:36+0200\n -PO-Revision-Date: 2012-02-27 08:23+0100\n +POT-Creation-Date: 2014-07-29 08:14+0200\n +PO-Revision-Date: 2014-07-29 08:25+0200\n Last-Translator: Paolo Bonzini pbonz...@redhat.com\n Language-Team: Italian i...@li.org\n Language: it\n @@ -16,49 +16,66 @@ msgstr Content-Transfer-Encoding: 8bit\n Plural-Forms: nplurals=2; plural=n != 1;\n -#: ui/gtk.c:214 +#: ui/gtk.c:321 msgid - Press Ctrl+Alt+G to release grab -msgstr +msgstr - Premere Ctrl+Alt+G per rilasciare l'input -#: ui/gtk.c:218 +#: ui/gtk.c:325 msgid [Paused] -msgstr +msgstr [Pausa] -#: ui/gtk.c:1318 +#: ui/gtk.c:1601 msgid _Pause -msgstr +msgstr _Pausa -#: ui/gtk.c:1324 +#: ui/gtk.c:1607 msgid _Reset -msgstr +msgstr _Reset -#: ui/gtk.c:1327 +#: ui/gtk.c:1610 msgid Power _Down -msgstr +msgstr _Spegni + +#: ui/gtk.c:1616 +msgid _Quit +msgstr _Esci -#: ui/gtk.c:1381 +#: ui/gtk.c:1702 +msgid Zoom _In +msgstr _Aumenta zoom + +#: ui/gtk.c:1709 +msgid Zoom _Out +msgstr _Riduci zoom + +#: ui/gtk.c:1716 +msgid Best _Fit +msgstr A_nnulla zoom + +#: ui/gtk.c:1723 msgid Zoom To _Fit msgstr Adatta alla _finestra -#: ui/gtk.c:1387 +#: ui/gtk.c:1729 msgid Grab On _Hover msgstr Cattura _automatica input -#: ui/gtk.c:1390 +#: ui/gtk.c:1732 msgid _Grab Input msgstr _Cattura input -#: ui/gtk.c:1416 +#: ui/gtk.c:1761 msgid Show _Tabs msgstr Mostra _tab -#: ui/gtk.c:1430 +#: ui/gtk.c:1764 +msgid Detach Tab +msgstr _Sposta in una nuova finestra + +#: ui/gtk.c:1778 msgid _Machine -msgstr +msgstr _Macchina virtuale -#: ui/gtk.c:1435 +#: ui/gtk.c:1783 msgid _View msgstr _Visualizza - -#~ msgid _File -#~ msgstr _File -- 1.9.3
Re: [Qemu-devel] [PATCH v4 0/5] ACPI fixes for QEMU 2.1
Paolo Bonzini pbonz...@redhat.com writes: v3-v4: drop all pretense of supporting bridges [me] Does this non-support need documentation? None visible in diffstat... v2-v3: fix tests/acpi-test-data/pc/DSDT [Peter] track down make check failure, fix it [patch 4, me] split patch 2 in two parts [mst] do not make bsel_alloc global [mst] include Igor's bridge patch [mst, as discussed on IRC] Igor Mammedov (1): pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is disabled Paolo Bonzini (4): acpi-dsdt: procedurally generate _PRT pc: hack for migration compatibility from QEMU 2.0 pc: future-proof migration-compatibility of ACPI tables bios-tables-test: fix ASL normalization false positive hw/i386/acpi-build.c| 94 +- hw/i386/acpi-dsdt.dsl | 90 +- hw/i386/acpi-dsdt.hex.generated | 1910 +++ hw/i386/pc_piix.c | 19 + hw/i386/pc_q35.c|5 + include/hw/i386/pc.h|1 + tests/acpi-test-data/pc/DSDT| Bin 4499 - 2807 bytes tests/bios-tables-test.c|6 +- 8 files changed, 263 insertions(+), 1862 deletions(-)
Re: [Qemu-devel] [PATCH v4 0/5] ACPI fixes for QEMU 2.1
Il 29/07/2014 08:16, Markus Armbruster ha scritto: v3-v4: drop all pretense of supporting bridges [me] Does this non-support need documentation? None visible in diffstat... Yes, it should. It was already broken for 1.7-2.0. Paolo
Re: [Qemu-devel] Dynamic QEMU platform device instantiation in machine files: phone call on Wed July 30
Eric Auger eric.au...@linaro.org writes: Dear all, For your information, a phone call will be held this week on Wed July 30, 17h-18h CET to address the topic of dynamic instantiation of QEMU platform devices in machine files (using the -device qemu option). Related threads are: http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg01019.html The objective of the meeting is to discuss (and hopefully reach a consensus) on where we can put the code associated to platform device dt node generation and also qom binding stuff. In case you are interested in this discussion, please contact me and I will send you the phone call details. Why not use the regular developers' conference call? Next one is scheduled for August 5th, but an extraordinary one could probably be arranged earlier (tomorrow usual time perhaps?).
Re: [Qemu-devel] [PATCH 3/5] scsi-block: extract scsi_block_is_passthrough
On Mon, 07/28 17:08, Paolo Bonzini wrote: This will be used for both scsi_block_new_request and the scsi-block implementation of parse_cdb. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi/scsi-disk.c | 38 ++ 1 file changed, 26 insertions(+), 12 deletions(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index d47ecd6..81b7276 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -2501,12 +2501,8 @@ static int scsi_block_initfn(SCSIDevice *dev) return scsi_initfn(s-qdev); } -static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, - uint32_t lun, uint8_t *buf, - void *hba_private) +static bool scsi_block_is_passthrough(SCSIDiskState *s, uint8_t *buf) { -SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d); - switch (buf[0]) { case READ_6: case READ_10: @@ -2523,9 +2519,9 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, case WRITE_VERIFY_12: case WRITE_VERIFY_16: /* If we are not using O_DIRECT, we might read stale data from the - * host cache if writes were made using other commands than these - * ones (such as WRITE SAME or EXTENDED COPY, etc.). So, without - * O_DIRECT everything must go through SG_IO. + * host cache if writes were made using other commands than these + * ones (such as WRITE SAME or EXTENDED COPY, etc.). So, without + * O_DIRECT everything must go through SG_IO. */ if (!(bdrv_get_flags(s-qdev.conf.bs) BDRV_O_NOCACHE)) { break; @@ -2542,13 +2538,31 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, * just make scsi-block operate the same as scsi-generic for them. */ if (s-qdev.type != TYPE_ROM) { -return scsi_req_alloc(scsi_disk_dma_reqops, s-qdev, tag, lun, - hba_private); +return false; } +break; + +default: +break; Out of curiosity, why add this default branch? Fam } -return scsi_req_alloc(scsi_generic_req_ops, s-qdev, tag, lun, - hba_private); +return true; +} + + +static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, + uint32_t lun, uint8_t *buf, + void *hba_private) +{ +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d); + +if (scsi_block_is_passthrough(s, buf)) { +return scsi_req_alloc(scsi_generic_req_ops, s-qdev, tag, lun, + hba_private); +} else { +return scsi_req_alloc(scsi_disk_dma_reqops, s-qdev, tag, lun, + hba_private); +} } #endif -- 1.9.3
Re: [Qemu-devel] [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On 2014/7/26 1:01, Konrad Rzeszutek Wilk wrote: On Thu, Jul 24, 2014 at 09:44:41AM +0800, Chen, Tiejun wrote: On 2014/7/24 4:54, Konrad Rzeszutek Wilk wrote: On Sat, Jul 19, 2014 at 12:27:21AM +, Kay, Allen M wrote: For the MCH PCI registers that do need to be read - can you tell us which ones those are? In qemu/hw/xen_pt_igd.c/igd_pci_read(), following MCH PCI config register reads are passthrough to the host HW. Some of the registers are needed by Ironlake GFX driver which we probably can remove. I did a trace recently on Broadwell, the number of register accessed are even smaller (0, 2, 2c, 2e, 50, 52, a0, a4). Given that we now have integrated MCH and GPU in the same socket, looks like driver can easily remove reads for offsets 0 - 0x2e. case 0x00:/* vendor id */ case 0x02:/* device id */ case 0x08:/* revision id */ case 0x2c:/* sybsystem vendor id */ case 0x2e:/* sybsystem id */ Right. We can fix the i915 to use the mechanism that Michael mentioned. (see attached RFC patches) case 0x50:/* SNB: processor graphics control register */ case 0x52:/* processor graphics control register */ case 0xa0:/* top of memory */ case 0xb0:/* ILK: BSM: should read from dev 2 offset 0x5c */ case 0x58:/* SNB: PAVPC Offset */ case 0xa4:/* SNB: graphics base of stolen memory */ case 0xa8:/* SNB: base of GTT stolen memory */ I dug in the intel-gtt.c (see ironlake_gtt_driver) to figure out this a bit more. As in, I speculated, what if we returned 0 (and not implement any support for reading from these registers). What would happen? 0x52 for Ironlake (g5): -- It looks like intel_gmch_probe is called when i915_gem_gtt_init starts (there is a lot of code that looks to be used between intel-gtt.c and i915.c). Anyhow the interesting parts are that i9xx_setup ends up calling ioremap the i915 BAR to setup some of these registers for older generations. Then i965_gtt_total_entries gets which reads 0x52, but it is only needed for v5 generation. For other (v4 and G33) it reads it from the GPU's 0x2020 register. If there is a mismatch, it writes to the GPU at 0x2020 to update the the size based on the bridge. And then it reads from 0x2020 and that is returned and stuck in intel_private.gtt_total_entries. So 0x52 in the emulated bridge could be populated with what the GPU has at 0x2020. And the writes go in the emulated area. 0x52 for v6 - v8: - We seem to go to gen6_gmch_probe which just figures out the the GTT based on the GPU's BAR sizes. The stolen values are read from 0x50 from the GPU. So no need to touch the bridge (see gen6_gmch_probe) OK, so no need to have 0x52 or 0x50 in the bridge. 0xA0: - Could not find any reference in the Linux code. Why would Windows driver need this? If we returned the _real_ TOM would it matter? Is it used to figure out the device should use 32-bit DMA operations or 40-bit? 0xb0 or 0x5c: - No mention of them in the Linux code. 0x58, 0xa4, 0xa8: - No usage of them in the Linux code. We seem to be using the 0x52 from the bridge and 0x2020 from the GPU for this functionality. So in theory*, if using Ironlake we need to have a proper value in 0x52 in the bridge. But for the later chipsets we do not need these values (I am assuming that intel_setup_mchbar can safely return as it does that for Ironlake and could very well for later generations). Can this be reflected in the Windows driver as well? P.S. *theory: That is assuming we modify the Linux i915_drv.c:intel_detect_pch to pick up the id as suggested earlier. See the RFC patches attached. (Not compile tested at all!) I take a look these patches, looks we still scan all PCI Bridge to walk all PCHs. So this means we still need to fake a PCI bridge, right? Or maybe you don't cover this problem this time. I totally forgot. Feel free to fix that. Sorry for this delay response. I prefer we should check dev slot to get that PCH like my previous patch, Uh? Your patch was checking for 0:1f.0, not the slot of the device. Yes. (see https://lkml.org/lkml/2014/6/19/121) gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class type. Because Windows always use this way, so I think this point should be same between Linux and Windows. Didn't we discuss that we did not want to pass in PCH at all if we can? I'm a bit confused since I guess I'm missing something important in this long discussion. I guess we just fake a simple PCIe device but just has PCI_VENDOR_ID_XEN/Real_PCH_Device_ID, and then well probe such a PCIe device inside intel_detect_pch(), right? And this means we will not support those existing drivers? And from the
Re: [Qemu-devel] [PATCH for-2.2 v2 0/5] scsi: enable passthrough of vendor-specific commands
On Mon, 07/28 17:08, Paolo Bonzini wrote: Right now scsi-generic is parsing the CDB, in order to compute the expected number of bytes to be transferred. This is necessary if DMA is done by the HBA via scsi_req_data, but it prevents executing vendor-specific commands via scsi-generic because we don't know how to parse them. If DMA is delegated to the SCSI layer via get_sg_list, we know in advance how many bytes the guest will want to receive and we can pass the information straight from the guest to SG_IO. In this case, it is unnecessary to parse the CDB to get the same information. scsi-disk needs it to detect underruns and overruns, but scsi-generic and scsi-block can just ask the HBA about the transfer direction and size. This series introduces a new parse_cdb callback in both the device and the HBA. The latter is called by scsi_bus_parse_cdb, which devices can call for passthrough requests in their implementation of parse_cdb. Paolo v1-v2: use the right CDB size for non-vendor-specific commands, as some drivers and/or firmware expect that and complain if you pass a READ(10) command in a 16-byte CDB. Interdiff here. More learning than reviewing for me, so take this with a grain of salt: Reviewed-by: Fam Zheng f...@redhat.com
[Qemu-devel] [RESEND PATCH V3] qemu-img info: show nocow info
Add nocow info in 'qemu-img info' output to show whether the file currently has NOCOW flag set or not. Signed-off-by: Chunyan Liu cy...@suse.com Acked-by: Eric Blake ebl...@redhat.com --- Resend for QEMU 2.2. Change json version comment. block/qapi.c | 25 + qapi/block-core.json | 5 - 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/block/qapi.c b/block/qapi.c index f44f6b4..aa53f19 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -28,6 +28,13 @@ #include qapi-visit.h #include qapi/qmp-output-visitor.h #include qapi/qmp/types.h +#ifdef __linux__ +#include linux/fs.h +#include sys/ioctl.h +#ifndef FS_NOCOW_FL +#define FS_NOCOW_FL 0x0080 /* Do not cow file */ +#endif +#endif BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs) { @@ -173,6 +180,20 @@ void bdrv_query_image_info(BlockDriverState *bs, Error *err = NULL; ImageInfo *info = g_new0(ImageInfo, 1); +#ifdef __linux__ +int fd, attr; + +/* get NOCOW info */ +fd = qemu_open(bs-filename, O_RDONLY | O_NONBLOCK); +if (fd = 0) { +if (ioctl(fd, FS_IOC_GETFLAGS, attr) == 0 (attr FS_NOCOW_FL)) { +info-has_nocow = true; +info-nocow = true; +} +qemu_close(fd); +} +#endif + bdrv_get_geometry(bs, total_sectors); info-filename= g_strdup(bs-filename); @@ -625,4 +646,8 @@ void bdrv_image_info_dump(fprintf_function func_fprintf, void *f, func_fprintf(f, Format specific information:\n); bdrv_image_info_specific_dump(func_fprintf, f, info-format_specific); } + +if (info-has_nocow info-nocow) { +func_fprintf(f, NOCOW flag: set\n); +} } diff --git a/qapi/block-core.json b/qapi/block-core.json index e378653..72b4015 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -115,6 +115,8 @@ # @format-specific: #optional structure supplying additional format-specific # information (since 1.7) # +# @nocow: #optional info of whether NOCOW flag is set or not. (since 2.2) +# # Since: 1.3 # ## @@ -126,7 +128,8 @@ '*backing-filename': 'str', '*full-backing-filename': 'str', '*backing-filename-format': 'str', '*snapshots': ['SnapshotInfo'], '*backing-image': 'ImageInfo', - '*format-specific': 'ImageInfoSpecific' } } + '*format-specific': 'ImageInfoSpecific', + '*nocow': 'bool' } } ## # @ImageCheck: -- 1.8.4.5
Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm
Serge, I don't think that is in any way a problem. Is migrating to older versions ever actually expected to work? In either case I don't think for this particular case it's a problem. Good; no; and good - respectively. (The how to handle this in libvirt question is more interesting) I've been giving this some thought. However rococo we make this, I think the caller is often going to need to modify the command line anyway, i.e. is going to need to be aware of the migration problem. For instance, taking Ubuntu as an example, 12.04 shipped with qemu-kvm-1.0, and a pxe-virtio.rom of just under 64k, giving a 64k ROM slot. 14.04 ships with qemu-2.0 and a pxe-virtio.rom of over 80k, giving a 128k ROM slot. So however we fix the machine types, when migrating in a 12.04 initiated VM, qemu will need -global virtio-net-pci.romfile=/path/to/pxe-virtio.rom.12.04 on the command line (or, if you don't much care about PXE working on a soft reboot, a blank file of the same size), whereas when migrating in a 14.04 initiated VM, that must not be on the command line. Fixing this properly would entail requiring that the ROMs are (effectively) distributed with qemu or at least that all ROM sizes become part of the machine type standard. This would have the advantage that loading a larger ROM than the machine type specifies would fail unless the ROM size was explicitly configured on the command line. But this is a subject wider than this patch. So the long and the short of it is that libvirt (sadly) like anything else starting qemu machines needs to know a bit about different versions of qemu, and be able to replace a machine type option with a machine type option and more on the command line. My previous suggestion doesn't help much because qemu will still need to be passed something on the command line. I think the best way to go with this patch would be something like: * Add pc-1.0-qemu-kvm as a machine type (done) * Rename pc-1.0 to pc-1.0-qemu-git * Add an alias for pc-1.0 to either pc-1.0-qemu-git or pc-1.0-qemu-kvm, configurable at build time with a ./configure option. The distro can then set this appropriately. This would default to pc-1.0-qemu-git (i.e. the current behaviour). On distros that only every used one of the above, ./configure will sort things out, +/- self-inflicted injuries relating to ROM size changes. If life is more complicated, libvirt (and other callers) will need to be aware. pc-1.0-qemu-git and pc-1.0-qemu-kvm can be used to unambiguously mean the relevant machine type, which will fix things going forward for that machine type. WDYT? -- Alex Bligh
Re: [Qemu-devel] [PATCH 3/5] scsi-block: extract scsi_block_is_passthrough
Il 29/07/2014 08:28, Fam Zheng ha scritto: + +default: +break; Out of curiosity, why add this default branch? No particular reason, I guess I've changed my style since I wrote this code! Paolo
Re: [Qemu-devel] Dynamic QEMU platform device instantiation in machine files: phone call on Wed July 30
Hi Markus, Thank you for the proposal. Actually quite a lot of people are now OK with this time slot. This phone call was organized in non std way since Alex was not available next week. If you don't mind I would now prefer to keep this slot. Do you wish to join by the way? Best Regards Eric On 07/29/2014 08:24 AM, Markus Armbruster wrote: Eric Auger eric.au...@linaro.org writes: Dear all, For your information, a phone call will be held this week on Wed July 30, 17h-18h CET to address the topic of dynamic instantiation of QEMU platform devices in machine files (using the -device qemu option). Related threads are: http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg01019.html The objective of the meeting is to discuss (and hopefully reach a consensus) on where we can put the code associated to platform device dt node generation and also qom binding stuff. In case you are interested in this discussion, please contact me and I will send you the phone call details. Why not use the regular developers' conference call? Next one is scheduled for August 5th, but an extraordinary one could probably be arranged earlier (tomorrow usual time perhaps?).
Re: [Qemu-devel] [PATCH 1/3] loader: add support for resizeable blobs
* Michael S. Tsirkin (m...@redhat.com) wrote: On Mon, Jul 28, 2014 at 05:52:27PM +0100, Dr. David Alan Gilbert wrote: * Michael S. Tsirkin (m...@redhat.com) wrote: Support resizeable blobs: we allocate more memory than currently available in the blob, which can later be filled in. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- include/hw/loader.h | 14 +++-- include/hw/nvram/fw_cfg.h | 2 +- hw/core/loader.c | 15 + hw/nvram/fw_cfg.c | 79 --- 4 files changed, 96 insertions(+), 14 deletions(-) +static bool fw_cfg_len_needed(void *opaque, int version_id) +{ +FWCfgState *s = FW_CFG(opaque); +int i, j; + +for (i = 0; i ARRAY_SIZE(s-entries); ++i) { +for (j = 0; j ARRAY_SIZE(s-entries[0]); ++j) { +if (s-entries[i][j].len != s-entries[i][j].max_len) { +return true; +} +} +} + +return false; +} This feels a bit delicate; it means that increasing the 'max_len' changes the expected migration stream, which seems odd for a parameter whose job is to make things easier to migrate. Dave It's an API issue - so you are more comfortable with an explicit flag? Yes, if it was something that was just switched on with new machine types I wouldn't worry as much. Dave + static const VMStateDescription vmstate_fw_cfg = { .name = fw_cfg, .version_id = 2, .minimum_version_id = 1, +.post_load = fw_cfg_post_load, +.pre_save = fw_cfg_pre_save, .fields = (VMStateField[]) { VMSTATE_UINT16(cur_entry, FWCfgState), VMSTATE_UINT16_HACK(cur_offset, FWCfgState, is_version_1), VMSTATE_UINT32_V(cur_offset, FWCfgState, 2), +VMSTATE_ARRAY_TEST(len, FWCfgState, FW_CFG_LEN_ENTRIES, + fw_cfg_len_needed, + vmstate_info_uint32, uint32_t), VMSTATE_END_OF_LIST() } }; @@ -388,23 +449,28 @@ static const VMStateDescription vmstate_fw_cfg = { static void fw_cfg_add_bytes_read_callback(FWCfgState *s, uint16_t key, FWCfgReadCallback callback, void *callback_opaque, - void *data, size_t len) + void *data, size_t len, + size_t max_len) { int arch = !!(key FW_CFG_ARCH_LOCAL); +assert(len = max_len); + key = FW_CFG_ENTRY_MASK; assert(key FW_CFG_MAX_ENTRY len UINT32_MAX); s-entries[arch][key].data = data; s-entries[arch][key].len = (uint32_t)len; +s-entries[arch][key].reset_len = (uint32_t)len; +s-entries[arch][key].max_len = (uint32_t)max_len; s-entries[arch][key].read_callback = callback; s-entries[arch][key].callback_opaque = callback_opaque; } void fw_cfg_add_bytes(FWCfgState *s, uint16_t key, void *data, size_t len) { -fw_cfg_add_bytes_read_callback(s, key, NULL, NULL, data, len); +fw_cfg_add_bytes_read_callback(s, key, NULL, NULL, data, len, len); } void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value) @@ -454,13 +520,15 @@ void fw_cfg_add_callback(FWCfgState *s, uint16_t key, FWCfgCallback callback, s-entries[arch][key].data = data; s-entries[arch][key].len = (uint32_t)len; +s-entries[arch][key].reset_len = (uint32_t)len; +s-entries[arch][key].max_len = (uint32_t)len; s-entries[arch][key].callback_opaque = callback_opaque; s-entries[arch][key].callback = callback; } void fw_cfg_add_file_callback(FWCfgState *s, const char *filename, FWCfgReadCallback callback, void *callback_opaque, - void *data, size_t len) + void *data, size_t len, size_t max_len) { int i, index; size_t dsize; @@ -475,7 +543,8 @@ void fw_cfg_add_file_callback(FWCfgState *s, const char *filename, assert(index FW_CFG_FILE_SLOTS); fw_cfg_add_bytes_read_callback(s, FW_CFG_FILE_FIRST + index, - callback, callback_opaque, data, len); + callback, callback_opaque, data, len, + max_len); pstrcpy(s-files-f[index].name, sizeof(s-files-f[index].name), filename); @@ -496,7 +565,7 @@ void fw_cfg_add_file_callback(FWCfgState *s, const char *filename, void fw_cfg_add_file(FWCfgState *s, const char *filename, void *data, size_t len) { -
[Qemu-devel] [Bug 1349722] [NEW] qemu-io: Exit code is always zero
Public bug reported: The qemu-io always returns zero on exit independently on errors occurred during the command execution. Example, $ qemu-io -c 'write 128 234' /tmp/run1/test-1/test.img offset 128 is not sector aligned $ echo $? 0 qemu.git HEAD: 41a1a9c42c4e ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1349722 Title: qemu-io: Exit code is always zero Status in QEMU: New Bug description: The qemu-io always returns zero on exit independently on errors occurred during the command execution. Example, $ qemu-io -c 'write 128 234' /tmp/run1/test-1/test.img offset 128 is not sector aligned $ echo $? 0 qemu.git HEAD: 41a1a9c42c4e To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1349722/+subscriptions
Re: [Qemu-devel] [PATCH] piix: set legacy table size for 1.7
Il 28/07/2014 23:01, Michael S. Tsirkin ha scritto: Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Paolo, so the following is needed on top of your patch? hw/i386/pc_piix.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 4524e6b..9694f88 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -326,6 +326,7 @@ static void pc_compat_1_7(MachineState *machine) smbios_defaults = false; gigabyte_align = false; option_rom_has_mr = true; +legacy_acpi_table_size = 6414; x86_cpu_compat_disable_kvm_features(FEAT_1_ECX, CPUID_EXT_X2APIC); } Only if you want to support pc-i440fx-1.7 migration for 1.7-2.1 (and not for 2.0-2.1). Probably better considering we're applying Igor's patch. Paolo
Re: [Qemu-devel] [PATCH] acpi-build: tweak acpi migration limits
Il 28/07/2014 23:15, Michael S. Tsirkin ha scritto: - Tweak error message for legacy machine type: Basically if table size exceeds the limits we set all bets are off for migration: e.g. it can start failing even within given qemu minor version simply because of a bugfix. - Increase table size to 128k. - Make sure we notice it long before we start getting close to the 128k limit: warn at 64k. - Don't fail if we exceed the limit: most people don't care about migration, even less people care about cross version miration. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/acpi-build.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 2178894..816c6d9 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -62,7 +62,7 @@ #define ACPI_BUILD_LEGACY_CPU_AML_SIZE97 #define ACPI_BUILD_ALIGN_SIZE 0x1000 -#define ACPI_BUILD_TABLE_SIZE 0x1 +#define ACPI_BUILD_TABLE_SIZE 0x2 typedef struct AcpiCpuInfo { DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT); @@ -1586,17 +1586,19 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) ACPI_BUILD_ALIGN_SIZE); if (tables-table_data-len legacy_table_size) { /* Should happen only with PCI bridges and -M pc-i440fx-2.0. */ -error_report(Warning: migration to QEMU 2.0 may not work.); +error_report(Warning: migration may not work.); } g_array_set_size(tables-table_data, legacy_table_size); } else { -if (tables-table_data-len ACPI_BUILD_TABLE_SIZE) { +/* Make sure we have a buffer in case we need to resize the tables. */ +if (tables-table_data-len ACPI_BUILD_TABLE_SIZE / 2) { /* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots. */ -error_report(ACPI tables are larger than 64k. Please remove); -error_report(CPUs, NUMA nodes, memory slots or PCI bridges.); -exit(1); +error_report(Warning: ACPI tables are larger than 64k.); +error_report(Warning: migration may not work.); +error_report(Warning: please remove CPUs, NUMA nodes, + memory slots or PCI bridges.); } -g_array_set_size(tables-table_data, ACPI_BUILD_TABLE_SIZE); +acpi_align_size(tables-table_data, ACPI_BUILD_TABLE_SIZE); } acpi_align_size(tables-linker, ACPI_BUILD_ALIGN_SIZE); Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Re: [Qemu-devel] [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
Il 29/07/2014 08:59, Chen, Tiejun ha scritto: (see https://lkml.org/lkml/2014/6/19/121) gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class type. Because Windows always use this way, so I think this point should be same between Linux and Windows. Didn't we discuss that we did not want to pass in PCH at all if we can? I'm a bit confused since I guess I'm missing something important in this long discussion. I guess we just fake a simple PCIe device but just has PCI_VENDOR_ID_XEN/Real_PCH_Device_ID, and then well probe such a PCIe device inside intel_detect_pch(), right? Yes, for the special PIIX4 legacy machine type we want to do that and place the device at 00:1f.0. Paolo
Re: [Qemu-devel] [PATCH for-2.2 1/2] rename parse_enum_option to qapi_enum_parse and make it public
Il 28/07/2014 22:06, Peter Lieven ha scritto: ...but you have relaxed the license to LGPLv2+ in your code motion. Then again, Peter is the original author of this code in commit 82a402e9, so you have the legal right to relax things. Actually, I was not aware. I just took Hu Taos original commit. As we want to make a collection of utilities I would go as far as put a BSD license here. I will send a v2 with your comments regarding Patch 2. And do that license change as well. LGPL v2.1+ is the standard license for qapi/, so I think it's okay to use either. Paolo
Re: [Qemu-devel] [PATCH] target-mips: Ignore unassigned accesses with KVM
Il 28/07/2014 23:36, Aurelien Jarno ha scritto: On Mon, Jul 28, 2014 at 12:37:50PM +0100, James Hogan wrote: MIPS registers an unassigned access handler which raises a guest bus error exception. However this causes QEMU to crash when KVM is enabled as it isn't called from the main execution loop so longjmp() gets called without a corresponding setjmp(). Until the KVM API can be updated to trigger a guest exception in response to an MMIO exit, prevent the bus error exception being raised from mips_cpu_unassigned_access() if KVM is enabled. The check is at run time since the do_unassigned_access callback is initialised before it is known whether KVM will be enabled. The problem can be triggered with Malta emulation by making the guest write to the reset region at physical address 0x1bf0, since it is marked read-only which is treated as unassigned for writes. Signed-off-by: James Hogan james.ho...@imgtec.com Cc: Aurelien Jarno aurel...@aurel32.net Cc: Peter Maydell peter.mayd...@linaro.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Christoffer Dall christoffer.d...@linaro.org Cc: Sanjay Lal sanj...@kymasys.com --- target-mips/op_helper.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c index 27651a4a00c1..df97b35f8701 100644 --- a/target-mips/op_helper.c +++ b/target-mips/op_helper.c @@ -21,6 +21,7 @@ #include qemu/host-utils.h #include exec/helper-proto.h #include exec/cpu_ldst.h +#include sysemu/kvm.h #ifndef CONFIG_USER_ONLY static inline void cpu_mips_tlb_flush (CPUMIPSState *env, int flush_global); @@ -2168,6 +2169,16 @@ void mips_cpu_unassigned_access(CPUState *cs, hwaddr addr, MIPSCPU *cpu = MIPS_CPU(cs); CPUMIPSState *env = cpu-env; +/* + * Raising an exception with KVM enabled will crash because it won't be from + * the main execution loop so the longjmp won't have a matching setjmp. + * Until we can trigger a bus error exception through KVM lets just ignore + * the access. + */ +if (kvm_enabled()) { +return; +} + if (is_exec) { helper_raise_exception(env, EXCP_IBE); } else { Reviewed-by: Aurelien Jarno aurel...@aurel32.net Note that even if the test is added for each exception, it is light enough compared to triggering and handling an exception so that it has no impact on performance. Paolo, do you want to take this patch in your kvm tree? Sure, I'll include it for 2.2. Paolo
Re: [Qemu-devel] Dynamic QEMU platform device instantiation in machine files: phone call on Wed July 30
Eric Auger eric.au...@linaro.org writes: Hi Markus, Thank you for the proposal. Actually quite a lot of people are now OK with this time slot. This phone call was organized in non std way since Alex was not available next week. If you don't mind I would now prefer to keep this slot. Let's keep it. Don't be shy using the ordinary call in the future, even if that means asking for one outside the biweekly schedule. Do you wish to join by the way? I'd love to, but family obligations will make it difficult. You said 17h-18h CET. Do you mean CEST? Easiest way to avoid the confusion is UTC.
Re: [Qemu-devel] [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On 2014/7/29 16:32, Paolo Bonzini wrote: Il 29/07/2014 08:59, Chen, Tiejun ha scritto: (see https://lkml.org/lkml/2014/6/19/121) gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class type. Because Windows always use this way, so I think this point should be same between Linux and Windows. Didn't we discuss that we did not want to pass in PCH at all if we can? I'm a bit confused since I guess I'm missing something important in this long discussion. I guess we just fake a simple PCIe device but just has PCI_VENDOR_ID_XEN/Real_PCH_Device_ID, and then well probe such a PCIe device inside intel_detect_pch(), right? Yes, for the special PIIX4 legacy machine type we want to do that and place the device at 00:1f.0. Got it. BTW, please review those patches implemented a separate machine, xenigd, firstly. Then I can step on this point. Thanks Tiejun
Re: [Qemu-devel] [PATCH v2 1/7] bootindex: add modify_boot_device_path function
Hi, Gerd I think it is ok to allow only *changing* the bootindex. Yes, that's no problem. But then yoy always will have a old entry where you can take the suffix from, and you don't need the suffix as parameter for the monitor command. No, optional. Because the bootindex property is not a necessary property for devices. If a device, such as IDE cdrom haven't attach the bootindex in qemu booting command line, the global fw_boot_order will not have its entry. I'd suggest to simply not support this and throw an error then. Ok. So, we should save the suffix as parameter. I think it is a bad idea to have the suffix as monitor command parameter. It is a implementation detail which the monitor users should not have to worry about. Yes. Actually I also have this misgivings. Easiest way to do this is to allow *changing* an existing bootindex entry only, and not support *adding* new boot entries. The user has to set a bootindex for any device it might want to boot from in the future then. I think this is acceptable. Hmm.. If you want support adding bootentries at runtime (and it probably makes sense to support removing entries too in that case) the suffix handling should be reworked. The suffix could be stored in DeviceState instead of being passed to the add_bootentry function, so you can add boot entries later on without expecting the user to know what the correct suffix is. That's a good idea! I think this is a good improvement. Any other comments? Thanks! Maybe we should save the suffix as parameter in add_boot_device_path() There are two reasons: 1. the floppy device may have two bootindex, which configure as below: -global isa-fdc.driveA=drive-fdc0-0-0, bootindexA=xxx \ -global isa-fdc.driveB=drive-fdc0-0-1 bootindexB=xxx \ We can see it in isabus_fdc_realize() [hw/block/fdc.c] 2. The option rom don't need pass dev to add_bootentry, which realized in rom_add_file() [hw/core/loader.c] in those situation, we have to pass suffix to add_bootentry function. In addition, because an isa-fdc device can be configured two drive, we have to pass the suffix for changing floppy's bootindex. Both del_bootentry and add_bootentry need suffix to confirm the unique device. Because the suffix of bootindex is invisible for common user. Maybe we can introduce a query interface for user. Such as hmp monitor command info bootindex, which show the information of bootindex have configured, includeing suffix. But the suffix is optional, maybe just useful for floppy disk, that will be not a serious or trouble problem IMHO. So, I want to do those in the next version: - save the suffix as optional continue. - allow changing the existing bootindex entry only, not support adding new boot entries. - rework modify_boot_device_path(), add checking logic for the suffix when remove old bootindex entry from global boot order list. I'm looking forward to your reply, thanks! Best regards, -Gonglei
Re: [Qemu-devel] Dynamic QEMU platform device instantiation in machine files: phone call on Wed July 30
On 07/29/2014 11:07 AM, Markus Armbruster wrote: Eric Auger eric.au...@linaro.org writes: Hi Markus, Thank you for the proposal. Actually quite a lot of people are now OK with this time slot. This phone call was organized in non std way since Alex was not available next week. If you don't mind I would now prefer to keep this slot. Let's keep it. Don't be shy using the ordinary call in the future, even if that means asking for one outside the biweekly schedule. OK sure I will ;-) Do you wish to join by the way? I'd love to, but family obligations will make it difficult. You said 17h-18h CET. Do you mean CEST? Easiest way to avoid the confusion is UTC. Yes CEST = UTC + 2. Thanks for pointing out the need for clarification. Regards Eric
[Qemu-devel] [PATCH] acpi-build: minor code cleanup
Fix up and add comments to clarify code, plus a trivial code change for clarity. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/acpi-build.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 779160f..ec86f1b 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -825,9 +825,9 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) bool bus_hotplug_support = false; /* -skip bridge subtree creation if bridge hotplug is disabled -to make it compatible with 1.7 machine type -*/ + * Skip bridge subtree creation if bridge hotplug is disabled + * to make acpi tables compatible with legacy machine types. + */ if (!child-pcihp_bridge_en bus-parent_dev) { return; } @@ -869,6 +869,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) PCIDeviceClass *pc; PCIDevice *pdev = bus-devices[i]; int slot = PCI_SLOT(i); +bool bridge_in_acpi; if (!pdev) { continue; @@ -878,8 +879,13 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) pc = PCI_DEVICE_GET_CLASS(pdev); dc = DEVICE_GET_CLASS(pdev); -if (pc-class_id == PCI_CLASS_BRIDGE_ISA || -(pc-is_bridge child-pcihp_bridge_en)) { +/* When hotplug for bridges is enabled, bridges are + * described in ACPI separately (see build_pci_bus_end). + * In this case they aren't themselves hot-pluggable. + */ +bridge_in_acpi = pc-is_bridge child-pcihp_bridge_en; + +if (pc-class_id == PCI_CLASS_BRIDGE_ISA || bridge_in_acpi) { set_bit(slot, slot_device_system); } @@ -891,7 +897,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) } } -if (!dc-hotpluggable || (pc-is_bridge child-pcihp_bridge_en)) { +if (!dc-hotpluggable || bridge_in_acpi) { clear_bit(slot, slot_hotplug_enable); } } -- MST
Re: [Qemu-devel] [PATCH v4 0/5] ACPI fixes for QEMU 2.1
On Tue, Jul 29, 2014 at 12:25:56PM +0200, Laszlo Ersek wrote: On 07/28/14 23:27, Michael S. Tsirkin wrote: On Mon, Jul 28, 2014 at 05:34:13PM +0200, Paolo Bonzini wrote: v3-v4: drop all pretense of supporting bridges [me] v2-v3: fix tests/acpi-test-data/pc/DSDT [Peter] track down make check failure, fix it [patch 4, me] split patch 2 in two parts [mst] do not make bsel_alloc global [mst] include Igor's bridge patch [mst, as discussed on IRC] OK, I applied this, and did some tweaks on top that I think make it a bit safer. It's very very late in the release cycle, but also very late in the day so I don't want to risk sending pull request now. I did push it out: tag for_upstream in my tree Will send tomorrow: Paolo, Laszlo, Gerd, could you please take a look and ack? Any particular reason for reordering the patches from Paolo's v4? In that series, the order is: 1 acpi-dsdt: procedurally generate _PRT 2 pc: hack for migration compatibility from QEMU 2.0 3 pc: future-proof migration-compatibility of ACPI tables 4 bios-tables-test: fix ASL normalization false positive 5 pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is disabled In yours, 1 acpi-dsdt: procedurally generate _PRT 2 pc: hack for migration compatibility from QEMU 2.0 3 bios-tables-test: fix ASL normalization false positive 4 pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is disabled 5 acpi-build: minor code cleanup 6 pc: future-proof migration-compatibility of ACPI tables It's because of this patch. It was a bit controversial, so I deferred applying it for a while. I implemented an alternative solution just to see how it would look like. That patch turned out to be much bigger, so I agreed we should go ahead with Paolo's one for 2.1 even if it's not pretty, and will create more work for 2.2. 7 acpi-build: tweak acpi migration limits 8 piix: set legacy table size for 1.7 1 - 1 2 - 2 3 - 6 4 - 3 5 - 4 Patches 1 2 are identical between the two sets, and their order is the same. You cut out patch #3, moved up patches #4 and #5, added a new patch (acpi-build: minor code cleanup), and then reinserted #3. .. Patches taken from Paolo's v4 seem to be identical. For patch acpi-build: minor code cleanup: - two typos in the commit message (double space, for clarify) - seems OK otherwise For patch acpi-build: tweak acpi migration limits: len pre-patch post-patch message actionmessage action --- [ 0, 64 KB] none set to 64 KB none set to 128KB ( 64 KB, 128 KB] error exit warning set to 128KB (128 KB, inf) error exit warning round up to multiple of 128 KB I don't object. For patch piix: set legacy table size for 1.7: didn't Igor say something that such a migration wouldn't work anyway? I could be remembering wrong. Thanks Laszlo I don't recall this, but if there are more bug we could just fix them too. -- MST
Re: [Qemu-devel] [PATCH v4 0/5] ACPI fixes for QEMU 2.1
Il 29/07/2014 12:31, Michael S. Tsirkin ha scritto: For patch piix: set legacy table size for 1.7: didn't Igor say something that such a migration wouldn't work anyway? I could be remembering wrong. I don't recall this, but if there are more bug we could just fix them too. You have to choose between a -M pc-i440fx-1.7 that migrates from 1.7 to 2.1, and one that migrates from 2.0 to 2.1. - to make 1.7-2.1 work: you need both Igor's patch (generate AML only...) and piix: set legacy table size for 1.7. All configurations will work, including those with PCI bridges. - to make 2.0-2.1 work (with -M pc-i440fx-1.7): you need to omit piix: set legacy table size for 1.7, and configurations with PCI bridges remain broken. Igor's patch is not needed, because it only affects configurations with PCI bridges. mst prefers the first one, and he changed my view on that too. And he noticed that with piix: set legacy table size for 1.7 more things just work, so it's better to include it. Paolo
[Qemu-devel] [PULL 0/8] pc migration fixes
The following changes since commit c60a57ff497667780132a3fcdc1500c83af5d5c0: Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging (2014-07-25 16:58:41 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream for you to fetch changes up to f47337cb9181a1f9339b9b0516b92fe77a22f0f3: piix: set legacy table size for 1.7 (2014-07-29 12:26:12 +0200) pc migration fixes Last minute fixes for migration. It seems that if we don't fix it now, fixing it in the next version will be even more painful ... Signed-off-by: Michael S. Tsirkin m...@redhat.com Igor Mammedov (1): pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is disabled Michael S. Tsirkin (3): acpi-build: minor code cleanup acpi-build: tweak acpi migration limits piix: set legacy table size for 1.7 Paolo Bonzini (4): acpi-dsdt: procedurally generate _PRT pc: hack for migration compatibility from QEMU 2.0 bios-tables-test: fix ASL normalization false positive pc: future-proof migration-compatibility of ACPI tables include/hw/i386/pc.h|1 + hw/i386/acpi-build.c| 99 +- hw/i386/pc_piix.c | 20 + hw/i386/pc_q35.c|5 + tests/bios-tables-test.c|6 +- hw/i386/acpi-dsdt.dsl | 90 +- hw/i386/acpi-dsdt.hex.generated | 1910 +++ tests/acpi-test-data/pc/DSDT| Bin 4499 - 2807 bytes 8 files changed, 269 insertions(+), 1862 deletions(-)
[Qemu-devel] [PULL 2/8] pc: hack for migration compatibility from QEMU 2.0
From: Paolo Bonzini pbonz...@redhat.com Changing the ACPI table size causes migration to break, and the memory hotplug work opened our eyes on how horribly we were breaking things in 2.0 already. The ACPI table size is rounded to the next 4k, which one would think gives some headroom. In practice this is not the case, because the user can control the ACPI table size (each CPU adds 97 bytes to the SSDT and 8 to the MADT) and so some -smp values will break the 4k boundary and fail to migrate. Similarly, PCI bridges add ~1870 bytes to the SSDT. This patch concerns itself with fixing migration from QEMU 2.0. It computes the payload size of QEMU 2.0 and always uses that one. The previous patch shrunk the ACPI tables enough that the QEMU 2.0 size should always be enough; non-AML tables can change depending on the configuration (especially MADT, SRAT, HPET) but they remain the same between QEMU 2.0 and 2.1, so we only compute our padding based on the sizes of the SSDT and DSDT. Migration from QEMU 1.7 should work for guests that have a number of CPUs other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140. It was already broken from QEMU 1.7 to QEMU 2.0 in the same way, though. Even with this patch, QEMU 1.7 and 2.0 have two different ideas of -M pc-i440fx-2.0 when there are PCI bridges. Igor sent a patch to adopt the QEMU 1.7 definition. I think distributions should apply it if they move directly from QEMU 1.7 to 2.1+ without ever packaging version 2.0. Reviewed-by: Laszlo Ersek ler...@redhat.com Tested-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Reviewed-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- include/hw/i386/pc.h | 1 + hw/i386/acpi-build.c | 57 hw/i386/pc_piix.c| 19 ++ hw/i386/pc_q35.c | 5 + 4 files changed, 78 insertions(+), 4 deletions(-) diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 1c0c382..f4b9b2b 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -94,6 +94,7 @@ struct PcGuestInfo { uint64_t *node_mem; uint64_t *node_cpu; FWCfgState *fw_cfg; +int legacy_acpi_table_size; bool has_acpi_build; bool has_reserved_memory; }; diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index ebc5f03..d90c471 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -25,7 +25,9 @@ #include glib.h #include qemu-common.h #include qemu/bitmap.h +#include qemu/osdep.h #include qemu/range.h +#include qemu/error-report.h #include hw/pci/pci.h #include qom/cpu.h #include hw/i386/pc.h @@ -52,6 +54,14 @@ #include qapi/qmp/qint.h #include qom/qom-qobject.h +/* These are used to size the ACPI tables for -M pc-i440fx-1.7 and + * -M pc-i440fx-2.0. Even if the actual amount of AML generated grows + * a little bit, there should be plenty of free space since the DSDT + * shrunk by ~1.5k between QEMU 2.0 and QEMU 2.1. + */ +#define ACPI_BUILD_LEGACY_CPU_AML_SIZE97 +#define ACPI_BUILD_ALIGN_SIZE 0x1000 + typedef struct AcpiCpuInfo { DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT); } AcpiCpuInfo; @@ -1440,13 +1450,14 @@ static void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) { GArray *table_offsets; -unsigned facs, dsdt, rsdt; +unsigned facs, ssdt, dsdt, rsdt; AcpiCpuInfo cpu; AcpiPmInfo pm; AcpiMiscInfo misc; AcpiMcfgInfo mcfg; PcPciInfo pci; uint8_t *u; +size_t aml_len = 0; acpi_get_cpu_info(cpu); acpi_get_pm_info(pm); @@ -1474,13 +1485,20 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) dsdt = tables-table_data-len; build_dsdt(tables-table_data, tables-linker, misc); +/* Count the size of the DSDT and SSDT, we will need it for legacy + * sizing of ACPI tables. + */ +aml_len += tables-table_data-len - dsdt; + /* ACPI tables pointed to by RSDT */ acpi_add_table(table_offsets, tables-table_data); build_fadt(tables-table_data, tables-linker, pm, facs, dsdt); +ssdt = tables-table_data-len; acpi_add_table(table_offsets, tables-table_data); build_ssdt(tables-table_data, tables-linker, cpu, pm, misc, pci, guest_info); +aml_len += tables-table_data-len - ssdt; acpi_add_table(table_offsets, tables-table_data); build_madt(tables-table_data, tables-linker, cpu, guest_info); @@ -1513,14 +1531,45 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) /* RSDP is in FSEG memory, so allocate it separately */ build_rsdp(tables-rsdp, tables-linker, rsdt); -/* We'll expose it all to Guest so align size to reduce +/* We'll expose it all to Guest so we want to reduce * chance of size changes. * RSDP is small so it's easy to keep it immutable, no need to * bother with alignment. + * + * We used to align the tables
[Qemu-devel] [PULL 7/8] acpi-build: tweak acpi migration limits
- Tweak error message for legacy machine type: Basically if table size exceeds the limits we set all bets are off for migration: e.g. it can start failing even within given qemu minor version simply because of a bugfix. - Increase table size to 128k. - Make sure we notice it long before we start getting close to the 128k limit: warn at 64k. - Don't fail if we exceed the limit: most people don't care about migration, even less people care about cross version miration. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Paolo Bonzini pbonz...@redhat.com --- hw/i386/acpi-build.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 2178894..816c6d9 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -62,7 +62,7 @@ #define ACPI_BUILD_LEGACY_CPU_AML_SIZE97 #define ACPI_BUILD_ALIGN_SIZE 0x1000 -#define ACPI_BUILD_TABLE_SIZE 0x1 +#define ACPI_BUILD_TABLE_SIZE 0x2 typedef struct AcpiCpuInfo { DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT); @@ -1586,17 +1586,19 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) ACPI_BUILD_ALIGN_SIZE); if (tables-table_data-len legacy_table_size) { /* Should happen only with PCI bridges and -M pc-i440fx-2.0. */ -error_report(Warning: migration to QEMU 2.0 may not work.); +error_report(Warning: migration may not work.); } g_array_set_size(tables-table_data, legacy_table_size); } else { -if (tables-table_data-len ACPI_BUILD_TABLE_SIZE) { +/* Make sure we have a buffer in case we need to resize the tables. */ +if (tables-table_data-len ACPI_BUILD_TABLE_SIZE / 2) { /* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots. */ -error_report(ACPI tables are larger than 64k. Please remove); -error_report(CPUs, NUMA nodes, memory slots or PCI bridges.); -exit(1); +error_report(Warning: ACPI tables are larger than 64k.); +error_report(Warning: migration may not work.); +error_report(Warning: please remove CPUs, NUMA nodes, + memory slots or PCI bridges.); } -g_array_set_size(tables-table_data, ACPI_BUILD_TABLE_SIZE); +acpi_align_size(tables-table_data, ACPI_BUILD_TABLE_SIZE); } acpi_align_size(tables-linker, ACPI_BUILD_ALIGN_SIZE); -- MST
[Qemu-devel] [PULL 8/8] piix: set legacy table size for 1.7
Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/pc_piix.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 4524e6b..9694f88 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -326,6 +326,7 @@ static void pc_compat_1_7(MachineState *machine) smbios_defaults = false; gigabyte_align = false; option_rom_has_mr = true; +legacy_acpi_table_size = 6414; x86_cpu_compat_disable_kvm_features(FEAT_1_ECX, CPUID_EXT_X2APIC); } -- MST
[Qemu-devel] [PULL 5/8] acpi-build: minor code cleanup
Fix up and add comments to clarify code, plus a trivial code change for clarity. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/acpi-build.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 779160f..ec86f1b 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -825,9 +825,9 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) bool bus_hotplug_support = false; /* -skip bridge subtree creation if bridge hotplug is disabled -to make it compatible with 1.7 machine type -*/ + * Skip bridge subtree creation if bridge hotplug is disabled + * to make acpi tables compatible with legacy machine types. + */ if (!child-pcihp_bridge_en bus-parent_dev) { return; } @@ -869,6 +869,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) PCIDeviceClass *pc; PCIDevice *pdev = bus-devices[i]; int slot = PCI_SLOT(i); +bool bridge_in_acpi; if (!pdev) { continue; @@ -878,8 +879,13 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) pc = PCI_DEVICE_GET_CLASS(pdev); dc = DEVICE_GET_CLASS(pdev); -if (pc-class_id == PCI_CLASS_BRIDGE_ISA || -(pc-is_bridge child-pcihp_bridge_en)) { +/* When hotplug for bridges is enabled, bridges are + * described in ACPI separately (see build_pci_bus_end). + * In this case they aren't themselves hot-pluggable. + */ +bridge_in_acpi = pc-is_bridge child-pcihp_bridge_en; + +if (pc-class_id == PCI_CLASS_BRIDGE_ISA || bridge_in_acpi) { set_bit(slot, slot_device_system); } @@ -891,7 +897,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) } } -if (!dc-hotpluggable || (pc-is_bridge child-pcihp_bridge_en)) { +if (!dc-hotpluggable || bridge_in_acpi) { clear_bit(slot, slot_hotplug_enable); } } -- MST
[Qemu-devel] [PULL 1/8] acpi-dsdt: procedurally generate _PRT
From: Paolo Bonzini pbonz...@redhat.com This replaces the _PRT constant with a method that computes it. The problem is that the DSDT+SSDT have grown from 2.0 to 2.1, enough to cross the 8k barrier (we align the ACPI tables to 4k before putting them in fw_cfg). This causes problems with migration and the pc-i440fx-2.0 machine type. The solution to the problem is to hardcode 64k as the limit, but this doesn't solve the bug with pc-i440fx-2.0. The fix will be for QEMU 2.1 to use exactly the same size as QEMU 2.0 for the ACPI tables. First, however, we must make the actual AML equal or smaller; to do this, rewrite _PRT in a way that saves over 1k of bytecode. Reviewed-by: Laszlo Ersek ler...@redhat.com Tested-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Reviewed-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/acpi-dsdt.dsl | 90 +- hw/i386/acpi-dsdt.hex.generated | 1910 +++ tests/acpi-test-data/pc/DSDT| Bin 4499 - 2807 bytes 3 files changed, 148 insertions(+), 1852 deletions(-) diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl index 3cc0ea0..6ba0170 100644 --- a/hw/i386/acpi-dsdt.dsl +++ b/hw/i386/acpi-dsdt.dsl @@ -181,57 +181,45 @@ DefinitionBlock ( Scope(\_SB) { Scope(PCI0) { -Name(_PRT, Package() { -/* PCI IRQ routing table, example from ACPI 2.0a specification, - section 6.2.8.1 */ -/* Note: we provide the same info as the PCI routing - table of the Bochs BIOS */ - -#define prt_slot(nr, lnk0, lnk1, lnk2, lnk3) \ -Package() { nr##, 0, lnk0, 0 }, \ -Package() { nr##, 1, lnk1, 0 }, \ -Package() { nr##, 2, lnk2, 0 }, \ -Package() { nr##, 3, lnk3, 0 } - -#define prt_slot0(nr) prt_slot(nr, LNKD, LNKA, LNKB, LNKC) -#define prt_slot1(nr) prt_slot(nr, LNKA, LNKB, LNKC, LNKD) -#define prt_slot2(nr) prt_slot(nr, LNKB, LNKC, LNKD, LNKA) -#define prt_slot3(nr) prt_slot(nr, LNKC, LNKD, LNKA, LNKB) - -prt_slot0(0x), -/* Device 1 is power mgmt device, and can only use irq 9 */ -prt_slot(0x0001, LNKS, LNKB, LNKC, LNKD), -prt_slot2(0x0002), -prt_slot3(0x0003), -prt_slot0(0x0004), -prt_slot1(0x0005), -prt_slot2(0x0006), -prt_slot3(0x0007), -prt_slot0(0x0008), -prt_slot1(0x0009), -prt_slot2(0x000a), -prt_slot3(0x000b), -prt_slot0(0x000c), -prt_slot1(0x000d), -prt_slot2(0x000e), -prt_slot3(0x000f), -prt_slot0(0x0010), -prt_slot1(0x0011), -prt_slot2(0x0012), -prt_slot3(0x0013), -prt_slot0(0x0014), -prt_slot1(0x0015), -prt_slot2(0x0016), -prt_slot3(0x0017), -prt_slot0(0x0018), -prt_slot1(0x0019), -prt_slot2(0x001a), -prt_slot3(0x001b), -prt_slot0(0x001c), -prt_slot1(0x001d), -prt_slot2(0x001e), -prt_slot3(0x001f), -}) +Method (_PRT, 0) { +Store(Package(128) {}, Local0) +Store(Zero, Local1) +While(LLess(Local1, 128)) { +// slot = pin 2 +Store(ShiftRight(Local1, 2), Local2) + +// lnk = (slot + pin) 3 +Store(And(Add(Local1, Local2), 3), Local3) +If (LEqual(Local3, 0)) { +Store(Package(4) { Zero, Zero, LNKD, Zero }, Local4) +} +If (LEqual(Local3, 1)) { +// device 1 is the power-management device, needs SCI +If (LEqual(Local1, 4)) { +Store(Package(4) { Zero, Zero, LNKS, Zero }, Local4) +} Else { +Store(Package(4) { Zero, Zero, LNKA, Zero }, Local4) +} +} +If (LEqual(Local3, 2)) { +Store(Package(4) { Zero, Zero, LNKB, Zero }, Local4) +} +If (LEqual(Local3, 3)) { +Store(Package(4) { Zero, Zero, LNKC, Zero }, Local4) +} + +// Complete the interrupt routing entry: +//Package(4) { 0x[slot], [pin], [link], 0) } + +Store(Or(ShiftLeft(Local2, 16), 0x), Index(Local4, 0)) +Store(And(Local1, 3),Index(Local4, 1)) +Store(Local4,Index(Local0,
[Qemu-devel] [PULL 3/8] bios-tables-test: fix ASL normalization false positive
From: Paolo Bonzini pbonz...@redhat.com My version of IASL (from RHEL7) puts two newlines between the head comment and the DefinitionBlock property. Kill all newlines after the comment, so that normalize_asl works properly. Signed-off-by: Paolo Bonzini pbonz...@redhat.com Reviewed-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Laszlo Ersek ler...@redhat.com --- tests/bios-tables-test.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c index 62771f7..045eb27 100644 --- a/tests/bios-tables-test.c +++ b/tests/bios-tables-test.c @@ -487,7 +487,11 @@ static GString *normalize_asl(gchar *asl_code) /* strip comments (different generation days) */ comment = g_strstr_len(asl-str, asl-len, COMMENT_END); if (comment) { -asl = g_string_erase(asl, 0, comment + sizeof(COMMENT_END) - asl-str); +comment += strlen(COMMENT_END); +while (*comment == '\n') { +comment++; +} +asl = g_string_erase(asl, 0, comment - asl-str); } /* strip def block name (it has file path in it) */ -- MST
[Qemu-devel] [PULL 4/8] pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is disabled
From: Igor Mammedov imamm...@redhat.com Fixes migration regression from QEMU-1.7 to a newer QEMUs. SSDT table size in QEMU-1.7 doesn't change regardless of a number of PCI bridge devices present at startup. However in QEMU-2.0 since addition of hotplug on PCI bridges, each PCI bridge adds ~1875 bytes to SSDT table, including pc-i440fx-1.7 machine type where PCI bridge hotplug disabled via compat property. It breaks migration from QEMU-1.7 to QEMU-2.[01] -M pc-i440fx-1.7 since RAMBlock size of ACPI tables on target becomes larger then on source and migration fails with: Length mismatch: /rom@etc/acpi/tables: 2000 in != 3000 error. Fix this by generating AML only for PCI0 bus if hotplug on PCI bridges is disabled and preserves PCI brigde description in AML as it was done in QEMU-1.7 for pc-i440fx-1.7. It will help to maintain size of SSDT static regardless of number of PCI bridges on startup for pc-i440fx-1.7 machine type. Signed-off-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Reviewed-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/acpi-build.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index d90c471..779160f 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -74,6 +74,7 @@ typedef struct AcpiMcfgInfo { typedef struct AcpiPmInfo { bool s3_disabled; bool s4_disabled; +bool pcihp_bridge_en; uint8_t s4_val; uint16_t sci_int; uint8_t acpi_enable_cmd; @@ -95,6 +96,7 @@ typedef struct AcpiBuildPciBusHotplugState { GArray *device_table; GArray *notify_table; struct AcpiBuildPciBusHotplugState *parent; +bool pcihp_bridge_en; } AcpiBuildPciBusHotplugState; static void acpi_get_dsdt(AcpiMiscInfo *info) @@ -198,6 +200,9 @@ static void acpi_get_pm_info(AcpiPmInfo *pm) NULL); pm-gpe0_blk_len = object_property_get_int(obj, ACPI_PM_PROP_GPE0_BLK_LEN, NULL); +pm-pcihp_bridge_en = +object_property_get_bool(obj, acpi-pci-hotplug-with-bridge-support, + NULL); } static void acpi_get_misc_info(AcpiMiscInfo *info) @@ -778,11 +783,13 @@ static void acpi_set_pci_info(void) } static void build_pci_bus_state_init(AcpiBuildPciBusHotplugState *state, - AcpiBuildPciBusHotplugState *parent) + AcpiBuildPciBusHotplugState *parent, + bool pcihp_bridge_en) { state-parent = parent; state-device_table = build_alloc_array(); state-notify_table = build_alloc_array(); +state-pcihp_bridge_en = pcihp_bridge_en; } static void build_pci_bus_state_cleanup(AcpiBuildPciBusHotplugState *state) @@ -796,7 +803,7 @@ static void *build_pci_bus_begin(PCIBus *bus, void *parent_state) AcpiBuildPciBusHotplugState *parent = parent_state; AcpiBuildPciBusHotplugState *child = g_malloc(sizeof *child); -build_pci_bus_state_init(child, parent); +build_pci_bus_state_init(child, parent, parent-pcihp_bridge_en); return child; } @@ -817,6 +824,14 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) GArray *method; bool bus_hotplug_support = false; +/* +skip bridge subtree creation if bridge hotplug is disabled +to make it compatible with 1.7 machine type +*/ +if (!child-pcihp_bridge_en bus-parent_dev) { +return; +} + if (bus-parent_dev) { op = 0x82; /* DeviceOp */ build_append_nameseg(bus_table, S%.02X_, @@ -863,7 +878,8 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) pc = PCI_DEVICE_GET_CLASS(pdev); dc = DEVICE_GET_CLASS(pdev); -if (pc-class_id == PCI_CLASS_BRIDGE_ISA || pc-is_bridge) { +if (pc-class_id == PCI_CLASS_BRIDGE_ISA || +(pc-is_bridge child-pcihp_bridge_en)) { set_bit(slot, slot_device_system); } @@ -875,7 +891,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state) } } -if (!dc-hotpluggable || pc-is_bridge) { +if (!dc-hotpluggable || (pc-is_bridge child-pcihp_bridge_en)) { clear_bit(slot, slot_hotplug_enable); } } @@ -1140,7 +1156,7 @@ build_ssdt(GArray *table_data, GArray *linker, bus = PCI_HOST_BRIDGE(pci_host)-bus; } -build_pci_bus_state_init(hotplug_state, NULL); +build_pci_bus_state_init(hotplug_state, NULL, pm-pcihp_bridge_en); if (bus) { /* Scan all PCI buses. Generate tables to support hotplug. */ -- MST
[Qemu-devel] [PULL 6/8] pc: future-proof migration-compatibility of ACPI tables
From: Paolo Bonzini pbonz...@redhat.com This patch avoids that similar changes break QEMU again in the future. QEMU will now hard-code 64k as the maximum ACPI table size, which (despite being an order of magnitude smaller than 640k) should be enough for everyone. Reviewed-by: Laszlo Ersek ler...@redhat.com Tested-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Reviewed-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/acpi-build.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index ec86f1b..2178894 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -62,6 +62,8 @@ #define ACPI_BUILD_LEGACY_CPU_AML_SIZE97 #define ACPI_BUILD_ALIGN_SIZE 0x1000 +#define ACPI_BUILD_TABLE_SIZE 0x1 + typedef struct AcpiCpuInfo { DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT); } AcpiCpuInfo; @@ -1588,7 +1590,13 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) } g_array_set_size(tables-table_data, legacy_table_size); } else { -acpi_align_size(tables-table_data, ACPI_BUILD_ALIGN_SIZE); +if (tables-table_data-len ACPI_BUILD_TABLE_SIZE) { +/* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots. */ +error_report(ACPI tables are larger than 64k. Please remove); +error_report(CPUs, NUMA nodes, memory slots or PCI bridges.); +exit(1); +} +g_array_set_size(tables-table_data, ACPI_BUILD_TABLE_SIZE); } acpi_align_size(tables-linker, ACPI_BUILD_ALIGN_SIZE); -- MST
[Qemu-devel] [PATCH v2] Add ACPI tables for TPM
From: Stefan Berger stef...@linux.vnet.ibm.com Add an SSDT ACPI table for the TPM device. Add a TCPA table for BIOS logging area when a TPM is being used. The latter follows this spec here: http://www.trustedcomputinggroup.org/files/static_page_files/DCD4188E-1A4B-B294-D050A155FB6F7385/TCG_ACPIGeneralSpecification_PublicReview.pdf Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com --- hw/i386/Makefile.objs | 3 ++- hw/i386/acpi-build.c | 46 ++ hw/i386/acpi-defs.h | 11 +++ hw/i386/ssdt-tpm.dsl | 43 +++ hw/tpm/tpm_tis.h | 5 + include/hw/acpi/tpm.h | 29 + include/sysemu/tpm.h | 5 + 7 files changed, 137 insertions(+), 5 deletions(-) create mode 100644 hw/i386/ssdt-tpm.dsl create mode 100644 include/hw/acpi/tpm.h diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs index 48014ab..3688cf8 100644 --- a/hw/i386/Makefile.objs +++ b/hw/i386/Makefile.objs @@ -10,7 +10,8 @@ obj-y += bios-linker-loader.o hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex \ hw/i386/ssdt-proc.hex hw/i386/ssdt-pcihp.hex hw/i386/ssdt-misc.hex \ hw/i386/acpi-dsdt.hex hw/i386/q35-acpi-dsdt.hex \ - hw/i386/q35-acpi-dsdt.hex hw/i386/ssdt-mem.hex + hw/i386/q35-acpi-dsdt.hex hw/i386/ssdt-mem.hex \ + hw/i386/ssdt-tpm.hex iasl-option=$(shell if test -z `$(1) $(2) 21 /dev/null` \ ; then echo $(2); else echo $(3); fi ;) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index ebc5f03..d767e37 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -38,6 +38,8 @@ #include hw/loader.h #include hw/isa/isa.h #include hw/acpi/memory_hotplug.h +#include sysemu/tpm.h +#include hw/acpi/tpm.h /* Supported chipsets: */ #include hw/acpi/piix4.h @@ -75,6 +77,7 @@ typedef struct AcpiPmInfo { typedef struct AcpiMiscInfo { bool has_hpet; +bool has_tpm; DECLARE_BITMAP(slot_hotplug_enable, PCI_SLOT_MAX); const unsigned char *dsdt_code; unsigned dsdt_size; @@ -193,6 +196,7 @@ static void acpi_get_pm_info(AcpiPmInfo *pm) static void acpi_get_misc_info(AcpiMiscInfo *info) { info-has_hpet = hpet_find(); +info-has_tpm = tpm_find(); info-pvpanic_port = pvpanic_port(); } @@ -681,6 +685,7 @@ static inline char acpi_get_hex(uint32_t val) #include hw/i386/ssdt-misc.hex #include hw/i386/ssdt-pcihp.hex +#include hw/i386/ssdt-tpm.hex static void build_append_notify_method(GArray *device, const char *name, @@ -1167,6 +1172,40 @@ build_hpet(GArray *table_data, GArray *linker) (void *)hpet, HPET, sizeof(*hpet), 1); } +static void +build_tpm_tcpa(GArray *table_data, GArray *linker) +{ +Acpi20Tcpa *tcpa; +uint32_t log_area_minimum_length = TPM_LOG_AREA_MINIMUM_SIZE; +uint64_t log_area_start_address; +size_t len = log_area_minimum_length + sizeof(*tcpa); + +log_area_start_address = table_data-len + sizeof(*tcpa); + +tcpa = acpi_data_push(table_data, len); + +tcpa-platform_class = cpu_to_le16(TPM_TCPA_ACPI_CLASS_CLIENT); +tcpa-log_area_minimum_length = cpu_to_le32(log_area_minimum_length); +tcpa-log_area_start_address = cpu_to_le64(log_area_start_address); + +/* LASA address to be filled by Guest linker */ +bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, + ACPI_BUILD_TABLE_FILE, + table_data, tcpa-log_area_start_address, + sizeof(tcpa-log_area_start_address)); +build_header(linker, table_data, + (void *)tcpa, TCPA, sizeof(*tcpa), 2); +} + +static void +build_tpm_ssdt(GArray *table_data, GArray *linker) +{ +void *tpm_ptr; + +tpm_ptr = acpi_data_push(table_data, sizeof(ssdt_tpm_aml)); +memcpy(tpm_ptr, ssdt_tpm_aml, sizeof(ssdt_tpm_aml)); +} + typedef enum { MEM_AFFINITY_NOFLAGS = 0, MEM_AFFINITY_ENABLED = (1 0), @@ -1489,6 +1528,13 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) acpi_add_table(table_offsets, tables-table_data); build_hpet(tables-table_data, tables-linker); } +if (misc.has_tpm) { +acpi_add_table(table_offsets, tables-table_data); +build_tpm_ssdt(tables-table_data, tables-linker); + +acpi_add_table(table_offsets, tables-table_data); +build_tpm_tcpa(tables-table_data, tables-linker); +} if (guest_info-numa_nodes) { acpi_add_table(table_offsets, tables-table_data); build_srat(tables-table_data, tables-linker, cpu, guest_info); diff --git a/hw/i386/acpi-defs.h b/hw/i386/acpi-defs.h index e93babb..1bc974a 100644 --- a/hw/i386/acpi-defs.h +++ b/hw/i386/acpi-defs.h @@ -314,4 +314,15 @@ struct AcpiTableMcfg { } QEMU_PACKED; typedef struct AcpiTableMcfg AcpiTableMcfg; +/* + * TCPA Description Table + */ +struct Acpi20Tcpa { +
[Qemu-devel] [Bug 1343827] Re: block.c: multiwrite_merge() truncates overlapping requests
Thanks for reporting this bug. I'm writing a test case and fix, will CC you on the patches. Please keep in mind that no ordering is guaranteed between requests that are in-flight at the same time. Therefore it is unusual to submit overlapping requests and could indicate a bug in the application. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1343827 Title: block.c: multiwrite_merge() truncates overlapping requests Status in QEMU: New Bug description: If the list of requests passed to multiwrite_merge() contains two requests where the first is for a range of sectors that is a strict subset of the second's, the second request is truncated to end where the first starts, so the second half of the second request is lost. This is easy to reproduce by running fio against a virtio-blk device running on qemu 2.1.0-rc1 with the below fio script. At least with fio 2.0.13, the randwrite pass will issue overlapping bios to the block driver, which the kernel is happy to pass along to qemu: [global] randrepeat=0 ioengine=libaio iodepth=64 direct=1 size=1M numjobs=1 verify_fatal=1 verify_dump=1 filename=$dev [seqwrite] blocksize_range=4k-1M rw=write verify=crc32c-intel [randwrite] stonewall blocksize_range=4k-1M rw=randwrite verify=meta Here is a naive fix for the problem that simply avoids merging problematic requests. I guess a better solution would be to redo qemu_iovec_concat() to do the right thing. diff -ur old/qemu-2.1.0-rc2/block.c qemu-2.1.0-rc2/block.c --- old/qemu-2.1.0-rc2/block.c 2014-07-15 14:49:14.0 -0700 +++ qemu-2.1.0-rc2/block.c 2014-07-17 23:03:14.224169741 -0700 @@ -4460,7 +4460,9 @@ int64_t oldreq_last = reqs[outidx].sector + reqs[outidx].nb_sectors; // Handle exactly sequential writes and overlapping writes. -if (reqs[i].sector = oldreq_last) { +// If this request ends before the previous one, don't merge. +if (reqs[i].sector = oldreq_last +reqs[i].sector + reqs[i].nb_sectors = oldreq_last) { merge = 1; } To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1343827/+subscriptions
Re: [Qemu-devel] [PATCH v4 0/5] ACPI fixes for QEMU 2.1
On 07/29/14 12:39, Paolo Bonzini wrote: Il 29/07/2014 12:31, Michael S. Tsirkin ha scritto: For patch piix: set legacy table size for 1.7: didn't Igor say something that such a migration wouldn't work anyway? I could be remembering wrong. I don't recall this, but if there are more bug we could just fix them too. You have to choose between a -M pc-i440fx-1.7 that migrates from 1.7 to 2.1, and one that migrates from 2.0 to 2.1. - to make 1.7-2.1 work: you need both Igor's patch (generate AML only...) and piix: set legacy table size for 1.7. All configurations will work, including those with PCI bridges. - to make 2.0-2.1 work (with -M pc-i440fx-1.7): you need to omit piix: set legacy table size for 1.7, and configurations with PCI bridges remain broken. Igor's patch is not needed, because it only affects configurations with PCI bridges. mst prefers the first one, and he changed my view on that too. And he noticed that with piix: set legacy table size for 1.7 more things just work, so it's better to include it. I see. Thanks. 1 acd727e acpi-dsdt: procedurally generate _PRT 2 07fb617 pc: hack for migration compatibility from QEMU 2.0 These already have my R-b. 3 3d5061f bios-tables-test: fix ASL normalization false positive Just gave my R-b in this thread. 4 82631f6 pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is disabled Acked-by: Laszlo Ersek ler...@redhat.com 5 1cffcf8 acpi-build: minor code cleanup Typos in the commit message should be fixed. With those changes, Reviewed-by: Laszlo Ersek ler...@redhat.com 6 3200ac6 pc: future-proof migration-compatibility of ACPI tables Has my R-b already. 7 004af2c acpi-build: tweak acpi migration limits Reviewed-by: Laszlo Ersek ler...@redhat.com 8 5c35a24 piix: set legacy table size for 1.7 Acked-by: Laszlo Ersek ler...@redhat.com Laszlo
Re: [Qemu-devel] [PATCH 2/4] xen:hw:pci-host:piix: create host bridge to passthrough
On Thu, Jul 24, 2014 at 07:30:27PM +0800, Tiejun Chen wrote: Implement that pci s/that/a/ host bridge to specific s/to specific/specific/ to passthrough. Actually this just inherit s/inherit/inherits/ the standard one. This is based on http://patchwork.ozlabs.org/patch/363810/. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- hw/pci-host/piix.c | 43 +++ 1 file changed, 43 insertions(+) diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index e0e0946..9feddf5 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -34,6 +34,7 @@ #include sysemu/sysemu.h #include hw/i386/ioapic.h #include qapi/visitor.h +#include hw/xen/xen_pt.h What call needs this, exactly? /* * I440FX chipset data sheet. @@ -44,6 +45,10 @@ #define I440FX_PCI_HOST_BRIDGE(obj) \ OBJECT_CHECK(I440FXState, (obj), TYPE_I440FX_PCI_HOST_BRIDGE) OK cool, but don't you want to put igd and passthrough somewhere in the name? Maybe legacy as well since future drivers will work with existing machine type, right? Applies to functions and macro names as well. +#define TYPE_I440FX_XEN_PCI_DEVICE i440FX-xen +#define I440FX_XEN_PCI_DEVICE(obj) \ +OBJECT_CHECK(PCII440FXState, (obj), TYPE_I440FX_XEN_PCI_DEVICE) + typedef struct I440FXState { PCIHostState parent_obj; PcPciInfo pci_info; @@ -305,6 +310,16 @@ static int i440fx_initfn(PCIDevice *dev) return 0; } +static int i440fx_xen_initfn(PCIDevice *dev) +{ +PCII440FXState *d = I440FX_XEN_PCI_DEVICE(dev); + +dev-config[I440FX_SMRAM] = 0x02; + +cpu_smm_register(i440fx_set_smm, d); +return 0; +} + PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix3_devfn, ISABus **isa_bus, qemu_irq *pic, @@ -704,6 +719,33 @@ static const TypeInfo i440fx_info = { .class_init= i440fx_class_init, }; +static void i440fx_xen_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); + +k-init = i440fx_xen_initfn; +k-vendor_id = PCI_VENDOR_ID_INTEL; +k-device_id = PCI_DEVICE_ID_INTEL_82441; +k-revision = 0x02; +k-class_id = PCI_CLASS_BRIDGE_ISA; +dc-desc = XEN Host bridge; +dc-vmsd = vmstate_i440fx; +/* + * PCI-facing part of the host bridge, not usable without the + * host-facing part, which can't be device_add'ed, yet. + */ +dc-cannot_instantiate_with_device_add_yet = true; +dc-hotpluggable = false; +} + +static const TypeInfo i440fx_xen_info = { +.name = TYPE_I440FX_XEN_PCI_DEVICE, +.parent= TYPE_PCI_DEVICE, +.instance_size = sizeof(PCII440FXState), +.class_init= i440fx_xen_class_init, +}; + static const char *i440fx_pcihost_root_bus_path(PCIHostState *host_bridge, PCIBus *rootbus) { @@ -745,6 +787,7 @@ static const TypeInfo i440fx_pcihost_info = { static void i440fx_register_types(void) { type_register_static(i440fx_info); +type_register_static(i440fx_xen_info); type_register_static(piix3_info); type_register_static(piix3_xen_info); type_register_static(i440fx_pcihost_info); -- 1.9.1
Re: [Qemu-devel] [PATCH 3/4] xen:hw:pci-host:piix: introduce xen_igd_i440fx_init
On Thu, Jul 24, 2014 at 07:30:28PM +0800, Tiejun Chen wrote: This is almost same as an original i440fx_init but just work with that xen igd host bridge to passthrough. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- hw/pci-host/piix.c | 79 include/hw/i386/pc.h | 10 +++ 2 files changed, 89 insertions(+) diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index 9feddf5..7ef08d7 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -407,6 +407,85 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, return b; } +PCIBus *xen_igd_i440fx_init(PCII440FXState **pi440fx_state, +int *piix3_devfn, +ISABus **isa_bus, qemu_irq *pic, +MemoryRegion *address_space_mem, +MemoryRegion *address_space_io, +ram_addr_t ram_size, +ram_addr_t below_4g_mem_size, +ram_addr_t above_4g_mem_size, +MemoryRegion *pci_address_space, +MemoryRegion *ram_memory) +{ +DeviceState *dev; +PCIBus *b; +PCIDevice *d; +PCIHostState *s; +PIIX3State *piix3; +PCII440FXState *f; +unsigned i; +I440FXState *i440fx; + +dev = qdev_create(NULL, TYPE_I440FX_PCI_HOST_BRIDGE); +s = PCI_HOST_BRIDGE(dev); +b = pci_bus_new(dev, NULL, pci_address_space, +address_space_io, 0, TYPE_PCI_BUS); +s-bus = b; +object_property_add_child(qdev_get_machine(), i440fx, OBJECT(dev), NULL); +qdev_init_nofail(dev); + +d = pci_create_simple(b, 0, TYPE_I440FX_XEN_PCI_DEVICE); +*pi440fx_state = I440FX_XEN_PCI_DEVICE(d); +f = *pi440fx_state; +f-system_memory = address_space_mem; +f-pci_address_space = pci_address_space; +f-ram_memory = ram_memory; + +i440fx = I440FX_PCI_HOST_BRIDGE(dev); +i440fx-pci_info.w32.begin = below_4g_mem_size; + +/* setup pci memory mapping */ +pc_pci_as_mapping_init(OBJECT(f), f-system_memory, + f-pci_address_space); + +memory_region_init_alias(f-smram_region, OBJECT(d), smram-region, + f-pci_address_space, 0xa, 0x2); +memory_region_add_subregion_overlap(f-system_memory, 0xa, +f-smram_region, 1); +memory_region_set_enabled(f-smram_region, false); +init_pam(dev, f-ram_memory, f-system_memory, f-pci_address_space, + f-pam_regions[0], PAM_BIOS_BASE, PAM_BIOS_SIZE); +for (i = 0; i 12; ++i) { +init_pam(dev, f-ram_memory, f-system_memory, f-pci_address_space, + f-pam_regions[i+1], PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, + PAM_EXPAN_SIZE); +} + +/* Xen supports additional interrupt routes from the PCI devices to + * the IOAPIC: the four pins of each PCI device on the bus are also + * connected to the IOAPIC directly. + * These additional routes can be discovered through ACPI. */ +piix3 = DO_UPCAST(PIIX3State, dev, +pci_create_simple_multifunction(b, -1, true, PIIX3-xen)); +pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq, +piix3, XEN_PIIX_NUM_PIRQS); +piix3-pic = pic; +*isa_bus = ISA_BUS(qdev_get_child_bus(DEVICE(piix3), isa.0)); + +*piix3_devfn = piix3-dev.devfn; + +ram_size = ram_size / 8 / 1024 / 1024; +if (ram_size 255) { +ram_size = 255; +} +d-config[0x57] = ram_size; + +i440fx_update_memory_mappings(f); + +return b; +} Too much copy-paste. Please refactor to avoid code duplication. Is the only difference here the use of TYPE_I440FX_XEN_PCI_DEVICE? Then you can pass type in as a parameter to a common static sub-function. + PCIBus *find_i440fx(void) { PCIHostState *s = OBJECT_CHECK(PCIHostState, diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 1c0c382..51656d9 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -239,6 +239,16 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix_devfn, MemoryRegion *pci_memory, MemoryRegion *ram_memory); +PCIBus *xen_igd_i440fx_init(PCII440FXState **pi440fx_state, int *piix_devfn, +ISABus **isa_bus, qemu_irq *pic, +MemoryRegion *address_space_mem, +MemoryRegion *address_space_io, +ram_addr_t ram_size, +ram_addr_t below_4g_mem_size, +ram_addr_t above_4g_mem_size, +MemoryRegion *pci_memory, +MemoryRegion *ram_memory); + PCIBus *find_i440fx(void); /* piix4.c */ extern PCIDevice *piix4_dev; -- 1.9.1
Re: [Qemu-devel] [PATCH 1/4] hw:i386:pc_piix: split pc_init1()
On Thu, Jul 24, 2014 at 07:30:26PM +0800, Tiejun Chen wrote: We'd like to split pc_init1 and then we can share something with other stuff. Signed-off-by: Tiejun Chen tiejun.c...@intel.com Did you test this patch? It does not look like it can work. --- hw/i386/pc_piix.c | 93 +-- 1 file changed, 70 insertions(+), 23 deletions(-) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 7081c08..2391fda 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -70,34 +70,21 @@ static bool smbios_legacy_mode; static bool gigabyte_align = true; static bool has_reserved_memory = true; -/* PC hardware initialisation */ -static void pc_init1(MachineState *machine, +static ram_addr_t below_4g_mem_size; +static ram_addr_t above_4g_mem_size; +static void pc_machine_base_init(MachineState *machine, int pci_enabled, - int kvmclock_enabled) + int kvmclock_enabled, + DeviceState *icc_bridge, + MemoryRegion *ram_memory, + MemoryRegion *pci_memory, + qemu_irq *gsi, + GSIState *gsi_state, + FWCfgState *fw_cfg) { PCMachineState *pc_machine = PC_MACHINE(machine); MemoryRegion *system_memory = get_system_memory(); -MemoryRegion *system_io = get_system_io(); -int i; -ram_addr_t below_4g_mem_size, above_4g_mem_size; -PCIBus *pci_bus; -ISABus *isa_bus; -PCII440FXState *i440fx_state; -int piix3_devfn = -1; -qemu_irq *cpu_irq; -qemu_irq *gsi; -qemu_irq *i8259; -qemu_irq *smi_irq; -GSIState *gsi_state; -DriveInfo *hd[MAX_IDE_BUS * MAX_IDE_DEVS]; -BusState *idebus[MAX_IDE_BUS]; -ISADevice *rtc_state; -ISADevice *floppy; -MemoryRegion *ram_memory; -MemoryRegion *pci_memory; MemoryRegion *rom_memory; -DeviceState *icc_bridge; -FWCfgState *fw_cfg = NULL; PcGuestInfo *guest_info; ram_addr_t lowmem; @@ -190,6 +177,20 @@ static void pc_init1(MachineState *machine, } else { gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS); } +} + +static void pc_machine_pci_bus_init(MachineState *machine, + int pci_enabled, + PCII440FXState *i440fx_state, + int piix3_devfn, + PCIBus *pci_bus, + ISABus *isa_bus, + qemu_irq *gsi, + MemoryRegion *pci_memory, + MemoryRegion *ram_memory) +{ +MemoryRegion *system_memory = get_system_memory(); +MemoryRegion *system_io = get_system_io(); if (pci_enabled) { pci_bus = i440fx_init(i440fx_state, piix3_devfn, isa_bus, gsi, @@ -203,6 +204,28 @@ static void pc_init1(MachineState *machine, isa_bus = isa_bus_new(NULL, system_io); no_hpet = 1; } +} + +static void pc_machine_device_init(MachineState *machine, + int pci_enabled, + GSIState *gsi_state, + DeviceState *icc_bridge, + int piix3_devfn, + FWCfgState *fw_cfg, + PCIBus *pci_bus, + ISABus *isa_bus, + qemu_irq *gsi) +{ +int i; +DriveInfo *hd[MAX_IDE_BUS * MAX_IDE_DEVS]; +BusState *idebus[MAX_IDE_BUS]; +qemu_irq *smi_irq; +PCMachineState *pc_machine = PC_MACHINE(machine); +qemu_irq *cpu_irq; +qemu_irq *i8259; +ISADevice *rtc_state; +ISADevice *floppy; + isa_bus_irqs(isa_bus, gsi); if (kvm_irqchip_in_kernel()) { @@ -290,6 +313,30 @@ static void pc_init1(MachineState *machine, } } +/* PC hardware initialisation */ +static void pc_init1(MachineState *machine, + int pci_enabled, + int kvmclock_enabled) +{ +PCIBus *pci_bus = NULL; +ISABus *isa_bus = NULL; +PCII440FXState *i440fx_state = NULL; +int piix3_devfn = -1; +qemu_irq *gsi = NULL; +GSIState *gsi_state = NULL; +MemoryRegion *ram_memory = NULL; +MemoryRegion *pci_memory = NULL; +DeviceState *icc_bridge = NULL; +FWCfgState *fw_cfg = NULL; These are set to NULL here and never modified below. Why does it make sense? + +pc_machine_base_init(machine, pci_enabled, kvmclock_enabled, icc_bridge, +ram_memory, pci_memory, gsi, gsi_state, fw_cfg); +pc_machine_pci_bus_init(machine, pci_enabled, i440fx_state, piix3_devfn, +pci_bus, isa_bus, gsi, pci_memory, ram_memory); +pc_machine_device_init(machine, pci_enabled, gsi_state, icc_bridge, +piix3_devfn, fw_cfg, pci_bus, isa_bus, gsi); +} + static void pc_init_pci(MachineState *machine) {
[Qemu-devel] [PATCH] fix vcpu long time io blocking on tap, when too many packets was delivered to the guest os via tap interface.
Signed-off-by: Wangkai wangka...@huawei.com --- net/tap.c | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/net/tap.c b/net/tap.c index a40f7f0..9a59934 100644 --- a/net/tap.c +++ b/net/tap.c @@ -189,6 +189,7 @@ static void tap_send(void *opaque) { TAPState *s = opaque; int size; +int packets = 0; while (qemu_can_send_packet(s-nc)) { uint8_t *buf = s-buf; @@ -210,6 +211,19 @@ static void tap_send(void *opaque) } else if (size 0) { break; } + +/* + * When receive packets on tap, QEMU io was locked, if too many + * packets was delivered to the guest os via tap interface, + * tap_send() would keep looping, if then the VM required a io + * operation, would be blocked for a long time. + * Here we set the number to limit one tap interface receive time, + * keep io events fair and lock time little. + */ +packets++; +if (packets = 50) { +break; +} } } -- 1.7.2.5
Re: [Qemu-devel] [PATCH 4/4] xen:hw:i386:pc_piix: introduce new machine for IGD passthrough
On Thu, Jul 24, 2014 at 07:30:29PM +0800, Tiejun Chen wrote: Now we can introduce a new machine, xenigd, specific to IGD passthrough. This can avoid involving other common codes. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- hw/i386/pc_piix.c | 87 +++ 1 file changed, 87 insertions(+) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 2391fda..46e5901 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -206,6 +206,33 @@ static void pc_machine_pci_bus_init(MachineState *machine, } } +static void xen_igd_pc_machine_pci_bus_init(MachineState *machine, + int pci_enabled, + PCII440FXState *i440fx_state, + int piix3_devfn, + PCIBus *pci_bus, + ISABus *isa_bus, + qemu_irq *gsi, + MemoryRegion *pci_memory, + MemoryRegion *ram_memory) +{ +MemoryRegion *system_memory = get_system_memory(); +MemoryRegion *system_io = get_system_io(); + +if (pci_enabled) { +pci_bus = xen_igd_i440fx_init(i440fx_state, piix3_devfn, isa_bus, + gsi, system_memory, system_io, machine-ram_size, + below_4g_mem_size, + above_4g_mem_size, + pci_memory, ram_memory); +} else { +pci_bus = NULL; +i440fx_state = NULL; what does this do? +isa_bus = isa_bus_new(NULL, system_io); +no_hpet = 1; +} no_hpet is code duplicated from pc in chunk above, better to move to a common function. +} + static void pc_machine_device_init(MachineState *machine, int pci_enabled, GSIState *gsi_state, @@ -337,11 +364,39 @@ static void pc_init1(MachineState *machine, piix3_devfn, fw_cfg, pci_bus, isa_bus, gsi); } +static void xen_igd_pc_init1(MachineState *machine, + int pci_enabled, + int kvmclock_enabled) +{ +PCIBus *pci_bus = NULL; +ISABus *isa_bus = NULL; +PCII440FXState *i440fx_state = NULL; +int piix3_devfn = -1; +qemu_irq *gsi = NULL; +GSIState *gsi_state = NULL; +MemoryRegion *ram_memory = NULL; +MemoryRegion *pci_memory = NULL; +DeviceState *icc_bridge = NULL; +FWCfgState *fw_cfg = NULL; + +pc_machine_base_init(machine, pci_enabled, kvmclock_enabled, icc_bridge, +ram_memory, pci_memory, gsi, gsi_state, fw_cfg); +xen_igd_pc_machine_pci_bus_init(machine, pci_enabled, i440fx_state, +piix3_devfn, pci_bus, isa_bus, gsi, pci_memory, ram_memory); +pc_machine_device_init(machine, pci_enabled, gsi_state, icc_bridge, +piix3_devfn, fw_cfg, pci_bus, isa_bus, gsi); +} + static void pc_init_pci(MachineState *machine) { pc_init1(machine, 1, 1); } +static void xen_igd_pc_init_pci(MachineState *machine) +{ +xen_igd_pc_init1(machine, 1, 1); +} + static void pc_compat_2_0(MachineState *machine) { smbios_legacy_mode = true; @@ -470,6 +525,17 @@ static void pc_xen_hvm_init(MachineState *machine) pci_create_simple(bus, -1, xen-platform); } } +static void xen_igd_pc_hvm_init(MachineState *machine) +{ +PCIBus *bus; + +xen_igd_pc_init_pci(machine); + +bus = pci_find_primary_bus(); +if (bus != NULL) { +pci_create_simple(bus, -1, xen-platform); +} +} #endif #define PC_I440FX_MACHINE_OPTIONS \ @@ -919,6 +985,26 @@ static QEMUMachine xenfv_machine = { { /* end of list */ } }, }; +static QEMUMachine xenigd_machine = { +PC_COMMON_MACHINE_OPTIONS, +.name = xenigd, +.desc = Xen Fully-virtualized PC specific to IGD, +.init = xen_igd_pc_hvm_init, +.max_cpus = HVM_MAX_VCPUS, +.default_machine_opts = accel=xen, +.hot_add_cpu = pc_hot_add_cpu, +.compat_props = (GlobalProperty[]) { +/* xenfv has no fwcfg and so does not load acpi from QEMU. + * as such new acpi features don't work. + */ +{ +.driver = PIIX4_PM, +.property = acpi-pci-hotplug-with-bridge-support, +.value= off, +}, +{ /* end of list */ } +}, +}; #endif static void pc_machine_init(void) @@ -942,6 +1028,7 @@ static void pc_machine_init(void) qemu_register_pc_machine(isapc_machine); #ifdef CONFIG_XEN qemu_register_pc_machine(xenfv_machine); +qemu_register_pc_machine(xenigd_machine); #endif } -- 1.9.1
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work
On 28/07/14 16:22, Alexander Graf wrote: On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote: On 10.07.14 15:10, Christian Borntraeger wrote: From: David Hildenbrand d...@linux.vnet.ibm.com If a cpu is stopped, it must never be allowed to run and no interrupt may wake it up. A cpu also has to be unhalted if it is halted and has work to do - this scenario wasn't hit in kvm case yet, as only disabled wait is processed within QEMU. Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com This looks like it's something that generic infrastructure should take care of, no? How does this work for the other archs? They always get an interrupt on the transition between !has_work - has_work. Why don't we get one for s390x? Alex Well, we have the special case on s390 as a CPU that is in the STOPPED or the CHECK STOP state may never run - even if there is an interrupt. It's basically like this CPU has been switched off. Imagine that it is tried to inject an interrupt into a stopped vcpu. It will kick the stopped vcpu and thus lead to a call to kvm_arch_process_async_events(). We have to deny that this vcpu will ever run as long as it is stopped. It's like a way to suppress the interrupt for such a transition you mentioned. An interrupt kick usually just means we go back into the main loop. From there we check the interrupt bitmap which interrupt to handle. Check out the handling code here: http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580 If you just check for the stopped state in here, do_interrupt() will never get called and thus the CPU shouldn't ever get executed. Unless I'm heavily mistaken :). Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a SIGP START to that vcpu). Yes, in that case that other CPU generates a signal (a different bit in interrupt_request) and the first CPU would see that it has to wake up and wake up. I am not sure if such a mechanism/scenario is applicable to any other arch. They all seem to reset the cs-halted flag if they know they are able to run (e.g. due to an interrupt) - they have no such thing as stopped cpus, only halted/waiting cpus. There's not really much difference between the two. The only difference from a software point of view is that a stopped CPU has its external interrupt bits masked off, no? We have - wait (wait bit in PSW) - disabled wait (wait bit and interrupt fencing in PSW) - STOPPED (not related to PSW, state change usually handled via service processor or hypervisor) I think we have to differentiate between KVM/TCG. On KVM we always do in kernel halt and qemu sees a halted only for STOPPED or disabled wait. TCG has to take care of the normal wait as well. From a first glimpse, a disabled wait and STOPPED look similar, but there are (important) differences, e.g. other CPUs get a different a different result from a SIGP SENSE. This makes a big difference, e.g. for Linux guests, that send a SIGP STOP, followed by a SIGP SENSE loop until the CPU is down on hotplug (and shutdown, kexec..) So I think we agree, that handling the cpu states natively makes sense. The question is now only how to model it correctly without breaking TCG/KVM and reuse as much common code as possible. Correct? Do I understand you correctly, that your collapsing of stopped and halted is only in the qemu coding sense, IOW maybe we could just modify kvm_arch_process_async_events to consider the STOPPED state, as TCGs sigp implementation does not support SMP anyway? David would that work? Christian
[Qemu-devel] [Bug 1224444] Re: virtio-serial loses writes when used over virtio-mmio
FWIW I am able to reproduce this quite easily on aarch64 too. My test program is: https://github.com/libguestfs/libguestfs/blob/master/tests/qemu/qemu-speed-test.c and you use it like this: qemu-speed-test --virtio-serial-upload (You can also test virtio-serial downloads and a few other things, but those don't appear to deadlock) Slowing down the upload, even just by enabling debugging, is sufficient to make the problem go away most of the time. I am testing with qemu from git (f45c56e0166e86d3b309ae72f4cb8e3d0949c7ef). -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/122 Title: virtio-serial loses writes when used over virtio-mmio Status in QEMU: New Bug description: virtio-serial appears to lose writes, but only when used on top of virtio-mmio. The scenario is this: /home/rjones/d/qemu/arm-softmmu/qemu-system-arm \ -global virtio-blk-device.scsi=off \ -nodefconfig \ -nodefaults \ -nographic \ -M vexpress-a15 \ -machine accel=kvm:tcg \ -m 500 \ -no-reboot \ -kernel /home/rjones/d/libguestfs/tmp/.guestfs-1001/kernel.27944 \ -dtb /home/rjones/d/libguestfs/tmp/.guestfs-1001/dtb.27944 \ -initrd /home/rjones/d/libguestfs/tmp/.guestfs-1001/initrd.27944 \ -device virtio-scsi-device,id=scsi \ -drive file=/home/rjones/d/libguestfs/tmp/libguestfsLa9dE2/scratch.1,cache=unsafe,format=raw,id=hd0,if=none \ -device scsi-hd,drive=hd0 \ -drive file=/home/rjones/d/libguestfs/tmp/.guestfs-1001/root.27944,snapshot=on,id=appliance,cache=unsafe,if=none \ -device scsi-hd,drive=appliance \ -device virtio-serial-device \ -serial stdio \ -chardev socket,path=/home/rjones/d/libguestfs/tmp/libguestfsLa9dE2/guestfsd.sock,id=channel0 \ -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \ -append 'panic=1 mem=500M console=ttyAMA0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm-256color' After the guest starts up, a daemon writes 4 bytes to a virtio-serial socket. The host side reads these 4 bytes correctly and writes a 64 byte message. The guest never sees this message. I enabled virtio-mmio debugging, and this is what is printed (## = my comment): ## guest opens the socket: trying to open virtio-serial channel '/dev/virtio-ports/org.libguestfs.channel.0' virtio_mmio: virtio_mmio_write offset 0x50 value 0x3 opened the socket, sock = 3 udevadm settle ## guest writes 4 bytes to the socket: virtio_mmio: virtio_mmio_write offset 0x50 value 0x5 virtio_mmio: virtio_mmio setting IRQ 1 virtio_mmio: virtio_mmio_read offset 0x60 virtio_mmio: virtio_mmio_write offset 0x64 value 0x1 virtio_mmio: virtio_mmio setting IRQ 0 sent magic GUESTFS_LAUNCH_FLAG ## host reads 4 bytes successfully: main_loop libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG libguestfs: [14605ms] appliance is up Guest launched OK. ## host writes 64 bytes to socket: libguestfs: writing the data to the socket (size = 64) waiting for next request libguestfs: data written OK ## hangs here forever with guest in read() call never receiving any data I am using qemu from git today (2d1fe1873a984). Some notes: - It's not 100% reproducible. Sometimes everything works fine, although it fails often (eg 2/3rds of the time). - KVM is being used. - We've long used virtio-serial on x86 and have never seen anything like this. This is what the output looks like when it doesn't fail: trying to open virtio-serial channel '/dev/virtio-ports/org.libguestfs.channel.0 ' virtio_mmio: virtio_mmio_write offset 0x50 value 0x3 opened the socket, sock = 3 udevadm settle virtio_mmio: virtio_mmio_write offset 0x50 value 0x5 virtio_mmio: virtio_mmio setting IRQ 1 virtio_mmio: virtio_mmio_read offset 0x60 virtio_mmio: virtio_mmio_write offset 0x64 value 0x1 virtio_mmio: virtio_mmio setting IRQ 0 sent magic GUESTlibguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG libguestfs: [14710ms] appliance is up Guest launched OK. libguestfs: writing the data to the socket (size = 64) FS_LAUNCH_FLAG main_loop waiting for next request libguestfs: data written OK virtio_mmio: virtio_mmio_write offset 0x50 value 0x2 virtio_mmio: virtio_mmio setting IRQ 1 virtio_mmio: virtio_mmio setting IRQ 1 virtio_mmio: virtio_mmio_write offset 0x50 value 0x2 virtio_mmio: virtio_mmio_read offset 0x60 virtio_mmio: virtio_mmio setting IRQ 1 virtio_mmio: virtio_mmio_write offset 0x64 value 0x1 virtio_mmio: virtio_mmio setting IRQ 0 virtio_mmio: virtio_mmio_read offset 0x60 virtio_mmio: virtio_mmio_write offset 0x64 value 0x0 virtio_mmio: virtio_mmio setting IRQ 0 virtio_mmio: virtio_mmio_read offset 0x60 virtio_mmio: virtio_mmio_write offset
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work
On 29.07.14 13:44, Christian Borntraeger wrote: On 28/07/14 16:22, Alexander Graf wrote: On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote: On 10.07.14 15:10, Christian Borntraeger wrote: From: David Hildenbrand d...@linux.vnet.ibm.com If a cpu is stopped, it must never be allowed to run and no interrupt may wake it up. A cpu also has to be unhalted if it is halted and has work to do - this scenario wasn't hit in kvm case yet, as only disabled wait is processed within QEMU. Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com This looks like it's something that generic infrastructure should take care of, no? How does this work for the other archs? They always get an interrupt on the transition between !has_work - has_work. Why don't we get one for s390x? Alex Well, we have the special case on s390 as a CPU that is in the STOPPED or the CHECK STOP state may never run - even if there is an interrupt. It's basically like this CPU has been switched off. Imagine that it is tried to inject an interrupt into a stopped vcpu. It will kick the stopped vcpu and thus lead to a call to kvm_arch_process_async_events(). We have to deny that this vcpu will ever run as long as it is stopped. It's like a way to suppress the interrupt for such a transition you mentioned. An interrupt kick usually just means we go back into the main loop. From there we check the interrupt bitmap which interrupt to handle. Check out the handling code here: http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580 If you just check for the stopped state in here, do_interrupt() will never get called and thus the CPU shouldn't ever get executed. Unless I'm heavily mistaken :). Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a SIGP START to that vcpu). Yes, in that case that other CPU generates a signal (a different bit in interrupt_request) and the first CPU would see that it has to wake up and wake up. I am not sure if such a mechanism/scenario is applicable to any other arch. They all seem to reset the cs-halted flag if they know they are able to run (e.g. due to an interrupt) - they have no such thing as stopped cpus, only halted/waiting cpus. There's not really much difference between the two. The only difference from a software point of view is that a stopped CPU has its external interrupt bits masked off, no? We have - wait (wait bit in PSW) - disabled wait (wait bit and interrupt fencing in PSW) - STOPPED (not related to PSW, state change usually handled via service processor or hypervisor) I think we have to differentiate between KVM/TCG. On KVM we always do in kernel halt and qemu sees a halted only for STOPPED or disabled wait. TCG has to take care of the normal wait as well. From a first glimpse, a disabled wait and STOPPED look similar, but there are (important) differences, e.g. other CPUs get a different a different result from a SIGP SENSE. This makes a big difference, e.g. for Linux guests, that send a SIGP STOP, followed by a SIGP SENSE loop until the CPU is down on hotplug (and shutdown, kexec..) So I think we agree, that handling the cpu states natively makes sense. The question is now only how to model it correctly without breaking TCG/KVM and reuse as much common code as possible. Correct? Do I understand you correctly, that your collapsing of stopped and halted is only in the qemu coding sense, IOW maybe we could just modify kvm_arch_process_async_events to consider the STOPPED state, as TCGs sigp implementation does not support SMP anyway? That works for me, yes. Alex
[Qemu-devel] Live migration debugging
Hi folks, Recently there's been several patches to fix kvmclock issues during migrations, which were subsequently reverted. I hope the observations below can be helpful in pinning down the actual issues to make live migration work again in the future. Live migration has been broken since at least release 1.4.0 (as shipped with Ubuntu 13.04), and still has the same problems in 2.1.0-rc2, but briefly worked in 2.0-git-20140609. The problem is that once the live migration is complete and the guest gets started on the destination server, it will hang for a long time, consuming 100% cpu. This can be mere seconds, but I've also observed hangs for as long as 11 minutes. And then suddenly the guest starts to respond again as if nothing happens, but its clock has not progressed at all while the machine was hanging. What I have observed is that the time spent hanging is exactly the difference between the clock rate of the host, and the 'real' (NTP) time. If you multiply the time since the previous migration with the PPM offset as determined by NTP (see /var/lib/ntp/ntp.drift), that is exactly how many seconds the guest will spend at 100% CPU before becoming responsive again. I have observed this on two different pairs of KVM servers. Each of the servers has a negative PPM value according to NTP. Example: a guest having nearly 9 days of uptime, with (according to NTP) a clock rate of -34 ppm, froze for 27 seconds when I migrated it. I have done quite a few test migrations, and this relationship holds quite precisely. As the duration of the freeze is proportional to the time since the previous migration, debugging is a bit difficult as you have to wait a while before you can demonstrate the problem. It is also probably a reason this problem is underreported, because it is not very noticeable if you do it right after starting the VM, but looks like a complete crash if you have a few months of uptime. With the 2.0 sources from 2014-06-09, the problem does *not* occur. A side-effect of that patch is that the guest clock has a lot of jitter until the first migration, but behaves normally (yet without hangs) on subsequent migrations. Is there a way that I can directly read the kvmclock from the guest or host, so we can compare them before and after migration, and see what goes wrong precisely? See also https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1297218 Regards, Paul Boven. -- Paul Boven bo...@jive.nl +31 (0)521-596547 Unix/Linux/Networking specialist Joint Institute for VLBI in Europe - www.jive.nl VLBI - It's a fringe science
Re: [Qemu-devel] [PATCH v2] Tap: fix vcpu long time io blocking on tap
-Original Message- From: Stefan Hajnoczi [mailto:stefa...@gmail.com] Sent: Monday, July 28, 2014 11:53 PM To: Wangkai (Kevin,C) Cc: qemu-devel@nongnu.org; Lee yang; aligu...@amazon.com; Stefan Hajnoczi Subject: Re: [Qemu-devel] [PATCH v2] Tap: fix vcpu long time io blocking on tap On Fri, Jul 18, 2014 at 09:33:42AM +, Wangkai (Kevin,C) wrote: fix vcpu long time io blocking on tap, when too many packets was delivered to the guest os via tap interface. -- Signed-off-by: Wangkai wangka...@huawei.com Thanks, applied to my net-next tree: https://github.com/stefanha/qemu/commits/net-next The patch did not apply cleanly so I had to do it manually: Applying: Tap: fix vcpu long time io blocking on tap fatal: corrupt patch at line 32 Repository lacks necessary blobs to fall back on 3-way merge. Cannot fall back to three-way merge. Patch failed at 0001 Tap: fix vcpu long time io blocking on tap Please use git-send-email(1). I also adjusted the commit message and doc comments to fit QEMU style and for grammar. Stefan [Wangkai (Kevin,C)] Hi Stefan, I will send a patch v3 use git send-email to you. Wangkai
Re: [Qemu-devel] [RFC PATCH v2 07/49] kvmapic: fixing loading vmstate
From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini Il 17/07/2014 13:02, Pavel Dovgalyuk ha scritto: diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index ce3d903..9d75ee0 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -347,7 +347,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id) static const VMStateDescription vmstate_apic_common = { .name = apic, -.version_id = 3, +.version_id = 4, .minimum_version_id = 3, .minimum_version_id_old = 1, .load_state_old = apic_load_old, @@ -374,6 +374,9 @@ static const VMStateDescription vmstate_apic_common = { VMSTATE_INT64(next_time, APICCommonState), VMSTATE_INT64(timer_expiry, APICCommonState), /* open-coded timer state */ +VMSTATE_INT32_V(sipi_vector, APICCommonState, 4), +VMSTATE_INT32_V(wait_for_sipi, APICCommonState, 4), This could be a subsection. sipi_vector is only used (needed) if wait_for_sipi != 0. Right, sipi_vector is used when wait_for_sipi != 0. But we can set sipi_vector to non-zero, save the snapshot, and then set wait_for_sipi. If that snapshot will be loaded, sipi_vector become incorrect. Isn't this scenario possible? Pavel Dovgalyuk
Re: [Qemu-devel] [RFC PATCH v2 07/49] kvmapic: fixing loading vmstate
Il 29/07/2014 14:03, Pavel Dovgaluk ha scritto: This could be a subsection. sipi_vector is only used (needed) if wait_for_sipi != 0. Right, sipi_vector is used when wait_for_sipi != 0. But we can set sipi_vector to non-zero, save the snapshot, and then set wait_for_sipi. If that snapshot will be loaded, sipi_vector become incorrect. Isn't this scenario possible? sipi_vector will not be used until CPU_INTERRUPT_SIPI is set, and then sipi_vector will have been overwritten with a new value. The architecture guarantees that. Paolo
Re: [Qemu-devel] [PATCH v6 3/3] sclp-s390: Add memory hotplug SCLPs
On 30/06/14 16:00, Matthew Rosato wrote: Add memory information to read SCP info and add handlers for Read Storage Element Information, Attach Storage Element, Assign Storage and Unassign Signed-off-by: Matthew Rosato mjros...@linux.vnet.ibm.com --- hw/s390x/sclp.c| 259 ++-- target-s390x/cpu.h | 15 +++ target-s390x/kvm.c |5 + 3 files changed, 273 insertions(+), 6 deletions(-) diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c index 769d7c3..936b189 100644 --- a/hw/s390x/sclp.c +++ b/hw/s390x/sclp.c @@ -16,7 +16,8 @@ #include sysemu/kvm.h #include exec/memory.h #include sysemu/sysemu.h - +#include exec/address-spaces.h +#include qemu/config-file.h #include hw/s390x/sclp.h #include hw/s390x/event-facility.h @@ -33,10 +34,19 @@ static inline SCLPEventFacility *get_event_facility(void) static void read_SCP_info(SCCB *sccb) { ReadInfo *read_info = (ReadInfo *) sccb; +sclpMemoryHotplugDev *mhd = get_sclp_memory_hotplug_dev(); CPUState *cpu; -int shift = 0; int cpu_count = 0; int i = 0; +int increment_size = 20; +int rnsize, rnmax; +QemuOpts *opts = qemu_opts_find(qemu_find_opts(memory), NULL); +int slots = qemu_opt_get_number(opts, slots, 0); +int max_avail_slots = s390_get_memslot_count(kvm_state); + +if (slots max_avail_slots) { +slots = max_avail_slots; +} CPU_FOREACH(cpu) { cpu_count++; @@ -54,14 +64,235 @@ static void read_SCP_info(SCCB *sccb) read_info-facilities = cpu_to_be64(SCLP_HAS_CPU_INFO); -while ((ram_size (20 + shift)) 65535) { -shift++; +/* I was about to apply this series, but I think it breaks the non-ccw machine for strange memory sizes (e.g. 40001MB). The s390-virtio.c file still does a shifting via 65535. Maybe just do a change over there as well? You should also start making up your mind about migration and prepare patches. I will send cpu migration patches soon. Christian + * The storage increment size is a multiple of 1M and is a power of 2. + * The number of storage increments must be MAX_STORAGE_INCREMENTS or fewer. + */ +while ((ram_size increment_size) MAX_STORAGE_INCREMENTS) { +increment_size++; +} +rnmax = ram_size increment_size; + +/* Memory Hotplug is only supported for the ccw machine type */ +if (mhd) { +while ((mhd-standby_mem_size increment_size) + MAX_STORAGE_INCREMENTS) { +increment_size++; +} +assert(increment_size == mhd-increment_size); + +mhd-standby_subregion_size = MEM_SECTION_SIZE; +/* Deduct the memory slot already used for core */ +if (slots 0) { +while ((mhd-standby_subregion_size * (slots - 1) + mhd-standby_mem_size)) { +mhd-standby_subregion_size = mhd-standby_subregion_size 1; +} +} +/* + * Initialize mapping of guest standby memory sections indicating which + * are and are not online. Assume all standby memory begins offline. + */ +if (mhd-standby_state_map == 0) { +if (mhd-standby_mem_size % mhd-standby_subregion_size) { +mhd-standby_state_map = g_malloc0((mhd-standby_mem_size / + mhd-standby_subregion_size + 1) * + (mhd-standby_subregion_size / + MEM_SECTION_SIZE)); +} else { +mhd-standby_state_map = g_malloc0(mhd-standby_mem_size / + MEM_SECTION_SIZE); +} +} +mhd-padded_ram_size = ram_size + mhd-pad_size; +mhd-rzm = 1 mhd-increment_size; +rnmax = ((ram_size + mhd-standby_mem_size + mhd-pad_size) + mhd-increment_size); + +read_info-facilities |= cpu_to_be64(SCLP_FC_ASSIGN_ATTACH_READ_STOR); +} + +rnsize = 1 (increment_size - 20); +if (rnsize = 128) { +read_info-rnsize = rnsize; +} else { +read_info-rnsize = 0; +read_info-rnsize2 = cpu_to_be32(rnsize); } -read_info-rnmax = cpu_to_be16(ram_size (20 + shift)); -read_info-rnsize = 1 shift; + +if (rnmax 0x1) { +read_info-rnmax = cpu_to_be16(rnmax); +} else { +read_info-rnmax = cpu_to_be16(0); +read_info-rnmax2 = cpu_to_be64(rnmax); +} + sccb-h.response_code = cpu_to_be16(SCLP_RC_NORMAL_READ_COMPLETION); } +static void read_storage_element0_info(SCCB *sccb) +{ +int i, assigned; +int subincrement_id = SCLP_STARTING_SUBINCREMENT_ID; +ReadStorageElementInfo *storage_info = (ReadStorageElementInfo *) sccb; +sclpMemoryHotplugDev *mhd =
Re: [Qemu-devel] [PULL 0/8] pc migration fixes
On 29 July 2014 11:48, Michael S. Tsirkin m...@redhat.com wrote: The following changes since commit c60a57ff497667780132a3fcdc1500c83af5d5c0: Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging (2014-07-25 16:58:41 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream for you to fetch changes up to f47337cb9181a1f9339b9b0516b92fe77a22f0f3: piix: set legacy table size for 1.7 (2014-07-29 12:26:12 +0200) pc migration fixes Last minute fixes for migration. It seems that if we don't fix it now, fixing it in the next version will be even more painful ... Signed-off-by: Michael S. Tsirkin m...@redhat.com Igor Mammedov (1): pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is disabled Michael S. Tsirkin (3): acpi-build: minor code cleanup acpi-build: tweak acpi migration limits piix: set legacy table size for 1.7 Paolo Bonzini (4): acpi-dsdt: procedurally generate _PRT pc: hack for migration compatibility from QEMU 2.0 bios-tables-test: fix ASL normalization false positive pc: future-proof migration-compatibility of ACPI tables Applied, thanks. -- PMM
[Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices
This patch set is based on a patch suggested by Einar Lueck on Feb 08, 2013. This patch set introduces: 1. s390x specific geometry detection: Add s390 specific version of hd_geometry_guess function, which uses HDIO_GETGEO ioctl. 2. A set of blocksize patches for autodetection of logical and physical blocksizes. Change history: 2.1 Original blocksize patch only configured autolookup for virtio-blk devices. There was a request from Stefan Hajnoczi to make this architecture-independent. Now autolookup is configured by default for all block devices. 2.2 Add driver method to probe blocksizes for raw and host_device drivers. (also requested by the reviewers of the original patch) Ekaterina Tumanova (4): hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x blocksize: support auto-sensing of blocksizes blocksize: Add driver method to get the blocksizes blocksize: add blkconf_blocksize call to all block devices block.c | 12 + block/raw-posix.c | 69 ++- block/raw_bsd.c | 14 ++ hw/block/Makefile.objs| 6 - hw/block/block.c | 25 + hw/block/hd-geometry.c| 56 ++ hw/block/nvme.c | 1 + hw/block/virtio-blk.c | 1 + hw/core/qdev-properties.c | 4 ++- hw/ide/qdev.c | 1 + hw/scsi/scsi-disk.c | 1 + hw/usb/dev-storage.c | 1 + include/block/block.h | 1 + include/block/block_int.h | 5 include/hw/block/block.h | 6 +++-- 15 files changed, 180 insertions(+), 23 deletions(-) -- 1.8.5.5
[Qemu-devel] [PATCH 3/4] blocksize: Add driver method to get the blocksizes
This patch introduces a new method of defining the physical and logical blocksizes for raw and host_device drivers. It uses various ioctls to determine the logical and physical blocksizes. For detecting the logical size it uses ioctl calls, that were previously coded in raw_probe_alignment (now moved into separate function probe_logical_blocksize). For detecting the physical blocksize it uses BLKPBSZGET ioctl. For raw devices driver calls the child's method. Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com Acked-by: Cornelia Huck cornelia.h...@de.ibm.com --- block/raw-posix.c | 69 --- block/raw_bsd.c | 14 +++ 2 files changed, 64 insertions(+), 19 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index 8e9758e..6e0d80c 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -221,50 +221,70 @@ static int raw_normalize_devicepath(const char **filename) } #endif -static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) +static unsigned int probe_logical_blocksize(BlockDriverState *bs, int fd) { -BDRVRawState *s = bs-opaque; -char *buf; -unsigned int sector_size; - -/* For /dev/sg devices the alignment is not really used. - With buffered I/O, we don't have any restrictions. */ -if (bs-sg || !(s-open_flags O_DIRECT)) { -bs-request_alignment = 1; -s-buf_align = 1; -return; -} +unsigned int sector_size = 0; /* Try a few ioctls to get the right size */ -bs-request_alignment = 0; -s-buf_align = 0; - #ifdef BLKSSZGET if (ioctl(fd, BLKSSZGET, sector_size) = 0) { -bs-request_alignment = sector_size; +return sector_size; } #endif #ifdef DKIOCGETBLOCKSIZE if (ioctl(fd, DKIOCGETBLOCKSIZE, sector_size) = 0) { -bs-request_alignment = sector_size; +return sector_size; } #endif #ifdef DIOCGSECTORSIZE if (ioctl(fd, DIOCGSECTORSIZE, sector_size) = 0) { -bs-request_alignment = sector_size; +return sector_size; } #endif #ifdef CONFIG_XFS if (s-is_xfs) { struct dioattr da; if (xfsctl(NULL, fd, XFS_IOC_DIOINFO, da) = 0) { -bs-request_alignment = da.d_miniosz; +sector_size = da.d_miniosz; /* The kernel returns wrong information for d_mem */ /* s-buf_align = da.d_mem; */ +return sector_size; } } #endif +return 0; +} + +static unsigned int probe_physical_blocksize(BlockDriverState *bs, int fd) +{ +unsigned int blk_size = 0; +#ifdef BLKPBSZGET +if (ioctl(fd, BLKPBSZGET, blk_size) = 0) { +return blk_size; +} +#endif + +return 0; +} + +static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) +{ +BDRVRawState *s = bs-opaque; +char *buf; + +/* For /dev/sg devices the alignment is not really used. + With buffered I/O, we don't have any restrictions. */ +if (bs-sg || !(s-open_flags O_DIRECT)) { +bs-request_alignment = 1; +s-buf_align = 1; +return; +} + +s-buf_align = 0; +/* Let's try to use the logical blocksize for the alignment. */ +bs-request_alignment = probe_logical_blocksize(bs, fd); + /* If we could not get the sizes so far, we can only guess them */ if (!s-buf_align) { size_t align; @@ -642,6 +662,16 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) bs-bl.opt_mem_alignment = s-buf_align; } +static int hdev_probe_blocksizes(BlockDriverState *bs) +{ +BDRVRawState *s = bs-opaque; + +bs-logical_block_size = probe_logical_blocksize(bs, s-fd); +bs-physical_block_size = probe_physical_blocksize(bs, s-fd); + +return 0; +} + static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb) { int ret; @@ -2009,6 +2039,7 @@ static BlockDriver bdrv_host_device = { .bdrv_get_info = raw_get_info, .bdrv_get_allocated_file_size = raw_get_allocated_file_size, +.bdrv_probe_blocksizes = hdev_probe_blocksizes, .bdrv_detach_aio_context = raw_detach_aio_context, .bdrv_attach_aio_context = raw_attach_aio_context, diff --git a/block/raw_bsd.c b/block/raw_bsd.c index f82f4c2..ae400a6 100644 --- a/block/raw_bsd.c +++ b/block/raw_bsd.c @@ -173,6 +173,19 @@ static int raw_probe(const uint8_t *buf, int buf_size, const char *filename) return 1; } +static int raw_probe_blocksizes(BlockDriverState *bs) +{ +int ret; + +ret = bdrv_probe_blocksizes(bs-file); +if (ret == 0) { +bs-logical_block_size = bs-file-logical_block_size; +bs-physical_block_size = bs-file-physical_block_size; +} + +return ret; +} + static BlockDriver bdrv_raw = { .format_name = raw, .bdrv_probe = raw_probe, @@ -190,6 +203,7 @@ static BlockDriver bdrv_raw = {
[Qemu-devel] [PATCH 2/4] blocksize: support auto-sensing of blocksizes
The block device model does not impose fixed block sizes for access to backing devices. This patch introduces support for auto lookup of the block sizes of the backing block device. To achieve this, a new function blkconf_blocksizes is implemented. This function tries to get values from the block driver. If this does not work 512 is used, so the default excecution logic is not changed. Based on 2013 patch from Einar Lueck elelu...@de.ibm.com Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com Acked-by: Cornelia Huck cornelia.h...@de.ibm.com --- block.c | 12 hw/block/block.c | 25 + hw/core/qdev-properties.c | 4 +++- include/block/block.h | 1 + include/block/block_int.h | 5 + include/hw/block/block.h | 2 ++ 6 files changed, 48 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index 8cf519b..67de6e9 100644 --- a/block.c +++ b/block.c @@ -552,6 +552,18 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) } } +int bdrv_probe_blocksizes(BlockDriverState *bs) +{ +BlockDriver *drv = bs-drv; + +assert(drv != NULL); +if (drv-bdrv_probe_blocksizes) { +return drv-bdrv_probe_blocksizes(bs); +} + +return -1; +} + /* * Create a uniquely-named empty temporary file. * Return 0 upon success, otherwise a negative errno value. diff --git a/hw/block/block.c b/hw/block/block.c index 33dd3f3..29a0227 100644 --- a/hw/block/block.c +++ b/hw/block/block.c @@ -10,6 +10,10 @@ #include sysemu/blockdev.h #include hw/block/block.h #include qemu/error-report.h +#include block/block_int.h +#ifdef __linux__ +#include linux/fs.h +#endif void blkconf_serial(BlockConf *conf, char **serial) { @@ -22,6 +26,27 @@ void blkconf_serial(BlockConf *conf, char **serial) } } +void blkconf_blocksizes(BlockConf *conf) +{ +BlockDriverState *bs = conf-bs; + +/* default values as a basis - if probing fails */ +conf-physical_block_size = BLOCK_PROPERTY_STD_BLKSIZE; +conf-logical_block_size = BLOCK_PROPERTY_STD_BLKSIZE; +if (bdrv_probe_blocksizes(conf-bs) 0) { +return; +} +if (bs-logical_block_size) { +conf-logical_block_size = (uint16_t)(bs-logical_block_size); +} +if (bs-physical_block_size) { +conf-physical_block_size = (uint16_t)(bs-physical_block_size); +} else if (conf-logical_block_size) { +/* if driver sets no physical size, try to use logical size */ +conf-physical_block_size = conf-logical_block_size; +} +} + int blkconf_geometry(BlockConf *conf, int *ptrans, unsigned cyls_max, unsigned heads_max, unsigned secs_max) { diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c index 3d12560..49fb1e3 100644 --- a/hw/core/qdev-properties.c +++ b/hw/core/qdev-properties.c @@ -579,7 +579,9 @@ static void set_blocksize(Object *obj, Visitor *v, void *opaque, error_propagate(errp, local_err); return; } -if (value min || value max) { + +/* value == 0 indicates that block size should be sensed later on */ +if ((value min || value max) value 0) { error_set(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE, dev-id?:, name, (int64_t)value, min, max); return; diff --git a/include/block/block.h b/include/block/block.h index f08471d..43af424 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -583,6 +583,7 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs); * the old #AioContext is not executing. */ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context); +int bdrv_probe_blocksizes(BlockDriverState *bs); void bdrv_io_plug(BlockDriverState *bs); void bdrv_io_unplug(BlockDriverState *bs); diff --git a/include/block/block_int.h b/include/block/block_int.h index 7b541a0..572e954 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -266,6 +266,8 @@ struct BlockDriver { void (*bdrv_io_unplug)(BlockDriverState *bs); void (*bdrv_flush_io_queue)(BlockDriverState *bs); +int (*bdrv_probe_blocksizes)(BlockDriverState *bs); + QLIST_ENTRY(BlockDriver) list; }; @@ -360,6 +362,9 @@ struct BlockDriverState { /* the block size for which the guest device expects atomicity */ int guest_block_size; +unsigned int physical_block_size; +unsigned int logical_block_size; + /* do we need to tell the quest if we have a volatile write cache? */ int enable_write_cache; diff --git a/include/hw/block/block.h b/include/hw/block/block.h index 7c3d6c8..7a0092e 100644 --- a/include/hw/block/block.h +++ b/include/hw/block/block.h @@ -40,6 +40,7 @@ static inline unsigned int get_physical_block_exp(BlockConf *conf) return exp; } +#define BLOCK_PROPERTY_STD_BLKSIZE 512 #define DEFINE_BLOCK_PROPERTIES(_state, _conf) \
[Qemu-devel] [PATCH 1/4] hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x
This patch extends the function hd_geometry_guess. It introduces a target specific hook. The default implementation for this target specific hook is empty, has therefore no effect and the existing logic works as before. For target-s390x, the behaviour is chosen as follows: If no geo could be guessed via guess_disk_lchs, a new function called guess_disk_pchs is called. The latter utilizes HDIO_GET_GEO ioctl to ask the underlying disk for geometry. If this is not successful (e.g. image files) geometry is derived from the size of the disk (as before). The new HDIO_GETGEO logic is required for two use cases: a) Support for geometries of Direct Attached Storage Disks (DASD) on s390x configured as backing of virtio block devices. b) Support for FCP attached SCSI disks that do not yet have a partition table. Without this patch, fdisk -l on the host would return different results then fdisk -l in the guest. Based on 2013 patch from Einar Lueck elelu...@de.ibm.com Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com Acked-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/block/Makefile.objs | 6 +- hw/block/hd-geometry.c | 56 ++ 2 files changed, 61 insertions(+), 1 deletion(-) diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs index bf46f03..e4f6205 100644 --- a/hw/block/Makefile.objs +++ b/hw/block/Makefile.objs @@ -1,4 +1,4 @@ -common-obj-y += block.o cdrom.o hd-geometry.o +common-obj-y += block.o cdrom.o common-obj-$(CONFIG_FDC) += fdc.o common-obj-$(CONFIG_SSI_M25P80) += m25p80.o common-obj-$(CONFIG_NAND) += nand.o @@ -13,3 +13,7 @@ obj-$(CONFIG_SH4) += tc58128.o obj-$(CONFIG_VIRTIO) += virtio-blk.o obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += dataplane/ + +# geometry is target/architecture dependent and therefore needs to be built +# for every target platform +obj-y += hd-geometry.o diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c index 6feb4f8..7988264 100644 --- a/hw/block/hd-geometry.c +++ b/hw/block/hd-geometry.c @@ -33,6 +33,10 @@ #include block/block.h #include hw/block/block.h #include trace.h +#ifdef __linux__ +#include linux/fs.h +#include linux/hdreg.h +#endif struct partition { uint8_t boot_ind; /* 0x80 - active */ @@ -47,6 +51,37 @@ struct partition { uint32_t nr_sects; /* nr of sectors in partition */ } QEMU_PACKED; +static void guess_chs_for_size(BlockDriverState *bs, uint32_t *pcyls, + uint32_t *pheads, uint32_t *psecs); + +/* try to get geometry from disk via HDIO_GETGEO ioctl + Return 0 if OK, -1 if ioctl does not work (e.g. image file) */ +static inline int guess_disk_pchs(BlockDriverState *bs, uint32_t *pcylinders, + uint32_t *pheads, uint32_t *psectors) +{ +#ifdef __linux__ +struct hd_geometry geo; + +if (bdrv_ioctl(bs, HDIO_GETGEO, geo)) { +return -1; +} + +/* HDIO_GETGEO may return success even though geo contains zeros + (e.g. certain multipath setups) */ +if (!geo.heads || !geo.sectors || !geo.cylinders) { +return -1; +} + +*pheads = geo.heads; +*psectors = geo.sectors; +*pcylinders = geo.cylinders; +return 0; +#else +return -1; +#endif +} + + /* try to guess the disk logical geometry from the MSDOS partition table. Return 0 if OK, -1 if could not guess */ static int guess_disk_lchs(BlockDriverState *bs, @@ -116,6 +151,26 @@ static void guess_chs_for_size(BlockDriverState *bs, *psecs = 63; } +#ifdef TARGET_S390X +void hd_geometry_guess(BlockDriverState *bs, uint32_t *pcyls, uint32_t *pheads, + uint32_t *psecs, int *ptrans) +{ +if (guess_disk_lchs(bs, (int *)pcyls, (int *)pheads, (int *)psecs) == 0) { +goto done; +} +if (guess_disk_pchs(bs, pcyls, pheads, psecs) == 0) { +goto done; +} +/* still no geometry - try to guess from disk size */ +guess_chs_for_size(bs, pcyls, pheads, psecs); +done: +if (ptrans) { +*ptrans = BIOS_ATA_TRANSLATION_NONE; +} +trace_hd_geometry_guess(bs, *pcyls, *pheads, *psecs, +BIOS_ATA_TRANSLATION_NONE); +} +#else void hd_geometry_guess(BlockDriverState *bs, uint32_t *pcyls, uint32_t *pheads, uint32_t *psecs, int *ptrans) @@ -148,6 +203,7 @@ void hd_geometry_guess(BlockDriverState *bs, } trace_hd_geometry_guess(bs, *pcyls, *pheads, *psecs, translation); } +#endif int hd_bios_chs_auto_trans(uint32_t cyls, uint32_t heads, uint32_t secs) { -- 1.8.5.5
[Qemu-devel] [PATCH 4/4] blocksize: add blkconf_blocksize call to all block devices
This patch add the blkconf_blocksize call to all devices, which use DEFINE_BLOCK_PROPERTIES. If the underlying driver function fails, blkconf_blocksizes will set blocksizes to default (512) value. Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com Acked-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/block/nvme.c | 1 + hw/block/virtio-blk.c| 1 + hw/ide/qdev.c| 1 + hw/scsi/scsi-disk.c | 1 + hw/usb/dev-storage.c | 1 + include/hw/block/block.h | 4 ++-- 6 files changed, 7 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 5fd8f89..50fe769 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -761,6 +761,7 @@ static int nvme_init(PCIDevice *pci_dev) if (!n-serial) { return -1; } +blkconf_blocksizes(n-conf); pci_conf = pci_dev-config; pci_conf[PCI_INTERRUPT_PIN] = 1; diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index c241c50..b5027b1 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -743,6 +743,7 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) } blkconf_serial(blk-conf, blk-serial); +blkconf_blocksizes(blk-conf); s-original_wce = bdrv_enable_write_cache(blk-conf.bs); if (blkconf_geometry(blk-conf, NULL, 65535, 255, 255) 0) { error_setg(errp, Error setting geometry); diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c index 6e475e6..6d94d8f 100644 --- a/hw/ide/qdev.c +++ b/hw/ide/qdev.c @@ -161,6 +161,7 @@ static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind) } blkconf_serial(dev-conf, dev-serial); +blkconf_blocksizes(dev-conf); if (kind != IDE_CD blkconf_geometry(dev-conf, dev-chs_trans, 65536, 16, 255) 0) { return -1; diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index d47ecd6..bfae48b 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -2250,6 +2250,7 @@ static int scsi_initfn(SCSIDevice *dev) } blkconf_serial(s-qdev.conf, s-serial); +blkconf_blocksizes(s-qdev.conf); if (dev-type == TYPE_DISK blkconf_geometry(dev-conf, NULL, 65535, 255, 255) 0) { return -1; diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c index ae4efcb..cf50ac1 100644 --- a/hw/usb/dev-storage.c +++ b/hw/usb/dev-storage.c @@ -603,6 +603,7 @@ static int usb_msd_initfn_storage(USBDevice *dev) } blkconf_serial(s-conf, dev-serial); +blkconf_blocksizes(s-conf); /* * Hack alert: this pretends to be a block device, but it's really diff --git a/include/hw/block/block.h b/include/hw/block/block.h index 7a0092e..8560bb2 100644 --- a/include/hw/block/block.h +++ b/include/hw/block/block.h @@ -44,9 +44,9 @@ static inline unsigned int get_physical_block_exp(BlockConf *conf) #define DEFINE_BLOCK_PROPERTIES(_state, _conf) \ DEFINE_PROP_DRIVE(drive, _state, _conf.bs), \ DEFINE_PROP_BLOCKSIZE(logical_block_size, _state, \ - _conf.logical_block_size, 512), \ + _conf.logical_block_size, 0), \ DEFINE_PROP_BLOCKSIZE(physical_block_size, _state,\ - _conf.physical_block_size, 512), \ + _conf.physical_block_size, 0),\ DEFINE_PROP_UINT16(min_io_size, _state, _conf.min_io_size, 0), \ DEFINE_PROP_UINT32(opt_io_size, _state, _conf.opt_io_size, 0),\ DEFINE_PROP_INT32(bootindex, _state, _conf.bootindex, -1),\ -- 1.8.5.5
Re: [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices
On 29/07/14 14:27, Ekaterina Tumanova wrote: This patch set is based on a patch suggested by Einar Lueck on Feb 08, 2013. This patch set introduces: 1. s390x specific geometry detection: Add s390 specific version of hd_geometry_guess function, which uses HDIO_GETGEO ioctl. 2. A set of blocksize patches for autodetection of logical and physical blocksizes. Change history: 2.1 Original blocksize patch only configured autolookup for virtio-blk devices. There was a request from Stefan Hajnoczi to make this architecture-independent. Now autolookup is configured by default for all block devices. 2.2 Add driver method to probe blocksizes for raw and host_device drivers. (also requested by the reviewers of the original patch) Ekaterina Tumanova (4): hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x blocksize: support auto-sensing of blocksizes blocksize: Add driver method to get the blocksizes blocksize: add blkconf_blocksize call to all block devices block.c | 12 + block/raw-posix.c | 69 ++- block/raw_bsd.c | 14 ++ hw/block/Makefile.objs| 6 - hw/block/block.c | 25 + hw/block/hd-geometry.c| 56 ++ hw/block/nvme.c | 1 + hw/block/virtio-blk.c | 1 + hw/core/qdev-properties.c | 4 ++- hw/ide/qdev.c | 1 + hw/scsi/scsi-disk.c | 1 + hw/usb/dev-storage.c | 1 + include/block/block.h | 1 + include/block/block_int.h | 5 include/hw/block/block.h | 6 +++-- 15 files changed, 180 insertions(+), 23 deletions(-) CCing Kevin and Stefan.
Re: [Qemu-devel] [PATCH for-2.1] po: Update French translation
On 28 July 2014 22:44, Aurelien Jarno aurel...@aurel32.net wrote: Add new translations for recently added messages. Signed-off-by: Aurelien Jarno aurel...@aurel32.net --- po/fr_FR.po | 54 +++--- Applied, thanks. -- PMM
Re: [Qemu-devel] [PATCH for-2.1] po: update Italian translation
On 29 July 2014 07:15, Paolo Bonzini pbonz...@redhat.com wrote: Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- po/it.po | 63 --- 1 file changed, 40 insertions(+), 23 deletions(-) Applied, thanks. -- PMM
Re: [Qemu-devel] [Bug 1343827] [NEW] block.c: multiwrite_merge() truncates overlapping requests
On Mon, Jul 28, 2014 at 2:12 PM, Andrey Korolyov and...@xdel.ru wrote: bug is still here in the master and can be easily reproduced (and, of course, looks like blocker since data corruption takes place). Does anyone have an idea on when the fix (at least suggested one) will be merged? I just sent the patches and CCed you. The fix will not be included last-minute in QEMU 2.1 unless you have a real-world application that depends on this behavior. It will be included in QEMU 2.2. If this is critical for you, please reply with details and we might be able to still get it into QEMU 2.1. Stefan
Re: [Qemu-devel] [PATCH 2/2] target-mips/translate.c: Add judgement for msb and lsb
On 07/29/2014 06:52 AM, Aurelien Jarno wrote: On Mon, Jul 28, 2014 at 11:34:30PM +0100, Peter Maydell wrote: On 28 July 2014 23:32, Aurelien Jarno aurel...@aurel32.net wrote: On Mon, Jul 28, 2014 at 11:01:02PM +0100, Peter Maydell wrote: This may be true, but the TCG README doesn't define negative lengths as being unspecified behaviour (ie guaranteed to at least not crash even if the result isn't specified), and in fact the implementation of tcg_gen_deposit will assert on negative lengths. We shouldn't implement guest unpredictable cases as crash QEMU. Well I tried this code under QEMU, and it clearly doesn't crash. It seems the assert are not enabled with the default configuration options. Try --enable-debug... That's my point, it's only in debug mode, not in the default configuration. Maybe remove the tcg_debug_assert in tcg_gen_deposit_i64 and tcg_gen_deposit_i64 is a better way. But it may cause other mistake in other architecture, i'm not sure. That said I agree it's something to avoid, but I don't think triggering a RI exception is the thing to do (even if it is correct according the MIPS ISA manual) when real silicon output a random result instead. Yes, you could emit code to do that instead if you like. When I said random, it didn't say in the sense of random generator, but in the sense a result that might depend on the input value and the silicon implementation. It would be silly to emit code just for that, but it would be smart for example to skip the deposit op in that case instead of triggering an exception. I think, debug mode shouldn't crash the qemu with an unpredictable operation, so i want to fix it. And you say there shouldn't raise RI, i agree with you. Or when lsb msb, just leave the code and do nothing. What do you think about this way?
[Qemu-devel] [PATCH 0/2] block: fix multiwrite_merge() overlapping requests
This is a fix for https://bugs.launchpad.net/qemu/+bug/1343827. Patch 1 fixes the bug. Patch 2 adds a qemu-iotests test case to prevent regressions. Stefan Hajnoczi (2): block: fix overlapping multiwrite requests qemu-iotests: add multiwrite test cases block.c| 6 +++ tests/qemu-iotests/100 | 97 ++ tests/qemu-iotests/100.out | 54 ++ tests/qemu-iotests/group | 1 + 4 files changed, 158 insertions(+) create mode 100755 tests/qemu-iotests/100 create mode 100644 tests/qemu-iotests/100.out -- 1.9.3
[Qemu-devel] [PATCH 1/2] block: fix overlapping multiwrite requests
When request A is a strict subset of request B: multiwrite_merge() merges them as follows: AA The tail of request A should have been included: AAAA This patch fixes data loss but this code path is probably rare. Since guests cannot assume ordering between in-flight requests, few applications submit overlapping write requests. Reported-by: Slava Pestov sviatoslav.pes...@gmail.com Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/block.c b/block.c index 8cf519b..0a3ac43 100644 --- a/block.c +++ b/block.c @@ -4498,6 +4498,12 @@ static int multiwrite_merge(BlockDriverState *bs, BlockRequest *reqs, // Add the second request qemu_iovec_concat(qiov, reqs[i].qiov, 0, reqs[i].qiov-size); +// Add tail of first request, if necessary +if (qiov-size reqs[outidx].qiov-size) { +qemu_iovec_concat(qiov, reqs[outidx].qiov, qiov-size, + reqs[outidx].qiov-size - qiov-size); +} + reqs[outidx].nb_sectors = qiov-size 9; reqs[outidx].qiov = qiov; -- 1.9.3
[Qemu-devel] [PATCH 2/2] qemu-iotests: add multiwrite test cases
This test case covers the basic bdrv_aio_multiwrite() scenarios: 1. Single request 2. Sequential requests 3. Overlapping requests 4. Disjoint requests Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- tests/qemu-iotests/100 | 97 ++ tests/qemu-iotests/100.out | 54 ++ tests/qemu-iotests/group | 1 + 3 files changed, 152 insertions(+) create mode 100755 tests/qemu-iotests/100 create mode 100644 tests/qemu-iotests/100.out diff --git a/tests/qemu-iotests/100 b/tests/qemu-iotests/100 new file mode 100755 index 000..636e0e0 --- /dev/null +++ b/tests/qemu-iotests/100 @@ -0,0 +1,97 @@ +#!/bin/bash +# +# Test simple read/write using plain bdrv_read/bdrv_write +# +# Copyright (C) 2014 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +# creator +owner=stefa...@redhat.com + +seq=`basename $0` +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! + +_cleanup() +{ + _cleanup_test_img +} +trap _cleanup; exit \$status 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common.rc +. ./common.filter + +_supported_fmt generic +_supported_proto generic +_supported_os Linux + + +size=128M + +echo +echo == Single request == +_make_test_img $size +$QEMU_IO -c multiwrite 0 4k $TEST_IMG | _filter_qemu_io + +echo +echo == verify pattern == +$QEMU_IO -c read -P 0xcd 0 4k $TEST_IMG | _filter_qemu_io +$QEMU_IO -c read -P 0 4k 4k $TEST_IMG | _filter_qemu_io + +echo +echo == Sequential requests == +_make_test_img $size +$QEMU_IO -c multiwrite 0 4k ; 4k 4k $TEST_IMG | _filter_qemu_io + +echo +echo == verify pattern == +$QEMU_IO -c read -P 0xcd 0 4k $TEST_IMG | _filter_qemu_io +$QEMU_IO -c read -P 0xce 4k 4k $TEST_IMG | _filter_qemu_io +$QEMU_IO -c read -P 0 8k 4k $TEST_IMG | _filter_qemu_io + +echo +echo == Overlapping requests == +_make_test_img $size +$QEMU_IO -c multiwrite 0 4k ; 1k 2k $TEST_IMG | _filter_qemu_io + +echo +echo == verify pattern == +# Order of overlapping in-flight requests is not guaranteed so we cannot verify +# [1k, 3k) since it could have either pattern 0xcd or 0xce. +$QEMU_IO -c read -P 0xcd 0 1k $TEST_IMG | _filter_qemu_io +$QEMU_IO -c read -P 0xcd 3k 1k $TEST_IMG | _filter_qemu_io +$QEMU_IO -c read -P 0 4k 4k $TEST_IMG | _filter_qemu_io + +echo +echo == Disjoint requests == +_make_test_img $size +$QEMU_IO -c multiwrite 0 4k ; 64k 4k $TEST_IMG | _filter_qemu_io + +echo +echo == verify pattern == +$QEMU_IO -c read -P 0xcd 0 4k $TEST_IMG | _filter_qemu_io +$QEMU_IO -c read -P 0 4k 60k $TEST_IMG | _filter_qemu_io +$QEMU_IO -c read -P 0xce 64k 4k $TEST_IMG | _filter_qemu_io +$QEMU_IO -c read -P 0 68k 4k $TEST_IMG | _filter_qemu_io + +# success, all done +echo *** done +rm -f $seq.full +status=0 diff --git a/tests/qemu-iotests/100.out b/tests/qemu-iotests/100.out new file mode 100644 index 000..2b9f741 --- /dev/null +++ b/tests/qemu-iotests/100.out @@ -0,0 +1,54 @@ +QA output created by 100 + +== Single request == +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 +wrote 4096/4096 bytes at offset 0 +4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + +== verify pattern == +read 4096/4096 bytes at offset 0 +4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 4096/4096 bytes at offset 4096 +4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + +== Sequential requests == +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 +wrote 8192/8192 bytes at offset 0 +8 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + +== verify pattern == +read 4096/4096 bytes at offset 0 +4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 4096/4096 bytes at offset 4096 +4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 4096/4096 bytes at offset 8192 +4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + +== Overlapping requests == +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 +wrote 6144/6144 bytes at offset 0 +6 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + +== verify pattern == +read 1024/1024 bytes at offset 0 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 1024/1024 bytes at offset 3072 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 4096/4096 bytes at offset 4096 +4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + +== Disjoint requests == +Formatting
Re: [Qemu-devel] [PATCH v2] Tap: fix vcpu long time io blocking on tap
On Tue, Jul 29, 2014 at 1:02 PM, Wangkai (Kevin,C) wangka...@huawei.com wrote: -Original Message- From: Stefan Hajnoczi [mailto:stefa...@gmail.com] Sent: Monday, July 28, 2014 11:53 PM To: Wangkai (Kevin,C) Cc: qemu-devel@nongnu.org; Lee yang; aligu...@amazon.com; Stefan Hajnoczi Subject: Re: [Qemu-devel] [PATCH v2] Tap: fix vcpu long time io blocking on tap On Fri, Jul 18, 2014 at 09:33:42AM +, Wangkai (Kevin,C) wrote: fix vcpu long time io blocking on tap, when too many packets was delivered to the guest os via tap interface. -- Signed-off-by: Wangkai wangka...@huawei.com Thanks, applied to my net-next tree: https://github.com/stefanha/qemu/commits/net-next The patch did not apply cleanly so I had to do it manually: Applying: Tap: fix vcpu long time io blocking on tap fatal: corrupt patch at line 32 Repository lacks necessary blobs to fall back on 3-way merge. Cannot fall back to three-way merge. Patch failed at 0001 Tap: fix vcpu long time io blocking on tap Please use git-send-email(1). I also adjusted the commit message and doc comments to fit QEMU style and for grammar. Stefan [Wangkai (Kevin,C)] Hi Stefan, I will send a patch v3 use git send-email to you. There is no need, I have already applied v2. Thanks, Stefan
Re: [Qemu-devel] [PATCH RFC v2 02/12] reset handler for qdevified devices
Sanidhya Kashyap sanidhya.ii...@gmail.com wrote: I have added a structure containing the list of qdevified devices which have been added to the SaveVMHandlers. Since, I was unable to find any particular struct containing the information about all the qdevified devices. So, I have created my own version, which is very very specific. I shall be grateful if anyone can give some pointers on this. Signed-off-by: Sanidhya Kashyap sanidhya.ii...@gmail.com --- savevm.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/savevm.c b/savevm.c index e19ae0a..0255fa0 100644 --- a/savevm.c +++ b/savevm.c @@ -503,12 +503,29 @@ void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque) } } +/* + * Reset entry for qdevified devices. + * Contains all those devices which have been qdevified and are + * part of the SaveVMHandlers. This one allows resetting of + * single device at any time. + */ + +typedef struct VMStatesQdevResetEntry { +QTAILQ_ENTRY(VMStatesQdevResetEntry) entry; +DeviceState *dev; +char device_name[256]; +} VMStatesQdevResetEntry; + +static QTAILQ_HEAD(vmstate_reset_handlers, VMStatesQdevResetEntry) + vmstate_reset_handlers = QTAILQ_HEAD_INITIALIZER(vmstate_reset_handlers); + int vmstate_register_with_alias_id(DeviceState *dev, int instance_id, const VMStateDescription *vmsd, void *opaque, int alias_id, int required_for_version) { SaveStateEntry *se; +VMStatesQdevResetEntry *qre; /* If this triggers, alias support can be dropped for the vmsd. */ assert(alias_id == -1 || required_for_version = vmsd-minimum_version_id); @@ -521,6 +538,12 @@ int vmstate_register_with_alias_id(DeviceState *dev, int instance_id, se-alias_id = alias_id; if (dev) { + +qre = g_malloc0(sizeof(VMStatesQdevResetEntry)); +qre-dev = dev; +strcpy(qre-device_name, vmsd-name); +QTAILQ_INSERT_TAIL(vmstate_reset_handlers, qre, entry); + As stated on irc, you could add a dev field to SaveStateEntry and call it a day. Thanks, Juan. char *id = qdev_get_dev_path(dev); if (id) { pstrcpy(se-idstr, sizeof(se-idstr), id); @@ -551,6 +574,7 @@ void vmstate_unregister(DeviceState *dev, const VMStateDescription *vmsd, void *opaque) { SaveStateEntry *se, *new_se; +VMStatesQdevResetEntry *qre, *new_qre; QTAILQ_FOREACH_SAFE(se, savevm_handlers, entry, new_se) { if (se-vmsd == vmsd se-opaque == opaque) { @@ -561,6 +585,12 @@ void vmstate_unregister(DeviceState *dev, const VMStateDescription *vmsd, g_free(se); } } + +QTAILQ_FOREACH_SAFE(qre, vmstate_reset_handlers, entry, new_qre) { +if (dev qre-dev !strcmp(vmsd-name, qre-device_name)) { +QTAILQ_REMOVE(vmstate_reset_handlers, qre, entry); +} +} } static int vmstate_load(QEMUFile *f, SaveStateEntry *se, int version_id)
Re: [Qemu-devel] [RESEND PATCH V3] qemu-img info: show nocow info
On 07/29/2014 01:18 AM, Chunyan Liu wrote: Add nocow info in 'qemu-img info' output to show whether the file currently has NOCOW flag set or not. Signed-off-by: Chunyan Liu cy...@suse.com Acked-by: Eric Blake ebl...@redhat.com Actually, this should be Reviewed-by, not Acked-by. My understanding is that Reviewed-by is stronger (I actually read through the patch, and looked for possible problems) while Acked-by is weaker (a maintainer stating that the work is taken in on the basis of third-party reviews without actually reviewing the patch itself). --- Resend for QEMU 2.2. Change json version comment. block/qapi.c | 25 + qapi/block-core.json | 5 - 2 files changed, 29 insertions(+), 1 deletion(-) -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH RFC v2 03/12] VMState test: query command to extract the qdevified device names
Sanidhya Kashyap sanidhya.ii...@gmail.com wrote: I have provided a qmp interface for getting the list of qdevified devices that have been registered with SaveVMHandlers. Signed-off-by: Sanidhya Kashyap sanidhya.ii...@gmail.com --- qapi-schema.json | 22 ++ qmp-commands.hx | 25 + savevm.c | 34 ++ 3 files changed, 81 insertions(+) diff --git a/qapi-schema.json b/qapi-schema.json index b11aad2..996e6b5 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -3480,3 +3480,25 @@ # Since: 2.1 ## { 'command': 'rtc-reset-reinjection' } + +## +# @VMstatesQdevDevices +# +# list of qdevified devices that are registered with SaveStateEntry +# +# @device: list of qdevified device names Should we use qdev on the name? Or just list of devices? My understanding is that all devices are on this list, no? +# +# Since 2.2 +## +{ 'type': 'VMStatesQdevDevices', + 'data': { 'device': ['str'] } } + +## +# @query-qdev-devices +# +# returns the VMStatesQdevDevices that have the associated value +# +# Since 2.2 +## +{ 'command': 'query-qdev-devices', + 'returns': 'VMStatesQdevDevices' } diff --git a/qmp-commands.hx b/qmp-commands.hx index 4be4765..2e20032 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -3755,3 +3755,28 @@ Example: - { return: {} } EQMP + +{ +.name = query-qdev-devices, +.args_type = , +.mhandler.cmd_new = qmp_marshal_input_query_qdev_devices, +}, + +SQMP +query-qdev-devices +-- + +Shows registered Qdevified devices + + +Example (1): + +- { execute: query-qdev-devices } +- { return: [ + { + devices: [ kvm-tpr-opt, piix4_pm ] Once here, can we change this to also include the device version? i.e. something like: devices: [ [ device: [ name: kvm-tpr-opt, version, 15]]], ...] Or somesuch? + } + ] + } + +EQMP diff --git a/savevm.c b/savevm.c index 0255fa0..7c1600a 100644 --- a/savevm.c +++ b/savevm.c @@ -1167,6 +1167,40 @@ void do_savevm(Monitor *mon, const QDict *qdict) } } +static strList *create_qdev_list(const char *name, strList *list) +{ +strList *temp_list; +int len; + +if (!list) { +list = g_malloc0(sizeof(strList)); +len = strlen(name); +list-value = g_malloc0(sizeof(char)*(len+1)); +strcpy(list-value, name); +list-next = NULL; +return list; +} +temp_list = g_malloc0(sizeof(strList)); +len = strlen(name); +temp_list-value = g_malloc0(sizeof(char)*(len+1)); +strcpy(temp_list-value, name); +temp_list-next = list; +list = temp_list; +return list; +} + +VMStatesQdevDevices *qmp_query_qdev_devices(Error **errp) +{ +VMStatesQdevResetEntry *qre; +VMStatesQdevDevices *qdev_devices = g_malloc0(sizeof(VMStatesQdevDevices)); + +QTAILQ_FOREACH(qre, vmstate_reset_handlers, entry) { +qdev_devices-device = create_qdev_list(qre-device_name, + qdev_devices-device); +} +return qdev_devices; +} + void qmp_xen_save_devices_state(const char *filename, Error **errp) { QEMUFile *f;
Re: [Qemu-devel] [PATCH 1/2] block: fix overlapping multiwrite requests
On Tue, 07/29 13:41, Stefan Hajnoczi wrote: When request A is a strict subset of request B: s/subset/superset/ ? Fam
Re: [Qemu-devel] [PATCH 2/2] target-mips/translate.c: Add judgement for msb and lsb
On 28 July 2014 23:52, Aurelien Jarno aurel...@aurel32.net wrote: On Mon, Jul 28, 2014 at 11:34:30PM +0100, Peter Maydell wrote: On 28 July 2014 23:32, Aurelien Jarno aurel...@aurel32.net wrote: On Mon, Jul 28, 2014 at 11:01:02PM +0100, Peter Maydell wrote: This may be true, but the TCG README doesn't define negative lengths as being unspecified behaviour (ie guaranteed to at least not crash even if the result isn't specified), and in fact the implementation of tcg_gen_deposit will assert on negative lengths. We shouldn't implement guest unpredictable cases as crash QEMU. Well I tried this code under QEMU, and it clearly doesn't crash. It seems the assert are not enabled with the default configuration options. Try --enable-debug... That's my point, it's only in debug mode, not in the default configuration. Debug builds are pretty common though, it's not exactly something obscure like only crashes on SPARC hosts. That said I agree it's something to avoid, but I don't think triggering a RI exception is the thing to do (even if it is correct according the MIPS ISA manual) when real silicon output a random result instead. Yes, you could emit code to do that instead if you like. When I said random, it didn't say in the sense of random generator, but in the sense a result that might depend on the input value and the silicon implementation. It would be silly to emit code just for that, but it would be smart for example to skip the deposit op in that case instead of triggering an exception. That's what I had in mind, yes. thanks -- PMM
Re: [Qemu-devel] [PATCH v5 2/2] vmdk: Optimize cluster allocation
On Tue, Jul 29, 2014 at 09:00:43AM +0800, Fam Zheng wrote: On Mon, 07/28 16:11, Stefan Hajnoczi wrote: On Tue, May 27, 2014 at 04:49:22PM +0800, Fam Zheng wrote: +if (!bs-backing_hd) { +memset(whole_grain, 0, skip_start_sector BDRV_SECTOR_BITS); +memset(whole_grain + (skip_end_sector BDRV_SECTOR_BITS), 0, + cluster_bytes - (skip_end_sector BDRV_SECTOR_BITS)); +} + +assert(skip_end_sector = sector_num + extent-cluster_sectors); Does this assertion make sense? skip_end_sector is a small number of sectors (relative to start of cluster), while sector_num + extent-cluster_sectors is a large absolute sector offset. skip_end_sector is absolute sector number too. The caller hunk in this patch is: I disagree. If it was an absolute sector number then the memset() a few lines above would be incorrect: memset(whole_grain, 0, skip_start_sector BDRV_SECTOR_BITS); memset(whole_grain + (skip_end_sector BDRV_SECTOR_BITS), 0, cluster_bytes - (skip_end_sector BDRV_SECTOR_BITS)); Look at the code you pasted again: @@ -1406,12 +1468,17 @@ static int vmdk_write(BlockDriverState *bs, int64_t sector_num, if (!extent) { return -EIO; } -ret = get_cluster_offset( -bs, -extent, -m_data, -sector_num 9, !extent-compressed, -cluster_offset); +extent_begin_sector = extent-end_sector - extent-sectors; +extent_relative_sector_num = sector_num - extent_begin_sector; +index_in_cluster = extent_relative_sector_num % extent-cluster_sectors; +n = extent-cluster_sectors - index_in_cluster; +if (n nb_sectors) { +n = nb_sectors; +} +ret = get_cluster_offset(bs, extent, m_data, sector_num 9, + !(extent-compressed || zeroed), + cluster_offset, + index_in_cluster, index_in_cluster + n); if (extent-compressed) { if (ret == VMDK_OK) { /* Refuse write to allocated cluster for streamOptimized */ See the last parameter of get_cluster_offset. The last parameter is (extent_relative_sector_num % extent-cluster_sectors) + (extent-cluster_sectors - index_in_cluster). Those are definitely sector counts (like nb_sectors) and not absolute sector numbers (like sector_num). pgppqs4d2XJb8.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH 1/2] block: fix overlapping multiwrite requests
On Tue, Jul 29, 2014 at 1:46 PM, Fam Zheng f...@redhat.com wrote: On Tue, 07/29 13:41, Stefan Hajnoczi wrote: When request A is a strict subset of request B: s/subset/superset/ ? Yes :)
Re: [Qemu-devel] [PATCH 2/3] qemu-char: add -chardev exit-on-eof option
On 07/29/2014 12:12 AM, Markus Armbruster wrote: Libvirt probably won't use it for normal guests (we don't want to kill qemu just because the monitor disconnects), but does have the notion of an autodestroy guest where it might be useful (we WANT the guest to go away if libvirtd dies early). In fact, autodestroy guests are used during migration - we want to kill qemu on the destination side if libvirtd dies before the source side finishes sending the migration stream. But in that scenario, once migration succeeds, libvirt has to be able to convert an autodestroy guest back into a normal guest that no longer disappears when libvirtd does; would this be something that QMP can toggle the state of this attribute on the fly? Perhaps through qom-set? After migration completes, execution moves from source to target. Wouldn't you want to switch off target auto-destruct together with that move, atomically? Libvirt starts the destination with -S, and migration can't complete until libvirt resumes the destination CPUs with 'cont'. Libvirt's current timing of releasing auto-destruct is based on handshaking between source and destination; it occurs after source claims migration is done but before resuming CPUs on the destination, which satisfies the atomicity that you correctly observed to be necessary. However, we also have precedence of actions in QAPI that are very unlikely to be used outside of qtest, but which are not marked experimental; for example, the 'Abort' action in 'transaction' will probably never be used by libvirt Arguably not a conscious decision to make it ABI forever, more a case of nobody thought about *not* making it ABI :) Added in June 2013; and we *did* have a discussion on whether to hide the transaction name at the time... https://lists.gnu.org/archive/html/qemu-devel/2013-06/msg03281.html I don't really mind this instance, but I'm a bit concerned about rank ABI growth. And that's a good position to maintain - it's always good to justify new knobs, especially since once they are ABI, it's harder to refactor around them. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm
Quoting Alex Bligh (a...@alex.org.uk): Serge, I don't think that is in any way a problem. Is migrating to older versions ever actually expected to work? In either case I don't think for this particular case it's a problem. Good; no; and good - respectively. (The how to handle this in libvirt question is more interesting) I've been giving this some thought. However rococo we make this, I think the caller is often going to need to modify the command line anyway, i.e. is going to need to be aware of the migration problem. For instance, taking Ubuntu as an example, 12.04 shipped with qemu-kvm-1.0, and a pxe-virtio.rom of just under 64k, giving a 64k ROM slot. 14.04 ships with qemu-2.0 and a pxe-virtio.rom of over 80k, giving a 128k ROM slot. So however we fix the machine types, when migrating in a 12.04 initiated VM, qemu will need -global virtio-net-pci.romfile=/path/to/pxe-virtio.rom.12.04 on the command line (or, if you don't much care about PXE working on a soft reboot, a blank file of the same size), whereas when migrating in a 14.04 initiated VM, that must not be on the command line. Fixing this properly would entail requiring that the ROMs are (effectively) distributed with qemu or at least that all ROM sizes become part of the machine type standard. This would have the advantage that loading a larger ROM than the machine type specifies would fail unless the ROM size was explicitly configured on the command line. But this is a subject wider than this patch. So the long and the short of it is that libvirt (sadly) like anything else starting qemu machines needs to know a bit about different versions of qemu, and be able to replace a machine type option with a machine type option and more on the command line. My previous suggestion doesn't help much because qemu will still need to be passed something on the command line. I think the best way to go with this patch would be something like: * Add pc-1.0-qemu-kvm as a machine type (done) * Rename pc-1.0 to pc-1.0-qemu-git * Add an alias for pc-1.0 to either pc-1.0-qemu-git or pc-1.0-qemu-kvm, configurable at build time with a ./configure option. The distro can then set this appropriately. This would default to pc-1.0-qemu-git (i.e. the current behaviour). On distros that only every used one of the above, ./configure will sort things out, +/- self-inflicted injuries relating to ROM size changes. If life is more complicated, libvirt (and other callers) will need to be aware. pc-1.0-qemu-git and pc-1.0-qemu-kvm can be used to unambiguously mean the relevant machine type, which will fix things going forward for that machine type. WDYT? That sounds good. And from there I think the thing to do will be to introduce a transient alternate package that has the pc-1.0 alias pointing ot pc-1.0-qemu-kvm and depends on the legacy pxe rom. And maybe users can then choose that package for migration purposes if needed, without having to make any changes to libvirt at all, while others are completely unaffected. -serge
Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm
On 29 Jul 2014, at 14:03, Serge E. Hallyn se...@hallyn.com wrote: That sounds good. And from there I think the thing to do will be to introduce a transient alternate package that has the pc-1.0 alias pointing ot pc-1.0-qemu-kvm and depends on the legacy pxe rom. And maybe users can then choose that package for migration purposes if needed, without having to make any changes to libvirt at all, while others are completely unaffected. OK, I'll do that the next time I roll the patch. I'd like to hear from some others first and am mindful of Paolo's comment re waiting for the release to ship. -- Alex Bligh
Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm
Il 29/07/2014 15:03, Serge E. Hallyn ha scritto: And from there I think the thing to do will be to introduce a transient alternate package that has the pc-1.0 alias pointing ot pc-1.0-qemu-kvm This should be done in the main package, too. and depends on the legacy pxe rom. If you can make the pxe-virtio.rom file 64k or less, then that would be a good idea for 14.04 in general. Newer machine types use efi-virtio.rom, so you won't break -M pc migration. Paolo
Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm
Quoting Paolo Bonzini (pbonz...@redhat.com): Il 29/07/2014 15:03, Serge E. Hallyn ha scritto: And from there I think the thing to do will be to introduce a transient alternate package that has the pc-1.0 alias pointing ot pc-1.0-qemu-kvm This should be done in the main package, too. That seems like a problem, unless Im misunderstanding something. If we do that in the main package, then anyone running a pc-1.0 system under the qemu package won't be able to migrate. Wouldn't it be better to have pc-1.0 alias by default point to the pc-1.0-qemu machine type? and depends on the legacy pxe rom. If you can make the pxe-virtio.rom file 64k or less, then that would be a good idea for 14.04 in general. Newer machine types use efi-virtio.rom, so you won't break -M pc migration. Hm. No idea offhand how I'd do that, but it sounds worth looking into. thanks, -serge
Re: [Qemu-devel] [PATCH v5 2/2] vmdk: Optimize cluster allocation
On Tue, 07/29 13:51, Stefan Hajnoczi wrote: On Tue, Jul 29, 2014 at 09:00:43AM +0800, Fam Zheng wrote: On Mon, 07/28 16:11, Stefan Hajnoczi wrote: On Tue, May 27, 2014 at 04:49:22PM +0800, Fam Zheng wrote: +if (!bs-backing_hd) { +memset(whole_grain, 0, skip_start_sector BDRV_SECTOR_BITS); +memset(whole_grain + (skip_end_sector BDRV_SECTOR_BITS), 0, + cluster_bytes - (skip_end_sector BDRV_SECTOR_BITS)); +} + +assert(skip_end_sector = sector_num + extent-cluster_sectors); Does this assertion make sense? skip_end_sector is a small number of sectors (relative to start of cluster), while sector_num + extent-cluster_sectors is a large absolute sector offset. skip_end_sector is absolute sector number too. The caller hunk in this patch is: I disagree. You are right, I totally misread when replying. Will respin to fix the assertion and also the spellings. Thanks for reviewing and explaining my mistake :) Fam
Re: [Qemu-devel] [PATCH for-2.1 0/2] qdev-monitor: include QOM device properties in -device FOO, help output
On Wed, Jul 09, 2014 at 02:01:30PM +0200, Stefan Hajnoczi wrote: These two patches fix the -device FOO,help output regression that Cole spotted in QEMU 2.0-rc0. The problem is that virtio-blk-pci qdev properties have been converted to QOM alias properties but -device FOO,help shows only qdev properties. We simply need to update -device FOO,help code to use both qdev and QOM properties. Note that types change because a 'drive' qdev type is actually a 'str' QOM type. We're moving more and more to QOM properties where the final type for this property would be 'linkDrive' or similar. Cole: please confirm that this fixes the issue Stefan Hajnoczi (2): qmp: hide hotplugged device property from device-list-properties qdev-monitor: include QOM properties in -device FOO,help output qdev-monitor.c | 40 +--- qmp.c | 1 + 2 files changed, 18 insertions(+), 23 deletions(-) CCed qemu-stable since we ought to fix -device FOO,?. This patch was missed for QEMU 2.1 but not critical (see Cole's response). Applied to my block-next tree: https://github.com/stefanha/qemu/commits/block-next Stefan pgpHm5HOZydZ6.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm
Il 29/07/2014 15:27, Serge Hallyn ha scritto: Quoting Paolo Bonzini (pbonz...@redhat.com): Il 29/07/2014 15:03, Serge E. Hallyn ha scritto: And from there I think the thing to do will be to introduce a transient alternate package that has the pc-1.0 alias pointing ot pc-1.0-qemu-kvm This should be done in the main package, too. That seems like a problem, unless Im misunderstanding something. If we do that in the main package, then anyone running a pc-1.0 system under the qemu package won't be able to migrate. Wouldn't it be better to have pc-1.0 alias by default point to the pc-1.0-qemu machine type? You'd break that for people who have already upgraded from 12.04 and 14.04 and are keeping the old machine type. You'd fix it for people who are upgrading now. I think providing a smoother upgrade path is worthwhile, even if it annoys someone else. Unfortunately the only solution is a lot of testing *before* a release, and in fact this is why 2.1 was delayed by a migration problem. Once the release is out, you'll have to make someone unhappy. and depends on the legacy pxe rom. If you can make the pxe-virtio.rom file 64k or less, then that would be a good idea for 14.04 in general. Newer machine types use efi-virtio.rom, so you won't break -M pc migration. Hm. No idea offhand how I'd do that, but it sounds worth looking into. I'm not sure either. You could simply package the 12.04 ipxe ROMs into 14.04, and add a note about getting the sources for GPL friendliness. Paolo
Re: [Qemu-devel] [PATCH] vmdk: improve streamOptimized vmdk support
On Mon, Jul 07, 2014 at 10:54:27AM -0400, Milos Vyletel wrote: VMDK's streamOptimized format is different that regular sparse format. L1(GD) and L2(GT) tables are not predefined but rather generated and written during image creation mainly because there is no way to tell how much space data will occupy once they are compressed. Also the location of header, L1 and L2 tables differs. - L2 tables (grain tables) are written after all grains they point to - L1 tables are written after all grains and L2 tables - footer at the end is used instead of header in first sector This patch improves streamOptimized support and adds possibility to create true streamOptimized images using qemu-img. Some of the changes are from VMDK specs, some of them from hexdump-ing images from VMWare and VirtualBox. I have compared these images to the ones generated by VMWare and vbox and they are identical with the exception of DescriptorFile that has some differences but none that would change behavior(CID and some additional DDB entries differ) and streamOptimized image generated from raw image was succesfully imported (as OVA) into VMWare ESXi and Oracle OVM. Signed-off-by: Milos Vyletel milos.vyle...@gmail.com --- block/vmdk.c | 363 +- 1 files changed, 281 insertions(+), 82 deletions(-) What does this patch do beyond what QEMU already supports today? Is there a particular application that rejected QEMU's streamOptimized images? Is this a bug fix? Please use scripts/checkpatch.pl to check coding style. Fam: Please review diff --git a/block/vmdk.c b/block/vmdk.c index 27a78da..f482225 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -81,6 +81,21 @@ typedef struct { uint16_t compressAlgorithm; } QEMU_PACKED VMDK4Header; +typedef struct { +uint64_t val; +uint32_t size; +uint32_t type; +uint8_t pad[BDRV_SECTOR_SIZE - sizeof(uint64_t) - 2*sizeof(uint32_t)]; +} QEMU_PACKED VMDK4MetaMarker; + +typedef struct { +VMDK4MetaMarker footer_marker; +uint32_t magic; +VMDK4Header header; +uint8_t pad[BDRV_SECTOR_SIZE - sizeof(uint32_t) - sizeof(VMDK4Header)]; +VMDK4MetaMarker eos_marker; +} QEMU_PACKED VMDK4Footer; + #define L2_CACHE_SIZE 16 typedef struct VmdkExtent { @@ -89,24 +104,29 @@ typedef struct VmdkExtent { bool compressed; bool has_marker; bool has_zero_grain; +bool has_footer; int version; int64_t sectors; int64_t end_sector; int64_t flat_start_offset; int64_t l1_table_offset; int64_t l1_backup_table_offset; +uint32_t l1_index; uint32_t *l1_table; uint32_t *l1_backup_table; unsigned int l1_size; uint32_t l1_entry_sectors; unsigned int l2_size; +uint32_t *l2_table; uint32_t *l2_cache; uint32_t l2_cache_offsets[L2_CACHE_SIZE]; uint32_t l2_cache_counts[L2_CACHE_SIZE]; int64_t cluster_sectors; char *type; + +VMDK4Footer footer; } VmdkExtent; typedef struct BDRVVmdkState { @@ -555,14 +575,51 @@ static char *vmdk_read_desc(BlockDriverState *file, uint64_t desc_offset, return buf; } +static int vmdk_read_footer(BlockDriverState *bs, +VMDK4Footer *footer) +{ +int ret; + +/* +* footer starts 3 sectors from end +* - footer marker +* - footer +* - end-of-stream marker +*/ +ret = bdrv_pread(bs-file, +(bs-file-total_sectors - 3) * BDRV_SECTOR_SIZE, +footer, sizeof(*footer)); +if (ret 0) { +goto out; +} + +/* Some sanity checks for the footer */ +if (be32_to_cpu(footer-magic) != VMDK4_MAGIC || +le32_to_cpu(footer-footer_marker.size) != 0 || +le32_to_cpu(footer-footer_marker.type) != MARKER_FOOTER || +le64_to_cpu(footer-eos_marker.val) != 0 || +le32_to_cpu(footer-eos_marker.size) != 0 || +le32_to_cpu(footer-eos_marker.type) != MARKER_END_OF_STREAM) +{ +ret = -EINVAL; +goto out; +} + +ret = VMDK_OK; + out: +return ret; +} + static int vmdk_open_vmdk4(BlockDriverState *bs, BlockDriverState *file, int flags, Error **errp) { int ret; +bool has_footer = false; uint32_t magic; uint32_t l1_size, l1_entry_sectors; VMDK4Header header; +VMDK4Footer footer; VmdkExtent *extent; BDRVVmdkState *s = bs-opaque; int64_t l1_backup_offset = 0; @@ -593,48 +650,13 @@ static int vmdk_open_vmdk4(BlockDriverState *bs, if (le64_to_cpu(header.gd_offset) == VMDK4_GD_AT_END) { /* - * The footer takes precedence over the header, so read it in. The - * footer starts at offset -1024 from the end: One sector for the - * footer, and another one for the end-of-stream marker. + * The
Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm
On 29 Jul 2014, at 14:21, Paolo Bonzini pbonz...@redhat.com wrote: If you can make the pxe-virtio.rom file 64k or less, then that would be a good idea for 14.04 in general. Newer machine types use efi-virtio.rom, so you won't break -M pc migration. Without further, won't that break migration from 14.04 *with* the big ROM? -- Alex Bligh
Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm
On 29 Jul 2014, at 14:35, Paolo Bonzini pbonz...@redhat.com wrote: I'm not sure either. You could simply package the 12.04 ipxe ROMs into 14.04, and add a note about getting the sources for GPL friendliness. This would be my preference (or in Ubuntu's case, add it to the ipxe-qemu package) but I think it should only be used when the legacy machine type is used or it will break inbound migrations from other 14.04 machines started with 128k ROMs (AIUI). -- Alex Bligh
Re: [Qemu-devel] [PATCH RFC v2 05/12] VMstate test: basic VMState testing mechanism
Sanidhya Kashyap sanidhya.ii...@gmail.com wrote: In this patch, I have made the following changes: * changed the DPRINT statement. * renamed the variables. * added noqdev variable which decides which option to use for resetting. * added devices option which can help in resetting one or many devices (only qdevified ones). * updated the documentation. Signed-off-by: Sanidhya Kashyap sanidhya.ii...@gmail.com +## +# @test-vmstates +# +# tests the vmstates' value by dumping and loading in memory +# +# @iterations: (optional) The total iterations for vmstate testing. +# The min and max defind value is 10 and 100 respectively. +# +# @period: (optional) sleep interval between iteration (in milliseconds). +# The default interval is 100 milliseconds with min and max being +# 1 and 1 respectively. +# +# @noqdev: boolean variable which decides whether to use qdevified devices +# or not. Will be removed when all the devices have been qdevified. +# +# @devices: (optional) helps in resetting particular qdevified decices +# that have been registered with SaveStateEntry +# +# Since 2.2 +## +{ 'command': 'test-vmstates', + 'data': {'*iterations': 'int', + '*period': 'int', + 'noqdev': 'bool', Do we really care about noqdev, or should we just decree that it is false always? +#define DEBUG_TEST_VMSTATES 1 + +#ifndef DEBUG_TEST_VMSTATES +#define DEBUG_TEST_VMSTATES 0 +#endif If you have the previe line, this ones are not needed. + +#define DPRINTF(fmt, ...) \ +do { \ +if (DEBUG_TEST_VMSTATES) { \ +printf(vmstate_test: fmt, ## __VA_ARGS__); \ +} \ +} while (0) DPRINTF is *so* last year O:-) It is considedered better to just add tracepoints to the code. I think that all the DPRINTF's make sense to be a tracepoint, no? +struct VMStateLogState { +int64_t current_iteration; +int64_t iterations; +int64_t period; +bool active_state; +bool noqdev; +VMStatesQdevDevices *qdevices; +QEMUTimer *timer; + +QTAILQ_HEAD(qdev_list, VMStatesQdevResetEntry) qdev_list; +}; + +static VMStateLogState *vmstate_current_state(void) +{ +static VMStateLogState current_state = { +.active_state = false, +}; + +return current_state; +} + +static inline void test_vmstates_clear_qdev_entries(VMStateLogState *v) We need a better preffix that test_vmstates_ But I can't think of one right now. Will think later about it. +static inline bool check_device_name(VMStateLogState *v, + VMStatesQdevDevices *qdevices, + Error **errp) Is inline needed? I would expect the compiler to do a reasonable decision with an static function called only once? +{ +VMStatesQdevResetEntry *qre; +strList *devices_name = qdevices-device; +QTAILQ_INIT(v-qdev_list); +bool device_present; + +/* now, checking against each one */ +for (; devices_name; devices_name = devices_name-next) { +device_present = false; +VMStatesQdevResetEntry *new_qre; +QTAILQ_FOREACH(qre, vmstate_reset_handlers, entry) { +if (!strcmp(qre-device_name, devices_name-value)) { + +device_present = true; + +new_qre = g_malloc0(sizeof(VMStatesQdevResetEntry)); +new_qre-dev = qre-dev; +strcpy(new_qre-device_name, qre-device_name); +QTAILQ_INSERT_TAIL(v-qdev_list, new_qre, entry); + +break; return; And remove the whole device_present variable and assignation? +} +} +if (!device_present) { +test_vmstates_clear_qdev_entries(v); +error_setg(errp, Incorrect device name - %s\n, + devices_name-value); +return false; +} +} +return true; +} + +static inline void test_vmstates_reset_devices(VMStateLogState *v) +{ +VMStatesQdevResetEntry *qre; + +if (v-noqdev) { +DPRINTF(resetting all devices\n); +qemu_system_reset(VMRESET_SILENT); What is diffe9rent between calling with noqdev and with an empty device list? I would expect them to be the same list of devices. +} else if (!v-qdevices) { +QTAILQ_FOREACH(qre, vmstate_reset_handlers, entry) { +DPRINTF(resetting device: %s\n, qre-device_name); +device_reset(qre-dev); +} +} else { +QTAILQ_FOREACH(qre, v-qdev_list, entry) { +DPRINTF(resetting device: %s\n, qre-device_name); +device_reset(qre-dev); +} +} +} + +static void vmstate_test_cb(void *opaque) +{ +VMStateLogState *v = opaque; +int saved_vm_running = runstate_is_running(); +const QEMUSizedBuffer *qsb; +QEMUFile *f; +int ret; +int64_t save_vmstates_duration, load_vmstates_duration;
Re: [Qemu-devel] [PATCH for 2.1 V3] qemu-img info: show nocow info
On Mon, Jul 28, 2014 at 12:58:33PM -0600, Eric Blake wrote: On 07/28/2014 09:19 AM, Stefan Hajnoczi wrote: On Wed, Jul 09, 2014 at 10:43:13AM +0800, Chunyan Liu wrote: Add nocow info in 'qemu-img info' output to show whether the file currently has NOCOW flag set or not. Signed-off-by: Chunyan Liu cy...@suse.com --- Changes: - update output info to NOCOW flag: set block/qapi.c | 25 + qapi/block-core.json | 5 - 2 files changed, 29 insertions(+), 1 deletion(-) This patch was sent on July 9th, after the 2.1 soft freeze when we stop merging new features. Soft freeze was 17th of June. Please resend for QEMU 2.2 and update the qapi-schema.json version comment. There's still the argument that this is a bug fix for an incomplete implementation of a new feature that IS in qemu 2.1 (that is, 2.1 is adding the ability to set the nocow flag, but without this patch, that addition is a write-only interface, and this patch is correcting the bug to allow it to be a read-write interface). But it is fairly late in the game - what is the level of damage if 2.1 is released with a write-only setting, and this patch is deferred to 2.2? Without some strong justification, I can agree with the decision to postpone this patch. You can use lsattr(1), so I see no critical need. Stefan pgpEu4JGYUZON.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm
Quoting Alex Bligh (a...@alex.org.uk): On 29 Jul 2014, at 14:21, Paolo Bonzini pbonz...@redhat.com wrote: If you can make the pxe-virtio.rom file 64k or less, then that would be a good idea for 14.04 in general. Newer machine types use efi-virtio.rom, so you won't break -M pc migration. Without further, won't that break migration from 14.04 *with* the big ROM? Heh, good point.
Re: [Qemu-devel] [PATCH] vmdk: improve streamOptimized vmdk support
On Tue, Jul 29, 2014 at 9:37 AM, Stefan Hajnoczi stefa...@redhat.com wrote: On Mon, Jul 07, 2014 at 10:54:27AM -0400, Milos Vyletel wrote: VMDK's streamOptimized format is different that regular sparse format. L1(GD) and L2(GT) tables are not predefined but rather generated and written during image creation mainly because there is no way to tell how much space data will occupy once they are compressed. Also the location of header, L1 and L2 tables differs. - L2 tables (grain tables) are written after all grains they point to - L1 tables are written after all grains and L2 tables - footer at the end is used instead of header in first sector This patch improves streamOptimized support and adds possibility to create true streamOptimized images using qemu-img. Some of the changes are from VMDK specs, some of them from hexdump-ing images from VMWare and VirtualBox. I have compared these images to the ones generated by VMWare and vbox and they are identical with the exception of DescriptorFile that has some differences but none that would change behavior(CID and some additional DDB entries differ) and streamOptimized image generated from raw image was succesfully imported (as OVA) into VMWare ESXi and Oracle OVM. Signed-off-by: Milos Vyletel milos.vyle...@gmail.com --- block/vmdk.c | 363 +- 1 files changed, 281 insertions(+), 82 deletions(-) What does this patch do beyond what QEMU already supports today? Is there a particular application that rejected QEMU's streamOptimized images? Is this a bug fix? Please use scripts/checkpatch.pl to check coding style. Fam: Please review Images created/converted using QEMU were rejected by VMWare ESXi, vCloud, VirtualBox and Oracle OVM. I did not try anything else. Generally speaking streamOptimized format is not followed by QEMU. Instead regular VMDK format + encryption is used. We can say this is bug fix since streamOptimzed format is already included but does not work well. I'll check the style and fix the code. I'll postpone resubmit until further review is done since I'm sure there will be things that will need to be fixed. Milos diff --git a/block/vmdk.c b/block/vmdk.c index 27a78da..f482225 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -81,6 +81,21 @@ typedef struct { uint16_t compressAlgorithm; } QEMU_PACKED VMDK4Header; +typedef struct { +uint64_t val; +uint32_t size; +uint32_t type; +uint8_t pad[BDRV_SECTOR_SIZE - sizeof(uint64_t) - 2*sizeof(uint32_t)]; +} QEMU_PACKED VMDK4MetaMarker; + +typedef struct { +VMDK4MetaMarker footer_marker; +uint32_t magic; +VMDK4Header header; +uint8_t pad[BDRV_SECTOR_SIZE - sizeof(uint32_t) - sizeof(VMDK4Header)]; +VMDK4MetaMarker eos_marker; +} QEMU_PACKED VMDK4Footer; + #define L2_CACHE_SIZE 16 typedef struct VmdkExtent { @@ -89,24 +104,29 @@ typedef struct VmdkExtent { bool compressed; bool has_marker; bool has_zero_grain; +bool has_footer; int version; int64_t sectors; int64_t end_sector; int64_t flat_start_offset; int64_t l1_table_offset; int64_t l1_backup_table_offset; +uint32_t l1_index; uint32_t *l1_table; uint32_t *l1_backup_table; unsigned int l1_size; uint32_t l1_entry_sectors; unsigned int l2_size; +uint32_t *l2_table; uint32_t *l2_cache; uint32_t l2_cache_offsets[L2_CACHE_SIZE]; uint32_t l2_cache_counts[L2_CACHE_SIZE]; int64_t cluster_sectors; char *type; + +VMDK4Footer footer; } VmdkExtent; typedef struct BDRVVmdkState { @@ -555,14 +575,51 @@ static char *vmdk_read_desc(BlockDriverState *file, uint64_t desc_offset, return buf; } +static int vmdk_read_footer(BlockDriverState *bs, +VMDK4Footer *footer) +{ +int ret; + +/* +* footer starts 3 sectors from end +* - footer marker +* - footer +* - end-of-stream marker +*/ +ret = bdrv_pread(bs-file, +(bs-file-total_sectors - 3) * BDRV_SECTOR_SIZE, +footer, sizeof(*footer)); +if (ret 0) { +goto out; +} + +/* Some sanity checks for the footer */ +if (be32_to_cpu(footer-magic) != VMDK4_MAGIC || +le32_to_cpu(footer-footer_marker.size) != 0 || +le32_to_cpu(footer-footer_marker.type) != MARKER_FOOTER || +le64_to_cpu(footer-eos_marker.val) != 0 || +le32_to_cpu(footer-eos_marker.size) != 0 || +le32_to_cpu(footer-eos_marker.type) != MARKER_END_OF_STREAM) +{ +ret = -EINVAL; +goto out; +} + +ret = VMDK_OK; + out: +return ret; +} + static int vmdk_open_vmdk4(BlockDriverState *bs, BlockDriverState *file, int flags, Error **errp) { int ret; +bool has_footer = false; uint32_t magic;
[Qemu-devel] [PATCH 2/2] target-mips/translate.c: Update OPC_SYNCI
Update OPC_SYNCI with BS_STOP, in order to handle the instructions which saved in the same TB of the store instruction. Signed-off-by: Dongxue Zhang elta@gmail.com --- target-mips/translate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/target-mips/translate.c b/target-mips/translate.c index c381366..dc8afcf 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -15334,7 +15334,9 @@ static void decode_opc (CPUMIPSState *env, DisasContext *ctx) break; case OPC_SYNCI: check_insn(ctx, ISA_MIPS32R2); -/* Treat as NOP. */ +/* Break the TB to be able to sync copied instructions + immediately */ +ctx-bstate = BS_STOP; break; case OPC_BPOSGE32:/* MIPS DSP branch */ #if defined(TARGET_MIPS64) -- 1.8.1.2
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work
Il 28/07/2014 17:03, David Hildenbrand ha scritto: Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like a SIPI++ as it performs a psw exchange - NMI). So we basically have two paths that can lead to a state change. All interrupt bits may be in any combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START be denied). The other thing may be that on s390, each vcpu (including itself) can put another vcpu into the STOPPED state - I assume that this is different for x86 INIT_RECEIVED. For this reason we have to watch out for bad race conditions (e.g. multiple vcpus working on another vcpu)... You can do that in x86 by sending an INIT inter-processor interrupt. A SIPI is ignored if the CPU is not in INIT_RECEIVED state. Commit 66450a21f99636af4fafac2afd33f1a40631bc3a introduced the current implementation. - an INIT cancels a previous SIPI; - if both INIT and SIPI are sent, on real hardware you need to have a few hundred microseconds between them, but KVM will reliably process INIT before SIPI. See commit 299018f44ac553dce3caf84df1d14c4764faa279 for an example of the races that can happen. Note that x86 has KVM_MP_STATE_SIPI_RECEIVED state but it is obsolete, we go straight from KVM_MP_STATE_INIT_RECEIVED to KVM_MP_STATE_RUNNABLE.
Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm
On 29 Jul 2014, at 14:42, Paolo Bonzini pbonz...@redhat.com wrote: Il 29/07/2014 15:39, Alex Bligh ha scritto: I'm not sure either. You could simply package the 12.04 ipxe ROMs into 14.04, and add a note about getting the sources for GPL friendliness. This would be my preference (or in Ubuntu's case, add it to the ipxe-qemu package) but I think it should only be used when the legacy machine type is used or it will break inbound migrations from other 14.04 machines started with 128k ROMs (AIUI). -M pc and its alias -M pc-i440fx-2.0 doesn't use pxe-virtio.rom at all (at least upstream). Really? How does it pxeboot on virtio then (it definitely does pxeboot on virtio)? Does Ubuntu 14.04 have efi-virtio.rom? Yes. After 2 layers of symlinks you get to. ubuntu@trustytest:~$ ls -la /usr/lib/ipxe/qemu/efi-virtio.rom -rw-r--r-- 1 root root 220672 Jan 6 2014 /usr/lib/ipxe/qemu/efi-virtio.rom It would break -M pc-1.0 started on older 14.04, but I think that's acceptable. I was more worried about any previous versions of Ubuntu (newer than 12.04) which might also be using the larger rom size. But then I haven't investigated at what stage the rom size grew. -- Alex Bligh
Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm
Il 29/07/2014 15:56, Alex Bligh ha scritto: On 29 Jul 2014, at 14:42, Paolo Bonzini pbonz...@redhat.com wrote: Il 29/07/2014 15:39, Alex Bligh ha scritto: I'm not sure either. You could simply package the 12.04 ipxe ROMs into 14.04, and add a note about getting the sources for GPL friendliness. This would be my preference (or in Ubuntu's case, add it to the ipxe-qemu package) but I think it should only be used when the legacy machine type is used or it will break inbound migrations from other 14.04 machines started with 128k ROMs (AIUI). -M pc and its alias -M pc-i440fx-2.0 doesn't use pxe-virtio.rom at all (at least upstream). Really? How does it pxeboot on virtio then (it definitely does pxeboot on virtio)? efi-virtio.rom contains both BIOS and UEFI ROMs. Does Ubuntu 14.04 have efi-virtio.rom? Yes. After 2 layers of symlinks you get to. ubuntu@trustytest:~$ ls -la /usr/lib/ipxe/qemu/efi-virtio.rom -rw-r--r-- 1 root root 220672 Jan 6 2014 /usr/lib/ipxe/qemu/efi-virtio.rom It would break -M pc-1.0 started on older 14.04, but I think that's acceptable. I was more worried about any previous versions of Ubuntu (newer than 12.04) which might also be using the larger rom size. But then I haven't investigated at what stage the rom size grew. You're right, but in Serge's shoes I wouldn't bother about anything except LTS. Paolo
Re: [Qemu-devel] [PATCH] vmdk: improve streamOptimized vmdk support
On Tue, Jul 29, 2014 at 2:46 PM, Milos Vyletel milos.vyle...@gmail.com wrote: On Tue, Jul 29, 2014 at 9:37 AM, Stefan Hajnoczi stefa...@redhat.com wrote: On Mon, Jul 07, 2014 at 10:54:27AM -0400, Milos Vyletel wrote: VMDK's streamOptimized format is different that regular sparse format. L1(GD) and L2(GT) tables are not predefined but rather generated and written during image creation mainly because there is no way to tell how much space data will occupy once they are compressed. Also the location of header, L1 and L2 tables differs. - L2 tables (grain tables) are written after all grains they point to - L1 tables are written after all grains and L2 tables - footer at the end is used instead of header in first sector This patch improves streamOptimized support and adds possibility to create true streamOptimized images using qemu-img. Some of the changes are from VMDK specs, some of them from hexdump-ing images from VMWare and VirtualBox. I have compared these images to the ones generated by VMWare and vbox and they are identical with the exception of DescriptorFile that has some differences but none that would change behavior(CID and some additional DDB entries differ) and streamOptimized image generated from raw image was succesfully imported (as OVA) into VMWare ESXi and Oracle OVM. Signed-off-by: Milos Vyletel milos.vyle...@gmail.com --- block/vmdk.c | 363 +- 1 files changed, 281 insertions(+), 82 deletions(-) What does this patch do beyond what QEMU already supports today? Is there a particular application that rejected QEMU's streamOptimized images? Is this a bug fix? Please use scripts/checkpatch.pl to check coding style. Fam: Please review Images created/converted using QEMU were rejected by VMWare ESXi, vCloud, VirtualBox and Oracle OVM. I did not try anything else. Generally speaking streamOptimized format is not followed by QEMU. Instead regular VMDK format + encryption is used. We can say this is bug fix since streamOptimzed format is already included but does not work well. I'll check the style and fix the code. I'll postpone resubmit until further review is done since I'm sure there will be things that will need to be fixed. What was the command-line you used? qemu-img convert -f qcow2 -O vmdk -o subformat=streamOptimized input.qcow2 output.vmdk Stefan
Re: [Qemu-devel] [PATCH 2/3] qemu-char: add -chardev exit-on-eof option
Eric Blake ebl...@redhat.com writes: On 07/29/2014 12:12 AM, Markus Armbruster wrote: Libvirt probably won't use it for normal guests (we don't want to kill qemu just because the monitor disconnects), but does have the notion of an autodestroy guest where it might be useful (we WANT the guest to go away if libvirtd dies early). In fact, autodestroy guests are used during migration - we want to kill qemu on the destination side if libvirtd dies before the source side finishes sending the migration stream. But in that scenario, once migration succeeds, libvirt has to be able to convert an autodestroy guest back into a normal guest that no longer disappears when libvirtd does; would this be something that QMP can toggle the state of this attribute on the fly? Perhaps through qom-set? After migration completes, execution moves from source to target. Wouldn't you want to switch off target auto-destruct together with that move, atomically? Libvirt starts the destination with -S, and migration can't complete until libvirt resumes the destination CPUs with 'cont'. Libvirt's current timing of releasing auto-destruct is based on handshaking between source and destination; it occurs after source claims migration is done but before resuming CPUs on the destination, which satisfies the atomicity that you correctly observed to be necessary. Makes sense, thanks. However, we also have precedence of actions in QAPI that are very unlikely to be used outside of qtest, but which are not marked experimental; for example, the 'Abort' action in 'transaction' will probably never be used by libvirt Arguably not a conscious decision to make it ABI forever, more a case of nobody thought about *not* making it ABI :) Added in June 2013; and we *did* have a discussion on whether to hide the transaction name at the time... https://lists.gnu.org/archive/html/qemu-devel/2013-06/msg03281.html I stand corrected. I don't really mind this instance, but I'm a bit concerned about rank ABI growth. And that's a good position to maintain - it's always good to justify new knobs, especially since once they are ABI, it's harder to refactor around them. Exactly.
Re: [Qemu-devel] [PATCH] vmdk: improve streamOptimized vmdk support
On Tue, Jul 29, 2014 at 10:37 AM, Stefan Hajnoczi stefa...@gmail.com wrote: On Tue, Jul 29, 2014 at 2:46 PM, Milos Vyletel milos.vyle...@gmail.com wrote: On Tue, Jul 29, 2014 at 9:37 AM, Stefan Hajnoczi stefa...@redhat.com wrote: On Mon, Jul 07, 2014 at 10:54:27AM -0400, Milos Vyletel wrote: VMDK's streamOptimized format is different that regular sparse format. L1(GD) and L2(GT) tables are not predefined but rather generated and written during image creation mainly because there is no way to tell how much space data will occupy once they are compressed. Also the location of header, L1 and L2 tables differs. - L2 tables (grain tables) are written after all grains they point to - L1 tables are written after all grains and L2 tables - footer at the end is used instead of header in first sector This patch improves streamOptimized support and adds possibility to create true streamOptimized images using qemu-img. Some of the changes are from VMDK specs, some of them from hexdump-ing images from VMWare and VirtualBox. I have compared these images to the ones generated by VMWare and vbox and they are identical with the exception of DescriptorFile that has some differences but none that would change behavior(CID and some additional DDB entries differ) and streamOptimized image generated from raw image was succesfully imported (as OVA) into VMWare ESXi and Oracle OVM. Signed-off-by: Milos Vyletel milos.vyle...@gmail.com --- block/vmdk.c | 363 +- 1 files changed, 281 insertions(+), 82 deletions(-) What does this patch do beyond what QEMU already supports today? Is there a particular application that rejected QEMU's streamOptimized images? Is this a bug fix? Please use scripts/checkpatch.pl to check coding style. Fam: Please review Images created/converted using QEMU were rejected by VMWare ESXi, vCloud, VirtualBox and Oracle OVM. I did not try anything else. Generally speaking streamOptimized format is not followed by QEMU. Instead regular VMDK format + encryption is used. We can say this is bug fix since streamOptimzed format is already included but does not work well. I'll check the style and fix the code. I'll postpone resubmit until further review is done since I'm sure there will be things that will need to be fixed. What was the command-line you used? qemu-img convert -f qcow2 -O vmdk -o subformat=streamOptimized input.qcow2 output.vmdk Stefan I've been using test image from http://wiki.qemu.org/download/linux-0.2.img.bz2 and converting it like this: qemu-img convert -O vmdk -o adapter_type=lsilogic,compat6,subformat=streamOptimized linux-0.2.img linux-0.2.vmdk Milos
Re: [Qemu-devel] [PATCH v6 3/3] sclp-s390: Add memory hotplug SCLPs
On 07/29/2014 08:17 AM, Christian Borntraeger wrote: On 30/06/14 16:00, Matthew Rosato wrote: Add memory information to read SCP info and add handlers for Read Storage Element Information, Attach Storage Element, Assign Storage and Unassign Signed-off-by: Matthew Rosato mjros...@linux.vnet.ibm.com --- hw/s390x/sclp.c| 259 ++-- target-s390x/cpu.h | 15 +++ target-s390x/kvm.c |5 + 3 files changed, 273 insertions(+), 6 deletions(-) diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c index 769d7c3..936b189 100644 --- a/hw/s390x/sclp.c +++ b/hw/s390x/sclp.c @@ -16,7 +16,8 @@ #include sysemu/kvm.h #include exec/memory.h #include sysemu/sysemu.h - +#include exec/address-spaces.h +#include qemu/config-file.h #include hw/s390x/sclp.h #include hw/s390x/event-facility.h @@ -33,10 +34,19 @@ static inline SCLPEventFacility *get_event_facility(void) static void read_SCP_info(SCCB *sccb) { ReadInfo *read_info = (ReadInfo *) sccb; +sclpMemoryHotplugDev *mhd = get_sclp_memory_hotplug_dev(); CPUState *cpu; -int shift = 0; int cpu_count = 0; int i = 0; +int increment_size = 20; +int rnsize, rnmax; +QemuOpts *opts = qemu_opts_find(qemu_find_opts(memory), NULL); +int slots = qemu_opt_get_number(opts, slots, 0); +int max_avail_slots = s390_get_memslot_count(kvm_state); + +if (slots max_avail_slots) { +slots = max_avail_slots; +} CPU_FOREACH(cpu) { cpu_count++; @@ -54,14 +64,235 @@ static void read_SCP_info(SCCB *sccb) read_info-facilities = cpu_to_be64(SCLP_HAS_CPU_INFO); -while ((ram_size (20 + shift)) 65535) { -shift++; +/* I was about to apply this series, but I think it breaks the non-ccw machine for strange memory sizes (e.g. 40001MB). The s390-virtio.c file still does a shifting via 65535. Maybe just do a change over there as well? Verified, this breaks on the non-ccw machine. Thanks for catching it -- I'll submit a new series that includes a patch for it. You should also start making up your mind about migration and prepare patches. I will send cpu migration patches soon. OK, sure, I have code that I was testing a while ago, I'll start cleaning it up for submission. Christian + * The storage increment size is a multiple of 1M and is a power of 2. + * The number of storage increments must be MAX_STORAGE_INCREMENTS or fewer. + */ +while ((ram_size increment_size) MAX_STORAGE_INCREMENTS) { +increment_size++; +} +rnmax = ram_size increment_size; + +/* Memory Hotplug is only supported for the ccw machine type */ +if (mhd) { +while ((mhd-standby_mem_size increment_size) + MAX_STORAGE_INCREMENTS) { +increment_size++; +} +assert(increment_size == mhd-increment_size); + +mhd-standby_subregion_size = MEM_SECTION_SIZE; +/* Deduct the memory slot already used for core */ +if (slots 0) { +while ((mhd-standby_subregion_size * (slots - 1) + mhd-standby_mem_size)) { +mhd-standby_subregion_size = mhd-standby_subregion_size 1; +} +} +/* + * Initialize mapping of guest standby memory sections indicating which + * are and are not online. Assume all standby memory begins offline. + */ +if (mhd-standby_state_map == 0) { +if (mhd-standby_mem_size % mhd-standby_subregion_size) { +mhd-standby_state_map = g_malloc0((mhd-standby_mem_size / + mhd-standby_subregion_size + 1) * + (mhd-standby_subregion_size / + MEM_SECTION_SIZE)); +} else { +mhd-standby_state_map = g_malloc0(mhd-standby_mem_size / + MEM_SECTION_SIZE); +} +} +mhd-padded_ram_size = ram_size + mhd-pad_size; +mhd-rzm = 1 mhd-increment_size; +rnmax = ((ram_size + mhd-standby_mem_size + mhd-pad_size) + mhd-increment_size); + +read_info-facilities |= cpu_to_be64(SCLP_FC_ASSIGN_ATTACH_READ_STOR); +} + +rnsize = 1 (increment_size - 20); +if (rnsize = 128) { +read_info-rnsize = rnsize; +} else { +read_info-rnsize = 0; +read_info-rnsize2 = cpu_to_be32(rnsize); } -read_info-rnmax = cpu_to_be16(ram_size (20 + shift)); -read_info-rnsize = 1 shift; + +if (rnmax 0x1) { +read_info-rnmax = cpu_to_be16(rnmax); +} else { +read_info-rnmax = cpu_to_be16(0); +read_info-rnmax2 = cpu_to_be64(rnmax); +} + sccb-h.response_code =
Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm
On 29 Jul 2014, at 15:00, Paolo Bonzini pbonz...@redhat.com wrote: efi-virtio.rom contains both BIOS and UEFI ROMs. You learn a new thing every day. You're right, but in Serge's shoes I wouldn't bother about anything except LTS. Certainly this would be the most convenient path for me (with my 'LTS user' hat on), i.e. make it 'just work' out the box. -- Alex Bligh
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work
Il 28/07/2014 17:03, David Hildenbrand ha scritto: Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like a SIPI++ as it performs a psw exchange - NMI). So we basically have two paths that can lead to a state change. All interrupt bits may be in any combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START be denied). The other thing may be that on s390, each vcpu (including itself) can put another vcpu into the STOPPED state - I assume that this is different for x86 INIT_RECEIVED. For this reason we have to watch out for bad race conditions (e.g. multiple vcpus working on another vcpu)... You can do that in x86 by sending an INIT inter-processor interrupt. A SIPI is ignored if the CPU is not in INIT_RECEIVED state. Commit 66450a21f99636af4fafac2afd33f1a40631bc3a introduced the current implementation. - an INIT cancels a previous SIPI; - if both INIT and SIPI are sent, on real hardware you need to have a few hundred microseconds between them, but KVM will reliably process INIT before SIPI. See commit 299018f44ac553dce3caf84df1d14c4764faa279 for an example of the races that can happen. Note that x86 has KVM_MP_STATE_SIPI_RECEIVED state but it is obsolete, we go straight from KVM_MP_STATE_INIT_RECEIVED to KVM_MP_STATE_RUNNABLE. Thanks for the explanation Paolo! Looks like from an interrupt point of view, the states have a lot in common. The major thing that differs on s390 is probably the way these interrupts are generated and what else they influence (all the power of the SIGP facility :) + special check-stop state that can't be left by an interrupt, only by SIGP CPU resets). David
Re: [Qemu-devel] [PATCH 2/2] target-mips/translate.c: Add judgement for msb and lsb
Ok, I got you. I will re-build a new patch for all the bitops. 2014-07-29 22:08 GMT+08:00 Aurelien Jarno aurel...@aurel32.net: On Tue, Jul 29, 2014 at 08:41:08PM +0800, Elta wrote: I think, debug mode shouldn't crash the qemu with an unpredictable operation, so i want to fix it. And you say there shouldn't raise RI, i agree with you. Agreed. Or when lsb msb, just leave the code and do nothing. What do you think about this way? Yes, you can use something like: if (lsb = msb) { deposit(...) } -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [PATCH] hw/arm/virt: fix pl031 addr typo
On 29 July 2014 16:44, Andrew Jones drjo...@redhat.com wrote: pl031's base address should be 0x9001000, 0x9001. While in there also add some spacing and zeros to make it easier to read the map. Signed-off-by: Andrew Jones drjo...@redhat.com -[VIRT_RTC] = { 0x9001, 0x1000 }, +[VIRT_RTC] ={ 0x09001000, 0x1000 }, ...and assuming from the commit message that this is the only actual change, the alignment to 64K is deliberate, for the benefit of guests with 64K pages. thanks -- PMM