Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: There is a need to add some more fields to VirtIODevice that should be migrated (broken status, endianness). The problem is that we do not want to break compatibility while adding a new feature... This issue has been addressed in the generic VMState code with the use of optional subsections. As a *temporary* alternative to port the whole virtio migration code to VMState, this patch mimics a similar subsectionning ability for virtio. Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. Amit
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: There is a need to add some more fields to VirtIODevice that should be migrated (broken status, endianness). The problem is that we do not want to break compatibility while adding a new feature... This issue has been addressed in the generic VMState code with the use of optional subsections. As a *temporary* alternative to port the whole virtio migration code to VMState, this patch mimics a similar subsectionning ability for virtio. Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. Amit This calls for a way for devices to inherit properties from the bus, which doesn't exist ATM. Fine but let's not hold up this patchset because of this. -- MST
[Qemu-devel] pidfile option in config file
Hello, I'm using the `-pidfile` option in my qemu command line. Since I'm also using the `-readconfig` option I thought it was a good thing to include the pidfile option in my config file. Unfortunately I did not found any possibility to add such option in my config file. Is it something I could expect? Regards, -- William
Re: [Qemu-devel] [PATCH] SMI handler should set the CPL to zero and save and restore it on rsm.
Il 15/05/2014 03:32, Kevin O'Connor ha scritto: On Wed, May 14, 2014 at 08:20:59PM -0400, Kevin O'Connor wrote: On Wed, May 14, 2014 at 10:05:47AM +0200, Paolo Bonzini wrote: CPL isn't even altered when CS is reloaded, because you cannot jump out of ring-0 except with an inter-privilege IRET, and that reloads SS too. An IRET or task switch is also the only way to set EFLAGS.VM, and it will hardcode SS.DPL=3, again matching CPL=3. Finally, to get out of real mode you need to have CPL=0, and whatever got you at CPL has also loaded SS with a ring-0 stack. This means that SS.DPL=0 right after clearing CR0.PE. Using SS.DPL as the CPL really sounds like the right approach. I tried it on my KVM testcase, and it works well. For QEMU, even the special case of SYSRET will be handled fine because QEMU does set SS.DPL = 3: cpu_x86_load_seg_cache(env, R_SS, selector + 8, 0, 0x, DESC_G_MASK | DESC_B_MASK | DESC_P_MASK | DESC_S_MASK | (3 DESC_DPL_SHIFT) | DESC_W_MASK | DESC_A_MASK); SS.DPL=CPL=3, SS.RPL=selector 3 is a mix of Intel behavior (which is SS.DPL=SS.RPL=CPL=3) and AMD behavior (because they set CPL=3 but SS.DPL=SS.RPL=selector 3). We may want to match Intel behavior, but that's a different change. Can you check if this patch works for you, and if so reply with Tested-by/Reviewed-by? Your patch causes Freedos to crash when emm386 is loaded, so I think it broke VM86 mode. Below are some logs I took from qemu at the point of the crash. FYI, with the patch below my quick test cases all look okay. --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -974,7 +974,6 @@ static inline void cpu_x86_load_seg_cache(CPUX86State *env, /* update the hidden flags */ { if (seg_reg == R_CS) { -int cpl = selector 3; #ifdef TARGET_X86_64 if ((env-hflags HF_LMA_MASK) (flags DESC_L_MASK)) { /* long mode */ @@ -984,16 +983,17 @@ static inline void cpu_x86_load_seg_cache(CPUX86State *env, #endif { /* legacy / compatibility case */ -if (!(env-cr[0] CR0_PE_MASK)) { -cpl = 0; -} else if (env-eflags VM_MASK) { -cpl = 3; -} new_hflags = (env-segs[R_CS].flags DESC_B_MASK) (DESC_B_SHIFT - HF_CS32_SHIFT); env-hflags = (env-hflags ~(HF_CS32_MASK | HF_CS64_MASK)) | new_hflags; } +} +if (seg_reg == R_SS) { +int cpl = (flags DESC_DPL_SHIFT) 3; +if (env-eflags VM_MASK) { +cpl = 3; +} #if HF_CPL_MASK != 3 #error HF_CPL_MASK is hardcoded #endif Looks like a bug entering VM86 mode. I'll take a look, thanks for confirming what works and what doesn't! Paolo
Re: [Qemu-devel] [PATCH v3] serial-pci: Set prog interface field of pci config to 16550 compatible
On Thu, Feb 27, 2014 at 02:05:05AM +0100, BALATON Zoltan wrote: Signed-off-by: BALATON Zoltan bala...@eik.bme.hu --- This looks sane, a minor comment below (hopefully last). Thanks! v2: resubmission after pc-2.1 is added with the multiport case v3: added compatibility check to avoid changing earlier than pc-2.1 hw/char/serial-pci.c | 11 +++ include/hw/i386/pc.h | 15 +++ 2 files changed, 26 insertions(+) diff --git a/hw/char/serial-pci.c b/hw/char/serial-pci.c index 991c99f..ae57098 100644 --- a/hw/char/serial-pci.c +++ b/hw/char/serial-pci.c @@ -34,6 +34,7 @@ typedef struct PCISerialState { PCIDevice dev; SerialState state; +uint8_t compat; } PCISerialState; typedef struct PCIMultiSerialState { @@ -44,6 +45,7 @@ typedef struct PCIMultiSerialState { SerialState state[PCI_SERIAL_MAX_PORTS]; uint32_t level[PCI_SERIAL_MAX_PORTS]; qemu_irq *irqs; +uint8_t compat; } PCIMultiSerialState; static int serial_pci_init(PCIDevice *dev) This name isn't very informative. I think this should be a prog_if property defaulting to 0x2 and over-written for old machine types, Devices don't care why their prog_if is changed, it's the machine type that cares about compatibility. See commit aa93200b88fb1071eaf21bf766711762ed4630e2 Author: Gabriel L. Somlo gso...@gmail.com Date: Mon May 5 10:52:51 2014 -0400 apic: use emulated lapic version 0x14 on pc machines = 2.1 as an example. Alternatively, create a bit property legacy_prog_if and change field name to compat_flags, have property set flags here. See commit 2af234e61d59f39ae16ba882271e7c4fef2c41c1 Author: Michael S. Tsirkin m...@redhat.com Date: Thu Feb 14 19:11:27 2013 +0200 e1000: unbreak the guest network migration to 1.3 for an example. I think the 1st option is better but second one would be more or less ok. @@ -60,6 +62,9 @@ static int serial_pci_init(PCIDevice *dev) return -1; } +if (!pci-compat) { +pci-dev.config[PCI_CLASS_PROG] = 0x02; /* 16550 compatible */ +} pci-dev.config[PCI_INTERRUPT_PIN] = 0x01; s-irq = pci_allocate_irq(pci-dev); @@ -101,6 +106,9 @@ static int multi_serial_pci_init(PCIDevice *dev) assert(pci-ports 0); assert(pci-ports = PCI_SERIAL_MAX_PORTS); +if (!pci-compat) { +pci-dev.config[PCI_CLASS_PROG] = 0x02; /* 16550 compatible */ +} pci-dev.config[PCI_INTERRUPT_PIN] = 0x01; memory_region_init(pci-iobar, OBJECT(pci), multiserial, 8 * pci-ports); pci_register_bar(pci-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, pci-iobar); @@ -177,12 +185,14 @@ static const VMStateDescription vmstate_pci_multi_serial = { static Property serial_pci_properties[] = { DEFINE_PROP_CHR(chardev, PCISerialState, state.chr), +DEFINE_PROP_UINT8(compat, PCISerialState, compat, 0), DEFINE_PROP_END_OF_LIST(), }; static Property multi_2x_serial_pci_properties[] = { DEFINE_PROP_CHR(chardev1, PCIMultiSerialState, state[0].chr), DEFINE_PROP_CHR(chardev2, PCIMultiSerialState, state[1].chr), +DEFINE_PROP_UINT8(compat, PCIMultiSerialState, compat, 0), DEFINE_PROP_END_OF_LIST(), }; @@ -191,6 +201,7 @@ static Property multi_4x_serial_pci_properties[] = { DEFINE_PROP_CHR(chardev2, PCIMultiSerialState, state[1].chr), DEFINE_PROP_CHR(chardev3, PCIMultiSerialState, state[2].chr), DEFINE_PROP_CHR(chardev4, PCIMultiSerialState, state[3].chr), +DEFINE_PROP_UINT8(compat, PCIMultiSerialState, compat, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 32a7687..8fb8046 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -271,6 +271,21 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *); .driver = apic,\ .property = version,\ .value= stringify(0x11),\ +},\ +{\ +.driver = pci-serial,\ +.property = compat,\ +.value= stringify(1),\ +},\ +{\ +.driver = pci-serial-2x,\ +.property = compat,\ +.value= stringify(1),\ +},\ +{\ +.driver = pci-serial-4x,\ +.property = compat,\ +.value= stringify(1),\ } #define PC_COMPAT_1_7 \ -- 1.8.1.5
Re: [Qemu-devel] [PATCH v3] serial-pci: Set prog interface field of pci config to 16550 compatible
On Do, 2014-02-27 at 02:05 +0100, BALATON Zoltan wrote: ^^ Looks like your clock is _way_ off. +if (!pci-compat) { +pci-dev.config[PCI_CLASS_PROG] = 0x02; /* 16550 compatible */ +} static Property serial_pci_properties[] = { DEFINE_PROP_CHR(chardev, PCISerialState, state.chr), +DEFINE_PROP_UINT8(compat, PCISerialState, compat, 0), DEFINE_PROP_END_OF_LIST(), }; +{\ +.driver = pci-serial,\ +.property = compat,\ +.value= stringify(1),\ +},\ Reviewed-by: Gerd Hoffmann kra...@redhat.com mst, do you take that through the pci tree? cheers, Gerd
Re: [Qemu-devel] [PATCH v3] serial-pci: Set prog interface field of pci config to 16550 compatible
On Thu, May 15, 2014 at 08:42:17AM +0200, Gerd Hoffmann wrote: On Do, 2014-02-27 at 02:05 +0100, BALATON Zoltan wrote: ^^ Looks like your clock is _way_ off. +if (!pci-compat) { +pci-dev.config[PCI_CLASS_PROG] = 0x02; /* 16550 compatible */ +} static Property serial_pci_properties[] = { DEFINE_PROP_CHR(chardev, PCISerialState, state.chr), +DEFINE_PROP_UINT8(compat, PCISerialState, compat, 0), DEFINE_PROP_END_OF_LIST(), }; +{\ +.driver = pci-serial,\ +.property = compat,\ +.value= stringify(1),\ +},\ Reviewed-by: Gerd Hoffmann kra...@redhat.com mst, do you take that through the pci tree? cheers, Gerd Yes but I'd like the property renamed. Agree?
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: There is a need to add some more fields to VirtIODevice that should be migrated (broken status, endianness). The problem is that we do not want to break compatibility while adding a new feature... This issue has been addressed in the generic VMState code with the use of optional subsections. As a *temporary* alternative to port the whole virtio migration code to VMState, this patch mimics a similar subsectionning ability for virtio. BTW Greg, do you plan on working on vmstate for virtio? Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. This calls for a way for devices to inherit properties from the bus, which doesn't exist ATM. Fine but let's not hold up this patchset because of this. No, only suggestion is to add a migration section in the bus, and then it's easier to do this in the post-migrate functions for each device -- so only one new section gets introduced instead of all devices being modified to send a new subsection. Amit
Re: [Qemu-devel] [PATCH] rdma: Fix block during rdma migration
-Original Message- From: Michael R. Hines [mailto:mrhi...@linux.vnet.ibm.com] Sent: Thursday, May 15, 2014 8:44 AM To: Gonglei (Arei); qemu-devel@nongnu.org Cc: Huangweidong (C); quint...@redhat.com; dgilb...@redhat.com; owass...@redhat.com; mrhi...@us.ibm.com; pbonz...@redhat.com; Moyuxiang Subject: Re: [Qemu-devel] [PATCH] rdma: Fix block during rdma migration On 05/09/2014 12:25 PM, Gonglei (Arei) wrote: Hi, -Original Message- From: Michael R. Hines [mailto:mrhi...@linux.vnet.ibm.com] Sent: Tuesday, April 01, 2014 8:42 AM To: Gonglei (Arei); qemu-devel@nongnu.org Cc: Huangweidong (C); quint...@redhat.com; dgilb...@redhat.com; owass...@redhat.com; mrhi...@us.ibm.com; Moyuxiang; pbonz...@redhat.com Subject: Re: [Qemu-devel] [PATCH] rdma: Fix block during rdma migration On 03/29/2014 03:39 PM, arei.gong...@huawei.com wrote: From: Mo Yuxiang moyuxi...@huawei.com If the networking break or there's something wrong with rdma device(ib0 with no IP) during rdma migration, the main_loop of qemu will be blocked in rdma_destroy_id. I add rdma_ack_cm_event to fix this bug. Signed-off-by: Mo Yuxiang moyuxi...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- migration-rdma.c | 1 + 1 file changed, 1 insertion(+) diff --git a/migration-rdma.c b/migration-rdma.c index eeb4302..f60749b 100644 --- a/migration-rdma.c +++ b/migration-rdma.c @@ -949,6 +949,7 @@ route: ERROR(errp, result not equal to event_addr_resolved %s, rdma_event_str(cm_event-event)); perror(rdma_resolve_addr); +rdma_ack_cm_event(cm_event); ret = -EINVAL; goto err_resolve_get_addr; } Reviewed-by: Michael R. Hines mrhi...@us.ibm.com Good catch. =) That's an obvious bug. It looks like I need to do a much better job of kill -9 inside the regression testing scripts - probably i should try killing the migration prematurely at different periods just to be sure there are no more places where the connection state is not getting cleaned up.. - Michael Michael, do you have a plan to pull this patch to master? Thanks. Best regards, -Gonglei Sorry for the late reply, but I'm not the maintainer for migration, that's Juan (I can only signoff on patches like everyone else =). I also have outstanding RDMA patches myself that have not yet been pulled. Would you mind pinging Juan for both of us? Thanks. The patch is Cc'ing Juan, maybe he is very busy. I have post v2 even, but I have not gotten any reply. I have no idea how to do next. Best regards, -Gonglei
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On Thu, 15 May 2014 11:34:25 +0530 Amit Shah amit.s...@redhat.com wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: There is a need to add some more fields to VirtIODevice that should be migrated (broken status, endianness). The problem is that we do not want to break compatibility while adding a new feature... This issue has been addressed in the generic VMState code with the use of optional subsections. As a *temporary* alternative to port the whole virtio migration code to VMState, this patch mimics a similar subsectionning ability for virtio. Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Hmmm... you mean we support migration from a newer QEMU to an older one ? All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. That would be nice if possible. I will have a closer look. Amit Thanks. -- Gregory Kurz kurzg...@fr.ibm.com gk...@linux.vnet.ibm.com Software Engineer @ IBM/Meiosys http://www.ibm.com Tel +33 (0)562 165 496 Anarchy is about taking complete responsibility for yourself. Alan Moore.
Re: [Qemu-devel] usb: usb tablet freeze when save/restore guest os
Hi, Well then, may I post a formal patch for this issue, Gerd? Thanks. I'd like to know what the root cause for the lost interrupt is. Not implementing PIRQ enable could be it, especially as the guest os seems to use it (otherwise your patch would have no effect). The check for the PIRQ enable bit should be in uhci_update_irq though, and you should check the single bit only, not the whole legacy support register. cheers, Gerd
Re: [Qemu-devel] [PATCH] iotests: Use configured python
Max Reitz mre...@redhat.com writes: On 14.05.2014 14:33, Markus Armbruster wrote: Max Reitz mre...@redhat.com writes: Currently, QEMU's iotests rely on /usr/bin/env to start the correct Python (that is, at least Python 2.4, but not 3). On systems where Python 3 is the default, the user has no clean way of making the iotests use the correct binary. This commit makes the iotests use the Python selected by configure. Signed-off-by: Max Reitz mre...@redhat.com I'm afraid this broke qemu-iotests in a separate build tree: ./check: line 38: ./common.env: No such file or directory As I already replied to Peter, I see two (or maybe three) ways to fix this: The first is, we use the correct path to common.env. This would however result in modification of the source tree although this is probably not what the user intends with an out-of-tree build. On the other hand, this would just work. Writing to the source tree works only when the write is exactly the same for every build tree. Example: autoconf writing configure. Modifying files under git-control is right out. The second is, we do not create common.env for out-of-tree builds and add a default common.env to the repository (PYTHON = python should probably suffice). This would not introduce any regressions, however, the iotests would remain problematic on systems with Python 3 being the default when using out-of-tree builds. A difference between an in-tree and an out-of-tree build is a bug. If plain python is good enough for out-of-tree builds, it should be good enough for in-tree builds. As I guess that out-of-tree builds are actually recommended, Correct. this would mean that the better solution might be to revert my patch and instead change every python occurrence in the iotests' Shebangs to python2, as kind of a third way to go. However, for this I'm not sure whether all systems which are supposed to be supported by qemu actually have a python2 executable/symlink. I guess so, but then again... I don't know. Try and find out :) So, either we fix it but try to write to the source tree without the user intending to modify it; or we fix it for in-tree builds only; or we drop the magic and just use python2 in the iotests' Shebangs. The problem is including generated bits, namely results of configure, into source files. The Autoconf way is to substitute placeholders in FOO.in producing FOO. When you want to limit .in contents as much as possible, you factor out the stuff that needs substitutions into some SUB.in, which you then include into FOO. Requires a suitable include mechanism. In shell, builtin source. But then you need to find SUB from FOO. I've seen two ways used: 1. Assume SUB is in the current directory. Link FOO into the build tree to make it so. 2. Require FOO to be run in a way that lets it find its source directory. If whatever runs FOO puts the full path into argv[0], you can use that. Else, require a suitable argument or environment variable.
Re: [Qemu-devel] [PATCH v3] serial-pci: Set prog interface field of pci config to 16550 compatible
static Property serial_pci_properties[] = { DEFINE_PROP_CHR(chardev, PCISerialState, state.chr), +DEFINE_PROP_UINT8(compat, PCISerialState, compat, 0), DEFINE_PROP_END_OF_LIST(), }; mst, do you take that through the pci tree? cheers, Gerd Yes but I'd like the property renamed. Agree? Yes, something more descriptive is reasonable. cheers, Gerd
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On (Thu) 15 May 2014 [08:49:48], Greg Kurz wrote: On Thu, 15 May 2014 11:34:25 +0530 Amit Shah amit.s...@redhat.com wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: There is a need to add some more fields to VirtIODevice that should be migrated (broken status, endianness). The problem is that we do not want to break compatibility while adding a new feature... This issue has been addressed in the generic VMState code with the use of optional subsections. As a *temporary* alternative to port the whole virtio migration code to VMState, this patch mimics a similar subsectionning ability for virtio. Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Hmmm... you mean we support migration from a newer QEMU to an older one ? Well, not really support, since many things don't work anyway, but let's make that easier on downstreams that may want to. All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. That would be nice if possible. I will have a closer look. Thanks! Amit
[Qemu-devel] [PATCH v4 1/2] qemu-iotests: Add data pattern in version3 VMDK sample image in 059
It's possible that we diverge from the specification with our implementation. Having a reference image in the test cases may detect such problems when we introduce a bug that can read what it creates, but can't handle a real VMDK. Signed-off-by: Fam Zheng f...@redhat.com --- tests/qemu-iotests/059 | 4 + tests/qemu-iotests/059.out | 202 - .../sample_images/iotest-version3.vmdk.bz2 | Bin 414 - 4764 bytes 3 files changed, 205 insertions(+), 1 deletion(-) diff --git a/tests/qemu-iotests/059 b/tests/qemu-iotests/059 index 26a2fd3..3c053c2 100755 --- a/tests/qemu-iotests/059 +++ b/tests/qemu-iotests/059 @@ -114,6 +114,10 @@ echo echo === Testing version 3 === _use_sample_img iotest-version3.vmdk.bz2 _img_info +for i in {0..99}; do +$QEMU_IO -r -c read -P $(( i % 10 + 0x30 )) $(( i * 64 * 1024 * 10 + i * 512 )) 512 $TEST_IMG \ +| _filter_qemu_io +done echo echo === Testing 4TB monolithicFlat creation and IO === diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out index eba0ded..0dadba6 100644 --- a/tests/qemu-iotests/059.out +++ b/tests/qemu-iotests/059.out @@ -2056,8 +2056,208 @@ wrote 512/512 bytes at offset 10240 === Testing version 3 === image: TEST_DIR/iotest-version3.IMGFMT file format: IMGFMT -virtual size: 1.0G (1073741824 bytes) +virtual size: 16G (17179869184 bytes) cluster_size: 65536 +read 512/512 bytes at offset 0 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 655872 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 1311744 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 1967616 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 2623488 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 3279360 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 3935232 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 4591104 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 5246976 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 5902848 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 6558720 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 7214592 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 7870464 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 8526336 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 9182208 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 9838080 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 10493952 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 11149824 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 11805696 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 12461568 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 13117440 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 13773312 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 14429184 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 15085056 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 15740928 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 16396800 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 17052672 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 17708544 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 18364416 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 19020288 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 19676160 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 20332032 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 20987904 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 21643776 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 22299648 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512 bytes at offset 22955520 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 512/512
[Qemu-devel] [PATCH v4 0/2] vmdk: Optimize cluster allocation
Patch 01 is new. Patch 02 is V4 with Kevin's comments addressed. Fam Zheng (2): qemu-iotests: Add data pattern in version3 VMDK sample image in 059 vmdk: Optimize cluster allocation block/vmdk.c | 220 + tests/qemu-iotests/059 | 4 + tests/qemu-iotests/059.out | 202 ++- .../sample_images/iotest-version3.vmdk.bz2 | Bin 414 - 4764 bytes 4 files changed, 343 insertions(+), 83 deletions(-) -- 1.9.2
[Qemu-devel] [PATCH v4 2/2] vmdk: Optimize cluster allocation
This drops the unnecessary bdrv_truncate() from, and also improves, cluster allocation code path. Before, when we need a new cluster, get_cluster_offset truncates the image to bdrv_getlength() + cluster_size, and returns the offset of added area, i.e. the image length before truncating. This is not efficient, so it's now rewritten as: - Save the extent file length when opening. - When allocating cluster, use the saved length as cluster offset. - Don't truncate image, because we'll anyway write data there: just write any data at the EOF position, in descending priority: * New user data (cluster allocation happens in a write request). * Filling data in the beginning and/or ending of the new cluster, if not covered by user data: either backing file content (COW), or zero for standalone images. One major benifit of this change is, on host mounted NFS images, even over a fast network, ftruncate is slow (see the example below). This change significantly speeds up cluster allocation. Comparing by converting a cirros image (296M) to VMDK on an NFS mount point, over 1Gbe LAN: $ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk Before: real0m21.796s user0m0.130s sys 0m0.483s After: real0m2.017s user0m0.047s sys 0m0.190s We also get rid of unchecked bdrv_getlength() and bdrv_truncate(), and get a little more documentation in function comments. Tested that this passes qemu-iotests for all VMDK subformats. Signed-off-by: Fam Zheng f...@redhat.com --- V4: Fix L2 table cache and image, and improve according to any other comments. (Kevin) V3: A new implementation following Kevin's suggestion. --- block/vmdk.c | 220 +-- 1 file changed, 138 insertions(+), 82 deletions(-) diff --git a/block/vmdk.c b/block/vmdk.c index 480ea37..796bc01 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -106,6 +106,7 @@ typedef struct VmdkExtent { uint32_t l2_cache_counts[L2_CACHE_SIZE]; int64_t cluster_sectors; +int64_t next_cluster_sector; char *type; } VmdkExtent; @@ -124,7 +125,6 @@ typedef struct BDRVVmdkState { } BDRVVmdkState; typedef struct VmdkMetaData { -uint32_t offset; unsigned int l1_index; unsigned int l2_index; unsigned int l2_offset; @@ -397,6 +397,7 @@ static int vmdk_add_extent(BlockDriverState *bs, { VmdkExtent *extent; BDRVVmdkState *s = bs-opaque; +int64_t ret; if (cluster_sectors 0x20) { /* 0x20 * 512Bytes = 1GB for one cluster is unrealistic */ @@ -428,6 +429,12 @@ static int vmdk_add_extent(BlockDriverState *bs, extent-l2_size = l2_size; extent-cluster_sectors = flat ? sectors : cluster_sectors; +ret = bdrv_getlength(extent-file); +if (ret 0) { +return ret; +} +extent-next_cluster_sector = DIV_ROUND_UP(ret, BDRV_SECTOR_SIZE); + if (s-num_extents 1) { extent-end_sector = (*(extent - 1)).end_sector + extent-sectors; } else { @@ -954,57 +961,96 @@ static int vmdk_refresh_limits(BlockDriverState *bs) return 0; } +/** + * get_whole_cluster + * + * Copy backing file's cluster that covers @sector_num, otherwise write zero, + * to the cluster at @cluster_sector_num. + * + * If @skip_start @skip_end, the relative range [@skip_start, @skip_end) is + * not copied or written, and leave it for call to write user data in the + * request. + */ static int get_whole_cluster(BlockDriverState *bs, -VmdkExtent *extent, -uint64_t cluster_offset, -uint64_t offset, -bool allocate) + VmdkExtent *extent, + uint64_t cluster_sector_num, + uint64_t sector_num, + uint64_t skip_start, uint64_t skip_end) { int ret = VMDK_OK; -uint8_t *whole_grain = NULL; +int64_t cluster_bytes; +uint8_t *whole_grain; + +/* For COW, align request sector_num to cluster start */ +sector_num = QEMU_ALIGN_DOWN(sector_num, extent-cluster_sectors); +cluster_bytes = extent-cluster_sectors BDRV_SECTOR_BITS; +whole_grain = qemu_blockalign(bs, cluster_bytes); + +if (!bs-backing_hd) { +memset(whole_grain, 0, skip_start BDRV_SECTOR_BITS); +memset(whole_grain + (skip_end BDRV_SECTOR_BITS), 0, + cluster_bytes - (skip_end BDRV_SECTOR_BITS)); +} +assert(skip_end = sector_num + extent-cluster_sectors); /* we will be here if it's first write on non-exist grain(cluster). * try to read from parent image, if exist */ -if (bs-backing_hd) { -whole_grain = -qemu_blockalign(bs, extent-cluster_sectors BDRV_SECTOR_BITS); -if (!vmdk_is_cid_valid(bs)) { -ret = VMDK_ERROR; -goto exit; -} +if (bs-backing_hd
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On Thu, 15 May 2014 12:16:35 +0530 Amit Shah amit.s...@redhat.com wrote: On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: There is a need to add some more fields to VirtIODevice that should be migrated (broken status, endianness). The problem is that we do not want to break compatibility while adding a new feature... This issue has been addressed in the generic VMState code with the use of optional subsections. As a *temporary* alternative to port the whole virtio migration code to VMState, this patch mimics a similar subsectionning ability for virtio. BTW Greg, do you plan on working on vmstate for virtio? Yes. Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. This calls for a way for devices to inherit properties from the bus, which doesn't exist ATM. Fine but let's not hold up this patchset because of this. No, only suggestion is to add a migration section in the bus, and then it's easier to do this in the post-migrate functions for each device -- so only one new section gets introduced instead of all devices being modified to send a new subsection. The main problem I see is that virtio sucks: as you see in patch 8, we have to be careful not to call vring or virtqueue stuff before the device knows its endianness or it breaks... I need to study how the virtio-bus gets migrated to ensure the endian section is streamed before the devices. Amit Thanks. -- Gregory Kurz kurzg...@fr.ibm.com gk...@linux.vnet.ibm.com Software Engineer @ IBM/Meiosys http://www.ibm.com Tel +33 (0)562 165 496 Anarchy is about taking complete responsibility for yourself. Alan Moore.
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On Thu, May 15, 2014 at 08:49:48AM +0200, Greg Kurz wrote: On Thu, 15 May 2014 11:34:25 +0530 Amit Shah amit.s...@redhat.com wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: There is a need to add some more fields to VirtIODevice that should be migrated (broken status, endianness). The problem is that we do not want to break compatibility while adding a new feature... This issue has been addressed in the generic VMState code with the use of optional subsections. As a *temporary* alternative to port the whole virtio migration code to VMState, this patch mimics a similar subsectionning ability for virtio. Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Hmmm... you mean we support migration from a newer QEMU to an older one ? In theory yes, that's why we have the -M switch. This area isn't well tested unfortunately, but there are downstreams that test and productize it. All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. That would be nice if possible. I will have a closer look. Amit Thanks. -- Gregory Kurz kurzg...@fr.ibm.com gk...@linux.vnet.ibm.com Software Engineer @ IBM/Meiosys http://www.ibm.com Tel +33 (0)562 165 496 Anarchy is about taking complete responsibility for yourself. Alan Moore.
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On Thu, May 15, 2014 at 12:16:35PM +0530, Amit Shah wrote: On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: There is a need to add some more fields to VirtIODevice that should be migrated (broken status, endianness). The problem is that we do not want to break compatibility while adding a new feature... This issue has been addressed in the generic VMState code with the use of optional subsections. As a *temporary* alternative to port the whole virtio migration code to VMState, this patch mimics a similar subsectionning ability for virtio. BTW Greg, do you plan on working on vmstate for virtio? Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. This calls for a way for devices to inherit properties from the bus, which doesn't exist ATM. Fine but let's not hold up this patchset because of this. No, only suggestion is to add a migration section in the bus, and then it's easier to do this in the post-migrate functions for each device -- so only one new section gets introduced instead of all devices being modified to send a new subsection. Amit I don't mind but the gain isn't very big here. -- MST
Re: [Qemu-devel] [PATCH 0/4] Support more than 255 cpus
Kernel's kvm support is not here. x2APIC is needed, I will try to do that later. On 05/13/2014 06:53 PM, Jan Kiszka wrote: On 2014-05-13 09:09, Li, Zhen-Hua wrote: From: Li, ZhenHua zhen-h...@hp.com These series patches are trying to make Qemu support more than 255 CPUs. The max cpu number changed to 4096. Support more than 255 cpus: ACPI and APIC defines Support more than 255 cpus: max_cpus to 4096 Support more than 255 cpus: max cpumask bit to 4096 Support more than 255 cpus: runtime chec include/hw/acpi/cpu_hotplug_defs.h | 4 ++-- include/hw/i386/apic_internal.h| 2 +- include/hw/i386/pc.h | 2 +- include/sysemu/sysemu.h | 2 +- hw/i386/acpi-build.c | 8 Don't we need x2APIC support to provide 255 CPUs? Where so you enforce this, i.e. keep the restriction to 255 CPUs when we are not in KVM mode with in-kernel APIC (the emulate APIC lacks x2APIC mode, unfortunately)? But, wait, KVM only supports up to 255 VCPUs. So what are you targeting at? Jan
Re: [Qemu-devel] [PATCH 4/4] Support more than 255 cpus: runtime check
Maybe it should be 4 bytes for 4096 (0x1000). On 05/13/2014 04:19 PM, Max Filippov wrote: On Tue, May 13, 2014 at 11:09 AM, Li, Zhen-Hua zhen-h...@hp.com wrote: From: Li, ZhenHua zhen-h...@hp.com There is some runtime check for max cpu count. Make them support 4096 cpus. Signed-off-by: Li, ZhenHua zhen-h...@hp.com --- hw/i386/acpi-build.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index c98df88..5c3bf10 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c [...] @@ -680,7 +680,7 @@ build_append_notify_method(GArray *device, const char *name, for (i = 0; i count; i++) { GArray *target = build_alloc_array(); build_append_nameseg(target, format, i); -assert(i 256); /* Fits in 1 byte */ +assert(i 4096); /* Fits in 1 byte */ The comment is no longer true. Also the function build_append_notify_method is called with format argument set to CP%0.02X, looks like this should be changed to CP%0.03X.
[Qemu-devel] [PATCH] virtio-net: announce self by guest
It's hard to track all mac addresses and their configurations (e.g vlan or ipv6) in qemu. Without those informations, it's impossible to build proper garp packet after migration. The only possible solution to this is let guest (who knew all configurations) to do this. So, this patch introduces a new readonly config status bit of virtio-net, VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce presence of its link through config update interrupt.When guest has done the announcement, it should ack the notification through VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by Linux guest). During load, a counter of announcing rounds were set so that the after the vm is running it can trigger rounds of config interrupts to notify the guest to build and send the correct garps. Tested with ping to guest with vlan during migration. Without the patch, lots of the packets were lost after migration. With the patch, could not notice packet loss after migration. Reference: RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html V7: https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html Changes from RFC v2: - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME - compat self announce for 2.0 machine type Changes from RFC v1: - clean VIRTIO_NET_S_ANNOUNCE bit during reset - free announce timer during clean - make announce work for non-vhost case Changes from V7: - Instead of introducing a global method for each kind of nic, this version limits the changes to virtio-net itself. Cc: Liuyongan liuyon...@huawei.com Cc: Amos Kong ak...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- hw/net/virtio-net.c| 48 include/hw/i386/pc.h |5 include/hw/virtio/virtio-net.h | 16 + 3 files changed, 69 insertions(+), 0 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 940a7cf..98d59e9 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -25,6 +25,7 @@ #include monitor/monitor.h #define VIRTIO_NET_VM_VERSION11 +#define VIRTIO_NET_ANNOUNCE_ROUNDS3 #define MAC_TABLE_ENTRIES64 #define MAX_VLAN(1 12) /* Per 802.1Q definition */ @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status) (n-status VIRTIO_NET_S_LINK_UP) vdev-vm_running; } +static void virtio_net_announce_timer(void *opaque) +{ +VirtIONet *n = opaque; +VirtIODevice *vdev = VIRTIO_DEVICE(n); + +if (n-announce +virtio_net_started(n, vdev-status) +vdev-guest_features (0x1 VIRTIO_NET_F_GUEST_ANNOUNCE) +vdev-guest_features (0x1 VIRTIO_NET_F_CTRL_VQ)) { + +n-announce--; +n-status |= VIRTIO_NET_S_ANNOUNCE; +virtio_notify_config(vdev); +} else { +timer_del(n-announce_timer); +n-announce = 0; +} +} + static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) { VirtIODevice *vdev = VIRTIO_DEVICE(n); @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status) virtio_net_vhost_status(n, status); +virtio_net_announce_timer(n); + for (i = 0; i n-max_queues; i++) { q = n-vqs[i]; @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev) n-nobcast = 0; /* multiqueue is disabled by default */ n-curr_queues = 1; +timer_del(n-announce_timer); +n-announce = 0; +n-status = ~VIRTIO_NET_S_ANNOUNCE; /* Flush any MAC and VLAN filter table state */ n-mac_table.in_use = 0; @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd, return VIRTIO_NET_OK; } +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd, + struct iovec *iov, unsigned int iov_cnt) +{ +if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) { +n-status = ~VIRTIO_NET_S_ANNOUNCE; +if (n-announce) { +timer_mod(n-announce_timer, + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 + + (VIRTIO_NET_ANNOUNCE_ROUNDS - n-announce - 1) * 100); +} +return VIRTIO_NET_OK; +} else { +return VIRTIO_NET_ERR; +} +} + static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd, struct iovec *iov, unsigned int iov_cnt) { @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt); } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) { status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt); +} else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) { +status = virtio_net_handle_announce(n,
Re: [Qemu-devel] [PATCH v3 21/25] vmdk: implement .bdrv_detach/attach_aio_context()
On Thu, May 15, 2014 at 09:34:20AM +0800, Fam Zheng wrote: On Thu, 05/08 16:34, Stefan Hajnoczi wrote: Implement .bdrv_detach/attach_aio_context() interfaces to propagate detach/attach to BDRVVmdkState-extents[].file. The block layer takes care of -file and -backing_hd but doesn't know about our extents BlockDriverStates, which is also part of the graph. Reviewed-by: Fam Zheng f...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- Noticed this doesn't apply any more due to recent VMDK fixes, because the context is different. Trivial to fix, though. Yep, me too. The resolution is trivial though. Stefan
Re: [Qemu-devel] [PATCH 1/3] throttle: add throttle_detach/attach_aio_context()
On Wed, May 14, 2014 at 05:05:58PM +0200, Benoît Canet wrote: The Wednesday 14 May 2014 à 16:22:45 (+0200), Stefan Hajnoczi wrote : Block I/O throttling uses timers and currently always adds them to the main loop. Throttling will break if bdrv_set_aio_context() is used to move a BlockDriverState to a different AioContext. This patch adds throttle_detach/attach_aio_context() interfaces so the s/so the/to the/ ? Thanks, let me know how you feel about my replies below. If there is no other need to respin then let's let this commit description typo slide. diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h index ab29b0b..b890613 100644 --- a/include/qemu/throttle.h +++ b/include/qemu/throttle.h @@ -67,6 +67,11 @@ typedef struct ThrottleState { Can we make sure this header knows about AioContext in case there is another throttle user not block related ? It already does because qemu-common.h includes qemu-typedefs.h. We get an opaque AioContext. The point of the typedefs.h is to avoid pulling in all the headers when callers just pass around pointers and don't need the actual definition. In this case, I think the split makes sense because callers might pass AioContext without actually using the AioContext API themselves. diff --git a/tests/test-throttle.c b/tests/test-throttle.c index 1d4ffd3..5fa5000 100644 --- a/tests/test-throttle.c +++ b/tests/test-throttle.c @@ -12,8 +12,10 @@ #include glib.h #include math.h +#include block/aio.h And remove this one eventually. This include pulls in the AioContext API because the test case itself manipulates the AioContext.
Re: [Qemu-devel] [PATCH 0/3] throttle: use AioContext for dataplane support
On Wed, May 14, 2014 at 07:40:18PM +0200, Benoît Canet wrote: The Wednesday 14 May 2014 à 16:22:44 (+0200), Stefan Hajnoczi wrote : This series applies on top of my dataplane: use QEMU block layer series. Now that the dataplane code path is using the QEMU block layer we should make I/O throttling limits safe to use. When the block_set_io_throttle monitor command is executed, the BlockDriverState's AioContext must be acquired in order to prevent race conditions with the IOThread that is processing requests from the guest. The new block layer AioContext detach/attach mechanism needs to be extended to move the throttling timer to a new AioContext. This makes throttling work across bdrv_set_aio_context() calls. The result of this series is that I/O throttling works with dataplane and limits may be changed at runtime using the monitor. Stefan Hajnoczi (3): throttle: add throttle_detach/attach_aio_context() throttle: add detach/attach test case blockdev: acquire AioContext in block_set_io_throttle block.c | 7 +++ blockdev.c | 6 ++ include/qemu/throttle.h | 10 ++ tests/test-throttle.c | 49 - util/throttle.c | 27 +++ 5 files changed, 90 insertions(+), 9 deletions(-) -- 1.9.0 I find One thing chocking is this series. I carefully decloupled the throttling code from the block layer so anyone could reuse it. I am under the impression that this series couples it back. The coupling is just due to the current header file layout. It is not a real coupling to the block layer. Throttling has a dependency on timers. Timers are part of an event loop (either AioContext or main loop). The AioContext prototypes happen to be in block/aio.h but they are not a dependency on block.h or BlockDriverState. This means we could extract them and move them to qemu/aiocontext.h in a separate series. Hope this explains the block/aio.h header and shows it's not true coupling. Stefan
Re: [Qemu-devel] [PATCH] iotests: Use configured python
Markus Armbruster arm...@redhat.com writes: [...] The problem is including generated bits, namely results of configure, into source files. The Autoconf way is to substitute placeholders in FOO.in producing FOO. When you want to limit .in contents as much as possible, you factor out the stuff that needs substitutions into some SUB.in, which you then include into FOO. Requires a suitable include mechanism. In shell, builtin source. But then you need to find SUB from FOO. I've seen two ways used: 1. Assume SUB is in the current directory. Link FOO into the build tree to make it so. 2. Require FOO to be run in a way that lets it find its source directory. If whatever runs FOO puts the full path into argv[0], you can use that. Else, require a suitable argument or environment variable. Uh, I left out some obvious details here. Revert the role of FOO and SUB. Generate FOO from FOO.in into the build tree, include the real meat from SUB.
Re: [Qemu-devel] pidfile option in config file
William Dauchy wdau...@gmail.com writes: Hello, I'm using the `-pidfile` option in my qemu command line. Since I'm also using the `-readconfig` option I thought it was a good thing to include the pidfile option in my config file. Unfortunately I did not found any possibility to add such option in my config file. Is it something I could expect? User interface bug: not all options are coded in a way that makes them work with -readconfig. This includes -pidfile. Sorry!
Re: [Qemu-devel] [PATCH] configure: Ensure tests/qemu-iotests exists before writing common.env
On 14 May 2014 23:47, Max Reitz mre...@redhat.com wrote: On 14.05.2014 16:26, Peter Maydell wrote: Before we write common.env to the tests/qemu-iotests directory, ensure that it exists. This fixes out-of-tree builds from clean. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- If somebody would like to review this I'll apply it to master as a buildfix... configure | 1 + 1 file changed, 1 insertion(+) diff --git a/configure b/configure index 6adfa72..c4e43ed 100755 --- a/configure +++ b/configure @@ -4770,6 +4770,7 @@ fi iotests_common_env=tests/qemu-iotests/common.env +mkdir -p $(dirname $iotests_common_env) echo # Automatically generated by configure - do not modify $iotests_common_env echo $iotests_common_env echo PYTHON='$python' $iotests_common_env If we do this, we'd have a commen.env which we cannot use, as all the tests are still in the original source tree. OK, this is busted then. We need to revert your original patch; when we've figured out how to do it properly we can apply a corrected version. I'd rather use $source_path/tests/qemu_iotests/common.env instead, or, if that is unacceptable as it modifies the original source tree (which is probably not desired when doing an out-of-tree build), write common.env only if this is an in-tree build. You're correct, we must not write to the original source tree. We mustn't behave differently on in-tree and out-of-tree builds either. thanks -- PMM
Re: [Qemu-devel] [PATCH 6/7] arm/virt: enable PSCI emulation support for system emulation
On 14 May 2014 23:58, Rob Herring rob.herr...@linaro.org wrote: On Wed, May 14, 2014 at 4:25 PM, Peter Maydell peter.mayd...@linaro.org wrote: An old host kernel, or an old guest kernel? The former is fine, because the KVM CPU init code will just ask for the KVM capability and fill in the ARMCPU field appropriately. For the latter, how are you supposed to determine what the guest kernel can support? Guest kernels and this was exactly my point that you can't determine it. The virt dtb is for the guest kernel and must be either 0.1 PSCI only or both 0.1 and 0.2. I think I misread what you meant. Reading the other thread, as long as you just mean changing the if statement like this, then we are in agreement: if (psci version is 0.1) { qemu_fdt_setprop_string(fdt, /psci, compatible, arm,psci); } else { const char compat[] = arm,psci-0.2\0arm,psci; qemu_fdt_setprop(fdt, /psci, compatible, compat, sizeof(compat)); } Yes, that's all I meant; sorry for the confusion. I hadn't noticed that we announce and support both the 0.1 and 0.2 interfaces in the else {} branch, which is probably why my phrasing was confusing. thanks -- PMM
Re: [Qemu-devel] pidfile option in config file
On Thu, May 15, 2014 at 10:15 AM, Markus Armbruster arm...@redhat.com wrote: User interface bug: not all options are coded in a way that makes them work with -readconfig. This includes -pidfile. Sorry! Thanks for the answer. Maybe I can try to send a patch one day or another. I wanted to make sure there was no reason for this current behavior. Regards, -- William
Re: [Qemu-devel] [PATCH] virtio-net: announce self by guest
Thanks, looks good. Some minor comments below, On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote: It's hard to track all mac addresses and their configurations (e.g vlan or ipv6) in qemu. Without those informations, it's impossible to s/those informations/this information/ build proper garp packet after migration. The only possible solution to this is let guest (who knew all configurations) to do this. s/knew/knows/ So, this patch introduces a new readonly config status bit of virtio-net, VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce presence of its link through config update interrupt.When guest has done the announcement, it should ack the notification through VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by Linux guest). During load, a counter of announcing rounds were set so that the after s/were/is/ s/the after/after/ the vm is running it can trigger rounds of config interrupts to notify the guest to build and send the correct garps. Tested with ping to guest with vlan during migration. Without the patch, lots of the packets were lost after migration. With the patch, could not notice packet loss after migration. below changelog should go after ---, until the ack list. Reference: RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html V7: https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html Changes from RFC v2: - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME - compat self announce for 2.0 machine type Changes from RFC v1: - clean VIRTIO_NET_S_ANNOUNCE bit during reset - free announce timer during clean - make announce work for non-vhost case Changes from V7: - Instead of introducing a global method for each kind of nic, this version limits the changes to virtio-net itself. Cc: Liuyongan liuyon...@huawei.com Cc: Amos Kong ak...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- hw/net/virtio-net.c| 48 include/hw/i386/pc.h |5 include/hw/virtio/virtio-net.h | 16 + 3 files changed, 69 insertions(+), 0 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 940a7cf..98d59e9 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -25,6 +25,7 @@ #include monitor/monitor.h #define VIRTIO_NET_VM_VERSION11 +#define VIRTIO_NET_ANNOUNCE_ROUNDS3 #define MAC_TABLE_ENTRIES64 #define MAX_VLAN(1 12) /* Per 802.1Q definition */ I would make it 5 to be consistent with SELF_ANNOUNCE_ROUNDS in savevm.c in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h and reuse it. @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status) (n-status VIRTIO_NET_S_LINK_UP) vdev-vm_running; } +static void virtio_net_announce_timer(void *opaque) +{ +VirtIONet *n = opaque; +VirtIODevice *vdev = VIRTIO_DEVICE(n); + +if (n-announce I would make it 0 here, just in case it becomes negative as a result of some bug. +virtio_net_started(n, vdev-status) +vdev-guest_features (0x1 VIRTIO_NET_F_GUEST_ANNOUNCE) +vdev-guest_features (0x1 VIRTIO_NET_F_CTRL_VQ)) { + +n-announce--; +n-status |= VIRTIO_NET_S_ANNOUNCE; +virtio_notify_config(vdev); +} else { +timer_del(n-announce_timer); why do this here? +n-announce = 0; why is this here? +} +} + static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) { VirtIODevice *vdev = VIRTIO_DEVICE(n); @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status) virtio_net_vhost_status(n, status); +virtio_net_announce_timer(n); + why do this here? why not right after we set announce counter? for (i = 0; i n-max_queues; i++) { q = n-vqs[i]; @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev) n-nobcast = 0; /* multiqueue is disabled by default */ n-curr_queues = 1; +timer_del(n-announce_timer); +n-announce = 0; +n-status = ~VIRTIO_NET_S_ANNOUNCE; /* Flush any MAC and VLAN filter table state */ n-mac_table.in_use = 0; @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd, return VIRTIO_NET_OK; } +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd, + struct iovec *iov, unsigned int iov_cnt) +{ +if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) { +n-status = ~VIRTIO_NET_S_ANNOUNCE; +if (n-announce) { I would make it 0 here, just in case it becomes negative as a result of some bug. +
Re: [Qemu-devel] [PATCH 0/4] Support more than 255 cpus
On 2014-05-15 09:16, Li, ZhenHua wrote: Kernel's kvm support is not here. x2APIC is needed, I will try to do that later. x2APIC emulation can wait if KVM support is there. But we need at least one of them before starting to think about raising the limit. Jan PS: Please don't top-post. On 05/13/2014 06:53 PM, Jan Kiszka wrote: On 2014-05-13 09:09, Li, Zhen-Hua wrote: From: Li, ZhenHua zhen-h...@hp.com These series patches are trying to make Qemu support more than 255 CPUs. The max cpu number changed to 4096. Support more than 255 cpus: ACPI and APIC defines Support more than 255 cpus: max_cpus to 4096 Support more than 255 cpus: max cpumask bit to 4096 Support more than 255 cpus: runtime chec include/hw/acpi/cpu_hotplug_defs.h | 4 ++-- include/hw/i386/apic_internal.h| 2 +- include/hw/i386/pc.h | 2 +- include/sysemu/sysemu.h | 2 +- hw/i386/acpi-build.c | 8 Don't we need x2APIC support to provide 255 CPUs? Where so you enforce this, i.e. keep the restriction to 255 CPUs when we are not in KVM mode with in-kernel APIC (the emulate APIC lacks x2APIC mode, unfortunately)? But, wait, KVM only supports up to 255 VCPUs. So what are you targeting at? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH v2 14/23] target-arm: add banked coprocessor register type and macros
On 14 May 2014, at 18:42, Greg Bellows greg.bell...@linaro.orgmailto:greg.bell...@linaro.org wrote: On 13 May 2014 11:15, Fabian Aggeler aggel...@ethz.chmailto:aggel...@ethz.ch wrote: Banked CP registers can be defined with a A32_BANKED_REG macro which defines a non-secure instance of the register followed by an adjacent secure instance. Using a union makes the code backwards-compatible since the non-secure instance can normally be accessed by its existing name. This comment indicates that the 0th entry or the default name is the non-secure bank, which differs from the code below. The code does indeed use the 0th entry as non-secure and 1st entry as secure. Using the union the default name points to the 0th entry. Am I missing something here? When translating a banked CP register access instruction in monitor mode, the SCR.NS bit determines which instance is going to be accessed. If EL3 is operating in Aarch64 state coprocessor registers are not banked anymore but in some cases have its own _EL3 instance. Signed-off-by: Sergey Fedorov s.fedo...@samsung.commailto:s.fedo...@samsung.com Signed-off-by: Fabian Aggeler aggel...@ethz.chmailto:aggel...@ethz.ch --- target-arm/cpu.h | 121 + target-arm/helper.c| 64 -- target-arm/translate.c | 19 +--- 3 files changed, 184 insertions(+), 20 deletions(-) diff --git a/target-arm/cpu.h b/target-arm/cpu.h index a970d55..9e325ac 100644 --- a/target-arm/cpu.h +++ b/target-arm/cpu.h @@ -80,6 +80,16 @@ #define offsetofhigh32(S, M) (offsetof(S, M) + sizeof(uint32_t)) #endif +/* Define a banked coprocessor register state field. Use %name as the + * register field name for the secure instance. The non-secure instance + * has a _nonsecure suffix. Where is the _nonsecure suffix? The above comment appears to be incorrect as the code assumes that the 0th entry as the non-secure bank. That comment is wrong and still reflects an earlier implementation where I didn’t yet use a union to be able to retain the existing name. I will adjust it. Thanks for catching this. + */ +#define A32_BANKED_REG(type, name) \ +union { \ +type name; \ +type name##_banked[2]; \ +} + /* Meanings of the ARMCPU object's two inbound GPIO lines */ #define ARM_CPU_IRQ 0 #define ARM_CPU_FIQ 1 @@ -89,6 +99,7 @@ typedef void ARMWriteCPFunc(void *opaque, int cp_info, typedef uint32_t ARMReadCPFunc(void *opaque, int cp_info, int dstreg, int operand); + struct arm_boot_info; #define NB_MMU_MODES 5 @@ -673,6 +684,78 @@ static inline bool arm_el_is_aa64(CPUARMState *env, int el) return arm_feature(env, ARM_FEATURE_AARCH64); } +/* When EL3 is operating in Aarch32 state, the NS-bit determines + * whether the secure instance of a cp-register should be used. */ +#define USE_SECURE_REG(env) ( \ +arm_feature(env, ARM_FEATURE_SECURITY_EXTENSIONS) \ +!arm_el_is_aa64(env, 3) \ +!((env)-cp15.c1_scr 1/*NS*/)) + +#define NONSECURE_BANK 0 +#define SECURE_BANK 1 + +#define A32_BANKED_REG_GET(env, regname) \ +((USE_SECURE_REG(env)) ? \ +(env)-cp15.regname##_banked[SECURE_BANK] : \ +(env)-cp15.regname) + +#define A32_MAPPED_EL3_REG_GET(env, regname) \ +(((arm_el_is_aa64(env, 3) arm_current_pl(env) == 3) || \ + (USE_SECURE_REG(env))) ? \ +(env)-cp15.regname##_el3 : \ +(env)-cp15.regname##_el1) + +#define A32_BANKED_REG_SET(env, regname, val) \ +do { \ +if (USE_SECURE_REG(env)) { \ +(env)-cp15.regname##_banked[SECURE_BANK] = (val); \ +} else { \ +(env)-cp15.regname = (val); \ +} \ +} while (0) + +#define A32_MAPPED_EL3_REG_SET(env, regname, val) \ +do { \ +if ((arm_el_is_aa64(env, 3) arm_current_pl(env) == 3) || \ +(USE_SECURE_REG(env))) { \ +(env)-cp15.regname##_el3 = (val); \ +} else { \ +(env)-cp15.regname##_el1 = (val); \ +} \ +} while (0) + + +#define A32_BANKED_CURRENT_REG_GET(env, regname) \ +((!arm_el_is_aa64(env, 3) arm_is_secure(env)) ? \ +(env)-cp15.regname##_banked[SECURE_BANK] : \ +(env)-cp15.regname) + +#define A32_MAPPED_EL3_CURRENT_REG_GET(env, regname) \ +(((arm_el_is_aa64(env, 3) arm_current_pl(env) == 3) || \ + (!arm_el_is_aa64(env, 3) arm_is_secure(env))) ? \ +(env)-cp15.regname##_el3 : \ +(env)-cp15.regname##_el1) + +#define A32_BANKED_CURRENT_REG_SET(env, regname, val) \ +do { \ +if (!arm_el_is_aa64(env, 3) arm_is_secure(env)) { \ +(env)-cp15.regname##_banked[SECURE_BANK] = (val); \ +} else { \ +(env)-cp15.regname = (val); \ +} \ +} while (0) + +#define
Re: [Qemu-devel] Qemu stucking
On 15/05/14 06:41, sonia verma wrote: Hi I'm getting below error when trying to boot the KVM with ethernet bridging,kvm support and universel TUN enabled by the following command.. /usr/bin/qemu-system-ppc64 -m 512 -nographic -hda /var/volatile/debian_lenny_ powerpc_standard.qcow2 FWIW I think this image will boot with normal qemu-system-ppc too. cannot manage 'OHCI USB controller' PCI device type 'usb': 106b 3f (c 3 10) = OpenBIOS 1.0 [Aug 28 2012 05:40] Wow, that's fairly old... Configuration device id QEMU version 1 machine id 3 CPUs: 1 Memory: 512M UUID: ---- CPU type PowerPC,970FX Welcome to OpenBIOS v1.0 built on Aug 28 2012 05:40 Second-stage QUIK loader Welcome to quik. mate is good. Debian GNU/Linux PowerPCchosen/bootargs = boot: ` Enter the kernel image name as [device:][partno]/path, where partno is a number from 0 to 16. Instead of /path you can type [mm-nn] to specify a range of disk blocks (512B) boot: Linux initrd imagename = /initrd.img, mem_size: 4406840 initrd_start: Starting at 51, , 1024 OF stdout device is: /pci@f000/mac-io@e/escc@13000/ch-b@13000 command line: console=ttyS0,9600 console=tty0 memory layout at init: alloc_bottom : 00c01000 alloc_top: 2000 alloc_top_hi : 2000 rmo_top : 2000 ram_top : 2000 Looking for displays found display : /pci@f000/QEMU,VGA@c, opening ... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x00c02000 - 0x00c024f0 Device tree struct 0x00c03000 - 0x00c05000 Calling quiesce ... returning from prom_init The prompt is stuck at init and is not able to proceed even after wating for 15-20 minutes. Please help regarding this. In my testing here, I tend to find that some versions of Linux don't particularly like running in -nographic mode and hangs at the point above. Some kernels even get upset running with a 32/15-bit display and must be forced to 8-bit instead :/ Can you try removing -nographic and instead try one or both of the following: -g 800x600x32 -g 800x600x8 And if those don't work, try the same image again but with qemu-system-ppc rather than qemu-system-ppc64. HTH, Mark.
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
Am 15.05.2014 09:04, schrieb Greg Kurz: On Thu, 15 May 2014 12:16:35 +0530 Amit Shah amit.s...@redhat.com wrote: On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Since backwards migration is not supported upstream, wouldn't it be easiest to just add support for the subsection marker and skipping to the end of section in that downstream? All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. IMO this is calling for inversion of control - i.e. let virtio devices call generic load/save functions that then dispatch to device-specific code and let us add common stuff in a central place without forgetting to add calls in some new device. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. This calls for a way for devices to inherit properties from the bus, which doesn't exist ATM. Fine but let's not hold up this patchset because of this. No, only suggestion is to add a migration section in the bus, and then it's easier to do this in the post-migrate functions for each device -- so only one new section gets introduced instead of all devices being modified to send a new subsection. The main problem I see is that virtio sucks: as you see in patch 8, we have to be careful not to call vring or virtqueue stuff before the device knows its endianness or it breaks... I need to study how the virtio-bus gets migrated to ensure the endian section is streamed before the devices. There is no ordering guarantee. The state needs to be migrated in the device or bus where it sits, if post-load processing is required; i.e., if it's in VirtIODevice then something like this series, if it were on VirtioBus exclusively (device asking bus for its endianness each time and does not do post-load stuff) then endianness could be migrated as a new bus section. Not sure if that would help the broken state though? Would touch on Stefan's alias properties for anything but virtio-mmio. Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH] virtio-net: announce self by guest
On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote: Thanks, looks good. Some minor comments below, On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote: It's hard to track all mac addresses and their configurations (e.g vlan or ipv6) in qemu. Without those informations, it's impossible to s/those informations/this information/ build proper garp packet after migration. The only possible solution to this is let guest (who knew all configurations) to do this. s/knew/knows/ So, this patch introduces a new readonly config status bit of virtio-net, VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce presence of its link through config update interrupt.When guest has done the announcement, it should ack the notification through VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by Linux guest). During load, a counter of announcing rounds were set so that the after s/were/is/ s/the after/after/ Will correct those typos. the vm is running it can trigger rounds of config interrupts to notify the guest to build and send the correct garps. Tested with ping to guest with vlan during migration. Without the patch, lots of the packets were lost after migration. With the patch, could not notice packet loss after migration. below changelog should go after ---, until the ack list. Ok. Reference: RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html V7: https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html Changes from RFC v2: - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME - compat self announce for 2.0 machine type Changes from RFC v1: - clean VIRTIO_NET_S_ANNOUNCE bit during reset - free announce timer during clean - make announce work for non-vhost case Changes from V7: - Instead of introducing a global method for each kind of nic, this version limits the changes to virtio-net itself. Cc: Liuyongan liuyon...@huawei.com Cc: Amos Kong ak...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- hw/net/virtio-net.c| 48 include/hw/i386/pc.h |5 include/hw/virtio/virtio-net.h | 16 + 3 files changed, 69 insertions(+), 0 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 940a7cf..98d59e9 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -25,6 +25,7 @@ #include monitor/monitor.h #define VIRTIO_NET_VM_VERSION11 +#define VIRTIO_NET_ANNOUNCE_ROUNDS3 #define MAC_TABLE_ENTRIES64 #define MAX_VLAN(1 12) /* Per 802.1Q definition */ I would make it 5 to be consistent with SELF_ANNOUNCE_ROUNDS in savevm.c in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h and reuse it. Ok. @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status) (n-status VIRTIO_NET_S_LINK_UP) vdev-vm_running; } +static void virtio_net_announce_timer(void *opaque) +{ +VirtIONet *n = opaque; +VirtIODevice *vdev = VIRTIO_DEVICE(n); + +if (n-announce I would make it 0 here, just in case it becomes negative as a result of some bug. Sure. +virtio_net_started(n, vdev-status) +vdev-guest_features (0x1 VIRTIO_NET_F_GUEST_ANNOUNCE) +vdev-guest_features (0x1 VIRTIO_NET_F_CTRL_VQ)) { + +n-announce--; +n-status |= VIRTIO_NET_S_ANNOUNCE; +virtio_notify_config(vdev); +} else { +timer_del(n-announce_timer); why do this here? +n-announce = 0; why is this here? If guest shuts down the device, there's no need to do the announcing. +} +} + static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) { VirtIODevice *vdev = VIRTIO_DEVICE(n); @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status) virtio_net_vhost_status(n, status); +virtio_net_announce_timer(n); + why do this here? why not right after we set announce counter? The reasons are: - The counters were set in load, but the device is not running so we could not inject the interrupt at that time. - We can stop the progress when guest is shutting down the device. for (i = 0; i n-max_queues; i++) { q = n-vqs[i]; @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev) n-nobcast = 0; /* multiqueue is disabled by default */ n-curr_queues = 1; +timer_del(n-announce_timer); +n-announce = 0; +n-status = ~VIRTIO_NET_S_ANNOUNCE; /* Flush any MAC and VLAN filter table state */ n-mac_table.in_use = 0; @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd, return VIRTIO_NET_OK; } +static int
Re: [Qemu-devel] [PATCH v1 00/22] target-arm: Preparations for A64 EL2 and 3
Hi Greg Thanks for your comments. I still have to work through them. I am using OpenVirtualization in secure world, which then switches to a Linux kernel in non-secure world to test the patches. What about you? Best, Fabian On 14 May 2014, at 15:55, Greg Bellows greg.bell...@linaro.orgmailto:greg.bell...@linaro.org wrote: Hi Fabian, I too had been updating the core TZ patches provided by Samsung. From looking at your changes I see a lot of similarities in our code with the exception being the mechanism for banked register support. The difference being that your approach is a bit more explicit in the declaration of the banked registers. Whereas my approach was to update the banked registers once all the other registers were registered. Both approaches I believe work. I spoke with Peter M. and he and I are okay with your approach. I will be looking closer at your patches today and making comments. One thing that held me up from committing sooner was testing my changes. Do you have a good approach for testing the changes? Regards, Greg On 14 May 2014 03:58, Aggeler Fabian aggel...@student.ethz.chmailto:aggel...@student.ethz.ch wrote: I see. What is Greg Bellows working on exactly? Also peripherals like TZASC, TZPC,...? My plan is to focus on them now if no one else is working on them. What do you suggest to minimize overlap? Thanks, Fabian From: Peter Maydell [peter.mayd...@linaro.orgmailto:peter.mayd...@linaro.org] Sent: Monday, May 12, 2014 10:39 PM To: Aggeler Fabian Cc: Edgar E. Iglesias; Rob Herring; Peter Crosthwaite; QEMU Developers; Alexander Graf; John Williams; Alex Bennée; Greg Bellows Subject: Re: [Qemu-devel] [PATCH v1 00/22] target-arm: Preparations for A64 EL2 and 3 On 12 May 2014 20:13, Aggeler Fabian aggel...@student.ethz.chmailto:aggel...@student.ethz.ch wrote: I’ve been reworking the Samsung patches as part of my Master thesis and I wanted to send them some time this week. I am currently rebasing them when I noticed Edgar’s patches. Is there some branch with the patches so I could rebase on them? Hmm, that makes about three lots of people trying to do similar things at this point. We should try to coordinate so we don't duplicate work. thanks -- PMM
Re: [Qemu-devel] [PATCH v2 02/23] target-arm: move SCR into Security Extensions register list
On 14 May 2014, at 16:19, Greg Bellows greg.bell...@linaro.orgmailto:greg.bell...@linaro.org wrote: On 13 May 2014 11:15, Fabian Aggeler aggel...@ethz.chmailto:aggel...@ethz.ch wrote: From: Sergey Fedorov s.fedo...@samsung.commailto:s.fedo...@samsung.com Define a new ARM CP register info list for the Security Extension feature. Register that list only for ARM cores with Security Extension support. Moving SCR into Security Extension register group. Signed-off-by: Sergey Fedorov s.fedo...@samsung.commailto:s.fedo...@samsung.com Signed-off-by: Fabian Aggeler aggel...@ethz.chmailto:aggel...@ethz.ch --- target-arm/helper.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/target-arm/helper.c b/target-arm/helper.c index 3be917c..7898f40 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -768,9 +768,6 @@ static const ARMCPRegInfo v7_cp_reginfo[] = { .access = PL1_RW, .writefn = vbar_write, .fieldoffset = offsetof(CPUARMState, cp15.c12_vbar), .resetvalue = 0 }, -{ .name = SCR, .cp = 15, .crn = 1, .crm = 1, .opc1 = 0, .opc2 = 0, - .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.c1_scr), - .resetvalue = 0, }, { .name = CCSIDR, .state = ARM_CP_STATE_BOTH, .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 0, .access = PL1_R, .readfn = ccsidr_read, .type = ARM_CP_NO_MIGRATE }, @@ -2087,6 +2084,15 @@ static void sctlr_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush(CPU(cpu), 1); } +static const ARMCPRegInfo tz_cp_reginfo[] = { Sticking with the feature name switch from TRUSTZONE to SECURITY, for consistency we should call this security_cp_reginfo. Makes sense. I will change it. +#ifndef CONFIG_USER_ONLY +{ .name = SCR, .cp = 15, .crn = 1, .crm = 1, .opc1 = 0, .opc2 = 0, + .access = PL3_RW, .fieldoffset = offsetof(CPUARMState, cp15.c1_scr), + .resetvalue = 0, }, +#endif +REGINFO_SENTINEL +}; + static CPAccessResult ctr_el0_access(CPUARMState *env, const ARMCPRegInfo *ri) { /* Only accessible in EL0 if SCTLR.UCT is set (and only in AArch64, @@ -2364,6 +2370,9 @@ void register_cp_regs_for_features(ARMCPU *cpu) if (arm_feature(env, ARM_FEATURE_LPAE)) { define_arm_cp_regs(cpu, lpae_cp_reginfo); } +if (arm_feature(env, ARM_FEATURE_SECURITY_EXTENSIONS)) { +define_arm_cp_regs(cpu, tz_cp_reginfo); +} /* Slightly awkwardly, the OMAP and StrongARM cores need all of * cp15 crn=0 to be writes-ignored, whereas for other cores they should * be read-only (ie write causes UNDEF exception). -- 1.8.3.2
Re: [Qemu-devel] [Qemu-ppc] macio ide question/bug report
On 15/05/14 00:21, BALATON Zoltan wrote: Which part is it that's still confusing you? Putting breakpoints on pmac_ide_transfer() and pmac_ide_atapi_transfer_cb() will show you the iterations on each DMA request (be sure to compare against a known good example to understand how it should work first). If you can give more detail as to which bits are confusing, I will try my best to explain them. Looking at the backtrace: #0 ide_atapi_cmd_error (s=0x563cb238, sense_key=5, asc=33) at hw/ide/atapi.c:141 #1 0x556cecf5 in ide_atapi_io_error (s=0x563cb238, ret=-5) at hw/ide/atapi.c:160 #2 0x556d9d01 in pmac_ide_atapi_transfer_cb (opaque=0x563ccc68, ret=-5) at hw/ide/macio.c:64 #3 0x556780d2 in dma_complete (dbs=0x563ab840, ret=-5) at dma-helpers.c:121 #4 0x556781db in dma_bdrv_cb (opaque=0x563ab840, ret=-5) at dma-helpers.c:149 #5 0x55614dd1 in bdrv_co_em_bh (opaque=0x563b5000) at block.c:4602 #6 0x555f8170 in aio_bh_poll (ctx=0x5637fc00) at async.c:81 #7 0x555f7dc9 in aio_poll (ctx=0x5637fc00, blocking=false) at aio-posix.c:188 #8 0x555f8587 in aio_ctx_dispatch (source=0x5637fc00, callback= 0x0, user_data=0x0) at async.c:205 #9 0x778ca6d5 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #10 0x557a0f42 in glib_pollfds_poll () at main-loop.c:190 #11 0x557a1042 in os_host_main_loop_wait (timeout=0) at main-loop.c:235 #12 0x557a1115 in main_loop_wait (nonblocking=1) at main-loop.c:484 #13 0x55844190 in main_loop () at vl.c:2075 #14 0x5584bc23 in main (argc=30, argv=0x7fffdc88, envp= 0x7fffdd80) at vl.c:4556 shows that pmac_ide_atapi_transfer_cb is called with ret=-5 which is why it fails, so putting a breakpoint there is too late. What I don't understand is where this -5 value comes from. I don't have a known good example because Darwin reads the TOC differently (probably before enabling DMA, I did not trace it more than the logs I've included earlier though) and MorphOS always fails. Have you noticed that the dma_bdrv_*() functions use a function pointer to pmac_ide_atapi_transfer_cb() as their callback function? The dma_bdrv_*() functions operate in a separate thread and then invoke the callback function when finished. The breakpoint you are hitting above is the invocation of pmac_ide_atapi_transfer_cb() as a result of the callback *after* the command has already failed, and not the invocation of pmac_ide_atapi_transfer_cb() which calls dma_bdrv_*() to setup the asynchronous transfer. Hence the DMA has already failed and the -5 value is being returned from the IDE code. I can only guess the reason this works with Darwin is because it is a non-DMA request. If you place a breakpoint on pmac_ide_transfer() then its invocation of pmac_ide_atapi_transfer_cb() should be the first iteration which sets up the DMA request, and the second invocation should be the (failing) callback from the dma_bdrv_*() functions. But as I mentioned before, I think the problem is that these functions shouldn't be called in the first place when requesting a TOC. HTH, Mark.
Re: [Qemu-devel] [PATCH v1 00/22] target-arm: Preparations for A64 EL2 and 3
On 15.05.2014 13:28, Aggeler Fabian wrote: Hi Greg Thanks for your comments. I still have to work through them. I am using OpenVirtualization in secure world, which then switches to a Linux kernel in non-secure world to test the patches. What about you? Best, Fabian Hi, Fabian, are there some secure OS with secure user-space tasks which can be used for testing whether world switching is performed correctly? Regards, Sergey. On 14 May 2014, at 15:55, Greg Bellows greg.bell...@linaro.orgmailto:greg.bell...@linaro.org wrote: Hi Fabian, I too had been updating the core TZ patches provided by Samsung. From looking at your changes I see a lot of similarities in our code with the exception being the mechanism for banked register support. The difference being that your approach is a bit more explicit in the declaration of the banked registers. Whereas my approach was to update the banked registers once all the other registers were registered. Both approaches I believe work. I spoke with Peter M. and he and I are okay with your approach. I will be looking closer at your patches today and making comments. One thing that held me up from committing sooner was testing my changes. Do you have a good approach for testing the changes? Regards, Greg On 14 May 2014 03:58, Aggeler Fabian aggel...@student.ethz.chmailto:aggel...@student.ethz.ch wrote: I see. What is Greg Bellows working on exactly? Also peripherals like TZASC, TZPC,...? My plan is to focus on them now if no one else is working on them. What do you suggest to minimize overlap? Thanks, Fabian From: Peter Maydell [peter.mayd...@linaro.orgmailto:peter.mayd...@linaro.org] Sent: Monday, May 12, 2014 10:39 PM To: Aggeler Fabian Cc: Edgar E. Iglesias; Rob Herring; Peter Crosthwaite; QEMU Developers; Alexander Graf; John Williams; Alex Bennée; Greg Bellows Subject: Re: [Qemu-devel] [PATCH v1 00/22] target-arm: Preparations for A64 EL2 and 3 On 12 May 2014 20:13, Aggeler Fabian aggel...@student.ethz.chmailto:aggel...@student.ethz.ch wrote: I’ve been reworking the Samsung patches as part of my Master thesis and I wanted to send them some time this week. I am currently rebasing them when I noticed Edgar’s patches. Is there some branch with the patches so I could rebase on them? Hmm, that makes about three lots of people trying to do similar things at this point. We should try to coordinate so we don't duplicate work. thanks -- PMM
Re: [Qemu-devel] Qemu stucking
Hi Mark Thanks for the reply. I'll test and let you know the result soon. On Thu, May 15, 2014 at 2:36 PM, Mark Cave-Ayland mark.cave-ayl...@ilande.co.uk wrote: On 15/05/14 06:41, sonia verma wrote: Hi I'm getting below error when trying to boot the KVM with ethernet bridging,kvm support and universel TUN enabled by the following command.. /usr/bin/qemu-system-ppc64 -m 512 -nographic -hda /var/volatile/debian_lenny_ powerpc_standard.qcow2 FWIW I think this image will boot with normal qemu-system-ppc too. cannot manage 'OHCI USB controller' PCI device type 'usb': 106b 3f (c 3 10) = OpenBIOS 1.0 [Aug 28 2012 05:40] Wow, that's fairly old... Configuration device id QEMU version 1 machine id 3 CPUs: 1 Memory: 512M UUID: ---- CPU type PowerPC,970FX Welcome to OpenBIOS v1.0 built on Aug 28 2012 05:40 Second-stage QUIK loader Welcome to quik. mate is good. Debian GNU/Linux PowerPCchosen/bootargs = boot: ` Enter the kernel image name as [device:][partno]/path, where partno is a number from 0 to 16. Instead of /path you can type [mm-nn] to specify a range of disk blocks (512B) boot: Linux initrd imagename = /initrd.img, mem_size: 4406840 initrd_start: Starting at 51, , 1024 OF stdout device is: /pci@f000/mac-io@e/escc@13000/ch-b@13000 command line: console=ttyS0,9600 console=tty0 memory layout at init: alloc_bottom : 00c01000 alloc_top: 2000 alloc_top_hi : 2000 rmo_top : 2000 ram_top : 2000 Looking for displays found display : /pci@f000/QEMU,VGA@c, opening ... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x00c02000 - 0x00c024f0 Device tree struct 0x00c03000 - 0x00c05000 Calling quiesce ... returning from prom_init The prompt is stuck at init and is not able to proceed even after wating for 15-20 minutes. Please help regarding this. In my testing here, I tend to find that some versions of Linux don't particularly like running in -nographic mode and hangs at the point above. Some kernels even get upset running with a 32/15-bit display and must be forced to 8-bit instead :/ Can you try removing -nographic and instead try one or both of the following: -g 800x600x32 -g 800x600x8 And if those don't work, try the same image again but with qemu-system-ppc rather than qemu-system-ppc64. HTH, Mark.
Re: [Qemu-devel] [PATCH] virtio-net: announce self by guest
On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote: On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote: Thanks, looks good. Some minor comments below, On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote: It's hard to track all mac addresses and their configurations (e.g vlan or ipv6) in qemu. Without those informations, it's impossible to s/those informations/this information/ build proper garp packet after migration. The only possible solution to this is let guest (who knew all configurations) to do this. s/knew/knows/ So, this patch introduces a new readonly config status bit of virtio-net, VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce presence of its link through config update interrupt.When guest has done the announcement, it should ack the notification through VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by Linux guest). During load, a counter of announcing rounds were set so that the after s/were/is/ s/the after/after/ Will correct those typos. the vm is running it can trigger rounds of config interrupts to notify the guest to build and send the correct garps. Tested with ping to guest with vlan during migration. Without the patch, lots of the packets were lost after migration. With the patch, could not notice packet loss after migration. below changelog should go after ---, until the ack list. Ok. Reference: RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html V7: https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html Changes from RFC v2: - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME - compat self announce for 2.0 machine type Changes from RFC v1: - clean VIRTIO_NET_S_ANNOUNCE bit during reset - free announce timer during clean - make announce work for non-vhost case Changes from V7: - Instead of introducing a global method for each kind of nic, this version limits the changes to virtio-net itself. Cc: Liuyongan liuyon...@huawei.com Cc: Amos Kong ak...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- hw/net/virtio-net.c| 48 include/hw/i386/pc.h |5 include/hw/virtio/virtio-net.h | 16 + 3 files changed, 69 insertions(+), 0 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 940a7cf..98d59e9 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -25,6 +25,7 @@ #include monitor/monitor.h #define VIRTIO_NET_VM_VERSION11 +#define VIRTIO_NET_ANNOUNCE_ROUNDS3 #define MAC_TABLE_ENTRIES64 #define MAX_VLAN(1 12) /* Per 802.1Q definition */ I would make it 5 to be consistent with SELF_ANNOUNCE_ROUNDS in savevm.c in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h and reuse it. Ok. @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status) (n-status VIRTIO_NET_S_LINK_UP) vdev-vm_running; } +static void virtio_net_announce_timer(void *opaque) +{ +VirtIONet *n = opaque; +VirtIODevice *vdev = VIRTIO_DEVICE(n); + +if (n-announce I would make it 0 here, just in case it becomes negative as a result of some bug. Sure. +virtio_net_started(n, vdev-status) +vdev-guest_features (0x1 VIRTIO_NET_F_GUEST_ANNOUNCE) +vdev-guest_features (0x1 VIRTIO_NET_F_CTRL_VQ)) { + +n-announce--; +n-status |= VIRTIO_NET_S_ANNOUNCE; +virtio_notify_config(vdev); +} else { +timer_del(n-announce_timer); why do this here? +n-announce = 0; why is this here? If guest shuts down the device, there's no need to do the announcing. It's still weird. Either announce is 0 and then timer was not running or it's 0 and then this won't trigger. +} +} + static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) { VirtIODevice *vdev = VIRTIO_DEVICE(n); @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status) virtio_net_vhost_status(n, status); +virtio_net_announce_timer(n); + why do this here? why not right after we set announce counter? The reasons are: - The counters were set in load, but the device is not running so we could not inject the interrupt at that time. I see. This makes sense - but this isn't intuitive. Why don't we simply start timer with current time? Need to make sure it runs fine if time passes, but I think it's fine. - We can stop the progress when guest is shutting down the device. On shut down guest will reset device stopping timer - this seems enough.
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On Thu, May 15, 2014 at 11:20:18AM +0200, Andreas Färber wrote: Am 15.05.2014 09:04, schrieb Greg Kurz: On Thu, 15 May 2014 12:16:35 +0530 Amit Shah amit.s...@redhat.com wrote: On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Since backwards migration is not supported upstream, wouldn't it be easiest to just add support for the subsection marker and skipping to the end of section in that downstream? Backwards and forwards migration need to be supported, customers told us repeatedly. So some downstreams support this and not supporting it upstream just means downstreams need to do their own thing. As importantly, ping-pong migration is the only reliable way to stress migration. So if we want to test cross-version we need it to work both way. Finally, the real issue and difficulty with cross-version migration is making VM behave in a backwards compatible way. Serializing in a compatible way is a trivial problem, or would be if the code wasn't a mess :) Once you do the hard part, breaking migration because of the trivial serialization issue is just silly. And special-casing forward migration does not make code simpler, it really only leads to proliferation of hacks and lack of symmetry. So yes it's a useful feature, and no not supporting it does not help anyway. All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. IMO this is calling for inversion of control - i.e. let virtio devices call generic load/save functions that then dispatch to device-specific code and let us add common stuff in a central place without forgetting to add calls in some new device. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. This calls for a way for devices to inherit properties from the bus, which doesn't exist ATM. Fine but let's not hold up this patchset because of this. No, only suggestion is to add a migration section in the bus, and then it's easier to do this in the post-migrate functions for each device -- so only one new section gets introduced instead of all devices being modified to send a new subsection. The main problem I see is that virtio sucks: as you see in patch 8, we have to be careful not to call vring or virtqueue stuff before the device knows its endianness or it breaks... I need to study how the virtio-bus gets migrated to ensure the endian section is streamed before the devices. There is no ordering guarantee. The state needs to be migrated in the device or bus where it sits, if post-load processing is required; i.e., if it's in VirtIODevice then something like this series, if it were on VirtioBus exclusively (device asking bus for its endianness each time and does not do post-load stuff) then endianness could be migrated as a new bus section. Not sure if that would help the broken state though? Would touch on Stefan's alias properties for anything but virtio-mmio. Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCHv4] block: optimize zero writes with bdrv_write_zeroes
Am 15.05.2014 um 07:16 hat Peter Lieven geschrieben: Am 14.05.2014 13:41, schrieb Kevin Wolf: Am 08.05.2014 um 18:22 hat Peter Lieven geschrieben: this patch tries to optimize zero write requests by automatically using bdrv_write_zeroes if it is supported by the format. This significantly speeds up file system initialization and should speed zero write test used to test backend storage performance. I ran the following 2 tests on my internal SSD with a 50G QCOW2 container and on an attached iSCSI storage. a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX QCOW2 [off] [on] [unmap] - runtime: 14secs1.1secs 1.1secs filesize: 937M 18M 18M iSCSI [off] [on] [unmap] runtime: 9.3s 0.9s 0.9s b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct QCOW2 [off] [on] [unmap] - runtime: 246secs 18secs 18secs filesize: 51G 192K 192K throughput:203M/s2.3G/s 2.3G/s iSCSI*[off] [on] [unmap] runtime: 8mins 45secs 33secs throughput:106M/s1.2G/s 1.6G/s allocated: 100% 100% 0% * The storage was connected via an 1Gbit interface. It seems to internally handle writing zeroes via WRITESAME16 very fast. Signed-off-by: Peter Lieven p...@kamp.de --- v3-v4: - use QAPI generated enum and lookup table [Kevin] - added more details about the options in the comments of the qapi-schema [Eric] - changed the type of detect_zeroes from str to BlockdevDetectZeroesOptions. I left the name as is because it is consistent with e.g. BlockdevDiscardOptions or BlockdevAioOptions [Eric] - changed the parse function in blockdev_init to be generic usable for other enum parameters v2-v3: - moved parameter parsing to blockdev_init - added per device detect_zeroes status to hmp (info block -v) and qmp (query-block) [Eric] - added support to enable detect-zeroes also for hot added devices [Eric] - added missing entry to qemu_common_drive_opts - fixed description of qemu_iovec_is_zero [Fam] v1-v2: - added tests to commit message (Markus) RFCv2-v1: - fixed paramter parsing strncmp - strcmp (Eric) - fixed typo (choosen-chosen) (Eric) - added second example to commit msg RFCv1-RFCv2: - add detect-zeroes=off|on|unmap knob to drive cmdline parameter - call zero detection only for format (bs-file != NULL) block.c | 11 ++ block/qapi.c |1 + blockdev.c| 34 + hmp.c |5 + include/block/block_int.h |1 + include/qemu-common.h |1 + qapi-schema.json | 52 - qemu-options.hx |6 ++ qmp-commands.hx |3 +++ util/iov.c| 21 ++ 10 files changed, 120 insertions(+), 15 deletions(-) diff --git a/block.c b/block.c index b749d31..aea4c77 100644 --- a/block.c +++ b/block.c @@ -3244,6 +3244,17 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs, ret = notifier_with_return_list_notify(bs-before_write_notifiers, req); +if (!ret bs-detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF +!(flags BDRV_REQ_ZERO_WRITE) bs-file +drv-bdrv_co_write_zeroes qemu_iovec_is_zero(qiov)) { Pretty long condition. :-) Looks like most is obviously needed, but I wonder what the bs-file part is good for? That looks rather arbitrary. What I wanted to achieve is that this condition is only true if we handle the format (e.g. raw, qcow2, vmdk etc.). If e.g. qcow2 then sends a zero write this should always end on the disk and should not be optimizable. But why? This means setting an arbitrary policy for no good reason. You already have an option, and it already defaults to off, so unless someone specifically enables it for bs-file, we don't do the optimisation. But if someone wants to have it on bs-file, what reason is there to ignore that request? +flags |= BDRV_REQ_ZERO_WRITE; +/* if the device was not opened with discard=on the below flag + * is immediately cleared again in bdrv_co_do_write_zeroes */ Is it? I only see it being cleared in bdrv_co_write_zeroes(), but that's not a function that seems to be called from here. You are right. Question, do we want to support detect_zeroes = unmap if discard = ignore? If yes, I have to update the docs. Otherwise I have to check for BDRV_O_DISCARD before setting BDRV_REQ_MAY_UNMAP. I think it would be reasonable enough to just error out when you try to
[Qemu-devel] [Bug 1303926] Re: qemu-system-x86_64 crashed with SIGABRT
Hi Serge, I think I have already reported the required information a number of times with the Ubuntu built-in bug reporting facility (apport?), which asked me to report the crash information to developers. Are you able to find it out or do I need to manually open a new bug? Thanks you. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1303926 Title: qemu-system-x86_64 crashed with SIGABRT Status in QEMU: New Status in “qemu” package in Ubuntu: Fix Released Bug description: I've been getting this periodically since my upgrade to qemu 2.0 in trusty this morning. ProblemType: Crash DistroRelease: Ubuntu 14.04 Package: qemu-system-x86 2.0.0~rc1+dfsg-0ubuntu1 ProcVersionSignature: Ubuntu 3.13.0-23.45-generic 3.13.8 Uname: Linux 3.13.0-23-generic x86_64 ApportVersion: 2.14.1-0ubuntu1 Architecture: amd64 Date: Mon Apr 7 13:31:53 2014 ExecutablePath: /usr/bin/qemu-system-x86_64 InstallationDate: Installed on 2013-11-26 (131 days ago) InstallationMedia: Ubuntu 13.10 Saucy Salamander - Release amd64 (20131016.1) ProcEnviron: PATH=(custom, no user) Registers: No symbol table is loaded. Use the file command. Signal: 6 SourcePackage: qemu StacktraceTop: Title: qemu-system-x86_64 crashed with SIGABRT UpgradeStatus: Upgraded to trusty on 2014-01-17 (79 days ago) UserGroups: To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1303926/+subscriptions
Re: [Qemu-devel] [PATCH v2 01/16] x86: Convert conditional compilation of debug printfs to regular ifs
El Tue, 13 May 2014 08:38:26 -0600 Eric Blake ebl...@redhat.com escribió: Although __VA_ARGS__ is required by C99, the use of ##__VA_ARGS__ is a gcc extension; are you sure that all other supported compilers handle it? (I guess that's just clang) If you want something portable to C99, just use one fewer macro argument, so that you are guaranteed that __VA_ARGS__ will be non-empty (that is, subsume fmt into the ...): #define DEBUG(...)\ QEMU_DPRINTF(DEVICE_ASSIGNMENT_DEBUG_ENABLED, \ pci_assign, __VA_ARGS__) I found some problems to convert from ## __VA_ARGS__ to __VA_ARGS__ in some parts, and as I asked in IRC, it was pointed that our HACKING document recommends the fmt, ## __VA_ARGS__ approach, and it obviously works on all the compilers we care about. So I will leave this as it is now. Marc
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
Am 15.05.2014 11:52, schrieb Michael S. Tsirkin: On Thu, May 15, 2014 at 11:20:18AM +0200, Andreas Färber wrote: Am 15.05.2014 09:04, schrieb Greg Kurz: On Thu, 15 May 2014 12:16:35 +0530 Amit Shah amit.s...@redhat.com wrote: On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Since backwards migration is not supported upstream, wouldn't it be easiest to just add support for the subsection marker and skipping to the end of section in that downstream? Backwards and forwards migration need to be supported, customers told us repeatedly. So some downstreams support this and not supporting it upstream just means downstreams need to do their own thing. As importantly, ping-pong migration is the only reliable way to stress migration. So if we want to test cross-version we need it to work both way. Finally, the real issue and difficulty with cross-version migration is making VM behave in a backwards compatible way. Serializing in a compatible way is a trivial problem, or would be if the code wasn't a mess :) Once you do the hard part, breaking migration because of the trivial serialization issue is just silly. And special-casing forward migration does not make code simpler, it really only leads to proliferation of hacks and lack of symmetry. So yes it's a useful feature, and no not supporting it does not help anyway. It seems you misunderstand. I was not saying it's not useful. My point is that VMStateSubsections added in newer versions (and thus not existing in older versions) need to be handled for any VMState-converted devices. So why can't we make that work for virtio too? Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
[Qemu-devel] [PATCH v2 3/8] xics: Disable flags reset on xics reset
Since islsi[] array has been merged into the ICSState struct, we must not reset flags as they tell if the interrupt is in use. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/intc/xics.c | 7 +++ hw/intc/xics_kvm.c | 7 +++ 2 files changed, 14 insertions(+) diff --git a/hw/intc/xics.c b/hw/intc/xics.c index 7065978..83a809e 100644 --- a/hw/intc/xics.c +++ b/hw/intc/xics.c @@ -521,11 +521,18 @@ static void ics_reset(DeviceState *dev) { ICSState *ics = ICS(dev); int i; +uint8_t flags[ics-nr_irqs]; + +for (i = 0; i ics-nr_irqs; i++) { +flags[i] = ics-irqs[i].flags; +} memset(ics-irqs, 0, sizeof(ICSIRQState) * ics-nr_irqs); + for (i = 0; i ics-nr_irqs; i++) { ics-irqs[i].priority = 0xff; ics-irqs[i].saved_priority = 0xff; +ics-irqs[i].flags = flags[i]; } } diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c index 8719a88..6fa2412 100644 --- a/hw/intc/xics_kvm.c +++ b/hw/intc/xics_kvm.c @@ -271,11 +271,18 @@ static void ics_kvm_reset(DeviceState *dev) { ICSState *ics = ICS(dev); int i; +uint8_t flags[ics-nr_irqs]; + +for (i = 0; i ics-nr_irqs; i++) { +flags[i] = ics-irqs[i].flags; +} memset(ics-irqs, 0, sizeof(ICSIRQState) * ics-nr_irqs); + for (i = 0; i ics-nr_irqs; i++) { ics-irqs[i].priority = 0xff; ics-irqs[i].saved_priority = 0xff; +ics-irqs[i].flags = flags[i]; } ics_set_kvm_state(ics, 1); -- 1.9.rc0
Re: [Qemu-devel] Qemu stucking
Hi Mark I tried booting KVM using qemy-system-ppc with your suggesstion but ended up stucking at below logs.. /usr/bin/qemu-system-ppc -m 512 -nographic -hda kvm/debian_lenny_powerpc_standard.qcow2 qemu-system-ppc: pci_add_option_rom: failed to find romfile efi-ne2k_pci.rom set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle = OpenBIOS 1.1 [May 26 2013 13:52] Configuration device id QEMU version 1 machine id 2 CPUs: 1 Memory: 512M UUID: ---- CPU type PowerPC,750 Actually I'm trying to boot KVM from the powerpc board having ubuntu 13.10 on it,so i need to provide the -nographic option. Please help regarding this. On Thu, May 15, 2014 at 3:14 PM, sonia verma soniaverma9...@gmail.comwrote: Hi Mark Thanks for the reply. I'll test and let you know the result soon. On Thu, May 15, 2014 at 2:36 PM, Mark Cave-Ayland mark.cave-ayl...@ilande.co.uk wrote: On 15/05/14 06:41, sonia verma wrote: Hi I'm getting below error when trying to boot the KVM with ethernet bridging,kvm support and universel TUN enabled by the following command.. /usr/bin/qemu-system-ppc64 -m 512 -nographic -hda /var/volatile/debian_lenny_ powerpc_standard.qcow2 FWIW I think this image will boot with normal qemu-system-ppc too. cannot manage 'OHCI USB controller' PCI device type 'usb': 106b 3f (c 3 10) = OpenBIOS 1.0 [Aug 28 2012 05:40] Wow, that's fairly old... Configuration device id QEMU version 1 machine id 3 CPUs: 1 Memory: 512M UUID: ---- CPU type PowerPC,970FX Welcome to OpenBIOS v1.0 built on Aug 28 2012 05:40 Second-stage QUIK loader Welcome to quik. mate is good. Debian GNU/Linux PowerPCchosen/bootargs = boot: ` Enter the kernel image name as [device:][partno]/path, where partno is a number from 0 to 16. Instead of /path you can type [mm-nn] to specify a range of disk blocks (512B) boot: Linux initrd imagename = /initrd.img, mem_size: 4406840 initrd_start: Starting at 51, , 1024 OF stdout device is: /pci@f000/mac-io@e/escc@13000/ch-b@13000 command line: console=ttyS0,9600 console=tty0 memory layout at init: alloc_bottom : 00c01000 alloc_top: 2000 alloc_top_hi : 2000 rmo_top : 2000 ram_top : 2000 Looking for displays found display : /pci@f000/QEMU,VGA@c, opening ... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x00c02000 - 0x00c024f0 Device tree struct 0x00c03000 - 0x00c05000 Calling quiesce ... returning from prom_init The prompt is stuck at init and is not able to proceed even after wating for 15-20 minutes. Please help regarding this. In my testing here, I tend to find that some versions of Linux don't particularly like running in -nographic mode and hangs at the point above. Some kernels even get upset running with a 32/15-bit display and must be forced to 8-bit instead :/ Can you try removing -nographic and instead try one or both of the following: -g 800x600x32 -g 800x600x8 And if those don't work, try the same image again but with qemu-system-ppc rather than qemu-system-ppc64. HTH, Mark.
[Qemu-devel] [PATCH v2 1/8] xics: Add flags for interrupts
The existing interrupt allocation scheme in SPAPR assumes that interrupts are allocated at the start time, continously and the config will not change. However, there are cases when this is not going to work such as: 1. migration - we will have to have an ability to choose interrupt numbers for devices in the command line and this will create gaps in interrupt space. 2. PCI hotplug - interrupts from unplugged device need to be returned back to interrupt pool, otherwise we will quickly run out of interrupts. This replaces a separate lslsi[] array with a byte in the ICSIRQState struct and defines LSI and MSI flags. Neither of these flags set signals that the descriptor is not allocated and not in use. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/intc/xics.c| 23 --- hw/intc/xics_kvm.c| 5 ++--- include/hw/ppc/xics.h | 6 +- 3 files changed, 23 insertions(+), 11 deletions(-) diff --git a/hw/intc/xics.c b/hw/intc/xics.c index 64aabe7..220ca0e 100644 --- a/hw/intc/xics.c +++ b/hw/intc/xics.c @@ -438,7 +438,7 @@ static void ics_set_irq(void *opaque, int srcno, int val) { ICSState *ics = (ICSState *)opaque; -if (ics-islsi[srcno]) { +if (ics-irqs[srcno].flags XICS_FLAGS_IRQ_LSI) { set_irq_lsi(ics, srcno, val); } else { set_irq_msi(ics, srcno, val); @@ -475,7 +475,7 @@ static void ics_write_xive(ICSState *ics, int nr, int server, trace_xics_ics_write_xive(nr, srcno, server, priority); -if (ics-islsi[srcno]) { +if (ics-irqs[srcno].flags XICS_FLAGS_IRQ_LSI) { write_xive_lsi(ics, srcno); } else { write_xive_msi(ics, srcno); @@ -497,7 +497,7 @@ static void ics_resend(ICSState *ics) for (i = 0; i ics-nr_irqs; i++) { /* FIXME: filter by server#? */ -if (ics-islsi[i]) { +if (ics-irqs[i].flags XICS_FLAGS_IRQ_LSI) { resend_lsi(ics, i); } else { resend_msi(ics, i); @@ -512,7 +512,7 @@ static void ics_eoi(ICSState *ics, int nr) trace_xics_ics_eoi(nr); -if (ics-islsi[srcno]) { +if (ics-irqs[srcno].flags XICS_FLAGS_IRQ_LSI) { irq-status = ~XICS_STATUS_SENT; } } @@ -609,7 +609,6 @@ static void ics_realize(DeviceState *dev, Error **errp) return; } ics-irqs = g_malloc0(ics-nr_irqs * sizeof(ICSIRQState)); -ics-islsi = g_malloc0(ics-nr_irqs * sizeof(bool)); ics-qirqs = qemu_allocate_irqs(ics_set_irq, ics, ics-nr_irqs); } @@ -646,11 +645,21 @@ qemu_irq xics_get_qirq(XICSState *icp, int irq) return icp-ics-qirqs[irq - icp-ics-offset]; } +static void ics_set_irq_type(ICSState *ics, int srcno, bool lsi) +{ +assert(!(ics-irqs[srcno].flags XICS_FLAGS_IRQ_MASK)); + +ics-irqs[srcno].flags |= +lsi ? XICS_FLAGS_IRQ_LSI : XICS_FLAGS_IRQ_MSI; +} + void xics_set_irq_type(XICSState *icp, int irq, bool lsi) { -assert(ics_valid_irq(icp-ics, irq)); +ICSState *ics = icp-ics; -icp-ics-islsi[irq - icp-ics-offset] = lsi; +assert(ics_valid_irq(ics, irq)); + +ics_set_irq_type(ics, irq - ics-offset, lsi); } /* diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c index 09476ae..8719a88 100644 --- a/hw/intc/xics_kvm.c +++ b/hw/intc/xics_kvm.c @@ -224,7 +224,7 @@ static int ics_set_kvm_state(ICSState *ics, int version_id) state |= KVM_XICS_MASKED; } -if (ics-islsi[i]) { +if (ics-irqs[i].flags XICS_FLAGS_IRQ_LSI) { state |= KVM_XICS_LEVEL_SENSITIVE; if (irq-status XICS_STATUS_ASSERTED) { state |= KVM_XICS_PENDING; @@ -253,7 +253,7 @@ static void ics_kvm_set_irq(void *opaque, int srcno, int val) int rc; args.irq = srcno + ics-offset; -if (!ics-islsi[srcno]) { +if (ics-irqs[srcno].flags XICS_FLAGS_IRQ_MSI) { if (!val) { return; } @@ -290,7 +290,6 @@ static void ics_kvm_realize(DeviceState *dev, Error **errp) return; } ics-irqs = g_malloc0(ics-nr_irqs * sizeof(ICSIRQState)); -ics-islsi = g_malloc0(ics-nr_irqs * sizeof(bool)); ics-qirqs = qemu_allocate_irqs(ics_kvm_set_irq, ics, ics-nr_irqs); } diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h index 0d7673d..fa8e9c2 100644 --- a/include/hw/ppc/xics.h +++ b/include/hw/ppc/xics.h @@ -136,7 +136,6 @@ struct ICSState { uint32_t nr_irqs; uint32_t offset; qemu_irq *qirqs; -bool *islsi; ICSIRQState *irqs; XICSState *icp; }; @@ -150,6 +149,11 @@ struct ICSIRQState { #define XICS_STATUS_REJECTED 0x4 #define XICS_STATUS_MASKED_PENDING 0x8 uint8_t status; +/* (flags XICS_FLAGS_IRQ_MASK) == 0 means the interrupt is not allocated */ +#define XICS_FLAGS_IRQ_LSI 0x1 +#define XICS_FLAGS_IRQ_MSI 0x2 +#define XICS_FLAGS_IRQ_MASK0x3 +uint8_t flags; }; qemu_irq xics_get_qirq(XICSState *icp, int irq); -- 1.9.rc0
[Qemu-devel] [PATCH v2 5/8] xics: Remove obsolete xics_set_irq_type()
This removes xics_set_irq_type() as it is not used anymore. This is done by a separate patch to make the previous patch look nicer. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/intc/xics.c| 11 --- include/hw/ppc/xics.h | 1 - 2 files changed, 12 deletions(-) diff --git a/hw/intc/xics.c b/hw/intc/xics.c index fdcbb3a..1a8cfd7 100644 --- a/hw/intc/xics.c +++ b/hw/intc/xics.c @@ -678,17 +678,6 @@ static void ics_set_irq_type(ICSState *ics, int srcno, bool lsi) lsi ? XICS_FLAGS_IRQ_LSI : XICS_FLAGS_IRQ_MSI; } -void xics_set_irq_type(XICSState *icp, int irq, bool lsi) -{ -int src = xics_find_source(icp, irq); -ICSState *ics; - -assert(src = 0); - -ics = icp-ics[src]; -ics_set_irq_type(ics, irq - ics-offset, lsi); -} - #define ICS_IRQ_FREE(ics, srcno) \ (!((ics)-irqs[(srcno)].flags (XICS_FLAGS_IRQ_MASK))) diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h index 8e13488..0d8af1b 100644 --- a/include/hw/ppc/xics.h +++ b/include/hw/ppc/xics.h @@ -157,7 +157,6 @@ struct ICSIRQState { }; qemu_irq xics_get_qirq(XICSState *icp, int irq); -void xics_set_irq_type(XICSState *icp, int irq, bool lsi); int xics_alloc(XICSState *icp, int src, int irq_hint, bool lsi); int xics_alloc_block(XICSState *icp, int src, int num, bool lsi, bool align); -- 1.9.rc0
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On Thu, May 15, 2014 at 11:58:25AM +0200, Andreas Färber wrote: Am 15.05.2014 11:52, schrieb Michael S. Tsirkin: On Thu, May 15, 2014 at 11:20:18AM +0200, Andreas Färber wrote: Am 15.05.2014 09:04, schrieb Greg Kurz: On Thu, 15 May 2014 12:16:35 +0530 Amit Shah amit.s...@redhat.com wrote: On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Since backwards migration is not supported upstream, wouldn't it be easiest to just add support for the subsection marker and skipping to the end of section in that downstream? Backwards and forwards migration need to be supported, customers told us repeatedly. So some downstreams support this and not supporting it upstream just means downstreams need to do their own thing. As importantly, ping-pong migration is the only reliable way to stress migration. So if we want to test cross-version we need it to work both way. Finally, the real issue and difficulty with cross-version migration is making VM behave in a backwards compatible way. Serializing in a compatible way is a trivial problem, or would be if the code wasn't a mess :) Once you do the hard part, breaking migration because of the trivial serialization issue is just silly. And special-casing forward migration does not make code simpler, it really only leads to proliferation of hacks and lack of symmetry. So yes it's a useful feature, and no not supporting it does not help anyway. It seems you misunderstand. I was not saying it's not useful. My point is that VMStateSubsections added in newer versions (and thus not existing in older versions) need to be handled for any VMState-converted devices. So why can't we make that work for virtio too? Andreas Sure. AFAIK the way to this works is by adding an field_exists callback, and not sending the section when we are in a compat mode. -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
[Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as XICS used to be unable to reuse interrupts which becomes a problem for dynamic MSI reconfiguration which is happening on guest driver reload or PCI hot (un)plug. Another problem is that PHB has a limit of devices supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that. This makes use of new XICS ability to reuse interrupts. This removes cached MSI configuration from SPAPR PHB so the first IRQ number of a device is stored in MSI/MSIX config space so there is no need to store this anywhere else. From now on, SPAPR PHB only keeps flags telling what type of interrupt for which device it has configured in order to return error if (for example) MSIX was enabled and the guest is trying to disable MSI which it has not enabled. This removes a limit for the maximum number of MSIX-enabled devices per PHB, now XICS and PCI bus capacity are the only limitation. This changes migration stream as it fixes vmstate_spapr_pci_msi::name which was wrong since the beginning. This fixed traces to be more informative. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- In reality either MSIX or MSI is enabled, never both. So I could remove msi/msix bitmaps from this patch, would it make sense? --- hw/ppc/spapr_pci.c | 179 +++- include/hw/pci-host/spapr.h | 11 +-- trace-events| 5 +- 3 files changed, 99 insertions(+), 96 deletions(-) diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index e574014..49e0382 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -220,36 +220,12 @@ static void rtas_write_pci_config(PowerPCCPU *cpu, sPAPREnvironment *spapr, } /* - * Find an entry with config_addr or returns the empty one if not found AND - * alloc_new is set. - * At the moment the msi_table entries are never released so there is - * no point to look till the end of the list if we need to find the free entry. - */ -static int spapr_msicfg_find(sPAPRPHBState *phb, uint32_t config_addr, - bool alloc_new) -{ -int i; - -for (i = 0; i SPAPR_MSIX_MAX_DEVS; ++i) { -if (!phb-msi_table[i].nvec) { -break; -} -if (phb-msi_table[i].config_addr == config_addr) { -return i; -} -} -if ((i SPAPR_MSIX_MAX_DEVS) alloc_new) { -trace_spapr_pci_msi(Allocating new MSI config, i, config_addr); -return i; -} - -return -1; -} - -/* * Set MSI/MSIX message data. * This is required for msi_notify()/msix_notify() which * will write at the addresses via spapr_msi_write(). + * + * If hwaddr == 0, all entries will have .data == first_irq i.e. + * table will be reset. */ static void spapr_msi_setmsg(PCIDevice *pdev, hwaddr addr, bool msix, unsigned first_irq, unsigned req_num) @@ -263,12 +239,51 @@ static void spapr_msi_setmsg(PCIDevice *pdev, hwaddr addr, bool msix, return; } -for (i = 0; i req_num; ++i, ++msg.data) { +for (i = 0; i req_num; ++i) { msix_set_message(pdev, i, msg); trace_spapr_pci_msi_setup(pdev-name, i, msg.address); +if (addr) { +++msg.data; +} } } +static unsigned spapr_msi_get(sPAPRPHBState *phb, PCIDevice *pdev, + unsigned *num, bool *msix) +{ +MSIMessage msg; +unsigned irq = 0; +uint8_t offs = (pci_bus_num(pdev-bus) SPAPR_PCI_BUS_SHIFT) | +PCI_SLOT(pdev-devfn); + +if ((phb-msi[offs] (1 PCI_FUNC(pdev-devfn))) +(phb-msix[offs] (1 PCI_FUNC(pdev-devfn { +error_report(Both MSI and MSIX configured! MSIX will be used.); +} + +if (phb-msix[offs] (1 PCI_FUNC(pdev-devfn))) { +*num = pdev-msix_entries_nr; +if (*num) { +msg = msix_get_message(pdev, 0); +irq = msg.data; +if (msix) { +*msix = true; +} +} +} else if (phb-msi[offs] (1 PCI_FUNC(pdev-devfn))) { +*num = msi_nr_vectors_allocated(pdev); +if (*num) { +msg = msi_get_message(pdev, 0); +irq = msg.data; +if (msix) { +*msix = false; +} +} +} + +return irq; +} + static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr, uint32_t token, uint32_t nargs, target_ulong args, uint32_t nret, @@ -280,9 +295,10 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr, unsigned int req_num = rtas_ld(args, 4); /* 0 == remove all */ unsigned int seq_num = rtas_ld(args, 5); unsigned int ret_intr_type; -int ndev, irq, max_irqs = 0; +unsigned int irq, max_irqs = 0, num = 0, offs; sPAPRPHBState *phb = NULL; PCIDevice *pdev = NULL; +bool msix = false; switch (func) { case
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On Thu, 15 May 2014 11:20:18 +0200 Andreas Färber afaer...@suse.de wrote: Am 15.05.2014 09:04, schrieb Greg Kurz: On Thu, 15 May 2014 12:16:35 +0530 Amit Shah amit.s...@redhat.com wrote: On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Since backwards migration is not supported upstream, wouldn't it be easiest to just add support for the subsection marker and skipping to the end of section in that downstream? Not sure I understand well... Do you suggest to stream the markers first, then the device, then the subsections ? And then there would be a way we can have the subsections restored before the device ? All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. IMO this is calling for inversion of control - i.e. let virtio devices call generic load/save functions that then dispatch to device-specific code and let us add common stuff in a central place without forgetting to add calls in some new device. That makes a lot of sense. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. This calls for a way for devices to inherit properties from the bus, which doesn't exist ATM. Fine but let's not hold up this patchset because of this. No, only suggestion is to add a migration section in the bus, and then it's easier to do this in the post-migrate functions for each device -- so only one new section gets introduced instead of all devices being modified to send a new subsection. The main problem I see is that virtio sucks: as you see in patch 8, we have to be careful not to call vring or virtqueue stuff before the device knows its endianness or it breaks... I need to study how the virtio-bus gets migrated to ensure the endian section is streamed before the devices. There is no ordering guarantee. The state needs to be migrated in the device or bus where it sits, if post-load processing is required; i.e., if it's in VirtIODevice then something like this series, if it were on VirtioBus exclusively (device asking bus for its endianness each time and does not do post-load stuff) then endianness could be migrated as a new bus section. Not sure if that would help the broken state though? IIRW the broken state was proposed as a per-device property... Fam, Do you have plans about the broken property ? Is it still needed ? Would touch on Stefan's alias properties for anything but virtio-mmio. OMG... maybe I should hold on then. Regards, Andreas Thanks ! -- Gregory Kurz kurzg...@fr.ibm.com gk...@linux.vnet.ibm.com Software Engineer @ IBM/Meiosys http://www.ibm.com Tel +33 (0)562 165 496 Anarchy is about taking complete responsibility for yourself. Alan Moore.
Re: [Qemu-devel] [PATCH] rdma: Fix block during rdma migration
Michael R. Hines mrhi...@linux.vnet.ibm.com wrote: On 05/09/2014 12:25 PM, Gonglei (Arei) wrote: Hi, -Original Message- From: Michael R. Hines [mailto:mrhi...@linux.vnet.ibm.com] Sent: Tuesday, April 01, 2014 8:42 AM To: Gonglei (Arei); qemu-devel@nongnu.org Cc: Huangweidong (C); quint...@redhat.com; dgilb...@redhat.com; owass...@redhat.com; mrhi...@us.ibm.com; Moyuxiang; pbonz...@redhat.com Subject: Re: [Qemu-devel] [PATCH] rdma: Fix block during rdma migration On 03/29/2014 03:39 PM, arei.gong...@huawei.com wrote: From: Mo Yuxiang moyuxi...@huawei.com If the networking break or there's something wrong with rdma device(ib0 with no IP) during rdma migration, the main_loop of qemu will be blocked in rdma_destroy_id. I add rdma_ack_cm_event to fix this bug. Signed-off-by: Mo Yuxiang moyuxi...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- migration-rdma.c | 1 + 1 file changed, 1 insertion(+) diff --git a/migration-rdma.c b/migration-rdma.c index eeb4302..f60749b 100644 --- a/migration-rdma.c +++ b/migration-rdma.c @@ -949,6 +949,7 @@ route: ERROR(errp, result not equal to event_addr_resolved %s, rdma_event_str(cm_event-event)); perror(rdma_resolve_addr); +rdma_ack_cm_event(cm_event); ret = -EINVAL; goto err_resolve_get_addr; } Reviewed-by: Michael R. Hines mrhi...@us.ibm.com Good catch. =) That's an obvious bug. It looks like I need to do a much better job of kill -9 inside the regression testing scripts - probably i should try killing the migration prematurely at different periods just to be sure there are no more places where the connection state is not getting cleaned up.. - Michael Michael, do you have a plan to pull this patch to master? Thanks. Best regards, -Gonglei Sorry for the late reply, but I'm not the maintainer for migration, that's Juan (I can only signoff on patches like everyone else =). I also have outstanding RDMA patches myself that have not yet been pulled. Would you mind pinging Juan for both of us? Pointer, please? I was waiting for Michael Reviewed-by from Michael. Later, Juan. - Michael - Michael
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
Am 15.05.2014 12:03, schrieb Michael S. Tsirkin: On Thu, May 15, 2014 at 11:58:25AM +0200, Andreas Färber wrote: Am 15.05.2014 11:52, schrieb Michael S. Tsirkin: On Thu, May 15, 2014 at 11:20:18AM +0200, Andreas Färber wrote: Am 15.05.2014 09:04, schrieb Greg Kurz: On Thu, 15 May 2014 12:16:35 +0530 Amit Shah amit.s...@redhat.com wrote: On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Since backwards migration is not supported upstream, wouldn't it be easiest to just add support for the subsection marker and skipping to the end of section in that downstream? Backwards and forwards migration need to be supported, customers told us repeatedly. So some downstreams support this and not supporting it upstream just means downstreams need to do their own thing. As importantly, ping-pong migration is the only reliable way to stress migration. So if we want to test cross-version we need it to work both way. Finally, the real issue and difficulty with cross-version migration is making VM behave in a backwards compatible way. Serializing in a compatible way is a trivial problem, or would be if the code wasn't a mess :) Once you do the hard part, breaking migration because of the trivial serialization issue is just silly. And special-casing forward migration does not make code simpler, it really only leads to proliferation of hacks and lack of symmetry. So yes it's a useful feature, and no not supporting it does not help anyway. It seems you misunderstand. I was not saying it's not useful. My point is that VMStateSubsections added in newer versions (and thus not existing in older versions) need to be handled for any VMState-converted devices. So why can't we make that work for virtio too? Sure. AFAIK the way to this works is by adding an field_exists callback, and not sending the section when we are in a compat mode. OK, but upstream always sends the latest version today. So isn't that just two ifs that you would need to add in save and load functions added here for the downstream? x86_64 is unaffected from ppc's endianness changes and you still do ppc64 BE, so there's no real functional problem for RHEL, is there? Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
[Qemu-devel] [PATCH v2 2/8] xics: Add xics_find_source()
PAPR allows having multiple interrupt sources such as PHB. This adds a source lookup function and makes use of it. Since at the moment QEMU only supports a single source, no change in behaviour is expected. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/intc/xics.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/hw/intc/xics.c b/hw/intc/xics.c index 220ca0e..7065978 100644 --- a/hw/intc/xics.c +++ b/hw/intc/xics.c @@ -635,14 +635,32 @@ static const TypeInfo ics_info = { /* * Exported functions */ +static int xics_find_source(XICSState *icp, int irq) +{ +int sources = 1; +int src; + +/* FIXME: implement multiple sources */ +for (src = 0; src sources; ++src) { +ICSState *ics = icp-ics[src]; +if (ics_valid_irq(ics, irq)) { +return src; +} +} + +return -1; +} qemu_irq xics_get_qirq(XICSState *icp, int irq) { -if (!ics_valid_irq(icp-ics, irq)) { -return NULL; +int src = xics_find_source(icp, irq); + +if (src = 0) { +ICSState *ics = icp-ics[src]; +return ics-qirqs[irq - ics-offset]; } -return icp-ics-qirqs[irq - icp-ics-offset]; +return NULL; } static void ics_set_irq_type(ICSState *ics, int srcno, bool lsi) @@ -655,10 +673,12 @@ static void ics_set_irq_type(ICSState *ics, int srcno, bool lsi) void xics_set_irq_type(XICSState *icp, int irq, bool lsi) { -ICSState *ics = icp-ics; +int src = xics_find_source(icp, irq); +ICSState *ics; -assert(ics_valid_irq(ics, irq)); +assert(src = 0); +ics = icp-ics[src]; ics_set_irq_type(ics, irq - ics-offset, lsi); } -- 1.9.rc0
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On Thu, May 15, 2014 at 12:08:26PM +0200, Greg Kurz wrote: On Thu, 15 May 2014 11:20:18 +0200 Andreas Färber afaer...@suse.de wrote: Am 15.05.2014 09:04, schrieb Greg Kurz: On Thu, 15 May 2014 12:16:35 +0530 Amit Shah amit.s...@redhat.com wrote: On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Since backwards migration is not supported upstream, wouldn't it be easiest to just add support for the subsection marker and skipping to the end of section in that downstream? Not sure I understand well... Do you suggest to stream the markers first, then the device, then the subsections ? And then there would be a way we can have the subsections restored before the device ? All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. IMO this is calling for inversion of control - i.e. let virtio devices call generic load/save functions that then dispatch to device-specific code and let us add common stuff in a central place without forgetting to add calls in some new device. That makes a lot of sense. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. This calls for a way for devices to inherit properties from the bus, which doesn't exist ATM. Fine but let's not hold up this patchset because of this. No, only suggestion is to add a migration section in the bus, and then it's easier to do this in the post-migrate functions for each device -- so only one new section gets introduced instead of all devices being modified to send a new subsection. The main problem I see is that virtio sucks: as you see in patch 8, we have to be careful not to call vring or virtqueue stuff before the device knows its endianness or it breaks... I need to study how the virtio-bus gets migrated to ensure the endian section is streamed before the devices. There is no ordering guarantee. The state needs to be migrated in the device or bus where it sits, if post-load processing is required; i.e., if it's in VirtIODevice then something like this series, if it were on VirtioBus exclusively (device asking bus for its endianness each time and does not do post-load stuff) then endianness could be migrated as a new bus section. Not sure if that would help the broken state though? IIRW the broken state was proposed as a per-device property... Fam, Do you have plans about the broken property ? Is it still needed ? Would touch on Stefan's alias properties for anything but virtio-mmio. OMG... maybe I should hold on then. No need to wait imho. Can this be made even simpler - call this stuff from virtio_save/virtio_load? Why not? Regards, Andreas Thanks ! -- Gregory Kurz kurzg...@fr.ibm.com gk...@linux.vnet.ibm.com Software Engineer @ IBM/Meiosys http://www.ibm.com Tel +33 (0)562 165 496 Anarchy is about taking complete responsibility for yourself. Alan Moore.
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On Thu, 15 May 2014 12:08:26 +0200 Greg Kurz gk...@linux.vnet.ibm.com wrote: On Thu, 15 May 2014 11:20:18 +0200 Andreas Färber afaer...@suse.de wrote: Am 15.05.2014 09:04, schrieb Greg Kurz: On Thu, 15 May 2014 12:16:35 +0530 Amit Shah amit.s...@redhat.com wrote: On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Since backwards migration is not supported upstream, wouldn't it be easiest to just add support for the subsection marker and skipping to the end of section in that downstream? Not sure I understand well... Do you suggest to stream the markers first, then the device, then the subsections ? And then there would be a way we can have the subsections restored before the device ? Heh just got the clarification thanks to Michael's mail... Sorry for the confusion and noise. :P All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. IMO this is calling for inversion of control - i.e. let virtio devices call generic load/save functions that then dispatch to device-specific code and let us add common stuff in a central place without forgetting to add calls in some new device. That makes a lot of sense. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. This calls for a way for devices to inherit properties from the bus, which doesn't exist ATM. Fine but let's not hold up this patchset because of this. No, only suggestion is to add a migration section in the bus, and then it's easier to do this in the post-migrate functions for each device -- so only one new section gets introduced instead of all devices being modified to send a new subsection. The main problem I see is that virtio sucks: as you see in patch 8, we have to be careful not to call vring or virtqueue stuff before the device knows its endianness or it breaks... I need to study how the virtio-bus gets migrated to ensure the endian section is streamed before the devices. There is no ordering guarantee. The state needs to be migrated in the device or bus where it sits, if post-load processing is required; i.e., if it's in VirtIODevice then something like this series, if it were on VirtioBus exclusively (device asking bus for its endianness each time and does not do post-load stuff) then endianness could be migrated as a new bus section. Not sure if that would help the broken state though? IIRW the broken state was proposed as a per-device property... Fam, Do you have plans about the broken property ? Is it still needed ? Would touch on Stefan's alias properties for anything but virtio-mmio. OMG... maybe I should hold on then. Regards, Andreas Thanks ! -- Gregory Kurz kurzg...@fr.ibm.com gk...@linux.vnet.ibm.com Software Engineer @ IBM/Meiosys http://www.ibm.com Tel +33 (0)562 165 496 Anarchy is about taking complete responsibility for yourself. Alan Moore.
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On Thu, May 15, 2014 at 12:11:12PM +0200, Andreas Färber wrote: Am 15.05.2014 12:03, schrieb Michael S. Tsirkin: On Thu, May 15, 2014 at 11:58:25AM +0200, Andreas Färber wrote: Am 15.05.2014 11:52, schrieb Michael S. Tsirkin: On Thu, May 15, 2014 at 11:20:18AM +0200, Andreas Färber wrote: Am 15.05.2014 09:04, schrieb Greg Kurz: On Thu, 15 May 2014 12:16:35 +0530 Amit Shah amit.s...@redhat.com wrote: On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Since backwards migration is not supported upstream, wouldn't it be easiest to just add support for the subsection marker and skipping to the end of section in that downstream? Backwards and forwards migration need to be supported, customers told us repeatedly. So some downstreams support this and not supporting it upstream just means downstreams need to do their own thing. As importantly, ping-pong migration is the only reliable way to stress migration. So if we want to test cross-version we need it to work both way. Finally, the real issue and difficulty with cross-version migration is making VM behave in a backwards compatible way. Serializing in a compatible way is a trivial problem, or would be if the code wasn't a mess :) Once you do the hard part, breaking migration because of the trivial serialization issue is just silly. And special-casing forward migration does not make code simpler, it really only leads to proliferation of hacks and lack of symmetry. So yes it's a useful feature, and no not supporting it does not help anyway. It seems you misunderstand. I was not saying it's not useful. My point is that VMStateSubsections added in newer versions (and thus not existing in older versions) need to be handled for any VMState-converted devices. So why can't we make that work for virtio too? Sure. AFAIK the way to this works is by adding an field_exists callback, and not sending the section when we are in a compat mode. OK, but upstream always sends the latest version today. So isn't that just two ifs that you would need to add in save and load functions added here for the downstream? x86_64 is unaffected from ppc's endianness changes and you still do ppc64 BE, so there's no real functional problem for RHEL, is there? Andreas I'm sorry I don't understand what you are saying here. Simply put, if you specify a compatible -M then your device should behave, and migrate, exactly like and old qemu did. With *any* change, there's always the temptation to say oh but old behaviour was too buggy we should just ignore it. Seems to make sense in each case but does not scale, you end up with breakages in all version without exception. So this should be a very high bar to overcome. -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] Qemu stucking
On 15/05/14 11:01, sonia verma wrote: Hi Mark I tried booting KVM using qemy-system-ppc with your suggesstion but ended up stucking at below logs.. /usr/bin/qemu-system-ppc -m 512 -nographic -hda kvm/debian_lenny_powerpc_standard.qcow2 qemu-system-ppc: pci_add_option_rom: failed to find romfile efi-ne2k_pci.rom set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle This looks broken :/ I've seen similar errors to this caused by faulty gcc compilers around the 4.7 era breaking OpenBIOS binaries, but I always test the images supplied with stock QEMU to make sure they work for me here before sending a pull request. Perhaps it is something within the KVM code? Does it work if you use standard QEMU? ATB, Mark. P.S. You should also CC qemu-ppc as that's where the PPC users tend to hang out...
[Qemu-devel] [PATCH v2 4/8] spapr: Move interrupt allocator to xics
The current allocator returns IRQ numbers from a pool and does not support IRQs reuse in any form as it did not keep track of what it previously returned, it only keeps the last returned IRQ. Some use cases such as PCI hot(un)plug may require IRQ release and reallocation. This moves an allocator from SPAPR to XICS. This switches IRQ users to use new API. This uses LSI/MSI flags to know if interrupt is allocated. The interrupt release function will be posted as a separate patch. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/intc/xics.c | 88 ++ hw/ppc/spapr.c | 67 -- hw/ppc/spapr_events.c | 2 +- hw/ppc/spapr_pci.c | 6 ++-- hw/ppc/spapr_vio.c | 2 +- include/hw/ppc/spapr.h | 10 -- include/hw/ppc/xics.h | 2 ++ trace-events | 4 +++ 8 files changed, 99 insertions(+), 82 deletions(-) diff --git a/hw/intc/xics.c b/hw/intc/xics.c index 83a809e..fdcbb3a 100644 --- a/hw/intc/xics.c +++ b/hw/intc/xics.c @@ -689,6 +689,94 @@ void xics_set_irq_type(XICSState *icp, int irq, bool lsi) ics_set_irq_type(ics, irq - ics-offset, lsi); } +#define ICS_IRQ_FREE(ics, srcno) \ +(!((ics)-irqs[(srcno)].flags (XICS_FLAGS_IRQ_MASK))) + +static int ics_find_free_block(ICSState *ics, int num, int alignnum) +{ +int first, i; + +for (first = 0; first ics-nr_irqs; first += alignnum) { +if (num (ics-nr_irqs - first)) { +return -1; +} +for (i = first; i first + num; ++i) { +if (!ICS_IRQ_FREE(ics, i)) { +break; +} +} +if (i == (first + num)) { +return first; +} +} + +return -1; +} + +int xics_alloc(XICSState *icp, int src, int irq_hint, bool lsi) +{ +ICSState *ics = icp-ics[src]; +int irq; + +if (irq_hint) { +assert(src == xics_find_source(icp, irq_hint)); +if (!ICS_IRQ_FREE(ics, irq_hint - ics-offset)) { +trace_xics_alloc_failed_hint(src, irq_hint); +return -1; +} +irq = irq_hint; +} else { +irq = ics_find_free_block(ics, 1, 1); +if (irq 0) { +trace_xics_alloc_failed_no_left(src); +return -1; +} +irq += ics-offset; +} + +ics_set_irq_type(ics, irq - ics-offset, lsi); +trace_xics_alloc(src, irq); + +return irq; +} + +/* + * Allocate block of consequtive IRQs, returns a number of the first. + * If align==true, aligns the first IRQ number to num. + */ +int xics_alloc_block(XICSState *icp, int src, int num, bool lsi, bool align) +{ +int i, first = -1; +ICSState *ics = icp-ics[src]; + +assert(src == 0); +/* + * MSIMesage::data is used for storing VIRQ so + * it has to be aligned to num to support multiple + * MSI vectors. MSI-X is not affected by this. + * The hint is used for the first IRQ, the rest should + * be allocated continuously. + */ +if (align) { +assert((num == 1) || (num == 2) || (num == 4) || + (num == 8) || (num == 16) || (num == 32)); +first = ics_find_free_block(ics, num, num); +} else { +first = ics_find_free_block(ics, num, 1); +} + +if (first = 0) { +for (i = first; i first + num; ++i) { +ics_set_irq_type(ics, i, lsi); +} +} +first += ics-offset; + +trace_xics_alloc_block(src, first, num, lsi, align); + +return first; +} + /* * Guest interfaces */ diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index b898a39..fce7e1c 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -88,73 +88,6 @@ sPAPREnvironment *spapr; -int spapr_allocate_irq(int hint, bool lsi) -{ -int irq; - -if (hint) { -irq = hint; -if (hint = spapr-next_irq) { -spapr-next_irq = hint + 1; -} -/* FIXME: we should probably check for collisions somehow */ -} else { -irq = spapr-next_irq++; -} - -/* Configure irq type */ -if (!xics_get_qirq(spapr-icp, irq)) { -return 0; -} - -xics_set_irq_type(spapr-icp, irq, lsi); - -return irq; -} - -/* - * Allocate block of consequtive IRQs, returns a number of the first. - * If msi==true, aligns the first IRQ number to num. - */ -int spapr_allocate_irq_block(int num, bool lsi, bool msi) -{ -int first = -1; -int i, hint = 0; - -/* - * MSIMesage::data is used for storing VIRQ so - * it has to be aligned to num to support multiple - * MSI vectors. MSI-X is not affected by this. - * The hint is used for the first IRQ, the rest should - * be allocated continuously. - */ -if (msi) { -assert((num == 1) || (num == 2) || (num == 4) || - (num == 8) || (num == 16) || (num == 32)); -hint = (spapr-next_irq + num - 1) ~(num - 1); -} - -for (i = 0; i num; ++i) { -int irq; - -irq =
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
On Thu, 15 May 2014 13:12:12 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Thu, May 15, 2014 at 12:08:26PM +0200, Greg Kurz wrote: On Thu, 15 May 2014 11:20:18 +0200 Andreas Färber afaer...@suse.de wrote: Am 15.05.2014 09:04, schrieb Greg Kurz: On Thu, 15 May 2014 12:16:35 +0530 Amit Shah amit.s...@redhat.com wrote: On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Since backwards migration is not supported upstream, wouldn't it be easiest to just add support for the subsection marker and skipping to the end of section in that downstream? Not sure I understand well... Do you suggest to stream the markers first, then the device, then the subsections ? And then there would be a way we can have the subsections restored before the device ? All users of virtio_load()/virtio_save() need to be patched because the subsections are streamed AFTER the device itself. IMO this is calling for inversion of control - i.e. let virtio devices call generic load/save functions that then dispatch to device-specific code and let us add common stuff in a central place without forgetting to add calls in some new device. That makes a lot of sense. Since all have the same fixup, I'm wondering if a new section can be added to the virtio-bus itself, which gets propagated to all devices upon load in the dest. This calls for a way for devices to inherit properties from the bus, which doesn't exist ATM. Fine but let's not hold up this patchset because of this. No, only suggestion is to add a migration section in the bus, and then it's easier to do this in the post-migrate functions for each device -- so only one new section gets introduced instead of all devices being modified to send a new subsection. The main problem I see is that virtio sucks: as you see in patch 8, we have to be careful not to call vring or virtqueue stuff before the device knows its endianness or it breaks... I need to study how the virtio-bus gets migrated to ensure the endian section is streamed before the devices. There is no ordering guarantee. The state needs to be migrated in the device or bus where it sits, if post-load processing is required; i.e., if it's in VirtIODevice then something like this series, if it were on VirtioBus exclusively (device asking bus for its endianness each time and does not do post-load stuff) then endianness could be migrated as a new bus section. Not sure if that would help the broken state though? IIRW the broken state was proposed as a per-device property... Fam, Do you have plans about the broken property ? Is it still needed ? Would touch on Stefan's alias properties for anything but virtio-mmio. OMG... maybe I should hold on then. No need to wait imho. Can this be made even simpler - call this stuff from virtio_save/virtio_load? Andreas already suggested this inversion of control. Why not? No reason indeed. I'll rewrite the code that way ! :) Regards, Andreas Thanks ! -- Gregory Kurz kurzg...@fr.ibm.com gk...@linux.vnet.ibm.com Software Engineer @ IBM/Meiosys http://www.ibm.com Tel +33 (0)562 165 496 Anarchy is about taking complete responsibility for yourself. Alan Moore. Thnaks ! -- Gregory Kurz kurzg...@fr.ibm.com gk...@linux.vnet.ibm.com Software Engineer @ IBM/Meiosys http://www.ibm.com Tel +33 (0)562 165 496 Anarchy is about taking complete responsibility for yourself. Alan Moore.
[Qemu-devel] [PATCH v2 6/8] spapr: Remove @next_irq
This removes @next_irq from sPAPREnvironment which was used in old IRQ allocator as XICS is now responsible for IRQs and keeps track of allocated IRQs. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/ppc/spapr.c | 3 +-- include/hw/ppc/spapr.h | 1 - 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index fce7e1c..d95e9b4 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -767,7 +767,7 @@ static const VMStateDescription vmstate_spapr = { .minimum_version_id = 1, .minimum_version_id_old = 1, .fields = (VMStateField []) { -VMSTATE_UINT32(next_irq, sPAPREnvironment), +VMSTATE_UNUSED(4), /* used to be @next_irq */ /* RTC offset */ VMSTATE_UINT64(rtc_offset, sPAPREnvironment), @@ -1171,7 +1171,6 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) /* Set up Interrupt Controller before we create the VCPUs */ spapr-icp = xics_system_init(smp_cpus * kvmppc_smt_threads() / smp_threads, XICS_IRQS); -spapr-next_irq = XICS_IRQ_BASE; /* init CPUs */ if (cpu_model == NULL) { diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index feb241a..f8d7326 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -27,7 +27,6 @@ typedef struct sPAPREnvironment { long rtas_size; void *fdt_skel; target_ulong entry_point; -uint32_t next_irq; uint64_t rtc_offset; bool has_graphics; -- 1.9.rc0
[Qemu-devel] [PATCH v2 7/8] xics: Implement xics_ics_free()
This implements interrupt release function so IRQs can be returned back to the pool for reuse in cases such as PCI hot plug. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/intc/xics.c| 25 + include/hw/ppc/xics.h | 1 + trace-events | 2 ++ 3 files changed, 28 insertions(+) diff --git a/hw/intc/xics.c b/hw/intc/xics.c index 1a8cfd7..f6a2066 100644 --- a/hw/intc/xics.c +++ b/hw/intc/xics.c @@ -766,6 +766,31 @@ int xics_alloc_block(XICSState *icp, int src, int num, bool lsi, bool align) return first; } +static void ics_free(ICSState *ics, int srcno, int num) +{ +int i; + +for (i = srcno; i srcno + num; ++i) { +if (ICS_IRQ_FREE(ics, i)) { +trace_xics_ics_free_warn(ics - ics-icp-ics, i + ics-offset); +} +memset(ics-irqs[i], 0, sizeof(ICSIRQState)); +} +} + +void xics_free(XICSState *icp, int src, int irq, int num) +{ +ICSState *ics = icp-ics[src]; + +/* FIXME: implement multiple sources */ +assert(src == 0); + +trace_xics_ics_free(ics - icp-ics, irq, num); +if (src = 0) { +ics_free(ics, irq - ics-offset, num); +} +} + /* * Guest interfaces */ diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h index 0d8af1b..267d045 100644 --- a/include/hw/ppc/xics.h +++ b/include/hw/ppc/xics.h @@ -159,6 +159,7 @@ struct ICSIRQState { qemu_irq xics_get_qirq(XICSState *icp, int irq); int xics_alloc(XICSState *icp, int src, int irq_hint, bool lsi); int xics_alloc_block(XICSState *icp, int src, int num, bool lsi, bool align); +void xics_free(XICSState *icp, int src, int irq, int num); void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu); diff --git a/trace-events b/trace-events index d8e83cc..0a66c71 100644 --- a/trace-events +++ b/trace-events @@ -1182,6 +1182,8 @@ xics_alloc(int src, int irq) source#%d, irq %d xics_alloc_failed_hint(int src, int irq) source#%d, irq %d is already in use xics_alloc_failed_no_left(int src) source#%d, no irq left xics_alloc_block(int src, int first, int num, bool lsi, int align) source#%d, first irq %d, %d irqs, lsi=%d, alignnum %d +xics_ics_free(int src, int irq, int num) Source#%d, first irq %d, %d irqs +xics_ics_free_warn(int src, int irq) Source#%d, irq %d is already free # hw/ppc/spapr_iommu.c spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) liobn=%PRIx64 ioba=0x%PRIx64 tce=0x%PRIx64 ret=%PRId64 -- 1.9.rc0
[Qemu-devel] [PATCH v2 0/8] Move interrupts from spapr to xics
This moves interrupts allocation business from SPAPR to XICS and makes use of it. Changes: v2: * s/server/source/ * fixed typos, code style, added an assert * added patch for spapr_pci for better IRQ reuse for MSI/MSIX There is just one source at the moment. We might create one per PHB and one per VIO device or VIO bus but I do not see any immediate profit from it. Makes any sense? Please comment. Thanks! Alexey Kardashevskiy (8): xics: Add flags for interrupts xics: Add xics_find_source() xics: Disable flags reset on xics reset spapr: Move interrupt allocator to xics xics: Remove obsolete xics_set_irq_type() spapr: Remove @next_irq xics: Implement xics_ics_free() spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB hw/intc/xics.c | 160 --- hw/intc/xics_kvm.c | 12 ++- hw/ppc/spapr.c | 70 + hw/ppc/spapr_events.c | 2 +- hw/ppc/spapr_pci.c | 181 +++- hw/ppc/spapr_vio.c | 2 +- include/hw/pci-host/spapr.h | 11 +-- include/hw/ppc/spapr.h | 11 --- include/hw/ppc/xics.h | 10 ++- trace-events| 11 ++- 10 files changed, 275 insertions(+), 195 deletions(-) -- 1.9.rc0
Re: [Qemu-devel] Qemu stucking
Hi Mark The gcc version I'm using is 4.8.1 . It is not working with the standard Qemu. On Thu, May 15, 2014 at 3:46 PM, Mark Cave-Ayland mark.cave-ayl...@ilande.co.uk wrote: On 15/05/14 11:01, sonia verma wrote: Hi Mark I tried booting KVM using qemy-system-ppc with your suggesstion but ended up stucking at below logs.. /usr/bin/qemu-system-ppc -m 512 -nographic -hda kvm/debian_lenny_powerpc_standard.qcow2 qemu-system-ppc: pci_add_option_rom: failed to find romfile efi-ne2k_pci.rom set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle set_property: NULL phandle This looks broken :/ I've seen similar errors to this caused by faulty gcc compilers around the 4.7 era breaking OpenBIOS binaries, but I always test the images supplied with stock QEMU to make sure they work for me here before sending a pull request. Perhaps it is something within the KVM code? Does it work if you use standard QEMU? ATB, Mark. P.S. You should also CC qemu-ppc as that's where the PPC users tend to hang out...
Re: [Qemu-devel] Qemu stucking
On 15/05/14 11:50, sonia verma wrote: Hi Mark The gcc version I'm using is 4.8.1 . It is not working with the standard Qemu. Unfortunately if it doesn't work with standard QEMU then it sounds as if there is something wrong with either your OpenBIOS binary or build environment. The debian_lenny_powerpc_standard.qcow2 image boots fine for me (albeit not with -nographic) using a fresh build from git. I would suggest trying the same, using the pre-built OpenBIOS images and see if that works. It may also be worth raising a bug report with your distro if they are shipping corrupt binaries. Kind regards, Mark.
Re: [Qemu-devel] Qemu stucking
Hi Mark Thanks for the information.It will help me alot. I'll let you know if any further issues. On Thu, May 15, 2014 at 4:32 PM, Mark Cave-Ayland mark.cave-ayl...@ilande.co.uk wrote: On 15/05/14 11:50, sonia verma wrote: Hi Mark The gcc version I'm using is 4.8.1 . It is not working with the standard Qemu. Unfortunately if it doesn't work with standard QEMU then it sounds as if there is something wrong with either your OpenBIOS binary or build environment. The debian_lenny_powerpc_standard.qcow2 image boots fine for me (albeit not with -nographic) using a fresh build from git. I would suggest trying the same, using the pre-built OpenBIOS images and see if that works. It may also be worth raising a bug report with your distro if they are shipping corrupt binaries. Kind regards, Mark.
[Qemu-devel] [PATCH v2 1/2] block: Move declaration of bdrv_get_aio_context to block.h
block_int.h is for block layer and block drivers, other code shouldn't include it. But similar to bdrv_set_aio_context, bdrv_get_aio_context should also be accessible from outside of block layer. Move it. Signed-off-by: Fam Zheng f...@redhat.com --- include/block/block.h | 7 +++ include/block/block_int.h | 7 --- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index d223e33..6676d1f 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -534,6 +534,13 @@ int bdrv_debug_resume(BlockDriverState *bs, const char *tag); bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag); /** + * bdrv_get_aio_context: + * + * Returns: the currently bound #AioContext + */ +AioContext *bdrv_get_aio_context(BlockDriverState *bs); + +/** * bdrv_set_aio_context: * * Changes the #AioContext used for fd handlers, timers, and BHs by this diff --git a/include/block/block_int.h b/include/block/block_int.h index 47c79b3..a9301d5 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -397,13 +397,6 @@ void bdrv_add_before_write_notifier(BlockDriverState *bs, NotifierWithReturn *notifier); /** - * bdrv_get_aio_context: - * - * Returns: the currently bound #AioContext - */ -AioContext *bdrv_get_aio_context(BlockDriverState *bs); - -/** * bdrv_detach_aio_context: * * May be called from .bdrv_detach_aio_context() to detach children from the -- 1.9.2
[Qemu-devel] [PATCH v2 0/2] dataplane: Enable config-wce
This applies on top of Stefan's dataplane series. v2: Patch 1 moves the declaration of bdrv_get_aio_context to block.h. Patch 2 is unchanged except for the dropped #include. Fam Zheng (2): block: Move declaration of bdrv_get_aio_context to block.h virtio-blk: Allow config-wce in dataplane hw/block/dataplane/virtio-blk.c | 6 -- hw/block/virtio-blk.c | 8 +++- include/block/block.h | 7 +++ include/block/block_int.h | 7 --- 4 files changed, 14 insertions(+), 14 deletions(-) -- 1.9.2
[Qemu-devel] [PATCH v2 2/2] virtio-blk: Allow config-wce in dataplane
Dataplane now uses block layer. Protect bdrv_set_enable_write_cache with aio_context_acquire and aio_context_release, so we can enable config-wce to allow guest to modify the write cache online. Signed-off-by: Fam Zheng f...@redhat.com --- hw/block/dataplane/virtio-blk.c | 6 -- hw/block/virtio-blk.c | 8 +++- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index 79fb612..46a6824 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -332,12 +332,6 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *blk, return; } -if (blk-config_wce) { -error_setg(errp, device is incompatible with x-data-plane, - use config-wce=off); -return; -} - /* If dataplane is (re-)enabled while the guest is running there could be * block jobs that can conflict. */ diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 8a568e5..5e9433d 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -523,7 +523,10 @@ static void virtio_blk_set_config(VirtIODevice *vdev, const uint8_t *config) struct virtio_blk_config blkcfg; memcpy(blkcfg, config, sizeof(blkcfg)); + +aio_context_acquire(bdrv_get_aio_context(s-bs)); bdrv_set_enable_write_cache(s-bs, blkcfg.wce != 0); +aio_context_release(bdrv_get_aio_context(s-bs)); } static uint32_t virtio_blk_get_features(VirtIODevice *vdev, uint32_t features) @@ -582,7 +585,10 @@ static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t status) * s-bs would erroneously be placed in writethrough mode. */ if (!(features (1 VIRTIO_BLK_F_CONFIG_WCE))) { -bdrv_set_enable_write_cache(s-bs, !!(features (1 VIRTIO_BLK_F_WCE))); +aio_context_acquire(bdrv_get_aio_context(s-bs)); +bdrv_set_enable_write_cache(s-bs, +!!(features (1 VIRTIO_BLK_F_WCE))); +aio_context_release(bdrv_get_aio_context(s-bs)); } } -- 1.9.2
Re: [Qemu-devel] usb: usb tablet freeze when save/restore guest os
-Original Message- From: Gerd Hoffmann [mailto:kra...@redhat.com] Sent: Thursday, May 15, 2014 2:50 PM To: Gonglei (Arei) Cc: Paolo Bonzini; qemu-devel@nongnu.org; Huangweidong (C); Michael S. Tsirkin Subject: Re: usb: usb tablet freeze when save/restore guest os Hi, Well then, may I post a formal patch for this issue, Gerd? Thanks. I'd like to know what the root cause for the lost interrupt is. Hi, Gerd. I must clarify the scene of this issue. 1)The problem occurred on Xen platform. the process of hibernate vm on Xen: suspend vm (pause all vcpus) call xc_save to save memory stop qemu (vm_stop) save qemu destroy vm 2)The process of hibernate vm on KVM: vm_stop do_vm_stop pause_all_vcpus runstate_set(state); //change the runstate to paused 3)the usb tablet backtrace: vnc_client_read protocol_client_msg pointer_event qemu_input_event_sync in qemu_input_event_sync(), there is a check vm runstate, as below: if (!runstate_is_running() !runstate_check(RUN_STATE_SUSPENDED)) { return; } On KVM platform, the usb tablet event will be return at there. As for Xen, It's too later. After suspend vm, the qemu process can response the event of usb tablet event. Because guest os's vcpus are paused, guest os cannot response interrupt injected by qemu. Then the interrupt will be lost. Not implementing PIRQ enable could be it, especially as the guest os seems to use it (otherwise your patch would have no effect). The check for the PIRQ enable bit should be in uhci_update_irq though, and you should check the single bit only, not the whole legacy support register. Agreed. Thanks. cheers, Gerd Best regards, -Gonglei
[Qemu-devel] [PATCH v2] spapr: Add ibm, chip-id property in device tree
This adds a ibm,chip-id property for CPU nodes which should be the same for all cores in the same CPU socket. The recent guest kernels use this information to associate threads with sockets. Refer to the kernel commit 256f2d4b463d3030ebc8d2b54f427543814a2bdc for more details. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v2: * always put ibm,chip to the device tree * removed from migration as it is the user's responsibility to run QEMU on both sides with the same CPU configuration --- hw/ppc/spapr.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index b898a39..166c1c6 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -313,6 +313,9 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base, uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)}; int i, smt = kvmppc_smt_threads(); unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80}; +QemuOpts *opts = qemu_opts_find(qemu_find_opts(smp-opts), NULL); +unsigned sockets = opts ? qemu_opt_get_number(opts, sockets, 0) : 0; +uint32_t cpus_per_socket = sockets ? (smp_cpus / sockets) : 1; fdt = g_malloc0(FDT_MAX_SIZE); _FDT((fdt_create(fdt, FDT_MAX_SIZE))); @@ -470,6 +473,9 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base, page_sizes_prop, page_sizes_prop_size))); } +_FDT((fdt_property_cell(fdt, ibm,chip-id, +cs-cpu_index / cpus_per_socket))); + _FDT((fdt_end_node(fdt))); } -- 1.9.rc0
Re: [Qemu-devel] [PATCH 1/3] throttle: add throttle_detach/attach_aio_context()
The Wednesday 14 May 2014 à 16:22:45 (+0200), Stefan Hajnoczi wrote : Block I/O throttling uses timers and currently always adds them to the main loop. Throttling will break if bdrv_set_aio_context() is used to move a BlockDriverState to a different AioContext. This patch adds throttle_detach/attach_aio_context() interfaces so the throttling timers and uses them to move timers to the new AioContext. Note that bdrv_set_aio_context() already drains all requests so we're sure no throttled requests are pending. The test cases need to be updated since the throttle_init() interface has changed. Cc: Benoît Canet benoit.ca...@irqsave.net Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block.c | 7 +++ include/qemu/throttle.h | 10 ++ tests/test-throttle.c | 25 - util/throttle.c | 27 +++ 4 files changed, 60 insertions(+), 9 deletions(-) diff --git a/block.c b/block.c index 189fc0d..b30bcd5 100644 --- a/block.c +++ b/block.c @@ -179,6 +179,7 @@ void bdrv_io_limits_enable(BlockDriverState *bs) { assert(!bs-io_limits_enabled); throttle_init(bs-throttle_state, + bdrv_get_aio_context(bs), QEMU_CLOCK_VIRTUAL, bdrv_throttle_read_timer_cb, bdrv_throttle_write_timer_cb, @@ -5532,6 +5533,9 @@ void bdrv_detach_aio_context(BlockDriverState *bs) return; } +if (bs-io_limits_enabled) { +throttle_detach_aio_context(bs-throttle_state); +} if (bs-drv-bdrv_detach_aio_context) { bs-drv-bdrv_detach_aio_context(bs); } @@ -5563,6 +5567,9 @@ void bdrv_attach_aio_context(BlockDriverState *bs, if (bs-drv-bdrv_attach_aio_context) { bs-drv-bdrv_attach_aio_context(bs, new_context); } +if (bs-io_limits_enabled) { +throttle_attach_aio_context(bs-throttle_state, new_context); +} } void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context) diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h index ab29b0b..b890613 100644 --- a/include/qemu/throttle.h +++ b/include/qemu/throttle.h @@ -67,6 +67,11 @@ typedef struct ThrottleState { int64_t previous_leak;/* timestamp of the last leak done */ QEMUTimer * timers[2];/* timers used to do the throttling */ QEMUClockType clock_type; /* the clock used */ + +/* Callbacks */ +QEMUTimerCB *read_timer_cb; +QEMUTimerCB *write_timer_cb; +void *timer_opaque; } ThrottleState; /* operations on single leaky buckets */ @@ -82,6 +87,7 @@ bool throttle_compute_timer(ThrottleState *ts, /* init/destroy cycle */ void throttle_init(ThrottleState *ts, + AioContext *aio_context, QEMUClockType clock_type, void (read_timer)(void *), void (write_timer)(void *), @@ -89,6 +95,10 @@ void throttle_init(ThrottleState *ts, void throttle_destroy(ThrottleState *ts); +void throttle_detach_aio_context(ThrottleState *ts); + +void throttle_attach_aio_context(ThrottleState *ts, AioContext *new_context); + bool throttle_have_timer(ThrottleState *ts); /* configuration */ diff --git a/tests/test-throttle.c b/tests/test-throttle.c index 1d4ffd3..5fa5000 100644 --- a/tests/test-throttle.c +++ b/tests/test-throttle.c @@ -12,8 +12,10 @@ #include glib.h #include math.h +#include block/aio.h #include qemu/throttle.h +AioContext *ctx; LeakyBucketbkt; ThrottleConfig cfg; ThrottleState ts; @@ -104,7 +106,8 @@ static void test_init(void) memset(ts, 1, sizeof(ts)); /* init the structure */ -throttle_init(ts, QEMU_CLOCK_VIRTUAL, read_timer_cb, write_timer_cb, ts); +throttle_init(ts, ctx, QEMU_CLOCK_VIRTUAL, + read_timer_cb, write_timer_cb, ts); /* check initialized fields */ g_assert(ts.clock_type == QEMU_CLOCK_VIRTUAL); @@ -126,7 +129,8 @@ static void test_init(void) static void test_destroy(void) { int i; -throttle_init(ts, QEMU_CLOCK_VIRTUAL, read_timer_cb, write_timer_cb, ts); +throttle_init(ts, ctx, QEMU_CLOCK_VIRTUAL, + read_timer_cb, write_timer_cb, ts); throttle_destroy(ts); for (i = 0; i 2; i++) { g_assert(!ts.timers[i]); @@ -165,7 +169,8 @@ static void test_config_functions(void) orig_cfg.op_size = 1; -throttle_init(ts, QEMU_CLOCK_VIRTUAL, read_timer_cb, write_timer_cb, ts); +throttle_init(ts, ctx, QEMU_CLOCK_VIRTUAL, + read_timer_cb, write_timer_cb, ts); /* structure reset by throttle_init previous_leak should be null */ g_assert(!ts.previous_leak); throttle_config(ts, orig_cfg); @@ -324,7 +329,8 @@ static void test_have_timer(void) g_assert(!throttle_have_timer(ts)); /* init the structure */ -
[Qemu-devel] [PATCH 1/9] kvm: add set_one_reg/get_one_reg helpers
This adds QEMU wrappers for KVM_SET_ONE_REG/KVM_GET_ONE_REG ioctls. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v3: * changed comments v2: * added Doc Comments * removed error_print --- include/sysemu/kvm.h | 21 + kvm-all.c| 18 ++ 2 files changed, 39 insertions(+) diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 63e241a..b1566a4 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -383,4 +383,25 @@ void kvm_init_irq_routing(KVMState *s); * 0: irq chip was created */ int kvm_arch_irqchip_create(KVMState *s); + +/** + * kvm_set_one_reg - set a register value in KVM via KVM_SET_ONE_REG ioctl + * @id: The register ID + * @addr: The pointer to a value must point to a variable of the correct + * type/size for the register being accessed. + * + * Returns: 0 on success, or a negative errno on failure. + */ +int kvm_set_one_reg(CPUState *cs, uint64_t id, void *addr); + +/** + * kvm_get_one_reg - get a register value from KVM via KVM_GET_ONE_REG ioctl + * @id: The register ID + * @addr: The pointer to a value must point to a variable of the correct + * type/size for the register being accessed. + * + * Returns: 0 on success, or a negative errno on failure. + */ +int kvm_get_one_reg(CPUState *cs, uint64_t id, void *addr); + #endif diff --git a/kvm-all.c b/kvm-all.c index 5cb7f26..ebb1afa 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -2114,3 +2114,21 @@ int kvm_create_device(KVMState *s, uint64_t type, bool test) return test ? 0 : create_dev.fd; } + +int kvm_set_one_reg(CPUState *cs, uint64_t id, void *addr) +{ +struct kvm_one_reg reg = { +.id = id, +.addr = (uintptr_t)addr, +}; +return kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, reg); +} + +int kvm_get_one_reg(CPUState *cs, uint64_t id, void *addr) +{ +struct kvm_one_reg reg = { +.id = id, +.addr = (uintptr_t)addr, +}; +return kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, reg); +} -- 1.9.rc0
[Qemu-devel] [PATCH 2/9] target-ppc: Add compat CPU option
PowerISA defines a compatibility mode for server POWERPC CPUs which is supported by the PCR special register. To support this feature, SPAPR defines a set of virtual PVRs, once per PowerISA spec version. This introduces a compat CPU option which defines maximal compatibility mode enabled. The supported modes are power6/power7/power8. This does not change the existing behaviour, new property will be used by next patches. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- target-ppc/cpu-models.h | 11 +++ target-ppc/cpu-qom.h| 2 ++ target-ppc/translate_init.c | 75 + 3 files changed, 88 insertions(+) diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h index 9a003b4..33d2c51 100644 --- a/target-ppc/cpu-models.h +++ b/target-ppc/cpu-models.h @@ -595,6 +595,17 @@ enum { CPU_POWERPC_PA6T = 0x0090, }; +/* Logical PVR definitions for sPAPR */ +enum { +/* Logical CPUs */ +CPU_POWERPC_LOGICAL_2_04 = 0x0F01, +CPU_POWERPC_LOGICAL_2_05 = 0x0F02, +CPU_POWERPC_LOGICAL_2_06 = 0x0F03, +CPU_POWERPC_LOGICAL_2_06_PLUS = 0x0F13, +CPU_POWERPC_LOGICAL_2_07 = 0x0F04, +CPU_POWERPC_LOGICAL_2_08 = 0x0F05, +}; + /* System version register (used on MPC 8xxx)*/ enum { POWERPC_SVR_NONE = 0x, diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h index 47dc8e6..2a8515b 100644 --- a/target-ppc/cpu-qom.h +++ b/target-ppc/cpu-qom.h @@ -82,6 +82,7 @@ typedef struct PowerPCCPUClass { * PowerPCCPU: * @env: #CPUPPCState * @cpu_dt_id: CPU index used in the device tree. KVM uses this index too + * @max_compat: Maximal supported logical PVR from the command line * * A PowerPC CPU. */ @@ -92,6 +93,7 @@ struct PowerPCCPU { CPUPPCState env; int cpu_dt_id; +uint32_t max_compat; }; static inline PowerPCCPU *ppc_env_get_cpu(CPUPPCState *env) diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c index 1d64ec9..6f61b34 100644 --- a/target-ppc/translate_init.c +++ b/target-ppc/translate_init.c @@ -28,6 +28,8 @@ #include mmu-hash32.h #include mmu-hash64.h #include qemu/error-report.h +#include qapi/visitor.h +#include hw/qdev-properties.h //#define PPC_DUMP_CPU //#define PPC_DEBUG_SPR @@ -7654,6 +7656,76 @@ POWERPC_FAMILY(POWER5P)(ObjectClass *oc, void *data) pcc-l1_icache_size = 0x1; } +static void powerpc_get_compat(Object *obj, Visitor *v, + void *opaque, const char *name, Error **errp) +{ +char *value = (char *); +Property *prop = opaque; +uint32_t *max_compat = qdev_get_prop_ptr(DEVICE(obj), prop); + +switch (*max_compat) { +case CPU_POWERPC_LOGICAL_2_05: +value = (char *)power6; +break; +case CPU_POWERPC_LOGICAL_2_06: +value = (char *)power7; +break; +case CPU_POWERPC_LOGICAL_2_07: +value = (char *)power8; +break; +case 0: +break; +default: +error_setg(errp, Internal error: compat is set to %x, + max_compat ? *max_compat : -1); +break; +} + +visit_type_str(v, value, name, errp); +} + +static void powerpc_set_compat(Object *obj, Visitor *v, + void *opaque, const char *name, Error **errp) +{ +Error *error = NULL; +char *value = NULL; +Property *prop = opaque; +uint32_t *max_compat = qdev_get_prop_ptr(DEVICE(obj), prop); + +visit_type_str(v, value, name, error); +if (error) { +error_propagate(errp, error); +return; +} + +if (strcmp(value, power6) == 0) { +*max_compat = CPU_POWERPC_LOGICAL_2_05; +} else if (strcmp(value, power7) == 0) { +*max_compat = CPU_POWERPC_LOGICAL_2_06; +} else if (strcmp(value, power8) == 0) { +*max_compat = CPU_POWERPC_LOGICAL_2_07; +} else { +error_setg(errp, Invalid compatibility mode \%s\, value); +} + +g_free(value); +} + +static PropertyInfo powerpc_compat_propinfo = { +.name = str, +.legacy_name = powerpc-server-compat, +.get = powerpc_get_compat, +.set = powerpc_set_compat, +}; + +#define DEFINE_PROP_POWERPC_COMPAT(_n, _s, _f) \ +DEFINE_PROP(_n, _s, _f, powerpc_compat_propinfo, uint32_t) + +static Property powerpc_servercpu_properties[] = { +DEFINE_PROP_POWERPC_COMPAT(compat, PowerPCCPU, max_compat), +DEFINE_PROP_END_OF_LIST(), +}; + static void init_proc_POWER7 (CPUPPCState *env) { gen_spr_ne_601(env); @@ -7740,6 +7812,7 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data) dc-fw_name = PowerPC,POWER7; dc-desc = POWER7; +dc-props = powerpc_servercpu_properties; pcc-pvr = CPU_POWERPC_POWER7_BASE; pcc-pvr_mask = CPU_POWERPC_POWER7_MASK; pcc-init_proc = init_proc_POWER7; @@ -7798,6 +7871,7 @@ POWERPC_FAMILY(POWER7P)(ObjectClass *oc, void *data)
[Qemu-devel] [PATCH 4/9] target-ppc: Implement compat CPU option
This adds basic support for the compat CPU option. By specifying the compat property, the user can manually switch guest CPU mode from raw to architected. Since the actual compatibility mode is not implemented yet, this does not change the existing behavior. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/ppc/spapr.c | 15 ++- target-ppc/cpu-models.h | 6 ++ target-ppc/cpu-qom.h| 2 ++ target-ppc/cpu.h| 3 +++ target-ppc/translate_init.c | 35 +++ 5 files changed, 60 insertions(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 0f8bd95..61d0675 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -212,7 +212,8 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr) CPU_FOREACH(cpu) { DeviceClass *dc = DEVICE_GET_CLASS(cpu); -int index = ppc_get_vcpu_dt_id(POWERPC_CPU(cpu)); +PowerPCCPU *pcpu = POWERPC_CPU(cpu); +int index = ppc_get_vcpu_dt_id(pcpu); uint32_t associativity[] = {cpu_to_be32(0x5), cpu_to_be32(0x0), cpu_to_be32(0x0), @@ -249,6 +250,14 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr) return ret; } +if (pcpu-cpu_version) { +ret = fdt_setprop(fdt, offset, cpu-version, + pcpu-cpu_version, sizeof(pcpu-cpu_version)); +if (ret 0) { +return ret; +} +} + /* Build interrupt servers and gservers properties */ for (i = 0; i smp_threads; i++) { servers_prop[i] = cpu_to_be32(index + i); @@ -1278,6 +1287,10 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) kvmppc_set_papr(cpu); } +if (cpu-max_compat) { +ppc_set_compat(cpu, cpu-max_compat); +} + xics_cpu_setup(spapr-icp, cpu); qemu_register_reset(spapr_cpu_reset, cpu); diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h index 33d2c51..45c0028 100644 --- a/target-ppc/cpu-models.h +++ b/target-ppc/cpu-models.h @@ -753,4 +753,10 @@ enum { POWERPC_SVR_8641D = 0x80900121, }; +/* Processor Compatibility mask (PCR) */ +enum { +POWERPC_ISA_COMPAT_2_05 = 0x02, +POWERPC_ISA_COMPAT_2_06 = 0x04, +}; + #endif diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h index 2a8515b..dfd1419 100644 --- a/target-ppc/cpu-qom.h +++ b/target-ppc/cpu-qom.h @@ -83,6 +83,7 @@ typedef struct PowerPCCPUClass { * @env: #CPUPPCState * @cpu_dt_id: CPU index used in the device tree. KVM uses this index too * @max_compat: Maximal supported logical PVR from the command line + * @cpu_version: Current logical PVR, zero if in raw mode * * A PowerPC CPU. */ @@ -94,6 +95,7 @@ struct PowerPCCPU { CPUPPCState env; int cpu_dt_id; uint32_t max_compat; +uint32_t cpu_version; }; static inline PowerPCCPU *ppc_env_get_cpu(CPUPPCState *env) diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index d498340..f61675a 100644 --- a/target-ppc/cpu.h +++ b/target-ppc/cpu.h @@ -1123,6 +1123,8 @@ void ppc_store_msr (CPUPPCState *env, target_ulong value); void ppc_cpu_list (FILE *f, fprintf_function cpu_fprintf); +void ppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version); + /* Time-base and decrementer management */ #ifndef NO_CPU_IO_DEFS uint64_t cpu_ppc_load_tbl (CPUPPCState *env); @@ -1338,6 +1340,7 @@ static inline int cpu_mmu_index (CPUPPCState *env) #define SPR_LPCR (0x13E) #define SPR_BOOKE_DVC2(0x13F) #define SPR_BOOKE_TSR (0x150) +#define SPR_PCR (0x152) #define SPR_BOOKE_TCR (0x154) #define SPR_BOOKE_TLB0PS (0x158) #define SPR_BOOKE_TLB1PS (0x159) diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c index 6f61b34..c4bd5de 100644 --- a/target-ppc/translate_init.c +++ b/target-ppc/translate_init.c @@ -7803,6 +7803,15 @@ static void init_proc_POWER7 (CPUPPCState *env) /* Can't find information on what this should be on reset. This * value is the one used by 74xx processors. */ vscr_init(env, 0x0001); + +/* + * Register PCR to report POWERPC_EXCP_PRIV_REG instead of + * POWERPC_EXCP_INVAL_SPR. + */ +spr_register(env, SPR_PCR, PCR, + SPR_NOACCESS, SPR_NOACCESS, + SPR_NOACCESS, SPR_NOACCESS, + 0x); } POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data) @@ -8880,6 +8889,32 @@ static void ppc_cpu_unrealizefn(DeviceState *dev, Error **errp) } } +void ppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version) +{ +CPUPPCState *env = cpu-env; + +cpu-cpu_version = cpu_version; + +/* + * Calculate PCR value from virtual PVR. + * TODO: support actual compatibility in TCG. + */ +switch (cpu_version) { +case
[Qemu-devel] [PATCH 5/9] target-ppc: Define Processor Compatibility Masks
This introduces PCR mask for supported compatibility modes. This will be used later by the ibm,client-architecture-support call. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- target-ppc/cpu-qom.h| 1 + target-ppc/translate_init.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h index dfd1419..093f09a 100644 --- a/target-ppc/cpu-qom.h +++ b/target-ppc/cpu-qom.h @@ -57,6 +57,7 @@ typedef struct PowerPCCPUClass { uint32_t pvr; uint32_t pvr_mask; +uint64_t pcr_mask; uint32_t svr; uint64_t insns_flags; uint64_t insns_flags2; diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c index c4bd5de..1b72485 100644 --- a/target-ppc/translate_init.c +++ b/target-ppc/translate_init.c @@ -7824,6 +7824,7 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data) dc-props = powerpc_servercpu_properties; pcc-pvr = CPU_POWERPC_POWER7_BASE; pcc-pvr_mask = CPU_POWERPC_POWER7_MASK; +pcc-pcr_mask = POWERPC_ISA_COMPAT_2_05 | POWERPC_ISA_COMPAT_2_06; pcc-init_proc = init_proc_POWER7; pcc-check_pow = check_pow_nocheck; pcc-insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB | @@ -7883,6 +7884,7 @@ POWERPC_FAMILY(POWER7P)(ObjectClass *oc, void *data) dc-props = powerpc_servercpu_properties; pcc-pvr = CPU_POWERPC_POWER7P_BASE; pcc-pvr_mask = CPU_POWERPC_POWER7P_MASK; +pcc-pcr_mask = POWERPC_ISA_COMPAT_2_05 | POWERPC_ISA_COMPAT_2_06; pcc-init_proc = init_proc_POWER7; pcc-check_pow = check_pow_nocheck; pcc-insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB | @@ -7954,6 +7956,7 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data) dc-props = powerpc_servercpu_properties; pcc-pvr = CPU_POWERPC_POWER8_BASE; pcc-pvr_mask = CPU_POWERPC_POWER8_MASK; +pcc-pcr_mask = POWERPC_ISA_COMPAT_2_05 | POWERPC_ISA_COMPAT_2_06; pcc-init_proc = init_proc_POWER8; pcc-check_pow = check_pow_nocheck; pcc-insns_flags = PPC_INSNS_BASE | PPC_ISEL | PPC_STRING | PPC_MFTB | -- 1.9.rc0
[Qemu-devel] [PATCH 6/9] spapr: Add ibm, client-architecture-support call
The PAPR+ specification defines a ibm,client-architecture-support (CAS) RTAS call which purpose is to provide a negotiation mechanism for the guest and the hypervisor to work out the best compatibility parameters. During the negotiation process, the guest provides an array of various options and capabilities which it supports, the hypervisor adjusts the device tree and (optionally) reboots the guest. At the moment the Linux guest calls CAS method at early boot so SLOF gets called. SLOF allocates a memory buffer for the device tree changes and calls a custom KVMPPC_H_CAS hypercall. QEMU parses the options, composes a diff for the device tree, copies it to the buffer provided by SLOF and returns to SLOF. SLOF updates the device tree and returns control to the guest kernel. Only then the Linux guest parses the device tree so it is possible to avoid unnecessary reboot in most cases. The device tree diff is a header with the update format version (defined as 0 in this patch) followed by a device tree with the properties which require update. If QEMU detects during the option list parsing that it has to reboot the guest, it silently does so as the guest expects reboot to happen as pHyp firmware does it almost all the time. This defines custom KVMPPC_H_CAS hypercall. The current SLOF already has support for it. This implements stub which returns empty tree to the guest. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/ppc/spapr.c | 26 ++ hw/ppc/spapr_hcall.c | 21 + include/hw/ppc/spapr.h | 9 - trace-events | 4 4 files changed, 59 insertions(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 61d0675..cf53a7a 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -53,6 +53,7 @@ #include hw/usb.h #include qemu/config-file.h #include qemu/error-report.h +#include trace.h #include libfdt.h @@ -562,6 +563,31 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base, return fdt; } +int spapr_h_cas_compose_response(target_ulong addr, target_ulong size) +{ +void *fdt; +sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 }; + +size -= sizeof(hdr); + +fdt = g_malloc0(size); +_FDT((fdt_create(fdt, size))); + +_FDT((fdt_finish(fdt))); + +if (fdt_totalsize(fdt) + sizeof(hdr) size) { +trace_spapr_cas_failed(size); +return -1; +} + +cpu_physical_memory_write(addr, hdr, sizeof(hdr)); +cpu_physical_memory_write(addr + sizeof(hdr), fdt, fdt_totalsize(fdt)); +trace_spapr_cas_continue(fdt_totalsize(fdt) + sizeof(hdr)); +g_free(fdt); + +return 0; +} + static int spapr_populate_memory(sPAPREnvironment *spapr, void *fdt) { uint32_t associativity[] = {cpu_to_be32(0x4), cpu_to_be32(0x0), diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c index 0bae053..2f6aa5c 100644 --- a/hw/ppc/spapr_hcall.c +++ b/hw/ppc/spapr_hcall.c @@ -752,6 +752,24 @@ out: return ret; } +static target_ulong h_client_architecture_support(PowerPCCPU *cpu_, + sPAPREnvironment *spapr, + target_ulong opcode, + target_ulong *args) +{ +target_ulong list = args[0]; + +if (!list) { +return H_SUCCESS; +} + +if (spapr_h_cas_compose_response(args[1], args[2])) { +qemu_system_reset_request(); +} + +return H_SUCCESS; +} + static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1]; static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1]; @@ -831,6 +849,9 @@ static void hypercall_register_types(void) spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas); spapr_register_hypercall(H_SET_MODE, h_set_mode); + +/* ibm,client-architecture-support support */ +spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support); } type_init(hypercall_register_types) diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index 5fdac1e..dd6d97a 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -302,10 +302,16 @@ typedef struct sPAPREnvironment { #define KVMPPC_HCALL_BASE 0xf000 #define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) #define KVMPPC_H_LOGICAL_MEMOP (KVMPPC_HCALL_BASE + 0x1) -#define KVMPPC_HCALL_MAXKVMPPC_H_LOGICAL_MEMOP +/* Client Architecture support */ +#define KVMPPC_H_CAS(KVMPPC_HCALL_BASE + 0x2) +#define KVMPPC_HCALL_MAXKVMPPC_H_CAS extern sPAPREnvironment *spapr; +typedef struct sPAPRDeviceTreeUpdateHeader { +uint32_t version_id; +} sPAPRDeviceTreeUpdateHeader; + /*#define DEBUG_SPAPR_HCALLS*/ #ifdef DEBUG_SPAPR_HCALLS @@ -401,6 +407,7 @@ struct sPAPRTCETable { void spapr_events_init(sPAPREnvironment *spapr); void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq); +int spapr_h_cas_compose_response(target_ulong addr,
[Qemu-devel] [PATCH 0/9] spapr: Enable ibm, client-architecture-support
This enables a ibm,client-architecture-support RTAS call. This allows older distros (such as SLES11 or RHEL6) to work on modern POWERPC hardware (such as POWER8) in architected mode. The previous try was RFC, so this is v1. The very first patch here is for the reference, it is already on its way to upstream. Please comment. Thanks! Alexey Kardashevskiy (9): kvm: add set_one_reg/get_one_reg helpers target-ppc: Add compat CPU option spapr: Move server# property out of skeleton fdt target-ppc: Implement compat CPU option target-ppc: Define Processor Compatibility Masks spapr: Add ibm,client-architecture-support call spapr: Limit threads per core according to current compatibility mode spapr: Implement processor compatibility in ibm,client-architecture-support KVM: PPC: Enable compatibility mode hw/ppc/spapr.c | 149 +++- hw/ppc/spapr_hcall.c| 108 include/hw/ppc/spapr.h | 9 ++- include/sysemu/kvm.h| 21 +++ kvm-all.c | 18 ++ target-ppc/cpu-models.h | 17 + target-ppc/cpu-qom.h| 5 ++ target-ppc/cpu.h| 3 + target-ppc/kvm.c| 5 ++ target-ppc/kvm_ppc.h| 6 ++ target-ppc/translate_init.c | 113 + trace-events| 9 +++ 12 files changed, 446 insertions(+), 17 deletions(-) -- 1.9.rc0
[Qemu-devel] [PATCH 9/9] KVM: PPC: Enable compatibility mode
The host kernel implements a KVM_REG_PPC_ARCH_COMPAT register which this uses to enable a compatibility mode if any chosen. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/ppc/spapr.c | 6 ++ hw/ppc/spapr_hcall.c | 4 target-ppc/kvm.c | 5 + target-ppc/kvm_ppc.h | 6 ++ 4 files changed, 21 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index a0882a1..f89be10 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1376,6 +1376,12 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args) /* Tell KVM that we're in PAPR mode */ if (kvm_enabled()) { kvmppc_set_papr(cpu); + +if (cpu-max_compat +kvmppc_set_compat(cpu, cpu-max_compat) 0) { +fprintf(stderr, Unable to set compatibility mode\n); +exit(1); +} } if (cpu-max_compat) { diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c index cb815c3..2ab21d3 100644 --- a/hw/ppc/spapr_hcall.c +++ b/hw/ppc/spapr_hcall.c @@ -834,6 +834,10 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_, CPU_FOREACH(cs) { PowerPCCPU *cpu = POWERPC_CPU(cs); +if (kvmppc_set_compat(cpu, cpu_version) 0) { +fprintf(stderr, Unable to set compatibility mode\n); +return H_HARDWARE; +} ppc_set_compat(cpu, cpu_version); } } diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index ff319fc..f8e8453 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -1513,6 +1513,11 @@ void kvmppc_set_papr(PowerPCCPU *cpu) cap_papr = 1; } +int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version) +{ +return kvm_set_one_reg(CPU(cpu), KVM_REG_PPC_ARCH_COMPAT, cpu_version); +} + void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy) { CPUState *cs = CPU(cpu); diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h index ff077ec..716c33d 100644 --- a/target-ppc/kvm_ppc.h +++ b/target-ppc/kvm_ppc.h @@ -23,6 +23,7 @@ int kvmppc_get_hasidle(CPUPPCState *env); int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, int buf_len); int kvmppc_set_interrupt(PowerPCCPU *cpu, int irq, int level); void kvmppc_set_papr(PowerPCCPU *cpu); +int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version); void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy); int kvmppc_smt_threads(void); int kvmppc_clear_tsr_bits(PowerPCCPU *cpu, uint32_t tsr_bits); @@ -95,6 +96,11 @@ static inline void kvmppc_set_papr(PowerPCCPU *cpu) { } +static inline int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version) +{ +return 0; +} + static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy) { } -- 1.9.rc0
[Qemu-devel] [PATCH 3/9] spapr: Move server# property out of skeleton fdt
The upcoming support of the ibm,client-architecture-support reconfiguration method will be able to reduce the number of threads per core so the server# and gserver# device tree properties are not parts of the FDT skeleton anymore. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/ppc/spapr.c | 37 ++--- 1 file changed, 22 insertions(+), 15 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 166c1c6..0f8bd95 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -219,6 +219,9 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr) cpu_to_be32(0x0), cpu_to_be32(cpu-numa_node), cpu_to_be32(index)}; +uint32_t servers_prop[smp_threads]; +uint32_t gservers_prop[smp_threads * 2]; +int i; if ((index % smt) != 0) { continue; @@ -245,6 +248,24 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr) if (ret 0) { return ret; } + +/* Build interrupt servers and gservers properties */ +for (i = 0; i smp_threads; i++) { +servers_prop[i] = cpu_to_be32(index + i); +/* Hack, direct the group queues back to cpu 0 */ +gservers_prop[i*2] = cpu_to_be32(index + i); +gservers_prop[i*2 + 1] = 0; +} +ret = fdt_setprop(fdt, offset, ibm,ppc-interrupt-server#s, + servers_prop, sizeof(servers_prop)); +if (ret 0) { +return ret; +} +ret = fdt_setprop(fdt, offset, ibm,ppc-interrupt-gserver#s, + gservers_prop, sizeof(gservers_prop)); +if (ret 0) { +return ret; +} } return ret; } @@ -311,7 +332,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base, char qemu_hypertas_prop[] = hcall-memop1; uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)}; uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)}; -int i, smt = kvmppc_smt_threads(); +int smt = kvmppc_smt_threads(); unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80}; QemuOpts *opts = qemu_opts_find(qemu_find_opts(smp-opts), NULL); unsigned sockets = opts ? qemu_opt_get_number(opts, sockets, 0) : 0; @@ -378,8 +399,6 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base, DeviceClass *dc = DEVICE_GET_CLASS(cs); PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs); int index = ppc_get_vcpu_dt_id(cpu); -uint32_t servers_prop[smp_threads]; -uint32_t gservers_prop[smp_threads * 2]; char *nodename; uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40), 0x, 0x}; @@ -428,18 +447,6 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base, _FDT((fdt_property_string(fdt, status, okay))); _FDT((fdt_property(fdt, 64-bit, NULL, 0))); -/* Build interrupt servers and gservers properties */ -for (i = 0; i smp_threads; i++) { -servers_prop[i] = cpu_to_be32(index + i); -/* Hack, direct the group queues back to cpu 0 */ -gservers_prop[i*2] = cpu_to_be32(index + i); -gservers_prop[i*2 + 1] = 0; -} -_FDT((fdt_property(fdt, ibm,ppc-interrupt-server#s, - servers_prop, sizeof(servers_prop; -_FDT((fdt_property(fdt, ibm,ppc-interrupt-gserver#s, - gservers_prop, sizeof(gservers_prop; - if (env-spr_cb[SPR_PURR].oea_read) { _FDT((fdt_property(fdt, ibm,purr, NULL, 0))); } -- 1.9.rc0
[Qemu-devel] [PATCH 8/9] spapr: Implement processor compatibility in ibm, client-architecture-support
Modern Linux kernels support last POWERPC CPUs so when a kernel boots, in most cases it can find a matching cpu_spec in the kernel's cpu_specs list. However if the kernel is quite old, it may be missing a definition of the actual CPU. To provide an ability for old kernels to work on modern hardware, a Processor Compatibility Mode has been introduced by the PowerISA specification. From the hardware prospective, it is supported by the Processor Compatibility Register (PCR) which is defined in PowerISA. The register enables one of the compatibility modes (2.05/2.06/2.07). Since PCR is a hypervisor privileged register and cannot be accessed from the guest, the mode selection is done via ibm,client-architecture-support (CAS) RTAS call using which the guest specifies what raw and architected CPU versions it supports. QEMU works out the best match, changes a cpu-version property of every CPU and notifies the guest about the change by setting these properties in the buffer passed as a response on a custom H_CAS hypercall. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/ppc/spapr.c | 40 + hw/ppc/spapr_hcall.c | 83 trace-events | 5 3 files changed, 128 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index a2c9106..a0882a1 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -35,6 +35,7 @@ #include kvm_ppc.h #include mmu-hash64.h #include cpu-models.h +#include qom/cpu.h #include hw/boards.h #include hw/ppc/ppc.h @@ -592,11 +593,50 @@ int spapr_h_cas_compose_response(target_ulong addr, target_ulong size) { void *fdt; sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 }; +CPUState *cs; +int smt = kvmppc_smt_threads(); size -= sizeof(hdr); fdt = g_malloc0(size); _FDT((fdt_create(fdt, size))); +_FDT((fdt_begin_node(fdt, cpus))); + +CPU_FOREACH(cs) { +PowerPCCPU *cpu = POWERPC_CPU(cs); +DeviceClass *dc = DEVICE_GET_CLASS(cpu); +int smpt = spapr_get_compat_smp_threads(cpu); +int i, index = ppc_get_vcpu_dt_id(cpu); +uint32_t servers_prop[smpt]; +uint32_t gservers_prop[smpt * 2]; +char tmp[32]; + +if ((index % smt) != 0) { +continue; +} + +snprintf(tmp, 32, %s@%x, dc-fw_name, index); +trace_spapr_cas_add_subnode(tmp); + +_FDT((fdt_begin_node(fdt, tmp))); +_FDT((fdt_property_cell(fdt, cpu-version, cpu-cpu_version))); + +/* Build interrupt servers and gservers properties */ +for (i = 0; i smpt; i++) { +servers_prop[i] = cpu_to_be32(index + i); +/* Hack, direct the group queues back to cpu 0 */ +gservers_prop[i*2] = cpu_to_be32(index + i); +gservers_prop[i*2 + 1] = 0; +} +_FDT((fdt_property(fdt, ibm,ppc-interrupt-server#s, + servers_prop, sizeof(servers_prop; +_FDT((fdt_property(fdt, ibm,ppc-interrupt-gserver#s, + gservers_prop, sizeof(gservers_prop; + +_FDT((fdt_end_node(fdt))); +} + +_FDT((fdt_end_node(fdt))); _FDT((fdt_finish(fdt))); diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c index 2f6aa5c..cb815c3 100644 --- a/hw/ppc/spapr_hcall.c +++ b/hw/ppc/spapr_hcall.c @@ -3,6 +3,9 @@ #include helper_regs.h #include hw/ppc/spapr.h #include mmu-hash64.h +#include cpu-models.h +#include trace.h +#include kvm_ppc.h struct SPRSyncState { CPUState *cs; @@ -752,12 +755,92 @@ out: return ret; } +#define get_compat_level(cpuver) ( \ +((cpuver) == CPU_POWERPC_LOGICAL_2_05) ? 2050 : \ +((cpuver) == CPU_POWERPC_LOGICAL_2_06) ? 2060 : \ +((cpuver) == CPU_POWERPC_LOGICAL_2_06_PLUS) ? 2061 : \ +((cpuver) == CPU_POWERPC_LOGICAL_2_07) ? 2070 : 0) + static target_ulong h_client_architecture_support(PowerPCCPU *cpu_, sPAPREnvironment *spapr, target_ulong opcode, target_ulong *args) { target_ulong list = args[0]; +PowerPCCPUClass *pcc_ = POWERPC_CPU_GET_CLASS(cpu_); +CPUState *cs; +CPUState *cs_ = CPU(cpu_); +bool cpu_match = false; +unsigned old_cpu_version = cpu_-cpu_version; +unsigned compat_lvl = 0, cpu_version = 0; +unsigned max_lvl = get_compat_level(cpu_-max_compat); + +/* Parse PVR list */ +for ( ; ; ) { +uint32_t pvr, pvr_mask; + +pvr_mask = ldl_phys(cs_-as, list); +list += 4; +pvr = ldl_phys(cs_-as, list); +list += 4; + +trace_spapr_cas_pvr_try(pvr); +if (!max_lvl +((cpu_-env.spr[SPR_PVR] pvr_mask) == (pvr pvr_mask))) { +cpu_match = true; +cpu_version = 0; +} else if (pvr == cpu_-cpu_version) { +cpu_match = true; +
[Qemu-devel] [PATCH 7/9] spapr: Limit threads per core according to current compatibility mode
This puts a limit to the number of threads per core based on the current compatibility mode. Although PowerISA specs do not specify the maximum threads per core number, the linux guest still expects that PowerISA2.05-compatible CPU supports only 2 threads per core as this is what POWER6 (2.05 compliant CPU) implements, same is true for POWER7 (2.06, 4 threads) and POWER8 (2.07, 8 threads). Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/ppc/spapr.c | 31 --- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index cf53a7a..a2c9106 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -34,6 +34,7 @@ #include sysemu/kvm.h #include kvm_ppc.h #include mmu-hash64.h +#include cpu-models.h #include hw/boards.h #include hw/ppc/ppc.h @@ -203,6 +204,29 @@ static XICSState *xics_system_init(int nr_servers, int nr_irqs) return icp; } +static int spapr_get_compat_smp_threads(PowerPCCPU *cpu) +{ +int ret = -1; + +switch (cpu-cpu_version) { +case CPU_POWERPC_LOGICAL_2_05: +ret = 2; +break; +case CPU_POWERPC_LOGICAL_2_06: +ret = 4; +break; +case CPU_POWERPC_LOGICAL_2_07: +ret = 8; +break; +default: +ret = smp_threads; +break; +} +ret = MIN(ret, smp_threads); + +return ret; +} + static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr) { int ret = 0, offset; @@ -221,8 +245,9 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr) cpu_to_be32(0x0), cpu_to_be32(cpu-numa_node), cpu_to_be32(index)}; -uint32_t servers_prop[smp_threads]; -uint32_t gservers_prop[smp_threads * 2]; +int smpt = spapr_get_compat_smp_threads(pcpu); +uint32_t servers_prop[smpt]; +uint32_t gservers_prop[smpt * 2]; int i; if ((index % smt) != 0) { @@ -260,7 +285,7 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr) } /* Build interrupt servers and gservers properties */ -for (i = 0; i smp_threads; i++) { +for (i = 0; i smpt; i++) { servers_prop[i] = cpu_to_be32(index + i); /* Hack, direct the group queues back to cpu 0 */ gservers_prop[i*2] = cpu_to_be32(index + i); -- 1.9.rc0
Re: [Qemu-devel] [PATCH] migration: static variables will not be reset at second migration
Hi, Juan Ping... Maybe you forgot to pull this patch, right? Thanks. Best regards, -Gonglei -Original Message- From: Juan Quintela [mailto:quint...@redhat.com] Sent: Friday, March 21, 2014 9:26 PM To: Gonglei (Arei) Cc: qemu-devel@nongnu.org; owass...@redhat.com; pbonz...@redhat.com; ebl...@redhat.com; dgilb...@redhat.com; chenliang (T) Subject: Re: [PATCH] migration: static variables will not be reset at second migration arei.gong...@huawei.com wrote: From: ChenLiang chenlian...@huawei.com The static variables in migration_bitmap_sync will not be reset in the case of a second attempted migration. Signed-off-by: ChenLiang chenlian...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com Good catch. Applied.. --- arch_init.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch_init.c b/arch_init.c index 60c975d..10516cb 100644 --- a/arch_init.c +++ b/arch_init.c @@ -468,15 +468,23 @@ static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length) /* Needs iothread lock! */ +/* Fix me: there are too many global variables used in migration process. */ +static int64_t start_time; +static int64_t bytes_xfer_prev; +static int64_t num_dirty_pages_period; + +static void migration_bitmap_sync_init(void) +{ +start_time = 0; +bytes_xfer_prev = 0; +num_dirty_pages_period = 0; +} static void migration_bitmap_sync(void) { RAMBlock *block; uint64_t num_dirty_pages_init = migration_dirty_pages; MigrationState *s = migrate_get_current(); -static int64_t start_time; -static int64_t bytes_xfer_prev; -static int64_t num_dirty_pages_period; int64_t end_time; int64_t bytes_xfer_now; @@ -733,6 +741,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) migration_dirty_pages = ram_pages; mig_throttle_on = false; dirty_rate_high_cnt = 0; +migration_bitmap_sync_init(); if (migrate_use_xbzrle()) { qemu_mutex_lock_iothread();
Re: [Qemu-devel] [PATCH 3/4] curl: Add sslverify option
Am 15.05.2014 um 01:28 hat Matthew Booth geschrieben: This allows qemu to use images over https with a self-signed certificate. It defaults to verifying the certificate. Signed-off-by: Matthew Booth mbo...@redhat.com --- block/curl.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/block/curl.c b/block/curl.c index 1b9f2f2..43d6646 100644 --- a/block/curl.c +++ b/block/curl.c @@ -23,6 +23,7 @@ */ #include qemu-common.h #include block/block_int.h +#include qapi/qmp/qbool.h #include curl/curl.h // #define DEBUG @@ -69,6 +70,7 @@ static CURLMcode __curl_multi_socket_action(CURLM *multi_handle, #define CURL_BLOCK_OPT_URL url #define CURL_BLOCK_OPT_READAHEAD readahead +#define CURL_BLOCK_OPT_SSLVERIFY sslverify struct BDRVCURLState; @@ -106,6 +108,7 @@ typedef struct BDRVCURLState { CURLState states[CURL_NUM_STATES]; char *url; size_t readahead_size; +bool sslverify; bool accept_range; } BDRVCURLState; @@ -372,6 +375,7 @@ static CURLState *curl_init_state(BDRVCURLState *s) return NULL; } curl_easy_setopt(state-curl, CURLOPT_URL, s-url); +curl_easy_setopt(state-curl, CURLOPT_SSL_VERIFYPEER, s-sslverify); block/curl.c: In function 'curl_init_state': block/curl.c:378:1043: error: call to '_curl_easy_setopt_err_long' declared with attribute warning: curl_easy_setopt expects a long argument for this option [-Werror] curl_easy_setopt(state-curl, CURLOPT_SSL_VERIFYPEER, s-sslverify); Squashing in this hunk: --- a/block/curl.c +++ b/block/curl.c @@ -375,7 +375,8 @@ static CURLState *curl_init_state(BDRVCURLState *s) return NULL; } curl_easy_setopt(state-curl, CURLOPT_URL, s-url); -curl_easy_setopt(state-curl, CURLOPT_SSL_VERIFYPEER, s-sslverify); +curl_easy_setopt(state-curl, CURLOPT_SSL_VERIFYPEER, + (long) s-sslverify); curl_easy_setopt(state-curl, CURLOPT_TIMEOUT, 5); curl_easy_setopt(state-curl, CURLOPT_WRITEFUNCTION, (void *)curl_read_cb); Kevin
[Qemu-devel] [PATCH] Revert iotests: Use configured python
This reverts commit f915db07ef9c368ea6db6430256de064fdd1525f. This commit is broken because it does not account for the build tree and the source tree being different, and can cause build failures for out-of-tree builds. Revert it until we can identify a better solution to the problem. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- I have an increasingly large queue of pull requests which I'm currently not processing pending a fix to this build breakage. Reverting the commit seems the simplest solution to unblocking this... Barring objections I'm going to commit this to master this afternoon. configure| 6 -- tests/qemu-iotests/031 | 9 - tests/qemu-iotests/036 | 7 +++ tests/qemu-iotests/039 | 19 +-- tests/qemu-iotests/054 | 3 +-- tests/qemu-iotests/060 | 21 ++--- tests/qemu-iotests/061 | 25 - tests/qemu-iotests/065 | 2 +- tests/qemu-iotests/083 | 3 +-- tests/qemu-iotests/check | 18 ++ 10 files changed, 43 insertions(+), 70 deletions(-) diff --git a/configure b/configure index 6adfa72..e565e59 100755 --- a/configure +++ b/configure @@ -4768,12 +4768,6 @@ if test $gcov = yes ; then echo GCOV=$gcov_tool $config_host_mak fi -iotests_common_env=tests/qemu-iotests/common.env - -echo # Automatically generated by configure - do not modify $iotests_common_env -echo $iotests_common_env -echo PYTHON='$python' $iotests_common_env - # use included Linux headers if test $linux = yes ; then mkdir -p linux-headers diff --git a/tests/qemu-iotests/031 b/tests/qemu-iotests/031 index 5aefb88..1d920ea 100755 --- a/tests/qemu-iotests/031 +++ b/tests/qemu-iotests/031 @@ -35,7 +35,6 @@ _cleanup() trap _cleanup; exit \$status 0 1 2 3 15 # get standard environment, filters and checks -. ./common.env . ./common.rc . ./common.filter . ./common.pattern @@ -57,22 +56,22 @@ for IMGOPTS in compat=0.10 compat=1.1; do echo === Create image with unknown header extension === echo _make_test_img 64M -$PYTHON qcow2.py $TEST_IMG add-header-ext 0x12345678 This is a test header extension -$PYTHON qcow2.py $TEST_IMG dump-header +./qcow2.py $TEST_IMG add-header-ext 0x12345678 This is a test header extension +./qcow2.py $TEST_IMG dump-header _check_test_img echo echo === Rewrite header with no backing file === echo $QEMU_IMG rebase -u -b $TEST_IMG -$PYTHON qcow2.py $TEST_IMG dump-header +./qcow2.py $TEST_IMG dump-header _check_test_img echo echo === Add a backing file and format === echo $QEMU_IMG rebase -u -b /some/backing/file/path -F host_device $TEST_IMG -$PYTHON qcow2.py $TEST_IMG dump-header +./qcow2.py $TEST_IMG dump-header done # success, all done diff --git a/tests/qemu-iotests/036 b/tests/qemu-iotests/036 index 29c35d1..03b6aa9 100755 --- a/tests/qemu-iotests/036 +++ b/tests/qemu-iotests/036 @@ -38,7 +38,6 @@ _cleanup() trap _cleanup; exit \$status 0 1 2 3 15 # get standard environment, filters and checks -. ./common.env . ./common.rc . ./common.filter . ./common.pattern @@ -54,15 +53,15 @@ IMGOPTS=compat=1.1 echo === Create image with unknown autoclear feature bit === echo _make_test_img 64M -$PYTHON qcow2.py $TEST_IMG set-feature-bit autoclear 63 -$PYTHON qcow2.py $TEST_IMG dump-header +./qcow2.py $TEST_IMG set-feature-bit autoclear 63 +./qcow2.py $TEST_IMG dump-header echo echo === Repair image === echo _check_test_img -r all -$PYTHON qcow2.py $TEST_IMG dump-header +./qcow2.py $TEST_IMG dump-header # success, all done echo *** done diff --git a/tests/qemu-iotests/039 b/tests/qemu-iotests/039 index b7b7030..b9cbe99 100755 --- a/tests/qemu-iotests/039 +++ b/tests/qemu-iotests/039 @@ -38,7 +38,6 @@ _cleanup() trap _cleanup; exit \$status 0 1 2 3 15 # get standard environment, filters and checks -. ./common.env . ./common.rc . ./common.filter @@ -59,7 +58,7 @@ _make_test_img $size $QEMU_IO -c write -P 0x5a 0 512 $TEST_IMG | _filter_qemu_io # The dirty bit must not be set -$PYTHON qcow2.py $TEST_IMG dump-header | grep incompatible_features +./qcow2.py $TEST_IMG dump-header | grep incompatible_features _check_test_img echo @@ -74,7 +73,7 @@ $QEMU_IO -c write -P 0x5a 0 512 -c abort $TEST_IMG | _filter_qemu_io ulimit -c $old_ulimit # The dirty bit must be set -$PYTHON qcow2.py $TEST_IMG dump-header | grep incompatible_features +./qcow2.py $TEST_IMG dump-header | grep incompatible_features _check_test_img echo @@ -83,7 +82,7 @@ echo == Read-only access must still work == $QEMU_IO -r -c read -P 0x5a 0 512 $TEST_IMG | _filter_qemu_io # The dirty bit must be set -$PYTHON qcow2.py $TEST_IMG dump-header | grep incompatible_features +./qcow2.py $TEST_IMG dump-header | grep incompatible_features echo echo == Repairing the image file must succeed == @@ -91,7 +90,7 @@ echo == Repairing the image file must succeed ==
Re: [Qemu-devel] [PATCH 1/4] curl: Fix build when curl_multi_socket_action isn't available
Am 15.05.2014 um 01:28 hat Matthew Booth geschrieben: Signed-off-by: Matthew Booth mbo...@redhat.com --- block/curl.c | 15 +++ 1 file changed, 15 insertions(+) Thanks, applied all to the block branch (with patch 3 fixed as commented there). Please don't forget --cover-letter and --subject-prefix=PATCH v2 next time, that makes it easier to reply to the whole series and to keep track of the current version. Kevin
Re: [Qemu-devel] [PATCH 3/5] block: make 'top' argument to block-commit optional
The Wednesday 14 May 2014 à 23:20:17 (-0400), Jeff Cody wrote : Now that active layer block-commit is supported, the 'top' argument no longer needs to be mandatory. Change it optional, with the default being the active layer in the Do you mean Change it to optional or Make it optional ? device chain. Signed-off-by: Jeff Cody jc...@redhat.com --- blockdev.c | 3 ++- qapi-schema.json | 7 --- qmp-commands.hx| 5 +++-- tests/qemu-iotests/040 | 28 ++-- 4 files changed, 27 insertions(+), 16 deletions(-) diff --git a/blockdev.c b/blockdev.c index 500707e..02c6525 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1868,7 +1868,8 @@ void qmp_block_stream(const char *device, bool has_base, } void qmp_block_commit(const char *device, - bool has_base, const char *base, const char *top, + bool has_base, const char *base, + bool has_top, const char *top, bool has_speed, int64_t speed, Error **errp) { diff --git a/qapi-schema.json b/qapi-schema.json index 36cb964..06a9b5d 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2099,8 +2099,9 @@ # @base: #optional The file name of the backing image to write data into. #If not specified, this is the deepest backing image # -# @top: The file name of the backing image within the image chain, -#which contains the topmost data to be committed down. +# @top:#optional The file name of the backing image within the image chain, +#which contains the topmost data to be committed down. If +#not specified, this is the active layer. # #If top == base, that is an error. #If top == active, the job will not be completed by itself, @@ -2128,7 +2129,7 @@ # ## { 'command': 'block-commit', - 'data': { 'device': 'str', '*base': 'str', 'top': 'str', + 'data': { 'device': 'str', '*base': 'str', '*top': 'str', '*speed': 'int' } } ## diff --git a/qmp-commands.hx b/qmp-commands.hx index cae890e..1aa3c65 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -985,7 +985,7 @@ EQMP { .name = block-commit, -.args_type = device:B,base:s?,top:s,speed:o?, +.args_type = device:B,base:s?,top:s?,speed:o?, .mhandler.cmd_new = qmp_marshal_input_block_commit, }, @@ -1003,7 +1003,8 @@ Arguments: If not specified, this is the deepest backing image (json-string, optional) - top: The file name of the backing image within the image chain, - which contains the topmost data to be committed down. + which contains the topmost data to be committed down. If + not specified, this is the active layer. (json-string, optional) If top == base, that is an error. If top == active, the job will not be completed by itself, diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040 index 734b6a6..803b0c7 100755 --- a/tests/qemu-iotests/040 +++ b/tests/qemu-iotests/040 @@ -35,11 +35,7 @@ test_img = os.path.join(iotests.test_dir, 'test.img') class ImageCommitTestCase(iotests.QMPTestCase): '''Abstract base class for image commit test cases''' -def run_commit_test(self, top, base): -self.assert_no_active_block_jobs() -result = self.vm.qmp('block-commit', device='drive0', top=top, base=base) -self.assert_qmp(result, 'return', {}) - +def wait_for_complete(self): completed = False while not completed: for event in self.vm.get_qmp_events(wait=True): @@ -58,6 +54,18 @@ class ImageCommitTestCase(iotests.QMPTestCase): self.assert_no_active_block_jobs() self.vm.shutdown() +def run_commit_test(self, top, base): +self.assert_no_active_block_jobs() +result = self.vm.qmp('block-commit', device='drive0', top=top, base=base) +self.assert_qmp(result, 'return', {}) +self.wait_for_complete() + +def run_default_commit_test(self): +self.assert_no_active_block_jobs() +result = self.vm.qmp('block-commit', device='drive0') +self.assert_qmp(result, 'return', {}) +self.wait_for_complete() + class TestSingleDrive(ImageCommitTestCase): image_len = 1 * 1024 * 1024 test_len = 1 * 1024 * 256 @@ -109,17 +117,17 @@ class TestSingleDrive(ImageCommitTestCase): self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', backing_img).find(verification failed)) self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', backing_img).find(verification failed)) +def test_top_is_default_active(self): +self.run_default_commit_test() +self.assertEqual(-1,
Re: [Qemu-devel] [PATCH 2/5] block: add helper function to determine if a BDS is in a chain
The Wednesday 14 May 2014 à 23:20:16 (-0400), Jeff Cody wrote : This is a small helper function, to determine if 'base' is in the chain of BlockDriverState 'top'. It returns true if it is in the chain, and false otherwise. If either argument is NULL, it will also return false. Signed-off-by: Jeff Cody jc...@redhat.com --- block.c | 9 + include/block/block.h | 1 + 2 files changed, 10 insertions(+) diff --git a/block.c b/block.c index 81945d3..c4f77c2 100644 --- a/block.c +++ b/block.c @@ -3734,6 +3734,15 @@ BlockDriverState *bdrv_lookup_bs(const char *device, return NULL; } +bool bdrv_is_in_chain(BlockDriverState *top, BlockDriverState *base) +{ +while (top top != base) { +top = top-backing_hd; +} + +return top != NULL; +} + BlockDriverState *bdrv_next(BlockDriverState *bs) { if (!bs) { diff --git a/include/block/block.h b/include/block/block.h index 1b119aa..283a6f3 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -381,6 +381,7 @@ BlockDeviceInfoList *bdrv_named_nodes_list(void); BlockDriverState *bdrv_lookup_bs(const char *device, const char *node_name, Error **errp); +bool bdrv_is_in_chain(BlockDriverState *top, BlockDriverState *base); BlockDriverState *bdrv_next(BlockDriverState *bs); void bdrv_iterate(void (*it)(void *opaque, BlockDriverState *bs), void *opaque); -- 1.8.3.1 Reviewed-by: Benoit Canet ben...@irqsave.net
Re: [Qemu-devel] [PATCH 3/5] block: make 'top' argument to block-commit optional
On Thu, May 15, 2014 at 01:47:55PM +0200, Benoît Canet wrote: The Wednesday 14 May 2014 à 23:20:17 (-0400), Jeff Cody wrote : Now that active layer block-commit is supported, the 'top' argument no longer needs to be mandatory. Change it optional, with the default being the active layer in the Do you mean Change it to optional or Make it optional ? Thanks - forgot the to. device chain. Signed-off-by: Jeff Cody jc...@redhat.com --- blockdev.c | 3 ++- qapi-schema.json | 7 --- qmp-commands.hx| 5 +++-- tests/qemu-iotests/040 | 28 ++-- 4 files changed, 27 insertions(+), 16 deletions(-) diff --git a/blockdev.c b/blockdev.c index 500707e..02c6525 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1868,7 +1868,8 @@ void qmp_block_stream(const char *device, bool has_base, } void qmp_block_commit(const char *device, - bool has_base, const char *base, const char *top, + bool has_base, const char *base, + bool has_top, const char *top, bool has_speed, int64_t speed, Error **errp) { diff --git a/qapi-schema.json b/qapi-schema.json index 36cb964..06a9b5d 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2099,8 +2099,9 @@ # @base: #optional The file name of the backing image to write data into. #If not specified, this is the deepest backing image # -# @top: The file name of the backing image within the image chain, -#which contains the topmost data to be committed down. +# @top:#optional The file name of the backing image within the image chain, +#which contains the topmost data to be committed down. If +#not specified, this is the active layer. # #If top == base, that is an error. #If top == active, the job will not be completed by itself, @@ -2128,7 +2129,7 @@ # ## { 'command': 'block-commit', - 'data': { 'device': 'str', '*base': 'str', 'top': 'str', + 'data': { 'device': 'str', '*base': 'str', '*top': 'str', '*speed': 'int' } } ## diff --git a/qmp-commands.hx b/qmp-commands.hx index cae890e..1aa3c65 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -985,7 +985,7 @@ EQMP { .name = block-commit, -.args_type = device:B,base:s?,top:s,speed:o?, +.args_type = device:B,base:s?,top:s?,speed:o?, .mhandler.cmd_new = qmp_marshal_input_block_commit, }, @@ -1003,7 +1003,8 @@ Arguments: If not specified, this is the deepest backing image (json-string, optional) - top: The file name of the backing image within the image chain, - which contains the topmost data to be committed down. + which contains the topmost data to be committed down. If + not specified, this is the active layer. (json-string, optional) If top == base, that is an error. If top == active, the job will not be completed by itself, diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040 index 734b6a6..803b0c7 100755 --- a/tests/qemu-iotests/040 +++ b/tests/qemu-iotests/040 @@ -35,11 +35,7 @@ test_img = os.path.join(iotests.test_dir, 'test.img') class ImageCommitTestCase(iotests.QMPTestCase): '''Abstract base class for image commit test cases''' -def run_commit_test(self, top, base): -self.assert_no_active_block_jobs() -result = self.vm.qmp('block-commit', device='drive0', top=top, base=base) -self.assert_qmp(result, 'return', {}) - +def wait_for_complete(self): completed = False while not completed: for event in self.vm.get_qmp_events(wait=True): @@ -58,6 +54,18 @@ class ImageCommitTestCase(iotests.QMPTestCase): self.assert_no_active_block_jobs() self.vm.shutdown() +def run_commit_test(self, top, base): +self.assert_no_active_block_jobs() +result = self.vm.qmp('block-commit', device='drive0', top=top, base=base) +self.assert_qmp(result, 'return', {}) +self.wait_for_complete() + +def run_default_commit_test(self): +self.assert_no_active_block_jobs() +result = self.vm.qmp('block-commit', device='drive0') +self.assert_qmp(result, 'return', {}) +self.wait_for_complete() + class TestSingleDrive(ImageCommitTestCase): image_len = 1 * 1024 * 1024 test_len = 1 * 1024 * 256 @@ -109,17 +117,17 @@ class TestSingleDrive(ImageCommitTestCase): self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', backing_img).find(verification failed))
[Qemu-devel] [PATCH v4] serial-pci: Set prog interface field of pci config to 16550 compatible
Signed-off-by: BALATON Zoltan bala...@eik.bme.hu --- v2: resubmission after pc-2.1 is added with the multiport case v3: added compatibility check to avoid changing earlier than pc-2.1 v4: renamed compat property to prog_if hw/char/serial-pci.c | 7 +++ include/hw/i386/pc.h | 15 +++ 2 files changed, 22 insertions(+) diff --git a/hw/char/serial-pci.c b/hw/char/serial-pci.c index 991c99f..a9c 100644 --- a/hw/char/serial-pci.c +++ b/hw/char/serial-pci.c @@ -34,6 +34,7 @@ typedef struct PCISerialState { PCIDevice dev; SerialState state; +uint8_t prog_if; } PCISerialState; typedef struct PCIMultiSerialState { @@ -44,6 +45,7 @@ typedef struct PCIMultiSerialState { SerialState state[PCI_SERIAL_MAX_PORTS]; uint32_t level[PCI_SERIAL_MAX_PORTS]; qemu_irq *irqs; +uint8_t prog_if; } PCIMultiSerialState; static int serial_pci_init(PCIDevice *dev) @@ -60,6 +62,7 @@ static int serial_pci_init(PCIDevice *dev) return -1; } +pci-dev.config[PCI_CLASS_PROG] = pci-prog_if; pci-dev.config[PCI_INTERRUPT_PIN] = 0x01; s-irq = pci_allocate_irq(pci-dev); @@ -101,6 +104,7 @@ static int multi_serial_pci_init(PCIDevice *dev) assert(pci-ports 0); assert(pci-ports = PCI_SERIAL_MAX_PORTS); +pci-dev.config[PCI_CLASS_PROG] = pci-prog_if; pci-dev.config[PCI_INTERRUPT_PIN] = 0x01; memory_region_init(pci-iobar, OBJECT(pci), multiserial, 8 * pci-ports); pci_register_bar(pci-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, pci-iobar); @@ -177,12 +181,14 @@ static const VMStateDescription vmstate_pci_multi_serial = { static Property serial_pci_properties[] = { DEFINE_PROP_CHR(chardev, PCISerialState, state.chr), +DEFINE_PROP_UINT8(prog_if, PCISerialState, prog_if, 0x02), DEFINE_PROP_END_OF_LIST(), }; static Property multi_2x_serial_pci_properties[] = { DEFINE_PROP_CHR(chardev1, PCIMultiSerialState, state[0].chr), DEFINE_PROP_CHR(chardev2, PCIMultiSerialState, state[1].chr), +DEFINE_PROP_UINT8(prog_if, PCIMultiSerialState, prog_if, 0x02), DEFINE_PROP_END_OF_LIST(), }; @@ -191,6 +197,7 @@ static Property multi_4x_serial_pci_properties[] = { DEFINE_PROP_CHR(chardev2, PCIMultiSerialState, state[1].chr), DEFINE_PROP_CHR(chardev3, PCIMultiSerialState, state[2].chr), DEFINE_PROP_CHR(chardev4, PCIMultiSerialState, state[3].chr), +DEFINE_PROP_UINT8(prog_if, PCIMultiSerialState, prog_if, 0x02), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 32a7687..552fbd8 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -271,6 +271,21 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *); .driver = apic,\ .property = version,\ .value= stringify(0x11),\ +},\ +{\ +.driver = pci-serial,\ +.property = prog_if,\ +.value= stringify(0),\ +},\ +{\ +.driver = pci-serial-2x,\ +.property = prof_if,\ +.value= stringify(0),\ +},\ +{\ +.driver = pci-serial-4x,\ +.property = prog_if,\ +.value= stringify(0),\ } #define PC_COMPAT_1_7 \ -- 1.8.1.5
[Qemu-devel] Differential VHD
Hi, What all steps to be taken for differential vhd image? i created one differential image using vhd-util Thanks
Re: [Qemu-devel] [PATCH] Revert iotests: Use configured python
Am 15.05.2014 um 13:34 hat Peter Maydell geschrieben: This reverts commit f915db07ef9c368ea6db6430256de064fdd1525f. This commit is broken because it does not account for the build tree and the source tree being different, and can cause build failures for out-of-tree builds. Revert it until we can identify a better solution to the problem. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- I have an increasingly large queue of pull requests which I'm currently not processing pending a fix to this build breakage. Reverting the commit seems the simplest solution to unblocking this... Barring objections I'm going to commit this to master this afternoon. Acked-by: Kevin Wolf kw...@redhat.com
Re: [Qemu-devel] [PATCH 1/5] block: Auto-generate node_names for each BDS entry
The Wednesday 14 May 2014 à 23:20:15 (-0400), Jeff Cody wrote : Currently, node_name is only filled in when done so explicitly by the user. If no node_name is specified, then the node name field is not populated. If node_names are automatically generated when not specified, that means that all block job operations can be done by reference to the unique node_name field. This eliminates ambiguity in filename pathing (relative filenames, or file descriptors, symlinks, mounts, etc..) that qemu currently needs to deal with. If a node name is specified, then it will not be automatically generated for that BDS entry. If it is automatically generated, it will be prefaced with __qemu##, followed by 8 characters of a unique number, followed by 8 random ASCII characters in the range of 'A-Z'. Some sample generated node-name strings: __qemu##IAIYNXXR __qemu##0002METXTRBQ __qemu##0001FMBORDWG The prefix is to aid in identifying it as a qemu-generated name, the numeric portion is to guarantee uniqueness in a given qemu session, and the random characters are to further avoid any accidental collisions with user-specified node-names. Signed-off-by: Jeff Cody jc...@redhat.com --- block.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index c90c71a..81945d3 100644 --- a/block.c +++ b/block.c @@ -838,12 +838,26 @@ static int bdrv_open_flags(BlockDriverState *bs, int flags) return open_flags; } +#define GEN_NODE_NAME_PREFIX__qemu## +#define GEN_NODE_NAME_MAX_LEN (sizeof(GEN_NODE_NAME_PREFIX) + 8 + 8) static void bdrv_assign_node_name(BlockDriverState *bs, const char *node_name, Error **errp) { +char gen_node_name[GEN_NODE_NAME_MAX_LEN]; The room for the '\0' string termination seems to be missing: char gen_node_name[GEN_NODE_NAME_MAX_LEN + 1]; +static uint32_t counter; /* simple counter to guarantee uniqueness */ + +/* if node_name is NULL, auto-generate a node name */ if (!node_name) { -return; +int len; +snprintf(gen_node_name, GEN_NODE_NAME_MAX_LEN, + %s%08x, GEN_NODE_NAME_PREFIX, counter++); +len = strlen(gen_node_name); +while (len GEN_NODE_NAME_MAX_LEN - 1) { +gen_node_name[len++] = g_random_int_range('A', 'Z'); +} Is this code generating only 7 random chars instead of 8 ? +gen_node_name[GEN_NODE_NAME_MAX_LEN - 1] = '\0'; Could be: gen_node_name[GEN_NODE_NAME_MAX_LEN] = '\0'; if the array is properly declared. +node_name = gen_node_name; } /* empty string node name is invalid */ -- 1.8.3.1
Re: [Qemu-devel] [PATCH RFC 1/8] virtio: add subsections to the migration stream
Am 15.05.2014 12:16, schrieb Michael S. Tsirkin: On Thu, May 15, 2014 at 12:11:12PM +0200, Andreas Färber wrote: Am 15.05.2014 12:03, schrieb Michael S. Tsirkin: On Thu, May 15, 2014 at 11:58:25AM +0200, Andreas Färber wrote: Am 15.05.2014 11:52, schrieb Michael S. Tsirkin: On Thu, May 15, 2014 at 11:20:18AM +0200, Andreas Färber wrote: Am 15.05.2014 09:04, schrieb Greg Kurz: On Thu, 15 May 2014 12:16:35 +0530 Amit Shah amit.s...@redhat.com wrote: On (Thu) 15 May 2014 [09:23:51], Michael S. Tsirkin wrote: On Thu, May 15, 2014 at 11:34:25AM +0530, Amit Shah wrote: On (Wed) 14 May 2014 [17:41:38], Greg Kurz wrote: Since each virtio device is streamed in its own section, the idea is to stream subsections between the end of the device section and the start of the next sections. This allows an older QEMU to complain and exit when fed with subsections: Unknown savevm section type 5 Error -22 while loading VM state Please make this configurable -- either via configure or device properties. That avoids having to break existing configurations that work without this patch. Since backwards migration is not supported upstream, wouldn't it be easiest to just add support for the subsection marker and skipping to the end of section in that downstream? Backwards and forwards migration need to be supported, customers told us repeatedly. So some downstreams support this and not supporting it upstream just means downstreams need to do their own thing. As importantly, ping-pong migration is the only reliable way to stress migration. So if we want to test cross-version we need it to work both way. Finally, the real issue and difficulty with cross-version migration is making VM behave in a backwards compatible way. Serializing in a compatible way is a trivial problem, or would be if the code wasn't a mess :) Once you do the hard part, breaking migration because of the trivial serialization issue is just silly. And special-casing forward migration does not make code simpler, it really only leads to proliferation of hacks and lack of symmetry. So yes it's a useful feature, and no not supporting it does not help anyway. It seems you misunderstand. I was not saying it's not useful. My point is that VMStateSubsections added in newer versions (and thus not existing in older versions) need to be handled for any VMState-converted devices. So why can't we make that work for virtio too? Sure. AFAIK the way to this works is by adding an field_exists callback, and not sending the section when we are in a compat mode. OK, but upstream always sends the latest version today. So isn't that just two ifs that you would need to add in save and load functions added here for the downstream? x86_64 is unaffected from ppc's endianness changes and you still do ppc64 BE, so there's no real functional problem for RHEL, is there? I'm sorry I don't understand what you are saying here. Simply put, if you specify a compatible -M then your device should behave, and migrate, exactly like and old qemu did. Whatever the version_id fields are for, upstream QEMU *always* saves the newest version_id format that it knows. There is no mechanism that I'm aware of in upstream QEMU for migrating device fields dependent on -M. So a device in QEMU only migrates exactly like an old QEMU did if neither fields nor subsections were added. Subsections are usually migrated based on whether the state differs from the default state (didn't check whether the final patch does this correctly here, but doesn't matter for 1/8 concept). So for x86 the subsection should *never* get written and thus not be a problem for you. For ppc64 it should not get written either, unless you care about migration from SLES12 or upstream ppc64le to RHEL ppc64. As I understood the IRC discussion Alex and me had with Greg about this, this is copying the exact code VMState uses to write its subsections, so there would be no forward migration problems at all once we convert to VMState and VMStateSubsections. The only question remaining is how does your downstream react when it encounters a subsection marker for subsections it doesn't know - which is not something upstream can solve for you unless you contribute backwards migration support upstream. Don't forget that Greg is introducing subsection support *because of* your backwards migration wish, so telling him not to add subsections because of backwards migration is really weird! Either subsections work with your downstream or they don't; if they don't, then you have much larger problems than can be solved by blocking this one section for ppc. Therefore I was saying if your downstream forgot to handle the case of a non-VMState device having subsections, then it may be easier to fix exactly that than make Greg jump through more hoops here for a theoretical problem you are unlikely to encounter on your downstream. Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409
Re: [Qemu-devel] [PATCH 1/5] block: Auto-generate node_names for each BDS entry
On Thu, May 15, 2014 at 01:58:59PM +0200, Benoît Canet wrote: The Wednesday 14 May 2014 à 23:20:15 (-0400), Jeff Cody wrote : Currently, node_name is only filled in when done so explicitly by the user. If no node_name is specified, then the node name field is not populated. If node_names are automatically generated when not specified, that means that all block job operations can be done by reference to the unique node_name field. This eliminates ambiguity in filename pathing (relative filenames, or file descriptors, symlinks, mounts, etc..) that qemu currently needs to deal with. If a node name is specified, then it will not be automatically generated for that BDS entry. If it is automatically generated, it will be prefaced with __qemu##, followed by 8 characters of a unique number, followed by 8 random ASCII characters in the range of 'A-Z'. Some sample generated node-name strings: __qemu##IAIYNXXR __qemu##0002METXTRBQ __qemu##0001FMBORDWG The prefix is to aid in identifying it as a qemu-generated name, the numeric portion is to guarantee uniqueness in a given qemu session, and the random characters are to further avoid any accidental collisions with user-specified node-names. Signed-off-by: Jeff Cody jc...@redhat.com --- block.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index c90c71a..81945d3 100644 --- a/block.c +++ b/block.c @@ -838,12 +838,26 @@ static int bdrv_open_flags(BlockDriverState *bs, int flags) return open_flags; } +#define GEN_NODE_NAME_PREFIX__qemu## +#define GEN_NODE_NAME_MAX_LEN (sizeof(GEN_NODE_NAME_PREFIX) + 8 + 8) static void bdrv_assign_node_name(BlockDriverState *bs, const char *node_name, Error **errp) { +char gen_node_name[GEN_NODE_NAME_MAX_LEN]; The room for the '\0' string termination seems to be missing: char gen_node_name[GEN_NODE_NAME_MAX_LEN + 1]; The array includes room for it, note the use of 'sizeof()': #define GEN_NODE_NAME_MAX_LEN (sizeof(GEN_NODE_NAME_PREFIX) + 8 + 8) sizeof() includes the '\0' in the length, while strlen() does not; e.g.: sizeof(four) = 5 strlen(four) = 4 +static uint32_t counter; /* simple counter to guarantee uniqueness */ + +/* if node_name is NULL, auto-generate a node name */ if (!node_name) { -return; +int len; +snprintf(gen_node_name, GEN_NODE_NAME_MAX_LEN, + %s%08x, GEN_NODE_NAME_PREFIX, counter++); +len = strlen(gen_node_name); +while (len GEN_NODE_NAME_MAX_LEN - 1) { +gen_node_name[len++] = g_random_int_range('A', 'Z'); +} Is this code generating only 7 random chars instead of 8 ? It generates 8 random characters (the sample node-name strings in the commit message were pulled straight from the QMP command 'query-named-block-nodes') +gen_node_name[GEN_NODE_NAME_MAX_LEN - 1] = '\0'; Could be: gen_node_name[GEN_NODE_NAME_MAX_LEN] = '\0'; if the array is properly declared. That would go over the array bounds by 1. +node_name = gen_node_name; } /* empty string node name is invalid */ -- 1.8.3.1
Re: [Qemu-devel] [PATCH v3 4/4] sclp-s390: Add memory hotplug SCLPs
On 13/05/14 20:02, Matthew Rosato wrote: On 05/12/2014 03:35 AM, Christian Borntraeger wrote: On 07/05/14 20:05, Matthew Rosato wrote: Add memory information to read SCP info and add handlers for Read Storage Element Information, Attach Storage Element, Assign Storage and Unassign Storage. Signed-off-by: Matthew Rosato mjros...@linux.vnet.ibm.com --- hw/s390x/sclp.c| 245 ++-- target-s390x/cpu.h | 15 target-s390x/kvm.c |5 ++ 3 files changed, 258 insertions(+), 7 deletions(-) diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c index 338dbdf..b9425ca 100644 --- a/hw/s390x/sclp.c +++ b/hw/s390x/sclp.c @@ -16,7 +16,8 @@ #include sysemu/kvm.h #include exec/memory.h #include sysemu/sysemu.h - +#include exec/address-spaces.h +#include qemu/config-file.h #include hw/s390x/sclp.h #include hw/s390x/event-facility.h @@ -33,10 +34,18 @@ static inline SCLPEventFacility *get_event_facility(void) static void read_SCP_info(SCCB *sccb) { ReadInfo *read_info = (ReadInfo *) sccb; +sclpMemoryHotplugDev *mhd = get_sclp_memory_hotplug_dev(); This only works for the ccw machine. The legacy machine does not yet have this device and qemu complains about the system bus not being hotplug capable. Cant you just initialize the device in the ccw machine and check here only for the existence? Agh, forgot about the legacy machine. Sure, I can split out creation from the get_sclp_memory_hotplug_dev() code, and then rework the code here. But then I got to thinking -- should I not be setting SCLP_FC_ASSIGN_ATTACH_READ_STOR below when we are running in the legacy machine? Because without the sclp_memory_hotplug_dev, we really aren't supporting these new functions (assign_storage, unassign_storage, assign_storage_element, read_storage_element*_info). Then, I can probably add some assert(mhd) for the remaining get_sclp_memory_hotplug_dev() calls, since we shouldn't get getting there for the legacy machine. What do you think? Alternatively, I'd have to add code to fudge returns for each of these new fuctions. Given our test coverage for the legacy machine, I agree to not add memory hotplug to this machine. Please make sure to enhance read_SCP_info to be able to handle the non-existence of the hotplug device. CPUState *cpu; -int shift = 0; int cpu_count = 0; int i = 0; +int rnsize, rnmax; +QemuOpts *opts = qemu_opts_find(qemu_find_opts(memory), NULL); +int slots = qemu_opt_get_number(opts, slots, 0); +int max_avail_slots = s390_get_memslot_count(kvm_state); + +if (slots max_avail_slots) { +slots = max_avail_slots; +} CPU_FOREACH(cpu) { cpu_count++; @@ -52,16 +61,222 @@ static void read_SCP_info(SCCB *sccb) read_info-entries[i].type = 0; } -read_info-facilities = cpu_to_be64(SCLP_HAS_CPU_INFO); +/* + * The storage increment size is a multiple of 1M and is a power of 2. + * The number of storage increments must be MAX_STORAGE_INCREMENTS or fewer. + */ +while ((ram_size mhd-increment_size) MAX_STORAGE_INCREMENTS) { +mhd-increment_size++; +} +while ((mhd-standby_mem_size mhd-increment_size) + MAX_STORAGE_INCREMENTS) { +mhd-increment_size++; +} + +mhd-standby_subregion_size = MEM_SECTION_SIZE; +/* Deduct the memory slot already used for core */ +if (slots 0) { +while ((mhd-standby_subregion_size * (slots - 1) + mhd-standby_mem_size)) { +mhd-standby_subregion_size = mhd-standby_subregion_size 1; +} +} +/* + * Initialize mapping of guest standby memory sections indicating which + * are and are not online. Assume all standby memory begins offline. + */ +if (mhd-standby_state_map == 0) { +if (mhd-standby_mem_size % mhd-standby_subregion_size) { +mhd-standby_state_map = g_malloc0((mhd-standby_mem_size / + mhd-standby_subregion_size + 1) * + (mhd-standby_subregion_size / + MEM_SECTION_SIZE)); +} else { +mhd-standby_state_map = g_malloc0(mhd-standby_mem_size / + MEM_SECTION_SIZE); +} +} + +mhd-padded_ram_size = ram_size + mhd-pad_size; +mhd-rzm = 1 mhd-increment_size; +rnsize = 1 (mhd-increment_size - 20); +if (rnsize = 128) { +read_info-rnsize = rnsize; +} else { +read_info-rnsize = 0; +read_info-rnsize2 = cpu_to_be32(rnsize); +} -while ((ram_size (20 + shift)) 65535) { -shift++; +rnmax = ((ram_size + mhd-standby_mem_size + mhd-pad_size) + mhd-increment_size); +if (rnmax 0x1) { +read_info-rnmax