Re: [Qemu-devel] [PATCH 2/4] qxl: split qxl functions in common and pci files
On 08/24/12 21:14, Erlon Cruz wrote: From: Fabiano Fidêncio fabi...@fidencio.org This commit splits qxl functions into common functions (located in qxl.c) and pci-specific functions (located in qxl-pci.c). All prototypes are being kept in qxl.h, as common MACROS and inline functions. Moreover, this commit is exposing a lot of APIs, don't know if it's the correct way to do it, but it was the only way that we saw to do it. Try enabling rename detection for this one (git format-patch -M). diff --git a/hw/qxl.h b/hw/qxl.h index f25e341..516e7da 100644 --- a/hw/qxl.h +++ b/hw/qxl.h @@ -143,6 +143,44 @@ typedef struct QXLDevice { } \ } while (0) +/* + * NOTE: SPICE_RING_PROD_ITEM accesses memory on the pci bar and as + * such can be changed by the guest, so to avoid a guest trigerrable + * abort we just qxl_set_guest_bug and set the return to NULL. Still + * it may happen as a result of emulator bug as well. + */ Why these are here and not in qxl-pci.c? +void init_qxl_rom(QXLDevice *d); +void init_qxl_ram(QXLDevice *d); Same question. +void interface_get_init_info(QXLInstance *sin, QXLDevInitInfo *info); +int interface_get_command(QXLInstance *sin, struct QXLCommandExt *ext); +int interface_req_cmd_notification(QXLInstance *sin); +void interface_release_resource(QXLInstance *sin, struct QXLReleaseInfoExt ext); +int interface_get_cursor_command(QXLInstance *sin, struct QXLCommandExt *ext); +int interface_req_cursor_notification(QXLInstance *sin); +void interface_notify_update(QXLInstance *sin, uint32_t update_id); +int interface_flush_resources(QXLInstance *sin); +void interface_update_area_complete(QXLInstance *sin, uint32_t surface_id, QXLRect *dirty, uint32_t num_updated_rects); +void interface_async_complete(QXLInstance *sin, uint64_t cookie_token); +ram_addr_t qxl_rom_size(void); Same question. I'd expect at least some of these having a virtio-specific implementation. interface_get_command() for example, which gets a qxl command from the ring ... cheers, Gerd
Re: [Qemu-devel] [PATCH 4/4] qxl: introducing virtio-qxl
Hi, To enable the VirtIOQXL device, use '-virtio-qxl'. Video output will be Please don't add a new option. 'qemu -vga none -device virtio-qxl' should work these days. You could also make virtio-qxl a valid choice for '-vga' for convenience. cheers, Gerd
Re: [Qemu-devel] Implementing qxl-virtio on QEMU
On 08/24/12 21:14, Erlon Cruz wrote: The following patches makes provides video support to non PCI architectures, please review! Can you give an overview on the virtio-qxl virtual hardware design? thanks, Gerd
Re: [Qemu-devel] [PATCH v2] register reset handler to write image into memory
On 26.08.2012, at 20:50, Yin Olivia-R63875 r63...@freescale.com wrote: Thanks to Dunrong and Andreas. $ scripts/get_maintainer.pl -f hw/loader.c Alexander Graf ag...@suse.de (commit_signer:3/6=50%) Anthony Liguori aligu...@us.ibm.com (commit_signer:2/6=33%) Stefan Weil w...@mail.berlios.de (commit_signer:1/6=17%) Benjamin Herrenschmidt b...@kernel.crashing.org (commit_signer:1/6=17%) Avi Kivity a...@redhat.com (commit_signer:1/6=17%) Dear maintainers, Could you please help review this patch? So far I got feedback from Andreas and try to answer the question. This patch does not answer the question why you try to avoid the ROM blobs and what ROM blobs are still being used for after your patch. I don't think it makes much sense to work around them for your use cases and to leave them behind - if there's something fundamentally wrong with them they should be ripped out completely or fixed. But maybe I'm misunderstanding in the absence of explanations? It's a general problem. For example, in my case, there're 3 different files loaded from host rootfs. $ qemu-system-ppc -enable-kvm -m 256 -nographic -M mpc8544ds -kernel uImage.8572.agraf -initrd /media/ram/guest-8572.rootfs.ext2.gz -append root=/dev/ram rw loglevel=7 console=ttyS0,115200 -serial tcp::4445,server -net nic (qemu) info roms addr= size=0x782840 mem=ram name=uImage.8572.agraf addr=00c0 size=0x01 mem=ram name=mpc8544ds.dtb addr=0200 size=0x3f922f mem=ram name=/media/ram/guest-8572.rootfs.ext2.gz The problem is that rom_add_*() mallocs memory for the image, and then rom_reset() copies those images into the guest's memory, but the QEMU memory does not get freed. On a VM reset, the images get recopied from QEMU to guest. Comparing the memory map of qemu process before and after starting up guest, we can find that QEMU consumes much memory for those images. $ diff -urN pmap.pre.log pmap.post.log --- pmap.pre.log +++ pmap.post.log @@ -33,7 +33,14 @@ 0ffee000 8K rwx-- /lib/ld-2.13.so 1000 3472K r-x-- qemu-system-ppc 10374000112K rwx-- qemu-system-ppc -1039 6524K rwx--[ anon ] +1039 7100K rwx--[ anon ] 48002000 16K rw---[ anon ] +48006000 4K -[ anon ] +48007000 8188K rw---[ anon ] +48806000 8K rw-s-[ anon ] +48808000 4K rw---[ anon ] +48809000 262144K rw---[ anon ] +58809000 5280K rw---[ anon ] +5cb98000 7692K rw---[ anon ] bf93e000132K rw---[ stack ] - total14456K + total 298352K Exactly we can re-load them from disk on a reset instead of holding onto the images in QEMU's memory. With this patch, the two big images (uImage and especially initrd) will not be loaded into QEMU's memory (qemu) info roms addr=00c0 size=0x01 mem=ram name=mpc8544ds.dtb It will save much memory space according to memory map of QEMU process. # diff -urN pmap.pre.log pmap.post.log --- pmap.pre.log +++ pmap.post.log @@ -33,7 +33,14 @@ 0ffee000 8K rwx-- /lib/ld-2.13.so 1000 3472K r-x-- qemu-system-ppc 10374000112K rwx-- qemu-system-ppc -1039 6524K rwx--[ anon ] +1039 7036K rwx--[ anon ] 48002000 16K rw---[ anon ] +48006000 4K -[ anon ] +48007000 8188K rw---[ anon ] +48806000 8K rw-s-[ anon ] +48808000 4K rw---[ anon ] +48809000 262144K rw---[ anon ] +58809000 4K rw---[ anon ] +58c04000 1204K rw---[ anon ] bfb2a000132K rw---[ stack ] - total14456K + total 286524K This patch changes all the image load process called by load_uimage() and load_image_targphys() in platform initialization. This doesn't explain why you leave the old in-RAM code alive though. The only reason I can imagine would be to allow for reset to not reload new roms after an update. Anthony, any opinion here? Do we need the keep-in-RAM rom code? Or could we just always load rom blobs on demand for everything? Alex Best Regards, Olivia -Original Message- From: Dunrong Huang [mailto:riegama...@gmail.com] Sent: Thursday, August 23, 2012 6:44 PM To: Yin Olivia-R63875 Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org Subject: Re: [Qemu-devel] [PATCH v2] register reset handler to write image into memory 2012/8/23 Yin Olivia-R63875 r63...@freescale.com: Dear All, I can't find MAINTAINER of hw/loader.c. Who can help review and apply this patch? Please use the script scripts/get_maintainer.pl, like: $ scripts/get_maintainer.pl your_patch_file.patch or $ scripts/get_maintainer.pl -f hw/loader.c Best Regards, Olivia Yin -- Best Regards, Dunrong Huang
Re: [Qemu-devel] [PATCH for-1.2 0/2] migrate PV EOI MSR
On 2012-08-26 17:59, Michael S. Tsirkin wrote: It turns out PV EOI gets disabled after migration - until next guest reset. This is because we are missing code to actually migrate it. This patch fixes it up: it does not do anything useful without kvm irqchip but applies cleanly to qemu.git as well as qemu-kvm.git, so I think it's cleaner to apply it in qemu.git to keep diff to minimum. There is nothing except pci-assign left in qemu-kvm (which will be posted for upstream in a minute), so you are intuitively doing the right thing. Patch 2 looks good to me, see patch 1 for the clean procedure. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH for-1.2 1/2] linux-headers: update asm/kvm_para.h to 3.6
On 2012-08-26 17:59, Michael S. Tsirkin wrote: Update asm-x96/kvm_para.h to version present in Linux 3.6. Nope, we have update-linux-headers.sh for this. Just run it again 3.6-rcX, grab the result, and mention the source (release version or kvm.git hash). Jan This is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-x86/kvm_para.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h index f2ac46a..a1c3d72 100644 --- a/linux-headers/asm-x86/kvm_para.h +++ b/linux-headers/asm-x86/kvm_para.h @@ -22,6 +22,7 @@ #define KVM_FEATURE_CLOCKSOURCE23 #define KVM_FEATURE_ASYNC_PF 4 #define KVM_FEATURE_STEAL_TIME 5 +#define KVM_FEATURE_PV_EOI 6 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -37,6 +38,7 @@ #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 +#define MSR_KVM_PV_EOI_EN 0x4b564d04 struct kvm_steal_time { __u64 steal; @@ -89,5 +91,10 @@ struct kvm_vcpu_pv_apf_data { __u32 enabled; }; +#define KVM_PV_EOI_BIT 0 +#define KVM_PV_EOI_MASK (0x1 KVM_PV_EOI_BIT) +#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK +#define KVM_PV_EOI_DISABLED 0x0 + #endif /* _ASM_X86_KVM_PARA_H */ signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH 0/4] uq/master: Add classic PCI device assignment
I'm proud to present probably the last patch series to merge qemu-kvm into upstream: This one adds PCI device assignment for x86 using the classic interface that the KVM model provides. See the last patch for reasons why we still want this while next-generation device assignment via VFIO is approaching. It's been a long journey, but once this is merged, I think we can close the qemu-kvm chapter. I already did so, all work is based on QEMU now. Jan Kiszka (4): kvm: Introduce kvm_irqchip_update_msi_route kvm: Introduce kvm_has_intx_set_mask kvm: i386: Add services required for PCI device assignment kvm: i386: Add classic PCI device assignment hw/kvm/Makefile.objs |2 +- hw/kvm/pci-assign.c| 1929 kvm-all.c | 50 ++ kvm.h |2 + target-i386/kvm.c | 141 target-i386/kvm_i386.h | 22 + 6 files changed, 2145 insertions(+), 1 deletions(-) create mode 100644 hw/kvm/pci-assign.c -- 1.7.3.4
[Qemu-devel] [PATCH 2/4] kvm: Introduce kvm_has_intx_set_mask
From: Jan Kiszka jan.kis...@siemens.com Will be used by PCI device assignment code. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c |8 kvm.h |1 + 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index fd9d9b4..84d4f7f 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -88,6 +88,7 @@ struct KVMState int pit_state2; int xsave, xcrs; int many_ioeventfds; +int intx_set_mask; /* The man page (and posix) say ioctl numbers are signed int, but * they're not. Linux, glibc and *BSD all treat ioctl numbers as * unsigned, and treating them as signed here can break things */ @@ -1387,6 +1388,8 @@ int kvm_init(void) s-irq_set_ioctl = KVM_IRQ_LINE_STATUS; } +s-intx_set_mask = kvm_check_extension(s, KVM_CAP_PCI_2_3); + ret = kvm_arch_init(s); if (ret 0) { goto err; @@ -1739,6 +1742,11 @@ int kvm_has_gsi_routing(void) #endif } +int kvm_has_intx_set_mask(void) +{ +return kvm_state-intx_set_mask; +} + void *kvm_vmalloc(ram_addr_t size) { #ifdef TARGET_S390X diff --git a/kvm.h b/kvm.h index 5cefe3a..dea2998 100644 --- a/kvm.h +++ b/kvm.h @@ -117,6 +117,7 @@ int kvm_has_xcrs(void); int kvm_has_pit_state2(void); int kvm_has_many_ioeventfds(void); int kvm_has_gsi_routing(void); +int kvm_has_intx_set_mask(void); #ifdef NEED_CPU_H int kvm_init_vcpu(CPUArchState *env); -- 1.7.3.4
[Qemu-devel] [PATCH 1/4] kvm: Introduce kvm_irqchip_update_msi_route
From: Jan Kiszka jan.kis...@siemens.com This service allows to update an MSI route without releasing/reacquiring the associated VIRQ. Will be used by PCI device assignment, later on likely also by virtio/vhost and VFIO. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c | 42 ++ kvm.h |1 + 2 files changed, 43 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index d4d8a1f..fd9d9b4 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -963,6 +963,30 @@ static void kvm_add_routing_entry(KVMState *s, kvm_irqchip_commit_routes(s); } +static int kvm_update_routing_entry(KVMState *s, +struct kvm_irq_routing_entry *new_entry) +{ +struct kvm_irq_routing_entry *entry; +int n; + +for (n = 0; n s-irq_routes-nr; n++) { +entry = s-irq_routes-entries[n]; +if (entry-gsi != new_entry-gsi) { +continue; +} + +entry-type = new_entry-type; +entry-flags = new_entry-flags; +entry-u = new_entry-u; + +kvm_irqchip_commit_routes(s); + +return 0; +} + +return -ESRCH; +} + void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin) { struct kvm_irq_routing_entry e; @@ -1125,6 +1149,24 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) return virq; } +int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg) +{ +struct kvm_irq_routing_entry kroute; + +if (!kvm_irqchip_in_kernel()) { +return -ENOSYS; +} + +kroute.gsi = virq; +kroute.type = KVM_IRQ_ROUTING_MSI; +kroute.flags = 0; +kroute.u.msi.address_lo = (uint32_t)msg.address; +kroute.u.msi.address_hi = msg.address 32; +kroute.u.msi.data = msg.data; + +return kvm_update_routing_entry(s, kroute); +} + static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int virq, bool assign) { struct kvm_irqfd irqfd = { diff --git a/kvm.h b/kvm.h index 37d1f81..5cefe3a 100644 --- a/kvm.h +++ b/kvm.h @@ -270,6 +270,7 @@ int kvm_set_ioeventfd_mmio(int fd, uint32_t adr, uint32_t val, bool assign, int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign); int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg); +int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg); void kvm_irqchip_release_virq(KVMState *s, int virq); int kvm_irqchip_add_irqfd_notifier(KVMState *s, EventNotifier *n, int virq); -- 1.7.3.4
[Qemu-devel] [PATCH 3/4] kvm: i386: Add services required for PCI device assignment
From: Jan Kiszka jan.kis...@siemens.com These helpers abstract the interaction of upcoming pci-assign with the KVM kernel services. Put them under i386 only as other archs will implement device pass-through via VFIO and not this classic interface. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- target-i386/kvm.c | 141 target-i386/kvm_i386.h | 22 2 files changed, 163 insertions(+), 0 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 696b14a..5e2d4f5 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -31,6 +31,7 @@ #include hw/apic.h #include ioport.h #include hyperv.h +#include hw/pci.h //#define DEBUG_KVM @@ -2055,3 +2056,143 @@ void kvm_arch_init_irq_routing(KVMState *s) kvm_msi_via_irqfd_allowed = true; kvm_gsi_routing_allowed = true; } + +/* Classic KVM device assignment interface. Will remain x86 only. */ +int kvm_device_pci_assign(KVMState *s, PCIHostDeviceAddress *dev_addr, + uint32_t flags, uint32_t *dev_id) +{ +struct kvm_assigned_pci_dev dev_data = { +.segnr = dev_addr-domain, +.busnr = dev_addr-bus, +.devfn = PCI_DEVFN(dev_addr-slot, dev_addr-function), +.flags = flags, +}; +int ret; + +dev_data.assigned_dev_id = +(dev_addr-domain 16) | (dev_addr-bus 8) | dev_data.devfn; + +ret = kvm_vm_ioctl(s, KVM_ASSIGN_PCI_DEVICE, dev_data); +if (ret 0) { +return ret; +} + +*dev_id = dev_data.assigned_dev_id; + +return 0; +} + +int kvm_device_pci_deassign(KVMState *s, uint32_t dev_id) +{ +struct kvm_assigned_pci_dev dev_data = { +.assigned_dev_id = dev_id, +}; + +return kvm_vm_ioctl(s, KVM_DEASSIGN_PCI_DEVICE, dev_data); +} + +static int kvm_assign_irq_internal(KVMState *s, uint32_t dev_id, + uint32_t irq_type, uint32_t guest_irq) +{ +struct kvm_assigned_irq assigned_irq = { +.assigned_dev_id = dev_id, +.guest_irq = guest_irq, +.flags = irq_type, +}; + +if (kvm_check_extension(s, KVM_CAP_ASSIGN_DEV_IRQ)) { +return kvm_vm_ioctl(s, KVM_ASSIGN_DEV_IRQ, assigned_irq); +} else { +return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, assigned_irq); +} +} + +int kvm_device_intx_assign(KVMState *s, uint32_t dev_id, bool use_host_msi, + uint32_t guest_irq) +{ +uint32_t irq_type = KVM_DEV_IRQ_GUEST_INTX | +(use_host_msi ? KVM_DEV_IRQ_HOST_MSI : KVM_DEV_IRQ_HOST_INTX); + +return kvm_assign_irq_internal(s, dev_id, irq_type, guest_irq); +} + +int kvm_device_intx_set_mask(KVMState *s, uint32_t dev_id, bool masked) +{ +struct kvm_assigned_pci_dev dev_data = { +.assigned_dev_id = dev_id, +.flags = masked ? KVM_DEV_ASSIGN_MASK_INTX : 0, +}; + +return kvm_vm_ioctl(s, KVM_ASSIGN_SET_INTX_MASK, dev_data); +} + +static int kvm_deassign_irq_internal(KVMState *s, uint32_t dev_id, + uint32_t type) +{ +struct kvm_assigned_irq assigned_irq = { +.assigned_dev_id = dev_id, +.flags = type, +}; + +return kvm_vm_ioctl(s, KVM_DEASSIGN_DEV_IRQ, assigned_irq); +} + +int kvm_device_intx_deassign(KVMState *s, uint32_t dev_id, bool use_host_msi) +{ +return kvm_deassign_irq_internal(s, dev_id, KVM_DEV_IRQ_GUEST_INTX | +(use_host_msi ? KVM_DEV_IRQ_HOST_MSI : KVM_DEV_IRQ_HOST_INTX)); +} + +int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, int virq) +{ +return kvm_assign_irq_internal(s, dev_id, KVM_DEV_IRQ_HOST_MSI | + KVM_DEV_IRQ_GUEST_MSI, virq); +} + +int kvm_device_msi_deassign(KVMState *s, uint32_t dev_id) +{ +return kvm_deassign_irq_internal(s, dev_id, KVM_DEV_IRQ_GUEST_MSI | +KVM_DEV_IRQ_HOST_MSI); +} + +bool kvm_device_msix_supported(KVMState *s) +{ +/* The kernel lacks a corresponding KVM_CAP, so we probe by calling + * KVM_ASSIGN_SET_MSIX_NR with an invalid parameter. */ +return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, NULL) == -EFAULT; +} + +int kvm_device_msix_init_vectors(KVMState *s, uint32_t dev_id, + uint32_t nr_vectors) +{ +struct kvm_assigned_msix_nr msix_nr = { +.assigned_dev_id = dev_id, +.entry_nr = nr_vectors, +}; + +return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, msix_nr); +} + +int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t vector, + int virq) +{ +struct kvm_assigned_msix_entry msix_entry = { +.assigned_dev_id = dev_id, +.gsi = virq, +.entry = vector, +}; + +return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, msix_entry); +} + +int kvm_device_msix_assign(KVMState *s, uint32_t dev_id) +{ +return kvm_assign_irq_internal(s, dev_id, KVM_DEV_IRQ_HOST_MSIX | +
Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycle problem
Il 25/08/2012 09:42, liu ping fan ha scritto: I don't see why MMIO dispatch should hold the IDEBus ref rather than the PCIIDEState. When transfer memory_region_init_io() 3rd para from void* opaque to Object* obj, the obj : opaque is not neccessary 1:1 map. For such situation, in order to let MemoryRegionOps tell between them, we should pass PCIIDEState-bus[0], bus[1] separately. The rule should be that the obj is the object that you want referenced, and that should be the PCIIDEState. But this is anyway moot because it only applies to objects that are converted to use unlocked dispatch. This likely will not be the case for IDE. Paolo In the case of the PIIX, the BARs are set up by the PCIIDEState in bmdma_setup_bar (called by bmdma_setup_bar). Supposing we have convert PCIIDEState-bmdma[0]/[1] to Object. And in mmio-dispatch, object_ref will impose on bmdma[0/[1], but this can not prevent PCIIDEState-refcnt=0, and then the whole object disappear!
[Qemu-devel] [PATCH V6 0/2] Add JSON output to qemu-img info
This patchset add a JSON output mode to the qemu-img info command. It's a rewrite from scratch of the original patchset by Wenchao Xia following Anthony Liguori advices on JSON formating. the --output=(json|human) option is now mandatory on the command line. Benoît Canet (3): qapi: Add SnapshotInfo. qapi: Add ImageInfo. qemu-img: Add json output option to the info command. in v2: eblake: make some field optionals squash the two qapi patchs together fix a typo on vm_clock_nsec bcanet: fix a potential memory leak in v3: lcapitulino: remove unneeded test put '\n' at the end of json in printf statement drop the uneeded head pointer in collect_snapshots in v4: Wenchao Xia Kevin Wolf: -Refactor to separate rate ImageInfo collection from human printing. Kevin Wolf: -Use --output=(json|human). -make the two choice exclusive and print a message if none is specified. -cosmetic '=' alignement in collect snapshots. Benoît Canet: -add full-backing-filename to the ImageInfo structure (needed for human printing) -make ImageInfo-actual_size optional depending on the context. in v5: Eric Blake: -use a constant for getopt parsing to avoid future short options collision. -make the command default to --output=human. -fix spurious whitespace change. -split vm-clock-nsec in two fields vm-clock-sec and vm-clock-nsec. -declare JSON structure as Since 1.3 in v6: Blue Swirl: -Add missing const in getopt structure declaration. Eric Blake: -Remove spurious undef. -Use an enum instead of two boolean. Benoît Canet (2): qapi: Add SnapshotInfo and ImageInfo. qemu-img: Add json output option to the info command. Makefile |3 +- qapi-schema.json | 64 ++ qemu-img.c | 259 +- 3 files changed, 282 insertions(+), 44 deletions(-) -- 1.7.9.5
[Qemu-devel] [PATCH V6 2/2] qemu-img: Add json output option to the info command.
This option --output=[human|json] make qemu-img info output on human or JSON representation at the choice of the user. example: { snapshots: [ { vm-clock-nsec: 637102488, name: vm-20120821145509, date-sec: 1345553709, date-nsec: 220289000, vm-clock-sec: 20, id: 1, vm-state-size: 96522745 }, { vm-clock-nsec: 28210866, name: vm-20120821154059, date-sec: 1345556459, date-nsec: 171392000, vm-clock-sec: 46, id: 2, vm-state-size: 101208714 } ], virtual-size: 1073741824, filename: snap.qcow2, cluster-size: 65536, format: qcow2, actual-size: 985587712, dirty-flag: false } Signed-off-by: Benoit Canet ben...@irqsave.net --- Makefile |3 +- qemu-img.c | 259 ++-- 2 files changed, 218 insertions(+), 44 deletions(-) diff --git a/Makefile b/Makefile index ab82ef3..9ba064b 100644 --- a/Makefile +++ b/Makefile @@ -160,7 +160,8 @@ tools-obj-y = $(oslib-obj-y) $(trace-obj-y) qemu-tool.o qemu-timer.o \ iohandler.o cutils.o iov.o async.o tools-obj-$(CONFIG_POSIX) += compatfd.o -qemu-img$(EXESUF): qemu-img.o $(tools-obj-y) $(block-obj-y) +qemu-img$(EXESUF): qemu-img.o $(tools-obj-y) $(block-obj-y) $(qapi-obj-y) \ + qapi-visit.o qapi-types.o qemu-nbd$(EXESUF): qemu-nbd.o $(tools-obj-y) $(block-obj-y) qemu-io$(EXESUF): qemu-io.o cmd.o $(tools-obj-y) $(block-obj-y) diff --git a/qemu-img.c b/qemu-img.c index 80cfb9b..fe4a4fc 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -21,12 +21,16 @@ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN * THE SOFTWARE. */ +#include qapi-visit.h +#include qapi/qmp-output-visitor.h +#include qjson.h #include qemu-common.h #include qemu-option.h #include qemu-error.h #include osdep.h #include sysemu.h #include block_int.h +#include getopt.h #include stdio.h #ifdef _WIN32 @@ -84,6 +88,7 @@ static void help(void) '-p' show progress of command (only certain commands)\n '-S' indicates the consecutive number of bytes that must contain only zeros\n for qemu-img to create a sparse image during conversion\n + '--output' takes the format in which the output must be done (human or json)\n \n Parameters to check subcommand:\n '-r' tries to repair any inconsistencies that are found during the check.\n @@ -1102,21 +1107,196 @@ static void dump_snapshots(BlockDriverState *bs) g_free(sn_tab); } -static int img_info(int argc, char **argv) +static void collect_snapshots(BlockDriverState *bs , ImageInfo *info) +{ +int i, sn_count; +QEMUSnapshotInfo *sn_tab = NULL; +SnapshotInfoList *info_list, *cur_item = NULL; +sn_count = bdrv_snapshot_list(bs, sn_tab); + +for (i = 0; i sn_count; i++) { +info-has_snapshots = true; +info_list = g_new0(SnapshotInfoList, 1); + +info_list-value= g_new0(SnapshotInfo, 1); +info_list-value-id= g_strdup(sn_tab[i].id_str); +info_list-value-name = g_strdup(sn_tab[i].name); +info_list-value-vm_state_size = sn_tab[i].vm_state_size; +info_list-value-date_sec = sn_tab[i].date_sec; +info_list-value-date_nsec = sn_tab[i].date_nsec; +info_list-value-vm_clock_sec = sn_tab[i].vm_clock_nsec / 10; +info_list-value-vm_clock_nsec = sn_tab[i].vm_clock_nsec % 10; + +/* XXX: waiting for the qapi to support GSList */ +if (!cur_item) { +info-snapshots = cur_item = info_list; +} else { +cur_item-next = info_list; +cur_item = info_list; +} + +} + +g_free(sn_tab); +} + +static void dump_json_image_info(ImageInfo *info) +{ +Error *errp = NULL; +QString *str; +QmpOutputVisitor *ov = qmp_output_visitor_new(); +QObject *obj; +visit_type_ImageInfo(qmp_output_get_visitor(ov), + info, NULL, errp); +obj = qmp_output_get_qobject(ov); +str = qobject_to_json_pretty(obj); +assert(str != NULL); +printf(%s\n, qstring_get_str(str)); +qobject_decref(obj); +qmp_output_visitor_cleanup(ov); +QDECREF(str); +} + +static void collect_backing_file_format(ImageInfo *info, char *filename) +{ +BlockDriverState *bs = NULL; +bs = bdrv_new_open(filename, NULL, + BDRV_O_FLAGS | BDRV_O_NO_BACKING); +if (!bs) { +return; +} +info-backing_filename_format = +g_strdup(bdrv_get_format_name(bs)); +bdrv_delete(bs); +info-has_backing_filename_format = true; +} + +static void collect_image_info(BlockDriverState *bs, + ImageInfo *info, + const char
[Qemu-devel] [PATCH V6 1/2] qapi: Add SnapshotInfo and ImageInfo.
Signed-off-by: Benoit Canet ben...@irqsave.net --- qapi-schema.json | 64 ++ 1 file changed, 64 insertions(+) diff --git a/qapi-schema.json b/qapi-schema.json index a92adb1..ffe3a0a 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -126,6 +126,70 @@ 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog' ] } ## +# @SnapshotInfo +# +# @id: unique snapshot id +# +# @name: user chosen name +# +# @vm-state-size: size of the VM state +# +# @date-sec: UTC date of the snapshot in seconds +# +# @date-nsec: fractional part in nano seconds to be used with date-sec +# +# @vm-clock-sec: VM clock relative to boot in seconds +# +# @vm-clock-nsec: fractional part in nano seconds to be used with vm-clock-sec +# +# Since: 1.3 +# +## + +{ 'type': 'SnapshotInfo', + 'data': { 'id': 'str', 'name': 'str', 'vm-state-size': 'int', +'date-sec': 'int', 'date-nsec': 'int', +'vm-clock-sec': 'int', 'vm-clock-nsec': 'int' } } + +## +# @ImageInfo: +# +# Information about a QEMU image file +# +# @filename: name of the image file +# +# @format: format of the image file +# +# @virtual-size: maximum capacity in bytes of the image +# +# @actual-size: #optional actual size on disk in bytes of the image +# +# @dirty-flag: #optional true if image is not cleanly closed +# +# @cluster-size: #optional size of a cluster in bytes +# +# @encrypted: #optional true if the image is encrypted +# +# @backing-filename: #optional name of the backing file +# +# @full-backing-filename: #optional full path of the backing file +# +# @backing-filename-format: #optional the format of the backing file +# +# @snapshots: #optional list of VM snapshots +# +# Since: 1.3 +# +## + +{ 'type': 'ImageInfo', + 'data': {'filename': 'str', 'format': 'str', '*dirty-flag': 'bool', + '*actual-size': 'int', 'virtual-size': 'int', + '*cluster-size': 'int', '*encrypted': 'bool', + '*backing-filename': 'str', '*full-backing-filename': 'str', + '*backing-filename-format': 'str', '*snapshots': ['SnapshotInfo'] } } + +## # @StatusInfo: # # Information about VCPU run state -- 1.7.9.5
Re: [Qemu-devel] [PATCH 8/9] qdev: make qdev_set_parent_bus() just set a link property
On Sun, Aug 26, 2012 at 11:51 PM, Anthony Liguori aligu...@us.ibm.com wrote: Also make setting the link to NULL break the bus link Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- hw/qdev.c | 48 ++-- 1 files changed, 42 insertions(+), 6 deletions(-) diff --git a/hw/qdev.c b/hw/qdev.c index 86e1337..525a0cb 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -100,8 +100,7 @@ static void bus_add_child(BusState *bus, DeviceState *child) void qdev_set_parent_bus(DeviceState *dev, BusState *bus) { -dev-parent_bus = bus; -bus_add_child(bus, dev); +object_property_set_link(OBJECT(dev), OBJECT(bus), parent_bus, NULL); } /* Create a new device. This only initializes the device state structure @@ -241,8 +240,8 @@ void qbus_reset_all_fn(void *opaque) /* can be used as -unplug() callback for the simple cases */ int qdev_simple_unplug_cb(DeviceState *dev) { -/* just zap it */ -qdev_free(dev); +/* Unplug from parent bus via a forced eject */ +qdev_set_parent_bus(dev, NULL); I think it is more reliable to remove the reference property(child, link) before object_finialize(). So when uplug-finish, we delete all the refers: bus-child, bus-child by _del_property not using _set_property. return 0; } @@ -646,6 +645,40 @@ void qdev_property_add_static(DeviceState *dev, Property *prop, assert_no_error(local_err); } +static void qdev_set_link_property(Object *obj, Visitor *v, void *opaque, + const char *name, Error **errp) +{ +DeviceState *dev = DEVICE(obj); +BusState *parent_bus = dev-parent_bus; + +object_set_link_property(obj, v, opaque, name, errp); + +if (parent_bus) { +bus_remove_child(parent_bus, dev); +} + +if (dev-parent_bus) { +bus_add_child(dev-parent_bus, dev); +} + +if (!dev-parent_bus) { +notifier_list_notify(dev-eject_notifier, dev); +} +} + +static void qdev_release_link_property(Object *obj, const char *name, + void *opaque) +{ +DeviceState *dev = DEVICE(obj); + +if (dev-parent_bus) { +bus_remove_child(dev-parent_bus, dev); +object_unref(OBJECT(dev-parent_bus)); +} + +dev-parent_bus = NULL; +} + static void device_initfn(Object *obj) { DeviceState *dev = DEVICE(obj); @@ -670,8 +703,11 @@ static void device_initfn(Object *obj) } while (class != object_class_by_name(TYPE_DEVICE)); qdev_prop_set_globals(dev); -object_property_add_link(OBJECT(dev), parent_bus, TYPE_BUS, - (Object **)dev-parent_bus, NULL); +object_property_add(OBJECT(dev), parent_bus, link TYPE_BUS , +object_get_link_property, +qdev_set_link_property, +qdev_release_link_property, +dev-parent_bus, NULL); notifier_list_init(dev-eject_notifier); } -- 1.7.5.4
Re: [Qemu-devel] [RFC PATCH 0/9] qom: improve reference counting and hotplug
On Sun, Aug 26, 2012 at 11:51 PM, Anthony Liguori aligu...@us.ibm.com wrote: Right now, you need to pair up object_new with object_delete. This is impractical when using reference counting because we would like to ensure that object_unref() also frees memory when needed. The first few patches fix this problem by introducing a release callback so that objects that need special release behavior (i.e. g_free) can do that. Since link and child properties all hold references, in order to actually free an object, we need to break those links. User created devices end up as children of a container. But child properties cannot be removed which means there's no obvious way to remove the reference and ultimately free the object. Why? Since we call _add_child() in qdev_device_add(), why can not we call object_property_del_child() for qmp_device_del(). Could you explain it more detail? We introduce the concept of nullable child properties to solve this. This is a child property that can be broken by writing NULL to the child link. Today we set all /peripheral* children to be nullable so that they can be deleted by management tools. In terms of modeling hotplug, we represent unplug by removing the object from the parent bus. We need to register a notifier for when this happens so that we can also remove the parent's child property to ultimately release the object. Putting it all together, we have: 1) qmp_device_del will issue a callback to a device. The default callback will do a forced eject (which means writing NULL to the parent_bus link). 2) PCI hotplug is a bit more sophisticated in that it waits for the guest to do the ejection. 3) qmp_device_del will register an eject notifier such that the device gets completely removed. There's a slightly change in behavior here. A device is not automatically destroyed based on a guest initiated eject. A management tool must explicitly break the parent's link to the child in order for the device to disappear completely. device_del behaves exactly as it does today though. This is an RFC. I've tested the series quite a lot (it was hard to get the reference counting right) but not enough to apply. I also don't think the series is quite split right and may not bisect cleanly. I also want to write up a document describing object life cycle since admittedly the above is probably not that easy to follow. I wanted to share this now though because it works and I think the concepts are right.
Re: [Qemu-devel] [RFC][PATCH v4 3/3] tcg: Optimize qemu_ld/st by generating slow paths at the end of a block
On 2012년 07월 29일 00:39, Yeongkyoon Lee wrote: On 2012년 07월 25일 23:00, Richard Henderson wrote: On 07/25/2012 12:35 AM, Yeongkyoon Lee wrote: +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION) defined(CONFIG_SOFTMMU) +/* Macros/structures for qemu_ld/st IR code optimization: + TCG_MAX_HELPER_LABELS is defined as same as OPC_BUF_SIZE in exec-all.h. */ +#define TCG_MAX_QEMU_LDST 640 Why statically size this ... This just followed the other TCG's code style, the allocation of the labels of TCGContext in tcg.c. +/* labels info for qemu_ld/st IRs + The labels help to generate TLB miss case codes at the end of TB */ +TCGLabelQemuLdst *qemu_ldst_labels; ... and then allocate the array dynamically? ditto. +/* jne slow_path */ +/* XXX: How to avoid using OPC_JCC_long for peephole optimization? */ +tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0); You can't, not and maintain the code-generate-until-address-reached exception invariant. +#ifndef CONFIG_QEMU_LDST_OPTIMIZATION uint8_t __ldb_mmu(target_ulong addr, int mmu_idx); void __stb_mmu(target_ulong addr, uint8_t val, int mmu_idx); uint16_t __ldw_mmu(target_ulong addr, int mmu_idx); @@ -28,6 +30,30 @@ void __stl_cmmu(target_ulong addr, uint32_t val, int mmu_idx); uint64_t __ldq_cmmu(target_ulong addr, int mmu_idx); void __stq_cmmu(target_ulong addr, uint64_t val, int mmu_idx); #else +/* Extended versions of MMU helpers for qemu_ld/st optimization. + The additional argument is a host code address accessing guest memory */ +uint8_t ext_ldb_mmu(target_ulong addr, int mmu_idx, uintptr_t ra); Don't tie LDST_OPTIMIZATION directly to the extended function calls. For a host supporting predication, like ARM, the best code sequence may look like (1) TLB check (2) If hit, load value from memory (3) If miss, call miss case (5) (4) ... next code ... (5) Load call parameters (6) Tail call (aka jump) to MMU helper so that (a) we need not explicitly load the address of (3) by hand for your RA parameter and (b) the mmu helper returns directly to (4). r~ The difference between current HEAD and the code sequence you said is, I think, code locality. My LDST_OPTIMIZATION patches enhances the code locality and also removes one jump. It shows about 4% rising of CoreMark performance on x86 host which supports predication like ARM. Probably, the performance enhancement for AREG0 cases might get more larger. I'm not sure where the performance enhancement came from now, and I'll check it by some tests later. In my humble opinion, there are no things to lose in LDST_OPTIMIZATION except for just adding one argument to MMU helper implicitly which doesn't look so critical. How about your opinion? Thanks. It's been a long time. I've tested the performances of one jump difference when fast qemu_ld/st (TLB hit). The result shows 3.6% CoreMark enhancement when reducing one jump where slow paths are generated at the end of block as same for the both cases. That means reducing one jump dominates the majority of performance enhancement from LDST_OPTIMIZATION. As a result, it needs extended MMU helper functions for attaining that performance rising, and those extended functions are used only implicitly. BTW, who will finally confirm my patches? I have sent four version of my patches in which I have applied all the reasonable feedbacks from this community. Currently, v4 is the final candidate though it might need merge with latest HEAD because it was sent 1 month before. Thanks.
Re: [Qemu-devel] [PATCH 9/9] hotplug: refactor hotplug to leverage new QOM functions
On Sun, Aug 26, 2012 at 11:51 PM, Anthony Liguori aligu...@us.ibm.com wrote: 1) DeviceState::unplug requests for an eject to happen - the default implementation is a forced eject 2) A bus can eject a device by setting the parent_bus to NULL - this detaches the device from the bus - this does *not* cause the device to disappear 3) The current implementation on unplug also registers an eject notifier - the eject notifier will detach the device the parent. This will cause the device to disappear 4) A purely guest initiated unplug will not delete a device but will cause the device to appear detached from the guests PoV. Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- hw/acpi_piix4.c |3 ++- hw/pci.c | 10 +- hw/pcie.c |2 +- hw/qdev.c | 22 ++ hw/qdev.h |2 ++ hw/shpc.c |2 +- hw/xen_platform.c |2 +- 7 files changed, 38 insertions(+), 5 deletions(-) diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c index 72d6e5c..eac53b3 100644 --- a/hw/acpi_piix4.c +++ b/hw/acpi_piix4.c @@ -305,7 +305,8 @@ static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) if (pc-no_hotplug) { slot_free = false; } else { -qdev_free(qdev); +/* Force eject of device */ +qdev_set_parent_bus(qdev, NULL); Do we need to wait for guest's ACKs for all of this device's children. Then we can change this node's topology in the device tree. I think, we can color the current device as unplug_state=ACK, and then decide whether to detached it or not. Each unplug ack from guest, will first check 1st.whether current node can be release or not. 2nd. if can released, then go bottom-up through the device tree to check whether the upper device can be released or not. If the down device(devB) removal cause the up device(devA) becomes a leaf, then we can remove devA. A leaf device is defined as : has no BusState kids OR all of its BusState kids are empty. This method can avoid sudden remove. } } } diff --git a/hw/pci.c b/hw/pci.c index 437af70..cc555c2 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -46,6 +46,14 @@ static char *pcibus_get_dev_path(DeviceState *dev); static char *pcibus_get_fw_dev_path(DeviceState *dev); static int pcibus_reset(BusState *qbus); +static void pcibus_remove_child(BusState *bus, DeviceState *dev) +{ +PCIDevice *pci_dev = PCI_DEVICE(dev); +PCIBus *pci_bus = PCI_BUS(bus); + +pci_bus-devices[pci_dev-devfn] = NULL; +} + static Property pci_props[] = { DEFINE_PROP_PCI_DEVFN(addr, PCIDevice, devfn, -1), DEFINE_PROP_STRING(romfile, PCIDevice, romfile), @@ -65,6 +73,7 @@ static void pci_bus_class_init(ObjectClass *klass, void *data) k-get_dev_path = pcibus_get_dev_path; k-get_fw_dev_path = pcibus_get_fw_dev_path; k-reset = pcibus_reset; +k-remove_child = pcibus_remove_child; } static const TypeInfo pci_bus_info = { @@ -833,7 +842,6 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus, static void do_pci_unregister_device(PCIDevice *pci_dev) { qemu_free_irqs(pci_dev-irq); -pci_dev-bus-devices[pci_dev-devfn] = NULL; pci_config_free(pci_dev); } diff --git a/hw/pcie.c b/hw/pcie.c index 7c92f19..d10ffea 100644 --- a/hw/pcie.c +++ b/hw/pcie.c @@ -235,7 +235,7 @@ static int pcie_cap_slot_hotplug(DeviceState *qdev, PCI_EXP_SLTSTA_PDS); pcie_cap_slot_event(d, PCI_EXP_HP_EV_PDC); } else { -qdev_free(pci_dev-qdev); +qdev_set_parent_bus(DEVICE(pci_dev), NULL); pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_PDS); pcie_cap_slot_event(d, PCI_EXP_HP_EV_PDC); diff --git a/hw/qdev.c b/hw/qdev.c index 525a0cb..be41f00 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -62,6 +62,7 @@ static void qdev_property_add_legacy(DeviceState *dev, Property *prop, static void bus_remove_child(BusState *bus, DeviceState *child) { +BusClass *bc = BUS_GET_CLASS(bus); BusChild *kid; QTAILQ_FOREACH(kid, bus-children, sibling) { @@ -71,6 +72,11 @@ static void bus_remove_child(BusState *bus, DeviceState *child) snprintf(name, sizeof(name), child[%d], kid-index); QTAILQ_REMOVE(bus-children, kid, sibling); object_property_del(OBJECT(bus), name, NULL); + +if (bc-remove_child) { +bc-remove_child(bus, kid-child); +} + g_free(kid); return; } @@ -192,9 +198,20 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id, dev-alias_required_for_version = required_for_version; } +static void qdev_finish_unplug(Notifier *notifier, void *data) +{ +
Re: [Qemu-devel] [PATCH v7 0/6] convert sendkey to qapi
On 20/08/12 23:08, Luiz Capitulino wrote: On Mon, 20 Aug 2012 07:25:13 -0600 Eric Blakeebl...@redhat.com wrote: On 08/19/2012 10:39 PM, Amos Kong wrote: This series converted 'sendkey' command to qapi. The raw value in hexadecimal format is not supported by 'send-key' of qmp. Are we still trying to get this into 1.2, or have we missed that deadline? Too late for 1.2, IMO. So I need to wait and repost a V8(# Since: 1.3) after 1.2 is released ? -- Amos.
[Qemu-devel] [RFC V5 08/11] quorum: Add quorum mechanism.
Signed-off-by: Benoit Canet ben...@irqsave.net --- block/quorum.c | 222 +++- 1 file changed, 221 insertions(+), 1 deletion(-) diff --git a/block/quorum.c b/block/quorum.c index 791ef4a..3fa9d53 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -14,6 +14,20 @@ */ #include block_int.h +#include zlib.h + +typedef struct QuorumVoteItem { +int index; +QLIST_ENTRY(QuorumVoteItem) next; +} QuorumVoteItem; + +typedef struct QuorumVoteVersion { +unsigned long value; +int index; +int vote_count; +QLIST_HEAD(, QuorumVoteItem) items; +QLIST_ENTRY(QuorumVoteVersion) next; +} QuorumVoteVersion; typedef struct { BlockDriverState **bs; @@ -31,6 +45,10 @@ typedef struct QuorumSingleAIOCB { QuorumAIOCB *parent; } QuorumSingleAIOCB; +typedef struct QuorumVotes { +QLIST_HEAD(, QuorumVoteVersion) vote_list; +} QuorumVotes; + struct QuorumAIOCB { BlockDriverAIOCB common; BDRVQuorumState *bqs; @@ -48,6 +66,8 @@ struct QuorumAIOCB { int success_count; /* number of successfully completed AIOCB */ bool *finished; /* completion signal for cancel */ +QuorumVotes votes; + void (*vote)(QuorumAIOCB *acb); int vote_ret; }; @@ -204,6 +224,11 @@ static void quorum_aio_bh(void *opaque) } qemu_bh_delete(acb-bh); + +if (acb-vote_ret) { +ret = acb-vote_ret; +} + acb-common.cb(acb-common.opaque, ret); if (acb-finished) { *acb-finished = true; @@ -239,6 +264,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s, acb-nb_sectors = nb_sectors; acb-vote = NULL; acb-vote_ret = 0; +QLIST_INIT(acb-votes.vote_list); for (i = 0; i s-total; i++) { acb-aios[i].buf = NULL; @@ -266,10 +292,202 @@ static void quorum_aio_cb(void *opaque, int ret) return; } +/* Do the vote */ +if (acb-vote) { +acb-vote(acb); +} + acb-bh = qemu_bh_new(quorum_aio_bh, acb); qemu_bh_schedule(acb-bh); } +static void quorum_print_bad(QuorumAIOCB *acb, const char *filename) +{ +fprintf(stderr, quorum: corrected error in quorum file %s: sector_num=% +PRId64 nb_sectors=%i\n, filename, acb-sector_num, +acb-nb_sectors); +} + +static void quorum_print_failure(QuorumAIOCB *acb) +{ +fprintf(stderr, quorum: failure sector_num=% PRId64 nb_sectors=%i\n, +acb-sector_num, acb-nb_sectors); +} + +static void quorum_print_bad_versions(QuorumAIOCB *acb, + unsigned long checksum) +{ +QuorumVoteVersion *version; +QuorumVoteItem *item; +BDRVQuorumState *s = acb-bqs; + +QLIST_FOREACH(version, acb-votes.vote_list, next) { +if (version-value == checksum) { +continue; +} +QLIST_FOREACH(item, version-items, next) { +quorum_print_bad(acb, s-filenames[item-index]); +} +} +} + +static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source) +{ +int i; +assert(dest-niov == source-niov); +assert(dest-size == source-size); +for (i = 0; i source-niov; i++) { +assert(dest-iov[i].iov_len == source-iov[i].iov_len); +memcpy(dest-iov[i].iov_base, + source-iov[i].iov_base, + source-iov[i].iov_len); +} +} + +static void quorum_count_vote(QuorumVotes *votes, + unsigned long checksum, + int index) +{ +QuorumVoteVersion *v = NULL, *version = NULL; +QuorumVoteItem *item; + +/* look if we have something with this checksum */ +QLIST_FOREACH(v, votes-vote_list, next) { +if (v-value == checksum) { +version = v; +break; +} +} + +/* It's a version not yet in the list add it */ +if (!version) { +version = g_new0(QuorumVoteVersion, 1); +QLIST_INIT(version-items); +version-value = checksum; +version-index = index; +version-vote_count = 0; +QLIST_INSERT_HEAD(votes-vote_list, version, next); +} + +version-vote_count++; + +item = g_new0(QuorumVoteItem, 1); +item-index = index; +QLIST_INSERT_HEAD(version-items, item, next); +} + +static void quorum_free_vote_list(QuorumVotes *votes) +{ +QuorumVoteVersion *version, *next_version; +QuorumVoteItem *item, *next_item; + +QLIST_FOREACH_SAFE(version, votes-vote_list, next, next_version) { +QLIST_REMOVE(version, next); +QLIST_FOREACH_SAFE(item, version-items, next, next_item) { +QLIST_REMOVE(item, next); +g_free(item); +} +g_free(version); +} +} + +static unsigned long quorum_compute_checksum(QuorumAIOCB *acb, int i) +{ +int j; +unsigned long adler = adler32(0L, Z_NULL, 0); +QEMUIOVector *qiov = acb-qiovs[i]; + +for (j = 0; j qiov-niov; j++) { +adler = adler32(adler, +
Re: [Qemu-devel] [PATCH 1/2] migration: Allow the migrate command to work on file: urls
Adding Luiz to the thread since he is concerned by migration. Luiz do you have any hints on doing this properly ? Benoît Le Thursday 23 Aug 2012 à 13:34:01 (+0100), Daniel P. Berrange a écrit : On Thu, Aug 23, 2012 at 02:28:07PM +0200, Benoît Canet wrote: Usage: (qemu) migrate file:/path/to/vm_statefile Signed-off-by: Benoit Canet ben...@irqsave.net --- migration-fd.c |4 ++-- migration.c| 20 +++- migration.h|2 +- 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/migration-fd.c b/migration-fd.c index 50138ed..d39e44a 100644 --- a/migration-fd.c +++ b/migration-fd.c @@ -73,9 +73,9 @@ static int fd_close(MigrationState *s) return 0; } -int fd_start_outgoing_migration(MigrationState *s, const char *fdname) +int fd_start_outgoing_migration(MigrationState *s, const char *fdname, int fd) { -s-fd = monitor_get_fd(cur_mon, fdname); +s-fd = fd ? fd : monitor_get_fd(cur_mon, fdname); if (s-fd == -1) { DPRINTF(fd_migration: invalid file descriptor identifier\n); goto err_after_get_fd; diff --git a/migration.c b/migration.c index 1edeec5..679847d 100644 --- a/migration.c +++ b/migration.c @@ -239,9 +239,14 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params, static int migrate_fd_cleanup(MigrationState *s) { int ret = 0; +struct stat st; qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL); +if (!fstat(s-fd, st) S_ISREG(st.st_mode)) { +fsync(s-fd); +} + if (s-file) { DPRINTF(closing file\n); ret = qemu_fclose(s-file); @@ -475,6 +480,17 @@ void migrate_del_blocker(Error *reason) migration_blockers = g_slist_remove(migration_blockers, reason); } +static int file_start_outgoing_migration(MigrationState *s, + const char *filename) +{ +int fd; +fd = open(filename, O_CREAT|O_TRUNC|O_WRONLY, S_IRUSR|S_IWUSR); +if (fd 0) { +return -errno; +} +return fd_start_outgoing_migration(s, NULL, fd); 'fd_start_outgoing_migration' requires that the FD you give it supports non-blocking I/O. File descriptors opened from plain files or block devices do not honour that requirement. So this proposed code will cause the entire QEMU process to block while migration is taking place. This is why no on has ever implemented the 'file:' protocol in QEMU before. To deal with this issue you'd either have to use the POSIX async I/O APIs (or QEMU's internal equivalent), or spawn a separate 'dd' helper process and give QEMU a pipe FD instead. The latter is what libvirt does to implement migrate to file. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
[Qemu-devel] [RFC V5 10/11] quorum: Add quorum_invalidate_cache().
Signed-off-by: Benoit Canet ben...@irqsave.net --- block/quorum.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/block/quorum.c b/block/quorum.c index 09eed84..c9dcd9c 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -571,6 +571,16 @@ static int64_t quorum_getlength(BlockDriverState *bs) return value; } +static void quorum_invalidate_cache(BlockDriverState *bs) +{ +BDRVQuorumState *s = bs-opaque; +int i; + +for (i = 0; i s-total; i++) { +bdrv_invalidate_cache(s-bs[i]); +} +} + static BlockDriver bdrv_quorum = { .format_name= quorum, .protocol_name = quorum, @@ -585,6 +595,7 @@ static BlockDriver bdrv_quorum = { .bdrv_aio_readv = quorum_aio_readv, .bdrv_aio_writev= quorum_aio_writev, +.bdrv_invalidate_cache = quorum_invalidate_cache, }; static void bdrv_quorum_init(void) -- 1.7.9.5
[Qemu-devel] [RFC V5 04/11] quorum: Add quorum_aio_writev and its dependencies.
Signed-off-by: Benoit Canet ben...@irqsave.net --- block/quorum.c | 112 1 file changed, 112 insertions(+) diff --git a/block/quorum.c b/block/quorum.c index b9fb2b9..cd11cfb 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -172,6 +172,116 @@ static void quorum_close(BlockDriverState *bs) g_free(s-bs); } +static void quorum_aio_cancel(BlockDriverAIOCB *blockacb) +{ +QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common); +bool finished = false; + +/* Wait for the request to finish */ +acb-finished = finished; +while (!finished) { +qemu_aio_wait(); +} +} + +static AIOPool quorum_aio_pool = { +.aiocb_size = sizeof(QuorumAIOCB), +.cancel = quorum_aio_cancel, +}; + +static void quorum_aio_bh(void *opaque) +{ +QuorumAIOCB *acb = opaque; +BDRVQuorumState *s = acb-bqs; +int ret; + +ret = s-threshold = acb-success_count ? 0 : -EIO; + +qemu_bh_delete(acb-bh); +acb-common.cb(acb-common.opaque, ret); +if (acb-finished) { +*acb-finished = true; +} +g_free(acb-aios); +g_free(acb-qiovs); +qemu_aio_release(acb); +} + +static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s, + BlockDriverState *bs, + QEMUIOVector *qiov, + int64_t sector_num, + int nb_sectors, + BlockDriverCompletionFunc *cb, + void *opaque) +{ +QuorumAIOCB *acb = qemu_aio_get(quorum_aio_pool, bs, cb, opaque); +int i; + +acb-aios = g_new0(QuorumSingleAIOCB, s-total); +acb-qiovs = g_new0(QEMUIOVector, s-total); + +acb-bqs = s; +acb-qiov = qiov; +acb-bh = NULL; +acb-count = 0; +acb-success_count = 0; +acb-sector_num = sector_num; +acb-nb_sectors = nb_sectors; +acb-vote = NULL; +acb-vote_ret = 0; + +for (i = 0; i s-total; i++) { +acb-aios[i].buf = NULL; +acb-aios[i].ret = 0; +acb-aios[i].parent = acb; +} + +return acb; +} + +static void quorum_aio_cb(void *opaque, int ret) +{ +QuorumSingleAIOCB *sacb = opaque; +QuorumAIOCB *acb = sacb-parent; +BDRVQuorumState *s = acb-bqs; + +sacb-ret = ret; +acb-count++; +if (ret == 0) { +acb-success_count++; +} +assert(acb-count = s-total); +assert(acb-success_count = s-total); +if (acb-count s-total) { +return; +} + +acb-bh = qemu_bh_new(quorum_aio_bh, acb); +qemu_bh_schedule(acb-bh); +} + +static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs, + int64_t sector_num, + QEMUIOVector *qiov, + int nb_sectors, + BlockDriverCompletionFunc *cb, + void *opaque) +{ +BDRVQuorumState *s = bs-opaque; +QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num, nb_sectors, + cb, opaque); +int i; + +for (i = 0; i s-total; i++) { +acb-aios[i].aiocb = bdrv_aio_writev(s-bs[i], sector_num, qiov, + nb_sectors, quorum_aio_cb, + acb-aios[i]); +} + +return acb-common; +} + static BlockDriver bdrv_quorum = { .format_name= quorum, .protocol_name = quorum, @@ -180,6 +290,8 @@ static BlockDriver bdrv_quorum = { .bdrv_file_open = quorum_open, .bdrv_close = quorum_close, + +.bdrv_aio_writev= quorum_aio_writev, }; static void bdrv_quorum_init(void) -- 1.7.9.5
Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycle problem
On 2012-08-27 09:01, Paolo Bonzini wrote: Il 25/08/2012 09:42, liu ping fan ha scritto: I don't see why MMIO dispatch should hold the IDEBus ref rather than the PCIIDEState. When transfer memory_region_init_io() 3rd para from void* opaque to Object* obj, the obj : opaque is not neccessary 1:1 map. For such situation, in order to let MemoryRegionOps tell between them, we should pass PCIIDEState-bus[0], bus[1] separately. The rule should be that the obj is the object that you want referenced, and that should be the PCIIDEState. But this is anyway moot because it only applies to objects that are converted to use unlocked dispatch. This likely will not be the case for IDE. BTW, I'm pretty sure - after implementing the basics for BQL-free PIO dispatching - that device objects are the wrong target for reference counting. We keep memory regions in our dispatching tables (PIO dispatching needs some refactoring for this), and those regions need protection for BQL-free use. Devices can't pass away as long as the have referenced regions, memory region deregistration services will have to take care of this. I'm currently not using reference counting at all, I'm enforcing that only BQL-protected regions can be deregistered. Also note that there seems to be another misconception in the discussions: deregistration is not only bound to device unplug. It also happens on device reconfiguration, e.g. PCI BAR (re-)mapping. Another strong indicator that we should worry about individual memory regions, not devices. Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] [RFC V5 02/11] quorum: Create BDRVQuorumState and BlkDriver and do init.
Signed-off-by: Benoit Canet ben...@irqsave.net --- block/quorum.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/block/quorum.c b/block/quorum.c index 65a6b55..19a9a44 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -15,6 +15,13 @@ #include block_int.h +typedef struct { +BlockDriverState **bs; +int threshold; +int total; +char **filenames; +} BDRVQuorumState; + typedef struct QuorumAIOCB QuorumAIOCB; typedef struct QuorumSingleAIOCB { @@ -26,6 +33,7 @@ typedef struct QuorumSingleAIOCB { struct QuorumAIOCB { BlockDriverAIOCB common; +BDRVQuorumState *bqs; QEMUBH *bh; /* Request metadata */ @@ -43,3 +51,17 @@ struct QuorumAIOCB { void (*vote)(QuorumAIOCB *acb); int vote_ret; }; + +static BlockDriver bdrv_quorum = { +.format_name= quorum, +.protocol_name = quorum, + +.instance_size = sizeof(BDRVQuorumState), +}; + +static void bdrv_quorum_init(void) +{ +bdrv_register(bdrv_quorum); +} + +block_init(bdrv_quorum_init); -- 1.7.9.5
[Qemu-devel] [RFC V5 09/11] quorum: Add quorum_getlength().
Signed-off-by: Benoit Canet ben...@irqsave.net --- block/quorum.c | 24 1 file changed, 24 insertions(+) diff --git a/block/quorum.c b/block/quorum.c index 3fa9d53..09eed84 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -549,12 +549,36 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs) return 0; } +static int64_t quorum_getlength(BlockDriverState *bs) +{ +BDRVQuorumState *s = bs-opaque; +QuorumVoteVersion *winner = NULL; +QuorumVotes votes; +int64_t value; +int i; + +QLIST_INIT(votes.vote_list); +for (i = 0; i s-total; i++) { +quorum_count_vote(votes, (unsigned long) bdrv_getlength(s-bs[i]), i); +} + +/* vote to select the most represented version */ +winner = quorum_get_vote_winner(votes); + +value = (int64_t) winner-value; +quorum_free_vote_list(votes); + +return value; +} + static BlockDriver bdrv_quorum = { .format_name= quorum, .protocol_name = quorum, .instance_size = sizeof(BDRVQuorumState), +.bdrv_getlength = quorum_getlength, + .bdrv_file_open = quorum_open, .bdrv_close = quorum_close, .bdrv_co_flush_to_disk = quorum_co_flush, -- 1.7.9.5
[Qemu-devel] [RFC V5 11/11] quorum: Add quorum_co_is_allocated.
Signed-off-by: Benoit Canet ben...@irqsave.net --- block/quorum.c | 32 1 file changed, 32 insertions(+) diff --git a/block/quorum.c b/block/quorum.c index c9dcd9c..5a9f598 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -581,6 +581,37 @@ static void quorum_invalidate_cache(BlockDriverState *bs) } } +static int coroutine_fn quorum_co_is_allocated(BlockDriverState *bs, + int64_t sector_num, + int nb_sectors, + int *pnum) +{ +BDRVQuorumState *s = bs-opaque; +QuorumVoteVersion *winner = NULL; +QuorumVotes result_votes, num_votes; +int i, result, num; + +QLIST_INIT(result_votes.vote_list); +QLIST_INIT(num_votes.vote_list); + +for (i = 0; i s-total; i++) { +result = bdrv_co_is_allocated(s-bs[i], sector_num, nb_sectors, num); +quorum_count_vote(result_votes, result, i); +quorum_count_vote(num_votes, num, i); +} + +winner = quorum_get_vote_winner(result_votes); +result = winner-value; + +winner = quorum_get_vote_winner(num_votes); +*pnum = winner-value; + +quorum_free_vote_list(result_votes); +quorum_free_vote_list(num_votes); + +return result; +} + static BlockDriver bdrv_quorum = { .format_name= quorum, .protocol_name = quorum, @@ -596,6 +627,7 @@ static BlockDriver bdrv_quorum = { .bdrv_aio_readv = quorum_aio_readv, .bdrv_aio_writev= quorum_aio_writev, .bdrv_invalidate_cache = quorum_invalidate_cache, +.bdrv_co_is_allocated = quorum_co_is_allocated, }; static void bdrv_quorum_init(void) -- 1.7.9.5
[Qemu-devel] [RFC V5 07/11] quorum: Add quorum_aio_readv.
Signed-off-by: Benoit Canet ben...@irqsave.net --- block/quorum.c | 38 +- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/block/quorum.c b/block/quorum.c index f83b4cf..791ef4a 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -193,15 +193,24 @@ static void quorum_aio_bh(void *opaque) { QuorumAIOCB *acb = opaque; BDRVQuorumState *s = acb-bqs; -int ret; +int i, ret; ret = s-threshold = acb-success_count ? 0 : -EIO; +for (i = 0; i s-total; i++) { +qemu_vfree(acb-aios[i].buf); +acb-aios[i].buf = NULL; +acb-aios[i].ret = 0; +} + qemu_bh_delete(acb-bh); acb-common.cb(acb-common.opaque, ret); if (acb-finished) { *acb-finished = true; } +for (i = 0; i s-total; i++) { +qemu_iovec_destroy(acb-qiovs[i]); +} g_free(acb-aios); g_free(acb-qiovs); qemu_aio_release(acb); @@ -261,6 +270,32 @@ static void quorum_aio_cb(void *opaque, int ret) qemu_bh_schedule(acb-bh); } +static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs, + int64_t sector_num, + QEMUIOVector *qiov, + int nb_sectors, + BlockDriverCompletionFunc *cb, + void *opaque) +{ +BDRVQuorumState *s = bs-opaque; +QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num, + nb_sectors, cb, opaque); +int i; + +for (i = 0; i s-total; i++) { +acb-aios[i].buf = qemu_blockalign(bs-file, qiov-size); +qemu_iovec_init(acb-qiovs[i], qiov-niov); +qemu_iovec_clone(acb-qiovs[i], qiov, acb-aios[i].buf); +} + +for (i = 0; i s-total; i++) { +bdrv_aio_readv(s-bs[i], sector_num, qiov, nb_sectors, + quorum_aio_cb, acb-aios[i]); +} + +return acb-common; +} + static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov, @@ -304,6 +339,7 @@ static BlockDriver bdrv_quorum = { .bdrv_close = quorum_close, .bdrv_co_flush_to_disk = quorum_co_flush, +.bdrv_aio_readv = quorum_aio_readv, .bdrv_aio_writev= quorum_aio_writev, }; -- 1.7.9.5
[Qemu-devel] [RFC V5 06/11] quorum: Add quorum_co_flush().
Signed-off-by: Benoit Canet ben...@irqsave.net --- block/quorum.c | 13 + 1 file changed, 13 insertions(+) diff --git a/block/quorum.c b/block/quorum.c index cd11cfb..f83b4cf 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -282,6 +282,18 @@ static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs, return acb-common; } +static coroutine_fn int quorum_co_flush(BlockDriverState *bs) +{ +BDRVQuorumState *s = bs-opaque; +int i; + +for (i = 0; i s-total; i++) { +bdrv_co_flush(s-bs[i]); +} + +return 0; +} + static BlockDriver bdrv_quorum = { .format_name= quorum, .protocol_name = quorum, @@ -290,6 +302,7 @@ static BlockDriver bdrv_quorum = { .bdrv_file_open = quorum_open, .bdrv_close = quorum_close, +.bdrv_co_flush_to_disk = quorum_co_flush, .bdrv_aio_writev= quorum_aio_writev, }; -- 1.7.9.5
[Qemu-devel] [RFC V5 00/11] Quorum disk image corruption resiliency
This patchset create a block driver implementing a quorum using total qemu disk images. Writes are mirrored on the $total files. For the reading part the $total files are read at the same time and a vote is done to determine if a qiov version is present $threshold or more times. It then return this majority version to the upper layers. When i $threshold versions of the data are returned by the lower layer the quorum is broken and the read return -EIO. The goal of this patchset is to be turned in a QEMU block filter living just above raw-*.c and below qcow2/qed when the required infrastructure will be done. Main use of this feature will be people using NFS appliances which can be subjected to bitflip errors. This patchset can be used to replace blkverify and the out of tree blkmirror. usage: -drive file=quorum:threshold/total:image_1.raw,,...,,image_total.raw,if=virtio,cache=none in v2: eblake: fix typos squash two first commits afärber: Modify the Makefile on first commit bcanet: move function prototype of quorum.c one patch down in v3: Blue Swirl: change char * to uint8_t * in QuorumSingleAIOCB Eric Blake: Add escaping of the : separator Allow to specify the n/m ratio parameters of the Quorum Stefan Hajnoczi: Squash quorum_close and quorum_open patch to avoid leak Add missing bdrv_delete() in quorum_close simpler quorum_getlength make the quorum_check_ret threshold a user setting (bind it to n) move blkverify_iovec_clone() and blkverify_iovec_compare() to cutils.c free unconditionally qemu_blockalign() with qemu_vfree() turn assignement into assert in quorum_copy_qiov() in v4: Eric Blake: verbose commit message for Add quorum_open() and quorum_close() use of a bool for the escape variable in the same commit simplify a if to a one liner in the same commit replace += 1 by ++ in a number of places make quorum_getlength return a quorum vote. Blue Swirl: replace n and m by threshold and total ignore flush errors in quorum_co_flush Stefan Hajnoczi: removal of a macro in Add quorum mechanism call qemu_iovec_destroy in the bh Benoît Canet: Now use QuorumVoteItem and QuorumVoteVersion as names for the voting structs refactor and rename function to quorum_count_vote. in v5: Blue swirl: replace ':' by ',' as separator to allow networked path replace remaining occurence of n and m by threshold and total Eric Blake: fix commit message about escaping Benoît Canet: Factorise voting into quorum_get_vote_winner() Create quorum_invalidate_cache to enable live migration Create quorum_co_is_allocated to enable streaming. Fix escaping Benoît Canet (11): quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB. quorum: Create BDRVQuorumState and BlkDriver and do init. quorum: Add quorum_open() and quorum_close(). quorum: Add quorum_aio_writev and its dependencies. blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify. quorum: Add quorum_co_flush(). quorum: Add quorum_aio_readv. quorum: Add quorum mechanism. quorum: Add quorum_getlength(). quorum: Add quorum_invalidate_cache(). quorum: Add quorum_co_is_allocated. block/Makefile.objs |1 + block/blkverify.c | 108 + block/quorum.c | 638 +++ cutils.c| 103 + qemu-common.h |2 + 5 files changed, 746 insertions(+), 106 deletions(-) create mode 100644 block/quorum.c -- 1.7.9.5
[Qemu-devel] [RFC V5 05/11] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify.
Signed-off-by: Benoit Canet ben...@irqsave.net --- block/blkverify.c | 108 + cutils.c | 103 ++ qemu-common.h |2 + 3 files changed, 107 insertions(+), 106 deletions(-) diff --git a/block/blkverify.c b/block/blkverify.c index 9d5f1ec..79d36d5 100644 --- a/block/blkverify.c +++ b/block/blkverify.c @@ -123,110 +123,6 @@ static int64_t blkverify_getlength(BlockDriverState *bs) return bdrv_getlength(s-test_file); } -/** - * Check that I/O vector contents are identical - * - * @a: I/O vector - * @b: I/O vector - * @ret:Offset to first mismatching byte or -1 if match - */ -static ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b) -{ -int i; -ssize_t offset = 0; - -assert(a-niov == b-niov); -for (i = 0; i a-niov; i++) { -size_t len = 0; -uint8_t *p = (uint8_t *)a-iov[i].iov_base; -uint8_t *q = (uint8_t *)b-iov[i].iov_base; - -assert(a-iov[i].iov_len == b-iov[i].iov_len); -while (len a-iov[i].iov_len *p++ == *q++) { -len++; -} - -offset += len; - -if (len != a-iov[i].iov_len) { -return offset; -} -} -return -1; -} - -typedef struct { -int src_index; -struct iovec *src_iov; -void *dest_base; -} IOVectorSortElem; - -static int sortelem_cmp_src_base(const void *a, const void *b) -{ -const IOVectorSortElem *elem_a = a; -const IOVectorSortElem *elem_b = b; - -/* Don't overflow */ -if (elem_a-src_iov-iov_base elem_b-src_iov-iov_base) { -return -1; -} else if (elem_a-src_iov-iov_base elem_b-src_iov-iov_base) { -return 1; -} else { -return 0; -} -} - -static int sortelem_cmp_src_index(const void *a, const void *b) -{ -const IOVectorSortElem *elem_a = a; -const IOVectorSortElem *elem_b = b; - -return elem_a-src_index - elem_b-src_index; -} - -/** - * Copy contents of I/O vector - * - * The relative relationships of overlapping iovecs are preserved. This is - * necessary to ensure identical semantics in the cloned I/O vector. - */ -static void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src, - void *buf) -{ -IOVectorSortElem sortelems[src-niov]; -void *last_end; -int i; - -/* Sort by source iovecs by base address */ -for (i = 0; i src-niov; i++) { -sortelems[i].src_index = i; -sortelems[i].src_iov = src-iov[i]; -} -qsort(sortelems, src-niov, sizeof(sortelems[0]), sortelem_cmp_src_base); - -/* Allocate buffer space taking into account overlapping iovecs */ -last_end = NULL; -for (i = 0; i src-niov; i++) { -struct iovec *cur = sortelems[i].src_iov; -ptrdiff_t rewind = 0; - -/* Detect overlap */ -if (last_end last_end cur-iov_base) { -rewind = last_end - cur-iov_base; -} - -sortelems[i].dest_base = buf - rewind; -buf += cur-iov_len - MIN(rewind, cur-iov_len); -last_end = MAX(cur-iov_base + cur-iov_len, last_end); -} - -/* Sort by source iovec index and build destination iovec */ -qsort(sortelems, src-niov, sizeof(sortelems[0]), sortelem_cmp_src_index); -for (i = 0; i src-niov; i++) { -qemu_iovec_add(dest, sortelems[i].dest_base, src-iov[i].iov_len); -} -} - static BlkverifyAIOCB *blkverify_aio_get(BlockDriverState *bs, bool is_write, int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, @@ -290,7 +186,7 @@ static void blkverify_aio_cb(void *opaque, int ret) static void blkverify_verify_readv(BlkverifyAIOCB *acb) { -ssize_t offset = blkverify_iovec_compare(acb-qiov, acb-raw_qiov); +ssize_t offset = qemu_iovec_compare(acb-qiov, acb-raw_qiov); if (offset != -1) { blkverify_err(acb, contents mismatch in sector % PRId64, acb-sector_num + (int64_t)(offset / BDRV_SECTOR_SIZE)); @@ -308,7 +204,7 @@ static BlockDriverAIOCB *blkverify_aio_readv(BlockDriverState *bs, acb-verify = blkverify_verify_readv; acb-buf = qemu_blockalign(bs-file, qiov-size); qemu_iovec_init(acb-raw_qiov, acb-qiov-niov); -blkverify_iovec_clone(acb-raw_qiov, qiov, acb-buf); +qemu_iovec_clone(acb-raw_qiov, qiov, acb-buf); bdrv_aio_readv(s-test_file, sector_num, qiov, nb_sectors, blkverify_aio_cb, acb); diff --git a/cutils.c b/cutils.c index ee4614d..dcdd60f 100644 --- a/cutils.c +++ b/cutils.c @@ -245,6 +245,109 @@ size_t qemu_iovec_memset(QEMUIOVector *qiov, size_t offset, return iov_memset(qiov-iov, qiov-niov, offset, fillc, bytes); } +/** + * Check that I/O vector contents are identical + * + * @a: I/O vector + * @b: I/O vector + * @ret:Offset to first mismatching
[Qemu-devel] [RFC V5 03/11] quorum: Add quorum_open() and quorum_close().
Valid quorum resources look like quorum:threshold/total:path/to/image_1, ... ,path/to/image_total ',' is used as a separator to allow to use networked path '\' is the escaping character for filename containing ',' '\' escape itself On the command line for quorum files img,test.raw, img2.raw and img3.raw invocation look like: -drive file=quorum:2/3:img\\,,test.raw,,img2.raw,,img3.raw (note the double ,, and \\) Signed-off-by: Benoit Canet ben...@irqsave.net --- block/quorum.c | 123 1 file changed, 123 insertions(+) diff --git a/block/quorum.c b/block/quorum.c index 19a9a44..b9fb2b9 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -52,11 +52,134 @@ struct QuorumAIOCB { int vote_ret; }; +/* Valid quorum resources look like + * quorum:threshold/total:path/to/image_1, ... ,path/to/image_total + * + * ',' is used as a separator to allow to use network path + * '\' is the escaping character for filename containing ',' + */ +static int quorum_open(BlockDriverState *bs, const char *filename, int flags) +{ +BDRVQuorumState *s = bs-opaque; +int i, j, k, len, ret = 0; +char *a, *b, *names; +bool escape; + +/* Parse the quorum: prefix */ +if (strncmp(filename, quorum:, strlen(quorum:))) { +return -EINVAL; +} + +filename += strlen(quorum:); + +/* Get threshold */ +errno = 0; +s-threshold = strtoul(filename, a, 10); +if (*a != '/' || errno) { +return -EINVAL; +} +a++; + +/* Get total */ +errno = 0; +s-total = strtoul(a, b, 10); +if (*b != ':' || errno) { +return -EINVAL; +} +b++; + +if (s-threshold 1 || s-total 2) { +return -EINVAL; +} + +if (s-threshold s-total) { +return -EINVAL; +} + +s-bs = g_malloc0(sizeof(BlockDriverState *) * s-total); +/* Two allocations for all filenames: simpler to free */ +s-filenames = g_malloc0(sizeof(char *) * s-total); +names = g_strdup(b); + +/* Get the filenames pointers */ +escape = false; +s-filenames[0] = names; +len = strlen(names); +for (i = j = k = 0; i len j s-total; i++) { +/* separation between two files */ +if (!escape names[i] == ',') { +char *prev = s-filenames[j]; +prev[k] = '\0'; +s-filenames[++j] = prev + k + 1; +k = 0; +continue; +} + +escape = !escape names[i] == '\\'; + +/* if we are not escaping copy */ +if (!escape) { +s-filenames[j][k++] = names[i]; +} +} +/* terminate last string */ +s-filenames[j][k] = '\0'; + +if ((j + 1) != s-total) { +ret = -EINVAL; +goto free_exit; +} + +/* Open files */ +for (i = 0; i s-total; i++) { +s-bs[i] = bdrv_new(); +ret = bdrv_open(s-bs[i], s-filenames[i], flags, NULL); +if (ret 0) { +goto error_exit; +} +} + +goto exit; + +error_exit: +for (; i = 0; i--) { +bdrv_delete(s-bs[i]); +s-bs[i] = NULL; +} +free_exit: +g_free(s-filenames[0]); +g_free(s-filenames); +s-filenames = NULL; +g_free(s-bs); +exit: +return ret; +} + +static void quorum_close(BlockDriverState *bs) +{ +BDRVQuorumState *s = bs-opaque; +int i; + +for (i = 0; i s-total; i++) { +/* Ensure writes reach stable storage */ +bdrv_flush(s-bs[i]); +bdrv_delete(s-bs[i]); +} + +g_free(s-filenames[0]); +g_free(s-filenames); +s-filenames = NULL; +g_free(s-bs); +} + static BlockDriver bdrv_quorum = { .format_name= quorum, .protocol_name = quorum, .instance_size = sizeof(BDRVQuorumState), + +.bdrv_file_open = quorum_open, +.bdrv_close = quorum_close, }; static void bdrv_quorum_init(void) -- 1.7.9.5
Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycle problem
On Mon, Aug 27, 2012 at 3:47 PM, Jan Kiszka jan.kis...@web.de wrote: On 2012-08-27 09:01, Paolo Bonzini wrote: Il 25/08/2012 09:42, liu ping fan ha scritto: I don't see why MMIO dispatch should hold the IDEBus ref rather than the PCIIDEState. When transfer memory_region_init_io() 3rd para from void* opaque to Object* obj, the obj : opaque is not neccessary 1:1 map. For such situation, in order to let MemoryRegionOps tell between them, we should pass PCIIDEState-bus[0], bus[1] separately. The rule should be that the obj is the object that you want referenced, and that should be the PCIIDEState. But this is anyway moot because it only applies to objects that are converted to use unlocked dispatch. This likely will not be the case for IDE. BTW, I'm pretty sure - after implementing the basics for BQL-free PIO dispatching - that device objects are the wrong target for reference Hi Jan, thanks for reminder, but could you explain it more detail? mmio dispatch table holds 1 ref for device, before releasing this ref,( When unplugging, we detach all the device's mr from memory, then drop the ref. So I think that no leak will be exposed by mr and it is safe to use device as target for reference. counting. We keep memory regions in our dispatching tables (PIO dispatching needs some refactoring for this), and those regions need protection for BQL-free use. Devices can't pass away as long as the have Yes, it is right. Device can pass away only after mr removed from dispatching tables Thanx pingfan referenced regions, memory region deregistration services will have to take care of this. I'm currently not using reference counting at all, I'm enforcing that only BQL-protected regions can be deregistered. Also note that there seems to be another misconception in the discussions: deregistration is not only bound to device unplug. It also happens on device reconfiguration, e.g. PCI BAR (re-)mapping. Another strong indicator that we should worry about individual memory regions, not devices. Jan
[Qemu-devel] [RFC V5 01/11] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
Signed-off-by: Benoit Canet ben...@irqsave.net --- block/Makefile.objs |1 + block/quorum.c | 45 + 2 files changed, 46 insertions(+) create mode 100644 block/quorum.c diff --git a/block/Makefile.objs b/block/Makefile.objs index b5754d3..66af6dc 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -4,6 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-obj-y += qed-check.o block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o block-obj-y += stream.o +block-obj-y += quorum.o block-obj-$(CONFIG_WIN32) += raw-win32.o block-obj-$(CONFIG_POSIX) += raw-posix.o block-obj-$(CONFIG_LIBISCSI) += iscsi.o diff --git a/block/quorum.c b/block/quorum.c new file mode 100644 index 000..65a6b55 --- /dev/null +++ b/block/quorum.c @@ -0,0 +1,45 @@ +/* + * Quorum Block filter + * + * Copyright (C) 2012 Nodalink, SARL. + * + * Author: + * Benoît Canet benoit.ca...@irqsave.net + * + * Based on the design and code of blkverify.c (Copyright (C) 2010 IBM, Corp) + * and blkmirror.c (Copyright (C) 2011 Red Hat, Inc). + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include block_int.h + +typedef struct QuorumAIOCB QuorumAIOCB; + +typedef struct QuorumSingleAIOCB { +BlockDriverAIOCB *aiocb; +uint8_t *buf; +int ret; +QuorumAIOCB *parent; +} QuorumSingleAIOCB; + +struct QuorumAIOCB { +BlockDriverAIOCB common; +QEMUBH *bh; + +/* Request metadata */ +int64_t sector_num; +int nb_sectors; + +QEMUIOVector *qiov; /* calling readv IOV */ + +QuorumSingleAIOCB *aios;/* individual AIOs */ +QEMUIOVector *qiovs;/* individual IOVs */ +int count; /* number of completed AIOCB */ +int success_count; /* number of successfully completed AIOCB */ +bool *finished; /* completion signal for cancel */ + +void (*vote)(QuorumAIOCB *acb); +int vote_ret; +}; -- 1.7.9.5
Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycle problem
On 2012-08-27 10:17, liu ping fan wrote: On Mon, Aug 27, 2012 at 3:47 PM, Jan Kiszka jan.kis...@web.de wrote: On 2012-08-27 09:01, Paolo Bonzini wrote: Il 25/08/2012 09:42, liu ping fan ha scritto: I don't see why MMIO dispatch should hold the IDEBus ref rather than the PCIIDEState. When transfer memory_region_init_io() 3rd para from void* opaque to Object* obj, the obj : opaque is not neccessary 1:1 map. For such situation, in order to let MemoryRegionOps tell between them, we should pass PCIIDEState-bus[0], bus[1] separately. The rule should be that the obj is the object that you want referenced, and that should be the PCIIDEState. But this is anyway moot because it only applies to objects that are converted to use unlocked dispatch. This likely will not be the case for IDE. BTW, I'm pretty sure - after implementing the basics for BQL-free PIO dispatching - that device objects are the wrong target for reference Hi Jan, thanks for reminder, but could you explain it more detail? mmio dispatch table holds 1 ref for device, before releasing this ref,( When unplugging, we detach all the device's mr from memory, then drop the ref. So I think that no leak will be exposed by mr and it is safe to use device as target for reference. It would be a mistake to assume that memory regions can only be embedded in device objects. Memory regions can be reconfigured or dynamically added/removed (see e.g. portio lists) - there is no device in this sentence. Regions are stored in the dispatching table, they will first of all be touched without holding the BQL. So their content has to be stable in that period, and it is the proper abstraction, IMHO, to focus on their life cycle management and attach all the rest to them. counting. We keep memory regions in our dispatching tables (PIO dispatching needs some refactoring for this), and those regions need protection for BQL-free use. Devices can't pass away as long as the have Yes, it is right. Device can pass away only after mr removed from dispatching tables Great, then you don't have to worry about device objects in the context of dispatching. Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH] [MIPS] Fix operands of RECIP2.S and RECIP2.PS
Read the second input operand of RECIP2.S and RECIP2.PS from FT rather than FD. RECIP2.D is already correct. Signed-off-by: Richard Sandiford rdsandif...@googlemail.com --- target-mips/translate.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/target-mips/translate.c b/target-mips/translate.c index 7104d30..d812986 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -6805,7 +6805,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1, TCGv_i32 fp1 = tcg_temp_new_i32(); gen_load_fpr32(fp0, fs); -gen_load_fpr32(fp1, fd); +gen_load_fpr32(fp1, ft); gen_helper_float_recip2_s(fp0, fp0, fp1); tcg_temp_free_i32(fp1); gen_store_fpr32(fp0, fd); @@ -7543,7 +7543,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1, TCGv_i64 fp1 = tcg_temp_new_i64(); gen_load_fpr64(ctx, fp0, fs); -gen_load_fpr64(ctx, fp1, fd); +gen_load_fpr64(ctx, fp1, ft); gen_helper_float_recip2_ps(fp0, fp0, fp1); tcg_temp_free_i64(fp1); gen_store_fpr64(ctx, fp0, fd); -- 1.7.7.6
[Qemu-devel] [PATCH] [MIPS] Fix order of CVT.PS.S operands
The FS input to CVT.PS.S is the high half and FT is the low half. tcg_gen_concat_i32_i64 takes the low half first, so the operands were in the wrong order. Signed-off-by: Richard Sandiford rdsandif...@googlemail.com --- target-mips/translate.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/target-mips/translate.c b/target-mips/translate.c index 06f0ac6..defc021 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -6907,7 +6907,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1, gen_load_fpr32(fp32_0, fs); gen_load_fpr32(fp32_1, ft); -tcg_gen_concat_i32_i64(fp64, fp32_0, fp32_1); +tcg_gen_concat_i32_i64(fp64, fp32_1, fp32_0); tcg_temp_free_i32(fp32_1); tcg_temp_free_i32(fp32_0); gen_store_fpr64(ctx, fp64, fd); -- 1.7.7.6
Re: [Qemu-devel] qcow2: online snasphots : internal vs external ?
On Sun, Aug 26, 2012 at 10:56 AM, Alexandre DERUMIER aderum...@odiso.com wrote: It is possible to achieve the same behaviour with external snapshot ? (I would like to do it online) I don't see how I can rollback to the point of time of the snapshot. The snapshot only captures the contents of the disk. Rollback does not make sense without shutting down the guest. The OS/file system would be very confused if the disk contents changed underneath it. Existing hotplug can be used. For example, if we have an external snapshot of a virtio-blk drive, we can use hotplug to remove the drive, choose the snapshot file and attach it again. This only works for data drives, the root file system usually cannot be changed while the guest is running. You may also wish to look at libvirt for higher level snapshot primitives. Also I see that snapshot_blkdev qmp command give in his description: Otherwise the snapshot will be internal! (currently unsupported). is Live internal snapshots on the roadmap ? I'm not aware of anyone working on adding internal snapshot in the near future. Patches are welcome. Stefan
Re: [Qemu-devel] [PATCH] [MIPS] Fix order of CVT.PS.S operands
On Mon, Aug 27, 2012 at 9:53 AM, Richard Sandiford rdsandif...@googlemail.com wrote: The FS input to CVT.PS.S is the high half and FT is the low half. tcg_gen_concat_i32_i64 takes the low half first, so the operands were in the wrong order. Signed-off-by: Richard Sandiford rdsandif...@googlemail.com --- target-mips/translate.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/target-mips/translate.c b/target-mips/translate.c index 06f0ac6..defc021 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -6907,7 +6907,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1, gen_load_fpr32(fp32_0, fs); gen_load_fpr32(fp32_1, ft); -tcg_gen_concat_i32_i64(fp64, fp32_0, fp32_1); +tcg_gen_concat_i32_i64(fp64, fp32_1, fp32_0); tcg_temp_free_i32(fp32_1); tcg_temp_free_i32(fp32_0); gen_store_fpr64(ctx, fp64, fd); -- 1.7.7.6 CCing Aurelian for MIPS. You can look at ./MAINTAINERS to see who should be CCed. Stefan
Re: [Qemu-devel] [PATCH] [MIPS] Fix operands of RECIP2.S and RECIP2.PS
On Mon, Aug 27, 2012 at 9:50 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Read the second input operand of RECIP2.S and RECIP2.PS from FT rather than FD. RECIP2.D is already correct. Signed-off-by: Richard Sandiford rdsandif...@googlemail.com --- target-mips/translate.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/target-mips/translate.c b/target-mips/translate.c index 7104d30..d812986 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -6805,7 +6805,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1, TCGv_i32 fp1 = tcg_temp_new_i32(); gen_load_fpr32(fp0, fs); -gen_load_fpr32(fp1, fd); +gen_load_fpr32(fp1, ft); gen_helper_float_recip2_s(fp0, fp0, fp1); tcg_temp_free_i32(fp1); gen_store_fpr32(fp0, fd); @@ -7543,7 +7543,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1, TCGv_i64 fp1 = tcg_temp_new_i64(); gen_load_fpr64(ctx, fp0, fs); -gen_load_fpr64(ctx, fp1, fd); +gen_load_fpr64(ctx, fp1, ft); gen_helper_float_recip2_ps(fp0, fp0, fp1); tcg_temp_free_i64(fp1); gen_store_fpr64(ctx, fp0, fd); -- 1.7.7.6 CCing Aurelian for MIPS. You can look at ./MAINTAINERS to see who should be CCed. Stefan
Re: [Qemu-devel] [PATCH] hw/pl110: Fix spelling of 'palette'
On 27 August 2012 06:19, Stefan Weil s...@weilnetz.de wrote: Am 26.08.2012 23:30, schrieb Peter Maydell: Fix the spelling of 'palette' used in various local variables and structure members. if (offset = 0x200 offset 0x400) { /* Pallette. */ What about this one? For V2 of your patch, you may add a Reviewed-by: Stefan Weil s...@weilnetz.de Thanks; as you may have guessed I didn't do a case-insensitive search... -- PMM
Re: [Qemu-devel] qcow2: online snasphots : internal vs external ?
Thanks again Stefan The snapshot only captures the contents of the disk. Rollback does not make sense without shutting down the guest. The OS/file system would be very confused if the disk contents changed underneath it. Existing hotplug can be used. For example, if we have an external snapshot of a virtio-blk drive, we can use hotplug to remove the drive, choose the snapshot file and attach it again. This only works for data drives, the root file system usually cannot be changed while the guest is running. Yes, sure rollback must be done offline. But I wanted to say, with external snapshot, how can I rollback to the point of the snapshot. exemple : image1.qcow2 file : /beforesnap1 take a snaphot (snap1), so qemu switch to snap1.qcow2 write some file: file: /aftersnap1. /beforesnap1 Now, how can I rollback to the point of time of snap1 ? I can reuse image1.qcow2, but if I write some datas on it, I don't see how I can return to the point of time of the snap1. (like qemu-image -a with internal snapshots) You may also wish to look at libvirt for higher level snapshot primitives. Thanks, I'll look at the libvirt to see how they do things. - Mail original - De: Stefan Hajnoczi stefa...@gmail.com À: Alexandre DERUMIER aderum...@odiso.com Cc: Jeff Cody jc...@redhat.com, qemu-devel qemu-devel@nongnu.org, Paolo Bonzini pbonz...@redhat.com, Eric Blake ebl...@redhat.com Envoyé: Lundi 27 Août 2012 11:04:14 Objet: Re: [Qemu-devel] qcow2: online snasphots : internal vs external ? On Sun, Aug 26, 2012 at 10:56 AM, Alexandre DERUMIER aderum...@odiso.com wrote: It is possible to achieve the same behaviour with external snapshot ? (I would like to do it online) I don't see how I can rollback to the point of time of the snapshot. The snapshot only captures the contents of the disk. Rollback does not make sense without shutting down the guest. The OS/file system would be very confused if the disk contents changed underneath it. Existing hotplug can be used. For example, if we have an external snapshot of a virtio-blk drive, we can use hotplug to remove the drive, choose the snapshot file and attach it again. This only works for data drives, the root file system usually cannot be changed while the guest is running. You may also wish to look at libvirt for higher level snapshot primitives. Also I see that snapshot_blkdev qmp command give in his description: Otherwise the snapshot will be internal! (currently unsupported). is Live internal snapshots on the roadmap ? I'm not aware of anyone working on adding internal snapshot in the near future. Patches are welcome. Stefan -- -- Alexandre D e rumier Ingénieur Systèmes et Réseaux Fixe : 03 20 68 88 85 Fax : 03 20 68 90 88 45 Bvd du Général Leclerc 59100 Roubaix 12 rue Marivaux 75002 Paris
Re: [Qemu-devel] qcow2: online snasphots : internal vs external ?
Il 27/08/2012 11:26, Alexandre DERUMIER ha scritto: how can I rollback to the point of the snapshot. exemple : image1.qcow2 file : /beforesnap1 take a snaphot (snap1), so qemu switch to snap1.qcow2 write some file: file: /aftersnap1. /beforesnap1 Now, how can I rollback to the point of time of snap1 ? I can reuse image1.qcow2, but if I write some datas on it, I don't see how I can return to the point of time of the snap1. (like qemu-image -a with internal snapshots) If you can drop snap1.qcow2 altogether, you just use image1.qcow2 the next time you start QEMU. If you cannot, you create snap2.qcow2 based on image1.qcow2: qemu-img -f qcow2 -obacking_file=image1.qcow2 snap2.qcow2 and use it the next time you start QEMU. Paolo
Re: [Qemu-devel] qcow2: online snasphots : internal vs external ?
Il 27/08/2012 11:04, Stefan Hajnoczi ha scritto: Also I see that snapshot_blkdev qmp command give in his description: Otherwise the snapshot will be internal! (currently unsupported). is Live internal snapshots on the roadmap ? I'm not aware of anyone working on adding internal snapshot in the near future. Patches are welcome. The main problem with internal snapshots is that it's difficult to work with two snapshots at the same time, especially if you need to write to two of them. IIUC this is why people concentrated more on external snapshots. It's not intrinsic to internal snapshots, more like a wart of the implementation, but not an easily fixed one. Paolo
Re: [Qemu-devel] [PATCH 2/2] mips-linux-user: Always support rdhwr.
On Fri, Mar 30, 2012 at 01:16:37PM -0400, Richard Henderson wrote: The kernel will emulate this instruction if it's not supported natively. This insn is used for TLS, among other things, and so is required by modern glibc. Signed-off-by: Richard Henderson r...@twiddle.net Cc: Riku Voipio riku.voi...@iki.fi --- target-mips/translate.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/target-mips/translate.c b/target-mips/translate.c index 300d95e..ed28ca8 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -8111,7 +8111,11 @@ gen_rdhwr (CPUMIPSState *env, DisasContext *ctx, int rt, int rd) { TCGv t0; +#if !defined(CONFIG_USER_ONLY) +/* The Linux kernel will emulate rdhwr if it's not supported natively. + Therefore only check the ISA in system mode. */ check_insn(env, ctx, ISA_MIPS32R2); +#endif t0 = tcg_temp_new(); switch (rd) { Thanks, applied. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [PATCH] [MIPS] Fix operands of RECIP2.S and RECIP2.PS
On Mon, Aug 27, 2012 at 09:50:38AM +0100, Richard Sandiford wrote: Read the second input operand of RECIP2.S and RECIP2.PS from FT rather than FD. RECIP2.D is already correct. Signed-off-by: Richard Sandiford rdsandif...@googlemail.com --- target-mips/translate.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/target-mips/translate.c b/target-mips/translate.c index 7104d30..d812986 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -6805,7 +6805,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1, TCGv_i32 fp1 = tcg_temp_new_i32(); gen_load_fpr32(fp0, fs); -gen_load_fpr32(fp1, fd); +gen_load_fpr32(fp1, ft); gen_helper_float_recip2_s(fp0, fp0, fp1); tcg_temp_free_i32(fp1); gen_store_fpr32(fp0, fd); @@ -7543,7 +7543,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1, TCGv_i64 fp1 = tcg_temp_new_i64(); gen_load_fpr64(ctx, fp0, fs); -gen_load_fpr64(ctx, fp1, fd); +gen_load_fpr64(ctx, fp1, ft); gen_helper_float_recip2_ps(fp0, fp0, fp1); tcg_temp_free_i64(fp1); gen_store_fpr64(ctx, fp0, fd); Thanks, applied. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [PATCH] ATAPI: Add support for ASCQ in sense codes
Am 31.07.2012 09:14, schrieb Paolo Bonzini: Il 31/07/2012 04:07, Ronnie Sahlberg ha scritto: Add support for setting the ASCQ for SCSI sense codes in the ATAPI driver. Use this to set ASCQ==2 for the medium removal prevention that is recommended in MMC for this condition. asc:0x53 ascq:0x02 is the recommended error for MEDIUM_REMOVAL_PREVENTED and is listed in Annex F in MMC You also need to cover migration. You could either add a subsection, or something like this: diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c index f7f714c..89c0157 100644 --- a/hw/ide/atapi.c +++ b/hw/ide/atapi.c @@ -1143,3 +1143,20 @@ void ide_atapi_cmd(IDEState *s) ide_atapi_cmd_error(s, ILLEGAL_REQUEST, ASC_ILLEGAL_OPCODE); } + +void ide_atapi_post_load(IDEState *s, int version_id) +{ +if (version_id 3) { +if (s-sense_key == UNIT_ATTENTION +s-asc == ASC_MEDIUM_MAY_HAVE_CHANGED) { +s-cdrom_changed = 1; +} +} + +/* This is simpler than adding a subsection just for the ascq. */ +if (s-asc == ASC_MEDIA_REMOVAL_PREVENTED) { +s-ascq = 2; +} else { +s-ascq = 0; +} +} diff --git a/hw/ide/core.c b/hw/ide/core.c index cb5ca4b..959ac48 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -2154,12 +2154,7 @@ static int ide_drive_post_load(void *opaque, int version_id) { IDEState *s = opaque; -if (version_id 3) { -if (s-sense_key == UNIT_ATTENTION -s-asc == ASC_MEDIUM_MAY_HAVE_CHANGED) { -s-cdrom_changed = 1; -} -} +ide_atapi_post_load(s, version_id); if (s-identify_set) { bdrv_set_enable_write_cache(s-bs, !!(s-identify_data[85] (1 5))); } diff --git a/hw/ide/internal.h b/hw/ide/internal.h index 7170bd9..2572461 100644 --- a/hw/ide/internal.h +++ b/hw/ide/internal.h @@ -572,6 +572,7 @@ BlockDriverAIOCB *ide_issue_trim(BlockDriverState *bs, /* hw/ide/atapi.c */ void ide_atapi_cmd(IDEState *s); void ide_atapi_cmd_reply_end(IDEState *s); +void ide_atapi_post_load(IDEState *s, int version_id); /* hw/ide/qdev.c */ void ide_bus_new(IDEBus *idebus, DeviceState *dev, int bus_id); In fact, I wonder if it is simpler to make up the ascq directly in cmd_request_sense, instead of changing all invocations of ide_atapi_cmd_error. Kevin, any preferences? Oh, I missed this question and wondered why there was no v2... I'd prefer this patch with migration support added. Kevin
Re: [Qemu-devel] [PATCH] [MIPS] Fix order of CVT.PS.S operands
On Mon, Aug 27, 2012 at 09:53:29AM +0100, Richard Sandiford wrote: The FS input to CVT.PS.S is the high half and FT is the low half. tcg_gen_concat_i32_i64 takes the low half first, so the operands were in the wrong order. Signed-off-by: Richard Sandiford rdsandif...@googlemail.com --- target-mips/translate.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/target-mips/translate.c b/target-mips/translate.c index 06f0ac6..defc021 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -6907,7 +6907,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1, gen_load_fpr32(fp32_0, fs); gen_load_fpr32(fp32_1, ft); -tcg_gen_concat_i32_i64(fp64, fp32_0, fp32_1); +tcg_gen_concat_i32_i64(fp64, fp32_1, fp32_0); tcg_temp_free_i32(fp32_1); tcg_temp_free_i32(fp32_0); gen_store_fpr64(ctx, fp64, fd); Thanks, applied. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [PATCH 1/2] target-mips: Streamline indexed cp1 memory addressing.
On Fri, Mar 30, 2012 at 01:16:36PM -0400, Richard Henderson wrote: We've already eliminated both base and index being zero. --- target-mips/translate.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/target-mips/translate.c b/target-mips/translate.c index a663b74..300d95e 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -7742,8 +7742,7 @@ static void gen_flt3_ldst (DisasContext *ctx, uint32_t opc, } else if (index == 0) { gen_load_gpr(t0, base); } else { -gen_load_gpr(t0, index); -gen_op_addr_add(ctx, t0, cpu_gpr[base], t0); +gen_op_addr_add(ctx, t0, cpu_gpr[base], cpu_gpr[index]); } /* Don't do NOP if destination is zero: we must perform the actual memory access. */ Thanks, applied. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [PATCH] ATAPI: STARTSTOPUNIT only eject/load media if powercondition is 0
Am 31.07.2012 03:28, schrieb Ronnie Sahlberg: The START STOP UNIT command will only eject/load media if power condition is zero. If power condition is !0 then LOEJ and START will be ignored. From MMC (sbc contains similar wordings too) The Power Conditions field requests the block device to be placed in the power condition defined in Table 558. If this field has a value other than 0h then the Start and LoEj bits shall be ignored. Signed-off-by: Ronnie Sahlberg ronniesahlb...@gmail.com Thanks, applied to block-next for 1.3. Kevin
[Qemu-devel] [PATCH] audio: previous audio buffer should be flushed
*** BLURB HERE *** munkyu.im (1): audio: previous audio buffer should be flushed audio/winwaveaudio.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) -- 1.7.4.1
[Qemu-devel] [PATCH] audio: previous audio buffer should be flushed
Buffer must be flushed when audio out is paused, but Winwave audio backend has problem with this unlike other backends. As a result, when user stop and restart audio files or something, the previous audio data are played in front of user expected sound. So changes it to waveOutReset() --- audio/winwaveaudio.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/audio/winwaveaudio.c b/audio/winwaveaudio.c index 663abb9..7de12a6 100644 --- a/audio/winwaveaudio.c +++ b/audio/winwaveaudio.c @@ -361,9 +361,9 @@ static int winwave_ctl_out (HWVoiceOut *hw, int cmd, ...) case VOICE_DISABLE: if (!wave-paused) { -mr = waveOutPause (wave-hwo); +mr = waveOutReset (wave-hwo); if (mr != MMSYSERR_NOERROR) { -winwave_logerr (mr, waveOutPause); +winwave_logerr (mr, waveOutReset); } else { wave-paused = 1; -- 1.7.4.1
[Qemu-devel] [PATCH v2] hw/pl110: Fix spelling of 'palette'
Fix the spelling of 'palette' used in various local variables, structure members and comments. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Reviewed-by: Stefan Weil s...@weilnetz.de --- v1-v2 changes: fix a comment which I'd missed before because it wasn't all-lowercase (thanks Stefan). hw/pl110.c | 30 +++--- hw/pl110_template.h | 22 +++--- 2 files changed, 26 insertions(+), 26 deletions(-) diff --git a/hw/pl110.c b/hw/pl110.c index f94608c..a582640 100644 --- a/hw/pl110.c +++ b/hw/pl110.c @@ -55,8 +55,8 @@ typedef struct { enum pl110_bppmode bpp; int invalidate; uint32_t mux_ctrl; -uint32_t pallette[256]; -uint32_t raw_pallette[128]; +uint32_t palette[256]; +uint32_t raw_palette[128]; qemu_irq irq; } pl110_state; @@ -79,8 +79,8 @@ static const VMStateDescription vmstate_pl110 = { VMSTATE_INT32(rows, pl110_state), VMSTATE_UINT32(bpp, pl110_state), VMSTATE_INT32(invalidate, pl110_state), -VMSTATE_UINT32_ARRAY(pallette, pl110_state, 256), -VMSTATE_UINT32_ARRAY(raw_pallette, pl110_state, 128), +VMSTATE_UINT32_ARRAY(palette, pl110_state, 256), +VMSTATE_UINT32_ARRAY(raw_palette, pl110_state, 128), VMSTATE_UINT32_V(mux_ctrl, pl110_state, 2), VMSTATE_END_OF_LIST() } @@ -236,7 +236,7 @@ static void pl110_update_display(void *opaque) s-upbase, s-cols, s-rows, src_width, dest_width, 0, s-invalidate, - fn, s-pallette, + fn, s-palette, first, last); if (first = 0) { dpy_update(s-ds, 0, first, s-cols, last - first + 1); @@ -253,13 +253,13 @@ static void pl110_invalidate_display(void * opaque) } } -static void pl110_update_pallette(pl110_state *s, int n) +static void pl110_update_palette(pl110_state *s, int n) { int i; uint32_t raw; unsigned int r, g, b; -raw = s-raw_pallette[n]; +raw = s-raw_palette[n]; n = 1; for (i = 0; i 2; i++) { r = (raw 0x1f) 3; @@ -271,17 +271,17 @@ static void pl110_update_pallette(pl110_state *s, int n) raw = 6; switch (ds_get_bits_per_pixel(s-ds)) { case 8: -s-pallette[n] = rgb_to_pixel8(r, g, b); +s-palette[n] = rgb_to_pixel8(r, g, b); break; case 15: -s-pallette[n] = rgb_to_pixel15(r, g, b); +s-palette[n] = rgb_to_pixel15(r, g, b); break; case 16: -s-pallette[n] = rgb_to_pixel16(r, g, b); +s-palette[n] = rgb_to_pixel16(r, g, b); break; case 24: case 32: -s-pallette[n] = rgb_to_pixel32(r, g, b); +s-palette[n] = rgb_to_pixel32(r, g, b); break; } n++; @@ -314,7 +314,7 @@ static uint64_t pl110_read(void *opaque, target_phys_addr_t offset, return idregs[s-version][(offset - 0xfe0) 2]; } if (offset = 0x200 offset 0x400) { -return s-raw_pallette[(offset - 0x200) 2]; +return s-raw_palette[(offset - 0x200) 2]; } switch (offset 2) { case 0: /* LCDTiming0 */ @@ -364,10 +364,10 @@ static void pl110_write(void *opaque, target_phys_addr_t offset, is written to. */ s-invalidate = 1; if (offset = 0x200 offset 0x400) { -/* Pallette. */ +/* Palette. */ n = (offset - 0x200) 2; -s-raw_pallette[(offset - 0x200) 2] = val; -pl110_update_pallette(s, n); +s-raw_palette[(offset - 0x200) 2] = val; +pl110_update_palette(s, n); return; } switch (offset 2) { diff --git a/hw/pl110_template.h b/hw/pl110_template.h index 1dce32a..e738e4a 100644 --- a/hw/pl110_template.h +++ b/hw/pl110_template.h @@ -129,14 +129,14 @@ static drawfn glue(pl110_draw_fn_,BITS)[48] = static void glue(pl110_draw_line1_,NAME)(void *opaque, uint8_t *d, const uint8_t *src, int width, int deststep) { -uint32_t *pallette = opaque; +uint32_t *palette = opaque; uint32_t data; while (width 0) { data = *(uint32_t *)src; #ifdef SWAP_PIXELS -#define FN(x, y) COPY_PIXEL(d, pallette[(data (y + 7 - (x))) 1]); +#define FN(x, y) COPY_PIXEL(d, palette[(data (y + 7 - (x))) 1]); #else -#define FN(x, y) COPY_PIXEL(d, pallette[(data ((x) + y)) 1]); +#define FN(x, y) COPY_PIXEL(d, palette[(data ((x) + y)) 1]); #endif #ifdef SWAP_WORDS FN_8(24) @@ -157,14 +157,14 @@ static void glue(pl110_draw_line1_,NAME)(void *opaque, uint8_t *d, const uint8_t static void glue(pl110_draw_line2_,NAME)(void *opaque, uint8_t *d, const uint8_t *src, int width, int deststep) { -uint32_t *pallette = opaque; +uint32_t *palette = opaque; uint32_t data; while (width 0) { data =
Re: [Qemu-devel] [PATCH] Add privilege level check to several Cop0 instructions.
On Sat, Sep 17, 2011 at 05:05:32PM -0700, Eric Johnson wrote: The MIPS Architecture Verification Programs (AVPs) check privileged instructions for the required privilege level. These changes are needed to pass the AVP suite. Signed-off-by: Eric Johnson er...@mips.com --- target-mips/translate.c | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/target-mips/translate.c b/target-mips/translate.c index d5b1c76..d99a716 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -5940,6 +5940,8 @@ static void gen_cp0 (CPUState *env, DisasContext *ctx, uint32_t opc, int rt, int { const char *opn = ldst; +check_cp0_enabled(ctx); + switch (opc) { case OPC_MFC0: if (rt == 0) { @@ -10125,6 +10127,7 @@ static void gen_pool32axf (CPUState *env, DisasContext *ctx, int rt, int rs, #ifndef CONFIG_USER_ONLY case MFC0: case MFC0 + 32: +check_cp0_enabled(ctx); if (rt == 0) { /* Treat as NOP. */ break; @@ -10136,6 +10139,7 @@ static void gen_pool32axf (CPUState *env, DisasContext *ctx, int rt, int rs, { TCGv t0 = tcg_temp_new(); +check_cp0_enabled(ctx); gen_load_gpr(t0, rt); gen_mtc0(env, ctx, t0, rs, (ctx-opcode 11) 0x7); tcg_temp_free(t0); @@ -10230,10 +10234,12 @@ static void gen_pool32axf (CPUState *env, DisasContext *ctx, int rt, int rs, switch (minor) { case RDPGPR: check_insn(env, ctx, ISA_MIPS32R2); +check_cp0_enabled(ctx); gen_load_srsgpr(rt, rs); break; case WRPGPR: check_insn(env, ctx, ISA_MIPS32R2); +check_cp0_enabled(ctx); gen_store_srsgpr(rt, rs); break; default: @@ -10276,6 +10282,7 @@ static void gen_pool32axf (CPUState *env, DisasContext *ctx, int rt, int rs, { TCGv t0 = tcg_temp_new(); +check_cp0_enabled(ctx); save_cpu_state(ctx, 1); gen_helper_di(t0); gen_store_gpr(t0, rs); @@ -10288,6 +10295,7 @@ static void gen_pool32axf (CPUState *env, DisasContext *ctx, int rt, int rs, { TCGv t0 = tcg_temp_new(); +check_cp0_enabled(ctx); save_cpu_state(ctx, 1); gen_helper_ei(t0); gen_store_gpr(t0, rs); @@ -10765,6 +10773,7 @@ static void decode_micromips32_opc (CPUState *env, DisasContext *ctx, minor = (ctx-opcode 12) 0xf; switch (minor) { case CACHE: +check_cp0_enabled(ctx); /* Treat as no-op. */ break; case LWC2: @@ -12216,6 +12225,7 @@ static void decode_opc (CPUState *env, DisasContext *ctx, int *is_branch) break; case OPC_CACHE: check_insn(env, ctx, ISA_MIPS3 | ISA_MIPS32); +check_cp0_enabled(ctx); /* Treat as NOP. */ break; case OPC_PREF: Thanks, applied. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [PATCH] Allow microMIPS SWP and SDP to have RD equal to BASE.
On Sat, Sep 17, 2011 at 05:28:16PM -0700, Eric Johnson wrote: The microMIPS SWP and SDP instructions do not modify GPRs. So their behavior is well defined when RD equals BASE. The MIPS Architecture Verification Programs (AVPs) check that they work as expected. This is required for AVPs to pass. Signed-off-by: Eric Johnson er...@mips.com --- target-mips/translate.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) The patch applies to a8467c7a0e8b024a18608ff7db31ca2f2297e641. diff --git a/target-mips/translate.c b/target-mips/translate.c index d5b1c76..82cf75b 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -10034,7 +10034,7 @@ static void gen_ldst_pair (DisasContext *ctx, uint32_t opc, int rd, const char *opn = ldst_pair; TCGv t0, t1; -if (ctx-hflags MIPS_HFLAG_BMASK || rd == 31 || rd == base) { +if (ctx-hflags MIPS_HFLAG_BMASK || rd == 31) { generate_exception(ctx, EXCP_RI); return; } @@ -10046,6 +10046,10 @@ static void gen_ldst_pair (DisasContext *ctx, uint32_t opc, int rd, switch (opc) { case LWP: +if (rd == base) { +generate_exception(ctx, EXCP_RI); +return; +} save_cpu_state(ctx, 0); op_ld_lw(t1, t0, ctx); gen_store_gpr(t1, rd); @@ -10067,6 +10071,10 @@ static void gen_ldst_pair (DisasContext *ctx, uint32_t opc, int rd, break; #ifdef TARGET_MIPS64 case LDP: +if (rd == base) { +generate_exception(ctx, EXCP_RI); +return; +} save_cpu_state(ctx, 0); op_ld_ld(t1, t0, ctx); gen_store_gpr(t1, rd); Thanks, applied. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [RFC PATCH 0/9] qom: improve reference counting and hotplug
Am 27.08.2012 09:22, schrieb liu ping fan: On Sun, Aug 26, 2012 at 11:51 PM, Anthony Liguori aligu...@us.ibm.com wrote: Right now, you need to pair up object_new with object_delete. This is impractical when using reference counting because we would like to ensure that object_unref() also frees memory when needed. The first few patches fix this problem by introducing a release callback so that objects that need special release behavior (i.e. g_free) can do that. Since link and child properties all hold references, in order to actually free an object, we need to break those links. User created devices end up as children of a container. But child properties cannot be removed which means there's no obvious way to remove the reference and ultimately free the object. Why? Since we call _add_child() in qdev_device_add(), why can not we call object_property_del_child() for qmp_device_del(). Could you explain it more detail? Seconded. If we hot-unplug a device, we should surely remove its child property from /machine/unassigned or parent bus or whatever. Why is it that child properties cannot be removed? Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment
Hi, Am 27.08.2012 08:28, schrieb Jan Kiszka: From: Jan Kiszka jan.kis...@siemens.com This adds PCI device assignment for i386 targets using the classic KVM interfaces. This version is 100% identical to what is being maintained in qemu-kvm for several years and is supported by libvirt as well. It is expected to remain relevant for another couple of years until kernels without full-features and performance-wise equivalent VFIO support are obsolete. A refactoring to-do that should be done in-tree is to model MSI and MSI-X support via the generic PCI layer, similar to what VFIO is already doing for MSI-X. This should improve the correctness and clean up the code from duplicate logic. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/kvm/Makefile.objs |2 +- hw/kvm/pci-assign.c | 1929 ++ 2 files changed, 1930 insertions(+), 1 deletions(-) create mode 100644 hw/kvm/pci-assign.c [...] diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c new file mode 100644 index 000..9cce02c --- /dev/null +++ b/hw/kvm/pci-assign.c @@ -0,0 +1,1929 @@ +/* + * Copyright (c) 2007, Neocleus Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. The downside of accepting this into qemu.git is that it gets us a huge blob of GPLv2-only code without history of contributors for GPLv2+ relicensing... + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. (Expect the usual GNU address reminder here.) + * + * + * Assign a PCI device from the host to a guest VM. + * + * Adapted for KVM by Qumranet. + * + * Copyright (c) 2007, Neocleus, Alex Novik (a...@neocleus.com) + * Copyright (c) 2007, Neocleus, Guy Zana (g...@neocleus.com) + * Copyright (C) 2008, Qumranet, Amit Shah (amit.s...@qumranet.com) + * Copyright (C) 2008, Red Hat, Amit Shah (amit.s...@redhat.com) + * Copyright (C) 2008, IBM, Muli Ben-Yehuda (m...@il.ibm.com) + */ +#include stdio.h +#include unistd.h +#include sys/io.h +#include sys/mman.h +#include sys/types.h +#include sys/stat.h +#include hw/hw.h +#include hw/pc.h +#include qemu-error.h +#include console.h +#include hw/loader.h +#include monitor.h +#include range.h +#include sysemu.h +#include hw/pci.h +#include hw/msi.h +#include kvm_i386.h Am I correct to understand we compile this only for i386 / x86_64? (apic.o in kvm/Makefile.objs hints in that direction) You may want to update the description in the comment above accordingly, also mentioning that this is some deprecated backwards-compatibility thing. Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [RFC PATCH 0/9] qom: improve reference counting and hotplug
Il 27/08/2012 13:46, Andreas Färber ha scritto: Since link and child properties all hold references, in order to actually free an object, we need to break those links. User created devices end up as children of a container. But child properties cannot be removed which means there's no obvious way to remove the reference and ultimately free the object. Why? Since we call _add_child() in qdev_device_add(), why can not we call object_property_del_child() for qmp_device_del(). Could you explain it more detail? Seconded. If we hot-unplug a device, we should surely remove its child property from /machine/unassigned or parent bus or whatever. Sure, as soon as the device is ejected by the guest. But until that point we need to keep the device in the QOM tree so that: 1) it has a canonical path; 2) it can be examined; 3) it keeps children alive. Why is it that child properties cannot be removed? Yeah, I didn't quite understand the difference between unparenting and setting the child property to NULL. Paolo
Re: [Qemu-devel] [PATCH 4/4] kvm: i386: Add classic PCI device assignment
On 2012-08-27 14:07, Andreas Färber wrote: Hi, Am 27.08.2012 08:28, schrieb Jan Kiszka: From: Jan Kiszka jan.kis...@siemens.com This adds PCI device assignment for i386 targets using the classic KVM interfaces. This version is 100% identical to what is being maintained in qemu-kvm for several years and is supported by libvirt as well. It is expected to remain relevant for another couple of years until kernels without full-features and performance-wise equivalent VFIO support are obsolete. A refactoring to-do that should be done in-tree is to model MSI and MSI-X support via the generic PCI layer, similar to what VFIO is already doing for MSI-X. This should improve the correctness and clean up the code from duplicate logic. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/kvm/Makefile.objs |2 +- hw/kvm/pci-assign.c | 1929 ++ 2 files changed, 1930 insertions(+), 1 deletions(-) create mode 100644 hw/kvm/pci-assign.c [...] diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c new file mode 100644 index 000..9cce02c --- /dev/null +++ b/hw/kvm/pci-assign.c @@ -0,0 +1,1929 @@ +/* + * Copyright (c) 2007, Neocleus Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. The downside of accepting this into qemu.git is that it gets us a huge blob of GPLv2-only code without history of contributors for GPLv2+ relicensing... The history is documented in qemu-kvm. I personally don't see it will pay off going through this, but someone else may, and nothing will prevent trying this at least. I can leave a comment. BTW, VFIO will be GPLv2 only as well. If I understood Alex correctly, it is too much derived from this code. IOW: There is probably no PCI assignment without this restriction in the foreseeable future. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. (Expect the usual GNU address reminder here.) Will fix. + * + * + * Assign a PCI device from the host to a guest VM. + * + * Adapted for KVM by Qumranet. + * + * Copyright (c) 2007, Neocleus, Alex Novik (a...@neocleus.com) + * Copyright (c) 2007, Neocleus, Guy Zana (g...@neocleus.com) + * Copyright (C) 2008, Qumranet, Amit Shah (amit.s...@qumranet.com) + * Copyright (C) 2008, Red Hat, Amit Shah (amit.s...@redhat.com) + * Copyright (C) 2008, IBM, Muli Ben-Yehuda (m...@il.ibm.com) + */ +#include stdio.h +#include unistd.h +#include sys/io.h +#include sys/mman.h +#include sys/types.h +#include sys/stat.h +#include hw/hw.h +#include hw/pc.h +#include qemu-error.h +#include console.h +#include hw/loader.h +#include monitor.h +#include range.h +#include sysemu.h +#include hw/pci.h +#include hw/msi.h +#include kvm_i386.h Am I correct to understand we compile this only for i386 / x86_64? This is correct. (apic.o in kvm/Makefile.objs hints in that direction) You may want to update the description in the comment above accordingly, also mentioning that this is some deprecated backwards-compatibility thing. You mean in the header of pci-assign.c? Can do. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCHv2 0/4] migrate PV EOI MSR
It turns out PV EOI gets disabled after migration - until next guest reset. This is because we are missing code to actually migrate it. This patch fixes it up: it applies cleanly to qemu.git as well as qemu-kvm.git, so I think it's cleaner to apply it in qemu.git to keep diff to minimum. Note: there's talk about adding infrastructure for CPUID whitelisting which thinkably could be used for migration compat support. I am guessing this won't be 1.2 material - when it's ready we can easily replace a simple flag that this patchset adds with something else. So this just adds minimal code to avoid regressing cross-version migration. Note: there's a kernel bug in linux 3.6-rc3 - apply my patch 'kvm: fix KVM_GET_MSR for PV EOI' in order to use this patchset on it. Needed for 1.2. Changes from v1: Update all headers from 3.6-rc3 to keep them in sync (Jan) Disable cpuid flag for qemu 1.2 and older (Orit) Michael S. Tsirkin (4): linux-headers: update to 3.6-rc3 pc: refactor compat code cpuid: disable pv eoi for 1.1 and older compat types kvm: get/set PV EOI MSR hw/Makefile.objs | 2 +- hw/cpu_flags.c| 32 +++ hw/cpu_flags.h| 9 hw/pc_piix.c | 46 --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 ++ linux-headers/linux/kvm.h | 3 +++ target-i386/cpu.c | 8 +++ target-i386/cpu.h | 1 + target-i386/kvm.c | 13 +++ target-i386/machine.c | 21 ++ 13 files changed, 136 insertions(+), 11 deletions(-) create mode 100644 hw/cpu_flags.c create mode 100644 hw/cpu_flags.h -- MST
[Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3
Update linux-headers to version present in Linux 3.6-rc3. Header asm-x96_64/kvm_para.h update is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 +++ linux-headers/linux/kvm.h | 3 +++ 5 files changed, 13 insertions(+), 2 deletions(-) diff --git a/linux-headers/asm-s390/kvm.h b/linux-headers/asm-s390/kvm.h index bdcbe0f..d25da59 100644 --- a/linux-headers/asm-s390/kvm.h +++ b/linux-headers/asm-s390/kvm.h @@ -1,7 +1,7 @@ #ifndef __LINUX_KVM_S390_H #define __LINUX_KVM_S390_H /* - * asm-s390/kvm.h - KVM s390 specific structures and definitions + * KVM s390 specific structures and definitions * * Copyright IBM Corp. 2008 * diff --git a/linux-headers/asm-s390/kvm_para.h b/linux-headers/asm-s390/kvm_para.h index 8e2dd67..870051f 100644 --- a/linux-headers/asm-s390/kvm_para.h +++ b/linux-headers/asm-s390/kvm_para.h @@ -1,5 +1,5 @@ /* - * asm-s390/kvm_para.h - definition for paravirtual devices on s390 + * definition for paravirtual devices on s390 * * Copyright IBM Corp. 2008 * diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h index e7d1c19..246617e 100644 --- a/linux-headers/asm-x86/kvm.h +++ b/linux-headers/asm-x86/kvm.h @@ -12,6 +12,7 @@ /* Select x86 specific features in linux/kvm.h */ #define __KVM_HAVE_PIT #define __KVM_HAVE_IOAPIC +#define __KVM_HAVE_IRQ_LINE #define __KVM_HAVE_DEVICE_ASSIGNMENT #define __KVM_HAVE_MSI #define __KVM_HAVE_USER_NMI diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h index f2ac46a..a1c3d72 100644 --- a/linux-headers/asm-x86/kvm_para.h +++ b/linux-headers/asm-x86/kvm_para.h @@ -22,6 +22,7 @@ #define KVM_FEATURE_CLOCKSOURCE23 #define KVM_FEATURE_ASYNC_PF 4 #define KVM_FEATURE_STEAL_TIME 5 +#define KVM_FEATURE_PV_EOI 6 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -37,6 +38,7 @@ #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 +#define MSR_KVM_PV_EOI_EN 0x4b564d04 struct kvm_steal_time { __u64 steal; @@ -89,5 +91,10 @@ struct kvm_vcpu_pv_apf_data { __u32 enabled; }; +#define KVM_PV_EOI_BIT 0 +#define KVM_PV_EOI_MASK (0x1 KVM_PV_EOI_BIT) +#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK +#define KVM_PV_EOI_DISABLED 0x0 + #endif /* _ASM_X86_KVM_PARA_H */ diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 5a9d4e3..4b9e575 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -617,6 +617,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_SIGNAL_MSI 77 #define KVM_CAP_PPC_GET_SMMU_INFO 78 #define KVM_CAP_S390_COW 79 +#define KVM_CAP_PPC_ALLOC_HTAB 80 #ifdef KVM_CAP_IRQ_ROUTING @@ -828,6 +829,8 @@ struct kvm_s390_ucas_mapping { #define KVM_SIGNAL_MSI_IOW(KVMIO, 0xa5, struct kvm_msi) /* Available with KVM_CAP_PPC_GET_SMMU_INFO */ #define KVM_PPC_GET_SMMU_INFO_IOR(KVMIO, 0xa6, struct kvm_ppc_smmu_info) +/* Available with KVM_CAP_PPC_ALLOC_HTAB */ +#define KVM_PPC_ALLOCATE_HTAB_IOWR(KVMIO, 0xa7, __u32) /* * ioctls for vcpu fds -- MST
[Qemu-devel] [PATCHv2 2/4] pc: refactor compat code
In preparation to adding PV EOI migration for 1.2, trivially refactor some some compat code to make it easier to add version specific cpuid tweaks. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pc_piix.c | 44 1 file changed, 36 insertions(+), 8 deletions(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index a771d79..008d42f 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -369,6 +369,22 @@ static QEMUMachine pc_machine_v1_2 = { .default_machine_opts = KVM_MACHINE_OPTIONS, }; +static void pc_machine_v1_1_compat(void) +{ +} + +static void pc_init_pci_v1_1(ram_addr_t ram_size, + const char *boot_device, + const char *kernel_filename, + const char *kernel_cmdline, + const char *initrd_filename, + const char *cpu_model) +{ +pc_machine_v1_1_compat(); +pc_init_pci(ram_size, boot_device, kernel_filename, +kernel_cmdline, initrd_filename, cpu_model); +} + #define PC_COMPAT_1_1 \ {\ .driver = virtio-scsi-pci,\ @@ -403,7 +419,7 @@ static QEMUMachine pc_machine_v1_2 = { static QEMUMachine pc_machine_v1_1 = { .name = pc-1.1, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_v1_1, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -439,7 +455,7 @@ static QEMUMachine pc_machine_v1_1 = { static QEMUMachine pc_machine_v1_0 = { .name = pc-1.0, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_v1_1, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -455,7 +471,7 @@ static QEMUMachine pc_machine_v1_0 = { static QEMUMachine pc_machine_v0_15 = { .name = pc-0.15, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_v1_1, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -488,7 +504,7 @@ static QEMUMachine pc_machine_v0_15 = { static QEMUMachine pc_machine_v0_14 = { .name = pc-0.14, .desc = Standard PC, -.init = pc_init_pci, +.init = pc_init_pci_v1_1, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -519,10 +535,22 @@ static QEMUMachine pc_machine_v0_14 = { .value= stringify(1),\ } +static void pc_init_pci_v0_13(ram_addr_t ram_size, + const char *boot_device, + const char *kernel_filename, + const char *kernel_cmdline, + const char *initrd_filename, + const char *cpu_model) +{ +pc_machine_v1_1_compat(); +pc_init_pci_no_kvmclock(ram_size, boot_device, kernel_filename, +kernel_cmdline, initrd_filename, cpu_model); +} + static QEMUMachine pc_machine_v0_13 = { .name = pc-0.13, .desc = Standard PC, -.init = pc_init_pci_no_kvmclock, +.init = pc_init_pci_v0_13, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -560,7 +588,7 @@ static QEMUMachine pc_machine_v0_13 = { static QEMUMachine pc_machine_v0_12 = { .name = pc-0.12, .desc = Standard PC, -.init = pc_init_pci_no_kvmclock, +.init = pc_init_pci_v0_13, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -594,7 +622,7 @@ static QEMUMachine pc_machine_v0_12 = { static QEMUMachine pc_machine_v0_11 = { .name = pc-0.11, .desc = Standard PC, qemu 0.11, -.init = pc_init_pci_no_kvmclock, +.init = pc_init_pci_v0_13, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { @@ -616,7 +644,7 @@ static QEMUMachine pc_machine_v0_11 = { static QEMUMachine pc_machine_v0_10 = { .name = pc-0.10, .desc = Standard PC, qemu 0.10, -.init = pc_init_pci_no_kvmclock, +.init = pc_init_pci_v0_13, .max_cpus = 255, .default_machine_opts = KVM_MACHINE_OPTIONS, .compat_props = (GlobalProperty[]) { -- MST
[Qemu-devel] [PATCHv2 3/4] cpuid: disable pv eoi for 1.1 and older compat types
In preparation for adding PV EOI support, disable PV EOI by default for 1.1 and older machine types, to avoid CPUID changing during migration. PV EOI can still be enabled/disabled by specifying it explicitly. Enable for 1.1 -M pc-1.1 -cpu kvm64,+kvm_pv_eoi Disable for 1.2 -M pc-1.2 -cpu kvm64,-kvm_pv_eoi Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/Makefile.objs | 2 +- hw/cpu_flags.c| 32 hw/cpu_flags.h| 9 + hw/pc_piix.c | 2 ++ target-i386/cpu.c | 8 5 files changed, 52 insertions(+), 1 deletion(-) create mode 100644 hw/cpu_flags.c create mode 100644 hw/cpu_flags.h diff --git a/hw/Makefile.objs b/hw/Makefile.objs index 850b87b..3f2532a 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -1,5 +1,5 @@ hw-obj-y = usb/ ide/ -hw-obj-y += loader.o +hw-obj-y += loader.o cpu_flags.o hw-obj-$(CONFIG_VIRTIO) += virtio-console.o hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o diff --git a/hw/cpu_flags.c b/hw/cpu_flags.c new file mode 100644 index 000..2422d20 --- /dev/null +++ b/hw/cpu_flags.c @@ -0,0 +1,32 @@ +/* + * CPU compatibility flags. + * + * Copyright (c) 2012 Red Hat Inc. + * Author: Michael S. Tsirkin. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ +#include hw/cpu_flags.h + +static bool __kvm_pv_eoi_disabled; + +void disable_kvm_pv_eoi(void) +{ + __kvm_pv_eoi_disabled = true; +} + +bool kvm_pv_eoi_disabled(void) +{ + return __kvm_pv_eoi_disabled; +} diff --git a/hw/cpu_flags.h b/hw/cpu_flags.h new file mode 100644 index 000..05777b6 --- /dev/null +++ b/hw/cpu_flags.h @@ -0,0 +1,9 @@ +#ifndef HW_CPU_FLAGS_H +#define HW_CPU_FLAGS_H + +#include stdbool.h + +void disable_kvm_pv_eoi(void); +bool kvm_pv_eoi_disabled(void); + +#endif diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 008d42f..bdbceda 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -46,6 +46,7 @@ #ifdef CONFIG_XEN # include xen/hvm/hvm_info_table.h #endif +#include cpu_flags.h #define MAX_IDE_BUS 2 @@ -371,6 +372,7 @@ static QEMUMachine pc_machine_v1_2 = { static void pc_machine_v1_1_compat(void) { +disable_kvm_pv_eoi(); } static void pc_init_pci_v1_1(ram_addr_t ram_size, diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 120a2e3..0d02fd1 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -23,6 +23,7 @@ #include cpu.h #include kvm.h +#include asm/kvm_para.h #include qemu-option.h #include qemu-config.h @@ -33,6 +34,7 @@ #include hyperv.h #include hw/hw.h +#include hw/cpu_flags.h /* feature flags taken from Intel Processor Identification and the CPUID * Instruction and AMD's CPUID Specification. In cases of disagreement @@ -889,6 +891,12 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, const char *cpu_model) plus_kvm_features = ~0; /* not supported bits will be filtered out later */ +/* Disable PV EOI for old machine types. + * Feature flags can still override. */ +if (kvm_pv_eoi_disabled()) { +plus_kvm_features = ~(0x1 KVM_FEATURE_PV_EOI); +} + add_flagname_to_bitmaps(hypervisor, plus_features, plus_ext_features, plus_ext2_features, plus_ext3_features, plus_kvm_features, plus_svm_features); -- MST
[Qemu-devel] [PATCHv2 4/4] kvm: get/set PV EOI MSR
Support get/set of new PV EOI MSR, for migration. Add an optional section for MSR value - send it out in case MSR was changed from the default value (0). Signed-off-by: Michael S. Tsirkin m...@redhat.com --- target-i386/cpu.h | 1 + target-i386/kvm.c | 13 + target-i386/machine.c | 21 + 3 files changed, 35 insertions(+) diff --git a/target-i386/cpu.h b/target-i386/cpu.h index aabf993..3c57d8b 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -699,6 +699,7 @@ typedef struct CPUX86State { uint64_t system_time_msr; uint64_t wall_clock_msr; uint64_t async_pf_en_msr; +uint64_t pv_eoi_en_msr; uint64_t tsc; uint64_t tsc_deadline; diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 5e2d4f5..6790180 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -64,6 +64,7 @@ static bool has_msr_star; static bool has_msr_hsave_pa; static bool has_msr_tsc_deadline; static bool has_msr_async_pf_en; +static bool has_msr_pv_eoi_en; static bool has_msr_misc_enable; static int lm_capable_kernel; @@ -456,6 +457,8 @@ int kvm_arch_init_vcpu(CPUX86State *env) has_msr_async_pf_en = c-eax (1 KVM_FEATURE_ASYNC_PF); +has_msr_pv_eoi_en = c-eax (1 KVM_FEATURE_PV_EOI); + cpu_x86_cpuid(env, 0, 0, limit, unused, unused, unused); for (i = 0; i = limit; i++) { @@ -1018,6 +1021,10 @@ static int kvm_put_msrs(CPUX86State *env, int level) kvm_msr_entry_set(msrs[n++], MSR_KVM_ASYNC_PF_EN, env-async_pf_en_msr); } +if (has_msr_pv_eoi_en) { +kvm_msr_entry_set(msrs[n++], MSR_KVM_PV_EOI_EN, + env-pv_eoi_en_msr); +} if (hyperv_hypercall_available()) { kvm_msr_entry_set(msrs[n++], HV_X64_MSR_GUEST_OS_ID, 0); kvm_msr_entry_set(msrs[n++], HV_X64_MSR_HYPERCALL, 0); @@ -1260,6 +1267,9 @@ static int kvm_get_msrs(CPUX86State *env) if (has_msr_async_pf_en) { msrs[n++].index = MSR_KVM_ASYNC_PF_EN; } +if (has_msr_pv_eoi_en) { +msrs[n++].index = MSR_KVM_PV_EOI_EN; +} if (env-mcg_cap) { msrs[n++].index = MSR_MCG_STATUS; @@ -1339,6 +1349,9 @@ static int kvm_get_msrs(CPUX86State *env) case MSR_KVM_ASYNC_PF_EN: env-async_pf_en_msr = msrs[i].data; break; +case MSR_KVM_PV_EOI_EN: +env-pv_eoi_en_msr = msrs[i].data; +break; } } diff --git a/target-i386/machine.c b/target-i386/machine.c index a8be058..4771508 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -279,6 +279,13 @@ static bool async_pf_msr_needed(void *opaque) return cpu-async_pf_en_msr != 0; } +static bool pv_eoi_msr_needed(void *opaque) +{ +CPUX86State *cpu = opaque; + +return cpu-pv_eoi_en_msr != 0; +} + static const VMStateDescription vmstate_async_pf_msr = { .name = cpu/async_pf_msr, .version_id = 1, @@ -290,6 +297,17 @@ static const VMStateDescription vmstate_async_pf_msr = { } }; +static const VMStateDescription vmstate_pv_eoi_msr = { +.name = cpu/async_pv_eoi_msr, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT64(pv_eoi_en_msr, CPUX86State), +VMSTATE_END_OF_LIST() +} +}; + static bool fpop_ip_dp_needed(void *opaque) { CPUX86State *env = opaque; @@ -454,6 +472,9 @@ static const VMStateDescription vmstate_cpu = { .vmsd = vmstate_async_pf_msr, .needed = async_pf_msr_needed, } , { +.vmsd = vmstate_pv_eoi_msr, +.needed = pv_eoi_msr_needed, +} , { .vmsd = vmstate_fpop_ip_dp, .needed = fpop_ip_dp_needed, }, { -- MST
[Qemu-devel] [PATCH] Save/load PC speaker internal state
Save PC speaker state to remove differences between system states after saving the snapshot and after loading it again. This patch is needed for deterministic replay of the execution. Signed-off-by: Pavel Dovgalyukpavel.dovga...@gmail.com --- hw/pcspk.c | 18 ++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/hw/pcspk.c b/hw/pcspk.c index e430324..3fb3dd1 100644 --- a/hw/pcspk.c +++ b/hw/pcspk.c @@ -159,10 +159,28 @@ static const MemoryRegionOps pcspk_io_ops = { }, }; +static const VMStateDescription vmstate_spk = { +.name = pcspk, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField[]) { +VMSTATE_UINT8_ARRAY(sample_buf, PCSpkState, PCSPK_BUF_LEN), +VMSTATE_UINT32(pit_count, PCSpkState), +VMSTATE_UINT32(samples, PCSpkState), +VMSTATE_UINT32(play_pos, PCSpkState), +VMSTATE_INT32(data_on, PCSpkState), +VMSTATE_INT32(dummy_refresh_clock, PCSpkState), +VMSTATE_END_OF_LIST() +} +}; + static int pcspk_initfn(ISADevice *dev) { PCSpkState *s = DO_UPCAST(PCSpkState, dev, dev); +vmstate_register(NULL, 0, vmstate_spk, s); + memory_region_init_io(s-ioport, pcspk_io_ops, s, elcr, 1); isa_register_ioport(dev, s-ioport, s-iobase);
Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3
On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote: Update linux-headers to version present in Linux 3.6-rc3. Header asm-x96_64/kvm_para.h update is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 +++ linux-headers/linux/kvm.h | 3 +++ 5 files changed, 13 insertions(+), 2 deletions(-) The latest version of update-linux-headers.sh should have caused this update to include asm-generic/kvm_para.h, I think. Did the script not pull that header in, or were you maybe using an old version of the script or forgot to git add the new file? thanks -- PMM
Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3
On 2012-08-27 14:42, Peter Maydell wrote: On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote: Update linux-headers to version present in Linux 3.6-rc3. Header asm-x96_64/kvm_para.h update is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 +++ linux-headers/linux/kvm.h | 3 +++ 5 files changed, 13 insertions(+), 2 deletions(-) The latest version of update-linux-headers.sh should have caused this update to include asm-generic/kvm_para.h, I think. Did the script not pull that header in, or were you maybe using an old version of the script or forgot to git add the new file? To be fair, that is hard to guess. We should add some magic to the update script to detect new files and maybe suggest them for addition. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH] Save/load PC speaker internal state
On 27 August 2012 13:21, Pavel Dovgaluk pavel.dovga...@ispras.ru wrote: Save PC speaker state to remove differences between system states after saving the snapshot and after loading it again. This patch is needed for deterministic replay of the execution. Signed-off-by: Pavel Dovgalyukpavel.dovga...@gmail.com Hi Pavel; thanks for this patch. Couple of minor issues: +static const VMStateDescription vmstate_spk = { +.name = pcspk, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField[]) { +VMSTATE_UINT8_ARRAY(sample_buf, PCSpkState, PCSPK_BUF_LEN), +VMSTATE_UINT32(pit_count, PCSpkState), +VMSTATE_UINT32(samples, PCSpkState), +VMSTATE_UINT32(play_pos, PCSpkState), +VMSTATE_INT32(data_on, PCSpkState), +VMSTATE_INT32(dummy_refresh_clock, PCSpkState), I think that you need also to update the types in the PCSpkState struct from int/unsigned int to int32_t/uint32_t, otherwise this won't compile on a 64 bit system. +VMSTATE_END_OF_LIST() +} +}; + static int pcspk_initfn(ISADevice *dev) { PCSpkState *s = DO_UPCAST(PCSpkState, dev, dev); +vmstate_register(NULL, 0, vmstate_spk, s); + It's nicer to register the vmstate by setting dc-vmsd = vmstate_spk; in pcspk_class_initfn(); then you don't need to explicitly call vmstate_register(). memory_region_init_io(s-ioport, pcspk_io_ops, s, elcr, 1); isa_register_ioport(dev, s-ioport, s-iobase); -- PMM
Re: [Qemu-devel] [PATCH 8/9] qdev: make qdev_set_parent_bus() just set a link property
liu ping fan qemul...@gmail.com writes: On Sun, Aug 26, 2012 at 11:51 PM, Anthony Liguori aligu...@us.ibm.com wrote: Also make setting the link to NULL break the bus link Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- hw/qdev.c | 48 ++-- 1 files changed, 42 insertions(+), 6 deletions(-) diff --git a/hw/qdev.c b/hw/qdev.c index 86e1337..525a0cb 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -100,8 +100,7 @@ static void bus_add_child(BusState *bus, DeviceState *child) void qdev_set_parent_bus(DeviceState *dev, BusState *bus) { -dev-parent_bus = bus; -bus_add_child(bus, dev); +object_property_set_link(OBJECT(dev), OBJECT(bus), parent_bus, NULL); } /* Create a new device. This only initializes the device state structure @@ -241,8 +240,8 @@ void qbus_reset_all_fn(void *opaque) /* can be used as -unplug() callback for the simple cases */ int qdev_simple_unplug_cb(DeviceState *dev) { -/* just zap it */ -qdev_free(dev); +/* Unplug from parent bus via a forced eject */ +qdev_set_parent_bus(dev, NULL); I think it is more reliable to remove the reference property(child, link) before object_finialize(). So when uplug-finish, we delete all the refers: bus-child, bus-child by _del_property not using _set_property. object_finalize is called when ref=0. You cannot remove refs in finalize because by definition, ref=0. Regards, Anthony Liguori
Re: [Qemu-devel] qcow2: online snasphots : internal vs external ?
Ok, got it, Thanks Paolo ! - Mail original - De: Paolo Bonzini pbonz...@redhat.com À: qemu-devel@nongnu.org Envoyé: Lundi 27 Août 2012 12:10:34 Objet: Re: [Qemu-devel] qcow2: online snasphots : internal vs external ? Il 27/08/2012 11:26, Alexandre DERUMIER ha scritto: how can I rollback to the point of the snapshot. exemple : image1.qcow2 file : /beforesnap1 take a snaphot (snap1), so qemu switch to snap1.qcow2 write some file: file: /aftersnap1. /beforesnap1 Now, how can I rollback to the point of time of snap1 ? I can reuse image1.qcow2, but if I write some datas on it, I don't see how I can return to the point of time of the snap1. (like qemu-image -a with internal snapshots) If you can drop snap1.qcow2 altogether, you just use image1.qcow2 the next time you start QEMU. If you cannot, you create snap2.qcow2 based on image1.qcow2: qemu-img -f qcow2 -obacking_file=image1.qcow2 snap2.qcow2 and use it the next time you start QEMU. Paolo -- -- Alexandre D e rumier Ingénieur Systèmes et Réseaux Fixe : 03 20 68 88 85 Fax : 03 20 68 90 88 45 Bvd du Général Leclerc 59100 Roubaix 12 rue Marivaux 75002 Paris
Re: [Qemu-devel] [RFC PATCH 0/9] qom: improve reference counting and hotplug
Paolo Bonzini pbonz...@redhat.com writes: Il 27/08/2012 13:46, Andreas Färber ha scritto: Since link and child properties all hold references, in order to actually free an object, we need to break those links. User created devices end up as children of a container. But child properties cannot be removed which means there's no obvious way to remove the reference and ultimately free the object. Why? Since we call _add_child() in qdev_device_add(), why can not we call object_property_del_child() for qmp_device_del(). Could you explain it more detail? Seconded. If we hot-unplug a device, we should surely remove its child property from /machine/unassigned or parent bus or whatever. That's exactly what is happening in this series. qmp_device_del adds an ejection notifier that unparents the device to remove the last reference count. Sure, as soon as the device is ejected by the guest. But until that point we need to keep the device in the QOM tree so that: 1) it has a canonical path; 2) it can be examined; 3) it keeps children alive. Why is it that child properties cannot be removed? Yeah, I didn't quite understand the difference between unparenting and setting the child property to NULL. They are exactly the same thing. Setting the child property to NULL is unparenting. Unparenting is essentially deleting. This series makes it such that there is a white list of devices that are capable of being deleted. Regards, Anthony Liguori Paolo
Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycle problem
Liu Ping Fan qemul...@gmail.com writes: From: Liu Ping Fan pingf...@linux.vnet.ibm.com Scene: obja lies in objA, when objA's ref-0, it will be freed, but at that time obja can still be in use. The real example is: typedef struct PCIIDEState { PCIDevice dev; IDEBus bus[2]; -- create in place . } When without big lock protection for mmio-dispatch, we will hold obj's refcnt. So memory_region_init_io() will replace the third para void *opaque with Object *obj. With this patch, we can protect PCIIDEState from disappearing during mmio-dispatch hold the IDEBus-ref. And the ref circle has been broken when calling qdev_delete_subtree(). Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com I think this is solving the wrong problem. There are many, many dependencies a device may have on other devices. Memory allocation isn't the only one. The problem is that we want to make sure that a device doesn't go away while an MMIO dispatch is happening. This is easy to solve without touching referencing counting. The device will hold a lock while the MMIO is being dispatched. The delete path simply needs to acquire that same lock. This will ensure that a delete operation cannot finish while MMIO is still in flight. Regarding deleting a device, not all devices are capable of being deleted and specifically, devices that are composed within the memory of another device cannot be directly deleted (they can only be deleted as part of their parent's destruction). Regards, Anthony Liguori --- hw/qdev.c |2 ++ hw/qdev.h |1 + 2 files changed, 3 insertions(+), 0 deletions(-) diff --git a/hw/qdev.c b/hw/qdev.c index e2339a1..b09ebbf 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -510,6 +510,8 @@ void qbus_create_inplace(BusState *bus, const char *typename, { object_initialize(bus, typename); +bus-overlap = parent; +object_ref(OBJECT(bus-overlap)); bus-parent = parent; bus-name = name ? g_strdup(name) : NULL; qbus_realize(bus); diff --git a/hw/qdev.h b/hw/qdev.h index 182cfa5..9bc5783 100644 --- a/hw/qdev.h +++ b/hw/qdev.h @@ -117,6 +117,7 @@ struct BusState { int allow_hotplug; bool qom_allocated; bool glib_allocated; +DeviceState *overlap; int max_index; QTAILQ_HEAD(ChildrenHead, BusChild) children; QLIST_ENTRY(BusState) sibling; -- 1.7.4.4
Re: [Qemu-devel] [PATCH v7 0/6] convert sendkey to qapi
On Mon, 27 Aug 2012 15:23:31 +0800 Amos Kong ak...@redhat.com wrote: On 20/08/12 23:08, Luiz Capitulino wrote: On Mon, 20 Aug 2012 07:25:13 -0600 Eric Blakeebl...@redhat.com wrote: On 08/19/2012 10:39 PM, Amos Kong wrote: This series converted 'sendkey' command to qapi. The raw value in hexadecimal format is not supported by 'send-key' of qmp. Are we still trying to get this into 1.2, or have we missed that deadline? Too late for 1.2, IMO. So I need to wait and repost a V8(# Since: 1.3) after 1.2 is released ? I haven't reviewed this yet. If it's good enough, then I can do the s/1.2/1.3 change myself when applying this to the qmp queue.
Re: [Qemu-devel] [PATCH 2/9] object: automatically free objects based on a release function
Am 26.08.2012 17:51, schrieb Anthony Liguori: Now object_delete() simply has the semantics of unref'ing an object and unparenting it. Signed-off-by: Anthony Liguori aligu...@us.ibm.com Acked-by: Andreas Färber afaer...@suse.de /-F -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH v2 0/6] Running Microport UNIX (ca 1987)
malc av1...@comtv.ru writes: On Thu, 23 Aug 2012, Matthew Ogilvie wrote: After applying this version 2 of this patch series, I can successfully run Micoport UNIX System V/386, v 2.1 (ca 1987) under qemu. (although not if I try to enable KVM) Version 1 of this series was posted about 4 weeks ago. See http://patchwork.ozlabs.org/project/qemu-devel/list/?submitter=15654 The patches are all independent, except that the documentation part of patch 5 (vga) adds onto patch 4 (retrace=) changes. [..snip..] Applied, thanks. malc, please revert these patches. They were not adequately reviewed and they also do not qualify for the stage of the release we're in. Regards, Anthony Liguori -- mailto:av1...@comtv.ru
Re: [Qemu-devel] [PATCH v2 6/6] i8259: add -no-spurious-interrupt-hack option
Matthew Ogilvie mmogilvi_q...@miniinfo.net writes: This patch provides a way to optionally suppress spurious interrupts, as a workaround for systems described below: Some old operating systems do not handle spurious interrupts well, and qemu tends to generate them significantly more often than real hardware. This is the wrong approach. You add a LostTickPolicy property to the i8259 device. Regards, Anthony Liguori Examples: - Microport UNIX System V/386 v 2.1 (ca 1987) (The main problem I'm fixing: Without this patch, it panics sporadically when accessing the hard disk.) - ATT UNIX System V/386 Release 4.0 Version 2.1a (ca 1991) See screenshot in QEMU Official OS Support List: http://www.claunia.com/qemu/objectManager.php?sClass=applicationiId=9 (I don't have this system to test.) - A report about OS/2 boot lockup from 2004 by Hampa Hug: http://lists.nongnu.org/archive/html/qemu-devel/2004-09/msg00367.html (My patch was partially inspired by his.) Also: http://lists.nongnu.org/archive/html/qemu-devel/2005-06/msg00243.html (I don't have this system to test.) Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net --- Note: checkpatches.pl gives an error about initializing the global int no_spurious_interrupt_hack = 0;, even though existing lines near it are doing the same thing. Should I give precedence to checkpatches.pl, or nearby code? There was no version 1 of this patch; this was the last thing I had to work around to get UNIX running. High level symptoms: 1. Despite using this UNIX system for nearly 10 years (ca 1987-1996) on an early 80386, I don't remember ever seeing any crash like this. I vaguely remember I may have had one or two crashes for which I don't have other explanations that perhaps could have been this, but I don't remember the error messages to confirm it. 2. It is somewhat random when UNIX crashes when running in qemu. - Sometimes it crashes the first time the floppy-based installer tries to access the hard disk (partition table?). - Other times (though fairly rarely), it actually finishes formatting and copying the first disk's files to the hard disk without crashing. - On the other hand, I've never seen it successfully boot from the hard disk without this patch. An attempt to boot from the hard drive always panics quite early. 3. I tried -win2k-hack instead, thinking maybe the hard disk is just responding faster than UNIX expected. But it doesn't seem to have any effect. UNIX still panics sporadically the same way. - TANGENT: I was going to see if my patch provides an alternative fix for installing Windows 2000, but I was unable to reproduce the original -win2k-hack problem at all (with neither -win2k-hack NOR this patch). Maybe some other change has fixed it some other way? Or maybe it is only an issue in configurations I didn't test? (KVM instead of TCG? Less RAM? Something else?) It might be worth doing a little more investigation, and eliminating the -win2k-hack option if appropriate. 4. If I enable KVM, I get a different error very early in bootup (in splx function instead of splint), and this patch doesn't help. My low level analysis of what is going on: It is hard to track down all the details, but based on logging a lot of qemu IRQ stuff, and setting a breakpoint in the earliest panic-related UNIX function using gdb, it looks like: 1. It is near the end of servicing a previous IRQ14 from the hard disk. 2. The processor has interrupts disabled (I think), while UNIX clears the slave 8259's IMR (mask) register (sets it to 0), allowing all interrupts to be passed on to the master. 3. While in that state, IRQ14 is raised (on the slave), which gets propagated to the master (IRQ2), but the CPU is not interrupted yet. 4. UNIX then masks the slave 8259's IMR register completely (sets to 0xff). 5. Because the master elcr register is set (by BIOS; UNIX never touches it) to edge trigger for IRQ2, the master latched on to IRQ2 earlier, and continues to assert the processors INT line (the env-interrupt_requestCPU_INTERRUPT_HARD bit) even after all slave IRQs have been masked off (clearing the input IRQ2). 6. Finally, UNIX enables CPU interrupts and the interrupt is delivered to the CPU, which ends up as a spurious IRQ15 due to the slave's imr register. UNIX doesn't know what to do with that, and panics/halts. I'm not sure why it only sporadically hits this sequence of events. There doesn't seem to be other IRQs asserted or serviced anywhere in the near past; the last several were all IRQ14's. But I can't help
[Qemu-devel] [Bug 1042084] [NEW] Windows 7 guest cannot boot after seabios updated
Public bug reported: Hi, I can no longer boot my Windows 7 guest after this commit (update seabios to latest master) http://git.qemu.org/?p=qemu.git;a=commitdiff;h=01afdadc92e71e29700e64f3a5f42c1c543e3cf9 When I tried to boot Windows, it BSOD and said The BIOS in this system is not fully ACPI compliant. Please contact your system vendor for an updated BIOS. Reverting this commit will fix the issue. ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1042084 Title: Windows 7 guest cannot boot after seabios updated Status in QEMU: New Bug description: Hi, I can no longer boot my Windows 7 guest after this commit (update seabios to latest master) http://git.qemu.org/?p=qemu.git;a=commitdiff;h=01afdadc92e71e29700e64f3a5f42c1c543e3cf9 When I tried to boot Windows, it BSOD and said The BIOS in this system is not fully ACPI compliant. Please contact your system vendor for an updated BIOS. Reverting this commit will fix the issue. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1042084/+subscriptions
[Qemu-devel] [Bug 1036363] Re: Major network performance problems on AMD hardware
Thank Stefan, I compiled both 0.15 and 1.0 and they do not have that problem... but Fedora package does. Perhaps the way Fedora package was compiled? I'm going to grab a source package and attempt to compile from that. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1036363 Title: Major network performance problems on AMD hardware Status in QEMU: New Status in qemu-kvm: New Bug description: Hi, I am experiencing some major performance problems with all of our beefy AMD Opteron 6274 servers running Fedora 17 (kernel 3.4.4-5.fc17.x86_64, qemu 1.0-17). The network performance between host and the virtual machine is terrible: # iperf -c 10.10.11.22 -r Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) Client connecting to 10.10.11.22, TCP port 5001 TCP window size: 197 KByte (default) [ 5] local 10.10.11.199 port 44192 connected with 10.10.11.22 port 5001 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 2.45 GBytes 2.11 Gbits/sec [ 4] local 10.10.11.199 port 5001 connected with 10.10.11.22 port 42601 [ 4] 0.0-10.0 sec 8.97 GBytes 7.71 Gbits/sec So the VM's receive is super slow. I would be happy with 7.71 Gbps because it's closer to matching the speed of the 10G ethernet adapters but the iSCSI drive's write performance is few times faster than read. Now running a similar test on the slowest machine I have, Intel core i3 I see this: # iperf -c 192.168.7.60 -r Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) Client connecting to 192.168.7.60, TCP port 5001 TCP window size: 306 KByte (default) [ 5] local 192.168.7.98 port 53992 connected with 192.168.7.60 port 5001 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 22.5 GBytes 19.3 Gbits/sec [ 4] local 192.168.7.98 port 5001 connected with 192.168.7.60 port 53339 [ 4] 0.0-10.0 sec 25.1 GBytes 21.5 Gbits/sec As you can image this is a huge difference in network IO. Most setups are identical down to the same versions. Vhost-net is enabled and it appears to use MSI-X on the VM. I've tried all kinds of settings and while they improve performance a little I feel it's just masking a bigger problem. All 12 of my AMD servers have this issue and it appears I'm not the only one complaining. Any help would be appreciated. Thanks. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1036363/+subscriptions
Re: [Qemu-devel] [PATCH 3/9] qbus: remove glib_allocated/qom_allocated and use release hook to free memory
Am 26.08.2012 17:51, schrieb Anthony Liguori: Signed-off-by: Anthony Liguori aligu...@us.ibm.com That's a really nice solution for cleaning this up, thanks! Acked-by: Andreas Färber afaer...@suse.de However one conceptional detail... --- hw/pci.c|7 ++- hw/qdev.c | 15 --- hw/qdev.h |7 --- hw/sysbus.c |7 ++- 4 files changed, 12 insertions(+), 24 deletions(-) [...] diff --git a/hw/qdev.c b/hw/qdev.c index b5a52ac..6b61daa 100644 --- a/hw/qdev.c +++ b/hw/qdev.c [...] @@ -468,18 +466,6 @@ BusState *qbus_create(const char *typename, DeviceState *parent, const char *nam return bus; } -void qbus_free(BusState *bus) -{ -if (bus-qom_allocated) { -object_delete(OBJECT(bus)); -} else { -object_finalize(OBJECT(bus)); -if (bus-glib_allocated) { -g_free(bus); -} -} -} - static char *bus_get_fw_dev_path(BusState *bus, DeviceState *dev) { BusClass *bc = BUS_GET_CLASS(bus); @@ -698,7 +684,6 @@ static void device_finalize(Object *obj) if (dev-state == DEV_STATE_INITIALIZED) { while (dev-num_child_bus) { bus = QLIST_FIRST(dev-child_bus); -qbus_free(bus); } if (qdev_get_vmsd(dev)) { vmstate_unregister(dev, qdev_get_vmsd(dev), dev); I wonder how this is gonna work: The device used to be in charge of tearing down its bus children ... now it neither deletes nor finalizes nor unrefs? Is the while loop even still needed? Wouldn't the busses still have the device as parent, referencing it, blocking device_finalize? Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH v2 0/6] Running Microport UNIX (ca 1987)
On Mon, 27 Aug 2012, Anthony Liguori wrote: malc av1...@comtv.ru writes: On Thu, 23 Aug 2012, Matthew Ogilvie wrote: After applying this version 2 of this patch series, I can successfully run Micoport UNIX System V/386, v 2.1 (ca 1987) under qemu. (although not if I try to enable KVM) Version 1 of this series was posted about 4 weeks ago. See http://patchwork.ozlabs.org/project/qemu-devel/list/?submitter=15654 The patches are all independent, except that the documentation part of patch 5 (vga) adds onto patch 4 (retrace=) changes. [..snip..] Applied, thanks. malc, please revert these patches. They were not adequately reviewed and they also do not qualify for the stage of the release we're in. Number 2 was, and should stay, as the emulation wasn't correct before it, don't really care about the rest. -- mailto:av1...@comtv.ru
Re: [Qemu-devel] [PATCH v2 0/6] Running Microport UNIX (ca 1987)
malc av1...@comtv.ru writes: On Mon, 27 Aug 2012, Anthony Liguori wrote: malc av1...@comtv.ru writes: On Thu, 23 Aug 2012, Matthew Ogilvie wrote: After applying this version 2 of this patch series, I can successfully run Micoport UNIX System V/386, v 2.1 (ca 1987) under qemu. (although not if I try to enable KVM) Version 1 of this series was posted about 4 weeks ago. See http://patchwork.ozlabs.org/project/qemu-devel/list/?submitter=15654 The patches are all independent, except that the documentation part of patch 5 (vga) adds onto patch 4 (retrace=) changes. [..snip..] Applied, thanks. malc, please revert these patches. They were not adequately reviewed and they also do not qualify for the stage of the release we're in. Number 2 was, and should stay, as the emulation wasn't correct before it, don't really care about the rest. Okay, please revert the rest then. Regards, Anthony Liguori -- mailto:av1...@comtv.ru
Re: [Qemu-devel] [PATCH 3/9] qbus: remove glib_allocated/qom_allocated and use release hook to free memory
Andreas Färber afaer...@suse.de writes: Am 26.08.2012 17:51, schrieb Anthony Liguori: Signed-off-by: Anthony Liguori aligu...@us.ibm.com That's a really nice solution for cleaning this up, thanks! Acked-by: Andreas Färber afaer...@suse.de However one conceptional detail... --- hw/pci.c|7 ++- hw/qdev.c | 15 --- hw/qdev.h |7 --- hw/sysbus.c |7 ++- 4 files changed, 12 insertions(+), 24 deletions(-) [...] diff --git a/hw/qdev.c b/hw/qdev.c index b5a52ac..6b61daa 100644 --- a/hw/qdev.c +++ b/hw/qdev.c [...] @@ -468,18 +466,6 @@ BusState *qbus_create(const char *typename, DeviceState *parent, const char *nam return bus; } -void qbus_free(BusState *bus) -{ -if (bus-qom_allocated) { -object_delete(OBJECT(bus)); -} else { -object_finalize(OBJECT(bus)); -if (bus-glib_allocated) { -g_free(bus); -} -} -} - static char *bus_get_fw_dev_path(BusState *bus, DeviceState *dev) { BusClass *bc = BUS_GET_CLASS(bus); @@ -698,7 +684,6 @@ static void device_finalize(Object *obj) if (dev-state == DEV_STATE_INITIALIZED) { while (dev-num_child_bus) { bus = QLIST_FIRST(dev-child_bus); -qbus_free(bus); } if (qdev_get_vmsd(dev)) { vmstate_unregister(dev, qdev_get_vmsd(dev), dev); I wonder how this is gonna work: The device used to be in charge of tearing down its bus children ... now it neither deletes nor finalizes nor unrefs? Is the while loop even still needed? Wouldn't the busses still have the device as parent, referencing it, blocking device_finalize? This has never been right.. Just because a controller goes away, it doesn't mean that the devices ought to go away too. There are different types of remove so let's consider each. 1) Guest visible eject: if a controller is ejected, then the guest will obviously see everything behind it get removed too. This is an emulation detail, not a QOM thing. 2) Final deletion: this only happens when all references go away. If you eject a controller but there are still children that reference it, the controller won't go away. You actually need to delete each individual disk (or whatever is behind it) in order to break the reference counting. The eject notifier could walk the full bus and attempt to break the connections but honestly, I'd much prefer that we deprecate the current device_del interface and just do everything through QOM properties. That would mean manually deleting all of the devices behind the bus if that's really what you wanted to do. Regards, Anthony Liguori Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH v2 6/6] i8259: add -no-spurious-interrupt-hack option
Il 27/08/2012 15:55, Anthony Liguori ha scritto: This patch provides a way to optionally suppress spurious interrupts, as a workaround for systems described below: Some old operating systems do not handle spurious interrupts well, and qemu tends to generate them significantly more often than real hardware. This is the wrong approach. You add a LostTickPolicy property to the i8259 device. Isn't the i8254 the one that would need a LostTickPolicy? But this seems like a bug that is either in the i8259 emulation, or in the firmware. Your own suggestion of setting IRQ2 to level-triggered in SeaBIOS is definitely a good one. Paolo
Re: [Qemu-devel] [RFC 1/8] move qemu_irq typedef out of cpu-common.h
- Original Message - From: Peter Maydell peter.mayd...@linaro.org ... I'm not objecting to this patch if it helps us move forwards, but adding the #include to sysemu.h is effectively just adding the definition to another grabbag header (183 files include sysemu.h). It would be nicer long-term to separate out the one thing in this header that cares about qemu_irq (the extern declaration of qemu_system_powerdown). Is there a preference/suggestion in which header it should be declared? ...
Re: [Qemu-devel] [PATCH v2 0/6] Running Microport UNIX (ca 1987)
On Mon, 27 Aug 2012, Anthony Liguori wrote: malc av1...@comtv.ru writes: [..snip..] Number 2 was, and should stay, as the emulation wasn't correct before it, don't really care about the rest. Okay, please revert the rest then. Done. [..snip..] -- mailto:av1...@comtv.ru
Re: [Qemu-devel] [PATCH 3/9] qbus: remove glib_allocated/qom_allocated and use release hook to free memory
Am 27.08.2012 16:22, schrieb Anthony Liguori: Andreas Färber afaer...@suse.de writes: I wonder how this is gonna work: The device used to be in charge of tearing down its bus children ... now it neither deletes nor finalizes nor unrefs? Is the while loop even still needed? Wouldn't the busses still have the device as parent, referencing it, blocking device_finalize? This has never been right.. Just because a controller goes away, it doesn't mean that the devices ought to go away too. There are different types of remove so let's consider each. 1) Guest visible eject: if a controller is ejected, then the guest will obviously see everything behind it get removed too. This is an emulation detail, not a QOM thing. 2) Final deletion: this only happens when all references go away. If you eject a controller but there are still children that reference it, the controller won't go away. You actually need to delete each individual disk (or whatever is behind it) in order to break the reference counting. The eject notifier could walk the full bus and attempt to break the connections but honestly, I'd much prefer that we deprecate the current device_del interface and just do everything through QOM properties. That would mean manually deleting all of the devices behind the bus if that's really what you wanted to do. I think we're talking about different scenarios here... I was thinking PCIHostState has-a PCIBus (not PCIBus has-a PCIDevice) and final deletion. In that case I would expect that it must be guaranteed that the device that created the bus has access to the bus until it destroys it. But IIUC the PCIHostState, once unparented from its SysBus (bad example!), has a refcount of 1 (its PCIBus) thereby not being finalized? I do understand your concept of refcounting matches what Java, .NET, etc. do for objects but combined with the new QBus I feel this is blurring the encapsulations and expected semantics of the device-centric functions we have. To me the uninitfn means the whole object goes away and is incompatible with part of its children may stay behind if there are still stray references to them... we can no longer properly access them then, only devices have canonical paths, so we'd risk piling up garbage at runtime. Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3
On Mon, Aug 27, 2012 at 01:42:03PM +0100, Peter Maydell wrote: On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote: Update linux-headers to version present in Linux 3.6-rc3. Header asm-x96_64/kvm_para.h update is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 +++ linux-headers/linux/kvm.h | 3 +++ 5 files changed, 13 insertions(+), 2 deletions(-) The latest version of update-linux-headers.sh should have caused this update to include asm-generic/kvm_para.h, I think. Did the script not pull that header in, or were you maybe using an old version of the script or forgot to git add the new file? thanks -- PMM I have no idea but adding new files is not the same as updating existing ones. Why don't you add it when you update headers to a version that actually uses it? -- MST
Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3
On Mon, Aug 27, 2012 at 02:48:57PM +0200, Jan Kiszka wrote: On 2012-08-27 14:42, Peter Maydell wrote: On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote: Update linux-headers to version present in Linux 3.6-rc3. Header asm-x96_64/kvm_para.h update is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 +++ linux-headers/linux/kvm.h | 3 +++ 5 files changed, 13 insertions(+), 2 deletions(-) The latest version of update-linux-headers.sh should have caused this update to include asm-generic/kvm_para.h, I think. Did the script not pull that header in, or were you maybe using an old version of the script or forgot to git add the new file? To be fair, that is hard to guess. We should add some magic to the update script to detect new files and maybe suggest them for addition. Jan But why did you add a header to qemu without adding it to git? That's a cleaner solution and needs no magic scripting. -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH] spice: increase the verbosity of spice section in qemu --help
On 08/26/2012 12:38 AM, Yonit Halperin wrote: On 08/21/2012 03:31 PM, Eric Blake wrote: On 08/21/2012 04:54 AM, Yonit Halperin wrote: Added all spice options to the help string. This can be used by libvirt to determine which spice related features are supported by qemu. For older released, this is true; but for future versions of qemu, libvirt would much rather learn this information from QMP commands than from scraping -help output. Can we get at all of this information from QMP? No, we don't have qmp commands for any of spice config options. I don't think it should be in the scope of this patch. But since we have already declared that 1.2 is the last release where libvirt will be scraping -help output, and that 1.3 and later will allow libvirt to query all configuration information via QMP commands, I think that you really _do_ need to consider QMP commands in the scope of this patch series, if you expect libvirt to be able to react to this information in qemu 1.3. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCHv2 1/4] linux-headers: update to 3.6-rc3
On 2012-08-27 16:53, Michael S. Tsirkin wrote: On Mon, Aug 27, 2012 at 02:48:57PM +0200, Jan Kiszka wrote: On 2012-08-27 14:42, Peter Maydell wrote: On 27 August 2012 13:20, Michael S. Tsirkin m...@redhat.com wrote: Update linux-headers to version present in Linux 3.6-rc3. Header asm-x96_64/kvm_para.h update is needed for the new PV EOI feature. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-headers/asm-s390/kvm.h | 2 +- linux-headers/asm-s390/kvm_para.h | 2 +- linux-headers/asm-x86/kvm.h | 1 + linux-headers/asm-x86/kvm_para.h | 7 +++ linux-headers/linux/kvm.h | 3 +++ 5 files changed, 13 insertions(+), 2 deletions(-) The latest version of update-linux-headers.sh should have caused this update to include asm-generic/kvm_para.h, I think. Did the script not pull that header in, or were you maybe using an old version of the script or forgot to git add the new file? To be fair, that is hard to guess. We should add some magic to the update script to detect new files and maybe suggest them for addition. Jan But why did you add a header to qemu without adding it to git? That's a cleaner solution and needs no magic scripting. Yes, this would have been appropriate. Still, a simple git status -s linux-headers run at the end of the update script can help reminding people in the future. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux
[Qemu-devel] [RFC PATCH 01/13] nbd: add more constants
Avoid magic numbers and magic size computations; hide them behind #defines. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- nbd.c | 17 ++--- 1 file modificato, 10 inserzioni(+), 7 rimozioni(-) diff --git a/nbd.c b/nbd.c index 0dd60c5..8201b7a 100644 --- a/nbd.c +++ b/nbd.c @@ -57,9 +57,12 @@ /* This is all part of the official NBD API */ +#define NBD_REQUEST_SIZE(4 + 4 + 8 + 8 + 4) #define NBD_REPLY_SIZE (4 + 4 + 8) #define NBD_REQUEST_MAGIC 0x25609513 #define NBD_REPLY_MAGIC 0x67446698 +#define NBD_OPTS_MAGIC 0x49484156454F5054LL +#define NBD_CLIENT_MAGIC0x420281861253LL #define NBD_SET_SOCK_IO(0xab, 0) #define NBD_SET_BLKSIZE _IO(0xab, 1) @@ -213,7 +216,7 @@ static int nbd_send_negotiate(int csock, off_t size, uint32_t flags) /* Negotiate [ 0 .. 7] passwd (NBDMAGIC) -[ 8 .. 15] magic(0x00420281861253) +[ 8 .. 15] magic(NBD_CLIENT_MAGIC) [16 .. 23] size [24 .. 27] flags [28 .. 151] reserved (0) @@ -224,7 +227,7 @@ static int nbd_send_negotiate(int csock, off_t size, uint32_t flags) TRACE(Beginning negotiation.); memcpy(buf, NBDMAGIC, 8); -cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL); +cpu_to_be64w((uint64_t*)(buf + 8), NBD_CLIENT_MAGIC); cpu_to_be64w((uint64_t*)(buf + 16), size); cpu_to_be32w((uint32_t*)(buf + 24), flags | NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM | @@ -295,7 +298,7 @@ int nbd_receive_negotiate(int csock, const char *name, uint32_t *flags, uint32_t namesize; TRACE(Checking magic (opts_magic)); -if (magic != 0x49484156454F5054LL) { +if (magic != NBD_OPTS_MAGIC) { LOG(Bad magic received); goto fail; } @@ -334,7 +337,7 @@ int nbd_receive_negotiate(int csock, const char *name, uint32_t *flags, } else { TRACE(Checking magic (cli_magic)); -if (magic != 0x00420281861253LL) { +if (magic != NBD_CLIENT_MAGIC) { LOG(Bad magic received); goto fail; } @@ -477,7 +480,7 @@ int nbd_client(int fd) ssize_t nbd_send_request(int csock, struct nbd_request *request) { -uint8_t buf[4 + 4 + 8 + 8 + 4]; +uint8_t buf[NBD_REQUEST_SIZE]; ssize_t ret; cpu_to_be32w((uint32_t*)buf, NBD_REQUEST_MAGIC); @@ -504,7 +507,7 @@ ssize_t nbd_send_request(int csock, struct nbd_request *request) static ssize_t nbd_receive_request(int csock, struct nbd_request *request) { -uint8_t buf[4 + 4 + 8 + 8 + 4]; +uint8_t buf[NBD_REQUEST_SIZE]; uint32_t magic; ssize_t ret; @@ -582,7 +585,7 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply *reply) static ssize_t nbd_send_reply(int csock, struct nbd_reply *reply) { -uint8_t buf[4 + 4 + 8]; +uint8_t buf[NBD_REPLY_SIZE]; ssize_t ret; /* Reply -- 1.7.11.2
[Qemu-devel] [RFC PATCH 06/13] nbd: negotiate with named exports
Allow negotiation to receive the name of the requested export from the client. Passing a NULL export to nbd_client_new will cause the server to send the extended negotiation header. The exp field is then filled during negotiation. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- nbd.c | 155 +++--- 1 file modificato, 140 inserzioni(+), 15 rimozioni(-) diff --git a/nbd.c b/nbd.c index 1249548..fe7551d 100644 --- a/nbd.c +++ b/nbd.c @@ -234,11 +234,23 @@ int unix_socket_outgoing(const char *path) return unix_connect(path); } -/* Basic flow +/* Basic flow for negotiation Server Client - Negotiate + + or + + Server Client + Negotiate #1 + Option + Negotiate #2 + + + + followed by + + Server Client Request Response Request @@ -246,20 +258,110 @@ int unix_socket_outgoing(const char *path) ... ... Request (type == 2) + */ +static int nbd_receive_options(NBDClient *client) +{ +int csock = client-sock; +char name[256]; +uint32_t tmp, length; +uint64_t magic; +int rc; + +/* Client sends: +[ 0 .. 3] reserved (0) +[ 4 .. 11] NBD_OPTS_MAGIC +[12 .. 15] NBD_OPT_EXPORT_NAME +[16 .. 19] length +[20 .. xx] export name (length bytes) + */ + +rc = -EINVAL; +if (read_sync(csock, tmp, sizeof(tmp)) != sizeof(tmp)) { +LOG(read failed); +goto fail; +} +TRACE(Checking reserved); +if (tmp != 0) { +LOG(Bad reserved received); +goto fail; +} + +if (read_sync(csock, magic, sizeof(magic)) != sizeof(magic)) { +LOG(read failed); +goto fail; +} +TRACE(Checking reserved); +if (magic != be64_to_cpu(NBD_OPTS_MAGIC)) { +LOG(Bad magic received); +goto fail; +} + +if (read_sync(csock, tmp, sizeof(tmp)) != sizeof(tmp)) { +LOG(read failed); +goto fail; +} +TRACE(Checking option); +if (tmp != be32_to_cpu(NBD_OPT_EXPORT_NAME)) { +LOG(Bad option received); +goto fail; +} + +if (read_sync(csock, length, sizeof(length)) != sizeof(length)) { +LOG(read failed); +goto fail; +} +TRACE(Checking length); +length = be32_to_cpu(length); +if (length 255) { +LOG(Bad length received); +goto fail; +} +if (read_sync(csock, name, length) != length) { +LOG(read failed); +goto fail; +} +name[length] = '\0'; + +client-exp = nbd_export_find(name); +if (!client-exp) { +LOG(export not found); +goto fail; +} + +QTAILQ_INSERT_TAIL(client-exp-clients, client, next); +TRACE(Option negotiation succeeded.); +rc = 0; +fail: +return rc; +} + static int nbd_send_negotiate(NBDClient *client) { int csock = client-sock; char buf[8 + 8 + 8 + 128]; int rc; +const int myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM | + NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA); -/* Negotiate -[ 0 .. 7] passwd (NBDMAGIC) -[ 8 .. 15] magic(NBD_CLIENT_MAGIC) +/* Negotiation header without options: +[ 0 .. 7] passwd (NBDMAGIC) +[ 8 .. 15] magic(NBD_CLIENT_MAGIC) [16 .. 23] size -[24 .. 27] flags -[28 .. 151] reserved (0) +[24 .. 25] server flags (0) +[24 .. 27] export flags +[28 .. 151] reserved (0) + + Negotiation header with options, part 1: +[ 0 .. 7] passwd (NBDMAGIC) +[ 8 .. 15] magic(NBD_OPTS_MAGIC) +[16 .. 17] server flags (0) + + part 2 (after options are sent): +[18 .. 25] size +[26 .. 27] export flags +[28 .. 151] reserved (0) */ socket_set_block(csock); @@ -267,16 +369,39 @@ static int nbd_send_negotiate(NBDClient *client) TRACE(Beginning negotiation.); memcpy(buf, NBDMAGIC, 8); -cpu_to_be64w((uint64_t*)(buf + 8), NBD_CLIENT_MAGIC); -cpu_to_be64w((uint64_t*)(buf + 16), client-exp-size); -cpu_to_be32w((uint32_t*)(buf + 24), - client-exp-nbdflags | NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM | - NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA); +if (client-exp) { +assert ((client-exp-nbdflags ~65535) == 0); +cpu_to_be64w((uint64_t*)(buf + 8), NBD_CLIENT_MAGIC); +cpu_to_be64w((uint64_t*)(buf + 16), client-exp-size); +cpu_to_be16w((uint16_t*)(buf + 26), client-exp-nbdflags | myflags); +} else { +cpu_to_be64w((uint64_t*)(buf + 8), NBD_OPTS_MAGIC); +} memset(buf + 28, 0, 124); -if (write_sync(csock, buf, sizeof(buf)) != sizeof(buf)) { -LOG(write failed); -goto fail; +if (client-exp)
[Qemu-devel] [RFC PATCH 09/13] qmp: add NBD server commands
Adding an NBD server inside QEMU is trivial, since all the logic is in nbd.c and can be shared easily between qemu-nbd and QEMU itself. The main difference is that qemu-nbd serves a single unnamed export, while QEMU serves named exports. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- Makefile.objs| 2 +- blockdev-nbd.c | 93 qapi-schema.json | 69 + qmp-commands.hx | 16 ++ 4 file modificati, 179 inserzioni(+). 1 rimozione(-) create mode 100644 blockdev-nbd.c diff --git a/Makefile.objs b/Makefile.objs index 4412757..c42affc 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -59,7 +59,7 @@ endif # suppress *all* target specific code in case of system emulation, i.e. a # single QEMU executable should support all CPUs and machines. -common-obj-y = $(block-obj-y) blockdev.o +common-obj-y = $(block-obj-y) blockdev.o blockdev-nbd.o common-obj-y += net.o net/ common-obj-y += qom/ common-obj-y += readline.o console.o cursor.o diff --git a/blockdev-nbd.c b/blockdev-nbd.c new file mode 100644 index 000..5a415be --- /dev/null +++ b/blockdev-nbd.c @@ -0,0 +1,93 @@ +/* + * QEMU host block devices + * + * Copyright (c) 2003-2008 Fabrice Bellard + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include blockdev.h +#include hw/block-common.h +#include monitor.h +#include qerror.h +#include sysemu.h +#include qmp-commands.h +#include trace.h +#include nbd.h +#include qemu_socket.h + +static int server_fd = -1; + +static void nbd_accept(void *opaque) +{ +struct sockaddr_in addr; +socklen_t addr_len = sizeof(addr); + +int fd = accept(server_fd, (struct sockaddr *)addr, addr_len); +if (fd = 0) { +nbd_client_new(NULL, fd, NULL); +} +} + +static void nbd_server_start(QemuOpts *opts, Error **errp) +{ +if (server_fd != -1) { +/* TODO: error */ +return; +} + +server_fd = inet_listen_opts(opts, 0, errp); +if (server_fd != -1) { +qemu_set_fd_handler2(server_fd, NULL, nbd_accept, NULL, NULL); +} +} + +void qmp_nbd_server_start(IPSocketAddress *addr, Error **errp) +{ +QemuOpts *opts; + +opts = qemu_opts_create(socket_opts, NULL, 0, NULL); +qemu_opt_set(opts, host, addr-host); +qemu_opt_set(opts, port, addr-port); + +addr-ipv4 |= !addr-has_ipv4; +addr-ipv6 |= !addr-has_ipv6; +if (!addr-ipv4 || !addr-ipv6) { +qemu_opt_set_bool(opts, ipv4, addr-ipv4); +qemu_opt_set_bool(opts, ipv6, addr-ipv6); +} + +nbd_server_start(opts, errp); +qemu_opts_del(opts); +} + + +void qmp_nbd_server_add(const char *device, bool has_writable, bool writable, +Error **errp) +{ +BlockDriverState *bs; +NBDExport *exp; + +bs = bdrv_find(device); +if (!bs) { +error_set(errp, QERR_DEVICE_NOT_FOUND, device); +return; +} + +if (nbd_export_find(bdrv_get_device_name(bs))) { +/* TODO: error */ +return; +} + +exp = nbd_export_new(bs, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY); +nbd_export_set_name(exp, device); +} + +void qmp_nbd_server_stop(Error **errp) +{ +nbd_export_close_all(); +qemu_set_fd_handler2(server_fd, NULL, NULL, NULL, NULL); +close(server_fd); +server_fd = -1; +} diff --git a/qapi-schema.json b/qapi-schema.json index 3d2b2d1..d792d2c 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2275,6 +2275,30 @@ 'opts': 'NetClientOptions' } } ## +# @IPSocketAddress +# +# Captures the destination address of an IP socket +# +# @host: host part of the address +# +# @port: port part of the address, or lowest port if @to is present +# +# @to: highest port to try +# +# @ipv4: whether to accept IPv4 addresses, default try both IPv4 and IPv6 +# +# @ipv6: whether to accept IPv6 addresses, default try both IPv4 and IPv6 +# +# Since 1.3 +## +{ 'type': 'IPSocketAddress', + 'data': { +'host': 'str', +'port': 'str', +'*ipv4': 'bool', +'*ipv6': 'bool' } } + +## # @getfd: # # Receive a file descriptor via SCM rights and assign it a name @@ -2454,3 +2478,46 @@ # ## { 'command': 'query-fdsets', 'returns': ['FdsetInfo'] } + +## +# @nbd-server-start: +# +# Start an NBD server listening on the given host and port. +# +# @addr: Interface on which to listen, nothing for all interfaces. +# +# Since: 1.3.0 +# +## +{ 'command': 'nbd-server-start', + 'data': { 'addr': 'IPSocketAddress' } } + +## +# @nbd-server-add: +# +# Export a device to QEMU's embedded NBD server. +# +# @device: Block device to be exported +# +# @writable: Whether clients should be able to write to the device via the +# NBD connection (default false). +# +# Returns: error if the device is already marked for export. +# +# Since: 1.3.0 +# +## +{ 'command': 'nbd-server-add', 'data': {'device': 'str', '*writable': 'bool'} } + +## +#
[Qemu-devel] [RFC PATCH 11/13] hmp: add NBD server commands
At the HMP level there is no nbd_server_add command. nbd_server_start automatically exposes all of the VM's block devices, and an option -w makes them writable. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hmp-commands.hx | 29 + hmp.c | 66 + hmp.h | 2 ++ 3 file modificati, 97 inserzioni(+) diff --git a/hmp-commands.hx b/hmp-commands.hx index f6104b0..cabb886 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1248,6 +1248,35 @@ Remove all matches from the access control list, and set the default policy back to @code{deny}. ETEXI +{ +.name = nbd_server_start, +.args_type = writable:-w,uri:s, +.params = nbd_server_start [-w] host:port, +.help = serve block devices on the given host and port, +.mhandler.cmd = hmp_nbd_server_start, +}, +STEXI +@item nbd_server_start @var{host}:@var{port} +@findex nbd_server_start +Start an NBD server on the given host and/or port, and serve all of the +virtual machine's block devices that have an inserted media on it. +The @option{-w} option makes the devices writable. +ETEXI + +{ +.name = nbd_server_stop, +.args_type = , +.params = nbd_server_stop, +.help = stop serving block devices using the NBD protocol, +.mhandler.cmd = hmp_nbd_server_stop, +}, +STEXI +@item nbd_server_stop +@findex nbd_server_stop +Stop the QEMU embedded NBD server. +ETEXI + + #if defined(TARGET_I386) { diff --git a/hmp.c b/hmp.c index a9d5675..10ff50d 100644 --- a/hmp.c +++ b/hmp.c @@ -18,6 +18,7 @@ #include qemu-option.h #include qemu-timer.h #include qmp-commands.h +#include qemu_socket.h #include monitor.h static void hmp_handle_error(Monitor *mon, Error **errp) @@ -1102,3 +1103,68 @@ void hmp_closefd(Monitor *mon, const QDict *qdict) qmp_closefd(fdname, errp); hmp_handle_error(mon, errp); } + +void hmp_nbd_server_start(Monitor *mon, const QDict *qdict) +{ +const char *uri = qdict_get_str(qdict, uri); +int writable = qdict_get_try_bool(qdict, writable, 0); +Error *errp = NULL; +QemuOpts *opts; +BlockDriverState *bs; +IPSocketAddress addr; + +/* First check if the address is available and start the server. */ +opts = qemu_opts_create(socket_opts, NULL, 0, NULL); +if (inet_parse(opts, uri) != 0) { +error_set(errp, QERR_SOCKET_CREATE_FAILED); + goto exit; +} + +memset(addr, 0, sizeof(addr)); +addr.host = (char *) qemu_opt_get(opts, host); +addr.port = (char *) qemu_opt_get(opts, port); +addr.ipv4 = qemu_opt_get_bool(opts, ipv4, 0); +addr.ipv6 = qemu_opt_get_bool(opts, ipv6, 0); +addr.has_ipv4 = addr.has_ipv6 = true; + +if (addr.host == NULL || addr.port == NULL) { +error_set(errp, QERR_SOCKET_CREATE_FAILED); +goto exit; +} + +qmp_nbd_server_start(addr, errp); +if (errp != NULL) { +goto exit; +} + +/* Then try adding all block devices. If one fails, close all and + * exit. + */ +bs = NULL; +while ((bs = bdrv_next(bs))) { +if (!bdrv_is_inserted(bs)) { +continue; +} + +qmp_nbd_server_add(bdrv_get_device_name(bs), + true, !bdrv_is_read_only(bs) writable, + errp); + +if (errp != NULL) { +qmp_nbd_server_stop(NULL); +break; +} +} + +exit: +qemu_opts_del(opts); +hmp_handle_error(mon, errp); +} + +void hmp_nbd_server_stop(Monitor *mon, const QDict *qdict) +{ +Error *errp = NULL; + +qmp_nbd_server_stop(errp); +hmp_handle_error(mon, errp); +} diff --git a/hmp.h b/hmp.h index 7dd93bf..89d3960 100644 --- a/hmp.h +++ b/hmp.h @@ -71,5 +71,7 @@ void hmp_netdev_add(Monitor *mon, const QDict *qdict); void hmp_netdev_del(Monitor *mon, const QDict *qdict); void hmp_getfd(Monitor *mon, const QDict *qdict); void hmp_closefd(Monitor *mon, const QDict *qdict); +void hmp_nbd_server_start(Monitor *mon, const QDict *qdict); +void hmp_nbd_server_stop(Monitor *mon, const QDict *qdict); #endif -- 1.7.11.2
Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycle problem
On 2012-08-27 15:19, Anthony Liguori wrote: Liu Ping Fan qemul...@gmail.com writes: From: Liu Ping Fan pingf...@linux.vnet.ibm.com Scene: obja lies in objA, when objA's ref-0, it will be freed, but at that time obja can still be in use. The real example is: typedef struct PCIIDEState { PCIDevice dev; IDEBus bus[2]; -- create in place . } When without big lock protection for mmio-dispatch, we will hold obj's refcnt. So memory_region_init_io() will replace the third para void *opaque with Object *obj. With this patch, we can protect PCIIDEState from disappearing during mmio-dispatch hold the IDEBus-ref. And the ref circle has been broken when calling qdev_delete_subtree(). Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com I think this is solving the wrong problem. There are many, many dependencies a device may have on other devices. Memory allocation isn't the only one. The problem is that we want to make sure that a device doesn't go away while an MMIO dispatch is happening. This is easy to solve without touching referencing counting. The device will hold a lock while the MMIO is being dispatched. The delete path simply needs to acquire that same lock. This will ensure that a delete operation cannot finish while MMIO is still in flight. That's a bit too simple. Quite a few MMIO/PIO fast-paths will work without any device-specific locking, e.g. just to read a simple register value. So we will need reference counting (for devices using private locks), but on the front-line object: the memory region. That region will block its owner from disappearing by waiting on dispatch when someone tries to unregister it. Also note that holding a lock is easily said but will be more tricky in practice. Quite a significant share of our code will continue to run under BQL, even for devices with their own locks. Init/cleanup functions will likely fall into this category, simply because the surrounding logic is hard to convert into fine-grained locking and is also not performance critical. At the same time, we can't take BQL - device-lock as we have to support device-lock - BQL ordering for (slow-path) calls into BQL-protected areas while holding a per-device lock (e.g. device mapping changes). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux
[Qemu-devel] [RFC PATCH 07/13] nbd: do not close BlockDriverState in nbd_export_close
This is not desirable when embedding the NBD server inside QEMU. Move the bdrv_close to qemu-nbd. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- nbd.c | 1 - qemu-nbd.c | 1 + 2 file modificati, 1 inserzione(+). 1 rimozione(-) diff --git a/nbd.c b/nbd.c index fe7551d..1f65b1f 100644 --- a/nbd.c +++ b/nbd.c @@ -893,7 +893,6 @@ void nbd_export_close(NBDExport *exp) g_free(exp-name); } -bdrv_close(exp-bs); g_free(exp); } diff --git a/qemu-nbd.c b/qemu-nbd.c index 1c1cf6a..23392e0 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -586,6 +586,7 @@ int main(int argc, char **argv) } while (!sigterm_reported (persistent || !nbd_started || nb_fds 0)); nbd_export_close(exp); +bdrv_close(bs); if (sockpath) { unlink(sockpath); } -- 1.7.11.2
Re: [Qemu-devel] [PATCH v2 0/6] Running Microport UNIX (ca 1987)
malc av1...@comtv.ru writes: On Mon, 27 Aug 2012, Anthony Liguori wrote: malc av1...@comtv.ru writes: [..snip..] Number 2 was, and should stay, as the emulation wasn't correct before it, don't really care about the rest. Okay, please revert the rest then. Done. Thank you! Regards, Anthony Liguori [..snip..] -- mailto:av1...@comtv.ru
Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycle problem
Jan Kiszka jan.kis...@siemens.com writes: On 2012-08-27 15:19, Anthony Liguori wrote: Liu Ping Fan qemul...@gmail.com writes: From: Liu Ping Fan pingf...@linux.vnet.ibm.com Scene: obja lies in objA, when objA's ref-0, it will be freed, but at that time obja can still be in use. The real example is: typedef struct PCIIDEState { PCIDevice dev; IDEBus bus[2]; -- create in place . } When without big lock protection for mmio-dispatch, we will hold obj's refcnt. So memory_region_init_io() will replace the third para void *opaque with Object *obj. With this patch, we can protect PCIIDEState from disappearing during mmio-dispatch hold the IDEBus-ref. And the ref circle has been broken when calling qdev_delete_subtree(). Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com I think this is solving the wrong problem. There are many, many dependencies a device may have on other devices. Memory allocation isn't the only one. The problem is that we want to make sure that a device doesn't go away while an MMIO dispatch is happening. This is easy to solve without touching referencing counting. The device will hold a lock while the MMIO is being dispatched. The delete path simply needs to acquire that same lock. This will ensure that a delete operation cannot finish while MMIO is still in flight. That's a bit too simple. Quite a few MMIO/PIO fast-paths will work without any device-specific locking, e.g. just to read a simple register value. So we will need reference counting But then you'll need to acquire a lock to take the reference/remove the reference which sort of defeats the purpose of trying to fast path. (for devices using private locks), but on the front-line object: the memory region. That region will block its owner from disappearing by waiting on dispatch when someone tries to unregister it. Also note that holding a lock is easily said but will be more tricky in practice. Quite a significant share of our code will continue to run under BQL, even for devices with their own locks. Init/cleanup functions will likely fall into this category, I'm not sure I'm convinced of this--but it's hard to tell until we really start converting. BTW, I'm pretty sure we have to tackle main loop functions first before we try to convert any devices off the BQL. Regards, Anthony Liguori simply because the surrounding logic is hard to convert into fine-grained locking and is also not performance critical. At the same time, we can't take BQL - device-lock as we have to support device-lock - BQL ordering for (slow-path) calls into BQL-protected areas while holding a per-device lock (e.g. device mapping changes). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH 4/9] object: remove object_finalize
Am 26.08.2012 17:51, schrieb Anthony Liguori: Callers should just use object_unref Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- hw/qdev.c |4 include/qemu/object.h |9 - qom/object.c |2 +- 3 files changed, 1 insertions(+), 14 deletions(-) diff --git a/hw/qdev.c b/hw/qdev.c index 6b61daa..fdee91f 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -678,13 +678,9 @@ static void device_initfn(Object *obj) static void device_finalize(Object *obj) { DeviceState *dev = DEVICE(obj); -BusState *bus; DeviceClass *dc = DEVICE_GET_CLASS(dev); if (dev-state == DEV_STATE_INITIALIZED) { -while (dev-num_child_bus) { -bus = QLIST_FIRST(dev-child_bus); -} if (qdev_get_vmsd(dev)) { vmstate_unregister(dev, qdev_get_vmsd(dev), dev); } This seems to answer my remark on 3/9, should've been squashed into that one. diff --git a/include/qemu/object.h b/include/qemu/object.h index 487adcd..8bc9935 100644 --- a/include/qemu/object.h +++ b/include/qemu/object.h @@ -490,15 +490,6 @@ void object_initialize_with_type(void *data, Type type); void object_initialize(void *obj, const char *typename); /** - * object_finalize: - * @obj: The object to finalize. - * - * This function destroys and object without freeing the memory associated with - * it. - */ -void object_finalize(void *obj); - -/** * object_dynamic_cast: * @obj: The object to cast. * @typename: The @typename to cast to. diff --git a/qom/object.c b/qom/object.c index 44135c3..1144f79 100644 --- a/qom/object.c +++ b/qom/object.c @@ -375,7 +375,7 @@ static void object_deinit(Object *obj, TypeImpl *type) } } -void object_finalize(void *data) +static void object_finalize(void *data) { Object *obj = data; TypeImpl *ti = obj-class-type; This is what I was referring to with breaking the encapsulation on 3/9: When we have a PHB with embedded PCIDevice on its PCIBus, as demonstrated with i440fx and prep_pci, then when doing object_delete() on the whole thing I expect the main object's finalizer to call object_finalize() on its embedded childs, forcing their uninit or an assert if a programming error. Not just an unref that might or might not finalize it. If however finalize is called only at refcount 0 then who will unref the self-created children? Finalize would never be called due to pending references by its children... Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycle problem
On 2012-08-27 17:14, Anthony Liguori wrote: Jan Kiszka jan.kis...@siemens.com writes: On 2012-08-27 15:19, Anthony Liguori wrote: Liu Ping Fan qemul...@gmail.com writes: From: Liu Ping Fan pingf...@linux.vnet.ibm.com Scene: obja lies in objA, when objA's ref-0, it will be freed, but at that time obja can still be in use. The real example is: typedef struct PCIIDEState { PCIDevice dev; IDEBus bus[2]; -- create in place . } When without big lock protection for mmio-dispatch, we will hold obj's refcnt. So memory_region_init_io() will replace the third para void *opaque with Object *obj. With this patch, we can protect PCIIDEState from disappearing during mmio-dispatch hold the IDEBus-ref. And the ref circle has been broken when calling qdev_delete_subtree(). Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com I think this is solving the wrong problem. There are many, many dependencies a device may have on other devices. Memory allocation isn't the only one. The problem is that we want to make sure that a device doesn't go away while an MMIO dispatch is happening. This is easy to solve without touching referencing counting. The device will hold a lock while the MMIO is being dispatched. The delete path simply needs to acquire that same lock. This will ensure that a delete operation cannot finish while MMIO is still in flight. That's a bit too simple. Quite a few MMIO/PIO fast-paths will work without any device-specific locking, e.g. just to read a simple register value. So we will need reference counting But then you'll need to acquire a lock to take the reference/remove the reference which sort of defeats the purpose of trying to fast path. Atomic ops? RCU? This problem won't be solved for the first time. (for devices using private locks), but on the front-line object: the memory region. That region will block its owner from disappearing by waiting on dispatch when someone tries to unregister it. Also note that holding a lock is easily said but will be more tricky in practice. Quite a significant share of our code will continue to run under BQL, even for devices with their own locks. Init/cleanup functions will likely fall into this category, I'm not sure I'm convinced of this--but it's hard to tell until we really start converting. BTW, I'm pretty sure we have to tackle main loop functions first before we try to convert any devices off the BQL. I'm sure we should leave existing code alone wherever possible, focusing on providing alternative versions for those paths that matter. Example: Most timers are fine under BQL. But some sensitive devices (RTC or HPET as clock source) will want their own timers. So the approach is to instantiate a separate, also prioritizeable instance of the timer subsystem for them and be done. We won't convert QEMU in a day, but we surely want results before the last corner is refactored (which would take years, at best). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux