Re: [Qemu-devel] [PATCH] pci: add standard bridge device
At 09/07/2011 07:52 PM, Michael S. Tsirkin Write: On Wed, Sep 07, 2011 at 12:39:09PM +0800, Wen Congyang wrote: At 09/06/2011 03:45 PM, Avi Kivity Write: On 09/06/2011 06:06 AM, Wen Congyang wrote: Use the uio driver - http://docs.blackfin.uclinux.org/kernel/generated/uio-howto/. You just mmap() the BAR from userspace and play with it. When I try to bind ivshmem to uio_pci_generic, I get the following messages: uio_pci_generic :01:01.0: No IRQ assigned to device: no support for interrupts? No idea what this means. PCI 3.0 6.2.4 For x86 based PCs, the values in this register correspond to IRQ numbers (0-15) of the standard dual 8259 configuration. The value 255 is defined as meaning unknown or no connection to the interrupt controller. Values between 15 and 254 are reserved. The register is interrupt line. I read the config of this device, the interrupt line is 0. It means that it uses the IRQ0. The following is the uio_pci_generic's code: static int __devinit probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct uio_pci_generic_dev *gdev; int err; err = pci_enable_device(pdev); if (err) { dev_err(pdev-dev, %s: pci_enable_device failed: %d\n, __func__, err); return err; } if (!pdev-irq) { dev_warn(pdev-dev, No IRQ assigned to device: no support for interrupts?\n); pci_disable_device(pdev); return -ENODEV; } ... } This function will be called when we write 'domain:bus:slot.function' to /sys/bus/pci/drivers/uio_pci_generic/bind. pdev-irq is 0, it means the device uses IRQ0. But we refuse it. I do not why. To Michael S. Tsirkin This code is writen by you. Do you know why you check whether pdev-irq is 0? Thanks Wen Congyang Well I see this in linux: /* * Read interrupt line and base address registers. * The architecture-dependent code can tweak these, of course. */ static void pci_read_irq(struct pci_dev *dev) { unsigned char irq; pci_read_config_byte(dev, PCI_INTERRUPT_PIN, irq); dev-pin = irq; if (irq) pci_read_config_byte(dev, PCI_INTERRUPT_LINE, irq); dev-irq = irq; } Thus a device without an interrupt pin will get irq set to 0, and this seems the right way to detect such devices. I don't think PCI devices really use IRQ0 in practice, its probably used for PC things. More likely the system is misconfigured. Try lspci -vv to see what went wrong. Yes, the PCI device shoulde not use IRQ0. I debug qemu's code, and find the PCI_INTERRUPT_LINE register is not set by qemu: = Hardware watchpoint 6: ((uint8_t *) 0x164e410)[0x3c] Old value = 0 '\000' New value = 10 '\n' pci_default_write_config (d=0x1653ed0, addr=60, val=10, l=1) at /home/wency/source/qemu/hw/pci.c:1115 1115d-config[addr + i] = ~(val w1cmask); /* W1C: Write 1 to Clear */ Missing separate debuginfos, use: debuginfo-install cyrus-sasl-gssapi-2.1.23-8.el6.x86_64 cyrus-sasl-md5-2.1.23-8.el6.x86_64 cyrus-sasl-plain-2.1.23-8.el6.x86_64 db4-4.7.25-16.el6.x86_64 (gdb) bt #0 pci_default_write_config (d=0x1653ed0, addr=60, val=10, l=1) at /home/wency/source/qemu/hw/pci.c:1115 #1 0x004d5827 in pci_host_config_write_common (pci_dev=0x1653ed0, addr=60, limit=256, val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:54 #2 0x004d5939 in pci_data_write (s=0x15f95a0, addr=2147502140, val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:75 #3 0x004d5b19 in pci_host_data_write (handler=0x15f9570, addr=3324, val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:125 #4 0x0063ee06 in ioport_simple_writeb (opaque=0x15f9570, addr=3324, value=10) at /home/wency/source/qemu/rwhandler.c:48 #5 0x00470db9 in ioport_write (index=0, address=3324, data=10) at ioport.c:81 #6 0x004717bc in cpu_outb (addr=3324, val=10 '\n') at ioport.c:273 #7 0x005ef25d in kvm_handle_io (port=3324, data=0x77ff8000, direction=1, size=1, count=1) at /home/wency/source/qemu/kvm-all.c:834 #8 0x005ef7e6 in kvm_cpu_exec (env=0x13da0d0) at /home/wency/source/qemu/kvm-all.c:976 #9 0x005c1a7b in qemu_kvm_cpu_thread_fn (arg=0x13da0d0) at /home/wency/source/qemu/cpus.c:661 #10 0x0032864077e1 in start_thread () from /lib64/libpthread.so.0 #11 0x0032858e68ed in clone () from /lib64/libc.so.6 = If I put ivshmem on bus 0, the PCI_INTERRUPT_LINE register can be set. So I guess this register is set by bios. I use the newest seabios, and PCI_INTERRUPT_LINE register is not set if the deivce is not on bus0. # lspci -vv 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) Subsystem: Red Hat, Inc Qemu virtual machine Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Re: [Qemu-devel] [PATCH v3 02/27] ide: Use a table to declare which drive kinds accept each command
Kevin Wolf kw...@redhat.com writes: Am 06.09.2011 18:58, schrieb Markus Armbruster: No functional change. It would be nice to have handler functions in the table, like commit e1a064f9 did for ATAPI. Left for another day. Signed-off-by: Markus Armbruster arm...@redhat.com --- hw/ide/core.c | 105 +++- 1 files changed, 80 insertions(+), 25 deletions(-) +[IBM_SENSE_CONDITION] = CFA_OK, +[CFA_WEAR_LEVEL]= CFA_OK, +[WIN_READ_NATIVE_MAX] = ALL_OK, +}; + +static bool ide_cmd_permitted(IDEState *s, uint32_t cmd) +{ +return cmd = ARRAY_SIZE(ide_cmd_table) Shouldn't it be instead of = ? I plead temporary insanity. Want a v4, or want to fix it up yourself?
Re: [Qemu-devel] [PATCH v3 14/27] block: Rename bdrv_set_locked() to bdrv_lock_medium()
Kevin Wolf kw...@redhat.com writes: Am 06.09.2011 18:58, schrieb Markus Armbruster: While there, make the locked parameter bool. Signed-off-by: Markus Armbruster arm...@redhat.com --- block.c |8 block.h |2 +- block/raw-posix.c |8 block/raw.c |6 +++--- block_int.h |2 +- hw/ide/atapi.c|2 +- hw/scsi-disk.c|2 +- trace-events |1 + 8 files changed, 16 insertions(+), 15 deletions(-) diff --git a/block.c b/block.c index 1e4be73..7225b15 100644 --- a/block.c +++ b/block.c @@ -3072,14 +3072,14 @@ void bdrv_eject(BlockDriverState *bs, int eject_flag) * Lock or unlock the media (if it is locked, the user won't be able * to eject it manually). */ -void bdrv_set_locked(BlockDriverState *bs, int locked) +void bdrv_lock_medium(BlockDriverState *bs, bool locked) { BlockDriver *drv = bs-drv; -trace_bdrv_set_locked(bs, locked); +trace_bdrv_lock_medium(bs, locked); -if (drv drv-bdrv_set_locked) { -drv-bdrv_set_locked(bs, locked); +if (drv drv-bdrv_lock_medium) { +drv-bdrv_lock_medium(bs, locked); } } diff --git a/block.h b/block.h index 396ca0e..4691090 100644 --- a/block.h +++ b/block.h @@ -212,7 +212,7 @@ int bdrv_is_sg(BlockDriverState *bs); int bdrv_enable_write_cache(BlockDriverState *bs); int bdrv_is_inserted(BlockDriverState *bs); int bdrv_media_changed(BlockDriverState *bs); -void bdrv_set_locked(BlockDriverState *bs, int locked); +void bdrv_lock_medium(BlockDriverState *bs, bool locked); void bdrv_eject(BlockDriverState *bs, int eject_flag); void bdrv_get_format(BlockDriverState *bs, char *buf, int buf_size); BlockDriverState *bdrv_find(const char *name); diff --git a/block/raw-posix.c b/block/raw-posix.c index bcf50b2..a624f56 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -1362,7 +1362,7 @@ static void cdrom_eject(BlockDriverState *bs, int eject_flag) } } -static void cdrom_set_locked(BlockDriverState *bs, int locked) +static void cdrom_lock_medium(BlockDriverState *bs, bool locked) { BDRVRawState *s = bs-opaque; @@ -1400,7 +1400,7 @@ static BlockDriver bdrv_host_cdrom = { /* removable device support */ .bdrv_is_inserted = cdrom_is_inserted, .bdrv_eject = cdrom_eject, -.bdrv_set_locked= cdrom_set_locked, +.bdrv_lock_medium = cdrom_lock_medium, /* generic scsi device */ .bdrv_ioctl = hdev_ioctl, @@ -1481,7 +1481,7 @@ static void cdrom_eject(BlockDriverState *bs, int eject_flag) cdrom_reopen(bs); } -static void cdrom_set_locked(BlockDriverState *bs, int locked) +static void cdrom_lock_medium(BlockDriverState *bs, bool locked) { BDRVRawState *s = bs-opaque; @@ -1521,7 +1521,7 @@ static BlockDriver bdrv_host_cdrom = { /* removable device support */ .bdrv_is_inserted = cdrom_is_inserted, .bdrv_eject = cdrom_eject, -.bdrv_set_locked= cdrom_set_locked, +.bdrv_lock_medium = cdrom_lock_medium, }; #endif /* __FreeBSD__ */ diff --git a/block/raw.c b/block/raw.c index f197479..63cf2d3 100644 --- a/block/raw.c +++ b/block/raw.c @@ -85,9 +85,9 @@ static void raw_eject(BlockDriverState *bs, int eject_flag) bdrv_eject(bs-file, eject_flag); } -static void raw_set_locked(BlockDriverState *bs, int locked) +static void raw_lock_medium(BlockDriverState *bs, bool locked) { -bdrv_set_locked(bs-file, locked); +bdrv_lock_medium(bs-file, locked); } static int raw_ioctl(BlockDriverState *bs, unsigned long int req, void *buf) @@ -144,7 +144,7 @@ static BlockDriver bdrv_raw = { .bdrv_is_inserted = raw_is_inserted, .bdrv_media_changed = raw_media_changed, .bdrv_eject = raw_eject, -.bdrv_set_locked= raw_set_locked, +.bdrv_lock_medium = raw_lock_medium, .bdrv_ioctl = raw_ioctl, .bdrv_aio_ioctl = raw_aio_ioctl, diff --git a/block_int.h b/block_int.h index 4f7ff3b..f42af2c 100644 --- a/block_int.h +++ b/block_int.h @@ -120,7 +120,7 @@ struct BlockDriver { int (*bdrv_is_inserted)(BlockDriverState *bs); int (*bdrv_media_changed)(BlockDriverState *bs); void (*bdrv_eject)(BlockDriverState *bs, int eject_flag); -void (*bdrv_set_locked)(BlockDriverState *bs, int locked); +void (*bdrv_lock_medium)(BlockDriverState *bs, bool locked); /* to control generic scsi devices */ int (*bdrv_ioctl)(BlockDriverState *bs, unsigned long int req, void *buf); diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c index afb27c6..06778f3 100644 --- a/hw/ide/atapi.c +++ b/hw/ide/atapi.c @@ -833,7 +833,7 @@ static void cmd_test_unit_ready(IDEState *s, uint8_t *buf) static void cmd_prevent_allow_medium_removal(IDEState *s, uint8_t* buf) { s-tray_locked = buf[4] 1; -bdrv_set_locked(s-bs, buf[4] 1); +
Re: [Qemu-devel] [PATCH] linux-user: Implement new ARM 64 bit cmpxchg kernel helper
On 31 August 2011 17:24, Dr. David Alan Gilbert david.gilb...@linaro.org wrote: linux-user: Implement new ARM 64 bit cmpxchg kernel helper Linux 3.1 will have a new kernel-page helper for ARM implementing 64 bit cmpxchg. Implement this helper in QEMU linux-user mode: * Provide kernel helper emulation for 64bit cmpxchg * Allow guest to object to guest offset to ensure it can map a page * Populate page with kernel helper version Signed-off-by: Dr. David Alan Gilbert david.gilb...@linaro.org Reviewed-by: Peter Maydell peter.mayd...@linaro.org
Re: [Qemu-devel] [PATCH] pci: add standard bridge device
At 09/08/2011 02:15 PM, Wen Congyang Write: At 09/07/2011 07:52 PM, Michael S. Tsirkin Write: On Wed, Sep 07, 2011 at 12:39:09PM +0800, Wen Congyang wrote: At 09/06/2011 03:45 PM, Avi Kivity Write: On 09/06/2011 06:06 AM, Wen Congyang wrote: Use the uio driver - http://docs.blackfin.uclinux.org/kernel/generated/uio-howto/. You just mmap() the BAR from userspace and play with it. When I try to bind ivshmem to uio_pci_generic, I get the following messages: uio_pci_generic :01:01.0: No IRQ assigned to device: no support for interrupts? No idea what this means. PCI 3.0 6.2.4 For x86 based PCs, the values in this register correspond to IRQ numbers (0-15) of the standard dual 8259 configuration. The value 255 is defined as meaning unknown or no connection to the interrupt controller. Values between 15 and 254 are reserved. The register is interrupt line. I read the config of this device, the interrupt line is 0. It means that it uses the IRQ0. The following is the uio_pci_generic's code: static int __devinit probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct uio_pci_generic_dev *gdev; int err; err = pci_enable_device(pdev); if (err) { dev_err(pdev-dev, %s: pci_enable_device failed: %d\n, __func__, err); return err; } if (!pdev-irq) { dev_warn(pdev-dev, No IRQ assigned to device: no support for interrupts?\n); pci_disable_device(pdev); return -ENODEV; } ... } This function will be called when we write 'domain:bus:slot.function' to /sys/bus/pci/drivers/uio_pci_generic/bind. pdev-irq is 0, it means the device uses IRQ0. But we refuse it. I do not why. To Michael S. Tsirkin This code is writen by you. Do you know why you check whether pdev-irq is 0? Thanks Wen Congyang Well I see this in linux: /* * Read interrupt line and base address registers. * The architecture-dependent code can tweak these, of course. */ static void pci_read_irq(struct pci_dev *dev) { unsigned char irq; pci_read_config_byte(dev, PCI_INTERRUPT_PIN, irq); dev-pin = irq; if (irq) pci_read_config_byte(dev, PCI_INTERRUPT_LINE, irq); dev-irq = irq; } Thus a device without an interrupt pin will get irq set to 0, and this seems the right way to detect such devices. I don't think PCI devices really use IRQ0 in practice, its probably used for PC things. More likely the system is misconfigured. Try lspci -vv to see what went wrong. Yes, the PCI device shoulde not use IRQ0. I debug qemu's code, and find the PCI_INTERRUPT_LINE register is not set by qemu: = Hardware watchpoint 6: ((uint8_t *) 0x164e410)[0x3c] Old value = 0 '\000' New value = 10 '\n' pci_default_write_config (d=0x1653ed0, addr=60, val=10, l=1) at /home/wency/source/qemu/hw/pci.c:1115 1115 d-config[addr + i] = ~(val w1cmask); /* W1C: Write 1 to Clear */ Missing separate debuginfos, use: debuginfo-install cyrus-sasl-gssapi-2.1.23-8.el6.x86_64 cyrus-sasl-md5-2.1.23-8.el6.x86_64 cyrus-sasl-plain-2.1.23-8.el6.x86_64 db4-4.7.25-16.el6.x86_64 (gdb) bt #0 pci_default_write_config (d=0x1653ed0, addr=60, val=10, l=1) at /home/wency/source/qemu/hw/pci.c:1115 #1 0x004d5827 in pci_host_config_write_common (pci_dev=0x1653ed0, addr=60, limit=256, val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:54 #2 0x004d5939 in pci_data_write (s=0x15f95a0, addr=2147502140, val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:75 #3 0x004d5b19 in pci_host_data_write (handler=0x15f9570, addr=3324, val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:125 #4 0x0063ee06 in ioport_simple_writeb (opaque=0x15f9570, addr=3324, value=10) at /home/wency/source/qemu/rwhandler.c:48 #5 0x00470db9 in ioport_write (index=0, address=3324, data=10) at ioport.c:81 #6 0x004717bc in cpu_outb (addr=3324, val=10 '\n') at ioport.c:273 #7 0x005ef25d in kvm_handle_io (port=3324, data=0x77ff8000, direction=1, size=1, count=1) at /home/wency/source/qemu/kvm-all.c:834 #8 0x005ef7e6 in kvm_cpu_exec (env=0x13da0d0) at /home/wency/source/qemu/kvm-all.c:976 #9 0x005c1a7b in qemu_kvm_cpu_thread_fn (arg=0x13da0d0) at /home/wency/source/qemu/cpus.c:661 #10 0x0032864077e1 in start_thread () from /lib64/libpthread.so.0 #11 0x0032858e68ed in clone () from /lib64/libc.so.6 = If I put ivshmem on bus 0, the PCI_INTERRUPT_LINE register can be set. So I guess this register is set by bios. I use the newest seabios, and PCI_INTERRUPT_LINE register is not set if the deivce is not on bus0. Here is the seabios's code: == static void pci_bios_init_device(struct pci_device *pci) { u16 bdf = pci-bdf; int pin, pic_irq;
Re: [Qemu-devel] [PATCH] ahci: add port I/O index-data pair
(Sorry for the slow response, was on vacation) On Thu, Sep 1, 2011 at 7:58 AM, Alexander Graf ag...@suse.de wrote: On 08/30/2011 05:07 AM, Daniel Verkamp wrote: On Sun, Aug 28, 2011 at 11:48 AM, Alexander Grafag...@suse.de wrote: On 27.08.2011, at 04:12, Daniel Verkamp wrote: Implement an I/O space index-data register pair as defined by the AHCI spec, including the corresponding SATA PCI capability and BAR. This allows real-mode code to access the AHCI registers; real-mode code cannot address the memory-mapped register space because it is beyond the first megabyte. Very nice patch! I'll check and compare with a real ICH-9 when I get back to .de, but I'd assume you also did that already ;). Once I checked that the IO region is set up similarly, I'll give you my ack. Please do double check against real hardware if you get the chance - I don't have a real ICH-9 handy to test against. This is all written based on my reading of the spec and testing with an internal DOS developer tool from work. I am mainly curious how the real thing handles writes to the index register that aren't divisible by 4 or are beyond the end of the register set (and how big that really is on ICH-9). Judging by the bits marked RO in the spec, I would guess writing 0x13 to the index and then reading it back should give 0x10, but I haven't tested it on real hw. Phew. So I finally got at least an ICH-9 system booting. This is what lspci -vvv tells me: 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02) (prog-if 01 [AHCI 1.0]) Subsystem: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 0 Interrupt: pin B routed to IRQ 26 Region 0: I/O ports at d000 [size=8] Region 1: I/O ports at cc00 [size=4] Region 2: I/O ports at c880 [size=8] Region 3: I/O ports at c800 [size=4] Region 4: I/O ports at c480 [size=32] Region 5: Memory at ffaf9000 (32-bit, non-prefetchable) [size=2K] Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit- Address: fee0f00c Data: 4169 Capabilities: [70] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=0004 Capabilities: [b0] Vendor Specific Information: Len=06 ? Kernel driver in use: ahci So BAR4 is where the IDP info should be. Offset is 4 into that IO space and the space is 32 bytes long. Do you have the ICH-9 implementation spec? I can try to dig something up if you don't have it around. I'm not sure I understand what you mean, but I think everything is in the right spot - compare with the real ICH-9 dump you provide (relevant parts quoted below; full lspci dump from QEMU device at end): Real: Region 4: I/O ports at c480 [size=32] QEMU: Region 4: I/O ports at c040 [size=32] (I/O address is different, but that is ok) Real: Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=0004 QEMU: Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=0004 (identical) Please send me a small test program I can run on the machine to find out what happens for unaligned I/O accesses. That would be very helpful! I will try to put something together in the next few days and send it along; is a DOS test app suitable? Thanks, -- Daniel Verkamp Here is the lspci -vvv -nn -x dump of the QEMU-emulated AHCI controller with the patch applied: 00:04.0 SATA controller [0106]: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller [8086:2922] (rev 02) (prog-if 01 [AHCI 1.0]) Subsystem: Red Hat, Inc Device [1af4:1100] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 24 Region 4: I/O ports at c040 [size=32] Region 5: Memory at febf1000 (32-bit, non-prefetchable) [size=4K] Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=0004 Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: fee0100c Data: 4149 Kernel driver in use: ahci 00: 86 80 22 29 07 04 10 00 02 01 06 01 00 00 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 41 c0 00 00 00 10 bf fe 00 00 00 00 f4 1a 00 11 30: 00 00 00 00 a8 00 00 00 00 00 00 00 0b 01 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 05 00 81 00 0c 10 e0 fe 00 00 00 00 49 41 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00
Re: [Qemu-devel] [PATCH 11/31] block/raw: Fix to forward method bdrv_media_changed()
Am 07.09.2011 21:25, schrieb Blue Swirl: On Tue, Sep 6, 2011 at 3:39 PM, Kevin Wolf kw...@redhat.com wrote: From: Markus Armbruster arm...@redhat.com Block driver raw forwards most methods to the underlying block driver. However, it doesn't implement method bdrv_media_changed(). Makes bdrv_media_changed() always return -ENOTSUP. I believe -fda /dev/fd0 gives you raw over host_floppy, and disk change detection (fdc register 7 bit 7) is broken. Testing my theory requires a computer museum, though. Or software to emulate ancient computers? Maybe such software could be already available to you? ;-) In general, such software is buggy. ;-) Kevin
Re: [Qemu-devel] [RFC PATCH 4/5] VFIO: Add PCI device support
On 09/07/2011 09:55 PM, Konrad Rzeszutek Wilk wrote: If you don't know what to do here, say N. + +menuconfig VFIO_PCI + bool VFIO support for PCI devices + depends on VFIO PCI + default y if X86 Hahah.. And Linus is going to tear your behind for that. Default should be 'n' It depends on VFIO, which presumably defaults to n. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [PATCH 0/2] improve qemu-img conversion performance
On Wed, Sep 07, 2011 at 04:06:51PM -0700, Yehuda Sadeh wrote: The following set of patches improve the qemu-img conversion process performance. When using a higher latency backend, small writes have a severe impact on the time it takes to do image conversion. We switch to using async writes, and we avoid splitting writes due to holes when the holes are small enough. Yehuda Sadeh (2): qemu-img: async write to block device when converting image qemu-img: don't skip writing small holes qemu-img.c | 34 +++--- 1 files changed, 27 insertions(+), 7 deletions(-) -- 2.7.5.1 This has nothing to do with the patch itself, but I've been curious about the existence of both a QEMU and a Linux kernel rbd block driver. The I/O latency with qemu-img has been an issue for rbd users. But they have the option of using the Linux kernel rbd block driver, where qemu-img can take advantage of the page cache instead of performing direct I/O. Does this mean you intend to support both QEMU block/rbd.c and Linux drivers/block/rbd.c? As a user I would go with the Linux kernel driver instead of the QEMU block driver because it offers page cache and host block device features. On the other hand a userspace driver is nice because it does not require privileges. Stefan
Re: [Qemu-devel] Suspicious code in qcow2.
Am 07.09.2011 18:42, schrieb Frediano Ziglio: Actually it does not cause problems but this code order seems a bit wrong to me (block/qcow2-cluster.c) QLIST_INSERT_HEAD(s-cluster_allocs, m, next_in_flight); /* allocate a new cluster */ cluster_offset = qcow2_alloc_clusters(bs, nb_clusters * s-cluster_size); if (cluster_offset 0) { ret = cluster_offset; goto fail; } /* save info needed for meta data update */ m-offset = offset; m-n_start = n_start; m-nb_clusters = nb_clusters; current metadata (m) get inserted in cluster allocation list with nb_clusters set to 0. Loop on cluster_allocs ignore (wait for this allocation or just skip it depending on dirty data in offset field) this metadata. Currently all occur in a CoMutex so this does not cause problems but in case qcow2_alloc_clusters unlock the mutex it can occur to insert two overlapping updates into cluster_allocs. Perhaps a better order would be /* save info needed for meta data update */ m-offset = offset; m-n_start = n_start; m-nb_clusters = nb_clusters; QLIST_INSERT_HEAD(s-cluster_allocs, m, next_in_flight); /* allocate a new cluster */ cluster_offset = qcow2_alloc_clusters(bs, nb_clusters * s-cluster_size); if (cluster_offset 0) { ret = cluster_offset; goto fail; } (tested successfully with iotests suite) Yes, that makes sense. Once we run this code without holding the CoMutex, this becomes a real problem. Care to send a patch? Kevin
Re: [Qemu-devel] [PATCH 3/5] tcg/s390: Only one call output register needed for 64 bit hosts
On 09/07/2011 12:32 PM, Alexander Graf wrote: On 05.09.2011, at 11:07, Stefan Weil wrote: The second register is only needed for 32 bit hosts. Looks sane to me. Richard, mind to ack? Alex Cc: Alexander Graf ag...@suse.de Signed-off-by: Stefan Weil w...@mail.berlios.de Acked-by: Richard Henderson r...@twiddle.net r~
[Qemu-devel] [PATCH] target-i386: Compute all flag data inside %cl != 0 test.
The (x (cl - 1)) quantity is only used if CL != 0. Move the computation of that quantity nearer its use. This avoids the creation of undefined TCG operations when the constant propagation optimization proves that CL == 0, and thus CL-1 is outside the range [0-wordsize). Signed-off-by: Richard Henderson r...@twiddle.net --- target-i386/translate.c | 72 --- 1 files changed, 43 insertions(+), 29 deletions(-) diff --git a/target-i386/translate.c b/target-i386/translate.c index ccef381..b966762 100644 --- a/target-i386/translate.c +++ b/target-i386/translate.c @@ -1406,70 +1406,84 @@ static void gen_shift_rm_T1(DisasContext *s, int ot, int op1, { target_ulong mask; int shift_label; -TCGv t0, t1; +TCGv t0, t1, t2; -if (ot == OT_QUAD) +if (ot == OT_QUAD) { mask = 0x3f; -else +} else { mask = 0x1f; +} /* load */ -if (op1 == OR_TMP0) +if (op1 == OR_TMP0) { gen_op_ld_T0_A0(ot + s-mem_index); -else +} else { gen_op_mov_TN_reg(ot, 0, op1); +} -tcg_gen_andi_tl(cpu_T[1], cpu_T[1], mask); +t0 = tcg_temp_local_new(); +t1 = tcg_temp_local_new(); +t2 = tcg_temp_local_new(); -tcg_gen_addi_tl(cpu_tmp5, cpu_T[1], -1); +tcg_gen_andi_tl(t2, cpu_T[1], mask); if (is_right) { if (is_arith) { gen_exts(ot, cpu_T[0]); -tcg_gen_sar_tl(cpu_T3, cpu_T[0], cpu_tmp5); -tcg_gen_sar_tl(cpu_T[0], cpu_T[0], cpu_T[1]); +tcg_gen_mov_tl(t0, cpu_T[0]); +tcg_gen_sar_tl(cpu_T[0], cpu_T[0], t2); } else { gen_extu(ot, cpu_T[0]); -tcg_gen_shr_tl(cpu_T3, cpu_T[0], cpu_tmp5); -tcg_gen_shr_tl(cpu_T[0], cpu_T[0], cpu_T[1]); +tcg_gen_mov_tl(t0, cpu_T[0]); +tcg_gen_shr_tl(cpu_T[0], cpu_T[0], t2); } } else { -tcg_gen_shl_tl(cpu_T3, cpu_T[0], cpu_tmp5); -tcg_gen_shl_tl(cpu_T[0], cpu_T[0], cpu_T[1]); +tcg_gen_mov_tl(t0, cpu_T[0]); +tcg_gen_shl_tl(cpu_T[0], cpu_T[0], t2); } /* store */ -if (op1 == OR_TMP0) +if (op1 == OR_TMP0) { gen_op_st_T0_A0(ot + s-mem_index); -else +} else { gen_op_mov_reg_T0(ot, op1); - +} + /* update eflags if non zero shift */ -if (s-cc_op != CC_OP_DYNAMIC) +if (s-cc_op != CC_OP_DYNAMIC) { gen_op_set_cc_op(s-cc_op); +} -/* XXX: inefficient */ -t0 = tcg_temp_local_new(); -t1 = tcg_temp_local_new(); - -tcg_gen_mov_tl(t0, cpu_T[0]); -tcg_gen_mov_tl(t1, cpu_T3); +tcg_gen_mov_tl(t1, cpu_T[0]); shift_label = gen_new_label(); -tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_T[1], 0, shift_label); +tcg_gen_brcondi_tl(TCG_COND_EQ, t2, 0, shift_label); -tcg_gen_mov_tl(cpu_cc_src, t1); -tcg_gen_mov_tl(cpu_cc_dst, t0); -if (is_right) +tcg_gen_addi_tl(t2, t2, -1); +tcg_gen_mov_tl(cpu_cc_dst, t1); + +if (is_right) { +if (is_arith) { +tcg_gen_sar_tl(cpu_cc_src, t0, t2); +} else { +tcg_gen_shr_tl(cpu_cc_src, t0, t2); +} +} else { +tcg_gen_shl_tl(cpu_cc_src, t0, t2); +} + +if (is_right) { tcg_gen_movi_i32(cpu_cc_op, CC_OP_SARB + ot); -else +} else { tcg_gen_movi_i32(cpu_cc_op, CC_OP_SHLB + ot); - +} + gen_set_label(shift_label); s-cc_op = CC_OP_DYNAMIC; /* cannot predict flags after */ tcg_temp_free(t0); tcg_temp_free(t1); +tcg_temp_free(t2); } static void gen_shift_rm_im(DisasContext *s, int ot, int op1, int op2, -- 1.7.4.4
Re: [Qemu-devel] [PATCH] [SPARC] Gdbstub: Fix back-trace on SPARC32
On 07/09/2011 21:02, Blue Swirl wrote: On Tue, Sep 6, 2011 at 10:38 AM, Fabien Chouteau chout...@adacore.com wrote: On 05/09/2011 21:22, Blue Swirl wrote: On Mon, Sep 5, 2011 at 9:33 AM, Fabien Chouteau chout...@adacore.com wrote: On 03/09/2011 11:25, Blue Swirl wrote: On Thu, Sep 1, 2011 at 2:17 PM, Fabien Chouteau chout...@adacore.com wrote: Gdb expects all registers windows to be flushed in ram, which is not the case in Qemu. Therefore the back-trace generation doesn't work. This patch adds a function to handle reads/writes in stack frames as if windows were flushed. Signed-off-by: Fabien Chouteau chout...@adacore.com --- gdbstub.c | 10 -- target-sparc/cpu.h|7 target-sparc/helper.c | 85 + 3 files changed, 99 insertions(+), 3 deletions(-) diff --git a/gdbstub.c b/gdbstub.c index 3b87c27..85d5ad7 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -41,6 +41,9 @@ #include qemu_socket.h #include kvm.h +#ifndef TARGET_CPU_MEMORY_RW_DEBUG +#define TARGET_CPU_MEMORY_RW_DEBUG cpu_memory_rw_debug These days, inline functions are preferred over macros. This is to allow target-specific implementation of the function. That can be done with inline functions too. OK, how do you do that? #ifndef TARGET_CPU_MEMORY_RW_DEBUG int target_memory_rw_debug(CPUState *env, target_ulong addr, uint8_t *buf, int len, int is_write) { return cpu_memory_rw_debug(env, addr, buf, len, is_write); } #else /* target_memory_rw_debug() defined in cpu.h */ #endif OK, understood. +#endif enum { GDB_SIGNAL_0 = 0, @@ -2013,7 +2016,7 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf) if (*p == ',') p++; len = strtoull(p, NULL, 16); -if (cpu_memory_rw_debug(s-g_cpu, addr, mem_buf, len, 0) != 0) { +if (TARGET_CPU_MEMORY_RW_DEBUG(s-g_cpu, addr, mem_buf, len, 0) != 0) { cpu_memory_rw_debug() could remain unwrapped with a generic function like cpu_gdb_sync_memory() which gdbstub should explicitly call. Maybe the lazy condition codes etc. could be handled in similar way, cpu_gdb_sync_registers(). Excuse me, I don't understand here. cpu_gdb_{read,write}_register needs to force calculation of lazy condition codes. On Sparc this is handled by cpu_get_psr(), so it is not explicit. I still don't understand you point. Do you suggest a cpu_gdb_sync_memory() that will flush register windows? Not really but nevermind. + +/* Gdb expects all registers windows to be flushed in ram. This function handles + * reads/writes in stack frames as if windows were flushed. We assume that the + * sparc ABI is followed. + */ We can't assume that, it depends on what we are executing (BIOS, OS, even application). Well, maybe the statement is too strong. The ABI is required to get a valid result. Gdb cannot build back-traces if the ABI is not followed anyway. But if the ABI assumption happens to be wrong (for example registers contain random values), memory may be corrupted because this would happily use whatever the registers contain. This cannot corrupt memory, the point is to read/write in registers instead of memory. Sorry, I misread a part of the patch, guest memory is not written unlike I mistakenly assumed (simple register to memory flush). However, wrong ABI assumption may instead corrupt the registers. Another way to fix this would be that GDB would tell QEMU what ABI to use for flushing. But how would one tell GDB about a non-standard ABI? For user emulators we can make ABI assumptions, there similar patch could make sense. But system emulators can't assume anything about the guest OS, it could be Linux, *BSD, a commercial OS or even a toy OS. I think all of these kernels follow the SPARC32 ABI, and if they don't Gdb cannot handle them anyway. This solution covers 99% of the problem. As is, it's not 100% correct and the failure case is destructive. But would it make sense if the registers were not touched on write? Then to GDB the windows would appear as if flushed to memory, but like real hardware the registers would not automatically get updated from memory if it's changed by GDB. I don't think corruption would be possible in that case, though GDB (or the user) could get temporarily confused if a read from memory location would not return its true value. I think this might be the best compromise. So I'll just handle reads in register windows. BTW, cpu_cwp_inc() is called but there is no effort to restore CWP afterward. The CWP in CPUState is never modified by cpu_cpw_inc(). Version 2 is on its way... Regards, -- Fabien Chouteau
[Qemu-devel] CFQ I/O starvation problem triggered by RHEL6.0 KVM guests
This is a report of strange cfq behaviour which seems to be triggered by QEMU posix aio threads. Host environment: OS: RHEL6.0 KVM/qemu-kvm (with no patch applied) IO scheduler: cfq (with the default parameters) On the host, we were running 3 linux guests to see if I/O from these guests would be handled fairly by host; each guest did dd write with oflag=direct. Guest virtual disk: We used a host local disk which had 3 partitions, and each guest was allocated one of these as dd write target. So our test was for checking if cfq could keep fairness for the 3 guests who shared the same disk. The result (strage starvation): Sometimes, one guest dominated cfq for more than 10sec and requests from other guests were not handled at all during that time. Below is the blktrace log which shows that a request to (8,27) in cfq2068S (*1) is not handled at all during cfq2095S and cfq2067S which hold requests to (8,26) are being handled alternately. *1) WS 104920578 + 64 Question: I guess that cfq_close_cooperator() was being called in an unusual manner. If so, do you think that cfq is responsible for keeping fairness for this kind of unusual write requests? Note: With RHEL6.1, this problem could not triggered. But I guess that was due to QEMU's block layer updates. Thanks, Takuya --- blktrace log --- 8,16 0 2010 0.275081840 2068 A WS 104920578 + 64 - (8,27) 0 8,16 0 2011 0.275082180 2068 Q WS 104920578 + 64 [qemu-kvm] 8,16 00 0.275091369 0 m N cfq2068S / alloced 8,16 0 2012 0.275091909 2068 G WS 104920578 + 64 [qemu-kvm] 8,16 0 2013 0.275093352 2068 P N [qemu-kvm] 8,16 0 2014 0.275094059 2068 I W 104920578 + 64 [qemu-kvm] 8,16 00 0.275094887 0 m N cfq2068S / insert_request 8,16 00 0.275095742 0 m N cfq2068S / add_to_rr 8,16 0 2015 0.275097194 2068 U N [qemu-kvm] 1 8,16 2 2073 0.275189462 2095 A WS 83979688 + 64 - (8,26) 4 8,16 2 2074 0.275189989 2095 Q WS 83979688 + 64 [qemu-kvm] 8,16 2 2075 0.275192534 2095 G WS 83979688 + 64 [qemu-kvm] 8,16 2 2076 0.275193909 2095 I W 83979688 + 64 [qemu-kvm] 8,16 20 0.275195609 0 m N cfq2095S / insert_request 8,16 20 0.275196404 0 m N cfq2095S / add_to_rr 8,16 20 0.275198004 0 m N cfq2095S / preempt 8,16 20 0.275198688 0 m N cfq2067S / slice expired t=1 8,16 20 0.275199631 0 m N cfq2067S / resid=100 8,16 20 0.275200413 0 m N cfq2067S / sl_used=1 8,16 20 0.275201521 0 m N / served: vt=1671968768 min_vt=1671966720 8,16 20 0.275202323 0 m N cfq2067S / del_from_rr 8,16 20 0.275204263 0 m N cfq2095S / set_active wl_prio:0 wl_type:2 8,16 20 0.275205131 0 m N cfq2095S / fifo=(null) 8,16 20 0.275205851 0 m N cfq2095S / dispatch_insert 8,16 20 0.275207121 0 m N cfq2095S / dispatched a request 8,16 20 0.275207873 0 m N cfq2095S / activate rq, drv=1 8,16 2 2077 0.275208198 2095 D W 83979688 + 64 [qemu-kvm] 8,16 2 2078 0.275269567 2095 U N [qemu-kvm] 2 8,16 4 836 0.275483550 0 C W 83979688 + 64 [0] 8,16 40 0.275496745 0 m N cfq2095S / complete rqnoidle 0 8,16 40 0.275497825 0 m N cfq2095S / set_slice=100 8,16 40 0.275499512 0 m N cfq2095S / arm_idle: 8 8,16 40 0.275499862 0 m N cfq schedule dispatch 8,16 6 85 0.275626195 2067 A WS 83979752 + 64 - (8,26) 40064 8,16 6 86 0.275626598 2067 Q WS 83979752 + 64 [qemu-kvm] 8,16 6 87 0.275628580 2067 G WS 83979752 + 64 [qemu-kvm] 8,16 6 88 0.275629630 2067 I W 83979752 + 64 [qemu-kvm] 8,16 60 0.275631047 0 m N cfq2067S / insert_request 8,16 60 0.275631730 0 m N cfq2067S / add_to_rr 8,16 60 0.275633567 0 m N cfq2067S / preempt 8,16 60 0.275634275 0 m N cfq2095S / slice expired t=1 8,16 60 0.275635285 0 m N cfq2095S / resid=100 8,16 60 0.275635985 0 m N cfq2095S / sl_used=1 8,16 60 0.275636882 0 m N / served: vt=1671970816 min_vt=1671968768 8,16 60 0.275637585 0 m N cfq2095S / del_from_rr 8,16 60 0.275639382 0 m N cfq2067S / set_active wl_prio:0 wl_type:2 8,16 60 0.275640222 0 m N cfq2067S / fifo=(null) 8,16 60 0.275640809 0 m N cfq2067S / dispatch_insert 8,16 60 0.275641929 0 m N cfq2067S / dispatched a request
Re: [Qemu-devel] [PATCH 1/2] build: fix missing trace dep on GENERATED_HEADERS
On Thu, Sep 8, 2011 at 12:40 AM, Michael Roth mdr...@linux.vnet.ibm.com wrote: fc764105 added an include for qemu-common.h to trace/control.h, which made all users of this header file dependent on GENERATED_HEADERS. Since it's used by pretty much all the trace backends now, make trace-obj-y dependent on GENERATED_HEADERS. Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com --- Makefile.objs | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Re: [Qemu-devel] [PATCH] pci: add standard bridge device
Hi, I modify the code like this, and the PCI_INTERRUPT_LINE register is set, and I can bind it to uio_pci_generic: --- a/src/pciinit.c +++ b/src/pciinit.c @@ -575,6 +575,8 @@ static int pci_bios_init_root_regions(u32 start, u32 end) pci_bios_init_bus_bases(busses[0]); -pci_bios_map_device_in_bus(0 /* host bus */); +for (bus = 0; bus= MaxPCIBus; bus++) { +pci_bios_map_device_in_bus(bus /* host bus */); No. pci_bios_map_device_in_bus goes down recursively when it finds a bridge, so it should cover all devices already. -pci_bios_init_device_in_bus(0 /* host bus */); +pci_bios_init_device_in_bus(bus /* host bus */); +} That is correct. Can be done easier though by just not limiting device initialization to a specific bus like in the attached patch. Does that one work for you? cheers, Gerd diff --git a/src/pciinit.c b/src/pciinit.c index 597c8ea..676e35e 100644 --- a/src/pciinit.c +++ b/src/pciinit.c @@ -45,7 +45,7 @@ static struct pci_bus { } *busses; static int busses_count; -static void pci_bios_init_device_in_bus(int bus); +static void pci_bios_init_device_all(void); static void pci_bios_check_device_in_bus(int bus); static void pci_bios_init_bus_bases(struct pci_bus *bus); static void pci_bios_map_device_in_bus(int bus); @@ -254,15 +254,10 @@ static void pci_bios_init_device(struct pci_device *pci) pci_init_device(pci_device_tbl, pci, NULL); } -static void pci_bios_init_device_in_bus(int bus) +static void pci_bios_init_device_all(void) { struct pci_device *pci; foreachpci(pci) { -u8 pci_bus = pci_bdf_to_bus(pci-bdf); -if (pci_bus bus) -continue; -if (pci_bus bus) -break; pci_bios_init_device(pci); } } @@ -605,7 +600,7 @@ pci_setup(void) pci_bios_init_bus_bases(busses[0]); pci_bios_map_device_in_bus(0 /* host bus */); -pci_bios_init_device_in_bus(0 /* host bus */); +pci_bios_init_device_all(); struct pci_device *pci; foreachpci(pci) {
Re: [Qemu-devel] [PATCH 1/3] rbd: allow client id to be specified in config string
On Wed, Sep 7, 2011 at 5:28 PM, Sage Weil s...@newdream.net wrote: Allow the client id to be specified in the config string via 'id=' so that users can control who they authenticate as. Currently they are stuck with the default ('admin'). This is necessary for anyone using authentication in their environment. Signed-off-by: Sage Weil s...@newdream.net --- block/rbd.c | 52 1 files changed, 44 insertions(+), 8 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Re: [Qemu-devel] [PATCH 2/3] rbd: clean up, fix style
On Wed, Sep 7, 2011 at 5:28 PM, Sage Weil s...@newdream.net wrote: No assignment in condition. Remove duplicate ret 0 check. Signed-off-by: Sage Weil s...@newdream.net --- block/rbd.c | 17 - 1 files changed, 8 insertions(+), 9 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Re: [Qemu-devel] [PATCH 3/3] rbd: fix leak in qemu_rbd_open failure paths
On Wed, Sep 7, 2011 at 5:28 PM, Sage Weil s...@newdream.net wrote: Fix leak of s-snap in failure path. Simplify error paths for the whole function. Reported-by: Stefan Hajnoczi stefa...@gmail.com Signed-off-by: Sage Weil s...@newdream.net --- block/rbd.c | 28 +--- 1 files changed, 13 insertions(+), 15 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
[Qemu-devel] Consistent crasher on reboot / shutdown
#0 0x7f0f5da502c4 in malloc_consolidate.part.3 () from /lib64/libc.so.6 #1 0x7f0f5da51274 in _int_malloc () from /lib64/libc.so.6 #2 0x7f0f5da53b00 in malloc () from /lib64/libc.so.6 #3 0x0066cfec in malloc_and_trace (n_bytes=4120) at /build/home/tlv/akivity/qemu/vl.c:2154 #4 0x7f0f5fbdc1de in ?? () from /lib64/libglib-2.0.so.0 #5 0x7f0f5fbdc668 in g_malloc0 () from /lib64/libglib-2.0.so.0 #6 0x004f7e67 in qdict_new () at qdict.c:38 #7 0x005e31e8 in handle_user_command (mon=0x2fccc30, cmdline=0x2fcd0b0 help) at /build/home/tlv/akivity/qemu/monitor.c:4532 #8 0x005e4ed1 in monitor_command_cb (mon=0x2fccc30, cmdline=0x2fcd0b0 help, opaque=0x0) at /build/home/tlv/akivity/qemu/monitor.c:5190 #9 0x0050b04c in readline_handle_byte (rs=0x2fcd0b0, ch=10) at readline.c:370 #10 0x005e4e15 in monitor_read (opaque=0x2fccc30, buf=0x7fff0a383860 \n, size=1) at /build/home/tlv/akivity/qemu/monitor.c:5176 #11 0x004f8ff9 in qemu_chr_be_write (s=0x2e53ae0, buf=0x7fff0a383860 \n, len=1) at qemu-char.c:163 #12 0x004fcb57 in tcp_chr_read (opaque=0x2e53ae0) at qemu-char.c:2106 #13 0x0046d87d in qemu_iohandler_poll (readfds=0x7fff0a384920, writefds=0x7fff0a3849a0, xfds=0x7fff0a384a20, ret=1) at iohandler.c:175 #14 0x0066b4cc in main_loop_wait (nonblocking=0) at /build/home/tlv/akivity/qemu/vl.c:1438 #15 0x0066b59c in main_loop () at /build/home/tlv/akivity/qemu/vl.c:1469 #16 0x006701e5 in main (argc=23, argv=0x7fff0a384ee8, envp=0x7fff0a384fa8) at /build/home/tlv/akivity/qemu/vl.c:3491 -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH] pci: add standard bridge device
At 09/08/2011 05:43 PM, Gerd Hoffmann Write: Hi, I modify the code like this, and the PCI_INTERRUPT_LINE register is set, and I can bind it to uio_pci_generic: --- a/src/pciinit.c +++ b/src/pciinit.c @@ -575,6 +575,8 @@ static int pci_bios_init_root_regions(u32 start, u32 end) pci_bios_init_bus_bases(busses[0]); -pci_bios_map_device_in_bus(0 /* host bus */); +for (bus = 0; bus= MaxPCIBus; bus++) { +pci_bios_map_device_in_bus(bus /* host bus */); No. pci_bios_map_device_in_bus goes down recursively when it finds a bridge, so it should cover all devices already. Yes, pci_bios_map_device() goes down recursively. -pci_bios_init_device_in_bus(0 /* host bus */); +pci_bios_init_device_in_bus(bus /* host bus */); +} That is correct. Can be done easier though by just not limiting device initialization to a specific bus like in the attached patch. Does that one work for you? I test it, and it works for me. Thanks Wen Congyang cheers, Gerd
Re: [Qemu-devel] [PATCH 1/3] rbd: allow client id to be specified in config string
Am 07.09.2011 18:28, schrieb Sage Weil: Allow the client id to be specified in the config string via 'id=' so that users can control who they authenticate as. Currently they are stuck with the default ('admin'). This is necessary for anyone using authentication in their environment. Signed-off-by: Sage Weil s...@newdream.net Thanks, applied all to the block branch. Kevin
Re: [Qemu-devel] [PATCH -V2] iohandler: update qemu_fd_set_handler to work with null call back arg
On 09/07/2011 09:44 PM, Anthony Liguori wrote: I think this is a bit more complicated than is really needed. Here's what I came up with which also fixes another bug where the io channel could be freed twice. I stumbled across this via a very strange failure scenario. Avi, it might be worth trying this patch to see if it fixes your problem too. Right now, I've got more than just one problem. One thing that I found challenging debugging this, coroutines make valgrind very unhappy. Is it possible that we could have a command line switch to fall back to the thread based coroutines so to make things more valgrind friendly? How is valgrind even aware of coroutines? Unless is doesn't implement makecontext correctly, it shouldn't even be aware of them. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH v8 0/4] The intro of QEMU block I/O throttling
The main goal of the patch is to effectively cap the disk I/O speed or counts of one single VM.It is only one draft, so it unavoidably has some drawbacks, if you catch them, please let me know. The patch will mainly introduce one block I/O throttling algorithm, one timer and one block queue for each I/O limits enabled drive. When a block request is coming in, the throttling algorithm will check if its I/O rate or counts exceed the limits; if yes, then it will enqueue to the block queue; The timer will handle the I/O requests in it. Some available features follow as below: (1) global bps limit. -drive bps=xxxin bytes/s (2) only read bps limit -drive bps_rd=xxx in bytes/s (3) only write bps limit -drive bps_wr=xxx in bytes/s (4) global iops limit -drive iops=xxx in ios/s (5) only read iops limit -drive iops_rd=xxxin ios/s (6) only write iops limit -drive iops_wr=xxxin ios/s (7) the combination of some limits. -drive bps=xxx,iops=xxx Known Limitations: (1) #1 can not coexist with #2, #3 (2) #4 can not coexist with #5, #6 (3) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario. Changes since code V7: fix the build per patch based on stefan's comments. Zhi Yong Wu (4): block: add the command line support block: add the block queue support block: add block timer and throttling algorithm qmp/hmp: add block_set_io_throttle v7: Mainly simply the block queue. Adjust codes based on stefan's comments. v6: Mainly fix the aio callback issue for block queue. Adjust codes based on Ram Pai's comments. v5: add qmp/hmp support. Adjust the codes based on stefan's comments qmp/hmp: add block_set_io_throttle v4: fix memory leaking based on ryan's feedback. v3: Added the code for extending slice time, and modified the method to compute wait time for the timer. v2: The codes V2 for QEMU disk I/O limits. Modified the codes mainly based on stefan's comments. v1: Submit the codes for QEMU disk I/O limits. Only a code draft. Makefile.objs |2 +- block.c | 344 +++-- block.h |6 +- block/blk-queue.c | 201 +++ block/blk-queue.h | 59 + block_int.h | 30 + blockdev.c| 98 +++ blockdev.h|2 + hmp-commands.hx | 15 +++ qemu-config.c | 24 qemu-options.hx |1 + qerror.c |4 + qerror.h |3 + qmp-commands.hx | 52 - 14 files changed, 825 insertions(+), 16 deletions(-) create mode 100644 block/blk-queue.c create mode 100644 block/blk-queue.h -- 1.7.6
[Qemu-devel] [PATCH v8 2/4] block: add the command line support
Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- block.c | 59 +++ block.h |5 block_int.h |3 ++ blockdev.c | 29 +++ qemu-config.c | 24 ++ qemu-options.hx |1 + 6 files changed, 121 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 43742b7..cd75183 100644 --- a/block.c +++ b/block.c @@ -104,6 +104,57 @@ int is_windows_drive(const char *filename) } #endif +/* throttling disk I/O limits */ +void bdrv_io_limits_disable(BlockDriverState *bs) +{ +bs-io_limits_enabled = false; + +if (bs-block_queue) { +qemu_block_queue_flush(bs-block_queue); +qemu_del_block_queue(bs-block_queue); +bs-block_queue = NULL; +} + +if (bs-block_timer) { +qemu_del_timer(bs-block_timer); +qemu_free_timer(bs-block_timer); +bs-block_timer = NULL; +} + +bs-slice_start = 0; + +bs-slice_end = 0; +} + +static void bdrv_block_timer(void *opaque) +{ +BlockDriverState *bs = opaque; +BlockQueue *queue= bs-block_queue; + +qemu_block_queue_flush(queue); +} + +void bdrv_io_limits_enable(BlockDriverState *bs) +{ +bs-block_queue = qemu_new_block_queue(); +bs-block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs); + +bs-slice_start = qemu_get_clock_ns(vm_clock); + +bs-slice_end = bs-slice_start + BLOCK_IO_SLICE_TIME; +} + +bool bdrv_io_limits_enabled(BlockDriverState *bs) +{ +BlockIOLimit *io_limits = bs-io_limits; +return io_limits-bps[BLOCK_IO_LIMIT_READ] + || io_limits-bps[BLOCK_IO_LIMIT_WRITE] + || io_limits-bps[BLOCK_IO_LIMIT_TOTAL] + || io_limits-iops[BLOCK_IO_LIMIT_READ] + || io_limits-iops[BLOCK_IO_LIMIT_WRITE] + || io_limits-iops[BLOCK_IO_LIMIT_TOTAL]; +} + /* check if the path starts with protocol: */ static int path_has_protocol(const char *path) { @@ -1453,6 +1504,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs, *psecs = bs-secs; } +/* throttling disk io limits */ +void bdrv_set_io_limits(BlockDriverState *bs, +BlockIOLimit *io_limits) +{ +bs-io_limits = *io_limits; +bs-io_limits_enabled = bdrv_io_limits_enabled(bs); +} + /* Recognize floppy formats */ typedef struct FDFormat { FDriveType drive; diff --git a/block.h b/block.h index 3ac0b94..a3e69db 100644 --- a/block.h +++ b/block.h @@ -58,6 +58,11 @@ void bdrv_info(Monitor *mon, QObject **ret_data); void bdrv_stats_print(Monitor *mon, const QObject *data); void bdrv_info_stats(Monitor *mon, QObject **ret_data); +/* disk I/O throttling */ +void bdrv_io_limits_enable(BlockDriverState *bs); +void bdrv_io_limits_disable(BlockDriverState *bs); +bool bdrv_io_limits_enabled(BlockDriverState *bs); + void bdrv_init(void); void bdrv_init_with_whitelist(void); BlockDriver *bdrv_find_protocol(const char *filename); diff --git a/block_int.h b/block_int.h index 201e635..368c776 100644 --- a/block_int.h +++ b/block_int.h @@ -257,6 +257,9 @@ void qemu_aio_release(void *p); void *qemu_blockalign(BlockDriverState *bs, size_t size); +void bdrv_set_io_limits(BlockDriverState *bs, +BlockIOLimit *io_limits); + #ifdef _WIN32 int is_windows_drive(const char *filename); #endif diff --git a/blockdev.c b/blockdev.c index 2602591..619ae9f 100644 --- a/blockdev.c +++ b/blockdev.c @@ -236,6 +236,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) int on_read_error, on_write_error; const char *devaddr; DriveInfo *dinfo; +BlockIOLimit io_limits; int snapshot = 0; int ret; @@ -354,6 +355,31 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) } } +/* disk I/O throttling */ +io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = + qemu_opt_get_number(opts, bps, 0); +io_limits.bps[BLOCK_IO_LIMIT_READ] = + qemu_opt_get_number(opts, bps_rd, 0); +io_limits.bps[BLOCK_IO_LIMIT_WRITE] = + qemu_opt_get_number(opts, bps_wr, 0); +io_limits.iops[BLOCK_IO_LIMIT_TOTAL] = + qemu_opt_get_number(opts, iops, 0); +io_limits.iops[BLOCK_IO_LIMIT_READ] = + qemu_opt_get_number(opts, iops_rd, 0); +io_limits.iops[BLOCK_IO_LIMIT_WRITE] = + qemu_opt_get_number(opts, iops_wr, 0); + +if (((io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0) + ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0) +|| (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0))) +|| ((io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0) + ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0) +|| (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0 { +error_report(bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr) + cannot be used at the same time); +return
[Qemu-devel] [PATCH v8 1/4] block: add the block queue support
Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- Makefile.objs |2 +- block/blk-queue.c | 201 + block/blk-queue.h | 59 block_int.h | 27 +++ 4 files changed, 288 insertions(+), 1 deletions(-) create mode 100644 block/blk-queue.c create mode 100644 block/blk-queue.h diff --git a/Makefile.objs b/Makefile.objs index 26b885b..5dcf456 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-nested-y += qed-check.o -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o block-nested-$(CONFIG_WIN32) += raw-win32.o block-nested-$(CONFIG_POSIX) += raw-posix.o block-nested-$(CONFIG_CURL) += curl.o diff --git a/block/blk-queue.c b/block/blk-queue.c new file mode 100644 index 000..adef497 --- /dev/null +++ b/block/blk-queue.c @@ -0,0 +1,201 @@ +/* + * QEMU System Emulator queue definition for block layer + * + * Copyright (c) IBM, Corp. 2011 + * + * Authors: + * Zhi Yong Wu wu...@linux.vnet.ibm.com + * Stefan Hajnoczi stefa...@linux.vnet.ibm.com + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include block_int.h +#include block/blk-queue.h +#include qemu-common.h + +/* The APIs for block request queue on qemu block layer. + */ + +struct BlockQueueAIOCB { +BlockDriverAIOCB common; +QTAILQ_ENTRY(BlockQueueAIOCB) entry; +BlockRequestHandler *handler; +BlockDriverAIOCB *real_acb; + +int64_t sector_num; +QEMUIOVector *qiov; +int nb_sectors; +}; + +typedef struct BlockQueueAIOCB BlockQueueAIOCB; + +struct BlockQueue { +QTAILQ_HEAD(requests, BlockQueueAIOCB) requests; +bool req_failed; +bool flushing; +}; + +static void qemu_block_queue_dequeue(BlockQueue *queue, + BlockQueueAIOCB *request) +{ +BlockQueueAIOCB *req; + +assert(queue); +while (!QTAILQ_EMPTY(queue-requests)) { +req = QTAILQ_FIRST(queue-requests); +if (req == request) { +QTAILQ_REMOVE(queue-requests, req, entry); +break; +} +} +} + +static void qemu_block_queue_cancel(BlockDriverAIOCB *acb) +{ +BlockQueueAIOCB *request = container_of(acb, BlockQueueAIOCB, common); +if (request-real_acb) { +bdrv_aio_cancel(request-real_acb); +} else { +assert(request-common.bs-block_queue); +qemu_block_queue_dequeue(request-common.bs-block_queue, + request); +} + +qemu_aio_release(request); +} + +static AIOPool block_queue_pool = { +.aiocb_size = sizeof(struct BlockQueueAIOCB), +.cancel = qemu_block_queue_cancel, +}; + +static void qemu_block_queue_callback(void *opaque, int ret) +{ +BlockQueueAIOCB *acb = opaque; + +if (acb-common.cb) { +acb-common.cb(acb-common.opaque, ret); +} + +qemu_aio_release(acb); +} + +BlockQueue *qemu_new_block_queue(void) +{ +BlockQueue *queue; + +queue = g_malloc0(sizeof(BlockQueue)); + +QTAILQ_INIT(queue-requests); + +queue-req_failed = true; +queue-flushing = false; + +return queue; +} + +void qemu_del_block_queue(BlockQueue *queue) +{ +BlockQueueAIOCB *request, *next; + +QTAILQ_FOREACH_SAFE(request, queue-requests, entry, next) { +QTAILQ_REMOVE(queue-requests, request, entry); +qemu_aio_release(request); +} + +g_free(queue); +} + +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue, +BlockDriverState *bs, +BlockRequestHandler *handler, +int64_t
[Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm
Note: 1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario. 2.) When dd command is issued in guest, if its option bs is set to a large value such as bs=1024K, the result speed will slightly bigger than the limits. For these problems, if you have nice thought, pls let us know.:) Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- block.c | 259 --- block.h |1 - 2 files changed, 248 insertions(+), 12 deletions(-) diff --git a/block.c b/block.c index cd75183..c08fde8 100644 --- a/block.c +++ b/block.c @@ -30,6 +30,9 @@ #include qemu-objects.h #include qemu-coroutine.h +#include qemu-timer.h +#include block/blk-queue.h + #ifdef CONFIG_BSD #include sys/types.h #include sys/stat.h @@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs, QEMUIOVector *iov); static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs); +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors, +bool is_write, double elapsed_time, uint64_t *wait); +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write, +double elapsed_time, uint64_t *wait); +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors, +bool is_write, int64_t *wait); + static QTAILQ_HEAD(, BlockDriverState) bdrv_states = QTAILQ_HEAD_INITIALIZER(bdrv_states); @@ -745,6 +755,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, bs-change_cb(bs-change_opaque, CHANGE_MEDIA); } +/* throttling disk I/O limits */ +if (bs-io_limits_enabled) { +bdrv_io_limits_enable(bs); +} + return 0; unlink_and_fail: @@ -783,6 +798,18 @@ void bdrv_close(BlockDriverState *bs) if (bs-change_cb) bs-change_cb(bs-change_opaque, CHANGE_MEDIA); } + +/* throttling disk I/O limits */ +if (bs-block_queue) { +qemu_del_block_queue(bs-block_queue); +bs-block_queue = NULL; +} + +if (bs-block_timer) { +qemu_del_timer(bs-block_timer); +qemu_free_timer(bs-block_timer); +bs-block_timer = NULL; +} } void bdrv_close_all(void) @@ -2341,16 +2368,48 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num, BlockDriverCompletionFunc *cb, void *opaque) { BlockDriver *drv = bs-drv; - +BlockDriverAIOCB *ret; +int64_t wait_time = -1; +printf(sector_num=%ld, nb_sectors=%d\n, sector_num, nb_sectors); trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque); -if (!drv) -return NULL; -if (bdrv_check_request(bs, sector_num, nb_sectors)) +if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) { return NULL; +} + +/* throttling disk read I/O */ +if (bs-io_limits_enabled) { +if (bdrv_exceed_io_limits(bs, nb_sectors, false, wait_time)) { +ret = qemu_block_queue_enqueue(bs-block_queue, bs, bdrv_aio_readv, + sector_num, qiov, nb_sectors, cb, opaque); +printf(wait_time=%ld\n, wait_time); +if (wait_time != -1) { +printf(reset block timer\n); +qemu_mod_timer(bs-block_timer, + wait_time + qemu_get_clock_ns(vm_clock)); +} + +if (ret) { +printf(ori ret is not null\n); +} else { +printf(ori ret is null\n); +} + +return ret; +} +} -return drv-bdrv_aio_readv(bs, sector_num, qiov, nb_sectors, +ret = drv-bdrv_aio_readv(bs, sector_num, qiov, nb_sectors, cb, opaque); +if (ret) { +if (bs-io_limits_enabled) { +bs-io_disps.bytes[BLOCK_IO_LIMIT_READ] += + (unsigned) nb_sectors * BDRV_SECTOR_SIZE; +bs-io_disps.ios[BLOCK_IO_LIMIT_READ]++; +} +} + +return ret; } typedef struct BlockCompleteData { @@ -2396,15 +2455,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num, BlockDriver *drv = bs-drv; BlockDriverAIOCB *ret; BlockCompleteData *blk_cb_data; +int64_t wait_time = -1; trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque); -if (!drv) -return NULL; -if (bs-read_only) -return NULL; -if (bdrv_check_request(bs, sector_num, nb_sectors)) +if (!drv || bs-read_only +|| bdrv_check_request(bs, sector_num, nb_sectors)) { return NULL; +} if (bs-dirty_bitmap) { blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb, @@ -2413,13 +2471,32 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num, opaque = blk_cb_data; } +
[Qemu-devel] [PATCH v8 4/4] qmp/hmp: add block_set_io_throttle
The patch introduce one new command block_set_io_throttle; For its usage syntax, if you have better idea, pls let me know. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- block.c | 26 +++- blockdev.c | 69 +++ blockdev.h |2 + hmp-commands.hx | 15 qerror.c|4 +++ qerror.h|3 ++ qmp-commands.hx | 52 - 7 files changed, 168 insertions(+), 3 deletions(-) diff --git a/block.c b/block.c index c08fde8..1d3f067 100644 --- a/block.c +++ b/block.c @@ -1938,6 +1938,16 @@ static void bdrv_print_dict(QObject *obj, void *opaque) qdict_get_bool(qdict, ro), qdict_get_str(qdict, drv), qdict_get_bool(qdict, encrypted)); + +monitor_printf(mon, bps=% PRId64 bps_rd=% PRId64 + bps_wr=% PRId64 iops=% PRId64 + iops_rd=% PRId64 iops_wr=% PRId64, +qdict_get_int(qdict, bps), +qdict_get_int(qdict, bps_rd), +qdict_get_int(qdict, bps_wr), +qdict_get_int(qdict, iops), +qdict_get_int(qdict, iops_rd), +qdict_get_int(qdict, iops_wr)); } else { monitor_printf(mon, [not inserted]); } @@ -1970,10 +1980,22 @@ void bdrv_info(Monitor *mon, QObject **ret_data) QDict *bs_dict = qobject_to_qdict(bs_obj); obj = qobject_from_jsonf({ 'file': %s, 'ro': %i, 'drv': %s, - 'encrypted': %i }, + 'encrypted': %i, + 'bps': % PRId64 , + 'bps_rd': % PRId64 , + 'bps_wr': % PRId64 , + 'iops': % PRId64 , + 'iops_rd': % PRId64 , + 'iops_wr': % PRId64 }, bs-filename, bs-read_only, bs-drv-format_name, - bdrv_is_encrypted(bs)); + bdrv_is_encrypted(bs), + bs-io_limits.bps[BLOCK_IO_LIMIT_TOTAL], + bs-io_limits.bps[BLOCK_IO_LIMIT_READ], + bs-io_limits.bps[BLOCK_IO_LIMIT_WRITE], + bs-io_limits.iops[BLOCK_IO_LIMIT_TOTAL], + bs-io_limits.iops[BLOCK_IO_LIMIT_READ], + bs-io_limits.iops[BLOCK_IO_LIMIT_WRITE]); if (bs-backing_file[0] != '\0') { QDict *qdict = qobject_to_qdict(obj); qdict_put(qdict, backing_file, diff --git a/blockdev.c b/blockdev.c index 619ae9f..7f5c4df 100644 --- a/blockdev.c +++ b/blockdev.c @@ -747,6 +747,75 @@ int do_change_block(Monitor *mon, const char *device, return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); } +/* throttling disk I/O limits */ +int do_block_set_io_throttle(Monitor *mon, + const QDict *qdict, QObject **ret_data) +{ +const char *devname = qdict_get_str(qdict, device); +uint64_t bps= qdict_get_try_int(qdict, bps, -1); +uint64_t bps_rd = qdict_get_try_int(qdict, bps_rd, -1); +uint64_t bps_wr = qdict_get_try_int(qdict, bps_wr, -1); +uint64_t iops = qdict_get_try_int(qdict, iops, -1); +uint64_t iops_rd= qdict_get_try_int(qdict, iops_rd, -1); +uint64_t iops_wr= qdict_get_try_int(qdict, iops_wr, -1); +BlockDriverState *bs; + +bs = bdrv_find(devname); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, devname); +return -1; +} + +if ((bps == -1) (bps_rd == -1) (bps_wr == -1) + (iops == -1) (iops_rd == -1) (iops_wr == -1)) { +qerror_report(QERR_MISSING_PARAMETER, + bps/bps_rd/bps_wr/iops/iops_rd/iops_wr); +return -1; +} + +if (((bps != -1) ((bps_rd != -1) || (bps_wr != -1))) +|| ((iops != -1) ((iops_rd != -1) || (iops_wr != -1 { +qerror_report(QERR_INVALID_PARAMETER_COMBINATION); +return -1; +} + +if (bps != -1) { +bs-io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps; +bs-io_limits.bps[BLOCK_IO_LIMIT_READ] = 0; +bs-io_limits.bps[BLOCK_IO_LIMIT_WRITE] = 0; +} + +if ((bps_rd != -1) || (bps_wr != -1)) { +bs-io_limits.bps[BLOCK_IO_LIMIT_READ] = + (bps_rd == -1) ? bs-io_limits.bps[BLOCK_IO_LIMIT_READ] : bps_rd; +bs-io_limits.bps[BLOCK_IO_LIMIT_WRITE] = + (bps_wr == -1) ? bs-io_limits.bps[BLOCK_IO_LIMIT_WRITE] : bps_wr; +
Re: [Qemu-devel] [PATCH -V2] iohandler: update qemu_fd_set_handler to work with null call back arg
Am 08.09.2011 12:07, schrieb Avi Kivity: On 09/07/2011 09:44 PM, Anthony Liguori wrote: I think this is a bit more complicated than is really needed. Here's what I came up with which also fixes another bug where the io channel could be freed twice. I stumbled across this via a very strange failure scenario. Avi, it might be worth trying this patch to see if it fixes your problem too. Right now, I've got more than just one problem. One thing that I found challenging debugging this, coroutines make valgrind very unhappy. Is it possible that we could have a command line switch to fall back to the thread based coroutines so to make things more valgrind friendly? How is valgrind even aware of coroutines? Unless is doesn't implement makecontext correctly, it shouldn't even be aware of them. The F15 valgrind complains three times that the program is switching stacks, but then it shuts up and just works as normal. Kevin
Re: [Qemu-devel] [PATCH V8 10/14] Encrypt state blobs using AES CBC encryption
On Wed, Sep 07, 2011 at 08:16:27PM -0400, Stefan Berger wrote: On 09/07/2011 02:55 PM, Michael S. Tsirkin wrote: On Thu, Sep 01, 2011 at 10:23:51PM -0400, Stefan Berger wrote: An additional 'layer' for reading and writing the blobs to the underlying block storage is added. This layer encrypts the blobs for writing if a key is available. Similarly it decrypts the blobs after reading. So a couple of further thoughts: 1. Raw storage should work too, and with e.g. NFS migration will be fine, right? So I'd say it's worth supporting. NFS via shared storage, yes, but not migration via Qemu's block migration mechanism. If snapshotting was supposed to be a feature to support then that's only possible via block storage (QCoW2 in particular). As disk has the same limitation, that sounds fine. Let the user decide whether snapshoting is needed, same as disk. Adding plain file support to the TPM code so it can store its 3 blobs into adds quite a bit of complexity to the code. The command line parameter that previously pointed to QCoW2 image file would probably have to point to a directory where files for the 3 blobs can be written into. Besides that, snapshotting would actually have to be prevented maybe through registering a (fake) file of other than QCoW2 type since the plain TPM files won't handle snapshotting correctly, either, and QEMU pretty much would have to be prevented from doing snapshotting at all. Maybe there's an API for this, but I don't know. Though why create this additional complexity? I don't mind relaxing the requirement of using a QCoW2 image and allowing for example RAW images (that then automatically prevent the snapshotting from happening) but the same code I now have would work for writing the blobs into it the single file. Right. Write all blobs into a single files at different offsets, or something. 2. File backed nvram is interesting outside tpm. For example,vpd and chassis number for pci, eeprom emulation for network cards. Using a file per device might be inconvenient though. So please think of a format and API that will allow sections for use by different devices. Also here 'snapshotting' is the most 'demanding' feature of QEMU I would say. Snapshotting isn't easily supported outside of the block layer from what I understand. Once you are tied to the block layer you end up having to use images and those don't grow quite well. So other devices wanting to use those type of devices would need to know what the worst case sizes are for writing their state into -- unless an image format is created that can grow. As for the format: Ideally all devices could write into one file, right? That would at least prevent too many files besides the VM's image file from floating around which presumably makes image management easier. Following the above, you add up all the worst case sizes the individual devices may need for their blobs and create an image with that capacity. Then you need some form of a (primitive?) directory that lets you write blobs into that storage. Assuming there were well defined names for those devices one could say for example store this blobs under the name 'tpm-permanent-state' and later on load it under that name. The possible size of the directory would have to be considered as well... I do something like that for the TPM where I have up to 3 such blobs that I store. The bad thing about the above is of course the need to know what the sum of all the worst case sizes is. A typical usecase I know about has prepared vpd/eeprom content. We'll typically need a tool to get binary blobs and put that into the file image. That tool can do the necessary math. We could also integrate this into qemu-img if we like. So a growable image format would be quite good to have. I haven't followed the conversations much, but is that something QCoW3 would support? I don't follow - does TPM need a growable image format? Why? Hardware typically has fixed amount of memory :) Crazy idea: Is there a filesystem that one could use and mount a filesystem onto (some) sectors of an image? Again, the best format right now is QCoW2 for this (due to snapshotting suport) where one would have to be able to mount a filesystem onto the current snapshot's available sectors. Then at least the handling of blobs would become a lot easier. Though I doubt this would be possible without custom code and lots of development. Hmm, libguestfs can do all kind of smart stuff. But we don't want qemu to depend on that. 3. Home-grown file formats give us enough trouble in migration. Could this use one of the variants of ASN.1? There are portable libraries to read/write that, even. I am not sure what 'this' refers to. What I am doing with the TPM is writing 3 independent blobs at certain offset into the QCoW2 block file. A directory in the first sector holds the offsets, sizes and crc32's of these (unencrypted)
Re: [Qemu-devel] [PATCH V8 08/14] Introduce file lock for the block layer
On Wed, Sep 07, 2011 at 08:31:45PM -0400, Stefan Berger wrote: On 09/07/2011 02:49 PM, Michael S. Tsirkin wrote: On Wed, Sep 07, 2011 at 12:08:22PM -0400, Stefan Berger wrote: On 09/07/2011 11:16 AM, Michael S. Tsirkin wrote: So it's a bug in the code then? From what I saw, yes. Migration is not complete until the passwords had been entered. Though the requirement for a correct password wasn't there before because Qemu just couldn't know which password is correct since it doesn't know what content in a VM image is correct -- just using the wrong key gives you content but it's of course not understandable. OK, we covered that on irc - the issue is that monitor on destination is inactive until migration is complete. Yes we need to fix that but no, it's not a tpm only problem. I think the TPM is the first device that needs that password before the migration switch-over happens. Yes. But we want the monitor on dest for other reasons, for example to be able to check migration status. The reason is that the TPM emulation layer needs the password/key to read the data from the QCoW2 to be able to initialize a device BEFORE the Qemu on the source side terminates thinking that the migration went ok. Previously an OS image that was 'opened' with the wrong key/password would probably cause the OS to not be able to read the data and hopefully not destroy it by writing wrongly encrypted data into it -- QEMU wasn't able to detect whether the QCoW2 encryption key was correct or not since it has not knowledge of the organization of the data inside the image. [[You'd need some form of reference point, like a sector that when written to a hash is calculated over its data and that hash is written into a location in the image. If a wrong key is given and the sector's hash ends up being != the reference hash you could say the key is wrong.]] Similar problems occur when you start a VM with an encrypted QCoW2 image. The monitor will prompt you for the password and then you start the VM and if the password was wrong the OS just won't be able to access the image. Stefan Why can't you verify the password? I do verify the key/password in the TPM driver. If the driver cannot make sense of the contents of the QCoW2 due to wrong key I simply put the driver into failure mode. That's all I can do with encrypted QCoW2. You can return error from init script which will make qemu exit. I can return an error code when the front- and backend interfaces are initialized, but that happens really early and the encyrption key entered through the monitor is not available at this point. I also don't get a notification about when the key was entered. In case of QCoW2 encryption (and migration) I delay initialization until very late, basically when the VM accesses the tpm tis hardware emulation layer again (needs to be done this way I think to support block migration where I cannot even access the block device early on at all). So it in the loadvm callback. This happens when guest is stopped on source, so no need for locks. Two bigger cases here: 1) Encryption key passed via command line: - Migration with shared storage: When Qemu is initializing on the destination side I try to access the QCoW2 file. I do some basic tests to check whether a key was needed but none was given or whether some of the content could be read to confirm a valid key. This is done really early on during startup of Qemu on the destination side while or before actually the memory pages were transferred. Graceful termination was easily possible here. - Migration using block migration: During initialization I only see an empty QCoW2 file (created by libvirt). I terminate at this point and do another initialization later on which basically comes down to initializing upon access of the TPM TIS interface. At this point graceful termination wasn't possible anymore. There may be a possibility to do this in the loadvm callback, assuming block migration at that point has already finished, which I am not quite sure. Though along with case 2) below this would then end up in 3 different times for initialization of the emulation layer. 2) QCoW2 encryption: - This maps to the last case above. Also here graceful termination wasn't possible. As for the loadvm callback: I have a note in my code that in case of QCoW2 encryption the key is not available, yet. So I even have to defer initialization further. In this case Qemu on the source machine will have terminated. Stefan The point is to decrypt when you start running on dest. When the monitor gets the key for the QCoW2 it would have to invoke the TPM driver code (callback) and return an error code if the key was found to be wrong and display an error message that libvirt could react to. Afaik none of the callback and error display logic is in place. Is that something we could add later as an improvement? What we
Re: [Qemu-devel] [PATCH] pci: add standard bridge device
On Thu, Sep 08, 2011 at 05:58:12PM +0800, Wen Congyang wrote: At 09/08/2011 05:43 PM, Gerd Hoffmann Write: Hi, I modify the code like this, and the PCI_INTERRUPT_LINE register is set, and I can bind it to uio_pci_generic: --- a/src/pciinit.c +++ b/src/pciinit.c @@ -575,6 +575,8 @@ static int pci_bios_init_root_regions(u32 start, u32 end) pci_bios_init_bus_bases(busses[0]); -pci_bios_map_device_in_bus(0 /* host bus */); +for (bus = 0; bus= MaxPCIBus; bus++) { +pci_bios_map_device_in_bus(bus /* host bus */); No. pci_bios_map_device_in_bus goes down recursively when it finds a bridge, so it should cover all devices already. Yes, pci_bios_map_device() goes down recursively. The value seems to be wrong though, I think. It seems to simply use the interrupt pin as array index. Instead, each bridge should interrupts as follows: /* Mapping mandated by PCI-to-PCI Bridge architecture specification, * revision 1.2 */ /* Table 9-1: Interrupt Binding for Devices Behind a Bridge */ static int pci_bridge_dev_map_irq_fn(PCIDevice *dev, int irq_num) { return (irq_num + PCI_SLOT(dev-devfn)) % PCI_NUM_PINS; } until we get to the host bridge. -pci_bios_init_device_in_bus(0 /* host bus */); +pci_bios_init_device_in_bus(bus /* host bus */); +} That is correct. Can be done easier though by just not limiting device initialization to a specific bus like in the attached patch. Does that one work for you? I test it, and it works for me. Thanks Wen Congyang cheers, Gerd
[Qemu-devel] [PATCH] pci: Remove unused pci_reserve_capability
eepro100 was the last user. Now pci_add_capability is powerful enough. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/pci.c |6 -- hw/pci.h |2 -- 2 files changed, 0 insertions(+), 8 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 57ff7b1..63c346d 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -2028,12 +2028,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size) pdev-config[PCI_STATUS] = ~PCI_STATUS_CAP_LIST; } -/* Reserve space for capability at a known offset (to call after load). */ -void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size) -{ -memset(pdev-used + offset, 0xff, size); -} - uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id) { return pci_find_capability_list(pdev, cap_id, NULL); diff --git a/hw/pci.h b/hw/pci.h index 391217e..f2dae63 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -209,8 +209,6 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size); -void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t size); - uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id); -- 1.7.3.4
[Qemu-devel] [PATCH V2] [SPARC] Gdbstub: Fix back-trace on SPARC32
Gdb expects all registers windows to be flushed in ram, which is not the case in Qemu. Therefore the back-trace generation doesn't work. This patch adds a function to handle reads (and only read) in stack frames as if windows were flushed. Signed-off-by: Fabien Chouteau chout...@adacore.com --- V2: * only handle reads in stack frames gdbstub.c | 16 +++-- target-sparc/cpu.h|7 target-sparc/helper.c | 84 + 3 files changed, 104 insertions(+), 3 deletions(-) diff --git a/gdbstub.c b/gdbstub.c index 3b87c27..7802c5f 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -41,6 +41,15 @@ #include qemu_socket.h #include kvm.h +#ifndef TARGET_CPU_MEMORY_RW_DEBUG +static inline int target_memory_rw_debug(CPUState *env, target_ulong addr, + uint8_t *buf, int len, int is_write) +{ +return cpu_memory_rw_debug(env, addr, buf, len, is_write); +} +#else +/* target_memory_rw_debug() defined in cpu.h */ +#endif enum { GDB_SIGNAL_0 = 0, @@ -2013,7 +2022,7 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf) if (*p == ',') p++; len = strtoull(p, NULL, 16); -if (cpu_memory_rw_debug(s-g_cpu, addr, mem_buf, len, 0) != 0) { +if (target_memory_rw_debug(s-g_cpu, addr, mem_buf, len, 0) != 0) { put_packet (s, E14); } else { memtohex(buf, mem_buf, len); @@ -2028,10 +2037,11 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf) if (*p == ':') p++; hextomem(mem_buf, p, len); -if (cpu_memory_rw_debug(s-g_cpu, addr, mem_buf, len, 1) != 0) +if (target_memory_rw_debug(s-g_cpu, addr, mem_buf, len, 1) != 0) { put_packet(s, E14); -else +} else { put_packet(s, OK); +} break; case 'p': /* Older gdb are really dumb, and don't use 'g' if 'p' is avaialable. diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h index 8654f26..19de5ba 100644 --- a/target-sparc/cpu.h +++ b/target-sparc/cpu.h @@ -495,6 +495,13 @@ int cpu_sparc_handle_mmu_fault(CPUSPARCState *env1, target_ulong address, int rw target_ulong mmu_probe(CPUSPARCState *env, target_ulong address, int mmulev); void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUState *env); +#if !defined(TARGET_SPARC64) !defined(CONFIG_USER_ONLY) +int target_memory_rw_debug(CPUState *env, target_ulong addr, + uint8_t *buf, int len, int is_write); +#define TARGET_CPU_MEMORY_RW_DEBUG +#endif + + /* translate.c */ void gen_intermediate_code_init(CPUSPARCState *env); diff --git a/target-sparc/helper.c b/target-sparc/helper.c index 1fe1f07..c80531a 100644 --- a/target-sparc/helper.c +++ b/target-sparc/helper.c @@ -358,6 +358,90 @@ void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUState *env) } } +#if !defined(CONFIG_USER_ONLY) + +/* Gdb expects all registers windows to be flushed in ram. This function handles + * reads (and only reads) in stack frames as if windows were flushed. We assume + * that the sparc ABI is followed. + */ +int target_memory_rw_debug(CPUState *env, target_ulong addr, + uint8_t *buf, int len, int is_write) +{ +int i; +int len1; +int cwp = env-cwp; + +if (!is_write) { +for (i = 0; i env-nwindows; i++) { +int off; +target_ulong fp = env-regbase[cwp * 16 + 22]; + +/* Assume fp == 0 means end of frame. */ +if (fp == 0) { +break; +} + +cwp = cpu_cwp_inc(env, cwp + 1); + +/* Invalid window ? */ +if (env-wim (1 cwp)) { +break; +} + +/* According to the ABI, the stack is growing downward. */ +if (addr + len fp) { +break; +} + +/* Not in this frame. */ +if (addr fp + 64) { +continue; +} + +/* Handle access before this window. */ +if (addr fp) { +len1 = fp - addr; +if (cpu_memory_rw_debug(env, addr, buf, len1, is_write) != 0) { +return -1; +} +addr += len1; +len -= len1; +buf += len1; +} + +/* Access byte per byte to registers. Not very efficient but speed + * is not critical. + */ +off = addr - fp; +len1 = 64 - off; + +if (len1 len) { +len1 = len; +} + +for (; len1; len1--) { +int reg = cwp * 16 + 8 + (off 2); +union { +uint32_t v; +uint8_t c[4]; +} u; +u.v = cpu_to_be32(env-regbase[reg]); +*buf++ = u.c[off 3]; +
Re: [Qemu-devel] [PATCH] pci: add standard bridge device
At 09/08/2011 06:42 PM, Michael S. Tsirkin Write: On Thu, Sep 08, 2011 at 05:58:12PM +0800, Wen Congyang wrote: At 09/08/2011 05:43 PM, Gerd Hoffmann Write: Hi, I modify the code like this, and the PCI_INTERRUPT_LINE register is set, and I can bind it to uio_pci_generic: --- a/src/pciinit.c +++ b/src/pciinit.c @@ -575,6 +575,8 @@ static int pci_bios_init_root_regions(u32 start, u32 end) pci_bios_init_bus_bases(busses[0]); -pci_bios_map_device_in_bus(0 /* host bus */); +for (bus = 0; bus= MaxPCIBus; bus++) { +pci_bios_map_device_in_bus(bus /* host bus */); No. pci_bios_map_device_in_bus goes down recursively when it finds a bridge, so it should cover all devices already. Yes, pci_bios_map_device() goes down recursively. The value seems to be wrong though, I think. It seems to simply use the interrupt pin as array index. Instead, each bridge should interrupts as follows: /* Mapping mandated by PCI-to-PCI Bridge architecture specification, * revision 1.2 */ /* Table 9-1: Interrupt Binding for Devices Behind a Bridge */ static int pci_bridge_dev_map_irq_fn(PCIDevice *dev, int irq_num) { return (irq_num + PCI_SLOT(dev-devfn)) % PCI_NUM_PINS; } until we get to the host bridge. I use gdb to debug, and find that this function is never called. Thanks Wen Congyang -pci_bios_init_device_in_bus(0 /* host bus */); +pci_bios_init_device_in_bus(bus /* host bus */); +} That is correct. Can be done easier though by just not limiting device initialization to a specific bus like in the attached patch. Does that one work for you? I test it, and it works for me. Thanks Wen Congyang cheers, Gerd
Re: [Qemu-devel] [PATCH] pci: add standard bridge device
On Thu, Sep 08, 2011 at 07:03:10PM +0800, Wen Congyang wrote: At 09/08/2011 06:42 PM, Michael S. Tsirkin Write: On Thu, Sep 08, 2011 at 05:58:12PM +0800, Wen Congyang wrote: At 09/08/2011 05:43 PM, Gerd Hoffmann Write: Hi, I modify the code like this, and the PCI_INTERRUPT_LINE register is set, and I can bind it to uio_pci_generic: --- a/src/pciinit.c +++ b/src/pciinit.c @@ -575,6 +575,8 @@ static int pci_bios_init_root_regions(u32 start, u32 end) pci_bios_init_bus_bases(busses[0]); -pci_bios_map_device_in_bus(0 /* host bus */); +for (bus = 0; bus= MaxPCIBus; bus++) { +pci_bios_map_device_in_bus(bus /* host bus */); No. pci_bios_map_device_in_bus goes down recursively when it finds a bridge, so it should cover all devices already. Yes, pci_bios_map_device() goes down recursively. The value seems to be wrong though, I think. It seems to simply use the interrupt pin as array index. Instead, each bridge should interrupts as follows: /* Mapping mandated by PCI-to-PCI Bridge architecture specification, * revision 1.2 */ /* Table 9-1: Interrupt Binding for Devices Behind a Bridge */ static int pci_bridge_dev_map_irq_fn(PCIDevice *dev, int irq_num) { return (irq_num + PCI_SLOT(dev-devfn)) % PCI_NUM_PINS; } until we get to the host bridge. I use gdb to debug, and find that this function is never called. Thanks Wen Congyang No, I mean that bios must implement this logic. You don't see it called probably because ivshmem does not cause interrupts for you. -pci_bios_init_device_in_bus(0 /* host bus */); +pci_bios_init_device_in_bus(bus /* host bus */); +} That is correct. Can be done easier though by just not limiting device initialization to a specific bus like in the attached patch. Does that one work for you? I test it, and it works for me. Thanks Wen Congyang cheers, Gerd
Re: [Qemu-devel] [PATCH v3 00/27] Block layer cleanup fixes
Am 06.09.2011 18:58, schrieb Markus Armbruster: This patch series looks bigger than it is. All the patches are small and hopefully easy to review. Objectives: * Push BlockDriverState members locked, tray_open, media_changed into device models, where they belong. Kevin picked the patches pushing media_changed from v2, so that part's gone already. * BlockDriverState member removable is a confusing mess, replace it. * Improve eject -f. Also clean up minor messes as they get in the way. It is based on Kevin's block branch. Part I: Move tray state to device models PATCH 01-05 IDE tray open/closed PATCH 06-07 SCSI tray open/closed PATCH 08-09 Kill BlockDriverState tray_open PATCH 10-11 IDE SCSI track tray lock PATCH 12-14 Kill BlockDriverState locked PATCH 15-16 IDE SCSI tray bug fixes PATCH 17IDE migrate tray state Part II: Miscellaneous PATCH 18-19 Replace BlockDriverState removable PATCH 20Cover tray open/closed in info block PATCH 21-25 Reduce unclean use of block_int.h PATCH 26-27 Improve eject -f Naturally, I want all parts applied. But I did my best to make applying only a prefix workable, too. Review invited from: * Kevin, Christoph and Amit reviewed previous versions. * Hannes ACKed the SCSI stuff in v2. * Luiz reviewed the patches that affect QMP's query-block. I renamed response member ejected to tray-open since then. * Paolo commented PATCH 17 `ide/atapi: Preserve tray state on migration'. * Stefano reviewed v1 of PATCH 18 (affects -drive if=xen). Testing * Linux installs from CD to empty disk, then boots fine from disk. * For both IDE and SCSI: - info block reports tray state correctly - Guest locking the tray stops eject (without -f) and change - eject -f; change works even while tray is locked by guest - Reading /dev/sr0 with tray open behaves as before: IDE closes the tray and reads (matches bare metal), SCSI reports no medium - Tray state is migrated correctly (tested with savevm/loadvm) * Guest still notices CD media change (IDE only, SCSI doesn't work before or after my patches because GESN isn't implemented) * Migrating ide-cd to older version works when tray is closed and unlocked, else fails (tested with savevm/loadvm) v3: * Rebased to block branch cfc606da - Old PATCH 01-05,25,28-34,40 already there, drop - a couple of simple conflicts in hw/scsi-disk.c * Drop old PATCH v2 27 scsi-disk: Preserve tray state on migration, because it doesn't make much sense without working SCSI migration. * Drop old PATCH v2 22 ide/atapi: Avoid physical/virtual tray state mismatch, because it has a bug, how to best fix it isn't obvious, and it's not essential to this series. Drop related PATCH v2 20,24, too. I plan to revisit them later. * Clean up `ide: Use a table to declare which drive kinds accept each command' a bit (Blue Kevin) * Hannes's advice: - Rename some SCSISense variables * Kevin's advice: - Add comments to explain MMC-5 jargon loej - Make bdrv_lock_medium() parameter locked bool. v2: * Rebased to block branch; non-trivial conflicts: - Old PATCH 01-02,06-09 already there, drop - `block: Attach non-qdev devices as well': - cover new pci_piix3_xen_ide_unplug() - hw/nand has been qdefivied, drop hunk - onenand_init() changed, rewrite hunk - pci_piix3_xen_ide_unplug() needs new PATCH 33. * Drop old PATCH 18 `scsi-disk: Reject CD-specific SCSI commands to disks' because Hannes wants to do it differently, and it's not essential to this series. * Christoph's advice: - Rework `ide: Update command code definitions as per ACS-2' - Add comment to `ide: Fix ATA command READ to set ATAPI signature for CD-ROM' - Squash `ide/atapi: Track tray open/close state' and `ide/atapi: Switch from BlockDriverState's tray_open to own' - Squash `ide/atapi: Track tray locked state' and `ide/atapi: Switch from BlockDriverState's locked to own tray_locked' - Squash `scsi-disk: Track tray locked state' and `scsi-disk: Switch from BlockDriverState's locked to own tray_locked' - Drop `block: Move BlockDriverAIOCB friends from block_int.h to block.h' * Luiz's advice: - Change query-block to always include ejected for removable devices. Requires moving `block: Show whether the guest ejected the medium in info block', which causes a bunch of conflicts. * A few cosmetic improvements Markus Armbruster (27): ide: Fix ATA command READ to set ATAPI signature for CD-ROM ide: Use a table to declare which drive kinds accept each command ide: Reject ATA commands specific to drive kinds ide/atapi: Clean up misleading name in cmd_start_stop_unit() ide/atapi: Track tray open/close state scsi-disk: Factor out scsi_disk_emulate_start_stop() scsi-disk: Track tray open/close state block: Revert entanglement of
[Qemu-devel] [PATCH] qcow2: initialize metadata before inserting in cluster_allocs
QCow2Meta structure was inserted into list before many fields are initialized. Currently is not a problem cause all occur in a lock but if qcow2_alloc_clusters would in a future unlock this lock some issues could arise. Initializing fields before inserting fix the problem. Signed-off-by: Frediano Ziglio fredd...@gmail.com --- block/qcow2-cluster.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 113db8b..428b5ad 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -806,6 +806,11 @@ again: abort(); } +/* save info needed for meta data update */ +m-offset = offset; +m-n_start = n_start; +m-nb_clusters = nb_clusters; + QLIST_INSERT_HEAD(s-cluster_allocs, m, next_in_flight); /* allocate a new cluster */ @@ -816,11 +821,6 @@ again: goto fail; } -/* save info needed for meta data update */ -m-offset = offset; -m-n_start = n_start; -m-nb_clusters = nb_clusters; - out: ret = qcow2_cache_put(bs, s-l2_table_cache, (void**) l2_table); if (ret 0) { -- 1.7.1
Re: [Qemu-devel] [PATCH] qcow2: initialize metadata before inserting in cluster_allocs
Am 08.09.2011 13:38, schrieb Frediano Ziglio: QCow2Meta structure was inserted into list before many fields are initialized. Currently is not a problem cause all occur in a lock but if qcow2_alloc_clusters would in a future unlock this lock some issues could arise. Initializing fields before inserting fix the problem. Signed-off-by: Frediano Ziglio fredd...@gmail.com Thanks, applied to the block branch. Kevin
Re: [Qemu-devel] [PATCH V8 10/14] Encrypt state blobs using AES CBC encryption
On 09/08/2011 06:32 AM, Michael S. Tsirkin wrote: On Wed, Sep 07, 2011 at 08:16:27PM -0400, Stefan Berger wrote: On 09/07/2011 02:55 PM, Michael S. Tsirkin wrote: On Thu, Sep 01, 2011 at 10:23:51PM -0400, Stefan Berger wrote: An additional 'layer' for reading and writing the blobs to the underlying block storage is added. This layer encrypts the blobs for writing if a key is available. Similarly it decrypts the blobs after reading. So a couple of further thoughts: 1. Raw storage should work too, and with e.g. NFS migration will be fine, right? So I'd say it's worth supporting. NFS via shared storage, yes, but not migration via Qemu's block migration mechanism. If snapshotting was supposed to be a feature to support then that's only possible via block storage (QCoW2 in particular). As disk has the same limitation, that sounds fine. Let the user decide whether snapshoting is needed, same as disk. Adding plain file support to the TPM code so it can store its 3 blobs into adds quite a bit of complexity to the code. The command line parameter that previously pointed to QCoW2 image file would probably have to point to a directory where files for the 3 blobs can be written into. Besides that, snapshotting would actually have to be prevented maybe through registering a (fake) file of other than QCoW2 type since the plain TPM files won't handle snapshotting correctly, either, and QEMU pretty much would have to be prevented from doing snapshotting at all. Maybe there's an API for this, but I don't know. Though why create this additional complexity? I don't mind relaxing the requirement of using a QCoW2 image and allowing for example RAW images (that then automatically prevent the snapshotting from happening) but the same code I now have would work for writing the blobs into it the single file. Right. Write all blobs into a single files at different offsets, or something. That's exactly what I am doing already. Just that I am doing this with Qemu's BlockStorage (bdrv) writing to sectors rather than seek()ing in files. To avoid more complexity I'd rather not introduce more code handling plain files but rely on all the image formats that qemu already supports and that give features like encryption (QCoW2 only), snapshotting (QCoW2 only) and block migration (presumably all of them). Plain files offer none of that. Devices that need to write their state to persistent storage really have to aim for doing this through Qemu's bdrv since they will otherwise be the ones killing the snapshot feature. TPM certainly doesn't want to be one of them. If the user doesn't want snapshotting to be supported since his VM image files are not QCoW2 type of files, just create a raw image file for the TPM's persistent state and bdrv will automatically prevent snapshotting. The point is that the TPM code now using the bdrv layer works with any image format already. 2. File backed nvram is interesting outside tpm. For example,vpd and chassis number for pci, eeprom emulation for network cards. Using a file per device might be inconvenient though. So please think of a format and API that will allow sections for use by different devices. Also here 'snapshotting' is the most 'demanding' feature of QEMU I would say. Snapshotting isn't easily supported outside of the block layer from what I understand. Once you are tied to the block layer you end up having to use images and those don't grow quite well. So other devices wanting to use those type of devices would need to know what the worst case sizes are for writing their state into -- unless an image format is created that can grow. As for the format: Ideally all devices could write into one file, right? That would at least prevent too many files besides the VM's image file from floating around which presumably makes image management easier. Following the above, you add up all the worst case sizes the individual devices may need for their blobs and create an image with that capacity. Then you need some form of a (primitive?) directory that lets you write blobs into that storage. Assuming there were well defined names for those devices one could say for example store this blobs under the name 'tpm-permanent-state' and later on load it under that name. The possible size of the directory would have to be considered as well... I do something like that for the TPM where I have up to 3 such blobs that I store. The bad thing about the above is of course the need to know what the sum of all the worst case sizes is. A typical usecase I know about has prepared vpd/eeprom content. We'll typically need a tool to get binary blobs and put that into the file image. That tool can do the necessary math. We could also integrate this into qemu-img if we like. So a growable image format would be quite good to have. I haven't followed the conversations much, but is that something QCoW3 would support? I don't follow - does TPM need a growable image format? Why?
Re: [Qemu-devel] [PATCH -V2] iohandler: update qemu_fd_set_handler to work with null call back arg
On 09/08/2011 05:07 AM, Avi Kivity wrote: On 09/07/2011 09:44 PM, Anthony Liguori wrote: I think this is a bit more complicated than is really needed. Here's what I came up with which also fixes another bug where the io channel could be freed twice. I stumbled across this via a very strange failure scenario. Avi, it might be worth trying this patch to see if it fixes your problem too. Right now, I've got more than just one problem. One thing that I found challenging debugging this, coroutines make valgrind very unhappy. Is it possible that we could have a command line switch to fall back to the thread based coroutines so to make things more valgrind friendly? How is valgrind even aware of coroutines? Unless is doesn't implement makecontext correctly, it shouldn't even be aware of them. It detects stack switching and has trouble differentiating between a legitimate stack switch and something more nefarious. I believe the heuristic it currently uses is the distance that RSP moves. If it moves more than a certain threshold, it assumes that's a stack switch. Regards, Anthony Liguori
Re: [Qemu-devel] [PULL 0/3] Trivial patches for Auguest 25 to September 2 2011
On Fri, Sep 2, 2011 at 11:12 AM, Stefan Hajnoczi stefa...@linux.vnet.ibm.com wrote: The following changes since commit 625f9e1f54cd78ee98ac22030da527c9a1cc9d2b: Merge remote-tracking branch 'stefanha/trivial-patches' into staging (2011-09-01 13:57:19 -0500) are available in the git repository at: ssh://repo.or.cz/srv/git/qemu/stefanha.git trivial-patches Boris Figovsky (1): x86: fix daa opcode for al register values higher than 0xf9 Brad Smith (1): libcacard: use INSTALL_DATA for data Stefan Weil (1): sh4: Fix potential crash in debug code hw/sh_intc.c | 9 + libcacard/Makefile | 2 +- target-i386/op_helper.c | 6 +++--- 3 files changed, 9 insertions(+), 8 deletions(-) Ping? Stefan
Re: [Qemu-devel] [PATCH V8 10/14] Encrypt state blobs using AES CBC encryption
On Thu, Sep 08, 2011 at 08:11:00AM -0400, Stefan Berger wrote: On 09/08/2011 06:32 AM, Michael S. Tsirkin wrote: On Wed, Sep 07, 2011 at 08:16:27PM -0400, Stefan Berger wrote: On 09/07/2011 02:55 PM, Michael S. Tsirkin wrote: On Thu, Sep 01, 2011 at 10:23:51PM -0400, Stefan Berger wrote: An additional 'layer' for reading and writing the blobs to the underlying block storage is added. This layer encrypts the blobs for writing if a key is available. Similarly it decrypts the blobs after reading. So a couple of further thoughts: 1. Raw storage should work too, and with e.g. NFS migration will be fine, right? So I'd say it's worth supporting. NFS via shared storage, yes, but not migration via Qemu's block migration mechanism. If snapshotting was supposed to be a feature to support then that's only possible via block storage (QCoW2 in particular). As disk has the same limitation, that sounds fine. Let the user decide whether snapshoting is needed, same as disk. Adding plain file support to the TPM code so it can store its 3 blobs into adds quite a bit of complexity to the code. The command line parameter that previously pointed to QCoW2 image file would probably have to point to a directory where files for the 3 blobs can be written into. Besides that, snapshotting would actually have to be prevented maybe through registering a (fake) file of other than QCoW2 type since the plain TPM files won't handle snapshotting correctly, either, and QEMU pretty much would have to be prevented from doing snapshotting at all. Maybe there's an API for this, but I don't know. Though why create this additional complexity? I don't mind relaxing the requirement of using a QCoW2 image and allowing for example RAW images (that then automatically prevent the snapshotting from happening) but the same code I now have would work for writing the blobs into it the single file. Right. Write all blobs into a single files at different offsets, or something. That's exactly what I am doing already. Just that I am doing this with Qemu's BlockStorage (bdrv) writing to sectors rather than seek()ing in files. To avoid more complexity I'd rather not introduce more code handling plain files but rely on all the image formats that qemu already supports and that give features like encryption (QCoW2 only), snapshotting (QCoW2 only) and block migration (presumably all of them). Plain files offer none of that. Devices that need to write their state to persistent storage really have to aim for doing this through Qemu's bdrv since they will otherwise be the ones killing the snapshot feature. TPM certainly doesn't want to be one of them. If the user doesn't want snapshotting to be supported since his VM image files are not QCoW2 type of files, just create a raw image file for the TPM's persistent state and bdrv will automatically prevent snapshotting. The point is that the TPM code now using the bdrv layer works with any image format already. Ah, that's fine then. I had an impression there was a qcow only limitation, not sure what in code gave me that impression. 2. File backed nvram is interesting outside tpm. For example,vpd and chassis number for pci, eeprom emulation for network cards. Using a file per device might be inconvenient though. So please think of a format and API that will allow sections for use by different devices. Also here 'snapshotting' is the most 'demanding' feature of QEMU I would say. Snapshotting isn't easily supported outside of the block layer from what I understand. Once you are tied to the block layer you end up having to use images and those don't grow quite well. So other devices wanting to use those type of devices would need to know what the worst case sizes are for writing their state into -- unless an image format is created that can grow. As for the format: Ideally all devices could write into one file, right? That would at least prevent too many files besides the VM's image file from floating around which presumably makes image management easier. Following the above, you add up all the worst case sizes the individual devices may need for their blobs and create an image with that capacity. Then you need some form of a (primitive?) directory that lets you write blobs into that storage. Assuming there were well defined names for those devices one could say for example store this blobs under the name 'tpm-permanent-state' and later on load it under that name. The possible size of the directory would have to be considered as well... I do something like that for the TPM where I have up to 3 such blobs that I store. The bad thing about the above is of course the need to know what the sum of all the worst case sizes is. A typical usecase I know about has prepared vpd/eeprom content. We'll typically need a tool to get binary blobs and put that into the file image. That tool can do the necessary math. We could
Re: [Qemu-devel] CFQ I/O starvation problem triggered by RHEL6.0 KVM guests
On Thu, Sep 08, 2011 at 06:13:53PM +0900, Takuya Yoshikawa wrote: This is a report of strange cfq behaviour which seems to be triggered by QEMU posix aio threads. Host environment: OS: RHEL6.0 KVM/qemu-kvm (with no patch applied) IO scheduler: cfq (with the default parameters) So you are using both RHEL 6.0 in both host and guest kernel? Can you reproduce the same issue with upstream kernels? How easily/frequently you can reproduce this with RHEL6.0 host. On the host, we were running 3 linux guests to see if I/O from these guests would be handled fairly by host; each guest did dd write with oflag=direct. Guest virtual disk: We used a host local disk which had 3 partitions, and each guest was allocated one of these as dd write target. So our test was for checking if cfq could keep fairness for the 3 guests who shared the same disk. The result (strage starvation): Sometimes, one guest dominated cfq for more than 10sec and requests from other guests were not handled at all during that time. Below is the blktrace log which shows that a request to (8,27) in cfq2068S (*1) is not handled at all during cfq2095S and cfq2067S which hold requests to (8,26) are being handled alternately. *1) WS 104920578 + 64 Question: I guess that cfq_close_cooperator() was being called in an unusual manner. If so, do you think that cfq is responsible for keeping fairness for this kind of unusual write requests? - If two guests are doing IO to separate partitions, they should really not be very close (until and unless partitions are really small). - Even if there are close cooperators, these queues are merged and they are treated as single queue from slice point of view. So cooperating queues should be merged and get a single slice instead of starving other queues in the system. Can you upload the blktrace logs somewhere which shows what happened during that 10 seconds. Note: With RHEL6.1, this problem could not triggered. But I guess that was due to QEMU's block layer updates. You can try reproducing this with fio. Thanks Vivek
Re: [Qemu-devel] [PATCH 0/2] improve qemu-img conversion performance
Am 08.09.2011 01:06, schrieb Yehuda Sadeh: The following set of patches improve the qemu-img conversion process performance. When using a higher latency backend, small writes have a severe impact on the time it takes to do image conversion. We switch to using async writes, and we avoid splitting writes due to holes when the holes are small enough. Yehuda Sadeh (2): qemu-img: async write to block device when converting image qemu-img: don't skip writing small holes qemu-img.c | 34 +++--- 1 files changed, 27 insertions(+), 7 deletions(-) This doesn't seem to be against git master or the block tree. Please rebase. I think that commit a22f123c may obsolete your patch 2/2. Kevin
Re: [Qemu-devel] [PATCH] target-i386: Compute all flag data inside %cl != 0 test.
On Thu, 8 Sep 2011, Richard Henderson wrote: The (x (cl - 1)) quantity is only used if CL != 0. Move the computation of that quantity nearer its use. This avoids the creation of undefined TCG operations when the constant propagation optimization proves that CL == 0, and thus CL-1 is outside the range [0-wordsize). Signed-off-by: Richard Henderson r...@twiddle.net Applied, thanks. [..snip..] -- mailto:av1...@comtv.ru
Re: [Qemu-devel] [PULL] File descriptor reclaim patchset for VirtFS
On 09/01/2011 12:25 PM, Aneesh Kumar K.V wrote: The following changes since commit 56a7a874e962e28522857fbf72eaefb1a07e2001: Merge remote-tracking branch 'stefanha/trivial-patches' into staging (2011-08-25 07:50:07 -0500) are available in the git repository at: git://repo.or.cz/qemu/v9fs.git for-upstream-3 Pulled. Thanks. Regards, Anthony Liguori Aneesh Kumar K.V (6): hw/9pfs: Add reference counting for fid hw/9pfs: Add file descriptor reclaim support hw/9pfs: init fid list properly hw/9pfs: Use v9fs_do_close instead of close hw/9pfs: Add directory reclaim support hw/9pfs: mark directories also as un-reclaimable on unlink hw/9pfs/codir.c| 13 +- hw/9pfs/cofile.c | 19 ++- hw/9pfs/virtio-9p-coth.h |4 +- hw/9pfs/virtio-9p-device.c |2 + hw/9pfs/virtio-9p.c| 486 +++- hw/9pfs/virtio-9p.h| 24 ++- 6 files changed, 445 insertions(+), 103 deletions(-)
Re: [Qemu-devel] [PULL] usb patch queue
On 09/02/2011 04:56 AM, Gerd Hoffmann wrote: Hi, This is the current use patch queue with the following changes: * musb improvements (qdev windup) * fix ehci emulation for FreeBSD guests. * a bunch if usb-host fixes. * misc minir tweaks. please pull, Gerd Pulled. Thanks. Regards, Anthony Liguori Gerd Hoffmann (15): usb-host: start tracing support usb-host: reapurb error report fix usb-host: fix halted endpoints usb-host: limit open retries usb-host: fix configuration tracking. usb-host: claim port usb-host: endpoint table fixup usb-ehci: handle siTDs usb-host: constify port usb-host: parse port in /proc/bus/usb/devices scan usb: fix use after free usb-ccid: switch to USBDesc* usb-ccid: remote wakeup support usb: claim port at device initialization time. usb-host: tag as unmigratable Juha Riihimäki (1): usb-musb: Add reset function Peter Maydell (2): usb: Remove leading underscores from __musb_irq_max usb-musb: Take a DeviceState* in init function hw/tusb6010.c | 11 +- hw/usb-bus.c | 110 -- hw/usb-ccid.c | 248 +++- hw/usb-desc.h |2 +- hw/usb-ehci.c | 65 +++-- hw/usb-hub.c | 12 +-- hw/usb-musb.c | 26 +++- hw/usb-ohci.c |4 +- hw/usb-uhci.c | 11 +- hw/usb.c | 37 +++--- hw/usb.h | 11 +- trace-events | 32 usb-linux.c | 448 ++--- 13 files changed, 561 insertions(+), 456 deletions(-) The following changes since commit 625f9e1f54cd78ee98ac22030da527c9a1cc9d2b: Merge remote-tracking branch 'stefanha/trivial-patches' into staging (2011-09-01 13:57:19 -0500) are available in the git repository at: git://git.kraxel.org/qemu usb.25 Gerd Hoffmann (15): usb-host: start tracing support usb-host: reapurb error report fix usb-host: fix halted endpoints usb-host: limit open retries usb-host: fix configuration tracking. usb-host: claim port usb-host: endpoint table fixup usb-ehci: handle siTDs usb-host: constify port usb-host: parse port in /proc/bus/usb/devices scan usb: fix use after free usb-ccid: switch to USBDesc* usb-ccid: remote wakeup support usb: claim port at device initialization time. usb-host: tag as unmigratable Juha Riihimäki (1): usb-musb: Add reset function Peter Maydell (2): usb: Remove leading underscores from __musb_irq_max usb-musb: Take a DeviceState* in init function hw/tusb6010.c | 11 +- hw/usb-bus.c | 110 -- hw/usb-ccid.c | 248 +++- hw/usb-desc.h |2 +- hw/usb-ehci.c | 65 +++-- hw/usb-hub.c | 12 +-- hw/usb-musb.c | 26 +++- hw/usb-ohci.c |4 +- hw/usb-uhci.c | 11 +- hw/usb.c | 37 +++--- hw/usb.h | 11 +- trace-events | 32 usb-linux.c | 448 ++--- 13 files changed, 561 insertions(+), 456 deletions(-)
Re: [Qemu-devel] [PULL] Memory API batch 5, v2
On 09/04/2011 10:28 AM, Avi Kivity wrote: Please pull from git://github.com/avikivity/qemu.git memory/batch v2: just a rebase to make sure bisects see the rom_device fix. Pulled. Thanks. Regards, Anthony Liguori Avi Kivity (22): mips_fulong2e: convert to memory API stellaris_enet: convert to memory API sysbus: add helpers to add and delete memory regions to the system bus pci_host: convert conf index and data ports to memory API ReadWriteHandler: remove an5206: convert to memory API armv7m: convert to memory API axis_dev88: convert to memory API (RAM only) sysbus: add sysbus_add_memory_overlap() integratorcp: convert to memory API (RAM/flash only) leon3: convert to memory API cirrus: wrap memory update in a transaction piix_pci: wrap memory update in a transaction Makefile.hw: allow hw/ files to include glib headers pflash_cfi01/pflash_cfi02: convert to memory API dummy_m68k: convert to memory API lm32_boards: convert to memory API mainstone: convert to memory API mcf5208: convert to memory API milkymist-minimac2: convert to memory API milkymist-softusb: convert to memory API milkymist: convert to memory API Makefile.hw | 1 + Makefile.target | 1 - hw/an5206.c | 12 +++-- hw/arm-misc.h | 5 ++- hw/armv7m.c | 22 + hw/axis_dev88.c | 16 +++--- hw/cirrus_vga.c | 2 + hw/collie.c | 7 +-- hw/dec_pci.c | 13 +++--- hw/dummy_m68k.c | 7 ++- hw/flash.h | 13 +- hw/grackle_pci.c | 13 +++--- hw/gumstix.c | 6 +-- hw/integratorcp.c | 28 +--- hw/leon3.c | 15 --- hw/lm32_boards.c | 23 +- hw/mainstone.c | 20 + hw/mcf5208.c | 72 ++- hw/milkymist-minimac2.c | 43 +- hw/milkymist-softusb.c | 48 ++-- hw/milkymist.c | 13 +++--- hw/mips_fulong2e.c | 17 --- hw/mips_malta.c | 54 +++ hw/mips_r4k.c | 13 +++--- hw/musicpal.c | 8 ++-- hw/omap_sx1.c | 8 ++-- hw/pci_host.c | 86 - hw/pci_host.h | 16 +++ hw/petalogix_ml605_mmu.c | 5 +- hw/petalogix_s3adsp1800_mmu.c | 5 +- hw/pflash_cfi01.c | 78 ++--- hw/pflash_cfi02.c | 95 + hw/piix_pci.c | 13 +- hw/ppc405_boards.c | 49 - hw/ppc4xx_pci.c | 10 +++-- hw/ppce500_pci.c | 21 - hw/prep_pci.c | 12 - hw/r2d.c | 2 +- hw/stellaris.c | 5 ++- hw/stellaris_enet.c | 29 +--- hw/sysbus.c | 29 hw/sysbus.h | 8 +++ hw/unin_pci.c | 82 ++-- hw/virtex_ml507.c | 4 +- hw/z2.c | 2 +- rwhandler.c | 87 - rwhandler.h | 27 47 files changed, 551 insertions(+), 594 deletions(-) delete mode 100644 rwhandler.c delete mode 100644 rwhandler.h
Re: [Qemu-devel] [PULL 00/31] Block patches
On 09/06/2011 10:39 AM, Kevin Wolf wrote: The following changes since commit f69539b14bdba7a5cd22e1f4bed439b476b17286: apb_pci: convert PCI space to memory API (2011-09-04 09:28:04 +) are available in the git repository at: git://repo.or.cz/qemu/kevin.git for-anthony Pulled. Thanks. Regards, Anthony Liguori Fam Zheng (8): VMDK: enable twoGbMaxExtentFlat VMDK: add twoGbMaxExtentSparse support VMDK: separate vmdk_read_extent/vmdk_write_extent VMDK: Opening compressed extent. VMDK: read/write compressed extent VMDK: creating streamOptimized subformat VMDK: bugfix, open Haiku vmdk image VMDK: bugfix, opening vSphere 4 exported image Frediano Ziglio (1): linux aio: some comments Kevin Wolf (3): qcow2: Properly initialise QcowL2Meta qcow2: Fix error cases to run depedent requests async: Allow nested qemu_bh_poll calls Markus Armbruster (14): block: Attach non-qdev devices as well block: Generalize change_cb() to BlockDevOps block: Split change_cb() into change_media_cb(), resize_cb() ide: Update command code definitions as per ACS-2 Table B.2 ide: Clean up case label indentation in ide_exec_cmd() ide: Give vmstate structs internal linkage where possible block/raw: Fix to forward method bdrv_media_changed() block: Leave tracking media change to device models fdc: Make media change detection more robust block: Clean up bdrv_flush_all() savevm: Include writable devices with removable media xen: Clean up pci_piix3_xen_ide_unplug()'s test for not a CD spitz tosa: Simplify drive is suitable for microdrive test block: Declare qemu_blockalign() in block.h, not block_int.h Paolo Bonzini (5): scsi: execute SYNCHRONIZE_CACHE asynchronously scsi: fix accounting of writes scsi: refine constants for READ CAPACITY 16 scsi: fill in additional sense length correctly scsi: improve MODE SENSE emulation async.c | 24 +++- block.c | 104 --- block.h | 28 +++- block/qcow2.c| 12 +- block/raw-posix.c|4 + block/raw.c |7 + block/vmdk.c | 346 +++--- block_int.h | 14 +-- blockdev.c |5 +- hw/fdc.c | 46 hw/ide/core.c| 35 +++--- hw/ide/internal.h| 171 + hw/ide/piix.c|7 +- hw/pflash_cfi01.c|1 + hw/pflash_cfi02.c|1 + hw/qdev-properties.c |6 +- hw/scsi-bus.c|6 +- hw/scsi-defs.h |8 +- hw/scsi-disk.c | 157 +-- hw/sd.c | 14 +- hw/spitz.c | 10 +- hw/tosa.c| 10 +- hw/usb-msd.c |2 +- hw/virtio-blk.c | 12 +- hw/xen_disk.c|1 + linux-aio.c |1 + savevm.c |4 +- 27 files changed, 652 insertions(+), 384 deletions(-)
Re: [Qemu-devel] [PULL] spice patch queue
On 09/07/2011 02:38 AM, Gerd Hoffmann wrote: Hi, Here is the spice patch queue with a collection of bugfixes. A workaround for the much discussed spice-calls-us-from-wrong-thread issue is included because it turned out to be not *that* easily fixable in spice so it will probably take some time. Also a spice server fix wouldn't cover already released spice versions. cheers, Gerd Pulled. Thanks. Regards, Anthony Liguori The following changes since commit 344eecf6995f4a0ad1d887cec922f6806f91a3f8: mips: Support the MT TCStatus IXMT irq disable flag (2011-09-06 11:09:39 +0200) are available in the git repository at: git://anongit.freedesktop.org/spice/qemu spice.v42 Gerd Hoffmann (1): spice: workaround a spice server bug. Peter Maydell (2): spice-qemu-char.c: Use correct printf format char for ssize_t hw/qxl: Fix format string errors Yonit Halperin (3): qxl: send interrupt after migration in case ram-int_pending != 0, RHBZ #732949 qxl: s/qxl_set_irq/qxl_update_irq/ spice: set qxl-ssd.running=true before telling spice to start, RHBZ #733993 hw/qxl-logger.c|2 +- hw/qxl.c | 26 -- spice-qemu-char.c |2 +- ui/spice-core.c| 25 - ui/spice-display.c |3 ++- 5 files changed, 44 insertions(+), 14 deletions(-)
Re: [Qemu-devel] [PULL 0/2]: QMP queue
On 09/06/2011 11:44 AM, Luiz Capitulino wrote: Anthony, The following patches have been sent to the list and look good to me. I've also tested them. Pulled. Thanks. Regards, Anthony Liguori The changes (since 344eecf6995f4a0ad1d887cec922f6806f91a3f8) are available in the following repository: git://repo.or.cz/qemu/qmp-unstable.git queue/qmp Jan Kiszka (1): Fix qjson test of solidus encoding Luiz Capitulino (1): configure: Copy test data to build directory check-qjson.c |3 ++- configure |2 +- 2 files changed, 3 insertions(+), 2 deletions(-)
Re: [Qemu-devel] [PULL 0/3] Trivial patches for Auguest 25 to September 2 2011
On 09/02/2011 05:12 AM, Stefan Hajnoczi wrote: The following changes since commit 625f9e1f54cd78ee98ac22030da527c9a1cc9d2b: Merge remote-tracking branch 'stefanha/trivial-patches' into staging (2011-09-01 13:57:19 -0500) are available in the git repository at: ssh://repo.or.cz/srv/git/qemu/stefanha.git trivial-patches Pulled. Thanks. Regards, Anthony Liguori Boris Figovsky (1): x86: fix daa opcode for al register values higher than 0xf9 Brad Smith (1): libcacard: use INSTALL_DATA for data Stefan Weil (1): sh4: Fix potential crash in debug code hw/sh_intc.c|9 + libcacard/Makefile |2 +- target-i386/op_helper.c |6 +++--- 3 files changed, 9 insertions(+), 8 deletions(-)
Re: [Qemu-devel] [PATCH] s390: remove boot image detection to fix boot with newer kernels
On 07/09/11 14:34, Alexander Graf wrote: No, in theory it could change arbitrarily. The vmlinux case is unfortunate but in the end its shoot yourself in the foot, we just have to make sure that we allow a graceful exit from a looping qemu guest. That's not the answer I'd like to hear. Can't we put a magic constant somewhere for newer kernel versions that would identify those and keep the basr 13,0 hack around for older ones? I will wire up the elf loader for s390, to make vmlinux simply work. That should make the test no longer needed. There are some small problems left, e.g. the elf loader loads the kernel as bios afterwards and therefore overwrites the kernel parameter line. Will fix this in the next days. Christian
[Qemu-devel] [PATCH] Flexible array should be last in struct mbuf
The flexible array member should remain the last member in the structure as this assumption is based upon in the code. Signed-off-by: Elie Richa ri...@adacore.com --- slirp/mbuf.h |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/slirp/mbuf.h b/slirp/mbuf.h index 55170e5..e13ff71 100644 --- a/slirp/mbuf.h +++ b/slirp/mbuf.h @@ -82,12 +82,13 @@ struct m_hdr { struct mbuf { struct m_hdr m_hdr; Slirp *slirp; + bool arp_requested; + uint64_t expiration_date; union M_dat { charm_dat_[1]; /* ANSI don't like 0 sized arrays */ char*m_ext_; - } M_dat; -bool arp_requested; -uint64_t expiration_date; + } M_dat; /* This is a flexible array member. It should always remain +the last member of the structure */ }; #define m_next m_hdr.mh_next -- 1.7.4.1
[Qemu-devel] [PATCH 00/12] nbd improvements
I find nbd quite useful to test migration, but it is limited: it can only do synchronous operation, it is not safe because it does not support flush, and it has no discard either. qemu-nbd is also limited to 1MB requests, and the nbd block driver does not take this into account. Luckily, flush/FUA support is being worked out by upstream, and discard can also be added with the same framework (patches 1 to 6). Asynchronous support is also very similar to what sheepdog is already doing (patches 7 to 12). Paolo Bonzini (12): nbd: support feature negotiation nbd: sync API definitions with upstream nbd: support NBD_SET_FLAGS ioctl nbd: add support for NBD_CMD_FLUSH nbd: add support for NBD_CMD_FLAG_FUA nbd: support NBD_CMD_TRIM in the server sheepdog: add coroutine_fn markers add socket_set_block sheepdog: move coroutine send/recv function to generic code block: add bdrv_co_flush support nbd: switch to asynchronous operation nbd: split requests block.c | 53 ++--- block/nbd.c | 225 block/sheepdog.c | 235 +++--- block_int.h |1 + cutils.c | 108 + nbd.c| 80 +-- nbd.h| 20 - oslib-posix.c|7 ++ oslib-win32.c|6 ++ qemu-common.h|3 + qemu-coroutine.c | 71 qemu-coroutine.h | 26 ++ qemu-nbd.c | 13 ++-- qemu_socket.h|1 + 14 files changed, 580 insertions(+), 269 deletions(-) -- 1.7.6
[Qemu-devel] [PATCH 06/12] nbd: support NBD_CMD_TRIM in the server
Map it to bdrv_discard. The server can now expose NBD_FLAG_SEND_TRIM. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/nbd.c | 31 +++ nbd.c |9 - 2 files changed, 39 insertions(+), 1 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 5a7812c..964caa8 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -275,6 +275,36 @@ static int nbd_flush(BlockDriverState *bs) return 0; } +static int nbd_discard(BlockDriverState *bs, int64_t sector_num, + int nb_sectors) +{ +BDRVNBDState *s = bs-opaque; +struct nbd_request request; +struct nbd_reply reply; + +if (!(s-nbdflags NBD_FLAG_SEND_TRIM)) { +return 0; +} +request.type = NBD_CMD_TRIM; +request.handle = (uint64_t)(intptr_t)bs; +request.from = sector_num * 512;; +request.len = nb_sectors * 512; + +if (nbd_send_request(s-sock, request) == -1) +return -errno; + +if (nbd_receive_reply(s-sock, reply) == -1) +return -errno; + +if (reply.error !=0) +return -reply.error; + +if (reply.handle != request.handle) +return -EIO; + +return 0; +} + static void nbd_close(BlockDriverState *bs) { BDRVNBDState *s = bs-opaque; @@ -299,6 +329,7 @@ static BlockDriver bdrv_nbd = { .bdrv_write= nbd_write, .bdrv_close= nbd_close, .bdrv_flush= nbd_flush, +.bdrv_discard = nbd_discard, .bdrv_getlength= nbd_getlength, .protocol_name = nbd, }; diff --git a/nbd.c b/nbd.c index b65fb4a..f089904 100644 --- a/nbd.c +++ b/nbd.c @@ -194,7 +194,7 @@ int nbd_negotiate(int csock, off_t size, uint32_t flags) cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL); cpu_to_be64w((uint64_t*)(buf + 16), size); cpu_to_be32w((uint32_t*)(buf + 24), - flags | NBD_FLAG_HAS_FLAGS | + flags | NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA); memset(buf + 28, 0, 124); @@ -703,6 +703,13 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset, if (nbd_send_reply(csock, reply) == -1) return -1; break; +case NBD_CMD_TRIM: +TRACE(Request type is TRIM); +bdrv_discard(bs, (request.from + dev_offset) / 512, + request.len / 512); +if (nbd_send_reply(csock, reply) == -1) +return -1; +break; default: LOG(invalid request type (%u) received, request.type); errno = EINVAL; -- 1.7.6
[Qemu-devel] [PATCH 12/12] nbd: split requests
qemu-nbd has a limit of slightly less than 1M per request. Work around this in the nbd block driver. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/nbd.c | 52 ++-- 1 files changed, 46 insertions(+), 6 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 5a75263..468a517 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -213,8 +213,9 @@ static int nbd_open(BlockDriverState *bs, const char* filename, int flags) return result; } -static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num, -int nb_sectors, QEMUIOVector *qiov) +static int nbd_co_readv_1(BlockDriverState *bs, int64_t sector_num, + int nb_sectors, QEMUIOVector *qiov, + int offset) { BDRVNBDState *s = bs-opaque; struct nbd_request request; @@ -241,7 +242,7 @@ static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num, reply.error = EIO; goto done; } -if (qemu_co_recvv(s-sock, qiov-iov, request.len, 0) != request.len) { +if (qemu_co_recvv(s-sock, qiov-iov, request.len, offset) != request.len) { reply.error = EIO; } @@ -251,8 +252,9 @@ done: } -static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num, - int nb_sectors, QEMUIOVector *qiov) +static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num, + int nb_sectors, QEMUIOVector *qiov, + int offset) { BDRVNBDState *s = bs-opaque; struct nbd_request request; @@ -273,7 +275,7 @@ static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num, reply.error = errno; goto done; } -ret = qemu_co_sendv(s-sock, qiov-iov, request.len, 0); +ret = qemu_co_sendv(s-sock, qiov-iov, request.len, offset); if (ret != request.len) { reply.error = EIO; goto done; @@ -291,6 +293,44 @@ done: return -reply.error; } +/* qemu-nbd has a limit of slightly less than 1M per request. For safety, + * transfer at most 512K per request. */ +#define NBD_MAX_SECTORS 1024 + +static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num, +int nb_sectors, QEMUIOVector *qiov) +{ +int offset = 0; +int ret; +while (nb_sectors NBD_MAX_SECTORS) { +ret = nbd_co_readv_1(bs, sector_num, NBD_MAX_SECTORS, qiov, offset); +if (ret 0) { +return ret; +} +offset += NBD_MAX_SECTORS * 512; +sector_num += NBD_MAX_SECTORS; +nb_sectors -= NBD_MAX_SECTORS; +} +return nbd_co_readv_1(bs, sector_num, nb_sectors, qiov, offset); +} + +static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num, + int nb_sectors, QEMUIOVector *qiov) +{ +int offset = 0; +int ret; +while (nb_sectors NBD_MAX_SECTORS) { +ret = nbd_co_writev_1(bs, sector_num, NBD_MAX_SECTORS, qiov, offset); +if (ret 0) { +return ret; +} +offset += NBD_MAX_SECTORS * 512; +sector_num += NBD_MAX_SECTORS; +nb_sectors -= NBD_MAX_SECTORS; +} +return nbd_co_writev_1(bs, sector_num, nb_sectors, qiov, offset); +} + static int nbd_co_flush(BlockDriverState *bs) { BDRVNBDState *s = bs-opaque; -- 1.7.6
[Qemu-devel] [PATCH 11/12] nbd: switch to asynchronous operation
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/nbd.c | 167 ++ nbd.c |8 +++ 2 files changed, 117 insertions(+), 58 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 964caa8..5a75263 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -52,6 +52,9 @@ typedef struct BDRVNBDState { size_t blocksize; char *export_name; /* An NBD server may export several devices */ +CoMutex mutex; +Coroutine *coroutine; + /* If it begins with '/', this is a UNIX domain socket. Otherwise, * it's a string of the form hostname|ip4|\[ip6\]:port */ @@ -104,6 +107,37 @@ out: return err; } +static void nbd_coroutine_start(BDRVNBDState *s) +{ +qemu_co_mutex_lock(s-mutex); +s-coroutine = qemu_coroutine_self(); +} + +static void nbd_coroutine_enter(void *opaque) +{ +BDRVNBDState *s = opaque; +qemu_coroutine_enter(s-coroutine, NULL); +} + +static int nbd_co_send_request(BDRVNBDState *s, struct nbd_request *request) +{ +qemu_aio_set_fd_handler(s-sock, NULL, nbd_coroutine_enter, NULL, NULL, s); +return nbd_send_request(s-sock, request); +} + +static int nbd_co_receive_reply(BDRVNBDState *s, struct nbd_reply *reply) +{ +qemu_aio_set_fd_handler(s-sock, nbd_coroutine_enter, NULL, NULL, NULL, s); +return nbd_receive_reply(s-sock, reply); +} + +static void nbd_coroutine_end(BDRVNBDState *s) +{ +qemu_aio_set_fd_handler(s-sock, NULL, NULL, NULL, NULL, s); +s-coroutine = NULL; +qemu_co_mutex_unlock(s-mutex); +} + static int nbd_establish_connection(BlockDriverState *bs) { BDRVNBDState *s = bs-opaque; @@ -163,6 +197,8 @@ static int nbd_open(BlockDriverState *bs, const char* filename, int flags) BDRVNBDState *s = bs-opaque; int result; +qemu_co_mutex_init(s-mutex); + /* Pop the config into our state object. Exit if invalid. */ result = nbd_config(s, filename, flags); if (result != 0) { @@ -177,8 +213,8 @@ static int nbd_open(BlockDriverState *bs, const char* filename, int flags) return result; } -static int nbd_read(BlockDriverState *bs, int64_t sector_num, -uint8_t *buf, int nb_sectors) +static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num, +int nb_sectors, QEMUIOVector *qiov) { BDRVNBDState *s = bs-opaque; struct nbd_request request; @@ -189,30 +225,39 @@ static int nbd_read(BlockDriverState *bs, int64_t sector_num, request.from = sector_num * 512;; request.len = nb_sectors * 512; -if (nbd_send_request(s-sock, request) == -1) -return -errno; - -if (nbd_receive_reply(s-sock, reply) == -1) -return -errno; - -if (reply.error !=0) -return -reply.error; - -if (reply.handle != request.handle) -return -EIO; +nbd_coroutine_start(s); +if (nbd_co_send_request(s, request) == -1) { +reply.error = errno; +goto done; +} +if (nbd_co_receive_reply(s, reply) == -1) { +reply.error = errno; +goto done; +} +if (reply.error != 0) { +goto done; +} +if (reply.handle != request.handle) { +reply.error = EIO; +goto done; +} +if (qemu_co_recvv(s-sock, qiov-iov, request.len, 0) != request.len) { +reply.error = EIO; +} -if (nbd_wr_sync(s-sock, buf, request.len, 1) != request.len) -return -EIO; +done: +nbd_coroutine_end(s); +return -reply.error; -return 0; } -static int nbd_write(BlockDriverState *bs, int64_t sector_num, - const uint8_t *buf, int nb_sectors) +static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num, + int nb_sectors, QEMUIOVector *qiov) { BDRVNBDState *s = bs-opaque; struct nbd_request request; struct nbd_reply reply; +int ret; request.type = NBD_CMD_WRITE; if (!bdrv_enable_write_cache(bs) (s-nbdflags NBD_FLAG_SEND_FUA)) { @@ -223,25 +268,30 @@ static int nbd_write(BlockDriverState *bs, int64_t sector_num, request.from = sector_num * 512;; request.len = nb_sectors * 512; -if (nbd_send_request(s-sock, request) == -1) -return -errno; - -if (nbd_wr_sync(s-sock, (uint8_t*)buf, request.len, 0) != request.len) -return -EIO; - -if (nbd_receive_reply(s-sock, reply) == -1) -return -errno; - -if (reply.error !=0) -return -reply.error; - -if (reply.handle != request.handle) -return -EIO; +nbd_coroutine_start(s); +if (nbd_co_send_request(s, request) == -1) { +reply.error = errno; +goto done; +} +ret = qemu_co_sendv(s-sock, qiov-iov, request.len, 0); +if (ret != request.len) { +reply.error = EIO; +goto done; +} +if (nbd_co_receive_reply(s, reply) == -1) { +reply.error = errno; +goto done; +} +if (reply.handle != request.handle) { +reply.error =
Re: [Qemu-devel] [PATCH V8 10/14] Encrypt state blobs using AES CBC encryption
On 09/08/2011 09:16 AM, Michael S. Tsirkin wrote: On Thu, Sep 08, 2011 at 08:11:00AM -0400, Stefan Berger wrote: On 09/08/2011 06:32 AM, Michael S. Tsirkin wrote: On Wed, Sep 07, 2011 at 08:16:27PM -0400, Stefan Berger wrote: On 09/07/2011 02:55 PM, Michael S. Tsirkin wrote: On Thu, Sep 01, 2011 at 10:23:51PM -0400, Stefan Berger wrote: An additional 'layer' for reading and writing the blobs to the underlying block storage is added. This layer encrypts the blobs for writing if a key is available. Similarly it decrypts the blobs after reading. So a couple of further thoughts: 1. Raw storage should work too, and with e.g. NFS migration will be fine, right? So I'd say it's worth supporting. NFS via shared storage, yes, but not migration via Qemu's block migration mechanism. If snapshotting was supposed to be a feature to support then that's only possible via block storage (QCoW2 in particular). As disk has the same limitation, that sounds fine. Let the user decide whether snapshoting is needed, same as disk. Adding plain file support to the TPM code so it can store its 3 blobs into adds quite a bit of complexity to the code. The command line parameter that previously pointed to QCoW2 image file would probably have to point to a directory where files for the 3 blobs can be written into. Besides that, snapshotting would actually have to be prevented maybe through registering a (fake) file of other than QCoW2 type since the plain TPM files won't handle snapshotting correctly, either, and QEMU pretty much would have to be prevented from doing snapshotting at all. Maybe there's an API for this, but I don't know. Though why create this additional complexity? I don't mind relaxing the requirement of using a QCoW2 image and allowing for example RAW images (that then automatically prevent the snapshotting from happening) but the same code I now have would work for writing the blobs into it the single file. Right. Write all blobs into a single files at different offsets, or something. That's exactly what I am doing already. Just that I am doing this with Qemu's BlockStorage (bdrv) writing to sectors rather than seek()ing in files. To avoid more complexity I'd rather not introduce more code handling plain files but rely on all the image formats that qemu already supports and that give features like encryption (QCoW2 only), snapshotting (QCoW2 only) and block migration (presumably all of them). Plain files offer none of that. Devices that need to write their state to persistent storage really have to aim for doing this through Qemu's bdrv since they will otherwise be the ones killing the snapshot feature. TPM certainly doesn't want to be one of them. If the user doesn't want snapshotting to be supported since his VM image files are not QCoW2 type of files, just create a raw image file for the TPM's persistent state and bdrv will automatically prevent snapshotting. The point is that the TPM code now using the bdrv layer works with any image format already. Ah, that's fine then. I had an impression there was a qcow only limitation, not sure what in code gave me that impression. Hm, currently I force the image to be a QCoW2. bdrv_get_format(bs, buf, sizeof(buf)); if (strcmp(buf, qcow2)) { fprintf(stderr, vTPM backing store must be of type qcow2\n); goto err_exit; } I can remove this and we should be fine. 2. File backed nvram is interesting outside tpm. For example,vpd and chassis number for pci, eeprom emulation for network cards. Using a file per device might be inconvenient though. So please think of a format and API that will allow sections for use by different devices. Also here 'snapshotting' is the most 'demanding' feature of QEMU I would say. Snapshotting isn't easily supported outside of the block layer from what I understand. Once you are tied to the block layer you end up having to use images and those don't grow quite well. So other devices wanting to use those type of devices would need to know what the worst case sizes are for writing their state into -- unless an image format is created that can grow. As for the format: Ideally all devices could write into one file, right? That would at least prevent too many files besides the VM's image file from floating around which presumably makes image management easier. Following the above, you add up all the worst case sizes the individual devices may need for their blobs and create an image with that capacity. Then you need some form of a (primitive?) directory that lets you write blobs into that storage. Assuming there were well defined names for those devices one could say for example store this blobs under the name 'tpm-permanent-state' and later on load it under that name. The possible size of the directory would have to be considered as well... I do something like that for the TPM where I have up to 3 such blobs that I store. The bad thing about the above is of course the need to know
[Qemu-devel] [PATCH 09/12] sheepdog: move coroutine send/recv function to generic code
Outside coroutines, avoid busy waiting on EAGAIN by temporarily making the socket blocking. The API of qemu_recvv/qemu_sendv is slightly different from do_readv/do_writev because they do not handle coroutines. It returns the number of bytes written before encountering an EAGAIN. The specificity of yielding on EAGAIN is entirely in qemu-coroutine.c. Cc: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/sheepdog.c | 221 + cutils.c | 108 ++ qemu-common.h|3 + qemu-coroutine.c | 71 + qemu-coroutine.h | 26 +++ 5 files changed, 229 insertions(+), 200 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index af696a5..188a8d8 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -443,129 +443,6 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, QEMUIOVector *qiov, return acb; } -#ifdef _WIN32 - -struct msghdr { -struct iovec *msg_iov; -size_tmsg_iovlen; -}; - -static ssize_t sendmsg(int s, const struct msghdr *msg, int flags) -{ -size_t size = 0; -char *buf, *p; -int i, ret; - -/* count the msg size */ -for (i = 0; i msg-msg_iovlen; i++) { -size += msg-msg_iov[i].iov_len; -} -buf = g_malloc(size); - -p = buf; -for (i = 0; i msg-msg_iovlen; i++) { -memcpy(p, msg-msg_iov[i].iov_base, msg-msg_iov[i].iov_len); -p += msg-msg_iov[i].iov_len; -} - -ret = send(s, buf, size, flags); - -g_free(buf); -return ret; -} - -static ssize_t recvmsg(int s, struct msghdr *msg, int flags) -{ -size_t size = 0; -char *buf, *p; -int i, ret; - -/* count the msg size */ -for (i = 0; i msg-msg_iovlen; i++) { -size += msg-msg_iov[i].iov_len; -} -buf = g_malloc(size); - -ret = qemu_recv(s, buf, size, flags); -if (ret 0) { -goto out; -} - -p = buf; -for (i = 0; i msg-msg_iovlen; i++) { -memcpy(msg-msg_iov[i].iov_base, p, msg-msg_iov[i].iov_len); -p += msg-msg_iov[i].iov_len; -} -out: -g_free(buf); -return ret; -} - -#endif - -/* - * Send/recv data with iovec buffers - * - * This function send/recv data from/to the iovec buffer directly. - * The first `offset' bytes in the iovec buffer are skipped and next - * `len' bytes are used. - * - * For example, - * - * do_send_recv(sockfd, iov, len, offset, 1); - * - * is equals to - * - * char *buf = malloc(size); - * iov_to_buf(iov, iovcnt, buf, offset, size); - * send(sockfd, buf, size, 0); - * free(buf); - */ -static int do_send_recv(int sockfd, struct iovec *iov, int len, int offset, -int write) -{ -struct msghdr msg; -int ret, diff; - -memset(msg, 0, sizeof(msg)); -msg.msg_iov = iov; -msg.msg_iovlen = 1; - -len += offset; - -while (iov-iov_len len) { -len -= iov-iov_len; - -iov++; -msg.msg_iovlen++; -} - -diff = iov-iov_len - len; -iov-iov_len -= diff; - -while (msg.msg_iov-iov_len = offset) { -offset -= msg.msg_iov-iov_len; - -msg.msg_iov++; -msg.msg_iovlen--; -} - -msg.msg_iov-iov_base = (char *) msg.msg_iov-iov_base + offset; -msg.msg_iov-iov_len -= offset; - -if (write) { -ret = sendmsg(sockfd, msg, 0); -} else { -ret = recvmsg(sockfd, msg, 0); -} - -msg.msg_iov-iov_base = (char *) msg.msg_iov-iov_base - offset; -msg.msg_iov-iov_len += offset; - -iov-iov_len += diff; -return ret; -} - static int connect_to_sdog(const char *addr, const char *port) { char hbuf[NI_MAXHOST], sbuf[NI_MAXSERV]; @@ -618,65 +495,6 @@ success: return fd; } -static int do_readv_writev(int sockfd, struct iovec *iov, int len, - int iov_offset, int write) -{ -int ret; -again: -ret = do_send_recv(sockfd, iov, len, iov_offset, write); -if (ret 0) { -if (errno == EINTR) { -goto again; -} -if (errno == EAGAIN) { -if (qemu_in_coroutine()) { -qemu_coroutine_yield(); -} -goto again; -} -error_report(failed to recv a rsp, %s, strerror(errno)); -return 1; -} - -iov_offset += ret; -len -= ret; -if (len) { -goto again; -} - -return 0; -} - -static int do_readv(int sockfd, struct iovec *iov, int len, int iov_offset) -{ -return do_readv_writev(sockfd, iov, len, iov_offset, 0); -} - -static int do_writev(int sockfd, struct iovec *iov, int len, int iov_offset) -{ -return do_readv_writev(sockfd, iov, len, iov_offset, 1); -} - -static int do_read_write(int sockfd, void *buf, int len, int write) -{ -struct iovec iov; - -iov.iov_base = buf; -iov.iov_len = len; - -return do_readv_writev(sockfd, iov, len, 0, write); -} - -static int do_read(int
[Qemu-devel] [PATCH 10/12] block: add bdrv_co_flush support
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block.c | 53 ++--- block_int.h |1 + 2 files changed, 43 insertions(+), 11 deletions(-) diff --git a/block.c b/block.c index 43742b7..3f32f75 100644 --- a/block.c +++ b/block.c @@ -64,6 +64,9 @@ static BlockDriverAIOCB *bdrv_co_aio_readv_em(BlockDriverState *bs, static BlockDriverAIOCB *bdrv_co_aio_writev_em(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, BlockDriverCompletionFunc *cb, void *opaque); +static BlockDriverAIOCB *bdrv_co_aio_flush_em(BlockDriverState *bs, + BlockDriverCompletionFunc *cb, + void *opaque); static int coroutine_fn bdrv_co_readv_em(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *iov); @@ -204,8 +207,18 @@ void bdrv_register(BlockDriver *bdrv) } } -if (!bdrv-bdrv_aio_flush) -bdrv-bdrv_aio_flush = bdrv_aio_flush_em; +if (bdrv-bdrv_aio_flush !bdrv-bdrv_co_flush) { +/* Emulate coroutines by AIO */ +bdrv-bdrv_co_flush = bdrv_co_flush_em; +} +if (!bdrv-bdrv_aio_flush) { +/* Emulate AIO by either coroutines or sync */ +if (bdrv-bdrv_co_flush) { +bdrv-bdrv_aio_flush = bdrv_co_aio_flush_em; +} else { +bdrv-bdrv_aio_flush = bdrv_aio_flush_em; +} +} QLIST_INSERT_HEAD(bdrv_drivers, bdrv, list); } @@ -980,11 +993,6 @@ static inline bool bdrv_has_async_rw(BlockDriver *drv) || drv-bdrv_aio_readv != bdrv_aio_readv_em; } -static inline bool bdrv_has_async_flush(BlockDriver *drv) -{ -return drv-bdrv_aio_flush != bdrv_aio_flush_em; -} - /* return 0 if error. See bdrv_write() for the return codes */ int bdrv_read(BlockDriverState *bs, int64_t sector_num, uint8_t *buf, int nb_sectors) @@ -1713,8 +1721,8 @@ int bdrv_flush(BlockDriverState *bs) return 0; } -if (bs-drv bdrv_has_async_flush(bs-drv) qemu_in_coroutine()) { -return bdrv_co_flush_em(bs); +if (bs-drv bs-drv-bdrv_co_flush qemu_in_coroutine()) { +return bs-drv-bdrv_co_flush(bs); } if (bs-drv bs-drv-bdrv_flush) { @@ -2729,7 +2737,7 @@ static AIOPool bdrv_em_co_aio_pool = { .cancel = bdrv_aio_co_cancel_em, }; -static void bdrv_co_rw_bh(void *opaque) +static void bdrv_co_em_bh(void *opaque) { BlockDriverAIOCBCoroutine *acb = opaque; @@ -2751,7 +2759,7 @@ static void coroutine_fn bdrv_co_rw(void *opaque) acb-req.nb_sectors, acb-req.qiov); } -acb-bh = qemu_bh_new(bdrv_co_rw_bh, acb); +acb-bh = qemu_bh_new(bdrv_co_em_bh, acb); qemu_bh_schedule(acb-bh); } @@ -2794,6 +2802,29 @@ static BlockDriverAIOCB *bdrv_co_aio_writev_em(BlockDriverState *bs, true); } +static void coroutine_fn bdrv_co_flush(void *opaque) +{ +BlockDriverAIOCBCoroutine *acb = opaque; +BlockDriverState *bs = acb-common.bs; + +acb-req.error = bs-drv-bdrv_co_flush(bs); +acb-bh = qemu_bh_new(bdrv_co_em_bh, acb); +qemu_bh_schedule(acb-bh); +} + +static BlockDriverAIOCB *bdrv_co_aio_flush_em(BlockDriverState *bs, + BlockDriverCompletionFunc *cb, + void *opaque) +{ +Coroutine *co; +BlockDriverAIOCBCoroutine *acb; + +acb = qemu_aio_get(bdrv_em_co_aio_pool, bs, cb, opaque); +co = qemu_coroutine_create(bdrv_co_flush); +qemu_coroutine_enter(co, acb); + +return acb-common; +} static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs, BlockDriverCompletionFunc *cb, void *opaque) { diff --git a/block_int.h b/block_int.h index 8a72b80..b0cd5ea 100644 --- a/block_int.h +++ b/block_int.h @@ -83,6 +83,7 @@ struct BlockDriver { int64_t sector_num, int nb_sectors, QEMUIOVector *qiov); int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov); +int coroutine_fn (*bdrv_co_flush)(BlockDriverState *bs); int (*bdrv_aio_multiwrite)(BlockDriverState *bs, BlockRequest *reqs, int num_reqs); -- 1.7.6
[Qemu-devel] [PATCH 07/12] sheepdog: add coroutine_fn markers
This makes the following patch easier to review. Cc: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/sheepdog.c | 14 +++--- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index c1f6e07..af696a5 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -396,7 +396,7 @@ static inline int free_aio_req(BDRVSheepdogState *s, AIOReq *aio_req) return !QLIST_EMPTY(acb-aioreq_head); } -static void sd_finish_aiocb(SheepdogAIOCB *acb) +static void coroutine_fn sd_finish_aiocb(SheepdogAIOCB *acb) { if (!acb-canceled) { qemu_coroutine_enter(acb-coroutine, NULL); @@ -735,7 +735,7 @@ out: return ret; } -static int add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, +static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, struct iovec *iov, int niov, int create, enum AIOCBState aiocb_type); @@ -743,7 +743,7 @@ static int add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, * This function searchs pending requests to the object `oid', and * sends them. */ -static void send_pending_req(BDRVSheepdogState *s, uint64_t oid, uint32_t id) +static void coroutine_fn send_pending_req(BDRVSheepdogState *s, uint64_t oid, uint32_t id) { AIOReq *aio_req, *next; SheepdogAIOCB *acb; @@ -777,7 +777,7 @@ static void send_pending_req(BDRVSheepdogState *s, uint64_t oid, uint32_t id) * This function is registered as a fd handler, and called from the * main loop when s-fd is ready for reading responses. */ -static void aio_read_response(void *opaque) +static void coroutine_fn aio_read_response(void *opaque) { SheepdogObjRsp rsp; BDRVSheepdogState *s = opaque; @@ -1064,7 +1064,7 @@ out: return ret; } -static int add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, +static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, struct iovec *iov, int niov, int create, enum AIOCBState aiocb_type) { @@ -1517,7 +1517,7 @@ static int sd_truncate(BlockDriverState *bs, int64_t offset) * update metadata, this sends a write request to the vdi object. * Otherwise, this switches back to sd_co_readv/writev. */ -static void sd_write_done(SheepdogAIOCB *acb) +static void coroutine_fn sd_write_done(SheepdogAIOCB *acb) { int ret; BDRVSheepdogState *s = acb-common.bs-opaque; @@ -1615,7 +1615,7 @@ out: * Returns 1 when we need to wait a response, 0 when there is no sent * request and -errno in error cases. */ -static int sd_co_rw_vector(void *p) +static int coroutine_fn sd_co_rw_vector(void *p) { SheepdogAIOCB *acb = p; int ret = 0; -- 1.7.6
[Qemu-devel] [PATCH 04/12] nbd: add support for NBD_CMD_FLUSH
Note for the brace police: the style in this commit and the following is consistent with the rest of the file. It is then fixed together with the introduction of coroutines. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/nbd.c | 31 +++ nbd.c | 14 +- 2 files changed, 44 insertions(+), 1 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index ffc57a9..4a195dc 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -237,6 +237,36 @@ static int nbd_write(BlockDriverState *bs, int64_t sector_num, return 0; } +static int nbd_flush(BlockDriverState *bs) +{ +BDRVNBDState *s = bs-opaque; +struct nbd_request request; +struct nbd_reply reply; + +if (!(s-nbdflags NBD_FLAG_SEND_FLUSH)) { +return 0; +} + +request.type = NBD_CMD_FLUSH; +request.handle = (uint64_t)(intptr_t)bs; +request.from = 0; +request.len = 0; + +if (nbd_send_request(s-sock, request) == -1) +return -errno; + +if (nbd_receive_reply(s-sock, reply) == -1) +return -errno; + +if (reply.error !=0) +return -reply.error; + +if (reply.handle != request.handle) +return -EIO; + +return 0; +} + static void nbd_close(BlockDriverState *bs) { BDRVNBDState *s = bs-opaque; @@ -260,6 +290,7 @@ static BlockDriver bdrv_nbd = { .bdrv_read = nbd_read, .bdrv_write= nbd_write, .bdrv_close= nbd_close, +.bdrv_flush= nbd_flush, .bdrv_getlength= nbd_getlength, .protocol_name = nbd, }; diff --git a/nbd.c b/nbd.c index 30cd78f..4dbbc62 100644 --- a/nbd.c +++ b/nbd.c @@ -193,7 +193,8 @@ int nbd_negotiate(int csock, off_t size, uint32_t flags) memcpy(buf, NBDMAGIC, 8); cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL); cpu_to_be64w((uint64_t*)(buf + 16), size); -cpu_to_be32w((uint32_t*)(buf + 24), flags | NBD_FLAG_HAS_FLAGS); +cpu_to_be32w((uint32_t*)(buf + 24), + flags | NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_FLUSH); memset(buf + 28, 0, 124); if (write_sync(csock, buf, sizeof(buf)) != sizeof(buf)) { @@ -682,6 +683,18 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset, TRACE(Request type is DISCONNECT); errno = 0; return 1; +case NBD_CMD_FLUSH: +TRACE(Request type is FLUSH); + +if (bdrv_flush(bs) == -1) { +LOG(flush failed); +errno = EINVAL; +return -1; +} + +if (nbd_send_reply(csock, reply) == -1) +return -1; +break; default: LOG(invalid request type (%u) received, request.type); errno = EINVAL; -- 1.7.6
[Qemu-devel] [PATCH 05/12] nbd: add support for NBD_CMD_FLAG_FUA
The server can use it to issue a flush automatically after a write. The client can also use it to mimic a write-through cache. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/nbd.c |8 nbd.c | 13 +++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 4a195dc..5a7812c 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -215,6 +215,10 @@ static int nbd_write(BlockDriverState *bs, int64_t sector_num, struct nbd_reply reply; request.type = NBD_CMD_WRITE; +if (!bdrv_enable_write_cache(bs) (s-nbdflags NBD_FLAG_SEND_FUA)) { +request.type |= NBD_CMD_FLAG_FUA; +} + request.handle = (uint64_t)(intptr_t)bs; request.from = sector_num * 512;; request.len = nb_sectors * 512; @@ -248,6 +252,10 @@ static int nbd_flush(BlockDriverState *bs) } request.type = NBD_CMD_FLUSH; +if (s-nbdflags NBD_FLAG_SEND_FUA) { +request.type |= NBD_CMD_FLAG_FUA; +} + request.handle = (uint64_t)(intptr_t)bs; request.from = 0; request.len = 0; diff --git a/nbd.c b/nbd.c index 4dbbc62..b65fb4a 100644 --- a/nbd.c +++ b/nbd.c @@ -194,7 +194,8 @@ int nbd_negotiate(int csock, off_t size, uint32_t flags) cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL); cpu_to_be64w((uint64_t*)(buf + 16), size); cpu_to_be32w((uint32_t*)(buf + 24), - flags | NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_FLUSH); + flags | NBD_FLAG_HAS_FLAGS | + NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA); memset(buf + 28, 0, 124); if (write_sync(csock, buf, sizeof(buf)) != sizeof(buf)) { @@ -614,7 +615,7 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset, reply.handle = request.handle; reply.error = 0; -switch (request.type) { +switch (request.type NBD_CMD_MASK_COMMAND) { case NBD_CMD_READ: TRACE(Request type is READ); @@ -674,6 +675,14 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset, } *offset += request.len; + +if (request.type NBD_CMD_FLAG_FUA) { +if (bdrv_flush(bs) == -1) { +LOG(flush failed); +errno = EINVAL; +return -1; +} +} } if (nbd_send_reply(csock, reply) == -1) -- 1.7.6
[Qemu-devel] [PATCH 03/12] nbd: support NBD_SET_FLAGS ioctl
The nbd kernel module cannot enable DISCARD requests unless it is informed about it. The flags field in the header is used for this, and this patch adds support for it. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- nbd.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/nbd.c b/nbd.c index 9ed2239..30cd78f 100644 --- a/nbd.c +++ b/nbd.c @@ -377,6 +377,14 @@ int nbd_init(int fd, int csock, uint32_t flags, off_t size, size_t blocksize) } } +if (ioctl(fd, NBD_SET_FLAGS, flags) 0 + errno != ENOTTY) { +int serrno = errno; +LOG(Failed setting flags); +errno = serrno; +return -1; +} + TRACE(Clearing NBD socket); if (ioctl(fd, NBD_CLEAR_SOCK) == -1) { -- 1.7.6
[Qemu-devel] [PATCH 01/12] nbd: support feature negotiation
nbd supports writing flags in bytes 24...27 of the header, and uses that for the read-only flag. Add support for it in qemu-nbd. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/nbd.c |4 ++-- nbd.c | 32 +--- nbd.h |9 ++--- qemu-nbd.c | 13 ++--- 4 files changed, 39 insertions(+), 19 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 55cb2fd..ffc57a9 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -47,6 +47,7 @@ typedef struct BDRVNBDState { int sock; +uint32_t nbdflags; off_t size; size_t blocksize; char *export_name; /* An NBD server may export several devices */ @@ -110,7 +111,6 @@ static int nbd_establish_connection(BlockDriverState *bs) int ret; off_t size; size_t blocksize; -uint32_t nbdflags; if (s-host_spec[0] == '/') { sock = unix_socket_outgoing(s-host_spec); @@ -125,7 +125,7 @@ static int nbd_establish_connection(BlockDriverState *bs) } /* NBD handshake */ -ret = nbd_receive_negotiate(sock, s-export_name, nbdflags, size, +ret = nbd_receive_negotiate(sock, s-export_name, s-nbdflags, size, blocksize); if (ret == -1) { logout(Failed to negotiate with the NBD server\n); diff --git a/nbd.c b/nbd.c index e7a585d..07a8e53 100644 --- a/nbd.c +++ b/nbd.c @@ -29,6 +29,10 @@ #include ctype.h #include inttypes.h +#ifdef __linux__ +#include linux/fs.h +#endif + #include qemu_socket.h //#define DEBUG_NBD @@ -171,7 +175,7 @@ int unix_socket_outgoing(const char *path) Request (type == 2) */ -int nbd_negotiate(int csock, off_t size) +int nbd_negotiate(int csock, off_t size, uint32_t flags) { char buf[8 + 8 + 8 + 128]; @@ -179,14 +183,16 @@ int nbd_negotiate(int csock, off_t size) [ 0 .. 7] passwd (NBDMAGIC) [ 8 .. 15] magic(0x00420281861253) [16 .. 23] size -[24 .. 151] reserved (0) +[24 .. 27] flags +[28 .. 151] reserved (0) */ TRACE(Beginning negotiation.); memcpy(buf, NBDMAGIC, 8); cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL); cpu_to_be64w((uint64_t*)(buf + 16), size); -memset(buf + 24, 0, 128); +cpu_to_be32w((uint32_t*)(buf + 24), flags | NBD_FLAG_HAS_FLAGS); +memset(buf + 28, 0, 124); if (write_sync(csock, buf, sizeof(buf)) != sizeof(buf)) { LOG(write failed); @@ -336,8 +342,8 @@ int nbd_receive_negotiate(int csock, const char *name, uint32_t *flags, return 0; } -#ifndef _WIN32 -int nbd_init(int fd, int csock, off_t size, size_t blocksize) +#ifdef __linux__ +int nbd_init(int fd, int csock, uint32_t flags, off_t size, size_t blocksize) { TRACE(Setting block size to %lu, (unsigned long)blocksize); @@ -357,6 +363,18 @@ int nbd_init(int fd, int csock, off_t size, size_t blocksize) return -1; } +if (flags NBD_FLAG_READ_ONLY) { +int read_only = 1; +TRACE(Setting readonly attribute); + +if (ioctl(fd, BLKROSET, (unsigned long) read_only) 0) { +int serrno = errno; +LOG(Failed setting read-only attribute); +errno = serrno; +return -1; +} +} + TRACE(Clearing NBD socket); if (ioctl(fd, NBD_CLEAR_SOCK) == -1) { @@ -547,7 +565,7 @@ static int nbd_send_reply(int csock, struct nbd_reply *reply) } int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset, - off_t *offset, bool readonly, uint8_t *data, int data_size) + off_t *offset, uint32_t nbdflags, uint8_t *data, int data_size) { struct nbd_request request; struct nbd_reply reply; @@ -631,7 +649,7 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset, return -1; } -if (readonly) { +if (nbdflags NBD_FLAG_READ_ONLY) { TRACE(Server is read-only, return error); reply.error = 1; } else { diff --git a/nbd.h b/nbd.h index 96f77fe..938a021 100644 --- a/nbd.h +++ b/nbd.h @@ -39,6 +39,9 @@ struct nbd_reply { uint64_t handle; } QEMU_PACKED; +#define NBD_FLAG_HAS_FLAGS (1 0)/* Flags are there */ +#define NBD_FLAG_READ_ONLY (1 1)/* Device is read-only */ + enum { NBD_CMD_READ = 0, NBD_CMD_WRITE = 1, @@ -55,14 +58,14 @@ int tcp_socket_incoming_spec(const char *address_and_port); int unix_socket_outgoing(const char *path); int unix_socket_incoming(const char *path); -int nbd_negotiate(int csock, off_t size); +int nbd_negotiate(int csock, off_t size, uint32_t flags); int nbd_receive_negotiate(int csock, const char *name, uint32_t *flags, off_t *size, size_t *blocksize); -int nbd_init(int fd, int csock, off_t size, size_t blocksize); +int nbd_init(int fd, int csock, uint32_t flags, off_t size, size_t blocksize); int
[Qemu-devel] [PATCH 08/12] add socket_set_block
Cc: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- oslib-posix.c |7 +++ oslib-win32.c |6 ++ qemu_socket.h |1 + 3 files changed, 14 insertions(+), 0 deletions(-) diff --git a/oslib-posix.c b/oslib-posix.c index 196099c..e13e6d4 100644 --- a/oslib-posix.c +++ b/oslib-posix.c @@ -91,6 +91,13 @@ void qemu_vfree(void *ptr) free(ptr); } +void socket_set_block(int fd) +{ +int f; +f = fcntl(fd, F_GETFL); +fcntl(fd, F_SETFL, f ~O_NONBLOCK); +} + void socket_set_nonblock(int fd) { int f; diff --git a/oslib-win32.c b/oslib-win32.c index 5f0759f..5e3de7d 100644 --- a/oslib-win32.c +++ b/oslib-win32.c @@ -73,6 +73,12 @@ void qemu_vfree(void *ptr) VirtualFree(ptr, 0, MEM_RELEASE); } +void socket_set_block(int fd) +{ +unsigned long opt = 0; +ioctlsocket(fd, FIONBIO, opt); +} + void socket_set_nonblock(int fd) { unsigned long opt = 1; diff --git a/qemu_socket.h b/qemu_socket.h index 180e4db..9e32fac 100644 --- a/qemu_socket.h +++ b/qemu_socket.h @@ -35,6 +35,7 @@ int inet_aton(const char *cp, struct in_addr *ia); /* misc helpers */ int qemu_socket(int domain, int type, int protocol); int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen); +void socket_set_block(int fd); void socket_set_nonblock(int fd); int send_all(int fd, const void *buf, int len1); -- 1.7.6
[Qemu-devel] [PATCH] Fix include statements for qemu-common.h
* qemu-common.h is not a system include file, so it should be included with instead of . Otherwise incremental builds might fail because only local include files are checked for changes. * linux-user/syscall.c included the file twice. Cc: Kevin Wolf kw...@redhat.com Cc: Riku Voipio riku.voi...@iki.fi Cc: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Stefan Weil w...@mail.berlios.de --- hw/virtio-blk.c |2 +- linux-user/syscall.c |3 +-- nbd.h|2 +- qemu-nbd.c |2 +- slirp/libslirp.h |2 +- 5 files changed, 5 insertions(+), 6 deletions(-) diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index 4df23f4..d5d4757 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -11,7 +11,7 @@ * */ -#include qemu-common.h +#include qemu-common.h #include qemu-error.h #include trace.h #include blockdev.h diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 6bdf4e6..e87e174 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -60,7 +60,7 @@ int __clone2(int (*fn)(void *), void *child_stack_base, #include netinet/ip.h #include netinet/tcp.h #include linux/wireless.h -#include qemu-common.h +#include qemu-common.h #ifdef TARGET_GPROF #include sys/gmon.h #endif @@ -96,7 +96,6 @@ int __clone2(int (*fn)(void *), void *child_stack_base, #include cpu-uname.h #include qemu.h -#include qemu-common.h #if defined(CONFIG_USE_NPTL) #define CLONE_NPTL_FLAGS2 (CLONE_SETTLS | \ diff --git a/nbd.h b/nbd.h index 96f77fe..273cfa1 100644 --- a/nbd.h +++ b/nbd.h @@ -21,7 +21,7 @@ #include sys/types.h -#include qemu-common.h +#include qemu-common.h #include block_int.h diff --git a/qemu-nbd.c b/qemu-nbd.c index 0b25a4d..3a39145 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -16,7 +16,7 @@ * along with this program; if not, see http://www.gnu.org/licenses/. */ -#include qemu-common.h +#include qemu-common.h #include block_int.h #include nbd.h diff --git a/slirp/libslirp.h b/slirp/libslirp.h index 67c70e3..a755123 100644 --- a/slirp/libslirp.h +++ b/slirp/libslirp.h @@ -1,7 +1,7 @@ #ifndef _LIBSLIRP_H #define _LIBSLIRP_H -#include qemu-common.h +#include qemu-common.h #ifdef CONFIG_SLIRP -- 1.7.2.5
Re: [Qemu-devel] [PATCH] virtio-9p: Fix syntax error in debug code
Am 20.07.2011 11:44, schrieb Aneesh Kumar K.V: On Wed, 20 Jul 2011 08:27:28 +0200, Stefan Weil w...@mail.berlios.de wrote: This error was reported by cppcheck: qemu/hw/9pfs/virtio-9p-debug.c:342: error: Invalid number of character ({) when these macros are defined: 'DEBUG_DATA'. Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Signed-off-by: Stefan Weil w...@mail.berlios.de --- hw/9pfs/virtio-9p-debug.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/9pfs/virtio-9p-debug.c b/hw/9pfs/virtio-9p-debug.c index 4636ad5..96925f0 100644 --- a/hw/9pfs/virtio-9p-debug.c +++ b/hw/9pfs/virtio-9p-debug.c @@ -295,7 +295,7 @@ static void pprint_data(V9fsPDU *pdu, int rx, size_t *offsetp, const char *name) if (rx) { count = pdu-elem.in_num; - } else + } else { count = pdu-elem.out_num; } Applied. We also need to update virtio-9p-debug w.r.t the new co-routine series. With co-routine we can have multiple 9p handler started simultaneously. -aneesh Maybe this patch can be applied via qemu-trivial. I simply would like to get it out of my list of open patches. Thanks, Stefan W.
Re: [Qemu-devel] [PATCH 0/2] improve qemu-img conversion performance
On Thu, 8 Sep 2011, Kevin Wolf wrote: Am 08.09.2011 01:06, schrieb Yehuda Sadeh: The following set of patches improve the qemu-img conversion process performance. When using a higher latency backend, small writes have a severe impact on the time it takes to do image conversion. We switch to using async writes, and we avoid splitting writes due to holes when the holes are small enough. Yehuda Sadeh (2): qemu-img: async write to block device when converting image qemu-img: don't skip writing small holes qemu-img.c | 34 +++--- 1 files changed, 27 insertions(+), 7 deletions(-) This doesn't seem to be against git master or the block tree. Please rebase. I think that commit a22f123c may obsolete your patch 2/2. With git.kernel.org down, where should I be looking for the latest upstream? Thanks! sage
Re: [Qemu-devel] [PATCH v4 24/39] ppc: convert to memory API
[dropping kvm@vger because my mail server refuses to send mails there] On 08.08.2011, at 15:09, Avi Kivity wrote: Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/cuda.c |6 ++- hw/escc.c | 42 +-- hw/escc.h |2 +- hw/heathrow_pic.c | 29 -- hw/ide.h |2 +- hw/ide/macio.c| 36 --- hw/mac_dbdma.c| 32 ++-- hw/mac_dbdma.h|4 ++- hw/mac_nvram.c| 39 ++--- hw/macio.c| 74 +++- hw/openpic.c | 81 + hw/openpic.h |2 +- hw/ppc_mac.h | 16 ++ hw/ppc_newworld.c | 30 +-- hw/ppc_oldworld.c | 23 +++ 15 files changed, 201 insertions(+), 217 deletions(-) diff --git a/hw/cuda.c b/hw/cuda.c index 065c362..5c92d81 100644 --- a/hw/cuda.c +++ b/hw/cuda.c @@ -117,6 +117,7 @@ typedef struct CUDATimer { } CUDATimer; typedef struct CUDAState { +MemoryRegion mem; /* cuda registers */ uint8_t b; /* B-side data */ uint8_t a; /* A-side data */ @@ -722,7 +723,7 @@ static void cuda_reset(void *opaque) set_counter(s, s-timers[1], 0x); } -void cuda_init (int *cuda_mem_index, qemu_irq irq) +void cuda_init (MemoryRegion **cuda_mem, qemu_irq irq) { struct tm tm; CUDAState *s = cuda_state; @@ -738,8 +739,9 @@ void cuda_init (int *cuda_mem_index, qemu_irq irq) s-tick_offset = (uint32_t)mktimegm(tm) + RTC_OFFSET; s-adb_poll_timer = qemu_new_timer_ns(vm_clock, cuda_adb_poll, s); -*cuda_mem_index = cpu_register_io_memory(cuda_read, cuda_write, s, +cpu_register_io_memory(cuda_read, cuda_write, s, DEVICE_NATIVE_ENDIAN); +*cuda_mem = s-mem; Just stumbled over this while debugging why the Mac machines don't boot anymore. Are you sure this part is correct? We're not registering the region (and its callbacks) anymore now, right? Alex
Re: [Qemu-devel] [PATCH v4 24/39] ppc: convert to memory API
On 08.08.2011, at 15:09, Avi Kivity wrote: Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/cuda.c |6 ++- hw/escc.c | 42 +-- hw/escc.h |2 +- hw/heathrow_pic.c | 29 -- hw/ide.h |2 +- hw/ide/macio.c| 36 --- hw/mac_dbdma.c| 32 ++-- hw/mac_dbdma.h|4 ++- hw/mac_nvram.c| 39 ++--- hw/macio.c| 74 +++- hw/openpic.c | 81 + hw/openpic.h |2 +- hw/ppc_mac.h | 16 ++ hw/ppc_newworld.c | 30 +-- hw/ppc_oldworld.c | 23 +++ 15 files changed, 201 insertions(+), 217 deletions(-) [...] @@ -89,7 +91,8 @@ static void pic_writel (void *opaque, target_phys_addr_t addr, uint32_t value) } } -static uint32_t pic_readl (void *opaque, target_phys_addr_t addr) +static uint64_t pic_read(void *opaque, target_phys_addr_t addr, + unsigned size) { HeathrowPICS *s = opaque; HeathrowPIC *pic; @@ -120,19 +123,12 @@ static uint32_t pic_readl (void *opaque, target_phys_addr_t addr) return value; } -static CPUWriteMemoryFunc * const pic_write[] = { -pic_writel, -pic_writel, -pic_writel, +static const MemoryRegionOps heathrow_pic_ops = { +.read = pic_read, +.write = pic_write, +.endianness = DEVICE_NATIVE_ENDIAN, native endian }; -static CPUReadMemoryFunc * const pic_read[] = { -pic_readl, -pic_readl, -pic_readl, -}; - - static void heathrow_pic_set_irq(void *opaque, int num, int level) { HeathrowPICS *s = opaque; @@ -201,7 +197,7 @@ static void heathrow_pic_reset(void *opaque) s-pics[1].level_triggered = 0x1ff0; } -qemu_irq *heathrow_pic_init(int *pmem_index, +qemu_irq *heathrow_pic_init(MemoryRegion **pmem, int nb_cpus, qemu_irq **irqs) { HeathrowPICS *s; @@ -209,8 +205,9 @@ qemu_irq *heathrow_pic_init(int *pmem_index, s = qemu_mallocz(sizeof(HeathrowPICS)); /* only 1 CPU */ s-irqs = irqs[0]; -*pmem_index = cpu_register_io_memory(pic_read, pic_write, s, - DEVICE_LITTLE_ENDIAN); little endian. So you're changing the endianness of the calls? Not nice. Alex
[Qemu-devel] [PATCH] PPC: Fix via-cuda memory registration
Commit 23c5e4ca (convert to memory API) broke the VIA Cuda emulation layer by not registering the IO structs. This patch registers them properly and thus makes -M g3beige and -M mac99 work again. Signed-off-by: Alexander Graf ag...@suse.de --- PS: Please test your patches. This one could have been found with an invocation as simple as qemu-system-ppc. We boot into the OpenBIOS prompt by default, so you wouldn't even have required a guest image or kernel. --- hw/cuda.c | 28 1 files changed, 16 insertions(+), 12 deletions(-) diff --git a/hw/cuda.c b/hw/cuda.c index 6f05975..4077436 100644 --- a/hw/cuda.c +++ b/hw/cuda.c @@ -634,16 +634,20 @@ static uint32_t cuda_readl (void *opaque, target_phys_addr_t addr) return 0; } -static CPUWriteMemoryFunc * const cuda_write[] = { -cuda_writeb, -cuda_writew, -cuda_writel, -}; - -static CPUReadMemoryFunc * const cuda_read[] = { -cuda_readb, -cuda_readw, -cuda_readl, +static MemoryRegionOps cuda_ops = { +.old_mmio = { +.write = { +cuda_writeb, +cuda_writew, +cuda_writel, +}, +.read = { +cuda_readb, +cuda_readw, +cuda_readl, +}, +}, +.endianness = DEVICE_NATIVE_ENDIAN, }; static bool cuda_timer_exist(void *opaque, int version_id) @@ -740,8 +744,8 @@ void cuda_init (MemoryRegion **cuda_mem, qemu_irq irq) s-tick_offset = (uint32_t)mktimegm(tm) + RTC_OFFSET; s-adb_poll_timer = qemu_new_timer_ns(vm_clock, cuda_adb_poll, s); -cpu_register_io_memory(cuda_read, cuda_write, s, - DEVICE_NATIVE_ENDIAN); +memory_region_init_io(s-mem, cuda_ops, s, cuda, 0x2000); + *cuda_mem = s-mem; vmstate_register(NULL, -1, vmstate_cuda, s); qemu_register_reset(cuda_reset, s); -- 1.6.0.2
Re: [Qemu-devel] [PATCH v2] Fix X86 CPU topology in KVM mode
On 2011-09-08 07:33, bharata@gmail.com wrote: From: Bharata B Rao bharata@gmail.com apic id returned to guest kernel in ebx for cpuid(function=1) depends on CPUX86State-cpuid_apic_id which gets populated after the cpuid information is cached in the host kernel. This results in broken CPU topology in guest. Fix this by setting cpuid_apic_id before cpuid information is passed to the host kernel. This is done by moving the setting of cpuid_apic_id to cpu_x86_init() where it will work for both KVM as well as TCG modes. Signed-off-by: Bharata B Rao bharata@gmail.com --- This is the next post of the fix that addresses Jan's comment about bringing back (smp_cpus 1) check. The previous version was posted here: http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00892.html I coudn't boot a 486 kernel successfully with qemu and hence not sure if and how this fix breaks i486. Any help from Jan or others who might have easy means to boot 486 would be good. At least it preserves the current logic, just moves it up in the initialization path. hw/pc.c |1 - target-i386/helper.c |5 + 2 files changed, 5 insertions(+), 1 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 5bc845a..f9cca4d 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -933,7 +933,6 @@ static CPUState *pc_new_cpu(const char *cpu_model) exit(1); } if ((env-cpuid_features CPUID_APIC) || smp_cpus 1) { -env-cpuid_apic_id = env-cpu_index; env-apic_state = apic_init(env, env-cpuid_apic_id); } qemu_register_reset(pc_cpu_reset, env); diff --git a/target-i386/helper.c b/target-i386/helper.c index 5df40d4..139a193 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -1256,6 +1256,11 @@ CPUX86State *cpu_x86_init(const char *cpu_model) cpu_x86_close(env); return NULL; } + +if ((env-cpuid_features CPUID_APIC) || smp_cpus 1) { +env-cpuid_apic_id = env-cpu_index; +} + mce_init(env); qemu_init_vcpu(env); Tested-and-acked-by: Jan Kiszka jan.kis...@siemens.com Just in time, we happen to hit this bug today too. It confused libvirt in the guest quite a bit... Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] unable to access the serial port on the Vm
Hi, I'm running one VM on the Qemu hypervisor. I'm tried too access the /dev/ttyS0 from the VM, but I cant access that ,. it shows input out error . i.e when I do $cat /dev/ttyS0 - it gives input output error . what may be the problem .. pls help
Re: [Qemu-devel] [PATCH] KVM: emulate lapic tsc deadline timer for hvm
--- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -229,6 +229,8 @@ #define MSR_IA32_APICBASE_ENABLE (111) #define MSR_IA32_APICBASE_BASE (0xf12) +#define MSR_IA32_TSCDEADLINE 0x06e0 + #define MSR_IA32_UCODE_WRITE 0x0079 #define MSR_IA32_UCODE_REV 0x008b Need to add to msrs_to_save so live migration works. MSR must be explicitly listed in qemu, also. Marcelo, seems MSR don't need explicitly list in qemu? KVM side adding MSR_IA32_TSCDEADLINE to msrs_to_save is enough. Qemu will get it through KVM_GET_MSR_INDEX_LIST. Do I miss something? Thanks, Jinsong
[Qemu-devel] [PATCH] ahci: Remove unused struct member
Member variable is_read is written, but never read (contrary to its name). Remove it. Kevin Wolf kw...@redhat.com Signed-off-by: Stefan Weil w...@mail.berlios.de --- hw/ide/ahci.c |2 -- hw/ide/ahci.h |1 - 2 files changed, 0 insertions(+), 3 deletions(-) diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c index f4fa154..a8659cf 100644 --- a/hw/ide/ahci.c +++ b/hw/ide/ahci.c @@ -754,7 +754,6 @@ static void process_ncq_command(AHCIState *s, int port, uint8_t *cmd_fis, case READ_FPDMA_QUEUED: DPRINTF(port, NCQ reading %d sectors from LBA %ld, tag %d\n, ncq_tfs-sector_count-1, ncq_tfs-lba, ncq_tfs-tag); -ncq_tfs-is_read = 1; DPRINTF(port, tag %d aio read %ld\n, ncq_tfs-tag, ncq_tfs-lba); @@ -768,7 +767,6 @@ static void process_ncq_command(AHCIState *s, int port, uint8_t *cmd_fis, case WRITE_FPDMA_QUEUED: DPRINTF(port, NCQ writing %d sectors to LBA %ld, tag %d\n, ncq_tfs-sector_count-1, ncq_tfs-lba, ncq_tfs-tag); -ncq_tfs-is_read = 0; DPRINTF(port, tag %d aio write %ld\n, ncq_tfs-tag, ncq_tfs-lba); diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h index 3c29d93..5de986c 100644 --- a/hw/ide/ahci.h +++ b/hw/ide/ahci.h @@ -259,7 +259,6 @@ typedef struct NCQTransferState { BlockDriverAIOCB *aiocb; QEMUSGList sglist; BlockAcctCookie acct; -int is_read; uint16_t sector_count; uint64_t lba; uint8_t tag; -- 1.7.2.5
Re: [Qemu-devel] [RFC PATCH 4/5] VFIO: Add PCI device support
On Thu, 2011-09-08 at 10:52 +0300, Avi Kivity wrote: On 09/07/2011 09:55 PM, Konrad Rzeszutek Wilk wrote: If you don't know what to do here, say N. + +menuconfig VFIO_PCI +bool VFIO support for PCI devices +depends on VFIO PCI +default y if X86 Hahah.. And Linus is going to tear your behind for that. Default should be 'n' It depends on VFIO, which presumably defaults to n. Yes, exactly.
Re: [Qemu-devel] [RFC PATCH 0/5] VFIO-NG group/device/iommu framework
On Wed, 2011-09-07 at 13:58 +0200, Alexander Graf wrote: On 01.09.2011, at 21:50, Alex Williamson wrote: Trying to move beyond talking about how VFIO should work to re-writing the code. This is pre-alpha, known broken, will probably crash your system but it illustrates some of how I see groups, devices, and iommus interacting. This is just the framework, no code to actually support user space drivers or device assignment yet. The iommu portions are still using the FIXME PCI specific hooks. Once Joerg gets some buy-in on his bus specific iommu patches, we can move to that. The group management is more complicated than I'd like and you can get groups into a bad state by killing the test program with devices/iommus open. The locking is overly simplistic. But, it's a start. Please make constructive comments and suggestions. Patches based on v3.0. Thanks, Looks pretty reasonable to me so far, but I guess we only know for sure once we have non-PCI implemented and working with this scheme as well. Btw I couldn't find the PCI BAR regions mmaps and general config space exposure. Where has that gone? I ripped it out for now just to work on the group/device/iommu framework. I didn't see a need to make a functional RFC just to get some buy-in on the framework. Thanks, Alex
[Qemu-devel] [PATCH] ARM7TDMI: Enable ARMv4T features
Signed-off-by: Marek Vasut marek.va...@gmail.com --- target-arm/helper.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/target-arm/helper.c b/target-arm/helper.c index 58cd99f..2f3e937 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -53,6 +53,7 @@ static void cpu_reset_model_id(CPUARMState *env, uint32_t id) env-cp15.c0_cpuid = id; switch (id) { case ARM_CPUID_ARM7TDMI: +set_feature(env, ARM_FEATURE_V4T); // set_feature(env, ARM_FEATURE_ABORT_BU); // set_feature(env, ARM_FEATURE_NO_CP15); break; -- 1.7.5.4
[Qemu-devel] DEMANDE DE DEVIS POUR
Nous sommes là pour vous faire bénéficier d’une étude GRATUITE et sans engagement de votre part pour l’installation de: • Climatisation réversible, • solaire sur toiture • solaire pour chauffer une piscine etc… • Isolation thermique des combles • Adoucisseur d’eau • L’énergie solaire vous permet de devenir producteur d’électricité et de revendre votre électricité à EDF* Vous possédez une propriété, un entrepôt, un gîte, un garage, un bâtiment industriel, une grange ou vous voulez construire un abri, un au vent … Bénéficier d’un revenue complémentaire durable pendant VINGT ans. (*sous réserve d'acceptation de votre dossier par EDF **contacte-nous par email ici ***C°A°D°O°V°I°S Contactez-nous ***0 2 pste 4 0 pste 5 7 pste 0 1pste1p3ppp Pour ne plus recevoir notre newsletter CONTACTEZ-NOUS PAR EMAIL-ICI
[Qemu-devel] [Bug 824650] Re: Latest GIT assert error in arp_table.c
No - that's not relevant. The latest git (07ff2c4475df77e38a31d50ee7f3932631806c15) still crashes after just a couple of minutes with just about any guest on a Linux host. These are the args for my FreeBSD guest: qemu-system-i386 -drive file=freebsd8.1-i386,index=0,media=disk,cache=unsafe -drive file=/dev/cdrom,index=1,media=cdrom -boot c -enable-kvm -m 128 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/824650 Title: Latest GIT assert error in arp_table.c Status in QEMU: New Bug description: The latest git version of qemu (commit 8cc7c3952d4d0a681d8d4c3ac89a206a5bfd7f00) crashes after a few minutes. All was fine up to a few days ago. This is wth both x86 and sparc emulation, on an x86_64 host. e.g. qemu-system-sparc -drive file=netbsd5.0.2-sparc,index=0,media=disk,cache=unsafe -m 256 -boot c -nographic -redir tcp:2232::22: qemu-system-sparc: slirp/arp_table.c:75: arp_table_search: Assertion `(ip_addr (__extension__ ({ register unsigned int __v, __x = (~(0xf 28)); if (__builtin_constant_p (__x)) __v = __x) 0xff00) 24) | (((__x) 0x00ff) 8) | (((__x) 0xff00) 8) | (((__x) 0x00ff) 24)); else __asm__ (bswap %0 : =r (__v) : 0 (__x)); __v; }))) != 0' failed. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/824650/+subscriptions
[Qemu-devel] [PATCH] SPARC: Trivial patch to clean up npc monitor output
This patch fixes the spacing of the PC and NPC output from 'info cpus' for SPARC. Signed-off-by: Nathan Kunkee nkunke...@hotmail.com diff --git a/monitor.c b/monitor.c index 1b8ba2c..16cd4c5 100644 --- a/monitor.c +++ b/monitor.c @@ -884,9 +884,9 @@ static void print_cpu_iter(QObject *obj, void *opaque) monitor_printf(mon, nip=0x TARGET_FMT_lx, (target_long) qdict_get_int(cpu, nip)); #elif defined(TARGET_SPARC) -monitor_printf(mon, pc=0x TARGET_FMT_lx, +monitor_printf(mon, pc=0x TARGET_FMT_lx, (target_long) qdict_get_int(cpu, pc)); -monitor_printf(mon, npc=0x TARGET_FMT_lx, +monitor_printf(mon, npc=0x TARGET_FMT_lx, (target_long) qdict_get_int(cpu, npc)); #elif defined(TARGET_MIPS) monitor_printf(mon, PC=0x TARGET_FMT_lx,
Re: [Qemu-devel] [PATCH 0/2] improve qemu-img conversion performance
On Thu, 8 Sep 2011, Stefan Hajnoczi wrote: On Wed, Sep 07, 2011 at 04:06:51PM -0700, Yehuda Sadeh wrote: The following set of patches improve the qemu-img conversion process performance. When using a higher latency backend, small writes have a severe impact on the time it takes to do image conversion. We switch to using async writes, and we avoid splitting writes due to holes when the holes are small enough. Yehuda Sadeh (2): qemu-img: async write to block device when converting image qemu-img: don't skip writing small holes qemu-img.c | 34 +++--- 1 files changed, 27 insertions(+), 7 deletions(-) -- 2.7.5.1 This has nothing to do with the patch itself, but I've been curious about the existence of both a QEMU and a Linux kernel rbd block driver. The I/O latency with qemu-img has been an issue for rbd users. But they have the option of using the Linux kernel rbd block driver, where qemu-img can take advantage of the page cache instead of performing direct I/O. Does this mean you intend to support both QEMU block/rbd.c and Linux drivers/block/rbd.c? As a user I would go with the Linux kernel driver instead of the QEMU block driver because it offers page cache and host block device features. On the other hand a userspace driver is nice because it does not require privileges. We intend to support both drivers, yes. The native qemu driver is generally more convenient because there is no kernel dependency, so we want to make qemu-img perform reasonably one way or another. There are plans to implement some limited buffering (and flush) in librbd to make the device behave a bit more like a disk with a cache. That will mask the sync write latency, but I suspect that doing these writes using the aio interface (and ignoring small holes) will help everyone... sage
Re: [Qemu-devel] [PATCH 09/12] sheepdog: move coroutine send/recv function to generic code
At Thu, 8 Sep 2011 17:25:02 +0200, Paolo Bonzini wrote: Outside coroutines, avoid busy waiting on EAGAIN by temporarily making the socket blocking. The API of qemu_recvv/qemu_sendv is slightly different from do_readv/do_writev because they do not handle coroutines. It returns the number of bytes written before encountering an EAGAIN. The specificity of yielding on EAGAIN is entirely in qemu-coroutine.c. Cc: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/sheepdog.c | 221 + cutils.c | 108 ++ qemu-common.h|3 + qemu-coroutine.c | 71 + qemu-coroutine.h | 26 +++ 5 files changed, 229 insertions(+), 200 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index af696a5..188a8d8 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -443,129 +443,6 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, QEMUIOVector *qiov, return acb; } -#ifdef _WIN32 - -struct msghdr { -struct iovec *msg_iov; -size_tmsg_iovlen; -}; - -static ssize_t sendmsg(int s, const struct msghdr *msg, int flags) -{ -size_t size = 0; -char *buf, *p; -int i, ret; - -/* count the msg size */ -for (i = 0; i msg-msg_iovlen; i++) { -size += msg-msg_iov[i].iov_len; -} -buf = g_malloc(size); - -p = buf; -for (i = 0; i msg-msg_iovlen; i++) { -memcpy(p, msg-msg_iov[i].iov_base, msg-msg_iov[i].iov_len); -p += msg-msg_iov[i].iov_len; -} - -ret = send(s, buf, size, flags); - -g_free(buf); -return ret; -} - -static ssize_t recvmsg(int s, struct msghdr *msg, int flags) -{ -size_t size = 0; -char *buf, *p; -int i, ret; - -/* count the msg size */ -for (i = 0; i msg-msg_iovlen; i++) { -size += msg-msg_iov[i].iov_len; -} -buf = g_malloc(size); - -ret = qemu_recv(s, buf, size, flags); -if (ret 0) { -goto out; -} - -p = buf; -for (i = 0; i msg-msg_iovlen; i++) { -memcpy(msg-msg_iov[i].iov_base, p, msg-msg_iov[i].iov_len); -p += msg-msg_iov[i].iov_len; -} -out: -g_free(buf); -return ret; -} - -#endif - -/* - * Send/recv data with iovec buffers - * - * This function send/recv data from/to the iovec buffer directly. - * The first `offset' bytes in the iovec buffer are skipped and next - * `len' bytes are used. - * - * For example, - * - * do_send_recv(sockfd, iov, len, offset, 1); - * - * is equals to - * - * char *buf = malloc(size); - * iov_to_buf(iov, iovcnt, buf, offset, size); - * send(sockfd, buf, size, 0); - * free(buf); - */ -static int do_send_recv(int sockfd, struct iovec *iov, int len, int offset, -int write) -{ -struct msghdr msg; -int ret, diff; - -memset(msg, 0, sizeof(msg)); -msg.msg_iov = iov; -msg.msg_iovlen = 1; - -len += offset; - -while (iov-iov_len len) { -len -= iov-iov_len; - -iov++; -msg.msg_iovlen++; -} - -diff = iov-iov_len - len; -iov-iov_len -= diff; - -while (msg.msg_iov-iov_len = offset) { -offset -= msg.msg_iov-iov_len; - -msg.msg_iov++; -msg.msg_iovlen--; -} - -msg.msg_iov-iov_base = (char *) msg.msg_iov-iov_base + offset; -msg.msg_iov-iov_len -= offset; - -if (write) { -ret = sendmsg(sockfd, msg, 0); -} else { -ret = recvmsg(sockfd, msg, 0); -} - -msg.msg_iov-iov_base = (char *) msg.msg_iov-iov_base - offset; -msg.msg_iov-iov_len += offset; - -iov-iov_len += diff; -return ret; -} - static int connect_to_sdog(const char *addr, const char *port) { char hbuf[NI_MAXHOST], sbuf[NI_MAXSERV]; @@ -618,65 +495,6 @@ success: return fd; } -static int do_readv_writev(int sockfd, struct iovec *iov, int len, - int iov_offset, int write) -{ -int ret; -again: -ret = do_send_recv(sockfd, iov, len, iov_offset, write); -if (ret 0) { -if (errno == EINTR) { -goto again; -} -if (errno == EAGAIN) { -if (qemu_in_coroutine()) { -qemu_coroutine_yield(); -} -goto again; -} -error_report(failed to recv a rsp, %s, strerror(errno)); -return 1; -} - -iov_offset += ret; -len -= ret; -if (len) { -goto again; -} - -return 0; -} - -static int do_readv(int sockfd, struct iovec *iov, int len, int iov_offset) -{ -return do_readv_writev(sockfd, iov, len, iov_offset, 0); -} - -static int do_writev(int sockfd, struct iovec *iov, int len, int iov_offset) -{ -return do_readv_writev(sockfd, iov, len,
Re: [Qemu-devel] [PATCH] pci: Remove unused pci_reserve_capability
On Thu, Sep 08, 2011 at 12:44:47PM +0200, Jan Kiszka wrote: eepro100 was the last user. Now pci_add_capability is powerful enough. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Applied, thanks. --- hw/pci.c |6 -- hw/pci.h |2 -- 2 files changed, 0 insertions(+), 8 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 57ff7b1..63c346d 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -2028,12 +2028,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size) pdev-config[PCI_STATUS] = ~PCI_STATUS_CAP_LIST; } -/* Reserve space for capability at a known offset (to call after load). */ -void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size) -{ -memset(pdev-used + offset, 0xff, size); -} - uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id) { return pci_find_capability_list(pdev, cap_id, NULL); diff --git a/hw/pci.h b/hw/pci.h index 391217e..f2dae63 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -209,8 +209,6 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size); -void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t size); - uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id); -- 1.7.3.4
[Qemu-devel] [PATCH] support add-cow file format
As raw file format does not support backing_file and copy on write feature, so I add COW to it to support backing_file option. I store dirty bitmap in an add-cow file. When executed, it looks like this: qemu-img create -f add-cow -o backing_file=ubuntu.img,image_file=test.img test.add-cow qemu -drive if=virtio,file=test.add-cow -m 1024 (test.img is a raw format file; test.add-cow stores bitmap) Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com --- Makefile.objs |1 + block.c | 83 ++- block.h |2 + block/add-cow.c | 456 +++ block_int.h |6 + qemu-img.c | 10 ++ 6 files changed, 555 insertions(+), 3 deletions(-) create mode 100644 block/add-cow.c diff --git a/Makefile.objs b/Makefile.objs index 26b885b..1402f9f 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -31,6 +31,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o +block-nested-y += add-cow.o block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-nested-y += qed-check.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o diff --git a/block.c b/block.c index a8c789a..c797cfc 100644 --- a/block.c +++ b/block.c @@ -369,7 +369,7 @@ static int find_image_format(const char *filename, BlockDriver **pdrv) { int ret, score, score_max; BlockDriver *drv1, *drv; -uint8_t buf[2048]; +uint8_t buf[4096]; BlockDriverState *bs; ret = bdrv_file_open(bs, filename, 0); @@ -657,6 +657,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, int back_flags; BlockDriver *back_drv = NULL; +char imaging_filename[PATH_MAX]; +int cow_flags; +BlockDriver *cow_drv = NULL; + bs-backing_hd = bdrv_new(); if (path_has_protocol(bs-backing_file)) { @@ -686,6 +690,30 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, /* base image inherits from parent */ bs-backing_hd-keep_read_only = bs-keep_read_only; } + +/* If there is a image_file, must be together with backing_file */ +if (bs-image_file[0] != '\0') { +bs-image_hd = bdrv_new(); +if (path_has_protocol(bs-image_file)) { +pstrcpy(imaging_filename, sizeof(imaging_filename), +bs-image_file); +} else { +path_combine(imaging_filename, sizeof(imaging_filename), + filename, bs-image_file); +} + +cow_drv = bdrv_find_format(add-cow); + +cow_flags = + (flags (~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING))) | BDRV_O_RDWR; +bs-image_hd-keep_read_only = 0; + +ret = bdrv_open(bs-image_hd, imaging_filename, cow_flags, back_drv); +if (ret 0) { +bdrv_close(bs); +return ret; +} +} } if (!bdrv_key_required(bs)) { @@ -711,6 +739,10 @@ void bdrv_close(BlockDriverState *bs) bdrv_delete(bs-backing_hd); bs-backing_hd = NULL; } +if (bs-image_hd) { +bdrv_delete(bs-image_hd); +bs-image_hd = NULL; +} bs-drv-bdrv_close(bs); g_free(bs-opaque); #ifdef _WIN32 @@ -851,7 +883,7 @@ int bdrv_commit(BlockDriverState *bs) if (!drv) return -ENOMEDIUM; - + if (!bs-backing_hd) { return -ENOTSUP; } @@ -2024,6 +2056,16 @@ void bdrv_get_backing_filename(BlockDriverState *bs, } } +void bdrv_get_image_filename(BlockDriverState *bs, + char *filename, int filename_size) +{ +if (!bs-image_file) { +pstrcpy(filename, filename_size, ); +} else { +pstrcpy(filename, filename_size, bs-image_file); +} +} + int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num, const uint8_t *buf, int nb_sectors) { @@ -3201,8 +3243,10 @@ int bdrv_img_create(const char *filename, const char *fmt, QEMUOptionParameter *param = NULL, *create_options = NULL; QEMUOptionParameter *backing_fmt, *backing_file, *size; BlockDriverState *bs = NULL; -BlockDriver *drv, *proto_drv; +BlockDriver *drv, *proto_drv, *cow_drv;; BlockDriver *backing_drv = NULL; +QEMUOptionParameter *cow_create_options = NULL; +QEMUOptionParameter *image_file; int ret = 0; /* Find driver and parse its options */ @@ -3225,10 +3269,16 @@ int bdrv_img_create(const char *filename, const char *fmt, create_options = append_option_parameters(create_options, proto_drv-create_options); +/* Just support raw format now*/
[Qemu-devel] [PATCH] tcg/ppc64: Fix zero extension code generation bug for ppc64 host
From: Thomas Huth th...@de.ibm.com The ppc64 code generation backend uses an rldicr (Rotate Left Double Immediate and Clear Right) instruction to implement zero extension of a 32 bit quantity to a 64 bit quantity (INDEX_op_ext32u_i64). However this is wrong - this instruction clears specified low bits of the value, instead of high bits as we require for a zero extension. It should instead use an rldicl (Rotate Left Double Immediate and Clear Left) instruction. Presumably amongst other things, this causes the SLOF firmware image used with -M pseries to not boot on a ppc64 host. It appears this bug was exposed by commit 0bf1dbdcc935dfc220a93cd990e947e90706aec6 (tcg/ppc64: fix 16/32 mixup) which enabled the use of the op_ext32u_i64 operation on the ppc64 backend. Signed-off-by: Thomas Huth th...@de.ibm.com Signed-off-by: David Gibson da...@gibson.dropbear.id.au --- tcg/ppc64/tcg-target.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c index d831684..e3c63ad 100644 --- a/tcg/ppc64/tcg-target.c +++ b/tcg/ppc64/tcg-target.c @@ -1560,7 +1560,7 @@ static void tcg_out_op (TCGContext *s, TCGOpcode opc, const TCGArg *args, break; case INDEX_op_ext32u_i64: -tcg_out_rld (s, RLDICR, args[0], args[1], 0, 32); +tcg_out_rld (s, RLDICL, args[0], args[1], 0, 32); break; case INDEX_op_setcond_i32: -- 1.7.5.4