date:20110908

Re: [Qemu-devel] [PATCH] pci: add standard bridge device

2011-09-08 Thread Wen Congyang

At 09/07/2011 07:52 PM, Michael S. Tsirkin Write:
 On Wed, Sep 07, 2011 at 12:39:09PM +0800, Wen Congyang wrote:
 At 09/06/2011 03:45 PM, Avi Kivity Write:
 On 09/06/2011 06:06 AM, Wen Congyang wrote:
  Use the uio driver -
  http://docs.blackfin.uclinux.org/kernel/generated/uio-howto/.  You
 just
  mmap() the BAR from userspace and play with it.

 When I try to bind ivshmem to uio_pci_generic, I get the following
 messages:
 uio_pci_generic :01:01.0: No IRQ assigned to device: no support
 for interrupts?


 No idea what this means.

 PCI 3.0 6.2.4
 For x86 based PCs, the values in this register correspond to IRQ numbers 
 (0-15) of the standard dual
 8259 configuration. The value 255 is defined as meaning unknown or no 
 connection to the interrupt
 controller. Values between 15 and 254 are reserved.

 The register is interrupt line.

 I read the config of this device, the interrupt line is 0. It means that it 
 uses the IRQ0.

 The following is the uio_pci_generic's code:
 static int __devinit probe(struct pci_dev *pdev,
 const struct pci_device_id *id)
 {
  struct uio_pci_generic_dev *gdev;
  int err;

  err = pci_enable_device(pdev);
  if (err) {
  dev_err(pdev-dev, %s: pci_enable_device failed: %d\n,
  __func__, err);
  return err;
  }

  if (!pdev-irq) {
  dev_warn(pdev-dev, No IRQ assigned to device: 
   no support for interrupts?\n);
  pci_disable_device(pdev);
  return -ENODEV;
  }
 ...
 }

 This function will be called when we write 'domain:bus:slot.function' to 
 /sys/bus/pci/drivers/uio_pci_generic/bind.
 pdev-irq is 0, it means the device uses IRQ0. But we refuse it. I do not 
 why.

 To Michael S. Tsirkin
 This code is writen by you. Do you know why you check whether pdev-irq is 0?

 Thanks
 Wen Congyang


 
 Well I see this in linux:
 
 /*
  * Read interrupt line and base address registers.
  * The architecture-dependent code can tweak these, of course.
  */
 static void pci_read_irq(struct pci_dev *dev)
 {
 unsigned char irq;
 
 pci_read_config_byte(dev, PCI_INTERRUPT_PIN, irq);
 dev-pin = irq;
 if (irq)
 pci_read_config_byte(dev, PCI_INTERRUPT_LINE, irq);
 dev-irq = irq;
 }
 
 Thus a device without an interrupt pin will get irq set to 0,
 and this seems the right way to detect such devices.
 I don't think PCI devices really use IRQ0 in practice,
 its probably used for PC things. More likely the system is
 misconfigured.  Try lspci -vv to see what went wrong.

Yes, the PCI device shoulde not use IRQ0. I debug qemu's code, and find the
PCI_INTERRUPT_LINE register is not set by qemu:
=
Hardware watchpoint 6: ((uint8_t *) 0x164e410)[0x3c]

Old value = 0 '\000'
New value = 10 '\n'
pci_default_write_config (d=0x1653ed0, addr=60, val=10, l=1) at 
/home/wency/source/qemu/hw/pci.c:1115
1115d-config[addr + i] = ~(val  w1cmask); /* W1C: Write 1 to 
Clear */
Missing separate debuginfos, use: debuginfo-install 
cyrus-sasl-gssapi-2.1.23-8.el6.x86_64 cyrus-sasl-md5-2.1.23-8.el6.x86_64 
cyrus-sasl-plain-2.1.23-8.el6.x86_64 db4-4.7.25-16.el6.x86_64
(gdb) bt
#0  pci_default_write_config (d=0x1653ed0, addr=60, val=10, l=1) at 
/home/wency/source/qemu/hw/pci.c:1115
#1  0x004d5827 in pci_host_config_write_common (pci_dev=0x1653ed0, 
addr=60, limit=256, val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:54
#2  0x004d5939 in pci_data_write (s=0x15f95a0, addr=2147502140, val=10, 
len=1) at /home/wency/source/qemu/hw/pci_host.c:75
#3  0x004d5b19 in pci_host_data_write (handler=0x15f9570, addr=3324, 
val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:125
#4  0x0063ee06 in ioport_simple_writeb (opaque=0x15f9570, addr=3324, 
value=10) at /home/wency/source/qemu/rwhandler.c:48
#5  0x00470db9 in ioport_write (index=0, address=3324, data=10) at 
ioport.c:81
#6  0x004717bc in cpu_outb (addr=3324, val=10 '\n') at ioport.c:273
#7  0x005ef25d in kvm_handle_io (port=3324, data=0x77ff8000, 
direction=1, size=1, count=1) at /home/wency/source/qemu/kvm-all.c:834
#8  0x005ef7e6 in kvm_cpu_exec (env=0x13da0d0) at 
/home/wency/source/qemu/kvm-all.c:976
#9  0x005c1a7b in qemu_kvm_cpu_thread_fn (arg=0x13da0d0) at 
/home/wency/source/qemu/cpus.c:661
#10 0x0032864077e1 in start_thread () from /lib64/libpthread.so.0
#11 0x0032858e68ed in clone () from /lib64/libc.so.6
=

If I put ivshmem on bus 0, the PCI_INTERRUPT_LINE register can be set. So I 
guess this register is set by bios.
I use the newest seabios, and PCI_INTERRUPT_LINE register is not set if the 
deivce is not on bus0.

# lspci -vv
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
Subsystem: Red Hat, Inc Qemu virtual machine
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-

Re: [Qemu-devel] [PATCH v3 02/27] ide: Use a table to declare which drive kinds accept each command

2011-09-08 Thread Markus Armbruster

Kevin Wolf kw...@redhat.com writes:

 Am 06.09.2011 18:58, schrieb Markus Armbruster:
 No functional change.
 
 It would be nice to have handler functions in the table, like commit
 e1a064f9 did for ATAPI.  Left for another day.
 
 Signed-off-by: Markus Armbruster arm...@redhat.com
 ---
  hw/ide/core.c |  105 
 +++-
  1 files changed, 80 insertions(+), 25 deletions(-)

 +[IBM_SENSE_CONDITION]   = CFA_OK,
 +[CFA_WEAR_LEVEL]= CFA_OK,
 +[WIN_READ_NATIVE_MAX]   = ALL_OK,
 +};
 +
 +static bool ide_cmd_permitted(IDEState *s, uint32_t cmd)
 +{
 +return cmd = ARRAY_SIZE(ide_cmd_table)

 Shouldn't it be  instead of = ?

I plead temporary insanity.  Want a v4, or want to fix it up yourself?

Re: [Qemu-devel] [PATCH v3 14/27] block: Rename bdrv_set_locked() to bdrv_lock_medium()

2011-09-08 Thread Markus Armbruster

Kevin Wolf kw...@redhat.com writes:

 Am 06.09.2011 18:58, schrieb Markus Armbruster:
 While there, make the locked parameter bool.
 
 Signed-off-by: Markus Armbruster arm...@redhat.com
 ---
  block.c   |8 
  block.h   |2 +-
  block/raw-posix.c |8 
  block/raw.c   |6 +++---
  block_int.h   |2 +-
  hw/ide/atapi.c|2 +-
  hw/scsi-disk.c|2 +-
  trace-events  |1 +
  8 files changed, 16 insertions(+), 15 deletions(-)
 
 diff --git a/block.c b/block.c
 index 1e4be73..7225b15 100644
 --- a/block.c
 +++ b/block.c
 @@ -3072,14 +3072,14 @@ void bdrv_eject(BlockDriverState *bs, int eject_flag)
   * Lock or unlock the media (if it is locked, the user won't be able
   * to eject it manually).
   */
 -void bdrv_set_locked(BlockDriverState *bs, int locked)
 +void bdrv_lock_medium(BlockDriverState *bs, bool locked)
  {
  BlockDriver *drv = bs-drv;
  
 -trace_bdrv_set_locked(bs, locked);
 +trace_bdrv_lock_medium(bs, locked);
  
 -if (drv  drv-bdrv_set_locked) {
 -drv-bdrv_set_locked(bs, locked);
 +if (drv  drv-bdrv_lock_medium) {
 +drv-bdrv_lock_medium(bs, locked);
  }
  }
  
 diff --git a/block.h b/block.h
 index 396ca0e..4691090 100644
 --- a/block.h
 +++ b/block.h
 @@ -212,7 +212,7 @@ int bdrv_is_sg(BlockDriverState *bs);
  int bdrv_enable_write_cache(BlockDriverState *bs);
  int bdrv_is_inserted(BlockDriverState *bs);
  int bdrv_media_changed(BlockDriverState *bs);
 -void bdrv_set_locked(BlockDriverState *bs, int locked);
 +void bdrv_lock_medium(BlockDriverState *bs, bool locked);
  void bdrv_eject(BlockDriverState *bs, int eject_flag);
  void bdrv_get_format(BlockDriverState *bs, char *buf, int buf_size);
  BlockDriverState *bdrv_find(const char *name);
 diff --git a/block/raw-posix.c b/block/raw-posix.c
 index bcf50b2..a624f56 100644
 --- a/block/raw-posix.c
 +++ b/block/raw-posix.c
 @@ -1362,7 +1362,7 @@ static void cdrom_eject(BlockDriverState *bs, int 
 eject_flag)
  }
  }
  
 -static void cdrom_set_locked(BlockDriverState *bs, int locked)
 +static void cdrom_lock_medium(BlockDriverState *bs, bool locked)
  {
  BDRVRawState *s = bs-opaque;
  
 @@ -1400,7 +1400,7 @@ static BlockDriver bdrv_host_cdrom = {
  /* removable device support */
  .bdrv_is_inserted   = cdrom_is_inserted,
  .bdrv_eject = cdrom_eject,
 -.bdrv_set_locked= cdrom_set_locked,
 +.bdrv_lock_medium   = cdrom_lock_medium,
  
  /* generic scsi device */
  .bdrv_ioctl = hdev_ioctl,
 @@ -1481,7 +1481,7 @@ static void cdrom_eject(BlockDriverState *bs, int 
 eject_flag)
  cdrom_reopen(bs);
  }
  
 -static void cdrom_set_locked(BlockDriverState *bs, int locked)
 +static void cdrom_lock_medium(BlockDriverState *bs, bool locked)
  {
  BDRVRawState *s = bs-opaque;
  
 @@ -1521,7 +1521,7 @@ static BlockDriver bdrv_host_cdrom = {
  /* removable device support */
  .bdrv_is_inserted   = cdrom_is_inserted,
  .bdrv_eject = cdrom_eject,
 -.bdrv_set_locked= cdrom_set_locked,
 +.bdrv_lock_medium   = cdrom_lock_medium,
  };
  #endif /* __FreeBSD__ */
  
 diff --git a/block/raw.c b/block/raw.c
 index f197479..63cf2d3 100644
 --- a/block/raw.c
 +++ b/block/raw.c
 @@ -85,9 +85,9 @@ static void raw_eject(BlockDriverState *bs, int eject_flag)
  bdrv_eject(bs-file, eject_flag);
  }
  
 -static void raw_set_locked(BlockDriverState *bs, int locked)
 +static void raw_lock_medium(BlockDriverState *bs, bool locked)
  {
 -bdrv_set_locked(bs-file, locked);
 +bdrv_lock_medium(bs-file, locked);
  }
  
  static int raw_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
 @@ -144,7 +144,7 @@ static BlockDriver bdrv_raw = {
  .bdrv_is_inserted   = raw_is_inserted,
  .bdrv_media_changed = raw_media_changed,
  .bdrv_eject = raw_eject,
 -.bdrv_set_locked= raw_set_locked,
 +.bdrv_lock_medium   = raw_lock_medium,
  
  .bdrv_ioctl = raw_ioctl,
  .bdrv_aio_ioctl = raw_aio_ioctl,
 diff --git a/block_int.h b/block_int.h
 index 4f7ff3b..f42af2c 100644
 --- a/block_int.h
 +++ b/block_int.h
 @@ -120,7 +120,7 @@ struct BlockDriver {
  int (*bdrv_is_inserted)(BlockDriverState *bs);
  int (*bdrv_media_changed)(BlockDriverState *bs);
  void (*bdrv_eject)(BlockDriverState *bs, int eject_flag);
 -void (*bdrv_set_locked)(BlockDriverState *bs, int locked);
 +void (*bdrv_lock_medium)(BlockDriverState *bs, bool locked);
  
  /* to control generic scsi devices */
  int (*bdrv_ioctl)(BlockDriverState *bs, unsigned long int req, void 
 *buf);
 diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
 index afb27c6..06778f3 100644
 --- a/hw/ide/atapi.c
 +++ b/hw/ide/atapi.c
 @@ -833,7 +833,7 @@ static void cmd_test_unit_ready(IDEState *s, uint8_t 
 *buf)
  static void cmd_prevent_allow_medium_removal(IDEState *s, uint8_t* buf)
  {
  s-tray_locked = buf[4]  1;
 -bdrv_set_locked(s-bs, buf[4]  1);
 +

Re: [Qemu-devel] [PATCH] linux-user: Implement new ARM 64 bit cmpxchg kernel helper

2011-09-08 Thread Peter Maydell

On 31 August 2011 17:24, Dr. David Alan Gilbert
david.gilb...@linaro.org wrote:
 linux-user: Implement new ARM 64 bit cmpxchg kernel helper

 Linux 3.1 will have a new kernel-page helper for ARM implementing
 64 bit cmpxchg. Implement this helper in QEMU linux-user mode:
  * Provide kernel helper emulation for 64bit cmpxchg
  * Allow guest to object to guest offset to ensure it can map a page
  * Populate page with kernel helper version

 Signed-off-by: Dr. David Alan Gilbert david.gilb...@linaro.org

Reviewed-by: Peter Maydell peter.mayd...@linaro.org

Re: [Qemu-devel] [PATCH] pci: add standard bridge device

2011-09-08 Thread Wen Congyang

At 09/08/2011 02:15 PM, Wen Congyang Write:
 At 09/07/2011 07:52 PM, Michael S. Tsirkin Write:
 On Wed, Sep 07, 2011 at 12:39:09PM +0800, Wen Congyang wrote:
 At 09/06/2011 03:45 PM, Avi Kivity Write:
 On 09/06/2011 06:06 AM, Wen Congyang wrote:
  Use the uio driver -
  http://docs.blackfin.uclinux.org/kernel/generated/uio-howto/.  You
 just
  mmap() the BAR from userspace and play with it.

 When I try to bind ivshmem to uio_pci_generic, I get the following
 messages:
 uio_pci_generic :01:01.0: No IRQ assigned to device: no support
 for interrupts?


 No idea what this means.

 PCI 3.0 6.2.4
 For x86 based PCs, the values in this register correspond to IRQ numbers 
 (0-15) of the standard dual
 8259 configuration. The value 255 is defined as meaning unknown or no 
 connection to the interrupt
 controller. Values between 15 and 254 are reserved.

 The register is interrupt line.

 I read the config of this device, the interrupt line is 0. It means that it 
 uses the IRQ0.

 The following is the uio_pci_generic's code:
 static int __devinit probe(struct pci_dev *pdev,
const struct pci_device_id *id)
 {
 struct uio_pci_generic_dev *gdev;
 int err;

 err = pci_enable_device(pdev);
 if (err) {
 dev_err(pdev-dev, %s: pci_enable_device failed: %d\n,
 __func__, err);
 return err;
 }

 if (!pdev-irq) {
 dev_warn(pdev-dev, No IRQ assigned to device: 
  no support for interrupts?\n);
 pci_disable_device(pdev);
 return -ENODEV;
 }
 ...
 }

 This function will be called when we write 'domain:bus:slot.function' to 
 /sys/bus/pci/drivers/uio_pci_generic/bind.
 pdev-irq is 0, it means the device uses IRQ0. But we refuse it. I do not 
 why.

 To Michael S. Tsirkin
 This code is writen by you. Do you know why you check whether pdev-irq is 
 0?

 Thanks
 Wen Congyang



 Well I see this in linux:

 /*
  * Read interrupt line and base address registers.
  * The architecture-dependent code can tweak these, of course.
  */
 static void pci_read_irq(struct pci_dev *dev)
 {
 unsigned char irq;

 pci_read_config_byte(dev, PCI_INTERRUPT_PIN, irq);
 dev-pin = irq;
 if (irq)
 pci_read_config_byte(dev, PCI_INTERRUPT_LINE, irq);
 dev-irq = irq;
 }

 Thus a device without an interrupt pin will get irq set to 0,
 and this seems the right way to detect such devices.
 I don't think PCI devices really use IRQ0 in practice,
 its probably used for PC things. More likely the system is
 misconfigured.  Try lspci -vv to see what went wrong.
 
 Yes, the PCI device shoulde not use IRQ0. I debug qemu's code, and find the
 PCI_INTERRUPT_LINE register is not set by qemu:
 =
 Hardware watchpoint 6: ((uint8_t *) 0x164e410)[0x3c]
 
 Old value = 0 '\000'
 New value = 10 '\n'
 pci_default_write_config (d=0x1653ed0, addr=60, val=10, l=1) at 
 /home/wency/source/qemu/hw/pci.c:1115
 1115  d-config[addr + i] = ~(val  w1cmask); /* W1C: Write 1 to 
 Clear */
 Missing separate debuginfos, use: debuginfo-install 
 cyrus-sasl-gssapi-2.1.23-8.el6.x86_64 cyrus-sasl-md5-2.1.23-8.el6.x86_64 
 cyrus-sasl-plain-2.1.23-8.el6.x86_64 db4-4.7.25-16.el6.x86_64
 (gdb) bt
 #0  pci_default_write_config (d=0x1653ed0, addr=60, val=10, l=1) at 
 /home/wency/source/qemu/hw/pci.c:1115
 #1  0x004d5827 in pci_host_config_write_common (pci_dev=0x1653ed0, 
 addr=60, limit=256, val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:54
 #2  0x004d5939 in pci_data_write (s=0x15f95a0, addr=2147502140, 
 val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:75
 #3  0x004d5b19 in pci_host_data_write (handler=0x15f9570, addr=3324, 
 val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:125
 #4  0x0063ee06 in ioport_simple_writeb (opaque=0x15f9570, addr=3324, 
 value=10) at /home/wency/source/qemu/rwhandler.c:48
 #5  0x00470db9 in ioport_write (index=0, address=3324, data=10) at 
 ioport.c:81
 #6  0x004717bc in cpu_outb (addr=3324, val=10 '\n') at ioport.c:273
 #7  0x005ef25d in kvm_handle_io (port=3324, data=0x77ff8000, 
 direction=1, size=1, count=1) at /home/wency/source/qemu/kvm-all.c:834
 #8  0x005ef7e6 in kvm_cpu_exec (env=0x13da0d0) at 
 /home/wency/source/qemu/kvm-all.c:976
 #9  0x005c1a7b in qemu_kvm_cpu_thread_fn (arg=0x13da0d0) at 
 /home/wency/source/qemu/cpus.c:661
 #10 0x0032864077e1 in start_thread () from /lib64/libpthread.so.0
 #11 0x0032858e68ed in clone () from /lib64/libc.so.6
 =
 
 If I put ivshmem on bus 0, the PCI_INTERRUPT_LINE register can be set. So I 
 guess this register is set by bios.
 I use the newest seabios, and PCI_INTERRUPT_LINE register is not set if the 
 deivce is not on bus0.

Here is the seabios's code:
==
static void pci_bios_init_device(struct pci_device *pci)
{
u16 bdf = pci-bdf;
int pin, pic_irq;

Re: [Qemu-devel] [PATCH] ahci: add port I/O index-data pair

2011-09-08 Thread Daniel Verkamp

(Sorry for the slow response, was on vacation)

On Thu, Sep 1, 2011 at 7:58 AM, Alexander Graf ag...@suse.de wrote:
 On 08/30/2011 05:07 AM, Daniel Verkamp wrote:

 On Sun, Aug 28, 2011 at 11:48 AM, Alexander Grafag...@suse.de  wrote:

 On 27.08.2011, at 04:12, Daniel Verkamp wrote:

 Implement an I/O space index-data register pair as defined by the AHCI
 spec, including the corresponding SATA PCI capability and BAR.

 This allows real-mode code to access the AHCI registers; real-mode
 code cannot address the memory-mapped register space because it is
 beyond the first megabyte.

 Very nice patch! I'll check and compare with a real ICH-9 when I get
 back to .de, but I'd assume you also did that already ;). Once I checked
 that the IO region is set up similarly, I'll give you my ack.

 Please do double check against real hardware if you get the chance - I
 don't have a real ICH-9 handy to test against.  This is all written
 based on my reading of the spec and testing with an internal DOS
 developer tool from work.

 I am mainly curious how the real thing handles writes to the index
 register that aren't divisible by 4 or are beyond the end of the
 register set (and how big that really is on ICH-9).  Judging by the
 bits marked RO in the spec, I would guess writing 0x13 to the index
 and then reading it back should give 0x10, but I haven't tested it on
 real hw.

 Phew. So I finally got at least an ICH-9 system booting. This is what lspci
 -vvv tells me:

 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6
 port SATA AHCI Controller (rev 02) (prog-if 01 [AHCI 1.0])
    Subsystem: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI
 Controller
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
 Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort-
 MAbort- SERR- PERR- INTx-
    Latency: 0
    Interrupt: pin B routed to IRQ 26
    Region 0: I/O ports at d000 [size=8]
    Region 1: I/O ports at cc00 [size=4]
    Region 2: I/O ports at c880 [size=8]
    Region 3: I/O ports at c800 [size=4]
    Region 4: I/O ports at c480 [size=32]
    Region 5: Memory at ffaf9000 (32-bit, non-prefetchable) [size=2K]
    Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit-
        Address: fee0f00c  Data: 4169
    Capabilities: [70] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
 PME(D0-,D1-,D2-,D3hot+,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=0004
    Capabilities: [b0] Vendor Specific Information: Len=06 ?
    Kernel driver in use: ahci

 So BAR4 is where the IDP info should be. Offset is 4 into that IO space and
 the space is 32 bytes long. Do you have the ICH-9 implementation spec? I can
 try to dig something up if you don't have it around.


I'm not sure I understand what you mean, but I think everything is in
the right spot - compare with the real ICH-9 dump you provide
(relevant parts quoted below; full lspci dump from QEMU device at
end):

Real: Region 4: I/O ports at c480 [size=32]
QEMU: Region 4: I/O ports at c040 [size=32] (I/O address is different,
but that is ok)

Real: Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=0004
QEMU: Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=0004 (identical)

 Please send me a small test program I can run on the machine to find out
 what happens for unaligned I/O accesses. That would be very helpful!


I will try to put something together in the next few days and send it along;
is a DOS test app suitable?

Thanks,
-- Daniel Verkamp


Here is the lspci -vvv -nn -x dump of the QEMU-emulated AHCI
controller with the patch applied:

00:04.0 SATA controller [0106]: Intel Corporation 82801IR/IO/IH
(ICH9R/DO/DH) 6 port SATA AHCI Controller [8086:2922] (rev 02)
(prog-if 01 [AHCI 1.0])
Subsystem: Red Hat, Inc Device [1af4:1100]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort-
TAbort- MAbort- SERR- PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 24
Region 4: I/O ports at c040 [size=32]
Region 5: Memory at febf1000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=0004
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: fee0100c  Data: 4149
Kernel driver in use: ahci
00: 86 80 22 29 07 04 10 00 02 01 06 01 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 41 c0 00 00 00 10 bf fe 00 00 00 00 f4 1a 00 11
30: 00 00 00 00 a8 00 00 00 00 00 00 00 0b 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 05 00 81 00 0c 10 e0 fe 00 00 00 00 49 41 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00

Re: [Qemu-devel] [PATCH 11/31] block/raw: Fix to forward method bdrv_media_changed()

2011-09-08 Thread Kevin Wolf

Am 07.09.2011 21:25, schrieb Blue Swirl:
 On Tue, Sep 6, 2011 at 3:39 PM, Kevin Wolf kw...@redhat.com wrote:
 From: Markus Armbruster arm...@redhat.com

 Block driver raw forwards most methods to the underlying block
 driver.  However, it doesn't implement method bdrv_media_changed().
 Makes bdrv_media_changed() always return -ENOTSUP.

 I believe -fda /dev/fd0 gives you raw over host_floppy, and disk
 change detection (fdc register 7 bit 7) is broken.  Testing my theory
 requires a computer museum, though.
 
 Or software to emulate ancient computers? Maybe such software could be
 already available to you? ;-)

In general, such software is buggy. ;-)

Kevin

Re: [Qemu-devel] [RFC PATCH 4/5] VFIO: Add PCI device support

2011-09-08 Thread Avi Kivity


On 09/07/2011 09:55 PM, Konrad Rzeszutek Wilk wrote:

If you don't know what to do here, say N.
  +
  +menuconfig VFIO_PCI
  + bool VFIO support for PCI devices
  + depends on VFIO  PCI
  + default y if X86

Hahah.. And Linus is going to tear your behind for that.

Default should be 'n'


It depends on VFIO, which presumably defaults to n.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

Re: [Qemu-devel] [PATCH 0/2] improve qemu-img conversion performance

2011-09-08 Thread Stefan Hajnoczi

On Wed, Sep 07, 2011 at 04:06:51PM -0700, Yehuda Sadeh wrote:
 The following set of patches improve the qemu-img conversion process
 performance. When using a higher latency backend, small writes have a
 severe impact on the time it takes to do image conversion. 
 We switch to using async writes, and we avoid splitting writes due to
 holes when the holes are small enough.
 
 Yehuda Sadeh (2):
   qemu-img: async write to block device when converting image
   qemu-img: don't skip writing small holes
 
  qemu-img.c |   34 +++---
  1 files changed, 27 insertions(+), 7 deletions(-)
 
 -- 
 2.7.5.1

This has nothing to do with the patch itself, but I've been curious
about the existence of both a QEMU and a Linux kernel rbd block driver.

The I/O latency with qemu-img has been an issue for rbd users.  But they
have the option of using the Linux kernel rbd block driver, where
qemu-img can take advantage of the page cache instead of performing
direct I/O.

Does this mean you intend to support both QEMU block/rbd.c and Linux
drivers/block/rbd.c?  As a user I would go with the Linux kernel driver
instead of the QEMU block driver because it offers page cache and host
block device features.  On the other hand a userspace driver is nice
because it does not require privileges.

Stefan

Re: [Qemu-devel] Suspicious code in qcow2.

2011-09-08 Thread Kevin Wolf

Am 07.09.2011 18:42, schrieb Frediano Ziglio:
 Actually it does not cause problems but this code order seems a bit
 wrong to me (block/qcow2-cluster.c)
 
 
 QLIST_INSERT_HEAD(s-cluster_allocs, m, next_in_flight);
 
 /* allocate a new cluster */
 
 cluster_offset = qcow2_alloc_clusters(bs, nb_clusters * s-cluster_size);
 if (cluster_offset  0) {
 ret = cluster_offset;
 goto fail;
 }
 
 /* save info needed for meta data update */
 m-offset = offset;
 m-n_start = n_start;
 m-nb_clusters = nb_clusters;
 
 
 current metadata (m) get inserted in cluster allocation list with
 nb_clusters set to 0. Loop on cluster_allocs ignore (wait for this
 allocation or just skip it depending on dirty data in offset field)
 this metadata. Currently all occur in a CoMutex so this does not cause
 problems but in case qcow2_alloc_clusters unlock the mutex it can
 occur to insert two overlapping updates into cluster_allocs. Perhaps a
 better order would be
 
 
 /* save info needed for meta data update */
 m-offset = offset;
 m-n_start = n_start;
 m-nb_clusters = nb_clusters;
 
 QLIST_INSERT_HEAD(s-cluster_allocs, m, next_in_flight);
 
 /* allocate a new cluster */
 
 cluster_offset = qcow2_alloc_clusters(bs, nb_clusters * s-cluster_size);
 if (cluster_offset  0) {
 ret = cluster_offset;
 goto fail;
 }
 
 
 (tested successfully with iotests suite)

Yes, that makes sense. Once we run this code without holding the
CoMutex, this becomes a real problem. Care to send a patch?

Kevin

Re: [Qemu-devel] [PATCH 3/5] tcg/s390: Only one call output register needed for 64 bit hosts

2011-09-08 Thread Richard Henderson

On 09/07/2011 12:32 PM, Alexander Graf wrote:
 
 On 05.09.2011, at 11:07, Stefan Weil wrote:
 
 The second register is only needed for 32 bit hosts.
 
 Looks sane to me. Richard, mind to ack?
 
 
 Alex
 

 Cc: Alexander Graf ag...@suse.de
 Signed-off-by: Stefan Weil w...@mail.berlios.de

Acked-by: Richard Henderson r...@twiddle.net


r~

[Qemu-devel] [PATCH] target-i386: Compute all flag data inside %cl != 0 test.

2011-09-08 Thread Richard Henderson

The (x  (cl - 1)) quantity is only used if CL != 0.  Move the
computation of that quantity nearer its use.

This avoids the creation of undefined TCG operations when the
constant propagation optimization proves that CL == 0, and thus
CL-1 is outside the range [0-wordsize).

Signed-off-by: Richard Henderson r...@twiddle.net
---
 target-i386/translate.c |   72 ---
 1 files changed, 43 insertions(+), 29 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index ccef381..b966762 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -1406,70 +1406,84 @@ static void gen_shift_rm_T1(DisasContext *s, int ot, 
int op1,
 {
 target_ulong mask;
 int shift_label;
-TCGv t0, t1;
+TCGv t0, t1, t2;
 
-if (ot == OT_QUAD)
+if (ot == OT_QUAD) {
 mask = 0x3f;
-else
+} else {
 mask = 0x1f;
+}
 
 /* load */
-if (op1 == OR_TMP0)
+if (op1 == OR_TMP0) {
 gen_op_ld_T0_A0(ot + s-mem_index);
-else
+} else {
 gen_op_mov_TN_reg(ot, 0, op1);
+}
 
-tcg_gen_andi_tl(cpu_T[1], cpu_T[1], mask);
+t0 = tcg_temp_local_new();
+t1 = tcg_temp_local_new();
+t2 = tcg_temp_local_new();
 
-tcg_gen_addi_tl(cpu_tmp5, cpu_T[1], -1);
+tcg_gen_andi_tl(t2, cpu_T[1], mask);
 
 if (is_right) {
 if (is_arith) {
 gen_exts(ot, cpu_T[0]);
-tcg_gen_sar_tl(cpu_T3, cpu_T[0], cpu_tmp5);
-tcg_gen_sar_tl(cpu_T[0], cpu_T[0], cpu_T[1]);
+tcg_gen_mov_tl(t0, cpu_T[0]);
+tcg_gen_sar_tl(cpu_T[0], cpu_T[0], t2);
 } else {
 gen_extu(ot, cpu_T[0]);
-tcg_gen_shr_tl(cpu_T3, cpu_T[0], cpu_tmp5);
-tcg_gen_shr_tl(cpu_T[0], cpu_T[0], cpu_T[1]);
+tcg_gen_mov_tl(t0, cpu_T[0]);
+tcg_gen_shr_tl(cpu_T[0], cpu_T[0], t2);
 }
 } else {
-tcg_gen_shl_tl(cpu_T3, cpu_T[0], cpu_tmp5);
-tcg_gen_shl_tl(cpu_T[0], cpu_T[0], cpu_T[1]);
+tcg_gen_mov_tl(t0, cpu_T[0]);
+tcg_gen_shl_tl(cpu_T[0], cpu_T[0], t2);
 }
 
 /* store */
-if (op1 == OR_TMP0)
+if (op1 == OR_TMP0) {
 gen_op_st_T0_A0(ot + s-mem_index);
-else
+} else {
 gen_op_mov_reg_T0(ot, op1);
-
+}
+
 /* update eflags if non zero shift */
-if (s-cc_op != CC_OP_DYNAMIC)
+if (s-cc_op != CC_OP_DYNAMIC) {
 gen_op_set_cc_op(s-cc_op);
+}
 
-/* XXX: inefficient */
-t0 = tcg_temp_local_new();
-t1 = tcg_temp_local_new();
-
-tcg_gen_mov_tl(t0, cpu_T[0]);
-tcg_gen_mov_tl(t1, cpu_T3);
+tcg_gen_mov_tl(t1, cpu_T[0]);
 
 shift_label = gen_new_label();
-tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_T[1], 0, shift_label);
+tcg_gen_brcondi_tl(TCG_COND_EQ, t2, 0, shift_label);
 
-tcg_gen_mov_tl(cpu_cc_src, t1);
-tcg_gen_mov_tl(cpu_cc_dst, t0);
-if (is_right)
+tcg_gen_addi_tl(t2, t2, -1);
+tcg_gen_mov_tl(cpu_cc_dst, t1);
+
+if (is_right) {
+if (is_arith) {
+tcg_gen_sar_tl(cpu_cc_src, t0, t2);
+} else {
+tcg_gen_shr_tl(cpu_cc_src, t0, t2);
+}
+} else {
+tcg_gen_shl_tl(cpu_cc_src, t0, t2);
+}
+
+if (is_right) {
 tcg_gen_movi_i32(cpu_cc_op, CC_OP_SARB + ot);
-else
+} else {
 tcg_gen_movi_i32(cpu_cc_op, CC_OP_SHLB + ot);
-
+}
+
 gen_set_label(shift_label);
 s-cc_op = CC_OP_DYNAMIC; /* cannot predict flags after */
 
 tcg_temp_free(t0);
 tcg_temp_free(t1);
+tcg_temp_free(t2);
 }
 
 static void gen_shift_rm_im(DisasContext *s, int ot, int op1, int op2,
-- 
1.7.4.4

Re: [Qemu-devel] [PATCH] [SPARC] Gdbstub: Fix back-trace on SPARC32

2011-09-08 Thread Fabien Chouteau

On 07/09/2011 21:02, Blue Swirl wrote:
 On Tue, Sep 6, 2011 at 10:38 AM, Fabien Chouteau chout...@adacore.com wrote:
 On 05/09/2011 21:22, Blue Swirl wrote:
 On Mon, Sep 5, 2011 at 9:33 AM, Fabien Chouteau chout...@adacore.com 
 wrote:
 On 03/09/2011 11:25, Blue Swirl wrote:
 On Thu, Sep 1, 2011 at 2:17 PM, Fabien Chouteau chout...@adacore.com 
 wrote:
 Gdb expects all registers windows to be flushed in ram, which is not the 
 case
 in Qemu. Therefore the back-trace generation doesn't work. This patch 
 adds a
 function to handle reads/writes in stack frames as if windows were 
 flushed.

 Signed-off-by: Fabien Chouteau chout...@adacore.com
 ---
  gdbstub.c |   10 --
  target-sparc/cpu.h|7 
  target-sparc/helper.c |   85 
 +
  3 files changed, 99 insertions(+), 3 deletions(-)

 diff --git a/gdbstub.c b/gdbstub.c
 index 3b87c27..85d5ad7 100644
 --- a/gdbstub.c
 +++ b/gdbstub.c
 @@ -41,6 +41,9 @@
  #include qemu_socket.h
  #include kvm.h

 +#ifndef TARGET_CPU_MEMORY_RW_DEBUG
 +#define TARGET_CPU_MEMORY_RW_DEBUG cpu_memory_rw_debug

 These days, inline functions are preferred over macros.


 This is to allow target-specific implementation of the function.

 That can be done with inline functions too.

 OK, how do you do that?
 
 #ifndef TARGET_CPU_MEMORY_RW_DEBUG
 int target_memory_rw_debug(CPUState *env, target_ulong addr,
   uint8_t *buf, int len, int is_write)
 {
 return cpu_memory_rw_debug(env, addr, buf, len, is_write);
 }
 #else
 /* target_memory_rw_debug() defined in cpu.h */
 #endif
 

OK, understood.


 +#endif

  enum {
 GDB_SIGNAL_0 = 0,
 @@ -2013,7 +2016,7 @@ static int gdb_handle_packet(GDBState *s, const 
 char *line_buf)
 if (*p == ',')
 p++;
 len = strtoull(p, NULL, 16);
 -if (cpu_memory_rw_debug(s-g_cpu, addr, mem_buf, len, 0) != 0) {
 +if (TARGET_CPU_MEMORY_RW_DEBUG(s-g_cpu, addr, mem_buf, len, 0) 
 != 0) {

 cpu_memory_rw_debug() could remain unwrapped with a generic function
 like cpu_gdb_sync_memory() which gdbstub should explicitly call.

 Maybe the lazy condition codes etc. could be handled in similar way,
 cpu_gdb_sync_registers().


 Excuse me, I don't understand here.

 cpu_gdb_{read,write}_register needs to force calculation of lazy
 condition codes. On Sparc this is handled by cpu_get_psr(), so it is
 not explicit.

 I still don't understand you point. Do you suggest a cpu_gdb_sync_memory() 
 that
 will flush register windows?

 Not really but nevermind.

 +
 +/* Gdb expects all registers windows to be flushed in ram. This 
 function handles
 + * reads/writes in stack frames as if windows were flushed. We assume 
 that the
 + * sparc ABI is followed.
 + */

 We can't assume that, it depends on what we are executing (BIOS, OS,
 even application).

 Well, maybe the statement is too strong. The ABI is required to get a valid
 result. Gdb cannot build back-traces if the ABI is not followed anyway.

 But if the ABI assumption happens to be wrong (for example registers
 contain random values), memory may be corrupted because this would
 happily use whatever the registers contain.

 This cannot corrupt memory, the point is to read/write in registers instead 
 of
 memory.

 Sorry, I misread a part of the patch, guest memory is not written
 unlike I mistakenly assumed (simple register to memory flush).
 However, wrong ABI assumption may instead corrupt the registers.

 Another way to fix this would be that GDB would tell QEMU what ABI to
 use for flushing. But how would one tell GDB about a non-standard ABI?

 For user emulators we can make ABI assumptions, there similar patch
 could make sense. But system emulators can't assume anything about the
 guest OS, it could be Linux, *BSD, a commercial OS or even a toy OS.

 I think all of these kernels follow the SPARC32 ABI, and if they don't Gdb
 cannot handle them anyway.

 This solution covers 99% of the problem.

 As is, it's not 100% correct and the failure case is destructive. But
 would it make sense if the registers were not touched on write? Then
 to GDB the windows would appear as if flushed to memory, but like real
 hardware the registers would not automatically get updated from memory
 if it's changed by GDB. I don't think corruption would be possible in
 that case, though GDB (or the user) could get temporarily confused if
 a read from memory location would not return its true value.


I think this might be the best compromise. So I'll just handle reads in
register windows.

 BTW, cpu_cwp_inc() is called but there is no effort to restore CWP afterward.


The CWP in CPUState is never modified by cpu_cpw_inc().

Version 2 is on its way...

Regards,

-- 
Fabien Chouteau

[Qemu-devel] CFQ I/O starvation problem triggered by RHEL6.0 KVM guests

2011-09-08 Thread Takuya Yoshikawa

This is a report of strange cfq behaviour which seems to be triggered by
QEMU posix aio threads.

Host environment:
  OS: RHEL6.0 KVM/qemu-kvm (with no patch applied)
  IO scheduler: cfq (with the default parameters)

On the host, we were running 3 linux guests to see if I/O from these guests
would be handled fairly by host; each guest did dd write with oflag=direct.

Guest virtual disk:
  We used a host local disk which had 3 partitions, and each guest was
  allocated one of these as dd write target.

So our test was for checking if cfq could keep fairness for the 3 guests
who shared the same disk.

The result (strage starvation):
  Sometimes, one guest dominated cfq for more than 10sec and requests from
  other guests were not handled at all during that time.

Below is the blktrace log which shows that a request to (8,27) in cfq2068S (*1)
is not handled at all during cfq2095S and cfq2067S which hold requests to
(8,26) are being handled alternately.

*1) WS 104920578 + 64

Question:
  I guess that cfq_close_cooperator() was being called in an unusual manner.
  If so, do you think that cfq is responsible for keeping fairness for this
  kind of unusual write requests?

Note:
  With RHEL6.1, this problem could not triggered. But I guess that was due to
  QEMU's block layer updates.

Thanks,
Takuya

--- blktrace log ---
  8,16   0 2010 0.275081840  2068  A  WS 104920578 + 64 - (8,27) 0
  8,16   0 2011 0.275082180  2068  Q  WS 104920578 + 64 [qemu-kvm]
  8,16   00 0.275091369 0  m   N cfq2068S / alloced
  8,16   0 2012 0.275091909  2068  G  WS 104920578 + 64 [qemu-kvm]
  8,16   0 2013 0.275093352  2068  P   N [qemu-kvm]
  8,16   0 2014 0.275094059  2068  I   W 104920578 + 64 [qemu-kvm]
  8,16   00 0.275094887 0  m   N cfq2068S / insert_request
  8,16   00 0.275095742 0  m   N cfq2068S / add_to_rr
  8,16   0 2015 0.275097194  2068  U   N [qemu-kvm] 1
  8,16   2 2073 0.275189462  2095  A  WS 83979688 + 64 - (8,26) 4
  8,16   2 2074 0.275189989  2095  Q  WS 83979688 + 64 [qemu-kvm]
  8,16   2 2075 0.275192534  2095  G  WS 83979688 + 64 [qemu-kvm]
  8,16   2 2076 0.275193909  2095  I   W 83979688 + 64 [qemu-kvm]
  8,16   20 0.275195609 0  m   N cfq2095S / insert_request
  8,16   20 0.275196404 0  m   N cfq2095S / add_to_rr
  8,16   20 0.275198004 0  m   N cfq2095S / preempt
  8,16   20 0.275198688 0  m   N cfq2067S / slice expired t=1
  8,16   20 0.275199631 0  m   N cfq2067S / resid=100
  8,16   20 0.275200413 0  m   N cfq2067S / sl_used=1
  8,16   20 0.275201521 0  m   N / served: vt=1671968768 
min_vt=1671966720
  8,16   20 0.275202323 0  m   N cfq2067S / del_from_rr
  8,16   20 0.275204263 0  m   N cfq2095S / set_active 
wl_prio:0 wl_type:2
  8,16   20 0.275205131 0  m   N cfq2095S / fifo=(null)
  8,16   20 0.275205851 0  m   N cfq2095S / dispatch_insert
  8,16   20 0.275207121 0  m   N cfq2095S / dispatched a request
  8,16   20 0.275207873 0  m   N cfq2095S / activate rq, drv=1
  8,16   2 2077 0.275208198  2095  D   W 83979688 + 64 [qemu-kvm]
  8,16   2 2078 0.275269567  2095  U   N [qemu-kvm] 2
  8,16   4  836 0.275483550 0  C   W 83979688 + 64 [0]
  8,16   40 0.275496745 0  m   N cfq2095S / complete rqnoidle 0
  8,16   40 0.275497825 0  m   N cfq2095S / set_slice=100
  8,16   40 0.275499512 0  m   N cfq2095S / arm_idle: 8
  8,16   40 0.275499862 0  m   N cfq schedule dispatch
  8,16   6   85 0.275626195  2067  A  WS 83979752 + 64 - (8,26) 40064
  8,16   6   86 0.275626598  2067  Q  WS 83979752 + 64 [qemu-kvm]
  8,16   6   87 0.275628580  2067  G  WS 83979752 + 64 [qemu-kvm]
  8,16   6   88 0.275629630  2067  I   W 83979752 + 64 [qemu-kvm]
  8,16   60 0.275631047 0  m   N cfq2067S / insert_request
  8,16   60 0.275631730 0  m   N cfq2067S / add_to_rr
  8,16   60 0.275633567 0  m   N cfq2067S / preempt
  8,16   60 0.275634275 0  m   N cfq2095S / slice expired t=1
  8,16   60 0.275635285 0  m   N cfq2095S / resid=100
  8,16   60 0.275635985 0  m   N cfq2095S / sl_used=1
  8,16   60 0.275636882 0  m   N / served: vt=1671970816 
min_vt=1671968768
  8,16   60 0.275637585 0  m   N cfq2095S / del_from_rr
  8,16   60 0.275639382 0  m   N cfq2067S / set_active 
wl_prio:0 wl_type:2
  8,16   60 0.275640222 0  m   N cfq2067S / fifo=(null)
  8,16   60 0.275640809 0  m   N cfq2067S / dispatch_insert
  8,16   60 0.275641929 0  m   N cfq2067S / dispatched a request

Re: [Qemu-devel] [PATCH 1/2] build: fix missing trace dep on GENERATED_HEADERS

2011-09-08 Thread Stefan Hajnoczi

On Thu, Sep 8, 2011 at 12:40 AM, Michael Roth mdr...@linux.vnet.ibm.com wrote:
 fc764105 added an include for qemu-common.h to trace/control.h, which
 made all users of this header file dependent on GENERATED_HEADERS. Since
 it's used by pretty much all the trace backends now, make trace-obj-y
 dependent on GENERATED_HEADERS.

 Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
 ---
  Makefile.objs |    2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

Re: [Qemu-devel] [PATCH] pci: add standard bridge device

2011-09-08 Thread Gerd Hoffmann


  Hi,


I modify the code like this, and the PCI_INTERRUPT_LINE register is set, and I 
can bind
it to uio_pci_generic:



--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -575,6 +575,8 @@ static int pci_bios_init_root_regions(u32 start, u32 end)



  pci_bios_init_bus_bases(busses[0]);
-pci_bios_map_device_in_bus(0 /* host bus */);
+for (bus = 0; bus= MaxPCIBus; bus++) {
+pci_bios_map_device_in_bus(bus /* host bus */);


No.  pci_bios_map_device_in_bus goes down recursively when it finds a 
bridge, so it should cover all devices already.



-pci_bios_init_device_in_bus(0 /* host bus */);
+pci_bios_init_device_in_bus(bus /* host bus */);
+}


That is correct.  Can be done easier though by just not limiting device 
initialization to a specific bus like in the attached patch.  Does that 
one work for you?


cheers,
  Gerd
diff --git a/src/pciinit.c b/src/pciinit.c
index 597c8ea..676e35e 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -45,7 +45,7 @@ static struct pci_bus {
 } *busses;
 static int busses_count;
 
-static void pci_bios_init_device_in_bus(int bus);
+static void pci_bios_init_device_all(void);
 static void pci_bios_check_device_in_bus(int bus);
 static void pci_bios_init_bus_bases(struct pci_bus *bus);
 static void pci_bios_map_device_in_bus(int bus);
@@ -254,15 +254,10 @@ static void pci_bios_init_device(struct pci_device *pci)
 pci_init_device(pci_device_tbl, pci, NULL);
 }
 
-static void pci_bios_init_device_in_bus(int bus)
+static void pci_bios_init_device_all(void)
 {
 struct pci_device *pci;
 foreachpci(pci) {
-u8 pci_bus = pci_bdf_to_bus(pci-bdf);
-if (pci_bus  bus)
-continue;
-if (pci_bus  bus)
-break;
 pci_bios_init_device(pci);
 }
 }
@@ -605,7 +600,7 @@ pci_setup(void)
 pci_bios_init_bus_bases(busses[0]);
 pci_bios_map_device_in_bus(0 /* host bus */);
 
-pci_bios_init_device_in_bus(0 /* host bus */);
+pci_bios_init_device_all();
 
 struct pci_device *pci;
 foreachpci(pci) {

Re: [Qemu-devel] [PATCH 1/3] rbd: allow client id to be specified in config string

2011-09-08 Thread Stefan Hajnoczi

On Wed, Sep 7, 2011 at 5:28 PM, Sage Weil s...@newdream.net wrote:
 Allow the client id to be specified in the config string via 'id=' so that
 users can control who they authenticate as.  Currently they are stuck with
 the default ('admin').  This is necessary for anyone using authentication
 in their environment.

 Signed-off-by: Sage Weil s...@newdream.net
 ---
  block/rbd.c |   52 
  1 files changed, 44 insertions(+), 8 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

Re: [Qemu-devel] [PATCH 2/3] rbd: clean up, fix style

2011-09-08 Thread Stefan Hajnoczi

On Wed, Sep 7, 2011 at 5:28 PM, Sage Weil s...@newdream.net wrote:
 No assignment in condition.  Remove duplicate ret  0 check.

 Signed-off-by: Sage Weil s...@newdream.net
 ---
  block/rbd.c |   17 -
  1 files changed, 8 insertions(+), 9 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

Re: [Qemu-devel] [PATCH 3/3] rbd: fix leak in qemu_rbd_open failure paths

2011-09-08 Thread Stefan Hajnoczi

On Wed, Sep 7, 2011 at 5:28 PM, Sage Weil s...@newdream.net wrote:
 Fix leak of s-snap in failure path.  Simplify error paths for the whole
 function.

 Reported-by: Stefan Hajnoczi stefa...@gmail.com
 Signed-off-by: Sage Weil s...@newdream.net
 ---
  block/rbd.c |   28 +---
  1 files changed, 13 insertions(+), 15 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

[Qemu-devel] Consistent crasher on reboot / shutdown

2011-09-08 Thread Avi Kivity


#0  0x7f0f5da502c4 in malloc_consolidate.part.3 () from /lib64/libc.so.6
#1  0x7f0f5da51274 in _int_malloc () from /lib64/libc.so.6
#2  0x7f0f5da53b00 in malloc () from /lib64/libc.so.6
#3  0x0066cfec in malloc_and_trace (n_bytes=4120) at 
/build/home/tlv/akivity/qemu/vl.c:2154

#4  0x7f0f5fbdc1de in ?? () from /lib64/libglib-2.0.so.0
#5  0x7f0f5fbdc668 in g_malloc0 () from /lib64/libglib-2.0.so.0
#6  0x004f7e67 in qdict_new () at qdict.c:38
#7  0x005e31e8 in handle_user_command (mon=0x2fccc30, 
cmdline=0x2fcd0b0 help) at /build/home/tlv/akivity/qemu/monitor.c:4532
#8  0x005e4ed1 in monitor_command_cb (mon=0x2fccc30, 
cmdline=0x2fcd0b0 help, opaque=0x0) at 
/build/home/tlv/akivity/qemu/monitor.c:5190
#9  0x0050b04c in readline_handle_byte (rs=0x2fcd0b0, ch=10) at 
readline.c:370
#10 0x005e4e15 in monitor_read (opaque=0x2fccc30, 
buf=0x7fff0a383860 \n, size=1) at 
/build/home/tlv/akivity/qemu/monitor.c:5176
#11 0x004f8ff9 in qemu_chr_be_write (s=0x2e53ae0, 
buf=0x7fff0a383860 \n, len=1) at qemu-char.c:163
#12 0x004fcb57 in tcp_chr_read (opaque=0x2e53ae0) at 
qemu-char.c:2106
#13 0x0046d87d in qemu_iohandler_poll (readfds=0x7fff0a384920, 
writefds=0x7fff0a3849a0, xfds=0x7fff0a384a20, ret=1) at iohandler.c:175
#14 0x0066b4cc in main_loop_wait (nonblocking=0) at 
/build/home/tlv/akivity/qemu/vl.c:1438
#15 0x0066b59c in main_loop () at 
/build/home/tlv/akivity/qemu/vl.c:1469
#16 0x006701e5 in main (argc=23, argv=0x7fff0a384ee8, 
envp=0x7fff0a384fa8) at /build/home/tlv/akivity/qemu/vl.c:3491


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] pci: add standard bridge device

2011-09-08 Thread Wen Congyang

At 09/08/2011 05:43 PM, Gerd Hoffmann Write:
   Hi,
 
 I modify the code like this, and the PCI_INTERRUPT_LINE register is
 set, and I can bind
 it to uio_pci_generic:
 
 --- a/src/pciinit.c
 +++ b/src/pciinit.c
 @@ -575,6 +575,8 @@ static int pci_bios_init_root_regions(u32 start,
 u32 end)
 
   pci_bios_init_bus_bases(busses[0]);
 -pci_bios_map_device_in_bus(0 /* host bus */);
 +for (bus = 0; bus= MaxPCIBus; bus++) {
 +pci_bios_map_device_in_bus(bus /* host bus */);
 
 No.  pci_bios_map_device_in_bus goes down recursively when it finds a
 bridge, so it should cover all devices already.

Yes, pci_bios_map_device() goes down recursively.

 
 -pci_bios_init_device_in_bus(0 /* host bus */);
 +pci_bios_init_device_in_bus(bus /* host bus */);
 +}
 
 That is correct.  Can be done easier though by just not limiting device
 initialization to a specific bus like in the attached patch.  Does that
 one work for you?

I test it, and it works for me.

Thanks
Wen Congyang

 
 cheers,
   Gerd

Re: [Qemu-devel] [PATCH 1/3] rbd: allow client id to be specified in config string

2011-09-08 Thread Kevin Wolf

Am 07.09.2011 18:28, schrieb Sage Weil:
 Allow the client id to be specified in the config string via 'id=' so that
 users can control who they authenticate as.  Currently they are stuck with
 the default ('admin').  This is necessary for anyone using authentication
 in their environment.
 
 Signed-off-by: Sage Weil s...@newdream.net

Thanks, applied all to the block branch.

Kevin

Re: [Qemu-devel] [PATCH -V2] iohandler: update qemu_fd_set_handler to work with null call back arg

2011-09-08 Thread Avi Kivity


On 09/07/2011 09:44 PM, Anthony Liguori wrote:


I think this is a bit more complicated than is really needed.  Here's 
what I came up with which also fixes another bug where the io channel 
could be freed twice.  I stumbled across this via a very strange 
failure scenario.  Avi, it might be worth trying this patch to see if 
it fixes your problem too.


Right now, I've got more than just one problem.



One thing that I found challenging debugging this, coroutines make 
valgrind very unhappy.  Is it possible that we could have a command 
line switch to fall back to the thread based coroutines so to make 
things more valgrind friendly?


How is valgrind even aware of coroutines?  Unless is doesn't implement 
makecontext correctly, it shouldn't even be aware of them.



--
error compiling committee.c: too many arguments to function

[Qemu-devel] [PATCH v8 0/4] The intro of QEMU block I/O throttling

2011-09-08 Thread Zhi Yong Wu

The main goal of the patch is to effectively cap the disk I/O speed or counts 
of one single VM.It is only one draft, so it unavoidably has some drawbacks, if 
you catch them, please let me know.

The patch will mainly introduce one block I/O throttling algorithm, one timer 
and one block queue for each I/O limits enabled drive.

When a block request is coming in, the throttling algorithm will check if its 
I/O rate or counts exceed the limits; if yes, then it will enqueue to the block 
queue; The timer will handle the I/O requests in it.

Some available features follow as below:
(1) global bps limit.
   -drive bps=xxxin bytes/s
(2) only read bps limit
   -drive bps_rd=xxx in bytes/s
(3) only write bps limit
   -drive bps_wr=xxx in bytes/s
(4) global iops limit
   -drive iops=xxx   in ios/s
(5) only read iops limit
   -drive iops_rd=xxxin ios/s
(6) only write iops limit
   -drive iops_wr=xxxin ios/s
(7) the combination of some limits.
   -drive bps=xxx,iops=xxx

Known Limitations:
(1) #1 can not coexist with #2, #3
(2) #4 can not coexist with #5, #6
(3) When bps/iops limits are specified to a small value such as 511 bytes/s, 
this VM will hang up. We are considering how to handle this senario.

Changes since code V7:
  fix the build per patch based on stefan's comments.

Zhi Yong Wu (4):
  block: add the command line support
  block: add the block queue support
  block: add block timer and throttling algorithm
  qmp/hmp: add block_set_io_throttle

 v7: Mainly simply the block queue.
 Adjust codes based on stefan's comments.

 v6: Mainly fix the aio callback issue for block queue.
 Adjust codes based on Ram Pai's comments.

 v5: add qmp/hmp support.
 Adjust the codes based on stefan's comments
 qmp/hmp: add block_set_io_throttle

 v4: fix memory leaking based on ryan's feedback.

 v3: Added the code for extending slice time, and modified the method to 
compute wait time for the timer.

 v2: The codes V2 for QEMU disk I/O limits.
 Modified the codes mainly based on stefan's comments.

 v1: Submit the codes for QEMU disk I/O limits.
 Only a code draft.


 Makefile.objs |2 +-
 block.c   |  344 +++--
 block.h   |6 +-
 block/blk-queue.c |  201 +++
 block/blk-queue.h |   59 +
 block_int.h   |   30 +
 blockdev.c|   98 +++
 blockdev.h|2 +
 hmp-commands.hx   |   15 +++
 qemu-config.c |   24 
 qemu-options.hx   |1 +
 qerror.c  |4 +
 qerror.h  |3 +
 qmp-commands.hx   |   52 -
 14 files changed, 825 insertions(+), 16 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

-- 
1.7.6

[Qemu-devel] [PATCH v8 2/4] block: add the command line support

2011-09-08 Thread Zhi Yong Wu

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 block.c |   59 +++
 block.h |5 
 block_int.h |3 ++
 blockdev.c  |   29 +++
 qemu-config.c   |   24 ++
 qemu-options.hx |1 +
 6 files changed, 121 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 43742b7..cd75183 100644
--- a/block.c
+++ b/block.c
@@ -104,6 +104,57 @@ int is_windows_drive(const char *filename)
 }
 #endif
 
+/* throttling disk I/O limits */
+void bdrv_io_limits_disable(BlockDriverState *bs)
+{
+bs-io_limits_enabled = false;
+
+if (bs-block_queue) {
+qemu_block_queue_flush(bs-block_queue);
+qemu_del_block_queue(bs-block_queue);
+bs-block_queue = NULL;
+}
+
+if (bs-block_timer) {
+qemu_del_timer(bs-block_timer);
+qemu_free_timer(bs-block_timer);
+bs-block_timer = NULL;
+}
+
+bs-slice_start = 0;
+
+bs-slice_end   = 0;
+}
+
+static void bdrv_block_timer(void *opaque)
+{
+BlockDriverState *bs = opaque;
+BlockQueue *queue= bs-block_queue;
+
+qemu_block_queue_flush(queue);
+}
+
+void bdrv_io_limits_enable(BlockDriverState *bs)
+{
+bs-block_queue = qemu_new_block_queue();
+bs-block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+
+bs-slice_start = qemu_get_clock_ns(vm_clock);
+
+bs-slice_end   = bs-slice_start + BLOCK_IO_SLICE_TIME;
+}
+
+bool bdrv_io_limits_enabled(BlockDriverState *bs)
+{
+BlockIOLimit *io_limits = bs-io_limits;
+return io_limits-bps[BLOCK_IO_LIMIT_READ]
+ || io_limits-bps[BLOCK_IO_LIMIT_WRITE]
+ || io_limits-bps[BLOCK_IO_LIMIT_TOTAL]
+ || io_limits-iops[BLOCK_IO_LIMIT_READ]
+ || io_limits-iops[BLOCK_IO_LIMIT_WRITE]
+ || io_limits-iops[BLOCK_IO_LIMIT_TOTAL];
+}
+
 /* check if the path starts with protocol: */
 static int path_has_protocol(const char *path)
 {
@@ -1453,6 +1504,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
 *psecs = bs-secs;
 }
 
+/* throttling disk io limits */
+void bdrv_set_io_limits(BlockDriverState *bs,
+BlockIOLimit *io_limits)
+{
+bs-io_limits = *io_limits;
+bs-io_limits_enabled = bdrv_io_limits_enabled(bs);
+}
+
 /* Recognize floppy formats */
 typedef struct FDFormat {
 FDriveType drive;
diff --git a/block.h b/block.h
index 3ac0b94..a3e69db 100644
--- a/block.h
+++ b/block.h
@@ -58,6 +58,11 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
 void bdrv_stats_print(Monitor *mon, const QObject *data);
 void bdrv_info_stats(Monitor *mon, QObject **ret_data);
 
+/* disk I/O throttling */
+void bdrv_io_limits_enable(BlockDriverState *bs);
+void bdrv_io_limits_disable(BlockDriverState *bs);
+bool bdrv_io_limits_enabled(BlockDriverState *bs);
+
 void bdrv_init(void);
 void bdrv_init_with_whitelist(void);
 BlockDriver *bdrv_find_protocol(const char *filename);
diff --git a/block_int.h b/block_int.h
index 201e635..368c776 100644
--- a/block_int.h
+++ b/block_int.h
@@ -257,6 +257,9 @@ void qemu_aio_release(void *p);
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
+void bdrv_set_io_limits(BlockDriverState *bs,
+BlockIOLimit *io_limits);
+
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
diff --git a/blockdev.c b/blockdev.c
index 2602591..619ae9f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -236,6 +236,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 int on_read_error, on_write_error;
 const char *devaddr;
 DriveInfo *dinfo;
+BlockIOLimit io_limits;
 int snapshot = 0;
 int ret;
 
@@ -354,6 +355,31 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 }
 }
 
+/* disk I/O throttling */
+io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
+   qemu_opt_get_number(opts, bps, 0);
+io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+   qemu_opt_get_number(opts, bps_rd, 0);
+io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+   qemu_opt_get_number(opts, bps_wr, 0);
+io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
+   qemu_opt_get_number(opts, iops, 0);
+io_limits.iops[BLOCK_IO_LIMIT_READ]  =
+   qemu_opt_get_number(opts, iops_rd, 0);
+io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+   qemu_opt_get_number(opts, iops_wr, 0);
+
+if (((io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
+ ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
+|| (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0)))
+|| ((io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
+ ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
+|| (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0 {
+error_report(bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)
+ cannot be used at the same time);
+return

[Qemu-devel] [PATCH v8 1/4] block: add the block queue support

2011-09-08 Thread Zhi Yong Wu

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 Makefile.objs |2 +-
 block/blk-queue.c |  201 +
 block/blk-queue.h |   59 
 block_int.h   |   27 +++
 4 files changed, 288 insertions(+), 1 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

diff --git a/Makefile.objs b/Makefile.objs
index 26b885b..5dcf456 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o 
dmg.o bochs.o vpc.o vv
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
-block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
+block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o 
blk-queue.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block/blk-queue.c b/block/blk-queue.c
new file mode 100644
index 000..adef497
--- /dev/null
+++ b/block/blk-queue.c
@@ -0,0 +1,201 @@
+/*
+ * QEMU System Emulator queue definition for block layer
+ *
+ * Copyright (c) IBM, Corp. 2011
+ *
+ * Authors:
+ *  Zhi Yong Wu  wu...@linux.vnet.ibm.com
+ *  Stefan Hajnoczi stefa...@linux.vnet.ibm.com
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include block_int.h
+#include block/blk-queue.h
+#include qemu-common.h
+
+/* The APIs for block request queue on qemu block layer.
+ */
+
+struct BlockQueueAIOCB {
+BlockDriverAIOCB common;
+QTAILQ_ENTRY(BlockQueueAIOCB) entry;
+BlockRequestHandler *handler;
+BlockDriverAIOCB *real_acb;
+
+int64_t sector_num;
+QEMUIOVector *qiov;
+int nb_sectors;
+};
+
+typedef struct BlockQueueAIOCB BlockQueueAIOCB;
+
+struct BlockQueue {
+QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
+bool req_failed;
+bool flushing;
+};
+
+static void qemu_block_queue_dequeue(BlockQueue *queue,
+ BlockQueueAIOCB *request)
+{
+BlockQueueAIOCB *req;
+
+assert(queue);
+while (!QTAILQ_EMPTY(queue-requests)) {
+req = QTAILQ_FIRST(queue-requests);
+if (req == request) {
+QTAILQ_REMOVE(queue-requests, req, entry);
+break;
+}
+}
+}
+
+static void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
+{
+BlockQueueAIOCB *request = container_of(acb, BlockQueueAIOCB, common);
+if (request-real_acb) {
+bdrv_aio_cancel(request-real_acb);
+} else {
+assert(request-common.bs-block_queue);
+qemu_block_queue_dequeue(request-common.bs-block_queue,
+ request);
+}
+
+qemu_aio_release(request);
+}
+
+static AIOPool block_queue_pool = {
+.aiocb_size = sizeof(struct BlockQueueAIOCB),
+.cancel = qemu_block_queue_cancel,
+};
+
+static void qemu_block_queue_callback(void *opaque, int ret)
+{
+BlockQueueAIOCB *acb = opaque;
+
+if (acb-common.cb) {
+acb-common.cb(acb-common.opaque, ret);
+}
+
+qemu_aio_release(acb);
+}
+
+BlockQueue *qemu_new_block_queue(void)
+{
+BlockQueue *queue;
+
+queue = g_malloc0(sizeof(BlockQueue));
+
+QTAILQ_INIT(queue-requests);
+
+queue-req_failed = true;
+queue-flushing   = false;
+
+return queue;
+}
+
+void qemu_del_block_queue(BlockQueue *queue)
+{
+BlockQueueAIOCB *request, *next;
+
+QTAILQ_FOREACH_SAFE(request, queue-requests, entry, next) {
+QTAILQ_REMOVE(queue-requests, request, entry);
+qemu_aio_release(request);
+}
+
+g_free(queue);
+}
+
+BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+BlockDriverState *bs,
+BlockRequestHandler *handler,
+int64_t

[Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm

2011-09-08 Thread Zhi Yong Wu

Note:
 1.) When bps/iops limits are specified to a small value such as 511 
bytes/s, this VM will hang up. We are considering how to handle this senario.
 2.) When dd command is issued in guest, if its option bs is set to a 
large value such as bs=1024K, the result speed will slightly bigger than the 
limits.

For these problems, if you have nice thought, pls let us know.:)

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 block.c |  259 ---
 block.h |1 -
 2 files changed, 248 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index cd75183..c08fde8 100644
--- a/block.c
+++ b/block.c
@@ -30,6 +30,9 @@
 #include qemu-objects.h
 #include qemu-coroutine.h
 
+#include qemu-timer.h
+#include block/blk-queue.h
+
 #ifdef CONFIG_BSD
 #include sys/types.h
 #include sys/stat.h
@@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState 
*bs,
  QEMUIOVector *iov);
 static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+bool is_write, double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+bool is_write, int64_t *wait);
+
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
 QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -745,6 +755,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
int flags,
 bs-change_cb(bs-change_opaque, CHANGE_MEDIA);
 }
 
+/* throttling disk I/O limits */
+if (bs-io_limits_enabled) {
+bdrv_io_limits_enable(bs);
+}
+
 return 0;
 
 unlink_and_fail:
@@ -783,6 +798,18 @@ void bdrv_close(BlockDriverState *bs)
 if (bs-change_cb)
 bs-change_cb(bs-change_opaque, CHANGE_MEDIA);
 }
+
+/* throttling disk I/O limits */
+if (bs-block_queue) {
+qemu_del_block_queue(bs-block_queue);
+bs-block_queue = NULL;
+}
+
+if (bs-block_timer) {
+qemu_del_timer(bs-block_timer);
+qemu_free_timer(bs-block_timer);
+bs-block_timer = NULL;
+}
 }
 
 void bdrv_close_all(void)
@@ -2341,16 +2368,48 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, 
int64_t sector_num,
  BlockDriverCompletionFunc *cb, void *opaque)
 {
 BlockDriver *drv = bs-drv;
-
+BlockDriverAIOCB *ret;
+int64_t wait_time = -1;
+printf(sector_num=%ld, nb_sectors=%d\n, sector_num, nb_sectors);
 trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
 
-if (!drv)
-return NULL;
-if (bdrv_check_request(bs, sector_num, nb_sectors))
+if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
 return NULL;
+}
+
+/* throttling disk read I/O */
+if (bs-io_limits_enabled) {
+if (bdrv_exceed_io_limits(bs, nb_sectors, false, wait_time)) {
+ret = qemu_block_queue_enqueue(bs-block_queue, bs, bdrv_aio_readv,
+   sector_num, qiov, nb_sectors, cb, opaque);
+printf(wait_time=%ld\n, wait_time);
+if (wait_time != -1) {
+printf(reset block timer\n);
+qemu_mod_timer(bs-block_timer,
+   wait_time + qemu_get_clock_ns(vm_clock));
+}
+
+if (ret) {
+printf(ori ret is not null\n);
+} else {
+printf(ori ret is null\n);
+}
+
+return ret;
+}
+}
 
-return drv-bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
+ret =  drv-bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
cb, opaque);
+if (ret) {
+if (bs-io_limits_enabled) {
+bs-io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
+  (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+bs-io_disps.ios[BLOCK_IO_LIMIT_READ]++;
+}
+}
+
+return ret;
 }
 
 typedef struct BlockCompleteData {
@@ -2396,15 +2455,14 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, 
int64_t sector_num,
 BlockDriver *drv = bs-drv;
 BlockDriverAIOCB *ret;
 BlockCompleteData *blk_cb_data;
+int64_t wait_time = -1;
 
 trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
 
-if (!drv)
-return NULL;
-if (bs-read_only)
-return NULL;
-if (bdrv_check_request(bs, sector_num, nb_sectors))
+if (!drv || bs-read_only
+|| bdrv_check_request(bs, sector_num, nb_sectors)) {
 return NULL;
+}
 
 if (bs-dirty_bitmap) {
 blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
@@ -2413,13 +2471,32 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, 
int64_t sector_num,
 opaque = blk_cb_data;
 }
 
+

[Qemu-devel] [PATCH v8 4/4] qmp/hmp: add block_set_io_throttle

2011-09-08 Thread Zhi Yong Wu

The patch introduce one new command block_set_io_throttle; For its usage 
syntax, if you have better idea, pls let me know.

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 block.c |   26 +++-
 blockdev.c  |   69 +++
 blockdev.h  |2 +
 hmp-commands.hx |   15 
 qerror.c|4 +++
 qerror.h|3 ++
 qmp-commands.hx |   52 -
 7 files changed, 168 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index c08fde8..1d3f067 100644
--- a/block.c
+++ b/block.c
@@ -1938,6 +1938,16 @@ static void bdrv_print_dict(QObject *obj, void *opaque)
 qdict_get_bool(qdict, ro),
 qdict_get_str(qdict, drv),
 qdict_get_bool(qdict, encrypted));
+
+monitor_printf(mon,  bps=% PRId64  bps_rd=% PRId64
+ bps_wr=% PRId64  iops=% PRId64
+ iops_rd=% PRId64  iops_wr=% PRId64,
+qdict_get_int(qdict, bps),
+qdict_get_int(qdict, bps_rd),
+qdict_get_int(qdict, bps_wr),
+qdict_get_int(qdict, iops),
+qdict_get_int(qdict, iops_rd),
+qdict_get_int(qdict, iops_wr));
 } else {
 monitor_printf(mon,  [not inserted]);
 }
@@ -1970,10 +1980,22 @@ void bdrv_info(Monitor *mon, QObject **ret_data)
 QDict *bs_dict = qobject_to_qdict(bs_obj);
 
 obj = qobject_from_jsonf({ 'file': %s, 'ro': %i, 'drv': %s, 
- 'encrypted': %i },
+ 'encrypted': %i, 
+ 'bps': % PRId64 ,
+ 'bps_rd': % PRId64 ,
+ 'bps_wr': % PRId64 ,
+ 'iops': % PRId64 ,
+ 'iops_rd': % PRId64 ,
+ 'iops_wr': % PRId64 },
  bs-filename, bs-read_only,
  bs-drv-format_name,
- bdrv_is_encrypted(bs));
+ bdrv_is_encrypted(bs),
+ bs-io_limits.bps[BLOCK_IO_LIMIT_TOTAL],
+ bs-io_limits.bps[BLOCK_IO_LIMIT_READ],
+ bs-io_limits.bps[BLOCK_IO_LIMIT_WRITE],
+ bs-io_limits.iops[BLOCK_IO_LIMIT_TOTAL],
+ bs-io_limits.iops[BLOCK_IO_LIMIT_READ],
+ bs-io_limits.iops[BLOCK_IO_LIMIT_WRITE]);
 if (bs-backing_file[0] != '\0') {
 QDict *qdict = qobject_to_qdict(obj);
 qdict_put(qdict, backing_file,
diff --git a/blockdev.c b/blockdev.c
index 619ae9f..7f5c4df 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -747,6 +747,75 @@ int do_change_block(Monitor *mon, const char *device,
 return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
 
+/* throttling disk I/O limits */
+int do_block_set_io_throttle(Monitor *mon,
+   const QDict *qdict, QObject **ret_data)
+{
+const char *devname = qdict_get_str(qdict, device);
+uint64_t bps= qdict_get_try_int(qdict, bps, -1);
+uint64_t bps_rd = qdict_get_try_int(qdict, bps_rd, -1);
+uint64_t bps_wr = qdict_get_try_int(qdict, bps_wr, -1);
+uint64_t iops   = qdict_get_try_int(qdict, iops, -1);
+uint64_t iops_rd= qdict_get_try_int(qdict, iops_rd, -1);
+uint64_t iops_wr= qdict_get_try_int(qdict, iops_wr, -1);
+BlockDriverState *bs;
+
+bs = bdrv_find(devname);
+if (!bs) {
+qerror_report(QERR_DEVICE_NOT_FOUND, devname);
+return -1;
+}
+
+if ((bps == -1)  (bps_rd == -1)  (bps_wr == -1)
+ (iops == -1)  (iops_rd == -1)  (iops_wr == -1)) {
+qerror_report(QERR_MISSING_PARAMETER,
+  bps/bps_rd/bps_wr/iops/iops_rd/iops_wr);
+return -1;
+}
+
+if (((bps != -1)  ((bps_rd != -1) || (bps_wr != -1)))
+|| ((iops != -1)  ((iops_rd != -1) || (iops_wr != -1 {
+qerror_report(QERR_INVALID_PARAMETER_COMBINATION);
+return -1;
+}
+
+if (bps != -1) {
+bs-io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
+bs-io_limits.bps[BLOCK_IO_LIMIT_READ]  = 0;
+bs-io_limits.bps[BLOCK_IO_LIMIT_WRITE] = 0;
+}
+
+if ((bps_rd != -1) || (bps_wr != -1)) {
+bs-io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+   (bps_rd == -1) ? bs-io_limits.bps[BLOCK_IO_LIMIT_READ] : bps_rd;
+bs-io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+   (bps_wr == -1) ? bs-io_limits.bps[BLOCK_IO_LIMIT_WRITE] : bps_wr;
+

Re: [Qemu-devel] [PATCH -V2] iohandler: update qemu_fd_set_handler to work with null call back arg

2011-09-08 Thread Kevin Wolf

Am 08.09.2011 12:07, schrieb Avi Kivity:
 On 09/07/2011 09:44 PM, Anthony Liguori wrote:

 I think this is a bit more complicated than is really needed.  Here's 
 what I came up with which also fixes another bug where the io channel 
 could be freed twice.  I stumbled across this via a very strange 
 failure scenario.  Avi, it might be worth trying this patch to see if 
 it fixes your problem too.
 
 Right now, I've got more than just one problem.
 

 One thing that I found challenging debugging this, coroutines make 
 valgrind very unhappy.  Is it possible that we could have a command 
 line switch to fall back to the thread based coroutines so to make 
 things more valgrind friendly?
 
 How is valgrind even aware of coroutines?  Unless is doesn't implement 
 makecontext correctly, it shouldn't even be aware of them.

The F15 valgrind complains three times that the program is switching
stacks, but then it shuts up and just works as normal.

Kevin

Re: [Qemu-devel] [PATCH V8 10/14] Encrypt state blobs using AES CBC encryption

2011-09-08 Thread Michael S. Tsirkin

On Wed, Sep 07, 2011 at 08:16:27PM -0400, Stefan Berger wrote:
 On 09/07/2011 02:55 PM, Michael S. Tsirkin wrote:
 On Thu, Sep 01, 2011 at 10:23:51PM -0400, Stefan Berger wrote:
 An additional 'layer' for reading and writing the blobs to the underlying
 block storage is added. This layer encrypts the blobs for writing if a 
 key is
 available. Similarly it decrypts the blobs after reading.
 So a couple of further thoughts:
 1. Raw storage should work too, and with e.g. NFS migration will be fine, 
 right?
 So I'd say it's worth supporting.
 NFS via shared storage, yes, but not migration via Qemu's block
 migration mechanism. If snapshotting was supposed to be a feature to
 support then that's only possible via block storage (QCoW2 in
 particular).

As disk has the same limitation, that sounds fine.
Let the user decide whether snapshoting is needed,
same as disk.

 Adding plain file support to the TPM code so it can store its 3
 blobs into adds quite a bit of complexity to the code. The command
 line parameter that previously pointed to QCoW2 image file would
 probably have to point to a directory where files for the 3 blobs
 can be written into. Besides that, snapshotting would actually have
 to be prevented maybe through registering a (fake) file of other
 than QCoW2 type since the plain TPM files won't handle snapshotting
 correctly, either, and QEMU pretty much would have to be prevented
 from doing snapshotting at all. Maybe there's an API for this, but I
 don't know. Though why create this additional complexity? I don't
 mind relaxing the requirement of using a QCoW2 image and allowing
 for example RAW images (that then automatically prevent the
 snapshotting from happening) but the same code I now have would work
 for writing the blobs into it the single file.

Right. Write all blobs into a single files at different
offsets, or something.

 2. File backed nvram is interesting outside tpm.
 For example,vpd and chassis number for pci, eeprom emulation for network 
  cards.
 Using a file per device might be inconvenient though.
 So please think of a format and API that will allow sections
 for use by different devices.
 Also here 'snapshotting' is the most 'demanding' feature of QEMU I
 would say. Snapshotting isn't easily supported outside of the block
 layer from what I understand. Once you are tied to the block layer
 you end up having to use images and those don't grow quite well. So
 other devices wanting to use those type of devices would need to
 know what the worst case sizes are for writing their state into --
 unless an image format is created that can grow.
 
 As for the format: Ideally all devices could write into one file,
 right? That would at least prevent too many files besides the VM's
 image file from floating around which presumably makes image
 management easier. Following the above, you add up all the worst
 case sizes the individual devices may need for their blobs and
 create an image with that capacity. Then you need some form of a
 (primitive?) directory that lets you write blobs into that storage.
 Assuming there were well defined names for those devices one could
 say for example store this blobs under the name
 'tpm-permanent-state' and later on load it under that name. The
 possible size of the directory would have to be considered as
 well... I do something like that for the TPM where I have up to 3
 such blobs that I store.
 
 The bad thing about the above is of course the need to know what the
 sum of all the worst case sizes is.

A typical usecase I know about has prepared vpd/eeprom content.
We'll typically need a tool to get binary blobs and put that into the
file image.  That tool can do the necessary math.
We could also integrate this into qemu-img if we like.

 So a growable image format would
 be quite good to have. I haven't followed the conversations much,
 but is that something QCoW3 would support?

I don't follow - does TPM need a growable image format? Why?
Hardware typically has fixed amount of memory :)

 Crazy idea: Is there a filesystem that one could use and mount a
 filesystem onto (some) sectors of an image? Again, the best format
 right now is QCoW2 for this (due to snapshotting suport) where one
 would have to be able to mount a filesystem onto the current
 snapshot's available sectors. Then at least the handling of blobs
 would become a lot easier. Though I doubt this would be possible
 without custom code and lots of development.

Hmm, libguestfs can do all kind of smart stuff.
But we don't want qemu to depend on that.

 3. Home-grown file formats give us enough trouble in migration.
 Could this use one of the variants of ASN.1?
 There are portable libraries to read/write that, even.
 
 I am not sure what 'this' refers to. What I am doing with the TPM is
 writing 3 independent blobs at certain offset into the QCoW2 block
 file. A directory in the first sector holds the offsets, sizes and
 crc32's of these (unencrypted)

Re: [Qemu-devel] [PATCH V8 08/14] Introduce file lock for the block layer

2011-09-08 Thread Michael S. Tsirkin

On Wed, Sep 07, 2011 at 08:31:45PM -0400, Stefan Berger wrote:
 On 09/07/2011 02:49 PM, Michael S. Tsirkin wrote:
 On Wed, Sep 07, 2011 at 12:08:22PM -0400, Stefan Berger wrote:
 On 09/07/2011 11:16 AM, Michael S. Tsirkin wrote:
 
 So it's a bug in the code then?
 
  From what I saw, yes. Migration is not complete until the passwords
 had been entered. Though the requirement for a correct password
 wasn't there before because Qemu just couldn't know which password
 is correct since it doesn't know what content in a VM image is
 correct -- just using the wrong key gives you content but it's of
 course not understandable.
 OK, we covered that on irc - the issue is that monitor
 on destination is inactive until migration is complete.
 Yes we need to fix that but no, it's not a tpm only
 problem.
 I think the TPM is the first device that needs that password before
 the migration switch-over happens.

Yes. But we want the monitor on dest for other reasons,
for example to be able to check migration status.

 The reason is that the TPM
 emulation layer needs the password/key to read the data from the
 QCoW2 to be able to initialize a device BEFORE the Qemu on the
 source side terminates thinking that the migration went ok.
 Previously an OS image that was 'opened' with the wrong key/password
 would probably cause the OS to not be able to read the data and
 hopefully not destroy it by writing wrongly encrypted data into it
 -- QEMU wasn't able to detect whether the QCoW2 encryption key was
 correct or not since it has not knowledge of the organization of the
 data inside the image.
 [[You'd need some form of reference point, like a sector that when
 written to a hash is calculated over its data and that hash is
 written into a location in the image. If a wrong key is given and
 the sector's hash ends up being != the reference hash you could say
 the key is wrong.]]
 Similar problems occur when you start a
 VM with an encrypted QCoW2 image. The monitor will prompt you for
 the password and then you start the VM and if the password was wrong
 the OS just won't be able to access the image.
 
 Stefan
 Why can't you verify the password?
 
 I do verify the key/password in the TPM driver. If the driver cannot
 make sense of the contents of the QCoW2 due to wrong key I simply
 put the driver into failure mode. That's all I can do with encrypted
 QCoW2.
 You can return error from init script which will make qemu exit.
 
 I can return an error code when the front- and backend interfaces
 are initialized, but that happens really early and the encyrption
 key entered through the monitor is not available at this point.
 
 I also don't get a notification about when the key was entered. In
 case of QCoW2 encryption (and migration) I delay initialization
 until very late, basically when the VM accesses the tpm tis hardware
 emulation layer again (needs to be done this way I think to support
 block migration where I cannot even access the block device early on
 at all).
 So it in the loadvm callback. This happens when guest is
 stopped on source, so no need for locks.
 Two bigger cases here:
 
 1) Encryption key passed via command line:
  - Migration with shared storage: When Qemu is initializing on
 the destination side I try to access the QCoW2 file. I do some basic
 tests to check whether a key was needed but none was given or
 whether some of the content could be read to confirm a valid key.
 This is done really early on during startup of Qemu on the
 destination side while or before actually the memory pages were
 transferred. Graceful termination was easily possible here.
  - Migration using block migration: During initialization I only
 see an empty QCoW2 file (created by libvirt). I terminate at this
 point and do another initialization later on which basically comes
 down to initializing upon access of the TPM TIS interface. At this
 point graceful termination wasn't possible anymore. There may be a
 possibility to do this in the loadvm callback, assuming block
 migration at that point has already finished, which I am not quite
 sure. Though along with case 2) below this would then end up in 3
 different times for initialization of the emulation layer.
 
 2) QCoW2 encryption:
  - This maps to the last case above. Also here graceful
 termination wasn't possible.
 
 As for the loadvm callback: I have a note in my code that in case of
 QCoW2 encryption the key is not available, yet. So I even have to
 defer initialization further. In this case Qemu on the source
 machine will have terminated.
 
 Stefan
 The point is to decrypt when you start running on dest.
 When the monitor gets the key for the QCoW2 it would have to invoke
 the TPM driver code (callback) and return an error code if the key
 was found to be wrong and display an error message that libvirt
 could react to.
 Afaik none of the callback and error display logic
 is in place.
 Is that something we could add later as an improvement?

What we

Re: [Qemu-devel] [PATCH] pci: add standard bridge device

2011-09-08 Thread Michael S. Tsirkin

On Thu, Sep 08, 2011 at 05:58:12PM +0800, Wen Congyang wrote:
 At 09/08/2011 05:43 PM, Gerd Hoffmann Write:
Hi,
  
  I modify the code like this, and the PCI_INTERRUPT_LINE register is
  set, and I can bind
  it to uio_pci_generic:
  
  --- a/src/pciinit.c
  +++ b/src/pciinit.c
  @@ -575,6 +575,8 @@ static int pci_bios_init_root_regions(u32 start,
  u32 end)
  
pci_bios_init_bus_bases(busses[0]);
  -pci_bios_map_device_in_bus(0 /* host bus */);
  +for (bus = 0; bus= MaxPCIBus; bus++) {
  +pci_bios_map_device_in_bus(bus /* host bus */);
  
  No.  pci_bios_map_device_in_bus goes down recursively when it finds a
  bridge, so it should cover all devices already.
 
 Yes, pci_bios_map_device() goes down recursively.

The value seems to be wrong though, I think.
It seems to simply use the interrupt pin as array index.
Instead, each bridge should interrupts as follows:

/* Mapping mandated by PCI-to-PCI Bridge architecture specification,
 * revision 1.2 */
/* Table 9-1: Interrupt Binding for Devices Behind a Bridge */
static int pci_bridge_dev_map_irq_fn(PCIDevice *dev, int irq_num)
{
return (irq_num + PCI_SLOT(dev-devfn)) % PCI_NUM_PINS;
}

until we get to the host bridge.


  
  -pci_bios_init_device_in_bus(0 /* host bus */);
  +pci_bios_init_device_in_bus(bus /* host bus */);
  +}
  
  That is correct.  Can be done easier though by just not limiting device
  initialization to a specific bus like in the attached patch.  Does that
  one work for you?
 
 I test it, and it works for me.
 
 Thanks
 Wen Congyang
 
  
  cheers,
Gerd

[Qemu-devel] [PATCH] pci: Remove unused pci_reserve_capability

2011-09-08 Thread Jan Kiszka

eepro100 was the last user. Now pci_add_capability is powerful enough.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/pci.c |6 --
 hw/pci.h |2 --
 2 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 57ff7b1..63c346d 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -2028,12 +2028,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, 
uint8_t size)
 pdev-config[PCI_STATUS] = ~PCI_STATUS_CAP_LIST;
 }
 
-/* Reserve space for capability at a known offset (to call after load). */
-void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size)
-{
-memset(pdev-used + offset, 0xff, size);
-}
-
 uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
 {
 return pci_find_capability_list(pdev, cap_id, NULL);
diff --git a/hw/pci.h b/hw/pci.h
index 391217e..f2dae63 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -209,8 +209,6 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
 
 void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
 
-void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t size);
-
 uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
 
 
-- 
1.7.3.4

[Qemu-devel] [PATCH V2] [SPARC] Gdbstub: Fix back-trace on SPARC32

2011-09-08 Thread Fabien Chouteau

Gdb expects all registers windows to be flushed in ram, which is not the case
in Qemu. Therefore the back-trace generation doesn't work. This patch adds a
function to handle reads (and only read) in stack frames as if windows were
flushed.

Signed-off-by: Fabien Chouteau chout...@adacore.com
---

V2:
  * only handle reads in stack frames

 gdbstub.c |   16 +++--
 target-sparc/cpu.h|7 
 target-sparc/helper.c |   84 +
 3 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index 3b87c27..7802c5f 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -41,6 +41,15 @@
 #include qemu_socket.h
 #include kvm.h
 
+#ifndef TARGET_CPU_MEMORY_RW_DEBUG
+static inline int target_memory_rw_debug(CPUState *env, target_ulong addr,
+ uint8_t *buf, int len, int is_write)
+{
+return cpu_memory_rw_debug(env, addr, buf, len, is_write);
+}
+#else
+/* target_memory_rw_debug() defined in cpu.h */
+#endif
 
 enum {
 GDB_SIGNAL_0 = 0,
@@ -2013,7 +2022,7 @@ static int gdb_handle_packet(GDBState *s, const char 
*line_buf)
 if (*p == ',')
 p++;
 len = strtoull(p, NULL, 16);
-if (cpu_memory_rw_debug(s-g_cpu, addr, mem_buf, len, 0) != 0) {
+if (target_memory_rw_debug(s-g_cpu, addr, mem_buf, len, 0) != 0) {
 put_packet (s, E14);
 } else {
 memtohex(buf, mem_buf, len);
@@ -2028,10 +2037,11 @@ static int gdb_handle_packet(GDBState *s, const char 
*line_buf)
 if (*p == ':')
 p++;
 hextomem(mem_buf, p, len);
-if (cpu_memory_rw_debug(s-g_cpu, addr, mem_buf, len, 1) != 0)
+if (target_memory_rw_debug(s-g_cpu, addr, mem_buf, len, 1) != 0) {
 put_packet(s, E14);
-else
+} else {
 put_packet(s, OK);
+}
 break;
 case 'p':
 /* Older gdb are really dumb, and don't use 'g' if 'p' is avaialable.
diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
index 8654f26..19de5ba 100644
--- a/target-sparc/cpu.h
+++ b/target-sparc/cpu.h
@@ -495,6 +495,13 @@ int cpu_sparc_handle_mmu_fault(CPUSPARCState *env1, 
target_ulong address, int rw
 target_ulong mmu_probe(CPUSPARCState *env, target_ulong address, int mmulev);
 void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUState *env);
 
+#if !defined(TARGET_SPARC64)  !defined(CONFIG_USER_ONLY)
+int target_memory_rw_debug(CPUState *env, target_ulong addr,
+   uint8_t *buf, int len, int is_write);
+#define TARGET_CPU_MEMORY_RW_DEBUG
+#endif
+
+
 /* translate.c */
 void gen_intermediate_code_init(CPUSPARCState *env);
 
diff --git a/target-sparc/helper.c b/target-sparc/helper.c
index 1fe1f07..c80531a 100644
--- a/target-sparc/helper.c
+++ b/target-sparc/helper.c
@@ -358,6 +358,90 @@ void dump_mmu(FILE *f, fprintf_function cpu_fprintf, 
CPUState *env)
 }
 }
 
+#if !defined(CONFIG_USER_ONLY)
+
+/* Gdb expects all registers windows to be flushed in ram. This function 
handles
+ * reads (and only reads) in stack frames as if windows were flushed. We assume
+ * that the sparc ABI is followed.
+ */
+int target_memory_rw_debug(CPUState *env, target_ulong addr,
+   uint8_t *buf, int len, int is_write)
+{
+int i;
+int len1;
+int cwp = env-cwp;
+
+if (!is_write) {
+for (i = 0; i  env-nwindows; i++) {
+int off;
+target_ulong fp = env-regbase[cwp * 16 + 22];
+
+/* Assume fp == 0 means end of frame.  */
+if (fp == 0) {
+break;
+}
+
+cwp = cpu_cwp_inc(env, cwp + 1);
+
+/* Invalid window ? */
+if (env-wim  (1  cwp)) {
+break;
+}
+
+/* According to the ABI, the stack is growing downward.  */
+if (addr + len  fp) {
+break;
+}
+
+/* Not in this frame.  */
+if (addr  fp + 64) {
+continue;
+}
+
+/* Handle access before this window.  */
+if (addr  fp) {
+len1 = fp - addr;
+if (cpu_memory_rw_debug(env, addr, buf, len1, is_write) != 0) {
+return -1;
+}
+addr += len1;
+len -= len1;
+buf += len1;
+}
+
+/* Access byte per byte to registers. Not very efficient but speed
+ * is not critical.
+ */
+off = addr - fp;
+len1 = 64 - off;
+
+if (len1  len) {
+len1 = len;
+}
+
+for (; len1; len1--) {
+int reg = cwp * 16 + 8 + (off  2);
+union {
+uint32_t v;
+uint8_t c[4];
+} u;
+u.v = cpu_to_be32(env-regbase[reg]);
+*buf++ = u.c[off  3];
+

Re: [Qemu-devel] [PATCH] pci: add standard bridge device

2011-09-08 Thread Wen Congyang

At 09/08/2011 06:42 PM, Michael S. Tsirkin Write:
 On Thu, Sep 08, 2011 at 05:58:12PM +0800, Wen Congyang wrote:
 At 09/08/2011 05:43 PM, Gerd Hoffmann Write:
   Hi,

 I modify the code like this, and the PCI_INTERRUPT_LINE register is
 set, and I can bind
 it to uio_pci_generic:

 --- a/src/pciinit.c
 +++ b/src/pciinit.c
 @@ -575,6 +575,8 @@ static int pci_bios_init_root_regions(u32 start,
 u32 end)

   pci_bios_init_bus_bases(busses[0]);
 -pci_bios_map_device_in_bus(0 /* host bus */);
 +for (bus = 0; bus= MaxPCIBus; bus++) {
 +pci_bios_map_device_in_bus(bus /* host bus */);

 No.  pci_bios_map_device_in_bus goes down recursively when it finds a
 bridge, so it should cover all devices already.

 Yes, pci_bios_map_device() goes down recursively.
 
 The value seems to be wrong though, I think.
 It seems to simply use the interrupt pin as array index.
 Instead, each bridge should interrupts as follows:
 
 /* Mapping mandated by PCI-to-PCI Bridge architecture specification,
  * revision 1.2 */
 /* Table 9-1: Interrupt Binding for Devices Behind a Bridge */
 static int pci_bridge_dev_map_irq_fn(PCIDevice *dev, int irq_num)
 {
 return (irq_num + PCI_SLOT(dev-devfn)) % PCI_NUM_PINS;
 }
 
 until we get to the host bridge.

I use gdb to debug, and find that this function is never called.

Thanks
Wen Congyang

 
 

 -pci_bios_init_device_in_bus(0 /* host bus */);
 +pci_bios_init_device_in_bus(bus /* host bus */);
 +}

 That is correct.  Can be done easier though by just not limiting device
 initialization to a specific bus like in the attached patch.  Does that
 one work for you?

 I test it, and it works for me.

 Thanks
 Wen Congyang


 cheers,
   Gerd

Re: [Qemu-devel] [PATCH] pci: add standard bridge device

2011-09-08 Thread Michael S. Tsirkin

On Thu, Sep 08, 2011 at 07:03:10PM +0800, Wen Congyang wrote:
 At 09/08/2011 06:42 PM, Michael S. Tsirkin Write:
  On Thu, Sep 08, 2011 at 05:58:12PM +0800, Wen Congyang wrote:
  At 09/08/2011 05:43 PM, Gerd Hoffmann Write:
Hi,
 
  I modify the code like this, and the PCI_INTERRUPT_LINE register is
  set, and I can bind
  it to uio_pci_generic:
 
  --- a/src/pciinit.c
  +++ b/src/pciinit.c
  @@ -575,6 +575,8 @@ static int pci_bios_init_root_regions(u32 start,
  u32 end)
 
pci_bios_init_bus_bases(busses[0]);
  -pci_bios_map_device_in_bus(0 /* host bus */);
  +for (bus = 0; bus= MaxPCIBus; bus++) {
  +pci_bios_map_device_in_bus(bus /* host bus */);
 
  No.  pci_bios_map_device_in_bus goes down recursively when it finds a
  bridge, so it should cover all devices already.
 
  Yes, pci_bios_map_device() goes down recursively.
  
  The value seems to be wrong though, I think.
  It seems to simply use the interrupt pin as array index.
  Instead, each bridge should interrupts as follows:
  
  /* Mapping mandated by PCI-to-PCI Bridge architecture specification,
   * revision 1.2 */
  /* Table 9-1: Interrupt Binding for Devices Behind a Bridge */
  static int pci_bridge_dev_map_irq_fn(PCIDevice *dev, int irq_num)
  {
  return (irq_num + PCI_SLOT(dev-devfn)) % PCI_NUM_PINS;
  }
  
  until we get to the host bridge.
 
 I use gdb to debug, and find that this function is never called.
 
 Thanks
 Wen Congyang

No, I mean that bios must implement this logic.
You don't see it called probably because ivshmem
does not cause interrupts for you.

  
  
 
  -pci_bios_init_device_in_bus(0 /* host bus */);
  +pci_bios_init_device_in_bus(bus /* host bus */);
  +}
 
  That is correct.  Can be done easier though by just not limiting device
  initialization to a specific bus like in the attached patch.  Does that
  one work for you?
 
  I test it, and it works for me.
 
  Thanks
  Wen Congyang
 
 
  cheers,
Gerd

Re: [Qemu-devel] [PATCH v3 00/27] Block layer cleanup fixes

2011-09-08 Thread Kevin Wolf

Am 06.09.2011 18:58, schrieb Markus Armbruster:
 This patch series looks bigger than it is.  All the patches are small
 and hopefully easy to review.
 
 Objectives:
 
 * Push BlockDriverState members locked, tray_open, media_changed into
   device models, where they belong.  Kevin picked the patches pushing
   media_changed from v2, so that part's gone already.
 
 * BlockDriverState member removable is a confusing mess, replace it.
 
 * Improve eject -f.
 
 Also clean up minor messes as they get in the way.
 
 It is based on Kevin's block branch.
 
 Part I: Move tray state to device models
 PATCH 01-05 IDE tray open/closed
 PATCH 06-07 SCSI tray open/closed
 PATCH 08-09 Kill BlockDriverState tray_open
 PATCH 10-11 IDE  SCSI track tray lock
 PATCH 12-14 Kill BlockDriverState locked
 PATCH 15-16 IDE  SCSI tray bug fixes
 PATCH 17IDE migrate tray state
 
 Part II: Miscellaneous
 PATCH 18-19 Replace BlockDriverState removable
 PATCH 20Cover tray open/closed in info block
 PATCH 21-25 Reduce unclean use of block_int.h
 PATCH 26-27 Improve eject -f
 
 Naturally, I want all parts applied.  But I did my best to make
 applying only a prefix workable, too.
 
 Review invited from:
 
 * Kevin, Christoph and Amit reviewed previous versions.
 
 * Hannes ACKed the SCSI stuff in v2.
 
 * Luiz reviewed the patches that affect QMP's query-block.  I renamed
   response member ejected to tray-open since then.
 
 * Paolo commented PATCH 17 `ide/atapi: Preserve tray state on
   migration'.
 
 * Stefano reviewed v1 of PATCH 18 (affects -drive if=xen).
 
 Testing
 
 * Linux installs from CD to empty disk, then boots fine from disk.
 
 * For both IDE and SCSI:
 
   - info block reports tray state correctly
 
   - Guest locking the tray stops eject (without -f) and change
 
   - eject -f; change works even while tray is locked by guest
 
   - Reading /dev/sr0 with tray open behaves as before: IDE closes the
 tray and reads (matches bare metal), SCSI reports no medium
 
   - Tray state is migrated correctly (tested with savevm/loadvm)
 
 * Guest still notices CD media change (IDE only, SCSI doesn't work
   before or after my patches because GESN isn't implemented)
 
 * Migrating ide-cd to older version works when tray is closed and
   unlocked, else fails (tested with savevm/loadvm)
 
 
 v3:
 
 * Rebased to block branch cfc606da
   - Old PATCH 01-05,25,28-34,40 already there, drop
   - a couple of simple conflicts in hw/scsi-disk.c
 
 * Drop old PATCH v2 27 scsi-disk: Preserve tray state on migration,
   because it doesn't make much sense without working SCSI migration.
 
 * Drop old PATCH v2 22 ide/atapi: Avoid physical/virtual tray state
   mismatch, because it has a bug, how to best fix it isn't obvious,
   and it's not essential to this series.  Drop related PATCH v2 20,24,
   too.  I plan to revisit them later.
 
 * Clean up `ide: Use a table to declare which drive kinds accept each
   command' a bit (Blue  Kevin)
 
 * Hannes's advice:
   - Rename some SCSISense variables
 
 * Kevin's advice:
   - Add comments to explain MMC-5 jargon loej
   - Make bdrv_lock_medium() parameter locked bool.
 
 v2:
 
 * Rebased to block branch; non-trivial conflicts:
   - Old PATCH 01-02,06-09 already there, drop
   - `block: Attach non-qdev devices as well':
 - cover new pci_piix3_xen_ide_unplug()
 - hw/nand has been qdefivied, drop hunk
 - onenand_init() changed, rewrite hunk
   - pci_piix3_xen_ide_unplug() needs new PATCH 33.
 
 * Drop old PATCH 18 `scsi-disk: Reject CD-specific SCSI commands to
   disks' because Hannes wants to do it differently, and it's not
   essential to this series.
 
 * Christoph's advice:
   - Rework `ide: Update command code definitions as per ACS-2'
   - Add comment to `ide: Fix ATA command READ to set ATAPI signature
 for CD-ROM'
   - Squash `ide/atapi: Track tray open/close state' and `ide/atapi:
 Switch from BlockDriverState's tray_open to own'
   - Squash `ide/atapi: Track tray locked state' and `ide/atapi: Switch
 from BlockDriverState's locked to own tray_locked'
   - Squash `scsi-disk: Track tray locked state' and `scsi-disk: Switch
 from BlockDriverState's locked to own tray_locked'
   - Drop `block: Move BlockDriverAIOCB  friends from block_int.h to
 block.h'
 
 * Luiz's advice:
   - Change query-block to always include ejected for removable
 devices.  Requires moving `block: Show whether the guest ejected
 the medium in info block', which causes a bunch of conflicts.
 
 * A few cosmetic improvements
 
 
 Markus Armbruster (27):
   ide: Fix ATA command READ to set ATAPI signature for CD-ROM
   ide: Use a table to declare which drive kinds accept each command
   ide: Reject ATA commands specific to drive kinds
   ide/atapi: Clean up misleading name in cmd_start_stop_unit()
   ide/atapi: Track tray open/close state
   scsi-disk: Factor out scsi_disk_emulate_start_stop()
   scsi-disk: Track tray open/close state
   block: Revert entanglement of

[Qemu-devel] [PATCH] qcow2: initialize metadata before inserting in cluster_allocs

2011-09-08 Thread Frediano Ziglio

QCow2Meta structure was inserted into list before many fields are
initialized. Currently is not a problem cause all occur in a lock
but if qcow2_alloc_clusters would in a future unlock this lock
some issues could arise.
Initializing fields before inserting fix the problem.

Signed-off-by: Frediano Ziglio fredd...@gmail.com
---
 block/qcow2-cluster.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 113db8b..428b5ad 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -806,6 +806,11 @@ again:
 abort();
 }
 
+/* save info needed for meta data update */
+m-offset = offset;
+m-n_start = n_start;
+m-nb_clusters = nb_clusters;
+
 QLIST_INSERT_HEAD(s-cluster_allocs, m, next_in_flight);
 
 /* allocate a new cluster */
@@ -816,11 +821,6 @@ again:
 goto fail;
 }
 
-/* save info needed for meta data update */
-m-offset = offset;
-m-n_start = n_start;
-m-nb_clusters = nb_clusters;
-
 out:
 ret = qcow2_cache_put(bs, s-l2_table_cache, (void**) l2_table);
 if (ret  0) {
-- 
1.7.1

Re: [Qemu-devel] [PATCH] qcow2: initialize metadata before inserting in cluster_allocs

2011-09-08 Thread Kevin Wolf

Am 08.09.2011 13:38, schrieb Frediano Ziglio:
 QCow2Meta structure was inserted into list before many fields are
 initialized. Currently is not a problem cause all occur in a lock
 but if qcow2_alloc_clusters would in a future unlock this lock
 some issues could arise.
 Initializing fields before inserting fix the problem.
 
 Signed-off-by: Frediano Ziglio fredd...@gmail.com

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH V8 10/14] Encrypt state blobs using AES CBC encryption

2011-09-08 Thread Stefan Berger


On 09/08/2011 06:32 AM, Michael S. Tsirkin wrote:

On Wed, Sep 07, 2011 at 08:16:27PM -0400, Stefan Berger wrote:

On 09/07/2011 02:55 PM, Michael S. Tsirkin wrote:

On Thu, Sep 01, 2011 at 10:23:51PM -0400, Stefan Berger wrote:

An additional 'layer' for reading and writing the blobs to the underlying
block storage is added. This layer encrypts the blobs for writing if a key is
available. Similarly it decrypts the blobs after reading.

So a couple of further thoughts:
1. Raw storage should work too, and with e.g. NFS migration will be fine, right?
So I'd say it's worth supporting.

NFS via shared storage, yes, but not migration via Qemu's block
migration mechanism. If snapshotting was supposed to be a feature to
support then that's only possible via block storage (QCoW2 in
particular).

As disk has the same limitation, that sounds fine.
Let the user decide whether snapshoting is needed,
same as disk.


Adding plain file support to the TPM code so it can store its 3
blobs into adds quite a bit of complexity to the code. The command
line parameter that previously pointed to QCoW2 image file would
probably have to point to a directory where files for the 3 blobs
can be written into. Besides that, snapshotting would actually have
to be prevented maybe through registering a (fake) file of other
than QCoW2 type since the plain TPM files won't handle snapshotting
correctly, either, and QEMU pretty much would have to be prevented
from doing snapshotting at all. Maybe there's an API for this, but I
don't know. Though why create this additional complexity? I don't
mind relaxing the requirement of using a QCoW2 image and allowing
for example RAW images (that then automatically prevent the
snapshotting from happening) but the same code I now have would work
for writing the blobs into it the single file.

Right. Write all blobs into a single files at different
offsets, or something.


That's exactly what I am doing already. Just that I am doing this with 
Qemu's BlockStorage (bdrv)  writing to sectors rather than seek()ing in 
files. To avoid more complexity I'd rather not introduce more code 
handling plain files but rely on all the image formats that qemu already 
supports and that give features like encryption (QCoW2 only), 
snapshotting (QCoW2 only) and block migration (presumably all of them). 
Plain files offer none of that. Devices that need to write their state 
to persistent storage really have to aim for doing this through Qemu's 
bdrv since they will otherwise be the ones killing the snapshot feature. 
TPM certainly doesn't want to be one of them. If the user doesn't want 
snapshotting to be supported since his VM image files are not QCoW2 type 
of files, just create a raw image file for the TPM's persistent state 
and bdrv will automatically prevent snapshotting. The point is that the 
TPM code now using the bdrv layer works with any image format already.



2. File backed nvram is interesting outside tpm.
For example,vpd and chassis number for pci, eeprom emulation for network 
cards.
Using a file per device might be inconvenient though.
So please think of a format and API that will allow sections
for use by different devices.

Also here 'snapshotting' is the most 'demanding' feature of QEMU I
would say. Snapshotting isn't easily supported outside of the block
layer from what I understand. Once you are tied to the block layer
you end up having to use images and those don't grow quite well. So
other devices wanting to use those type of devices would need to
know what the worst case sizes are for writing their state into --
unless an image format is created that can grow.

As for the format: Ideally all devices could write into one file,
right? That would at least prevent too many files besides the VM's
image file from floating around which presumably makes image
management easier. Following the above, you add up all the worst
case sizes the individual devices may need for their blobs and
create an image with that capacity. Then you need some form of a
(primitive?) directory that lets you write blobs into that storage.
Assuming there were well defined names for those devices one could
say for example store this blobs under the name
'tpm-permanent-state' and later on load it under that name. The
possible size of the directory would have to be considered as
well... I do something like that for the TPM where I have up to 3
such blobs that I store.

The bad thing about the above is of course the need to know what the
sum of all the worst case sizes is.

A typical usecase I know about has prepared vpd/eeprom content.
We'll typically need a tool to get binary blobs and put that into the
file image.  That tool can do the necessary math.
We could also integrate this into qemu-img if we like.


So a growable image format would
be quite good to have. I haven't followed the conversations much,
but is that something QCoW3 would support?

I don't follow - does TPM need a growable image format? Why?

Re: [Qemu-devel] [PATCH -V2] iohandler: update qemu_fd_set_handler to work with null call back arg

2011-09-08 Thread Anthony Liguori


On 09/08/2011 05:07 AM, Avi Kivity wrote:

On 09/07/2011 09:44 PM, Anthony Liguori wrote:


I think this is a bit more complicated than is really needed. Here's
what I came up with which also fixes another bug where the io channel
could be freed twice. I stumbled across this via a very strange
failure scenario. Avi, it might be worth trying this patch to see if
it fixes your problem too.


Right now, I've got more than just one problem.



One thing that I found challenging debugging this, coroutines make
valgrind very unhappy. Is it possible that we could have a command
line switch to fall back to the thread based coroutines so to make
things more valgrind friendly?


How is valgrind even aware of coroutines? Unless is doesn't implement
makecontext correctly, it shouldn't even be aware of them.


It detects stack switching and has trouble differentiating between a 
legitimate stack switch and something more nefarious.  I believe the 
heuristic it currently uses is the distance that RSP moves.  If it moves 
more than a certain threshold, it assumes that's a stack switch.


Regards,

Anthony Liguori

Re: [Qemu-devel] [PULL 0/3] Trivial patches for Auguest 25 to September 2 2011

2011-09-08 Thread Stefan Hajnoczi

On Fri, Sep 2, 2011 at 11:12 AM, Stefan Hajnoczi
stefa...@linux.vnet.ibm.com wrote:
 The following changes since commit 625f9e1f54cd78ee98ac22030da527c9a1cc9d2b:

  Merge remote-tracking branch 'stefanha/trivial-patches' into staging 
 (2011-09-01 13:57:19 -0500)

 are available in the git repository at:

  ssh://repo.or.cz/srv/git/qemu/stefanha.git trivial-patches

 Boris Figovsky (1):
      x86: fix daa opcode for al register values higher than 0xf9

 Brad Smith (1):
      libcacard: use INSTALL_DATA for data

 Stefan Weil (1):
      sh4: Fix potential crash in debug code

  hw/sh_intc.c            |    9 +
  libcacard/Makefile      |    2 +-
  target-i386/op_helper.c |    6 +++---
  3 files changed, 9 insertions(+), 8 deletions(-)

Ping?

Stefan

Re: [Qemu-devel] [PATCH V8 10/14] Encrypt state blobs using AES CBC encryption

2011-09-08 Thread Michael S. Tsirkin

On Thu, Sep 08, 2011 at 08:11:00AM -0400, Stefan Berger wrote:
 On 09/08/2011 06:32 AM, Michael S. Tsirkin wrote:
 On Wed, Sep 07, 2011 at 08:16:27PM -0400, Stefan Berger wrote:
 On 09/07/2011 02:55 PM, Michael S. Tsirkin wrote:
 On Thu, Sep 01, 2011 at 10:23:51PM -0400, Stefan Berger wrote:
 An additional 'layer' for reading and writing the blobs to the 
 underlying
 block storage is added. This layer encrypts the blobs for writing if a 
 key is
 available. Similarly it decrypts the blobs after reading.
 So a couple of further thoughts:
 1. Raw storage should work too, and with e.g. NFS migration will be fine, 
 right?
 So I'd say it's worth supporting.
 NFS via shared storage, yes, but not migration via Qemu's block
 migration mechanism. If snapshotting was supposed to be a feature to
 support then that's only possible via block storage (QCoW2 in
 particular).
 As disk has the same limitation, that sounds fine.
 Let the user decide whether snapshoting is needed,
 same as disk.
 
 Adding plain file support to the TPM code so it can store its 3
 blobs into adds quite a bit of complexity to the code. The command
 line parameter that previously pointed to QCoW2 image file would
 probably have to point to a directory where files for the 3 blobs
 can be written into. Besides that, snapshotting would actually have
 to be prevented maybe through registering a (fake) file of other
 than QCoW2 type since the plain TPM files won't handle snapshotting
 correctly, either, and QEMU pretty much would have to be prevented
 from doing snapshotting at all. Maybe there's an API for this, but I
 don't know. Though why create this additional complexity? I don't
 mind relaxing the requirement of using a QCoW2 image and allowing
 for example RAW images (that then automatically prevent the
 snapshotting from happening) but the same code I now have would work
 for writing the blobs into it the single file.
 Right. Write all blobs into a single files at different
 offsets, or something.
 
 That's exactly what I am doing already. Just that I am doing this
 with Qemu's BlockStorage (bdrv)  writing to sectors rather than
 seek()ing in files. To avoid more complexity I'd rather not
 introduce more code handling plain files but rely on all the image
 formats that qemu already supports and that give features like
 encryption (QCoW2 only), snapshotting (QCoW2 only) and block
 migration (presumably all of them). Plain files offer none of that.
 Devices that need to write their state to persistent storage really
 have to aim for doing this through Qemu's bdrv since they will
 otherwise be the ones killing the snapshot feature. TPM certainly
 doesn't want to be one of them. If the user doesn't want
 snapshotting to be supported since his VM image files are not QCoW2
 type of files, just create a raw image file for the TPM's persistent
 state and bdrv will automatically prevent snapshotting. The point is
 that the TPM code now using the bdrv layer works with any image
 format already.

Ah, that's fine then. I had an impression there was a qcow only
limitation, not sure what in code gave me that impression.

 2. File backed nvram is interesting outside tpm.
 For example,vpd and chassis number for pci, eeprom emulation for 
  network cards.
 Using a file per device might be inconvenient though.
 So please think of a format and API that will allow sections
 for use by different devices.
 Also here 'snapshotting' is the most 'demanding' feature of QEMU I
 would say. Snapshotting isn't easily supported outside of the block
 layer from what I understand. Once you are tied to the block layer
 you end up having to use images and those don't grow quite well. So
 other devices wanting to use those type of devices would need to
 know what the worst case sizes are for writing their state into --
 unless an image format is created that can grow.
 
 As for the format: Ideally all devices could write into one file,
 right? That would at least prevent too many files besides the VM's
 image file from floating around which presumably makes image
 management easier. Following the above, you add up all the worst
 case sizes the individual devices may need for their blobs and
 create an image with that capacity. Then you need some form of a
 (primitive?) directory that lets you write blobs into that storage.
 Assuming there were well defined names for those devices one could
 say for example store this blobs under the name
 'tpm-permanent-state' and later on load it under that name. The
 possible size of the directory would have to be considered as
 well... I do something like that for the TPM where I have up to 3
 such blobs that I store.
 
 The bad thing about the above is of course the need to know what the
 sum of all the worst case sizes is.
 A typical usecase I know about has prepared vpd/eeprom content.
 We'll typically need a tool to get binary blobs and put that into the
 file image.  That tool can do the necessary math.
 We could

Re: [Qemu-devel] CFQ I/O starvation problem triggered by RHEL6.0 KVM guests

2011-09-08 Thread Vivek Goyal

On Thu, Sep 08, 2011 at 06:13:53PM +0900, Takuya Yoshikawa wrote:
 This is a report of strange cfq behaviour which seems to be triggered by
 QEMU posix aio threads.
 
 Host environment:
   OS: RHEL6.0 KVM/qemu-kvm (with no patch applied)
   IO scheduler: cfq (with the default parameters)

So you are using both RHEL 6.0 in both host and guest kernel? Can you
reproduce the same issue with upstream kernels? How easily/frequently
you can reproduce this with RHEL6.0 host.

 
 On the host, we were running 3 linux guests to see if I/O from these guests
 would be handled fairly by host; each guest did dd write with oflag=direct.
 
 Guest virtual disk:
   We used a host local disk which had 3 partitions, and each guest was
   allocated one of these as dd write target.
 
 So our test was for checking if cfq could keep fairness for the 3 guests
 who shared the same disk.
 
 The result (strage starvation):
   Sometimes, one guest dominated cfq for more than 10sec and requests from
   other guests were not handled at all during that time.
 
 Below is the blktrace log which shows that a request to (8,27) in cfq2068S 
 (*1)
 is not handled at all during cfq2095S and cfq2067S which hold requests to
 (8,26) are being handled alternately.
 
 *1) WS 104920578 + 64
 
 Question:
   I guess that cfq_close_cooperator() was being called in an unusual manner.
   If so, do you think that cfq is responsible for keeping fairness for this
   kind of unusual write requests?

- If two guests are doing IO to separate partitions, they should really
  not be very close (until and unless partitions are really small).

- Even if there are close cooperators, these queues are merged and they
  are treated as single queue from slice point of view. So cooperating
  queues should be merged and get a single slice instead of starving
  other queues in the system.

Can you upload the blktrace logs somewhere which shows what happened 
during that 10 seconds.

 
 Note:
   With RHEL6.1, this problem could not triggered. But I guess that was due to
   QEMU's block layer updates.

You can try reproducing this with fio.

Thanks
Vivek

Re: [Qemu-devel] [PATCH 0/2] improve qemu-img conversion performance

2011-09-08 Thread Kevin Wolf

Am 08.09.2011 01:06, schrieb Yehuda Sadeh:
 The following set of patches improve the qemu-img conversion process
 performance. When using a higher latency backend, small writes have a
 severe impact on the time it takes to do image conversion. 
 We switch to using async writes, and we avoid splitting writes due to
 holes when the holes are small enough.
 
 Yehuda Sadeh (2):
   qemu-img: async write to block device when converting image
   qemu-img: don't skip writing small holes
 
  qemu-img.c |   34 +++---
  1 files changed, 27 insertions(+), 7 deletions(-)
 

This doesn't seem to be against git master or the block tree. Please rebase.

I think that commit a22f123c may obsolete your patch 2/2.

Kevin

Re: [Qemu-devel] [PATCH] target-i386: Compute all flag data inside %cl != 0 test.

2011-09-08 Thread malc

On Thu, 8 Sep 2011, Richard Henderson wrote:

 The (x  (cl - 1)) quantity is only used if CL != 0.  Move the
 computation of that quantity nearer its use.
 
 This avoids the creation of undefined TCG operations when the
 constant propagation optimization proves that CL == 0, and thus
 CL-1 is outside the range [0-wordsize).
 
 Signed-off-by: Richard Henderson r...@twiddle.net

Applied, thanks.

[..snip..]

-- 
mailto:av1...@comtv.ru

Re: [Qemu-devel] [PULL] File descriptor reclaim patchset for VirtFS

2011-09-08 Thread Anthony Liguori


On 09/01/2011 12:25 PM, Aneesh Kumar K.V wrote:


The following changes since commit 56a7a874e962e28522857fbf72eaefb1a07e2001:

   Merge remote-tracking branch 'stefanha/trivial-patches' into staging 
(2011-08-25 07:50:07 -0500)

are available in the git repository at:

   git://repo.or.cz/qemu/v9fs.git for-upstream-3


Pulled.  Thanks.

Regards,

Anthony Liguori



Aneesh Kumar K.V (6):
   hw/9pfs: Add reference counting for fid
   hw/9pfs: Add file descriptor reclaim support
   hw/9pfs: init fid list properly
   hw/9pfs: Use v9fs_do_close instead of close
   hw/9pfs: Add directory reclaim support
   hw/9pfs: mark directories also as un-reclaimable on unlink

  hw/9pfs/codir.c|   13 +-
  hw/9pfs/cofile.c   |   19 ++-
  hw/9pfs/virtio-9p-coth.h   |4 +-
  hw/9pfs/virtio-9p-device.c |2 +
  hw/9pfs/virtio-9p.c|  486 +++-
  hw/9pfs/virtio-9p.h|   24 ++-
  6 files changed, 445 insertions(+), 103 deletions(-)

Re: [Qemu-devel] [PULL] usb patch queue

2011-09-08 Thread Anthony Liguori


On 09/02/2011 04:56 AM, Gerd Hoffmann wrote:

   Hi,

This is the current use patch queue with the following changes:

   * musb improvements (qdev windup)
   * fix ehci emulation for FreeBSD guests.
   * a bunch if usb-host fixes.
   * misc minir tweaks.

please pull,
   Gerd



Pulled.  Thanks.

Regards,

Anthony Liguori



Gerd Hoffmann (15):
   usb-host: start tracing support
   usb-host: reapurb error report fix
   usb-host: fix halted endpoints
   usb-host: limit open retries
   usb-host: fix configuration tracking.
   usb-host: claim port
   usb-host: endpoint table fixup
   usb-ehci: handle siTDs
   usb-host: constify port
   usb-host: parse port in /proc/bus/usb/devices scan
   usb: fix use after free
   usb-ccid: switch to USBDesc*
   usb-ccid: remote wakeup support
   usb: claim port at device initialization time.
   usb-host: tag as unmigratable

Juha Riihimäki (1):
   usb-musb: Add reset function

Peter Maydell (2):
   usb: Remove leading underscores from __musb_irq_max
   usb-musb: Take a DeviceState* in init function

  hw/tusb6010.c |   11 +-
  hw/usb-bus.c  |  110 --
  hw/usb-ccid.c |  248 +++-
  hw/usb-desc.h |2 +-
  hw/usb-ehci.c |   65 +++--
  hw/usb-hub.c  |   12 +--
  hw/usb-musb.c |   26 +++-
  hw/usb-ohci.c |4 +-
  hw/usb-uhci.c |   11 +-
  hw/usb.c  |   37 +++---
  hw/usb.h  |   11 +-
  trace-events  |   32 
  usb-linux.c   |  448 ++---
  13 files changed, 561 insertions(+), 456 deletions(-)

The following changes since commit 625f9e1f54cd78ee98ac22030da527c9a1cc9d2b:

   Merge remote-tracking branch 'stefanha/trivial-patches' into staging 
(2011-09-01 13:57:19 -0500)

are available in the git repository at:

   git://git.kraxel.org/qemu usb.25

Gerd Hoffmann (15):
   usb-host: start tracing support
   usb-host: reapurb error report fix
   usb-host: fix halted endpoints
   usb-host: limit open retries
   usb-host: fix configuration tracking.
   usb-host: claim port
   usb-host: endpoint table fixup
   usb-ehci: handle siTDs
   usb-host: constify port
   usb-host: parse port in /proc/bus/usb/devices scan
   usb: fix use after free
   usb-ccid: switch to USBDesc*
   usb-ccid: remote wakeup support
   usb: claim port at device initialization time.
   usb-host: tag as unmigratable

Juha Riihimäki (1):
   usb-musb: Add reset function

Peter Maydell (2):
   usb: Remove leading underscores from __musb_irq_max
   usb-musb: Take a DeviceState* in init function

  hw/tusb6010.c |   11 +-
  hw/usb-bus.c  |  110 --
  hw/usb-ccid.c |  248 +++-
  hw/usb-desc.h |2 +-
  hw/usb-ehci.c |   65 +++--
  hw/usb-hub.c  |   12 +--
  hw/usb-musb.c |   26 +++-
  hw/usb-ohci.c |4 +-
  hw/usb-uhci.c |   11 +-
  hw/usb.c  |   37 +++---
  hw/usb.h  |   11 +-
  trace-events  |   32 
  usb-linux.c   |  448 ++---
  13 files changed, 561 insertions(+), 456 deletions(-)

Re: [Qemu-devel] [PULL] Memory API batch 5, v2

2011-09-08 Thread Anthony Liguori


On 09/04/2011 10:28 AM, Avi Kivity wrote:

Please pull from

git://github.com/avikivity/qemu.git memory/batch

v2: just a rebase to make sure bisects see the rom_device fix.



Pulled.  Thanks.

Regards,

Anthony Liguori



Avi Kivity (22):
mips_fulong2e: convert to memory API
stellaris_enet: convert to memory API
sysbus: add helpers to add and delete memory regions to the system bus
pci_host: convert conf index and data ports to memory API
ReadWriteHandler: remove
an5206: convert to memory API
armv7m: convert to memory API
axis_dev88: convert to memory API (RAM only)
sysbus: add sysbus_add_memory_overlap()
integratorcp: convert to memory API (RAM/flash only)
leon3: convert to memory API
cirrus: wrap memory update in a transaction
piix_pci: wrap memory update in a transaction
Makefile.hw: allow hw/ files to include glib headers
pflash_cfi01/pflash_cfi02: convert to memory API
dummy_m68k: convert to memory API
lm32_boards: convert to memory API
mainstone: convert to memory API
mcf5208: convert to memory API
milkymist-minimac2: convert to memory API
milkymist-softusb: convert to memory API
milkymist: convert to memory API

Makefile.hw | 1 +
Makefile.target | 1 -
hw/an5206.c | 12 +++--
hw/arm-misc.h | 5 ++-
hw/armv7m.c | 22 +
hw/axis_dev88.c | 16 +++---
hw/cirrus_vga.c | 2 +
hw/collie.c | 7 +--
hw/dec_pci.c | 13 +++---
hw/dummy_m68k.c | 7 ++-
hw/flash.h | 13 +-
hw/grackle_pci.c | 13 +++---
hw/gumstix.c | 6 +--
hw/integratorcp.c | 28 +---
hw/leon3.c | 15 ---
hw/lm32_boards.c | 23 +-
hw/mainstone.c | 20 +
hw/mcf5208.c | 72 ++-
hw/milkymist-minimac2.c | 43 +-
hw/milkymist-softusb.c | 48 ++--
hw/milkymist.c | 13 +++---
hw/mips_fulong2e.c | 17 ---
hw/mips_malta.c | 54 +++
hw/mips_r4k.c | 13 +++---
hw/musicpal.c | 8 ++--
hw/omap_sx1.c | 8 ++--
hw/pci_host.c | 86 -
hw/pci_host.h | 16 +++
hw/petalogix_ml605_mmu.c | 5 +-
hw/petalogix_s3adsp1800_mmu.c | 5 +-
hw/pflash_cfi01.c | 78 ++---
hw/pflash_cfi02.c | 95 +
hw/piix_pci.c | 13 +-
hw/ppc405_boards.c | 49 -
hw/ppc4xx_pci.c | 10 +++--
hw/ppce500_pci.c | 21 -
hw/prep_pci.c | 12 -
hw/r2d.c | 2 +-
hw/stellaris.c | 5 ++-
hw/stellaris_enet.c | 29 +---
hw/sysbus.c | 29 
hw/sysbus.h | 8 +++
hw/unin_pci.c | 82 ++--
hw/virtex_ml507.c | 4 +-
hw/z2.c | 2 +-
rwhandler.c | 87 -
rwhandler.h | 27 
47 files changed, 551 insertions(+), 594 deletions(-)
delete mode 100644 rwhandler.c
delete mode 100644 rwhandler.h

Re: [Qemu-devel] [PULL 00/31] Block patches

2011-09-08 Thread Anthony Liguori


On 09/06/2011 10:39 AM, Kevin Wolf wrote:

The following changes since commit f69539b14bdba7a5cd22e1f4bed439b476b17286:

   apb_pci: convert PCI space to memory API (2011-09-04 09:28:04 +)

are available in the git repository at:
   git://repo.or.cz/qemu/kevin.git for-anthony



Pulled.  Thanks.

Regards,

Anthony Liguori



Fam Zheng (8):
   VMDK: enable twoGbMaxExtentFlat
   VMDK: add twoGbMaxExtentSparse support
   VMDK: separate vmdk_read_extent/vmdk_write_extent
   VMDK: Opening compressed extent.
   VMDK: read/write compressed extent
   VMDK: creating streamOptimized subformat
   VMDK: bugfix, open Haiku vmdk image
   VMDK: bugfix, opening vSphere 4 exported image

Frediano Ziglio (1):
   linux aio: some comments

Kevin Wolf (3):
   qcow2: Properly initialise QcowL2Meta
   qcow2: Fix error cases to run depedent requests
   async: Allow nested qemu_bh_poll calls

Markus Armbruster (14):
   block: Attach non-qdev devices as well
   block: Generalize change_cb() to BlockDevOps
   block: Split change_cb() into change_media_cb(), resize_cb()
   ide: Update command code definitions as per ACS-2 Table B.2
   ide: Clean up case label indentation in ide_exec_cmd()
   ide: Give vmstate structs internal linkage where possible
   block/raw: Fix to forward method bdrv_media_changed()
   block: Leave tracking media change to device models
   fdc: Make media change detection more robust
   block: Clean up bdrv_flush_all()
   savevm: Include writable devices with removable media
   xen: Clean up pci_piix3_xen_ide_unplug()'s test for not a CD
   spitz tosa: Simplify drive is suitable for microdrive test
   block: Declare qemu_blockalign() in block.h, not block_int.h

Paolo Bonzini (5):
   scsi: execute SYNCHRONIZE_CACHE asynchronously
   scsi: fix accounting of writes
   scsi: refine constants for READ CAPACITY 16
   scsi: fill in additional sense length correctly
   scsi: improve MODE SENSE emulation

  async.c  |   24 +++-
  block.c  |  104 ---
  block.h  |   28 +++-
  block/qcow2.c|   12 +-
  block/raw-posix.c|4 +
  block/raw.c  |7 +
  block/vmdk.c |  346 +++---
  block_int.h  |   14 +--
  blockdev.c   |5 +-
  hw/fdc.c |   46 
  hw/ide/core.c|   35 +++---
  hw/ide/internal.h|  171 +
  hw/ide/piix.c|7 +-
  hw/pflash_cfi01.c|1 +
  hw/pflash_cfi02.c|1 +
  hw/qdev-properties.c |6 +-
  hw/scsi-bus.c|6 +-
  hw/scsi-defs.h   |8 +-
  hw/scsi-disk.c   |  157 +--
  hw/sd.c  |   14 +-
  hw/spitz.c   |   10 +-
  hw/tosa.c|   10 +-
  hw/usb-msd.c |2 +-
  hw/virtio-blk.c  |   12 +-
  hw/xen_disk.c|1 +
  linux-aio.c  |1 +
  savevm.c |4 +-
  27 files changed, 652 insertions(+), 384 deletions(-)

Re: [Qemu-devel] [PULL] spice patch queue

2011-09-08 Thread Anthony Liguori


On 09/07/2011 02:38 AM, Gerd Hoffmann wrote:

   Hi,

Here is the spice patch queue with a collection of bugfixes.

A workaround for the much discussed spice-calls-us-from-wrong-thread
issue is included because it turned out to be not *that* easily fixable
in spice so it will probably take some time.  Also a spice server fix
wouldn't cover already released spice versions.

cheers,
   Gerd



Pulled.  Thanks.

Regards,

Anthony Liguori



The following changes since commit 344eecf6995f4a0ad1d887cec922f6806f91a3f8:

   mips: Support the MT TCStatus IXMT irq disable flag (2011-09-06 11:09:39 
+0200)

are available in the git repository at:
   git://anongit.freedesktop.org/spice/qemu spice.v42

Gerd Hoffmann (1):
   spice: workaround a spice server bug.

Peter Maydell (2):
   spice-qemu-char.c: Use correct printf format char for ssize_t
   hw/qxl: Fix format string errors

Yonit Halperin (3):
   qxl: send interrupt after migration in case ram-int_pending != 0, RHBZ 
#732949
   qxl: s/qxl_set_irq/qxl_update_irq/
   spice: set qxl-ssd.running=true before telling spice to start, RHBZ 
#733993

  hw/qxl-logger.c|2 +-
  hw/qxl.c   |   26 --
  spice-qemu-char.c  |2 +-
  ui/spice-core.c|   25 -
  ui/spice-display.c |3 ++-
  5 files changed, 44 insertions(+), 14 deletions(-)

Re: [Qemu-devel] [PULL 0/2]: QMP queue

2011-09-08 Thread Anthony Liguori


On 09/06/2011 11:44 AM, Luiz Capitulino wrote:

Anthony,

The following patches have been sent to the list and look good to me. I've
also tested them.



Pulled.  Thanks.

Regards,

Anthony Liguori



The changes (since 344eecf6995f4a0ad1d887cec922f6806f91a3f8) are available
in the following repository:

 git://repo.or.cz/qemu/qmp-unstable.git queue/qmp

Jan Kiszka (1):
   Fix qjson test of solidus encoding

Luiz Capitulino (1):
   configure: Copy test data to build directory

  check-qjson.c |3 ++-
  configure |2 +-
  2 files changed, 3 insertions(+), 2 deletions(-)

Re: [Qemu-devel] [PULL 0/3] Trivial patches for Auguest 25 to September 2 2011

2011-09-08 Thread Anthony Liguori


On 09/02/2011 05:12 AM, Stefan Hajnoczi wrote:

The following changes since commit 625f9e1f54cd78ee98ac22030da527c9a1cc9d2b:

   Merge remote-tracking branch 'stefanha/trivial-patches' into staging 
(2011-09-01 13:57:19 -0500)

are available in the git repository at:

   ssh://repo.or.cz/srv/git/qemu/stefanha.git trivial-patches



Pulled.  Thanks.

Regards,

Anthony Liguori



Boris Figovsky (1):
   x86: fix daa opcode for al register values higher than 0xf9

Brad Smith (1):
   libcacard: use INSTALL_DATA for data

Stefan Weil (1):
   sh4: Fix potential crash in debug code

  hw/sh_intc.c|9 +
  libcacard/Makefile  |2 +-
  target-i386/op_helper.c |6 +++---
  3 files changed, 9 insertions(+), 8 deletions(-)

Re: [Qemu-devel] [PATCH] s390: remove boot image detection to fix boot with newer kernels

2011-09-08 Thread Christian Borntraeger

On 07/09/11 14:34, Alexander Graf wrote:
 No, in theory it could change arbitrarily. The vmlinux case is unfortunate
 but in the end its shoot yourself in the foot, we just have to make sure
 that we allow a graceful exit from a looping qemu guest.
 
 That's not the answer I'd like to hear. Can't we put a magic constant 
 somewhere
for newer kernel versions that would identify those and keep the basr 13,0 
hack 
 around for older ones?

I will wire up the elf loader for s390, to make vmlinux simply work. That should
make the test no longer needed. 
There are some small problems left, e.g. the elf loader loads the kernel as 
bios 
afterwards and therefore overwrites the kernel parameter line. Will fix this in 
the
next days.

Christian

[Qemu-devel] [PATCH] Flexible array should be last in struct mbuf

2011-09-08 Thread Elie Richa

The flexible array member should remain the last member in the structure
as this assumption is based upon in the code.

Signed-off-by: Elie Richa ri...@adacore.com
---
 slirp/mbuf.h |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/slirp/mbuf.h b/slirp/mbuf.h
index 55170e5..e13ff71 100644
--- a/slirp/mbuf.h
+++ b/slirp/mbuf.h
@@ -82,12 +82,13 @@ struct m_hdr {
 struct mbuf {
struct  m_hdr m_hdr;
Slirp *slirp;
+   bool arp_requested;
+   uint64_t expiration_date;
union M_dat {
charm_dat_[1]; /* ANSI don't like 0 sized arrays */
char*m_ext_;
-   } M_dat;
-bool arp_requested;
-uint64_t expiration_date;
+   } M_dat; /* This is a flexible array member. It should always remain
+the last member of the structure */
 };
 
 #define m_next m_hdr.mh_next
-- 
1.7.4.1

[Qemu-devel] [PATCH 00/12] nbd improvements

2011-09-08 Thread Paolo Bonzini

I find nbd quite useful to test migration, but it is limited:
it can only do synchronous operation, it is not safe because it
does not support flush, and it has no discard either.  qemu-nbd
is also limited to 1MB requests, and the nbd block driver does
not take this into account.

Luckily, flush/FUA support is being worked out by upstream,
and discard can also be added with the same framework (patches
1 to 6).

Asynchronous support is also very similar to what sheepdog is
already doing (patches 7 to 12).

Paolo Bonzini (12):
  nbd: support feature negotiation
  nbd: sync API definitions with upstream
  nbd: support NBD_SET_FLAGS ioctl
  nbd: add support for NBD_CMD_FLUSH
  nbd: add support for NBD_CMD_FLAG_FUA
  nbd: support NBD_CMD_TRIM in the server
  sheepdog: add coroutine_fn markers
  add socket_set_block
  sheepdog: move coroutine send/recv function to generic code
  block: add bdrv_co_flush support
  nbd: switch to asynchronous operation
  nbd: split requests

 block.c  |   53 ++---
 block/nbd.c  |  225 
 block/sheepdog.c |  235 +++---
 block_int.h  |1 +
 cutils.c |  108 +
 nbd.c|   80 +--
 nbd.h|   20 -
 oslib-posix.c|7 ++
 oslib-win32.c|6 ++
 qemu-common.h|3 +
 qemu-coroutine.c |   71 
 qemu-coroutine.h |   26 ++
 qemu-nbd.c   |   13 ++--
 qemu_socket.h|1 +
 14 files changed, 580 insertions(+), 269 deletions(-)

-- 
1.7.6

[Qemu-devel] [PATCH 06/12] nbd: support NBD_CMD_TRIM in the server

2011-09-08 Thread Paolo Bonzini

Map it to bdrv_discard.  The server can now expose NBD_FLAG_SEND_TRIM.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block/nbd.c |   31 +++
 nbd.c   |9 -
 2 files changed, 39 insertions(+), 1 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 5a7812c..964caa8 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -275,6 +275,36 @@ static int nbd_flush(BlockDriverState *bs)
 return 0;
 }
 
+static int nbd_discard(BlockDriverState *bs, int64_t sector_num,
+   int nb_sectors)
+{
+BDRVNBDState *s = bs-opaque;
+struct nbd_request request;
+struct nbd_reply reply;
+
+if (!(s-nbdflags  NBD_FLAG_SEND_TRIM)) {
+return 0;
+}
+request.type = NBD_CMD_TRIM;
+request.handle = (uint64_t)(intptr_t)bs;
+request.from = sector_num * 512;;
+request.len = nb_sectors * 512;
+
+if (nbd_send_request(s-sock, request) == -1)
+return -errno;
+
+if (nbd_receive_reply(s-sock, reply) == -1)
+return -errno;
+
+if (reply.error !=0)
+return -reply.error;
+
+if (reply.handle != request.handle)
+return -EIO;
+
+return 0;
+}
+
 static void nbd_close(BlockDriverState *bs)
 {
 BDRVNBDState *s = bs-opaque;
@@ -299,6 +329,7 @@ static BlockDriver bdrv_nbd = {
 .bdrv_write= nbd_write,
 .bdrv_close= nbd_close,
 .bdrv_flush= nbd_flush,
+.bdrv_discard  = nbd_discard,
 .bdrv_getlength= nbd_getlength,
 .protocol_name = nbd,
 };
diff --git a/nbd.c b/nbd.c
index b65fb4a..f089904 100644
--- a/nbd.c
+++ b/nbd.c
@@ -194,7 +194,7 @@ int nbd_negotiate(int csock, off_t size, uint32_t flags)
 cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL);
 cpu_to_be64w((uint64_t*)(buf + 16), size);
 cpu_to_be32w((uint32_t*)(buf + 24),
- flags | NBD_FLAG_HAS_FLAGS |
+ flags | NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
  NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
 memset(buf + 28, 0, 124);
 
@@ -703,6 +703,13 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, 
uint64_t dev_offset,
 if (nbd_send_reply(csock, reply) == -1)
 return -1;
 break;
+case NBD_CMD_TRIM:
+TRACE(Request type is TRIM);
+bdrv_discard(bs, (request.from + dev_offset) / 512,
+ request.len / 512);
+if (nbd_send_reply(csock, reply) == -1)
+return -1;
+break;
 default:
 LOG(invalid request type (%u) received, request.type);
 errno = EINVAL;
-- 
1.7.6

[Qemu-devel] [PATCH 12/12] nbd: split requests

2011-09-08 Thread Paolo Bonzini

qemu-nbd has a limit of slightly less than 1M per request.  Work
around this in the nbd block driver.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block/nbd.c |   52 ++--
 1 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 5a75263..468a517 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -213,8 +213,9 @@ static int nbd_open(BlockDriverState *bs, const char* 
filename, int flags)
 return result;
 }
 
-static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
-int nb_sectors, QEMUIOVector *qiov)
+static int nbd_co_readv_1(BlockDriverState *bs, int64_t sector_num,
+  int nb_sectors, QEMUIOVector *qiov,
+  int offset)
 {
 BDRVNBDState *s = bs-opaque;
 struct nbd_request request;
@@ -241,7 +242,7 @@ static int nbd_co_readv(BlockDriverState *bs, int64_t 
sector_num,
 reply.error = EIO;
 goto done;
 }
-if (qemu_co_recvv(s-sock, qiov-iov, request.len, 0) != request.len) {
+if (qemu_co_recvv(s-sock, qiov-iov, request.len, offset) != request.len) 
{
 reply.error = EIO;
 }
 
@@ -251,8 +252,9 @@ done:
 
 }
 
-static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num,
- int nb_sectors, QEMUIOVector *qiov)
+static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num,
+   int nb_sectors, QEMUIOVector *qiov,
+   int offset)
 {
 BDRVNBDState *s = bs-opaque;
 struct nbd_request request;
@@ -273,7 +275,7 @@ static int nbd_co_writev(BlockDriverState *bs, int64_t 
sector_num,
 reply.error = errno;
 goto done;
 }
-ret = qemu_co_sendv(s-sock, qiov-iov, request.len, 0);
+ret = qemu_co_sendv(s-sock, qiov-iov, request.len, offset);
 if (ret != request.len) {
 reply.error = EIO;
 goto done;
@@ -291,6 +293,44 @@ done:
 return -reply.error;
 }
 
+/* qemu-nbd has a limit of slightly less than 1M per request.  For safety,
+ * transfer at most 512K per request. */
+#define NBD_MAX_SECTORS 1024
+
+static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
+int nb_sectors, QEMUIOVector *qiov)
+{
+int offset = 0;
+int ret;
+while (nb_sectors  NBD_MAX_SECTORS) {
+ret = nbd_co_readv_1(bs, sector_num, NBD_MAX_SECTORS, qiov, offset);
+if (ret  0) {
+return ret;
+}
+offset += NBD_MAX_SECTORS * 512;
+sector_num += NBD_MAX_SECTORS;
+nb_sectors -= NBD_MAX_SECTORS;
+}
+return nbd_co_readv_1(bs, sector_num, nb_sectors, qiov, offset);
+}
+
+static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num,
+ int nb_sectors, QEMUIOVector *qiov)
+{
+int offset = 0;
+int ret;
+while (nb_sectors  NBD_MAX_SECTORS) {
+ret = nbd_co_writev_1(bs, sector_num, NBD_MAX_SECTORS, qiov, offset);
+if (ret  0) {
+return ret;
+}
+offset += NBD_MAX_SECTORS * 512;
+sector_num += NBD_MAX_SECTORS;
+nb_sectors -= NBD_MAX_SECTORS;
+}
+return nbd_co_writev_1(bs, sector_num, nb_sectors, qiov, offset);
+}
+
 static int nbd_co_flush(BlockDriverState *bs)
 {
 BDRVNBDState *s = bs-opaque;
-- 
1.7.6

[Qemu-devel] [PATCH 11/12] nbd: switch to asynchronous operation

2011-09-08 Thread Paolo Bonzini

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block/nbd.c |  167 ++
 nbd.c   |8 +++
 2 files changed, 117 insertions(+), 58 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 964caa8..5a75263 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -52,6 +52,9 @@ typedef struct BDRVNBDState {
 size_t blocksize;
 char *export_name; /* An NBD server may export several devices */
 
+CoMutex mutex;
+Coroutine *coroutine;
+
 /* If it begins with  '/', this is a UNIX domain socket. Otherwise,
  * it's a string of the form hostname|ip4|\[ip6\]:port
  */
@@ -104,6 +107,37 @@ out:
 return err;
 }
 
+static void nbd_coroutine_start(BDRVNBDState *s)
+{
+qemu_co_mutex_lock(s-mutex);
+s-coroutine = qemu_coroutine_self();
+}
+
+static void nbd_coroutine_enter(void *opaque)
+{
+BDRVNBDState *s = opaque;
+qemu_coroutine_enter(s-coroutine, NULL);
+}
+
+static int nbd_co_send_request(BDRVNBDState *s, struct nbd_request *request)
+{
+qemu_aio_set_fd_handler(s-sock, NULL, nbd_coroutine_enter, NULL, NULL, s);
+return nbd_send_request(s-sock, request);
+}
+
+static int nbd_co_receive_reply(BDRVNBDState *s, struct nbd_reply *reply)
+{
+qemu_aio_set_fd_handler(s-sock, nbd_coroutine_enter, NULL, NULL, NULL, s);
+return nbd_receive_reply(s-sock, reply);
+}
+
+static void nbd_coroutine_end(BDRVNBDState *s)
+{
+qemu_aio_set_fd_handler(s-sock, NULL, NULL, NULL, NULL, s);
+s-coroutine = NULL;
+qemu_co_mutex_unlock(s-mutex);
+}
+
 static int nbd_establish_connection(BlockDriverState *bs)
 {
 BDRVNBDState *s = bs-opaque;
@@ -163,6 +197,8 @@ static int nbd_open(BlockDriverState *bs, const char* 
filename, int flags)
 BDRVNBDState *s = bs-opaque;
 int result;
 
+qemu_co_mutex_init(s-mutex);
+
 /* Pop the config into our state object. Exit if invalid. */
 result = nbd_config(s, filename, flags);
 if (result != 0) {
@@ -177,8 +213,8 @@ static int nbd_open(BlockDriverState *bs, const char* 
filename, int flags)
 return result;
 }
 
-static int nbd_read(BlockDriverState *bs, int64_t sector_num,
-uint8_t *buf, int nb_sectors)
+static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
+int nb_sectors, QEMUIOVector *qiov)
 {
 BDRVNBDState *s = bs-opaque;
 struct nbd_request request;
@@ -189,30 +225,39 @@ static int nbd_read(BlockDriverState *bs, int64_t 
sector_num,
 request.from = sector_num * 512;;
 request.len = nb_sectors * 512;
 
-if (nbd_send_request(s-sock, request) == -1)
-return -errno;
-
-if (nbd_receive_reply(s-sock, reply) == -1)
-return -errno;
-
-if (reply.error !=0)
-return -reply.error;
-
-if (reply.handle != request.handle)
-return -EIO;
+nbd_coroutine_start(s);
+if (nbd_co_send_request(s, request) == -1) {
+reply.error = errno;
+goto done;
+}
+if (nbd_co_receive_reply(s, reply) == -1) {
+reply.error = errno;
+goto done;
+}
+if (reply.error != 0) {
+goto done;
+}
+if (reply.handle != request.handle) {
+reply.error = EIO;
+goto done;
+}
+if (qemu_co_recvv(s-sock, qiov-iov, request.len, 0) != request.len) {
+reply.error = EIO;
+}
 
-if (nbd_wr_sync(s-sock, buf, request.len, 1) != request.len)
-return -EIO;
+done:
+nbd_coroutine_end(s);
+return -reply.error;
 
-return 0;
 }
 
-static int nbd_write(BlockDriverState *bs, int64_t sector_num,
- const uint8_t *buf, int nb_sectors)
+static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num,
+ int nb_sectors, QEMUIOVector *qiov)
 {
 BDRVNBDState *s = bs-opaque;
 struct nbd_request request;
 struct nbd_reply reply;
+int ret;
 
 request.type = NBD_CMD_WRITE;
 if (!bdrv_enable_write_cache(bs)  (s-nbdflags  NBD_FLAG_SEND_FUA)) {
@@ -223,25 +268,30 @@ static int nbd_write(BlockDriverState *bs, int64_t 
sector_num,
 request.from = sector_num * 512;;
 request.len = nb_sectors * 512;
 
-if (nbd_send_request(s-sock, request) == -1)
-return -errno;
-
-if (nbd_wr_sync(s-sock, (uint8_t*)buf, request.len, 0) != request.len)
-return -EIO;
-
-if (nbd_receive_reply(s-sock, reply) == -1)
-return -errno;
-
-if (reply.error !=0)
-return -reply.error;
-
-if (reply.handle != request.handle)
-return -EIO;
+nbd_coroutine_start(s);
+if (nbd_co_send_request(s, request) == -1) {
+reply.error = errno;
+goto done;
+}
+ret = qemu_co_sendv(s-sock, qiov-iov, request.len, 0);
+if (ret != request.len) {
+reply.error = EIO;
+goto done;
+}
+if (nbd_co_receive_reply(s, reply) == -1) {
+reply.error = errno;
+goto done;
+}
+if (reply.handle != request.handle) {
+reply.error =

Re: [Qemu-devel] [PATCH V8 10/14] Encrypt state blobs using AES CBC encryption

2011-09-08 Thread Stefan Berger


On 09/08/2011 09:16 AM, Michael S. Tsirkin wrote:

On Thu, Sep 08, 2011 at 08:11:00AM -0400, Stefan Berger wrote:

On 09/08/2011 06:32 AM, Michael S. Tsirkin wrote:

On Wed, Sep 07, 2011 at 08:16:27PM -0400, Stefan Berger wrote:

On 09/07/2011 02:55 PM, Michael S. Tsirkin wrote:

On Thu, Sep 01, 2011 at 10:23:51PM -0400, Stefan Berger wrote:

An additional 'layer' for reading and writing the blobs to the underlying
block storage is added. This layer encrypts the blobs for writing if a key is
available. Similarly it decrypts the blobs after reading.

So a couple of further thoughts:
1. Raw storage should work too, and with e.g. NFS migration will be fine, right?
So I'd say it's worth supporting.

NFS via shared storage, yes, but not migration via Qemu's block
migration mechanism. If snapshotting was supposed to be a feature to
support then that's only possible via block storage (QCoW2 in
particular).

As disk has the same limitation, that sounds fine.
Let the user decide whether snapshoting is needed,
same as disk.


Adding plain file support to the TPM code so it can store its 3
blobs into adds quite a bit of complexity to the code. The command
line parameter that previously pointed to QCoW2 image file would
probably have to point to a directory where files for the 3 blobs
can be written into. Besides that, snapshotting would actually have
to be prevented maybe through registering a (fake) file of other
than QCoW2 type since the plain TPM files won't handle snapshotting
correctly, either, and QEMU pretty much would have to be prevented

from doing snapshotting at all. Maybe there's an API for this, but I

don't know. Though why create this additional complexity? I don't
mind relaxing the requirement of using a QCoW2 image and allowing
for example RAW images (that then automatically prevent the
snapshotting from happening) but the same code I now have would work
for writing the blobs into it the single file.

Right. Write all blobs into a single files at different
offsets, or something.

That's exactly what I am doing already. Just that I am doing this
with Qemu's BlockStorage (bdrv)  writing to sectors rather than
seek()ing in files. To avoid more complexity I'd rather not
introduce more code handling plain files but rely on all the image
formats that qemu already supports and that give features like
encryption (QCoW2 only), snapshotting (QCoW2 only) and block
migration (presumably all of them). Plain files offer none of that.
Devices that need to write their state to persistent storage really
have to aim for doing this through Qemu's bdrv since they will
otherwise be the ones killing the snapshot feature. TPM certainly
doesn't want to be one of them. If the user doesn't want
snapshotting to be supported since his VM image files are not QCoW2
type of files, just create a raw image file for the TPM's persistent
state and bdrv will automatically prevent snapshotting. The point is
that the TPM code now using the bdrv layer works with any image
format already.

Ah, that's fine then. I had an impression there was a qcow only
limitation, not sure what in code gave me that impression.

Hm, currently I force the image to be a QCoW2.

bdrv_get_format(bs, buf, sizeof(buf));
if (strcmp(buf, qcow2)) {
fprintf(stderr, vTPM backing store must be of type qcow2\n);
goto err_exit;
}

I can remove this and we should be fine.

2. File backed nvram is interesting outside tpm.
For example,vpd and chassis number for pci, eeprom emulation for network 
cards.
Using a file per device might be inconvenient though.
So please think of a format and API that will allow sections
for use by different devices.

Also here 'snapshotting' is the most 'demanding' feature of QEMU I
would say. Snapshotting isn't easily supported outside of the block
layer from what I understand. Once you are tied to the block layer
you end up having to use images and those don't grow quite well. So
other devices wanting to use those type of devices would need to
know what the worst case sizes are for writing their state into --
unless an image format is created that can grow.

As for the format: Ideally all devices could write into one file,
right? That would at least prevent too many files besides the VM's
image file from floating around which presumably makes image
management easier. Following the above, you add up all the worst
case sizes the individual devices may need for their blobs and
create an image with that capacity. Then you need some form of a
(primitive?) directory that lets you write blobs into that storage.
Assuming there were well defined names for those devices one could
say for example store this blobs under the name
'tpm-permanent-state' and later on load it under that name. The
possible size of the directory would have to be considered as
well... I do something like that for the TPM where I have up to 3
such blobs that I store.

The bad thing about the above is of course the need to know

[Qemu-devel] [PATCH 09/12] sheepdog: move coroutine send/recv function to generic code

2011-09-08 Thread Paolo Bonzini

Outside coroutines, avoid busy waiting on EAGAIN by temporarily
making the socket blocking.

The API of qemu_recvv/qemu_sendv is slightly different from
do_readv/do_writev because they do not handle coroutines.  It
returns the number of bytes written before encountering an
EAGAIN.  The specificity of yielding on EAGAIN is entirely in
qemu-coroutine.c.

Cc: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block/sheepdog.c |  221 +
 cutils.c |  108 ++
 qemu-common.h|3 +
 qemu-coroutine.c |   71 +
 qemu-coroutine.h |   26 +++
 5 files changed, 229 insertions(+), 200 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index af696a5..188a8d8 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -443,129 +443,6 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
 return acb;
 }
 
-#ifdef _WIN32
-
-struct msghdr {
-struct iovec *msg_iov;
-size_tmsg_iovlen;
-};
-
-static ssize_t sendmsg(int s, const struct msghdr *msg, int flags)
-{
-size_t size = 0;
-char *buf, *p;
-int i, ret;
-
-/* count the msg size */
-for (i = 0; i  msg-msg_iovlen; i++) {
-size += msg-msg_iov[i].iov_len;
-}
-buf = g_malloc(size);
-
-p = buf;
-for (i = 0; i  msg-msg_iovlen; i++) {
-memcpy(p, msg-msg_iov[i].iov_base, msg-msg_iov[i].iov_len);
-p += msg-msg_iov[i].iov_len;
-}
-
-ret = send(s, buf, size, flags);
-
-g_free(buf);
-return ret;
-}
-
-static ssize_t recvmsg(int s, struct msghdr *msg, int flags)
-{
-size_t size = 0;
-char *buf, *p;
-int i, ret;
-
-/* count the msg size */
-for (i = 0; i  msg-msg_iovlen; i++) {
-size += msg-msg_iov[i].iov_len;
-}
-buf = g_malloc(size);
-
-ret = qemu_recv(s, buf, size, flags);
-if (ret  0) {
-goto out;
-}
-
-p = buf;
-for (i = 0; i  msg-msg_iovlen; i++) {
-memcpy(msg-msg_iov[i].iov_base, p, msg-msg_iov[i].iov_len);
-p += msg-msg_iov[i].iov_len;
-}
-out:
-g_free(buf);
-return ret;
-}
-
-#endif
-
-/*
- * Send/recv data with iovec buffers
- *
- * This function send/recv data from/to the iovec buffer directly.
- * The first `offset' bytes in the iovec buffer are skipped and next
- * `len' bytes are used.
- *
- * For example,
- *
- *   do_send_recv(sockfd, iov, len, offset, 1);
- *
- * is equals to
- *
- *   char *buf = malloc(size);
- *   iov_to_buf(iov, iovcnt, buf, offset, size);
- *   send(sockfd, buf, size, 0);
- *   free(buf);
- */
-static int do_send_recv(int sockfd, struct iovec *iov, int len, int offset,
-int write)
-{
-struct msghdr msg;
-int ret, diff;
-
-memset(msg, 0, sizeof(msg));
-msg.msg_iov = iov;
-msg.msg_iovlen = 1;
-
-len += offset;
-
-while (iov-iov_len  len) {
-len -= iov-iov_len;
-
-iov++;
-msg.msg_iovlen++;
-}
-
-diff = iov-iov_len - len;
-iov-iov_len -= diff;
-
-while (msg.msg_iov-iov_len = offset) {
-offset -= msg.msg_iov-iov_len;
-
-msg.msg_iov++;
-msg.msg_iovlen--;
-}
-
-msg.msg_iov-iov_base = (char *) msg.msg_iov-iov_base + offset;
-msg.msg_iov-iov_len -= offset;
-
-if (write) {
-ret = sendmsg(sockfd, msg, 0);
-} else {
-ret = recvmsg(sockfd, msg, 0);
-}
-
-msg.msg_iov-iov_base = (char *) msg.msg_iov-iov_base - offset;
-msg.msg_iov-iov_len += offset;
-
-iov-iov_len += diff;
-return ret;
-}
-
 static int connect_to_sdog(const char *addr, const char *port)
 {
 char hbuf[NI_MAXHOST], sbuf[NI_MAXSERV];
@@ -618,65 +495,6 @@ success:
 return fd;
 }
 
-static int do_readv_writev(int sockfd, struct iovec *iov, int len,
-   int iov_offset, int write)
-{
-int ret;
-again:
-ret = do_send_recv(sockfd, iov, len, iov_offset, write);
-if (ret  0) {
-if (errno == EINTR) {
-goto again;
-}
-if (errno == EAGAIN) {
-if (qemu_in_coroutine()) {
-qemu_coroutine_yield();
-}
-goto again;
-}
-error_report(failed to recv a rsp, %s, strerror(errno));
-return 1;
-}
-
-iov_offset += ret;
-len -= ret;
-if (len) {
-goto again;
-}
-
-return 0;
-}
-
-static int do_readv(int sockfd, struct iovec *iov, int len, int iov_offset)
-{
-return do_readv_writev(sockfd, iov, len, iov_offset, 0);
-}
-
-static int do_writev(int sockfd, struct iovec *iov, int len, int iov_offset)
-{
-return do_readv_writev(sockfd, iov, len, iov_offset, 1);
-}
-
-static int do_read_write(int sockfd, void *buf, int len, int write)
-{
-struct iovec iov;
-
-iov.iov_base = buf;
-iov.iov_len = len;
-
-return do_readv_writev(sockfd, iov, len, 0, write);
-}
-
-static int do_read(int

[Qemu-devel] [PATCH 10/12] block: add bdrv_co_flush support

2011-09-08 Thread Paolo Bonzini

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block.c |   53 ++---
 block_int.h |1 +
 2 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index 43742b7..3f32f75 100644
--- a/block.c
+++ b/block.c
@@ -64,6 +64,9 @@ static BlockDriverAIOCB 
*bdrv_co_aio_readv_em(BlockDriverState *bs,
 static BlockDriverAIOCB *bdrv_co_aio_writev_em(BlockDriverState *bs,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockDriverCompletionFunc *cb, void *opaque);
+static BlockDriverAIOCB *bdrv_co_aio_flush_em(BlockDriverState *bs,
+  BlockDriverCompletionFunc *cb,
+  void *opaque);
 static int coroutine_fn bdrv_co_readv_em(BlockDriverState *bs,
  int64_t sector_num, int nb_sectors,
  QEMUIOVector *iov);
@@ -204,8 +207,18 @@ void bdrv_register(BlockDriver *bdrv)
 }
 }
 
-if (!bdrv-bdrv_aio_flush)
-bdrv-bdrv_aio_flush = bdrv_aio_flush_em;
+if (bdrv-bdrv_aio_flush  !bdrv-bdrv_co_flush) {
+/* Emulate coroutines by AIO */
+bdrv-bdrv_co_flush = bdrv_co_flush_em;
+}
+if (!bdrv-bdrv_aio_flush) {
+/* Emulate AIO by either coroutines or sync */
+if (bdrv-bdrv_co_flush) {
+bdrv-bdrv_aio_flush = bdrv_co_aio_flush_em;
+} else {
+bdrv-bdrv_aio_flush = bdrv_aio_flush_em;
+}
+}
 
 QLIST_INSERT_HEAD(bdrv_drivers, bdrv, list);
 }
@@ -980,11 +993,6 @@ static inline bool bdrv_has_async_rw(BlockDriver *drv)
 || drv-bdrv_aio_readv != bdrv_aio_readv_em;
 }
 
-static inline bool bdrv_has_async_flush(BlockDriver *drv)
-{
-return drv-bdrv_aio_flush != bdrv_aio_flush_em;
-}
-
 /* return  0 if error. See bdrv_write() for the return codes */
 int bdrv_read(BlockDriverState *bs, int64_t sector_num,
   uint8_t *buf, int nb_sectors)
@@ -1713,8 +1721,8 @@ int bdrv_flush(BlockDriverState *bs)
 return 0;
 }
 
-if (bs-drv  bdrv_has_async_flush(bs-drv)  qemu_in_coroutine()) {
-return bdrv_co_flush_em(bs);
+if (bs-drv  bs-drv-bdrv_co_flush  qemu_in_coroutine()) {
+return bs-drv-bdrv_co_flush(bs);
 }
 
 if (bs-drv  bs-drv-bdrv_flush) {
@@ -2729,7 +2737,7 @@ static AIOPool bdrv_em_co_aio_pool = {
 .cancel = bdrv_aio_co_cancel_em,
 };
 
-static void bdrv_co_rw_bh(void *opaque)
+static void bdrv_co_em_bh(void *opaque)
 {
 BlockDriverAIOCBCoroutine *acb = opaque;
 
@@ -2751,7 +2759,7 @@ static void coroutine_fn bdrv_co_rw(void *opaque)
 acb-req.nb_sectors, acb-req.qiov);
 }
 
-acb-bh = qemu_bh_new(bdrv_co_rw_bh, acb);
+acb-bh = qemu_bh_new(bdrv_co_em_bh, acb);
 qemu_bh_schedule(acb-bh);
 }
 
@@ -2794,6 +2802,29 @@ static BlockDriverAIOCB 
*bdrv_co_aio_writev_em(BlockDriverState *bs,
  true);
 }
 
+static void coroutine_fn bdrv_co_flush(void *opaque)
+{
+BlockDriverAIOCBCoroutine *acb = opaque;
+BlockDriverState *bs = acb-common.bs;
+
+acb-req.error = bs-drv-bdrv_co_flush(bs);
+acb-bh = qemu_bh_new(bdrv_co_em_bh, acb);
+qemu_bh_schedule(acb-bh);
+}
+
+static BlockDriverAIOCB *bdrv_co_aio_flush_em(BlockDriverState *bs,
+  BlockDriverCompletionFunc *cb,
+  void *opaque)
+{
+Coroutine *co;
+BlockDriverAIOCBCoroutine *acb;
+
+acb = qemu_aio_get(bdrv_em_co_aio_pool, bs, cb, opaque);
+co = qemu_coroutine_create(bdrv_co_flush);
+qemu_coroutine_enter(co, acb);
+
+return acb-common;
+}
 static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
 BlockDriverCompletionFunc *cb, void *opaque)
 {
diff --git a/block_int.h b/block_int.h
index 8a72b80..b0cd5ea 100644
--- a/block_int.h
+++ b/block_int.h
@@ -83,6 +83,7 @@ struct BlockDriver {
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+int coroutine_fn (*bdrv_co_flush)(BlockDriverState *bs);
 
 int (*bdrv_aio_multiwrite)(BlockDriverState *bs, BlockRequest *reqs,
 int num_reqs);
-- 
1.7.6

[Qemu-devel] [PATCH 07/12] sheepdog: add coroutine_fn markers

2011-09-08 Thread Paolo Bonzini

This makes the following patch easier to review.

Cc: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block/sheepdog.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index c1f6e07..af696a5 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -396,7 +396,7 @@ static inline int free_aio_req(BDRVSheepdogState *s, AIOReq 
*aio_req)
 return !QLIST_EMPTY(acb-aioreq_head);
 }
 
-static void sd_finish_aiocb(SheepdogAIOCB *acb)
+static void coroutine_fn sd_finish_aiocb(SheepdogAIOCB *acb)
 {
 if (!acb-canceled) {
 qemu_coroutine_enter(acb-coroutine, NULL);
@@ -735,7 +735,7 @@ out:
 return ret;
 }
 
-static int add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, int create,
enum AIOCBState aiocb_type);
 
@@ -743,7 +743,7 @@ static int add_aio_request(BDRVSheepdogState *s, AIOReq 
*aio_req,
  * This function searchs pending requests to the object `oid', and
  * sends them.
  */
-static void send_pending_req(BDRVSheepdogState *s, uint64_t oid, uint32_t id)
+static void coroutine_fn send_pending_req(BDRVSheepdogState *s, uint64_t oid, 
uint32_t id)
 {
 AIOReq *aio_req, *next;
 SheepdogAIOCB *acb;
@@ -777,7 +777,7 @@ static void send_pending_req(BDRVSheepdogState *s, uint64_t 
oid, uint32_t id)
  * This function is registered as a fd handler, and called from the
  * main loop when s-fd is ready for reading responses.
  */
-static void aio_read_response(void *opaque)
+static void coroutine_fn aio_read_response(void *opaque)
 {
 SheepdogObjRsp rsp;
 BDRVSheepdogState *s = opaque;
@@ -1064,7 +1064,7 @@ out:
 return ret;
 }
 
-static int add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, int create,
enum AIOCBState aiocb_type)
 {
@@ -1517,7 +1517,7 @@ static int sd_truncate(BlockDriverState *bs, int64_t 
offset)
  * update metadata, this sends a write request to the vdi object.
  * Otherwise, this switches back to sd_co_readv/writev.
  */
-static void sd_write_done(SheepdogAIOCB *acb)
+static void coroutine_fn sd_write_done(SheepdogAIOCB *acb)
 {
 int ret;
 BDRVSheepdogState *s = acb-common.bs-opaque;
@@ -1615,7 +1615,7 @@ out:
  * Returns 1 when we need to wait a response, 0 when there is no sent
  * request and -errno in error cases.
  */
-static int sd_co_rw_vector(void *p)
+static int coroutine_fn sd_co_rw_vector(void *p)
 {
 SheepdogAIOCB *acb = p;
 int ret = 0;
-- 
1.7.6

[Qemu-devel] [PATCH 04/12] nbd: add support for NBD_CMD_FLUSH

2011-09-08 Thread Paolo Bonzini

Note for the brace police: the style in this commit and the following
is consistent with the rest of the file.  It is then fixed together with
the introduction of coroutines.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block/nbd.c |   31 +++
 nbd.c   |   14 +-
 2 files changed, 44 insertions(+), 1 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index ffc57a9..4a195dc 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -237,6 +237,36 @@ static int nbd_write(BlockDriverState *bs, int64_t 
sector_num,
 return 0;
 }
 
+static int nbd_flush(BlockDriverState *bs)
+{
+BDRVNBDState *s = bs-opaque;
+struct nbd_request request;
+struct nbd_reply reply;
+
+if (!(s-nbdflags  NBD_FLAG_SEND_FLUSH)) {
+return 0;
+}
+
+request.type = NBD_CMD_FLUSH;
+request.handle = (uint64_t)(intptr_t)bs;
+request.from = 0;
+request.len = 0;
+
+if (nbd_send_request(s-sock, request) == -1)
+return -errno;
+
+if (nbd_receive_reply(s-sock, reply) == -1)
+return -errno;
+
+if (reply.error !=0)
+return -reply.error;
+
+if (reply.handle != request.handle)
+return -EIO;
+
+return 0;
+}
+
 static void nbd_close(BlockDriverState *bs)
 {
 BDRVNBDState *s = bs-opaque;
@@ -260,6 +290,7 @@ static BlockDriver bdrv_nbd = {
 .bdrv_read = nbd_read,
 .bdrv_write= nbd_write,
 .bdrv_close= nbd_close,
+.bdrv_flush= nbd_flush,
 .bdrv_getlength= nbd_getlength,
 .protocol_name = nbd,
 };
diff --git a/nbd.c b/nbd.c
index 30cd78f..4dbbc62 100644
--- a/nbd.c
+++ b/nbd.c
@@ -193,7 +193,8 @@ int nbd_negotiate(int csock, off_t size, uint32_t flags)
 memcpy(buf, NBDMAGIC, 8);
 cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL);
 cpu_to_be64w((uint64_t*)(buf + 16), size);
-cpu_to_be32w((uint32_t*)(buf + 24), flags | NBD_FLAG_HAS_FLAGS);
+cpu_to_be32w((uint32_t*)(buf + 24),
+ flags | NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_FLUSH);
 memset(buf + 28, 0, 124);
 
 if (write_sync(csock, buf, sizeof(buf)) != sizeof(buf)) {
@@ -682,6 +683,18 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, 
uint64_t dev_offset,
 TRACE(Request type is DISCONNECT);
 errno = 0;
 return 1;
+case NBD_CMD_FLUSH:
+TRACE(Request type is FLUSH);
+
+if (bdrv_flush(bs) == -1) {
+LOG(flush failed);
+errno = EINVAL;
+return -1;
+}
+
+if (nbd_send_reply(csock, reply) == -1)
+return -1;
+break;
 default:
 LOG(invalid request type (%u) received, request.type);
 errno = EINVAL;
-- 
1.7.6

[Qemu-devel] [PATCH 05/12] nbd: add support for NBD_CMD_FLAG_FUA

2011-09-08 Thread Paolo Bonzini

The server can use it to issue a flush automatically after a
write.  The client can also use it to mimic a write-through
cache.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block/nbd.c |8 
 nbd.c   |   13 +++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 4a195dc..5a7812c 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -215,6 +215,10 @@ static int nbd_write(BlockDriverState *bs, int64_t 
sector_num,
 struct nbd_reply reply;
 
 request.type = NBD_CMD_WRITE;
+if (!bdrv_enable_write_cache(bs)  (s-nbdflags  NBD_FLAG_SEND_FUA)) {
+request.type |= NBD_CMD_FLAG_FUA;
+}
+
 request.handle = (uint64_t)(intptr_t)bs;
 request.from = sector_num * 512;;
 request.len = nb_sectors * 512;
@@ -248,6 +252,10 @@ static int nbd_flush(BlockDriverState *bs)
 }
 
 request.type = NBD_CMD_FLUSH;
+if (s-nbdflags  NBD_FLAG_SEND_FUA) {
+request.type |= NBD_CMD_FLAG_FUA;
+}
+
 request.handle = (uint64_t)(intptr_t)bs;
 request.from = 0;
 request.len = 0;
diff --git a/nbd.c b/nbd.c
index 4dbbc62..b65fb4a 100644
--- a/nbd.c
+++ b/nbd.c
@@ -194,7 +194,8 @@ int nbd_negotiate(int csock, off_t size, uint32_t flags)
 cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL);
 cpu_to_be64w((uint64_t*)(buf + 16), size);
 cpu_to_be32w((uint32_t*)(buf + 24),
- flags | NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_FLUSH);
+ flags | NBD_FLAG_HAS_FLAGS |
+ NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
 memset(buf + 28, 0, 124);
 
 if (write_sync(csock, buf, sizeof(buf)) != sizeof(buf)) {
@@ -614,7 +615,7 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, 
uint64_t dev_offset,
 reply.handle = request.handle;
 reply.error = 0;
 
-switch (request.type) {
+switch (request.type  NBD_CMD_MASK_COMMAND) {
 case NBD_CMD_READ:
 TRACE(Request type is READ);
 
@@ -674,6 +675,14 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, 
uint64_t dev_offset,
 }
 
 *offset += request.len;
+
+if (request.type  NBD_CMD_FLAG_FUA) {
+if (bdrv_flush(bs) == -1) {
+LOG(flush failed);
+errno = EINVAL;
+return -1;
+}
+}
 }
 
 if (nbd_send_reply(csock, reply) == -1)
-- 
1.7.6

[Qemu-devel] [PATCH 03/12] nbd: support NBD_SET_FLAGS ioctl

2011-09-08 Thread Paolo Bonzini

The nbd kernel module cannot enable DISCARD requests unless it is
informed about it.  The flags field in the header is used for this,
and this patch adds support for it.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 nbd.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/nbd.c b/nbd.c
index 9ed2239..30cd78f 100644
--- a/nbd.c
+++ b/nbd.c
@@ -377,6 +377,14 @@ int nbd_init(int fd, int csock, uint32_t flags, off_t 
size, size_t blocksize)
 }
 }
 
+if (ioctl(fd, NBD_SET_FLAGS, flags)  0
+ errno != ENOTTY) {
+int serrno = errno;
+LOG(Failed setting flags);
+errno = serrno;
+return -1;
+}
+
 TRACE(Clearing NBD socket);
 
 if (ioctl(fd, NBD_CLEAR_SOCK) == -1) {
-- 
1.7.6

[Qemu-devel] [PATCH 01/12] nbd: support feature negotiation

2011-09-08 Thread Paolo Bonzini

nbd supports writing flags in bytes 24...27 of the header,
and uses that for the read-only flag.  Add support for it
in qemu-nbd.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block/nbd.c |4 ++--
 nbd.c   |   32 +---
 nbd.h   |9 ++---
 qemu-nbd.c  |   13 ++---
 4 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 55cb2fd..ffc57a9 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -47,6 +47,7 @@
 
 typedef struct BDRVNBDState {
 int sock;
+uint32_t nbdflags;
 off_t size;
 size_t blocksize;
 char *export_name; /* An NBD server may export several devices */
@@ -110,7 +111,6 @@ static int nbd_establish_connection(BlockDriverState *bs)
 int ret;
 off_t size;
 size_t blocksize;
-uint32_t nbdflags;
 
 if (s-host_spec[0] == '/') {
 sock = unix_socket_outgoing(s-host_spec);
@@ -125,7 +125,7 @@ static int nbd_establish_connection(BlockDriverState *bs)
 }
 
 /* NBD handshake */
-ret = nbd_receive_negotiate(sock, s-export_name, nbdflags, size,
+ret = nbd_receive_negotiate(sock, s-export_name, s-nbdflags, size,
 blocksize);
 if (ret == -1) {
 logout(Failed to negotiate with the NBD server\n);
diff --git a/nbd.c b/nbd.c
index e7a585d..07a8e53 100644
--- a/nbd.c
+++ b/nbd.c
@@ -29,6 +29,10 @@
 #include ctype.h
 #include inttypes.h
 
+#ifdef __linux__
+#include linux/fs.h
+#endif
+
 #include qemu_socket.h
 
 //#define DEBUG_NBD
@@ -171,7 +175,7 @@ int unix_socket_outgoing(const char *path)
   Request (type == 2)
 */
 
-int nbd_negotiate(int csock, off_t size)
+int nbd_negotiate(int csock, off_t size, uint32_t flags)
 {
 char buf[8 + 8 + 8 + 128];
 
@@ -179,14 +183,16 @@ int nbd_negotiate(int csock, off_t size)
 [ 0 ..   7]   passwd   (NBDMAGIC)
 [ 8 ..  15]   magic(0x00420281861253)
 [16 ..  23]   size
-[24 .. 151]   reserved (0)
+[24 ..  27]   flags
+[28 .. 151]   reserved (0)
  */
 
 TRACE(Beginning negotiation.);
 memcpy(buf, NBDMAGIC, 8);
 cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL);
 cpu_to_be64w((uint64_t*)(buf + 16), size);
-memset(buf + 24, 0, 128);
+cpu_to_be32w((uint32_t*)(buf + 24), flags | NBD_FLAG_HAS_FLAGS);
+memset(buf + 28, 0, 124);
 
 if (write_sync(csock, buf, sizeof(buf)) != sizeof(buf)) {
 LOG(write failed);
@@ -336,8 +342,8 @@ int nbd_receive_negotiate(int csock, const char *name, 
uint32_t *flags,
 return 0;
 }
 
-#ifndef _WIN32
-int nbd_init(int fd, int csock, off_t size, size_t blocksize)
+#ifdef __linux__
+int nbd_init(int fd, int csock, uint32_t flags, off_t size, size_t blocksize)
 {
 TRACE(Setting block size to %lu, (unsigned long)blocksize);
 
@@ -357,6 +363,18 @@ int nbd_init(int fd, int csock, off_t size, size_t 
blocksize)
 return -1;
 }
 
+if (flags  NBD_FLAG_READ_ONLY) {
+int read_only = 1;
+TRACE(Setting readonly attribute);
+
+if (ioctl(fd, BLKROSET, (unsigned long) read_only)  0) {
+int serrno = errno;
+LOG(Failed setting read-only attribute);
+errno = serrno;
+return -1;
+}
+}
+
 TRACE(Clearing NBD socket);
 
 if (ioctl(fd, NBD_CLEAR_SOCK) == -1) {
@@ -547,7 +565,7 @@ static int nbd_send_reply(int csock, struct nbd_reply 
*reply)
 }
 
 int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset,
- off_t *offset, bool readonly, uint8_t *data, int data_size)
+ off_t *offset, uint32_t nbdflags, uint8_t *data, int data_size)
 {
 struct nbd_request request;
 struct nbd_reply reply;
@@ -631,7 +649,7 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, 
uint64_t dev_offset,
 return -1;
 }
 
-if (readonly) {
+if (nbdflags  NBD_FLAG_READ_ONLY) {
 TRACE(Server is read-only, return error);
 reply.error = 1;
 } else {
diff --git a/nbd.h b/nbd.h
index 96f77fe..938a021 100644
--- a/nbd.h
+++ b/nbd.h
@@ -39,6 +39,9 @@ struct nbd_reply {
 uint64_t handle;
 } QEMU_PACKED;
 
+#define NBD_FLAG_HAS_FLAGS  (1  0)/* Flags are there */
+#define NBD_FLAG_READ_ONLY  (1  1)/* Device is read-only */
+
 enum {
 NBD_CMD_READ = 0,
 NBD_CMD_WRITE = 1,
@@ -55,14 +58,14 @@ int tcp_socket_incoming_spec(const char *address_and_port);
 int unix_socket_outgoing(const char *path);
 int unix_socket_incoming(const char *path);
 
-int nbd_negotiate(int csock, off_t size);
+int nbd_negotiate(int csock, off_t size, uint32_t flags);
 int nbd_receive_negotiate(int csock, const char *name, uint32_t *flags,
   off_t *size, size_t *blocksize);
-int nbd_init(int fd, int csock, off_t size, size_t blocksize);
+int nbd_init(int fd, int csock, uint32_t flags, off_t size, size_t blocksize);
 int

[Qemu-devel] [PATCH 08/12] add socket_set_block

2011-09-08 Thread Paolo Bonzini

Cc: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 oslib-posix.c |7 +++
 oslib-win32.c |6 ++
 qemu_socket.h |1 +
 3 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/oslib-posix.c b/oslib-posix.c
index 196099c..e13e6d4 100644
--- a/oslib-posix.c
+++ b/oslib-posix.c
@@ -91,6 +91,13 @@ void qemu_vfree(void *ptr)
 free(ptr);
 }
 
+void socket_set_block(int fd)
+{
+int f;
+f = fcntl(fd, F_GETFL);
+fcntl(fd, F_SETFL, f  ~O_NONBLOCK);
+}
+
 void socket_set_nonblock(int fd)
 {
 int f;
diff --git a/oslib-win32.c b/oslib-win32.c
index 5f0759f..5e3de7d 100644
--- a/oslib-win32.c
+++ b/oslib-win32.c
@@ -73,6 +73,12 @@ void qemu_vfree(void *ptr)
 VirtualFree(ptr, 0, MEM_RELEASE);
 }
 
+void socket_set_block(int fd)
+{
+unsigned long opt = 0;
+ioctlsocket(fd, FIONBIO, opt);
+}
+
 void socket_set_nonblock(int fd)
 {
 unsigned long opt = 1;
diff --git a/qemu_socket.h b/qemu_socket.h
index 180e4db..9e32fac 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -35,6 +35,7 @@ int inet_aton(const char *cp, struct in_addr *ia);
 /* misc helpers */
 int qemu_socket(int domain, int type, int protocol);
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
+void socket_set_block(int fd);
 void socket_set_nonblock(int fd);
 int send_all(int fd, const void *buf, int len1);
 
-- 
1.7.6

[Qemu-devel] [PATCH] Fix include statements for qemu-common.h

2011-09-08 Thread Stefan Weil

* qemu-common.h is not a system include file, so it should be included
  with  instead of . Otherwise incremental builds might fail
  because only local include files are checked for changes.

* linux-user/syscall.c included the file twice.

Cc: Kevin Wolf kw...@redhat.com
Cc: Riku Voipio riku.voi...@iki.fi
Cc: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Stefan Weil w...@mail.berlios.de
---
 hw/virtio-blk.c  |2 +-
 linux-user/syscall.c |3 +--
 nbd.h|2 +-
 qemu-nbd.c   |2 +-
 slirp/libslirp.h |2 +-
 5 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 4df23f4..d5d4757 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -11,7 +11,7 @@
  *
  */
 
-#include qemu-common.h
+#include qemu-common.h
 #include qemu-error.h
 #include trace.h
 #include blockdev.h
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 6bdf4e6..e87e174 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -60,7 +60,7 @@ int __clone2(int (*fn)(void *), void *child_stack_base,
 #include netinet/ip.h
 #include netinet/tcp.h
 #include linux/wireless.h
-#include qemu-common.h
+#include qemu-common.h
 #ifdef TARGET_GPROF
 #include sys/gmon.h
 #endif
@@ -96,7 +96,6 @@ int __clone2(int (*fn)(void *), void *child_stack_base,
 #include cpu-uname.h
 
 #include qemu.h
-#include qemu-common.h
 
 #if defined(CONFIG_USE_NPTL)
 #define CLONE_NPTL_FLAGS2 (CLONE_SETTLS | \
diff --git a/nbd.h b/nbd.h
index 96f77fe..273cfa1 100644
--- a/nbd.h
+++ b/nbd.h
@@ -21,7 +21,7 @@
 
 #include sys/types.h
 
-#include qemu-common.h
+#include qemu-common.h
 
 #include block_int.h
 
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 0b25a4d..3a39145 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -16,7 +16,7 @@
  *  along with this program; if not, see http://www.gnu.org/licenses/.
  */
 
-#include qemu-common.h
+#include qemu-common.h
 #include block_int.h
 #include nbd.h
 
diff --git a/slirp/libslirp.h b/slirp/libslirp.h
index 67c70e3..a755123 100644
--- a/slirp/libslirp.h
+++ b/slirp/libslirp.h
@@ -1,7 +1,7 @@
 #ifndef _LIBSLIRP_H
 #define _LIBSLIRP_H
 
-#include qemu-common.h
+#include qemu-common.h
 
 #ifdef CONFIG_SLIRP
 
-- 
1.7.2.5

Re: [Qemu-devel] [PATCH] virtio-9p: Fix syntax error in debug code

2011-09-08 Thread Stefan Weil


Am 20.07.2011 11:44, schrieb Aneesh Kumar K.V:
On Wed, 20 Jul 2011 08:27:28 +0200, Stefan Weil w...@mail.berlios.de 
wrote:

This error was reported by cppcheck:

qemu/hw/9pfs/virtio-9p-debug.c:342:
error: Invalid number of character ({) when these macros are defined:
'DEBUG_DATA'.

Cc: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Signed-off-by: Stefan Weil w...@mail.berlios.de
---
hw/9pfs/virtio-9p-debug.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/9pfs/virtio-9p-debug.c b/hw/9pfs/virtio-9p-debug.c
index 4636ad5..96925f0 100644
--- a/hw/9pfs/virtio-9p-debug.c
+++ b/hw/9pfs/virtio-9p-debug.c
@@ -295,7 +295,7 @@ static void pprint_data(V9fsPDU *pdu, int rx, 
size_t *offsetp, const char *name)


if (rx) {
count = pdu-elem.in_num;
- } else
+ } else {
count = pdu-elem.out_num;
}



Applied.

We also need to update virtio-9p-debug w.r.t the new co-routine
series. With co-routine we can have multiple 9p handler started
simultaneously.

-aneesh


Maybe this patch can be applied via qemu-trivial.
I simply would like to get it out of my list of open patches.

Thanks,
Stefan W.

Re: [Qemu-devel] [PATCH 0/2] improve qemu-img conversion performance

2011-09-08 Thread Sage Weil

On Thu, 8 Sep 2011, Kevin Wolf wrote:
 Am 08.09.2011 01:06, schrieb Yehuda Sadeh:
  The following set of patches improve the qemu-img conversion process
  performance. When using a higher latency backend, small writes have a
  severe impact on the time it takes to do image conversion. 
  We switch to using async writes, and we avoid splitting writes due to
  holes when the holes are small enough.
  
  Yehuda Sadeh (2):
qemu-img: async write to block device when converting image
qemu-img: don't skip writing small holes
  
   qemu-img.c |   34 +++---
   1 files changed, 27 insertions(+), 7 deletions(-)
  
 
 This doesn't seem to be against git master or the block tree. Please rebase.
 
 I think that commit a22f123c may obsolete your patch 2/2.

With git.kernel.org down, where should I be looking for the latest 
upstream?

Thanks!
sage

Re: [Qemu-devel] [PATCH v4 24/39] ppc: convert to memory API

2011-09-08 Thread Alexander Graf

[dropping kvm@vger because my mail server refuses to send mails there]

On 08.08.2011, at 15:09, Avi Kivity wrote:

 Reviewed-by: Richard Henderson r...@twiddle.net
 Reviewed-by: Anthony Liguori aligu...@us.ibm.com
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
 hw/cuda.c |6 ++-
 hw/escc.c |   42 +--
 hw/escc.h |2 +-
 hw/heathrow_pic.c |   29 --
 hw/ide.h  |2 +-
 hw/ide/macio.c|   36 ---
 hw/mac_dbdma.c|   32 ++--
 hw/mac_dbdma.h|4 ++-
 hw/mac_nvram.c|   39 ++---
 hw/macio.c|   74 +++-
 hw/openpic.c  |   81 +
 hw/openpic.h  |2 +-
 hw/ppc_mac.h  |   16 ++
 hw/ppc_newworld.c |   30 +--
 hw/ppc_oldworld.c |   23 +++
 15 files changed, 201 insertions(+), 217 deletions(-)
 
 diff --git a/hw/cuda.c b/hw/cuda.c
 index 065c362..5c92d81 100644
 --- a/hw/cuda.c
 +++ b/hw/cuda.c
 @@ -117,6 +117,7 @@ typedef struct CUDATimer {
 } CUDATimer;
 
 typedef struct CUDAState {
 +MemoryRegion mem;
  /* cuda registers */
  uint8_t b;  /* B-side data */
  uint8_t a;  /* A-side data */
 @@ -722,7 +723,7 @@ static void cuda_reset(void *opaque)
  set_counter(s, s-timers[1], 0x);
 }
 
 -void cuda_init (int *cuda_mem_index, qemu_irq irq)
 +void cuda_init (MemoryRegion **cuda_mem, qemu_irq irq)
 {
  struct tm tm;
  CUDAState *s = cuda_state;
 @@ -738,8 +739,9 @@ void cuda_init (int *cuda_mem_index, qemu_irq irq)
  s-tick_offset = (uint32_t)mktimegm(tm) + RTC_OFFSET;
 
  s-adb_poll_timer = qemu_new_timer_ns(vm_clock, cuda_adb_poll, s);
 -*cuda_mem_index = cpu_register_io_memory(cuda_read, cuda_write, s,
 +cpu_register_io_memory(cuda_read, cuda_write, s,
   DEVICE_NATIVE_ENDIAN);
 +*cuda_mem = s-mem;

Just stumbled over this while debugging why the Mac machines don't boot 
anymore. Are you sure this part is correct? We're not registering the region 
(and its callbacks) anymore now, right?


Alex

Re: [Qemu-devel] [PATCH v4 24/39] ppc: convert to memory API

2011-09-08 Thread Alexander Graf


On 08.08.2011, at 15:09, Avi Kivity wrote:

 Reviewed-by: Richard Henderson r...@twiddle.net
 Reviewed-by: Anthony Liguori aligu...@us.ibm.com
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
 hw/cuda.c |6 ++-
 hw/escc.c |   42 +--
 hw/escc.h |2 +-
 hw/heathrow_pic.c |   29 --
 hw/ide.h  |2 +-
 hw/ide/macio.c|   36 ---
 hw/mac_dbdma.c|   32 ++--
 hw/mac_dbdma.h|4 ++-
 hw/mac_nvram.c|   39 ++---
 hw/macio.c|   74 +++-
 hw/openpic.c  |   81 +
 hw/openpic.h  |2 +-
 hw/ppc_mac.h  |   16 ++
 hw/ppc_newworld.c |   30 +--
 hw/ppc_oldworld.c |   23 +++
 15 files changed, 201 insertions(+), 217 deletions(-)
 
 

[...]

 @@ -89,7 +91,8 @@ static void pic_writel (void *opaque, target_phys_addr_t 
 addr, uint32_t value)
 }
 }
 
 -static uint32_t pic_readl (void *opaque, target_phys_addr_t addr)
 +static uint64_t pic_read(void *opaque, target_phys_addr_t addr,
 + unsigned size)
 {
 HeathrowPICS *s = opaque;
 HeathrowPIC *pic;
 @@ -120,19 +123,12 @@ static uint32_t pic_readl (void *opaque, 
 target_phys_addr_t addr)
 return value;
 }
 
 -static CPUWriteMemoryFunc * const pic_write[] = {
 -pic_writel,
 -pic_writel,
 -pic_writel,
 +static const MemoryRegionOps heathrow_pic_ops = {
 +.read = pic_read,
 +.write = pic_write,
 +.endianness = DEVICE_NATIVE_ENDIAN,

native endian

 };
 
 -static CPUReadMemoryFunc * const pic_read[] = {
 -pic_readl,
 -pic_readl,
 -pic_readl,
 -};
 -
 -
 static void heathrow_pic_set_irq(void *opaque, int num, int level)
 {
 HeathrowPICS *s = opaque;
 @@ -201,7 +197,7 @@ static void heathrow_pic_reset(void *opaque)
 s-pics[1].level_triggered = 0x1ff0;
 }
 
 -qemu_irq *heathrow_pic_init(int *pmem_index,
 +qemu_irq *heathrow_pic_init(MemoryRegion **pmem,
 int nb_cpus, qemu_irq **irqs)
 {
 HeathrowPICS *s;
 @@ -209,8 +205,9 @@ qemu_irq *heathrow_pic_init(int *pmem_index,
 s = qemu_mallocz(sizeof(HeathrowPICS));
 /* only 1 CPU */
 s-irqs = irqs[0];
 -*pmem_index = cpu_register_io_memory(pic_read, pic_write, s,
 - DEVICE_LITTLE_ENDIAN);

little endian. So you're changing the endianness of the calls? Not nice.


Alex

[Qemu-devel] [PATCH] PPC: Fix via-cuda memory registration

2011-09-08 Thread Alexander Graf

Commit 23c5e4ca (convert to memory API) broke the VIA Cuda emulation layer
by not registering the IO structs.

This patch registers them properly and thus makes -M g3beige and -M mac99
work again.

Signed-off-by: Alexander Graf ag...@suse.de

---

PS: Please test your patches. This one could have been found with an invocation
as simple as qemu-system-ppc. We boot into the OpenBIOS prompt by default,
so you wouldn't even have required a guest image or kernel.
---
 hw/cuda.c |   28 
 1 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/hw/cuda.c b/hw/cuda.c
index 6f05975..4077436 100644
--- a/hw/cuda.c
+++ b/hw/cuda.c
@@ -634,16 +634,20 @@ static uint32_t cuda_readl (void *opaque, 
target_phys_addr_t addr)
 return 0;
 }
 
-static CPUWriteMemoryFunc * const cuda_write[] = {
-cuda_writeb,
-cuda_writew,
-cuda_writel,
-};
-
-static CPUReadMemoryFunc * const cuda_read[] = {
-cuda_readb,
-cuda_readw,
-cuda_readl,
+static MemoryRegionOps cuda_ops = {
+.old_mmio = {
+.write = {
+cuda_writeb,
+cuda_writew,
+cuda_writel,
+},
+.read = {
+cuda_readb,
+cuda_readw,
+cuda_readl,
+},
+},
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 static bool cuda_timer_exist(void *opaque, int version_id)
@@ -740,8 +744,8 @@ void cuda_init (MemoryRegion **cuda_mem, qemu_irq irq)
 s-tick_offset = (uint32_t)mktimegm(tm) + RTC_OFFSET;
 
 s-adb_poll_timer = qemu_new_timer_ns(vm_clock, cuda_adb_poll, s);
-cpu_register_io_memory(cuda_read, cuda_write, s,
- DEVICE_NATIVE_ENDIAN);
+memory_region_init_io(s-mem, cuda_ops, s, cuda, 0x2000);
+
 *cuda_mem = s-mem;
 vmstate_register(NULL, -1, vmstate_cuda, s);
 qemu_register_reset(cuda_reset, s);
-- 
1.6.0.2

Re: [Qemu-devel] [PATCH v2] Fix X86 CPU topology in KVM mode

2011-09-08 Thread Jan Kiszka

On 2011-09-08 07:33, bharata@gmail.com wrote:
 From: Bharata B Rao bharata@gmail.com
 
 apic id returned to guest kernel in ebx for cpuid(function=1) depends on
 CPUX86State-cpuid_apic_id which gets populated after the cpuid information
 is cached in the host kernel. This results in broken CPU topology in guest.
 
 Fix this by setting cpuid_apic_id before cpuid information is passed to
 the host kernel. This is done by moving the setting of cpuid_apic_id
 to cpu_x86_init() where it will work for both KVM as well as TCG modes.
 
 Signed-off-by: Bharata B Rao bharata@gmail.com
 ---
 This is the next post of the fix that addresses Jan's comment about
 bringing back (smp_cpus  1) check.
 
 The previous version was posted here:
 http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00892.html
 
 I coudn't boot a 486 kernel successfully with qemu and hence not
 sure if and how this fix breaks i486. Any help from Jan or others
 who might have easy means to boot 486 would be good.

At least it preserves the current logic, just moves it up in the
initialization path.

 
  hw/pc.c  |1 -
  target-i386/helper.c |5 +
  2 files changed, 5 insertions(+), 1 deletions(-)
 
 diff --git a/hw/pc.c b/hw/pc.c
 index 5bc845a..f9cca4d 100644
 --- a/hw/pc.c
 +++ b/hw/pc.c
 @@ -933,7 +933,6 @@ static CPUState *pc_new_cpu(const char *cpu_model)
  exit(1);
  }
  if ((env-cpuid_features  CPUID_APIC) || smp_cpus  1) {
 -env-cpuid_apic_id = env-cpu_index;
  env-apic_state = apic_init(env, env-cpuid_apic_id);
  }
  qemu_register_reset(pc_cpu_reset, env);
 diff --git a/target-i386/helper.c b/target-i386/helper.c
 index 5df40d4..139a193 100644
 --- a/target-i386/helper.c
 +++ b/target-i386/helper.c
 @@ -1256,6 +1256,11 @@ CPUX86State *cpu_x86_init(const char *cpu_model)
  cpu_x86_close(env);
  return NULL;
  }
 +
 +if ((env-cpuid_features  CPUID_APIC) || smp_cpus  1) {
 +env-cpuid_apic_id = env-cpu_index;
 +}
 +
  mce_init(env);
  
  qemu_init_vcpu(env);

Tested-and-acked-by: Jan Kiszka jan.kis...@siemens.com

Just in time, we happen to hit this bug today too. It confused libvirt
in the guest quite a bit...

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] unable to access the serial port on the Vm

2011-09-08 Thread bala suru

Hi,
I'm running one VM on the Qemu hypervisor.
I'm tried too access the /dev/ttyS0 from the VM, but I cant access that ,.
it shows input out error .

i.e when I do $cat /dev/ttyS0 - it gives  input output  error .
what may be the problem .. pls help

Re: [Qemu-devel] [PATCH] KVM: emulate lapic tsc deadline timer for hvm

2011-09-08 Thread Liu, Jinsong

 --- a/arch/x86/include/asm/msr-index.h
 +++ b/arch/x86/include/asm/msr-index.h
 @@ -229,6 +229,8 @@
  #define MSR_IA32_APICBASE_ENABLE   (111)
  #define MSR_IA32_APICBASE_BASE (0xf12)
 
 +#define MSR_IA32_TSCDEADLINE   0x06e0
 +
  #define MSR_IA32_UCODE_WRITE   0x0079
  #define MSR_IA32_UCODE_REV 0x008b
 
 Need to add to msrs_to_save so live migration works.
 
 MSR must be explicitly listed in qemu, also.
 

Marcelo, seems MSR don't need explicitly list in qemu?
KVM side adding MSR_IA32_TSCDEADLINE to msrs_to_save is enough. Qemu will get 
it through KVM_GET_MSR_INDEX_LIST.
Do I miss something?

Thanks,
Jinsong

[Qemu-devel] [PATCH] ahci: Remove unused struct member

2011-09-08 Thread Stefan Weil

Member variable is_read is written, but never read
(contrary to its name). Remove it.

Kevin Wolf kw...@redhat.com
Signed-off-by: Stefan Weil w...@mail.berlios.de
---
 hw/ide/ahci.c |2 --
 hw/ide/ahci.h |1 -
 2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index f4fa154..a8659cf 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -754,7 +754,6 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 case READ_FPDMA_QUEUED:
 DPRINTF(port, NCQ reading %d sectors from LBA %ld, tag %d\n,
 ncq_tfs-sector_count-1, ncq_tfs-lba, ncq_tfs-tag);
-ncq_tfs-is_read = 1;
 
 DPRINTF(port, tag %d aio read %ld\n, ncq_tfs-tag, ncq_tfs-lba);
 
@@ -768,7 +767,6 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 case WRITE_FPDMA_QUEUED:
 DPRINTF(port, NCQ writing %d sectors to LBA %ld, tag %d\n,
 ncq_tfs-sector_count-1, ncq_tfs-lba, ncq_tfs-tag);
-ncq_tfs-is_read = 0;
 
 DPRINTF(port, tag %d aio write %ld\n, ncq_tfs-tag, 
ncq_tfs-lba);
 
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index 3c29d93..5de986c 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -259,7 +259,6 @@ typedef struct NCQTransferState {
 BlockDriverAIOCB *aiocb;
 QEMUSGList sglist;
 BlockAcctCookie acct;
-int is_read;
 uint16_t sector_count;
 uint64_t lba;
 uint8_t tag;
-- 
1.7.2.5

Re: [Qemu-devel] [RFC PATCH 4/5] VFIO: Add PCI device support

2011-09-08 Thread Alex Williamson

On Thu, 2011-09-08 at 10:52 +0300, Avi Kivity wrote:
 On 09/07/2011 09:55 PM, Konrad Rzeszutek Wilk wrote:
 If you don't know what to do here, say N.
+
+menuconfig VFIO_PCI
+bool VFIO support for PCI devices
+depends on VFIO  PCI
+default y if X86
 
  Hahah.. And Linus is going to tear your behind for that.
 
  Default should be 'n'
 
 It depends on VFIO, which presumably defaults to n.

Yes, exactly.

Re: [Qemu-devel] [RFC PATCH 0/5] VFIO-NG group/device/iommu framework

2011-09-08 Thread Alex Williamson

On Wed, 2011-09-07 at 13:58 +0200, Alexander Graf wrote:
 On 01.09.2011, at 21:50, Alex Williamson wrote:
 
  Trying to move beyond talking about how VFIO should work to
  re-writing the code.  This is pre-alpha, known broken, will
  probably crash your system but it illustrates some of how
  I see groups, devices, and iommus interacting.  This is just
  the framework, no code to actually support user space drivers
  or device assignment yet.
  
  The iommu portions are still using the FIXME PCI specific
  hooks.  Once Joerg gets some buy-in on his bus specific iommu
  patches, we can move to that.
  
  The group management is more complicated than I'd like and
  you can get groups into a bad state by killing the test program
  with devices/iommus open.  The locking is overly simplistic.
  But, it's a start.  Please make constructive comments and
  suggestions.  Patches based on v3.0.  Thanks,
 
 Looks pretty reasonable to me so far, but I guess we only know for sure once 
 we have non-PCI implemented and working with this scheme as well.
 Btw I couldn't find the PCI BAR regions mmaps and general config space 
 exposure. Where has that gone?

I ripped it out for now just to work on the group/device/iommu
framework.  I didn't see a need to make a functional RFC just to get
some buy-in on the framework.  Thanks,

Alex

[Qemu-devel] [PATCH] ARM7TDMI: Enable ARMv4T features

2011-09-08 Thread Marek Vasut

Signed-off-by: Marek Vasut marek.va...@gmail.com
---
 target-arm/helper.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 58cd99f..2f3e937 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -53,6 +53,7 @@ static void cpu_reset_model_id(CPUARMState *env, uint32_t id)
 env-cp15.c0_cpuid = id;
 switch (id) {
 case ARM_CPUID_ARM7TDMI:
+set_feature(env, ARM_FEATURE_V4T);
 // set_feature(env, ARM_FEATURE_ABORT_BU);
// set_feature(env, ARM_FEATURE_NO_CP15);
 break;
-- 
1.7.5.4

[Qemu-devel] DEMANDE DE DEVIS POUR

2011-09-08 Thread VOTRE







Nous sommes là
pour 

vous faire bénéficier d’une étude
GRATUITE et

sans engagement de votre part pour l’installation de:



• Climatisation réversible, 

• solaire sur toiture

• solaire pour chauffer une piscine etc…


• Isolation thermique des combles 

• Adoucisseur d’eau

• L’énergie solaire vous permet de 

devenir producteur d’électricité et de revendre votre
électricité à EDF* 

Vous possédez
une propriété, un entrepôt, un gîte, un garage, un bâtiment 

industriel, une grange ou vous voulez construire un abri, un au vent … 

Bénéficier d’un revenue complémentaire durable pendant VINGT ans.




(*sous réserve d'acceptation de votre
dossier par EDF



**contacte-nous
par email ici



***C°A°D°O°V°I°S






Contactez-nous ***0 2 pste 4 0 pste 5 7 pste
 0 1pste1p3ppp





Pour ne plus recevoir notre newsletter CONTACTEZ-NOUS PAR
EMAIL-ICI

[Qemu-devel] [Bug 824650] Re: Latest GIT assert error in arp_table.c

2011-09-08 Thread Nigel Horne

No - that's not relevant.  The latest git
(07ff2c4475df77e38a31d50ee7f3932631806c15) still crashes after just a
couple of minutes with just about any guest on a Linux host.

These are the args for my FreeBSD guest:

qemu-system-i386 -drive
file=freebsd8.1-i386,index=0,media=disk,cache=unsafe -drive
file=/dev/cdrom,index=1,media=cdrom -boot c -enable-kvm -m 128

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/824650

Title:
  Latest GIT assert error in arp_table.c

Status in QEMU:
  New

Bug description:
  The latest git version of qemu (commit
  8cc7c3952d4d0a681d8d4c3ac89a206a5bfd7f00) crashes after a few minutes.
  All was fine up to a few days ago.  This is wth both x86 and sparc
  emulation, on an x86_64 host.

  e.g. qemu-system-sparc -drive
  file=netbsd5.0.2-sparc,index=0,media=disk,cache=unsafe -m 256 -boot c
  -nographic -redir tcp:2232::22:

   qemu-system-sparc: slirp/arp_table.c:75: arp_table_search: Assertion
  `(ip_addr  (__extension__ ({ register unsigned int __v, __x = (~(0xf
   28)); if (__builtin_constant_p (__x)) __v = __x)  0xff00)
   24) | (((__x)  0x00ff)  8) | (((__x)  0xff00)  8) |
  (((__x)  0x00ff)  24)); else __asm__ (bswap %0 : =r (__v) :
  0 (__x)); __v; }))) != 0' failed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/824650/+subscriptions

[Qemu-devel] [PATCH] SPARC: Trivial patch to clean up npc monitor output

2011-09-08 Thread Nathan Kunkee



 This patch fixes the spacing of the PC and NPC output from 'info cpus' 
for SPARC.


Signed-off-by: Nathan Kunkee nkunke...@hotmail.com

diff --git a/monitor.c b/monitor.c
index 1b8ba2c..16cd4c5 100644
--- a/monitor.c
+++ b/monitor.c
@@ -884,9 +884,9 @@ static void print_cpu_iter(QObject *obj, void *opaque)
 monitor_printf(mon, nip=0x TARGET_FMT_lx,
(target_long) qdict_get_int(cpu, nip));
 #elif defined(TARGET_SPARC)
-monitor_printf(mon, pc=0x  TARGET_FMT_lx,
+monitor_printf(mon, pc=0x TARGET_FMT_lx,
(target_long) qdict_get_int(cpu, pc));
-monitor_printf(mon, npc=0x TARGET_FMT_lx,
+monitor_printf(mon,  npc=0x TARGET_FMT_lx,
(target_long) qdict_get_int(cpu, npc));
 #elif defined(TARGET_MIPS)
 monitor_printf(mon, PC=0x TARGET_FMT_lx,

Re: [Qemu-devel] [PATCH 0/2] improve qemu-img conversion performance

2011-09-08 Thread Sage Weil

On Thu, 8 Sep 2011, Stefan Hajnoczi wrote:
 On Wed, Sep 07, 2011 at 04:06:51PM -0700, Yehuda Sadeh wrote:
  The following set of patches improve the qemu-img conversion process
  performance. When using a higher latency backend, small writes have a
  severe impact on the time it takes to do image conversion. 
  We switch to using async writes, and we avoid splitting writes due to
  holes when the holes are small enough.
  
  Yehuda Sadeh (2):
qemu-img: async write to block device when converting image
qemu-img: don't skip writing small holes
  
   qemu-img.c |   34 +++---
   1 files changed, 27 insertions(+), 7 deletions(-)
  
  -- 
  2.7.5.1
 
 This has nothing to do with the patch itself, but I've been curious
 about the existence of both a QEMU and a Linux kernel rbd block driver.
 
 The I/O latency with qemu-img has been an issue for rbd users.  But they
 have the option of using the Linux kernel rbd block driver, where
 qemu-img can take advantage of the page cache instead of performing
 direct I/O.

 Does this mean you intend to support both QEMU block/rbd.c and Linux
 drivers/block/rbd.c?  As a user I would go with the Linux kernel driver
 instead of the QEMU block driver because it offers page cache and host
 block device features.  On the other hand a userspace driver is nice
 because it does not require privileges.

We intend to support both drivers, yes.  The native qemu driver is 
generally more convenient because there is no kernel dependency, so we 
want to make qemu-img perform reasonably one way or another.

There are plans to implement some limited buffering (and flush) in librbd 
to make the device behave a bit more like a disk with a cache.  That will 
mask the sync write latency, but I suspect that doing these writes using 
the aio interface (and ignoring small holes) will help everyone...

sage

Re: [Qemu-devel] [PATCH 09/12] sheepdog: move coroutine send/recv function to generic code

2011-09-08 Thread MORITA Kazutaka

At Thu,  8 Sep 2011 17:25:02 +0200,
Paolo Bonzini wrote:
 
 Outside coroutines, avoid busy waiting on EAGAIN by temporarily
 making the socket blocking.
 
 The API of qemu_recvv/qemu_sendv is slightly different from
 do_readv/do_writev because they do not handle coroutines.  It
 returns the number of bytes written before encountering an
 EAGAIN.  The specificity of yielding on EAGAIN is entirely in
 qemu-coroutine.c.
 
 Cc: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  block/sheepdog.c |  221 +
  cutils.c |  108 ++
  qemu-common.h|3 +
  qemu-coroutine.c |   71 +
  qemu-coroutine.h |   26 +++
  5 files changed, 229 insertions(+), 200 deletions(-)
 
 diff --git a/block/sheepdog.c b/block/sheepdog.c
 index af696a5..188a8d8 100644
 --- a/block/sheepdog.c
 +++ b/block/sheepdog.c
 @@ -443,129 +443,6 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState 
 *bs, QEMUIOVector *qiov,
  return acb;
  }
  
 -#ifdef _WIN32
 -
 -struct msghdr {
 -struct iovec *msg_iov;
 -size_tmsg_iovlen;
 -};
 -
 -static ssize_t sendmsg(int s, const struct msghdr *msg, int flags)
 -{
 -size_t size = 0;
 -char *buf, *p;
 -int i, ret;
 -
 -/* count the msg size */
 -for (i = 0; i  msg-msg_iovlen; i++) {
 -size += msg-msg_iov[i].iov_len;
 -}
 -buf = g_malloc(size);
 -
 -p = buf;
 -for (i = 0; i  msg-msg_iovlen; i++) {
 -memcpy(p, msg-msg_iov[i].iov_base, msg-msg_iov[i].iov_len);
 -p += msg-msg_iov[i].iov_len;
 -}
 -
 -ret = send(s, buf, size, flags);
 -
 -g_free(buf);
 -return ret;
 -}
 -
 -static ssize_t recvmsg(int s, struct msghdr *msg, int flags)
 -{
 -size_t size = 0;
 -char *buf, *p;
 -int i, ret;
 -
 -/* count the msg size */
 -for (i = 0; i  msg-msg_iovlen; i++) {
 -size += msg-msg_iov[i].iov_len;
 -}
 -buf = g_malloc(size);
 -
 -ret = qemu_recv(s, buf, size, flags);
 -if (ret  0) {
 -goto out;
 -}
 -
 -p = buf;
 -for (i = 0; i  msg-msg_iovlen; i++) {
 -memcpy(msg-msg_iov[i].iov_base, p, msg-msg_iov[i].iov_len);
 -p += msg-msg_iov[i].iov_len;
 -}
 -out:
 -g_free(buf);
 -return ret;
 -}
 -
 -#endif
 -
 -/*
 - * Send/recv data with iovec buffers
 - *
 - * This function send/recv data from/to the iovec buffer directly.
 - * The first `offset' bytes in the iovec buffer are skipped and next
 - * `len' bytes are used.
 - *
 - * For example,
 - *
 - *   do_send_recv(sockfd, iov, len, offset, 1);
 - *
 - * is equals to
 - *
 - *   char *buf = malloc(size);
 - *   iov_to_buf(iov, iovcnt, buf, offset, size);
 - *   send(sockfd, buf, size, 0);
 - *   free(buf);
 - */
 -static int do_send_recv(int sockfd, struct iovec *iov, int len, int offset,
 -int write)
 -{
 -struct msghdr msg;
 -int ret, diff;
 -
 -memset(msg, 0, sizeof(msg));
 -msg.msg_iov = iov;
 -msg.msg_iovlen = 1;
 -
 -len += offset;
 -
 -while (iov-iov_len  len) {
 -len -= iov-iov_len;
 -
 -iov++;
 -msg.msg_iovlen++;
 -}
 -
 -diff = iov-iov_len - len;
 -iov-iov_len -= diff;
 -
 -while (msg.msg_iov-iov_len = offset) {
 -offset -= msg.msg_iov-iov_len;
 -
 -msg.msg_iov++;
 -msg.msg_iovlen--;
 -}
 -
 -msg.msg_iov-iov_base = (char *) msg.msg_iov-iov_base + offset;
 -msg.msg_iov-iov_len -= offset;
 -
 -if (write) {
 -ret = sendmsg(sockfd, msg, 0);
 -} else {
 -ret = recvmsg(sockfd, msg, 0);
 -}
 -
 -msg.msg_iov-iov_base = (char *) msg.msg_iov-iov_base - offset;
 -msg.msg_iov-iov_len += offset;
 -
 -iov-iov_len += diff;
 -return ret;
 -}
 -
  static int connect_to_sdog(const char *addr, const char *port)
  {
  char hbuf[NI_MAXHOST], sbuf[NI_MAXSERV];
 @@ -618,65 +495,6 @@ success:
  return fd;
  }
  
 -static int do_readv_writev(int sockfd, struct iovec *iov, int len,
 -   int iov_offset, int write)
 -{
 -int ret;
 -again:
 -ret = do_send_recv(sockfd, iov, len, iov_offset, write);
 -if (ret  0) {
 -if (errno == EINTR) {
 -goto again;
 -}
 -if (errno == EAGAIN) {
 -if (qemu_in_coroutine()) {
 -qemu_coroutine_yield();
 -}
 -goto again;
 -}
 -error_report(failed to recv a rsp, %s, strerror(errno));
 -return 1;
 -}
 -
 -iov_offset += ret;
 -len -= ret;
 -if (len) {
 -goto again;
 -}
 -
 -return 0;
 -}
 -
 -static int do_readv(int sockfd, struct iovec *iov, int len, int iov_offset)
 -{
 -return do_readv_writev(sockfd, iov, len, iov_offset, 0);
 -}
 -
 -static int do_writev(int sockfd, struct iovec *iov, int len, int iov_offset)
 -{
 -return do_readv_writev(sockfd, iov, len,

Re: [Qemu-devel] [PATCH] pci: Remove unused pci_reserve_capability

2011-09-08 Thread Michael S. Tsirkin

On Thu, Sep 08, 2011 at 12:44:47PM +0200, Jan Kiszka wrote:
 eepro100 was the last user. Now pci_add_capability is powerful enough.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Applied, thanks.

 ---
  hw/pci.c |6 --
  hw/pci.h |2 --
  2 files changed, 0 insertions(+), 8 deletions(-)
 
 diff --git a/hw/pci.c b/hw/pci.c
 index 57ff7b1..63c346d 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -2028,12 +2028,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t 
 cap_id, uint8_t size)
  pdev-config[PCI_STATUS] = ~PCI_STATUS_CAP_LIST;
  }
  
 -/* Reserve space for capability at a known offset (to call after load). */
 -void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size)
 -{
 -memset(pdev-used + offset, 0xff, size);
 -}
 -
  uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
  {
  return pci_find_capability_list(pdev, cap_id, NULL);
 diff --git a/hw/pci.h b/hw/pci.h
 index 391217e..f2dae63 100644
 --- a/hw/pci.h
 +++ b/hw/pci.h
 @@ -209,8 +209,6 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
  
  void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t 
 cap_size);
  
 -void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t 
 size);
 -
  uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
  
  
 -- 
 1.7.3.4

[Qemu-devel] [PATCH] support add-cow file format

2011-09-08 Thread Dong Xu Wang


As raw file format does not support backing_file and copy on write feature, so 
I add COW to it to support backing_file option. I store dirty bitmap in an 
add-cow file. When executed, it looks like this:
qemu-img create -f add-cow -o backing_file=ubuntu.img,image_file=test.img 
test.add-cow
qemu -drive if=virtio,file=test.add-cow -m 1024 

(test.img is a raw format file; test.add-cow stores bitmap)

Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
---
 Makefile.objs   |1 +
 block.c |   83 ++-
 block.h |2 +
 block/add-cow.c |  456 +++
 block_int.h |6 +
 qemu-img.c  |   10 ++
 6 files changed, 555 insertions(+), 3 deletions(-)
 create mode 100644 block/add-cow.c

diff --git a/Makefile.objs b/Makefile.objs
index 26b885b..1402f9f 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -31,6 +31,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 
 block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o 
vvfat.o
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
+block-nested-y += add-cow.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
 block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
diff --git a/block.c b/block.c
index a8c789a..c797cfc 100644
--- a/block.c
+++ b/block.c
@@ -369,7 +369,7 @@ static int find_image_format(const char *filename, 
BlockDriver **pdrv)
 {
 int ret, score, score_max;
 BlockDriver *drv1, *drv;
-uint8_t buf[2048];
+uint8_t buf[4096];
 BlockDriverState *bs;
 
 ret = bdrv_file_open(bs, filename, 0);
@@ -657,6 +657,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
int flags,
 int back_flags;
 BlockDriver *back_drv = NULL;
 
+char imaging_filename[PATH_MAX];
+int cow_flags;
+BlockDriver *cow_drv = NULL;
+
 bs-backing_hd = bdrv_new();
 
 if (path_has_protocol(bs-backing_file)) {
@@ -686,6 +690,30 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
int flags,
 /* base image inherits from parent */
 bs-backing_hd-keep_read_only = bs-keep_read_only;
 }
+
+/* If there is a image_file, must be together with backing_file */
+if (bs-image_file[0] != '\0') {
+bs-image_hd = bdrv_new();
+if (path_has_protocol(bs-image_file)) {
+pstrcpy(imaging_filename, sizeof(imaging_filename),
+bs-image_file);
+} else {
+path_combine(imaging_filename, sizeof(imaging_filename),
+ filename, bs-image_file);
+}
+
+cow_drv = bdrv_find_format(add-cow);
+
+cow_flags =
+ (flags  (~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING))) | 
BDRV_O_RDWR;
+bs-image_hd-keep_read_only = 0;
+
+ret = bdrv_open(bs-image_hd, imaging_filename, cow_flags, 
back_drv);
+if (ret  0) {
+bdrv_close(bs);
+return ret;
+}
+}
 }
 
 if (!bdrv_key_required(bs)) {
@@ -711,6 +739,10 @@ void bdrv_close(BlockDriverState *bs)
 bdrv_delete(bs-backing_hd);
 bs-backing_hd = NULL;
 }
+if (bs-image_hd) {
+bdrv_delete(bs-image_hd);
+bs-image_hd = NULL;
+}
 bs-drv-bdrv_close(bs);
 g_free(bs-opaque);
 #ifdef _WIN32
@@ -851,7 +883,7 @@ int bdrv_commit(BlockDriverState *bs)
 
 if (!drv)
 return -ENOMEDIUM;
-
+
 if (!bs-backing_hd) {
 return -ENOTSUP;
 }
@@ -2024,6 +2056,16 @@ void bdrv_get_backing_filename(BlockDriverState *bs,
 }
 }
 
+void bdrv_get_image_filename(BlockDriverState *bs,
+   char *filename, int filename_size)
+{
+if (!bs-image_file) {
+pstrcpy(filename, filename_size, );
+} else {
+pstrcpy(filename, filename_size, bs-image_file);
+}
+}
+
 int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
   const uint8_t *buf, int nb_sectors)
 {
@@ -3201,8 +3243,10 @@ int bdrv_img_create(const char *filename, const char 
*fmt,
 QEMUOptionParameter *param = NULL, *create_options = NULL;
 QEMUOptionParameter *backing_fmt, *backing_file, *size;
 BlockDriverState *bs = NULL;
-BlockDriver *drv, *proto_drv;
+BlockDriver *drv, *proto_drv, *cow_drv;;
 BlockDriver *backing_drv = NULL;
+QEMUOptionParameter *cow_create_options = NULL;
+QEMUOptionParameter *image_file;
 int ret = 0;
 
 /* Find driver and parse its options */
@@ -3225,10 +3269,16 @@ int bdrv_img_create(const char *filename, const char 
*fmt,
 create_options = append_option_parameters(create_options,
   proto_drv-create_options);
 
+/* Just support raw format now*/

[Qemu-devel] [PATCH] tcg/ppc64: Fix zero extension code generation bug for ppc64 host

2011-09-08 Thread David Gibson

From: Thomas Huth th...@de.ibm.com

The ppc64 code generation backend uses an rldicr (Rotate Left Double
Immediate and Clear Right) instruction to implement zero extension of
a 32 bit quantity to a 64 bit quantity (INDEX_op_ext32u_i64).  However
this is wrong - this instruction clears specified low bits of the
value, instead of high bits as we require for a zero extension.  It
should instead use an rldicl (Rotate Left Double Immediate and Clear
Left) instruction.

Presumably amongst other things, this causes the SLOF firmware image
used with -M pseries to not boot on a ppc64 host.

It appears this bug was exposed by commit
0bf1dbdcc935dfc220a93cd990e947e90706aec6 (tcg/ppc64: fix 16/32 mixup)
which enabled the use of the op_ext32u_i64 operation on the ppc64
backend.

Signed-off-by: Thomas Huth th...@de.ibm.com
Signed-off-by: David Gibson da...@gibson.dropbear.id.au
---
 tcg/ppc64/tcg-target.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index d831684..e3c63ad 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -1560,7 +1560,7 @@ static void tcg_out_op (TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 break;
 
 case INDEX_op_ext32u_i64:
-tcg_out_rld (s, RLDICR, args[0], args[1], 0, 32);
+tcg_out_rld (s, RLDICL, args[0], args[1], 0, 32);
 break;
 
 case INDEX_op_setcond_i32:
-- 
1.7.5.4

89 matches

Mail list logo