date:20120718

Re: [Qemu-devel] [PATCH] Fix for qemu crash on assertion error when adding PCI passthru device.

2012-07-18 Thread Jan Kiszka

On 2012-07-18 22:42, Ma, Stephen B. wrote:
> Sorry for taking so long to reply.  I am new to this.  Should this patch be 
> committed or just dropped

This bug was fixed by 266ca11a0433643a3cc3146a9837d9f2b0bfbe3b in the
meantime.

Jan

> 
> 
> -Original Message-
> From: Jan Kiszka [mailto:jan.kis...@web.de] 
> Sent: Sunday, June 17, 2012 11:25 PM
> To: Anthony Liguori
> Cc: Michael S. Tsirkin; 'qemu-devel@nongnu.org'; Ma, Stephen B.
> Subject: Re: [PATCH] Fix for qemu crash on assertion error when adding PCI 
> passthru device.
> 
> On 2012-06-17 16:28, Anthony Liguori wrote:
>> On 06/17/2012 03:34 AM, Michael S. Tsirkin wrote:
>>> On Sun, Jun 17, 2012 at 06:26:33AM +, Ma, Stephen B. wrote:

 Michael,

 Thanks for the review.  I added the unparent to the qdev_free.


 ---
   hw/qdev.c |1 +
   1 files changed, 1 insertions(+), 0 deletions(-)

 diff --git a/hw/qdev.c b/hw/qdev.c
 index d2dc28b..ed1328d 100644
 --- a/hw/qdev.c
 +++ b/hw/qdev.c
 @@ -264,6 +264,7 @@ void qdev_init_nofail(DeviceState *dev)
   /* Unlink device from bus and free the structure.  */
   void qdev_free(DeviceState *dev)
   {
 +object_unparent(OBJECT(dev));
   object_delete(OBJECT(dev));
   }

 --
 1.7.1
>>>
>>> Anthony, any feedback?
>>
>> Yes, this is wrong.
>>
>> PCI passthrough isn't in qemu.git so it's not clear to me where this 
>> is happening.  Why would qdev_free be called when adding a PCI 
>> passthru device?
> 
> The bug is reproducible with any in-tree device (at least PCI) that happens 
> to return != 0 from its init handler.
> 
> Jan
> 
> 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [help]: how to use HMP command dump-guest-memory

2012-07-18 Thread Wen Congyang

At 07/19/2012 12:47 AM, Sheldon Wrote:
> I want to dump all guest's memory to file ./guestcore
> I execute this command as follow:
> (qemu) dump-guest-memory -p protocol file:./guestcore
> invalid char in expression

Please try this command:
dump-guest-memory -p file:./guestcore

Thanks
Wen Congyang

> 
> 
> 
> 
> 
>

[Qemu-devel] [PATCH] vfio-powerpc: added VFIO support (v5)

2012-07-18 Thread Alexey Kardashevskiy

It literally does the following:

1. POWERPC IOMMU support (the kernel counterpart is required)

2. The patch assumes that IOAPIC calls are going to be replaced
with something generic.

3. Added sPAPRVFIOData (hw/spapr_iommu_vfio.h) which describes
the interface between VFIO and sPAPR IOMMU.

4. Change sPAPR PHB to scan the PCI bus which is used for
the IOMMU-VFIO group. Now it is enough to add the following to
the QEMU command line to get VFIO up with all the devices from
IOMMU group with id=3:
-device spapr-pci-host-bridge,busname=E1000E,buid=0x3,iommu=3,\
 
mem_win_addr=0x2300,io_win_addr=0x2400,msi_win_addr=0x2500

With the pathes posted earlier, this patch fully supports
VFIO what includes MSIX as well.

Cc: Alex Williamson 
Cc: David Gibson 
Cc: Benjamin Herrenschmidt 
Cc: Alexander Graf 
Signed-off-by: Alexey Kardashevskiy 
---
 hw/ppc/Makefile.objs   |3 ++
 hw/spapr.h |3 ++
 hw/spapr_iommu.c   |   58 -
 hw/spapr_iommu_vfio.h  |   34 
 hw/spapr_pci.c |  124 ++--
 hw/spapr_pci.h |6 +++
 hw/vfio_pci.c  |   63 ++
 hw/vfio_pci.h  |2 +
 linux-headers/linux/vfio.h |   26 ++
 trace-events   |1 +
 10 files changed, 314 insertions(+), 6 deletions(-)
 create mode 100644 hw/spapr_iommu_vfio.h

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index f573a95..c46a049 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -25,4 +25,7 @@ obj-$(CONFIG_FDT) += ../device_tree.o
 # Xilinx PPC peripherals
 obj-y += xilinx_ethlite.o
 
+# VFIO PCI device assignment
+obj-$(CONFIG_VFIO_PCI) += vfio_pci.o
+
 obj-y := $(addprefix ../,$(obj-y))
diff --git a/hw/spapr.h b/hw/spapr.h
index b37f337..aae6aee 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -340,4 +340,7 @@ int spapr_dma_dt(void *fdt, int node_off, const char 
*propname,
 int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
   DMAContext *dma);
 
+#include "hw/spapr_iommu_vfio.h"
+void spapr_vfio_init_dma(uint32_t liobn, sPAPRVFIOData *data);
+
 #endif /* !defined (__HW_SPAPR_H__) */
diff --git a/hw/spapr_iommu.c b/hw/spapr_iommu.c
index 50c288d..86a37d2 100644
--- a/hw/spapr_iommu.c
+++ b/hw/spapr_iommu.c
@@ -23,6 +23,8 @@
 #include "dma.h"
 
 #include "hw/spapr.h"
+#include "hw/spapr_iommu_vfio.h"
+#include "hw/vfio_pci.h"
 
 #include 
 
@@ -183,6 +185,56 @@ static int put_tce_emu(target_ulong liobn, target_ulong 
ioba, target_ulong tce)
 return 0;
 }
 
+typedef struct sPAPRVFIOTable {
+sPAPRVFIOData *data;
+uint32_t liobn;
+QLIST_ENTRY(sPAPRVFIOTable) list;
+} sPAPRVFIOTable;
+
+QLIST_HEAD(vfio_tce_tables, sPAPRVFIOTable) vfio_tce_tables;
+
+void spapr_vfio_init_dma(uint32_t liobn, sPAPRVFIOData *data)
+{
+sPAPRVFIOTable *t;
+
+t = g_malloc0(sizeof(*t));
+t->data = data;
+t->liobn = liobn;
+
+QLIST_INSERT_HEAD(&vfio_tce_tables, t, list);
+}
+
+static int put_tce_vfio(uint32_t liobn, target_ulong ioba, target_ulong tce)
+{
+sPAPRVFIOTable *t;
+struct tce_iommu_dma_map map = {
+.argsz = sizeof(map),
+.va = 0,
+.dmaaddr = ioba,
+};
+
+QLIST_FOREACH(t, &vfio_tce_tables, list) {
+if (t->liobn != liobn) {
+continue;
+}
+if (tce) {
+map.va = (uintptr_t)qemu_get_ram_ptr(tce & ~SPAPR_TCE_PAGE_MASK);
+
+if (t->data->map(t->data, &map)) {
+perror("TCE_MAP_DMA");
+return H_PARAMETER;
+}
+} else {
+if (t->data->unmap(t->data, &map)) {
+perror("TCE_UNMAP_DMA");
+return H_PARAMETER;
+}
+}
+return H_SUCCESS;
+}
+return H_CONTINUE; /* positive non-zero value */
+}
+
 static target_ulong h_put_tce(CPUPPCState *env, sPAPREnvironment *spapr,
   target_ulong opcode, target_ulong *args)
 {
@@ -200,7 +252,11 @@ static target_ulong h_put_tce(CPUPPCState *env, 
sPAPREnvironment *spapr,
 ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
 
 ret = put_tce_emu(liobn, ioba, tce);
-if (0 >= ret) {
+if (ret <= 0) {
+return ret ? H_PARAMETER : H_SUCCESS;
+}
+ret = put_tce_vfio(liobn, ioba, tce);
+if (ret <= 0) {
 return ret ? H_PARAMETER : H_SUCCESS;
 }
 #ifdef DEBUG_TCE
diff --git a/hw/spapr_iommu_vfio.h b/hw/spapr_iommu_vfio.h
new file mode 100644
index 000..cc2d368
--- /dev/null
+++ b/hw/spapr_iommu_vfio.h
@@ -0,0 +1,34 @@
+/*
+ * Definitions for VFIO IOMMU implementation for SPAPR TCE.
+ *
+ * Copyright (c) 2012 Alexey Kardashevskiy 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any late

Re: [Qemu-devel] [PATCH] vfio-powerpc: added VFIO support (v4)

2012-07-18 Thread Alexey Kardashevskiy

On 19/07/12 00:14, Alex Williamson wrote:
> On Wed, 2012-07-18 at 21:09 +1000, Alexey Kardashevskiy wrote:
>> It literally does the following:
>>
>> 1. POWERPC IOMMU support (the kernel counterpart is required)
>>
>> 2. The patch assumes that IOAPIC calls are going to be replaced
>> with something generic.
>>
>> 3. Added sPAPRVFIOData (hw/spapr_iommu_vfio.h) which describes
>> the interface between VFIO and sPAPR IOMMU.
>>
>> 4. Change sPAPR PHB to scan the PCI bus which is used for
>> the IOMMU-VFIO group. Now it is enough to add the following to
>> the QEMU command line to get VFIO up with all the devices from
>> IOMMU group with id=3:
>> -device spapr-pci-host-bridge,busname=E1000E,buid=0x3,iommu=3,\
>>  
>> mem_win_addr=0x2300,io_win_addr=0x2400,msi_win_addr=0x2500
>>
>> WIth the pathes posted today a bit earlier, this patch fully supports
>> VFIO what includes MSIX as well.
>>
>> ps. yes, I know that linux_vfio.h has moved, will fix it later :)
>>
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>  hw/linux-vfio.h   |   26 +++
>>  hw/ppc/Makefile.objs  |3 ++
>>  hw/spapr.h|4 ++
>>  hw/spapr_iommu.c  |   62 -
>>  hw/spapr_iommu_vfio.h |   34 ++
>>  hw/spapr_pci.c|  124 
>> +++--
>>  hw/spapr_pci.h|6 +++
>>  hw/vfio_pci.c |   64 +
>>  hw/vfio_pci.h |2 +
>>  trace-events  |1 +
>>  10 files changed, 320 insertions(+), 6 deletions(-)
>>  create mode 100644 hw/spapr_iommu_vfio.h
>>
>> diff --git a/hw/linux-vfio.h b/hw/linux-vfio.h
>> index 300d49b..27a0501 100644
>> --- a/hw/linux-vfio.h
>> +++ b/hw/linux-vfio.h
>> @@ -442,4 +442,30 @@ struct vfio_iommu_type1_dma_unmap {
>>  
>>  #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)
>>  
>> +/*
>> + * Interface to SPAPR TCE (POWERPC Book3S)
>> + */
>> +#define SPAPR_TCE_IOMMU 2
>> +
>> +struct tce_iommu_info {
>> +__u32 argsz;
>> +__u32 flags;
>> +__u32 dma32_window_start;
>> +__u32 dma32_window_size;
>> +__u64 dma64_window_start;
>> +__u64 dma64_window_size;
>> +};
>> +
>> +#define SPAPR_TCE_IOMMU_GET_INFO_IO(VFIO_TYPE, VFIO_BASE + 12)
>> +
>> +struct tce_iommu_dma_map {
>> +__u32 argsz;
>> +__u32 flags;
>> +__u64 va;
>> +__u64 dmaaddr;
>> +};
>> +
>> +#define SPAPR_TCE_IOMMU_MAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 13)
>> +#define SPAPR_TCE_IOMMU_UNMAP_DMA   _IO(VFIO_TYPE, VFIO_BASE + 14)
>> +
>>  #endif /* VFIO_H */
>> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
>> index f573a95..c46a049 100644
>> --- a/hw/ppc/Makefile.objs
>> +++ b/hw/ppc/Makefile.objs
>> @@ -25,4 +25,7 @@ obj-$(CONFIG_FDT) += ../device_tree.o
>>  # Xilinx PPC peripherals
>>  obj-y += xilinx_ethlite.o
>>  
>> +# VFIO PCI device assignment
>> +obj-$(CONFIG_VFIO_PCI) += vfio_pci.o
>> +
>>  obj-y := $(addprefix ../,$(obj-y))
>> diff --git a/hw/spapr.h b/hw/spapr.h
>> index b37f337..0c15c88 100644
>> --- a/hw/spapr.h
>> +++ b/hw/spapr.h
>> @@ -340,4 +340,8 @@ int spapr_dma_dt(void *fdt, int node_off, const char 
>> *propname,
>>  int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
>>DMAContext *dma);
>>  
>> +struct sPAPRVFIOData;
>> +void spapr_vfio_init_dma(int group_id, uint32_t liobn,
>> + struct sPAPRVFIOData *data);
>> +
>>  #endif /* !defined (__HW_SPAPR_H__) */
>> diff --git a/hw/spapr_iommu.c b/hw/spapr_iommu.c
>> index 50c288d..0a82842 100644
>> --- a/hw/spapr_iommu.c
>> +++ b/hw/spapr_iommu.c
>> @@ -23,6 +23,8 @@
>>  #include "dma.h"
>>  
>>  #include "hw/spapr.h"
>> +#include "hw/spapr_iommu_vfio.h"
>> +#include "hw/vfio_pci.h"
>>  
>>  #include 
>>  
>> @@ -183,6 +185,60 @@ static int put_tce_emu(target_ulong liobn, target_ulong 
>> ioba, target_ulong tce)
>>  return 0;
>>  }
>>  
>> +typedef struct sPAPRVFIOTable {
>> +struct sPAPRVFIOData *data;
>> +uint32_t liobn;
>> +QLIST_ENTRY(sPAPRVFIOTable) list;
>> +} sPAPRVFIOTable;
>> +
>> +QLIST_HEAD(vfio_tce_tables, sPAPRVFIOTable) vfio_tce_tables;
>> +
>> +void spapr_vfio_init_dma(int group_id, uint32_t liobn,
>> + struct sPAPRVFIOData *data)
>> +{
>> +sPAPRVFIOTable *t;
>> +
>> +t = g_malloc0(sizeof(*t));
>> +t->data = data;
>> +t->liobn = liobn;
>> +
>> +QLIST_INSERT_HEAD(&vfio_tce_tables, t, list);
>> +}
>> +
>> +static int put_tce_vfio(uint32_t liobn, target_ulong ioba, target_ulong tce)
>> +{
>> +sPAPRVFIOTable *t;
>> +struct tce_iommu_dma_map map = {
>> +.argsz = sizeof(map),
>> +.va = 0,
>> +.dmaaddr = ioba,
>> +};
>> +
>> +QLIST_FOREACH(t, &vfio_tce_tables, list) {
>> +if (t->liobn != liobn) {
>> +continue;
>> +}
>> +if (!t->data) {
>> +return H_NO_MEM;
>> +}
> 
> Why would this ever happen?
> 
>> +if (tce)

Re: [Qemu-devel] [RFC v9 00/27] virtio: virtio-blk data plane

2012-07-18 Thread Khoa Huynh


Michael S. Tsirkin wrote on 07/18/2012 10:43:23 AM:

> From: "Michael S. Tsirkin" 
> To: Stefan Hajnoczi ,
> Cc: Kevin Wolf , Anthony Liguori/Austin/IBM@IBMUS,
> k...@vger.kernel.org, qemu-devel@nongnu.org, Khoa Huynh/Austin/
> IBM@IBMUS, Paolo Bonzini , Asias He

> Date: 07/18/2012 10:45 AM
> Subject: Re: [Qemu-devel] [RFC v9 00/27] virtio: virtio-blk data plane
> Sent by: qemu-devel-bounces+khoa=us.ibm@nongnu.org
>
> On Wed, Jul 18, 2012 at 04:07:27PM +0100, Stefan Hajnoczi wrote:
> > This series implements a dedicated thread for virtio-blk
> processing using Linux
> > AIO for raw image files only.  It is based on qemu-kvm.git a0bc8c3
> and somewhat
> > old but I wanted to share it on the list since it has been
> mentioned on mailing
> > lists and IRC recently.
> >
> > These patches can be used for benchmarking and discussion about
> how to improve
> > block performance.  Paolo Bonzini has also worked in this area andmight
want
> > to share his patches.
> >
> > The basic approach is:
> > 1. Each virtio-blk device has a thread dedicated to handling ioeventfd
> >signalling when the guest kicks the virtqueue.
> > 2. Requests are processed without going through the QEMU block layer
using
> >Linux AIO directly.
> > 3. Completion interrupts are injected via ioctl from the dedicated
thread.
> >
> > The series also contains request merging as a bdrv_aio_multiwrite
> () equivalent.
> > This was only to get a comparison against the QEMU block layer and
> I would drop
> > it for other types of analysis.
> >
> > The effect of this series is that O_DIRECT Linux AIO on raw files can
bypass
> > the QEMU global mutex and block layer.  This means higher performance.
>
> Do you have any numbers at all?

Yes, we do have a lot of data for this data-plane patch set.  I can send
you
detailed charts if you like, but generally, we run into a performance
bottleneck
with the existing qemu due to the qemu global mutex, and thus, could only
get
to about 140,000 IOPS for a single guest (at least on my setup).  With this
data-plane patch set, we bypass this bottleneck and have been able to
achieve
more than 600,000 IOPS for a single guest, and an aggregate 1.33 million
IOPS
with 4 guests on a single host.

Just for reference, VMware has claimed that they could get 300,000 IOPS for
a
single VM and 1 million IOPS with 6 VMs on a single VSphere 5.0 host.  So
we
definitely need something like this for KVM to be competitive with VMware
and
other hypervisors.  Of course, this would also help satisfy the high I/O
rate
requirements for BigData and other data-intensive applications or
benchmarks
running on KVM.

Thanks,
-Khoa

>
> > A cleaned up version of this approach could be added to QEMU as a
> raw O_DIRECT
> > Linux AIO fast path.  Image file formats, protocols, and other block
layer
> > features are not supported by virtio-blk-data-plane.
> >
> > Git repo:
> > http://repo.or.cz/w/qemu-kvm/stefanha.git/shortlog/refs/heads/
> virtio-blk-data-plane
> >
> > Stefan Hajnoczi (27):
> >   virtio-blk: Remove virtqueue request handling code
> >   virtio-blk: Set up host notifier for data plane
> >   virtio-blk: Data plane thread event loop
> >   virtio-blk: Map vring
> >   virtio-blk: Do cheapest possible memory mapping
> >   virtio-blk: Take PCI memory range into account
> >   virtio-blk: Put dataplane code into its own directory
> >   virtio-blk: Read requests from the vring
> >   virtio-blk: Add Linux AIO queue
> >   virtio-blk: Stop data plane thread cleanly
> >   virtio-blk: Indirect vring and flush support
> >   virtio-blk: Add workaround for BUG_ON() dependency in virtio_ring.h
> >   virtio-blk: Increase max requests for indirect vring
> >   virtio-blk: Use pthreads instead of qemu-thread
> >   notifier: Add a function to set the notifier
> >   virtio-blk: Kick data plane thread using event notifier set
> >   virtio-blk: Use guest notifier to raise interrupts
> >   virtio-blk: Call ioctl() directly instead of irqfd
> >   virtio-blk: Disable guest->host notifies while processing vring
> >   virtio-blk: Add ioscheduler to detect mergable requests
> >   virtio-blk: Add basic request merging
> >   virtio-blk: Fix request merging
> >   virtio-blk: Stub out SCSI commands
> >   virtio-blk: fix incorrect length
> >   msix: fix irqchip breakage in msix_try_notify_from_thread()
> >   msix: use upstream kvm_irqchip_set_irq()
> >   virtio-blk: add EVENT_IDX support to dataplane
> >
> >  event_notifier.c  |7 +
> >  event_notifier.h  |1 +
> >  hw/dataplane/event-poll.h |  116 +++
> >  hw/dataplane/ioq.h|  128 
> >  hw/dataplane/iosched.h|   97 ++
> >  hw/dataplane/vring.h  |  334 
> >  hw/msix.c |   15 +
> >  hw/msix.h |1 +
> >  hw/virtio-blk.c   |  753 
> +
> >  hw/virtio-pci.c   |8 +
> >  hw/virtio.c   |9 +
> >  hw/virtio.h   |3 +
>

[Qemu-devel] Mapping Hugepage to Guest

2012-07-18 Thread Haines, Brent R

Hello,

I have a process that allocates a number of hugepages in a /mnt/huge directory 
on the Linux host.  I would like to map those same hugepages when starting the 
guest in the same /mnt/huge directory.  Is there an existing way to do this?

Thanks,

Brent

Re: [Qemu-devel] [PATCH 5/6] add-cow: support snapshot_blkdev

2012-07-18 Thread Dong Xu Wang

On Thu, Jun 14, 2012 at 7:33 PM, Kevin Wolf  wrote:
> Am 14.06.2012 13:18, schrieb Paolo Bonzini:
>> Il 14/06/2012 12:59, Kevin Wolf ha scritto:
>>> Am 13.06.2012 16:36, schrieb Dong Xu Wang:
 add-cow will let raw file support snapshot_blkdev indirectly.

 Signed-off-by: Dong Xu Wang 
>>>
>>> Paolo, what do you think about this magic?
>>
>> Besides the obvious layering violation (it would be better to add a new
>> method bdrv_ext_snapshot_create perhaps) I don't see very much a problem
>> in it.  Passing image creation options sounds like a good idea that we
>> can add in the future as an extension.
>>
>> But honestly, I don't really see the point of add-cow in general...  The
>> raw image is anyway not usable without a pass through qemu-io convert,
>> and mirroring will also allow collapsing an image to raw (with the
>> persistent dirty bitmap playing the role of the add-cow metadata).
>
> Well, the idea was that you stream into add-cow and once the streaming
> has completed, you can just drop the bitmap.
>
> There might be some overlap with mirroring, though when we discussed
> introducing add-cow, mirrors were not yet on the table. One difference
> that remains is that with streaming the guest only writes to the target
> image and can't have any problem with convergence.
>
>>> I think I can see its use especially for HMP because it's quite
>>> convenient, but on the other hand it assumes a fixed image path for the
>>> new raw image. This isn't going to work for block devices, for example.
>>
>> True, but then probably you would use mode='existing', because you need
>> to pre-create the logical volume.
>
> I think it might be convenient to have the raw volume precreated (you
> need to do that anyway), but create the COW bitmap only during the
> snapshot command. But yeah, not really important.
>
>>> If we don't do it this way, we need to allow passing image creation
>>> options to the snapshotting command so that you can pass a precreated
>>> image file.
>>
>> This sounds like a useful extension anyway, except that passing an
>> unstructured string for image creation options is ugly...  Perhaps we
>> can base a better implementation of options on Laszlo's QemuOpts visitor.
>
> Yes, I wouldn't want to use a string here, we should use something
> structured. Image creation still uses the old-style options instead of
> QemuOpts, but maybe this is the opportunity to convert it.

Kevin, do you mean I need to replace QEMUOptionParameter with QemuOpts?

If true, other image formats should also be changed, I noticed Stefan
has an un-comleted patch:
http://repo.or.cz/w/qemu/stefanha.git/commitdiff/b49babb2c8b476a36357cfd7276ca45a11039ca5

then I can work based on Stefan's patch.


>
> Kevin
>

[Qemu-devel] [Bug 1004050] Re: qemu-system-ppc64 by default has non-working keyboard

2012-07-18 Thread Truman Boyes

I have also experienced the same issue with qemu-system-ppc64. It
appears that ppc64 is not able to communicate with the USB controller.
This issue is not seen with with qemu-system-ppc.

tboyes@tboyes-dev:~/qemu$ qemu-system-ppc64 -serial stdio -m 1024 -net nic -net 
user debian-ppc.qcow2 -cdrom debian-6.0.5-powerpc-netinst.iso  -boot d
VNC server running on `127.0.0.1:5901'
C>> annot manage 'OHCI USB controller' PCI device type 'usb':
>>  106b 3f (c 3 10)

>> =
>> OpenBIOS 1.0 [May 30 2012 16:55]
>> Configuration device id QEMU version 1 machine id 3
>> CPUs: 1
>> Memory: 1024M
>> UUID: ----
>> CPU type PowerPC,970FX
usb-kbd: warning: key event queue full
usb-kbd: warning: key event queue full
usb-kbd: warning: key event queue full
usb-kbd: warning: key event queue full

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1004050

Title:
  qemu-system-ppc64 by default has non-working keyboard

Status in QEMU:
  New

Bug description:
  Compile qemu from git and do:

./ppc64-softmmu/qemu-system-ppc64

  (ie. no parameters).  It boots to an OpenBIOS prompt.  However the
  keyboard doesn't work.  After ~10 keypresses, qemu just says:

  usb-kbd: warning: key event queue full
  usb-kbd: warning: key event queue full
  usb-kbd: warning: key event queue full
  usb-kbd: warning: key event queue full

  There is no indication inside the guest that OpenBIOS is seeing
  keyboard events.

  Also there's no indication of what type of keyboard devices are
  available, nor what we should use.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1004050/+subscriptions

Re: [Qemu-devel] [RFC v9 00/27] virtio: virtio-blk data plane

2012-07-18 Thread Khoa Huynh


"Michael S. Tsirkin"  wrote on 07/18/2012 10:43:23 AM:

> From: "Michael S. Tsirkin" 
> To: Stefan Hajnoczi ,
> Cc: qemu-devel@nongnu.org, k...@vger.kernel.org, Anthony Liguori/
> Austin/IBM@IBMUS, Kevin Wolf , Paolo Bonzini
> , Asias He , Khoa Huynh/
> Austin/IBM@IBMUS
> Date: 07/18/2012 10:46 AM
> Subject: Re: [RFC v9 00/27] virtio: virtio-blk data plane
>
> On Wed, Jul 18, 2012 at 04:07:27PM +0100, Stefan Hajnoczi wrote:
> > This series implements a dedicated thread for virtio-blk
> processing using Linux
> > AIO for raw image files only.  It is based on qemu-kvm.git a0bc8c3
> and somewhat
> > old but I wanted to share it on the list since it has been
> mentioned on mailing
> > lists and IRC recently.
> >
> > These patches can be used for benchmarking and discussion about
> how to improve
> > block performance.  Paolo Bonzini has also worked in this area andmight
want
> > to share his patches.
> >
> > The basic approach is:
> > 1. Each virtio-blk device has a thread dedicated to handling ioeventfd
> >signalling when the guest kicks the virtqueue.
> > 2. Requests are processed without going through the QEMU block layer
using
> >Linux AIO directly.
> > 3. Completion interrupts are injected via ioctl from the dedicated
thread.
> >
> > The series also contains request merging as a bdrv_aio_multiwrite
> () equivalent.
> > This was only to get a comparison against the QEMU block layer and
> I would drop
> > it for other types of analysis.
> >
> > The effect of this series is that O_DIRECT Linux AIO on raw files can
bypass
> > the QEMU global mutex and block layer.  This means higher performance.
>
> Do you have any numbers at all?

Yes, we do have a lot of data for this data-plane patch set.  I can send
you
detailed charts if you like, but generally, we run into a performance
bottleneck
with the existing qemu due to the qemu global mutex, and thus, could only
get
to about 140,000 IOPS for a single guest (at least on my setup).  With this
data-plane patch set, we bypass this bottleneck and have been able to
achieve
more than 600,000 IOPS for a single guest, and an aggregate 1.33 million
IOPS
with 4 guests on a single host.

Just for reference, VMware has claimed that they could get 300,000 IOPS for
a
single VM and 1 million IOPS with 6 VMs on a single VSphere 5.0 host.  So
we
definitely need something like this for KVM to be competitive with VMware
and
other hypervisors.  Of course, this would also help satisfy the high I/O
rate
requirements for BigData and other data-intensive applications or
benchmarks
running on KVM.

Thanks,
-Khoa

>
> > A cleaned up version of this approach could be added to QEMU as a
> raw O_DIRECT
> > Linux AIO fast path.  Image file formats, protocols, and other block
layer
> > features are not supported by virtio-blk-data-plane.
> >
> > Git repo:
> > http://repo.or.cz/w/qemu-kvm/stefanha.git/shortlog/refs/heads/
> virtio-blk-data-plane
> >
> > Stefan Hajnoczi (27):
> >   virtio-blk: Remove virtqueue request handling code
> >   virtio-blk: Set up host notifier for data plane
> >   virtio-blk: Data plane thread event loop
> >   virtio-blk: Map vring
> >   virtio-blk: Do cheapest possible memory mapping
> >   virtio-blk: Take PCI memory range into account
> >   virtio-blk: Put dataplane code into its own directory
> >   virtio-blk: Read requests from the vring
> >   virtio-blk: Add Linux AIO queue
> >   virtio-blk: Stop data plane thread cleanly
> >   virtio-blk: Indirect vring and flush support
> >   virtio-blk: Add workaround for BUG_ON() dependency in virtio_ring.h
> >   virtio-blk: Increase max requests for indirect vring
> >   virtio-blk: Use pthreads instead of qemu-thread
> >   notifier: Add a function to set the notifier
> >   virtio-blk: Kick data plane thread using event notifier set
> >   virtio-blk: Use guest notifier to raise interrupts
> >   virtio-blk: Call ioctl() directly instead of irqfd
> >   virtio-blk: Disable guest->host notifies while processing vring
> >   virtio-blk: Add ioscheduler to detect mergable requests
> >   virtio-blk: Add basic request merging
> >   virtio-blk: Fix request merging
> >   virtio-blk: Stub out SCSI commands
> >   virtio-blk: fix incorrect length
> >   msix: fix irqchip breakage in msix_try_notify_from_thread()
> >   msix: use upstream kvm_irqchip_set_irq()
> >   virtio-blk: add EVENT_IDX support to dataplane
> >
> >  event_notifier.c  |7 +
> >  event_notifier.h  |1 +
> >  hw/dataplane/event-poll.h |  116 +++
> >  hw/dataplane/ioq.h|  128 
> >  hw/dataplane/iosched.h|   97 ++
> >  hw/dataplane/vring.h  |  334 
> >  hw/msix.c |   15 +
> >  hw/msix.h |1 +
> >  hw/virtio-blk.c   |  753 
> +
> >  hw/virtio-pci.c   |8 +
> >  hw/virtio.c   |9 +
> >  hw/virtio.h   |3 +
> >  12 files changed, 1074 insertions(+), 398 deletions(-)
> >  c

[Qemu-devel] [help]: how to use HMP command dump-guest-memory

2012-07-18 Thread Sheldon


I want to dump all guest's memory to file ./guestcore
I execute this command as follow:
(qemu) dump-guest-memory -p protocol file:./guestcore
invalid char in expression

Re: [Qemu-devel] qemu in full emulation on win32

2012-07-18 Thread Alexey Kardashevskiy

On 19/07/12 02:35, Stefan Weil wrote:
> Am 18.07.2012 08:30, schrieb Alexey Kardashevskiy:
>> Hi!
>>
>> Found 2 problems while I was debugging 
>> qemu/ppc64-softmmu/qemu-system-ppc64.exe
>> WindowsXP SP3 Pro, 32bit, i686-pc-mingw32-gcc (GCC) 4.5.2.
>>
>>
>> 1. The size of the following is 7 bytes on linux and 8 bytes on Windows:
>> struct {
>>  uint32_t hi;
>>  uint64_t child;
>>  uint64_t parent;
>>  uint64_t size;
>> } __attribute__((packed)) ranges[];
>>
>> The structure is used between QEMU and Open Firmware (powerpc bios) so it is 
>> important.
>>
>> The Feature is described here:
>> http://stackoverflow.com/questions/7789668/why-would-the-size-of-a-packed-structure-be-different-on-linux-and-windows-when
>> Shortly there is packing and ms-packing and they are different :)
>>
>> The solutions are:
>> 1. Add MS-specific #pragma pack(push,1) and #pragma pack(pop).
>> 2. Add -mno-ms-bitfields (gcc >= 4.7.0)
>> 3. Change the structure above to use only uint32_t.
>>
>> What is the common way of solving such problems in QEMU?
> 
> Problem 1 is solved with solution 4 (your own patch) although
> that patch does not change the structure size to 7 bytes :-)


The weblink here is just for explanation :)
My struct is 7 32bit values but on Windows it was 8 32bit values, 32->28 bytes.



>> 2. QEMU cannot allocate 1024MB for the guest RAM. Literally, VirtualAlloc() 
>> fails on 1024MB BUT it does not if I allocate 1023MB and 64MB by 2 
>> subsequent calls. We allocate RAM via memory_region_init_ram(). I am pretty 
>> sure this is not happening on 64bit Windows and I suspect that it is 
>> happening with qemu-system-x86.exe, is not it?
>>
>> Do we care that there is actually enough RAM and we could allocate it in 
>> several chunks?
> 
> 
> Please try the patch which I'm going to send.
> 
> On w64, VirtualAlloc() _can_ allocate large quantities of contiguous 
> virtual memory.
> 
> On w32, it is normally restricted to the lower 2 GiB which are already 
> fragmented
> by the code (executable, shared libraries) and data. Larger quantities 
> are available
> when the executable is allowed to use the upper 2 GiB, too. That's what 
> my patch does.


Looking forward, thanks. I am surprised nobody hit it before.



-- 
Alexey

[Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data

2012-07-18 Thread Alexey Kardashevskiy

Added (msi|msix)_set_message() function for whoever might
want to use them.

Currently msi_notify()/msix_notify() write to these vectors to
signal the guest about an interrupt so the correct values have to
written there by the guest or QEMU.

For example, POWER guest never initializes MSI/MSIX vectors, instead
it uses RTAS hypercalls. So in order to support MSIX for virtio-pci on
POWER we have to initialize MSI/MSIX message from QEMU.

Signed-off-by: Alexey Kardashevskiy 
---
 hw/msi.c  |   17 +
 hw/msi.h  |1 +
 hw/msix.c |   13 +
 hw/msix.h |1 +
 4 files changed, 32 insertions(+)

diff --git a/hw/msi.c b/hw/msi.c
index 5233204..e2273a0 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -105,6 +105,23 @@ static inline uint8_t msi_pending_off(const PCIDevice* 
dev, bool msi64bit)
 return dev->msi_cap + (msi64bit ? PCI_MSI_PENDING_64 : PCI_MSI_PENDING_32);
 }
 
+/*
+ * Special API for POWER to configure the vectors through
+ * a side channel. Should never be used by devices.
+ */
+void msi_set_message(PCIDevice *dev, MSIMessage msg)
+{
+uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
+bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
+
+if (msi64bit) {
+pci_set_quad(dev->config + msi_address_lo_off(dev), msg.address);
+} else {
+pci_set_long(dev->config + msi_address_lo_off(dev), msg.address);
+}
+pci_set_word(dev->config + msi_data_off(dev, msi64bit), msg.data);
+}
+
 bool msi_enabled(const PCIDevice *dev)
 {
 return msi_present(dev) &&
diff --git a/hw/msi.h b/hw/msi.h
index 75747ab..6ec1f99 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -31,6 +31,7 @@ struct MSIMessage {
 
 extern bool msi_supported;
 
+void msi_set_message(PCIDevice *dev, MSIMessage msg);
 bool msi_enabled(const PCIDevice *dev);
 int msi_init(struct PCIDevice *dev, uint8_t offset,
  unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask);
diff --git a/hw/msix.c b/hw/msix.c
index fd9ea95..800fc32 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -37,6 +37,19 @@ static MSIMessage msix_get_message(PCIDevice *dev, unsigned 
vector)
 return msg;
 }
 
+/*
+ * Special API for POWER to configure the vectors through
+ * a side channel. Should never be used by devices.
+ */
+void msix_set_message(PCIDevice *dev, int vector, struct MSIMessage msg)
+{
+uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
+
+pci_set_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR, msg.address);
+pci_set_long(table_entry + PCI_MSIX_ENTRY_DATA, msg.data);
+table_entry[PCI_MSIX_ENTRY_VECTOR_CTRL] &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
+}
+
 static uint8_t msix_pending_mask(int vector)
 {
 return 1 << (vector % 8);
diff --git a/hw/msix.h b/hw/msix.h
index 1786e27..15211cb 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -4,6 +4,7 @@
 #include "qemu-common.h"
 #include "pci.h"
 
+void msix_set_message(PCIDevice *dev, int vector, MSIMessage msg);
 int msix_init(PCIDevice *dev, unsigned short nentries,
   MemoryRegion *table_bar, uint8_t table_bar_nr,
   unsigned table_offset, MemoryRegion *pba_bar,
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data

2012-07-18 Thread Alexey Kardashevskiy

On 19/07/12 01:23, Michael S. Tsirkin wrote:
> On Wed, Jul 18, 2012 at 11:17:12PM +1000, Alexey Kardashevskiy wrote:
>> On 18/07/12 22:43, Michael S. Tsirkin wrote:
>>> On Thu, Jun 21, 2012 at 09:39:10PM +1000, Alexey Kardashevskiy wrote:
 Added (msi|msix)_set_message() functions.

 Currently msi_notify()/msix_notify() write to these vectors to
 signal the guest about an interrupt so the correct values have to
 written there by the guest or QEMU.

 For example, POWER guest never initializes MSI/MSIX vectors, instead
 it uses RTAS hypercalls. So in order to support MSIX for virtio-pci on
 POWER we have to initialize MSI/MSIX message from QEMU.

 Signed-off-by: Alexey Kardashevskiy 
>>>
>>> So guests do enable MSI through config space, but do
>>> not fill in vectors? 
>>
>> Yes. msix_capability_init() calls arch_setup_msi_irqs() which does 
>> everything it needs to do (i.e. calls hypervisor) before 
>> msix_capability_init() writes PCI_MSIX_FLAGS_ENABLE to the PCI_MSIX_FLAGS 
>> register.
>>
>> These vectors are the PCI bus addresses, the way they are set is specific 
>> for a PCI host controller, I do not see why the current scheme is a bug.
> 
> I won't work with any real PCI device, will it? Real pci devices expect
> vectors to be written into their memory.


Yes. And the hypervisor does this. On POWER (at least book3s - server powerpc, 
the whole config space kitchen is hidden behind RTAS (kind of bios). For the 
guest, this RTAS is implemented in hypervisor, for the host - in the system 
firmware. So powerpc linux does not have to have PHB drivers. Kinda cool.

Usual powerpc server is running without the host linux at all, it is running a 
hypervisor called pHyp. And every guest knows that it is a guest, there is no 
full machine emulation, it is para-virtualization. In power-kvm, we replace 
that pHyp with the host linux and now QEMU plays a hypervisor role. Some day We 
will move the hypervisor to the host kernel completely (?) but now it is in 
QEMU.


>>> Very strange. Are you sure it's not
>>> just a guest bug? How does it work for other PCI devices?
>>
>> Did not get the question. It works the same for every PCI device under POWER 
>> guest.
> 
> I mean for real PCI devices.
> 
>>> Can't we just fix guest drivers to program the vectors properly?
>>>
>>> Also pls address the comment below.
>>
>> Comment below.
>>
>>> Thanks!
>>>
 ---
  hw/msi.c  |   13 +
  hw/msi.h  |1 +
  hw/msix.c |9 +
  hw/msix.h |2 ++
  4 files changed, 25 insertions(+)

 diff --git a/hw/msi.c b/hw/msi.c
 index 5233204..cc6102f 100644
 --- a/hw/msi.c
 +++ b/hw/msi.c
 @@ -105,6 +105,19 @@ static inline uint8_t msi_pending_off(const 
 PCIDevice* dev, bool msi64bit)
  return dev->msi_cap + (msi64bit ? PCI_MSI_PENDING_64 : 
 PCI_MSI_PENDING_32);
  }
  
 +void msi_set_message(PCIDevice *dev, MSIMessage msg)
 +{
 +uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
 +bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
 +
 +if (msi64bit) {
 +pci_set_quad(dev->config + msi_address_lo_off(dev), msg.address);
 +} else {
 +pci_set_long(dev->config + msi_address_lo_off(dev), msg.address);
 +}
 +pci_set_word(dev->config + msi_data_off(dev, msi64bit), msg.data);
 +}
 +
>>>
>>> Please add documentation. Something like
>>>
>>> /*
>>>  * Special API for POWER to configure the vectors through
>>>  * a side channel. Should never be used by devices.
>>>  */
>>
>>
>> It is useful for any para-virtualized environment I believe, is not it?
>> For s390 as well. Of course, if it supports PCI, for example, what I am not 
>> sure it does though :)
> 
> I expect the normal guest to program the address into MSI register using
> config accesses, same way that it enables MSI/MSIX.
> Why POWER does it differently I did not yet figure out but I hope
> this weirdness is not so widespread.


In para-virt I would expect the guest not to touch config space at all. At 
least it should use one interface rather than two but this is how it is.


  bool msi_enabled(const PCIDevice *dev)
  {
  return msi_present(dev) &&
 diff --git a/hw/msi.h b/hw/msi.h
 index 75747ab..6ec1f99 100644
 --- a/hw/msi.h
 +++ b/hw/msi.h
 @@ -31,6 +31,7 @@ struct MSIMessage {
  
  extern bool msi_supported;
  
 +void msi_set_message(PCIDevice *dev, MSIMessage msg);
  bool msi_enabled(const PCIDevice *dev);
  int msi_init(struct PCIDevice *dev, uint8_t offset,
   unsigned int nr_vectors, bool msi64bit, bool 
 msi_per_vector_mask);
 diff --git a/hw/msix.c b/hw/msix.c
 index ded3c55..5f7d6d3 100644
 --- a/hw/msix.c
 +++ b/hw/msix.c
 @@ -45,6 +45,15 @@ static MSIMessage msix_get_message(PCIDevice *dev, 
 unsigned vector)
  return msg;
  }
>

Re: [Qemu-devel] [PATCH 2/2] vexpress: Add NOR1 Flash support

2012-07-18 Thread Peter Crosthwaite

On Thu, Jul 19, 2012 at 5:03 AM,  <402ja...@gmail.com> wrote:
> From: Jagan <402ja...@gmail.com>
>
> This patch adds support for NOR1 flash (Bank #2) on
> vexpress-a9 platform. It is 64MB CFI01 compliant flash.
>
> Tested on stable u-boot version through Linux.
>
> Signed-off-by: Jagan <402ja...@gmail.com>
> ---
>  hw/vexpress.c |   10 +-
>  1 files changed, 9 insertions(+), 1 deletions(-)
>
> diff --git a/hw/vexpress.c b/hw/vexpress.c
> index 2e889a8..b4262ed 100644
> --- a/hw/vexpress.c
> +++ b/hw/vexpress.c
> @@ -422,7 +422,15 @@ static void vexpress_common_init(const VEDBoardInfo 
> *daughterboard,
>  }
>
>  /* VE_NORFLASH0ALIAS: not modelled */
> -/* VE_NORFLASH1: not modelled */
> +/* VE_NORFLASH1: */
> +dinfo = drive_get(IF_PFLASH, 0, 0);

Both flashes use drive_get(IF_PFLASH, 0, 0). Doesnt this means they
are both going to back to the same file (one -pflash argument) and
share storage? Should this use drive_get_next() and you specify two
-pflash args, one for each device?

Regards
Peter

> +if (!pflash_cfi01_register(map[VE_NORFLASH1], NULL, "vexpress.flash1",
> +VEXPRESS_FLASH_SIZE, dinfo ? dinfo->bdrv : NULL,
> +VEXPRESS_FLASH_SECT_SIZE,
> +VEXPRESS_FLASH_SIZE / VEXPRESS_FLASH_SECT_SIZE,
> +4, 0x0089, 0x0018, 0x, 0x1, 0)) {
> +fprintf(stderr, "qemu: Error registering flash1 memory.\n");
> +}
>
>  sram_size = 0x200;
>  memory_region_init_ram(sram, "vexpress.sram", sram_size);
> --
> 1.7.0.4
>
>

[Qemu-devel] [Bug 1007269] Re: Can’t install or boot up 32bit win8 guest.

2012-07-18 Thread Marcelo Tosatti

PAE/NX/SSE2 Support Requirement Guide for Windows 8

http://answers.microsoft.com/en-
us/windows/forum/windows_8-windows_install/error-when-installing-
windows-8-release-preview/a2c11f2c-d43b-44fc-9bc0-61805a2d95ef

Perhaps adding 3DNow allows qemu64 to boot?

In any case, this is not a KVM bug.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1007269

Title:
  Can’t install or boot up 32bit win8 guest.

Status in QEMU:
  New

Bug description:
  Environment:
  
  Host OS (ia32/ia32e/IA64):ia32e
  Guest OS (ia32/ia32e/IA64):ia32e
  Guest OS Type (Linux/Windows):Linux(rhel6u1)
  kvm.git Commit:51bfd2998113e1f8ce8dcf853407b76a04b5f2a0
  qemu-kvm Commit:3fd9fedb9fae4517d93d76e93a2924559cacf2f6
  Host Kernel Version:3.4.0-rc7
  Hardware:WSM-EP,Romley-EP

  
  Bug detailed description:
  --
  it can't install or boot up 32bit Win8(Consumer Preview Update 2) guest. The 
guest will crash with the following error, while 64bit Win8 and 32bit Win7 
guest work fine. 
  -Win8 Error---
  Your computer needs to restart.
  Please hold down the power button.
  Error code: 0x005D
  Parameters:
  0x03060D03
  0x756E6547
  0x49656E69
  0x6C65746E
  -Win8 Error---

  "0x005D" code means "UNSUPPORTED_PROCESSOR" in Windows.

  BTW, you can get 32bit Win8 ISO from the following website. 
  http://windows.microsoft.com/en-US/windows-8/iso

  
  Reproduce steps:
  
  1. get 32 bit win8 ISO
  2. prepare a disk image: dd if=/dev/zero of=/root/win8-32.img bs=1M 
count=16384
  3. start a guest to install 32bit win8: qemu-system –m 4096 –smp 4 –hda 
/root/win8-32.img –cdrom /media/32bit-Win8.iso
  4. if you have a 32bit win8 image,try to boot it up

  
  Current result:
  
  32bit Win8 guest crash

  Expected result:
  
  32bit win8 guest boot up correctly

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1007269/+subscriptions

Re: [Qemu-devel] [PATCH] qemu kvm: Recognize PCID feature

2012-07-18 Thread Marcelo Tosatti

On Wed, Jul 18, 2012 at 11:29:36AM +0200, Jan Kiszka wrote:
> On 2012-07-18 10:44, Mao, Junjie wrote:
> > Hi, Avi
> > 
> > Any comments on this patch? :)
> 
> Always include qemu-devel when your are changing QEMU, qemu-kvm is just
> staging for the latter. This patch can actually go into upstream
> directly, maybe even via qemu-trivial as it just makes that flag selectable.
> 
> Jan

Agreed, CCing Stefan.

Re: [Qemu-devel] [PATCH] Fix for qemu crash on assertion error when adding PCI passthru device.

2012-07-18 Thread Ma, Stephen B.

Sorry for taking so long to reply.  I am new to this.  Should this patch be 
committed or just dropped


-Original Message-
From: Jan Kiszka [mailto:jan.kis...@web.de] 
Sent: Sunday, June 17, 2012 11:25 PM
To: Anthony Liguori
Cc: Michael S. Tsirkin; 'qemu-devel@nongnu.org'; Ma, Stephen B.
Subject: Re: [PATCH] Fix for qemu crash on assertion error when adding PCI 
passthru device.

On 2012-06-17 16:28, Anthony Liguori wrote:
> On 06/17/2012 03:34 AM, Michael S. Tsirkin wrote:
>> On Sun, Jun 17, 2012 at 06:26:33AM +, Ma, Stephen B. wrote:
>>>
>>> Michael,
>>>
>>> Thanks for the review.  I added the unparent to the qdev_free.
>>>
>>>
>>> ---
>>>   hw/qdev.c |1 +
>>>   1 files changed, 1 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/hw/qdev.c b/hw/qdev.c
>>> index d2dc28b..ed1328d 100644
>>> --- a/hw/qdev.c
>>> +++ b/hw/qdev.c
>>> @@ -264,6 +264,7 @@ void qdev_init_nofail(DeviceState *dev)
>>>   /* Unlink device from bus and free the structure.  */
>>>   void qdev_free(DeviceState *dev)
>>>   {
>>> +object_unparent(OBJECT(dev));
>>>   object_delete(OBJECT(dev));
>>>   }
>>>
>>> --
>>> 1.7.1
>>
>> Anthony, any feedback?
> 
> Yes, this is wrong.
> 
> PCI passthrough isn't in qemu.git so it's not clear to me where this 
> is happening.  Why would qdev_free be called when adding a PCI 
> passthru device?

The bug is reproducible with any in-tree device (at least PCI) that happens to 
return != 0 from its init handler.

Jan

[Qemu-devel] [PATCH 2/2] vexpress: Add NOR1 Flash support

2012-07-18 Thread 402jagan

From: Jagan <402ja...@gmail.com>

This patch adds support for NOR1 flash (Bank #2) on
vexpress-a9 platform. It is 64MB CFI01 compliant flash.

Tested on stable u-boot version through Linux.

Signed-off-by: Jagan <402ja...@gmail.com>
---
 hw/vexpress.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/hw/vexpress.c b/hw/vexpress.c
index 2e889a8..b4262ed 100644
--- a/hw/vexpress.c
+++ b/hw/vexpress.c
@@ -422,7 +422,15 @@ static void vexpress_common_init(const VEDBoardInfo 
*daughterboard,
 }
 
 /* VE_NORFLASH0ALIAS: not modelled */
-/* VE_NORFLASH1: not modelled */
+/* VE_NORFLASH1: */
+dinfo = drive_get(IF_PFLASH, 0, 0);
+if (!pflash_cfi01_register(map[VE_NORFLASH1], NULL, "vexpress.flash1",
+VEXPRESS_FLASH_SIZE, dinfo ? dinfo->bdrv : NULL,
+VEXPRESS_FLASH_SECT_SIZE,
+VEXPRESS_FLASH_SIZE / VEXPRESS_FLASH_SECT_SIZE,
+4, 0x0089, 0x0018, 0x, 0x1, 0)) {
+fprintf(stderr, "qemu: Error registering flash1 memory.\n");
+}
 
 sram_size = 0x200;
 memory_region_init_ram(sram, "vexpress.sram", sram_size);
-- 
1.7.0.4

[Qemu-devel] [PATCH 1/2] vexpress: Add NOR0 Flash support

2012-07-18 Thread 402jagan

From: Jagan <402ja...@gmail.com>

This patch adds support for NOR0 flash (Bank #1) on
vexpress-a9 platform. It is 64MB CFI01 compliant flash.

Tested on stable u-boot version through Linux.

Signed-off-by: Jagan <402ja...@gmail.com>
---
 hw/vexpress.c |   17 -
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/hw/vexpress.c b/hw/vexpress.c
index 8072c5a..2e889a8 100644
--- a/hw/vexpress.c
+++ b/hw/vexpress.c
@@ -29,6 +29,11 @@
 #include "sysemu.h"
 #include "boards.h"
 #include "exec-memory.h"
+#include "blockdev.h"
+#include "flash.h"
+
+#define VEXPRESS_FLASH_SIZE (64 * 1024 * 1024)
+#define VEXPRESS_FLASH_SECT_SIZE (256 * 1024)
 
 #define VEXPRESS_BOARD_ID 0x8e0
 
@@ -355,6 +360,7 @@ static void vexpress_common_init(const VEDBoardInfo 
*daughterboard,
 MemoryRegion *vram = g_new(MemoryRegion, 1);
 MemoryRegion *sram = g_new(MemoryRegion, 1);
 const target_phys_addr_t *map = daughterboard->motherboard_map;
+DriveInfo *dinfo;
 
 daughterboard->init(daughterboard, ram_size, cpu_model, pic, &proc_id);
 
@@ -405,7 +411,16 @@ static void vexpress_common_init(const VEDBoardInfo 
*daughterboard,
 
 sysbus_create_simple("pl111", map[VE_CLCD], pic[14]);
 
-/* VE_NORFLASH0: not modelled */
+/* VE_NORFLASH0: */
+dinfo = drive_get(IF_PFLASH, 0, 0);
+if (!pflash_cfi01_register(map[VE_NORFLASH0], NULL, "vexpress.flash0",
+VEXPRESS_FLASH_SIZE, dinfo ? dinfo->bdrv : NULL,
+VEXPRESS_FLASH_SECT_SIZE,
+VEXPRESS_FLASH_SIZE / VEXPRESS_FLASH_SECT_SIZE,
+4, 0x0089, 0x0018, 0x, 0x0, 0)) {
+fprintf(stderr, "qemu: Error registering flash0 memory.\n");
+}
+
 /* VE_NORFLASH0ALIAS: not modelled */
 /* VE_NORFLASH1: not modelled */
 
-- 
1.7.0.4

[Qemu-devel] [PATCH 0/2] vexpress-a9: NOR flash support

2012-07-18 Thread 402jagan

From: Jagan <402ja...@gmail.com>

These patches are added a support for NOR flash on vexpress-a9
both in Bank #1 and Bank #2.
Well tested on u-boot stable tree with Linux.

Jagan (2):
  vexpress: Add NOR0 Flash support
  vexpress: Add NOR1 Flash support

 hw/vexpress.c |   27 +--
 1 files changed, 25 insertions(+), 2 deletions(-)

Re: [Qemu-devel] [RFC v9 23/27] virtio-blk: Stub out SCSI commands

2012-07-18 Thread Michael S. Tsirkin

On Wed, Jul 18, 2012 at 04:07:50PM +0100, Stefan Hajnoczi wrote:
> Signed-off-by: Stefan Hajnoczi 

Why?

> ---
>  hw/virtio-blk.c |   25 +
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
> index 51807b5..8734029 100644
> --- a/hw/virtio-blk.c
> +++ b/hw/virtio-blk.c
> @@ -215,14 +215,8 @@ static void process_request(IOQueue *ioq, struct iovec 
> iov[], unsigned int out_n
>  
>  /* TODO Linux sets the barrier bit even when not advertised! */
>  uint32_t type = outhdr->type & ~VIRTIO_BLK_T_BARRIER;
> -
> -if (unlikely(type & ~(VIRTIO_BLK_T_OUT | VIRTIO_BLK_T_FLUSH))) {
> -fprintf(stderr, "virtio-blk unsupported request type %#x\n", 
> outhdr->type);
> -exit(1);
> -}
> -
>  struct iocb *iocb;
> -switch (type & (VIRTIO_BLK_T_OUT | VIRTIO_BLK_T_FLUSH)) {
> +switch (type & (VIRTIO_BLK_T_OUT | VIRTIO_BLK_T_SCSI_CMD | 
> VIRTIO_BLK_T_FLUSH)) {
>  case VIRTIO_BLK_T_IN:
>  if (unlikely(out_num != 1)) {
>  fprintf(stderr, "virtio-blk invalid read request\n");
> @@ -239,6 +233,21 @@ static void process_request(IOQueue *ioq, struct iovec 
> iov[], unsigned int out_n
>  iocb = ioq_rdwr(ioq, false, &iov[1], out_num - 1, outhdr->sector * 
> 512UL); /* TODO is it always 512? */
>  break;
>  
> +case VIRTIO_BLK_T_SCSI_CMD:
> +if (unlikely(in_num == 0)) {
> +fprintf(stderr, "virtio-blk invalid SCSI command request\n");
> +exit(1);
> +}
> +
> +/* TODO support SCSI commands */
> +{
> +VirtIOBlock *s = container_of(ioq, VirtIOBlock, ioqueue);
> +inhdr->status = VIRTIO_BLK_S_UNSUPP;
> +vring_push(&s->vring, head, sizeof *inhdr);
> +virtio_blk_notify_guest(s);
> +}
> +return;
> +
>  case VIRTIO_BLK_T_FLUSH:
>  if (unlikely(in_num != 1 || out_num != 1)) {
>  fprintf(stderr, "virtio-blk invalid flush request\n");
> @@ -256,7 +265,7 @@ static void process_request(IOQueue *ioq, struct iovec 
> iov[], unsigned int out_n
>  return;
>  
>  default:
> -fprintf(stderr, "virtio-blk multiple request type bits set\n");
> +fprintf(stderr, "virtio-blk unsupported request type %#x\n", 
> outhdr->type);
>  exit(1);
>  }
>  
> -- 
> 1.7.10.4

Re: [Qemu-devel] [RFC v9 22/27] virtio-blk: Fix request merging

2012-07-18 Thread Michael S. Tsirkin

On Wed, Jul 18, 2012 at 04:07:49PM +0100, Stefan Hajnoczi wrote:
> Khoa Huynh  discovered that request merging is broken.
> The merged iocb is not updated to reflect the total number of iovecs and
> the offset is also outdated.
> 
> This patch fixes request merging.
> 
> Signed-off-by: Stefan Hajnoczi 

So all these fixups need to be folded in making it correct first time.

> ---
>  hw/virtio-blk.c |   10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
> index 9131a7a..51807b5 100644
> --- a/hw/virtio-blk.c
> +++ b/hw/virtio-blk.c
> @@ -178,13 +178,17 @@ static void merge_request(struct iocb *iocb_a, struct 
> iocb *iocb_b)
>  req_a->len = iocb_nbytes(iocb_a);
>  }
>  
> -iocb_b->u.v.vec = iovec;
> -req_b->len = iocb_nbytes(iocb_b);
> -req_b->next_merged = req_a;
>  /*
>  fprintf(stderr, "merged %p (%u) and %p (%u), %u iovecs in total\n",
>  req_a, iocb_a->u.v.nr, req_b, iocb_b->u.v.nr, iocb_a->u.v.nr + 
> iocb_b->u.v.nr);
>  */
> +
> +iocb_b->u.v.vec = iovec;
> +iocb_b->u.v.nr += iocb_a->u.v.nr;
> +iocb_b->u.v.offset = iocb_a->u.v.offset;
> +
> +req_b->len = iocb_nbytes(iocb_b);
> +req_b->next_merged = req_a;
>  }
>  
>  static void process_request(IOQueue *ioq, struct iovec iov[], unsigned int 
> out_num, unsigned int in_num, unsigned int head)
> -- 
> 1.7.10.4

Re: [Qemu-devel] [RFC v9 12/27] virtio-blk: Add workaround for BUG_ON() dependency in virtio_ring.h

2012-07-18 Thread Michael S. Tsirkin

On Wed, Jul 18, 2012 at 04:07:39PM +0100, Stefan Hajnoczi wrote:
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/dataplane/vring.h |5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/dataplane/vring.h b/hw/dataplane/vring.h
> index 3eab4b4..44ef4a9 100644
> --- a/hw/dataplane/vring.h
> +++ b/hw/dataplane/vring.h
> @@ -1,6 +1,11 @@
>  #ifndef VRING_H
>  #define VRING_H
>  
> +/* Some virtio_ring.h files use BUG_ON() */

It's a bug then. Do we really need to work around broken systems?
If yes let's just ship our own headers ...

> +#ifndef BUG_ON
> +#define BUG_ON(x)
> +#endif
> +
>  #include 
>  #include "qemu-common.h"
>  
> -- 
> 1.7.10.4

Re: [Qemu-devel] [RFC v9 11/27] virtio-blk: Indirect vring and flush support

2012-07-18 Thread Michael S. Tsirkin

On Wed, Jul 18, 2012 at 04:07:38PM +0100, Stefan Hajnoczi wrote:
> RHEL6 and other new guest kernels use indirect vring descriptors to
> increase the number of requests that can be batched.  This fundamentally
> changes vring from a scheme that requires fixed resources to something
> more dynamic (although there is still an absolute maximum number of
> descriptors).  Cope with indirect vrings by taking on as many requests
> as we can in one go and then postponing the remaining requests until the
> first batch completes.
> 
> It would be possible to switch to dynamic resource management so iovec
> and iocb structs are malloced.  This would allow the entire ring to be
> processed even with indirect descriptors, but would probably hit a
> bottleneck when io_submit refuses to queue more requests.  Therefore,
> stick with the simpler scheme for now.
> 
> Unfortunately Linux AIO does not support asynchronous fsync/fdatasync on
> all files.  In particular, an O_DIRECT opened file on ext4 does not
> support Linux AIO fdsync.  Work around this by performing fdatasync()
> synchronously for now.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/dataplane/ioq.h   |   18 -
>  hw/dataplane/vring.h |  103 
> +++---
>  hw/virtio-blk.c  |   75 ++--
>  3 files changed, 144 insertions(+), 52 deletions(-)
> 
> diff --git a/hw/dataplane/ioq.h b/hw/dataplane/ioq.h
> index 7200e87..d1545d6 100644
> --- a/hw/dataplane/ioq.h
> +++ b/hw/dataplane/ioq.h
> @@ -3,7 +3,7 @@
>  
>  typedef struct {
>  int fd; /* file descriptor */
> -unsigned int max_reqs;   /* max length of freelist and queue */
> +unsigned int max_reqs;  /* max length of freelist and queue */
>  
>  io_context_t io_ctx;/* Linux AIO context */
>  EventNotifier io_notifier;  /* Linux AIO eventfd */
> @@ -91,18 +91,16 @@ static struct iocb *ioq_rdwr(IOQueue *ioq, bool read, 
> struct iovec *iov, unsigne
>  return iocb;
>  }
>  
> -static struct iocb *ioq_fdsync(IOQueue *ioq)
> -{
> -struct iocb *iocb = ioq_get_iocb(ioq);
> -
> -io_prep_fdsync(iocb, ioq->fd);
> -io_set_eventfd(iocb, event_notifier_get_fd(&ioq->io_notifier));
> -return iocb;
> -}
> -
>  static int ioq_submit(IOQueue *ioq)
>  {
>  int rc = io_submit(ioq->io_ctx, ioq->queue_idx, ioq->queue);
> +if (unlikely(rc < 0)) {
> +unsigned int i;
> +fprintf(stderr, "io_submit io_ctx=%#lx nr=%d iovecs=%p\n", 
> (uint64_t)ioq->io_ctx, ioq->queue_idx, ioq->queue);
> +for (i = 0; i < ioq->queue_idx; i++) {
> +fprintf(stderr, "[%u] type=%#x fd=%d\n", i, 
> ioq->queue[i]->aio_lio_opcode, ioq->queue[i]->aio_fildes);
> +}
> +}
>  ioq->queue_idx = 0; /* reset */
>  return rc;
>  }
> diff --git a/hw/dataplane/vring.h b/hw/dataplane/vring.h
> index 70675e5..3eab4b4 100644
> --- a/hw/dataplane/vring.h
> +++ b/hw/dataplane/vring.h
> @@ -64,6 +64,86 @@ static void vring_setup(Vring *vring, VirtIODevice *vdev, 
> int n)
>  vring->vr.desc, vring->vr.avail, vring->vr.used);
>  }
>  
> +static bool vring_more_avail(Vring *vring)
> +{
> + return vring->vr.avail->idx != vring->last_avail_idx;
> +}
> +
> +/* This is stolen from linux-2.6/drivers/vhost/vhost.c. */

So add a Red Hat copyright pls.

> +static bool get_indirect(Vring *vring,
> + struct iovec iov[], struct iovec *iov_end,
> + unsigned int *out_num, unsigned int *in_num,
> + struct vring_desc *indirect)
> +{
> + struct vring_desc desc;
> + unsigned int i = 0, count, found = 0;
> +
> + /* Sanity check */
> + if (unlikely(indirect->len % sizeof desc)) {
> + fprintf(stderr, "Invalid length in indirect descriptor: "
> +"len 0x%llx not multiple of 0x%zx\n",
> +(unsigned long long)indirect->len,
> +sizeof desc);
> + exit(1);
> + }
> +
> + count = indirect->len / sizeof desc;
> + /* Buffers are chained via a 16 bit next field, so
> +  * we can have at most 2^16 of these. */
> + if (unlikely(count > USHRT_MAX + 1)) {
> + fprintf(stderr, "Indirect buffer length too big: %d\n",
> +indirect->len);
> +exit(1);
> + }
> +
> +/* Point to translate indirect desc chain */
> +indirect = phys_to_host(vring, indirect->addr);
> +
> + /* We will use the result as an address to read from, so most
> +  * architectures only need a compiler barrier here. */
> + __sync_synchronize(); /* read_barrier_depends(); */
> +
> + do {
> + if (unlikely(++found > count)) {
> + fprintf(stderr, "Loop detected: last one at %u "
> +"indirect size %u\n",
> +i, count);
> + exit(1);
> + }
> +
> +

[Qemu-devel] [PATCH 6/9] qapi: add qapi-schema-errors.json

2012-07-18 Thread Luiz Capitulino

This is the main error file, where all errors are defined and from
where error macros and whatnot will be automatically generated.

It contains all errors classes currently defined in qerror.[ch].

Signed-off-by: Luiz Capitulino 
---
 qapi-schema-errors.json | 616 
 1 file changed, 616 insertions(+)
 create mode 100644 qapi-schema-errors.json

diff --git a/qapi-schema-errors.json b/qapi-schema-errors.json
new file mode 100644
index 000..d99e55f
--- /dev/null
+++ b/qapi-schema-errors.json
@@ -0,0 +1,616 @@
+##
+# @AddClientFailed
+#
+# Since: 0.15
+##
+{ 'error': 'AddClientFailed',
+  'description': 'Could not add client' }
+
+##
+# @AmbiguousPath
+#
+# Since: 1.1.0
+##
+{ 'error': 'AmbiguousPath',
+  'description': 'Path \'%(path)\' does not uniquely identify a %(object)',
+  'data': {'path': 'str'} }
+
+##
+# @BadBusForDevice
+#
+# Since: 0.14
+##
+{ 'error': 'BadBusForDevice',
+  'description': 'Device \'%(device)\' can\'t go on a %(bad_bus_type) bus',
+  'data': {'device': 'str', 'bad_bus_type': 'str'} }
+
+##
+# @BaseNotFound
+#
+# Since: 1.1.0
+##
+{ 'error': 'BaseNotFound',
+  'description': 'Base \'%(base)\' not found',
+  'data': {'base': 'str'} }
+
+##
+# @BlockFormatFeatureNotSupported
+#
+# Since: 1.0
+##
+{ 'error': 'BlockFormatFeatureNotSupported',
+  'description': 'Block format \'%(format)\' used by device \'%(name)\' does 
not support feature \'%(feature)\'',
+  'data': {'format': 'str', 'name': 'str', 'feature': 'str'} }
+
+##
+# @BufferOverrun
+#
+# Since: 0.15
+##
+{ 'error': 'BufferOverrun',
+  'description': 'An internal buffer overran' }
+
+##
+# @BusNoHotplug
+#
+# Since: 0.14
+##
+{ 'error': 'BusNoHotplug',
+  'description': 'Bus \'%(bus)\' does not support hotplugging',
+  'data': {'bus': 'str'} }
+
+##
+# @BusNotFound
+#
+# Since: 0.14
+##
+{ 'error': 'BusNotFound',
+  'description': 'Bus \'%(bus)\' not found',
+  'data': {'bus': 'str'} }
+
+##
+# @CommandDisabled
+#
+# Since: 1.1.0
+##
+{ 'error': 'CommandDisabled',
+  'description': 'The command %(name) has been disabled for this instance',
+  'data': {'name': 'str'} }
+
+##
+# @CommandNotFound
+#
+# Since: 0.14
+##
+{ 'error': 'CommandNotFound',
+  'description': 'The command %(name) has not been found',
+  'data': {'name': 'str'} }
+
+##
+# @DeviceEncrypted
+#
+# Since: 0.14
+##
+{ 'error': 'DeviceEncrypted',
+  'description': 'Device \'%(device)\' is encrypted',
+  'data': {'device': 'str', 'filename': 'str'} }
+
+##
+# @DeviceFeatureBlocksMigration
+#
+# Since: 1.0
+##
+{ 'error': 'DeviceFeatureBlocksMigration',
+  'description': 'Migration is disabled when using feature \'%(feature)\' in 
device \'%(device)\'',
+  'data': {'device': 'str', 'feature': 'str'} }
+
+##
+# @DeviceHasNoMedium
+#
+# Since: 1.1.0
+##
+{ 'error': 'DeviceHasNoMedium',
+  'description': 'Device \'%(device)\' has no medium',
+  'data': {'device': 'str'} }
+
+##
+# @DeviceInitFailed
+#
+# Since: 0.14
+##
+{ 'error': 'DeviceInitFailed',
+  'description': 'Device \'%(device)\' could not be initialized',
+  'data': {'device': 'str'} }
+
+##
+# @DeviceInUse
+#
+# Since: 0.14
+##
+{ 'error': 'DeviceInUse',
+  'description': 'Device \'%(device)\' is in use',
+  'data': {'device': 'str'} }
+
+##
+# @DeviceIsReadOnly
+#
+# Since: 1.1.0
+##
+{ 'error': 'DeviceIsReadOnly',
+  'description': 'Device \'%(device)\' is read only',
+  'data': {'device': 'str'} }
+
+##
+# @DeviceLocked
+#
+# Since: 0.14
+##
+{ 'error': 'DeviceLocked',
+  'description': 'Device \'%(device)\' is locked',
+  'data': {'device': 'str'} }
+
+##
+# @DeviceMultipleBusses
+#
+# Since: 0.14
+##
+{ 'error': 'DeviceMultipleBusses',
+  'description': 'Device \'%(device)\' has multiple child busses',
+  'data': {'device': 'str'} }
+
+##
+# @DeviceNoBus
+#
+# Since: 0.14
+##
+{ 'error': 'DeviceNoBus',
+  'description': 'Device \'%(device)\' has no child bus',
+  'data': {'device': 'str'} }
+
+##
+# @DeviceNoHotplug
+#
+# Since: 0.14
+##
+{ 'error': 'DeviceNoHotplug',
+  'description': 'Device \'%(device)\' does not support hotplugging',
+  'data': {'device': 'str'} }
+
+##
+# @DeviceNotActive
+#
+# Since: 0.14
+##
+{ 'error': 'DeviceNotActive',
+  'description': 'Device \'%(device)\' has not been activated',
+  'data': {'device': 'str'} }
+
+##
+# @DeviceNotEncrypted
+#
+# Since: 0.14
+##
+{ 'error': 'DeviceNotEncrypted',
+  'description': 'Device \'%(device)\' is not encrypted',
+  'data': {'device': 'str'} }
+
+##
+# @DeviceNotFound
+#
+# Since: 0.14
+##
+{ 'error': 'DeviceNotFound',
+  'description': 'Device \'%(device)\' not found',
+  'data': {'device': 'str'} }
+
+##
+# @DeviceNotRemovable
+#
+# Since: 0.14
+##
+{ 'error': 'DeviceNotRemovable',
+  'description': 'Device \'%(device)\' is not removable',
+  'data': {'device': 'str'} }
+
+##
+# @DuplicateId
+#
+# Since: 0.14
+##
+{ 'error': 'DuplicateId',
+  'description': 'Duplicate ID \'%(id)\' for %(object)',
+  'data': {'id': 'str', 'object': 'str'} }
+
+##
+# @FdNotFound
+#
+# Since: 0.14
+#

Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option

2012-07-18 Thread Vinod, Chegu

Thanks Eduardo !

Hi Anthony, If you are ok with this patch...could you pl pull these changes 
into upstream (or) 
suggest who I should talk to get these changes in ?

Thanks!
Vinod

-Original Message-
From: Eduardo Habkost [mailto:ehabk...@redhat.com] 
Sent: Wednesday, July 18, 2012 10:15 AM
To: Vinod, Chegu
Cc: qemu-devel@nongnu.org; aligu...@us.ibm.com; k...@vger.kernel.org
Subject: Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's 
-numa option

On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote:
> Changes since v3:
>- using bitmap_set() instead of set_bit() in numa_add() routine.
>- removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
>- Rebased to the latest qemu.

Tested-by: Eduardo Habkost 
Reviewed-by: Eduardo Habkost 


> 
> Changes since v2:
>- Using "unsigned long *" for the node_cpumask[].
>- Use bitmap_new() instead of g_malloc0() for allocation.
>- Don't rely on "max_cpus" since it may not be initialized
>  before the numa related qemu options are parsed & processed.
> 
> Note: Continuing to use a new constant for allocation of
>   the mask (This constant is currently set to 255 since
>   with an 8bit APIC ID VCPUs can range from 0-254 in a
>   guest. The APIC ID 255 (0xFF) is reserved for broadcast).
> 
> Changes since v1:
> 
>- Use bitmap functions that are already in qemu (instead
>  of cpu_set_t macro's from sched.h)
>- Added a check for endvalue >= max_cpus.
>- Fix to address the round-robbing assignment when
>  cpu's are not explicitly specified.
> ---
> 
> v1:
> --
> 
> The -numa option to qemu is used to create [fake] numa nodes and 
> expose them to the guest OS instance.
> 
> There are a couple of issues with the -numa option:
> 
> a) Max VCPU's that can be specified for a guest while using
>the qemu's -numa option is 64. Due to a typecasting issue
>when the number of VCPUs is > 32 the VCPUs don't show up
>under the specified [fake] numa nodes.
> 
> b) KVM currently has support for 160VCPUs per guest. The
>qemu's -numa option has only support for upto 64VCPUs
>per guest.
> This patch addresses these two issues.
> 
> Below are examples of (a) and (b)
> 
> a) >32 VCPUs are specified with the -numa option:
> 
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> 71:01:01 \
> -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4
> 
> ...
> Upstream qemu :
> --
> 
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 node 0 
> size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 
> 46 47 48 49 50 51 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24 
> 25 26 27 28 29 52 53 54 55 56 57 58 59 node 2 size: 131072 MB node 3 
> cpus: 30 node 3 size: 131072 MB node 4 cpus:
> node 4 size: 131072 MB
> node 5 cpus: 31
> node 5 size: 131072 MB
> 
> With the patch applied :
> ---
> 
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 131072 MB node 
> 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 131072 MB node 3 
> cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 131072 MB node 4 
> cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 131072 MB node 5 
> cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 131072 MB
> 
> b) >64 VCPUs specified with -numa option:
> 
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> -cpu 
> Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl
> ,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc :4
> 
> ...
> 
> Upstream qemu :
> --
> 
> only 63 CPUs in NUMA mode supported.
> only 64 CPUs in NUMA mode supported.
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 node 0 size: 65536 MB 
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 
> 51 74 75 76 77 78 79 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24 
> 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 node 2 size: 65536 MB 
> node 3 cpus: 30 62 node 3 size: 65536 MB node 4 cpus:
> node 4 size: 65536 MB
> node 5 cpus:
> node 5 size: 65536 MB
> node 6 cpus: 31 63
> node 6 size: 65536 MB
> node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 node 7 
> size: 65536 MB
> 
> With the patch applied :
> ---
> 
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 65536 MB node 
> 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 65536 MB node 3 
> cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 65536 MB node 4 cpus: 
> 40 41 42 43 44 45

[Qemu-devel] [PATCH 5/9] qapi: qapi.py: allow the "'" character be escaped

2012-07-18 Thread Luiz Capitulino

A future commit will add a new qapi script which escapes that character.

Signed-off-by: Luiz Capitulino 
---
 scripts/qapi.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/scripts/qapi.py b/scripts/qapi.py
index e062336..9aa518f 100644
--- a/scripts/qapi.py
+++ b/scripts/qapi.py
@@ -21,7 +21,9 @@ def tokenize(data):
 elif data[0] == "'":
 data = data[1:]
 string = ''
-while data[0] != "'":
+while True:
+if data[0] == "'" and string[len(string)-1] != "\\":
+break
 string += data[0]
 data = data[1:]
 data = data[1:]
-- 
1.7.11.2.249.g31c7954.dirty

[Qemu-devel] [PATCH 8/9] qerror: switch to qapi generated error macros and table

2012-07-18 Thread Luiz Capitulino

Previous commits added qapi infrastructure to automatically generate
qerror macros and the qerror table from qapi-schema-errors.json.

This commit drops the current error macros from qerror.h and the error
table from qerror.c and use the generated ones instead.

Please, note that qapi-error.c is actually _included_ by qerror.c.
This is hacky, but the alternative is to make the table private to
qapi-error.c and generate functions to return table entries. I think that
doesn't pay much off.

Signed-off-by: Luiz Capitulino 
---
 qerror.c | 310 +--
 qerror.h | 220 +
 2 files changed, 2 insertions(+), 528 deletions(-)

diff --git a/qerror.c b/qerror.c
index e09c410..ec4ceb8 100644
--- a/qerror.c
+++ b/qerror.c
@@ -14,6 +14,7 @@
 #include "qjson.h"
 #include "qerror.h"
 #include "qemu-common.h"
+#include "qapi-errors.c"
 
 static void qerror_destroy_obj(QObject *obj);
 
@@ -23,315 +24,6 @@ static const QType qerror_type = {
 };
 
 /**
- * The 'desc' parameter is a printf-like string, the format of the format
- * string is:
- *
- * %(KEY)
- *
- * Where KEY is a QDict key, which has to be passed to qerror_from_info().
- *
- * Example:
- *
- * "foo error on device: %(device) slot: %(slot_nr)"
- *
- * A single percent sign can be printed if followed by a second one,
- * for example:
- *
- * "running out of foo: %(foo)%%"
- *
- * Please keep the entries in alphabetical order.
- * Use scripts/check-qerror.sh to check.
- */
-static const QErrorStringTable qerror_table[] = {
-{
-.error_fmt = QERR_ADD_CLIENT_FAILED,
-.desc  = "Could not add client",
-},
-{
-.error_fmt = QERR_AMBIGUOUS_PATH,
-.desc  = "Path '%(path)' does not uniquely identify a %(object)"
-},
-{
-.error_fmt = QERR_BAD_BUS_FOR_DEVICE,
-.desc  = "Device '%(device)' can't go on a %(bad_bus_type) bus",
-},
-{
-.error_fmt = QERR_BASE_NOT_FOUND,
-.desc  = "Base '%(base)' not found",
-},
-{
-.error_fmt = QERR_BLOCK_FORMAT_FEATURE_NOT_SUPPORTED,
-.desc  = "Block format '%(format)' used by device '%(name)' does 
not support feature '%(feature)'",
-},
-{
-.error_fmt = QERR_BUS_NO_HOTPLUG,
-.desc  = "Bus '%(bus)' does not support hotplugging",
-},
-{
-.error_fmt = QERR_BUS_NOT_FOUND,
-.desc  = "Bus '%(bus)' not found",
-},
-{
-.error_fmt = QERR_COMMAND_DISABLED,
-.desc  = "The command %(name) has been disabled for this instance",
-},
-{
-.error_fmt = QERR_COMMAND_NOT_FOUND,
-.desc  = "The command %(name) has not been found",
-},
-{
-.error_fmt = QERR_DEVICE_ENCRYPTED,
-.desc  = "Device '%(device)' is encrypted",
-},
-{
-.error_fmt = QERR_DEVICE_FEATURE_BLOCKS_MIGRATION,
-.desc  = "Migration is disabled when using feature '%(feature)' in 
device '%(device)'",
-},
-{
-.error_fmt = QERR_DEVICE_HAS_NO_MEDIUM,
-.desc  = "Device '%(device)' has no medium",
-},
-{
-.error_fmt = QERR_DEVICE_INIT_FAILED,
-.desc  = "Device '%(device)' could not be initialized",
-},
-{
-.error_fmt = QERR_DEVICE_IN_USE,
-.desc  = "Device '%(device)' is in use",
-},
-{
-.error_fmt = QERR_DEVICE_IS_READ_ONLY,
-.desc  = "Device '%(device)' is read only",
-},
-{
-.error_fmt = QERR_DEVICE_LOCKED,
-.desc  = "Device '%(device)' is locked",
-},
-{
-.error_fmt = QERR_DEVICE_MULTIPLE_BUSSES,
-.desc  = "Device '%(device)' has multiple child busses",
-},
-{
-.error_fmt = QERR_DEVICE_NO_BUS,
-.desc  = "Device '%(device)' has no child bus",
-},
-{
-.error_fmt = QERR_DEVICE_NO_HOTPLUG,
-.desc  = "Device '%(device)' does not support hotplugging",
-},
-{
-.error_fmt = QERR_DEVICE_NOT_ACTIVE,
-.desc  = "Device '%(device)' has not been activated",
-},
-{
-.error_fmt = QERR_DEVICE_NOT_ENCRYPTED,
-.desc  = "Device '%(device)' is not encrypted",
-},
-{
-.error_fmt = QERR_DEVICE_NOT_FOUND,
-.desc  = "Device '%(device)' not found",
-},
-{
-.error_fmt = QERR_DEVICE_NOT_REMOVABLE,
-.desc  = "Device '%(device)' is not removable",
-},
-{
-.error_fmt = QERR_DUPLICATE_ID,
-.desc  = "Duplicate ID '%(id)' for %(object)",
-},
-{
-.error_fmt = QERR_FD_NOT_FOUND,
-.desc  = "File descriptor named '%(name)' not found",
-},
-{
-.error_fmt = QERR_FD_NOT_SUPPLIED,
-.desc  = "No file descriptor supplied via SCM_RIGHTS",
-},
-{
-.error_fmt = QERR_FEATURE_DISABLED,
-.desc  = "The feature '%(name)'

Re: [Qemu-devel] [RFC v9 06/27] virtio-blk: Take PCI memory range into account

2012-07-18 Thread Michael S. Tsirkin

On Wed, Jul 18, 2012 at 04:07:33PM +0100, Stefan Hajnoczi wrote:
> Support >4 GB physical memory accesses.
> 
> Signed-off-by: Stefan Hajnoczi 

Need some sane APIs, this is just too scary.

> ---
>  hw/virtio-blk.c |7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
> index abd9386..99654f1 100644
> --- a/hw/virtio-blk.c
> +++ b/hw/virtio-blk.c
> @@ -64,6 +64,13 @@ static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
>   */
>  static inline void *phys_to_host(VirtIOBlock *s, target_phys_addr_t phys)
>  {
> +/* Adjust for 3.6-4 GB PCI memory range */
> +if (phys >= 0x1) {
> +phys -= 0x1 - 0xe000;
> +} else if (phys >= 0xe000) {
> +fprintf(stderr, "phys_to_host bad physical address in PCI range 
> %#lx\n", phys);
> +exit(1);
> +}
>  return s->phys_mem_zero_host_ptr + phys;
>  }
>  
> -- 
> 1.7.10.4

Re: [Qemu-devel] [RFC v9 11/27] virtio-blk: Indirect vring and flush support

2012-07-18 Thread Michael S. Tsirkin

On Wed, Jul 18, 2012 at 04:07:38PM +0100, Stefan Hajnoczi wrote:
> RHEL6 and other new guest kernels use indirect vring descriptors to
> increase the number of requests that can be batched.  This fundamentally
> changes vring from a scheme that requires fixed resources to something
> more dynamic (although there is still an absolute maximum number of
> descriptors).  Cope with indirect vrings by taking on as many requests
> as we can in one go and then postponing the remaining requests until the
> first batch completes.
> 
> It would be possible to switch to dynamic resource management so iovec
> and iocb structs are malloced.  This would allow the entire ring to be
> processed even with indirect descriptors, but would probably hit a
> bottleneck when io_submit refuses to queue more requests.  Therefore,
> stick with the simpler scheme for now.
> 
> Unfortunately Linux AIO does not support asynchronous fsync/fdatasync on
> all files.  In particular, an O_DIRECT opened file on ext4 does not
> support Linux AIO fdsync.  Work around this by performing fdatasync()
> synchronously for now.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/dataplane/ioq.h   |   18 -
>  hw/dataplane/vring.h |  103 
> +++---
>  hw/virtio-blk.c  |   75 ++--
>  3 files changed, 144 insertions(+), 52 deletions(-)
> 
> diff --git a/hw/dataplane/ioq.h b/hw/dataplane/ioq.h
> index 7200e87..d1545d6 100644
> --- a/hw/dataplane/ioq.h
> +++ b/hw/dataplane/ioq.h
> @@ -3,7 +3,7 @@
>  
>  typedef struct {
>  int fd; /* file descriptor */
> -unsigned int max_reqs;   /* max length of freelist and queue */
> +unsigned int max_reqs;  /* max length of freelist and queue */
>  
>  io_context_t io_ctx;/* Linux AIO context */
>  EventNotifier io_notifier;  /* Linux AIO eventfd */
> @@ -91,18 +91,16 @@ static struct iocb *ioq_rdwr(IOQueue *ioq, bool read, 
> struct iovec *iov, unsigne
>  return iocb;
>  }
>  
> -static struct iocb *ioq_fdsync(IOQueue *ioq)
> -{
> -struct iocb *iocb = ioq_get_iocb(ioq);
> -
> -io_prep_fdsync(iocb, ioq->fd);
> -io_set_eventfd(iocb, event_notifier_get_fd(&ioq->io_notifier));
> -return iocb;
> -}
> -
>  static int ioq_submit(IOQueue *ioq)
>  {
>  int rc = io_submit(ioq->io_ctx, ioq->queue_idx, ioq->queue);
> +if (unlikely(rc < 0)) {
> +unsigned int i;
> +fprintf(stderr, "io_submit io_ctx=%#lx nr=%d iovecs=%p\n", 
> (uint64_t)ioq->io_ctx, ioq->queue_idx, ioq->queue);
> +for (i = 0; i < ioq->queue_idx; i++) {
> +fprintf(stderr, "[%u] type=%#x fd=%d\n", i, 
> ioq->queue[i]->aio_lio_opcode, ioq->queue[i]->aio_fildes);
> +}
> +}
>  ioq->queue_idx = 0; /* reset */
>  return rc;
>  }
> diff --git a/hw/dataplane/vring.h b/hw/dataplane/vring.h
> index 70675e5..3eab4b4 100644
> --- a/hw/dataplane/vring.h
> +++ b/hw/dataplane/vring.h
> @@ -64,6 +64,86 @@ static void vring_setup(Vring *vring, VirtIODevice *vdev, 
> int n)
>  vring->vr.desc, vring->vr.avail, vring->vr.used);
>  }
>  
> +static bool vring_more_avail(Vring *vring)
> +{
> + return vring->vr.avail->idx != vring->last_avail_idx;
> +}
> +
> +/* This is stolen from linux-2.6/drivers/vhost/vhost.c. */
> +static bool get_indirect(Vring *vring,
> + struct iovec iov[], struct iovec *iov_end,
> + unsigned int *out_num, unsigned int *in_num,
> + struct vring_desc *indirect)
> +{
> + struct vring_desc desc;
> + unsigned int i = 0, count, found = 0;
> +
> + /* Sanity check */
> + if (unlikely(indirect->len % sizeof desc)) {
> + fprintf(stderr, "Invalid length in indirect descriptor: "
> +"len 0x%llx not multiple of 0x%zx\n",
> +(unsigned long long)indirect->len,
> +sizeof desc);
> + exit(1);
> + }
> +
> + count = indirect->len / sizeof desc;
> + /* Buffers are chained via a 16 bit next field, so
> +  * we can have at most 2^16 of these. */
> + if (unlikely(count > USHRT_MAX + 1)) {
> + fprintf(stderr, "Indirect buffer length too big: %d\n",
> +indirect->len);
> +exit(1);
> + }
> +
> +/* Point to translate indirect desc chain */
> +indirect = phys_to_host(vring, indirect->addr);
> +
> + /* We will use the result as an address to read from, so most
> +  * architectures only need a compiler barrier here. */
> + __sync_synchronize(); /* read_barrier_depends(); */


qemu has its own barriers now, pls use them.

> +
> + do {
> + if (unlikely(++found > count)) {
> + fprintf(stderr, "Loop detected: last one at %u "
> +"indirect size %u\n",
> +i, count);
> + exit(1);
> + }

Re: [Qemu-devel] [PATCH v4 5/6] qapi: convert sendkey

2012-07-18 Thread Amos Kong

- Original Message -
> On Wed, 18 Jul 2012 20:56:54 +0800
> Amos Kong  wrote:
> 
> > >> +} KeyDef;
> > >> +
> > >> +static const KeyDef key_defs[] = {
> > >
> > > We can't have an array defined in a header file because it will
> > > be defined in
> > > each .c file that includes it.
> > >
> > > Please, define it in input.c (along with qmp_send_key())
> > 
> > Ok.
> > 
> > > and write the following public functions:
> > >
> > >   o KeyCode keycode_from_key(const char *key);
> > >   o KeyCode keycode_from_code(int code);
> > 
> > 
> > void qmp_send_key(KeyCodesList *keys, bool has_hold_time, int64_t
> > hold_time, ...)
> >  ^
> >  \_ when we use qmp, a key list will be passed,
> >  the
> > values are the index
> > in enum KeyCodes. not the real KeyCode.
> 
> Right.
> 
> > 
> >  { 'enum': 'KeyCodes',
> >'data': [ 'shift', 'shift_r', 'al...
> > 
> > So we need to get this kind of 'index' in hmp_send_key() and pass
> > to
> > qmp_send_key().
> 
> Yes, that's what keycode_from_key() would do, something like this:
> 
> KeyCode keycode_from_key(const char *key)
> {
> int i;
> 
> for (i = 0; i < KEY_CODES_MAX; i++) {
> if (!strcmp(key, KeyCode_lookup[i])) {
> return i;
> }
> }
> 
> return KEY_CODE_MAX;
> }
> 
> Note that it returns the KeyCode index, and should be defined in
> input.c.

Sure :)
 
> > then convert this 'index' to keycode in qmp_send_key()
> 
> Exactly, qmp_send_key() can access key_defs[] to get the keycode from
> the
> index.
> 
> > 
> > I didn't find a way to define a non-serial enum.
> 
> I'm not sure I follow you here, I think that what I suggested above
> will work.

So I would continually pass 'index' to qmp_send_key().
I already implemented those in localhost, they all works.
Will fix other issues and post v5 later, thanks.


> > eg: (then int qmp_marshal_input_send_key() would pass real keycode
> > to
> > qmp_send_key())
> > { 'enum': 'KeyCodes',
> >'data': [ 'shift' = 0x2a, 'shift_r' = 0x36, 'alt' = 0x38, ...
> > 
> > 
> > If we still pass 'index' to qmp_send_key as patch V4.
> > 
> > extern int index_from_key(const char *key);   -> it's used in
> > hmp_send_key()
> > extern int index_from_keycode(int code);  -> it's used in
> > hmp_send_key()
> > extern char *key_from_keycode(int idx);   -> it's used in
> > monitor_find_completion()
> > extern int keycode_from_key(const char *key); -> it's used in
> > qmp_send_key()
> > 
> > 
> > > and then use these functions where using key_defs would be
> > > necessary. Also,
> > > note that keycode_from_key() can use KeyCodes_lookup[] instead of
> > > key_defs (this
> > > way we can drop 'name' from KeyDef).
> > 
> > 
> > 
> > >> +#endif
> > >> +#endif
> > >> +[KEY_CODES_MAX] = { 0, NULL },
> > >> +};
> > >> +
> > >>   #endif
> > >> diff --git a/hmp-commands.hx b/hmp-commands.hx
> > >> index e336251..865eea9 100644
> > >> --- a/hmp-commands.hx
> > >> +++ b/hmp-commands.hx
> > >> @@ -505,7 +505,7 @@ ETEXI
> > >>   .args_type  = "keys:s,hold-time:i?",
> > >>   .params = "keys [hold_ms]",
> > >>   .help   = "send keys to the VM (e.g. 'sendkey
> > >>   ctrl-alt-f1', default hold time=100 ms)",
> > >> -.mhandler.cmd = do_sendkey,
> > >> +.mhandler.cmd = hmp_send_key,
> > >>   },
> > >>
> > >>   STEXI
> > >> diff --git a/hmp.c b/hmp.c
> > >> index b9cec1d..cfdc106 100644
> > >> --- a/hmp.c
> > >> +++ b/hmp.c
> > >> @@ -19,6 +19,7 @@
> > >>   #include "qemu-timer.h"
> > >>   #include "qmp-commands.h"
> > >>   #include "monitor.h"
> > >> +#include "console.h"
> > >>
> > >>   static void hmp_handle_error(Monitor *mon, Error **errp)
> > >>   {
> > >> @@ -1000,3 +1001,66 @@ void hmp_netdev_del(Monitor *mon, const
> > >> QDict *qdict)
> > >>   qmp_netdev_del(id,&err);
> > >>   hmp_handle_error(mon,&err);
> > >>   }
> > >> +
> > >> +static int get_key_index(const char *key)
> > >> +{
> > >> +int i, keycode;
> > >> +char *endp;
> > >> +
> > >> +for (i = 0; i<  KEY_CODES_MAX; i++)
> > >> +if (key_defs[i].keycode&&  !strcmp(key,
> > >> key_defs[i].name))
> > >> +return i;
> > >
> > > Here you can call do:
> > >
> > >keycode = keycode_from_key(key);
> > >if (keycode != KEY_CODES_MAX) {
> > >   return keycode;
> > >}
> > >
> > >> +
> > >> +if (strstart(key, "0x", NULL)) {
> > >> +keycode = strtoul(key,&endp, 0);
> > >> +if (*endp == '\0'&&  keycode>= 0x01&&  keycode<= 0xff)
> > >> +for (i = 0; i<  KEY_CODES_MAX; i++)
> > >> +if (keycode == key_defs[i].keycode)
> > >> +return i;
> > >
> > > You can drop that for loop and do instead:
> > >
> > >keycode = keycode_from_code(keycode);
> > >
> > >
> > >> +}
> > >> +
> > >> +return -1;
> > >> +}
> > >> +
> > >> +void hmp_send_key(Moni

[Qemu-devel] [PATCH 3/9] qerror: rename QERR_QMP_EXTRA_MEMBER

2012-07-18 Thread Luiz Capitulino

The error class name is QMPExtraInputObjectMember, not QMPExtraMember.

Rename the QERR_QMP_EXTRA_MEMBER macro to QERR_QMP_EXTRA_INPUT_OBJECT_MEMBER
to reflect that, so that future error macro generation generates the
expected macro name.

Signed-off-by: Luiz Capitulino 
---
 monitor.c| 2 +-
 qapi/qmp-dispatch.c  | 2 +-
 qapi/qmp-input-visitor.c | 2 +-
 qerror.c | 2 +-
 qerror.h | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/monitor.c b/monitor.c
index 188c03d..8427c6c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4434,7 +4434,7 @@ static QDict *qmp_check_input_obj(QObject *input_obj)
 } else if (!strcmp(arg_name, "id")) {
 /* FIXME: check duplicated IDs for async commands */
 } else {
-qerror_report(QERR_QMP_EXTRA_MEMBER, arg_name);
+qerror_report(QERR_QMP_EXTRA_INPUT_OBJECT_MEMBER, arg_name);
 return NULL;
 }
 }
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 122c1a2..29d6f30 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -47,7 +47,7 @@ static QDict *qmp_dispatch_check_obj(const QObject *request, 
Error **errp)
 }
 has_exec_key = true;
 } else if (strcmp(arg_name, "arguments")) {
-error_set(errp, QERR_QMP_EXTRA_MEMBER, arg_name);
+error_set(errp, QERR_QMP_EXTRA_INPUT_OBJECT_MEMBER, arg_name);
 return NULL;
 }
 }
diff --git a/qapi/qmp-input-visitor.c b/qapi/qmp-input-visitor.c
index 107d8d3..a59d4f6 100644
--- a/qapi/qmp-input-visitor.c
+++ b/qapi/qmp-input-visitor.c
@@ -104,7 +104,7 @@ static void qmp_input_pop(QmpInputVisitor *qiv, Error 
**errp)
 if (g_hash_table_size(top_ht)) {
 const char *key;
 g_hash_table_find(top_ht, always_true, &key);
-error_set(errp, QERR_QMP_EXTRA_MEMBER, key);
+error_set(errp, QERR_QMP_EXTRA_INPUT_OBJECT_MEMBER, key);
 }
 g_hash_table_unref(top_ht);
 }
diff --git a/qerror.c b/qerror.c
index a9d771b..132ab2d 100644
--- a/qerror.c
+++ b/qerror.c
@@ -271,7 +271,7 @@ static const QErrorStringTable qerror_table[] = {
 .desc  = "QMP input object member '%(member)' expects 
'%(expected)'",
 },
 {
-.error_fmt = QERR_QMP_EXTRA_MEMBER,
+.error_fmt = QERR_QMP_EXTRA_INPUT_OBJECT_MEMBER,
 .desc  = "QMP input object member '%(member)' is unexpected",
 },
 {
diff --git a/qerror.h b/qerror.h
index 69d417d..27ce395 100644
--- a/qerror.h
+++ b/qerror.h
@@ -224,7 +224,7 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_QMP_BAD_INPUT_OBJECT_MEMBER \
 "{ 'class': 'QMPBadInputObjectMember', 'data': { 'member': %s, 'expected': 
%s } }"
 
-#define QERR_QMP_EXTRA_MEMBER \
+#define QERR_QMP_EXTRA_INPUT_OBJECT_MEMBER \
 "{ 'class': 'QMPExtraInputObjectMember', 'data': { 'member': %s } }"
 
 #define QERR_RESET_REQUIRED \
-- 
1.7.11.2.249.g31c7954.dirty

[Qemu-devel] [PATCH V4] exynos4210: add Exynos4210 i2c implementation

2012-07-18 Thread Igor Mitsyanko

Create 9 exynos4210 i2c interfaces.

Signed-off-by: Igor Mitsyanko 
Reviewed-by: Andreas Färber 
---
Previous versions of this patch were sent to mailing list within a "Exynos: 
i2c, gpio and
touchscreen support for NURI board" patchset some time ago. V4 is merely a 
rabased version
of V3.

Change list:

v3->v4
- rebased to current master.
- "exynos4210_i2c_info" renamed to "exynos4210_i2c_type_info".

v2->v3
- static TypeInfos are made const.
- added spaces after "do {".
- "All rights reserved" sentence is droped from license.
- names are fixed according to conventions.
- introduced scl_free member of exynos4210 i2c state. As it turned out, 
real hardware
  generates level-triggered interrupt while scl line is kept low, scl_free 
member models
  this behaviour. This fixes a bug when i2c generates receiving of extra 
data byte.
- exynos4210 i2c slave device droped.
- added missing i2caddr in vmstate.
- timers are droped.
- debug array containing i2c registers names replaced with a function.

v1->v2
- Added indendations for second and subsequent lines of multiple-line macro 
definitions;
- Weird big spaces after .field members of VMStateDescriptions are replaced 
with single space;

 hw/arm/Makefile.objs |2 +-
 hw/exynos4210.c  |   27 
 hw/exynos4210.h  |3 +
 hw/exynos4210_i2c.c  |  334 ++
 4 files changed, 365 insertions(+), 1 deletions(-)
 create mode 100644 hw/exynos4210_i2c.c

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 236786e..c413780 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -11,7 +11,7 @@ obj-y += realview_gic.o realview.o arm_sysctl.o arm11mpcore.o 
a9mpcore.o
 obj-y += exynos4210_gic.o exynos4210_combiner.o exynos4210.o
 obj-y += exynos4_boards.o exynos4210_uart.o exynos4210_pwm.o
 obj-y += exynos4210_pmu.o exynos4210_mct.o exynos4210_fimd.o
-obj-y += exynos4210_rtc.o
+obj-y += exynos4210_rtc.o exynos4210_i2c.o
 obj-y += arm_l2x0.o
 obj-y += arm_mptimer.o a15mpcore.o
 obj-y += armv7m.o armv7m_nvic.o stellaris.o pl022.o stellaris_enet.o
diff --git a/hw/exynos4210.c b/hw/exynos4210.c
index 7c58c90..00d4db8 100644
--- a/hw/exynos4210.c
+++ b/hw/exynos4210.c
@@ -39,6 +39,13 @@
 /* MCT */
 #define EXYNOS4210_MCT_BASE_ADDR   0x1005
 
+/* I2C */
+#define EXYNOS4210_I2C_SHIFT   0x0001
+#define EXYNOS4210_I2C_BASE_ADDR   0x1386
+/* Interrupt Group of External Interrupt Combiner for I2C */
+#define EXYNOS4210_I2C_INTG27
+#define EXYNOS4210_HDMI_INTG   16
+
 /* UART's definitions */
 #define EXYNOS4210_UART0_BASE_ADDR 0x1380
 #define EXYNOS4210_UART1_BASE_ADDR 0x1381
@@ -283,6 +290,26 @@ Exynos4210State *exynos4210_init(MemoryRegion *system_mem,
 s->irq_table[exynos4210_get_irq(35, 3)]);
 sysbus_mmio_map(busdev, 0, EXYNOS4210_MCT_BASE_ADDR);
 
+/*** I2C ***/
+for (n = 0; n < EXYNOS4210_I2C_NUMBER; n++) {
+uint32_t addr = EXYNOS4210_I2C_BASE_ADDR + EXYNOS4210_I2C_SHIFT * n;
+qemu_irq i2c_irq;
+
+if (n < 8) {
+i2c_irq = s->irq_table[exynos4210_get_irq(EXYNOS4210_I2C_INTG, n)];
+} else {
+i2c_irq = s->irq_table[exynos4210_get_irq(EXYNOS4210_HDMI_INTG, 
1)];
+}
+
+dev = qdev_create(NULL, "exynos4210.i2c");
+qdev_init_nofail(dev);
+busdev = sysbus_from_qdev(dev);
+sysbus_connect_irq(busdev, 0, i2c_irq);
+sysbus_mmio_map(busdev, 0, addr);
+s->i2c_if[n] = (i2c_bus *)qdev_get_child_bus(dev, "i2c");
+}
+
+
 /*** UARTs ***/
 exynos4210_uart_create(EXYNOS4210_UART0_BASE_ADDR,
EXYNOS4210_UART0_FIFO_SIZE, 0, NULL,
diff --git a/hw/exynos4210.h b/hw/exynos4210.h
index 9b1ae4c..a43ba3a 100644
--- a/hw/exynos4210.h
+++ b/hw/exynos4210.h
@@ -74,6 +74,8 @@
 #define EXYNOS4210_EXT_GIC_NIRQ (160-32)
 #define EXYNOS4210_INT_GIC_NIRQ 64
 
+#define EXYNOS4210_I2C_NUMBER   9
+
 typedef struct Exynos4210Irq {
 qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
@@ -95,6 +97,7 @@ typedef struct Exynos4210State {
 MemoryRegion dram1_mem;
 MemoryRegion boot_secondary;
 MemoryRegion bootreg_mem;
+i2c_bus *i2c_if[EXYNOS4210_I2C_NUMBER];
 } Exynos4210State;
 
 void exynos4210_write_secondary(ARMCPU *cpu,
diff --git a/hw/exynos4210_i2c.c b/hw/exynos4210_i2c.c
new file mode 100644
index 000..3f72a5c
--- /dev/null
+++ b/hw/exynos4210_i2c.c
@@ -0,0 +1,334 @@
+/*
+ *  Exynos4210 I2C Bus Serial Interface Emulation
+ *
+ *  Copyright (C) 2012 Samsung Electronics Co Ltd.
+ *Maksim Kozlov, 
+ *Igor Mitsyanko, 
+ *
+ *  This program is free software; you can redistribute it and/or modify it
+ *  under the terms of the GNU General Public License as published by the
+ *  Free Software Foundation; either version 2 of the L

[Qemu-devel] [PATCH 9/9] scripts: update check-qerror.sh

2012-07-18 Thread Luiz Capitulino

The qerror.h file doesn't contain the macros anymore, the script should
check qapi-schema-errors.json instead.

Signed-off-by: Luiz Capitulino 
---
 scripts/check-qerror.sh | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/scripts/check-qerror.sh b/scripts/check-qerror.sh
index af7fbd5..e397b4f 100755
--- a/scripts/check-qerror.sh
+++ b/scripts/check-qerror.sh
@@ -16,7 +16,5 @@ check_order() {
   return 0
 }
 
-check_order 'Definitions in qerror.h must be in alphabetical order:' \
-grep '^#define QERR_' qerror.h
-check_order 'Entries in qerror.c:qerror_table must be in alphabetical order:' \
-sed -n '/^static.*qerror_table\[\]/,/^};/s/QERR_/&/gp' qerror.c
+check_order 'Definitions must be in alphabetical order:' \
+grep '^# @' qapi-schema-errors.json
-- 
1.7.11.2.249.g31c7954.dirty

[Qemu-devel] [PATCH 7/9] qapi: add qapi-errors.py

2012-07-18 Thread Luiz Capitulino

This script generates two files from qapi-schema-errors.json:

 o qapi-errors.h: contains error macro definitions, eg. QERR_BASE_NOT_FOUND,
  corresponds to most of today's qerror.h

 o qapi-errors.c: contains the error table that currently exists in qerror.c

The script is not used yet though, that's going to be done by the next
commit.

Signed-off-by: Luiz Capitulino 
---
 Makefile   |   8 ++-
 scripts/qapi-errors.py | 180 +
 2 files changed, 186 insertions(+), 2 deletions(-)
 create mode 100644 scripts/qapi-errors.py

diff --git a/Makefile b/Makefile
index ab82ef3..2cdc732 100644
--- a/Makefile
+++ b/Makefile
@@ -22,8 +22,9 @@ GENERATED_HEADERS = config-host.h trace.h qemu-options.def
 ifeq ($(TRACE_BACKEND),dtrace)
 GENERATED_HEADERS += trace-dtrace.h
 endif
-GENERATED_HEADERS += qmp-commands.h qapi-types.h qapi-visit.h
-GENERATED_SOURCES += qmp-marshal.c qapi-types.c qapi-visit.c trace.c
+GENERATED_HEADERS += qmp-commands.h qapi-types.h qapi-visit.h qapi-errors.h
+GENERATED_SOURCES += qmp-marshal.c qapi-types.c qapi-visit.c qapi-errors.c \
+ trace.c
 
 # Don't try to regenerate Makefile or configure
 # We don't generate any of them
@@ -200,6 +201,9 @@ $(SRC_PATH)/qapi-schema.json 
$(SRC_PATH)/scripts/qapi-visit.py
 qmp-commands.h qmp-marshal.c :\
 $(SRC_PATH)/qapi-schema.json $(SRC_PATH)/scripts/qapi-commands.py
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-commands.py 
$(gen-out-type) -m -o "." < $<, "  GEN   $@")
+qapi-errors.h qapi-errors.c :\
+$(SRC_PATH)/qapi-schema-errors.json $(SRC_PATH)/scripts/qapi-errors.py
+   $(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-errors.py -o 
"." < $<, "  GEN   $@")
 
 QGALIB_OBJ=$(addprefix qapi-generated/, qga-qapi-types.o qga-qapi-visit.o 
qga-qmp-marshal.o)
 QGALIB_GEN=$(addprefix qapi-generated/, qga-qapi-types.h qga-qapi-visit.h 
qga-qmp-commands.h)
diff --git a/scripts/qapi-errors.py b/scripts/qapi-errors.py
new file mode 100644
index 000..eb86b8e
--- /dev/null
+++ b/scripts/qapi-errors.py
@@ -0,0 +1,180 @@
+#
+# QAPI errors generator
+#
+# Copyright (C) 2012 Red Hat, Inc.
+#
+# This work is licensed under the terms of the GNU GPLv2.
+# See the COPYING.LIB file in the top-level directory.
+
+from ordereddict import OrderedDict
+import getopt, sys, os, errno
+from qapi import *
+
+def gen_error_decl_prologue(header, guard, prefix=""):
+ret = mcgen('''
+/* THIS FILE IS AUTOMATICALLY GENERATED, DO NOT MODIFY */
+
+/*
+ * schema-defined QAPI Errors
+ *
+ * Copyright (C) 2012 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef %(guard)s
+#define %(guard)s
+
+''',
+ header=basename(header), guard=guardname(header), 
prefix=prefix)
+return ret
+
+def gen_error_def_prologue(error_header, prefix=""):
+ret = mcgen('''
+/* THIS FILE IS AUTOMATICALLY GENERATED, DO NOT MODIFY */
+
+/*
+ * schema-defined QMP Error table
+ *
+ * Copyright (C) 2012 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "%(prefix)s%(error_header)s"
+
+''',
+prefix=prefix, error_header=error_header)
+return ret
+
+def gen_error_def_table_init():
+ret = mcgen('''
+static const QErrorStringTable qerror_table[] = {
+''')
+return ret
+
+def gen_error_macro(err_domain):
+string = ''
+cur = err_domain[0]
+for nxt in err_domain[1:]:
+if string and cur.isupper() and nxt.islower():
+string += '_'
+string += cur
+cur = nxt
+string += cur
+return 'QERR_' + string.upper()
+
+def gen_error_def_table_entry(name, desc):
+err_macro = gen_error_macro(name)
+ret = mcgen('''
+{
+.error_fmt = %(error_macro)s,
+.desc  = "%(error_desc)s",
+},
+''',
+error_macro=err_macro, error_desc=desc)
+return ret
+
+def gen_error_def_table_close():
+ret = mcgen('''
+{}
+};
+''')
+return ret
+
+def maybe_open(really, name, opt):
+if really:
+return open(name, opt)
+else:
+import StringIO
+return StringIO.StringIO()
+
+def gen_error_decl_macro(name, data):
+colon = ''
+data_str = ''
+for k, v in data.items():
+data_str += colon + "'%s': " % k
+if v == 'str':
+data_str += "%s"
+elif v == 'int':
+data_str += '%"PRId64"'
+else:
+sys.exit("unknown data type '%s' for error '%s'" % (v, name))
+colon = ', '
+
+err_macro = gen_error_macro(name)
+
+ret = mcgen('''
+#define %(error_macro)s \\
+"{ 'class': '%(error_class)s', 'data': { %(error_data)s } }"
+
+''',
+error_macro=err_macro, error_class=name, error_data=data_str)
+return ret
+
+if

[Qemu-devel] [PATCH v6 2/3] Force driftfix=none on previous machines

2012-07-18 Thread Crístian Viana

The current value for the -rtc driftfix option is 'none'. This patch
makes sure that the old machines configuration will work the same way
even after that option changes its default value.

Signed-off-by: Crístian Viana 
---
 hw/pc_piix.c |4 
 1 file changed, 4 insertions(+)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 0c0096f..cac7b36 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -410,6 +410,10 @@ static QEMUMachine pc_machine_v1_1 = {
 .driver   = TYPE_USB_DEVICE,\
 .property = "full-path",\
 .value= "no",\
+},{\
+.driver   = "mc146818rtc",\
+.property = "lost_tick_policy",\
+.value= "discard",\
 }
 
 static QEMUMachine pc_machine_v1_0 = {
-- 
1.7.9.5

[Qemu-devel] [PATCH v6 3/3] Change driftfix default value to slew

2012-07-18 Thread Crístian Viana

Windows 2008+ is very sensitive to missed ticks. The RTC is used by default as
the time source. If driftfix is not enabled, Windows is prone to
blue screening.

Signed-off-by: Crístian Viana 
---
 hw/mc146818rtc.c |2 +-
 vl.c |   11 ++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c
index 3777f85..dfd7ee6 100644
--- a/hw/mc146818rtc.c
+++ b/hw/mc146818rtc.c
@@ -686,7 +686,7 @@ ISADevice *rtc_init(ISABus *bus, int base_year, qemu_irq 
intercept_irq)
 static Property mc146818rtc_properties[] = {
 DEFINE_PROP_INT32("base_year", RTCState, base_year, 1980),
 DEFINE_PROP_LOSTTICKPOLICY("lost_tick_policy", RTCState,
-   lost_tick_policy, LOST_TICK_DISCARD),
+   lost_tick_policy, LOST_TICK_SLEW),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/vl.c b/vl.c
index 46248b9..c09ed5e 100644
--- a/vl.c
+++ b/vl.c
@@ -559,7 +559,16 @@ static void configure_rtc(QemuOpts *opts)
 
 qdev_prop_register_global_list(slew_lost_ticks);
 } else if (!strcmp(value, "none")) {
-/* discard is default */
+static GlobalProperty discard_lost_ticks[] = {
+{
+.driver   = "mc146818rtc",
+.property = "lost_tick_policy",
+.value= "discard",
+},
+{ /* end of list */ }
+};
+
+qdev_prop_register_global_list(discard_lost_ticks);
 } else {
 fprintf(stderr, "qemu: invalid option value '%s'\n", value);
 exit(1);
-- 
1.7.9.5

[Qemu-devel] [PATCH v6 1/3] Check if GlobalProperty exists before registering

2012-07-18 Thread Crístian Viana

If a GlobalProperty has already been registered, it won't have its value
overwritten. This is done to enforce that the properties specified in the 
command
line will "win" over the ones specified by the machine properties, if set with
the parameter "-M".

Signed-off-by: Crístian Viana 
---
Changes since v5:
- Updated commit message of PATCH 1/3
- Rebased on master

 hw/qdev-properties.c |8 
 1 file changed, 8 insertions(+)

diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index 3571cf3..8356879 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -1212,6 +1212,14 @@ static QTAILQ_HEAD(, GlobalProperty) global_props = 
QTAILQ_HEAD_INITIALIZER(glob
 
 static void qdev_prop_register_global(GlobalProperty *prop)
 {
+GlobalProperty *p;
+
+QTAILQ_FOREACH(p, &global_props, next) {
+if (strcmp(prop->driver, p->driver) == 0) {
+return;
+}
+}
+
 QTAILQ_INSERT_TAIL(&global_props, prop, next);
 }
 
-- 
1.7.9.5

[Qemu-devel] [PATCH 4/9] qerror: rename QERR_PROPERTY_VALUE_NOT_POWER_OF_2

2012-07-18 Thread Luiz Capitulino

The error class name is PropertyValueNotPowerOf2, not PropertyValueNotPowerOf_2.

Rename the QERR_PROPERTY_VALUE_NOT_POWER_OF_2 macro to
QERR_PROPERTY_VALUE_NOT_POWER_OF2 to reflect that, so that future error macro
generation generates the expected macro name.

Signed-off-by: Luiz Capitulino 
---
 hw/qdev-properties.c | 2 +-
 qerror.c | 2 +-
 qerror.h | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index 3571cf3..38f78b3 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -885,7 +885,7 @@ static void set_blocksize(Object *obj, Visitor *v, void 
*opaque,
 
 /* We rely on power-of-2 blocksizes for bitmasks */
 if ((value & (value - 1)) != 0) {
-error_set(errp, QERR_PROPERTY_VALUE_NOT_POWER_OF_2,
+error_set(errp, QERR_PROPERTY_VALUE_NOT_POWER_OF2,
   dev->id?:"", name, (int64_t)value);
 return;
 }
diff --git a/qerror.c b/qerror.c
index 132ab2d..e09c410 100644
--- a/qerror.c
+++ b/qerror.c
@@ -245,7 +245,7 @@ static const QErrorStringTable qerror_table[] = {
 .desc  = "Property '%(device).%(property)' can't find value 
'%(value)'",
 },
 {
-.error_fmt = QERR_PROPERTY_VALUE_NOT_POWER_OF_2,
+.error_fmt = QERR_PROPERTY_VALUE_NOT_POWER_OF2,
 .desc  = "Property '%(device).%(property)' doesn't take "
  "value '%(value)', it's not a power of 2",
 },
diff --git a/qerror.h b/qerror.h
index 27ce395..58d0295 100644
--- a/qerror.h
+++ b/qerror.h
@@ -205,7 +205,7 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_PROPERTY_VALUE_NOT_FOUND \
 "{ 'class': 'PropertyValueNotFound', 'data': { 'device': %s, 'property': 
%s, 'value': %s } }"
 
-#define QERR_PROPERTY_VALUE_NOT_POWER_OF_2 \
+#define QERR_PROPERTY_VALUE_NOT_POWER_OF2 \
 "{ 'class': 'PropertyValueNotPowerOf2', 'data': { " \
 "'device': %s, 'property': %s, 'value': %"PRId64" } }"
 
-- 
1.7.11.2.249.g31c7954.dirty

[Qemu-devel] [PATCH 2/9] qerror: rename QERR_SOCK_CONNECT_IN_PROGRESS

2012-07-18 Thread Luiz Capitulino

The error class name is SockConnectInprogress, not SockConnectInProgress.

Rename the QERR_SOCK_CONNECT_IN_PROGRESS macro to QERR_SOCK_CONNECT_INPROGRESS
to reflect that, so that future error macro generation generates the
expected macro name.

Signed-off-by: Luiz Capitulino 
---
 migration-tcp.c | 2 +-
 qemu-sockets.c  | 2 +-
 qerror.c| 2 +-
 qerror.h| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 587fc70..3cffa94 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -90,7 +90,7 @@ int tcp_start_outgoing_migration(MigrationState *s, const 
char *host_port,
 
 if (!error_is_set(errp)) {
 migrate_fd_connect(s);
-} else if (error_is_type(*errp, QERR_SOCK_CONNECT_IN_PROGRESS)) {
+} else if (error_is_type(*errp, QERR_SOCK_CONNECT_INPROGRESS)) {
 DPRINTF("connect in progress\n");
 qemu_set_fd_handler2(s->fd, NULL, NULL, tcp_wait_for_connect, s);
 } else if (error_is_type(*errp, QERR_SOCK_CREATE_FAILED)) {
diff --git a/qemu-sockets.c b/qemu-sockets.c
index 1357ec0..20def3e 100644
--- a/qemu-sockets.c
+++ b/qemu-sockets.c
@@ -274,7 +274,7 @@ int inet_connect_opts(QemuOpts *opts, Error **errp)
   #else
 if (!block && (rc == -EINPROGRESS)) {
   #endif
-error_set(errp, QERR_SOCK_CONNECT_IN_PROGRESS);
+error_set(errp, QERR_SOCK_CONNECT_INPROGRESS);
 } else if (rc < 0) {
 if (NULL == e->ai_next)
 fprintf(stderr, "%s: connect(%s,%s,%s,%s): %s\n", __FUNCTION__,
diff --git a/qerror.c b/qerror.c
index e988e36..a9d771b 100644
--- a/qerror.c
+++ b/qerror.c
@@ -309,7 +309,7 @@ static const QErrorStringTable qerror_table[] = {
 .desc  = "Could not start VNC server on %(target)",
 },
 {
-.error_fmt = QERR_SOCK_CONNECT_IN_PROGRESS,
+.error_fmt = QERR_SOCK_CONNECT_INPROGRESS,
 .desc  = "Connection can not be completed immediately",
 },
 {
diff --git a/qerror.h b/qerror.h
index 71b0496..69d417d 100644
--- a/qerror.h
+++ b/qerror.h
@@ -251,7 +251,7 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_VNC_SERVER_FAILED \
 "{ 'class': 'VNCServerFailed', 'data': { 'target': %s } }"
 
-#define QERR_SOCK_CONNECT_IN_PROGRESS \
+#define QERR_SOCK_CONNECT_INPROGRESS \
 "{ 'class': 'SockConnectInprogress', 'data': {} }"
 
 #define QERR_SOCK_CONNECT_FAILED \
-- 
1.7.11.2.249.g31c7954.dirty

[Qemu-devel] [PATCH 1/9] qerror: rename QERR_SOCKET_* macros

2012-07-18 Thread Luiz Capitulino

The socket error classes call a socket 'Sock', like in SockConnectFailed,
but the error macros call a socket SOCKET, like in QERR_SOCKET_CONNECT_FAILED.

This will cause problems when the error macros creation get automated,
because the macro name will be derived from the error class name.

Avoid that by renaming all QERR_SOCKET_* macros to QERR_SOCK_*.

Signed-off-by: Luiz Capitulino 
---
 migration-tcp.c |  6 +++---
 qemu-sockets.c  | 22 +++---
 qerror.c| 10 +-
 qerror.h| 10 +-
 4 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 440804d..587fc70 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -90,13 +90,13 @@ int tcp_start_outgoing_migration(MigrationState *s, const 
char *host_port,
 
 if (!error_is_set(errp)) {
 migrate_fd_connect(s);
-} else if (error_is_type(*errp, QERR_SOCKET_CONNECT_IN_PROGRESS)) {
+} else if (error_is_type(*errp, QERR_SOCK_CONNECT_IN_PROGRESS)) {
 DPRINTF("connect in progress\n");
 qemu_set_fd_handler2(s->fd, NULL, NULL, tcp_wait_for_connect, s);
-} else if (error_is_type(*errp, QERR_SOCKET_CREATE_FAILED)) {
+} else if (error_is_type(*errp, QERR_SOCK_CREATE_FAILED)) {
 DPRINTF("connect failed\n");
 return -1;
-} else if (error_is_type(*errp, QERR_SOCKET_CONNECT_FAILED)) {
+} else if (error_is_type(*errp, QERR_SOCK_CONNECT_FAILED)) {
 DPRINTF("connect failed\n");
 migrate_fd_error(s);
 return -1;
diff --git a/qemu-sockets.c b/qemu-sockets.c
index 2ae715d..1357ec0 100644
--- a/qemu-sockets.c
+++ b/qemu-sockets.c
@@ -120,7 +120,7 @@ int inet_listen_opts(QemuOpts *opts, int port_offset, Error 
**errp)
 if ((qemu_opt_get(opts, "host") == NULL) ||
 (qemu_opt_get(opts, "port") == NULL)) {
 fprintf(stderr, "%s: host and/or port not specified\n", __FUNCTION__);
-error_set(errp, QERR_SOCKET_CREATE_FAILED);
+error_set(errp, QERR_SOCK_CREATE_FAILED);
 return -1;
 }
 pstrcpy(port, sizeof(port), qemu_opt_get(opts, "port"));
@@ -139,7 +139,7 @@ int inet_listen_opts(QemuOpts *opts, int port_offset, Error 
**errp)
 if (rc != 0) {
 fprintf(stderr,"getaddrinfo(%s,%s): %s\n", addr, port,
 gai_strerror(rc));
-error_set(errp, QERR_SOCKET_CREATE_FAILED);
+error_set(errp, QERR_SOCK_CREATE_FAILED);
 return -1;
 }
 
@@ -153,7 +153,7 @@ int inet_listen_opts(QemuOpts *opts, int port_offset, Error 
**errp)
 fprintf(stderr,"%s: socket(%s): %s\n", __FUNCTION__,
 inet_strfamily(e->ai_family), strerror(errno));
 if (!e->ai_next) {
-error_set(errp, QERR_SOCKET_CREATE_FAILED);
+error_set(errp, QERR_SOCK_CREATE_FAILED);
 }
 continue;
 }
@@ -179,7 +179,7 @@ int inet_listen_opts(QemuOpts *opts, int port_offset, Error 
**errp)
 inet_strfamily(e->ai_family), uaddr, inet_getport(e),
 strerror(errno));
 if (!e->ai_next) {
-error_set(errp, QERR_SOCKET_BIND_FAILED);
+error_set(errp, QERR_SOCK_BIND_FAILED);
 }
 }
 }
@@ -191,7 +191,7 @@ int inet_listen_opts(QemuOpts *opts, int port_offset, Error 
**errp)
 
 listen:
 if (listen(slisten,1) != 0) {
-error_set(errp, QERR_SOCKET_LISTEN_FAILED);
+error_set(errp, QERR_SOCK_LISTEN_FAILED);
 perror("listen");
 closesocket(slisten);
 freeaddrinfo(res);
@@ -226,7 +226,7 @@ int inet_connect_opts(QemuOpts *opts, Error **errp)
 block = qemu_opt_get_bool(opts, "block", 0);
 if (addr == NULL || port == NULL) {
 fprintf(stderr, "inet_connect: host and/or port not specified\n");
-error_set(errp, QERR_SOCKET_CREATE_FAILED);
+error_set(errp, QERR_SOCK_CREATE_FAILED);
 return -1;
 }
 
@@ -239,7 +239,7 @@ int inet_connect_opts(QemuOpts *opts, Error **errp)
 if (0 != (rc = getaddrinfo(addr, port, &ai, &res))) {
 fprintf(stderr,"getaddrinfo(%s,%s): %s\n", addr, port,
 gai_strerror(rc));
-error_set(errp, QERR_SOCKET_CREATE_FAILED);
+error_set(errp, QERR_SOCK_CREATE_FAILED);
return -1;
 }
 
@@ -274,7 +274,7 @@ int inet_connect_opts(QemuOpts *opts, Error **errp)
   #else
 if (!block && (rc == -EINPROGRESS)) {
   #endif
-error_set(errp, QERR_SOCKET_CONNECT_IN_PROGRESS);
+error_set(errp, QERR_SOCK_CONNECT_IN_PROGRESS);
 } else if (rc < 0) {
 if (NULL == e->ai_next)
 fprintf(stderr, "%s: connect(%s,%s,%s,%s): %s\n", __FUNCTION__,
@@ -287,7 +287,7 @@ int inet_connect_opts(QemuOpts *opts, Error **errp)
 freeaddrinfo(res);
 return sock;
 }
-error_set(errp, QERR_SOCKET_CONNECT_FAILED);
+error_set(errp, QERR_SOCK_CONNECT_FAILED);

[Qemu-devel] [PATCH 0/9]: qapi: generate qerrors from qapi-schema-errors.json

2012-07-18 Thread Luiz Capitulino

This series moves all qerrors we have today to qapi-schema-errors.json and
generate the error macros and table from it.

With this series, one doesn't have to manually add an error macro and the
matching table entry anymore. He or she just have to add the new error to
qapi-schema-errors.json.

There's only one small problem: the matching between error class name and
the (generated) error macro may not be clear for those not familirized with
qerrors. There are two possible solutions to this:

  1. Add the generated macro name along with the error class name in
 qapi-schema-json-errors.json; and/or

  2. add docs/qapi-errors.txt to explain this in detail

This series is my first step on improving our error API.

 Makefile |   8 +-
 hw/qdev-properties.c |   2 +-
 migration-tcp.c  |   6 +-
 monitor.c|   2 +-
 qapi-schema-errors.json  | 616 +++
 qapi/qmp-dispatch.c  |   2 +-
 qapi/qmp-input-visitor.c |   2 +-
 qemu-sockets.c   |  22 +-
 qerror.c | 310 +---
 qerror.h | 220 +
 scripts/check-qerror.sh  |   6 +-
 scripts/qapi-errors.py   | 180 ++
 scripts/qapi.py  |   4 +-
 13 files changed, 827 insertions(+), 553 deletions(-)

Re: [Qemu-devel] [PATCH v4 0/5] refactor PC machine, i440fx and piix3 to take advantage of QOM

2012-07-18 Thread Paolo Bonzini

Il 18/07/2012 15:19, Wanpeng Li ha scritto:
> [CCing ML]
> 
> This series aggressively refactors the PC machine initialization to be more
> modelled and less ad-hoc.  The highlights of this series are:
> 
> 1) Things like -m and -bios-name are now device model properties
> 
> 2) The i440fx and piix3 are now modelled in a thorough fashion
> 
> 3) Most of the chipset features of the piix3 are modelled through composition
> 
> 4) i440fx_init is trivialized to creating devices and setting properties
> 
> 5) convert MemoryRegion to QOM
> 
> 6) convert PCI host bridge to QOM
> 
> The point (4) is the most important one.  As we refactor in this fashion,
> we should quickly get to the point where machine->init disappears completely 
> in
> favor of just creating a handful of devices.
> 
> The two stage initialization of QOM is important here.  instance_init() is 
> when
> composed devices are created which means that after you've created a device, 
> all
> of its children are visible in the device model.  This lets you set properties
> of the parent and its children.
> 
> realize() (which is still called DeviceState::init today) will be called right
> before the guest starts up for the first time.
> 
> Signed-off-by: Anthony Liguori 
> Signed-off-by: Wanpeng Li 

Why should we include this?  I assume it conflicts uselessly with the
work Jason is doing on q35.

Paolo

[Qemu-devel] [PATCH resend] tests: Makefile: include dependency files

2012-07-18 Thread Eduardo Habkost

Otherwise 'make check' won't recompile files that need to be recompiled
because of header changes.

To reproduce the bug, run:

 $ make check  # succeeds
 $ echo B0RKED > hw/mc146818rtc_regs.h
 $ make check  # is supposed to try to rebuild tests/rtc-test.o and fail

Signed-off-by: Eduardo Habkost 
---
 tests/Makefile |2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/Makefile b/tests/Makefile
index d66ab19..b3f7494 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -143,3 +143,5 @@ check-qtest: $(patsubst %,check-qtest-%, $(QTEST_TARGETS))
 check-unit: $(patsubst %,check-%, $(check-unit-y))
 check-block: $(patsubst %,check-%, $(check-block-y))
 check: check-unit check-qtest
+
+-include $(wildcard tests/*.d)
-- 
1.7.10.4

Re: [Qemu-devel] qemu in full emulation on win32

2012-07-18 Thread Peter Maydell

On 18 July 2012 17:29, Stefan Weil  wrote:
> Am 18.07.2012 10:01, schrieb Peter Maydell:
>> I think this struct should use QEMU_PACKED, which will
>> ensure that it is packed to GCC rules rather than MS
>> rules.
>>
>> We also seem to have let a pile of new uses of attribute((packed))
>> slip in in hw/mfi.h. Those are probably bugs too.

> They are bugs (for w32 / w64 hosts). I just sent a patch to fix them.
>
> Some more which I did not fix are in the TCG debugger interface.
> Maybe those also need to be fixed for w32 / w64, but that needs
> more tests and reading of the debugger interface documentation.
> Maybe Richard Henderson knows whether they should use QEMU_PACKED,
> too.

So, I think none of these structs will get actually used on
Windows, because they're ELF structure layouts which only
get used on hosts with ELF support. However I think it would
be nice to use QEMU_PACKED for consistency.

We can clearly straightforwardly switch the DebugInfo struct
in tcg/tcg.c to use QEMU_PACKED.

The remaining cases are all the same, in tcg/*/tcg-target.c:

typedef struct {
uint32_t len __attribute__((aligned((sizeof(void *);
uint32_t cie_offset;
tcg_target_long func_start __attribute__((packed));
tcg_target_long func_len __attribute__((packed));
uint8_t def_cfa[4];
uint8_t reg_ofs[14];
} DebugFrameFDE;

I think we can just remove the packed attributes from the
struct member fields and apply QEMU_PACKED to the whole struct:
I don't think this will change any of the alignment or packing
in the not-windows case.

(It's not clear to me why the alignment attribute is applied
to the len field rather than to the whole struct, but we
don't need to change that I guess.)

-- PMM

Re: [Qemu-devel] [PATCH v4] Fixes related to processing of qemu's -numa option

2012-07-18 Thread Eduardo Habkost

On Mon, Jul 16, 2012 at 09:31:30PM -0700, Chegu Vinod wrote:
> Changes since v3:
>- using bitmap_set() instead of set_bit() in numa_add() routine.
>- removed call to bitmak_zero() since bitmap_new() also zeros' the bitmap.
>- Rebased to the latest qemu.

Tested-by: Eduardo Habkost 
Reviewed-by: Eduardo Habkost 


> 
> Changes since v2:
>- Using "unsigned long *" for the node_cpumask[].
>- Use bitmap_new() instead of g_malloc0() for allocation.
>- Don't rely on "max_cpus" since it may not be initialized
>  before the numa related qemu options are parsed & processed.
> 
> Note: Continuing to use a new constant for allocation of
>   the mask (This constant is currently set to 255 since
>   with an 8bit APIC ID VCPUs can range from 0-254 in a
>   guest. The APIC ID 255 (0xFF) is reserved for broadcast).
> 
> Changes since v1:
> 
>- Use bitmap functions that are already in qemu (instead
>  of cpu_set_t macro's from sched.h)
>- Added a check for endvalue >= max_cpus.
>- Fix to address the round-robbing assignment when
>  cpu's are not explicitly specified.
> ---
> 
> v1:
> --
> 
> The -numa option to qemu is used to create [fake] numa nodes
> and expose them to the guest OS instance.
> 
> There are a couple of issues with the -numa option:
> 
> a) Max VCPU's that can be specified for a guest while using
>the qemu's -numa option is 64. Due to a typecasting issue
>when the number of VCPUs is > 32 the VCPUs don't show up
>under the specified [fake] numa nodes.
> 
> b) KVM currently has support for 160VCPUs per guest. The
>qemu's -numa option has only support for upto 64VCPUs
>per guest.
> This patch addresses these two issues.
> 
> Below are examples of (a) and (b)
> 
> a) >32 VCPUs are specified with the -numa option:
> 
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> 71:01:01 \
> -net tap,ifname=tap0,script=no,downscript=no \
> -vnc :4
> 
> ...
> Upstream qemu :
> --
> 
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51
> node 1 size: 131072 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59
> node 2 size: 131072 MB
> node 3 cpus: 30
> node 3 size: 131072 MB
> node 4 cpus:
> node 4 size: 131072 MB
> node 5 cpus: 31
> node 5 size: 131072 MB
> 
> With the patch applied :
> ---
> 
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 6 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 131072 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19
> node 1 size: 131072 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29
> node 2 size: 131072 MB
> node 3 cpus: 30 31 32 33 34 35 36 37 38 39
> node 3 size: 131072 MB
> node 4 cpus: 40 41 42 43 44 45 46 47 48 49
> node 4 size: 131072 MB
> node 5 cpus: 50 51 52 53 54 55 56 57 58 59
> node 5 size: 131072 MB
> 
> b) >64 VCPUs specified with -numa option:
> 
> /usr/local/bin/qemu-system-x86_64 \
> -enable-kvm \
> -cpu 
> Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+d-vnc
>  :4
> 
> ...
> 
> Upstream qemu :
> --
> 
> only 63 CPUs in NUMA mode supported.
> only 64 CPUs in NUMA mode supported.
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 
> 75 76 77 78 79
> node 1 size: 65536 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61
> node 2 size: 65536 MB
> node 3 cpus: 30 62
> node 3 size: 65536 MB
> node 4 cpus:
> node 4 size: 65536 MB
> node 5 cpus:
> node 5 size: 65536 MB
> node 6 cpus: 31 63
> node 6 size: 65536 MB
> node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69
> node 7 size: 65536 MB
> 
> With the patch applied :
> ---
> 
> QEMU 1.1.50 monitor - type 'help' for more information
> (qemu) info numa
> 8 nodes
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9
> node 0 size: 65536 MB
> node 1 cpus: 10 11 12 13 14 15 16 17 18 19
> node 1 size: 65536 MB
> node 2 cpus: 20 21 22 23 24 25 26 27 28 29
> node 2 size: 65536 MB
> node 3 cpus: 30 31 32 33 34 35 36 37 38 39
> node 3 size: 65536 MB
> node 4 cpus: 40 41 42 43 44 45 46 47 48 49
> node 4 size: 65536 MB
> node 5 cpus: 50 51 52 53 54 55 56 57 58 59
> node 5 size: 65536 MB
> node 6 cpus: 60 61 62 63 64 65 66 67 68 69
> node 6 size: 65536 MB
> node 7 cpus: 70 71 72 73 74 75 76 77 78 79
> 
> Signed-off-by: Chegu Vinod , Jim Hull , 
> Craig Hada 
> ---
>  cpus.c   |3 ++-
>  hw/pc.c  |3 ++-
>  sysemu.h |3 ++-
>  vl.c |   43 +--
>  4 files changed, 27 insertions(+), 25

Re: [Qemu-devel] [PATCH] Use macro QEMU_PACKED for new packed structures

2012-07-18 Thread Peter Maydell

On 18 July 2012 17:42, Stefan Weil  wrote:
> Am 18.07.2012 18:33, schrieb Peter Maydell:
>> On 18 July 2012 17:12, Stefan Weil  wrote:
>>> Since commit 541dc0d47f10973c241e9955afc2aefc96adec51,
>>> some new packed structures were added without using QEMU_PACKED.
>>>
>>> QEMU_PACKED is needed for compilations with MinGW.
>>> For other platforms nothing changes.
>>>
>>> The code was fixed using this command:
>>>
>>>  git grep -la '__attribute__ ((packed))'|xargs perl -pi -e
>>> 's/__attribute__ \(\(packed\)\)/QEMU_PACKED/'
>>
>> There's one in tcg/tcg.c as well, which you missed because
>> it doesn't have a space between the "__attribute__" and
>> "((packed))".
>>
>> (There are also some in some of the tcg-target.c files,
>> but those are cases of the packed attribute being applied
>> to individual struct members rather than a whole struct, so
>> I'm not entirely sure what we should do with those.)

> I noticed those, too, but did not change them because
> I think that they need more examination (see my comment
> in http://lists.nongnu.org/archive/html/qemu-devel/2012-07/msg02373.html).

Fair enough.
Reviewed-by: Peter Maydell 

-- PMM

Re: [Qemu-devel] [PATCH] Use macro QEMU_PACKED for new packed structures

2012-07-18 Thread Stefan Weil


Am 18.07.2012 18:33, schrieb Peter Maydell:

On 18 July 2012 17:12, Stefan Weil  wrote:

Since commit 541dc0d47f10973c241e9955afc2aefc96adec51,
some new packed structures were added without using QEMU_PACKED.

QEMU_PACKED is needed for compilations with MinGW.
For other platforms nothing changes.

The code was fixed using this command:

 git grep -la '__attribute__ ((packed))'|xargs perl -pi -e 's/__attribute__ 
\(\(packed\)\)/QEMU_PACKED/'

There's one in tcg/tcg.c as well, which you missed because
it doesn't have a space between the "__attribute__" and
"((packed))".

(There are also some in some of the tcg-target.c files,
but those are cases of the packed attribute being applied
to individual struct members rather than a whole struct, so
I'm not entirely sure what we should do with those.)

-- PMM



I noticed those, too, but did not change them because
I think that they need more examination (see my comment
in http://lists.nongnu.org/archive/html/qemu-devel/2012-07/msg02373.html).

Thanks,

Stefan W.

[Qemu-devel] [PATCH] configure: Support system emulation with large memory on w32 hosts

2012-07-18 Thread Stefan Weil

32-bit applications on Windows normally only get virtual memory in
the lower 2 GiB address space.

Because of memory fragmentation, VirtualAlloc() usually won't get 1 GiB
of contiguous virtual memory in that address space. Therefore running
system emulations with 1 GiB or more RAM will abort with a failure.

The linker flag --large-address-aware allows addresses in the upper 2 GiB.
With this flag, it is possible to run emulated machines with up to
2047 MiB RAM.

Signed-off-by: Stefan Weil 
---

I tested the executables with large address awareness on a 64 bit Windows
(works) and with Wine on Debian 32 bit Linux (no longer aborts, but hangs
when 1024 or more MiB are requested).

Maybe the support for large addresses is broken in my Wine version.
Please report any different test results.

Regards,

Stefan Weil


 configure |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index df5c99a..b681f9a 100755
--- a/configure
+++ b/configure
@@ -2937,9 +2937,9 @@ else
 POD2MAN="pod2man"
 fi
 
-# Use ASLR, no-SEH and DEP if available
+# Use ASLR, large addresses, no-SEH and DEP if available
 if test "$mingw32" = "yes" ; then
-for flag in --dynamicbase --no-seh --nxcompat; do
+for flag in --dynamicbase --large-address-aware --no-seh --nxcompat; do
 if $ld --help 2>/dev/null | grep ".$flag" >/dev/null 2>/dev/null ; then
 LDFLAGS="-Wl,$flag $LDFLAGS"
 fi
-- 
1.7.0.4

Re: [Qemu-devel] qemu in full emulation on win32

2012-07-18 Thread Stefan Weil


Am 18.07.2012 08:30, schrieb Alexey Kardashevskiy:

Hi!

Found 2 problems while I was debugging qemu/ppc64-softmmu/qemu-system-ppc64.exe
WindowsXP SP3 Pro, 32bit, i686-pc-mingw32-gcc (GCC) 4.5.2.


1. The size of the following is 7 bytes on linux and 8 bytes on Windows:
struct {
 uint32_t hi;
 uint64_t child;
 uint64_t parent;
 uint64_t size;
} __attribute__((packed)) ranges[];

The structure is used between QEMU and Open Firmware (powerpc bios) so it is 
important.

The Feature is described here:
http://stackoverflow.com/questions/7789668/why-would-the-size-of-a-packed-structure-be-different-on-linux-and-windows-when
Shortly there is packing and ms-packing and they are different :)

The solutions are:
1. Add MS-specific #pragma pack(push,1) and #pragma pack(pop).
2. Add -mno-ms-bitfields (gcc >= 4.7.0)
3. Change the structure above to use only uint32_t.

What is the common way of solving such problems in QEMU?


Problem 1 is solved with solution 4 (your own patch) although
that patch does not change the structure size to 7 bytes :-)






2. QEMU cannot allocate 1024MB for the guest RAM. Literally, VirtualAlloc() 
fails on 1024MB BUT it does not if I allocate 1023MB and 64MB by 2 subsequent 
calls. We allocate RAM via memory_region_init_ram(). I am pretty sure this is 
not happening on 64bit Windows and I suspect that it is happening with 
qemu-system-x86.exe, is not it?

Do we care that there is actually enough RAM and we could allocate it in 
several chunks?



Please try the patch which I'm going to send.

On w64, VirtualAlloc() _can_ allocate large quantities of contiguous 
virtual memory.


On w32, it is normally restricted to the lower 2 GiB which are already 
fragmented
by the code (executable, shared libraries) and data. Larger quantities 
are available
when the executable is allowed to use the upper 2 GiB, too. That's what 
my patch does.


Regards,

Stefan W.

Re: [Qemu-devel] [PATCH] Use macro QEMU_PACKED for new packed structures

2012-07-18 Thread Peter Maydell

On 18 July 2012 17:12, Stefan Weil  wrote:
> Since commit 541dc0d47f10973c241e9955afc2aefc96adec51,
> some new packed structures were added without using QEMU_PACKED.
>
> QEMU_PACKED is needed for compilations with MinGW.
> For other platforms nothing changes.
>
> The code was fixed using this command:
>
> git grep -la '__attribute__ ((packed))'|xargs perl -pi -e 
> 's/__attribute__ \(\(packed\)\)/QEMU_PACKED/'

There's one in tcg/tcg.c as well, which you missed because
it doesn't have a space between the "__attribute__" and
"((packed))".

(There are also some in some of the tcg-target.c files,
but those are cases of the packed attribute being applied
to individual struct members rather than a whole struct, so
I'm not entirely sure what we should do with those.)

-- PMM

Re: [Qemu-devel] qemu in full emulation on win32

2012-07-18 Thread Stefan Weil


Am 18.07.2012 10:01, schrieb Peter Maydell:

On 18 July 2012 07:30, Alexey Kardashevskiy  wrote:

1. The size of the following is 7 bytes on linux and 8 bytes on Windows:
struct {
 uint32_t hi;
 uint64_t child;
 uint64_t parent;
 uint64_t size;
} __attribute__((packed)) ranges[];

The structure is used between QEMU and Open Firmware (powerpc bios) so it is 
important.


I think this struct should use QEMU_PACKED, which will
ensure that it is packed to GCC rules rather than MS
rules.

We also seem to have let a pile of new uses of attribute((packed))
slip in in hw/mfi.h. Those are probably bugs too.

-- PMM


They are bugs (for w32 / w64 hosts). I just sent a patch to fix them.

Some more which I did not fix are in the TCG debugger interface.
Maybe those also need to be fixed for w32 / w64, but that needs
more tests and reading of the debugger interface documentation.
Maybe Richard Henderson knows whether they should use QEMU_PACKED,
too.

Regards,

Stefan W.

Re: [Qemu-devel] [PATCH v4 5/5] merge pc_piix.c to pc.c

2012-07-18 Thread Michael S. Tsirkin

On Wed, Jul 18, 2012 at 09:19:05PM +0800, Wanpeng Li wrote:
> [CCing ML]
> 
> From: Anthony Liguori 
> 
> Signed-off-by: Anthony Liguori 
> Signed-off-by: Wanpeng Li 

Yes, it was a bad idea to split to begin with, but the
machine compatibility code IMO is better kept
somewhere separate, if for no other reason that
we keep touching it with each revision.
Would be nice to leave pc alone.

> ---
>  hw/i386/Makefile.objs |1 -
>  hw/pc.c   |  753 
> +++--
>  hw/pc.h   |   46 +---
>  hw/pc_piix.c  |  661 ---
>  4 files changed, 667 insertions(+), 794 deletions(-)
>  delete mode 100644 hw/pc_piix.c
> 
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 49b32d0..868020c 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -4,7 +4,6 @@ obj-y += sga.o ioapic_common.o ioapic.o i440fx.o piix3.o
>  obj-y += vmport.o
>  obj-y += pci-hotplug.o smbios.o wdt_ib700.o
>  obj-y += debugcon.o multiboot.o
> -obj-y += pc_piix.o
>  obj-y += pc_sysfw.o
>  obj-$(CONFIG_XEN) += xen_platform.o xen_apic.o
>  obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
> diff --git a/hw/pc.c b/hw/pc.c
> index c7e9ab3..7c04339 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -27,6 +27,7 @@
>  #include "fdc.h"
>  #include "ide.h"
>  #include "pci.h"
> +#include "usb.h"
>  #include "vmware_vga.h"
>  #include "monitor.h"
>  #include "fw_cfg.h"
> @@ -47,7 +48,10 @@
>  #include "ui/qemu-spice.h"
>  #include "memory.h"
>  #include "exec-memory.h"
> +#include "kvm/clock.h"
>  #include "arch_init.h"
> +#include "smbus.h"
> +#include "boards.h"
>  
>  /* output Bochs bios info messages */
>  //#define DEBUG_BIOS
> @@ -75,6 +79,8 @@
>  
>  #define E820_NR_ENTRIES  16
>  
> +#define MAX_IDE_BUS 2
> +
>  struct e820_entry {
>  uint64_t address;
>  uint64_t length;
> @@ -86,10 +92,14 @@ struct e820_table {
>  struct e820_entry entry[E820_NR_ENTRIES];
>  } QEMU_PACKED __attribute((__aligned__(4)));
>  
> +static const int ide_iobase[MAX_IDE_BUS] = { 0x1f0, 0x170 };
> +static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 };
> +static const int ide_irq[MAX_IDE_BUS] = { 14, 15 };
> +
>  static struct e820_table e820_table;
>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>  
> -void gsi_handler(void *opaque, int n, int level)
> +static void gsi_handler(void *opaque, int n, int level)
>  {
>  GSIState *s = opaque;
>  
> @@ -107,7 +117,7 @@ static void ioport80_write(void *opaque, uint32_t addr, 
> uint32_t data)
>  /* MSDOS compatibility mode FPU exception support */
>  static qemu_irq ferr_irq;
>  
> -void pc_register_ferr_irq(qemu_irq irq)
> +static void pc_register_ferr_irq(qemu_irq irq)
>  {
>  ferr_irq = irq;
>  }
> @@ -330,7 +340,7 @@ static void pc_cmos_init_late(void *opaque)
>  qemu_unregister_reset(pc_cmos_init_late, opaque);
>  }
>  
> -void pc_cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
> +static void pc_cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
>const char *boot_device,
>ISADevice *floppy, BusState *idebus0, BusState *idebus1,
>ISADevice *s)
> @@ -860,7 +870,7 @@ static const int ne2000_irq[NE2000_NB_MAX] = { 9, 10, 11, 
> 3, 4, 5 };
>  static const int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc };
>  static const int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 };
>  
> -void pc_init_ne2k_isa(ISABus *bus, NICInfo *nd)
> +static void pc_init_ne2k_isa(ISABus *bus, NICInfo *nd)
>  {
>  static int nb_ne2k = 0;
>  
> @@ -915,7 +925,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id)
>  return dev;
>  }
>  
> -void pc_acpi_smi_interrupt(void *opaque, int irq, int level)
> +static void pc_acpi_smi_interrupt(void *opaque, int irq, int level)
>  {
>  CPUX86State *s = opaque;
>  
> @@ -952,7 +962,7 @@ static X86CPU *pc_new_cpu(const char *cpu_model)
>  return cpu;
>  }
>  
> -void pc_cpus_init(const char *cpu_model)
> +static void pc_cpus_init(const char *cpu_model)
>  {
>  int i;
>  
> @@ -970,55 +980,18 @@ void pc_cpus_init(const char *cpu_model)
>  }
>  }
>  
> -void *pc_memory_init(MemoryRegion *system_memory,
> +static void *pc_memory_init(MemoryRegion *system_memory,
>  const char *kernel_filename,
>  const char *kernel_cmdline,
>  const char *initrd_filename,
>  ram_addr_t below_4g_mem_size,
> -ram_addr_t above_4g_mem_size,
> -MemoryRegion *rom_memory,
> -MemoryRegion **ram_memory)
> +ram_addr_t above_4g_mem_size)
>  {
>  int linux_boot, i;
> -MemoryRegion *ram, *option_rom_mr;
> -MemoryRegion *ram_below_4g, *ram_above_4g;
>  void *fw_cfg;
>  
>  linux_boot = (kernel_filename != NULL);
>  
> -/* Allocate RAM.  We allocate it

[Qemu-devel] [PATCH] Use macro QEMU_PACKED for new packed structures

2012-07-18 Thread Stefan Weil

Since commit 541dc0d47f10973c241e9955afc2aefc96adec51,
some new packed structures were added without using QEMU_PACKED.

QEMU_PACKED is needed for compilations with MinGW.
For other platforms nothing changes.

The code was fixed using this command:

git grep -la '__attribute__ ((packed))'|xargs perl -pi -e 's/__attribute__ 
\(\(packed\)\)/QEMU_PACKED/'

Signed-off-by: Stefan Weil 
---
 hw/mfi.h  |   92 ++---
 hw/ppce500_spin.c |2 +-
 2 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/hw/mfi.h b/hw/mfi.h
index 8a82162..3045d4e 100644
--- a/hw/mfi.h
+++ b/hw/mfi.h
@@ -435,24 +435,24 @@ typedef enum {
 struct mfi_sg32 {
 uint32_t addr;
 uint32_t len;
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 struct mfi_sg64 {
 uint64_t addr;
 uint32_t len;
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 struct mfi_sg_skinny {
 uint64_t addr;
 uint32_t len;
 uint32_t flag;
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 union mfi_sgl {
 struct mfi_sg32 sg32[1];
 struct mfi_sg64 sg64[1];
 struct mfi_sg_skinny sg_skinny[1];
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 /* Message frames.  All messages have a common header */
 struct mfi_frame_header {
@@ -468,7 +468,7 @@ struct mfi_frame_header {
 uint16_t flags;
 uint16_t timeout;
 uint32_t data_len;
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 struct mfi_init_frame {
 struct mfi_frame_header header;
@@ -487,7 +487,7 @@ struct mfi_io_frame {
 uint32_t lba_lo;
 uint32_t lba_hi;
 union mfi_sgl sgl;
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 #define MFI_PASS_FRAME_SIZE 48
 struct mfi_pass_frame {
@@ -496,7 +496,7 @@ struct mfi_pass_frame {
 uint32_t sense_addr_hi;
 uint8_t cdb[16];
 union mfi_sgl sgl;
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 #define MFI_DCMD_FRAME_SIZE 40
 struct mfi_dcmd_frame {
@@ -504,7 +504,7 @@ struct mfi_dcmd_frame {
 uint32_t opcode;
 uint8_t mbox[MFI_MBOX_SIZE];
 union mfi_sgl sgl;
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 struct mfi_abort_frame {
 struct mfi_frame_header header;
@@ -512,7 +512,7 @@ struct mfi_abort_frame {
 uint32_t abort_mfi_addr_lo;
 uint32_t abort_mfi_addr_hi;
 uint32_t reserved1[6];
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 struct mfi_smp_frame {
 struct mfi_frame_header header;
@@ -521,7 +521,7 @@ struct mfi_smp_frame {
 struct mfi_sg32 sg32[2];
 struct mfi_sg64 sg64[2];
 } sgl;
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 struct mfi_stp_frame {
 struct mfi_frame_header header;
@@ -531,7 +531,7 @@ struct mfi_stp_frame {
 struct mfi_sg32 sg32[2];
 struct mfi_sg64 sg64[2];
 } sgl;
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 union mfi_frame {
 struct mfi_frame_header header;
@@ -563,7 +563,7 @@ struct mfi_init_qinfo {
 uint32_t pi_addr_hi;
 uint32_t ci_addr_lo;
 uint32_t ci_addr_hi;
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 /* Controller properties */
 struct mfi_ctrl_props {
@@ -626,7 +626,7 @@ struct mfi_ctrl_props {
* is spun down (0=use FW defaults)
*/
 uint8_t reserved[24];
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 /* PCI information about the card. */
 struct mfi_info_pci {
@@ -635,7 +635,7 @@ struct mfi_info_pci {
 uint16_t subvendor;
 uint16_t subdevice;
 uint8_t reserved[24];
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 /* Host (front end) interface information */
 struct mfi_info_host {
@@ -647,7 +647,7 @@ struct mfi_info_host {
 uint8_t reserved[6];
 uint8_t port_count;
 uint64_t port_addr[8];
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 /* Device (back end) interface information */
 struct mfi_info_device {
@@ -659,7 +659,7 @@ struct mfi_info_device {
 uint8_t reserved[6];
 uint8_t port_count;
 uint64_t port_addr[8];
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 /* Firmware component information */
 struct mfi_info_component {
@@ -667,7 +667,7 @@ struct mfi_info_component {
 char version[32];
 char build_date[16];
 char build_time[16];
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 /* Controller default settings */
 struct mfi_defaults {
@@ -710,7 +710,7 @@ struct mfi_defaults {
 uint8_t fde_only;
 uint8_t delay_during_post;
 uint8_t resv[19];
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 /* Controller default settings */
 struct mfi_bios_data {
@@ -722,7 +722,7 @@ struct mfi_bios_data {
 uint8_t expose_all_drives;
 uint8_t reserved[56];
 uint8_t check_sum;
-} __attribute__ ((packed));
+} QEMU_PACKED;
 
 /* SAS (?) controller info, returned from MFI_DCMD_CTRL_GETINFO. */
 struct mfi_ctrl_info {
@@ -807,7 +807,7 @@ struct mfi_ctrl_info {
 uint8_t min;
 uint8_t max;
 uint8_t reserved[2];
-} __attribute__ ((packed)) stripe_sz_ops;
+} QEMU_PACKED stri

[Qemu-devel] [RFC v9 01/27] virtio-blk: Remove virtqueue request handling code

2012-07-18 Thread Stefan Hajnoczi

Start with a clean slate, a virtio-blk device that supports virtio
lifecycle operations and configuration but doesn't do any actual I/O.
The I/O is going to happen in a separate optimized data plane thread.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-blk.c |  496 +--
 1 file changed, 3 insertions(+), 493 deletions(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 49990f8..a627427 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -16,18 +16,12 @@
 #include "trace.h"
 #include "blockdev.h"
 #include "virtio-blk.h"
-#include "scsi-defs.h"
-#ifdef __linux__
-# include 
-#endif
 
 typedef struct VirtIOBlock
 {
 VirtIODevice vdev;
 BlockDriverState *bs;
 VirtQueue *vq;
-void *rq;
-QEMUBH *bh;
 BlockConf *conf;
 char *serial;
 unsigned short sector_mask;
@@ -39,439 +33,11 @@ static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
 return (VirtIOBlock *)vdev;
 }
 
-typedef struct VirtIOBlockReq
-{
-VirtIOBlock *dev;
-VirtQueueElement elem;
-struct virtio_blk_inhdr *in;
-struct virtio_blk_outhdr *out;
-struct virtio_scsi_inhdr *scsi;
-QEMUIOVector qiov;
-struct VirtIOBlockReq *next;
-BlockAcctCookie acct;
-} VirtIOBlockReq;
-
-static void virtio_blk_req_complete(VirtIOBlockReq *req, int status)
-{
-VirtIOBlock *s = req->dev;
-
-trace_virtio_blk_req_complete(req, status);
-
-stb_p(&req->in->status, status);
-virtqueue_push(s->vq, &req->elem, req->qiov.size + sizeof(*req->in));
-virtio_notify(&s->vdev, s->vq);
-}
-
-static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
-int is_read)
-{
-BlockErrorAction action = bdrv_get_on_error(req->dev->bs, is_read);
-VirtIOBlock *s = req->dev;
-
-if (action == BLOCK_ERR_IGNORE) {
-bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_IGNORE, is_read);
-return 0;
-}
-
-if ((error == ENOSPC && action == BLOCK_ERR_STOP_ENOSPC)
-|| action == BLOCK_ERR_STOP_ANY) {
-req->next = s->rq;
-s->rq = req;
-bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_STOP, is_read);
-vm_stop(RUN_STATE_IO_ERROR);
-bdrv_iostatus_set_err(s->bs, error);
-} else {
-virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
-bdrv_acct_done(s->bs, &req->acct);
-g_free(req);
-bdrv_emit_qmp_error_event(s->bs, BDRV_ACTION_REPORT, is_read);
-}
-
-return 1;
-}
-
-static void virtio_blk_rw_complete(void *opaque, int ret)
-{
-VirtIOBlockReq *req = opaque;
-
-trace_virtio_blk_rw_complete(req, ret);
-
-if (ret) {
-int is_read = !(ldl_p(&req->out->type) & VIRTIO_BLK_T_OUT);
-if (virtio_blk_handle_rw_error(req, -ret, is_read))
-return;
-}
-
-virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
-bdrv_acct_done(req->dev->bs, &req->acct);
-g_free(req);
-}
-
-static void virtio_blk_flush_complete(void *opaque, int ret)
-{
-VirtIOBlockReq *req = opaque;
-
-if (ret) {
-if (virtio_blk_handle_rw_error(req, -ret, 0)) {
-return;
-}
-}
-
-virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
-bdrv_acct_done(req->dev->bs, &req->acct);
-g_free(req);
-}
-
-static VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
-{
-VirtIOBlockReq *req = g_malloc(sizeof(*req));
-req->dev = s;
-req->qiov.size = 0;
-req->next = NULL;
-return req;
-}
-
-static VirtIOBlockReq *virtio_blk_get_request(VirtIOBlock *s)
-{
-VirtIOBlockReq *req = virtio_blk_alloc_request(s);
-
-if (req != NULL) {
-if (!virtqueue_pop(s->vq, &req->elem)) {
-g_free(req);
-return NULL;
-}
-}
-
-return req;
-}
-
-#ifdef __linux__
-static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
-{
-struct sg_io_hdr hdr;
-int ret;
-int status;
-int i;
-
-if ((req->dev->vdev.guest_features & (1 << VIRTIO_BLK_F_SCSI)) == 0) {
-virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
-g_free(req);
-return;
-}
-
-/*
- * We require at least one output segment each for the virtio_blk_outhdr
- * and the SCSI command block.
- *
- * We also at least require the virtio_blk_inhdr, the virtio_scsi_inhdr
- * and the sense buffer pointer in the input segments.
- */
-if (req->elem.out_num < 2 || req->elem.in_num < 3) {
-virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
-g_free(req);
-return;
-}
-
-/*
- * No support for bidirection commands yet.
- */
-if (req->elem.out_num > 2 && req->elem.in_num > 3) {
-virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
-g_free(req);
-return;
-}
-
-/*
- * The scsi inhdr is placed in the second-to-last input segment, just
- * before the regular inhdr.
- */
-req->scsi = (void *)req->elem.in_sg[req->elem.in_num - 2].iov_base;
-
-memset(&hdr, 0, sizeof(struct s

[Qemu-devel] [RFC v9 10/27] virtio-blk: Stop data plane thread cleanly

2012-07-18 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
---
 hw/dataplane/event-poll.h |   79 ---
 hw/dataplane/ioq.h|   65 +--
 hw/dataplane/vring.h  |6 +-
 hw/virtio-blk.c   |  154 +++--
 4 files changed, 243 insertions(+), 61 deletions(-)

diff --git a/hw/dataplane/event-poll.h b/hw/dataplane/event-poll.h
index f38e969..acd85e1 100644
--- a/hw/dataplane/event-poll.h
+++ b/hw/dataplane/event-poll.h
@@ -5,17 +5,40 @@
 #include "event_notifier.h"
 
 typedef struct EventHandler EventHandler;
-typedef void EventCallback(EventHandler *handler);
+typedef bool EventCallback(EventHandler *handler);
 struct EventHandler
 {
-EventNotifier *notifier;/* eventfd */
-EventCallback *callback;/* callback function */
+EventNotifier *notifier;/* eventfd */
+EventCallback *callback;/* callback function */
 };
 
 typedef struct {
-int epoll_fd;   /* epoll(2) file descriptor */
+int epoll_fd;   /* epoll(2) file descriptor */
+EventNotifier stop_notifier;/* stop poll notifier */
+EventHandler stop_handler;  /* stop poll handler */
 } EventPoll;
 
+/* Add an event notifier and its callback for polling */
+static void event_poll_add(EventPoll *poll, EventHandler *handler, 
EventNotifier *notifier, EventCallback *callback)
+{
+struct epoll_event event = {
+.events = EPOLLIN,
+.data.ptr = handler,
+};
+handler->notifier = notifier;
+handler->callback = callback;
+if (epoll_ctl(poll->epoll_fd, EPOLL_CTL_ADD, 
event_notifier_get_fd(notifier), &event) != 0) {
+fprintf(stderr, "failed to add event handler to epoll: %m\n");
+exit(1);
+}
+}
+
+/* Event callback for stopping the event_poll_run() loop */
+static bool handle_stop(EventHandler *handler)
+{
+return false; /* stop event loop */
+}
+
 static void event_poll_init(EventPoll *poll)
 {
 /* Create epoll file descriptor */
@@ -24,35 +47,29 @@ static void event_poll_init(EventPoll *poll)
 fprintf(stderr, "epoll_create1 failed: %m\n");
 exit(1);
 }
+
+/* Set up stop notifier */
+if (event_notifier_init(&poll->stop_notifier, 0) < 0) {
+fprintf(stderr, "failed to init stop notifier\n");
+exit(1);
+}
+event_poll_add(poll, &poll->stop_handler,
+   &poll->stop_notifier, handle_stop);
 }
 
 static void event_poll_cleanup(EventPoll *poll)
 {
+event_notifier_cleanup(&poll->stop_notifier);
 close(poll->epoll_fd);
 poll->epoll_fd = -1;
 }
 
-/* Add an event notifier and its callback for polling */
-static void event_poll_add(EventPoll *poll, EventHandler *handler, 
EventNotifier *notifier, EventCallback *callback)
-{
-struct epoll_event event = {
-.events = EPOLLIN,
-.data.ptr = handler,
-};
-handler->notifier = notifier;
-handler->callback = callback;
-if (epoll_ctl(poll->epoll_fd, EPOLL_CTL_ADD, 
event_notifier_get_fd(notifier), &event) != 0) {
-fprintf(stderr, "failed to add event handler to epoll: %m\n");
-exit(1);
-}
-}
-
 /* Block until the next event and invoke its callback
  *
  * Signals must be masked, EINTR should never happen.  This is true for QEMU
  * threads.
  */
-static void event_poll(EventPoll *poll)
+static bool event_poll(EventPoll *poll)
 {
 EventHandler *handler;
 struct epoll_event event;
@@ -73,7 +90,27 @@ static void event_poll(EventPoll *poll)
 event_notifier_test_and_clear(handler->notifier);
 
 /* Handle the event */
-handler->callback(handler);
+return handler->callback(handler);
+}
+
+static void event_poll_run(EventPoll *poll)
+{
+while (event_poll(poll)) {
+/* do nothing */
+}
+}
+
+/* Stop the event_poll_run() loop
+ *
+ * This function can be used from another thread.
+ */
+static void event_poll_stop(EventPoll *poll)
+{
+uint64_t dummy = 1;
+int eventfd = event_notifier_get_fd(&poll->stop_notifier);
+ssize_t unused __attribute__((unused));
+
+unused = write(eventfd, &dummy, sizeof dummy);
 }
 
 #endif /* EVENT_POLL_H */
diff --git a/hw/dataplane/ioq.h b/hw/dataplane/ioq.h
index 26ca307..7200e87 100644
--- a/hw/dataplane/ioq.h
+++ b/hw/dataplane/ioq.h
@@ -3,10 +3,10 @@
 
 typedef struct {
 int fd; /* file descriptor */
-unsigned int maxreqs;   /* max length of freelist and queue */
+unsigned int max_reqs;   /* max length of freelist and queue */
 
 io_context_t io_ctx;/* Linux AIO context */
-EventNotifier notifier; /* Linux AIO eventfd */
+EventNotifier io_notifier;  /* Linux AIO eventfd */
 
 /* Requests can complete in any order so a free list is necessary to manage
  * available iocbs.
@@ -19,25 +19,28 @@ typedef struct {
 unsigned int queue_idx;
 } IOQueue;
 
-static void ioq_init(IOQueue *ioq, int fd, unsigned int maxreqs)
+static void ioq

[Qemu-devel] [RFC v9 04/27] virtio-blk: Map vring

2012-07-18 Thread Stefan Hajnoczi

Map the vring to host memory so it can be accessed without the overhead
of the QEMU memory functions.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-blk.c |   44 
 1 file changed, 44 insertions(+)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index f6043bc..4c790a3 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "qemu-common.h"
 #include "qemu-thread.h"
 #include "qemu-error.h"
@@ -43,6 +44,8 @@ typedef struct VirtIOBlock
 bool data_plane_started;
 QemuThread data_plane_thread;
 
+struct vring vring;
+
 int epoll_fd;   /* epoll(2) file descriptor */
 io_context_t io_ctx;/* Linux AIO context */
 EventNotifier io_notifier;  /* Linux AIO eventfd */
@@ -55,6 +58,43 @@ static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
 return (VirtIOBlock *)vdev;
 }
 
+/* Map the guest's vring to host memory
+ *
+ * This is not allowed but we know the ring won't move.
+ */
+static void map_vring(struct vring *vring, VirtIODevice *vdev, int n)
+{
+target_phys_addr_t physaddr, len;
+
+vring->num = virtio_queue_get_num(vdev, n);
+
+physaddr = virtio_queue_get_desc_addr(vdev, n);
+len = virtio_queue_get_desc_size(vdev, n);
+vring->desc = cpu_physical_memory_map(physaddr, &len, 0);
+
+physaddr = virtio_queue_get_avail_addr(vdev, n);
+len = virtio_queue_get_avail_size(vdev, n);
+vring->avail = cpu_physical_memory_map(physaddr, &len, 0);
+
+physaddr = virtio_queue_get_used_addr(vdev, n);
+len = virtio_queue_get_used_size(vdev, n);
+vring->used = cpu_physical_memory_map(physaddr, &len, 0);
+
+if (!vring->desc || !vring->avail || !vring->used) {
+fprintf(stderr, "virtio-blk failed to map vring\n");
+exit(1);
+}
+
+fprintf(stderr, "virtio-blk vring physical=%#lx desc=%p avail=%p 
used=%p\n",
+virtio_queue_get_ring_addr(vdev, n),
+vring->desc, vring->avail, vring->used);
+}
+
+static void unmap_vring(struct vring *vring, VirtIODevice *vdev, int n)
+{
+cpu_physical_memory_unmap(vring->desc, virtio_queue_get_ring_size(vdev, 
n), 0, 0);
+}
+
 static void handle_io(void)
 {
 fprintf(stderr, "io completion happened\n");
@@ -109,6 +149,8 @@ static void add_event_handler(int epoll_fd, EventHandler 
*event_handler)
 
 static void data_plane_start(VirtIOBlock *s)
 {
+map_vring(&s->vring, &s->vdev, 0);
+
 /* Create epoll file descriptor */
 s->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
 if (s->epoll_fd < 0) {
@@ -157,6 +199,8 @@ static void data_plane_stop(VirtIOBlock *s)
 s->vdev.binding->set_host_notifier(s->vdev.binding_opaque, 0, false);
 
 close(s->epoll_fd);
+
+unmap_vring(&s->vring, &s->vdev, 0);
 }
 
 static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t val)
-- 
1.7.10.4

[Qemu-devel] [RFC v9 07/27] virtio-blk: Put dataplane code into its own directory

2012-07-18 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
---
 hw/dataplane/event-poll.h |   79 +++
 hw/dataplane/vring.h  |  191 +
 hw/virtio-blk.c   |  149 ---
 3 files changed, 304 insertions(+), 115 deletions(-)
 create mode 100644 hw/dataplane/event-poll.h
 create mode 100644 hw/dataplane/vring.h

diff --git a/hw/dataplane/event-poll.h b/hw/dataplane/event-poll.h
new file mode 100644
index 000..f38e969
--- /dev/null
+++ b/hw/dataplane/event-poll.h
@@ -0,0 +1,79 @@
+#ifndef EVENT_POLL_H
+#define EVENT_POLL_H
+
+#include 
+#include "event_notifier.h"
+
+typedef struct EventHandler EventHandler;
+typedef void EventCallback(EventHandler *handler);
+struct EventHandler
+{
+EventNotifier *notifier;/* eventfd */
+EventCallback *callback;/* callback function */
+};
+
+typedef struct {
+int epoll_fd;   /* epoll(2) file descriptor */
+} EventPoll;
+
+static void event_poll_init(EventPoll *poll)
+{
+/* Create epoll file descriptor */
+poll->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
+if (poll->epoll_fd < 0) {
+fprintf(stderr, "epoll_create1 failed: %m\n");
+exit(1);
+}
+}
+
+static void event_poll_cleanup(EventPoll *poll)
+{
+close(poll->epoll_fd);
+poll->epoll_fd = -1;
+}
+
+/* Add an event notifier and its callback for polling */
+static void event_poll_add(EventPoll *poll, EventHandler *handler, 
EventNotifier *notifier, EventCallback *callback)
+{
+struct epoll_event event = {
+.events = EPOLLIN,
+.data.ptr = handler,
+};
+handler->notifier = notifier;
+handler->callback = callback;
+if (epoll_ctl(poll->epoll_fd, EPOLL_CTL_ADD, 
event_notifier_get_fd(notifier), &event) != 0) {
+fprintf(stderr, "failed to add event handler to epoll: %m\n");
+exit(1);
+}
+}
+
+/* Block until the next event and invoke its callback
+ *
+ * Signals must be masked, EINTR should never happen.  This is true for QEMU
+ * threads.
+ */
+static void event_poll(EventPoll *poll)
+{
+EventHandler *handler;
+struct epoll_event event;
+int nevents;
+
+/* Wait for the next event.  Only do one event per call to keep the
+ * function simple, this could be changed later. */
+nevents = epoll_wait(poll->epoll_fd, &event, 1, -1);
+if (unlikely(nevents != 1)) {
+fprintf(stderr, "epoll_wait failed: %m\n");
+exit(1); /* should never happen */
+}
+
+/* Find out which event handler has become active */
+handler = event.data.ptr;
+
+/* Clear the eventfd */
+event_notifier_test_and_clear(handler->notifier);
+
+/* Handle the event */
+handler->callback(handler);
+}
+
+#endif /* EVENT_POLL_H */
diff --git a/hw/dataplane/vring.h b/hw/dataplane/vring.h
new file mode 100644
index 000..7099a99
--- /dev/null
+++ b/hw/dataplane/vring.h
@@ -0,0 +1,191 @@
+#ifndef VRING_H
+#define VRING_H
+
+#include 
+#include "qemu-common.h"
+
+typedef struct {
+void *phys_mem_zero_host_ptr;   /* host pointer to guest RAM */
+struct vring vr;/* virtqueue vring mapped to host memory */
+__u16 last_avail_idx;   /* last processed avail ring index */
+__u16 last_used_idx;/* last processed used ring index */
+} Vring;
+
+static inline unsigned int vring_get_num(Vring *vring)
+{
+return vring->vr.num;
+}
+
+/* Map target physical address to host address
+ */
+static inline void *phys_to_host(Vring *vring, target_phys_addr_t phys)
+{
+/* Adjust for 3.6-4 GB PCI memory range */
+if (phys >= 0x1) {
+phys -= 0x1 - 0xe000;
+} else if (phys >= 0xe000) {
+fprintf(stderr, "phys_to_host bad physical address in PCI range 
%#lx\n", phys);
+exit(1);
+}
+return vring->phys_mem_zero_host_ptr + phys;
+}
+
+/* Setup for cheap target physical to host address conversion
+ *
+ * This is a hack for direct access to guest memory, we're not really allowed
+ * to do this.
+ */
+static void setup_phys_to_host(Vring *vring)
+{
+target_phys_addr_t len = 4096; /* RAM is really much larger but we cheat */
+vring->phys_mem_zero_host_ptr = cpu_physical_memory_map(0, &len, 0);
+if (!vring->phys_mem_zero_host_ptr) {
+fprintf(stderr, "setup_phys_to_host failed\n");
+exit(1);
+}
+}
+
+/* Map the guest's vring to host memory
+ *
+ * This is not allowed but we know the ring won't move.
+ */
+static void vring_setup(Vring *vring, VirtIODevice *vdev, int n)
+{
+setup_phys_to_host(vring);
+
+vring_init(&vring->vr, virtio_queue_get_num(vdev, n),
+   phys_to_host(vring, virtio_queue_get_ring_addr(vdev, n)), 4096);
+
+vring->last_avail_idx = vring->vr.avail->idx;
+vring->last_used_idx = vring->vr.used->idx;
+
+fprintf(stderr, "vring physical=%#lx desc=%p avail=%p used=%p\n",
+virtio_queue_get_ring_addr(vdev, n),
+vring->vr.desc, vring->vr.

[Qemu-devel] [RFC v9 08/27] virtio-blk: Read requests from the vring

2012-07-18 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
---
 hw/dataplane/vring.h |8 +--
 hw/virtio-blk.c  |   62 ++
 2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/hw/dataplane/vring.h b/hw/dataplane/vring.h
index 7099a99..b07d4f6 100644
--- a/hw/dataplane/vring.h
+++ b/hw/dataplane/vring.h
@@ -76,7 +76,7 @@ static void vring_setup(Vring *vring, VirtIODevice *vdev, int 
n)
  * Stolen from linux-2.6/drivers/vhost/vhost.c.
  */
 static unsigned int vring_pop(Vring *vring,
- struct iovec iov[], unsigned int iov_size,
+ struct iovec iov[], struct iovec *iov_end,
  unsigned int *out_num, unsigned int *in_num)
 {
struct vring_desc desc;
@@ -138,10 +138,14 @@ static unsigned int vring_pop(Vring *vring,
return ret;
}
continue; */
-fprintf(stderr, "virtio-blk indirect vring not supported\n");
+fprintf(stderr, "Indirect vring not supported\n");
 exit(1);
}
 
+if (iov >= iov_end) {
+fprintf(stderr, "Not enough vring iovecs\n");
+exit(1);
+}
 iov->iov_base = phys_to_host(vring, desc.addr);
 iov->iov_len  = desc.len;
 iov++;
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 2c1cce8..91f1bab 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -24,6 +24,7 @@
 enum {
 SEG_MAX = 126,  /* maximum number of I/O segments */
 VRING_MAX = SEG_MAX + 2,/* maximum number of vring descriptors */
+REQ_MAX = VRING_MAX / 2,/* maximum number of requests in the vring 
*/
 };
 
 typedef struct VirtIOBlock
@@ -58,20 +59,63 @@ static void handle_io(EventHandler *handler)
 fprintf(stderr, "io completion happened\n");
 }
 
+static void process_request(struct iovec iov[], unsigned int out_num, unsigned 
int in_num)
+{
+/* Virtio block requests look like this: */
+struct virtio_blk_outhdr *outhdr; /* iov[0] */
+/* data[]... */
+struct virtio_blk_inhdr *inhdr;   /* iov[out_num + in_num - 1] */
+
+if (unlikely(out_num == 0 || in_num == 0 ||
+iov[0].iov_len != sizeof *outhdr ||
+iov[out_num + in_num - 1].iov_len != sizeof *inhdr)) {
+fprintf(stderr, "virtio-blk invalid request\n");
+exit(1);
+}
+
+outhdr = iov[0].iov_base;
+inhdr = iov[out_num + in_num - 1].iov_base;
+
+fprintf(stderr, "virtio-blk request type=%#x sector=%#lx\n",
+outhdr->type, outhdr->sector);
+}
+
 static void handle_notify(EventHandler *handler)
 {
 VirtIOBlock *s = container_of(handler, VirtIOBlock, notify_handler);
-struct iovec iov[VRING_MAX];
-unsigned int out_num, in_num;
-int head;
 
-head = vring_pop(&s->vring, iov, ARRAY_SIZE(iov), &out_num, &in_num);
-if (unlikely(head >= vring_get_num(&s->vring))) {
-fprintf(stderr, "false alarm, nothing on vring\n");
-return;
-}
+/* There is one array of iovecs into which all new requests are extracted
+ * from the vring.  Requests are read from the vring and the translated
+ * descriptors are written to the iovecs array.  The iovecs do not have to
+ * persist across handle_notify() calls because the kernel copies the
+ * iovecs on io_submit().
+ *
+ * Handling io_submit() EAGAIN may require storing the requests across
+ * handle_notify() calls until the kernel has sufficient resources to
+ * accept more I/O.  This is not implemented yet.
+ */
+struct iovec iovec[VRING_MAX];
+struct iovec *iov, *end = &iovec[VRING_MAX];
+
+/* When a request is read from the vring, the index of the first descriptor
+ * (aka head) is returned so that the completed request can be pushed onto
+ * the vring later.
+ *
+ * The number of hypervisor read-only iovecs is out_num.  The number of
+ * hypervisor write-only iovecs is in_num.
+ */
+unsigned int head, out_num = 0, in_num = 0;
+
+for (iov = iovec; ; iov += out_num + in_num) {
+head = vring_pop(&s->vring, iov, end, &out_num, &in_num);
+if (head >= vring_get_num(&s->vring)) {
+break; /* no more requests */
+}
+
+fprintf(stderr, "head=%u out_num=%u in_num=%u\n", head, out_num, 
in_num);
 
-fprintf(stderr, "head=%u out_num=%u in_num=%u\n", head, out_num, in_num);
+process_request(iov, out_num, in_num);
+}
 }
 
 static void *data_plane_thread(void *opaque)
-- 
1.7.10.4

[Qemu-devel] [RFC v9 12/27] virtio-blk: Add workaround for BUG_ON() dependency in virtio_ring.h

2012-07-18 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
---
 hw/dataplane/vring.h |5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/dataplane/vring.h b/hw/dataplane/vring.h
index 3eab4b4..44ef4a9 100644
--- a/hw/dataplane/vring.h
+++ b/hw/dataplane/vring.h
@@ -1,6 +1,11 @@
 #ifndef VRING_H
 #define VRING_H
 
+/* Some virtio_ring.h files use BUG_ON() */
+#ifndef BUG_ON
+#define BUG_ON(x)
+#endif
+
 #include 
 #include "qemu-common.h"
 
-- 
1.7.10.4

[Qemu-devel] [RFC v9 23/27] virtio-blk: Stub out SCSI commands

2012-07-18 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-blk.c |   25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 51807b5..8734029 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -215,14 +215,8 @@ static void process_request(IOQueue *ioq, struct iovec 
iov[], unsigned int out_n
 
 /* TODO Linux sets the barrier bit even when not advertised! */
 uint32_t type = outhdr->type & ~VIRTIO_BLK_T_BARRIER;
-
-if (unlikely(type & ~(VIRTIO_BLK_T_OUT | VIRTIO_BLK_T_FLUSH))) {
-fprintf(stderr, "virtio-blk unsupported request type %#x\n", 
outhdr->type);
-exit(1);
-}
-
 struct iocb *iocb;
-switch (type & (VIRTIO_BLK_T_OUT | VIRTIO_BLK_T_FLUSH)) {
+switch (type & (VIRTIO_BLK_T_OUT | VIRTIO_BLK_T_SCSI_CMD | 
VIRTIO_BLK_T_FLUSH)) {
 case VIRTIO_BLK_T_IN:
 if (unlikely(out_num != 1)) {
 fprintf(stderr, "virtio-blk invalid read request\n");
@@ -239,6 +233,21 @@ static void process_request(IOQueue *ioq, struct iovec 
iov[], unsigned int out_n
 iocb = ioq_rdwr(ioq, false, &iov[1], out_num - 1, outhdr->sector * 
512UL); /* TODO is it always 512? */
 break;
 
+case VIRTIO_BLK_T_SCSI_CMD:
+if (unlikely(in_num == 0)) {
+fprintf(stderr, "virtio-blk invalid SCSI command request\n");
+exit(1);
+}
+
+/* TODO support SCSI commands */
+{
+VirtIOBlock *s = container_of(ioq, VirtIOBlock, ioqueue);
+inhdr->status = VIRTIO_BLK_S_UNSUPP;
+vring_push(&s->vring, head, sizeof *inhdr);
+virtio_blk_notify_guest(s);
+}
+return;
+
 case VIRTIO_BLK_T_FLUSH:
 if (unlikely(in_num != 1 || out_num != 1)) {
 fprintf(stderr, "virtio-blk invalid flush request\n");
@@ -256,7 +265,7 @@ static void process_request(IOQueue *ioq, struct iovec 
iov[], unsigned int out_n
 return;
 
 default:
-fprintf(stderr, "virtio-blk multiple request type bits set\n");
+fprintf(stderr, "virtio-blk unsupported request type %#x\n", 
outhdr->type);
 exit(1);
 }
 
-- 
1.7.10.4

[Qemu-devel] [RFC v9 18/27] virtio-blk: Call ioctl() directly instead of irqfd

2012-07-18 Thread Stefan Hajnoczi

Optimize for the MSI-X enabled and vector unmasked case where it is
possible to issue the KVM ioctl() directly instead of using irqfd.

This patch introduces a new virtio binding function which tries to
notify in a thread-safe way.  If this is not possible, the function
returns false.  Virtio block then knows to use irqfd as a fallback.
---
 hw/msix.c   |   17 +
 hw/msix.h   |1 +
 hw/virtio-blk.c |   10 --
 hw/virtio-pci.c |8 
 hw/virtio.c |9 +
 hw/virtio.h |3 +++
 6 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 7955221..3308604 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -503,6 +503,23 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 stl_le_phys(address, data);
 }
 
+bool msix_try_notify_from_thread(PCIDevice *dev, unsigned vector)
+{
+if (unlikely(vector >= dev->msix_entries_nr || 
!dev->msix_entry_used[vector])) {
+return false;
+}
+if (unlikely(msix_is_masked(dev, vector))) {
+return false;
+}
+#ifdef KVM_CAP_IRQCHIP
+if (likely(kvm_enabled() && kvm_irqchip_in_kernel())) {
+kvm_set_irq(dev->msix_irq_entries[vector].gsi, 1, NULL);
+return true;
+}
+#endif
+return false;
+}
+
 void msix_reset(PCIDevice *dev)
 {
 if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))
diff --git a/hw/msix.h b/hw/msix.h
index a8661e1..99fb08f 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -26,6 +26,7 @@ void msix_vector_unuse(PCIDevice *dev, unsigned vector);
 void msix_unuse_all_vectors(PCIDevice *dev);
 
 void msix_notify(PCIDevice *dev, unsigned vector);
+bool msix_try_notify_from_thread(PCIDevice *dev, unsigned vector);
 
 void msix_reset(PCIDevice *dev);
 
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index bdff68a..efeffa0 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -82,6 +82,12 @@ static void virtio_blk_notify_guest(VirtIOBlock *s)
 !(s->vdev.guest_features & (1 << VIRTIO_F_NOTIFY_ON_EMPTY
return;
 
+/* Try to issue the ioctl() directly for speed */
+if (likely(virtio_queue_try_notify_from_thread(s->vq))) {
+return;
+}
+
+/* If the fast path didn't work, use irqfd */
 event_notifier_set(virtio_queue_get_guest_notifier(s->vq));
 }
 
@@ -263,7 +269,7 @@ static void data_plane_start(VirtIOBlock *s)
 vring_setup(&s->vring, &s->vdev, 0);
 
 /* Set up guest notifier (irq) */
-if (s->vdev.binding->set_guest_notifier(s->vdev.binding_opaque, 0, true) 
!= 0) {
+if (s->vdev.binding->set_guest_notifiers(s->vdev.binding_opaque, true) != 
0) {
 fprintf(stderr, "virtio-blk failed to set guest notifier, ensure 
-enable-kvm is set\n");
 exit(1);
 }
@@ -315,7 +321,7 @@ static void data_plane_stop(VirtIOBlock *s)
 event_poll_cleanup(&s->event_poll);
 
 /* Clean up guest notifier (irq) */
-s->vdev.binding->set_guest_notifier(s->vdev.binding_opaque, 0, false);
+s->vdev.binding->set_guest_notifiers(s->vdev.binding_opaque, false);
 }
 
 static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t val)
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index f1e13af..03512b3 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -106,6 +106,13 @@ static void virtio_pci_notify(void *opaque, uint16_t 
vector)
 qemu_set_irq(proxy->pci_dev.irq[0], proxy->vdev->isr & 1);
 }
 
+static bool virtio_pci_try_notify_from_thread(void *opaque, uint16_t vector)
+{
+VirtIOPCIProxy *proxy = opaque;
+return msix_enabled(&proxy->pci_dev) &&
+   msix_try_notify_from_thread(&proxy->pci_dev, vector);
+}
+
 static void virtio_pci_save_config(void * opaque, QEMUFile *f)
 {
 VirtIOPCIProxy *proxy = opaque;
@@ -707,6 +714,7 @@ static void virtio_pci_vmstate_change(void *opaque, bool 
running)
 
 static const VirtIOBindings virtio_pci_bindings = {
 .notify = virtio_pci_notify,
+.try_notify_from_thread = virtio_pci_try_notify_from_thread,
 .save_config = virtio_pci_save_config,
 .load_config = virtio_pci_load_config,
 .save_queue = virtio_pci_save_queue,
diff --git a/hw/virtio.c b/hw/virtio.c
index 064aecf..a1d1a8a 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -689,6 +689,15 @@ static inline int vring_need_event(uint16_t event, 
uint16_t new, uint16_t old)
return (uint16_t)(new - event - 1) < (uint16_t)(new - old);
 }
 
+bool virtio_queue_try_notify_from_thread(VirtQueue *vq)
+{
+VirtIODevice *vdev = vq->vdev;
+if (likely(vdev->binding->try_notify_from_thread)) {
+return vdev->binding->try_notify_from_thread(vdev->binding_opaque, 
vq->vector);
+}
+return false;
+}
+
 static bool vring_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
 uint16_t old, new;
diff --git a/hw/virtio.h b/hw/virtio.h
index 400c092..2cdf2be 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -93,6 +93,7 @@ typedef struct VirtQueueElement
 
 typedef struct {
 void (*notify)(void * opaque, uint16_t vector);
+bool (*try_notify_from_thr

Re: [Qemu-devel] [RFC v9 00/27] virtio: virtio-blk data plane

2012-07-18 Thread Michael S. Tsirkin

On Wed, Jul 18, 2012 at 04:07:27PM +0100, Stefan Hajnoczi wrote:
> This series implements a dedicated thread for virtio-blk processing using 
> Linux
> AIO for raw image files only.  It is based on qemu-kvm.git a0bc8c3 and 
> somewhat
> old but I wanted to share it on the list since it has been mentioned on 
> mailing
> lists and IRC recently.

BTW are these any bugfixes here upstream needs?
I could not tell.

-- 
MST

Re: [Qemu-devel] [PATCH 09/11] configure: Fix compile warning in utimensat/futimens test

2012-07-18 Thread Stefan Weil


Am 18.07.2012 16:10, schrieb Peter Maydell:

Fix compile warning in the utimensat/futimens test ("implicit
declaration of function 'utimensat'", ditto futimens) by
adding a missing include.

Signed-off-by: Peter Maydell 
---
  configure |1 +
  1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 9c2a84d..638e486 100755
--- a/configure
+++ b/configure
@@ -2341,6 +2341,7 @@ cat > $TMPC << EOF
  #define _ATFILE_SOURCE
  #include 
  #include 
+#include 
  
  int main(void)

  {


Reviewed-by: Stefan Weil

[Qemu-devel] [RFC v9 11/27] virtio-blk: Indirect vring and flush support

2012-07-18 Thread Stefan Hajnoczi

RHEL6 and other new guest kernels use indirect vring descriptors to
increase the number of requests that can be batched.  This fundamentally
changes vring from a scheme that requires fixed resources to something
more dynamic (although there is still an absolute maximum number of
descriptors).  Cope with indirect vrings by taking on as many requests
as we can in one go and then postponing the remaining requests until the
first batch completes.

It would be possible to switch to dynamic resource management so iovec
and iocb structs are malloced.  This would allow the entire ring to be
processed even with indirect descriptors, but would probably hit a
bottleneck when io_submit refuses to queue more requests.  Therefore,
stick with the simpler scheme for now.

Unfortunately Linux AIO does not support asynchronous fsync/fdatasync on
all files.  In particular, an O_DIRECT opened file on ext4 does not
support Linux AIO fdsync.  Work around this by performing fdatasync()
synchronously for now.

Signed-off-by: Stefan Hajnoczi 
---
 hw/dataplane/ioq.h   |   18 -
 hw/dataplane/vring.h |  103 +++---
 hw/virtio-blk.c  |   75 ++--
 3 files changed, 144 insertions(+), 52 deletions(-)

diff --git a/hw/dataplane/ioq.h b/hw/dataplane/ioq.h
index 7200e87..d1545d6 100644
--- a/hw/dataplane/ioq.h
+++ b/hw/dataplane/ioq.h
@@ -3,7 +3,7 @@
 
 typedef struct {
 int fd; /* file descriptor */
-unsigned int max_reqs;   /* max length of freelist and queue */
+unsigned int max_reqs;  /* max length of freelist and queue */
 
 io_context_t io_ctx;/* Linux AIO context */
 EventNotifier io_notifier;  /* Linux AIO eventfd */
@@ -91,18 +91,16 @@ static struct iocb *ioq_rdwr(IOQueue *ioq, bool read, 
struct iovec *iov, unsigne
 return iocb;
 }
 
-static struct iocb *ioq_fdsync(IOQueue *ioq)
-{
-struct iocb *iocb = ioq_get_iocb(ioq);
-
-io_prep_fdsync(iocb, ioq->fd);
-io_set_eventfd(iocb, event_notifier_get_fd(&ioq->io_notifier));
-return iocb;
-}
-
 static int ioq_submit(IOQueue *ioq)
 {
 int rc = io_submit(ioq->io_ctx, ioq->queue_idx, ioq->queue);
+if (unlikely(rc < 0)) {
+unsigned int i;
+fprintf(stderr, "io_submit io_ctx=%#lx nr=%d iovecs=%p\n", 
(uint64_t)ioq->io_ctx, ioq->queue_idx, ioq->queue);
+for (i = 0; i < ioq->queue_idx; i++) {
+fprintf(stderr, "[%u] type=%#x fd=%d\n", i, 
ioq->queue[i]->aio_lio_opcode, ioq->queue[i]->aio_fildes);
+}
+}
 ioq->queue_idx = 0; /* reset */
 return rc;
 }
diff --git a/hw/dataplane/vring.h b/hw/dataplane/vring.h
index 70675e5..3eab4b4 100644
--- a/hw/dataplane/vring.h
+++ b/hw/dataplane/vring.h
@@ -64,6 +64,86 @@ static void vring_setup(Vring *vring, VirtIODevice *vdev, 
int n)
 vring->vr.desc, vring->vr.avail, vring->vr.used);
 }
 
+static bool vring_more_avail(Vring *vring)
+{
+   return vring->vr.avail->idx != vring->last_avail_idx;
+}
+
+/* This is stolen from linux-2.6/drivers/vhost/vhost.c. */
+static bool get_indirect(Vring *vring,
+   struct iovec iov[], struct iovec *iov_end,
+   unsigned int *out_num, unsigned int *in_num,
+   struct vring_desc *indirect)
+{
+   struct vring_desc desc;
+   unsigned int i = 0, count, found = 0;
+
+   /* Sanity check */
+   if (unlikely(indirect->len % sizeof desc)) {
+   fprintf(stderr, "Invalid length in indirect descriptor: "
+  "len 0x%llx not multiple of 0x%zx\n",
+  (unsigned long long)indirect->len,
+  sizeof desc);
+   exit(1);
+   }
+
+   count = indirect->len / sizeof desc;
+   /* Buffers are chained via a 16 bit next field, so
+* we can have at most 2^16 of these. */
+   if (unlikely(count > USHRT_MAX + 1)) {
+   fprintf(stderr, "Indirect buffer length too big: %d\n",
+  indirect->len);
+exit(1);
+   }
+
+/* Point to translate indirect desc chain */
+indirect = phys_to_host(vring, indirect->addr);
+
+   /* We will use the result as an address to read from, so most
+* architectures only need a compiler barrier here. */
+   __sync_synchronize(); /* read_barrier_depends(); */
+
+   do {
+   if (unlikely(++found > count)) {
+   fprintf(stderr, "Loop detected: last one at %u "
+  "indirect size %u\n",
+  i, count);
+   exit(1);
+   }
+
+desc = *indirect++;
+   if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
+   fprintf(stderr, "Nested indirect descriptor\n");
+exit(1);
+   }
+
+/* Stop for now if there are not enough iovecs available. */
+if (iov

Re: [Qemu-devel] [PATCH 07/11] configure: Fix compile warning in PNG test

2012-07-18 Thread Stefan Weil


Am 18.07.2012 16:10, schrieb Peter Maydell:

Fix compile warning (variable 'png_ptr' set but not used) in the
PNG detection test code.

Signed-off-by: Peter Maydell 
---
  configure |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/configure b/configure
index aced52e..784325a 100755
--- a/configure
+++ b/configure
@@ -1727,7 +1727,7 @@ cat > $TMPC &1; then


Reviewed-by: Stefan Weil

[Qemu-devel] [RFC v9 21/27] virtio-blk: Add basic request merging

2012-07-18 Thread Stefan Hajnoczi

This commit adds an I/O scheduler that sorts requests and merges
adjacent requests if they have the same operation type (read/write).
The code is ugly and not very well factored but it does merge
successfully.
---
 hw/dataplane/ioq.h |3 +-
 hw/dataplane/iosched.h |   51 +-
 hw/dataplane/vring.h   |4 +--
 hw/virtio-blk.c|   93 +++-
 4 files changed, 122 insertions(+), 29 deletions(-)

diff --git a/hw/dataplane/ioq.h b/hw/dataplane/ioq.h
index d1545d6..72e5fd6 100644
--- a/hw/dataplane/ioq.h
+++ b/hw/dataplane/ioq.h
@@ -96,7 +96,7 @@ static int ioq_submit(IOQueue *ioq)
 int rc = io_submit(ioq->io_ctx, ioq->queue_idx, ioq->queue);
 if (unlikely(rc < 0)) {
 unsigned int i;
-fprintf(stderr, "io_submit io_ctx=%#lx nr=%d iovecs=%p\n", 
(uint64_t)ioq->io_ctx, ioq->queue_idx, ioq->queue);
+fprintf(stderr, "io_submit failed io_ctx=%#lx nr=%d iovecs=%p 
rc=%d\n", (uint64_t)ioq->io_ctx, ioq->queue_idx, ioq->queue, rc);
 for (i = 0; i < ioq->queue_idx; i++) {
 fprintf(stderr, "[%u] type=%#x fd=%d\n", i, 
ioq->queue[i]->aio_lio_opcode, ioq->queue[i]->aio_fildes);
 }
@@ -121,7 +121,6 @@ static int ioq_run_completion(IOQueue *ioq, 
IOQueueCompletion *completion, void
 ssize_t ret = ((uint64_t)events[i].res2 << 32) | events[i].res;
 
 completion(events[i].obj, ret, opaque);
-ioq_put_iocb(ioq, events[i].obj);
 }
 return nevents;
 }
diff --git a/hw/dataplane/iosched.h b/hw/dataplane/iosched.h
index 12ebccc..39da73c 100644
--- a/hw/dataplane/iosched.h
+++ b/hw/dataplane/iosched.h
@@ -9,6 +9,8 @@ typedef struct {
 unsigned long sched_calls;
 } IOSched;
 
+typedef void MergeFunc(struct iocb *a, struct iocb *b);
+
 static int iocb_cmp(const void *a, const void *b)
 {
 const struct iocb *iocb_a = a;
@@ -29,10 +31,10 @@ static int iocb_cmp(const void *a, const void *b)
 
 static size_t iocb_nbytes(struct iocb *iocb)
 {
-struct iovec *iov = iocb->u.c.buf;
+const struct iovec *iov = iocb->u.v.vec;
 size_t nbytes = 0;
 size_t i;
-for (i = 0; i < iocb->u.c.nbytes; i++) {
+for (i = 0; i < iocb->u.v.nr; i++) {
 nbytes += iov->iov_len;
 iov++;
 }
@@ -44,35 +46,52 @@ static void iosched_init(IOSched *iosched)
 memset(iosched, 0, sizeof *iosched);
 }
 
-static void iosched_print_stats(IOSched *iosched)
+static __attribute__((unused)) void iosched_print_stats(IOSched *iosched)
 {
 fprintf(stderr, "iocbs = %lu merges = %lu sched_calls = %lu\n",
 iosched->iocbs, iosched->merges, iosched->sched_calls);
 memset(iosched, 0, sizeof *iosched);
 }
 
-static void iosched(IOSched *iosched, struct iocb *unsorted[], unsigned int 
count)
+static void iosched(IOSched *iosched, struct iocb *unsorted[], unsigned int 
*count, MergeFunc merge_func)
 {
-struct iocb *sorted[count];
-struct iocb *last;
-unsigned int i;
+struct iocb *sorted[*count];
+unsigned int merges = 0;
+unsigned int i, j;
 
+/*
 if ((++iosched->sched_calls % 1000) == 0) {
 iosched_print_stats(iosched);
 }
+*/
+
+if (!*count) {
+return;
+}
 
 memcpy(sorted, unsorted, sizeof sorted);
-qsort(sorted, count, sizeof sorted[0], iocb_cmp);
-
-iosched->iocbs += count;
-last = sorted[0];
-for (i = 1; i < count; i++) {
-if (last->aio_lio_opcode == sorted[i]->aio_lio_opcode &&
-last->u.c.offset + iocb_nbytes(last) == sorted[i]->u.c.offset) {
-iosched->merges++;
+qsort(sorted, *count, sizeof sorted[0], iocb_cmp);
+
+unsorted[0] = sorted[0];
+j = 1;
+for (i = 1; i < *count; i++) {
+struct iocb *last = sorted[i - 1];
+struct iocb *cur = sorted[i];
+
+if (last->aio_lio_opcode == cur->aio_lio_opcode &&
+last->u.c.offset + iocb_nbytes(last) == cur->u.c.offset) {
+merge_func(last, cur);
+merges++;
+
+unsorted[j - 1] = cur;
+} else {
+unsorted[j++] = cur;
 }
-last = sorted[i];
 }
+
+iosched->merges += merges;
+iosched->iocbs += *count;
+*count = j;
 }
 
 #endif /* IOSCHED_H */
diff --git a/hw/dataplane/vring.h b/hw/dataplane/vring.h
index cdd4d4a..bbf8c86 100644
--- a/hw/dataplane/vring.h
+++ b/hw/dataplane/vring.h
@@ -29,7 +29,7 @@ static inline void *phys_to_host(Vring *vring, 
target_phys_addr_t phys)
 if (phys >= 0x1) {
 phys -= 0x1 - 0xe000;
 } else if (phys >= 0xe000) {
-fprintf(stderr, "phys_to_host bad physical address in PCI range 
%#lx\n", phys);
+fprintf(stderr, "phys_to_host bad physical address in PCI range 
%#lx\n", (unsigned long)phys);
 exit(1);
 }
 return vring->phys_mem_zero_host_ptr + phys;
@@ -65,7 +65,7 @@ static void vring_setup(Vring *vring, VirtIODevice *vdev, int 
n)
 vring->last_used_idx = 0;
 
 fprintf

Re: [Qemu-devel] [RFC v9 00/27] virtio: virtio-blk data plane

2012-07-18 Thread Michael S. Tsirkin

On Wed, Jul 18, 2012 at 04:07:27PM +0100, Stefan Hajnoczi wrote:
> This series implements a dedicated thread for virtio-blk processing using 
> Linux
> AIO for raw image files only.  It is based on qemu-kvm.git a0bc8c3 and 
> somewhat
> old but I wanted to share it on the list since it has been mentioned on 
> mailing
> lists and IRC recently.
> 
> These patches can be used for benchmarking and discussion about how to improve
> block performance.  Paolo Bonzini has also worked in this area and might want
> to share his patches.
> 
> The basic approach is:
> 1. Each virtio-blk device has a thread dedicated to handling ioeventfd
>signalling when the guest kicks the virtqueue.
> 2. Requests are processed without going through the QEMU block layer using
>Linux AIO directly.
> 3. Completion interrupts are injected via ioctl from the dedicated thread.
> 
> The series also contains request merging as a bdrv_aio_multiwrite() 
> equivalent.
> This was only to get a comparison against the QEMU block layer and I would 
> drop
> it for other types of analysis.
> 
> The effect of this series is that O_DIRECT Linux AIO on raw files can bypass
> the QEMU global mutex and block layer.  This means higher performance.

Do you have any numbers at all?

> A cleaned up version of this approach could be added to QEMU as a raw O_DIRECT
> Linux AIO fast path.  Image file formats, protocols, and other block layer
> features are not supported by virtio-blk-data-plane.
> 
> Git repo:
> http://repo.or.cz/w/qemu-kvm/stefanha.git/shortlog/refs/heads/virtio-blk-data-plane
> 
> Stefan Hajnoczi (27):
>   virtio-blk: Remove virtqueue request handling code
>   virtio-blk: Set up host notifier for data plane
>   virtio-blk: Data plane thread event loop
>   virtio-blk: Map vring
>   virtio-blk: Do cheapest possible memory mapping
>   virtio-blk: Take PCI memory range into account
>   virtio-blk: Put dataplane code into its own directory
>   virtio-blk: Read requests from the vring
>   virtio-blk: Add Linux AIO queue
>   virtio-blk: Stop data plane thread cleanly
>   virtio-blk: Indirect vring and flush support
>   virtio-blk: Add workaround for BUG_ON() dependency in virtio_ring.h
>   virtio-blk: Increase max requests for indirect vring
>   virtio-blk: Use pthreads instead of qemu-thread
>   notifier: Add a function to set the notifier
>   virtio-blk: Kick data plane thread using event notifier set
>   virtio-blk: Use guest notifier to raise interrupts
>   virtio-blk: Call ioctl() directly instead of irqfd
>   virtio-blk: Disable guest->host notifies while processing vring
>   virtio-blk: Add ioscheduler to detect mergable requests
>   virtio-blk: Add basic request merging
>   virtio-blk: Fix request merging
>   virtio-blk: Stub out SCSI commands
>   virtio-blk: fix incorrect length
>   msix: fix irqchip breakage in msix_try_notify_from_thread()
>   msix: use upstream kvm_irqchip_set_irq()
>   virtio-blk: add EVENT_IDX support to dataplane
> 
>  event_notifier.c  |7 +
>  event_notifier.h  |1 +
>  hw/dataplane/event-poll.h |  116 +++
>  hw/dataplane/ioq.h|  128 
>  hw/dataplane/iosched.h|   97 ++
>  hw/dataplane/vring.h  |  334 
>  hw/msix.c |   15 +
>  hw/msix.h |1 +
>  hw/virtio-blk.c   |  753 
> +
>  hw/virtio-pci.c   |8 +
>  hw/virtio.c   |9 +
>  hw/virtio.h   |3 +
>  12 files changed, 1074 insertions(+), 398 deletions(-)
>  create mode 100644 hw/dataplane/event-poll.h
>  create mode 100644 hw/dataplane/ioq.h
>  create mode 100644 hw/dataplane/iosched.h
>  create mode 100644 hw/dataplane/vring.h
> 
> -- 
> 1.7.10.4

[Qemu-devel] [RFC v9 09/27] virtio-blk: Add Linux AIO queue

2012-07-18 Thread Stefan Hajnoczi

Requests read from the vring will be placed in a queue where they can be
merged as necessary.  Once all requests have been read from the vring,
the queue can be submitted.

Signed-off-by: Stefan Hajnoczi 
---
 hw/dataplane/ioq.h |  104 
 hw/virtio-blk.c|   33 -
 2 files changed, 120 insertions(+), 17 deletions(-)
 create mode 100644 hw/dataplane/ioq.h

diff --git a/hw/dataplane/ioq.h b/hw/dataplane/ioq.h
new file mode 100644
index 000..26ca307
--- /dev/null
+++ b/hw/dataplane/ioq.h
@@ -0,0 +1,104 @@
+#ifndef IO_QUEUE_H
+#define IO_QUEUE_H
+
+typedef struct {
+int fd; /* file descriptor */
+unsigned int maxreqs;   /* max length of freelist and queue */
+
+io_context_t io_ctx;/* Linux AIO context */
+EventNotifier notifier; /* Linux AIO eventfd */
+
+/* Requests can complete in any order so a free list is necessary to manage
+ * available iocbs.
+ */
+struct iocb **freelist; /* free iocbs */
+unsigned int freelist_idx;
+
+/* Multiple requests are queued up before submitting them all in one go */
+struct iocb **queue;/* queued iocbs */
+unsigned int queue_idx;
+} IOQueue;
+
+static void ioq_init(IOQueue *ioq, int fd, unsigned int maxreqs)
+{
+ioq->fd = fd;
+ioq->maxreqs = maxreqs;
+
+if (io_setup(maxreqs, &ioq->io_ctx) != 0) {
+fprintf(stderr, "ioq io_setup failed\n");
+exit(1);
+}
+
+if (event_notifier_init(&ioq->notifier, 0) != 0) {
+fprintf(stderr, "ioq io event notifier creation failed\n");
+exit(1);
+}
+
+ioq->freelist = g_malloc0(sizeof ioq->freelist[0] * maxreqs);
+ioq->freelist_idx = 0;
+
+ioq->queue = g_malloc0(sizeof ioq->queue[0] * maxreqs);
+ioq->queue_idx = 0;
+}
+
+static void ioq_cleanup(IOQueue *ioq)
+{
+g_free(ioq->freelist);
+g_free(ioq->queue);
+
+event_notifier_cleanup(&ioq->notifier);
+io_destroy(ioq->io_ctx);
+}
+
+static EventNotifier *ioq_get_notifier(IOQueue *ioq)
+{
+return &ioq->notifier;
+}
+
+static struct iocb *ioq_get_iocb(IOQueue *ioq)
+{
+if (unlikely(ioq->freelist_idx == 0)) {
+fprintf(stderr, "ioq underflow\n");
+exit(1);
+}
+struct iocb *iocb = ioq->freelist[--ioq->freelist_idx];
+ioq->queue[ioq->queue_idx++] = iocb;
+}
+
+static __attribute__((unused)) void ioq_put_iocb(IOQueue *ioq, struct iocb 
*iocb)
+{
+if (unlikely(ioq->freelist_idx == ioq->maxreqs)) {
+fprintf(stderr, "ioq overflow\n");
+exit(1);
+}
+ioq->freelist[ioq->freelist_idx++] = iocb;
+}
+
+static __attribute__((unused)) void ioq_rdwr(IOQueue *ioq, bool read, struct 
iovec *iov, unsigned int count, long long offset)
+{
+struct iocb *iocb = ioq_get_iocb(ioq);
+
+if (read) {
+io_prep_preadv(iocb, ioq->fd, iov, count, offset);
+} else {
+io_prep_pwritev(iocb, ioq->fd, iov, count, offset);
+}
+io_set_eventfd(iocb, event_notifier_get_fd(&ioq->notifier));
+}
+
+static __attribute__((unused)) void ioq_fdsync(IOQueue *ioq)
+{
+struct iocb *iocb = ioq_get_iocb(ioq);
+
+io_prep_fdsync(iocb, ioq->fd);
+io_set_eventfd(iocb, event_notifier_get_fd(&ioq->notifier));
+}
+
+static __attribute__((unused)) int ioq_submit(IOQueue *ioq)
+{
+int rc = io_submit(ioq->io_ctx, ioq->queue_idx, ioq->queue);
+ioq->queue_idx = 0; /* reset */
+return rc;
+}
+
+#endif /* IO_QUEUE_H */
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 91f1bab..5e1ed79 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -13,12 +13,14 @@
 
 #include 
 #include "qemu-common.h"
+#include "block_int.h"
 #include "qemu-thread.h"
 #include "qemu-error.h"
 #include "blockdev.h"
 #include "virtio-blk.h"
 #include "hw/dataplane/event-poll.h"
 #include "hw/dataplane/vring.h"
+#include "hw/dataplane/ioq.h"
 #include "kvm.h"
 
 enum {
@@ -42,9 +44,9 @@ typedef struct VirtIOBlock
 
 Vring vring;/* virtqueue vring */
 
+IOQueue ioqueue;/* Linux AIO queue (should really be per 
dataplane thread) */
+
 EventPoll event_poll;   /* event poller */
-io_context_t io_ctx;/* Linux AIO context */
-EventNotifier io_notifier;  /* Linux AIO eventfd */
 EventHandler io_handler;/* Linux AIO completion handler */
 EventHandler notify_handler;/* virtqueue notify handler */
 } VirtIOBlock;
@@ -128,6 +130,14 @@ static void *data_plane_thread(void *opaque)
 return NULL;
 }
 
+/* Normally the block driver passes down the fd, there's no way to get it from
+ * above.
+ */
+static int get_raw_posix_fd_hack(VirtIOBlock *s)
+{
+return *(int*)s->bs->file->opaque;
+}
+
 static void data_plane_start(VirtIOBlock *s)
 {
 vring_setup(&s->vring, &s->vdev, 0);
@@ -138,23 +148,13 @@ static void data_plane_start(VirtIOBlock *s)
 fprintf(stderr, "virtio-blk failed to set host notifier,

Re: [Qemu-devel] [PATCH] powerpc pci: fixed packing of ranges[]

2012-07-18 Thread Alexander Graf


On 07/18/2012 05:40 PM, Stefan Weil wrote:

Am 18.07.2012 10:22, schrieb Alexey Kardashevskiy:

By default mingw-gcc is trying to pack structures the way to
preserve binary compatibility with MS Visual C what leads to
incorrect and unexpected padding in the PCI bus ranges property of
the sPAPR PHB.

The patch replaces __attribute__((packed)) with more strict QEMU_PACKED
which actually is __attribute__((gcc_struct, packed)) on Windows.

Signed-off-by: Alexey Kardashevskiy 
---
  hw/spapr_pci.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/spapr_pci.c b/hw/spapr_pci.c
index b3032d2..0261d2e 100644
--- a/hw/spapr_pci.c
+++ b/hw/spapr_pci.c
@@ -418,7 +418,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
  uint64_t child;
  uint64_t parent;
  uint64_t size;
-} __attribute__((packed)) ranges[] = {
+} QEMU_PACKED ranges[] = {
  {
  cpu_to_be32(b_ss(1)), cpu_to_be64(0),
  cpu_to_be64(phb->io_win_addr),



The patch changes sizeof(ranges[0]) from 32 to 28 bytes
and can be applied as a trivial patch.

Tested-by: Stefan Weil 
Reviewed-by: Stefan Weil 



So do you want to take it through the trivial queue? I'm fine either way.

Acked-by: Alexander Graf 


Alex

[Qemu-devel] [RFC v9 06/27] virtio-blk: Take PCI memory range into account

2012-07-18 Thread Stefan Hajnoczi

Support >4 GB physical memory accesses.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-blk.c |7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index abd9386..99654f1 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -64,6 +64,13 @@ static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
  */
 static inline void *phys_to_host(VirtIOBlock *s, target_phys_addr_t phys)
 {
+/* Adjust for 3.6-4 GB PCI memory range */
+if (phys >= 0x1) {
+phys -= 0x1 - 0xe000;
+} else if (phys >= 0xe000) {
+fprintf(stderr, "phys_to_host bad physical address in PCI range 
%#lx\n", phys);
+exit(1);
+}
 return s->phys_mem_zero_host_ptr + phys;
 }
 
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH] powerpc pci: fixed packing of ranges[]

2012-07-18 Thread Stefan Weil


Am 18.07.2012 10:22, schrieb Alexey Kardashevskiy:

By default mingw-gcc is trying to pack structures the way to
preserve binary compatibility with MS Visual C what leads to
incorrect and unexpected padding in the PCI bus ranges property of
the sPAPR PHB.

The patch replaces __attribute__((packed)) with more strict QEMU_PACKED
which actually is __attribute__((gcc_struct, packed)) on Windows.

Signed-off-by: Alexey Kardashevskiy 
---
  hw/spapr_pci.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/spapr_pci.c b/hw/spapr_pci.c
index b3032d2..0261d2e 100644
--- a/hw/spapr_pci.c
+++ b/hw/spapr_pci.c
@@ -418,7 +418,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
  uint64_t child;
  uint64_t parent;
  uint64_t size;
-} __attribute__((packed)) ranges[] = {
+} QEMU_PACKED ranges[] = {
  {
  cpu_to_be32(b_ss(1)), cpu_to_be64(0),
  cpu_to_be64(phb->io_win_addr),



The patch changes sizeof(ranges[0]) from 32 to 28 bytes
and can be applied as a trivial patch.

Tested-by: Stefan Weil 
Reviewed-by: Stefan Weil

Re: [Qemu-devel] [RFC v9 18/27] virtio-blk: Call ioctl() directly instead of irqfd

2012-07-18 Thread Michael S. Tsirkin

On Wed, Jul 18, 2012 at 04:07:45PM +0100, Stefan Hajnoczi wrote:
> Optimize for the MSI-X enabled and vector unmasked case where it is
> possible to issue the KVM ioctl() directly instead of using irqfd.

Why? Is an ioctl faster?

> This patch introduces a new virtio binding function which tries to
> notify in a thread-safe way.  If this is not possible, the function
> returns false.  Virtio block then knows to use irqfd as a fallback.
> ---
>  hw/msix.c   |   17 +
>  hw/msix.h   |1 +
>  hw/virtio-blk.c |   10 --
>  hw/virtio-pci.c |8 
>  hw/virtio.c |9 +
>  hw/virtio.h |3 +++
>  6 files changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/msix.c b/hw/msix.c
> index 7955221..3308604 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -503,6 +503,23 @@ void msix_notify(PCIDevice *dev, unsigned vector)
>  stl_le_phys(address, data);
>  }
>  
> +bool msix_try_notify_from_thread(PCIDevice *dev, unsigned vector)
> +{
> +if (unlikely(vector >= dev->msix_entries_nr || 
> !dev->msix_entry_used[vector])) {
> +return false;
> +}
> +if (unlikely(msix_is_masked(dev, vector))) {
> +return false;
> +}
> +#ifdef KVM_CAP_IRQCHIP
> +if (likely(kvm_enabled() && kvm_irqchip_in_kernel())) {
> +kvm_set_irq(dev->msix_irq_entries[vector].gsi, 1, NULL);
> +return true;
> +}
> +#endif
> +return false;
> +}
> +
>  void msix_reset(PCIDevice *dev)
>  {
>  if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))
> diff --git a/hw/msix.h b/hw/msix.h
> index a8661e1..99fb08f 100644
> --- a/hw/msix.h
> +++ b/hw/msix.h
> @@ -26,6 +26,7 @@ void msix_vector_unuse(PCIDevice *dev, unsigned vector);
>  void msix_unuse_all_vectors(PCIDevice *dev);
>  
>  void msix_notify(PCIDevice *dev, unsigned vector);
> +bool msix_try_notify_from_thread(PCIDevice *dev, unsigned vector);
>  
>  void msix_reset(PCIDevice *dev);
>  
> diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
> index bdff68a..efeffa0 100644
> --- a/hw/virtio-blk.c
> +++ b/hw/virtio-blk.c
> @@ -82,6 +82,12 @@ static void virtio_blk_notify_guest(VirtIOBlock *s)
>  !(s->vdev.guest_features & (1 << VIRTIO_F_NOTIFY_ON_EMPTY
>   return;
>  
> +/* Try to issue the ioctl() directly for speed */
> +if (likely(virtio_queue_try_notify_from_thread(s->vq))) {
> +return;
> +}
> +
> +/* If the fast path didn't work, use irqfd */
>  event_notifier_set(virtio_queue_get_guest_notifier(s->vq));
>  }
>  
> @@ -263,7 +269,7 @@ static void data_plane_start(VirtIOBlock *s)
>  vring_setup(&s->vring, &s->vdev, 0);
>  
>  /* Set up guest notifier (irq) */
> -if (s->vdev.binding->set_guest_notifier(s->vdev.binding_opaque, 0, true) 
> != 0) {
> +if (s->vdev.binding->set_guest_notifiers(s->vdev.binding_opaque, true) 
> != 0) {
>  fprintf(stderr, "virtio-blk failed to set guest notifier, ensure 
> -enable-kvm is set\n");
>  exit(1);
>  }
> @@ -315,7 +321,7 @@ static void data_plane_stop(VirtIOBlock *s)
>  event_poll_cleanup(&s->event_poll);
>  
>  /* Clean up guest notifier (irq) */
> -s->vdev.binding->set_guest_notifier(s->vdev.binding_opaque, 0, false);
> +s->vdev.binding->set_guest_notifiers(s->vdev.binding_opaque, false);
>  }
>  
>  static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t val)
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index f1e13af..03512b3 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -106,6 +106,13 @@ static void virtio_pci_notify(void *opaque, uint16_t 
> vector)
>  qemu_set_irq(proxy->pci_dev.irq[0], proxy->vdev->isr & 1);
>  }
>  
> +static bool virtio_pci_try_notify_from_thread(void *opaque, uint16_t vector)
> +{
> +VirtIOPCIProxy *proxy = opaque;
> +return msix_enabled(&proxy->pci_dev) &&
> +   msix_try_notify_from_thread(&proxy->pci_dev, vector);
> +}
> +
>  static void virtio_pci_save_config(void * opaque, QEMUFile *f)
>  {
>  VirtIOPCIProxy *proxy = opaque;
> @@ -707,6 +714,7 @@ static void virtio_pci_vmstate_change(void *opaque, bool 
> running)
>  
>  static const VirtIOBindings virtio_pci_bindings = {
>  .notify = virtio_pci_notify,
> +.try_notify_from_thread = virtio_pci_try_notify_from_thread,
>  .save_config = virtio_pci_save_config,
>  .load_config = virtio_pci_load_config,
>  .save_queue = virtio_pci_save_queue,
> diff --git a/hw/virtio.c b/hw/virtio.c
> index 064aecf..a1d1a8a 100644
> --- a/hw/virtio.c
> +++ b/hw/virtio.c
> @@ -689,6 +689,15 @@ static inline int vring_need_event(uint16_t event, 
> uint16_t new, uint16_t old)
>   return (uint16_t)(new - event - 1) < (uint16_t)(new - old);
>  }
>  
> +bool virtio_queue_try_notify_from_thread(VirtQueue *vq)
> +{
> +VirtIODevice *vdev = vq->vdev;
> +if (likely(vdev->binding->try_notify_from_thread)) {
> +return vdev->binding->try_notify_from_thread(vdev->binding_opaque, 
> vq->vector);
> +}
> +ret

Re: [Qemu-devel] [RFC] introduce a dynamic library to expose qemu block API

2012-07-18 Thread Daniel P. Berrange

On Wed, Jul 18, 2012 at 02:58:46PM +0100, Daniel P. Berrange wrote:
> On Wed, Jul 18, 2012 at 04:51:03PM +0800, Wenchao Xia wrote:
> >   Hi, following is API draft, prototypes were taken from qemu/block.h,
> > and the API prefix is changed frpm bdrv to qbdrvs, to declare related
> > object is BlockDriverState, not BlockDriver. One issue here is it may
> > require include block_int.h, which is not LGPL2 licensed yet.
> >   API format is kept mostly the same with qemu generic block layer, to
> > make it easier for implement, and easier to make qemu migrate on it if
> > possible.
> 
> 
> How is error reporting dealt with, and what is the intent around
> thread safety of the APIs ?  I'd like to see a fully thread safe
> API - multiple threads can use the same 'BlockDriverState *'
> concurrently, and thread-local error reporting.

Oh, and will this library depend on glib, and will it have the
abort-on-oom behaviour QEMU has ? From a libvirt POV, we won't
use any library that aborts-on-oom.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

[Qemu-devel] [Bug 1026176] [NEW] unable to boot squashfs through mtd device

2012-07-18 Thread Vincent de RIBOU

Public bug reported:

Hi,

I have built latest qemu archive qemu-1.1.1 to be sure of up to date source 
code.
I have then built buildroot squashfs image, which can be used correctly with 
cmdline like:

qemu-system-i386 -m 64 -k fr -boot c -kernel images/bzImage -drive
if=ide,file=images/rootfs.squashfs  -append "root=/dev/sda"

Then I wanted to modify cmdline to use real MTD device, like:

qemu-system-i386 -m 64 -k fr -boot c -kernel images/bzImage -drive
if=mtd,file=images/rootfs.squashfs  -append "root=/dev/mtdblock0".

But nothing was good under kernel.
Even if mtd0 is reported through qemu interface (Ctrl Alt+2), no device can be 
found under kernel even if all drivers are built to use it.

Is this feature okay on qemu-1.1.1 ??
did I do mistake in my cmdline??

thank you for your help.
regards,

** Affects: qemu
 Importance: Undecided
 Status: New


** Tags: 1.1.1

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1026176

Title:
  unable to boot squashfs through mtd device

Status in QEMU:
  New

Bug description:
  Hi,

  I have built latest qemu archive qemu-1.1.1 to be sure of up to date source 
code.
  I have then built buildroot squashfs image, which can be used correctly with 
cmdline like:

  qemu-system-i386 -m 64 -k fr -boot c -kernel images/bzImage -drive
  if=ide,file=images/rootfs.squashfs  -append "root=/dev/sda"

  Then I wanted to modify cmdline to use real MTD device, like:

  qemu-system-i386 -m 64 -k fr -boot c -kernel images/bzImage -drive
  if=mtd,file=images/rootfs.squashfs  -append "root=/dev/mtdblock0".

  But nothing was good under kernel.
  Even if mtd0 is reported through qemu interface (Ctrl Alt+2), no device can 
be found under kernel even if all drivers are built to use it.

  Is this feature okay on qemu-1.1.1 ??
  did I do mistake in my cmdline??

  thank you for your help.
  regards,

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1026176/+subscriptions

[Qemu-devel] [RFC v9 05/27] virtio-blk: Do cheapest possible memory mapping

2012-07-18 Thread Stefan Hajnoczi

Instead of using QEMU memory access functions, grab the host address of
guest physical address zero and simply add to this base address.

This not only simplifies vring mapping but will also make virtqueue
element access cheap by avoiding QEMU memory access functions in the I/O
code path.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-blk.c |   58 ---
 1 file changed, 30 insertions(+), 28 deletions(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 4c790a3..abd9386 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -51,6 +51,8 @@ typedef struct VirtIOBlock
 EventNotifier io_notifier;  /* Linux AIO eventfd */
 EventHandler io_handler;/* Linux AIO completion handler */
 EventHandler notify_handler;/* virtqueue notify handler */
+
+void *phys_mem_zero_host_ptr;   /* host pointer to guest RAM */
 } VirtIOBlock;
 
 static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
@@ -58,43 +60,44 @@ static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
 return (VirtIOBlock *)vdev;
 }
 
+/* Map target physical address to host address
+ */
+static inline void *phys_to_host(VirtIOBlock *s, target_phys_addr_t phys)
+{
+return s->phys_mem_zero_host_ptr + phys;
+}
+
+/* Setup for cheap target physical to host address conversion
+ *
+ * This is a hack for direct access to guest memory, we're not really allowed
+ * to do this.
+ */
+static void setup_phys_to_host(VirtIOBlock *s)
+{
+target_phys_addr_t len = 4096; /* RAM is really much larger but we cheat */
+s->phys_mem_zero_host_ptr = cpu_physical_memory_map(0, &len, 0);
+if (!s->phys_mem_zero_host_ptr) {
+fprintf(stderr, "setup_phys_to_host failed\n");
+exit(1);
+}
+}
+
 /* Map the guest's vring to host memory
  *
  * This is not allowed but we know the ring won't move.
  */
-static void map_vring(struct vring *vring, VirtIODevice *vdev, int n)
+static void map_vring(struct vring *vring, VirtIOBlock *s, VirtIODevice *vdev, 
int n)
 {
-target_phys_addr_t physaddr, len;
-
 vring->num = virtio_queue_get_num(vdev, n);
-
-physaddr = virtio_queue_get_desc_addr(vdev, n);
-len = virtio_queue_get_desc_size(vdev, n);
-vring->desc = cpu_physical_memory_map(physaddr, &len, 0);
-
-physaddr = virtio_queue_get_avail_addr(vdev, n);
-len = virtio_queue_get_avail_size(vdev, n);
-vring->avail = cpu_physical_memory_map(physaddr, &len, 0);
-
-physaddr = virtio_queue_get_used_addr(vdev, n);
-len = virtio_queue_get_used_size(vdev, n);
-vring->used = cpu_physical_memory_map(physaddr, &len, 0);
-
-if (!vring->desc || !vring->avail || !vring->used) {
-fprintf(stderr, "virtio-blk failed to map vring\n");
-exit(1);
-}
+vring->desc = phys_to_host(s, virtio_queue_get_desc_addr(vdev, n));
+vring->avail = phys_to_host(s, virtio_queue_get_avail_addr(vdev, n));
+vring->used = phys_to_host(s, virtio_queue_get_used_addr(vdev, n));
 
 fprintf(stderr, "virtio-blk vring physical=%#lx desc=%p avail=%p 
used=%p\n",
 virtio_queue_get_ring_addr(vdev, n),
 vring->desc, vring->avail, vring->used);
 }
 
-static void unmap_vring(struct vring *vring, VirtIODevice *vdev, int n)
-{
-cpu_physical_memory_unmap(vring->desc, virtio_queue_get_ring_size(vdev, 
n), 0, 0);
-}
-
 static void handle_io(void)
 {
 fprintf(stderr, "io completion happened\n");
@@ -149,7 +152,8 @@ static void add_event_handler(int epoll_fd, EventHandler 
*event_handler)
 
 static void data_plane_start(VirtIOBlock *s)
 {
-map_vring(&s->vring, &s->vdev, 0);
+setup_phys_to_host(s);
+map_vring(&s->vring, s, &s->vdev, 0);
 
 /* Create epoll file descriptor */
 s->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
@@ -199,8 +203,6 @@ static void data_plane_stop(VirtIOBlock *s)
 s->vdev.binding->set_host_notifier(s->vdev.binding_opaque, 0, false);
 
 close(s->epoll_fd);
-
-unmap_vring(&s->vring, &s->vdev, 0);
 }
 
 static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t val)
-- 
1.7.10.4

[Qemu-devel] [RFC v9 15/27] notifier: Add a function to set the notifier

2012-07-18 Thread Stefan Hajnoczi

Although past users only needed to test and clear event notifiers, it is
useful to be able to set them too.

Signed-off-by: Stefan Hajnoczi 
---
 event_notifier.c |7 +++
 event_notifier.h |1 +
 2 files changed, 8 insertions(+)

diff --git a/event_notifier.c b/event_notifier.c
index 0b82981..006adc5 100644
--- a/event_notifier.c
+++ b/event_notifier.c
@@ -59,3 +59,10 @@ int event_notifier_test(EventNotifier *e)
 }
 return r == sizeof(value);
 }
+
+int event_notifier_set(EventNotifier *e)
+{
+uint64_t value = 1;
+int r = write(e->fd, &value, sizeof(value));
+return r == sizeof(value);
+}
diff --git a/event_notifier.h b/event_notifier.h
index 886222c..46a22f8 100644
--- a/event_notifier.h
+++ b/event_notifier.h
@@ -24,5 +24,6 @@ void event_notifier_cleanup(EventNotifier *);
 int event_notifier_get_fd(EventNotifier *);
 int event_notifier_test_and_clear(EventNotifier *);
 int event_notifier_test(EventNotifier *);
+int event_notifier_set(EventNotifier *);
 
 #endif
-- 
1.7.10.4

Re: [Qemu-devel] [RFC] introduce a dynamic library to expose qemu block API

2012-07-18 Thread Kevin Wolf

Am 18.07.2012 11:03, schrieb Paolo Bonzini:
> Let's get things right, and only have open/close:
> 
> int qbdrvs_open(BlockDriverState **bs, const char *filename, int flags,
> const char *format_name);
> void qbdrvs_close(BlockDriverState *bs);

What is flags?

Are we really happy with a function that can't provide the features that
-blockdev will give us?

>> int qbdrvs_img_create(const char *filename, const char *fmt,
>> const char *base_filename, const char *base_fmt,
>> char *options, uint64_t img_size, int flags);

This prototype is totally wrong. It's already not nice that the
signature of some block layer internal function looks like this.
Basically all of the options need to be replaced by something like a
single QDict.

>> /* sync access */
>> int qbdrvs_read(BlockDriverState *bs, int64_t sector_num,
>>   uint8_t *buf, int nb_sectors);
>> int qbdrvs_write(BlockDriverState *bs, int64_t sector_num,
>>const uint8_t *buf, int nb_sectors);
> 
> I would like to have also a scatter gather API (qbdrvs_readv and
> qbdrvs_writev) taking a "struct iovec *iov, int niov" instead of
> "uint8_t *buf, int nb_sectors".
> 
> flush is missing.

Yes, both very important.

Kevin

[Qemu-devel] [RFC v9 25/27] msix: fix irqchip breakage in msix_try_notify_from_thread()

2012-07-18 Thread Stefan Hajnoczi

Commit bd8b215bce453706c3951460cc7e6627ccb90314 removed #ifdef
KVM_CAP_IRQCHIP from hw/msix.c after it turned out  is not
included since msix.o is built in libhw64/.  Do the same for
msix_try_notify_from_thread() since we do not have access to
 here and hence KVM_CAP_IRQCHIP is not defined.

Signed-off-by: Stefan Hajnoczi 
---
 hw/msix.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 3308604..0ed1013 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -511,12 +511,10 @@ bool msix_try_notify_from_thread(PCIDevice *dev, unsigned 
vector)
 if (unlikely(msix_is_masked(dev, vector))) {
 return false;
 }
-#ifdef KVM_CAP_IRQCHIP
 if (likely(kvm_enabled() && kvm_irqchip_in_kernel())) {
 kvm_set_irq(dev->msix_irq_entries[vector].gsi, 1, NULL);
 return true;
 }
-#endif
 return false;
 }
 
-- 
1.7.10.4

[Qemu-devel] [RFC v9 14/27] virtio-blk: Use pthreads instead of qemu-thread

2012-07-18 Thread Stefan Hajnoczi

Using qemu-thread.h seemed like a nice idea but it has two limitations:

1. QEMU needs to be built with --enable-io-thread
2. qemu-kvm doesn't build with --enable-io-thread

For now just copy the pthread_create() code straight into virtio-blk.c.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-blk.c |   16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 7ae3c56..1616be5 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -11,6 +11,7 @@
  *
  */
 
+#include 
 #include 
 #include "qemu-common.h"
 #include "block_int.h"
@@ -47,7 +48,7 @@ typedef struct {
 DeviceState *qdev;
 
 bool data_plane_started;
-QemuThread data_plane_thread;
+pthread_t data_plane_thread;
 
 Vring vring;/* virtqueue vring */
 
@@ -268,7 +269,16 @@ static void data_plane_start(VirtIOBlock *s)
 }
 event_poll_add(&s->event_poll, &s->io_handler, 
ioq_get_notifier(&s->ioqueue), handle_io);
 
-qemu_thread_create(&s->data_plane_thread, data_plane_thread, s, 
QEMU_THREAD_JOINABLE);
+/* Create data plane thread */
+sigset_t set, oldset;
+sigfillset(&set);
+pthread_sigmask(SIG_SETMASK, &set, &oldset);
+if (pthread_create(&s->data_plane_thread, NULL, data_plane_thread, s) != 0)
+{
+fprintf(stderr, "pthread create failed: %m\n");
+exit(1);
+}
+pthread_sigmask(SIG_SETMASK, &oldset, NULL);
 
 s->data_plane_started = true;
 }
@@ -279,7 +289,7 @@ static void data_plane_stop(VirtIOBlock *s)
 
 /* Tell data plane thread to stop and then wait for it to return */
 event_poll_stop(&s->event_poll);
-pthread_join(s->data_plane_thread.thread, NULL);
+pthread_join(s->data_plane_thread, NULL);
 
 ioq_cleanup(&s->ioqueue);
 
-- 
1.7.10.4

[Qemu-devel] [RFC v9 03/27] virtio-blk: Data plane thread event loop

2012-07-18 Thread Stefan Hajnoczi

Add a simple event handling loop based on epoll(2).  The data plane
thread now receives virtqueue notify and Linux AIO completion events.

The data plane thread currently does not shut down.  Either it needs to
be a detached thread or have clean shutdown support.

Most of the data plane start/stop code can be done once on virtio-blk
init/cleanup instead of each time the virtio device is brought up/down
by the driver.  Only the vring address and the notify pio address
change.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-blk.c |  125 +++
 1 file changed, 116 insertions(+), 9 deletions(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 0389294..f6043bc 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -11,12 +11,25 @@
  *
  */
 
+#include 
+#include 
+#include 
 #include "qemu-common.h"
+#include "qemu-thread.h"
 #include "qemu-error.h"
-#include "trace.h"
 #include "blockdev.h"
 #include "virtio-blk.h"
 
+enum {
+SEG_MAX = 126, /* maximum number of I/O segments */
+};
+
+typedef struct
+{
+EventNotifier *notifier;/* eventfd */
+void (*handler)(void);  /* handler function */
+} EventHandler;
+
 typedef struct VirtIOBlock
 {
 VirtIODevice vdev;
@@ -28,6 +41,13 @@ typedef struct VirtIOBlock
 DeviceState *qdev;
 
 bool data_plane_started;
+QemuThread data_plane_thread;
+
+int epoll_fd;   /* epoll(2) file descriptor */
+io_context_t io_ctx;/* Linux AIO context */
+EventNotifier io_notifier;  /* Linux AIO eventfd */
+EventHandler io_handler;/* Linux AIO completion handler */
+EventHandler notify_handler;/* virtqueue notify handler */
 } VirtIOBlock;
 
 static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
@@ -35,21 +55,108 @@ static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
 return (VirtIOBlock *)vdev;
 }
 
-static void virtio_blk_data_plane_start(VirtIOBlock *s)
+static void handle_io(void)
+{
+fprintf(stderr, "io completion happened\n");
+}
+
+static void handle_notify(void)
+{
+fprintf(stderr, "virtqueue notify happened\n");
+}
+
+static void *data_plane_thread(void *opaque)
 {
+VirtIOBlock *s = opaque;
+struct epoll_event event;
+int nevents;
+EventHandler *event_handler;
+
+/* Signals are masked, EINTR should never happen */
+
+for (;;) {
+/* Wait for the next event.  Only do one event per call to keep the
+ * function simple, this could be changed later. */
+nevents = epoll_wait(s->epoll_fd, &event, 1, -1);
+if (unlikely(nevents != 1)) {
+fprintf(stderr, "epoll_wait failed: %m\n");
+continue; /* should never happen */
+}
+
+/* Find out which event handler has become active */
+event_handler = event.data.ptr;
+
+/* Clear the eventfd */
+event_notifier_test_and_clear(event_handler->notifier);
+
+/* Handle the event */
+event_handler->handler();
+}
+return NULL;
+}
+
+static void add_event_handler(int epoll_fd, EventHandler *event_handler)
+{
+struct epoll_event event = {
+.events = EPOLLIN,
+.data.ptr = event_handler,
+};
+if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, 
event_notifier_get_fd(event_handler->notifier), &event) != 0) {
+fprintf(stderr, "virtio-blk failed to add event handler to epoll: 
%m\n");
+exit(1);
+}
+}
+
+static void data_plane_start(VirtIOBlock *s)
+{
+/* Create epoll file descriptor */
+s->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
+if (s->epoll_fd < 0) {
+fprintf(stderr, "epoll_create1 failed: %m\n");
+return; /* TODO error handling */
+}
+
 if (s->vdev.binding->set_host_notifier(s->vdev.binding_opaque, 0, true) != 
0) {
 fprintf(stderr, "virtio-blk failed to set host notifier\n");
-return;
+return; /* TODO error handling */
+}
+
+s->notify_handler.notifier = virtio_queue_get_host_notifier(s->vq),
+s->notify_handler.handler = handle_notify;
+add_event_handler(s->epoll_fd, &s->notify_handler);
+
+/* Create aio context */
+if (io_setup(SEG_MAX, &s->io_ctx) != 0) {
+fprintf(stderr, "virtio-blk io_setup failed\n");
+return; /* TODO error handling */
 }
 
+if (event_notifier_init(&s->io_notifier, 0) != 0) {
+fprintf(stderr, "virtio-blk io event notifier creation failed\n");
+return; /* TODO error handling */
+}
+
+s->io_handler.notifier = &s->io_notifier;
+s->io_handler.handler = handle_io;
+add_event_handler(s->epoll_fd, &s->io_handler);
+
+qemu_thread_create(&s->data_plane_thread, data_plane_thread, s, 
QEMU_THREAD_JOINABLE);
+
 s->data_plane_started = true;
 }
 
-static void virtio_blk_data_plane_stop(VirtIOBlock *s)
+static void data_plane_stop(VirtIOBlock *s)
 {
 s->data_plane_started = false;
 
+/* TODO stop data plane thread */
+
+event_notifier_cleanup(&s->io_no

Re: [Qemu-devel] [RFC] introduce a dynamic library to expose qemu block API

2012-07-18 Thread Kevin Wolf

Am 18.07.2012 16:12, schrieb Daniel P. Berrange:
> On Wed, Jul 18, 2012 at 04:02:15PM +0200, Paolo Bonzini wrote:
>> Il 18/07/2012 15:58, Daniel P. Berrange ha scritto:
>>> How is error reporting dealt with
>>
>> These APIs just return errno values.
> 
> Which has led to somewhat unhelpful error reporting in the past. If we're
> designing a library API it'd be nice to improve on this.

But in most cases, errno is what we get from the OS, so we can't do much
more than passing it on. Maybe we can do a bit better with bdrv_open(),
which is relatively likely to fail in qemu rather than in the kernel
because something's wrong with the content of the image.

>>> , and what is the intent around
>>> thread safety of the APIs ?  I'd like to see a fully thread safe
>>> API - multiple threads can use the same 'BlockDriverState *'
>>> concurrently, and thread-local error reporting.
>>
>> This is a bit difficult to provide, since the QEMU block layer itself is
>> not thread-safe.
> 
> Yep, I'd expect that this is something we'd need to fix when turning the
> code into a library.  NB, I don't mean to say QEMU should protect against
> an app doing stupid things like letting 2 threads write to the same area
> of the file at once. That's upto the application. I simply mean that the
> BlockDriverState shouldn't corrupt itself if 2 separate APIs are called
> concurrently on the same instance.

I think it makes sense to make the library thread-safe - and if it only
means taking the global mutex like we do in qemu. I think threading is a
good interface to allow clients to do AIO (even though possibly not the
only one we want to provide), and eventually it will match what qemu is
doing internally.

Kevin

[Qemu-devel] [RFC v9 27/27] virtio-blk: add EVENT_IDX support to dataplane

2012-07-18 Thread Stefan Hajnoczi

This patch adds support for the VIRTIO_RING_F_EVENT_IDX feature for
interrupt mitigation.  virtio-blk doesn't do anything fancy with it so
we may not see a performance improvement.  This patch will allow newer
guest kernels to run successfully.

Signed-off-by: Stefan Hajnoczi 
---
 hw/dataplane/vring.h |   65 --
 hw/virtio-blk.c  |   16 ++---
 2 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/hw/dataplane/vring.h b/hw/dataplane/vring.h
index bbf8c86..d939a22 100644
--- a/hw/dataplane/vring.h
+++ b/hw/dataplane/vring.h
@@ -14,6 +14,8 @@ typedef struct {
 struct vring vr;/* virtqueue vring mapped to host memory */
 __u16 last_avail_idx;   /* last processed avail ring index */
 __u16 last_used_idx;/* last processed used ring index */
+uint16_t signalled_used;/* EVENT_IDX state */
+bool signalled_used_valid;
 } Vring;
 
 static inline unsigned int vring_get_num(Vring *vring)
@@ -63,6 +65,8 @@ static void vring_setup(Vring *vring, VirtIODevice *vdev, int 
n)
 
 vring->last_avail_idx = 0;
 vring->last_used_idx = 0;
+vring->signalled_used = 0;
+vring->signalled_used_valid = false;
 
 fprintf(stderr, "vring physical=%#lx desc=%p avail=%p used=%p\n",
 (unsigned long)virtio_queue_get_ring_addr(vdev, n),
@@ -75,21 +79,48 @@ static bool vring_more_avail(Vring *vring)
return vring->vr.avail->idx != vring->last_avail_idx;
 }
 
-/* Hint to disable guest->host notifies */
-static void vring_disable_cb(Vring *vring)
+/* Toggle guest->host notifies */
+static void vring_set_notification(VirtIODevice *vdev, Vring *vring, bool 
enable)
 {
-vring->vr.used->flags |= VRING_USED_F_NO_NOTIFY;
+if (vdev->guest_features & (1 << VIRTIO_RING_F_EVENT_IDX)) {
+if (enable) {
+vring_avail_event(&vring->vr) = vring->vr.avail->idx;
+}
+} else if (enable) {
+vring->vr.used->flags &= ~VRING_USED_F_NO_NOTIFY;
+} else {
+vring->vr.used->flags |= VRING_USED_F_NO_NOTIFY;
+}
 }
 
-/* Re-enable guest->host notifies
- *
- * Returns false if there are more descriptors in the ring.
- */
-static bool vring_enable_cb(Vring *vring)
+/* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */
+static bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
 {
-vring->vr.used->flags &= ~VRING_USED_F_NO_NOTIFY;
-__sync_synchronize(); /* mb() */
-return !vring_more_avail(vring);
+uint16_t old, new;
+bool v;
+/* Flush out used index updates. This is paired
+ * with the barrier that the Guest executes when enabling
+ * interrupts. */
+__sync_synchronize(); /* smp_mb() */
+
+if ((vdev->guest_features & VIRTIO_F_NOTIFY_ON_EMPTY) &&
+unlikely(vring->vr.avail->idx == vring->last_avail_idx)) {
+return true;
+}
+
+if (!(vdev->guest_features & VIRTIO_RING_F_EVENT_IDX)) {
+return !(vring->vr.avail->flags & VRING_AVAIL_F_NO_INTERRUPT);
+}
+old = vring->signalled_used;
+v = vring->signalled_used_valid;
+new = vring->signalled_used = vring->last_used_idx;
+vring->signalled_used_valid = true;
+
+if (unlikely(!v)) {
+return true;
+}
+
+return vring_need_event(vring_used_event(&vring->vr), new, old);
 }
 
 /* This is stolen from linux-2.6/drivers/vhost/vhost.c. */
@@ -178,7 +209,7 @@ static bool get_indirect(Vring *vring,
  *
  * Stolen from linux-2.6/drivers/vhost/vhost.c.
  */
-static int vring_pop(Vring *vring,
+static int vring_pop(VirtIODevice *vdev, Vring *vring,
  struct iovec iov[], struct iovec *iov_end,
  unsigned int *out_num, unsigned int *in_num)
 {
@@ -214,6 +245,10 @@ static int vring_pop(Vring *vring,
exit(1);
}
 
+   if (vdev->guest_features & (1 << VIRTIO_RING_F_EVENT_IDX)) {
+   vring_avail_event(&vring->vr) = vring->vr.avail->idx;
+   }
+
/* When we start there are none of either input nor output. */
*out_num = *in_num = 0;
 
@@ -279,6 +314,7 @@ static int vring_pop(Vring *vring,
 static void vring_push(Vring *vring, unsigned int head, int len)
 {
struct vring_used_elem *used;
+   uint16_t new;
 
/* The virtqueue contains a ring of used buffers.  Get a pointer to the
 * next entry in that used ring. */
@@ -289,7 +325,10 @@ static void vring_push(Vring *vring, unsigned int head, 
int len)
/* Make sure buffer is written before we update index. */
__sync_synchronize(); /* smp_wmb() */
 
-vring->vr.used->idx = ++vring->last_used_idx;
+   new = vring->vr.used->idx = ++vring->last_used_idx;
+   if (unlikely((int16_t)(new - vring->signalled_used) < (uint16_t)1)) {
+   vring->signalled_used_valid = false;
+   }
 }
 
 #endif /* VRING_H */
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index cff2298..a3e3d8c 100644
--- a/hw/virtio-blk.c

Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data

2012-07-18 Thread Michael S. Tsirkin

On Wed, Jul 18, 2012 at 11:17:12PM +1000, Alexey Kardashevskiy wrote:
> On 18/07/12 22:43, Michael S. Tsirkin wrote:
> > On Thu, Jun 21, 2012 at 09:39:10PM +1000, Alexey Kardashevskiy wrote:
> >> Added (msi|msix)_set_message() functions.
> >>
> >> Currently msi_notify()/msix_notify() write to these vectors to
> >> signal the guest about an interrupt so the correct values have to
> >> written there by the guest or QEMU.
> >>
> >> For example, POWER guest never initializes MSI/MSIX vectors, instead
> >> it uses RTAS hypercalls. So in order to support MSIX for virtio-pci on
> >> POWER we have to initialize MSI/MSIX message from QEMU.
> >>
> >> Signed-off-by: Alexey Kardashevskiy 
> > 
> > So guests do enable MSI through config space, but do
> > not fill in vectors? 
> 
> Yes. msix_capability_init() calls arch_setup_msi_irqs() which does everything 
> it needs to do (i.e. calls hypervisor) before msix_capability_init() writes 
> PCI_MSIX_FLAGS_ENABLE to the PCI_MSIX_FLAGS register.
> 
> These vectors are the PCI bus addresses, the way they are set is specific for 
> a PCI host controller, I do not see why the current scheme is a bug.

I won't work with any real PCI device, will it? Real pci devices expect
vectors to be written into their memory.

> > Very strange. Are you sure it's not
> > just a guest bug? How does it work for other PCI devices?
> 
> Did not get the question. It works the same for every PCI device under POWER 
> guest.

I mean for real PCI devices.

> > Can't we just fix guest drivers to program the vectors properly?
> > 
> > Also pls address the comment below.
> 
> Comment below.
> 
> > Thanks!
> > 
> >> ---
> >>  hw/msi.c  |   13 +
> >>  hw/msi.h  |1 +
> >>  hw/msix.c |9 +
> >>  hw/msix.h |2 ++
> >>  4 files changed, 25 insertions(+)
> >>
> >> diff --git a/hw/msi.c b/hw/msi.c
> >> index 5233204..cc6102f 100644
> >> --- a/hw/msi.c
> >> +++ b/hw/msi.c
> >> @@ -105,6 +105,19 @@ static inline uint8_t msi_pending_off(const 
> >> PCIDevice* dev, bool msi64bit)
> >>  return dev->msi_cap + (msi64bit ? PCI_MSI_PENDING_64 : 
> >> PCI_MSI_PENDING_32);
> >>  }
> >>  
> >> +void msi_set_message(PCIDevice *dev, MSIMessage msg)
> >> +{
> >> +uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
> >> +bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
> >> +
> >> +if (msi64bit) {
> >> +pci_set_quad(dev->config + msi_address_lo_off(dev), msg.address);
> >> +} else {
> >> +pci_set_long(dev->config + msi_address_lo_off(dev), msg.address);
> >> +}
> >> +pci_set_word(dev->config + msi_data_off(dev, msi64bit), msg.data);
> >> +}
> >> +
> > 
> > Please add documentation. Something like
> > 
> > /*
> >  * Special API for POWER to configure the vectors through
> >  * a side channel. Should never be used by devices.
> >  */
> 
> 
> It is useful for any para-virtualized environment I believe, is not it?
> For s390 as well. Of course, if it supports PCI, for example, what I am not 
> sure it does though :)

I expect the normal guest to program the address into MSI register using
config accesses, same way that it enables MSI/MSIX.
Why POWER does it differently I did not yet figure out but I hope
this weirdness is not so widespread.

> >>  bool msi_enabled(const PCIDevice *dev)
> >>  {
> >>  return msi_present(dev) &&
> >> diff --git a/hw/msi.h b/hw/msi.h
> >> index 75747ab..6ec1f99 100644
> >> --- a/hw/msi.h
> >> +++ b/hw/msi.h
> >> @@ -31,6 +31,7 @@ struct MSIMessage {
> >>  
> >>  extern bool msi_supported;
> >>  
> >> +void msi_set_message(PCIDevice *dev, MSIMessage msg);
> >>  bool msi_enabled(const PCIDevice *dev);
> >>  int msi_init(struct PCIDevice *dev, uint8_t offset,
> >>   unsigned int nr_vectors, bool msi64bit, bool 
> >> msi_per_vector_mask);
> >> diff --git a/hw/msix.c b/hw/msix.c
> >> index ded3c55..5f7d6d3 100644
> >> --- a/hw/msix.c
> >> +++ b/hw/msix.c
> >> @@ -45,6 +45,15 @@ static MSIMessage msix_get_message(PCIDevice *dev, 
> >> unsigned vector)
> >>  return msg;
> >>  }
> >>  
> >> +void msix_set_message(PCIDevice *dev, int vector, struct MSIMessage msg)
> >> +{
> >> +uint8_t *table_entry = dev->msix_table_page + vector * 
> >> PCI_MSIX_ENTRY_SIZE;
> >> +
> >> +pci_set_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR, msg.address);
> >> +pci_set_long(table_entry + PCI_MSIX_ENTRY_DATA, msg.data);
> >> +table_entry[PCI_MSIX_ENTRY_VECTOR_CTRL] &= 
> >> ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
> >> +}
> >> +
> >>  /* Add MSI-X capability to the config space for the device. */
> >>  /* Given a bar and its size, add MSI-X table on top of it
> >>   * and fill MSI-X capability in the config space.
> >> diff --git a/hw/msix.h b/hw/msix.h
> >> index 50aee82..26a437e 100644
> >> --- a/hw/msix.h
> >> +++ b/hw/msix.h
> >> @@ -4,6 +4,8 @@
> >>  #include "qemu-common.h"
> >>  #include "pci.h"
> >>  
> >> +void msix_set_message(PCIDevice *dev, int vector, MSIMessage msg);
> >> +
> >>  int msix_init(PCIDevice *pde

[Qemu-devel] [RFC v9 13/27] virtio-blk: Increase max requests for indirect vring

2012-07-18 Thread Stefan Hajnoczi

With indirect vring descriptors, one can no longer assume that the
maximum number of requests is VRING_MAX / 2 (outhdr and inhdr).  Now a
single indirect descriptor can contain the outhdr and inhdr so max
requests becomes VRING_MAX.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-blk.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 591eace..7ae3c56 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -26,7 +26,9 @@
 enum {
 SEG_MAX = 126,  /* maximum number of I/O segments */
 VRING_MAX = SEG_MAX + 2,/* maximum number of vring descriptors */
-REQ_MAX = VRING_MAX / 2,/* maximum number of requests in the vring 
*/
+REQ_MAX = VRING_MAX,/* maximum number of requests in the vring,
+ * is VRING_MAX / 2 with traditional and
+ * VRING_MAX with indirect descriptors */
 };
 
 typedef struct {
-- 
1.7.10.4

[Qemu-devel] [RFC v9 16/27] virtio-blk: Kick data plane thread using event notifier set

2012-07-18 Thread Stefan Hajnoczi

---
 hw/virtio-blk.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 1616be5..d75c187 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -339,8 +339,7 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, 
VirtQueue *vq)
 virtio_blk_set_status(vdev, VIRTIO_CONFIG_S_DRIVER_OK); /* start the 
thread */
 
 /* Now kick the thread */
-uint64_t dummy = 1;
-ssize_t unused __attribute__((unused)) = 
write(event_notifier_get_fd(virtio_queue_get_host_notifier(s->vq)), &dummy, 
sizeof dummy);
+event_notifier_set(virtio_queue_get_host_notifier(s->vq));
 }
 
 /* coalesce internal state, copy to pci i/o region 0
-- 
1.7.10.4

[Qemu-devel] [PATCH 07/11] configure: Fix compile warning in PNG test

2012-07-18 Thread Peter Maydell

Fix compile warning (variable 'png_ptr' set but not used) in the
PNG detection test code.

Signed-off-by: Peter Maydell 
---
 configure |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/configure b/configure
index aced52e..784325a 100755
--- a/configure
+++ b/configure
@@ -1727,7 +1727,7 @@ cat > $TMPC &1; then
-- 
1.7.5.4

[Qemu-devel] [RFC v9 19/27] virtio-blk: Disable guest->host notifies while processing vring

2012-07-18 Thread Stefan Hajnoczi

---
 hw/dataplane/vring.h |   28 +++-
 hw/virtio-blk.c  |   47 +++
 2 files changed, 58 insertions(+), 17 deletions(-)

diff --git a/hw/dataplane/vring.h b/hw/dataplane/vring.h
index 44ef4a9..cdd4d4a 100644
--- a/hw/dataplane/vring.h
+++ b/hw/dataplane/vring.h
@@ -69,11 +69,29 @@ static void vring_setup(Vring *vring, VirtIODevice *vdev, 
int n)
 vring->vr.desc, vring->vr.avail, vring->vr.used);
 }
 
+/* Are there more descriptors available? */
 static bool vring_more_avail(Vring *vring)
 {
return vring->vr.avail->idx != vring->last_avail_idx;
 }
 
+/* Hint to disable guest->host notifies */
+static void vring_disable_cb(Vring *vring)
+{
+vring->vr.used->flags |= VRING_USED_F_NO_NOTIFY;
+}
+
+/* Re-enable guest->host notifies
+ *
+ * Returns false if there are more descriptors in the ring.
+ */
+static bool vring_enable_cb(Vring *vring)
+{
+vring->vr.used->flags &= ~VRING_USED_F_NO_NOTIFY;
+__sync_synchronize(); /* mb() */
+return !vring_more_avail(vring);
+}
+
 /* This is stolen from linux-2.6/drivers/vhost/vhost.c. */
 static bool get_indirect(Vring *vring,
struct iovec iov[], struct iovec *iov_end,
@@ -160,7 +178,7 @@ static bool get_indirect(Vring *vring,
  *
  * Stolen from linux-2.6/drivers/vhost/vhost.c.
  */
-static unsigned int vring_pop(Vring *vring,
+static int vring_pop(Vring *vring,
  struct iovec iov[], struct iovec *iov_end,
  unsigned int *out_num, unsigned int *in_num)
 {
@@ -178,9 +196,9 @@ static unsigned int vring_pop(Vring *vring,
exit(1);
}
 
-   /* If there's nothing new since last we looked, return invalid. */
+   /* If there's nothing new since last we looked. */
if (avail_idx == last_avail_idx)
-   return num;
+   return -EAGAIN;
 
/* Only get avail ring entries after they have been exposed by guest. */
__sync_synchronize(); /* smp_rmb() */
@@ -215,7 +233,7 @@ static unsigned int vring_pop(Vring *vring,
 desc = vring->vr.desc[i];
if (desc.flags & VRING_DESC_F_INDIRECT) {
if (!get_indirect(vring, iov, iov_end, out_num, in_num, 
&desc)) {
-return num; /* not enough iovecs, stop for now */
+return -ENOBUFS; /* not enough iovecs, stop for now */
 }
 continue;
}
@@ -225,7 +243,7 @@ static unsigned int vring_pop(Vring *vring,
  * with the current set.
  */
 if (iov >= iov_end) {
-return num;
+return -ENOBUFS;
 }
 
 iov->iov_base = phys_to_host(vring, desc.addr);
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index efeffa0..f67fdb7 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -202,7 +202,8 @@ static bool handle_notify(EventHandler *handler)
  * accept more I/O.  This is not implemented yet.
  */
 struct iovec iovec[VRING_MAX];
-struct iovec *iov, *end = &iovec[VRING_MAX];
+struct iovec *end = &iovec[VRING_MAX];
+struct iovec *iov = iovec;
 
 /* When a request is read from the vring, the index of the first descriptor
  * (aka head) is returned so that the completed request can be pushed onto
@@ -211,19 +212,41 @@ static bool handle_notify(EventHandler *handler)
  * The number of hypervisor read-only iovecs is out_num.  The number of
  * hypervisor write-only iovecs is in_num.
  */
-unsigned int head, out_num = 0, in_num = 0;
+int head;
+unsigned int out_num = 0, in_num = 0;
 
-for (iov = iovec; ; iov += out_num + in_num) {
-head = vring_pop(&s->vring, iov, end, &out_num, &in_num);
-if (head >= vring_get_num(&s->vring)) {
-break; /* no more requests */
-}
+for (;;) {
+/* Disable guest->host notifies to avoid unnecessary vmexits */
+vring_disable_cb(&s->vring);
+
+for (;;) {
+head = vring_pop(&s->vring, iov, end, &out_num, &in_num);
+if (head < 0) {
+break; /* no more requests */
+}
 
-/*
-fprintf(stderr, "out_num=%u in_num=%u head=%u\n", out_num, in_num, 
head);
-*/
+/*
+fprintf(stderr, "out_num=%u in_num=%u head=%d\n", out_num, in_num, 
head);
+*/
 
-process_request(&s->ioqueue, iov, out_num, in_num, head);
+process_request(&s->ioqueue, iov, out_num, in_num, head);
+iov += out_num + in_num;
+}
+
+if (likely(head == -EAGAIN)) { /* vring emptied */
+/* Re-enable guest->host notifies and stop processing the vring.
+ * But if the guest has snuck in more descriptors, keep processing.
+ */
+if (likely(vring_enable_cb(&s->vring))) {
+break;
+}
+} else { /* head == -ENOBUFS, cannot continue since iov

[Qemu-devel] [PATCH 10/11] configure: -I\$(SRC_PATH) goes in QEMU_INCLUDES not QEMU_CFLAGS

2012-07-18 Thread Peter Maydell

If the smartcard configure check passes, add '-I\$(SRC_PATH)/libcacard'
to QEMU_INCLUDES, not QEMU_CFLAGS. Otherwise the unexpanded SRC_PATH
will cause a warning in every following configure test.

Signed-off-by: Peter Maydell 
---
 configure |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 638e486..8140464 100755
--- a/configure
+++ b/configure
@@ -2656,7 +2656,7 @@ if test "$smartcard" != "no" ; then
 #include 
 int main(void) { PK11_FreeSlot(0); return 0; }
 EOF
-smartcard_cflags="-I\$(SRC_PATH)/libcacard"
+smartcard_includes="-I\$(SRC_PATH)/libcacard"
 libcacard_libs="$($pkg_config --libs nss 2>/dev/null) $glib_libs"
 libcacard_cflags="$($pkg_config --cflags nss 2>/dev/null) $glib_cflags"
 test_cflags="$libcacard_cflags"
@@ -2670,7 +2670,8 @@ EOF
 if $pkg_config --atleast-version=3.12.8 nss >/dev/null 2>&1 && \
   compile_prog "$test_cflags" "$libcacard_libs"; then
 smartcard_nss="yes"
-QEMU_CFLAGS="$QEMU_CFLAGS $smartcard_cflags $libcacard_cflags"
+QEMU_CFLAGS="$QEMU_CFLAGS $libcacard_cflags"
+QEMU_INCLUDES="$QEMU_INCLUDES $smartcard_includes"
 libs_softmmu="$libcacard_libs $libs_softmmu"
 else
 if test "$smartcard_nss" = "yes"; then
-- 
1.7.5.4

[Qemu-devel] [RFC v9 22/27] virtio-blk: Fix request merging

2012-07-18 Thread Stefan Hajnoczi

Khoa Huynh  discovered that request merging is broken.
The merged iocb is not updated to reflect the total number of iovecs and
the offset is also outdated.

This patch fixes request merging.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-blk.c |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 9131a7a..51807b5 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -178,13 +178,17 @@ static void merge_request(struct iocb *iocb_a, struct 
iocb *iocb_b)
 req_a->len = iocb_nbytes(iocb_a);
 }
 
-iocb_b->u.v.vec = iovec;
-req_b->len = iocb_nbytes(iocb_b);
-req_b->next_merged = req_a;
 /*
 fprintf(stderr, "merged %p (%u) and %p (%u), %u iovecs in total\n",
 req_a, iocb_a->u.v.nr, req_b, iocb_b->u.v.nr, iocb_a->u.v.nr + 
iocb_b->u.v.nr);
 */
+
+iocb_b->u.v.vec = iovec;
+iocb_b->u.v.nr += iocb_a->u.v.nr;
+iocb_b->u.v.offset = iocb_a->u.v.offset;
+
+req_b->len = iocb_nbytes(iocb_b);
+req_b->next_merged = req_a;
 }
 
 static void process_request(IOQueue *ioq, struct iovec iov[], unsigned int 
out_num, unsigned int in_num, unsigned int head)
-- 
1.7.10.4

[Qemu-devel] [RFC v9 17/27] virtio-blk: Use guest notifier to raise interrupts

2012-07-18 Thread Stefan Hajnoczi

The data plane thread isn't allowed to call virtio_irq() directly
because that function is not thread-safe.  Use the guest notifier just
like virtio-net to handle IRQs.

When MSI-X is in use and the vector is unmasked, the guest notifier
directly sets the IRQ inside the host kernel.  If the vector is masked,
then QEMU's iothread needs to take note of the IRQ.  If MSI-X is not in
use, then QEMU's iothread handles the IRQ and this will be slower than
synchronously calling notify_irq() from the data plane thread.
---
 hw/virtio-blk.c |   28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index d75c187..bdff68a 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -73,6 +73,18 @@ static int get_raw_posix_fd_hack(VirtIOBlock *s)
 return *(int*)s->bs->file->opaque;
 }
 
+/* Raise an interrupt to signal guest, if necessary */
+static void virtio_blk_notify_guest(VirtIOBlock *s)
+{
+/* Always notify when queue is empty (when feature acknowledge) */
+   if ((s->vring.vr.avail->flags & VRING_AVAIL_F_NO_INTERRUPT) &&
+   (s->vring.vr.avail->idx != s->vring.last_avail_idx ||
+!(s->vdev.guest_features & (1 << VIRTIO_F_NOTIFY_ON_EMPTY
+   return;
+
+event_notifier_set(virtio_queue_get_guest_notifier(s->vq));
+}
+
 static void complete_request(struct iocb *iocb, ssize_t ret, void *opaque)
 {
 VirtIOBlock *s = opaque;
@@ -154,7 +166,7 @@ static void process_request(IOQueue *ioq, struct iovec 
iov[], unsigned int out_n
 fdatasync(get_raw_posix_fd_hack(s));
 inhdr->status = VIRTIO_BLK_S_OK;
 vring_push(&s->vring, head, sizeof *inhdr);
-virtio_irq(s->vq);
+virtio_blk_notify_guest(s);
 }
 return;
 
@@ -222,8 +234,7 @@ static bool handle_io(EventHandler *handler)
 VirtIOBlock *s = container_of(handler, VirtIOBlock, io_handler);
 
 if (ioq_run_completion(&s->ioqueue, complete_request, s) > 0) {
-/* TODO is this thread-safe and can it be done faster? */
-virtio_irq(s->vq);
+virtio_blk_notify_guest(s);
 }
 
 /* If there were more requests than iovecs, the vring will not be empty yet
@@ -251,11 +262,17 @@ static void data_plane_start(VirtIOBlock *s)
 
 vring_setup(&s->vring, &s->vdev, 0);
 
+/* Set up guest notifier (irq) */
+if (s->vdev.binding->set_guest_notifier(s->vdev.binding_opaque, 0, true) 
!= 0) {
+fprintf(stderr, "virtio-blk failed to set guest notifier, ensure 
-enable-kvm is set\n");
+exit(1);
+}
+
 event_poll_init(&s->event_poll);
 
 /* Set up virtqueue notify */
 if (s->vdev.binding->set_host_notifier(s->vdev.binding_opaque, 0, true) != 
0) {
-fprintf(stderr, "virtio-blk failed to set host notifier, ensure 
-enable-kvm is set\n");
+fprintf(stderr, "virtio-blk failed to set host notifier\n");
 exit(1);
 }
 event_poll_add(&s->event_poll, &s->notify_handler,
@@ -296,6 +313,9 @@ static void data_plane_stop(VirtIOBlock *s)
 s->vdev.binding->set_host_notifier(s->vdev.binding_opaque, 0, false);
 
 event_poll_cleanup(&s->event_poll);
+
+/* Clean up guest notifier (irq) */
+s->vdev.binding->set_guest_notifier(s->vdev.binding_opaque, 0, false);
 }
 
 static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t val)
-- 
1.7.10.4

[Qemu-devel] [RFC v9 00/27] virtio: virtio-blk data plane

2012-07-18 Thread Stefan Hajnoczi

This series implements a dedicated thread for virtio-blk processing using Linux
AIO for raw image files only.  It is based on qemu-kvm.git a0bc8c3 and somewhat
old but I wanted to share it on the list since it has been mentioned on mailing
lists and IRC recently.

These patches can be used for benchmarking and discussion about how to improve
block performance.  Paolo Bonzini has also worked in this area and might want
to share his patches.

The basic approach is:
1. Each virtio-blk device has a thread dedicated to handling ioeventfd
   signalling when the guest kicks the virtqueue.
2. Requests are processed without going through the QEMU block layer using
   Linux AIO directly.
3. Completion interrupts are injected via ioctl from the dedicated thread.

The series also contains request merging as a bdrv_aio_multiwrite() equivalent.
This was only to get a comparison against the QEMU block layer and I would drop
it for other types of analysis.

The effect of this series is that O_DIRECT Linux AIO on raw files can bypass
the QEMU global mutex and block layer.  This means higher performance.

A cleaned up version of this approach could be added to QEMU as a raw O_DIRECT
Linux AIO fast path.  Image file formats, protocols, and other block layer
features are not supported by virtio-blk-data-plane.

Git repo:
http://repo.or.cz/w/qemu-kvm/stefanha.git/shortlog/refs/heads/virtio-blk-data-plane

Stefan Hajnoczi (27):
  virtio-blk: Remove virtqueue request handling code
  virtio-blk: Set up host notifier for data plane
  virtio-blk: Data plane thread event loop
  virtio-blk: Map vring
  virtio-blk: Do cheapest possible memory mapping
  virtio-blk: Take PCI memory range into account
  virtio-blk: Put dataplane code into its own directory
  virtio-blk: Read requests from the vring
  virtio-blk: Add Linux AIO queue
  virtio-blk: Stop data plane thread cleanly
  virtio-blk: Indirect vring and flush support
  virtio-blk: Add workaround for BUG_ON() dependency in virtio_ring.h
  virtio-blk: Increase max requests for indirect vring
  virtio-blk: Use pthreads instead of qemu-thread
  notifier: Add a function to set the notifier
  virtio-blk: Kick data plane thread using event notifier set
  virtio-blk: Use guest notifier to raise interrupts
  virtio-blk: Call ioctl() directly instead of irqfd
  virtio-blk: Disable guest->host notifies while processing vring
  virtio-blk: Add ioscheduler to detect mergable requests
  virtio-blk: Add basic request merging
  virtio-blk: Fix request merging
  virtio-blk: Stub out SCSI commands
  virtio-blk: fix incorrect length
  msix: fix irqchip breakage in msix_try_notify_from_thread()
  msix: use upstream kvm_irqchip_set_irq()
  virtio-blk: add EVENT_IDX support to dataplane

 event_notifier.c  |7 +
 event_notifier.h  |1 +
 hw/dataplane/event-poll.h |  116 +++
 hw/dataplane/ioq.h|  128 
 hw/dataplane/iosched.h|   97 ++
 hw/dataplane/vring.h  |  334 
 hw/msix.c |   15 +
 hw/msix.h |1 +
 hw/virtio-blk.c   |  753 +
 hw/virtio-pci.c   |8 +
 hw/virtio.c   |9 +
 hw/virtio.h   |3 +
 12 files changed, 1074 insertions(+), 398 deletions(-)
 create mode 100644 hw/dataplane/event-poll.h
 create mode 100644 hw/dataplane/ioq.h
 create mode 100644 hw/dataplane/iosched.h
 create mode 100644 hw/dataplane/vring.h

-- 
1.7.10.4

[Qemu-devel] [RFC v9 24/27] virtio-blk: fix incorrect length

2012-07-18 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-blk.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 8734029..cff2298 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -131,7 +131,7 @@ static void complete_one_request(VirtIOBlockRequest *req, 
VirtIOBlock *s, ssize_
  * written to, but for virtio-blk it seems to be the number of bytes
  * transferred plus the status bytes.
  */
-vring_push(&s->vring, req->head, len + sizeof req->status);
+vring_push(&s->vring, req->head, len + sizeof(*req->status));
 }
 
 static bool is_request_merged(VirtIOBlockRequest *req)
-- 
1.7.10.4

[Qemu-devel] [RFC v9 26/27] msix: use upstream kvm_irqchip_set_irq()

2012-07-18 Thread Stefan Hajnoczi

Commit 9507e305ec54062fccc88fcf6fccf1898a7e7141 changed the
kvm_set_irq() function to kvm_irqchip_set_irq().

Signed-off-by: Stefan Hajnoczi 
---
 hw/msix.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/msix.c b/hw/msix.c
index 0ed1013..373017a 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -512,7 +512,7 @@ bool msix_try_notify_from_thread(PCIDevice *dev, unsigned 
vector)
 return false;
 }
 if (likely(kvm_enabled() && kvm_irqchip_in_kernel())) {
-kvm_set_irq(dev->msix_irq_entries[vector].gsi, 1, NULL);
+kvm_irqchip_set_irq(kvm_state, dev->msix_irq_entries[vector].gsi, 1);
 return true;
 }
 return false;
-- 
1.7.10.4

[Qemu-devel] [RFC v9 20/27] virtio-blk: Add ioscheduler to detect mergable requests

2012-07-18 Thread Stefan Hajnoczi

---
 hw/dataplane/iosched.h |   78 
 hw/virtio-blk.c|5 
 2 files changed, 83 insertions(+)
 create mode 100644 hw/dataplane/iosched.h

diff --git a/hw/dataplane/iosched.h b/hw/dataplane/iosched.h
new file mode 100644
index 000..12ebccc
--- /dev/null
+++ b/hw/dataplane/iosched.h
@@ -0,0 +1,78 @@
+#ifndef IOSCHED_H
+#define IOSCHED_H
+
+#include "hw/dataplane/ioq.h"
+
+typedef struct {
+unsigned long iocbs;
+unsigned long merges;
+unsigned long sched_calls;
+} IOSched;
+
+static int iocb_cmp(const void *a, const void *b)
+{
+const struct iocb *iocb_a = a;
+const struct iocb *iocb_b = b;
+
+/*
+ * Note that we can't simply subtract req2->sector from req1->sector
+ * here as that could overflow the return value.
+ */
+if (iocb_a->u.c.offset > iocb_b->u.c.offset) {
+return 1;
+} else if (iocb_a->u.c.offset < iocb_b->u.c.offset) {
+return -1;
+} else {
+return 0;
+}
+}
+
+static size_t iocb_nbytes(struct iocb *iocb)
+{
+struct iovec *iov = iocb->u.c.buf;
+size_t nbytes = 0;
+size_t i;
+for (i = 0; i < iocb->u.c.nbytes; i++) {
+nbytes += iov->iov_len;
+iov++;
+}
+return nbytes;
+}
+
+static void iosched_init(IOSched *iosched)
+{
+memset(iosched, 0, sizeof *iosched);
+}
+
+static void iosched_print_stats(IOSched *iosched)
+{
+fprintf(stderr, "iocbs = %lu merges = %lu sched_calls = %lu\n",
+iosched->iocbs, iosched->merges, iosched->sched_calls);
+memset(iosched, 0, sizeof *iosched);
+}
+
+static void iosched(IOSched *iosched, struct iocb *unsorted[], unsigned int 
count)
+{
+struct iocb *sorted[count];
+struct iocb *last;
+unsigned int i;
+
+if ((++iosched->sched_calls % 1000) == 0) {
+iosched_print_stats(iosched);
+}
+
+memcpy(sorted, unsorted, sizeof sorted);
+qsort(sorted, count, sizeof sorted[0], iocb_cmp);
+
+iosched->iocbs += count;
+last = sorted[0];
+for (i = 1; i < count; i++) {
+if (last->aio_lio_opcode == sorted[i]->aio_lio_opcode &&
+last->u.c.offset + iocb_nbytes(last) == sorted[i]->u.c.offset) {
+iosched->merges++;
+}
+last = sorted[i];
+}
+}
+
+#endif /* IOSCHED_H */
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index f67fdb7..75cb0f2 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -22,6 +22,7 @@
 #include "hw/dataplane/event-poll.h"
 #include "hw/dataplane/vring.h"
 #include "hw/dataplane/ioq.h"
+#include "hw/dataplane/iosched.h"
 #include "kvm.h"
 
 enum {
@@ -57,6 +58,7 @@ typedef struct {
 EventHandler notify_handler;/* virtqueue notify handler */
 
 IOQueue ioqueue;/* Linux AIO queue (should really be per 
dataplane thread) */
+IOSched iosched;/* I/O scheduler */
 VirtIOBlockRequest requests[REQ_MAX]; /* pool of requests, managed by the 
queue */
 } VirtIOBlock;
 
@@ -249,6 +251,8 @@ static bool handle_notify(EventHandler *handler)
 }
 }
 
+iosched(&s->iosched, s->ioqueue.queue, s->ioqueue.queue_idx);
+
 /* Submit requests, if any */
 int rc = ioq_submit(&s->ioqueue);
 if (unlikely(rc < 0)) {
@@ -289,6 +293,7 @@ static void data_plane_start(VirtIOBlock *s)
 {
 int i;
 
+iosched_init(&s->iosched);
 vring_setup(&s->vring, &s->vdev, 0);
 
 /* Set up guest notifier (irq) */
-- 
1.7.10.4

[Qemu-devel] [RFC v9 02/27] virtio-blk: Set up host notifier for data plane

2012-07-18 Thread Stefan Hajnoczi

Set up the virtqueue notify ioeventfd that the data plane will monitor.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio-blk.c |   37 +
 1 file changed, 37 insertions(+)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index a627427..0389294 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -26,6 +26,8 @@ typedef struct VirtIOBlock
 char *serial;
 unsigned short sector_mask;
 DeviceState *qdev;
+
+bool data_plane_started;
 } VirtIOBlock;
 
 static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
@@ -33,6 +35,39 @@ static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
 return (VirtIOBlock *)vdev;
 }
 
+static void virtio_blk_data_plane_start(VirtIOBlock *s)
+{
+if (s->vdev.binding->set_host_notifier(s->vdev.binding_opaque, 0, true) != 
0) {
+fprintf(stderr, "virtio-blk failed to set host notifier\n");
+return;
+}
+
+s->data_plane_started = true;
+}
+
+static void virtio_blk_data_plane_stop(VirtIOBlock *s)
+{
+s->data_plane_started = false;
+
+s->vdev.binding->set_host_notifier(s->vdev.binding_opaque, 0, false);
+}
+
+static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t val)
+{
+VirtIOBlock *s = to_virtio_blk(vdev);
+
+/* Toggle host notifier only on status change */
+if (s->data_plane_started == !!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
+return;
+}
+
+if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
+virtio_blk_data_plane_start(s);
+} else {
+virtio_blk_data_plane_stop(s);
+}
+}
+
 static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 fprintf(stderr, "virtio_blk_handle_output: should never get here,"
@@ -115,6 +150,7 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf 
*conf,
 
 s->vdev.get_config = virtio_blk_update_config;
 s->vdev.get_features = virtio_blk_get_features;
+s->vdev.set_status = virtio_blk_set_status;
 s->bs = conf->bs;
 s->conf = conf;
 s->serial = *serial;
@@ -122,6 +158,7 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf 
*conf,
 bdrv_guess_geometry(s->bs, &cylinders, &heads, &secs);
 
 s->vq = virtio_add_queue(&s->vdev, 128, virtio_blk_handle_output);
+s->data_plane_started = false;
 
 s->qdev = dev;
 bdrv_set_buffer_alignment(s->bs, conf->logical_block_size);
-- 
1.7.10.4

Re: [Qemu-devel] Can't Build i386-bsd-user on Freebsd

2012-07-18 Thread Wei-Ren Chen

  CC'ed to Blue, since he is bsd-user maintainer as MAINTAINER said.

On Mon, Jul 16, 2012 at 07:07:38PM -0700, Paramjot Oberoi wrote:
> Hey all,
> 
> I'm having trouble building user mode BSD emulation on FreeBSD. I've tried
> 1.0.1, 1.1.1, and stable from GIT. I build by doing a: "./configure
> --target-list=i386-bsd-user", and then make with "gmake". The first error I 
> get
> is in regards to CTLTYPE_QUAD.
> 
> /usr/home/qemu-1.0.1/bsd-user/syscall.c: In function 'sysctl_oldcvt':
> /usr/home/qemu-1.0.1/bsd-user/syscall.c:214: error: 'CTLTYPE_QUAD' undeclared
> (first use in this function)
> /usr/home/qemu-1.0.1/bsd-user/syscall.c:214: error: (Each undeclared 
> identifier
> is reported only once
> /usr/home/qemu-1.0.1/bsd-user/syscall.c:214: error: for each function it
> appears in.)
> gmake[1]: *** [syscall.o] Error 1
> gmake: *** [subdir-i386-bsd-user] Error 2
> 
> To fix this error I added an #ifdef/#endif around the switch statement as
> specified in this thread: http://comments.gmane.org/gmane.comp.emulators.qemu/
> 104657. Now I'm stuck with the following error:
> 
>   CCi386-bsd-user/helper.o
>   CCi386-bsd-user/cpu.o
>   CCi386-bsd-user/disas.o
>   CCi386-bsd-user/ioport-user.o
>   LINK  i386-bsd-user/qemu-i386
> /usr/local/lib/libgthread-2.0.so: could not read symbols: File in wrong format
> gmake[1]: *** [qemu-i386] Error 1
> gmake: *** [subdir-i386-bsd-user] Error 2
> 
> Any advice on how to get it to build? I'm running 32-bit FreeBSD. Thanks in
> advance, I appreciate it .

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

[Qemu-devel] [PATCH 02/11] configure: Fix build with ALSA audio driver

2012-07-18 Thread Peter Maydell

From: Stefan Weil 

Since commit 417c9d72d48275d19c60861896efd4962d21aca2,
all configure tests normally run with -Werror.

Some of these tests now fail because they raised a compiler warning.

Here a build breakage for ALSA (configure --audio-drv-list=alsa) is fixed.

Signed-off-by: Stefan Weil 
Reviewed-by: Peter Maydell 
---
 configure |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/configure b/configure
index 383fa3d..63156a7 100755
--- a/configure
+++ b/configure
@@ -1889,7 +1889,7 @@ for drv in $audio_drv_list; do
 case $drv in
 alsa)
 audio_drv_probe $drv alsa/asoundlib.h -lasound \
-"snd_pcm_t **handle; return snd_pcm_close(*handle);"
+"return snd_pcm_close((snd_pcm_t *)0);"
 libs_softmmu="-lasound $libs_softmmu"
 ;;
 
-- 
1.7.5.4

[Qemu-devel] [PATCH 01/11] configure: Don't run configure tests with -Werror enabled

2012-07-18 Thread Peter Maydell

Don't run configure tests with -Werror in the compiler flags. The idea
of -Werror is that it makes problems very obvious to developers, so
they get fixed quickly. However, when running configure tests, failures
due to -Werror are far from obvious -- they simply result in the test
quietly failing when it should have passed. Not using -Werror is in
line with recommended practice in the Autoconf world.

This commit is essentially backing out the changes in commit 417c9d72.
Instead we fix the problem that commit was trying to address in a
different way: we add -Werror only for the test of the nss headers,
with a comment that this is specifically intended to detect a bug
in some releases of nss.

We also have to clean up a bug in the smartcard test where it was
trying to include smartcard_cflags in the test compile flags: this
would always result in a failure with -Werror, because they include
an escaped "$(SRC_PATH)" which is only valid when used in the final
makefile.

Signed-off-by: Peter Maydell 
Reviewed-by: Stefan Weil 
---
 configure |   22 ++
 1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/configure b/configure
index 0a3896e..383fa3d 100755
--- a/configure
+++ b/configure
@@ -1156,9 +1156,10 @@ gcc_flags="-Wold-style-declaration 
-Wold-style-definition -Wtype-limits"
 gcc_flags="-Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers 
$gcc_flags"
 gcc_flags="-Wmissing-include-dirs -Wempty-body -Wnested-externs $gcc_flags"
 gcc_flags="-fstack-protector-all -Wendif-labels $gcc_flags"
-if test "$werror" = "yes" ; then
-gcc_flags="-Werror $gcc_flags"
-fi
+# Note that we do not add -Werror to gcc_flags here, because that would
+# enable it for all configure tests. If a configure test failed due
+# to -Werror this would just silently disable some features,
+# so it's too error prone.
 cat > $TMPC << EOF
 int main(void) { return 0; }
 EOF
@@ -2656,8 +2657,16 @@ EOF
 smartcard_cflags="-I\$(SRC_PATH)/libcacard"
 libcacard_libs="$($pkg_config --libs nss 2>/dev/null) $glib_libs"
 libcacard_cflags="$($pkg_config --cflags nss 2>/dev/null) $glib_cflags"
+test_cflags="$libcacard_cflags"
+# The header files in nss < 3.13.3 have a bug which causes them to
+# emit a warning. If we're going to compile QEMU with -Werror, then
+# test that the headers don't have this bug. Otherwise we would pass
+# the configure test but fail to compile QEMU later.
+if test "$werror" = "yes"; then
+test_cflags="-Werror $test_cflags"
+fi
 if $pkg_config --atleast-version=3.12.8 nss >/dev/null 2>&1 && \
-  compile_prog "$smartcard_cflags $libcacard_cflags" 
"$libcacard_libs"; then
+  compile_prog "$test_cflags" "$libcacard_libs"; then
 smartcard_nss="yes"
 QEMU_CFLAGS="$QEMU_CFLAGS $smartcard_cflags $libcacard_cflags"
 libs_softmmu="$libcacard_libs $libs_softmmu"
@@ -2903,6 +2912,11 @@ if test -z "$zero_malloc" ; then
 fi
 fi
 
+# Now we've finished running tests it's OK to add -Werror to the compiler flags
+if test "$werror" = "yes"; then
+QEMU_CFLAGS="-Werror $QEMU_CFLAGS"
+fi
+
 if test "$solaris" = "no" ; then
 if $ld --version 2>/dev/null | grep "GNU ld" >/dev/null 2>/dev/null ; then
 LDFLAGS="-Wl,--warn-common $LDFLAGS"
-- 
1.7.5.4

Re: [Qemu-devel] [PATCH] improve scripts/make-release

2012-07-18 Thread Eric Blake

On 07/18/2012 08:31 AM, Gerd Hoffmann wrote:
> 'make dist' creates a tarball for the current checkout.
> 'make qemu-${version}.tar.bz2' creates a tarball for git tag v${version}.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  Makefile |5 ++---
>  scripts/make-release |8 +++-
>  2 files changed, 9 insertions(+), 4 deletions(-)

> +++ b/scripts/make-release
> @@ -12,11 +12,17 @@
>  
>  src="$1"
>  version="$2"
> +if test "$version" = ""; then

test ! "$version"

is less typing, and still POSIX compliant.

> + commit=$(git describe --long)
> + version="${commit#v}"

Sticking with my earlier theme started against Anthony's original
implementation of this script, in complaining about inconsistent shell
quoting styles, it might look nicer to favor minimal quoting:

commit=$(git describe --long)
version=${commit#v}

or maximal quoting:

commit="$(git describe --long)"
version="${commit#v}"

rather than an ad hoc mix.  But as that is cosmetic, and does not impact
functionality, you have my:

Reviewed-by: Eric Blake 

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

1 2 >

1 - 100 of 184 matches

Mail list logo