date:20130809

Re: [Qemu-devel] [SeaBIOS] [PATCH] acpi: hide 64-bit PCI hole for Windows XP

2013-08-09 Thread Gerd Hoffmann

   Hi,

 pmbase is a compile-time constant (aka #define) in both seabios and
 coreboot, and making this runtime-configurable is non-trivial.  See
 src/smm.c in seabios for one reason why.

 Converting src/smm.c to use a runtime value isn't hard - just change
 the assembler from: mov $ __stringify(PORT_ACPI_PM_BASE)  + 0x04,
 %dx\n to: mov 4(my_acpi_base), %dx\n and make sure to define the
 global variable my_acpi_base as VARFSEG.

Ah, good, I give that a try.  Need to check how that works out for
coreboot though.

That leaves the mmconf xbar location.  We can continue to have everybody
agree this should be mapped @ 0xb000 and be done with it.  Making
this configurable via fw_cfg is no problem for seabios.  coreboot can't
deal with it though, it sets up the xbar _very_ early because it does
the complete pci setup via mmconf.

 In seabios we have fixed 32bit / 64bit width today, from acpi.c:

 // store pci io windows
 *(u32*)ssdt_ptr[acpi_pci32_start[0]] = cpu_to_le32(pcimem_start);
 *(u32*)ssdt_ptr[acpi_pci32_end[0]] = cpu_to_le32(pcimem_end - 1);
 if (pcimem64_start) {
 ssdt_ptr[acpi_pci64_valid[0]] = 1;
 *(u64*)ssdt_ptr[acpi_pci64_start[0]] = cpu_to_le64(pcimem64_start);
 *(u64*)ssdt_ptr[acpi_pci64_end[0]] = cpu_to_le64(pcimem64_end - 1);
 *(u64*)ssdt_ptr[acpi_pci64_length[0]] = cpu_to_le64(
 pcimem64_end - pcimem64_start);
 } else {
 ssdt_ptr[acpi_pci64_valid[0]] = 0;
 }

 Storing fixup instructions for these fields in the linker script
 shouldn't be hard I think.
 
 I don't think SeaBIOS should continue to do the above once the tables
 are moved to QEMU.  QEMU has all the info SeaBIOS has, so it can
 generate the tables correctly on its own.

The loader script provided by qemu has fixup instructions, which is
needed to fixup pointers to other acpi tables.  The idea is to use that
mechanism to also allow th firmware to fixup addresses like pmbase in
the qemu-generated tables.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access

2013-08-09 Thread Rusty Russell

Anthony Liguori anth...@codemonkey.ws writes:
 I suspect this is a premature optimization.  With a weak function called
 directly in the accessors below, I suspect you would see no measurable
 performance overhead compared to this approach.

 It's all very predictable so the CPU should do a decent job optimizing
 the if () away.

Perhaps.  I was leery of introducing performance regressions, but the
actual I/O tends to dominate anyway.

So I tested this, by adding the patch (below) and benchmarking
qemu-system-i386 on my laptop before and after.

Setup: Intel(R) Core(TM) i5 CPU   M 560  @ 2.67GHz
(Performance cpu governer enabled)
Guest: virtio user net, virtio block on raw file, 1 CPU, 512MB RAM.
(Qemu run under eatmydata to eliminate syncs)

First test: ping -f -c 1 -q 10.0.2.0 (100 times)
(Ping chosen since packets stay in qemu's user net code)

BEFORE:
MIN: 824ms
MAX: 914ms
AVG: 876.95ms
STDDEV: 16ms

AFTER:
MIN: 872ms
MAX: 933ms
AVG: 904.35ms
STDDEV: 15ms

Second test: dd if=/dev/vda iflag=direct count=1 of=/dev/null (100 times)

BEFORE:
MIN: 0.927994sec
MAX: 1.051640sec
AVG: 0.99733sec
STDDEV: 0.028sec

AFTER:
MIN: 0.941706sec
MAX: 1.034810sec
AVG: 0.988692sec
STDDEV: 0.021sec

So, we can notice performance on ping, but anything which does actual IO
is a wash.

Cheers,
Rusty.

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 2887f17..df8733b 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -85,20 +85,6 @@ struct VirtQueue
 EventNotifier host_notifier;
 };
 
-#ifdef TARGET_VIRTIO_SWAPENDIAN
-bool virtio_byteswap;
-
-/* Ask target code if we should swap endian for all vring and config access. */
-static void mark_endian(void)
-{
-virtio_byteswap = virtio_swap_endian();
-}
-#else
-static void mark_endian(void)
-{
-}
-#endif
-
 /* virt queue functions */
 static void virtqueue_init(VirtQueue *vq)
 {
@@ -540,9 +526,6 @@ void virtio_set_status(VirtIODevice *vdev, uint8_t val)
 VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
 trace_virtio_set_status(vdev, val);
 
-/* If guest virtio endian is uncertain, set it now. */
-mark_endian();
-
 if (k-set_status) {
 k-set_status(vdev, val);
 }
diff --git a/include/hw/virtio/virtio-access.h 
b/include/hw/virtio/virtio-access.h
index b1d531e..ea4166a 100644
--- a/include/hw/virtio/virtio-access.h
+++ b/include/hw/virtio/virtio-access.h
@@ -13,18 +13,9 @@
 #ifndef _QEMU_VIRTIO_ACCESS_H
 #define _QEMU_VIRTIO_ACCESS_H
 
-#ifdef TARGET_VIRTIO_SWAPENDIAN
-/* Architectures which need biendian define this function. */
-extern bool virtio_swap_endian(void);
-
-extern bool virtio_byteswap;
-#else
-#define virtio_byteswap false
-#endif
-
 static inline uint16_t virtio_lduw_phys(hwaddr pa)
 {
-if (virtio_byteswap) {
+if (cpu_get_byteswap()) {
 return bswap16(lduw_phys(pa));
 }
 return lduw_phys(pa);
@@ -33,7 +24,7 @@ static inline uint16_t virtio_lduw_phys(hwaddr pa)
 
 static inline uint32_t virtio_ldl_phys(hwaddr pa)
 {
-if (virtio_byteswap) {
+if (cpu_get_byteswap()) {
 return bswap32(ldl_phys(pa));
 }
 return ldl_phys(pa);
@@ -41,7 +32,7 @@ static inline uint32_t virtio_ldl_phys(hwaddr pa)
 
 static inline uint64_t virtio_ldq_phys(hwaddr pa)
 {
-if (virtio_byteswap) {
+if (cpu_get_byteswap()) {
 return bswap64(ldq_phys(pa));
 }
 return ldq_phys(pa);
@@ -49,7 +40,7 @@ static inline uint64_t virtio_ldq_phys(hwaddr pa)
 
 static inline void virtio_stw_phys(hwaddr pa, uint16_t value)
 {
-if (virtio_byteswap) {
+if (cpu_get_byteswap()) {
 stw_phys(pa, bswap16(value));
 } else {
 stw_phys(pa, value);
@@ -58,7 +49,7 @@ static inline void virtio_stw_phys(hwaddr pa, uint16_t value)
 
 static inline void virtio_stl_phys(hwaddr pa, uint32_t value)
 {
-if (virtio_byteswap) {
+if (cpu_get_byteswap()) {
 stl_phys(pa, bswap32(value));
 } else {
 stl_phys(pa, value);
@@ -67,7 +58,7 @@ static inline void virtio_stl_phys(hwaddr pa, uint32_t value)
 
 static inline void virtio_stw_p(void *ptr, uint16_t v)
 {
-if (virtio_byteswap) {
+if (cpu_get_byteswap()) {
stw_p(ptr, bswap16(v));
 } else {
stw_p(ptr, v);
@@ -76,7 +67,7 @@ static inline void virtio_stw_p(void *ptr, uint16_t v)
 
 static inline void virtio_stl_p(void *ptr, uint32_t v)
 {
-if (virtio_byteswap) {
+if (cpu_get_byteswap()) {
stl_p(ptr, bswap32(v));
 } else {
stl_p(ptr, v);
@@ -85,7 +76,7 @@ static inline void virtio_stl_p(void *ptr, uint32_t v)
 
 static inline void virtio_stq_p(void *ptr, uint64_t v)
 {
-if (virtio_byteswap) {
+if (cpu_get_byteswap()) {
stq_p(ptr, bswap64(v));
 } else {
stq_p(ptr, v);
@@ -94,7 +85,7 @@ static inline void virtio_stq_p(void *ptr, uint64_t v)
 
 static inline int virtio_lduw_p(const void

Re: [Qemu-devel] [PATCH] pc: drop external DSDT loading

2013-08-09 Thread Gerd Hoffmann

On 08/08/13 18:38, Anthony Liguori wrote:
 This breaks migration and is unneeded with modern SeaBIOS.

No.  Dropping for piix is fine.  It will break q35 though.

Given that q35 can't be migrated anyway due to ahci being tagged as
unmigratable keeping it for q35 (until the new acpi table loading is
sorted) shouldn't hurt though.

cheers,
  Gerd

[Qemu-devel] Question about life cycle of QEMU's stable branches

2013-08-09 Thread Hitoshi Mitake


Hi QEMU list,

I have a question about life cycle of QEMU's stable branches. Is there
an explicit support period for the stable branches?

We are maintaining stable branches of sheepdog, distributed block
storage system for QEMU. The stable branches  of sheepdog corresponds
to the stable branches of QEMU. e.g. stable-0.6 of sheepdog
coressponds to stable-1.5 of QEMU. We will try to keep the branches
updated until corresponding branchs of QEMU become outdated.

So I'd like to know the maintenance policy of QEMU's stable branch.

Thanks,
Hitoshi

Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access

2013-08-09 Thread Rusty Russell

Andreas Färber afaer...@suse.de writes:
 Am 08.08.2013 15:31, schrieb Anthony Liguori:
 Rusty Russell ru...@rustcorp.com.au writes:
 We have a mechanism to do weak functions via stubs/.  I think it would
 be better to do cpu_get_byteswap() as a stub function and then overload
 it in the ppc64 code.

 If this as your name indicates is a per-CPU function then it should go
 into CPUClass. Interesting question is, what is virtio supposed to do if
 we have two ppc CPUs, one is Big Endian, the other is Little Endian.
 We'd need to check current_cpu then, which for Xen is always NULL.

Below is the minimal solution, which is sufficient for virtio.

If Anton wants per-cpu endianness for gdb, he'll need something more
sophisticated.

Feedback welcome!
Rusty.

Subject: cpu_get_byteswap: function for endian-ambivalent targets.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index a5bb515..ed84267 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -357,4 +357,13 @@ void cpu_reset_interrupt(CPUState *cpu, int mask);
  */
 void cpu_resume(CPUState *cpu);
 
+/**
+ * cpu_get_byteswap:
+ *
+ * Is (any) CPU running in byteswapped mode: normally false.  This
+ * doesn't take a cpu argument, because we don't support heterogeneous
+ * endianness.
+ */
+bool cpu_get_byteswap(void);
+
 #endif
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 9b701b4..d4af94a 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -25,3 +25,4 @@ stub-obj-y += vm-stop.o
 stub-obj-y += vmstate.o
 stub-obj-$(CONFIG_WIN32) += fd-register.o
 stub-obj-y += cpus.o
+stub-obj-y += cpu_byteswap.o
diff --git a/stubs/cpu_byteswap.c b/stubs/cpu_byteswap.c
new file mode 100644
index 000..b3b669f
--- /dev/null
+++ b/stubs/cpu_byteswap.c
@@ -0,0 +1,6 @@
+#include qom/cpu.h
+
+bool cpu_get_byteswap(void)
+{
+return false;
+}

Subject: target-ppc: ppc64 targets can be either endian.

In this case, we just query the first cpu.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/target-ppc/misc_helper.c b/target-ppc/misc_helper.c
index 616aab6..0a508eb 100644
--- a/target-ppc/misc_helper.c
+++ b/target-ppc/misc_helper.c
@@ -116,3 +116,8 @@ void ppc_store_msr(CPUPPCState *env, target_ulong value)
 {
 hreg_store_msr(env, value, 0);
 }
+
+bool cpu_get_byteswap(void)
+{
+return first_cpu-hflags  (1  MSR_LE);
+}

Re: [Qemu-devel] [PATCH V3 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line.

2013-08-09 Thread Kevin Wolf

Am 08.08.2013 um 15:37 hat Benoît Canet geschrieben:
  Kevin's series renamed these to have a dash in the name, and also moved
  all the throttling parameters into a sub-struct.  Does it make more
  sense to have just '*throttling' with that sub-struct containing 12
  parameters, 6 for limits and 6 for thresholds, or would it be better to
  have '*throttling' with 6 members for limits, as well as
  '*throttling-threshold' with the other 6 members?  Naming-wise,
  throttling.bps-total and throttling-threshold.bps-total convey as much
  information as throttling.bps-total and throttling.bps-total-threshold.
 
 Eric  Kevin:
 
 Should compatible old style values be added for the new throttling options in
 order to avoid a mixed style mess in qemu-options.hx ?

No, I don't think new options should get an old-style alias. We should
simply convert the documentation in qemu-options.hx consistently to the
new style.

Kevin

Re: [Qemu-devel] [PATCH v1 1/5] util: introduce gsource event abstraction

2013-08-09 Thread liu ping fan

On Fri, Aug 9, 2013 at 12:29 AM, Michael Roth mdr...@linux.vnet.ibm.com wrote:
 Quoting Liu Ping Fan (2013-08-08 01:26:07)
 Introduce struct EventsGSource. It will ease the usage of GSource
 associated with a group of files, which are dynamically allocated
 and release, ex, slirp.

 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  util/Makefile.objs   |  1 +
  util/event_gsource.c | 94 
 
  util/event_gsource.h | 37 +
  3 files changed, 132 insertions(+)
  create mode 100644 util/event_gsource.c
  create mode 100644 util/event_gsource.h

 diff --git a/util/Makefile.objs b/util/Makefile.objs
 index dc72ab0..eec55bd 100644
 --- a/util/Makefile.objs
 +++ b/util/Makefile.objs
 @@ -11,3 +11,4 @@ util-obj-y += iov.o aes.o qemu-config.o qemu-sockets.o 
 uri.o notify.o
  util-obj-y += qemu-option.o qemu-progress.o
  util-obj-y += hexdump.o
  util-obj-y += crc32c.o
 +util-obj-y += event_gsource.o
 diff --git a/util/event_gsource.c b/util/event_gsource.c
 new file mode 100644
 index 000..4b9fa89
 --- /dev/null
 +++ b/util/event_gsource.c
 @@ -0,0 +1,94 @@
 +/*
 + *  Copyright (C) 2013 IBM
 + *
 + *  This program is free software; you can redistribute it and/or modify
 + *  it under the terms of the GNU General Public License as published by
 + *  the Free Software Foundation; under version 2 of the License.
 + *
 + *  This program is distributed in the hope that it will be useful,
 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + *  GNU General Public License for more details.
 + *
 + *  You should have received a copy of the GNU General Public License
 + *  along with this program; if not, see http://www.gnu.org/licenses/.
 + */
 +
 +#include event_gsource.h
 +#include qemu/bitops.h
 +
 +GPollFD *events_source_add_pollfd(EventsGSource *src, int fd)

 Small nit, but if the class is EventsGSource, the methods should
 use the events_gsource_* prefix. Or we can just call it EventsSource.

Will fix.
[...]

 +
 +EventsGSource *events_source_new(GPrepare prepare, GSourceFunc dispatch_cb,
 +void *opaque)
 +{
 +EventsGSource *src;
 +GSourceFuncs *gfuncs = g_new0(GSourceFuncs, 1);
 +gfuncs-prepare = prepare;

 I'm not sure how useful this EventsGSource abstraction is if it requires users
 to implement a custom prepare() function that accesses EventsGSource fields
 directly.

 Either we should just make this SlirpGSource until another user comes
 along where we can look at pulling out common bits, or we should pass in a
 prepare() function that operates on the opaque data instead of the
 underlying EventsGSource.

Maybe SlirpGSource, since the prepare of slirp is too complicated.

 If you take the latter approach, you might consider having
 events_source_add_pollfd take a GPollFD* arg instead of an fd, then storing
 a pointer to the same GPollFD* in your opaque/Slirp object so you can do 
 things

The GPollFD* is stored in slirp's socket (each slirp-socket has a sock-fd ).
 like set the event masks for all the GPollFDs in the prepare cb prior to
 completing the GSource's prepare function (which could then do something 
 generic
 like return true if any GPollFDs have a non-zero event mask)

What is the aim of masks for all the GPollFDs?  We just poll the
GPollFD when so-state ask us to do that. Otherwise they are kept
clean.

Thanks and regards,
Pingfan
 +gfuncs-check = events_source_check,
 +gfuncs-dispatch = events_source_dispatch,
 +
 +src = (EventsGSource *)g_source_new(gfuncs, sizeof(EventsGSource));
 +src-gfuncs = gfuncs;
 +src-pollfds_list = NULL;
 +src-opaque = opaque;
 +g_source_set_callback(src-source, dispatch_cb, src, NULL);
 +
 +return src;
 +}
 +
 +void events_source_release(EventsGSource *src)
 +{
 +assert(!src-pollfds_list);
 +g_free(src-gfuncs);
 +g_source_destroy(src-source);
 +}
 diff --git a/util/event_gsource.h b/util/event_gsource.h
 new file mode 100644
 index 000..8755952
 --- /dev/null
 +++ b/util/event_gsource.h
 @@ -0,0 +1,37 @@
 +/*
 + *  Copyright (C) 2013 IBM
 + *
 + *  This program is free software; you can redistribute it and/or modify
 + *  it under the terms of the GNU General Public License as published by
 + *  the Free Software Foundation; under version 2 of the License.
 + *
 + *  This program is distributed in the hope that it will be useful,
 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + *  GNU General Public License for more details.
 + *
 + *  You should have received a copy of the GNU General Public License
 + *  along with this program; if not, see http://www.gnu.org/licenses/.
 + */
 +
 +#ifndef EVENT_GSOURCE_H
 +#define EVENT_GSOURCE_H
 +#include qemu-common.h
 +
 +typedef gboolean (*GPrepare)(GSource *source, gint *timeout_);
 +
 +/* multi fd drive

Re: [Qemu-devel] [PATCH v6 00/18] aio: drop io_flush()

2013-08-09 Thread Wenchao Xia


于 2013-8-6 23:07, Stefan Hajnoczi 写道:

On Thu, Jul 25, 2013 at 05:18:07PM +0200, Stefan Hajnoczi wrote:

v6:
  * Fix block/stream.c:close_unused_images() dangling pointer in Patch 2
  * Rebase onto qemu.git/master

v5:
  * Split out bdrv_delete() drain fix [bonzini]
  * Fix commit message  [bonzini]

v4:
  * Ensure pending BHs are processed in bdrv_drain_all() [bonzini]

v3:
  * I forgot about this series, time to push it again!
  * Rebase onto qemu.git/master
  * Drop now-unused AioFlushHandler typedef [bonzini]

This series gets rid of aio's .io_flush() callback.  It's based on Paolo's
insight that bdrv_drain_all() can be implemented using the bs-tracked_requests
list.  io_flush() is redundant since the block layer already knows if requests
are pending.

The point of this effort is to simplify our event loop(s).  If we can reduce
custom features like io_flush() it becomes possible to unify AioContext and
main-loop.c, maybe even to replace it with glib's main loop.

This is also important to me for dataplane, since bdrv_drain_all() is one of
the synchronization points between threads.  QEMU monitor commands invoke
bdrv_drain_all() while the block device is accessed from a dataplane thread.

Background on io_flush() semantics:

The io_flush() handler must return 1 if this aio fd handler is active.  That
is, requests are pending and we'd like to make progress by monitoring the fd.

If io_flush() returns 0, the aio event loop skips monitoring this fd.  This is
critical for block drivers like iscsi where we have an idle TCP socket which we
want to block on *only* when there are pending requests.

The series works as follows:

Part 1 - stop relying on .io_flush()

The first patches change aio_poll() callers to check their termination
condition themselves instead of relying on .io_flush():

   bcfa688 block: ensure bdrv_drain_all() works during bdrv_delete()
   c1351f5 block: stop relying on io_flush() in bdrv_drain_all()
   b75d9e5 dataplane/virtio-blk: check exit conditions before aio_poll()
   18fc4d6 tests: adjust test-aio to new aio_poll() semantics
   4c52a7c tests: adjust test-thread-pool to new aio_poll() semantics

Part 2 - stop calling .io_flush() from aio_poll()

The semantic change to aio_poll() is made here:

   bacae7a aio: stop using .io_flush()

Part 3 - drop io_flush() handlers, just pass NULL for the io_flush argument

Remove the now dead code from all .io_flush() users:

   720e0ad block/curl: drop curl_aio_flush()
   2830cf3 block/gluster: drop qemu_gluster_aio_flush_cb()
   c683e8e block/iscsi: drop iscsi_process_flush()
   5019686 block/linux-aio: drop qemu_laio_completion_cb()
   b862bcf block/nbd: drop nbd_have_request()
   516e517 block/rbd: drop qemu_rbd_aio_flush_cb()
   177090b block/sheepdog: drop have_co_req() and aio_flush_request()
   e2e9e85 block/ssh: drop return_true()
   7cc7bac dataplane/virtio-blk: drop flush_true() and flush_io()
   db3cb18 thread-pool: drop thread_pool_active()
   10a783b tests: drop event_active_cb()

Part 4 - remove io_flush arguments from aio functions

The big, mechanical patch that drops the io_flush argument:

   4a8c36b aio: drop io_flush argument

Note that I split Part 3 from Part 4 so it's easy to review individual block
drivers.

Stefan Hajnoczi (18):
   block: ensure bdrv_drain_all() works during bdrv_delete()
   block: stop relying on io_flush() in bdrv_drain_all()
   dataplane/virtio-blk: check exit conditions before aio_poll()
   tests: adjust test-aio to new aio_poll() semantics
   tests: adjust test-thread-pool to new aio_poll() semantics
   aio: stop using .io_flush()
   block/curl: drop curl_aio_flush()
   block/gluster: drop qemu_gluster_aio_flush_cb()
   block/iscsi: drop iscsi_process_flush()
   block/linux-aio: drop qemu_laio_completion_cb()
   block/nbd: drop nbd_have_request()
   block/rbd: drop qemu_rbd_aio_flush_cb()
   block/sheepdog: drop have_co_req() and aio_flush_request()
   block/ssh: drop return_true()
   dataplane/virtio-blk: drop flush_true() and flush_io()
   thread-pool: drop thread_pool_active()
   tests: drop event_active_cb()
   aio: drop io_flush argument

  aio-posix.c | 36 ++
  aio-win32.c | 37 ---
  async.c |  4 +-
  block.c | 50 ++---
  block/curl.c| 25 ++---
  block/gluster.c | 21 ++-
  block/iscsi.c   | 10 +
  block/linux-aio.c   | 18 +
  block/nbd.c | 18 ++---
  block/rbd.c | 16 +---
  block/sheepdog.c| 33 -
  block/ssh.c | 12 +-
  block/stream.c  |  6 ++-
  hw/block/dataplane/virtio-blk.c | 25 +++--
  include/block/aio.h | 14 +--
  main-loop.c |  9 ++---
  tests/test-aio.c| 82

Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access

2013-08-09 Thread Rusty Russell

Andreas Färber afaer...@suse.de writes:
 Am 08.08.2013 17:40, schrieb Anthony Liguori:
 Andreas Färber afaer...@suse.de writes:
 Am 08.08.2013 15:31, schrieb Anthony Liguori:
 We have a mechanism to do weak functions via stubs/.  I think it would
 be better to do cpu_get_byteswap() as a stub function and then overload
 it in the ppc64 code.

 If this as your name indicates is a per-CPU function then it should go
 into CPUClass. Interesting question is, what is virtio supposed to do if
 we have two ppc CPUs, one is Big Endian, the other is Little Endian.
 
 PPC64 is big endian.  AFAIK, there is no such thing as a little endian
 PPC64 processor.
 
 This is just a processor mode where loads/stores are byte lane swapped.
 Hence the name 'cpu_get_byteswap'.  It's just asking whether the
 load/stores are being swapped or not.

 Exactly, just read it as is in ... Endian mode. On the CPUs I am more
 familiar with (e.g., 970), this used to be controlled via an MSR bit,
 which as CPUPPCState::msr exists per CPUState. I did not check on real
 hardware, but from the QEMU code this would allow for the mixed-endian
 scenario described above.

 At least for PPC64, it's not possible to enable/disable byte lane
 swapping for individual CPUs.  It's done through a system-wide hcall.

 What is offending me is only the following: If we name it
 cpu_get_byteswap() as proposed by you, then its first argument should be
 a CPUState *cpu. Its value would be read from the derived type's state,
 such as the MSR bit in the code path that you wanted duplicated. The
 function implementing that register-reading would be a hook in CPUClass,
 with a default implementation in qom/cpu.c rather than a fallback in
 stubs/. To access CPUClass, CPUState cannot be NULL, as brought up by
 Stefano for cpu_do_unassigned_access(); not following that pattern
 prevents mixing CPU architectures, which my large refactorings have
 partially been about. Cf. my guest-memory-dump refactoring.

 If it is just some random global value, then please don't call it
 cpu_*(). Since sPAPR is not a target of its own, I don't see how/where
 you want to implement that hcall query as per-target function either,
 that might rather call for a QEMUMachine hook?

 I don't care or argue about byte lanes here, I am just trying to keep
 API design consistent and not regressing on the way to heterogeneous
 emulation.

That's a lot of replumbing and indirect function calls for a fairly
obscure case.  We certainly don't have a nice CPUState lying around in
virtio at the moment, for example.

I can try to plumb this in if there's consensus, but I suspect it's
making the job 10x harder.

(The next logical step would be for st* and ld* to take the cpu to query
its endianness, Anthony's weird ideas notwithstanding).

Cheers,
Rusty.

Re: [Qemu-devel] [PATCH 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-09 Thread Paolo Bonzini

Il 08/08/2013 17:43, Jan Kiszka ha scritto:
 On 2013-08-08 17:33, Peter Maydell wrote:
 On 3 August 2013 09:31, Jan Kiszka jan.kis...@web.de wrote:
 --- a/ioport.c
 +++ b/ioport.c
 @@ -44,6 +44,22 @@ typedef struct MemoryRegionPortioList {
  MemoryRegionPortio ports[];
  } MemoryRegionPortioList;

 +static uint64_t unassigned_io_read(void *opaque, hwaddr addr, unsigned 
 size)
 +{
 +return -1UL;

 This should probably be -1ULL, otherwise we'll return
 different values on 32 bit and 64 bit hosts. (Actually
 managing a 64 bit read of the i/o space is pretty
 unlikely, though possibly alpha memory-mapped via the
 PCI space might let you do it.)
 
 No problem with changing this - but wouldn't 64-bit i/o accesses be a
 bug? It's not allowed according to PCI, no device can handle it
 (officially), so no arch should forward such requests from mmio, rather
 break them up first.

Yes, the impl.max_access_size should never be 8.  Though 1ULL would be
clearer perhaps.

Paolo

Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access

2013-08-09 Thread Peter Maydell

On 9 August 2013 08:35, Rusty Russell ru...@rustcorp.com.au wrote:
 That's a lot of replumbing and indirect function calls for a fairly
 obscure case.  We certainly don't have a nice CPUState lying around in
 virtio at the moment, for example.

Actually if you're in an IO routine you do: it will be in the
global variable 'current_cpu'. Most IO functions don't need this,
but it's what we use for things like this device behaviour varies
depending on which CPU accesses it.

-- PMM

Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access

2013-08-09 Thread Benjamin Herrenschmidt

On Fri, 2013-08-09 at 17:05 +0930, Rusty Russell wrote:

  Exactly, just read it as is in ... Endian mode. On the CPUs I am more
  familiar with (e.g., 970), this used to be controlled via an MSR bit,

970 didn't have an LE mode :-)

  which as CPUPPCState::msr exists per CPUState. I did not check on real
  hardware, but from the QEMU code this would allow for the mixed-endian
  scenario described above.

This whole exercise should have nothing to do with the current endian
mode of the CPU. If for example you are running lx86 (the x86 emulator
IBM provides) which exploits MSR:LE on POWER7 to run x86 binaries in
userspace, you don't want virtio to suddenly change endian !

The information we care about is the endianness of the operating system.

The most logical way to infer it is a different bit, which used to be
MSR:ILE and is now in LPCR for guests and controlled via a hypercall on
pseries, which indicates what is the endianness of interrupt vectors.

IE. It indicates how the cpu should set MSR:LE when taking an interrupt,
regardless of what the current MSR:LE value is at any given point in
time.

So what should be done in fact is whenever *that* bit is changed
(currently via hcall, maybe via MSR:ILE if we emulate that on older
models or LPCR when we emulate that), then the qemu cpu model can call
out to change the OS endianness which we can propagate to virtio.

Anything trying to do stuff based on the current endianness in the MSR
sounds like a cesspit to me.

 (The next logical step would be for st* and ld* to take the cpu to query
 its endianness, Anthony's weird ideas notwithstanding).

Why would we even consider something like that ?

Ben.

Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access

2013-08-09 Thread Peter Maydell

On 9 August 2013 03:58, Rusty Russell ru...@rustcorp.com.au wrote:
 Anthony Liguori anth...@codemonkey.ws writes:
 The distinction is important in QEMU.  ppc64 is still
 TARGET_WORDS_BIGENDIAN.  We still want most stl_phys to treat integers
 as big endian.  There's just this extra concept that CPU loads/stores
 are sometimes byte swapped.  That affects virtio but not a lot else.

 You've redefined endian here; please don't do that.  Endian is the order
 in memory which a CPU does loads and stores.

Agreed (subject to the complicating factor that it's possible for
a CPU to have the order to be different for data, instruction and
TLB walk loads, so it's not a single setting for a CPU).

 From any reasonable definition, PPC is bi-endian.

 It's actually a weird thing for the qemu core to know at all:

It's a TCG performance optimisation and/or simplification hack,
basically -- by hardcoding the endianness we think the core is at
compile time we can either always-byteswap or never-byteswap in the
fast paths.

 almost
 everything which cares is in target-specific code.  The exceptions are
 gdb stubs and virtio, both of which are native endian (and that weird
 code in exec.c: what is notdirty_mem_write?).

The code for actually doing TCG CPU loads and stores cares too.

notdirty_mem_write is (I think) part of how we handle self-modifying
code: if you write to a page in memory that we have translated some
code from, then we need to intercept that write so that we can throw
away the translated code. notdirty_mem_write() is the function that
does that interception, throws away the code, figures out if we still
need to intercept next time around, and actually does the access
the guest asked for. (It might also be used for spotting when the
guest touches memory during migration and thus that page needs to be
retransmitted -- I haven't checked.)

-- PMM

Re: [Qemu-devel] [PATCH v1 2/5] slirp: make timeout local

2013-08-09 Thread Paolo Bonzini

Il 08/08/2013 08:26, Liu Ping Fan ha scritto:
 Each slirp has its own time to caculate timeout.
 
 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  slirp/slirp.c | 22 ++
  slirp/slirp.h |  3 +++
  2 files changed, 13 insertions(+), 12 deletions(-)
 
 diff --git a/slirp/slirp.c b/slirp/slirp.c
 index 80b28ea..55654d5 100644
 --- a/slirp/slirp.c
 +++ b/slirp/slirp.c
 @@ -40,8 +40,6 @@ static const uint8_t special_ethaddr[ETH_ALEN] = {
  static const uint8_t zero_ethaddr[ETH_ALEN] = { 0, 0, 0, 0, 0, 0 };
  
  u_int curtime;
 -static u_int time_fasttimo, last_slowtimo;
 -static int do_slowtimo;
  
  static QTAILQ_HEAD(slirp_instances, Slirp) slirp_instances =
  QTAILQ_HEAD_INITIALIZER(slirp_instances);
 @@ -278,14 +276,13 @@ void slirp_pollfds_fill(GArray *pollfds)
  /*
   * First, TCP sockets
   */
 -do_slowtimo = 0;
  
  QTAILQ_FOREACH(slirp, slirp_instances, entry) {
  /*
   * *_slowtimo needs calling if there are IP fragments
   * in the fragment queue, or there are TCP connections active
   */
 -do_slowtimo |= ((slirp-tcb.so_next != slirp-tcb) ||
 +slirp-do_slowtimo = ((slirp-tcb.so_next != slirp-tcb) ||
  (slirp-ipq.ip_link != slirp-ipq.ip_link.next));
  
  for (so = slirp-tcb.so_next; so != slirp-tcb;
 @@ -299,8 +296,9 @@ void slirp_pollfds_fill(GArray *pollfds)
  /*
   * See if we need a tcp_fasttimo
   */
 -if (time_fasttimo == 0  so-so_tcpcb-t_flags  TF_DELACK) {
 -time_fasttimo = curtime; /* Flag when we want a fasttimo */
 +if (slirp-time_fasttimo == 0 
 +so-so_tcpcb-t_flags  TF_DELACK) {
 +slirp-time_fasttimo = curtime; /* Flag when want a fasttimo 
 */
  }
  
  /*
 @@ -381,7 +379,7 @@ void slirp_pollfds_fill(GArray *pollfds)
  udp_detach(so);
  continue;
  } else {
 -do_slowtimo = 1; /* Let socket expire */
 +slirp-do_slowtimo = 1; /* Let socket expire */
  }
  }
  
 @@ -422,7 +420,7 @@ void slirp_pollfds_fill(GArray *pollfds)
  icmp_detach(so);
  continue;
  } else {
 -do_slowtimo = 1; /* Let socket expire */
 +slirp-do_slowtimo = 1; /* Let socket expire */
  }
  }
  
 @@ -454,14 +452,14 @@ void slirp_pollfds_poll(GArray *pollfds, int 
 select_error)
  /*
   * See if anything has timed out
   */
 -if (time_fasttimo  ((curtime - time_fasttimo) = 2)) {
 +if (slirp-time_fasttimo  ((curtime - slirp-time_fasttimo) = 2)) 
 {
  tcp_fasttimo(slirp);
 -time_fasttimo = 0;
 +slirp-time_fasttimo = 0;
  }
 -if (do_slowtimo  ((curtime - last_slowtimo) = 499)) {
 +if (slirp-do_slowtimo  ((curtime - slirp-last_slowtimo) = 499)) 
 {
  ip_slowtimo(slirp);
  tcp_slowtimo(slirp);
 -last_slowtimo = curtime;
 +slirp-last_slowtimo = curtime;
  }
  
  /*
 diff --git a/slirp/slirp.h b/slirp/slirp.h
 index fe0e65d..008360e 100644
 --- a/slirp/slirp.h
 +++ b/slirp/slirp.h
 @@ -203,6 +203,9 @@ bool arp_table_search(Slirp *slirp, uint32_t ip_addr,
  
  struct Slirp {
  QTAILQ_ENTRY(Slirp) entry;
 +u_int time_fasttimo;
 +u_int last_slowtimo;
 +int do_slowtimo;
  
  /* virtual network configuration */
  struct in_addr vnetwork_addr;
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

In addition, you could modify slirp_update_timeout to run after
slirp_pollfds_fill, and actually look at the timeout fields to set the
right timeout (2 if time_fasttimo is nonzero, 499 if do_slowtimo is true).

Paolo

Re: [Qemu-devel] [RFC] [PATCHv8 00/30] aio / timers: Add AioContext timers and use ppoll

2013-08-09 Thread Jan Kiszka

On 2013-08-08 23:41, Alex Bligh wrote:
 This patch series adds support for timers attached to an AioContext clock
 which get called within aio_poll.
 
 In doing so it removes alarm timers and moves to use ppoll where possible.
 
 This patch set 'sort of' passes make check (see below for caveat)
 including a new test harness for the aio timers, but has not been
 tested much beyond that. In particular, the win32 changes have not
 even been compile tested. Equally, alterations to use_icount
 are untested.
 
 Caveat: I have had to alter tests/test-aio.c so the following error
 no longer occurs.
 
 ERROR:tests/test-aio.c:346:test_wait_event_notifier_noflush: assertion 
 failed: (aio_poll(ctx, false))
 
 As gar as I can tell, this check was incorrect, in that it checking
 aio_poll makes progress when in fact it should not make progress. I
 fixed an issue where aio_poll was (as far as I can tell) wrongly
 returning true on a timeout, and that generated this error.
 
 Note also the comment on patch 18 in relation to a possible bug
 in cpus.c.
 
 The penultimate patch is patch which is created in an automated manner
 using scripts/switch-timer-api, added in this patch set. It violates some
 coding standards (e.g. line length = 80 characters), but this is preferable
 in terms of giving a provably correct conversion.
 
 This patch set has been compile tested  make check tested on a
 'christmas-tree' configuration, meaning a configuration with every
 --enable- value tested that can be easily configured on Ubuntu Precise,
 after application of each patch.
 
 Changes since v7:
 * Rebase to master 6fdf98f281f85ae6e2883bed2f691bcfe33b1f9f
 * Add qemu_clock_get_ms and qemu_clock_get_ms
 * Rename qemu_get_clock to qemu_clock_ptr
 * Reorder qemu-timer.h to utilise the legacy API
 * Hide qemu_clock_new  qemu_clock_free
 * Rename default_timerlist to main_loop_timerlist
 * Remove main_loop_timerlist once main_loop_tlg is in
 * Add script to convert to new API
 * Make rtc_clock use new API
 * Convert tests/test-aio to use new API
 * Run script on entire source code
 * Remove legacy API functions
 
 Changes since v6:
 * Fix build failure in vnc-auth-sasl.c
 * Split first patch into 3
 * Add assert on timerlist_free
 * Fix ==/= error on qemu_clock_use_for_deadline
 * Remove unnecessary cast in aio_timerlist_notify
 * Fix bad deadline comparison in aio_ctx_check
 * Add assert to timerlist_new_from_clock to check init_clocks
 * Use timer_list not tl
 * Change default_timerlistgroup to main_loop_timerlistgroup
 * Add comment on commit for qemu_clock_use_for_deadline
 * Fixed various include file issues
 * Convert *_has_timers and *_has_expired to return bool
 * Make loop variable consistent when looping through clock types
 * Add documentation to existing qemu_timer calls
 * Remove qemu_clock_deadline and move to qemu_clock_deadline_ns
 
 Changes since v5:
 * Rebase onto master (b9ac5d9)
 * Fix spacing in typedef QEMUTimerList
 * Rename 'QEMUClocks' extern to 'qemu_clocks'
 
 Changes since v4:
 * Rename qemu_timerlist_ functions to timer_list (per Paolo Bonzini)
 * Rename qemu_timer_.*timerlist.* to timer_ (per Paolo Bonzini)
 * Use enum for QEMUClockType
 * Put clocks into an array; remove global variables
 * Introduce QEMUTimerListGroup - a timeliest of each type
 * Add a QEMUTimerListGroup to AioContext
 * Use a callback on timer modification, rather than binding in
   AioContext into the timeliest
 * Make cpus.c iterate over all timerlists when it does a notify
 * Make cpus.c icount timeout use soonest timeout
   across all timerlists
 
 Changes since v3:
 * Split up QEMUClock and QEMUClock list
 * Improve commenting
 * Fix comment in vl.c
 * Change test/test-aio.c to reflect correct behaviour in aio_poll.
 
 Changes since v2:
 * Reordered to remove alarm timers last
 * Added prctl(PR_SET_TIMERSLACK, 1, ...)
 * Renamed qemu_g_poll_ns to qemu_poll_ns
 * Moved declaration of above  drop glib types
 * Do not use a global list of qemu clocks
 * Add AioContext * to QEMUClock
 * Split up conversion to use ppoll and timers
 * Indentation fix
 * Fix aio_win32.c aio_poll to return progress
 * aio_notify / qemu_notify when timers are modified
 * change comment in deprecation of clock options
 
 Alex Bligh (30):
   aio / timers: Rename qemu_new_clock and expose clock types
   aio / timers: add qemu-timer.c utility functions
   aio / timers: Consistent treatment of disabled clocks for deadlines
   aio / timers: add ppoll support with qemu_poll_ns
   aio / timers: Add prctl(PR_SET_TIMERSLACK, 1, ...) to reduce timer
 slack
   aio / timers: Make qemu_run_timers and qemu_run_all_timers return
 progress
   aio / timers: Split QEMUClock into QEMUClock and QEMUTimerList
   aio / timers: Untangle include files
   aio / timers: Add QEMUTimerListGroup and helper functions
   aio / timers: Add QEMUTimerListGroup to AioContext
   aio / timers: Add a notify callback to QEMUTimerList
   aio / timers: aio_ctx_prepare sets timeout from AioContext

[Qemu-devel] [PATCH v7 00/18] aio: drop io_flush()

2013-08-09 Thread Stefan Hajnoczi

v7:
 * Rebase onto Kevin's block-next branch to resolve conflicts [Wenchao]

v6:
 * Fix block/stream.c:close_unused_images() dangling pointer in Patch 2
 * Rebase onto qemu.git/master

v5:
 * Split out bdrv_delete() drain fix [bonzini]
 * Fix commit message  [bonzini]

v4:
 * Ensure pending BHs are processed in bdrv_drain_all() [bonzini]

v3:
 * I forgot about this series, time to push it again!
 * Rebase onto qemu.git/master
 * Drop now-unused AioFlushHandler typedef [bonzini]

This series gets rid of aio's .io_flush() callback.  It's based on Paolo's
insight that bdrv_drain_all() can be implemented using the bs-tracked_requests
list.  io_flush() is redundant since the block layer already knows if requests
are pending.

The point of this effort is to simplify our event loop(s).  If we can reduce
custom features like io_flush() it becomes possible to unify AioContext and
main-loop.c, maybe even to replace it with glib's main loop.

This is also important to me for dataplane, since bdrv_drain_all() is one of
the synchronization points between threads.  QEMU monitor commands invoke
bdrv_drain_all() while the block device is accessed from a dataplane thread.

Background on io_flush() semantics:

The io_flush() handler must return 1 if this aio fd handler is active.  That
is, requests are pending and we'd like to make progress by monitoring the fd.

If io_flush() returns 0, the aio event loop skips monitoring this fd.  This is
critical for block drivers like iscsi where we have an idle TCP socket which we
want to block on *only* when there are pending requests.

The series works as follows:

Part 1 - stop relying on .io_flush()

The first patches change aio_poll() callers to check their termination
condition themselves instead of relying on .io_flush():

  bcfa688 block: ensure bdrv_drain_all() works during bdrv_delete()
  c1351f5 block: stop relying on io_flush() in bdrv_drain_all()
  b75d9e5 dataplane/virtio-blk: check exit conditions before aio_poll()
  18fc4d6 tests: adjust test-aio to new aio_poll() semantics
  4c52a7c tests: adjust test-thread-pool to new aio_poll() semantics

Part 2 - stop calling .io_flush() from aio_poll()

The semantic change to aio_poll() is made here:

  bacae7a aio: stop using .io_flush()

Part 3 - drop io_flush() handlers, just pass NULL for the io_flush argument

Remove the now dead code from all .io_flush() users:

  720e0ad block/curl: drop curl_aio_flush()
  2830cf3 block/gluster: drop qemu_gluster_aio_flush_cb()
  c683e8e block/iscsi: drop iscsi_process_flush()
  5019686 block/linux-aio: drop qemu_laio_completion_cb()
  b862bcf block/nbd: drop nbd_have_request()
  516e517 block/rbd: drop qemu_rbd_aio_flush_cb()
  177090b block/sheepdog: drop have_co_req() and aio_flush_request()
  e2e9e85 block/ssh: drop return_true()
  7cc7bac dataplane/virtio-blk: drop flush_true() and flush_io()
  db3cb18 thread-pool: drop thread_pool_active()
  10a783b tests: drop event_active_cb()

Part 4 - remove io_flush arguments from aio functions

The big, mechanical patch that drops the io_flush argument:

  4a8c36b aio: drop io_flush argument

Note that I split Part 3 from Part 4 so it's easy to review individual block
drivers.

Stefan Hajnoczi (18):
  block: ensure bdrv_drain_all() works during bdrv_delete()
  block: stop relying on io_flush() in bdrv_drain_all()
  dataplane/virtio-blk: check exit conditions before aio_poll()
  tests: adjust test-aio to new aio_poll() semantics
  tests: adjust test-thread-pool to new aio_poll() semantics
  aio: stop using .io_flush()
  block/curl: drop curl_aio_flush()
  block/gluster: drop qemu_gluster_aio_flush_cb()
  block/iscsi: drop iscsi_process_flush()
  block/linux-aio: drop qemu_laio_completion_cb()
  block/nbd: drop nbd_have_request()
  block/rbd: drop qemu_rbd_aio_flush_cb()
  block/sheepdog: drop have_co_req() and aio_flush_request()
  block/ssh: drop return_true()
  dataplane/virtio-blk: drop flush_true() and flush_io()
  thread-pool: drop thread_pool_active()
  tests: drop event_active_cb()
  aio: drop io_flush argument

 aio-posix.c | 36 ++
 aio-win32.c | 37 ---
 async.c |  4 +-
 block.c | 49 ++--
 block/curl.c| 25 ++---
 block/gluster.c | 21 ++-
 block/iscsi.c   | 10 +
 block/linux-aio.c   | 18 +
 block/nbd.c | 18 ++---
 block/rbd.c | 16 +---
 block/sheepdog.c| 33 -
 block/ssh.c | 12 +-
 block/stream.c  |  6 ++-
 hw/block/dataplane/virtio-blk.c | 25 +++--
 include/block/aio.h | 14 +--
 main-loop.c |  9 ++---
 tests/test-aio.c| 82 +
 tests/test-thread-pool.c| 24 ++--
 thread-pool.c

[Qemu-devel] [PATCH v7 02/18] block: stop relying on io_flush() in bdrv_drain_all()

2013-08-09 Thread Stefan Hajnoczi

If a block driver has no file descriptors to monitor but there are still
active requests, it can return 1 from .io_flush().  This is used to spin
during synchronous I/O.

Stop relying on .io_flush() and instead check
QLIST_EMPTY(bs-tracked_requests) to decide whether there are active
requests.

This is the first step in removing .io_flush() so that event loops no
longer need to have the concept of synchronous I/O.  Eventually we may
be able to kill synchronous I/O completely by running everything in a
coroutine, but that is future work.

Note this patch moves bs-throttled_reqs initialization to bdrv_new() so
that bdrv_requests_pending(bs) can safely access it.  In practice bs is
g_malloc0() so the memory is already zeroed but it's safer to initialize
the queue properly.

We also need to fix up block/stream.c:close_unused_images() to prevent
traversing a dangling pointer while it rearranges the backing file
chain.  This is necessary since the new bdrv_drain_all() traverses the
backing file chain.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block.c| 45 +++--
 block/stream.c |  6 +-
 2 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index a12eca0..d40857e 100644
--- a/block.c
+++ b/block.c
@@ -148,7 +148,6 @@ static void bdrv_block_timer(void *opaque)
 
 void bdrv_io_limits_enable(BlockDriverState *bs)
 {
-qemu_co_queue_init(bs-throttled_reqs);
 bs-block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
 bs-io_limits_enabled = true;
 }
@@ -306,6 +305,7 @@ BlockDriverState *bdrv_new(const char *device_name)
 bdrv_iostatus_disable(bs);
 notifier_list_init(bs-close_notifiers);
 notifier_with_return_list_init(bs-before_write_notifiers);
+qemu_co_queue_init(bs-throttled_reqs);
 
 return bs;
 }
@@ -1433,6 +1433,35 @@ void bdrv_close_all(void)
 }
 }
 
+/* Check if any requests are in-flight (including throttled requests) */
+static bool bdrv_requests_pending(BlockDriverState *bs)
+{
+if (!QLIST_EMPTY(bs-tracked_requests)) {
+return true;
+}
+if (!qemu_co_queue_empty(bs-throttled_reqs)) {
+return true;
+}
+if (bs-file  bdrv_requests_pending(bs-file)) {
+return true;
+}
+if (bs-backing_hd  bdrv_requests_pending(bs-backing_hd)) {
+return true;
+}
+return false;
+}
+
+static bool bdrv_requests_pending_all(void)
+{
+BlockDriverState *bs;
+QTAILQ_FOREACH(bs, bdrv_states, list) {
+if (bdrv_requests_pending(bs)) {
+return true;
+}
+}
+return false;
+}
+
 /*
  * Wait for pending requests to complete across all BlockDriverStates
  *
@@ -1447,12 +1476,11 @@ void bdrv_close_all(void)
  */
 void bdrv_drain_all(void)
 {
+/* Always run first iteration so any pending completion BHs run */
+bool busy = true;
 BlockDriverState *bs;
-bool busy;
-
-do {
-busy = qemu_aio_wait();
 
+while (busy) {
 /* FIXME: We do not have timer support here, so this is effectively
  * a busy wait.
  */
@@ -1461,12 +1489,9 @@ void bdrv_drain_all(void)
 busy = true;
 }
 }
-} while (busy);
 
-/* If requests are still pending there is a bug somewhere */
-QTAILQ_FOREACH(bs, bdrv_states, list) {
-assert(QLIST_EMPTY(bs-tracked_requests));
-assert(qemu_co_queue_empty(bs-throttled_reqs));
+busy = bdrv_requests_pending_all();
+busy |= aio_poll(qemu_get_aio_context(), busy);
 }
 }
 
diff --git a/block/stream.c b/block/stream.c
index 7fe9e48..db49b4d 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -57,6 +57,11 @@ static void close_unused_images(BlockDriverState *top, 
BlockDriverState *base,
 BlockDriverState *intermediate;
 intermediate = top-backing_hd;
 
+/* Must assign before bdrv_delete() to prevent traversing dangling pointer
+ * while we delete backing image instances.
+ */
+top-backing_hd = base;
+
 while (intermediate) {
 BlockDriverState *unused;
 
@@ -70,7 +75,6 @@ static void close_unused_images(BlockDriverState *top, 
BlockDriverState *base,
 unused-backing_hd = NULL;
 bdrv_delete(unused);
 }
-top-backing_hd = base;
 }
 
 static void coroutine_fn stream_run(void *opaque)
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 01/18] block: ensure bdrv_drain_all() works during bdrv_delete()

2013-08-09 Thread Stefan Hajnoczi

In bdrv_delete() make sure to call bdrv_make_anon() *after* bdrv_close()
so that the device is still seen by bdrv_drain_all() when iterating
bdrv_states.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 812f372..a12eca0 100644
--- a/block.c
+++ b/block.c
@@ -1611,11 +1611,11 @@ void bdrv_delete(BlockDriverState *bs)
 assert(!bs-job);
 assert(!bs-in_use);
 
+bdrv_close(bs);
+
 /* remove from list, if necessary */
 bdrv_make_anon(bs);
 
-bdrv_close(bs);
-
 g_free(bs);
 }
 
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 03/18] dataplane/virtio-blk: check exit conditions before aio_poll()

2013-08-09 Thread Stefan Hajnoczi

Check exit conditions before entering blocking aio_poll().  This is
mainly for consistency since it's unlikely that we are stopping in the
first event loop iteration.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/block/dataplane/virtio-blk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 411becc..5bd5eed 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -376,9 +376,9 @@ static void *data_plane_thread(void *opaque)
 {
 VirtIOBlockDataPlane *s = opaque;
 
-do {
+while (!s-stopping || s-num_reqs  0) {
 aio_poll(s-ctx, true);
-} while (!s-stopping || s-num_reqs  0);
+}
 return NULL;
 }
 
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 04/18] tests: adjust test-aio to new aio_poll() semantics

2013-08-09 Thread Stefan Hajnoczi

aio_poll(ctx, true) will soon block if any fd handlers have been set.
Previously it would only block when .io_flush() returned true.

This means that callers must check their wait condition *before*
aio_poll() to avoid deadlock.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 tests/test-aio.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/tests/test-aio.c b/tests/test-aio.c
index c173870..20bf5e6 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -15,6 +15,13 @@
 
 AioContext *ctx;
 
+typedef struct {
+EventNotifier e;
+int n;
+int active;
+bool auto_set;
+} EventNotifierTestData;
+
 /* Wait until there are no more BHs or AIO requests */
 static void wait_for_aio(void)
 {
@@ -23,6 +30,14 @@ static void wait_for_aio(void)
 }
 }
 
+/* Wait until event notifier becomes inactive */
+static void wait_until_inactive(EventNotifierTestData *data)
+{
+while (data-active  0) {
+aio_poll(ctx, true);
+}
+}
+
 /* Simple callbacks for testing.  */
 
 typedef struct {
@@ -50,13 +65,6 @@ static void bh_delete_cb(void *opaque)
 }
 }
 
-typedef struct {
-EventNotifier e;
-int n;
-int active;
-bool auto_set;
-} EventNotifierTestData;
-
 static int event_active_cb(EventNotifier *e)
 {
 EventNotifierTestData *data = container_of(e, EventNotifierTestData, e);
@@ -281,7 +289,7 @@ static void test_flush_event_notifier(void)
 g_assert_cmpint(data.active, ==, 9);
 g_assert(aio_poll(ctx, false));
 
-wait_for_aio();
+wait_until_inactive(data);
 g_assert_cmpint(data.n, ==, 10);
 g_assert_cmpint(data.active, ==, 0);
 g_assert(!aio_poll(ctx, false));
@@ -325,7 +333,7 @@ static void test_wait_event_notifier_noflush(void)
 g_assert_cmpint(data.n, ==, 2);
 
 event_notifier_set(dummy.e);
-wait_for_aio();
+wait_until_inactive(dummy);
 g_assert_cmpint(data.n, ==, 2);
 g_assert_cmpint(dummy.n, ==, 1);
 g_assert_cmpint(dummy.active, ==, 0);
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 05/18] tests: adjust test-thread-pool to new aio_poll() semantics

2013-08-09 Thread Stefan Hajnoczi

aio_poll(ctx, true) will soon block when fd handlers have been set.
Previously aio_poll() would return early if all .io_flush() returned
false.  This means we need to check the equivalent of the .io_flush()
condition *before* calling aio_poll(ctx, true) to avoid deadlock.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 tests/test-thread-pool.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/tests/test-thread-pool.c b/tests/test-thread-pool.c
index b62338f..8188d1a 100644
--- a/tests/test-thread-pool.c
+++ b/tests/test-thread-pool.c
@@ -40,19 +40,13 @@ static void done_cb(void *opaque, int ret)
 active--;
 }
 
-/* Wait until all aio and bh activity has finished */
-static void qemu_aio_wait_all(void)
-{
-while (aio_poll(ctx, true)) {
-/* Do nothing */
-}
-}
-
 static void test_submit(void)
 {
 WorkerTestData data = { .n = 0 };
 thread_pool_submit(pool, worker_cb, data);
-qemu_aio_wait_all();
+while (data.n == 0) {
+aio_poll(ctx, true);
+}
 g_assert_cmpint(data.n, ==, 1);
 }
 
@@ -65,7 +59,9 @@ static void test_submit_aio(void)
 /* The callbacks are not called until after the first wait.  */
 active = 1;
 g_assert_cmpint(data.ret, ==, -EINPROGRESS);
-qemu_aio_wait_all();
+while (data.ret == -EINPROGRESS) {
+aio_poll(ctx, true);
+}
 g_assert_cmpint(active, ==, 0);
 g_assert_cmpint(data.n, ==, 1);
 g_assert_cmpint(data.ret, ==, 0);
@@ -103,7 +99,9 @@ static void test_submit_co(void)
 
 /* qemu_aio_wait_all will execute the rest of the coroutine.  */
 
-qemu_aio_wait_all();
+while (data.ret == -EINPROGRESS) {
+aio_poll(ctx, true);
+}
 
 /* Back here after the coroutine has finished.  */
 
@@ -187,7 +185,9 @@ static void test_cancel(void)
 }
 
 /* Finish execution and execute any remaining callbacks.  */
-qemu_aio_wait_all();
+while (active  0) {
+aio_poll(ctx, true);
+}
 g_assert_cmpint(active, ==, 0);
 for (i = 0; i  100; i++) {
 if (data[i].n == 3) {
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 08/18] block/gluster: drop qemu_gluster_aio_flush_cb()

2013-08-09 Thread Stefan Hajnoczi

Since .io_flush() is no longer called we do not need
qemu_gluster_aio_flush_cb() anymore.  It turns out that qemu_aio_count
is unused now and can be dropped.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/gluster.c | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 645b7f1..fa0b9e0 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -32,7 +32,6 @@ typedef struct BDRVGlusterState {
 struct glfs *glfs;
 int fds[2];
 struct glfs_fd *fd;
-int qemu_aio_count;
 int event_reader_pos;
 GlusterAIOCB *event_acb;
 } BDRVGlusterState;
@@ -247,7 +246,6 @@ static void qemu_gluster_complete_aio(GlusterAIOCB *acb, 
BDRVGlusterState *s)
 ret = -EIO; /* Partial read/write - fail it */
 }
 
-s-qemu_aio_count--;
 qemu_aio_release(acb);
 cb(opaque, ret);
 if (finished) {
@@ -275,13 +273,6 @@ static void qemu_gluster_aio_event_reader(void *opaque)
 } while (ret  0  errno == EINTR);
 }
 
-static int qemu_gluster_aio_flush_cb(void *opaque)
-{
-BDRVGlusterState *s = opaque;
-
-return (s-qemu_aio_count  0);
-}
-
 /* TODO Convert to fine grained options */
 static QemuOptsList runtime_opts = {
 .name = gluster,
@@ -348,7 +339,7 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict 
*options,
 }
 fcntl(s-fds[GLUSTER_FD_READ], F_SETFL, O_NONBLOCK);
 qemu_aio_set_fd_handler(s-fds[GLUSTER_FD_READ],
-qemu_gluster_aio_event_reader, NULL, qemu_gluster_aio_flush_cb, s);
+qemu_gluster_aio_event_reader, NULL, NULL, s);
 
 out:
 qemu_opts_del(opts);
@@ -445,7 +436,6 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, 
ssize_t ret, void *arg)
 qemu_mutex_lock_iothread(); /* We are in gluster thread context */
 acb-common.cb(acb-common.opaque, -EIO);
 qemu_aio_release(acb);
-s-qemu_aio_count--;
 close(s-fds[GLUSTER_FD_READ]);
 close(s-fds[GLUSTER_FD_WRITE]);
 qemu_aio_set_fd_handler(s-fds[GLUSTER_FD_READ], NULL, NULL, NULL,
@@ -467,7 +457,6 @@ static BlockDriverAIOCB 
*qemu_gluster_aio_rw(BlockDriverState *bs,
 
 offset = sector_num * BDRV_SECTOR_SIZE;
 size = nb_sectors * BDRV_SECTOR_SIZE;
-s-qemu_aio_count++;
 
 acb = qemu_aio_get(gluster_aiocb_info, bs, cb, opaque);
 acb-size = size;
@@ -488,7 +477,6 @@ static BlockDriverAIOCB 
*qemu_gluster_aio_rw(BlockDriverState *bs,
 return acb-common;
 
 out:
-s-qemu_aio_count--;
 qemu_aio_release(acb);
 return NULL;
 }
@@ -531,7 +519,6 @@ static BlockDriverAIOCB 
*qemu_gluster_aio_flush(BlockDriverState *bs,
 acb-size = 0;
 acb-ret = 0;
 acb-finished = NULL;
-s-qemu_aio_count++;
 
 ret = glfs_fsync_async(s-fd, gluster_finish_aiocb, acb);
 if (ret  0) {
@@ -540,7 +527,6 @@ static BlockDriverAIOCB 
*qemu_gluster_aio_flush(BlockDriverState *bs,
 return acb-common;
 
 out:
-s-qemu_aio_count--;
 qemu_aio_release(acb);
 return NULL;
 }
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 06/18] aio: stop using .io_flush()

2013-08-09 Thread Stefan Hajnoczi

Now that aio_poll() users check their termination condition themselves,
it is no longer necessary to call .io_flush() handlers.

The behavior of aio_poll() changes as follows:

1. .io_flush() is no longer invoked and file descriptors are *always*
monitored.  Previously returning 0 from .io_flush() would skip this file
descriptor.

Due to this change it is essential to check that requests are pending
before calling qemu_aio_wait().  Failure to do so means we block, for
example, waiting for an idle iSCSI socket to become readable when there
are no requests.  Currently all qemu_aio_wait()/aio_poll() callers check
before calling.

2. aio_poll() now returns true if progress was made (BH or fd handlers
executed) and false otherwise.  Previously it would return true whenever
'busy', which means that .io_flush() returned true.  The 'busy' concept
no longer exists so just progress is returned.

Due to this change we need to update tests/test-aio.c which asserts
aio_poll() return values.  Note that QEMU doesn't actually rely on these
return values so only tests/test-aio.c cares.

Note that ctx-notifier, the EventNotifier fd used for aio_notify(), is
now handled as a special case.  This is a little ugly but maintains
aio_poll() semantics, i.e. aio_notify() does not count as 'progress' and
aio_poll() avoids blocking when the user has not set any fd handlers yet.

Patches after this remove .io_flush() handler code until we can finally
drop the io_flush arguments to aio_set_fd_handler() and friends.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 aio-posix.c  | 29 +
 aio-win32.c  | 34 ++
 tests/test-aio.c | 10 +-
 3 files changed, 28 insertions(+), 45 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index b68eccd..7d66048 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -23,7 +23,6 @@ struct AioHandler
 GPollFD pfd;
 IOHandler *io_read;
 IOHandler *io_write;
-AioFlushHandler *io_flush;
 int deleted;
 int pollfds_idx;
 void *opaque;
@@ -84,7 +83,6 @@ void aio_set_fd_handler(AioContext *ctx,
 /* Update handler with latest information */
 node-io_read = io_read;
 node-io_write = io_write;
-node-io_flush = io_flush;
 node-opaque = opaque;
 node-pollfds_idx = -1;
 
@@ -147,7 +145,11 @@ static bool aio_dispatch(AioContext *ctx)
 (revents  (G_IO_IN | G_IO_HUP | G_IO_ERR)) 
 node-io_read) {
 node-io_read(node-opaque);
-progress = true;
+
+/* aio_notify() does not count as progress */
+if (node-opaque != ctx-notifier) {
+progress = true;
+}
 }
 if (!node-deleted 
 (revents  (G_IO_OUT | G_IO_ERR)) 
@@ -173,7 +175,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 {
 AioHandler *node;
 int ret;
-bool busy, progress;
+bool progress;
 
 progress = false;
 
@@ -200,20 +202,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
 g_array_set_size(ctx-pollfds, 0);
 
 /* fill pollfds */
-busy = false;
 QLIST_FOREACH(node, ctx-aio_handlers, node) {
 node-pollfds_idx = -1;
-
-/* If there aren't pending AIO operations, don't invoke callbacks.
- * Otherwise, if there are no AIO requests, qemu_aio_wait() would
- * wait indefinitely.
- */
-if (!node-deleted  node-io_flush) {
-if (node-io_flush(node-opaque) == 0) {
-continue;
-}
-busy = true;
-}
 if (!node-deleted  node-pfd.events) {
 GPollFD pfd = {
 .fd = node-pfd.fd,
@@ -226,8 +216,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 ctx-walking_handlers--;
 
-/* No AIO operations?  Get us out of here */
-if (!busy) {
+/* early return if we only have the aio_notify() fd */
+if (ctx-pollfds-len == 1) {
 return progress;
 }
 
@@ -250,6 +240,5 @@ bool aio_poll(AioContext *ctx, bool blocking)
 }
 }
 
-assert(progress || busy);
-return true;
+return progress;
 }
diff --git a/aio-win32.c b/aio-win32.c
index 38723bf..4309c16 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -23,7 +23,6 @@
 struct AioHandler {
 EventNotifier *e;
 EventNotifierHandler *io_notify;
-AioFlushEventNotifierHandler *io_flush;
 GPollFD pfd;
 int deleted;
 QLIST_ENTRY(AioHandler) node;
@@ -73,7 +72,6 @@ void aio_set_event_notifier(AioContext *ctx,
 }
 /* Update handler with latest information */
 node-io_notify = io_notify;
-node-io_flush = io_flush;
 }
 
 aio_notify(ctx);
@@ -96,7 +94,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 {
 AioHandler *node;
 HANDLE events[MAXIMUM_WAIT_OBJECTS + 1];
-bool busy, progress;
+bool progress;
 int count;
 
 progress = false;
@@ -126,7

[Qemu-devel] [PATCH v7 13/18] block/sheepdog: drop have_co_req() and aio_flush_request()

2013-08-09 Thread Stefan Hajnoczi

.io_flush() is no longer called so drop have_co_req() and
aio_flush_request().

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/sheepdog.c | 25 +
 1 file changed, 5 insertions(+), 20 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 7699aad..9252b1d 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -509,13 +509,6 @@ static void restart_co_req(void *opaque)
 qemu_coroutine_enter(co, NULL);
 }
 
-static int have_co_req(void *opaque)
-{
-/* this handler is set only when there is a pending request, so
- * always returns 1. */
-return 1;
-}
-
 typedef struct SheepdogReqCo {
 int sockfd;
 SheepdogReq *hdr;
@@ -538,14 +531,14 @@ static coroutine_fn void do_co_req(void *opaque)
 unsigned int *rlen = srco-rlen;
 
 co = qemu_coroutine_self();
-qemu_aio_set_fd_handler(sockfd, NULL, restart_co_req, have_co_req, co);
+qemu_aio_set_fd_handler(sockfd, NULL, restart_co_req, NULL, co);
 
 ret = send_co_req(sockfd, hdr, data, wlen);
 if (ret  0) {
 goto out;
 }
 
-qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, have_co_req, co);
+qemu_aio_set_fd_handler(sockfd, restart_co_req, NULL, NULL, co);
 
 ret = qemu_co_recv(sockfd, hdr, sizeof(*hdr));
 if (ret  sizeof(*hdr)) {
@@ -796,14 +789,6 @@ static void co_write_request(void *opaque)
 qemu_coroutine_enter(s-co_send, NULL);
 }
 
-static int aio_flush_request(void *opaque)
-{
-BDRVSheepdogState *s = opaque;
-
-return !QLIST_EMPTY(s-inflight_aio_head) ||
-!QLIST_EMPTY(s-pending_aio_head);
-}
-
 /*
  * Return a socket discriptor to read/write objects.
  *
@@ -819,7 +804,7 @@ static int get_sheep_fd(BDRVSheepdogState *s)
 return fd;
 }
 
-qemu_aio_set_fd_handler(fd, co_read_response, NULL, aio_flush_request, s);
+qemu_aio_set_fd_handler(fd, co_read_response, NULL, NULL, s);
 return fd;
 }
 
@@ -1070,7 +1055,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 qemu_co_mutex_lock(s-lock);
 s-co_send = qemu_coroutine_self();
 qemu_aio_set_fd_handler(s-fd, co_read_response, co_write_request,
-aio_flush_request, s);
+NULL, s);
 socket_set_cork(s-fd, 1);
 
 /* send a header */
@@ -1092,7 +1077,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 
 socket_set_cork(s-fd, 0);
 qemu_aio_set_fd_handler(s-fd, co_read_response, NULL,
-aio_flush_request, s);
+NULL, s);
 qemu_co_mutex_unlock(s-lock);
 
 return 0;
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 15/18] dataplane/virtio-blk: drop flush_true() and flush_io()

2013-08-09 Thread Stefan Hajnoczi

.io_flush() is no longer called so drop flush_true() and flush_io().

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/block/dataplane/virtio-blk.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 5bd5eed..82131b1 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -261,11 +261,6 @@ static int process_request(IOQueue *ioq, struct iovec 
iov[],
 }
 }
 
-static int flush_true(EventNotifier *e)
-{
-return true;
-}
-
 static void handle_notify(EventNotifier *e)
 {
 VirtIOBlockDataPlane *s = container_of(e, VirtIOBlockDataPlane,
@@ -345,14 +340,6 @@ static void handle_notify(EventNotifier *e)
 }
 }
 
-static int flush_io(EventNotifier *e)
-{
-VirtIOBlockDataPlane *s = container_of(e, VirtIOBlockDataPlane,
-   io_notifier);
-
-return s-num_reqs  0;
-}
-
 static void handle_io(EventNotifier *e)
 {
 VirtIOBlockDataPlane *s = container_of(e, VirtIOBlockDataPlane,
@@ -485,7 +472,7 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
 exit(1);
 }
 s-host_notifier = *virtio_queue_get_host_notifier(vq);
-aio_set_event_notifier(s-ctx, s-host_notifier, handle_notify, 
flush_true);
+aio_set_event_notifier(s-ctx, s-host_notifier, handle_notify, NULL);
 
 /* Set up ioqueue */
 ioq_init(s-ioqueue, s-fd, REQ_MAX);
@@ -493,7 +480,7 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
 ioq_put_iocb(s-ioqueue, s-requests[i].iocb);
 }
 s-io_notifier = *ioq_get_notifier(s-ioqueue);
-aio_set_event_notifier(s-ctx, s-io_notifier, handle_io, flush_io);
+aio_set_event_notifier(s-ctx, s-io_notifier, handle_io, NULL);
 
 s-started = true;
 trace_virtio_blk_data_plane_start(s);
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 10/18] block/linux-aio: drop qemu_laio_completion_cb()

2013-08-09 Thread Stefan Hajnoczi

.io_flush() is no longer called so drop qemu_laio_completion_cb().  It
turns out that count is now unused so drop that too.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/linux-aio.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index ee0f8d1..d9128f3 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -39,7 +39,6 @@ struct qemu_laiocb {
 struct qemu_laio_state {
 io_context_t ctx;
 EventNotifier e;
-int count;
 };
 
 static inline ssize_t io_event_ret(struct io_event *ev)
@@ -55,8 +54,6 @@ static void qemu_laio_process_completion(struct 
qemu_laio_state *s,
 {
 int ret;
 
-s-count--;
-
 ret = laiocb-ret;
 if (ret != -ECANCELED) {
 if (ret == laiocb-nbytes) {
@@ -101,13 +98,6 @@ static void qemu_laio_completion_cb(EventNotifier *e)
 }
 }
 
-static int qemu_laio_flush_cb(EventNotifier *e)
-{
-struct qemu_laio_state *s = container_of(e, struct qemu_laio_state, e);
-
-return (s-count  0) ? 1 : 0;
-}
-
 static void laio_cancel(BlockDriverAIOCB *blockacb)
 {
 struct qemu_laiocb *laiocb = (struct qemu_laiocb *)blockacb;
@@ -177,14 +167,11 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void 
*aio_ctx, int fd,
 goto out_free_aiocb;
 }
 io_set_eventfd(laiocb-iocb, event_notifier_get_fd(s-e));
-s-count++;
 
 if (io_submit(s-ctx, 1, iocbs)  0)
-goto out_dec_count;
+goto out_free_aiocb;
 return laiocb-common;
 
-out_dec_count:
-s-count--;
 out_free_aiocb:
 qemu_aio_release(laiocb);
 return NULL;
@@ -204,7 +191,7 @@ void *laio_init(void)
 }
 
 qemu_aio_set_event_notifier(s-e, qemu_laio_completion_cb,
-qemu_laio_flush_cb);
+NULL);
 
 return s;
 
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 09/18] block/iscsi: drop iscsi_process_flush()

2013-08-09 Thread Stefan Hajnoczi

.io_flush() is no longer called so drop iscsi_process_flush().

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/iscsi.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index e7c1c2b..180b827 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -146,13 +146,6 @@ static const AIOCBInfo iscsi_aiocb_info = {
 static void iscsi_process_read(void *arg);
 static void iscsi_process_write(void *arg);
 
-static int iscsi_process_flush(void *arg)
-{
-IscsiLun *iscsilun = arg;
-
-return iscsi_queue_length(iscsilun-iscsi)  0;
-}
-
 static void
 iscsi_set_events(IscsiLun *iscsilun)
 {
@@ -166,7 +159,7 @@ iscsi_set_events(IscsiLun *iscsilun)
 qemu_aio_set_fd_handler(iscsi_get_fd(iscsi),
   iscsi_process_read,
   (ev  POLLOUT) ? iscsi_process_write : NULL,
-  iscsi_process_flush,
+  NULL,
   iscsilun);
 
 }
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 14/18] block/ssh: drop return_true()

2013-08-09 Thread Stefan Hajnoczi

.io_flush() is no longer called so drop return_true().

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/ssh.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/block/ssh.c b/block/ssh.c
index d7e7bf8..e149da9 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -740,14 +740,6 @@ static void restart_coroutine(void *opaque)
 qemu_coroutine_enter(co, NULL);
 }
 
-/* Always true because when we have called set_fd_handler there is
- * always a request being processed.
- */
-static int return_true(void *opaque)
-{
-return 1;
-}
-
 static coroutine_fn void set_fd_handler(BDRVSSHState *s)
 {
 int r;
@@ -766,7 +758,7 @@ static coroutine_fn void set_fd_handler(BDRVSSHState *s)
 DPRINTF(s-sock=%d rd_handler=%p wr_handler=%p, s-sock,
 rd_handler, wr_handler);
 
-qemu_aio_set_fd_handler(s-sock, rd_handler, wr_handler, return_true, co);
+qemu_aio_set_fd_handler(s-sock, rd_handler, wr_handler, NULL, co);
 }
 
 static coroutine_fn void clear_fd_handler(BDRVSSHState *s)
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 17/18] tests: drop event_active_cb()

2013-08-09 Thread Stefan Hajnoczi

Drop the io_flush argument to aio_set_event_notifier().

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 tests/test-aio.c | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/tests/test-aio.c b/tests/test-aio.c
index 1251952..7b2892a 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -65,12 +65,6 @@ static void bh_delete_cb(void *opaque)
 }
 }
 
-static int event_active_cb(EventNotifier *e)
-{
-EventNotifierTestData *data = container_of(e, EventNotifierTestData, e);
-return data-active  0;
-}
-
 static void event_ready_cb(EventNotifier *e)
 {
 EventNotifierTestData *data = container_of(e, EventNotifierTestData, e);
@@ -239,7 +233,7 @@ static void test_set_event_notifier(void)
 {
 EventNotifierTestData data = { .n = 0, .active = 0 };
 event_notifier_init(data.e, false);
-aio_set_event_notifier(ctx, data.e, event_ready_cb, event_active_cb);
+aio_set_event_notifier(ctx, data.e, event_ready_cb, NULL);
 g_assert(!aio_poll(ctx, false));
 g_assert_cmpint(data.n, ==, 0);
 
@@ -253,7 +247,7 @@ static void test_wait_event_notifier(void)
 {
 EventNotifierTestData data = { .n = 0, .active = 1 };
 event_notifier_init(data.e, false);
-aio_set_event_notifier(ctx, data.e, event_ready_cb, event_active_cb);
+aio_set_event_notifier(ctx, data.e, event_ready_cb, NULL);
 g_assert(!aio_poll(ctx, false));
 g_assert_cmpint(data.n, ==, 0);
 g_assert_cmpint(data.active, ==, 1);
@@ -278,7 +272,7 @@ static void test_flush_event_notifier(void)
 {
 EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true };
 event_notifier_init(data.e, false);
-aio_set_event_notifier(ctx, data.e, event_ready_cb, event_active_cb);
+aio_set_event_notifier(ctx, data.e, event_ready_cb, NULL);
 g_assert(!aio_poll(ctx, false));
 g_assert_cmpint(data.n, ==, 0);
 g_assert_cmpint(data.active, ==, 10);
@@ -318,7 +312,7 @@ static void test_wait_event_notifier_noflush(void)
 
 /* An active event notifier forces aio_poll to look at EventNotifiers.  */
 event_notifier_init(dummy.e, false);
-aio_set_event_notifier(ctx, dummy.e, event_ready_cb, event_active_cb);
+aio_set_event_notifier(ctx, dummy.e, event_ready_cb, NULL);
 
 event_notifier_set(data.e);
 g_assert(aio_poll(ctx, false));
@@ -521,7 +515,7 @@ static void test_source_set_event_notifier(void)
 {
 EventNotifierTestData data = { .n = 0, .active = 0 };
 event_notifier_init(data.e, false);
-aio_set_event_notifier(ctx, data.e, event_ready_cb, event_active_cb);
+aio_set_event_notifier(ctx, data.e, event_ready_cb, NULL);
 while (g_main_context_iteration(NULL, false));
 g_assert_cmpint(data.n, ==, 0);
 
@@ -535,7 +529,7 @@ static void test_source_wait_event_notifier(void)
 {
 EventNotifierTestData data = { .n = 0, .active = 1 };
 event_notifier_init(data.e, false);
-aio_set_event_notifier(ctx, data.e, event_ready_cb, event_active_cb);
+aio_set_event_notifier(ctx, data.e, event_ready_cb, NULL);
 g_assert(g_main_context_iteration(NULL, false));
 g_assert_cmpint(data.n, ==, 0);
 g_assert_cmpint(data.active, ==, 1);
@@ -560,7 +554,7 @@ static void test_source_flush_event_notifier(void)
 {
 EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true };
 event_notifier_init(data.e, false);
-aio_set_event_notifier(ctx, data.e, event_ready_cb, event_active_cb);
+aio_set_event_notifier(ctx, data.e, event_ready_cb, NULL);
 g_assert(g_main_context_iteration(NULL, false));
 g_assert_cmpint(data.n, ==, 0);
 g_assert_cmpint(data.active, ==, 10);
@@ -600,7 +594,7 @@ static void test_source_wait_event_notifier_noflush(void)
 
 /* An active event notifier forces aio_poll to look at EventNotifiers.  */
 event_notifier_init(dummy.e, false);
-aio_set_event_notifier(ctx, dummy.e, event_ready_cb, event_active_cb);
+aio_set_event_notifier(ctx, dummy.e, event_ready_cb, NULL);
 
 event_notifier_set(data.e);
 g_assert(g_main_context_iteration(NULL, false));
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 16/18] thread-pool: drop thread_pool_active()

2013-08-09 Thread Stefan Hajnoczi

.io_flush() is no longer called so drop thread_pool_active().  The block
layer is the only thread-pool.c user and it already tracks in-flight
requests, therefore we do not need thread_pool_active().

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 thread-pool.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/thread-pool.c b/thread-pool.c
index 0ebd4c2..096f007 100644
--- a/thread-pool.c
+++ b/thread-pool.c
@@ -197,12 +197,6 @@ restart:
 }
 }
 
-static int thread_pool_active(EventNotifier *notifier)
-{
-ThreadPool *pool = container_of(notifier, ThreadPool, notifier);
-return !QLIST_EMPTY(pool-head);
-}
-
 static void thread_pool_cancel(BlockDriverAIOCB *acb)
 {
 ThreadPoolElement *elem = (ThreadPoolElement *)acb;
@@ -310,7 +304,7 @@ static void thread_pool_init_one(ThreadPool *pool, 
AioContext *ctx)
 QTAILQ_INIT(pool-request_list);
 
 aio_set_event_notifier(ctx, pool-notifier, event_notifier_ready,
-   thread_pool_active);
+   NULL);
 }
 
 ThreadPool *thread_pool_new(AioContext *ctx)
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 12/18] block/rbd: drop qemu_rbd_aio_flush_cb()

2013-08-09 Thread Stefan Hajnoczi

.io_flush() is no longer called so drop qemu_rbd_aio_flush_cb().
qemu_aio_count is unused now so drop it too.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/rbd.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index cb71751..71b4a0c 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -100,7 +100,6 @@ typedef struct BDRVRBDState {
 rados_ioctx_t io_ctx;
 rbd_image_t image;
 char name[RBD_MAX_IMAGE_NAME_SIZE];
-int qemu_aio_count;
 char *snap;
 int event_reader_pos;
 RADOSCB *event_rcb;
@@ -428,19 +427,11 @@ static void qemu_rbd_aio_event_reader(void *opaque)
 if (s-event_reader_pos == sizeof(s-event_rcb)) {
 s-event_reader_pos = 0;
 qemu_rbd_complete_aio(s-event_rcb);
-s-qemu_aio_count--;
 }
 }
 } while (ret  0  errno == EINTR);
 }
 
-static int qemu_rbd_aio_flush_cb(void *opaque)
-{
-BDRVRBDState *s = opaque;
-
-return (s-qemu_aio_count  0);
-}
-
 /* TODO Convert to fine grained options */
 static QemuOptsList runtime_opts = {
 .name = rbd,
@@ -554,7 +545,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags)
 fcntl(s-fds[0], F_SETFL, O_NONBLOCK);
 fcntl(s-fds[1], F_SETFL, O_NONBLOCK);
 qemu_aio_set_fd_handler(s-fds[RBD_FD_READ], qemu_rbd_aio_event_reader,
-NULL, qemu_rbd_aio_flush_cb, s);
+NULL, NULL, s);
 
 
 qemu_opts_del(opts);
@@ -741,8 +732,6 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
 off = sector_num * BDRV_SECTOR_SIZE;
 size = nb_sectors * BDRV_SECTOR_SIZE;
 
-s-qemu_aio_count++; /* All the RADOSCB */
-
 rcb = g_malloc(sizeof(RADOSCB));
 rcb-done = 0;
 rcb-acb = acb;
@@ -779,7 +768,6 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
 
 failed:
 g_free(rcb);
-s-qemu_aio_count--;
 qemu_aio_release(acb);
 return NULL;
 }
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 11/18] block/nbd: drop nbd_have_request()

2013-08-09 Thread Stefan Hajnoczi

.io_flush() is no longer called so drop nbd_have_request().  We cannot
drop in_flight since it is still used by other block/nbd.c code.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/nbd.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 9c480b8..f1619f9 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -279,13 +279,6 @@ static void nbd_coroutine_start(BDRVNBDState *s, struct 
nbd_request *request)
 request-handle = INDEX_TO_HANDLE(s, i);
 }
 
-static int nbd_have_request(void *opaque)
-{
-BDRVNBDState *s = opaque;
-
-return s-in_flight  0;
-}
-
 static void nbd_reply_ready(void *opaque)
 {
 BDRVNBDState *s = opaque;
@@ -342,7 +335,7 @@ static int nbd_co_send_request(BDRVNBDState *s, struct 
nbd_request *request,
 qemu_co_mutex_lock(s-send_mutex);
 s-send_coroutine = qemu_coroutine_self();
 qemu_aio_set_fd_handler(s-sock, nbd_reply_ready, nbd_restart_write,
-nbd_have_request, s);
+NULL, s);
 if (qiov) {
 if (!s-is_unix) {
 socket_set_cork(s-sock, 1);
@@ -362,7 +355,7 @@ static int nbd_co_send_request(BDRVNBDState *s, struct 
nbd_request *request,
 rc = nbd_send_request(s-sock, request);
 }
 qemu_aio_set_fd_handler(s-sock, nbd_reply_ready, NULL,
-nbd_have_request, s);
+NULL, s);
 s-send_coroutine = NULL;
 qemu_co_mutex_unlock(s-send_mutex);
 return rc;
@@ -439,7 +432,7 @@ static int nbd_establish_connection(BlockDriverState *bs)
  * kick the reply mechanism.  */
 qemu_set_nonblock(sock);
 qemu_aio_set_fd_handler(sock, nbd_reply_ready, NULL,
-nbd_have_request, s);
+NULL, s);
 
 s-sock = sock;
 s-size = size;
-- 
1.8.1.4

[Qemu-devel] [PATCH v7 18/18] aio: drop io_flush argument

2013-08-09 Thread Stefan Hajnoczi

The .io_flush() handler no longer exists and has no users.  Drop the
io_flush argument to aio_set_fd_handler() and related functions.

The AioFlushEventNotifierHandler and AioFlushHandler typedefs are no
longer used and are dropped too.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 aio-posix.c |  7 ++-
 aio-win32.c |  3 +--
 async.c |  4 ++--
 block/curl.c|  9 -
 block/gluster.c |  7 +++
 block/iscsi.c   |  3 +--
 block/linux-aio.c   |  3 +--
 block/nbd.c | 11 ---
 block/rbd.c |  4 ++--
 block/sheepdog.c| 18 --
 block/ssh.c |  4 ++--
 hw/block/dataplane/virtio-blk.c |  8 
 include/block/aio.h | 14 ++
 main-loop.c |  9 +++--
 tests/test-aio.c| 40 
 thread-pool.c   |  5 ++---
 16 files changed, 61 insertions(+), 88 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index 7d66048..2440eb9 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -46,7 +46,6 @@ void aio_set_fd_handler(AioContext *ctx,
 int fd,
 IOHandler *io_read,
 IOHandler *io_write,
-AioFlushHandler *io_flush,
 void *opaque)
 {
 AioHandler *node;
@@ -95,12 +94,10 @@ void aio_set_fd_handler(AioContext *ctx,
 
 void aio_set_event_notifier(AioContext *ctx,
 EventNotifier *notifier,
-EventNotifierHandler *io_read,
-AioFlushEventNotifierHandler *io_flush)
+EventNotifierHandler *io_read)
 {
 aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
-   (IOHandler *)io_read, NULL,
-   (AioFlushHandler *)io_flush, notifier);
+   (IOHandler *)io_read, NULL, notifier);
 }
 
 bool aio_pending(AioContext *ctx)
diff --git a/aio-win32.c b/aio-win32.c
index 4309c16..78b2801 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -30,8 +30,7 @@ struct AioHandler {
 
 void aio_set_event_notifier(AioContext *ctx,
 EventNotifier *e,
-EventNotifierHandler *io_notify,
-AioFlushEventNotifierHandler *io_flush)
+EventNotifierHandler *io_notify)
 {
 AioHandler *node;
 
diff --git a/async.c b/async.c
index 5ce3633..9791d8e 100644
--- a/async.c
+++ b/async.c
@@ -201,7 +201,7 @@ aio_ctx_finalize(GSource *source)
 AioContext *ctx = (AioContext *) source;
 
 thread_pool_free(ctx-thread_pool);
-aio_set_event_notifier(ctx, ctx-notifier, NULL, NULL);
+aio_set_event_notifier(ctx, ctx-notifier, NULL);
 event_notifier_cleanup(ctx-notifier);
 qemu_mutex_destroy(ctx-bh_lock);
 g_array_free(ctx-pollfds, TRUE);
@@ -243,7 +243,7 @@ AioContext *aio_context_new(void)
 event_notifier_init(ctx-notifier, false);
 aio_set_event_notifier(ctx, ctx-notifier, 
(EventNotifierHandler *)
-   event_notifier_test_and_clear, NULL);
+   event_notifier_test_and_clear);
 
 return ctx;
 }
diff --git a/block/curl.c b/block/curl.c
index 524..e566855 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -93,17 +93,16 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int 
action,
 DPRINTF(CURL (AIO): Sock action %d on fd %d\n, action, fd);
 switch (action) {
 case CURL_POLL_IN:
-qemu_aio_set_fd_handler(fd, curl_multi_do, NULL, NULL, s);
+qemu_aio_set_fd_handler(fd, curl_multi_do, NULL, s);
 break;
 case CURL_POLL_OUT:
-qemu_aio_set_fd_handler(fd, NULL, curl_multi_do, NULL, s);
+qemu_aio_set_fd_handler(fd, NULL, curl_multi_do, s);
 break;
 case CURL_POLL_INOUT:
-qemu_aio_set_fd_handler(fd, curl_multi_do, curl_multi_do,
-NULL, s);
+qemu_aio_set_fd_handler(fd, curl_multi_do, curl_multi_do, s);
 break;
 case CURL_POLL_REMOVE:
-qemu_aio_set_fd_handler(fd, NULL, NULL, NULL, NULL);
+qemu_aio_set_fd_handler(fd, NULL, NULL, NULL);
 break;
 }
 
diff --git a/block/gluster.c b/block/gluster.c
index fa0b9e0..fbdbe97 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -339,7 +339,7 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict 
*options,
 }
 fcntl(s-fds[GLUSTER_FD_READ], F_SETFL, O_NONBLOCK);
 qemu_aio_set_fd_handler(s-fds[GLUSTER_FD_READ],
-qemu_gluster_aio_event_reader, NULL, NULL, s);
+qemu_gluster_aio_event_reader,

Re: [Qemu-devel] [PATCH v6 00/18] aio: drop io_flush()

2013-08-09 Thread Stefan Hajnoczi

On Fri, Aug 09, 2013 at 03:14:11PM +0800, Wenchao Xia wrote:
 于 2013-8-6 23:07, Stefan Hajnoczi 写道:
 On Thu, Jul 25, 2013 at 05:18:07PM +0200, Stefan Hajnoczi wrote:
 Ping?
 
   I tried to apply this series to do more work above it, but seems
 code conflict with upstream.

I sent a new revision of this series based on Kevin's block-next tree:
git://github.com/stefanha/qemu.git bdrv_drain

Stefan

Re: [Qemu-devel] [RFC] [PATCHv8 00/30] aio / timers: Add AioContext timers and use ppoll

2013-08-09 Thread liu ping fan

On Fri, Aug 9, 2013 at 4:12 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2013-08-08 23:41, Alex Bligh wrote:
 This patch series adds support for timers attached to an AioContext clock
 which get called within aio_poll.

 In doing so it removes alarm timers and moves to use ppoll where possible.

 This patch set 'sort of' passes make check (see below for caveat)
 including a new test harness for the aio timers, but has not been
 tested much beyond that. In particular, the win32 changes have not
 even been compile tested. Equally, alterations to use_icount
 are untested.

 Caveat: I have had to alter tests/test-aio.c so the following error
 no longer occurs.

 ERROR:tests/test-aio.c:346:test_wait_event_notifier_noflush: assertion 
 failed: (aio_poll(ctx, false))

 As gar as I can tell, this check was incorrect, in that it checking
 aio_poll makes progress when in fact it should not make progress. I
 fixed an issue where aio_poll was (as far as I can tell) wrongly
 returning true on a timeout, and that generated this error.

 Note also the comment on patch 18 in relation to a possible bug
 in cpus.c.

 The penultimate patch is patch which is created in an automated manner
 using scripts/switch-timer-api, added in this patch set. It violates some
 coding standards (e.g. line length = 80 characters), but this is preferable
 in terms of giving a provably correct conversion.

 This patch set has been compile tested  make check tested on a
 'christmas-tree' configuration, meaning a configuration with every
 --enable- value tested that can be easily configured on Ubuntu Precise,
 after application of each patch.

 Changes since v7:
 * Rebase to master 6fdf98f281f85ae6e2883bed2f691bcfe33b1f9f
 * Add qemu_clock_get_ms and qemu_clock_get_ms
 * Rename qemu_get_clock to qemu_clock_ptr
 * Reorder qemu-timer.h to utilise the legacy API
 * Hide qemu_clock_new  qemu_clock_free
 * Rename default_timerlist to main_loop_timerlist
 * Remove main_loop_timerlist once main_loop_tlg is in
 * Add script to convert to new API
 * Make rtc_clock use new API
 * Convert tests/test-aio to use new API
 * Run script on entire source code
 * Remove legacy API functions

 Changes since v6:
 * Fix build failure in vnc-auth-sasl.c
 * Split first patch into 3
 * Add assert on timerlist_free
 * Fix ==/= error on qemu_clock_use_for_deadline
 * Remove unnecessary cast in aio_timerlist_notify
 * Fix bad deadline comparison in aio_ctx_check
 * Add assert to timerlist_new_from_clock to check init_clocks
 * Use timer_list not tl
 * Change default_timerlistgroup to main_loop_timerlistgroup
 * Add comment on commit for qemu_clock_use_for_deadline
 * Fixed various include file issues
 * Convert *_has_timers and *_has_expired to return bool
 * Make loop variable consistent when looping through clock types
 * Add documentation to existing qemu_timer calls
 * Remove qemu_clock_deadline and move to qemu_clock_deadline_ns

 Changes since v5:
 * Rebase onto master (b9ac5d9)
 * Fix spacing in typedef QEMUTimerList
 * Rename 'QEMUClocks' extern to 'qemu_clocks'

 Changes since v4:
 * Rename qemu_timerlist_ functions to timer_list (per Paolo Bonzini)
 * Rename qemu_timer_.*timerlist.* to timer_ (per Paolo Bonzini)
 * Use enum for QEMUClockType
 * Put clocks into an array; remove global variables
 * Introduce QEMUTimerListGroup - a timeliest of each type
 * Add a QEMUTimerListGroup to AioContext
 * Use a callback on timer modification, rather than binding in
   AioContext into the timeliest
 * Make cpus.c iterate over all timerlists when it does a notify
 * Make cpus.c icount timeout use soonest timeout
   across all timerlists

 Changes since v3:
 * Split up QEMUClock and QEMUClock list
 * Improve commenting
 * Fix comment in vl.c
 * Change test/test-aio.c to reflect correct behaviour in aio_poll.

 Changes since v2:
 * Reordered to remove alarm timers last
 * Added prctl(PR_SET_TIMERSLACK, 1, ...)
 * Renamed qemu_g_poll_ns to qemu_poll_ns
 * Moved declaration of above  drop glib types
 * Do not use a global list of qemu clocks
 * Add AioContext * to QEMUClock
 * Split up conversion to use ppoll and timers
 * Indentation fix
 * Fix aio_win32.c aio_poll to return progress
 * aio_notify / qemu_notify when timers are modified
 * change comment in deprecation of clock options

 Alex Bligh (30):
   aio / timers: Rename qemu_new_clock and expose clock types
   aio / timers: add qemu-timer.c utility functions
   aio / timers: Consistent treatment of disabled clocks for deadlines
   aio / timers: add ppoll support with qemu_poll_ns
   aio / timers: Add prctl(PR_SET_TIMERSLACK, 1, ...) to reduce timer
 slack
   aio / timers: Make qemu_run_timers and qemu_run_all_timers return
 progress
   aio / timers: Split QEMUClock into QEMUClock and QEMUTimerList
   aio / timers: Untangle include files
   aio / timers: Add QEMUTimerListGroup and helper functions
   aio / timers: Add QEMUTimerListGroup to AioContext
   aio / timers: Add a notify callback to QEMUTimerList

Re: [Qemu-devel] [RFC] [PATCHv8 00/30] aio / timers: Add AioContext timers and use ppoll

2013-08-09 Thread Jan Kiszka

On 2013-08-09 10:24, liu ping fan wrote:
 Hi Jan, do you work on pushing hpet onto dedicated thread? If you do,
 I will cease my current work.

No, the HPET is not needed for our current use case, just the MC146818.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

[Qemu-devel] [PATCH v7 07/18] block/curl: drop curl_aio_flush()

2013-08-09 Thread Stefan Hajnoczi

.io_flush() is no longer called so drop curl_aio_flush().  The acb[]
array that the function checks is still used in other parts of
block/curl.c.  Therefore we cannot remove acb[], it is needed.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/curl.c | 22 +++---
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 82d39ff..524 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -86,7 +86,6 @@ typedef struct BDRVCURLState {
 
 static void curl_clean_state(CURLState *s);
 static void curl_multi_do(void *arg);
-static int curl_aio_flush(void *opaque);
 
 static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
 void *s, void *sp)
@@ -94,14 +93,14 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int 
action,
 DPRINTF(CURL (AIO): Sock action %d on fd %d\n, action, fd);
 switch (action) {
 case CURL_POLL_IN:
-qemu_aio_set_fd_handler(fd, curl_multi_do, NULL, curl_aio_flush, 
s);
+qemu_aio_set_fd_handler(fd, curl_multi_do, NULL, NULL, s);
 break;
 case CURL_POLL_OUT:
-qemu_aio_set_fd_handler(fd, NULL, curl_multi_do, curl_aio_flush, 
s);
+qemu_aio_set_fd_handler(fd, NULL, curl_multi_do, NULL, s);
 break;
 case CURL_POLL_INOUT:
 qemu_aio_set_fd_handler(fd, curl_multi_do, curl_multi_do,
-curl_aio_flush, s);
+NULL, s);
 break;
 case CURL_POLL_REMOVE:
 qemu_aio_set_fd_handler(fd, NULL, NULL, NULL, NULL);
@@ -495,21 +494,6 @@ out_noclean:
 return -EINVAL;
 }
 
-static int curl_aio_flush(void *opaque)
-{
-BDRVCURLState *s = opaque;
-int i, j;
-
-for (i=0; i  CURL_NUM_STATES; i++) {
-for(j=0; j  CURL_NUM_ACB; j++) {
-if (s-states[i].acb[j]) {
-return 1;
-}
-}
-}
-return 0;
-}
-
 static void curl_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 // Do we have to implement canceling? Seems to work without...
-- 
1.8.1.4

Re: [Qemu-devel] [PATCH v1 2/5] slirp: make timeout local

2013-08-09 Thread liu ping fan

On Fri, Aug 9, 2013 at 4:06 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 08/08/2013 08:26, Liu Ping Fan ha scritto:
 Each slirp has its own time to caculate timeout.

 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  slirp/slirp.c | 22 ++
  slirp/slirp.h |  3 +++
  2 files changed, 13 insertions(+), 12 deletions(-)

 diff --git a/slirp/slirp.c b/slirp/slirp.c
 index 80b28ea..55654d5 100644
 --- a/slirp/slirp.c
 +++ b/slirp/slirp.c
 @@ -40,8 +40,6 @@ static const uint8_t special_ethaddr[ETH_ALEN] = {
  static const uint8_t zero_ethaddr[ETH_ALEN] = { 0, 0, 0, 0, 0, 0 };

  u_int curtime;
 -static u_int time_fasttimo, last_slowtimo;
 -static int do_slowtimo;

  static QTAILQ_HEAD(slirp_instances, Slirp) slirp_instances =
  QTAILQ_HEAD_INITIALIZER(slirp_instances);
 @@ -278,14 +276,13 @@ void slirp_pollfds_fill(GArray *pollfds)
  /*
   * First, TCP sockets
   */
 -do_slowtimo = 0;

  QTAILQ_FOREACH(slirp, slirp_instances, entry) {
  /*
   * *_slowtimo needs calling if there are IP fragments
   * in the fragment queue, or there are TCP connections active
   */
 -do_slowtimo |= ((slirp-tcb.so_next != slirp-tcb) ||
 +slirp-do_slowtimo = ((slirp-tcb.so_next != slirp-tcb) ||
  (slirp-ipq.ip_link != slirp-ipq.ip_link.next));

  for (so = slirp-tcb.so_next; so != slirp-tcb;
 @@ -299,8 +296,9 @@ void slirp_pollfds_fill(GArray *pollfds)
  /*
   * See if we need a tcp_fasttimo
   */
 -if (time_fasttimo == 0  so-so_tcpcb-t_flags  TF_DELACK) {
 -time_fasttimo = curtime; /* Flag when we want a fasttimo */
 +if (slirp-time_fasttimo == 0 
 +so-so_tcpcb-t_flags  TF_DELACK) {
 +slirp-time_fasttimo = curtime; /* Flag when want a 
 fasttimo */
  }

  /*
 @@ -381,7 +379,7 @@ void slirp_pollfds_fill(GArray *pollfds)
  udp_detach(so);
  continue;
  } else {
 -do_slowtimo = 1; /* Let socket expire */
 +slirp-do_slowtimo = 1; /* Let socket expire */
  }
  }

 @@ -422,7 +420,7 @@ void slirp_pollfds_fill(GArray *pollfds)
  icmp_detach(so);
  continue;
  } else {
 -do_slowtimo = 1; /* Let socket expire */
 +slirp-do_slowtimo = 1; /* Let socket expire */
  }
  }

 @@ -454,14 +452,14 @@ void slirp_pollfds_poll(GArray *pollfds, int 
 select_error)
  /*
   * See if anything has timed out
   */
 -if (time_fasttimo  ((curtime - time_fasttimo) = 2)) {
 +if (slirp-time_fasttimo  ((curtime - slirp-time_fasttimo) = 
 2)) {
  tcp_fasttimo(slirp);
 -time_fasttimo = 0;
 +slirp-time_fasttimo = 0;
  }
 -if (do_slowtimo  ((curtime - last_slowtimo) = 499)) {
 +if (slirp-do_slowtimo  ((curtime - slirp-last_slowtimo) = 
 499)) {
  ip_slowtimo(slirp);
  tcp_slowtimo(slirp);
 -last_slowtimo = curtime;
 +slirp-last_slowtimo = curtime;
  }

  /*
 diff --git a/slirp/slirp.h b/slirp/slirp.h
 index fe0e65d..008360e 100644
 --- a/slirp/slirp.h
 +++ b/slirp/slirp.h
 @@ -203,6 +203,9 @@ bool arp_table_search(Slirp *slirp, uint32_t ip_addr,

  struct Slirp {
  QTAILQ_ENTRY(Slirp) entry;
 +u_int time_fasttimo;
 +u_int last_slowtimo;
 +int do_slowtimo;

  /* virtual network configuration */
  struct in_addr vnetwork_addr;


 Reviewed-by: Paolo Bonzini pbonz...@redhat.com

 In addition, you could modify slirp_update_timeout to run after
 slirp_pollfds_fill, and actually look at the timeout fields to set the
 right timeout (2 if time_fasttimo is nonzero, 499 if do_slowtimo is true).

After patch 3,4, the timeout will be an param for GSource's prepare.
So I think I will set the right timeout value in patch4?

Thanks,
Pingfan
 Paolo

[Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-09 Thread Alexey Kardashevskiy

A PCI device's DMA address space (possibly an IOMMU) is returned by a
method on the PCIBus.  At the moment that only has one caller, so the
method is simply open coded.  We'll need another caller for VFIO, so
this patch introduces a helper/wrapper function.

Signed-off-by: David Gibson da...@gibson.dropbear.id.au
[aik: added inheritance from parent if iommu is not set for the current bus]
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

---
Changes:
v2:
* added inheritance, needed for a pci-bridge on spapr-ppc64
* pci_iommu_as renamed to pci_device_iommu_address_space
---
 hw/pci/pci.c | 22 --
 include/hw/pci/pci.h |  1 +
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 4c004f5..0072b54 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 }
 
 pci_dev-bus = bus;
-if (bus-iommu_fn) {
-dma_as = bus-iommu_fn(bus, bus-iommu_opaque, devfn);
-} else {
-/* FIXME: inherit memory region from bus creator */
-dma_as = address_space_memory;
-}
+dma_as = pci_device_iommu_address_space(pci_dev);
 
 memory_region_init_alias(pci_dev-bus_master_enable_region,
  OBJECT(pci_dev), bus master,
@@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass *klass, 
void *data)
 k-props = pci_props;
 }
 
+AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
+{
+PCIBus *bus = PCI_BUS(dev-bus);
+
+if (bus-iommu_fn) {
+return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
+}
+
+if (bus-parent_dev) {
+return pci_device_iommu_address_space(bus-parent_dev);
+}
+
+return address_space_memory;
+}
+
 void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
 {
 bus-iommu_fn = fn;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index ccec2ba..2374aa9 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -405,6 +405,7 @@ void pci_device_deassert_intx(PCIDevice *dev);
 
 typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
 
+AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
 void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
 
 static inline void
-- 
1.8.3.2

[Qemu-devel] [PATCH] spapr-pci: fix config space access to support bridges

2013-08-09 Thread Alexey Kardashevskiy

spapr-pci config space accessors use find_dev() to find a PCI device.
However find_dev() only searched on a primary bus and did not do
recursive search through secondary buses so config space access was not
possible for devices other that on a primary bus.

This fixed find_dev() by using the PCI API pci_find_device() function.
This effectively enabled pci bridges on spapr.

While we are here, let's add some config space access traces.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 hw/ppc/spapr_pci.c | 14 +++---
 trace-events   |  2 ++
 2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 1ca35a0..2c8e55d 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -65,22 +65,13 @@ static PCIDevice *find_dev(sPAPREnvironment *spapr, 
uint64_t buid,
 {
 sPAPRPHBState *sphb = find_phb(spapr, buid);
 PCIHostState *phb = PCI_HOST_BRIDGE(sphb);
-BusState *bus = BUS(phb-bus);
-BusChild *kid;
 int devfn = (config_addr  8)  0xFF;
 
 if (!phb) {
 return NULL;
 }
 
-QTAILQ_FOREACH(kid, bus-children, sibling) {
-PCIDevice *dev = (PCIDevice *)kid-child;
-if (dev-devfn == devfn) {
-return dev;
-}
-}
-
-return NULL;
+return pci_find_device(phb-bus, (config_addr16)  0xff, devfn);
 }
 
 static uint32_t rtas_pci_cfgaddr(uint32_t arg)
@@ -114,9 +105,9 @@ static void finish_read_pci_config(sPAPREnvironment *spapr, 
uint64_t buid,
 
 val = pci_host_config_read_common(pci_dev, addr,
   pci_config_size(pci_dev), size);
-
 rtas_st(rets, 0, 0);
 rtas_st(rets, 1, val);
+trace_spapr_pci_cfg_read(pci_dev-name, addr, val);
 }
 
 static void rtas_ibm_read_pci_config(PowerPCCPU *cpu, sPAPREnvironment *spapr,
@@ -183,6 +174,7 @@ static void finish_write_pci_config(sPAPREnvironment 
*spapr, uint64_t buid,
  val, size);
 
 rtas_st(rets, 0, 0);
+trace_spapr_pci_cfg_write(pci_dev-name, addr, val);
 }
 
 static void rtas_ibm_write_pci_config(PowerPCCPU *cpu, sPAPREnvironment *spapr,
diff --git a/trace-events b/trace-events
index 3856b5c..f99a8aa 100644
--- a/trace-events
+++ b/trace-events
@@ -1119,6 +1119,8 @@ spapr_pci_rtas_ibm_change_msi(unsigned func, unsigned 
req) func %u, requested %
 spapr_pci_rtas_ibm_query_interrupt_source_number(unsigned ioa, unsigned intr) 
queries for #%u, IRQ%u
 spapr_pci_msi_write(uint64_t addr, uint64_t data, uint32_t dt_irq) 
@%PRIx64=%PRIx64 IRQ %u
 spapr_pci_lsi_set(const char *busname, int pin, uint32_t irq) %s PIN%d IRQ %u
+spapr_pci_cfg_read(const char *dev, unsigned offs, unsigned val) %s @0x%x 
0x%x
+spapr_pci_cfg_write(const char *dev, unsigned offs, unsigned val) %s @0x%x 
0x%x
 
 # hw/ppc/xics.c
 xics_icp_check_ipi(int server, uint8_t mfrr) CPU %d can take IPI mfrr=%#x
-- 
1.8.3.2

Re: [Qemu-devel] [RFC] [PATCHv8 05/30] aio / timers: Add prctl(PR_SET_TIMERSLACK, 1, ...) to reduce timer slack

2013-08-09 Thread Stefan Hajnoczi

On Thu, Aug 08, 2013 at 10:42:02PM +0100, Alex Bligh wrote:
 Where supported, called prctl(PR_SET_TIMERSLACK, 1, ...) to
 set one nanosecond timer slack to increase precision of timer
 calls.
 
 Signed-off-by: Alex Bligh a...@alex.org.uk
 ---
  qemu-timer.c |7 +++
  1 file changed, 7 insertions(+)
 
 diff --git a/qemu-timer.c b/qemu-timer.c
 index 5e81935..9eb6db8 100644
 --- a/qemu-timer.c
 +++ b/qemu-timer.c
 @@ -41,6 +41,10 @@
  #include poll.h
  #endif
  
 +#ifdef CONFIG_PRCTL_PR_SET_TIMERSLACK

The ./configure change should also be in this patch.  I think it crept
into another patch by mistake.

Re: [Qemu-devel] [RFC] [PATCHv8 11/30] aio / timers: Add a notify callback to QEMUTimerList

2013-08-09 Thread Stefan Hajnoczi

On Thu, Aug 08, 2013 at 10:42:08PM +0100, Alex Bligh wrote:
 @@ -213,13 +214,41 @@ QEMUClock *timerlist_get_clock(QEMUTimerList 
 *timer_list);
  bool timerlist_run_timers(QEMUTimerList *timer_list);
  
  /**
 + * timerlist_set_notify_cb:
 + * @timer_list: the timer list to use
 + * @cb: the callback to call on notification
 + * @opaque: the opaque pointer to pass to the callback
 + *
 + * Set the notify callback for a timer list. The notifier
 + * callback is called when the clock is reenabled or a timer
 + * on the list is modified.
 + */
 +void timerlist_set_notify_cb(QEMUTimerList *timer_list,
 + QEMUTimerListNotifyCB *cb, void *opaque);

When looking at thread-safety I had to think about set_notify_cb() for a
while.  The issue is that we add the timerlist to the clock source's
-timerlists *before* notify_cb has been assigned.

This could be a problem is another thread re-enables the clock source
while we are still in timerlist_new().

In practice it is not an issue when AioContexts are created under the
global mutex (that's the case today).

Still, it would be a little safer to drop set_notify_cb() and instead
pass in cb/opaque in timerlist_new().

Here is a patch that does this (against the previous revision of this
patch):

From 75096b8fcafbac598ec0a5eab7a10cfb2e571fdf Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi stefa...@redhat.com
Date: Wed, 7 Aug 2013 15:44:02 +0200
Subject: [PATCH] qemu-timer: set notify_cb in timerlist_new()

Eliminate the race condition between creating a QEMUTimerList and
setting its notify_cb field.  This is important for multi-threading
scenarios where a timerlist can be notified before timerlist_new() has
returned.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 include/qemu/timer.h | 22 --
 qemu-timer.c | 25 -
 2 files changed, 20 insertions(+), 27 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 9989d0e..935c259 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -207,13 +207,20 @@ void qemu_clock_notify(QEMUClock *clock);
 /**
  * timerlist_new:
  * @type: the clock type to associate with the timerlist
+ * @cb: the callback to call on notification
+ * @opaque: the opaque pointer to pass to the callback
  *
  * Create a new timerlist associated with the clock of
  * type @type.
  *
+ * The notifier callback is called when the clock is reenabled or a timer on
+ * the list is modified.
+ *
  * Returns: a pointer to the QEMUTimerList created
  */
-QEMUTimerList *timerlist_new(QEMUClockType type);
+QEMUTimerList *timerlist_new(QEMUClockType type,
+ QEMUTimerListNotifyCB *cb,
+ void *opaque);
 
 /**
  * timerlist_free:
@@ -282,19 +289,6 @@ QEMUClock *timerlist_get_clock(QEMUTimerList *timer_list);
 bool timerlist_run_timers(QEMUTimerList *timer_list);
 
 /**
- * timerlist_set_notify_cb:
- * @timer_list: the timer list to use
- * @cb: the callback to call on notification
- * @opaque: the opaque pointer to pass to the callback
- *
- * Set the notify callback for a timer list. The notifier
- * callback is called when the clock is reenabled or a timer
- * on the list is modified.
- */
-void timerlist_set_notify_cb(QEMUTimerList *timer_list,
- QEMUTimerListNotifyCB *cb, void *opaque);
-
-/**
  * timerlist_notify:
  * @timer_list: the timer list to use
  *
diff --git a/qemu-timer.c b/qemu-timer.c
index a39c4d6..8cb4fe7 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -87,7 +87,9 @@ static bool qemu_timer_expired_ns(QEMUTimer *timer_head, 
int64_t current_time)
 return timer_head  (timer_head-expire_time = current_time);
 }
 
-static QEMUTimerList *timerlist_new_from_clock(QEMUClock *clock)
+static QEMUTimerList *timerlist_new_from_clock(QEMUClock *clock,
+   QEMUTimerListNotifyCB *cb,
+   void *opaque)
 {
 QEMUTimerList *timer_list;
 
@@ -101,13 +103,17 @@ static QEMUTimerList *timerlist_new_from_clock(QEMUClock 
*clock)
 
 timer_list = g_malloc0(sizeof(QEMUTimerList));
 timer_list-clock = clock;
+timer_list-notify_cb = cb;
+timer_list-notify_opaque = opaque;
 QLIST_INSERT_HEAD(clock-timerlists, timer_list, list);
 return timer_list;
 }
 
-QEMUTimerList *timerlist_new(QEMUClockType type)
+QEMUTimerList *timerlist_new(QEMUClockType type,
+ QEMUTimerListNotifyCB *cb,
+ void *opaque)
 {
-return timerlist_new_from_clock(qemu_get_clock(type));
+return timerlist_new_from_clock(qemu_get_clock(type), cb, opaque);
 }
 
 void timerlist_free(QEMUTimerList *timer_list)
@@ -131,7 +137,8 @@ QEMUClock *qemu_clock_new(QEMUClockType type)
 clock-enabled = true;
 clock-last = INT64_MIN;
 notifier_list_init(clock-reset_notifiers);
-clock-default_timerlist = timerlist_new_from_clock(clock);
+

Re: [Qemu-devel] [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Bug 1207686]

2013-08-09 Thread Oliver Francke


Hi Josh,

just opened

http://tracker.ceph.com/issues/5919

with all collected information incl. debug-log.

Hope it helps,

Oliver.

On 08/08/2013 07:01 PM, Josh Durgin wrote:

On 08/08/2013 05:40 AM, Oliver Francke wrote:

Hi Josh,

I have a session logged with:

 debug_ms=1:debug_rbd=20:debug_objectcacher=30

as you requested from Mike, even if I think, we do have another story
here, anyway.

Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is
3.2.0-51-amd...

Do you want me to open a ticket for that stuff? I have about 5MB
compressed logfile waiting for you ;)


Yes, that'd be great. If you could include the time when you saw the 
guest hang that'd be ideal. I'm not sure if this is one or two bugs,

but it seems likely it's a bug in rbd and not qemu.

Thanks!
Josh


Thnx in advance,

Oliver.

On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote:

On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:

Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:

We can un-wedge the guest by opening a NoVNC session or running a
'virsh screenshot' command. After that, the guest resumes and runs
as expected. At that point we can examine the guest. Each time we'll
see:

If virsh screenshot works then this confirms that QEMU itself is still
responding.  Its main loop cannot be blocked since it was able to
process the screendump command.

This supports Josh's theory that a callback is not being invoked.  The
virtio-blk I/O request would be left in a pending state.

Now here is where the behavior varies between configurations:

On a Windows guest with 1 vCPU, you may see the symptom that the 
guest no

longer responds to ping.

On a Linux guest with multiple vCPUs, you may see the hung task message
from the guest kernel because other vCPUs are still making progress.
Just the vCPU that issued the I/O request and whose task is in
UNINTERRUPTIBLE state would really be stuck.

Basically, the symptoms depend not just on how QEMU is behaving but 
also

on the guest kernel and how many vCPUs you have configured.

I think this can explain how both problems you are observing, Oliver 
and

Mike, are a result of the same bug.  At least I hope they are :).

Stefan








--

Oliver Francke

filoo GmbH
Moltkestraße 25a
0 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

Re: [Qemu-devel] [RFC] [PATCHv8 23/30] aio / timers: Rearrange timer.h make legacy functions call non-legacy

2013-08-09 Thread Stefan Hajnoczi

On Thu, Aug 08, 2013 at 10:42:20PM +0100, Alex Bligh wrote:
 @@ -269,17 +299,17 @@ bool timerlist_expired(QEMUTimerList *timer_list);
  int64_t timerlist_deadline_ns(QEMUTimerList *timer_list);
  
  /**
 - * timerlist_getclock:
 + * timerlist_get_clock:
   * @timer_list: the timer list to operate on
   *
   * Read the clock value associated with a timer list.
   * The clock value is normally in nanoseconds, but may
   * not be in some instances (e.g. vm_clock with use_icount).

The documentation is wrong.  This function does not get the clock value
in nanoseconds.

 +/**
 + * timer_put:
 + * @f: the file
 + * @ts: the timer
 + */
 +void timer_put(QEMUFile *f, QEMUTimer *ts);

The struct is still called QEMUTimer, should that also be renamed?

Re: [Qemu-devel] [PATCH v1 2/5] slirp: make timeout local

2013-08-09 Thread Paolo Bonzini

Il 09/08/2013 10:48, liu ping fan ha scritto:
 After patch 3,4, the timeout will be an param for GSource's prepare.
 So I think I will set the right timeout value in patch4?

No, please do it before (in fact you can extract this patch and the
timeout change to a seprate series).  Then patch 4 can simply move the
timeout logic to the GSource's prepare function.

Paolo

Re: [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-09 Thread Paolo Bonzini

Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
 A PCI device's DMA address space (possibly an IOMMU) is returned by a
 method on the PCIBus.  At the moment that only has one caller, so the
 method is simply open coded.  We'll need another caller for VFIO, so
 this patch introduces a helper/wrapper function.
 
 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 [aik: added inheritance from parent if iommu is not set for the current bus]
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 ---
 Changes:
 v2:
 * added inheritance, needed for a pci-bridge on spapr-ppc64
 * pci_iommu_as renamed to pci_device_iommu_address_space
 ---
  hw/pci/pci.c | 22 --
  include/hw/pci/pci.h |  1 +
  2 files changed, 17 insertions(+), 6 deletions(-)
 
 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 4c004f5..0072b54 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
 *pci_dev, PCIBus *bus,
  }
  
  pci_dev-bus = bus;
 -if (bus-iommu_fn) {
 -dma_as = bus-iommu_fn(bus, bus-iommu_opaque, devfn);
 -} else {
 -/* FIXME: inherit memory region from bus creator */
 -dma_as = address_space_memory;
 -}
 +dma_as = pci_device_iommu_address_space(pci_dev);
  
  memory_region_init_alias(pci_dev-bus_master_enable_region,
   OBJECT(pci_dev), bus master,
 @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass *klass, 
 void *data)
  k-props = pci_props;
  }
  
 +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
 +{
 +PCIBus *bus = PCI_BUS(dev-bus);
 +
 +if (bus-iommu_fn) {
 +return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
 +}
 +
 +if (bus-parent_dev) {
 +return pci_device_iommu_address_space(bus-parent_dev);
 +}

No, this would fail if bus-parent_dev is not NULL but not a PCI device
either.  You can use object_dynamic_cast to convert the parent_dev to
PCIDevice, and if the cast succeeds you call the new function.

Perhaps you could make the new function take a PCIBus instead.
Accessing the PCIDevice's IOMMU address space (as opposed to the
bus-master address space) doesn't make much sense, VFIO is really a
special case.  Putting the new API on the bus side instead looks better.

(BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
devices sitting on the secondary bus?)

Paolo

 +return address_space_memory;
 +}
 +
  void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
  {
  bus-iommu_fn = fn;
 diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
 index ccec2ba..2374aa9 100644
 --- a/include/hw/pci/pci.h
 +++ b/include/hw/pci/pci.h
 @@ -405,6 +405,7 @@ void pci_device_deassert_intx(PCIDevice *dev);
  
  typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
  
 +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
  void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
  
  static inline void

Re: [Qemu-devel] [RFC] [PATCHv8 00/30] aio / timers: Add AioContext timers and use ppoll

2013-08-09 Thread Stefan Hajnoczi

On Thu, Aug 08, 2013 at 10:41:57PM +0100, Alex Bligh wrote:
 This patch series adds support for timers attached to an AioContext clock
 which get called within aio_poll.

Patch 29/30 didn't make it to my inbox.  It may have been bounced due to
size.  Using your git repo instead.

Stefan

Re: [Qemu-devel] [SeaBIOS] [PATCH] acpi: hide 64-bit PCI hole for Windows XP

2013-08-09 Thread Gerd Hoffmann

  Hi,

 Converting src/smm.c to use a runtime value isn't hard - just change
 the assembler from: mov $ __stringify(PORT_ACPI_PM_BASE)  + 0x04,
 %dx\n to: mov 4(my_acpi_base), %dx\n and make sure to define the
 global variable my_acpi_base as VARFSEG.

The apm fix brought a ctl register variable we can use directly, so I
tried the attached patch, then got this:

  Linking out/rom.o
out/code32flat.o: In function `smm_relocation_end':
(.text.asm./home/kraxel/projects/seabios/src/smm.c.72+0x37): relocation
truncated to fit: R_386_16 against symbol `acpi_pm1a_cnt' defined in
.data.varfseg./home/kraxel/projects/seabios/src/acpi.c.21 section in
out/code32flat.o
out/code32flat.o: In function `smm_relocation_end':
(.text.asm./home/kraxel/projects/seabios/src/smm.c.72+0x46): relocation
truncated to fit: R_386_16 against symbol `acpi_pm1a_cnt' defined in
.data.varfseg./home/kraxel/projects/seabios/src/acpi.c.21 section in
out/code32flat.o
make: *** [out/rom.o] Error 1

cheers,
  Gerd

From 2d3cf0af70727664c0ab5f17dae99b9f3043b631 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann kra...@redhat.com
Date: Fri, 9 Aug 2013 11:43:51 +0200
Subject: [PATCH] [wip] make pmbase runtime

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 src/acpi.c |2 +-
 src/acpi.h |2 +-
 src/smm.c  |4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 8db1ed4..db33595 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -18,7 +18,7 @@
 
 #include acpi-dsdt.hex
 
-u32 acpi_pm1a_cnt VARFSEG;
+u16 acpi_pm1a_cnt VARFSEG;
 
 static void
 build_header(struct acpi_table_header *h, u32 sig, int len, u8 rev)
diff --git a/src/acpi.h b/src/acpi.h
index f0d24d4..5c478a1 100644
--- a/src/acpi.h
+++ b/src/acpi.h
@@ -36,7 +36,7 @@ struct rsdp_descriptor {/* Root System Descriptor 
Pointer */
 };
 
 extern struct rsdp_descriptor *RsdpAddr;
-extern u32 acpi_pm1a_cnt;
+extern u16 acpi_pm1a_cnt;
 
 /* Table structure from Linux kernel (the ACPI tables are under the
BSD license) */
diff --git a/src/smm.c b/src/smm.c
index a424a29..a788a82 100644
--- a/src/smm.c
+++ b/src/smm.c
@@ -48,7 +48,7 @@ ASM32FLAT(
   jne 1f\n
 
 /* ACPI disable */
-  mov $ __stringify(PORT_ACPI_PM_BASE)  + 0x04, %dx\n /* PMCNTRL */
+  mov (acpi_pm1a_cnt), %dx\n /* PMCNTRL */
   inw %dx, %ax\n
   andw $~1, %ax\n
   outw %ax, %dx\n
@@ -60,7 +60,7 @@ ASM32FLAT(
   jne 2f\n
 
 /* ACPI enable */
-  mov $ __stringify(PORT_ACPI_PM_BASE)  + 0x04, %dx\n /* PMCNTRL */
+  mov (acpi_pm1a_cnt), %dx\n /* PMCNTRL */
   inw %dx, %ax\n
   orw $1, %ax\n
   outw %ax, %dx\n
-- 
1.7.9.7

Re: [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-09 Thread Alexey Kardashevskiy

On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
 Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
 A PCI device's DMA address space (possibly an IOMMU) is returned by a
 method on the PCIBus.  At the moment that only has one caller, so the
 method is simply open coded.  We'll need another caller for VFIO, so
 this patch introduces a helper/wrapper function.

 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 [aik: added inheritance from parent if iommu is not set for the current bus]
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---
 Changes:
 v2:
 * added inheritance, needed for a pci-bridge on spapr-ppc64
 * pci_iommu_as renamed to pci_device_iommu_address_space
 ---
  hw/pci/pci.c | 22 --
  include/hw/pci/pci.h |  1 +
  2 files changed, 17 insertions(+), 6 deletions(-)

 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 4c004f5..0072b54 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
 *pci_dev, PCIBus *bus,
  }
  
  pci_dev-bus = bus;
 -if (bus-iommu_fn) {
 -dma_as = bus-iommu_fn(bus, bus-iommu_opaque, devfn);
 -} else {
 -/* FIXME: inherit memory region from bus creator */
 -dma_as = address_space_memory;
 -}
 +dma_as = pci_device_iommu_address_space(pci_dev);
  
  memory_region_init_alias(pci_dev-bus_master_enable_region,
   OBJECT(pci_dev), bus master,
 @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass *klass, 
 void *data)
  k-props = pci_props;
  }
  
 +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
 +{
 +PCIBus *bus = PCI_BUS(dev-bus);
 +
 +if (bus-iommu_fn) {
 +return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
 +}
 +
 +if (bus-parent_dev) {
 +return pci_device_iommu_address_space(bus-parent_dev);
 +}
 
 No, this would fail if bus-parent_dev is not NULL but not a PCI device
 either.

parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/


 You can use object_dynamic_cast to convert the parent_dev to
 PCIDevice, and if the cast succeeds you call the new function.
 
 Perhaps you could make the new function take a PCIBus instead.
 Accessing the PCIDevice's IOMMU address space (as opposed to the
 bus-master address space) doesn't make much sense, VFIO is really a
 special case.  Putting the new API on the bus side instead looks better.
 
 (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
 devices sitting on the secondary bus?)


It happens naturally I guess when linux enables devices.


-- 
Alexey

[Qemu-devel] mac -vnc has bug

2013-08-09 Thread Peter Cheung

Hi Allqemu 1.6.1 has bug when starting it with -vnc, when i connect it 
using vncviewer, the vncviewer will close automatically.
$uname -aDarwin Peters-MacBook-Air.local 11.4.2 Darwin Kernel Version 11.4.2: 
Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64

compile by:./configure --enable-cocoa --target-list=x86_64-softmmu 
--audio-drv-list=coreaudio --prefix=/Users/peter/qemu --enable-debug 
--disable-werror --extra-cflags=-O2 --enable-vnc
Start qemu by:~/qemu/bin/qemu-system-x86_64 -hda hd10meg.img -k en-us  -m 256m 
-vnc :1

Thanksfrom Peter

Re: [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-09 Thread Paolo Bonzini

Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
 Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
 A PCI device's DMA address space (possibly an IOMMU) is returned by a
 method on the PCIBus.  At the moment that only has one caller, so the
 method is simply open coded.  We'll need another caller for VFIO, so
 this patch introduces a helper/wrapper function.

 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 [aik: added inheritance from parent if iommu is not set for the current bus]
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---
 Changes:
 v2:
 * added inheritance, needed for a pci-bridge on spapr-ppc64
 * pci_iommu_as renamed to pci_device_iommu_address_space
 ---
  hw/pci/pci.c | 22 --
  include/hw/pci/pci.h |  1 +
  2 files changed, 17 insertions(+), 6 deletions(-)

 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 4c004f5..0072b54 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
 *pci_dev, PCIBus *bus,
  }
  
  pci_dev-bus = bus;
 -if (bus-iommu_fn) {
 -dma_as = bus-iommu_fn(bus, bus-iommu_opaque, devfn);
 -} else {
 -/* FIXME: inherit memory region from bus creator */
 -dma_as = address_space_memory;
 -}
 +dma_as = pci_device_iommu_address_space(pci_dev);
  
  memory_region_init_alias(pci_dev-bus_master_enable_region,
   OBJECT(pci_dev), bus master,
 @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass 
 *klass, void *data)
  k-props = pci_props;
  }
  
 +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
 +{
 +PCIBus *bus = PCI_BUS(dev-bus);
 +
 +if (bus-iommu_fn) {
 +return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
 +}
 +
 +if (bus-parent_dev) {
 +return pci_device_iommu_address_space(bus-parent_dev);
 +}

 No, this would fail if bus-parent_dev is not NULL but not a PCI device
 either.
 
 parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/

Doh, I misread the code, I thought it was the parent field in
BusState.  Why do we have parent_dev at all?

 You can use object_dynamic_cast to convert the parent_dev to
 PCIDevice, and if the cast succeeds you call the new function.

 Perhaps you could make the new function take a PCIBus instead.
 Accessing the PCIDevice's IOMMU address space (as opposed to the
 bus-master address space) doesn't make much sense, VFIO is really a
 special case.  Putting the new API on the bus side instead looks better.

 (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
 devices sitting on the secondary bus?)
 
 It happens naturally I guess when linux enables devices.

Yes, but then using the IOMMU address space would be wrong; you would
have to use the bus-master address space as a base for the child's
bus-master address space.

Also, we would have to fix the x86 firmware.

Paolo

[Qemu-devel] [PATCH v5 1/8] vvfat: use bdrv_new() to allocate BlockDriverState

2013-08-09 Thread Fam Zheng

we need bdrv_new() to properly initialize BDS, don't allocate memory
manually.

Signed-off-by: Fam Zheng f...@redhat.com
---
 block/vvfat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index cd3b8ed..a827d91 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2943,7 +2943,7 @@ static int enable_write_target(BDRVVVFATState *s)
 unlink(s-qcow_filename);
 #endif
 
-s-bs-backing_hd = calloc(sizeof(BlockDriverState), 1);
+s-bs-backing_hd = bdrv_new();
 s-bs-backing_hd-drv = vvfat_write_target;
 s-bs-backing_hd-opaque = g_malloc(sizeof(void*));
 *(void**)s-bs-backing_hd-opaque = s;
-- 
1.8.3.1

[Qemu-devel] [PATCH v5 0/8] Implement reference count for BlockDriverState [resend]

2013-08-09 Thread Fam Zheng

[resend to the correct list]

BlockDriverState lifecycle management is needed by future features such as
image fleecing and blockdev-add. This series adds reference count to
BlockDriverState.

The first two patches clean up two odd BlockDriverState use cases, so all code
uses bdrv_new() to create BlockDriverState instance.

Then implemented bdrv_ref() and bdrv_unref() to operate on refcnt: Initially,
refcnt is 1, which means bdrv_unref is effectively a bdrv_delete() here. So
patch 04 has a search and replace to convert bdrv_delete to bdrv_unref, before
bdrv_ref is used anywhere. 05~08 patches calls bdrv_ref for device attach,
block-migration and nbd.

The rule is: Either bdrv_ref() or bdrv_new() must have a matching
bdrv_unref() call, and the last matching bdrv_unref deletes the bs.

v4:
08: Added, let block job use BDS reference.
02: Fix leak of bs.opaque

v3:
03: Removed unnecessary bdrv_close() call.

v2:
05: Removed: block: use BlockDriverState refcnt for device attach/detach
07: Fix xen_disk blk_disconnect() as it depended on device attach refcnt.


Fam Zheng (8):
  vvfat: use bdrv_new() to allocate BlockDriverState
  iscsi: use bdrv_new() instead of stack structure
  block: implement reference count for BlockDriverState
  block: make bdrv_delete() static
  migration: omit drive ref as we have bdrv_ref now
  xen_disk: simplify blk_disconnect with refcnt
  nbd: use BlockDriverState refcnt
  block: use BDS ref for block jobs

 block-migration.c |  4 +--
 block.c   | 44 +++
 block/backup.c|  2 +-
 block/blkverify.c |  4 +--
 block/cow.c   |  2 +-
 block/iscsi.c | 16 ++-
 block/mirror.c|  2 +-
 block/qcow.c  |  2 +-
 block/qcow2.c |  2 +-
 block/qed.c   |  2 +-
 block/sheepdog.c  |  6 ++---
 block/snapshot.c  |  2 +-
 block/stream.c|  2 +-
 block/vmdk.c  | 10 +++
 block/vvfat.c |  6 ++---
 blockdev-nbd.c| 10 +--
 blockdev.c| 67 +--
 blockjob.c|  3 +++
 hw/block/xen_disk.c   | 13 +
 include/block/block.h |  3 ++-
 include/block/block_int.h |  1 +
 nbd.c |  5 
 qemu-img.c| 26 +-
 qemu-io.c |  6 ++---
 24 files changed, 106 insertions(+), 134 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v5 3/8] block: implement reference count for BlockDriverState

2013-08-09 Thread Fam Zheng

Introduce bdrv_ref/bdrv_unref to manage the lifecycle of
BlockDriverState. They are unused for now but will used to replace
bdrv_delete() later.

Signed-off-by: Fam Zheng f...@redhat.com
---
 block.c   | 21 +
 include/block/block.h |  2 ++
 include/block/block_int.h |  1 +
 3 files changed, 24 insertions(+)

diff --git a/block.c b/block.c
index 01b66d8..fea6904 100644
--- a/block.c
+++ b/block.c
@@ -306,6 +306,7 @@ BlockDriverState *bdrv_new(const char *device_name)
 bdrv_iostatus_disable(bs);
 notifier_list_init(bs-close_notifiers);
 notifier_with_return_list_init(bs-before_write_notifiers);
+bs-refcnt = 1;
 
 return bs;
 }
@@ -1517,6 +1518,9 @@ static void bdrv_move_feature_fields(BlockDriverState 
*bs_dest,
 /* dirty bitmap */
 bs_dest-dirty_bitmap   = bs_src-dirty_bitmap;
 
+/* reference count */
+bs_dest-refcnt = bs_src-refcnt;
+
 /* job */
 bs_dest-in_use = bs_src-in_use;
 bs_dest-job= bs_src-job;
@@ -4391,6 +4395,23 @@ int64_t bdrv_get_dirty_count(BlockDriverState *bs)
 }
 }
 
+/* Get a reference to bs */
+void bdrv_ref(BlockDriverState *bs)
+{
+bs-refcnt++;
+}
+
+/* Release a previously grabbed reference to bs.
+ * If after releasing, reference count is zero, the BlockDriverState is
+ * deleted. */
+void bdrv_unref(BlockDriverState *bs)
+{
+assert(bs-refcnt  0);
+if (--bs-refcnt == 0) {
+bdrv_delete(bs);
+}
+}
+
 void bdrv_set_in_use(BlockDriverState *bs, int in_use)
 {
 assert(bs-in_use != in_use);
diff --git a/include/block/block.h b/include/block/block.h
index 742fce5..b33ef62 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -356,6 +356,8 @@ int64_t bdrv_get_dirty_count(BlockDriverState *bs);
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
 
+void bdrv_ref(BlockDriverState *bs);
+void bdrv_unref(BlockDriverState *bs);
 void bdrv_set_in_use(BlockDriverState *bs, int in_use);
 int bdrv_in_use(BlockDriverState *bs);
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e45f2a0..1f85cfb 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -294,6 +294,7 @@ struct BlockDriverState {
 BlockDeviceIoStatus iostatus;
 char device_name[32];
 HBitmap *dirty_bitmap;
+int refcnt;
 int in_use; /* users other than guest access, eg. block migration */
 QTAILQ_ENTRY(BlockDriverState) list;
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v5 6/8] xen_disk: simplify blk_disconnect with refcnt

2013-08-09 Thread Fam Zheng

We call bdrv_attach_dev when initializing whether or not bs is created
locally, so call bdrv_detach_dev and let the refcnt handle the
lifecycle.

Signed-off-by: Fam Zheng f...@redhat.com
---
 hw/block/xen_disk.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 8bfa04e..668cc06 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -824,6 +824,9 @@ static int blk_connect(struct XenDevice *xendev)
 /* setup via qemu cmdline - already setup for us */
 xen_be_printf(blkdev-xendev, 2, get configured bdrv (cmdline 
setup)\n);
 blkdev-bs = blkdev-dinfo-bdrv;
+/* blkdev-bs is not create by us, we get a reference
+ * so we can bdrv_unref() unconditionally */
+bdrv_ref(blkdev-bs);
 }
 bdrv_attach_dev_nofail(blkdev-bs, blkdev);
 blkdev-file_size = bdrv_getlength(blkdev-bs);
@@ -922,12 +925,8 @@ static void blk_disconnect(struct XenDevice *xendev)
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
 
 if (blkdev-bs) {
-if (!blkdev-dinfo) {
-/* close/delete only if we created it ourself */
-bdrv_close(blkdev-bs);
-bdrv_detach_dev(blkdev-bs, blkdev);
-bdrv_unref(blkdev-bs);
-}
+bdrv_detach_dev(blkdev-bs, blkdev);
+bdrv_unref(blkdev-bs);
 blkdev-bs = NULL;
 }
 xen_be_unbind_evtchn(blkdev-xendev);
-- 
1.8.3.1

[Qemu-devel] [PATCH v5 2/8] iscsi: use bdrv_new() instead of stack structure

2013-08-09 Thread Fam Zheng

BlockDriverState structure needs bdrv_new() to initialize refcnt, don't
allocate a local structure variable and memset to 0, becasue with coming
refcnt implementation, bdrv_unref will crash if bs-refcnt not
initialized to 1.

Signed-off-by: Fam Zheng f...@redhat.com
---
 block/iscsi.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index e7c1c2b..123f058 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1249,11 +1249,11 @@ static int iscsi_create(const char *filename, 
QEMUOptionParameter *options)
 {
 int ret = 0;
 int64_t total_size = 0;
-BlockDriverState bs;
+BlockDriverState *bs;
 IscsiLun *iscsilun = NULL;
 QDict *bs_options;
 
-memset(bs, 0, sizeof(BlockDriverState));
+bs = bdrv_new();
 
 /* Read out options */
 while (options  options-name) {
@@ -1263,12 +1263,12 @@ static int iscsi_create(const char *filename, 
QEMUOptionParameter *options)
 options++;
 }
 
-bs.opaque = g_malloc0(sizeof(struct IscsiLun));
-iscsilun = bs.opaque;
+bs-opaque = g_malloc0(sizeof(struct IscsiLun));
+iscsilun = bs-opaque;
 
 bs_options = qdict_new();
 qdict_put(bs_options, filename, qstring_from_str(filename));
-ret = iscsi_open(bs, bs_options, 0);
+ret = iscsi_open(bs, bs_options, 0);
 QDECREF(bs_options);
 
 if (ret != 0) {
@@ -1282,7 +1282,7 @@ static int iscsi_create(const char *filename, 
QEMUOptionParameter *options)
 ret = -ENODEV;
 goto out;
 }
-if (bs.total_sectors  total_size) {
+if (bs-total_sectors  total_size) {
 ret = -ENOSPC;
 goto out;
 }
@@ -1292,7 +1292,9 @@ out:
 if (iscsilun-iscsi != NULL) {
 iscsi_destroy_context(iscsilun-iscsi);
 }
-g_free(bs.opaque);
+g_free(bs-opaque);
+bs-opaque = NULL;
+bdrv_delete(bs);
 return ret;
 }
 
-- 
1.8.3.1

Re: [Qemu-devel] [RFC] [PATCHv8 09/30] aio / timers: Add QEMUTimerListGroup and helper functions

2013-08-09 Thread Paolo Bonzini

Il 08/08/2013 23:42, Alex Bligh ha scritto:
 Add QEMUTimerListGroup and helper functions, to represent
 a QEMUTimerList associated with each clock. Add a default
 QEMUTimerListGroup representing the default timer lists
 which are not associated with any other object (e.g.
 an AioContext as added by future patches).
 
 Signed-off-by: Alex Bligh a...@alex.org.uk
 ---
  include/qemu/timer.h |   45 +
  qemu-timer.c |   41 +
  2 files changed, 86 insertions(+)
 
 diff --git a/include/qemu/timer.h b/include/qemu/timer.h
 index 1baa0e2..3b46f60 100644
 --- a/include/qemu/timer.h
 +++ b/include/qemu/timer.h
 @@ -52,8 +52,10 @@ typedef enum {
  
  typedef struct QEMUClock QEMUClock;
  typedef struct QEMUTimerList QEMUTimerList;
 +typedef QEMUTimerList *QEMUTimerListGroup[QEMU_CLOCK_MAX];

Please wrap this in a struct for easier future extensibility.

I'm not a big fan of the TimerListGroup name, but I cannot think of
anything better...

... except if we get rid of TimerListGroup completely and put it in
AioContext.  To do so, we can have _two_ AioContexts in main-loop.c.
One is for the block layer, the other is only used for timers.  Later we
could move bottom halves from the first to the second, too.

Sorry for not thinking of this before.

Paolo

  typedef void QEMUTimerCB(void *opaque);
  
 +extern QEMUTimerListGroup main_loop_tlg;
  extern QEMUClock *qemu_clocks[QEMU_CLOCK_MAX];
  
  /**
 @@ -211,6 +213,49 @@ QEMUClock *timerlist_get_clock(QEMUTimerList 
 *timer_list);
  bool timerlist_run_timers(QEMUTimerList *timer_list);
  
  /**
 + * timerlistgroup_init:
 + * @tlg: the timer list group
 + *
 + * Initialise a timer list group. This must already be
 + * allocated in memory and zeroed.
 + */
 +void timerlistgroup_init(QEMUTimerListGroup tlg);
 +
 +/**
 + * timerlistgroup_deinit:
 + * @tlg: the timer list group
 + *
 + * Deinitialise a timer list group. This must already be
 + * initialised. Note the memory is not freed.
 + */
 +void timerlistgroup_deinit(QEMUTimerListGroup tlg);
 +
 +/**
 + * timerlistgroup_run_timers:
 + * @tlg: the timer list group
 + *
 + * Run the timers associated with a timer list group.
 + * This will run timers on multiple clocks.
 + *
 + * Returns: true if any timer callback ran
 + */
 +bool timerlistgroup_run_timers(QEMUTimerListGroup tlg);
 +
 +/**
 + * timerlistgroup_deadline_ns
 + * @tlg: the timer list group
 + *
 + * Determine the deadline of the soonest timer to
 + * expire associated with any timer list linked to
 + * the timer list group. Only clocks suitable for
 + * deadline calculation are included.
 + *
 + * Returns: the deadline in nanoseconds or -1 if no
 + * timers are to expire.
 + */
 +int64_t timerlistgroup_deadline_ns(QEMUTimerListGroup tlg);
 +
 +/**
   * qemu_timeout_ns_to_ms:
   * @ns: nanosecond timeout value
   *
 diff --git a/qemu-timer.c b/qemu-timer.c
 index 1a0a4b1..e151d24 100644
 --- a/qemu-timer.c
 +++ b/qemu-timer.c
 @@ -59,6 +59,7 @@ struct QEMUClock {
  bool enabled;
  };
  
 +QEMUTimerListGroup main_loop_tlg;
  QEMUClock *qemu_clocks[QEMU_CLOCK_MAX];
  
  /* A QEMUTimerList is a list of timers attached to a clock. More
 @@ -575,6 +576,45 @@ bool qemu_run_timers(QEMUClock *clock)
  return timerlist_run_timers(clock-main_loop_timerlist);
  }
  
 +void timerlistgroup_init(QEMUTimerListGroup tlg)
 +{
 +QEMUClockType type;
 +for (type = 0; type  QEMU_CLOCK_MAX; type++) {
 +tlg[type] = timerlist_new(type);
 +}
 +}
 +
 +void timerlistgroup_deinit(QEMUTimerListGroup tlg)
 +{
 +QEMUClockType type;
 +for (type = 0; type  QEMU_CLOCK_MAX; type++) {
 +timerlist_free(tlg[type]);
 +}
 +}
 +
 +bool timerlistgroup_run_timers(QEMUTimerListGroup tlg)
 +{
 +QEMUClockType type;
 +bool progress = false;
 +for (type = 0; type  QEMU_CLOCK_MAX; type++) {
 +progress |= timerlist_run_timers(tlg[type]);
 +}
 +return progress;
 +}
 +
 +int64_t timerlistgroup_deadline_ns(QEMUTimerListGroup tlg)
 +{
 +int64_t deadline = -1;
 +QEMUClockType type;
 +for (type = 0; type  QEMU_CLOCK_MAX; type++) {
 +if (qemu_clock_use_for_deadline(tlg[type]-clock)) {
 +deadline = qemu_soonest_timeout(deadline,
 +
 timerlist_deadline_ns(tlg[type]));
 +}
 +}
 +return deadline;
 +}
 +
  int64_t qemu_get_clock_ns(QEMUClock *clock)
  {
  int64_t now, last;
 @@ -616,6 +656,7 @@ void init_clocks(void)
  for (type = 0; type  QEMU_CLOCK_MAX; type++) {
  if (!qemu_clocks[type]) {
  qemu_clocks[type] = qemu_clock_new(type);
 +main_loop_tlg[type] = qemu_clocks[type]-main_loop_timerlist;
  }
  }

[Qemu-devel] [PATCH v5 4/8] block: make bdrv_delete() static

2013-08-09 Thread Fam Zheng

Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no
longer public and should be called by bdrv_unref() if refcnt is
decreased to 0.

This is an identical change because effectively, there's no multiple
reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets
bs-refcnt to 1, so all bdrv_unref() now actually delete the BDS.

Signed-off-by: Fam Zheng f...@redhat.com
---
 block.c   | 23 ---
 block/backup.c|  2 +-
 block/blkverify.c |  4 ++--
 block/cow.c   |  2 +-
 block/iscsi.c |  2 +-
 block/mirror.c|  2 +-
 block/qcow.c  |  2 +-
 block/qcow2.c |  2 +-
 block/qed.c   |  2 +-
 block/sheepdog.c  |  6 +++---
 block/snapshot.c  |  2 +-
 block/stream.c|  2 +-
 block/vmdk.c  | 10 +-
 block/vvfat.c |  4 ++--
 blockdev.c| 14 +++---
 hw/block/xen_disk.c   |  4 ++--
 include/block/block.h |  1 -
 qemu-img.c| 26 +-
 qemu-io.c |  6 +++---
 19 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/block.c b/block.c
index fea6904..73e9abd 100644
--- a/block.c
+++ b/block.c
@@ -877,7 +877,7 @@ fail:
 if (!bs-drv) {
 QDECREF(bs-options);
 }
-bdrv_delete(bs);
+bdrv_unref(bs);
 return ret;
 }
 
@@ -928,7 +928,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options)
 *backing_filename ? backing_filename : NULL, options,
 back_flags, back_drv);
 if (ret  0) {
-bdrv_delete(bs-backing_hd);
+bdrv_unref(bs-backing_hd);
 bs-backing_hd = NULL;
 bs-open_flags |= BDRV_O_NO_BACKING;
 return ret;
@@ -1003,12 +1003,12 @@ int bdrv_open(BlockDriverState *bs, const char 
*filename, QDict *options,
 bs1 = bdrv_new();
 ret = bdrv_open(bs1, filename, NULL, 0, drv);
 if (ret  0) {
-bdrv_delete(bs1);
+bdrv_unref(bs1);
 goto fail;
 }
 total_size = bdrv_getlength(bs1)  BDRV_SECTOR_MASK;
 
-bdrv_delete(bs1);
+bdrv_unref(bs1);
 
 ret = get_tmp_filename(tmp_filename, sizeof(tmp_filename));
 if (ret  0) {
@@ -1082,7 +1082,7 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
QDict *options,
 }
 
 if (bs-file != file) {
-bdrv_delete(file);
+bdrv_unref(file);
 file = NULL;
 }
 
@@ -1122,7 +1122,7 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
QDict *options,
 
 unlink_and_fail:
 if (file != NULL) {
-bdrv_delete(file);
+bdrv_unref(file);
 }
 if (bs-is_temporary) {
 unlink(filename);
@@ -1383,7 +1383,7 @@ void bdrv_close(BlockDriverState *bs)
 
 if (bs-drv) {
 if (bs-backing_hd) {
-bdrv_delete(bs-backing_hd);
+bdrv_unref(bs-backing_hd);
 bs-backing_hd = NULL;
 }
 bs-drv-bdrv_close(bs);
@@ -1407,7 +1407,7 @@ void bdrv_close(BlockDriverState *bs)
 bs-options = NULL;
 
 if (bs-file != NULL) {
-bdrv_delete(bs-file);
+bdrv_unref(bs-file);
 bs-file = NULL;
 }
 }
@@ -1604,11 +1604,12 @@ void bdrv_append(BlockDriverState *bs_new, 
BlockDriverState *bs_top)
 bs_new-drv ? bs_new-drv-format_name : );
 }
 
-void bdrv_delete(BlockDriverState *bs)
+static void bdrv_delete(BlockDriverState *bs)
 {
 assert(!bs-dev);
 assert(!bs-job);
 assert(!bs-in_use);
+assert(!bs-refcnt);
 
 /* remove from list, if necessary */
 bdrv_make_anon(bs);
@@ -2124,7 +2125,7 @@ int bdrv_drop_intermediate(BlockDriverState *active, 
BlockDriverState *top,
 QSIMPLEQ_FOREACH_SAFE(intermediate_state, states_to_delete, entry, next) {
 /* so that bdrv_close() does not recursively close the chain */
 intermediate_state-bs-backing_hd = NULL;
-bdrv_delete(intermediate_state-bs);
+bdrv_unref(intermediate_state-bs);
 }
 ret = 0;
 
@@ -4625,7 +4626,7 @@ out:
 free_option_parameters(param);
 
 if (bs) {
-bdrv_delete(bs);
+bdrv_unref(bs);
 }
 }
 
diff --git a/block/backup.c b/block/backup.c
index 6ae8a05..586d941 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -338,7 +338,7 @@ static void coroutine_fn backup_run(void *opaque)
 hbitmap_free(job-bitmap);
 
 bdrv_iostatus_disable(target);
-bdrv_delete(target);
+bdrv_unref(target);
 
 block_job_completed(job-common, ret);
 }
diff --git a/block/blkverify.c b/block/blkverify.c
index 1d58cc3..c4e961e 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -155,7 +155,7 @@ static int blkverify_open(BlockDriverState *bs, QDict 
*options, int flags)
 s-test_file = bdrv_new();
 ret = bdrv_open(s-test_file, filename, NULL, flags, NULL);
 if (ret  0) {
-bdrv_delete(s-test_file);
+bdrv_unref(s-test_file);
 s-test_file

Re: [Qemu-devel] [RFC] [PATCHv8 07/30] aio / timers: Split QEMUClock into QEMUClock and QEMUTimerList

2013-08-09 Thread Paolo Bonzini

Il 08/08/2013 23:42, Alex Bligh ha scritto:
 Split QEMUClock into QEMUClock and QEMUTimerList so that we can
 have more than one QEMUTimerList associated with the same clock.
 
 Introduce a main_loop_timerlist concept and make existing
 qemu_clock_* calls that actually should operate on a QEMUTimerList
 call the relevant QEMUTimerList implementations, using the clock's
 default timerlist. This vastly reduces the invasiveness of this
 change and means the API stays constant for existing users.
 
 Introduce a list of QEMUTimerLists associated with each clock
 so that reenabling the clock can cause all the notifiers
 to be called. Note the code to do the notifications is added
 in a later patch.
 
 Switch QEMUClockType to an enum. Remove global variables vm_clock,
 host_clock and rt_clock and add compatibility defines. Do not
 fix qemu_next_alarm_deadline as it's going to be deleted.
 
 Add qemu_clock_use_for_deadline to indicate whether a particular
 clock should be used for deadline calculations. When use_icount
 is true, vm_clock should not be used for deadline calculations
 as it does not contain a nanosecond count. Instead, icount
 timeouts come from the execution thread doing aio_notify or
 qemu_notify as appropriate. This function is used in the next
 patch.
 
 Signed-off-by: Alex Bligh a...@alex.org.uk
 ---
  include/qemu/timer.h |  406 
 +++---
  qemu-timer.c |  200 +++--
  2 files changed, 536 insertions(+), 70 deletions(-)
 
 diff --git a/include/qemu/timer.h b/include/qemu/timer.h
 index c270144..6c2bf6c 100644
 --- a/include/qemu/timer.h
 +++ b/include/qemu/timer.h
 @@ -11,34 +11,75 @@
  #define SCALE_US 1000
  #define SCALE_NS 1
  
 -#define QEMU_CLOCK_REALTIME 0
 -#define QEMU_CLOCK_VIRTUAL  1
 -#define QEMU_CLOCK_HOST 2
 +/**
 + * QEMUClockType:
 + *
 + * The following clock types are available:
 + *
 + * @QEMU_CLOCK_REALTIME: Real time clock
 + *
 + * The real time clock should be used only for stuff which does not
 + * change the virtual machine state, as it is run even if the virtual
 + * machine is stopped. The real time clock has a frequency of 1000
 + * Hz.
 + *
 + * Formerly rt_clock
 + *
 + * @QEMU_CLOCK_VIRTUAL: virtual clock
 + *
 + * The virtual clock is only run during the emulation. It is stopped
 + * when the virtual machine is stopped. Virtual timers use a high
 + * precision clock, usually cpu cycles (use ticks_per_sec).
 + *
 + * Formerly vm_clock
 + *
 + * @QEMU_CLOCK_HOST: host clock
 + *
 + * The host clock should be use for device models that emulate accurate
 + * real time sources. It will continue to run when the virtual machine
 + * is suspended, and it will reflect system time changes the host may
 + * undergo (e.g. due to NTP). The host clock has the same precision as
 + * the virtual clock.
 + *
 + * Formerly host_clock
 + */
 +
 +typedef enum {
 +QEMU_CLOCK_REALTIME = 0,
 +QEMU_CLOCK_VIRTUAL = 1,
 +QEMU_CLOCK_HOST = 2,
 +QEMU_CLOCK_MAX
 +} QEMUClockType;
  
  typedef struct QEMUClock QEMUClock;
 +typedef struct QEMUTimerList QEMUTimerList;
  typedef void QEMUTimerCB(void *opaque);
  
 -/* The real time clock should be used only for stuff which does not
 -   change the virtual machine state, as it is run even if the virtual
 -   machine is stopped. The real time clock has a frequency of 1000
 -   Hz. */
 -extern QEMUClock *rt_clock;
 +extern QEMUClock *qemu_clocks[QEMU_CLOCK_MAX];
  
 -/* The virtual clock is only run during the emulation. It is stopped
 -   when the virtual machine is stopped. Virtual timers use a high
 -   precision clock, usually cpu cycles (use ticks_per_sec). */
 -extern QEMUClock *vm_clock;
 +/**
 + * qemu_clock_ptr:
 + * @type: type of clock
 + *
 + * Translate a clock type into a pointer to QEMUClock object.
 + *
 + * Returns: a pointer to the QEMUClock object
 + */
 +static inline QEMUClock *qemu_clock_ptr(QEMUClockType type)
 +{
 +return qemu_clocks[type];
 +}
  
 -/* The host clock should be use for device models that emulate accurate
 -   real time sources. It will continue to run when the virtual machine
 -   is suspended, and it will reflect system time changes the host may
 -   undergo (e.g. due to NTP). The host clock has the same precision as
 -   the virtual clock. */
 -extern QEMUClock *host_clock;
 +/* These three clocks are maintained here with separate variable
 + * names for compatibility only.
 + */
 +#define rt_clock (qemu_clock_ptr(QEMU_CLOCK_REALTIME))
 +#define vm_clock (qemu_clock_ptr(QEMU_CLOCK_VIRTUAL))
 +#define host_clock (qemu_clock_ptr(QEMU_CLOCK_HOST))
  
  int64_t qemu_get_clock_ns(QEMUClock *clock);
 -int64_t qemu_clock_has_timers(QEMUClock *clock);
 -int64_t qemu_clock_expired(QEMUClock *clock);
 +bool qemu_clock_has_timers(QEMUClock *clock);
 +bool qemu_clock_expired(QEMUClock *clock);
  int64_t qemu_clock_deadline(QEMUClock *clock);
  
  /**
 @@ -53,6 +94,124 @@ int64_t qemu_clock_deadline(QEMUClock *clock);

[Qemu-devel] [PATCH v5 7/8] nbd: use BlockDriverState refcnt

2013-08-09 Thread Fam Zheng

Previously, nbd calls drive_get_ref() on the drive of bs. A BDS doesn't
always have associated dinfo, which nbd doesn't care either. We already
have BDS ref count, so use it to make it safe for a BDS w/o blockdev.

Signed-off-by: Fam Zheng f...@redhat.com
---
 blockdev-nbd.c | 10 +-
 nbd.c  |  5 +
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 95f10c8..922cf56 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -69,12 +69,6 @@ static void nbd_close_notifier(Notifier *n, void *data)
 g_free(cn);
 }
 
-static void nbd_server_put_ref(NBDExport *exp)
-{
-BlockDriverState *bs = nbd_export_get_blockdev(exp);
-drive_put_ref(drive_get_by_blockdev(bs));
-}
-
 void qmp_nbd_server_add(const char *device, bool has_writable, bool writable,
 Error **errp)
 {
@@ -105,11 +99,9 @@ void qmp_nbd_server_add(const char *device, bool 
has_writable, bool writable,
 writable = false;
 }
 
-exp = nbd_export_new(bs, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY,
- nbd_server_put_ref);
+exp = nbd_export_new(bs, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY, NULL);
 
 nbd_export_set_name(exp, device);
-drive_get_ref(drive_get_by_blockdev(bs));
 
 n = g_malloc0(sizeof(NBDCloseNotifier));
 n-n.notify = nbd_close_notifier;
diff --git a/nbd.c b/nbd.c
index 2606403..f258cdd 100644
--- a/nbd.c
+++ b/nbd.c
@@ -881,6 +881,7 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
dev_offset,
 exp-nbdflags = nbdflags;
 exp-size = size == -1 ? bdrv_getlength(bs) : size;
 exp-close = close;
+bdrv_ref(bs);
 return exp;
 }
 
@@ -927,6 +928,10 @@ void nbd_export_close(NBDExport *exp)
 }
 nbd_export_set_name(exp, NULL);
 nbd_export_put(exp);
+if (exp-bs) {
+bdrv_unref(exp-bs);
+exp-bs = NULL;
+}
 }
 
 void nbd_export_get(NBDExport *exp)
-- 
1.8.3.1

Re: [Qemu-devel] [RFC] [PATCHv8 15/30] aio / timers: Convert mainloop to use timeout

2013-08-09 Thread Paolo Bonzini

Il 08/08/2013 23:42, Alex Bligh ha scritto:
 Convert mainloop to use timeout from default timerlist group
 (i.e. the current 3 static timers)

And with two AioContexts in the main loop this patch disappears
completely, since the deadline is computed by the second AioContext.

Paolo

 Signed-off-by: Alex Bligh a...@alex.org.uk
 ---
  main-loop.c |   45 ++---
  1 file changed, 34 insertions(+), 11 deletions(-)
 
 diff --git a/main-loop.c b/main-loop.c
 index a44fff6..00e54bd 100644
 --- a/main-loop.c
 +++ b/main-loop.c
 @@ -155,10 +155,11 @@ static int max_priority;
  static int glib_pollfds_idx;
  static int glib_n_poll_fds;
  
 -static void glib_pollfds_fill(uint32_t *cur_timeout)
 +static void glib_pollfds_fill(int64_t *cur_timeout)
  {
  GMainContext *context = g_main_context_default();
  int timeout = 0;
 +int64_t timeout_ns;
  int n;
  
  g_main_context_prepare(context, max_priority);
 @@ -174,9 +175,13 @@ static void glib_pollfds_fill(uint32_t *cur_timeout)
   glib_n_poll_fds);
  } while (n != glib_n_poll_fds);
  
 -if (timeout = 0  timeout  *cur_timeout) {
 -*cur_timeout = timeout;
 +if (timeout  0) {
 +timeout_ns = -1;
 +} else {
 +timeout_ns = (int64_t)timeout * (int64_t)SCALE_MS;
  }
 +
 +*cur_timeout = qemu_soonest_timeout(timeout_ns, *cur_timeout);
  }
  
  static void glib_pollfds_poll(void)
 @@ -191,7 +196,7 @@ static void glib_pollfds_poll(void)
  
  #define MAX_MAIN_LOOP_SPIN (1000)
  
 -static int os_host_main_loop_wait(uint32_t timeout)
 +static int os_host_main_loop_wait(int64_t timeout)
  {
  int ret;
  static int spin_counter;
 @@ -214,7 +219,7 @@ static int os_host_main_loop_wait(uint32_t timeout)
  notified = true;
  }
  
 -timeout = 1;
 +timeout = SCALE_MS;
  }
  
  if (timeout  0) {
 @@ -224,7 +229,7 @@ static int os_host_main_loop_wait(uint32_t timeout)
  spin_counter++;
  }
  
 -ret = g_poll((GPollFD *)gpollfds-data, gpollfds-len, timeout);
 +ret = qemu_poll_ns((GPollFD *)gpollfds-data, gpollfds-len, timeout);
  
  if (timeout  0) {
  qemu_mutex_lock_iothread();
 @@ -373,7 +378,7 @@ static void pollfds_poll(GArray *pollfds, int nfds, 
 fd_set *rfds,
  }
  }
  
 -static int os_host_main_loop_wait(uint32_t timeout)
 +static int os_host_main_loop_wait(int64_t timeout)
  {
  GMainContext *context = g_main_context_default();
  GPollFD poll_fds[1024 * 2]; /* this is probably overkill */
 @@ -382,6 +387,7 @@ static int os_host_main_loop_wait(uint32_t timeout)
  PollingEntry *pe;
  WaitObjects *w = wait_objects;
  gint poll_timeout;
 +int64_t poll_timeout_ns;
  static struct timeval tv0;
  fd_set rfds, wfds, xfds;
  int nfds;
 @@ -419,12 +425,17 @@ static int os_host_main_loop_wait(uint32_t timeout)
  poll_fds[n_poll_fds + i].events = G_IO_IN;
  }
  
 -if (poll_timeout  0 || timeout  poll_timeout) {
 -poll_timeout = timeout;
 +if (poll_timeout  0) {
 +poll_timeout_ns = -1;
 +} else {
 +poll_timeout_ns = (int64_t)poll_timeout * (int64_t)SCALE_MS;
  }
  
 +poll_timeout_ns = qemu_soonest_timeout(poll_timeout_ns, timeout);
 +
  qemu_mutex_unlock_iothread();
 -g_poll_ret = g_poll(poll_fds, n_poll_fds + w-num, poll_timeout);
 +g_poll_ret = qemu_poll_ns(poll_fds, n_poll_fds + w-num, 
 poll_timeout_ns);
 +
  qemu_mutex_lock_iothread();
  if (g_poll_ret  0) {
  for (i = 0; i  w-num; i++) {
 @@ -449,6 +460,7 @@ int main_loop_wait(int nonblocking)
  {
  int ret;
  uint32_t timeout = UINT32_MAX;
 +int64_t timeout_ns;
  
  if (nonblocking) {
  timeout = 0;
 @@ -462,7 +474,18 @@ int main_loop_wait(int nonblocking)
  slirp_pollfds_fill(gpollfds);
  #endif
  qemu_iohandler_fill(gpollfds);
 -ret = os_host_main_loop_wait(timeout);
 +
 +if (timeout == UINT32_MAX) {
 +timeout_ns = -1;
 +} else {
 +timeout_ns = (uint64_t)timeout * (int64_t)(SCALE_MS);
 +}
 +
 +timeout_ns = qemu_soonest_timeout(timeout_ns,
 +  timerlistgroup_deadline_ns(
 +  main_loop_tlg));
 +
 +ret = os_host_main_loop_wait(timeout_ns);
  qemu_iohandler_poll(gpollfds, ret);
  #ifdef CONFIG_SLIRP
  slirp_pollfds_poll(gpollfds, (ret  0));

[Qemu-devel] [PATCH v5 8/8] block: use BDS ref for block jobs

2013-08-09 Thread Fam Zheng

Block jobs used drive_get_ref(drive_get_by_blockdev(bs)) to avoid BDS
being deleted. Now we have BDS reference count, and block jobs don't
care about dinfo, so replace them to get cleaner code. It is also the
safe way when BDS has no drive info.

Signed-off-by: Fam Zheng f...@redhat.com
---
 blockdev.c | 53 -
 blockjob.c |  3 +++
 2 files changed, 3 insertions(+), 53 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 5b86c9d..f6961e1 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -233,37 +233,6 @@ void drive_get_ref(DriveInfo *dinfo)
 dinfo-refcount++;
 }
 
-typedef struct {
-QEMUBH *bh;
-DriveInfo *dinfo;
-} DrivePutRefBH;
-
-static void drive_put_ref_bh(void *opaque)
-{
-DrivePutRefBH *s = opaque;
-
-drive_put_ref(s-dinfo);
-qemu_bh_delete(s-bh);
-g_free(s);
-}
-
-/*
- * Release a drive reference in a BH
- *
- * It is not possible to use drive_put_ref() from a callback function when the
- * callers still need the drive.  In such cases we schedule a BH to release the
- * reference.
- */
-static void drive_put_ref_bh_schedule(DriveInfo *dinfo)
-{
-DrivePutRefBH *s;
-
-s = g_new(DrivePutRefBH, 1);
-s-bh = qemu_bh_new(drive_put_ref_bh, s);
-s-dinfo = dinfo;
-qemu_bh_schedule(s-bh);
-}
-
 static int parse_block_error_action(const char *buf, bool is_read)
 {
 if (!strcmp(buf, ignore)) {
@@ -1394,8 +1363,6 @@ static void block_job_cb(void *opaque, int ret)
 monitor_protocol_event(QEVENT_BLOCK_JOB_COMPLETED, obj);
 }
 qobject_decref(obj);
-
-drive_put_ref_bh_schedule(drive_get_by_blockdev(bs));
 }
 
 void qmp_block_stream(const char *device, bool has_base,
@@ -1431,12 +1398,6 @@ void qmp_block_stream(const char *device, bool has_base,
 error_propagate(errp, local_err);
 return;
 }
-
-/* Grab a reference so hotplug does not delete the BlockDriverState from
- * underneath us.
- */
-drive_get_ref(drive_get_by_blockdev(bs));
-
 trace_qmp_block_stream(bs, bs-job);
 }
 
@@ -1493,10 +1454,6 @@ void qmp_block_commit(const char *device,
 error_propagate(errp, local_err);
 return;
 }
-/* Grab a reference so hotplug does not delete the BlockDriverState from
- * underneath us.
- */
-drive_get_ref(drive_get_by_blockdev(bs));
 }
 
 void qmp_drive_backup(const char *device, const char *target,
@@ -1609,11 +1566,6 @@ void qmp_drive_backup(const char *device, const char 
*target,
 error_propagate(errp, local_err);
 return;
 }
-
-/* Grab a reference so hotplug does not delete the BlockDriverState from
- * underneath us.
- */
-drive_get_ref(drive_get_by_blockdev(bs));
 }
 
 #define DEFAULT_MIRROR_BUF_SIZE   (10  20)
@@ -1750,11 +1702,6 @@ void qmp_drive_mirror(const char *device, const char 
*target,
 error_propagate(errp, local_err);
 return;
 }
-
-/* Grab a reference so hotplug does not delete the BlockDriverState from
- * underneath us.
- */
-drive_get_ref(drive_get_by_blockdev(bs));
 }
 
 static BlockJob *find_block_job(const char *device)
diff --git a/blockjob.c b/blockjob.c
index ca80df1..832d558 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -46,6 +46,7 @@ void *block_job_create(const BlockJobType *job_type, 
BlockDriverState *bs,
 return NULL;
 }
 bdrv_set_in_use(bs, 1);
+bdrv_ref(bs);
 
 job = g_malloc0(job_type-instance_size);
 job-job_type  = job_type;
@@ -80,6 +81,8 @@ void block_job_completed(BlockJob *job, int ret)
 bs-job = NULL;
 g_free(job);
 bdrv_set_in_use(bs, 0);
+bdrv_unref(bs);
+job-bs = NULL;
 }
 
 void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
-- 
1.8.3.1

Re: [Qemu-devel] [RFC] [PATCHv8 11/30] aio / timers: Add a notify callback to QEMUTimerList

2013-08-09 Thread Paolo Bonzini

Il 08/08/2013 23:42, Alex Bligh ha scritto:
 Add a notify pointer to QEMUTimerList so it knows what to notify
 on a timer change.

If we do the two AioContexts trick, this can simply be a back-pointer
to the AioContext.

Paolo

 Signed-off-by: Alex Bligh a...@alex.org.uk
 ---
  async.c  |7 ++-
  include/qemu/timer.h |   35 ---
  qemu-timer.c |   24 ++--
  3 files changed, 60 insertions(+), 6 deletions(-)
 
 diff --git a/async.c b/async.c
 index 99fb5a8..2051921 100644
 --- a/async.c
 +++ b/async.c
 @@ -234,6 +234,11 @@ void aio_notify(AioContext *ctx)
  event_notifier_set(ctx-notifier);
  }
  
 +static void aio_timerlist_notify(void *opaque)
 +{
 +aio_notify(opaque);
 +}
 +
  AioContext *aio_context_new(void)
  {
  AioContext *ctx;
 @@ -245,7 +250,7 @@ AioContext *aio_context_new(void)
  aio_set_event_notifier(ctx, ctx-notifier, 
 (EventNotifierHandler *)
 event_notifier_test_and_clear, NULL);
 -timerlistgroup_init(ctx-tlg);
 +timerlistgroup_init(ctx-tlg, aio_timerlist_notify, ctx);
  
  return ctx;
  }
 diff --git a/include/qemu/timer.h b/include/qemu/timer.h
 index 3b46f60..07e6d9e 100644
 --- a/include/qemu/timer.h
 +++ b/include/qemu/timer.h
 @@ -54,6 +54,7 @@ typedef struct QEMUClock QEMUClock;
  typedef struct QEMUTimerList QEMUTimerList;
  typedef QEMUTimerList *QEMUTimerListGroup[QEMU_CLOCK_MAX];
  typedef void QEMUTimerCB(void *opaque);
 +typedef void QEMUTimerListNotifyCB(void *opaque);
  
  extern QEMUTimerListGroup main_loop_tlg;
  extern QEMUClock *qemu_clocks[QEMU_CLOCK_MAX];
 @@ -213,13 +214,41 @@ QEMUClock *timerlist_get_clock(QEMUTimerList 
 *timer_list);
  bool timerlist_run_timers(QEMUTimerList *timer_list);
  
  /**
 + * timerlist_set_notify_cb:
 + * @timer_list: the timer list to use
 + * @cb: the callback to call on notification
 + * @opaque: the opaque pointer to pass to the callback
 + *
 + * Set the notify callback for a timer list. The notifier
 + * callback is called when the clock is reenabled or a timer
 + * on the list is modified.
 + */
 +void timerlist_set_notify_cb(QEMUTimerList *timer_list,
 + QEMUTimerListNotifyCB *cb, void *opaque);
 +
 +/**
 + * timerlist_notify:
 + * @timer_list: the timer list to use
 + *
 + * call the notifier callback associated with the timer list.
 + */
 +void timerlist_notify(QEMUTimerList *timer_list);
 +
 +/**
   * timerlistgroup_init:
   * @tlg: the timer list group
 + * @cb: the callback to call when a notify is required
 + * @opaque: the opaque pointer to be passed to the callback.
   *
   * Initialise a timer list group. This must already be
 - * allocated in memory and zeroed.
 - */
 -void timerlistgroup_init(QEMUTimerListGroup tlg);
 + * allocated in memory and zeroed. The notifier callback is
 + * called whenever a clock in the timer list group is
 + * reenabled or whenever a timer associated with any timer
 + * list is modified. If @cb is specified as null, qemu_notify()
 + * is used instead.
 + */
 +void timerlistgroup_init(QEMUTimerListGroup tlg,
 + QEMUTimerListNotifyCB *cb, void *opaque);
  
  /**
   * timerlistgroup_deinit:
 diff --git a/qemu-timer.c b/qemu-timer.c
 index e151d24..14cb433 100644
 --- a/qemu-timer.c
 +++ b/qemu-timer.c
 @@ -73,6 +73,8 @@ struct QEMUTimerList {
  QEMUClock *clock;
  QEMUTimer *active_timers;
  QLIST_ENTRY(QEMUTimerList) list;
 +QEMUTimerListNotifyCB *notify_cb;
 +void *notify_opaque;
  };
  
  struct QEMUTimer {
 @@ -394,6 +396,22 @@ QEMUTimerList 
 *qemu_clock_get_main_loop_timerlist(QEMUClock *clock)
  return clock-main_loop_timerlist;
  }
  
 +void timerlist_set_notify_cb(QEMUTimerList *timer_list,
 + QEMUTimerListNotifyCB *cb, void *opaque)
 +{
 +timer_list-notify_cb = cb;
 +timer_list-notify_opaque = opaque;
 +}
 +
 +void timerlist_notify(QEMUTimerList *timer_list)
 +{
 +if (timer_list-notify_cb) {
 +timer_list-notify_cb(timer_list-notify_opaque);
 +} else {
 +qemu_notify_event();
 +}
 +}
 +
  /* Transition function to convert a nanosecond timeout to ms
   * This is used where a system does not support ppoll
   */
 @@ -518,7 +536,7 @@ void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time)
  /* Interrupt execution to force deadline recalculation.  */
  qemu_clock_warp(ts-timer_list-clock);
  if (use_icount) {
 -qemu_notify_event();
 +timerlist_notify(ts-timer_list);
  }
  }
  }
 @@ -576,11 +594,13 @@ bool qemu_run_timers(QEMUClock *clock)
  return timerlist_run_timers(clock-main_loop_timerlist);
  }
  
 -void timerlistgroup_init(QEMUTimerListGroup tlg)
 +void timerlistgroup_init(QEMUTimerListGroup tlg,
 + QEMUTimerListNotifyCB *cb, void *opaque)
  {
  QEMUClockType type;
  for (type = 0; type

Re: [Qemu-devel] [RFC] [PATCHv8 16/30] aio / timers: On timer modification, qemu_notify or aio_notify

2013-08-09 Thread Paolo Bonzini

Il 08/08/2013 23:42, Alex Bligh ha scritto:
 On qemu_mod_timer_ns, ensure qemu_notify or aio_notify is called to
 end the appropriate poll(), irrespective of use_icount value.
 
 On qemu_clock_enable, ensure qemu_notify or aio_notify is called for
 all QEMUTimerLists attached to the QEMUClock.
 
 Signed-off-by: Alex Bligh a...@alex.org.uk

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

 ---
  include/qemu/timer.h |9 +
  qemu-timer.c |   13 ++---
  2 files changed, 19 insertions(+), 3 deletions(-)
 
 diff --git a/include/qemu/timer.h b/include/qemu/timer.h
 index 07e6d9e..68f76df 100644
 --- a/include/qemu/timer.h
 +++ b/include/qemu/timer.h
 @@ -122,6 +122,15 @@ bool qemu_clock_use_for_deadline(QEMUClock *clock);
  QEMUTimerList *qemu_clock_get_main_loop_timerlist(QEMUClock *clock);
  
  /**
 + * qemu_clock_nofify:
 + * @clock: the clock to operate on
 + *
 + * Call the notifier callback connected with the default timer
 + * list linked to the clock, or qemu_notify() if none.
 + */
 +void qemu_clock_notify(QEMUClock *clock);
 +
 +/**
   * timerlist_new:
   * @type: the clock type to associate with the timerlist
   *
 diff --git a/qemu-timer.c b/qemu-timer.c
 index 14cb433..4b162e1 100644
 --- a/qemu-timer.c
 +++ b/qemu-timer.c
 @@ -307,11 +307,20 @@ bool qemu_clock_use_for_deadline(QEMUClock *clock)
  return !(use_icount  (clock-type == QEMU_CLOCK_VIRTUAL));
  }
  
 +void qemu_clock_notify(QEMUClock *clock)
 +{
 +QEMUTimerList *timer_list;
 +QLIST_FOREACH(timer_list, clock-timerlists, list) {
 +timerlist_notify(timer_list);
 +}
 +}
 +
  void qemu_clock_enable(QEMUClock *clock, bool enabled)
  {
  bool old = clock-enabled;
  clock-enabled = enabled;
  if (enabled  !old) {
 +qemu_clock_notify(clock);
  qemu_rearm_alarm_timer(alarm_timer);
  }
  }
 @@ -535,9 +544,7 @@ void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time)
  }
  /* Interrupt execution to force deadline recalculation.  */
  qemu_clock_warp(ts-timer_list-clock);
 -if (use_icount) {
 -timerlist_notify(ts-timer_list);
 -}
 +timerlist_notify(ts-timer_list);
  }
  }

Re: [Qemu-devel] [RFC] [PATCHv8 12/30] aio / timers: aio_ctx_prepare sets timeout from AioContext timers

2013-08-09 Thread Paolo Bonzini

Il 08/08/2013 23:42, Alex Bligh ha scritto:
 Calculate the timeout in aio_ctx_prepare taking into account
 the timers attached to the AioContext.
 
 Alter aio_ctx_check similarly.
 
 Signed-off-by: Alex Bligh a...@alex.org.uk
 ---
  async.c |   13 +++--
  1 file changed, 11 insertions(+), 2 deletions(-)
 
 diff --git a/async.c b/async.c
 index 2051921..dd27459 100644
 --- a/async.c
 +++ b/async.c
 @@ -150,13 +150,14 @@ aio_ctx_prepare(GSource *source, gint*timeout)
  {
  AioContext *ctx = (AioContext *) source;
  QEMUBH *bh;
 +int deadline;
  
  for (bh = ctx-first_bh; bh; bh = bh-next) {
  if (!bh-deleted  bh-scheduled) {
  if (bh-idle) {
  /* idle bottom halves will be polled at least
   * every 10ms */
 -*timeout = 10;
 +*timeout = qemu_soonest_timeout(*timeout, 10);
  } else {
  /* non-idle bottom halves will be executed
   * immediately */
 @@ -166,6 +167,14 @@ aio_ctx_prepare(GSource *source, gint*timeout)
  }
  }
  
 +deadline = qemu_timeout_ns_to_ms(timerlistgroup_deadline_ns(ctx-tlg));
 +if (deadline == 0) {
 +*timeout = 0;
 +return true;
 +} else {
 +*timeout = qemu_soonest_timeout(*timeout, deadline);
 +}
 +
  return false;
  }
  
 @@ -180,7 +189,7 @@ aio_ctx_check(GSource *source)
  return true;
   }
  }
 -return aio_pending(ctx);
 +return aio_pending(ctx) || (timerlistgroup_deadline_ns(ctx-tlg) == 0);

Again, if we do the two AioContext tricks, the
timerlistgroup_deadline_ns function would be internal to AioContext.

Paolo

  }
  
  static gboolean

Re: [Qemu-devel] [RFC] [PATCHv8 13/30] aio / timers: Add aio_timer_new wrapper

2013-08-09 Thread Paolo Bonzini

Il 08/08/2013 23:42, Alex Bligh ha scritto:
 Add aio_timer_new wrapper function.
 
 Signed-off-by: Alex Bligh a...@alex.org.uk
 ---
  include/block/aio.h |   19 +++
  1 file changed, 19 insertions(+)
 
 diff --git a/include/block/aio.h b/include/block/aio.h
 index a13f6e8..bd6f17c 100644
 --- a/include/block/aio.h
 +++ b/include/block/aio.h
 @@ -255,4 +255,23 @@ void qemu_aio_set_fd_handler(int fd,
   void *opaque);
  #endif
  
 +/**
 + * aio_timer_new:
 + * @ctx: the aio context
 + * @type: the clock type
 + * @scale: the scale
 + * @cb: the callback to call on timer expiry
 + * @opaque: the opaque pointer to pass to the callback
 + *
 + * Generate a new timer attached to the context @ctx.
 + *
 + * Returns: a pointer to the new timer
 + */
 +static inline QEMUTimer *aio_timer_new(AioContext *ctx, QEMUClockType type,
 +   int scale,
 +   QEMUTimerCB *cb, void *opaque)
 +{
 +return timer_new(ctx-tlg[type], scale, cb, opaque);
 +}

Since we're doing a new API, I would prefer to have it as timer_init and
aio_timer_init.  We can remove the allocation completely, it is a
useless indirection and we misuse it since we hardly ever call
qemu_free_timer.

Paolo

  #endif

Re: [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-09 Thread Alexey Kardashevskiy

On 08/09/2013 07:53 PM, Paolo Bonzini wrote:
 Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
 Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
 A PCI device's DMA address space (possibly an IOMMU) is returned by a
 method on the PCIBus.  At the moment that only has one caller, so the
 method is simply open coded.  We'll need another caller for VFIO, so
 this patch introduces a helper/wrapper function.

 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 [aik: added inheritance from parent if iommu is not set for the current 
 bus]
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---
 Changes:
 v2:
 * added inheritance, needed for a pci-bridge on spapr-ppc64
 * pci_iommu_as renamed to pci_device_iommu_address_space
 ---
  hw/pci/pci.c | 22 --
  include/hw/pci/pci.h |  1 +
  2 files changed, 17 insertions(+), 6 deletions(-)

 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 4c004f5..0072b54 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
 *pci_dev, PCIBus *bus,
  }
  
  pci_dev-bus = bus;
 -if (bus-iommu_fn) {
 -dma_as = bus-iommu_fn(bus, bus-iommu_opaque, devfn);
 -} else {
 -/* FIXME: inherit memory region from bus creator */
 -dma_as = address_space_memory;
 -}
 +dma_as = pci_device_iommu_address_space(pci_dev);
  
  memory_region_init_alias(pci_dev-bus_master_enable_region,
   OBJECT(pci_dev), bus master,
 @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass 
 *klass, void *data)
  k-props = pci_props;
  }
  
 +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
 +{
 +PCIBus *bus = PCI_BUS(dev-bus);
 +
 +if (bus-iommu_fn) {
 +return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
 +}
 +
 +if (bus-parent_dev) {
 +return pci_device_iommu_address_space(bus-parent_dev);
 +}

 No, this would fail if bus-parent_dev is not NULL but not a PCI device
 either.

 parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/
 
 Doh, I misread the code, I thought it was the parent field in
 BusState.  Why do we have parent_dev at all?

The code is too old? Don't know.


 You can use object_dynamic_cast to convert the parent_dev to
 PCIDevice, and if the cast succeeds you call the new function.

 Perhaps you could make the new function take a PCIBus instead.
 Accessing the PCIDevice's IOMMU address space (as opposed to the
 bus-master address space) doesn't make much sense, VFIO is really a
 special case.  Putting the new API on the bus side instead looks better.

 (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
 devices sitting on the secondary bus?)

 It happens naturally I guess when linux enables devices.
 
 Yes, but then using the IOMMU address space would be wrong; you would
 have to use the bus-master address space as a base for the child's
 bus-master address space.


Like this? Works too.

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 8bdcedc..a4c70e6 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2247,23 +2247,23 @@ AddressSpace
*pci_device_iommu_address_space(PCIDevice *dev)
 {
 PCIBus *bus = PCI_BUS(dev-bus);

 if (bus-iommu_fn) {
 return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
 }

 if (bus-parent_dev) {
 return pci_device_iommu_address_space(bus-parent_dev);
 }

-return address_space_memory;
+return dev-bus_master_as;
 }



 Also, we would have to fix the x86 firmware.
 
 Paolo
 


-- 
Alexey

Re: [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-09 Thread Paolo Bonzini

Il 09/08/2013 12:13, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 07:53 PM, Paolo Bonzini wrote:
 Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
 Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
 A PCI device's DMA address space (possibly an IOMMU) is returned by a
 method on the PCIBus.  At the moment that only has one caller, so the
 method is simply open coded.  We'll need another caller for VFIO, so
 this patch introduces a helper/wrapper function.

 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 [aik: added inheritance from parent if iommu is not set for the current 
 bus]
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---
 Changes:
 v2:
 * added inheritance, needed for a pci-bridge on spapr-ppc64
 * pci_iommu_as renamed to pci_device_iommu_address_space
 ---
  hw/pci/pci.c | 22 --
  include/hw/pci/pci.h |  1 +
  2 files changed, 17 insertions(+), 6 deletions(-)

 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 4c004f5..0072b54 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
 *pci_dev, PCIBus *bus,
  }
  
  pci_dev-bus = bus;
 -if (bus-iommu_fn) {
 -dma_as = bus-iommu_fn(bus, bus-iommu_opaque, devfn);
 -} else {
 -/* FIXME: inherit memory region from bus creator */
 -dma_as = address_space_memory;
 -}
 +dma_as = pci_device_iommu_address_space(pci_dev);
  
  memory_region_init_alias(pci_dev-bus_master_enable_region,
   OBJECT(pci_dev), bus master,
 @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass 
 *klass, void *data)
  k-props = pci_props;
  }
  
 +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
 +{
 +PCIBus *bus = PCI_BUS(dev-bus);
 +
 +if (bus-iommu_fn) {
 +return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
 +}
 +
 +if (bus-parent_dev) {
 +return pci_device_iommu_address_space(bus-parent_dev);
 +}

 No, this would fail if bus-parent_dev is not NULL but not a PCI device
 either.

 parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/

 Doh, I misread the code, I thought it was the parent field in
 BusState.  Why do we have parent_dev at all?
 
 The code is too old? Don't know.
 
 
 You can use object_dynamic_cast to convert the parent_dev to
 PCIDevice, and if the cast succeeds you call the new function.

 Perhaps you could make the new function take a PCIBus instead.
 Accessing the PCIDevice's IOMMU address space (as opposed to the
 bus-master address space) doesn't make much sense, VFIO is really a
 special case.  Putting the new API on the bus side instead looks better.

 (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
 devices sitting on the secondary bus?)

 It happens naturally I guess when linux enables devices.

 Yes, but then using the IOMMU address space would be wrong; you would
 have to use the bus-master address space as a base for the child's
 bus-master address space.
 
 
 Like this? Works too.
 
 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 8bdcedc..a4c70e6 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -2247,23 +2247,23 @@ AddressSpace
 *pci_device_iommu_address_space(PCIDevice *dev)
  {
  PCIBus *bus = PCI_BUS(dev-bus);
 
  if (bus-iommu_fn) {
  return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
  }
 
  if (bus-parent_dev) {
  return pci_device_iommu_address_space(bus-parent_dev);
  }
 
 -return address_space_memory;
 +return dev-bus_master_as;
  }

I was thinking more like this:

 if (bus-parent_dev) {
-return pci_device_iommu_address_space(bus-parent_dev);
+/* Take parent device's bus-master enable bit into account.  */
+return pci_get_address_space(bus-parent_dev);
 }

+/* Not a secondary bus and no IOMMU.  Use system memory.  */
 return address_space_memory;

Paolo

 
 
 Also, we would have to fix the x86 firmware.

 Paolo

Re: [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-09 Thread Benjamin Herrenschmidt

On Fri, 2013-08-09 at 11:40 +0200, Paolo Bonzini wrote:
 (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA
 for devices sitting on the secondary bus?)

In theory yes though I have seen bridges who ignore it...

Cheers,
Ben.

Re: [Qemu-devel] [RFC] [PATCHv8 28/30] aio / timers: Add scripts/switch-timer-api

2013-08-09 Thread Paolo Bonzini

Il 08/08/2013 23:42, Alex Bligh ha scritto:
 Add scripts/switch-timer-api to programatically rewrite source
 files to use the new timer system.
 
 Signed-off-by: Alex Bligh a...@alex.org.uk
 ---
  scripts/switch-timer-api |  178 
 ++
  1 file changed, 178 insertions(+)
 
 diff --git a/scripts/switch-timer-api b/scripts/switch-timer-api
 new file mode 100755
 index 000..5a06069
 --- /dev/null
 +++ b/scripts/switch-timer-api
 @@ -0,0 +1,178 @@
 +#!/usr/bin/perl
 +
 +use strict;
 +use warnings;
 +use Getopt::Long;
 +use FindBin;
 +
 +my @legacy = qw(qemu_clock_ptr qemu_get_clock_ns qemu_get_clock_ms 
 qemu_register_clock_reset_notifier qemu_unregister_clock_reset_notifier 
 qemu_new_timer qemu_free_timer qemu_del_timer qemu_mod_timer_ns 
 qemu_mod_timer qemu_run_timers qemu_new_timer_ns qemu_new_timer_us 
 qemu_new_timer_ms);
 +my $legacyre = '\b('.join('|', @legacy).')\b';
 +my $option_git;
 +my $option_dryrun;
 +my $option_quiet;
 +my $option_rtc;
 +my $suffix=.tmp.$$;
 +my @files;
 +my $getfiles = 'git grep -l -E 
 \'\b((host|rt|vm|rtc)_clock\b|qemu_\w*timer)\' | egrep \'\.[ch]$\' | egrep -v 
 \'qemu-timer\.c$|include/qemu/timer\.h$\'';
 +
 +sub Syntax
 +{
 +print STDERR STOP;
 +Usage: $FindBin::Script [options] FILE ...
 +
 +Translate each FILE to the new Qemu timer API. If no files
 +are passed, a reasonable guess is taken.
 +
 +Options:
 +  -q, --quiet Do not show warnings etc
 +  -d, --dry-run   Do a dry run
 +  -g, --git   Generate a git commit for each change
 +  -r, --rtc   Only fix up rtc usage

What is this option useful for?

Paolo

 +  -h, --help  Print this message
 +
 +STOP
 +return;
 +}
 +
 +sub ParseOptions
 +{
 +if (!GetOptions (
 +  dry-run|d = \$option_dryrun,
 + git|g = \$option_git,
 +  quiet|q = \$option_quiet,
 +  rtc|r = \$option_rtc,
 + help|h = sub { Syntax(); exit(0); }
 +))
 +{
 +Syntax();
 +die Bad options;
 +}
 +
 +if ($#ARGV =0)
 +{
 + @files = @ARGV;
 +}
 +else
 +{
 + @files = split(/\s+/, `$getfiles`);
 +}
 +
 +foreach my $file (@files)
 +{
 + die Cannot find $file unless (-f $file  -r $file);
 +}
 +}
 +
 +sub DoWarn
 +{
 +my $text = shift @_;
 +my $line = shift @_;
 +return if ($option_quiet);
 +chomp ($line);
 +print STDERR $text\n;
 +print STDERR $line\n\n;
 +}
 +
 +sub Process
 +{
 +my $ifn = shift @_;
 +my $ofn = $ifn.$suffix;
 +
 +my $intext;
 +my $outtext;
 +my $linenum = 0;
 +
 +open my $input, , $ifn || die Cannot open $ifn for read: $!;
 +
 +while ($input)
 +{
 + my $line = $_;
 + $intext .= $line;
 + $linenum++;
 +
 + # fix the specific uses
 + unless ($option_rtc)
 + {
 + $line =~ 
 s/\bqemu_new_timer(_[num]s)\s*\((vm_|rt_|host_)clock\b/qemu_timer_new$1(XXX_$2clock/g;
 + $line =~ 
 s/\bqemu_new_timer\s*\((vm_|rt_|host_)clock\b/qemu_timer_new(XXX_$1clock/g;
 + $line =~ 
 s/\bqemu_get_clock(_[num]s)\s*\((vm_|rt_|host_)clock\b/qemu_clock_get$1(XXX_$2clock/g;
 + }
 +
 + # rtc is different
 + $line =~ 
 s/\bqemu_new_timer(_[num]s)\s*\(rtc_clock\b/qemu_timer_new$1(rtc_clock/g;
 + $line =~ s/\bqemu_new_timer\s*\(rtc_clock\b/qemu_timer_new(rtc_clock/g;
 + $line =~ 
 s/\bqemu_get_clock(_[num]s)\s*\(rtc_clock\b/qemu_clock_get$1(rtc_clock/g;
 + $line =~ 
 s/\bqemu_register_clock_reset_notifier\s*\(rtc_clock\b/qemu_register_clock_reset_notifier(qemu_clock_ptr(rtc_clock)/g;
 +
 + unless ($option_rtc)
 + {
 + # fix up comments
 + $line =~ s/\b(vm_|rt_|host_)clock\b/XXX_$1clock/g if ($line =~ 
 m,^[/ ]+\*,);
 +
 + # spurious fprintf error reporting
 + $line =~ s/: qemu_new_timer_ns failed/: qemu_timer_new_ns failed/g;
 +
 + # these have just changed name
 + $line =~ s/\bqemu_mod_timer\b/qemu_timer_mod/g;
 + $line =~ s/\bqemu_mod_timer_(ns|us|ms)\b/qemu_timer_mod_$1/g;
 + $line =~ s/\bqemu_free_timer\b/qemu_timer_free/g;
 + $line =~ s/\bqemu_del_timer\b/qemu_timer_del/g;
 + }
 +
 + # fix up rtc_clock
 + $line =~ s/QEMUClock \*rtc_clock;/QEMUClockType rtc_clock;/g;
 + $line =~ s/\brtc_clock = (vm_|rt_|host_)clock\b/rtc_clock = 
 XXX_$1clock/g;
 +
 + unless ($option_rtc)
 + {
 + # replace any more general uses
 + $line =~ s/\b(vm_|rt_|host_)clock\b/qemu_clock_ptr(XXX_$1clock)/g;
 + }
 +
 + # fix up the place holders
 + $line =~ s/\bXXX_vm_clock\b/QEMU_CLOCK_VIRTUAL/g;
 + $line =~ s/\bXXX_rt_clock\b/QEMU_CLOCK_REALTIME/g;
 + $line =~ s/\bXXX_host_clock\b/QEMU_CLOCK_HOST/g;
 +
 + unless ($option_rtc)
 + {
 + DoWarn($ifn:$linenum WARNING: timer $1 not fixed up, $line) if 
 ($line =~ /\b((vm_|rt_|host_)clock)\b/);
 + DoWarn($ifn:$linenum WARNING: function $1 not fixed up, $line) if 
 ($line =~

Re: [Qemu-devel] [RFC] [PATCHv8 28/30] aio / timers: Add scripts/switch-timer-api

2013-08-09 Thread Paolo Bonzini

Il 08/08/2013 23:42, Alex Bligh ha scritto:
 + # these have just changed name
 + $line =~ s/\bqemu_mod_timer\b/qemu_timer_mod/g;
 + $line =~ s/\bqemu_mod_timer_(ns|us|ms)\b/qemu_timer_mod_$1/g;
 + $line =~ s/\bqemu_free_timer\b/qemu_timer_free/g;
 + $line =~ s/\bqemu_del_timer\b/qemu_timer_del/g;

I couldn't quite track which patch introduced this change.

I would either go all the way and drop the qemu_ prefix, or leave the
old name in place.

Paolo

[Qemu-devel] [PATCH v5 5/8] migration: omit drive ref as we have bdrv_ref now

2013-08-09 Thread Fam Zheng

block-migration.c does not actually use DriveInfo anywhere.  Hence it's
safe to drive ref code, we really only care about referencing BDS.

Signed-off-by: Fam Zheng f...@redhat.com
---
 block-migration.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index f803f20..daf9ec1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -336,8 +336,8 @@ static void init_blk_migration_it(void *opaque, 
BlockDriverState *bs)
 bmds-completed_sectors = 0;
 bmds-shared_base = block_mig_state.shared_base;
 alloc_aio_bitmap(bmds);
-drive_get_ref(drive_get_by_blockdev(bs));
 bdrv_set_in_use(bs, 1);
+bdrv_ref(bs);
 
 block_mig_state.total_sector_sum += sectors;
 
@@ -575,7 +575,7 @@ static void blk_mig_cleanup(void)
 while ((bmds = QSIMPLEQ_FIRST(block_mig_state.bmds_list)) != NULL) {
 QSIMPLEQ_REMOVE_HEAD(block_mig_state.bmds_list, entry);
 bdrv_set_in_use(bmds-bs, 0);
-drive_put_ref(drive_get_by_blockdev(bmds-bs));
+bdrv_unref(bmds-bs);
 g_free(bmds-aio_bitmap);
 g_free(bmds);
 }
-- 
1.8.3.1

Re: [Qemu-devel] [RFC] [PATCHv8 00/30] aio / timers: Add AioContext timers and use ppoll

2013-08-09 Thread Paolo Bonzini

Il 08/08/2013 23:41, Alex Bligh ha scritto:
 This patch series adds support for timers attached to an AioContext clock
 which get called within aio_poll.
 
 In doing so it removes alarm timers and moves to use ppoll where possible.
 
 This patch set 'sort of' passes make check (see below for caveat)
 including a new test harness for the aio timers, but has not been
 tested much beyond that. In particular, the win32 changes have not
 even been compile tested. Equally, alterations to use_icount
 are untested.
 
 Caveat: I have had to alter tests/test-aio.c so the following error
 no longer occurs.
 
 ERROR:tests/test-aio.c:346:test_wait_event_notifier_noflush: assertion 
 failed: (aio_poll(ctx, false))
 
 As gar as I can tell, this check was incorrect, in that it checking
 aio_poll makes progress when in fact it should not make progress. I
 fixed an issue where aio_poll was (as far as I can tell) wrongly
 returning true on a timeout, and that generated this error.
 
 Note also the comment on patch 18 in relation to a possible bug
 in cpus.c.
 
 The penultimate patch is patch which is created in an automated manner
 using scripts/switch-timer-api, added in this patch set. It violates some
 coding standards (e.g. line length = 80 characters), but this is preferable
 in terms of giving a provably correct conversion.
 
 This patch set has been compile tested  make check tested on a
 'christmas-tree' configuration, meaning a configuration with every
 --enable- value tested that can be easily configured on Ubuntu Precise,
 after application of each patch.

Awesome work, really.  I think we can still simplify it a bit, but many
of the changes I asked should be doable with a global search-and-replace
on the patch files.

Thanks!

Paolo

Re: [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-09 Thread Paolo Bonzini

Il 09/08/2013 12:20, Benjamin Herrenschmidt ha scritto:
 On Fri, 2013-08-09 at 11:40 +0200, Paolo Bonzini wrote:
 (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA
 for devices sitting on the secondary bus?)
 
 In theory yes though I have seen bridges who ignore it...

And we would need a compatibility property anyway.  So let's choose to
be among the ones who ignore it (i.e. the same as this patch).  But it
needs a comment.

Paolo

Re: [Qemu-devel] [RFC] [PATCHv8 00/30] aio / timers: Add AioContext timers and use ppoll

2013-08-09 Thread Alex Bligh


Jan,

--On 9 August 2013 10:12:45 +0200 Jan Kiszka jan.kis...@siemens.com wrote:


Do you have this in git somewhere? I'd like to port my (almost) BQL-free
threaded RTC device model on top to check how well the new API works.
But it looks promising.


Per my subsequent message:

For ease of review, the finished result (and for that matter all the
intermediate steps) are available at:
https://github.com/abligh/qemu/tree/timer-api

If you are pulling this tree, note there is an intermediate commit

* b1eb241 2013-07-19 | Disable expensive tests [Alex Bligh]

which should not be included in the final pull, which merely disables
tests that take many minutes to run on my puny laptop.

This commit does NOT exist in the patch series posted here.

[stefanha: the above commit no longer breaks the build]


--
Alex Bligh

Re: [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-09 Thread David Gibson

On Fri, Aug 09, 2013 at 07:48:16PM +1000, Alexey Kardashevskiy wrote:
 On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
  Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
  A PCI device's DMA address space (possibly an IOMMU) is returned by a
  method on the PCIBus.  At the moment that only has one caller, so the
  method is simply open coded.  We'll need another caller for VFIO, so
  this patch introduces a helper/wrapper function.
 
  Signed-off-by: David Gibson da...@gibson.dropbear.id.au
  [aik: added inheritance from parent if iommu is not set for the current 
  bus]
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
  ---
  Changes:
  v2:
  * added inheritance, needed for a pci-bridge on spapr-ppc64
  * pci_iommu_as renamed to pci_device_iommu_address_space
  ---
   hw/pci/pci.c | 22 --
   include/hw/pci/pci.h |  1 +
   2 files changed, 17 insertions(+), 6 deletions(-)
 
  diff --git a/hw/pci/pci.c b/hw/pci/pci.c
  index 4c004f5..0072b54 100644
  --- a/hw/pci/pci.c
  +++ b/hw/pci/pci.c
  @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
  *pci_dev, PCIBus *bus,
   }
   
   pci_dev-bus = bus;
  -if (bus-iommu_fn) {
  -dma_as = bus-iommu_fn(bus, bus-iommu_opaque, devfn);
  -} else {
  -/* FIXME: inherit memory region from bus creator */
  -dma_as = address_space_memory;
  -}
  +dma_as = pci_device_iommu_address_space(pci_dev);
   
   memory_region_init_alias(pci_dev-bus_master_enable_region,
OBJECT(pci_dev), bus master,
  @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass 
  *klass, void *data)
   k-props = pci_props;
   }
   
  +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
  +{
  +PCIBus *bus = PCI_BUS(dev-bus);
  +
  +if (bus-iommu_fn) {
  +return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
  +}
  +
  +if (bus-parent_dev) {
  +return pci_device_iommu_address_space(bus-parent_dev);
  +}
  
  No, this would fail if bus-parent_dev is not NULL but not a PCI device
  either.
 
 parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/

It will be a PCI device, though the confusion is understandable.
There is a parent_dev in both PCIBus, which is always a PCI device if
non-NULL, as well as a parent_dev in BusState, which can be any sort
of device.  If both are non-NULL they will be equal apart from type.

So bus-parent_dev is safe here, bus-qbus.parent_dev would not be.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgp9uBlKbLlIZ.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-09 Thread Alexey Kardashevskiy

On 08/09/2013 08:19 PM, Paolo Bonzini wrote:
 Il 09/08/2013 12:13, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 07:53 PM, Paolo Bonzini wrote:
 Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
 Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
 A PCI device's DMA address space (possibly an IOMMU) is returned by a
 method on the PCIBus.  At the moment that only has one caller, so the
 method is simply open coded.  We'll need another caller for VFIO, so
 this patch introduces a helper/wrapper function.

 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 [aik: added inheritance from parent if iommu is not set for the current 
 bus]
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---
 Changes:
 v2:
 * added inheritance, needed for a pci-bridge on spapr-ppc64
 * pci_iommu_as renamed to pci_device_iommu_address_space
 ---
  hw/pci/pci.c | 22 --
  include/hw/pci/pci.h |  1 +
  2 files changed, 17 insertions(+), 6 deletions(-)

 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 4c004f5..0072b54 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
 *pci_dev, PCIBus *bus,
  }
  
  pci_dev-bus = bus;
 -if (bus-iommu_fn) {
 -dma_as = bus-iommu_fn(bus, bus-iommu_opaque, devfn);
 -} else {
 -/* FIXME: inherit memory region from bus creator */
 -dma_as = address_space_memory;
 -}
 +dma_as = pci_device_iommu_address_space(pci_dev);
  
  memory_region_init_alias(pci_dev-bus_master_enable_region,
   OBJECT(pci_dev), bus master,
 @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass 
 *klass, void *data)
  k-props = pci_props;
  }
  
 +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
 +{
 +PCIBus *bus = PCI_BUS(dev-bus);
 +
 +if (bus-iommu_fn) {
 +return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
 +}
 +
 +if (bus-parent_dev) {
 +return pci_device_iommu_address_space(bus-parent_dev);
 +}

 No, this would fail if bus-parent_dev is not NULL but not a PCI device
 either.

 parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/

 Doh, I misread the code, I thought it was the parent field in
 BusState.  Why do we have parent_dev at all?

 The code is too old? Don't know.


 You can use object_dynamic_cast to convert the parent_dev to
 PCIDevice, and if the cast succeeds you call the new function.

 Perhaps you could make the new function take a PCIBus instead.
 Accessing the PCIDevice's IOMMU address space (as opposed to the
 bus-master address space) doesn't make much sense, VFIO is really a
 special case.  Putting the new API on the bus side instead looks better.

 (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
 devices sitting on the secondary bus?)

 It happens naturally I guess when linux enables devices.

 Yes, but then using the IOMMU address space would be wrong; you would
 have to use the bus-master address space as a base for the child's
 bus-master address space.


 Like this? Works too.

 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 8bdcedc..a4c70e6 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -2247,23 +2247,23 @@ AddressSpace
 *pci_device_iommu_address_space(PCIDevice *dev)
  {
  PCIBus *bus = PCI_BUS(dev-bus);

  if (bus-iommu_fn) {
  return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
  }

  if (bus-parent_dev) {
  return pci_device_iommu_address_space(bus-parent_dev);
  }

 -return address_space_memory;
 +return dev-bus_master_as;
  }
 
 I was thinking more like this:
 
  if (bus-parent_dev) {
 -return pci_device_iommu_address_space(bus-parent_dev);
 +/* Take parent device's bus-master enable bit into account.  */
 +return pci_get_address_space(bus-parent_dev);
  }
 
 +/* Not a secondary bus and no IOMMU.  Use system memory.  */
  return address_space_memory;

Ok.

Oh. BTW. This bus master thing now breaks VFIO's check for all devices
being in the same address space as every single device has its own bus
master AddressSpace.


 
 Paolo
 


 Also, we would have to fix the x86 firmware.

btw what would it mean? :)



 Paolo



 


-- 
Alexey

Re: [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-09 Thread Paolo Bonzini

Il 09/08/2013 12:58, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 08:19 PM, Paolo Bonzini wrote:
 Il 09/08/2013 12:13, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 07:53 PM, Paolo Bonzini wrote:
 Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
 Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
 A PCI device's DMA address space (possibly an IOMMU) is returned by a
 method on the PCIBus.  At the moment that only has one caller, so the
 method is simply open coded.  We'll need another caller for VFIO, so
 this patch introduces a helper/wrapper function.

 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 [aik: added inheritance from parent if iommu is not set for the current 
 bus]
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---
 Changes:
 v2:
 * added inheritance, needed for a pci-bridge on spapr-ppc64
 * pci_iommu_as renamed to pci_device_iommu_address_space
 ---
  hw/pci/pci.c | 22 --
  include/hw/pci/pci.h |  1 +
  2 files changed, 17 insertions(+), 6 deletions(-)

 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 4c004f5..0072b54 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
 *pci_dev, PCIBus *bus,
  }
  
  pci_dev-bus = bus;
 -if (bus-iommu_fn) {
 -dma_as = bus-iommu_fn(bus, bus-iommu_opaque, devfn);
 -} else {
 -/* FIXME: inherit memory region from bus creator */
 -dma_as = address_space_memory;
 -}
 +dma_as = pci_device_iommu_address_space(pci_dev);
  
  memory_region_init_alias(pci_dev-bus_master_enable_region,
   OBJECT(pci_dev), bus master,
 @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass 
 *klass, void *data)
  k-props = pci_props;
  }
  
 +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
 +{
 +PCIBus *bus = PCI_BUS(dev-bus);
 +
 +if (bus-iommu_fn) {
 +return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
 +}
 +
 +if (bus-parent_dev) {
 +return pci_device_iommu_address_space(bus-parent_dev);
 +}

 No, this would fail if bus-parent_dev is not NULL but not a PCI device
 either.

 parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/

 Doh, I misread the code, I thought it was the parent field in
 BusState.  Why do we have parent_dev at all?

 The code is too old? Don't know.


 You can use object_dynamic_cast to convert the parent_dev to
 PCIDevice, and if the cast succeeds you call the new function.

 Perhaps you could make the new function take a PCIBus instead.
 Accessing the PCIDevice's IOMMU address space (as opposed to the
 bus-master address space) doesn't make much sense, VFIO is really a
 special case.  Putting the new API on the bus side instead looks better.

 (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
 devices sitting on the secondary bus?)

 It happens naturally I guess when linux enables devices.

 Yes, but then using the IOMMU address space would be wrong; you would
 have to use the bus-master address space as a base for the child's
 bus-master address space.


 Like this? Works too.

 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 8bdcedc..a4c70e6 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -2247,23 +2247,23 @@ AddressSpace
 *pci_device_iommu_address_space(PCIDevice *dev)
  {
  PCIBus *bus = PCI_BUS(dev-bus);

  if (bus-iommu_fn) {
  return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
  }

  if (bus-parent_dev) {
  return pci_device_iommu_address_space(bus-parent_dev);
  }

 -return address_space_memory;
 +return dev-bus_master_as;
  }

 I was thinking more like this:

  if (bus-parent_dev) {
 -return pci_device_iommu_address_space(bus-parent_dev);
 +/* Take parent device's bus-master enable bit into account.  */
 +return pci_get_address_space(bus-parent_dev);
  }

 +/* Not a secondary bus and no IOMMU.  Use system memory.  */
  return address_space_memory;
 
 Ok.
 
 Oh. BTW. This bus master thing now breaks VFIO's check for all devices
 being in the same address space as every single device has its own bus
 master AddressSpace.

Yes, fixing that check is another good use of the new API you introduced.

But after Ben's answer, I guess the above change is not really needed.
It would add complication for VFIO, too.  Proper emulation would require
QEMU to trap writes to the device's bus-master bit.  QEMU would have to
take of the value that the guest writes, AND it with the bus-master
bit(s) of all bridges between the host bridge and the VFIO device, and
write the result to the passed-through device.  This is because the
bridges are emulated, and do not exist in real hardware.

 Also, we would have to fix the x86 firmware.
 
 btw what would it mean? :)

Firmware would have to look for bridges and enable bus master

Re: [Qemu-devel] [RFC] [PATCHv8 00/30] aio / timers: Add AioContext timers and use ppoll

2013-08-09 Thread Stefan Hajnoczi

On Thu, Aug 08, 2013 at 10:41:57PM +0100, Alex Bligh wrote:
 This patch series adds support for timers attached to an AioContext clock
 which get called within aio_poll.
 
 In doing so it removes alarm timers and moves to use ppoll where possible.
 
 This patch set 'sort of' passes make check (see below for caveat)
 including a new test harness for the aio timers, but has not been
 tested much beyond that. In particular, the win32 changes have not
 even been compile tested. Equally, alterations to use_icount
 are untested.
 
 Caveat: I have had to alter tests/test-aio.c so the following error
 no longer occurs.
 
 ERROR:tests/test-aio.c:346:test_wait_event_notifier_noflush: assertion 
 failed: (aio_poll(ctx, false))
 
 As gar as I can tell, this check was incorrect, in that it checking
 aio_poll makes progress when in fact it should not make progress. I
 fixed an issue where aio_poll was (as far as I can tell) wrongly
 returning true on a timeout, and that generated this error.
 
 Note also the comment on patch 18 in relation to a possible bug
 in cpus.c.
 
 The penultimate patch is patch which is created in an automated manner
 using scripts/switch-timer-api, added in this patch set. It violates some
 coding standards (e.g. line length = 80 characters), but this is preferable
 in terms of giving a provably correct conversion.
 
 This patch set has been compile tested  make check tested on a
 'christmas-tree' configuration, meaning a configuration with every
 --enable- value tested that can be easily configured on Ubuntu Precise,
 after application of each patch.
 
 Changes since v7:
 * Rebase to master 6fdf98f281f85ae6e2883bed2f691bcfe33b1f9f
 * Add qemu_clock_get_ms and qemu_clock_get_ms
 * Rename qemu_get_clock to qemu_clock_ptr
 * Reorder qemu-timer.h to utilise the legacy API
 * Hide qemu_clock_new  qemu_clock_free
 * Rename default_timerlist to main_loop_timerlist
 * Remove main_loop_timerlist once main_loop_tlg is in
 * Add script to convert to new API
 * Make rtc_clock use new API
 * Convert tests/test-aio to use new API
 * Run script on entire source code
 * Remove legacy API functions
 
 Changes since v6:
 * Fix build failure in vnc-auth-sasl.c
 * Split first patch into 3
 * Add assert on timerlist_free
 * Fix ==/= error on qemu_clock_use_for_deadline
 * Remove unnecessary cast in aio_timerlist_notify
 * Fix bad deadline comparison in aio_ctx_check
 * Add assert to timerlist_new_from_clock to check init_clocks
 * Use timer_list not tl
 * Change default_timerlistgroup to main_loop_timerlistgroup
 * Add comment on commit for qemu_clock_use_for_deadline
 * Fixed various include file issues
 * Convert *_has_timers and *_has_expired to return bool
 * Make loop variable consistent when looping through clock types
 * Add documentation to existing qemu_timer calls
 * Remove qemu_clock_deadline and move to qemu_clock_deadline_ns
 
 Changes since v5:
 * Rebase onto master (b9ac5d9)
 * Fix spacing in typedef QEMUTimerList
 * Rename 'QEMUClocks' extern to 'qemu_clocks'
 
 Changes since v4:
 * Rename qemu_timerlist_ functions to timer_list (per Paolo Bonzini)
 * Rename qemu_timer_.*timerlist.* to timer_ (per Paolo Bonzini)
 * Use enum for QEMUClockType
 * Put clocks into an array; remove global variables
 * Introduce QEMUTimerListGroup - a timeliest of each type
 * Add a QEMUTimerListGroup to AioContext
 * Use a callback on timer modification, rather than binding in
   AioContext into the timeliest
 * Make cpus.c iterate over all timerlists when it does a notify
 * Make cpus.c icount timeout use soonest timeout
   across all timerlists
 
 Changes since v3:
 * Split up QEMUClock and QEMUClock list
 * Improve commenting
 * Fix comment in vl.c
 * Change test/test-aio.c to reflect correct behaviour in aio_poll.
 
 Changes since v2:
 * Reordered to remove alarm timers last
 * Added prctl(PR_SET_TIMERSLACK, 1, ...)
 * Renamed qemu_g_poll_ns to qemu_poll_ns
 * Moved declaration of above  drop glib types
 * Do not use a global list of qemu clocks
 * Add AioContext * to QEMUClock
 * Split up conversion to use ppoll and timers
 * Indentation fix
 * Fix aio_win32.c aio_poll to return progress
 * aio_notify / qemu_notify when timers are modified
 * change comment in deprecation of clock options
 
 Alex Bligh (30):
   aio / timers: Rename qemu_new_clock and expose clock types
   aio / timers: add qemu-timer.c utility functions
   aio / timers: Consistent treatment of disabled clocks for deadlines
   aio / timers: add ppoll support with qemu_poll_ns
   aio / timers: Add prctl(PR_SET_TIMERSLACK, 1, ...) to reduce timer
 slack
   aio / timers: Make qemu_run_timers and qemu_run_all_timers return
 progress
   aio / timers: Split QEMUClock into QEMUClock and QEMUTimerList
   aio / timers: Untangle include files
   aio / timers: Add QEMUTimerListGroup and helper functions
   aio / timers: Add QEMUTimerListGroup to AioContext
   aio / timers: Add a notify callback to QEMUTimerList
   aio / timers: aio_ctx_prepare sets

Re: [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-09 Thread Alexey Kardashevskiy

On 08/09/2013 09:07 PM, Paolo Bonzini wrote:
 Il 09/08/2013 12:58, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 08:19 PM, Paolo Bonzini wrote:
 Il 09/08/2013 12:13, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 07:53 PM, Paolo Bonzini wrote:
 Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
 Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
 A PCI device's DMA address space (possibly an IOMMU) is returned by a
 method on the PCIBus.  At the moment that only has one caller, so the
 method is simply open coded.  We'll need another caller for VFIO, so
 this patch introduces a helper/wrapper function.

 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 [aik: added inheritance from parent if iommu is not set for the 
 current bus]
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---
 Changes:
 v2:
 * added inheritance, needed for a pci-bridge on spapr-ppc64
 * pci_iommu_as renamed to pci_device_iommu_address_space
 ---
  hw/pci/pci.c | 22 --
  include/hw/pci/pci.h |  1 +
  2 files changed, 17 insertions(+), 6 deletions(-)

 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 4c004f5..0072b54 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -812,12 +812,7 @@ static PCIDevice 
 *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
  }
  
  pci_dev-bus = bus;
 -if (bus-iommu_fn) {
 -dma_as = bus-iommu_fn(bus, bus-iommu_opaque, devfn);
 -} else {
 -/* FIXME: inherit memory region from bus creator */
 -dma_as = address_space_memory;
 -}
 +dma_as = pci_device_iommu_address_space(pci_dev);
  
  memory_region_init_alias(pci_dev-bus_master_enable_region,
   OBJECT(pci_dev), bus master,
 @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass 
 *klass, void *data)
  k-props = pci_props;
  }
  
 +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
 +{
 +PCIBus *bus = PCI_BUS(dev-bus);
 +
 +if (bus-iommu_fn) {
 +return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
 +}
 +
 +if (bus-parent_dev) {
 +return pci_device_iommu_address_space(bus-parent_dev);
 +}

 No, this would fail if bus-parent_dev is not NULL but not a PCI device
 either.

 parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/

 Doh, I misread the code, I thought it was the parent field in
 BusState.  Why do we have parent_dev at all?

 The code is too old? Don't know.


 You can use object_dynamic_cast to convert the parent_dev to
 PCIDevice, and if the cast succeeds you call the new function.

 Perhaps you could make the new function take a PCIBus instead.
 Accessing the PCIDevice's IOMMU address space (as opposed to the
 bus-master address space) doesn't make much sense, VFIO is really a
 special case.  Putting the new API on the bus side instead looks better.

 (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
 devices sitting on the secondary bus?)

 It happens naturally I guess when linux enables devices.

 Yes, but then using the IOMMU address space would be wrong; you would
 have to use the bus-master address space as a base for the child's
 bus-master address space.


 Like this? Works too.

 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 8bdcedc..a4c70e6 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -2247,23 +2247,23 @@ AddressSpace
 *pci_device_iommu_address_space(PCIDevice *dev)
  {
  PCIBus *bus = PCI_BUS(dev-bus);

  if (bus-iommu_fn) {
  return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
  }

  if (bus-parent_dev) {
  return pci_device_iommu_address_space(bus-parent_dev);
  }

 -return address_space_memory;
 +return dev-bus_master_as;
  }

 I was thinking more like this:

  if (bus-parent_dev) {
 -return pci_device_iommu_address_space(bus-parent_dev);
 +/* Take parent device's bus-master enable bit into account.  */
 +return pci_get_address_space(bus-parent_dev);
  }

 +/* Not a secondary bus and no IOMMU.  Use system memory.  */
  return address_space_memory;

 Ok.

 Oh. BTW. This bus master thing now breaks VFIO's check for all devices
 being in the same address space as every single device has its own bus
 master AddressSpace.
 
 Yes, fixing that check is another good use of the new API you introduced.
 
 But after Ben's answer, I guess the above change is not really needed.
 It would add complication for VFIO, too.  Proper emulation would require
 QEMU to trap writes to the device's bus-master bit.  QEMU would have to
 take of the value that the guest writes, AND it with the bus-master
 bit(s) of all bridges between the host bridge and the VFIO device, and
 write the result to the passed-through device.  This is because the
 bridges are emulated, and do not exist in real hardware.


So does this mean that we go with the original patch and ignore bus master
address

Re: [Qemu-devel] [PATCH v2] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-09 Thread Paolo Bonzini

Il 09/08/2013 13:21, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 09:07 PM, Paolo Bonzini wrote:
 Il 09/08/2013 12:58, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 08:19 PM, Paolo Bonzini wrote:
 Il 09/08/2013 12:13, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 07:53 PM, Paolo Bonzini wrote:
 Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
 On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
 Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
 A PCI device's DMA address space (possibly an IOMMU) is returned by a
 method on the PCIBus.  At the moment that only has one caller, so the
 method is simply open coded.  We'll need another caller for VFIO, so
 this patch introduces a helper/wrapper function.

 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 [aik: added inheritance from parent if iommu is not set for the 
 current bus]
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---
 Changes:
 v2:
 * added inheritance, needed for a pci-bridge on spapr-ppc64
 * pci_iommu_as renamed to pci_device_iommu_address_space
 ---
  hw/pci/pci.c | 22 --
  include/hw/pci/pci.h |  1 +
  2 files changed, 17 insertions(+), 6 deletions(-)

 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 4c004f5..0072b54 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -812,12 +812,7 @@ static PCIDevice 
 *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
  }
  
  pci_dev-bus = bus;
 -if (bus-iommu_fn) {
 -dma_as = bus-iommu_fn(bus, bus-iommu_opaque, devfn);
 -} else {
 -/* FIXME: inherit memory region from bus creator */
 -dma_as = address_space_memory;
 -}
 +dma_as = pci_device_iommu_address_space(pci_dev);
  
  memory_region_init_alias(pci_dev-bus_master_enable_region,
   OBJECT(pci_dev), bus master,
 @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass 
 *klass, void *data)
  k-props = pci_props;
  }
  
 +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
 +{
 +PCIBus *bus = PCI_BUS(dev-bus);
 +
 +if (bus-iommu_fn) {
 +return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
 +}
 +
 +if (bus-parent_dev) {
 +return pci_device_iommu_address_space(bus-parent_dev);
 +}

 No, this would fail if bus-parent_dev is not NULL but not a PCI device
 either.

 parent_dev is of the PCIDevice* type, how can it be not a PCI device? 
 :-/

 Doh, I misread the code, I thought it was the parent field in
 BusState.  Why do we have parent_dev at all?

 The code is too old? Don't know.


 You can use object_dynamic_cast to convert the parent_dev to
 PCIDevice, and if the cast succeeds you call the new function.

 Perhaps you could make the new function take a PCIBus instead.
 Accessing the PCIDevice's IOMMU address space (as opposed to the
 bus-master address space) doesn't make much sense, VFIO is really a
 special case.  Putting the new API on the bus side instead looks 
 better.

 (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA 
 for
 devices sitting on the secondary bus?)

 It happens naturally I guess when linux enables devices.

 Yes, but then using the IOMMU address space would be wrong; you would
 have to use the bus-master address space as a base for the child's
 bus-master address space.


 Like this? Works too.

 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 8bdcedc..a4c70e6 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -2247,23 +2247,23 @@ AddressSpace
 *pci_device_iommu_address_space(PCIDevice *dev)
  {
  PCIBus *bus = PCI_BUS(dev-bus);

  if (bus-iommu_fn) {
  return bus-iommu_fn(bus, bus-iommu_opaque, dev-devfn);
  }

  if (bus-parent_dev) {
  return pci_device_iommu_address_space(bus-parent_dev);
  }

 -return address_space_memory;
 +return dev-bus_master_as;
  }

 I was thinking more like this:

  if (bus-parent_dev) {
 -return pci_device_iommu_address_space(bus-parent_dev);
 +/* Take parent device's bus-master enable bit into account.  */
 +return pci_get_address_space(bus-parent_dev);
  }

 +/* Not a secondary bus and no IOMMU.  Use system memory.  */
  return address_space_memory;

 Ok.

 Oh. BTW. This bus master thing now breaks VFIO's check for all devices
 being in the same address space as every single device has its own bus
 master AddressSpace.

 Yes, fixing that check is another good use of the new API you introduced.

 But after Ben's answer, I guess the above change is not really needed.
 It would add complication for VFIO, too.  Proper emulation would require
 QEMU to trap writes to the device's bus-master bit.  QEMU would have to
 take of the value that the guest writes, AND it with the bus-master
 bit(s) of all bridges between the host bridge and the VFIO device, and
 write the result to the passed-through device.  This is because the
 bridges are emulated, and do not exist in real hardware.
 
 So does this mean that

Re: [Qemu-devel] [PATCH for-next] spapr: Avoid HOST@0 CPU node name in SLOF device tree for -cpu host

2013-08-09 Thread Andreas Färber

Am 08.08.2013 07:15, schrieb Prerna Saxena:
 On 08/01/2013 06:32 AM, Andreas Färber wrote:
 By default on KVM or when user asks for it via -cpu host, cpu_model will
 be host and sPAPR merely upper-cases it for the SLOF device tree.

 Change it so that we get the underlying CPU type, e.g., POWER7_V2.3@0.

 Reported-by: Prerna Saxena pre...@linux.vnet.ibm.com
 Signed-off-by: Andreas Färber afaer...@suse.de
 ---
 
 ACK.
 Reviewed and tested --Works as expected.

Thanks. Looking deeper into the issue I have come up with a simpler
solution that will obsolete this patch.

Andreas

 
 I'll send out an updated follow-up patch later in the day which ensures
 PAPR compliance for nomenclature.
 
 Regards,
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [PATCH for 1.6 0/2] Fix unassigned memory and I/O access handling

2013-08-09 Thread Jan Kiszka

This just binds both patches properly together and fixes up the return
value size of unassigned_io_read as suggested. Please merge before the
1.6 release.

Jan Kiszka (2):
  memory: Provide separate handling of unassigned io ports accesses
  Revert memory: Return -1 again on reads from unsigned regions

 exec.c|3 ++-
 include/exec/ioport.h |2 ++
 ioport.c  |   16 
 memory.c  |2 +-
 4 files changed, 21 insertions(+), 2 deletions(-)

-- 
1.7.3.4

[Qemu-devel] [PATCH for 1.6 1/2] memory: Provide separate handling of unassigned io ports accesses

2013-08-09 Thread Jan Kiszka

Accesses to unassigned io ports shall return -1 on read and be ignored
on write. Ensure these properties via dedicated ops, decoupling us from
the memory core's handling of unassigned accesses.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 exec.c|3 ++-
 include/exec/ioport.h |2 ++
 ioport.c  |   16 
 3 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/exec.c b/exec.c
index 3ca9381..9ed598f 100644
--- a/exec.c
+++ b/exec.c
@@ -1820,7 +1820,8 @@ static void memory_map_init(void)
 address_space_init(address_space_memory, system_memory, memory);
 
 system_io = g_malloc(sizeof(*system_io));
-memory_region_init(system_io, NULL, io, 65536);
+memory_region_init_io(system_io, NULL, unassigned_io_ops, NULL, io,
+  65536);
 address_space_init(address_space_io, system_io, I/O);
 
 memory_listener_register(core_memory_listener, address_space_memory);
diff --git a/include/exec/ioport.h b/include/exec/ioport.h
index bdd4e96..84f7f85 100644
--- a/include/exec/ioport.h
+++ b/include/exec/ioport.h
@@ -45,6 +45,8 @@ typedef struct MemoryRegionPortio {
 
 #define PORTIO_END_OF_LIST() { }
 
+extern const MemoryRegionOps unassigned_io_ops;
+
 void cpu_outb(pio_addr_t addr, uint8_t val);
 void cpu_outw(pio_addr_t addr, uint16_t val);
 void cpu_outl(pio_addr_t addr, uint32_t val);
diff --git a/ioport.c b/ioport.c
index 79b7f1a..707cce8 100644
--- a/ioport.c
+++ b/ioport.c
@@ -44,6 +44,22 @@ typedef struct MemoryRegionPortioList {
 MemoryRegionPortio ports[];
 } MemoryRegionPortioList;
 
+static uint64_t unassigned_io_read(void *opaque, hwaddr addr, unsigned size)
+{
+return -1ULL;
+}
+
+static void unassigned_io_write(void *opaque, hwaddr addr, uint64_t val,
+unsigned size)
+{
+}
+
+const MemoryRegionOps unassigned_io_ops = {
+.read = unassigned_io_read,
+.write = unassigned_io_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
 void cpu_outb(pio_addr_t addr, uint8_t val)
 {
 LOG_IOPORT(outb: %04FMT_pioaddr %02PRIx8\n, addr, val);
-- 
1.7.3.4

[Qemu-devel] [PATCH for 1.6 2/2] Revert memory: Return -1 again on reads from unsigned regions

2013-08-09 Thread Jan Kiszka

This reverts commit 9b8c69243585a32d14b9bb9fcd52c37b0b5a1b71.

The commit was wrong: We only return -1 on invalid accesses, not on
valid but unbacked ones. This broke various corner cases.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 memory.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/memory.c b/memory.c
index 886f838..5a10fd0 100644
--- a/memory.c
+++ b/memory.c
@@ -872,7 +872,7 @@ static uint64_t unassigned_mem_read(void *opaque, hwaddr 
addr,
 if (current_cpu != NULL) {
 cpu_unassigned_access(current_cpu, addr, false, false, 0, size);
 }
-return -1ULL;
+return 0;
 }
 
 static void unassigned_mem_write(void *opaque, hwaddr addr,
-- 
1.7.3.4

Re: [Qemu-devel] [PATCH V4 0/5] Continuous Leaky Bucket Throttling*

2013-08-09 Thread Benoît Canet

 It fail with the following error message at exit and I don't know why yet.
 qemu-system-x86_64: block.c:1489: bdrv_drain_all: Assertion
 `((bs-tracked_requests)-lh_first == ((void *)0))' failed.

I solved this issue: bdrv_drain_all was bogus.

Best regards

Benoît

 
  block.c   |  351 ++--
  block/qapi.c  |   50 +++--
  blockdev.c|  207 ++-
  hmp.c |   36 +++-
  include/block/block.h |1 -
  include/block/block_int.h |   32 +--
  include/qemu/throttle.h   |  105 ++
  qapi-schema.json  |   40 +++-
  qemu-options.hx   |4 +-
  qmp-commands.hx   |   34 +++-
  tests/Makefile|2 +
  tests/test-throttle.c |  494 
 +
  util/Makefile.objs|1 +
  util/throttle.c   |  391 +++
  14 files changed, 1405 insertions(+), 343 deletions(-)
  create mode 100644 include/qemu/throttle.h
  create mode 100644 tests/test-throttle.c
  create mode 100644 util/throttle.c
 
 -- 
 1.7.10.4

Re: [Qemu-devel] [PATCH] pc: drop external DSDT loading

2013-08-09 Thread Anthony Liguori

Gerd Hoffmann kra...@redhat.com writes:

 On 08/08/13 18:38, Anthony Liguori wrote:
 This breaks migration and is unneeded with modern SeaBIOS.

 No.  Dropping for piix is fine.  It will break q35 though.

Can you elaborate?  When Michael and I discussed this I was under the
impression that latest SeaBIOS had full support for q35.

Regards,

Anthony Liguori


 Given that q35 can't be migrated anyway due to ahci being tagged as
 unmigratable keeping it for q35 (until the new acpi table loading is
 sorted) shouldn't hurt though.

 cheers,
   Gerd

Re: [Qemu-devel] -cpu host (was Re: KVM call minutes for 2013-08-06)

2013-08-09 Thread Peter Maydell

On 8 August 2013 13:51, Peter Maydell peter.mayd...@linaro.org wrote:
 For ARM you can't get at feature info of the host from userspace
 (unless you want to get into parsing /proc/cpuinfo), so my current
 idea is to have KVM_ARM_VCPU_INIT support a target-cpu-type
 which means whatever host CPU is.

To expand on this for the 64 bit situation:
 * although in theory we could support a 32-bit-compiled QEMU
   binary on a 64-bit host kernel, I think there's not much
   need for it
 * if you run a 64-bit QEMU on a 64-bit host and ask VCPU_INIT
   for a 'host' CPU, you get a 64 bit CPU
 * you can add the feature flag '32 bit VM please' when
   making the VCPU_INIT call, which gets you the same
   host CPU but forced into 32 bit mode (this flag  behaviour
   exist in the kernel today) -- in QEMU I guess we have a
   '-cpu host32' which drives this, or possibly add support
   for -cpu host,+32bitvm style syntax.

NB that the API for reading and writing registers isn't the
same for 64 bit CPU in 32 bit mode as for a native 32 bit
CPU -- the view of the guest that QEMU sees in the former
case is the same view that a 64 bit hypervisor sees of a
32 bit guest. I think that to avoid huge ifdefs it will
be cleaner to have
 target-arm/kvm.c [common functions]
 target-arm/kvm32.c [init_vcpu, get_registers, etc for 32 bit]
 target-arm/kvm64.c [ditto, 64 bit]

and configure only sets CONFIG_KVM for aarch64-on-aarch64
and arm-on-arm.

-- PMM

Re: [Qemu-devel] [PATCH v2 3/7] AARCH64: Add aarch64 CPU initialization, get and put registers support

2013-08-09 Thread Peter Maydell

On 23 July 2013 10:33, Mian M. Hamayun m.hama...@virtualopensystems.com wrote:
 From: Mian M. Hamayun m.hama...@virtualopensystems.com

 The cpu init function tries to initialize with all possible cpu types, as
 KVM does not provide a means to detect the real cpu type and simply refuses
 to initialize on cpu type mis-match. By using the loop based init function,
 we avoid the need to modify code if the underlying platform is different,
 such as Fast Models instead of Foundation Models.

 Get and Put Registers deal with the basic state of Aarch64 CPUs for the 
 moment.

 Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com

 Conflicts:

 target-arm/kvm.c

 Conflicts:

 target-arm/cpu.c

This sort of Conflicts note shouldn't be in a commit message.

 ---
  linux-headers/linux/kvm.h |1 +
  target-arm/kvm.c  |  125 
 +
  2 files changed, 126 insertions(+)

 diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
 index c614070..4df5292 100644
 --- a/linux-headers/linux/kvm.h
 +++ b/linux-headers/linux/kvm.h
 @@ -783,6 +783,7 @@ struct kvm_dirty_tlb {
  #define KVM_REG_IA64   0x3000ULL
  #define KVM_REG_ARM0x4000ULL
  #define KVM_REG_S390   0x5000ULL
 +#define KVM_REG_ARM64  0x6000ULL

  #define KVM_REG_SIZE_SHIFT 52
  #define KVM_REG_SIZE_MASK  0x00f0ULL

Updates to the linux-headers/ files need to:
 * be a separate patch
 * be the result of running scripts/update-linux-headers.sh
   on an upstream mainline kernel or kvm-next kernel tree
 * include the kernel tree/commit hash in the commit message

 diff --git a/target-arm/kvm.c b/target-arm/kvm.c
 index b92e00d..c96b871 100644
 --- a/target-arm/kvm.c
 +++ b/target-arm/kvm.c
 @@ -32,6 +32,11 @@
  #error mismatch between cpu.h and KVM header definitions
  #endif

 +#ifdef TARGET_AARCH64
 +#define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
 + KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
 +#endif
 +
  const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
  KVM_CAP_LAST_INFO
  };
 @@ -50,6 +55,33 @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
  return cpu-cpu_index;
  }

 +#ifdef TARGET_AARCH64

This set of #ifdefs is pretty messy. I suggest splitting kvm.c
into three:
  target-arm/kvm.c   -- anything common to both 32 and 64 bit
  target-arm/kvm32.c -- non-TARGET_AARCH64 functions
  target-arm/kvm64.c -- TARGET_AARCH64 functions

and have target-arm/Makefile.objs build only one of kvm32.c
or kvm64.c depending on whether TARGET_AARCH64 is set.

 +/* Find an appropriate target CPU type.
 + * KVM does not provide means to detect the host CPU type on aarch64,
 + * and simply refuses to initialize, if the CPU type mis-matches;
 + * so we try each possible CPU type on aarch64 before giving up! */
 +for (i = 0; i  KVM_ARM_NUM_TARGETS; ++i) {
 +init.target = kvm_arm_targets[i];
 +ret = kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_INIT, init);
 +if (!ret)
 +break;
 +}

We definitely don't want to do this -- see my notes on
'-cpu host' in another email thread.  (We should make sure
we coordinate who of you or Linaro is going to do the
-cpu host support, incidentally.)

-- PMM

Re: [Qemu-devel] [PATCH] pc: drop external DSDT loading

2013-08-09 Thread Gerd Hoffmann

On 08/09/13 14:38, Anthony Liguori wrote:
 Gerd Hoffmann kra...@redhat.com writes:
 
 On 08/08/13 18:38, Anthony Liguori wrote:
 This breaks migration and is unneeded with modern SeaBIOS.

 No.  Dropping for piix is fine.  It will break q35 though.
 
 Can you elaborate?  When Michael and I discussed this I was under the
 impression that latest SeaBIOS had full support for q35.

SeaBIOS has the piix acpi tables compiled in, for the snake of backward
compatibility with old qemu versions which don't provide the acpi tables.

With the q35 merge seabios started to provide the apci tables via
fw_cfg, for both q35 and piix, with the long-term goal to drop the
internal tables some day even for piix.

Later on we figured table loading has live migration issues (due to bios
binary being migrated but acpi tables are not, so you can end up with
mismatches).

The idea to deal with that was to simply turn off acpi table loading for
piix4, then wait for mst's acpi table patches which fix this for real
(including migration).  Continue loading the tables on q35, ignoring the
live migration issue as ahci renders q35 unmigratable anyway.

I assumed that happend already, but looks like it slipped though,
otherwise you would not have posted that patch  ...

cheers,
  Gerd

Re: [Qemu-devel] [libvirt] [PATCH] qemu: Drop qemuDomainMemoryLimit

2013-08-09 Thread Michal Privoznik

[CC'ing qemu-devel list]
On 09.08.2013 15:17, Daniel P. Berrange wrote:
 On Fri, Aug 09, 2013 at 07:13:58AM -0600, Eric Blake wrote:
 On 08/09/2013 06:56 AM, Michal Privoznik wrote:
 This function is to guess the correct limit for maximal memory
 usage by qemu for given domain. This can never be guessed
 correctly, not to mention all the pains and sleepless nights this
 code has caused. Once somebody discovers algorithm to solve the
 Halting Problem, we can compute the limit algorithmically. But
 till then, this code should never see the light of the release
 again.
 ---
  src/qemu/qemu_cgroup.c  |  3 +--
  src/qemu/qemu_command.c |  2 +-
  src/qemu/qemu_domain.c  | 49 
 -
  src/qemu/qemu_domain.h  |  2 --
  src/qemu/qemu_hotplug.c |  2 +-
  5 files changed, 3 insertions(+), 55 deletions(-)

 ACK.  Users that put an explicit limit in their XML are taking on their
 own risk at guessing correctly; all other users should not be forced to
 suffer from a bad guess on our part killing their domain.
 
 If we don't understand how to calculate a default limit that works,
 how are users with even less knowledge than us, suppose to calculate
 an explicit level of their own ?
 
 This limit was designed so that the hosts are not vulnerable to DOS
 attack from a compromised QEMU, so removing this is arguably introducing
 a security weakness in our default deployment.
 
 I think I'd like to see some feedback / agreement from QEMU developers
 that this problem really can't be solved, before we remove it.
 
 Daniel
 

In libvirt I've introduced a heuristic to guess the maximum limit for a
memory for a given VM definition. The rationale was it's better to be
safe by default and not let leaking qemu trash the host. The heuristic
is only used if user has not configured any limit himself. However, over
the time the number of users reporting OOM kills due to my heuristic has
grown. Finally, I've full nose of this problem so I've made a patch [1]
that removes this 'functionality' completely (I'd say it's bug after
all). In the patch you can see the heuristic we've converged to. But Dan
has his point. If libvirt  qemu devels aren't able to come up with
proper heuristic, how can an ordinary user (who doesn't have any
knowledge of code) do so? So before I apply my patch, I want to ask you
guys, what do you think about it.

Michal

1: https://www.redhat.com/archives/libvir-list/2013-August/msg00437.html

[Qemu-devel] Are there plans to achieve ram live Snapshot feature?

2013-08-09 Thread Chijianchun

Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly 
restrictions to users.

Are there plans to achieve ram live Snapshot feature?

in my mind, Snapshots can not occupy additional too much memory, So when the 
memory needs to be changed, the old memory page is needed to flush to the file 
first.  But flushing to file is too slower than memory,  and when flushing, the 
vcpu or VM is need to be paused until finished flushing,  so 
pause...resume...pause...resume., more and more slower.

Is this idea feasible? Are there any other thoughts?

[Qemu-devel] [PATCH] default-configs: Fix A9MP and A15MP config names

2013-08-09 Thread Peter Maydell

When individual CONFIG_ switches for the A9MPcore and A15MPcore
devices were created, they were inadvertently given incorrect names
(CONFIG_ARM9MPCORE and CONFIG_ARM15MPCORE). These CPUs are
Cortex-A9MP and Cortex-A15MP, and in particular the ARM9 is
a different (rather older) CPU than the Cortex-A9. Rename the
CONFIG_ switches to bring them into line with the source file
names and CPU names.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 default-configs/arm-softmmu.mak |4 ++--
 hw/cpu/Makefile.objs|4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 27cbe3d..ac0815d 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -34,9 +34,9 @@ CONFIG_PFLASH_CFI02=y
 CONFIG_MICRODRIVE=y
 CONFIG_USB_MUSB=y
 
-CONFIG_ARM9MPCORE=y
 CONFIG_ARM11MPCORE=y
-CONFIG_ARM15MPCORE=y
+CONFIG_A9MPCORE=y
+CONFIG_A15MPCORE=y
 
 CONFIG_ARM_GIC=y
 CONFIG_ARM_GIC_KVM=$(CONFIG_KVM)
diff --git a/hw/cpu/Makefile.objs b/hw/cpu/Makefile.objs
index 4461ece..df287c1 100644
--- a/hw/cpu/Makefile.objs
+++ b/hw/cpu/Makefile.objs
@@ -1,5 +1,5 @@
 obj-$(CONFIG_ARM11MPCORE) += arm11mpcore.o
-obj-$(CONFIG_ARM9MPCORE) += a9mpcore.o
-obj-$(CONFIG_ARM15MPCORE) += a15mpcore.o
+obj-$(CONFIG_A9MPCORE) += a9mpcore.o
+obj-$(CONFIG_A15MPCORE) += a15mpcore.o
 obj-$(CONFIG_ICC_BUS) += icc_bus.o
 
-- 
1.7.9.5

Re: [Qemu-devel] [RFC] [PATCHv8 05/30] aio / timers: Add prctl(PR_SET_TIMERSLACK, 1, ...) to reduce timer slack

2013-08-09 Thread Alex Bligh


On 9 Aug 2013, at 09:53, Stefan Hajnoczi wrote:

 The ./configure change should also be in this patch.  I think it crept
 into another patch by mistake.

Oops - I will fix.

-- 
Alex Bligh

Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access

2013-08-09 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori anth...@codemonkey.ws writes:
 I suspect this is a premature optimization.  With a weak function called
 directly in the accessors below, I suspect you would see no measurable
 performance overhead compared to this approach.

 It's all very predictable so the CPU should do a decent job optimizing
 the if () away.

 Perhaps.  I was leery of introducing performance regressions, but the
 actual I/O tends to dominate anyway.

 So I tested this, by adding the patch (below) and benchmarking
 qemu-system-i386 on my laptop before and after.

 Setup: Intel(R) Core(TM) i5 CPU   M 560  @ 2.67GHz
 (Performance cpu governer enabled)
 Guest: virtio user net, virtio block on raw file, 1 CPU, 512MB RAM.
 (Qemu run under eatmydata to eliminate syncs)

FYI, cache=unsafe is equivalent to using eatmydata.

 First test: ping -f -c 1 -q 10.0.2.0 (100 times)
 (Ping chosen since packets stay in qemu's user net code)

 BEFORE:
 MIN: 824ms
 MAX: 914ms
 AVG: 876.95ms
 STDDEV: 16ms

 AFTER:
 MIN: 872ms
 MAX: 933ms
 AVG: 904.35ms
 STDDEV: 15ms

I can reproduce this although I also see a larger standard deviation.

BEFORE:
MIN: 496
MAX: 1055
AVG: 873.22
STDEV: 136.88

AFTER:
MIN: 494
MAX: 1456
AVG: 947.77
STDEV: 150.89

In my datasets, the stdev is higher in the after case implying that
there is more variation.  Indeed, the MIN is pretty much the same.

GCC is inlining the functions, I'm still surprised that it's measurable
at all.

At any rate, I think the advantage of not increasing the amount of
target specific code outweighs the performance difference here.  As you
said, if there is real I/O, the differences isn't noticable.

Regards,

Anthony Liguori

[Qemu-devel] [PATCH] pc: compat: remove PCLMULQDQ from Westmere on pc-*-1.4 and older

2013-08-09 Thread Eduardo Habkost

commit 41cb383f42d0cb51d8e3e25e3ecebc954dd4196f made a guest-visible
change by adding the PCLMULQDQ bit to Westmere without adding
compatibility code to keep the ABI older machine-types. This patch fixes
it by adding the missing compat code.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
Bug detected by the virt-test CPUID-dump comparison test case, available at:
  https://github.com/autotest/virt-test/pull/714
---
 hw/i386/pc_piix.c | 1 +
 hw/i386/pc_q35.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index ab25458..2817092 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -259,6 +259,7 @@ static void pc_init_pci_1_4(QEMUMachineInitArgs *args)
 {
 has_pvpanic = false;
 x86_cpu_compat_set_features(n270, FEAT_1_ECX, 0, CPUID_EXT_MOVBE);
+x86_cpu_compat_set_features(Westmere, FEAT_1_ECX, 0, 
CPUID_EXT_PCLMULQDQ);
 pc_init_pci_1_5(args);
 }
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 2f35d12..25c6b33 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -227,6 +227,7 @@ static void pc_q35_init_1_4(QEMUMachineInitArgs *args)
 {
 has_pvpanic = false;
 x86_cpu_compat_set_features(n270, FEAT_1_ECX, 0, CPUID_EXT_MOVBE);
+x86_cpu_compat_set_features(Westmere, FEAT_1_ECX, 0, 
CPUID_EXT_PCLMULQDQ);
 pc_q35_init_1_5(args);
 }
 
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access

2013-08-09 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori anth...@codemonkey.ws writes:
 Daniel P. Berrange berra...@redhat.com writes:

 The distinction is important in QEMU.  ppc64 is still
 TARGET_WORDS_BIGENDIAN.  We still want most stl_phys to treat integers
 as big endian.  There's just this extra concept that CPU loads/stores
 are sometimes byte swapped.  That affects virtio but not a lot else.

 You've redefined endian here; please don't do that.  Endian is the order
 in memory which a CPU does loads and stores.  From any reasonable
 definition, PPC is bi-endian.

 It's actually a weird thing for the qemu core to know at all: almost
 everything which cares is in target-specific code.  The exceptions are
 gdb stubs and virtio, both of which are native endian (and that weird
 code in exec.c: what is notdirty_mem_write?).

 Your argument that we shouldn't fix stl_* might be justifiable (ie. just
 hack virtio and gdb as one-offs), but it's neither clear nor least
 surprise.

That's not what I'm suggesting.

I'm suggesting that we should introduce multiple variants of {ld,st}*
for different types of memory access.

These are bad names, but I'm thinking along the lines of:

/* Read a word as the load/store instructions would */
cpu_ldst_ldw()

/* Read a word as the instruction fetch unit would */
cpu_fetch_ldw()

/* Read a word as the hardware MMU would */
cpu_mmu_ldw()

Peter was suggesting that instead of having separate functions, we
should use a context:

ldw(cpu-ldst, ..)
ldw(cpu-fetch, ..)
...

I think I prefer functions though over a context.  But this is really
about TCG, not virtio.  As Ben pointed out, virtio endianness needs to
be independent of CPUs.  We process the ring outside of a specific CPU
context and it's possible that if we pick an arbitrary one, it will be
in the wrong context (if running BE userspace).

The only real problem I have with your original patch is putting virtio
knowledge in the target code.  I think your adjusted version is fine.

Regards,

Anthony Liguori


 Chers,
 Rusty.

Re: [Qemu-devel] [RFC] [PATCHv8 11/30] aio / timers: Add a notify callback to QEMUTimerList

2013-08-09 Thread Alex Bligh


On 9 Aug 2013, at 10:02, Stefan Hajnoczi wrote:

 When looking at thread-safety I had to think about set_notify_cb() for a
 while.  The issue is that we add the timerlist to the clock source's
 -timerlists *before* notify_cb has been assigned.
 
 This could be a problem is another thread re-enables the clock source
 while we are still in timerlist_new().
 
 In practice it is not an issue when AioContexts are created under the
 global mutex (that's the case today).
 
 Still, it would be a little safer to drop set_notify_cb() and instead
 pass in cb/opaque in timerlist_new().

This is good idea. I will do that.

-- 
Alex Bligh

Re: [Qemu-devel] [RFC] [PATCHv8 23/30] aio / timers: Rearrange timer.h make legacy functions call non-legacy

2013-08-09 Thread Alex Bligh


On 9 Aug 2013, at 10:23, Stefan Hajnoczi wrote:

 On Thu, Aug 08, 2013 at 10:42:20PM +0100, Alex Bligh wrote:
 @@ -269,17 +299,17 @@ bool timerlist_expired(QEMUTimerList *timer_list);
 int64_t timerlist_deadline_ns(QEMUTimerList *timer_list);
 
 /**
 - * timerlist_getclock:
 + * timerlist_get_clock:
  * @timer_list: the timer list to operate on
  *
  * Read the clock value associated with a timer list.
  * The clock value is normally in nanoseconds, but may
  * not be in some instances (e.g. vm_clock with use_icount).
 
 The documentation is wrong.  This function does not get the clock value
 in nanoseconds.

I will fix.

 +/**
 + * timer_put:
 + * @f: the file
 + * @ts: the timer
 + */
 +void timer_put(QEMUFile *f, QEMUTimer *ts);
 
 The struct is still called QEMUTimer, should that also be renamed?

I could do but it might produce another patch too large for the
mailing list. Suggest we leave it as it is for now.

-- 
Alex Bligh

Re: [Qemu-devel] [RFC] [PATCHv8 00/30] aio / timers: Add AioContext timers and use ppoll

2013-08-09 Thread Alex Bligh

Stefan,

On 9 Aug 2013, at 10:41, Stefan Hajnoczi wrote:

 Patch 29/30 didn't make it to my inbox.  It may have been bounced due to
 size.  Using your git repo instead.

I can break this up into 1 patch per file if you prefer, which is
what I was originally going to do. The automated program does that
and produces git comments. That's about 130 commits I think. Would
you prefer that?

-- 
Alex Bligh

1 2 3 >

1 - 100 of 208 matches

Mail list logo