date:20130811

Re: [Qemu-devel] [RFC] [PATCHv8 09/30] aio / timers: Add QEMUTimerListGroup and helper functions

2013-08-11 Thread Paolo Bonzini

Il 10/08/2013 13:05, Alex Bligh ha scritto:
> Despite the fact we both dislike the name TimerListGroup, I
> think the way to go here is (1). (2) does not really save lines
> of code (certainly not compiled instructions) - it's main saving
> is removing a pile of commenting from include/qemu/timer.h,
> which makes things more opaque.
> 
> I also think there may well be a use for something that wants
> to use timers but not AioContext (I am thinking for instance
> of a thread that does not do block IO). This permits that,
> but does not require it.

There is actually a disadvantage of moving TimerListGroup to AioContext.
 The disadvantage is that GSources can only work with millisecond
resolution.  Thus you would need anyway some special casing of the
"timer AioContext" to get the deadline in nanoseconds.

So let's keep the TimerListGroup for now.

Paolo

> WDYT?

Re: [Qemu-devel] [RFC] [PATCHv8 09/30] aio / timers: Add QEMUTimerListGroup and helper functions

2013-08-11 Thread Alex Bligh


Paolo,

--On 11 August 2013 09:53:38 +0200 Paolo Bonzini  
wrote:



There is actually a disadvantage of moving TimerListGroup to AioContext.
 The disadvantage is that GSources can only work with millisecond
resolution.  Thus you would need anyway some special casing of the
"timer AioContext" to get the deadline in nanoseconds.


We also need to special case the notifier as it needs to qemu_notify()
rather than aio_notify().


So let's keep the TimerListGroup for now.


OK - do you want me to wrap it in a struct? Other than that I think I've
done all the comments in v8. Happy to do that with v10 if there are
other comments on v9.

I note no one has yet commented on the changes to the icount stuff
where a timeout is apparently arbitrarily capped at 2^31 ns (about
2.1 seconds) in PATCHv9 19/31 - attached below. That's the area
I'm worried about as I'm not sure I understood the code.

--
Alex Bligh



-- Forwarded Message --

Subject: [RFC] [PATCHv9 19/31] aio / timers: Use all timerlists in icount 
warp calculations


...

For compatibility, maintain an apparent bug where when using
icount, if no vm_clock timer was set, qemu_clock_deadline
would return INT32_MAX and always set an icount clock expiry
about 2 seconds ahead.

...

diff --git a/cpus.c b/cpus.c
index 0f65e76..673d506 100644
--- a/cpus.c
+++ b/cpus.c

...

@@ -314,7 +314,18 @@ void qemu_clock_warp(QEMUClock *clock)
}

vm_clock_warp_start = qemu_get_clock_ns(rt_clock);
-deadline = qemu_clock_deadline(vm_clock);
+/* We want to use the earliest deadline from ALL vm_clocks */
+deadline = qemu_clock_deadline_ns_all(vm_clock);
+
+/* Maintain prior (possibly buggy) behaviour where if no deadline
+ * was set (as there is no vm_clock timer) or it is more than
+ * INT32_MAX nanoseconds ahead, we still use INT32_MAX
+ * nanoseconds.
+ */
+if ((deadline < 0) || (deadline > INT32_MAX)) {
+deadline = INT32_MAX; / < THIS
+}
+
if (deadline > 0) {
/*
 * Ensure the vm_clock proceeds even when the virtual CPU goes to
@@ -333,8 +344,8 @@ void qemu_clock_warp(QEMUClock *clock)
 * packets continuously instead of every 100ms.
 */
qemu_mod_timer(icount_warp_timer, vm_clock_warp_start + deadline);
-} else {
-qemu_notify_event();
+} else if (deadline == 0) {    <-- AND THIS
+qemu_clock_notify(vm_clock);
}
}

...

@@ -1145,11 +1161,23 @@ static int tcg_cpu_exec(CPUArchState *env)
#endif
if (use_icount) {
int64_t count;
+int64_t deadline;
int decr;
qemu_icount -= (env->icount_decr.u16.low + env->icount_extra);
env->icount_decr.u16.low = 0;
env->icount_extra = 0;
-count = qemu_icount_round(qemu_clock_deadline(vm_clock));
+deadline = qemu_clock_deadline_ns_all(vm_clock);
+
+/* Maintain prior (possibly buggy) behaviour where if no deadline
+ * was set (as there is no vm_clock timer) or it is more than
+ * INT32_MAX nanoseconds ahead, we still use INT32_MAX
+ * nanoseconds.
+ */
+if ((deadline < 0) || (deadline > INT32_MAX)) {
+deadline = INT32_MAX;  / <--- AND THIS
+}
+
+count = qemu_icount_round(deadline);
qemu_icount += count;
decr = (count > 0x) ? 0x : count;
count -= decr;

--
Alex Bligh

Re: [Qemu-devel] [PATCH 0/2] Disassembly with external objdump

2013-08-11 Thread Blue Swirl

On Fri, Aug 9, 2013 at 7:19 PM, Richard Henderson  wrote:
> We have one host platform (aarch64), and three target platforms
> (openrisc, unicore32, xtensa) with no built-in disassembly support,
> thanks largely to gplv3 silliness.
>
> Here's a first-cut at handling these cases with an external tool.
> The qemu-produced dump file contains just a hex dump of bytes, and
> a perl script is provided to pass those bytes through objdump.
>
> I've lightly tested this with aarch64 host running on Foundation.
> Feedback appreciated.

Nice idea, now that QEMU is now more easily portable to new host platforms.

>
>
> r~
>
>
> Richard Henderson (2):
>   disas: Implement fallback to dump object code as hex
>   disas: Add disas-objdump.pl
>
>  disas.c  | 46 +++--
>  scripts/disas-objdump.pl | 87 
> 
>  2 files changed, 123 insertions(+), 10 deletions(-)
>  create mode 100755 scripts/disas-objdump.pl
>
> --
> 1.8.3.1
>
>

[Qemu-devel] [VAC] mjt is at vacation

2013-08-11 Thread Michael Tokarev


We're going to a (hopefully) 3-week vacation starting
today, and I don't expect any internet access there.
So I wont be able to handle trivial-patches during
this time (till Aug-2013).

Thanks,

/mjt

Re: [Qemu-devel] [PATCH for-1.6 1/2] don't create pvpanic device by default.

2013-08-11 Thread Michael S. Tsirkin

On Fri, Aug 02, 2013 at 10:27:31AM +0200, Paolo Bonzini wrote:
> On 08/02/2013 09:04 AM, Hu Tao wrote:
> >The problem with pvpanic being an internal device is that VMs running
> >operating systems without a driver for this device will have problems
> >when qemu will be upgraded (from qemu without this pvpanic).
> >
> >The outcome may be, for example: in Windows(let's say XP) the Device
> >manager will open a "new device" wizard and the device will appear as
> >an unrecognized device. On a cluster with hundreds of such VMs, If
> >that cluster has a health monitoring service it may show all the VMs
> >in a "not healthy" state.
> >
> >Reported-by: Marcel Apfelbaum 
> >Signed-off-by: Hu Tao 
> 
> NACK,
> 
> this is premature.  It is fundamentally a firmware problem.
> 
> We have time to apply an even smaller patch that doesn't set
> has_pvpanic to true, and delay the whole feature to 1.7, if we do
> not fix the firmware in the next two weeks.
> 
> Paolo

I think this is not just a firmware problem.  Adding device by default
was too rush, assumption was risk of guest bugs was 0.

We are now seeing problems with bios guest code and with linux guest
drivers as well.  Yes they all can be fixed, but we simply shouldn't
force this risk of broken guests on everyone.

libvirt is the main user and libvirt people
indicated their preference to creating device with
-device pvpanic rather than a built-in one that
can't be removed.

So please reconsider, and here's an ack from me.

Acked-by: Michael S. Tsirkin

[Qemu-devel] [Bug 1087114] Re: assertion "QLIST_EMPTY(&bs->tracked_requests)" failed

2013-08-11 Thread Rainer Müller

I was unable to reproduce the original issue on Mac OS X 10.8.4 using
the current master. However, I was also unable to reproduce the original
issue on the stable-1.5 branch which does not have the fix by Izumi
Tsutsui linked above. As this second fix is only for a problem that
appears in certain load situations, of course I might not be able to
reproduce it.

I also reviewed the code on master I am confident that the solution is
correct now.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1087114

Title:
  assertion "QLIST_EMPTY(&bs->tracked_requests)" failed

Status in QEMU:
  New

Bug description:
  QEMU 1.3.0 on OpenBSD now crashes with an error as shown below and the
  command line params do not seem to matter.

  assertion "QLIST_EMPTY(&bs->tracked_requests)" failed: file "block.c",
  line 1220, function "bdrv_drain_all"

  #1  0x030d1bce24aa in abort () at /usr/src/lib/libc/stdlib/abort.c:70
  p = (struct atexit *) 0x30d11897000
  mask = 4294967263
  cleanup_called = 1
  #2  0x030d1bc5ff44 in __assert2 (file=Variable "file" is not available.
  ) at /usr/src/lib/libc/gen/assert.c:52
  No locals.
  #3  0x030b0d383a03 in bdrv_drain_all () at block.c:1220
  bs = (BlockDriverState *) 0x30d13f3b630
  busy = false
  __func__ = "bdrv_drain_all"
  #4  0x030b0d43acfc in bmdma_cmd_writeb (bm=0x30d0f5f56a8, val=8) at 
hw/ide/pci.c:312
  __func__ = "bmdma_cmd_writeb"
  #5  0x030b0d43b450 in bmdma_write (opaque=0x30d0f5f56a8, addr=0, val=8, 
size=1) at hw/ide/piix.c:76
  bm = (BMDMAState *) 0x30d0f5f56a8
  #6  0x030b0d5c2ce6 in memory_region_write_accessor (opaque=0x30d0f5f57d0, 
addr=0, value=0x30d18c288f0, size=1, shift=0, mask=255)
  at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/memory.c:334
  mr = (MemoryRegion *) 0x30d0f5f57d0
  tmp = 8
  #7  0x030b0d5c2dc5 in access_with_adjusted_size (addr=0, 
value=0x30d18c288f0, size=1, access_size_min=1, access_size_max=4, 
  access=0x30b0d5c2c6b , 
opaque=0x30d0f5f57d0) at 
/home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/memory.c:364
  access_mask = 255
  access_size = 1
  i = 0
  #8  0x030b0d5c3222 in memory_region_iorange_write (iorange=0x30d1d5e7400, 
offset=0, width=1, data=8)
  at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/memory.c:439
  mrio = (MemoryRegionIORange *) 0x30d1d5e7400
  mr = (MemoryRegion *) 0x30d0f5f57d0
  __func__ = "memory_region_iorange_write"
  #9  0x030b0d5c019a in ioport_writeb_thunk (opaque=0x30d1d5e7400, 
addr=49216, data=8) at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/ioport.c:212
  ioport = (IORange *) 0x30d1d5e7400
  #10 0x030b0d5bfb65 in ioport_write (index=0, address=49216, data=8) at 
/home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/ioport.c:83
  func = (IOPortWriteFunc *) 0x30b0d5c0148 
  default_func = {0x30b0d5bfbbc , 0x30b0d5bfc61 
, 0x30b0d5bfd0c }
  #11 0x030b0d5c0704 in cpu_outb (addr=49216, val=8 '\b') at 
/home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/ioport.c:289
  No locals.
  #12 0x030b0d6067dd in helper_outb (port=49216, data=8) at 
/home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/target-i386/misc_helper.c:72
  No locals.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1087114/+subscriptions

[Qemu-devel] [PATCH v2] virtio-serial: Do not notify virtqueue if no element was pushed back.

2013-08-11 Thread Gal Hammer

The redundant notification caused the Windows' driver to duplicate the
pending write request's buffer. The driver was fixed, but I think this
change is still required.

Signed-off-by: Gal Hammer 
---
 hw/char/virtio-serial-bus.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index da417c7..0d38b4b 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -105,6 +105,7 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
VirtQueue *vq,
  VirtIODevice *vdev)
 {
 VirtIOSerialPortClass *vsc;
+bool elem_pushed = false;
 
 assert(port);
 assert(virtio_queue_ready(vq));
@@ -145,9 +146,12 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
VirtQueue *vq,
 break;
 }
 virtqueue_push(vq, &port->elem, 0);
+elem_pushed = true;
 port->elem.out_num = 0;
 }
-virtio_notify(vdev, vq);
+if (elem_pushed) {
+virtio_notify(vdev, vq);
+}
 }
 
 static void flush_queued_data(VirtIOSerialPort *port)
-- 
1.8.1.4

Re: [Qemu-devel] [PATCH v2 for-1.6 1/2] hw/virtio/virtio: Don't allow guests to add/remove queues

2013-08-11 Thread Michael S. Tsirkin

On Fri, Jul 26, 2013 at 04:41:27PM +0100, Peter Maydell wrote:
> A queue size of 0 is used to indicate a nonexistent queue, so
> don't allow the guest to flip a queue between zero-size and
> non-zero-size. Don't permit setting of negative queue sizes
> either.
> 
> Signed-off-by: Peter Maydell 


Reviewed-by: Michael S. Tsirkin 

> ---
>  hw/virtio/virtio.c |   12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 09f62c6..60653f7 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -673,10 +673,16 @@ hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
>  
>  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
>  {
> -if (num <= VIRTQUEUE_MAX_SIZE) {
> -vdev->vq[n].vring.num = num;
> -virtqueue_init(&vdev->vq[n]);
> +/* Don't allow guest to flip queue between existent and
> + * nonexistent states, or to set it to an invalid size.
> + */
> +if (!!num != !!vdev->vq[n].vring.num ||
> +num > VIRTQUEUE_MAX_SIZE ||
> +num < 0) {
> +return;
>  }
> +vdev->vq[n].vring.num = num;
> +virtqueue_init(&vdev->vq[n]);
>  }
>  
>  int virtio_queue_get_num(VirtIODevice *vdev, int n)
> -- 
> 1.7.9.5

Re: [Qemu-devel] [PATCH v2 for-1.6 2/2] hw/virtio/virtio-mmio: Make QueueNumMax read 0 for unavailable queues

2013-08-11 Thread Michael S. Tsirkin

On Fri, Jul 26, 2013 at 04:41:28PM +0100, Peter Maydell wrote:
> The virtio-mmio spec says that QueueNumMax must read zero for queues
> which are unavailable; implement this, rather than always returning
> VIRTQUEUE_MAX_SIZE.
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Michael S. Tsirkin 


> ---
>  hw/virtio/virtio-mmio.c |3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
> index 54d6679..9cf79ce 100644
> --- a/hw/virtio/virtio-mmio.c
> +++ b/hw/virtio/virtio-mmio.c
> @@ -151,6 +151,9 @@ static uint64_t virtio_mmio_read(void *opaque, hwaddr 
> offset, unsigned size)
>  }
>  return proxy->host_features;
>  case VIRTIO_MMIO_QUEUENUMMAX:
> +if (!virtio_queue_get_num(vdev, vdev->queue_sel)) {
> +return 0;
> +}
>  return VIRTQUEUE_MAX_SIZE;
>  case VIRTIO_MMIO_QUEUEPFN:
>  return virtio_queue_get_addr(vdev, vdev->queue_sel)
> -- 
> 1.7.9.5

Re: [Qemu-devel] [PATCH v2 for-1.6 0/2] virtio-mmio: fixes to QueueNum, QueueNumMax

2013-08-11 Thread Michael S. Tsirkin

On Fri, Aug 09, 2013 at 04:47:40PM +0100, Peter Maydell wrote:
> On 26 July 2013 16:41, Peter Maydell  wrote:
> > These patches fix a couple of bugs in virtio-mmio's
> > handling of the registers that deal with the queue size:
> >
> >  * as mst points out, letting the guest flip a queue between
> >"exists" and "doesn't exist" is a bad idea
> >  * QueueNumMax wasn't reading the correct value for nonexistent
> >queues
> >
> > This doesn't include any change to the behaviour of queuesize
> > on reset (discussed in other thread); the current behaviour is
> > not a problem for well-behaved guests, and safe in the face
> > of badly-behaved guests, and currently improving the reset
> > behaviour is blocked by an unrelated bug.
> >
> > v1->v2: changes as per mst review:
> >  * avoid explicit "== 0" comparisons
> >  * avoid unnecessary parens round comparison ops
> >  * do the "don't flip between existent and nonexistent" check
> >with "!!num != !!oldnum" (and add a comment noting why we're
> >doing this check)
> >
> > Peter Maydell (2):
> >   hw/virtio/virtio: Don't allow guests to add/remove queues
> >   hw/virtio/virtio-mmio: Make QueueNumMax read 0 for unavailable queues
> 
> These didn't make it into 1.6, but in the absence of any
> review comments I'm putting them into arm-devs for post-1.6.
> 
> thanks
> -- PMM

I'd say these are important bugfixes, should be OK for 1.6 still.

Re: [Qemu-devel] [PATCH v3] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-11 Thread Michael S. Tsirkin

On Sat, Aug 10, 2013 at 01:09:08AM +1000, Alexey Kardashevskiy wrote:
> A PCI device's DMA address space (possibly an IOMMU) is returned by a
> method on the PCIBus.  At the moment that only has one caller, so the
> method is simply open coded.  We'll need another caller for VFIO, so
> this patch introduces a helper/wrapper function.
> 
> If IOMMU is not set, the pci_device_iommu_address_space() function
> returns the parent's IOMMU skipping the "bus master" address space as
> otherwise proper emulation would require more effort for no benefit.
> 
> Signed-off-by: David Gibson 
> [aik: added inheritance from parent if iommu is not set for the current bus]
> Signed-off-by: Alexey Kardashevskiy 
> 
> ---
> Changes:
> v3:
> * added comment about ignoring bus master address space
> 
> v2:
> * added inheritance, needed for a pci-bridge on spapr-ppc64
> * pci_iommu_as renamed to pci_device_iommu_address_space
> ---
>  hw/pci/pci.c | 24 ++--
>  include/hw/pci/pci.h |  1 +
>  2 files changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 4c004f5..dbfa395 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> *pci_dev, PCIBus *bus,
>  }
>  
>  pci_dev->bus = bus;
> -if (bus->iommu_fn) {
> -dma_as = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
> -} else {
> -/* FIXME: inherit memory region from bus creator */
> -dma_as = &address_space_memory;
> -}
> +dma_as = pci_device_iommu_address_space(pci_dev);
>  
>  memory_region_init_alias(&pci_dev->bus_master_enable_region,
>   OBJECT(pci_dev), "bus master",
> @@ -2239,6 +2234,23 @@ static void pci_device_class_init(ObjectClass *klass, 
> void *data)
>  k->props = pci_props;
>  }
>  
> +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
> +{
> +PCIBus *bus = PCI_BUS(dev->bus);
> +
> +if (bus->iommu_fn) {
> +return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
> +}
> +
> +if (bus->parent_dev) {
> +/** We are ignoring the bus master DMA bit of the bridge
> + *  as it would complicate things such as VFIO for no good reason */

/*
 * Always
 * like
 * this
 */

/** Never
 * like this */

The comment should be improved I think.
I would put it like this:
/*
 * Note: this does not check bus master enable bit on device or
 * any of the pci to pci bridges above it, it's up to the caller to
 * check that before initiating the transaction.
 *
 * TODO: design a mechanism for callers to do this without
 * doing bus scans on data path.
 */

Would you like me to queue this on the pci tree? If yes I can
tweak the comment myself, no need to repost.

> +return pci_device_iommu_address_space(bus->parent_dev);
> +}
> +
> +return &address_space_memory;
> +}
> +
>  void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
>  {
>  bus->iommu_fn = fn;
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index ccec2ba..2374aa9 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -405,6 +405,7 @@ void pci_device_deassert_intx(PCIDevice *dev);
>  
>  typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
>  
> +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
>  void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
>  
>  static inline void
> -- 
> 1.8.3.2

Re: [Qemu-devel] peculiar make check problems: tests/libqtest.c

2013-08-11 Thread Andreas Färber

Hi,

Am 10.08.2013 13:20, schrieb Alex Bligh:
> Occasionally running make check I am seeing the following opaque error:
> ERROR:tests/libqtest.c:69:init_socket: assertion failed (ret != -1): (-1
> != -1)
> 
> Rerunning it it runs clean. Any ideas?

Most likely a previous qtest will have failed, leaving temporary files
with that pid behind and your current test happens to get the same pid
again. Check /tmp/qtest*.

Cheers,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH v3] pci: Introduce helper to retrieve a PCI device's DMA address space

2013-08-11 Thread Alexey Kardashevskiy

On 08/11/2013 11:58 PM, Michael S. Tsirkin wrote:
> On Sat, Aug 10, 2013 at 01:09:08AM +1000, Alexey Kardashevskiy wrote:
>> A PCI device's DMA address space (possibly an IOMMU) is returned by a
>> method on the PCIBus.  At the moment that only has one caller, so the
>> method is simply open coded.  We'll need another caller for VFIO, so
>> this patch introduces a helper/wrapper function.
>>
>> If IOMMU is not set, the pci_device_iommu_address_space() function
>> returns the parent's IOMMU skipping the "bus master" address space as
>> otherwise proper emulation would require more effort for no benefit.
>>
>> Signed-off-by: David Gibson 
>> [aik: added inheritance from parent if iommu is not set for the current bus]
>> Signed-off-by: Alexey Kardashevskiy 
>>
>> ---
>> Changes:
>> v3:
>> * added comment about ignoring bus master address space
>>
>> v2:
>> * added inheritance, needed for a pci-bridge on spapr-ppc64
>> * pci_iommu_as renamed to pci_device_iommu_address_space
>> ---
>>  hw/pci/pci.c | 24 ++--
>>  include/hw/pci/pci.h |  1 +
>>  2 files changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index 4c004f5..dbfa395 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
>> *pci_dev, PCIBus *bus,
>>  }
>>  
>>  pci_dev->bus = bus;
>> -if (bus->iommu_fn) {
>> -dma_as = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
>> -} else {
>> -/* FIXME: inherit memory region from bus creator */
>> -dma_as = &address_space_memory;
>> -}
>> +dma_as = pci_device_iommu_address_space(pci_dev);
>>  
>>  memory_region_init_alias(&pci_dev->bus_master_enable_region,
>>   OBJECT(pci_dev), "bus master",
>> @@ -2239,6 +2234,23 @@ static void pci_device_class_init(ObjectClass *klass, 
>> void *data)
>>  k->props = pci_props;
>>  }
>>  
>> +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>> +{
>> +PCIBus *bus = PCI_BUS(dev->bus);
>> +
>> +if (bus->iommu_fn) {
>> +return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>> +}
>> +
>> +if (bus->parent_dev) {
>> +/** We are ignoring the bus master DMA bit of the bridge
>> + *  as it would complicate things such as VFIO for no good reason */
> 
> /*
>  * Always
>  * like
>  * this
>  */
> 
> /** Never
>  * like this */


Hm. I thought I saw a lot of those but it was the kernel :)
btw may comments start with "/**" (with no text in that line but still) -
what is the difference to "/*"?


> The comment should be improved I think.
> I would put it like this:
> /*
>  * Note: this does not check bus master enable bit on device or
>  * any of the pci to pci bridges above it, it's up to the caller to
>  * check that before initiating the transaction.
>  *
>  * TODO: design a mechanism for callers to do this without
>  * doing bus scans on data path.
>  */

What exactly do you call here "bus scans"?


> Would you like me to queue this on the pci tree? If yes I can
> tweak the comment myself, no need to repost.

Yes, please. Your tree is fine. Thanks!


>> +return pci_device_iommu_address_space(bus->parent_dev);
>> +}
>> +
>> +return &address_space_memory;
>> +}
>> +
>>  void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
>>  {
>>  bus->iommu_fn = fn;
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index ccec2ba..2374aa9 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -405,6 +405,7 @@ void pci_device_deassert_intx(PCIDevice *dev);
>>  
>>  typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
>>  
>> +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
>>  void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
>>  
>>  static inline void



-- 
Alexey

Re: [Qemu-devel] [PATCH for-1.6 1/2] don't create pvpanic device by default.

2013-08-11 Thread Andreas Färber

Am 11.08.2013 12:33, schrieb Michael S. Tsirkin:
> On Fri, Aug 02, 2013 at 10:27:31AM +0200, Paolo Bonzini wrote:
>> On 08/02/2013 09:04 AM, Hu Tao wrote:
>>> The problem with pvpanic being an internal device is that VMs running
>>> operating systems without a driver for this device will have problems
>>> when qemu will be upgraded (from qemu without this pvpanic).
>>>
>>> The outcome may be, for example: in Windows(let's say XP) the Device
>>> manager will open a "new device" wizard and the device will appear as
>>> an unrecognized device. On a cluster with hundreds of such VMs, If
>>> that cluster has a health monitoring service it may show all the VMs
>>> in a "not healthy" state.
>>>
>>> Reported-by: Marcel Apfelbaum 
>>> Signed-off-by: Hu Tao 
>>
>> NACK,
>>
>> this is premature.  It is fundamentally a firmware problem.
>>
>> We have time to apply an even smaller patch that doesn't set
>> has_pvpanic to true, and delay the whole feature to 1.7, if we do
>> not fix the firmware in the next two weeks.
>>
>> Paolo
> 
> I think this is not just a firmware problem.  Adding device by default
> was too rush, assumption was risk of guest bugs was 0.
> 
> We are now seeing problems with bios guest code and with linux guest
> drivers as well.  Yes they all can be fixed, but we simply shouldn't
> force this risk of broken guests on everyone.
> 
> libvirt is the main user and libvirt people
> indicated their preference to creating device with
> -device pvpanic rather than a built-in one that
> can't be removed.
> 
> So please reconsider, and here's an ack from me.
> 
> Acked-by: Michael S. Tsirkin 

NACK for this v1: As pointed out on the KVM call, we still need to keep
the pvpanic device around by default for pc-*-1.5. Removing has_pvpanic
completely therefore seems wrong. Can you submit a v2 for rc3 tomorrow?

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [Bug 1210212] Re: qemu core dumps with -serial mon:vc

2013-08-11 Thread Lei Li

Hi,

This problem has been solved by

commit 7b7ab18d0b9769b5f39e663fa55caed461b1202e:
Author: Michael Roth 
Date:   Tue Jul 30 13:04:22 2013 -0500

chardev: fix CHR_EVENT_OPENED events for mux chardevs

Patch link:
http://patchwork.ozlabs.org/patch/263458/


** Changed in: qemu
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1210212

Title:
  qemu core dumps with -serial mon:vc

Status in QEMU:
  Fix Committed

Bug description:
  qemu 1.5.2-1 dumps core when asked to put the monitor on a virtual
  console.  For example, suppose you want to monitor the second serial
  port, you might try something like:

  qemu-system-x86_64 -serial null -serial mon:vc

  But that creates a core dump.  In fact, even re-creating what should
  be the default dumps core:

  $ qemu-system-x86_64 -serial mon:vc:80Cx25C
  Segmentation fault (core dumped)

  I'm not including a backtrace because the bug is so easy to reproduce,
  but I can provide more info if necessary.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1210212/+subscriptions

Re: [Qemu-devel] peculiar make check problems: tests/libqtest.c

2013-08-11 Thread Alex Bligh


On 11 Aug 2013, at 15:22, Andreas Färber wrote:

> Am 10.08.2013 13:20, schrieb Alex Bligh:
>> Occasionally running make check I am seeing the following opaque error:
>> ERROR:tests/libqtest.c:69:init_socket: assertion failed (ret != -1): (-1
>> != -1)
>> 
>> Rerunning it it runs clean. Any ideas?
> 
> Most likely a previous qtest will have failed, leaving temporary files
> with that pid behind and your current test happens to get the same pid
> again. Check /tmp/qtest*.

That was it.

-- 
Alex Bligh

[Qemu-devel] [PATCH for-1.6 V2 1/2] hw/misc: don't create pvpanic device by default

2013-08-11 Thread Marcel Apfelbaum

This patch is based on Hu Tao's:
http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00124.html

No need to hard-code pvpanic as part of the machine.
It can be added with "-device pvpanic" from command line (The next patch).
Anyway, for backport compatibility it is still part of 1.5
machine.

Signed-off-by: Marcel Apfelbaum 
---
Changes from v1:
 - Keep pvpanic device enabled by default for 1.5
   for backport compatibility

 hw/i386/pc_piix.c | 9 -
 hw/i386/pc_q35.c  | 7 ---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index ab25458..679d2e5 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -56,7 +56,7 @@ static const int ide_iobase[MAX_IDE_BUS] = { 0x1f0, 0x170 };
 static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 };
 static const int ide_irq[MAX_IDE_BUS] = { 14, 15 };
 
-static bool has_pvpanic = true;
+static bool has_pvpanic;
 static bool has_pci_info = true;
 
 /* PC hardware initialisation */
@@ -252,14 +252,15 @@ static void pc_init_pci(QEMUMachineInitArgs *args)
 static void pc_init_pci_1_5(QEMUMachineInitArgs *args)
 {
 has_pci_info = false;
+has_pvpanic = true;
 pc_init_pci(args);
 }
 
 static void pc_init_pci_1_4(QEMUMachineInitArgs *args)
 {
-has_pvpanic = false;
 x86_cpu_compat_set_features("n270", FEAT_1_ECX, 0, CPUID_EXT_MOVBE);
-pc_init_pci_1_5(args);
+has_pci_info = false;
+pc_init_pci(args);
 }
 
 static void pc_init_pci_1_3(QEMUMachineInitArgs *args)
@@ -290,7 +291,6 @@ static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs 
*args)
 const char *kernel_cmdline = args->kernel_cmdline;
 const char *initrd_filename = args->initrd_filename;
 const char *boot_device = args->boot_device;
-has_pvpanic = false;
 has_pci_info = false;
 disable_kvm_pv_eoi();
 enable_compat_apic_id_mode();
@@ -309,7 +309,6 @@ static void pc_init_isa(QEMUMachineInitArgs *args)
 const char *kernel_cmdline = args->kernel_cmdline;
 const char *initrd_filename = args->initrd_filename;
 const char *boot_device = args->boot_device;
-has_pvpanic = false;
 has_pci_info = false;
 if (cpu_model == NULL)
 cpu_model = "486";
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 2f35d12..d2bb248 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -46,7 +46,7 @@
 /* ICH9 AHCI has 6 ports */
 #define MAX_SATA_PORTS 6
 
-static bool has_pvpanic = true;
+static bool has_pvpanic;
 static bool has_pci_info = true;
 
 /* PC hardware initialisation */
@@ -220,14 +220,15 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 static void pc_q35_init_1_5(QEMUMachineInitArgs *args)
 {
 has_pci_info = false;
+has_pvpanic = true;
 pc_q35_init(args);
 }
 
 static void pc_q35_init_1_4(QEMUMachineInitArgs *args)
 {
-has_pvpanic = false;
 x86_cpu_compat_set_features("n270", FEAT_1_ECX, 0, CPUID_EXT_MOVBE);
-pc_q35_init_1_5(args);
+has_pci_info = false;
+pc_q35_init(args);
 }
 
 static QEMUMachine pc_q35_machine_v1_6 = {
-- 
1.8.3.1

[Qemu-devel] [PATCH for-1.6 V2 2/2] hw/misc: make pvpanic known to user

2013-08-11 Thread Marcel Apfelbaum

This patch is based on Hu Tao's:
http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00125.html

The pvpanic device may be enabled now with "-device pvpanic"
from command line.

Signed-off-by: Marcel Apfelbaum 
---
Changes from V1:
 - Addressed Andreas Färber review (removed bus type)
 - Small changes to be posible to enable pvpanic
   both from command line and from machine_init
 - Added pvpanic to MISC category

 hw/misc/pvpanic.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c
index 7bb49a5..b64e3bb 100644
--- a/hw/misc/pvpanic.c
+++ b/hw/misc/pvpanic.c
@@ -97,29 +97,24 @@ static void pvpanic_isa_realizefn(DeviceState *dev, Error 
**errp)
 {
 ISADevice *d = ISA_DEVICE(dev);
 PVPanicState *s = ISA_PVPANIC_DEVICE(dev);
+FWCfgState *fw_cfg = fw_cfg_find();
+uint16_t *pvpanic_port;
 
-isa_register_ioport(d, &s->io, s->ioport);
-}
+if (!fw_cfg) {
+return;
+}
 
-static void pvpanic_fw_cfg(ISADevice *dev, FWCfgState *fw_cfg)
-{
-PVPanicState *s = ISA_PVPANIC_DEVICE(dev);
-uint16_t *pvpanic_port = g_malloc(sizeof(*pvpanic_port));
+pvpanic_port = g_malloc(sizeof(*pvpanic_port));
 *pvpanic_port = cpu_to_le16(s->ioport);
-
 fw_cfg_add_file(fw_cfg, "etc/pvpanic-port", pvpanic_port,
 sizeof(*pvpanic_port));
+
+isa_register_ioport(d, &s->io, s->ioport);
 }
 
 void pvpanic_init(ISABus *bus)
 {
-ISADevice *dev;
-FWCfgState *fw_cfg = fw_cfg_find();
-if (!fw_cfg) {
-return;
-}
-dev = isa_create_simple (bus, TYPE_ISA_PVPANIC_DEVICE);
-pvpanic_fw_cfg(dev, fw_cfg);
+isa_create_simple(bus, TYPE_ISA_PVPANIC_DEVICE);
 }
 
 static Property pvpanic_isa_properties[] = {
@@ -132,8 +127,8 @@ static void pvpanic_isa_class_init(ObjectClass *klass, void 
*data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 
 dc->realize = pvpanic_isa_realizefn;
-dc->no_user = 1;
 dc->props = pvpanic_isa_properties;
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
 }
 
 static TypeInfo pvpanic_isa_info = {
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH for-1.6 1/2] don't create pvpanic device by default.

2013-08-11 Thread Michael S. Tsirkin

On Sun, Aug 11, 2013 at 04:45:03PM +0200, Andreas Färber wrote:
> Am 11.08.2013 12:33, schrieb Michael S. Tsirkin:
> > On Fri, Aug 02, 2013 at 10:27:31AM +0200, Paolo Bonzini wrote:
> >> On 08/02/2013 09:04 AM, Hu Tao wrote:
> >>> The problem with pvpanic being an internal device is that VMs running
> >>> operating systems without a driver for this device will have problems
> >>> when qemu will be upgraded (from qemu without this pvpanic).
> >>>
> >>> The outcome may be, for example: in Windows(let's say XP) the Device
> >>> manager will open a "new device" wizard and the device will appear as
> >>> an unrecognized device. On a cluster with hundreds of such VMs, If
> >>> that cluster has a health monitoring service it may show all the VMs
> >>> in a "not healthy" state.
> >>>
> >>> Reported-by: Marcel Apfelbaum 
> >>> Signed-off-by: Hu Tao 
> >>
> >> NACK,
> >>
> >> this is premature.  It is fundamentally a firmware problem.
> >>
> >> We have time to apply an even smaller patch that doesn't set
> >> has_pvpanic to true, and delay the whole feature to 1.7, if we do
> >> not fix the firmware in the next two weeks.
> >>
> >> Paolo
> > 
> > I think this is not just a firmware problem.  Adding device by default
> > was too rush, assumption was risk of guest bugs was 0.
> > 
> > We are now seeing problems with bios guest code and with linux guest
> > drivers as well.  Yes they all can be fixed, but we simply shouldn't
> > force this risk of broken guests on everyone.
> > 
> > libvirt is the main user and libvirt people
> > indicated their preference to creating device with
> > -device pvpanic rather than a built-in one that
> > can't be removed.
> > 
> > So please reconsider, and here's an ack from me.
> > 
> > Acked-by: Michael S. Tsirkin 
> 
> NACK for this v1: As pointed out on the KVM call, we still need to keep
> the pvpanic device around by default for pc-*-1.5. Removing has_pvpanic
> completely therefore seems wrong.

We also mentioned an option to patch 1.5 stable to change it there,
but I'm fine with not doing it.

> Can you submit a v2 for rc3 tomorrow?
> 
> Andreas


> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [PATCH for-1.6 V2 0/2] pvpanic: Separate pvpanic from machine type

2013-08-11 Thread Marcel Apfelbaum

Creating the pvpanic device as part of the machine type has the
potential to trigger guest OS, guest firmware and driver bugs.
The potential of such was originally viewed as minimal.
However, since releasing 1.5 with pvpanic as part
of the builtin machine type, several issues were observed
in the field:
 - Some Windows versions triggered 'New Hardware Wizard' and
   an unidentified device appeared in Device Manager.
 - Issue reported off list: on Linux >= 3.10
   the pvpanic driver breaks the reset on crash option:
   VM stops instead of being reset.

pvpanic device also changes monitor command behaviour in some cases,
such silent incompatible changes aren't expected by management tools:
 - Monitor command requires 'cont' before 'system_reset'
   in order to restart the VM after kernel panic/BSOD 

Note that libvirt is the main user and libvirt people indicated their
preference to creating device with -device pvpanic rather than a
built-in one that can't be removed.

These issues were raised at last KVM call. The agreement reached
there was that we were a bit too rash to make the device
a builtin, and that for 1.6 we should drop the pvpanic device from the
default machine type, instead teach management tools to add it by
default using -device pvpanic.
It's not clear whether changing 1.5 behaviour at this point
is a sane thing, so this patchset doesn't touch 1.5 machine type.

This patch series reworks the patchset from Hu Tao
(don't create pvpanic device by default)
addressing comments and modifying behaviour according
to what was discussed on the call.
Please review and consider for 1.6.

A related discussion can be followed at 
http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00036.html. 

This is a continuation of patches sent by Hu Tao:
http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00124.html
http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00125.html

Changes from v1 (by Hu Tao):
 - Keep pvpanic device enabled by default for 1.5
   for backport compatibility
 - Addressed Andreas Färber review (removed bus type)
 - Small changes to be posible to enable pvpanic
   both from command line and from machine_init
 - Added pvpanic to MISC category

Marcel Apfelbaum (2):
  hw/misc: don't create pvpanic device by default
  hw/misc: make pvpanic known to user

 hw/i386/pc_piix.c |  9 -
 hw/i386/pc_q35.c  |  7 ---
 hw/misc/pvpanic.c | 25 ++---
 3 files changed, 18 insertions(+), 23 deletions(-)

-- 
1.8.3.1

Re: [Qemu-devel] [PATCH for-1.6 1/2] don't create pvpanic device by default.

2013-08-11 Thread Marcel Apfelbaum

On Sun, 2013-08-11 at 16:45 +0200, Andreas Färber wrote:
> Am 11.08.2013 12:33, schrieb Michael S. Tsirkin:
> > On Fri, Aug 02, 2013 at 10:27:31AM +0200, Paolo Bonzini wrote:
> >> On 08/02/2013 09:04 AM, Hu Tao wrote:
> >>> The problem with pvpanic being an internal device is that VMs running
> >>> operating systems without a driver for this device will have problems
> >>> when qemu will be upgraded (from qemu without this pvpanic).
> >>>
> >>> The outcome may be, for example: in Windows(let's say XP) the Device
> >>> manager will open a "new device" wizard and the device will appear as
> >>> an unrecognized device. On a cluster with hundreds of such VMs, If
> >>> that cluster has a health monitoring service it may show all the VMs
> >>> in a "not healthy" state.
> >>>
> >>> Reported-by: Marcel Apfelbaum 
> >>> Signed-off-by: Hu Tao 
> >>
> >> NACK,
> >>
> >> this is premature.  It is fundamentally a firmware problem.
> >>
> >> We have time to apply an even smaller patch that doesn't set
> >> has_pvpanic to true, and delay the whole feature to 1.7, if we do
> >> not fix the firmware in the next two weeks.
> >>
> >> Paolo
> > 
> > I think this is not just a firmware problem.  Adding device by default
> > was too rush, assumption was risk of guest bugs was 0.
> > 
> > We are now seeing problems with bios guest code and with linux guest
> > drivers as well.  Yes they all can be fixed, but we simply shouldn't
> > force this risk of broken guests on everyone.
> > 
> > libvirt is the main user and libvirt people
> > indicated their preference to creating device with
> > -device pvpanic rather than a built-in one that
> > can't be removed.
> > 
> > So please reconsider, and here's an ack from me.
> > 
> > Acked-by: Michael S. Tsirkin 
> 
> NACK for this v1: As pointed out on the KVM call, we still need to keep
> the pvpanic device around by default for pc-*-1.5. Removing has_pvpanic
> completely therefore seems wrong. Can you submit a v2 for rc3 tomorrow?

I just sent a patchset with V2. Can you please review it?
Thanks,
Marcel

> 
> Andreas
>

[Qemu-devel] [PATCH 1/2] vmdk: support vmfsSparse files

2013-08-11 Thread Paolo Bonzini

VMware ESX hosts use a variant of the VMDK3 format, identified by the
vmfsSparse create type ad the VMFSSPARSE extent type.

It has 16 KB grain tables (L2) and a variable-size grain directory (L1).
In addition, the grain size is always 512, but that is not a problem
because it is included in the header.

The format of the extents is documented in the VMDK spec.  The format
of the descriptor file is not documented precisely, but it can be
found at http://kb.vmware.com/kb/10026353 (Recreating a missing virtual
machine disk (VMDK) descriptor file for delta disks).

With these patches, vmfsSparse files only work if opened through the
descriptor file.  Data files without descriptor files, as far as I
could understand, are not supported by ESX.

Signed-off-by: Paolo Bonzini 
---
 block/vmdk.c | 51 ++-
 1 file changed, 46 insertions(+), 5 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index b16d509..eaf484a 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -505,6 +505,34 @@ static int vmdk_open_vmdk3(BlockDriverState *bs,
 return ret;
 }
 
+static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
+ BlockDriverState *file,
+ int flags)
+{
+int ret;
+uint32_t magic;
+VMDK3Header header;
+VmdkExtent *extent;
+
+ret = bdrv_pread(file, sizeof(magic), &header, sizeof(header));
+if (ret < 0) {
+return ret;
+}
+extent = vmdk_add_extent(bs, file, false,
+  le64_to_cpu(header.disk_sectors),
+  le64_to_cpu(header.l1dir_offset) << 9,
+  0,
+  le64_to_cpu(header.l1dir_size) * 4,
+  4096,
+  le64_to_cpu(header.granularity)); /* always 512 */
+ret = vmdk_init_tables(bs, extent);
+if (ret) {
+/* free extent allocated by vmdk_add_extent */
+vmdk_free_last_extent(bs);
+}
+return ret;
+}
+
 static int vmdk_open_desc_file(BlockDriverState *bs, int flags,
uint64_t desc_offset);
 
@@ -663,7 +691,7 @@ static int vmdk_parse_description(const char *desc, const 
char *opt_name,
 /* Open an extent file and append to bs array */
 static int vmdk_open_sparse(BlockDriverState *bs,
 BlockDriverState *file,
-int flags)
+int flags, bool vmfs_sparse)
 {
 uint32_t magic;
 
@@ -674,7 +702,11 @@ static int vmdk_open_sparse(BlockDriverState *bs,
 magic = be32_to_cpu(magic);
 switch (magic) {
 case VMDK3_MAGIC:
-return vmdk_open_vmdk3(bs, file, flags);
+if (vmfs_sparse) {
+return vmdk_open_vmfs_sparse(bs, file, flags);
+} else {
+return vmdk_open_vmdk3(bs, file, flags);
+}
 break;
 case VMDK4_MAGIC:
 return vmdk_open_vmdk4(bs, file, flags);
@@ -718,7 +750,8 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 }
 
 if (sectors <= 0 ||
-(strcmp(type, "FLAT") && strcmp(type, "SPARSE")) ||
+(strcmp(type, "FLAT") && strcmp(type, "SPARSE") &&
+ strcmp(type, "VMFSSPARSE")) ||
 (strcmp(access, "RW"))) {
 goto next_line;
 }
@@ -743,7 +776,14 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 extent->flat_start_offset = flat_offset << 9;
 } else if (!strcmp(type, "SPARSE")) {
 /* SPARSE extent */
-ret = vmdk_open_sparse(bs, extent_file, bs->open_flags);
+ret = vmdk_open_sparse(bs, extent_file, bs->open_flags, false);
+if (ret) {
+bdrv_delete(extent_file);
+return ret;
+}
+} else if (!strcmp(type, "VMFSSPARSE")) {
+/* VMFSSPARSE extent */
+ret = vmdk_open_sparse(bs, extent_file, bs->open_flags, true);
 if (ret) {
 bdrv_delete(extent_file);
 return ret;
@@ -789,6 +829,7 @@ static int vmdk_open_desc_file(BlockDriverState *bs, int 
flags,
 goto exit;
 }
 if (strcmp(ct, "monolithicFlat") &&
+strcmp(ct, "vmfsSparse") &&
 strcmp(ct, "twoGbMaxExtentSparse") &&
 strcmp(ct, "twoGbMaxExtentFlat")) {
 fprintf(stderr,
@@ -808,7 +849,7 @@ static int vmdk_open(BlockDriverState *bs, QDict *options, 
int flags)
 int ret;
 BDRVVmdkState *s = bs->opaque;
 
-if (vmdk_open_sparse(bs, bs->file, flags) == 0) {
+if (vmdk_open_sparse(bs, bs->file, flags, false) == 0) {
 s->desc_offset = 0x200;
 } else {
 ret = vmdk_open_desc_file(bs, flags, 0);
-- 
1.8.3.1

[Qemu-devel] [PATCH 2/2] vmdk: support vmfs files

2013-08-11 Thread Paolo Bonzini

VMware ESX hosts also use different create and extent types for flat
files, respectively "vmfs" and "VMFS".  This is not documented, but it
can be found at http://kb.vmware.com/kb/10002511 (Recreating a missing
virtual machine disk (VMDK) descriptor file).

Signed-off-by: Paolo Bonzini 
---
 block/vmdk.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index eaf484a..2a5b63d 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -751,7 +751,7 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 
 if (sectors <= 0 ||
 (strcmp(type, "FLAT") && strcmp(type, "SPARSE") &&
- strcmp(type, "VMFSSPARSE")) ||
+ strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE")) ||
 (strcmp(access, "RW"))) {
 goto next_line;
 }
@@ -764,7 +764,7 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 }
 
 /* save to extents array */
-if (!strcmp(type, "FLAT")) {
+if (!strcmp(type, "FLAT") || !strcmp(type, "VMFS")) {
 /* FLAT extent */
 VmdkExtent *extent;
 
@@ -829,6 +829,7 @@ static int vmdk_open_desc_file(BlockDriverState *bs, int 
flags,
 goto exit;
 }
 if (strcmp(ct, "monolithicFlat") &&
+strcmp(ct, "vmfs") &&
 strcmp(ct, "vmfsSparse") &&
 strcmp(ct, "twoGbMaxExtentSparse") &&
 strcmp(ct, "twoGbMaxExtentFlat")) {
-- 
1.8.3.1

[Qemu-devel] [PATCH 0/2] vmdk: Support ESX files

2013-08-11 Thread Paolo Bonzini

So I thought I was on vacation, but my neighbor (who has a small data
recovery company) asked me if I knew how to extract metadata from
ESX images.

Sure, said I, thinking there would be no better test case for my metadata
dump patches.  However, ESX files are slightly different from "hosted"
files (as VMware calls them), so we needed a small patch to open them.
This series is based on that, and adds support for these files.

Paolo Bonzini (2):
  vmdk: support vmfsSparse files
  vmdk: support vmfs files

 block/vmdk.c | 54 --
 1 file changed, 48 insertions(+), 6 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [RFC] [PATCHv10 02/31] aio / timers: Rename qemu_new_clock and expose clock types

2013-08-11 Thread Alex Bligh

Rename qemu_new_clock to qemu_clock_new.

Expose clock types.

Signed-off-by: Alex Bligh 
---
 include/qemu/timer.h |4 
 qemu-timer.c |   12 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index a9afdb3..da43cbe 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -11,6 +11,10 @@
 #define SCALE_US 1000
 #define SCALE_NS 1
 
+#define QEMU_CLOCK_REALTIME 0
+#define QEMU_CLOCK_VIRTUAL  1
+#define QEMU_CLOCK_HOST 2
+
 typedef struct QEMUClock QEMUClock;
 typedef void QEMUTimerCB(void *opaque);
 
diff --git a/qemu-timer.c b/qemu-timer.c
index 682c50f..4117add 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -40,10 +40,6 @@
 /***/
 /* timers */
 
-#define QEMU_CLOCK_REALTIME 0
-#define QEMU_CLOCK_VIRTUAL  1
-#define QEMU_CLOCK_HOST 2
-
 struct QEMUClock {
 QEMUTimer *active_timers;
 
@@ -231,7 +227,7 @@ QEMUClock *rt_clock;
 QEMUClock *vm_clock;
 QEMUClock *host_clock;
 
-static QEMUClock *qemu_new_clock(int type)
+static QEMUClock *qemu_clock_new(int type)
 {
 QEMUClock *clock;
 
@@ -433,9 +429,9 @@ void qemu_unregister_clock_reset_notifier(QEMUClock *clock, 
Notifier *notifier)
 void init_clocks(void)
 {
 if (!rt_clock) {
-rt_clock = qemu_new_clock(QEMU_CLOCK_REALTIME);
-vm_clock = qemu_new_clock(QEMU_CLOCK_VIRTUAL);
-host_clock = qemu_new_clock(QEMU_CLOCK_HOST);
+rt_clock = qemu_clock_new(QEMU_CLOCK_REALTIME);
+vm_clock = qemu_clock_new(QEMU_CLOCK_VIRTUAL);
+host_clock = qemu_clock_new(QEMU_CLOCK_HOST);
 }
 }
 
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 03/31] aio / timers: add qemu-timer.c utility functions

2013-08-11 Thread Alex Bligh

Add utility functions to qemu-timer.c for nanosecond timing.

Add qemu_clock_deadline_ns to calculate deadlines to
nanosecond accuracy.

Add utility function qemu_soonest_timeout to calculate soonest deadline.

Add qemu_timeout_ns_to_ms to convert a timeout in nanoseconds back to
milliseconds for when ppoll is not used.

Signed-off-by: Alex Bligh 
---
 include/qemu/timer.h |   42 ++
 qemu-timer.c |   50 ++
 2 files changed, 92 insertions(+)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index da43cbe..e0a51a1 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -40,6 +40,29 @@ int64_t qemu_get_clock_ns(QEMUClock *clock);
 int64_t qemu_clock_has_timers(QEMUClock *clock);
 int64_t qemu_clock_expired(QEMUClock *clock);
 int64_t qemu_clock_deadline(QEMUClock *clock);
+
+/**
+ * qemu_clock_deadline_ns:
+ * @clock: the clock to operate on
+ *
+ * Calculate the timeout of the earliest expiring timer
+ * in nanoseconds, or -1 if no timer is set to expire.
+ *
+ * Returns: time until expiry in nanoseconds or -1
+ */
+int64_t qemu_clock_deadline_ns(QEMUClock *clock);
+
+/**
+ * qemu_timeout_ns_to_ms:
+ * @ns: nanosecond timeout value
+ *
+ * Convert a nanosecond timeout value (or -1) to
+ * a millisecond value (or -1), always rounding up.
+ *
+ * Returns: millisecond timeout value
+ */
+int qemu_timeout_ns_to_ms(int64_t ns);
+
 void qemu_clock_enable(QEMUClock *clock, bool enabled);
 void qemu_clock_warp(QEMUClock *clock);
 
@@ -67,6 +90,25 @@ int64_t cpu_get_ticks(void);
 void cpu_enable_ticks(void);
 void cpu_disable_ticks(void);
 
+/**
+ * qemu_soonest_timeout:
+ * @timeout1: first timeout in nanoseconds (or -1 for infinite)
+ * @timeout2: second timeout in nanoseconds (or -1 for infinite)
+ *
+ * Calculates the soonest of two timeout values. -1 means infinite, which
+ * is later than any other value.
+ *
+ * Returns: soonest timeout value in nanoseconds (or -1 for infinite)
+ */
+static inline int64_t qemu_soonest_timeout(int64_t timeout1, int64_t timeout2)
+{
+/* we can abuse the fact that -1 (which means infinite) is a maximal
+ * value when cast to unsigned. As this is disgusting, it's kept in
+ * one inline function.
+ */
+return ((uint64_t) timeout1 < (uint64_t) timeout2) ? timeout1 : timeout2;
+}
+
 static inline QEMUTimer *qemu_new_timer_ns(QEMUClock *clock, QEMUTimerCB *cb,
void *opaque)
 {
diff --git a/qemu-timer.c b/qemu-timer.c
index 4117add..df8f12b 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -273,6 +273,56 @@ int64_t qemu_clock_deadline(QEMUClock *clock)
 return delta;
 }
 
+/*
+ * As above, but return -1 for no deadline, and do not cap to 2^32
+ * as we know the result is always positive.
+ */
+
+int64_t qemu_clock_deadline_ns(QEMUClock *clock)
+{
+int64_t delta;
+
+if (!clock->enabled || !clock->active_timers) {
+return -1;
+}
+
+delta = clock->active_timers->expire_time - qemu_get_clock_ns(clock);
+
+if (delta <= 0) {
+return 0;
+}
+
+return delta;
+}
+
+/* Transition function to convert a nanosecond timeout to ms
+ * This is used where a system does not support ppoll
+ */
+int qemu_timeout_ns_to_ms(int64_t ns)
+{
+int64_t ms;
+if (ns < 0) {
+return -1;
+}
+
+if (!ns) {
+return 0;
+}
+
+/* Always round up, because it's better to wait too long than to wait too
+ * little and effectively busy-wait
+ */
+ms = (ns + SCALE_MS - 1) / SCALE_MS;
+
+/* To avoid overflow problems, limit this to 2^31, i.e. approx 25 days */
+if (ms > (int64_t) INT32_MAX) {
+ms = INT32_MAX;
+}
+
+return (int) ms;
+}
+
+
 QEMUTimer *qemu_new_timer(QEMUClock *clock, int scale,
   QEMUTimerCB *cb, void *opaque)
 {
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 06/31] aio / timers: Add prctl(PR_SET_TIMERSLACK, 1, ...) to reduce timer slack

2013-08-11 Thread Alex Bligh

Where supported, called prctl(PR_SET_TIMERSLACK, 1, ...) to
set one nanosecond timer slack to increase precision of timer
calls.

Signed-off-by: Alex Bligh 
---
 configure|   18 ++
 qemu-timer.c |7 +++
 2 files changed, 25 insertions(+)

diff --git a/configure b/configure
index 5659412..0a55c20 100755
--- a/configure
+++ b/configure
@@ -2834,6 +2834,21 @@ if compile_prog "" "" ; then
   ppoll=yes
 fi
 
+# check for prctl(PR_SET_TIMERSLACK , ... ) support
+prctl_pr_set_timerslack=no
+cat > $TMPC << EOF
+#include 
+
+int main(void)
+{
+prctl(PR_SET_TIMERSLACK, 1, 0, 0, 0);
+return 0;
+}
+EOF
+if compile_prog "" "" ; then
+  prctl_pr_set_timerslack=yes
+fi
+
 # check for epoll support
 epoll=no
 cat > $TMPC << EOF
@@ -3833,6 +3848,9 @@ fi
 if test "$ppoll" = "yes" ; then
   echo "CONFIG_PPOLL=y" >> $config_host_mak
 fi
+if test "$prctl_pr_set_timerslack" = "yes" ; then
+  echo "CONFIG_PRCTL_PR_SET_TIMERSLACK=y" >> $config_host_mak
+fi
 if test "$epoll" = "yes" ; then
   echo "CONFIG_EPOLL=y" >> $config_host_mak
 fi
diff --git a/qemu-timer.c b/qemu-timer.c
index 4bf05d4..f224b62 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -41,6 +41,10 @@
 #include 
 #endif
 
+#ifdef CONFIG_PRCTL_PR_SET_TIMERSLACK
+#include 
+#endif
+
 /***/
 /* timers */
 
@@ -507,6 +511,9 @@ void init_clocks(void)
 vm_clock = qemu_clock_new(QEMU_CLOCK_VIRTUAL);
 host_clock = qemu_clock_new(QEMU_CLOCK_HOST);
 }
+#ifdef CONFIG_PRCTL_PR_SET_TIMERSLACK
+prctl(PR_SET_TIMERSLACK, 1, 0, 0, 0);
+#endif
 }
 
 uint64_t timer_expire_time_ns(QEMUTimer *ts)
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 11/31] aio / timers: Add QEMUTimerListGroup to AioContext

2013-08-11 Thread Alex Bligh

Add a QEMUTimerListGroup each AioContext (meaning a QEMUTimerList
associated with each clock is added) and delete it when the
AioContext is freed.

Signed-off-by: Alex Bligh 
---
 async.c  |2 ++
 include/block/aio.h  |4 
 tests/test-aio.c |3 +++
 tests/test-thread-pool.c |3 +++
 4 files changed, 12 insertions(+)

diff --git a/async.c b/async.c
index 5ce3633..ae2c700 100644
--- a/async.c
+++ b/async.c
@@ -205,6 +205,7 @@ aio_ctx_finalize(GSource *source)
 event_notifier_cleanup(&ctx->notifier);
 qemu_mutex_destroy(&ctx->bh_lock);
 g_array_free(ctx->pollfds, TRUE);
+timerlistgroup_deinit(&ctx->tlg);
 }
 
 static GSourceFuncs aio_source_funcs = {
@@ -244,6 +245,7 @@ AioContext *aio_context_new(void)
 aio_set_event_notifier(ctx, &ctx->notifier, 
(EventNotifierHandler *)
event_notifier_test_and_clear, NULL);
+timerlistgroup_init(&ctx->tlg);
 
 return ctx;
 }
diff --git a/include/block/aio.h b/include/block/aio.h
index cc1..a13f6e8 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -18,6 +18,7 @@
 #include "qemu/queue.h"
 #include "qemu/event_notifier.h"
 #include "qemu/thread.h"
+#include "qemu/timer.h"
 
 typedef struct BlockDriverAIOCB BlockDriverAIOCB;
 typedef void BlockDriverCompletionFunc(void *opaque, int ret);
@@ -72,6 +73,9 @@ typedef struct AioContext {
 
 /* Thread pool for performing work and receiving completion callbacks */
 struct ThreadPool *thread_pool;
+
+/* TimerLists for calling timers - one per clock type */
+QEMUTimerListGroup tlg;
 } AioContext;
 
 /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
diff --git a/tests/test-aio.c b/tests/test-aio.c
index c173870..2d7ec4c 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -12,6 +12,7 @@
 
 #include 
 #include "block/aio.h"
+#include "qemu/timer.h"
 
 AioContext *ctx;
 
@@ -628,6 +629,8 @@ int main(int argc, char **argv)
 {
 GSource *src;
 
+init_clocks();
+
 ctx = aio_context_new();
 src = aio_get_g_source(ctx);
 g_source_attach(src, NULL);
diff --git a/tests/test-thread-pool.c b/tests/test-thread-pool.c
index b62338f..27d6190 100644
--- a/tests/test-thread-pool.c
+++ b/tests/test-thread-pool.c
@@ -3,6 +3,7 @@
 #include "block/aio.h"
 #include "block/thread-pool.h"
 #include "block/block.h"
+#include "qemu/timer.h"
 
 static AioContext *ctx;
 static ThreadPool *pool;
@@ -205,6 +206,8 @@ int main(int argc, char **argv)
 {
 int ret;
 
+init_clocks();
+
 ctx = aio_context_new();
 pool = aio_get_thread_pool(ctx);
 
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 14/31] aio / timers: Add aio_timer_init & aio_timer_new wrappers

2013-08-11 Thread Alex Bligh

Add aio_timer_init and aio_timer_new wrapper functions.

Signed-off-by: Alex Bligh 
---
 include/block/aio.h |   43 +++
 1 file changed, 43 insertions(+)

diff --git a/include/block/aio.h b/include/block/aio.h
index a13f6e8..07d5053 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -255,4 +255,47 @@ void qemu_aio_set_fd_handler(int fd,
  void *opaque);
 #endif
 
+/**
+ * aio_timer_new:
+ * @ctx: the aio context
+ * @type: the clock type
+ * @scale: the scale
+ * @cb: the callback to call on timer expiry
+ * @opaque: the opaque pointer to pass to the callback
+ *
+ * Allocate a new timer attached to the context @ctx.
+ * The function is responsible for memory allocation.
+ *
+ * The preferred interface is aio_timer_init. Use that
+ * unless you really need dynamic memory allocation.
+ *
+ * Returns: a pointer to the new timer
+ */
+static inline QEMUTimer *aio_timer_new(AioContext *ctx, QEMUClockType type,
+   int scale,
+   QEMUTimerCB *cb, void *opaque)
+{
+return timer_new_tl(ctx->tlg.tl[type], scale, cb, opaque);
+}
+
+/**
+ * aio_timer_init:
+ * @ctx: the aio context
+ * @ts: the timer
+ * @type: the clock type
+ * @scale: the scale
+ * @cb: the callback to call on timer expiry
+ * @opaque: the opaque pointer to pass to the callback
+ *
+ * Initialise a new timer attached to the context @ctx.
+ * The caller is responsible for memory allocation.
+ */
+static inline void aio_timer_init(AioContext *ctx,
+  QEMUTimer *ts, QEMUClockType type,
+  int scale,
+  QEMUTimerCB *cb, void *opaque)
+{
+timer_init(ts, ctx->tlg.tl[type], scale, cb, opaque);
+}
+
 #endif
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 17/31] aio / timers: On timer modification, qemu_notify or aio_notify

2013-08-11 Thread Alex Bligh

On qemu_mod_timer_ns, ensure qemu_notify or aio_notify is called to
end the appropriate poll(), irrespective of use_icount value.

On qemu_clock_enable, ensure qemu_notify or aio_notify is called for
all QEMUTimerLists attached to the QEMUClock.

Signed-off-by: Alex Bligh 
---
 include/qemu/timer.h |9 +
 qemu-timer.c |   13 ++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 354ee88..11a6c61 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -135,6 +135,15 @@ bool qemu_clock_use_for_deadline(QEMUClock *clock);
 QEMUTimerList *qemu_clock_get_main_loop_timerlist(QEMUClock *clock);
 
 /**
+ * qemu_clock_nofify:
+ * @clock: the clock to operate on
+ *
+ * Call the notifier callback connected with the default timer
+ * list linked to the clock, or qemu_notify() if none.
+ */
+void qemu_clock_notify(QEMUClock *clock);
+
+/**
  * timerlist_new:
  * @type: the clock type to associate with the timerlist
  * @cb: the callback to call on notification
diff --git a/qemu-timer.c b/qemu-timer.c
index c1de3d3..ec25bcc 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -304,11 +304,20 @@ bool qemu_clock_use_for_deadline(QEMUClock *clock)
 return !(use_icount && (clock->type == QEMU_CLOCK_VIRTUAL));
 }
 
+void qemu_clock_notify(QEMUClock *clock)
+{
+QEMUTimerList *timer_list;
+QLIST_FOREACH(timer_list, &clock->timerlists, list) {
+timerlist_notify(timer_list);
+}
+}
+
 void qemu_clock_enable(QEMUClock *clock, bool enabled)
 {
 bool old = clock->enabled;
 clock->enabled = enabled;
 if (enabled && !old) {
+qemu_clock_notify(clock);
 qemu_rearm_alarm_timer(alarm_timer);
 }
 }
@@ -522,9 +531,7 @@ void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time)
 }
 /* Interrupt execution to force deadline recalculation.  */
 qemu_clock_warp(ts->timer_list->clock);
-if (use_icount) {
-timerlist_notify(ts->timer_list);
-}
+timerlist_notify(ts->timer_list);
 }
 }
 
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 16/31] aio / timers: Convert mainloop to use timeout

2013-08-11 Thread Alex Bligh

Convert mainloop to use timeout from default timerlist group
(i.e. the current 3 static timers)

Signed-off-by: Alex Bligh 
---
 main-loop.c |   45 ++---
 1 file changed, 34 insertions(+), 11 deletions(-)

diff --git a/main-loop.c b/main-loop.c
index a44fff6..afc3e31 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -155,10 +155,11 @@ static int max_priority;
 static int glib_pollfds_idx;
 static int glib_n_poll_fds;
 
-static void glib_pollfds_fill(uint32_t *cur_timeout)
+static void glib_pollfds_fill(int64_t *cur_timeout)
 {
 GMainContext *context = g_main_context_default();
 int timeout = 0;
+int64_t timeout_ns;
 int n;
 
 g_main_context_prepare(context, &max_priority);
@@ -174,9 +175,13 @@ static void glib_pollfds_fill(uint32_t *cur_timeout)
  glib_n_poll_fds);
 } while (n != glib_n_poll_fds);
 
-if (timeout >= 0 && timeout < *cur_timeout) {
-*cur_timeout = timeout;
+if (timeout < 0) {
+timeout_ns = -1;
+} else {
+timeout_ns = (int64_t)timeout * (int64_t)SCALE_MS;
 }
+
+*cur_timeout = qemu_soonest_timeout(timeout_ns, *cur_timeout);
 }
 
 static void glib_pollfds_poll(void)
@@ -191,7 +196,7 @@ static void glib_pollfds_poll(void)
 
 #define MAX_MAIN_LOOP_SPIN (1000)
 
-static int os_host_main_loop_wait(uint32_t timeout)
+static int os_host_main_loop_wait(int64_t timeout)
 {
 int ret;
 static int spin_counter;
@@ -214,7 +219,7 @@ static int os_host_main_loop_wait(uint32_t timeout)
 notified = true;
 }
 
-timeout = 1;
+timeout = SCALE_MS;
 }
 
 if (timeout > 0) {
@@ -224,7 +229,7 @@ static int os_host_main_loop_wait(uint32_t timeout)
 spin_counter++;
 }
 
-ret = g_poll((GPollFD *)gpollfds->data, gpollfds->len, timeout);
+ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, timeout);
 
 if (timeout > 0) {
 qemu_mutex_lock_iothread();
@@ -373,7 +378,7 @@ static void pollfds_poll(GArray *pollfds, int nfds, fd_set 
*rfds,
 }
 }
 
-static int os_host_main_loop_wait(uint32_t timeout)
+static int os_host_main_loop_wait(int64_t timeout)
 {
 GMainContext *context = g_main_context_default();
 GPollFD poll_fds[1024 * 2]; /* this is probably overkill */
@@ -382,6 +387,7 @@ static int os_host_main_loop_wait(uint32_t timeout)
 PollingEntry *pe;
 WaitObjects *w = &wait_objects;
 gint poll_timeout;
+int64_t poll_timeout_ns;
 static struct timeval tv0;
 fd_set rfds, wfds, xfds;
 int nfds;
@@ -419,12 +425,17 @@ static int os_host_main_loop_wait(uint32_t timeout)
 poll_fds[n_poll_fds + i].events = G_IO_IN;
 }
 
-if (poll_timeout < 0 || timeout < poll_timeout) {
-poll_timeout = timeout;
+if (poll_timeout < 0) {
+poll_timeout_ns = -1;
+} else {
+poll_timeout_ns = (int64_t)poll_timeout * (int64_t)SCALE_MS;
 }
 
+poll_timeout_ns = qemu_soonest_timeout(poll_timeout_ns, timeout);
+
 qemu_mutex_unlock_iothread();
-g_poll_ret = g_poll(poll_fds, n_poll_fds + w->num, poll_timeout);
+g_poll_ret = qemu_poll_ns(poll_fds, n_poll_fds + w->num, poll_timeout_ns);
+
 qemu_mutex_lock_iothread();
 if (g_poll_ret > 0) {
 for (i = 0; i < w->num; i++) {
@@ -449,6 +460,7 @@ int main_loop_wait(int nonblocking)
 {
 int ret;
 uint32_t timeout = UINT32_MAX;
+int64_t timeout_ns;
 
 if (nonblocking) {
 timeout = 0;
@@ -462,7 +474,18 @@ int main_loop_wait(int nonblocking)
 slirp_pollfds_fill(gpollfds);
 #endif
 qemu_iohandler_fill(gpollfds);
-ret = os_host_main_loop_wait(timeout);
+
+if (timeout == UINT32_MAX) {
+timeout_ns = -1;
+} else {
+timeout_ns = (uint64_t)timeout * (int64_t)(SCALE_MS);
+}
+
+timeout_ns = qemu_soonest_timeout(timeout_ns,
+  timerlistgroup_deadline_ns(
+  &main_loop_tlg));
+
+ret = os_host_main_loop_wait(timeout_ns);
 qemu_iohandler_poll(gpollfds, ret);
 #ifdef CONFIG_SLIRP
 slirp_pollfds_poll(gpollfds, (ret < 0));
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 18/31] aio / timers: Introduce new API timer_new and friends

2013-08-11 Thread Alex Bligh

Introduce new API for creating timers - timer_new and
_ns, _ms, _us derivatives.

Signed-off-by: Alex Bligh 
---
 include/qemu/timer.h |   69 ++
 1 file changed, 69 insertions(+)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 11a6c61..4161ec7 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -369,6 +369,24 @@ static inline QEMUTimer *timer_new_tl(QEMUTimerList 
*timer_list,
 return ts;
 }
 
+/**
+ * timer_new:
+ * @type: the clock type to use
+ * @scale: the scale value for the tiemr
+ * @cb: the callback to be called when the timer expires
+ * @opaque: the opaque pointer to be passed to the callback
+ *
+ * Creeate a new timer and associate it with the default
+ * timer list for the clock type @type.
+ *
+ * Returns: a pointer to the timer
+ */
+static inline QEMUTimer *timer_new(QEMUClockType type, int scale,
+   QEMUTimerCB *cb, void *opaque)
+{
+return timer_new_tl(main_loop_tlg.tl[type], scale, cb, opaque);
+}
+
 void qemu_free_timer(QEMUTimer *ts);
 void qemu_del_timer(QEMUTimer *ts);
 void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time);
@@ -492,6 +510,23 @@ static inline QEMUTimer *qemu_new_timer_ns(QEMUClock 
*clock, QEMUTimerCB *cb,
 }
 
 /**
+ * timer_new_ns:
+ * @clock: the clock to associate with the timer
+ * @callback: the callback to call when the timer expires
+ * @opaque: the opaque pointer to pass to the callback
+ *
+ * Create a new timer with nanosecond scale on the default timer list
+ * associated with the clock.
+ *
+ * Returns: a pointer to the newly created timer
+ */
+static inline QEMUTimer *timer_new_ns(QEMUClockType type, QEMUTimerCB *cb,
+  void *opaque)
+{
+return timer_new(type, SCALE_NS, cb, opaque);
+}
+
+/**
  * qemu_new_timer_us:
  * @clock: the clock to associate with the timer
  * @callback: the callback to call when the timer expires
@@ -510,6 +545,23 @@ static inline QEMUTimer *qemu_new_timer_us(QEMUClock 
*clock,
 }
 
 /**
+ * timer_new_us:
+ * @clock: the clock to associate with the timer
+ * @callback: the callback to call when the timer expires
+ * @opaque: the opaque pointer to pass to the callback
+ *
+ * Create a new timer with microsecond scale on the default timer list
+ * associated with the clock.
+ *
+ * Returns: a pointer to the newly created timer
+ */
+static inline QEMUTimer *timer_new_us(QEMUClockType type, QEMUTimerCB *cb,
+  void *opaque)
+{
+return timer_new(type, SCALE_US, cb, opaque);
+}
+
+/**
  * qemu_new_timer_ms:
  * @clock: the clock to associate with the timer
  * @callback: the callback to call when the timer expires
@@ -527,6 +579,23 @@ static inline QEMUTimer *qemu_new_timer_ms(QEMUClock 
*clock,
 return qemu_new_timer(clock, SCALE_MS, cb, opaque);
 }
 
+/**
+ * timer_new_ms:
+ * @clock: the clock to associate with the timer
+ * @callback: the callback to call when the timer expires
+ * @opaque: the opaque pointer to pass to the callback
+ *
+ * Create a new timer with millisecond scale on the default timer list
+ * associated with the clock.
+ *
+ * Returns: a pointer to the newly created timer
+ */
+static inline QEMUTimer *timer_new_ms(QEMUClockType type, QEMUTimerCB *cb,
+  void *opaque)
+{
+return timer_new(type, SCALE_MS, cb, opaque);
+}
+
 static inline int64_t qemu_get_clock_ms(QEMUClock *clock)
 {
 return qemu_get_clock_ns(clock) / SCALE_MS;
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 01/31] aio / timers: Rename qemu_timer_* functions

2013-08-11 Thread Alex Bligh

Rename four functions in preparation for new API.

Rename qemu_timer_expired to timer_expired
Rename qemu_timer_expire_time_ns to timer_expire_time_ns
Rename qemu_timer_pending to timer_pending
Rename qemu_timer_expired_ns to timer_expired_ns

Signed-off-by: Alex Bligh 
---
 backends/baum.c|6 +++---
 hw/input/tsc2005.c |2 +-
 hw/input/tsc210x.c |2 +-
 hw/mips/cputimer.c |4 ++--
 hw/openrisc/cputimer.c |2 +-
 hw/timer/mc146818rtc.c |6 +++---
 hw/usb/redirect.c  |4 ++--
 include/qemu/timer.h   |6 +++---
 qemu-timer.c   |   20 ++--
 savevm.c   |2 +-
 10 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/backends/baum.c b/backends/baum.c
index 62aa784..b08e1d5 100644
--- a/backends/baum.c
+++ b/backends/baum.c
@@ -314,9 +314,9 @@ static int baum_eat_packet(BaumDriverState *baum, const 
uint8_t *buf, int len)
 return 0; \
 if (*cur++ != ESC) { \
 DPRINTF("Broken packet %#2x, tossing\n", req); \
-   if (qemu_timer_pending(baum->cellCount_timer)) { \
-qemu_del_timer(baum->cellCount_timer); \
-baum_cellCount_timer_cb(baum); \
+if (timer_pending(baum->cellCount_timer)) {\
+qemu_del_timer(baum->cellCount_timer); \
+baum_cellCount_timer_cb(baum); \
 } \
 return (cur - 2 - buf); \
 } \
diff --git a/hw/input/tsc2005.c b/hw/input/tsc2005.c
index a771cd5..ebd1b7e 100644
--- a/hw/input/tsc2005.c
+++ b/hw/input/tsc2005.c
@@ -513,7 +513,7 @@ static int tsc2005_load(QEMUFile *f, void *opaque, int 
version_id)
 for (i = 0; i < 8; i ++)
 s->tr[i] = qemu_get_be32(f);
 
-s->busy = qemu_timer_pending(s->timer);
+s->busy = timer_pending(s->timer);
 tsc2005_pin_update(s);
 
 return 0;
diff --git a/hw/input/tsc210x.c b/hw/input/tsc210x.c
index 9b854e7..0067f98 100644
--- a/hw/input/tsc210x.c
+++ b/hw/input/tsc210x.c
@@ -1093,7 +1093,7 @@ static int tsc210x_load(QEMUFile *f, void *opaque, int 
version_id)
 for (i = 0; i < 0x14; i ++)
 qemu_get_be16s(f, &s->filter_data[i]);
 
-s->busy = qemu_timer_pending(s->timer);
+s->busy = timer_pending(s->timer);
 qemu_set_irq(s->pint, !s->irq);
 qemu_set_irq(s->davint, !s->dav);
 
diff --git a/hw/mips/cputimer.c b/hw/mips/cputimer.c
index e0266bf..739bbac 100644
--- a/hw/mips/cputimer.c
+++ b/hw/mips/cputimer.c
@@ -72,8 +72,8 @@ uint32_t cpu_mips_get_count (CPUMIPSState *env)
 uint64_t now;
 
 now = qemu_get_clock_ns(vm_clock);
-if (qemu_timer_pending(env->timer)
-&& qemu_timer_expired(env->timer, now)) {
+if (timer_pending(env->timer)
+&& timer_expired(env->timer, now)) {
 /* The timer has already expired.  */
 cpu_mips_timer_expire(env);
 }
diff --git a/hw/openrisc/cputimer.c b/hw/openrisc/cputimer.c
index 4144b34..9a09f5c 100644
--- a/hw/openrisc/cputimer.c
+++ b/hw/openrisc/cputimer.c
@@ -72,7 +72,7 @@ static void openrisc_timer_cb(void *opaque)
 OpenRISCCPU *cpu = opaque;
 
 if ((cpu->env.ttmr & TTMR_IE) &&
- qemu_timer_expired(cpu->env.timer, qemu_get_clock_ns(vm_clock))) {
+ timer_expired(cpu->env.timer, qemu_get_clock_ns(vm_clock))) {
 CPUState *cs = CPU(cpu);
 
 cpu->env.ttmr |= TTMR_IP;
diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c
index 3c3baac..d12f6e7 100644
--- a/hw/timer/mc146818rtc.c
+++ b/hw/timer/mc146818rtc.c
@@ -252,7 +252,7 @@ static void check_update_timer(RTCState *s)
  * the alarm time.  */
 next_update_time = s->next_alarm_time;
 }
-if (next_update_time != qemu_timer_expire_time_ns(s->update_timer)) {
+if (next_update_time != timer_expire_time_ns(s->update_timer)) {
 qemu_mod_timer(s->update_timer, next_update_time);
 }
 }
@@ -587,8 +587,8 @@ static int update_in_progress(RTCState *s)
 if (!rtc_running(s)) {
 return 0;
 }
-if (qemu_timer_pending(s->update_timer)) {
-int64_t next_update_time = qemu_timer_expire_time_ns(s->update_timer);
+if (timer_pending(s->update_timer)) {
+int64_t next_update_time = timer_expire_time_ns(s->update_timer);
 /* Latch UIP until the timer expires.  */
 if (qemu_get_clock_ns(rtc_clock) >= (next_update_time - 
UIP_HOLD_LENGTH)) {
 s->cmos_data[RTC_REG_A] |= REG_A_UIP;
diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index e3b9f32..8fee3d3 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -1493,7 +1493,7 @@ static void usbredir_device_connect(void *priv,
 USBRedirDevice *dev = priv;
 const char *speed;
 
-if (qemu_timer_pending(dev->attach_timer) || dev->dev.attached) {
+if (timer_pending(dev->attach_timer) || dev->dev.attached) {
 ERROR("Received device connect while already connected\n");
 return;
 }
@@ -1588,7 +1588,7 @

[Qemu-devel] [RFC] [PATCHv10 28/31] aio / timers: Add test harness for AioContext timers

2013-08-11 Thread Alex Bligh

Add a test harness for AioContext timers. The g_source equivalent is
unsatisfactory as it suffers from false wakeups.

Signed-off-by: Alex Bligh 
---
 tests/test-aio.c |  136 ++
 1 file changed, 136 insertions(+)

diff --git a/tests/test-aio.c b/tests/test-aio.c
index eedf7f8..f751543 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -32,6 +32,15 @@ typedef struct {
 int max;
 } BHTestData;
 
+typedef struct {
+QEMUTimer timer;
+QEMUClockType clock_type;
+int n;
+int max;
+int64_t ns;
+AioContext *ctx;
+} TimerTestData;
+
 static void bh_test_cb(void *opaque)
 {
 BHTestData *data = opaque;
@@ -40,6 +49,24 @@ static void bh_test_cb(void *opaque)
 }
 }
 
+static void timer_test_cb(void *opaque)
+{
+TimerTestData *data = opaque;
+if (++data->n < data->max) {
+timer_mod(&data->timer,
+  qemu_clock_get_ns(data->clock_type) + data->ns);
+}
+}
+
+static void dummy_io_handler_read(void *opaque)
+{
+}
+
+static int dummy_io_handler_flush(void *opaque)
+{
+return 1;
+}
+
 static void bh_delete_cb(void *opaque)
 {
 BHTestData *data = opaque;
@@ -341,6 +368,65 @@ static void test_wait_event_notifier_noflush(void)
 event_notifier_cleanup(&data.e);
 }
 
+static void test_timer_schedule(void)
+{
+TimerTestData data = { .n = 0, .ctx = ctx, .ns = SCALE_MS * 750LL,
+   .max = 2,
+   .clock_type = QEMU_CLOCK_VIRTUAL };
+int pipefd[2];
+
+/* aio_poll will not block to wait for timers to complete unless it has
+ * an fd to wait on. Fixing this breaks other tests. So create a dummy one.
+ */
+g_assert(!pipe2(pipefd, O_NONBLOCK));
+aio_set_fd_handler(ctx, pipefd[0],
+   dummy_io_handler_read, NULL, dummy_io_handler_flush,
+   NULL);
+aio_poll(ctx, false);
+
+aio_timer_init(ctx, &data.timer, data.clock_type,
+   SCALE_NS, timer_test_cb, &data);
+timer_mod(&data.timer,
+  qemu_clock_get_ns(data.clock_type) +
+  data.ns);
+
+g_assert_cmpint(data.n, ==, 0);
+
+/* timer_mod may well cause an event notifer to have gone off,
+ * so clear that
+ */
+do {} while (aio_poll(ctx, false));
+
+g_assert(!aio_poll(ctx, false));
+g_assert_cmpint(data.n, ==, 0);
+
+sleep(1);
+g_assert_cmpint(data.n, ==, 0);
+
+g_assert(aio_poll(ctx, false));
+g_assert_cmpint(data.n, ==, 1);
+
+/* timer_mod called by our callback */
+do {} while (aio_poll(ctx, false));
+
+g_assert(!aio_poll(ctx, false));
+g_assert_cmpint(data.n, ==, 1);
+
+g_assert(aio_poll(ctx, true));
+g_assert_cmpint(data.n, ==, 2);
+
+/* As max is now 2, an event notifier should not have gone off */
+
+g_assert(!aio_poll(ctx, false));
+g_assert_cmpint(data.n, ==, 2);
+
+aio_set_fd_handler(ctx, pipefd[0], NULL, NULL, NULL, NULL);
+close(pipefd[0]);
+close(pipefd[1]);
+
+timer_del(&data.timer);
+}
+
 /* Now the same tests, using the context as a GSource.  They are
  * very similar to the ones above, with g_main_context_iteration
  * replacing aio_poll.  However:
@@ -623,6 +709,54 @@ static void test_source_wait_event_notifier_noflush(void)
 event_notifier_cleanup(&data.e);
 }
 
+static void test_source_timer_schedule(void)
+{
+TimerTestData data = { .n = 0, .ctx = ctx, .ns = SCALE_MS * 750LL,
+   .max = 2,
+   .clock_type = QEMU_CLOCK_VIRTUAL };
+int pipefd[2];
+int64_t expiry;
+
+/* aio_poll will not block to wait for timers to complete unless it has
+ * an fd to wait on. Fixing this breaks other tests. So create a dummy one.
+ */
+g_assert(!pipe2(pipefd, O_NONBLOCK));
+aio_set_fd_handler(ctx, pipefd[0],
+   dummy_io_handler_read, NULL, dummy_io_handler_flush,
+   NULL);
+do {} while (g_main_context_iteration(NULL, false));
+
+aio_timer_init(ctx, &data.timer, data.clock_type,
+   SCALE_NS, timer_test_cb, &data);
+expiry = qemu_clock_get_ns(data.clock_type) +
+data.ns;
+timer_mod(&data.timer, expiry);
+
+g_assert_cmpint(data.n, ==, 0);
+
+sleep(1);
+g_assert_cmpint(data.n, ==, 0);
+
+g_assert(g_main_context_iteration(NULL, false));
+g_assert_cmpint(data.n, ==, 1);
+
+/* The comment above was not kidding when it said this wakes up itself */
+do {
+g_assert(g_main_context_iteration(NULL, true));
+} while (qemu_clock_get_ns(data.clock_type) <= expiry);
+sleep(1);
+g_main_context_iteration(NULL, false);
+
+g_assert_cmpint(data.n, ==, 2);
+
+aio_set_fd_handler(ctx, pipefd[0], NULL, NULL, NULL, NULL);
+close(pipefd[0]);
+close(pipefd[1]);
+
+timer_del(&data.timer);
+}
+
+
 /* End of tests.  */
 
 int main(int argc, char **argv)
@@ -651,6 +785,7 @@ int main(int argc, char **ar

[Qemu-devel] [RFC] [PATCHv10 15/31] aio / timers: Convert aio_poll to use AioContext timers' deadline

2013-08-11 Thread Alex Bligh

Convert aio_poll to use deadline based on AioContext's timers.

aio_poll has been changed to return accurately whether progress
has occurred. Prior to this commit, aio_poll always returned
true if g_poll was entered, whether or not any progress was
made. This required a change to tests/test-aio.c where an
assert was backwards.

Signed-off-by: Alex Bligh 
---
 aio-posix.c  |   20 +---
 aio-win32.c  |   22 +++---
 tests/test-aio.c |4 ++--
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index b68eccd..e5b89ab 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -166,6 +166,10 @@ static bool aio_dispatch(AioContext *ctx)
 g_free(tmp);
 }
 }
+
+/* Run our timers */
+progress |= timerlistgroup_run_timers(&ctx->tlg);
+
 return progress;
 }
 
@@ -232,9 +236,9 @@ bool aio_poll(AioContext *ctx, bool blocking)
 }
 
 /* wait until next event */
-ret = g_poll((GPollFD *)ctx->pollfds->data,
- ctx->pollfds->len,
- blocking ? -1 : 0);
+ret = qemu_poll_ns((GPollFD *)ctx->pollfds->data,
+ ctx->pollfds->len,
+ blocking ? timerlistgroup_deadline_ns(&ctx->tlg) : 0);
 
 /* if we have any readable fds, dispatch event */
 if (ret > 0) {
@@ -245,11 +249,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
 node->pfd.revents = pfd->revents;
 }
 }
-if (aio_dispatch(ctx)) {
-progress = true;
-}
+}
+
+/* Run dispatch even if there were no readable fds to run timers */
+if (aio_dispatch(ctx)) {
+progress = true;
 }
 
 assert(progress || busy);
-return true;
+return progress;
 }
diff --git a/aio-win32.c b/aio-win32.c
index 38723bf..479b871 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -98,6 +98,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 HANDLE events[MAXIMUM_WAIT_OBJECTS + 1];
 bool busy, progress;
 int count;
+int timeout;
 
 progress = false;
 
@@ -111,6 +112,9 @@ bool aio_poll(AioContext *ctx, bool blocking)
 progress = true;
 }
 
+/* Run timers */
+progress |= timerlistgroup_run_timers(&ctx->tlg);
+
 /*
  * Then dispatch any pending callbacks from the GSource.
  *
@@ -174,8 +178,11 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 /* wait until next event */
 while (count > 0) {
-int timeout = blocking ? INFINITE : 0;
-int ret = WaitForMultipleObjects(count, events, FALSE, timeout);
+int ret;
+
+timeout = blocking ?
+qemu_timeout_ns_to_ms(timerlistgroup_deadline_ns(&ctx->tlg)) : 0;
+ret = WaitForMultipleObjects(count, events, FALSE, timeout);
 
 /* if we have any signaled events, dispatch event */
 if ((DWORD) (ret - WAIT_OBJECT_0) >= count) {
@@ -214,6 +221,15 @@ bool aio_poll(AioContext *ctx, bool blocking)
 events[ret - WAIT_OBJECT_0] = events[--count];
 }
 
+if (blocking) {
+/* Run the timers a second time. We do this because otherwise aio_wait
+ * will not note progress - and will stop a drain early - if we have
+ * a timer that was not ready to run entering g_poll but is ready
+ * after g_poll. This will only do anything if a timer has expired.
+ */
+progress |= timerlistgroup_run_timers(ctx->timer_list);
+}
+
 assert(progress || busy);
-return true;
+return progress;
 }
diff --git a/tests/test-aio.c b/tests/test-aio.c
index 2d7ec4c..eedf7f8 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -316,13 +316,13 @@ static void test_wait_event_notifier_noflush(void)
 event_notifier_set(&data.e);
 g_assert(aio_poll(ctx, false));
 g_assert_cmpint(data.n, ==, 1);
-g_assert(aio_poll(ctx, false));
+g_assert(!aio_poll(ctx, false));
 g_assert_cmpint(data.n, ==, 1);
 
 event_notifier_set(&data.e);
 g_assert(aio_poll(ctx, false));
 g_assert_cmpint(data.n, ==, 2);
-g_assert(aio_poll(ctx, false));
+g_assert(!aio_poll(ctx, false));
 g_assert_cmpint(data.n, ==, 2);
 
 event_notifier_set(&dummy.e);
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 20/31] aio / timers: Add documentation and new format calls

2013-08-11 Thread Alex Bligh

Add documentation for existing qemu timer calls. Add new format
calls of the format timer_XXX rather than qemu_XXX_timer
for consistency.

Signed-off-by: Alex Bligh 
---
 include/qemu/timer.h |  206 --
 1 file changed, 184 insertions(+), 22 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index a2d77be..bdee09b 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -92,8 +92,52 @@ static inline QEMUClock *qemu_clock_ptr(QEMUClockType type)
 #define vm_clock (qemu_clock_ptr(QEMU_CLOCK_VIRTUAL))
 #define host_clock (qemu_clock_ptr(QEMU_CLOCK_HOST))
 
+/**
+ * qemu_get_clock_ns:
+ * @clock: the clock to operate on
+ *
+ * Get the nanosecond value of a clock
+ *
+ * Returns: the clock value in nanoseconds
+ */
 int64_t qemu_get_clock_ns(QEMUClock *clock);
+
+/**
+ * qemu_clock_get_ns;
+ * @type: the clock type
+ *
+ * Get the nanosecond value of a clock with
+ * type @type
+ *
+ * Returns: the clock value in nanoseconds
+ */
+static inline int64_t qemu_clock_get_ns(QEMUClockType type)
+{
+return qemu_get_clock_ns(qemu_clock_ptr(type));
+}
+
+/**
+ * qemu_clock_has_timers:
+ * @clock: the clock to operate on
+ *
+ * Determines whether a clock's default timer list
+ * has timers attached
+ *
+ * Returns: true if the clock's default timer list
+ * has timers attached
+ */
 bool qemu_clock_has_timers(QEMUClock *clock);
+
+/**
+ * qemu_clock_expired:
+ * @clock: the clock to operate on
+ *
+ * Determines whether a clock's default timer list
+ * has an expired clock.
+ *
+ * Returns: true if the clock's default timer list has
+ * an expired timer
+ */
 bool qemu_clock_expired(QEMUClock *clock);
 int64_t qemu_clock_deadline(QEMUClock *clock);
 
@@ -293,7 +337,7 @@ void timerlistgroup_deinit(QEMUTimerListGroup *tlg);
 bool timerlistgroup_run_timers(QEMUTimerListGroup *tlg);
 
 /**
- * timerlistgroup_deadline_ns
+ * timerlistgroup_deadline_ns:
  * @tlg: the timer list group
  *
  * Determine the deadline of the soonest timer to
@@ -329,13 +373,57 @@ int qemu_timeout_ns_to_ms(int64_t ns);
  * Returns: number of fds ready
  */
 int qemu_poll_ns(GPollFD *fds, uint nfds, int64_t timeout);
+
+/**
+ * qemu_clock_enable:
+ * @clock: the clock to operate on
+ * @enabled: true to enable, false to disable
+ *
+ * Enable or disable a clock
+ */
 void qemu_clock_enable(QEMUClock *clock, bool enabled);
+
+/**
+ * qemu_clock_warp:
+ * @clock: the clock to operate on
+ *
+ * Warp a clock to a new value
+ */
 void qemu_clock_warp(QEMUClock *clock);
 
+/**
+ * qemu_register_clock_reset_notifier:
+ * @clock: the clock to operate on
+ * @notifier: the notifier function
+ *
+ * Register a notifier function to call when the clock
+ * concerned is reset.
+ */
 void qemu_register_clock_reset_notifier(QEMUClock *clock, Notifier *notifier);
+
+/**
+ * qemu_unregister_clock_reset_notifier:
+ * @clock: the clock to operate on
+ * @notifier: the notifier function
+ *
+ * Unregister a notifier function to call when the clock
+ * concerned is reset.
+ */
 void qemu_unregister_clock_reset_notifier(QEMUClock *clock,
   Notifier *notifier);
 
+/**
+ * qemu_new_timer:
+ * @clock: the clock to operate on
+ * @scale: the scale of the clock
+ * @cb: the callback function to call when the timer expires
+ * @opaque: an opaque pointer to pass to the callback
+ *
+ * Produce a new timer attached to clock @clock. This is a legacy
+ * function. Use timer_new instead.
+ *
+ * Returns: a pointer to the new timer allocated.
+ */
 QEMUTimer *qemu_new_timer(QEMUClock *clock, int scale,
   QEMUTimerCB *cb, void *opaque);
 
@@ -400,21 +488,21 @@ static inline QEMUTimer *timer_new(QEMUClockType type, 
int scale,
 return timer_new_tl(main_loop_tlg.tl[type], scale, cb, opaque);
 }
 
+/**
+ * qemu_free_timer:
+ * @ts: the timer to operate on
+ *
+ * free the timer @ts. @ts must not be active.
+ *
+ * This is a legacy function. Use timer_free instead.
+ */
 void qemu_free_timer(QEMUTimer *ts);
-void qemu_del_timer(QEMUTimer *ts);
-void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time);
-void qemu_mod_timer(QEMUTimer *ts, int64_t expire_time);
-bool timer_pending(QEMUTimer *ts);
-bool timer_expired(QEMUTimer *timer_head, int64_t current_time);
-uint64_t timer_expire_time_ns(QEMUTimer *ts);
-
-/* New format calling conventions for timers */
 
 /**
  * timer_free:
- * @ts: the timer
+ * @ts: the timer to operate on
  *
- * Free a timer (it must not be on the active list)
+ * free the timer @ts. @ts must not be active.
  */
 static inline void timer_free(QEMUTimer *ts)
 {
@@ -422,10 +510,22 @@ static inline void timer_free(QEMUTimer *ts)
 }
 
 /**
+ * qemu_del_timer:
+ * @ts: the timer to operate on
+ *
+ * Delete a timer. This makes it inactive. It does not free
+ * memory.
+ *
+ * This is a legacy function. Use timer_del instead.
+ */
+void qemu_del_timer(QEMUTimer *ts);
+
+/**
  * timer_del:
- * @ts: the timer
+ * @ts: th

[Qemu-devel] [RFC] [PATCHv10 31/31] aio / timers: Remove legacy interface

2013-08-11 Thread Alex Bligh

Remove the legacy interface from include/qemu/timers.h.

Ensure struct QEMUClock is not exposed at all.

Signed-off-by: Alex Bligh 
---
 include/qemu/timer.h |  214 +-
 qemu-timer.c |   35 +
 2 files changed, 5 insertions(+), 244 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 168a16d..829c005 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -44,7 +44,6 @@ typedef enum {
 QEMU_CLOCK_MAX
 } QEMUClockType;
 
-typedef struct QEMUClock QEMUClock;
 typedef struct QEMUTimerList QEMUTimerList;
 
 typedef struct QEMUTimerListGroup {
@@ -66,20 +65,10 @@ typedef struct QEMUTimer {
 extern QEMUTimerListGroup main_loop_tlg;
 
 /*
- * QEMUClock & QEMUClockType
+ * QEMUClockType
  */
 
-/**
- * qemu_clock_ptr:
- * @type: type of clock
- *
- * Translate a clock type into a pointer to QEMUClock object.
- *
- * Returns: a pointer to the QEMUClock object
- */
-QEMUClock *qemu_clock_ptr(QEMUClockType type);
-
-/**
+/*
  * qemu_clock_get_ns;
  * @type: the clock type
  *
@@ -654,205 +643,6 @@ static inline int64_t get_ticks_per_sec(void)
 return 10LL;
 }
 
-/**
- * LEGACY API SECTION
- *
- * All these calls will be deleted in due course
- */
-
-/* These three clocks are maintained here with separate variable
- * names for compatibility only.
- */
-#define rt_clock (qemu_clock_ptr(QEMU_CLOCK_REALTIME))
-#define vm_clock (qemu_clock_ptr(QEMU_CLOCK_VIRTUAL))
-#define host_clock (qemu_clock_ptr(QEMU_CLOCK_HOST))
-
-/** LEGACY
- * qemu_get_clock_ns:
- * @clock: the clock to operate on
- *
- * Get the nanosecond value of a clock
- *
- * Returns: the clock value in nanoseconds
- */
-int64_t qemu_get_clock_ns(QEMUClock *clock);
-
-/** LEGACY
- * qemu_get_clock_ms:
- * @clock: the clock to operate on
- *
- * Get the millisecond value of a clock
- *
- * Returns: the clock value in milliseconds
- */
-static inline int64_t qemu_get_clock_ms(QEMUClock *clock)
-{
-return qemu_get_clock_ns(clock) / SCALE_MS;
-}
-
-/** LEGACY
- * qemu_register_clock_reset_notifier:
- * @clock: the clock to operate on
- * @notifier: the notifier function
- *
- * Register a notifier function to call when the clock
- * concerned is reset.
- */
-void qemu_register_clock_reset_notifier(QEMUClock *clock,
-Notifier *notifier);
-
-/** LEGACY
- * qemu_unregister_clock_reset_notifier:
- * @clock: the clock to operate on
- * @notifier: the notifier function
- *
- * Unregister a notifier function to call when the clock
- * concerned is reset.
- */
-void qemu_unregister_clock_reset_notifier(QEMUClock *clock,
-  Notifier *notifier);
-
-/** LEGACY
- * qemu_new_timer:
- * @clock: the clock to operate on
- * @scale: the scale of the clock
- * @cb: the callback function to call when the timer expires
- * @opaque: an opaque pointer to pass to the callback
- *
- * Produce a new timer attached to clock @clock. This is a legacy
- * function. Use timer_new instead.
- *
- * Returns: a pointer to the new timer allocated.
- */
-QEMUTimer *qemu_new_timer(QEMUClock *clock, int scale,
-  QEMUTimerCB *cb, void *opaque);
-
-/** LEGACY
- * qemu_free_timer:
- * @ts: the timer to operate on
- *
- * free the timer @ts. @ts must not be active.
- *
- * This is a legacy function. Use timer_free instead.
- */
-static inline void qemu_free_timer(QEMUTimer *ts)
-{
-timer_free(ts);
-}
-
-/** LEGACY
- * qemu_del_timer:
- * @ts: the timer to operate on
- *
- * Delete a timer. This makes it inactive. It does not free
- * memory.
- *
- * This is a legacy function. Use timer_del instead.
- */
-static inline void qemu_del_timer(QEMUTimer *ts)
-{
-timer_del(ts);
-}
-
-/** LEGACY
- * qemu_mod_timer_ns:
- * @ts: the timer to operate on
- * @expire_time: the expiry time in nanoseconds
- *
- * Modify a timer such that the expiry time is @expire_time
- * as measured in nanoseconds
- *
- * This is a legacy function. Use timer_mod_ns.
- */
-static inline void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time)
-{
-timer_mod_ns(ts, expire_time);
-}
-
-/** LEGACY
- * qemu_mod_timer:
- * @ts: the timer to operate on
- * @expire_time: the expiry time
- *
- * Modify a timer such that the expiry time is @expire_time
- * as measured in the timer's scale
- *
- * This is a legacy function. Use timer_mod.
- */
-static inline void qemu_mod_timer(QEMUTimer *ts, int64_t expire_time)
-{
-timer_mod(ts, expire_time);
-}
-
-/** LEGACY
- * qemu_run_timers:
- * @clock: clock on which to operate
- *
- * Run all the timers associated with the default timer list
- * of a clock.
- *
- * Returns: true if any timer ran.
- */
-bool qemu_run_timers(QEMUClock *clock);
-
-/** LEGACY
- * qemu_new_timer_ns:
- * @clock: the clock to associate with the timer
- * @callback: the callback to call when the timer expires
- * @opaque: the opaque pointer to pa

[Qemu-devel] [RFC] [PATCHv10 13/31] aio / timers: aio_ctx_prepare sets timeout from AioContext timers

2013-08-11 Thread Alex Bligh

Calculate the timeout in aio_ctx_prepare taking into account
the timers attached to the AioContext.

Alter aio_ctx_check similarly.

Signed-off-by: Alex Bligh 
---
 async.c |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/async.c b/async.c
index 2b9ba9b..d8656cc 100644
--- a/async.c
+++ b/async.c
@@ -150,13 +150,14 @@ aio_ctx_prepare(GSource *source, gint*timeout)
 {
 AioContext *ctx = (AioContext *) source;
 QEMUBH *bh;
+int deadline;
 
 for (bh = ctx->first_bh; bh; bh = bh->next) {
 if (!bh->deleted && bh->scheduled) {
 if (bh->idle) {
 /* idle bottom halves will be polled at least
  * every 10ms */
-*timeout = 10;
+*timeout = qemu_soonest_timeout(*timeout, 10);
 } else {
 /* non-idle bottom halves will be executed
  * immediately */
@@ -166,6 +167,14 @@ aio_ctx_prepare(GSource *source, gint*timeout)
 }
 }
 
+deadline = qemu_timeout_ns_to_ms(timerlistgroup_deadline_ns(&ctx->tlg));
+if (deadline == 0) {
+*timeout = 0;
+return true;
+} else {
+*timeout = qemu_soonest_timeout(*timeout, deadline);
+}
+
 return false;
 }
 
@@ -180,7 +189,7 @@ aio_ctx_check(GSource *source)
 return true;
}
 }
-return aio_pending(ctx);
+return aio_pending(ctx) || (timerlistgroup_deadline_ns(&ctx->tlg) == 0);
 }
 
 static gboolean
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 26/31] aio / timers: Convert rtc_clock to be a QEMUClockType

2013-08-11 Thread Alex Bligh

Convert rtc_clock to be a QEMUClockType

Move rtc_clock users to use the new API

Signed-off-by: Alex Bligh 
---
 hw/arm/omap1.c|4 ++--
 hw/arm/pxa2xx.c   |   35 +++
 hw/arm/strongarm.c|   10 +-
 hw/timer/m48t59.c |4 ++--
 hw/timer/mc146818rtc.c|   28 +++-
 hw/timer/pl031.c  |   13 +++--
 hw/timer/twl92230.c   |8 
 include/sysemu/sysemu.h   |2 +-
 target-alpha/sys_helper.c |2 +-
 vl.c  |   10 +-
 10 files changed, 61 insertions(+), 55 deletions(-)

diff --git a/hw/arm/omap1.c b/hw/arm/omap1.c
index 19be5fc..9dc5abd 100644
--- a/hw/arm/omap1.c
+++ b/hw/arm/omap1.c
@@ -2894,7 +2894,7 @@ static void omap_rtc_reset(struct omap_rtc_s *s)
 s->pm_am = 0;
 s->auto_comp = 0;
 s->round = 0;
-s->tick = qemu_get_clock_ms(rtc_clock);
+s->tick = qemu_clock_get_ms(rtc_clock);
 memset(&s->alarm_tm, 0, sizeof(s->alarm_tm));
 s->alarm_tm.tm_mday = 0x01;
 s->status = 1 << 7;
@@ -2915,7 +2915,7 @@ static struct omap_rtc_s *omap_rtc_init(MemoryRegion 
*system_memory,
 
 s->irq = timerirq;
 s->alarm = alarmirq;
-s->clk = qemu_new_timer_ms(rtc_clock, omap_rtc_tick, s);
+s->clk = timer_new_ms(rtc_clock, omap_rtc_tick, s);
 
 omap_rtc_reset(s);
 
diff --git a/hw/arm/pxa2xx.c b/hw/arm/pxa2xx.c
index 17ddd3f..331bc72 100644
--- a/hw/arm/pxa2xx.c
+++ b/hw/arm/pxa2xx.c
@@ -842,7 +842,7 @@ static inline void pxa2xx_rtc_int_update(PXA2xxRTCState *s)
 
 static void pxa2xx_rtc_hzupdate(PXA2xxRTCState *s)
 {
-int64_t rt = qemu_get_clock_ms(rtc_clock);
+int64_t rt = qemu_clock_get_ms(rtc_clock);
 s->last_rcnr += ((rt - s->last_hz) << 15) /
 (1000 * ((s->rttr & 0x) + 1));
 s->last_rdcr += ((rt - s->last_hz) << 15) /
@@ -852,7 +852,7 @@ static void pxa2xx_rtc_hzupdate(PXA2xxRTCState *s)
 
 static void pxa2xx_rtc_swupdate(PXA2xxRTCState *s)
 {
-int64_t rt = qemu_get_clock_ms(rtc_clock);
+int64_t rt = qemu_clock_get_ms(rtc_clock);
 if (s->rtsr & (1 << 12))
 s->last_swcr += (rt - s->last_sw) / 10;
 s->last_sw = rt;
@@ -860,7 +860,7 @@ static void pxa2xx_rtc_swupdate(PXA2xxRTCState *s)
 
 static void pxa2xx_rtc_piupdate(PXA2xxRTCState *s)
 {
-int64_t rt = qemu_get_clock_ms(rtc_clock);
+int64_t rt = qemu_clock_get_ms(rtc_clock);
 if (s->rtsr & (1 << 15))
 s->last_swcr += rt - s->last_pi;
 s->last_pi = rt;
@@ -986,16 +986,19 @@ static uint64_t pxa2xx_rtc_read(void *opaque, hwaddr addr,
 case PIAR:
 return s->piar;
 case RCNR:
-return s->last_rcnr + ((qemu_get_clock_ms(rtc_clock) - s->last_hz) << 
15) /
-(1000 * ((s->rttr & 0x) + 1));
+return s->last_rcnr +
+((qemu_clock_get_ms(rtc_clock) - s->last_hz) << 15) /
+(1000 * ((s->rttr & 0x) + 1));
 case RDCR:
-return s->last_rdcr + ((qemu_get_clock_ms(rtc_clock) - s->last_hz) << 
15) /
-(1000 * ((s->rttr & 0x) + 1));
+return s->last_rdcr +
+((qemu_clock_get_ms(rtc_clock) - s->last_hz) << 15) /
+(1000 * ((s->rttr & 0x) + 1));
 case RYCR:
 return s->last_rycr;
 case SWCR:
 if (s->rtsr & (1 << 12))
-return s->last_swcr + (qemu_get_clock_ms(rtc_clock) - s->last_sw) 
/ 10;
+return s->last_swcr +
+(qemu_clock_get_ms(rtc_clock) - s->last_sw) / 10;
 else
 return s->last_swcr;
 default:
@@ -1135,14 +1138,14 @@ static int pxa2xx_rtc_init(SysBusDevice *dev)
 s->last_swcr = (tm.tm_hour << 19) |
 (tm.tm_min << 13) | (tm.tm_sec << 7);
 s->last_rtcpicr = 0;
-s->last_hz = s->last_sw = s->last_pi = qemu_get_clock_ms(rtc_clock);
-
-s->rtc_hz= qemu_new_timer_ms(rtc_clock, pxa2xx_rtc_hz_tick,s);
-s->rtc_rdal1 = qemu_new_timer_ms(rtc_clock, pxa2xx_rtc_rdal1_tick, s);
-s->rtc_rdal2 = qemu_new_timer_ms(rtc_clock, pxa2xx_rtc_rdal2_tick, s);
-s->rtc_swal1 = qemu_new_timer_ms(rtc_clock, pxa2xx_rtc_swal1_tick, s);
-s->rtc_swal2 = qemu_new_timer_ms(rtc_clock, pxa2xx_rtc_swal2_tick, s);
-s->rtc_pi= qemu_new_timer_ms(rtc_clock, pxa2xx_rtc_pi_tick,s);
+s->last_hz = s->last_sw = s->last_pi = qemu_clock_get_ms(rtc_clock);
+
+s->rtc_hz= timer_new_ms(rtc_clock, pxa2xx_rtc_hz_tick,s);
+s->rtc_rdal1 = timer_new_ms(rtc_clock, pxa2xx_rtc_rdal1_tick, s);
+s->rtc_rdal2 = timer_new_ms(rtc_clock, pxa2xx_rtc_rdal2_tick, s);
+s->rtc_swal1 = timer_new_ms(rtc_clock, pxa2xx_rtc_swal1_tick, s);
+s->rtc_swal2 = timer_new_ms(rtc_clock, pxa2xx_rtc_swal2_tick, s);
+s->rtc_pi= timer_new_ms(rtc_clock, pxa2xx_rtc_pi_tick,s);
 
 sysbus_init_irq(dev, &s->rtc_irq);
 
diff --git a/hw/arm/strongarm.c b/hw/arm/strongarm.c
index 82a9492..a7f8113 100644
--- a/hw/arm/strongarm.c
+++ b/hw/arm/strongarm.c
@@ -269,7 +269,7

[Qemu-devel] [RFC] [PATCHv10 00/31] aio / timers: Add AioContext timers and use ppoll

2013-08-11 Thread Alex Bligh

[ This patch set is available from git at:
   https://github.com/abligh/qemu/tree/aio-timers10
As autogenerated patch 30 of the series is too large for the mailing list. ]

This patch series adds support for timers attached to an AioContext clock
which get called within aio_poll.

In doing so it removes alarm timers and moves to use ppoll where possible.

This patch set 'sort of' passes make check (see below for caveat)
including a new test harness for the aio timers, but has not been
tested much beyond that. In particular, the win32 changes have not
even been compile tested. Equally, alterations to use_icount
are untested.

Caveat: I have had to alter tests/test-aio.c so the following error
no longer occurs.

ERROR:tests/test-aio.c:346:test_wait_event_notifier_noflush: assertion failed: 
(aio_poll(ctx, false))

As gar as I can tell, this check was incorrect, in that it checking
aio_poll makes progress when in fact it should not make progress. I
fixed an issue where aio_poll was (as far as I can tell) wrongly
returning true on a timeout, and that generated this error.

Note also the comment on patch 19 in relation to a possible bug
in cpus.c.

The penultimate patch is patch which is created in an automated manner
using scripts/switch-timer-api, added in this patch set. It violates some
coding standards (e.g. line length >= 80 characters), but this is preferable
in terms of giving a provably correct conversion. This patch is too
large for the mailing list, so

EITHER: get it from git at the URL at the top of this message.

OR: Do the following:
 1. Apply patches -0029 inclusive
 2. Run scripts/switch-timer-api
 3. git commit -a (+ suitable commit message)
 4. Apply patch 0031

If there is demand I can split it one commit per file.

This patch set has been compile tested & make check tested on a
'christmas-tree' configuration, meaning a configuration with every --enable-*
value tested that can be easily configured on Ubuntu Precise,
after application of each patch.

Changes done since v9:
* Rebase to master 2e985fe
* Wrap QEMUTimerListGroup in a struct as we're keeping it
  for the time being

Changes since v8:
* PR_SET_TIMERSLACK commit should have relevant configure patch within
* Delete timerlist_set_notify_cb, put into timerlist_new
* Add missing QLIST_INIT of clock->timerlists
* Fix documentation for timerlist_get_clock
* Rename qemu_timer_xxx to timer_xxx
* Remove unintentional change to pc-bios/slof.bin
* Introduce timer_init and aio_timer_init

Changes since v7:
* Rebase to master 6fdf98f281f85ae6e2883bed2f691bcfe33b1f9f
* Add qemu_clock_get_ms and qemu_clock_get_ms
* Rename qemu_get_clock to qemu_clock_ptr
* Reorder qemu-timer.h to utilise the legacy API
* Hide qemu_clock_new & qemu_clock_free
* Rename default_timerlist to main_loop_timerlist
* Remove main_loop_timerlist once main_loop_tlg is in
* Add script to convert to new API
* Make rtc_clock use new API
* Convert tests/test-aio to use new API
* Run script on entire source code
* Remove legacy API functions

Changes since v6:
* Fix build failure in vnc-auth-sasl.c
* Split first patch into 3
* Add assert on timerlist_free
* Fix ==/= error on qemu_clock_use_for_deadline
* Remove unnecessary cast in aio_timerlist_notify
* Fix bad deadline comparison in aio_ctx_check
* Add assert to timerlist_new_from_clock to check init_clocks
* Use timer_list not tl
* Change default_timerlistgroup to main_loop_timerlistgroup
* Add comment on commit for qemu_clock_use_for_deadline
* Fixed various include file issues
* Convert *_has_timers and *_has_expired to return bool
* Make loop variable consistent when looping through clock types
* Add documentation to existing qemu_timer calls
* Remove qemu_clock_deadline and move to qemu_clock_deadline_ns

Changes since v5:
* Rebase onto master (b9ac5d9)
* Fix spacing in typedef QEMUTimerList
* Rename 'QEMUClocks' extern to 'qemu_clocks'

Changes since v4:
* Rename qemu_timerlist_ functions to timer_list (per Paolo Bonzini)
* Rename qemu_timer_.*timerlist.* to timer_ (per Paolo Bonzini)
* Use enum for QEMUClockType
* Put clocks into an array; remove global variables
* Introduce QEMUTimerListGroup - a timeliest of each type
* Add a QEMUTimerListGroup to AioContext
* Use a callback on timer modification, rather than binding in
  AioContext into the timeliest
* Make cpus.c iterate over all timerlists when it does a notify
* Make cpus.c icount timeout use soonest timeout
  across all timerlists

Changes since v3:
* Split up QEMUClock and QEMUClock list
* Improve commenting
* Fix comment in vl.c
* Change test/test-aio.c to reflect correct behaviour in aio_poll.

Changes since v2:
* Reordered to remove alarm timers last
* Added prctl(PR_SET_TIMERSLACK, 1, ...)
* Renamed qemu_g_poll_ns to qemu_poll_ns
* Moved declaration of above & drop glib types
* Do not use a global list of qemu clocks
* Add AioContext * to QEMUClock
* Split up conversion to use ppoll and timers
* Indentation fix
* Fix aio_win32.c aio_poll to return progress

[Qemu-devel] [RFC] [PATCHv10 09/31] aio / timers: Untangle include files

2013-08-11 Thread Alex Bligh

include/qemu/timer.h has no need to include main-loop.h and
doing so causes an issue for the next patch. Unfortunately
various files assume including timers.h will pull in main-loop.h.
Untangle this mess.

Signed-off-by: Alex Bligh 
---
 dma-helpers.c |1 +
 hw/dma/xilinx_axidma.c|1 +
 hw/timer/arm_timer.c  |1 +
 hw/timer/exynos4210_mct.c |1 +
 hw/timer/exynos4210_pwm.c |1 +
 hw/timer/grlib_gptimer.c  |2 ++
 hw/timer/imx_epit.c   |1 +
 hw/timer/imx_gpt.c|1 +
 hw/timer/lm32_timer.c |1 +
 hw/timer/puv3_ost.c   |1 +
 hw/timer/sh_timer.c   |1 +
 hw/timer/slavio_timer.c   |1 +
 hw/timer/xilinx_timer.c   |1 +
 hw/tpm/tpm_tis.c  |1 +
 hw/usb/hcd-uhci.c |1 +
 include/block/block_int.h |1 +
 include/block/coroutine.h |2 ++
 include/qemu/timer.h  |1 -
 migration-exec.c  |1 +
 migration-fd.c|1 +
 migration-tcp.c   |1 +
 migration-unix.c  |1 +
 migration.c   |1 +
 nbd.c |1 +
 net/net.c |1 +
 net/socket.c  |1 +
 qemu-coroutine-io.c   |1 +
 qemu-io-cmds.c|1 +
 qemu-nbd.c|1 +
 slirp/misc.c  |1 +
 thread-pool.c |1 +
 ui/vnc-auth-sasl.h|1 +
 ui/vnc-auth-vencrypt.c|2 +-
 ui/vnc-ws.c   |1 +
 34 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index 499550f..c9620a5 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -11,6 +11,7 @@
 #include "trace.h"
 #include "qemu/range.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 
 /* #define DEBUG_IOMMU */
 
diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
index a48e3ba..59e8e35 100644
--- a/hw/dma/xilinx_axidma.c
+++ b/hw/dma/xilinx_axidma.c
@@ -27,6 +27,7 @@
 #include "hw/ptimer.h"
 #include "qemu/log.h"
 #include "qapi/qmp/qerror.h"
+#include "qemu/main-loop.h"
 
 #include "hw/stream.h"
 
diff --git a/hw/timer/arm_timer.c b/hw/timer/arm_timer.c
index acfea59..a47afde 100644
--- a/hw/timer/arm_timer.c
+++ b/hw/timer/arm_timer.c
@@ -12,6 +12,7 @@
 #include "qemu-common.h"
 #include "hw/qdev.h"
 #include "hw/ptimer.h"
+#include "qemu/main-loop.h"
 
 /* Common timer implementation.  */
 
diff --git a/hw/timer/exynos4210_mct.c b/hw/timer/exynos4210_mct.c
index a8009a4..13b1889 100644
--- a/hw/timer/exynos4210_mct.c
+++ b/hw/timer/exynos4210_mct.c
@@ -54,6 +54,7 @@
 
 #include "hw/sysbus.h"
 #include "qemu/timer.h"
+#include "qemu/main-loop.h"
 #include "qemu-common.h"
 #include "hw/ptimer.h"
 
diff --git a/hw/timer/exynos4210_pwm.c b/hw/timer/exynos4210_pwm.c
index a52f0f6..1aa8f4d 100644
--- a/hw/timer/exynos4210_pwm.c
+++ b/hw/timer/exynos4210_pwm.c
@@ -23,6 +23,7 @@
 #include "hw/sysbus.h"
 #include "qemu/timer.h"
 #include "qemu-common.h"
+#include "qemu/main-loop.h"
 #include "hw/ptimer.h"
 
 #include "hw/arm/exynos4210.h"
diff --git a/hw/timer/grlib_gptimer.c b/hw/timer/grlib_gptimer.c
index 7c1055a..74c16d6 100644
--- a/hw/timer/grlib_gptimer.c
+++ b/hw/timer/grlib_gptimer.c
@@ -25,6 +25,8 @@
 #include "hw/sysbus.h"
 #include "qemu/timer.h"
 #include "hw/ptimer.h"
+#include "qemu/timer.h"
+#include "qemu/main-loop.h"
 
 #include "trace.h"
 
diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c
index 117dc7b..efe2ff9 100644
--- a/hw/timer/imx_epit.c
+++ b/hw/timer/imx_epit.c
@@ -18,6 +18,7 @@
 #include "hw/ptimer.h"
 #include "hw/sysbus.h"
 #include "hw/arm/imx.h"
+#include "qemu/main-loop.h"
 
 #define TYPE_IMX_EPIT "imx.epit"
 
diff --git a/hw/timer/imx_gpt.c b/hw/timer/imx_gpt.c
index 87db0e1..f2d1975 100644
--- a/hw/timer/imx_gpt.c
+++ b/hw/timer/imx_gpt.c
@@ -18,6 +18,7 @@
 #include "hw/ptimer.h"
 #include "hw/sysbus.h"
 #include "hw/arm/imx.h"
+#include "qemu/main-loop.h"
 
 #define TYPE_IMX_GPT "imx.gpt"
 
diff --git a/hw/timer/lm32_timer.c b/hw/timer/lm32_timer.c
index 986e6a1..8ed138c 100644
--- a/hw/timer/lm32_timer.c
+++ b/hw/timer/lm32_timer.c
@@ -27,6 +27,7 @@
 #include "qemu/timer.h"
 #include "hw/ptimer.h"
 #include "qemu/error-report.h"
+#include "qemu/main-loop.h"
 
 #define DEFAULT_FREQUENCY (50*100)
 
diff --git a/hw/timer/puv3_ost.c b/hw/timer/puv3_ost.c
index 4bd2b76..fa9eefd 100644
--- a/hw/timer/puv3_ost.c
+++ b/hw/timer/puv3_ost.c
@@ -10,6 +10,7 @@
  */
 #include "hw/sysbus.h"
 #include "hw/ptimer.h"
+#include "qemu/main-loop.h"
 
 #undef DEBUG_PUV3
 #include "hw/unicore32/puv3.h"
diff --git a/hw/timer/sh_timer.c b/hw/timer/sh_timer.c
index 251a10d..07f0670 100644
--- a/hw/timer/sh_timer.c
+++ b/hw/timer/sh_timer.c
@@ -11,6 +11,7 @@
 #include "hw/hw.h"
 #include "hw/sh4/sh.h"
 #include "qemu/timer.h"
+#include "qemu/main-loop.h"
 #include "exec/address-spaces.h"
 #include "hw/ptimer.h"
 
diff --git a/hw/timer/slavio_timer.c b/hw/timer/slavio_timer.c
index 33e8f6c..f75b914 100644
--- a/hw/timer/slavio_timer.c
+++ b/hw/time

[Qemu-devel] [RFC] [PATCHv10 24/31] aio / timers: Rearrange timer.h & make legacy functions call non-legacy

2013-08-11 Thread Alex Bligh

Rearrange timer.h so it is in order by function type.

Make legacy functions call non-legacy functions rather than vice-versa.

Convert cpus.c to use new API.

Signed-off-by: Alex Bligh 
---
 cpus.c   |  112 -
 hw/acpi/piix4.c  |2 +-
 hw/input/tsc2005.c   |4 +-
 hw/input/tsc210x.c   |4 +-
 hw/sparc64/sun4u.c   |4 +-
 include/qemu/timer.h |  614 --
 main-loop.c  |2 +-
 qemu-timer.c |  100 +---
 qtest.c  |2 +-
 savevm.c |8 +-
 stubs/clock-warp.c   |2 +-
 11 files changed, 477 insertions(+), 377 deletions(-)

diff --git a/cpus.c b/cpus.c
index 673d506..a6d7833 100644
--- a/cpus.c
+++ b/cpus.c
@@ -202,7 +202,7 @@ static void icount_adjust(void)
 return;
 }
 cur_time = cpu_get_clock();
-cur_icount = qemu_get_clock_ns(vm_clock);
+cur_icount = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 delta = cur_icount - cur_time;
 /* FIXME: This is a very crude algorithm, somewhat prone to oscillation.  
*/
 if (delta > 0
@@ -223,15 +223,16 @@ static void icount_adjust(void)
 
 static void icount_adjust_rt(void *opaque)
 {
-qemu_mod_timer(icount_rt_timer,
-   qemu_get_clock_ms(rt_clock) + 1000);
+timer_mod(icount_rt_timer,
+   qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + 1000);
 icount_adjust();
 }
 
 static void icount_adjust_vm(void *opaque)
 {
-qemu_mod_timer(icount_vm_timer,
-   qemu_get_clock_ns(vm_clock) + get_ticks_per_sec() / 10);
+timer_mod(icount_vm_timer,
+   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+   get_ticks_per_sec() / 10);
 icount_adjust();
 }
 
@@ -247,22 +248,22 @@ static void icount_warp_rt(void *opaque)
 }
 
 if (runstate_is_running()) {
-int64_t clock = qemu_get_clock_ns(rt_clock);
+int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 int64_t warp_delta = clock - vm_clock_warp_start;
 if (use_icount == 1) {
 qemu_icount_bias += warp_delta;
 } else {
 /*
- * In adaptive mode, do not let the vm_clock run too
+ * In adaptive mode, do not let QEMU_CLOCK_VIRTUAL run too
  * far ahead of real time.
  */
 int64_t cur_time = cpu_get_clock();
-int64_t cur_icount = qemu_get_clock_ns(vm_clock);
+int64_t cur_icount = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 int64_t delta = cur_time - cur_icount;
 qemu_icount_bias += MIN(warp_delta, delta);
 }
-if (qemu_clock_expired(vm_clock)) {
-qemu_clock_notify(vm_clock);
+if (qemu_clock_expired(QEMU_CLOCK_VIRTUAL)) {
+qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
 }
 }
 vm_clock_warp_start = -1;
@@ -270,19 +271,19 @@ static void icount_warp_rt(void *opaque)
 
 void qtest_clock_warp(int64_t dest)
 {
-int64_t clock = qemu_get_clock_ns(vm_clock);
+int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 assert(qtest_enabled());
 while (clock < dest) {
-int64_t deadline = qemu_clock_deadline_ns_all(vm_clock);
+int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
 int64_t warp = MIN(dest - clock, deadline);
 qemu_icount_bias += warp;
-qemu_run_timers(vm_clock);
-clock = qemu_get_clock_ns(vm_clock);
+qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
+clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 }
-qemu_clock_notify(vm_clock);
+qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
 }
 
-void qemu_clock_warp(QEMUClock *clock)
+void qemu_clock_warp(QEMUClockType type)
 {
 int64_t deadline;
 
@@ -291,20 +292,20 @@ void qemu_clock_warp(QEMUClock *clock)
  * applicable to other clocks.  But a clock argument removes the
  * need for if statements all over the place.
  */
-if (clock != vm_clock || !use_icount) {
+if (type != QEMU_CLOCK_VIRTUAL || !use_icount) {
 return;
 }
 
 /*
- * If the CPUs have been sleeping, advance the vm_clock timer now.  This
- * ensures that the deadline for the timer is computed correctly below.
+ * If the CPUs have been sleeping, advance QEMU_CLOCK_VIRTUAL timer now.
+ * This ensures that the deadline for the timer is computed correctly 
below.
  * This also makes sure that the insn counter is synchronized before the
  * CPU starts running, in case the CPU is woken by an event other than
- * the earliest vm_clock timer.
+ * the earliest QEMU_CLOCK_VIRTUAL timer.
  */
 icount_warp_rt(NULL);
-if (!all_cpu_threads_idle() || !qemu_clock_has_timers(vm_clock)) {
-qemu_del_timer(icount_warp_timer);
+if (!all_cpu_threads_idle() || !qemu_clock_has_timers(QEMU_CLOCK_VIRTUAL)) 
{
+timer_del(icount_warp_timer);
 return;
 }
 
@@ -313,12 +314,12 @@ void qemu_clock_warp(

[Qemu-devel] [RFC] [PATCHv10 22/31] aio / timers: Remove legacy qemu_clock_deadline & qemu_timerlist_deadline

2013-08-11 Thread Alex Bligh

Remove qemu_clock_deadline and qemu_timerlist_deadline now we are using
the ns functions throughout.

Signed-off-by: Alex Bligh 
---
 include/qemu/timer.h |   16 
 qemu-timer.c |   20 
 2 files changed, 36 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 1684e77..899a11a 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -139,7 +139,6 @@ bool qemu_clock_has_timers(QEMUClock *clock);
  * an expired timer
  */
 bool qemu_clock_expired(QEMUClock *clock);
-int64_t qemu_clock_deadline(QEMUClock *clock);
 
 /**
  * qemu_clock_deadline_ns:
@@ -245,21 +244,6 @@ bool timerlist_has_timers(QEMUTimerList *timer_list);
 bool timerlist_expired(QEMUTimerList *timer_list);
 
 /**
- * timerlist_deadline:
- * @timer_list: the timer list to operate on
- *
- * Determine the deadline for a timer_list. This is
- * a legacy function which returns INT32_MAX if the
- * timer list has no timers or if the earliest timer
- * expires later than INT32_MAX nanoseconds away.
- *
- * Returns: the number of nanoseconds until the earliest
- * timer expires or INT32_MAX in the situations listed
- * above
- */
-int64_t timerlist_deadline(QEMUTimerList *timer_list);
-
-/**
  * timerlist_deadline_ns:
  * @timer_list: the timer list to operate on
  *
diff --git a/qemu-timer.c b/qemu-timer.c
index acc3bcf..2f27c8d 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -176,26 +176,6 @@ bool qemu_clock_expired(QEMUClock *clock)
 return timerlist_expired(clock->main_loop_timerlist);
 }
 
-int64_t timerlist_deadline(QEMUTimerList *timer_list)
-{
-/* To avoid problems with overflow limit this to 2^32.  */
-int64_t delta = INT32_MAX;
-
-if (timer_list->clock->enabled && timer_list->active_timers) {
-delta = timer_list->active_timers->expire_time -
-qemu_get_clock_ns(timer_list->clock);
-}
-if (delta < 0) {
-delta = 0;
-}
-return delta;
-}
-
-int64_t qemu_clock_deadline(QEMUClock *clock)
-{
-return timerlist_deadline(clock->main_loop_timerlist);
-}
-
 /*
  * As above, but return -1 for no deadline, and do not cap to 2^32
  * as we know the result is always positive.
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 19/31] aio / timers: Use all timerlists in icount warp calculations

2013-08-11 Thread Alex Bligh

Notify all timerlists derived from vm_clock in icount warp
calculations.

When calculating timer delay based on vm_clock deadline, use
all timerlists.

For compatibility, maintain an apparent bug where when using
icount, if no vm_clock timer was set, qemu_clock_deadline
would return INT32_MAX and always set an icount clock expiry
about 2 seconds ahead.

NB: thread safety - when different timerlists sit on different
threads, this will need some locking.

Signed-off-by: Alex Bligh 
---
 cpus.c   |   46 +-
 include/qemu/timer.h |   13 +
 qemu-timer.c |   16 
 qtest.c  |2 +-
 4 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/cpus.c b/cpus.c
index 0f65e76..673d506 100644
--- a/cpus.c
+++ b/cpus.c
@@ -262,7 +262,7 @@ static void icount_warp_rt(void *opaque)
 qemu_icount_bias += MIN(warp_delta, delta);
 }
 if (qemu_clock_expired(vm_clock)) {
-qemu_notify_event();
+qemu_clock_notify(vm_clock);
 }
 }
 vm_clock_warp_start = -1;
@@ -273,13 +273,13 @@ void qtest_clock_warp(int64_t dest)
 int64_t clock = qemu_get_clock_ns(vm_clock);
 assert(qtest_enabled());
 while (clock < dest) {
-int64_t deadline = qemu_clock_deadline(vm_clock);
+int64_t deadline = qemu_clock_deadline_ns_all(vm_clock);
 int64_t warp = MIN(dest - clock, deadline);
 qemu_icount_bias += warp;
 qemu_run_timers(vm_clock);
 clock = qemu_get_clock_ns(vm_clock);
 }
-qemu_notify_event();
+qemu_clock_notify(vm_clock);
 }
 
 void qemu_clock_warp(QEMUClock *clock)
@@ -314,7 +314,18 @@ void qemu_clock_warp(QEMUClock *clock)
 }
 
 vm_clock_warp_start = qemu_get_clock_ns(rt_clock);
-deadline = qemu_clock_deadline(vm_clock);
+/* We want to use the earliest deadline from ALL vm_clocks */
+deadline = qemu_clock_deadline_ns_all(vm_clock);
+
+/* Maintain prior (possibly buggy) behaviour where if no deadline
+ * was set (as there is no vm_clock timer) or it is more than
+ * INT32_MAX nanoseconds ahead, we still use INT32_MAX
+ * nanoseconds.
+ */
+if ((deadline < 0) || (deadline > INT32_MAX)) {
+deadline = INT32_MAX;
+}
+
 if (deadline > 0) {
 /*
  * Ensure the vm_clock proceeds even when the virtual CPU goes to
@@ -333,8 +344,8 @@ void qemu_clock_warp(QEMUClock *clock)
  * packets continuously instead of every 100ms.
  */
 qemu_mod_timer(icount_warp_timer, vm_clock_warp_start + deadline);
-} else {
-qemu_notify_event();
+} else if (deadline == 0) {
+qemu_clock_notify(vm_clock);
 }
 }
 
@@ -866,8 +877,13 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 
 while (1) {
 tcg_exec_all();
-if (use_icount && qemu_clock_deadline(vm_clock) <= 0) {
-qemu_notify_event();
+
+if (use_icount) {
+int64_t deadline = qemu_clock_deadline_ns_all(vm_clock);
+
+if (deadline == 0) {
+qemu_clock_notify(vm_clock);
+}
 }
 qemu_tcg_wait_io_event();
 }
@@ -1145,11 +1161,23 @@ static int tcg_cpu_exec(CPUArchState *env)
 #endif
 if (use_icount) {
 int64_t count;
+int64_t deadline;
 int decr;
 qemu_icount -= (env->icount_decr.u16.low + env->icount_extra);
 env->icount_decr.u16.low = 0;
 env->icount_extra = 0;
-count = qemu_icount_round(qemu_clock_deadline(vm_clock));
+deadline = qemu_clock_deadline_ns_all(vm_clock);
+
+/* Maintain prior (possibly buggy) behaviour where if no deadline
+ * was set (as there is no vm_clock timer) or it is more than
+ * INT32_MAX nanoseconds ahead, we still use INT32_MAX
+ * nanoseconds.
+ */
+if ((deadline < 0) || (deadline > INT32_MAX)) {
+deadline = INT32_MAX;
+}
+
+count = qemu_icount_round(deadline);
 qemu_icount += count;
 decr = (count > 0x) ? 0x : count;
 count -= decr;
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 4161ec7..a2d77be 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -102,6 +102,7 @@ int64_t qemu_clock_deadline(QEMUClock *clock);
  * @clock: the clock to operate on
  *
  * Calculate the timeout of the earliest expiring timer
+ * on the default timer list associated with the clock
  * in nanoseconds, or -1 if no timer is set to expire.
  *
  * Returns: time until expiry in nanoseconds or -1
@@ -125,6 +126,18 @@ int64_t qemu_clock_deadline_ns(QEMUClock *clock);
 bool qemu_clock_use_for_deadline(QEMUClock *clock);
 
 /**
+ * qemu_clock_use_for_deadline:
+ * @clock: the clock to operate on
+ *
+ * Calculate the deadline across all timer lists associated
+ * with a clock (as opposed to just the default one)
+ * in nanoseconds, or -1 if no timer is set to

[Qemu-devel] [RFC] [PATCHv10 08/31] aio / timers: Split QEMUClock into QEMUClock and QEMUTimerList

2013-08-11 Thread Alex Bligh

Split QEMUClock into QEMUClock and QEMUTimerList so that we can
have more than one QEMUTimerList associated with the same clock.

Introduce a main_loop_timerlist concept and make existing
qemu_clock_* calls that actually should operate on a QEMUTimerList
call the relevant QEMUTimerList implementations, using the clock's
default timerlist. This vastly reduces the invasiveness of this
change and means the API stays constant for existing users.

Introduce a list of QEMUTimerLists associated with each clock
so that reenabling the clock can cause all the notifiers
to be called. Note the code to do the notifications is added
in a later patch.

Switch QEMUClockType to an enum. Remove global variables vm_clock,
host_clock and rt_clock and add compatibility defines. Do not
fix qemu_next_alarm_deadline as it's going to be deleted.

Add qemu_clock_use_for_deadline to indicate whether a particular
clock should be used for deadline calculations. When use_icount
is true, vm_clock should not be used for deadline calculations
as it does not contain a nanosecond count. Instead, icount
timeouts come from the execution thread doing aio_notify or
qemu_notify as appropriate. This function is used in the next
patch.

Signed-off-by: Alex Bligh 
---
 include/qemu/timer.h |  347 ++
 qemu-timer.c |  207 ++
 2 files changed, 475 insertions(+), 79 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index fcb6a42..a217a81 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -11,34 +11,84 @@
 #define SCALE_US 1000
 #define SCALE_NS 1
 
-#define QEMU_CLOCK_REALTIME 0
-#define QEMU_CLOCK_VIRTUAL  1
-#define QEMU_CLOCK_HOST 2
+/**
+ * QEMUClockType:
+ *
+ * The following clock types are available:
+ *
+ * @QEMU_CLOCK_REALTIME: Real time clock
+ *
+ * The real time clock should be used only for stuff which does not
+ * change the virtual machine state, as it is run even if the virtual
+ * machine is stopped. The real time clock has a frequency of 1000
+ * Hz.
+ *
+ * Formerly rt_clock
+ *
+ * @QEMU_CLOCK_VIRTUAL: virtual clock
+ *
+ * The virtual clock is only run during the emulation. It is stopped
+ * when the virtual machine is stopped. Virtual timers use a high
+ * precision clock, usually cpu cycles (use ticks_per_sec).
+ *
+ * Formerly vm_clock
+ *
+ * @QEMU_CLOCK_HOST: host clock
+ *
+ * The host clock should be use for device models that emulate accurate
+ * real time sources. It will continue to run when the virtual machine
+ * is suspended, and it will reflect system time changes the host may
+ * undergo (e.g. due to NTP). The host clock has the same precision as
+ * the virtual clock.
+ *
+ * Formerly host_clock
+ */
+
+typedef enum {
+QEMU_CLOCK_REALTIME = 0,
+QEMU_CLOCK_VIRTUAL = 1,
+QEMU_CLOCK_HOST = 2,
+QEMU_CLOCK_MAX
+} QEMUClockType;
 
 typedef struct QEMUClock QEMUClock;
+typedef struct QEMUTimerList QEMUTimerList;
 typedef void QEMUTimerCB(void *opaque);
 
-/* The real time clock should be used only for stuff which does not
-   change the virtual machine state, as it is run even if the virtual
-   machine is stopped. The real time clock has a frequency of 1000
-   Hz. */
-extern QEMUClock *rt_clock;
+typedef struct QEMUTimer {
+int64_t expire_time;/* in nanoseconds */
+QEMUTimerList *timer_list;
+QEMUTimerCB *cb;
+void *opaque;
+QEMUTimer *next;
+int scale;
+} QEMUTimer;
+
+extern QEMUClock *qemu_clocks[QEMU_CLOCK_MAX];
 
-/* The virtual clock is only run during the emulation. It is stopped
-   when the virtual machine is stopped. Virtual timers use a high
-   precision clock, usually cpu cycles (use ticks_per_sec). */
-extern QEMUClock *vm_clock;
+/**
+ * qemu_clock_ptr:
+ * @type: type of clock
+ *
+ * Translate a clock type into a pointer to QEMUClock object.
+ *
+ * Returns: a pointer to the QEMUClock object
+ */
+static inline QEMUClock *qemu_clock_ptr(QEMUClockType type)
+{
+return qemu_clocks[type];
+}
 
-/* The host clock should be use for device models that emulate accurate
-   real time sources. It will continue to run when the virtual machine
-   is suspended, and it will reflect system time changes the host may
-   undergo (e.g. due to NTP). The host clock has the same precision as
-   the virtual clock. */
-extern QEMUClock *host_clock;
+/* These three clocks are maintained here with separate variable
+ * names for compatibility only.
+ */
+#define rt_clock (qemu_clock_ptr(QEMU_CLOCK_REALTIME))
+#define vm_clock (qemu_clock_ptr(QEMU_CLOCK_VIRTUAL))
+#define host_clock (qemu_clock_ptr(QEMU_CLOCK_HOST))
 
 int64_t qemu_get_clock_ns(QEMUClock *clock);
-int64_t qemu_clock_has_timers(QEMUClock *clock);
-int64_t qemu_clock_expired(QEMUClock *clock);
+bool qemu_clock_has_timers(QEMUClock *clock);
+bool qemu_clock_expired(QEMUClock *clock);
 int64_t qemu_clock_deadline(QEMUClock *clock);
 
 /**
@@ -53,6 +103,121 @@ int64_t qemu_clock_deadline(QEMUCl

[Qemu-devel] [RFC] [PATCHv10 27/31] aio / timers: convert block_job_sleep_ns and co_sleep_ns to new API

2013-08-11 Thread Alex Bligh

Convert block_job_sleep_ns and co_sleep_ns to use the new timer
API.

Signed-off-by: Alex Bligh 
---
 block/backup.c|4 ++--
 block/commit.c|2 +-
 block/mirror.c|4 ++--
 block/stream.c|2 +-
 blockjob.c|4 ++--
 include/block/blockjob.h  |2 +-
 include/block/coroutine.h |2 +-
 qemu-coroutine-sleep.c|   10 +-
 8 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 6ae8a05..e12b3b1 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -272,9 +272,9 @@ static void coroutine_fn backup_run(void *opaque)
 uint64_t delay_ns = ratelimit_calculate_delay(
 &job->limit, job->sectors_read);
 job->sectors_read = 0;
-block_job_sleep_ns(&job->common, rt_clock, delay_ns);
+block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 
delay_ns);
 } else {
-block_job_sleep_ns(&job->common, rt_clock, 0);
+block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 0);
 }
 
 if (block_job_is_cancelled(&job->common)) {
diff --git a/block/commit.c b/block/commit.c
index 2227fc2..51a1ab3 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -103,7 +103,7 @@ wait:
 /* Note that even when no rate limit is applied we need to yield
  * with no pending I/O here so that bdrv_drain_all() returns.
  */
-block_job_sleep_ns(&s->common, rt_clock, delay_ns);
+block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
 if (block_job_is_cancelled(&s->common)) {
 break;
 }
diff --git a/block/mirror.c b/block/mirror.c
index bed4a7e..ead567e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -439,13 +439,13 @@ static void coroutine_fn mirror_run(void *opaque)
 delay_ns = 0;
 }
 
-block_job_sleep_ns(&s->common, rt_clock, delay_ns);
+block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
 if (block_job_is_cancelled(&s->common)) {
 break;
 }
 } else if (!should_complete) {
 delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
-block_job_sleep_ns(&s->common, rt_clock, delay_ns);
+block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
 } else if (cnt == 0) {
 /* The two disks are in sync.  Exit and report successful
  * completion.
diff --git a/block/stream.c b/block/stream.c
index 7fe9e48..0ef1b9d 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -110,7 +110,7 @@ wait:
 /* Note that even when no rate limit is applied we need to yield
  * with no pending I/O here so that bdrv_drain_all() returns.
  */
-block_job_sleep_ns(&s->common, rt_clock, delay_ns);
+block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
 if (block_job_is_cancelled(&s->common)) {
 break;
 }
diff --git a/blockjob.c b/blockjob.c
index ca80df1..7edc945 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -187,7 +187,7 @@ int block_job_cancel_sync(BlockJob *job)
 return (data.cancelled && data.ret == 0) ? -ECANCELED : data.ret;
 }
 
-void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns)
+void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
 {
 assert(job->busy);
 
@@ -200,7 +200,7 @@ void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, 
int64_t ns)
 if (block_job_is_paused(job)) {
 qemu_coroutine_yield();
 } else {
-co_sleep_ns(clock, ns);
+co_sleep_ns(type, ns);
 }
 job->busy = true;
 }
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index c290d07..d530409 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -141,7 +141,7 @@ void *block_job_create(const BlockJobType *job_type, 
BlockDriverState *bs,
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
  * nanoseconds.  Canceling the job will interrupt the wait immediately.
  */
-void block_job_sleep_ns(BlockJob *job, QEMUClock *clock, int64_t ns);
+void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
 
 /**
  * block_job_completed:
diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index 9197bfb..a4d4055 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -214,7 +214,7 @@ void qemu_co_rwlock_unlock(CoRwlock *lock);
  * Note this function uses timers and hence only works when a main loop is in
  * use.  See main-loop.h and do not use from qemu-tool programs.
  */
-void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns);
+void coroutine_fn co_sleep_ns(QEMUClockType type, int64_t ns);
 
 /**
  * Yield until a file descriptor becomes readable
diff --git a/qemu-coroutine-sleep.c b/qemu-coroutine-sleep.c
index 169ce5c..f6db978 100644
-

[Qemu-devel] [RFC] [PATCHv10 25/31] aio / timers: Remove main_loop_timerlist

2013-08-11 Thread Alex Bligh

Now we have timerlistgroups implemented and main_loop_tlg, we
no longer need the concept of a default timer list associated
with each clock. Remove it and simplify initialisation of
clocks and timer lists.

Signed-off-by: Alex Bligh 
---
 include/qemu/timer.h |6 +
 qemu-timer.c |   63 ++
 2 files changed, 28 insertions(+), 41 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 49c31d2..168a16d 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -64,7 +64,6 @@ typedef struct QEMUTimer {
 } QEMUTimer;
 
 extern QEMUTimerListGroup main_loop_tlg;
-extern QEMUClock *qemu_clocks[QEMU_CLOCK_MAX];
 
 /*
  * QEMUClock & QEMUClockType
@@ -78,10 +77,7 @@ extern QEMUClock *qemu_clocks[QEMU_CLOCK_MAX];
  *
  * Returns: a pointer to the QEMUClock object
  */
-static inline QEMUClock *qemu_clock_ptr(QEMUClockType type)
-{
-return qemu_clocks[type];
-}
+QEMUClock *qemu_clock_ptr(QEMUClockType type);
 
 /**
  * qemu_clock_get_ns;
diff --git a/qemu-timer.c b/qemu-timer.c
index e81e651..8498651 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -45,7 +45,6 @@
 /* timers */
 
 struct QEMUClock {
-QEMUTimerList *main_loop_timerlist;
 QLIST_HEAD(, QEMUTimerList) timerlists;
 
 NotifierList reset_notifiers;
@@ -56,7 +55,7 @@ struct QEMUClock {
 };
 
 QEMUTimerListGroup main_loop_tlg;
-QEMUClock *qemu_clocks[QEMU_CLOCK_MAX];
+QEMUClock qemu_clocks[QEMU_CLOCK_MAX];
 
 /* A QEMUTimerList is a list of timers attached to a clock. More
  * than one QEMUTimerList can be attached to each clock, for instance
@@ -73,24 +72,30 @@ struct QEMUTimerList {
 void *notify_opaque;
 };
 
+/**
+ * qemu_clock_ptr:
+ * @type: type of clock
+ *
+ * Translate a clock type into a pointer to QEMUClock object.
+ *
+ * Returns: a pointer to the QEMUClock object
+ */
+QEMUClock *qemu_clock_ptr(QEMUClockType type)
+{
+return &qemu_clocks[type];
+}
+
 static bool timer_expired_ns(QEMUTimer *timer_head, int64_t current_time)
 {
 return timer_head && (timer_head->expire_time <= current_time);
 }
 
-static QEMUTimerList *timerlist_new_from_clock(QEMUClock *clock,
-   QEMUTimerListNotifyCB *cb,
-   void *opaque)
+QEMUTimerList *timerlist_new(QEMUClockType type,
+ QEMUTimerListNotifyCB *cb,
+ void *opaque)
 {
 QEMUTimerList *timer_list;
-
-/* Assert if we do not have a clock. If you see this
- * assertion in means that the clocks have not been
- * initialised before a timerlist is needed. This
- * normally happens if an AioContext is used before
- * init_clocks() is called within main().
- */
-assert(clock);
+QEMUClock *clock = qemu_clock_ptr(type);
 
 timer_list = g_malloc0(sizeof(QEMUTimerList));
 timer_list->clock = clock;
@@ -100,36 +105,25 @@ static QEMUTimerList *timerlist_new_from_clock(QEMUClock 
*clock,
 return timer_list;
 }
 
-QEMUTimerList *timerlist_new(QEMUClockType type,
- QEMUTimerListNotifyCB *cb, void *opaque)
-{
-return timerlist_new_from_clock(qemu_clock_ptr(type), cb, opaque);
-}
-
 void timerlist_free(QEMUTimerList *timer_list)
 {
 assert(!timerlist_has_timers(timer_list));
 if (timer_list->clock) {
 QLIST_REMOVE(timer_list, list);
-if (timer_list->clock->main_loop_timerlist == timer_list) {
-timer_list->clock->main_loop_timerlist = NULL;
-}
 }
 g_free(timer_list);
 }
 
-static QEMUClock *qemu_clock_new(QEMUClockType type)
+static void qemu_clock_init(QEMUClockType type)
 {
-QEMUClock *clock;
+QEMUClock *clock = qemu_clock_ptr(type);
 
-clock = g_malloc0(sizeof(QEMUClock));
 clock->type = type;
 clock->enabled = true;
 clock->last = INT64_MIN;
 QLIST_INIT(&clock->timerlists);
 notifier_list_init(&clock->reset_notifiers);
-clock->main_loop_timerlist = timerlist_new_from_clock(clock, NULL, NULL);
-return clock;
+main_loop_tlg.tl[type] = timerlist_new(type, NULL, NULL);
 }
 
 bool qemu_clock_use_for_deadline(QEMUClockType type)
@@ -164,7 +158,7 @@ bool timerlist_has_timers(QEMUTimerList *timer_list)
 bool qemu_clock_has_timers(QEMUClockType type)
 {
 return timerlist_has_timers(
-qemu_clock_ptr(type)->main_loop_timerlist);
+main_loop_tlg.tl[type]);
 }
 
 bool timerlist_expired(QEMUTimerList *timer_list)
@@ -177,7 +171,7 @@ bool timerlist_expired(QEMUTimerList *timer_list)
 bool qemu_clock_expired(QEMUClockType type)
 {
 return timerlist_expired(
-qemu_clock_ptr(type)->main_loop_timerlist);
+main_loop_tlg.tl[type]);
 }
 
 /*
@@ -227,7 +221,7 @@ QEMUClockType timerlist_get_clock(QEMUTimerList *timer_list)
 
 QEMUTimerList *qemu_clock_get_main_loop_timerlist(QEMUClockType type)
 {
-return qemu_clock_ptr(type)->main_loop_timerlist;
+return main_loop_tlg.tl[type];

[Qemu-devel] [RFC] [PATCHv10 21/31] aio / timers: Remove alarm timers

2013-08-11 Thread Alex Bligh

Remove alarm timers from qemu-timers.c now we use g_poll / ppoll
instead.

Signed-off-by: Alex Bligh 
---
 include/qemu/timer.h |3 -
 main-loop.c  |4 -
 qemu-timer.c |  500 +-
 vl.c |4 +-
 4 files changed, 4 insertions(+), 507 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index bdee09b..1684e77 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -634,9 +634,6 @@ bool qemu_run_timers(QEMUClock *clock);
  */
 bool qemu_run_all_timers(void);
 
-void configure_alarms(char const *opt);
-int init_timer_alarm(void);
-
 /**
  * initclocks:
  *
diff --git a/main-loop.c b/main-loop.c
index afc3e31..1d0e030 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -131,10 +131,6 @@ int qemu_init_main_loop(void)
 GSource *src;
 
 init_clocks();
-if (init_timer_alarm() < 0) {
-fprintf(stderr, "could not initialize alarm timer\n");
-exit(1);
-}
 
 ret = qemu_signal_init();
 if (ret) {
diff --git a/qemu-timer.c b/qemu-timer.c
index c56ae9e..acc3bcf 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -33,10 +33,6 @@
 #include 
 #endif
 
-#ifdef _WIN32
-#include 
-#endif
-
 #ifdef CONFIG_PPOLL
 #include 
 #endif
@@ -77,174 +73,11 @@ struct QEMUTimerList {
 void *notify_opaque;
 };
 
-struct qemu_alarm_timer {
-char const *name;
-int (*start)(struct qemu_alarm_timer *t);
-void (*stop)(struct qemu_alarm_timer *t);
-void (*rearm)(struct qemu_alarm_timer *t, int64_t nearest_delta_ns);
-#if defined(__linux__)
-timer_t timer;
-int fd;
-#elif defined(_WIN32)
-HANDLE timer;
-#endif
-bool expired;
-bool pending;
-};
-
-static struct qemu_alarm_timer *alarm_timer;
-
 static bool timer_expired_ns(QEMUTimer *timer_head, int64_t current_time)
 {
 return timer_head && (timer_head->expire_time <= current_time);
 }
 
-static int64_t qemu_next_alarm_deadline(void)
-{
-int64_t delta = INT64_MAX;
-int64_t rtdelta;
-int64_t hdelta;
-
-if (!use_icount && vm_clock->enabled &&
-vm_clock->main_loop_timerlist->active_timers) {
-delta = vm_clock->main_loop_timerlist->active_timers->expire_time -
-qemu_get_clock_ns(vm_clock);
-}
-if (host_clock->enabled &&
-host_clock->main_loop_timerlist->active_timers) {
-hdelta = host_clock->main_loop_timerlist->active_timers->expire_time -
-qemu_get_clock_ns(host_clock);
-if (hdelta < delta) {
-delta = hdelta;
-}
-}
-if (rt_clock->enabled &&
-rt_clock->main_loop_timerlist->active_timers) {
-rtdelta = (rt_clock->main_loop_timerlist->active_timers->expire_time -
-   qemu_get_clock_ns(rt_clock));
-if (rtdelta < delta) {
-delta = rtdelta;
-}
-}
-
-return delta;
-}
-
-static void qemu_rearm_alarm_timer(struct qemu_alarm_timer *t)
-{
-int64_t nearest_delta_ns = qemu_next_alarm_deadline();
-if (nearest_delta_ns < INT64_MAX) {
-t->rearm(t, nearest_delta_ns);
-}
-}
-
-/* TODO: MIN_TIMER_REARM_NS should be optimized */
-#define MIN_TIMER_REARM_NS 25
-
-#ifdef _WIN32
-
-static int mm_start_timer(struct qemu_alarm_timer *t);
-static void mm_stop_timer(struct qemu_alarm_timer *t);
-static void mm_rearm_timer(struct qemu_alarm_timer *t, int64_t delta);
-
-static int win32_start_timer(struct qemu_alarm_timer *t);
-static void win32_stop_timer(struct qemu_alarm_timer *t);
-static void win32_rearm_timer(struct qemu_alarm_timer *t, int64_t delta);
-
-#else
-
-static int unix_start_timer(struct qemu_alarm_timer *t);
-static void unix_stop_timer(struct qemu_alarm_timer *t);
-static void unix_rearm_timer(struct qemu_alarm_timer *t, int64_t delta);
-
-#ifdef __linux__
-
-static int dynticks_start_timer(struct qemu_alarm_timer *t);
-static void dynticks_stop_timer(struct qemu_alarm_timer *t);
-static void dynticks_rearm_timer(struct qemu_alarm_timer *t, int64_t delta);
-
-#endif /* __linux__ */
-
-#endif /* _WIN32 */
-
-static struct qemu_alarm_timer alarm_timers[] = {
-#ifndef _WIN32
-#ifdef __linux__
-{"dynticks", dynticks_start_timer,
- dynticks_stop_timer, dynticks_rearm_timer},
-#endif
-{"unix", unix_start_timer, unix_stop_timer, unix_rearm_timer},
-#else
-{"mmtimer", mm_start_timer, mm_stop_timer, mm_rearm_timer},
-{"dynticks", win32_start_timer, win32_stop_timer, win32_rearm_timer},
-#endif
-{NULL, }
-};
-
-static void show_available_alarms(void)
-{
-int i;
-
-printf("Available alarm timers, in order of precedence:\n");
-for (i = 0; alarm_timers[i].name; i++)
-printf("%s\n", alarm_timers[i].name);
-}
-
-void configure_alarms(char const *opt)
-{
-int i;
-int cur = 0;
-int count = ARRAY_SIZE(alarm_timers) - 1;
-char *arg;
-char *name;
-struct qemu_alarm_timer tmp;
-
-if (is_help_option(opt)) {
-show_available_alarms();
-exit(0);
-}
-
-arg

[Qemu-devel] [RFC] [PATCHv10 07/31] aio / timers: Make qemu_run_timers and qemu_run_all_timers return progress

2013-08-11 Thread Alex Bligh

Make qemu_run_timers and qemu_run_all_timers return progress
so that aio_poll etc. can determine whether a timer has been
run.

Signed-off-by: Alex Bligh 
---
 include/qemu/timer.h |   21 +++--
 qemu-timer.c |   18 --
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index fcc3ca0..fcb6a42 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -92,8 +92,25 @@ bool timer_pending(QEMUTimer *ts);
 bool timer_expired(QEMUTimer *timer_head, int64_t current_time);
 uint64_t timer_expire_time_ns(QEMUTimer *ts);
 
-void qemu_run_timers(QEMUClock *clock);
-void qemu_run_all_timers(void);
+/**
+ * qemu_run_timers:
+ * @clock: clock on which to operate
+ *
+ * Run all the timers associated with a clock.
+ *
+ * Returns: true if any timer ran.
+ */
+bool qemu_run_timers(QEMUClock *clock);
+
+/**
+ * qemu_run_all_timers:
+ *
+ * Run all the timers associated with every clock.
+ *
+ * Returns: true if any timer ran.
+ */
+bool qemu_run_all_timers(void);
+
 void configure_alarms(char const *opt);
 void init_clocks(void);
 int init_timer_alarm(void);
diff --git a/qemu-timer.c b/qemu-timer.c
index f224b62..4a10315 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -446,13 +446,14 @@ bool timer_expired(QEMUTimer *timer_head, int64_t 
current_time)
 return timer_expired_ns(timer_head, current_time * timer_head->scale);
 }
 
-void qemu_run_timers(QEMUClock *clock)
+bool qemu_run_timers(QEMUClock *clock)
 {
 QEMUTimer *ts;
 int64_t current_time;
+bool progress = false;

 if (!clock->enabled)
-return;
+return progress;
 
 current_time = qemu_get_clock_ns(clock);
 for(;;) {
@@ -466,7 +467,9 @@ void qemu_run_timers(QEMUClock *clock)
 
 /* run the callback (the timer list can be modified) */
 ts->cb(ts->opaque);
+progress = true;
 }
+return progress;
 }
 
 int64_t qemu_get_clock_ns(QEMUClock *clock)
@@ -521,20 +524,23 @@ uint64_t timer_expire_time_ns(QEMUTimer *ts)
 return timer_pending(ts) ? ts->expire_time : -1;
 }
 
-void qemu_run_all_timers(void)
+bool qemu_run_all_timers(void)
 {
+bool progress = false;
 alarm_timer->pending = false;
 
 /* vm time timers */
-qemu_run_timers(vm_clock);
-qemu_run_timers(rt_clock);
-qemu_run_timers(host_clock);
+progress |= qemu_run_timers(vm_clock);
+progress |= qemu_run_timers(rt_clock);
+progress |= qemu_run_timers(host_clock);
 
 /* rearm timer, if not periodic */
 if (alarm_timer->expired) {
 alarm_timer->expired = false;
 qemu_rearm_alarm_timer(alarm_timer);
 }
+
+return progress;
 }
 
 #ifdef _WIN32
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 23/31] aio / timers: Add qemu_clock_get_ms and qemu_clock_get_ms

2013-08-11 Thread Alex Bligh

Add utility functions qemu_clock_get_ms and qemu_clock_get_us

Signed-off-by: Alex Bligh 
---
 include/qemu/timer.h |   28 
 1 file changed, 28 insertions(+)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 899a11a..8a154c2 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -117,6 +117,34 @@ static inline int64_t qemu_clock_get_ns(QEMUClockType type)
 }
 
 /**
+ * qemu_clock_get_ms;
+ * @type: the clock type
+ *
+ * Get the millisecond value of a clock with
+ * type @type
+ *
+ * Returns: the clock value in milliseconds
+ */
+static inline int64_t qemu_clock_get_ms(QEMUClockType type)
+{
+return qemu_clock_get_ns(type) / SCALE_MS;
+}
+
+/**
+ * qemu_clock_get_us;
+ * @type: the clock type
+ *
+ * Get the microsecond value of a clock with
+ * type @type
+ *
+ * Returns: the clock value in microseconds
+ */
+static inline int64_t qemu_clock_get_us(QEMUClockType type)
+{
+return qemu_clock_get_ns(type) / SCALE_US;
+}
+
+/**
  * qemu_clock_has_timers:
  * @clock: the clock to operate on
  *
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 29/31] aio / timers: Add scripts/switch-timer-api

2013-08-11 Thread Alex Bligh

Add scripts/switch-timer-api to programatically rewrite source
files to use the new timer system.

Signed-off-by: Alex Bligh 
---
 scripts/switch-timer-api |  178 ++
 1 file changed, 178 insertions(+)

diff --git a/scripts/switch-timer-api b/scripts/switch-timer-api
new file mode 100755
index 000..a369a08
--- /dev/null
+++ b/scripts/switch-timer-api
@@ -0,0 +1,178 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+use Getopt::Long;
+use FindBin;
+
+my @legacy = qw(qemu_clock_ptr qemu_get_clock_ns qemu_get_clock_ms 
qemu_register_clock_reset_notifier qemu_unregister_clock_reset_notifier 
qemu_new_timer qemu_free_timer qemu_del_timer qemu_mod_timer_ns qemu_mod_timer 
qemu_run_timers qemu_new_timer_ns qemu_new_timer_us qemu_new_timer_ms);
+my $legacyre = '\b('.join('|', @legacy).')\b';
+my $option_git;
+my $option_dryrun;
+my $option_quiet;
+my $option_rtc;
+my $suffix=".tmp.$$";
+my @files;
+my $getfiles = 'git grep -l -E \'\b((host|rt|vm|rtc)_clock\b|qemu_\w*timer)\' 
| egrep \'\.[ch]$\' | egrep -v \'qemu-timer\.c$|include/qemu/timer\.h$\'';
+
+sub Syntax
+{
+print STDERR < \$option_dryrun,
+ "git|g" => \$option_git,
+"quiet|q" => \$option_quiet,
+"rtc|r" => \$option_rtc,
+ "help|h" => sub { Syntax(); exit(0); }
+))
+{
+Syntax();
+die "Bad options";
+}
+
+if ($#ARGV >=0)
+{
+   @files = @ARGV;
+}
+else
+{
+   @files = split(/\s+/, `$getfiles`);
+}
+
+foreach my $file (@files)
+{
+   die "Cannot find $file" unless (-f $file && -r $file);
+}
+}
+
+sub DoWarn
+{
+my $text = shift @_;
+my $line = shift @_;
+return if ($option_quiet);
+chomp ($line);
+print STDERR "$text\n";
+print STDERR "$line\n\n";
+}
+
+sub Process
+{
+my $ifn = shift @_;
+my $ofn = $ifn.$suffix;
+
+my $intext;
+my $outtext;
+my $linenum = 0;
+
+open my $input, "<", $ifn || die "Cannot open $ifn for read: $!";
+
+while (<$input>)
+{
+   my $line = $_;
+   $intext .= $line;
+   $linenum++;
+
+   # fix the specific uses
+   unless ($option_rtc)
+   {
+   $line =~ 
s/\bqemu_new_timer(_[num]s)\s*\((vm_|rt_|host_)clock\b/timer_new$1(XXX_$2clock/g;
+   $line =~ 
s/\bqemu_new_timer\s*\((vm_|rt_|host_)clock\b/timer_new(XXX_$1clock/g;
+   $line =~ 
s/\bqemu_get_clock(_[num]s)\s*\((vm_|rt_|host_)clock\b/qemu_clock_get$1(XXX_$2clock/g;
+   }
+
+   # rtc is different
+   $line =~ 
s/\bqemu_new_timer(_[num]s)\s*\(rtc_clock\b/timer_new$1(rtc_clock/g;
+   $line =~ s/\bqemu_new_timer\s*\(rtc_clock\b/timer_new(rtc_clock/g;
+   $line =~ 
s/\bqemu_get_clock(_[num]s)\s*\(rtc_clock\b/qemu_clock_get$1(rtc_clock/g;
+   $line =~ 
s/\bqemu_register_clock_reset_notifier\s*\(rtc_clock\b/qemu_register_clock_reset_notifier(qemu_clock_ptr(rtc_clock)/g;
+
+   unless ($option_rtc)
+   {
+   # fix up comments
+   $line =~ s/\b(vm_|rt_|host_)clock\b/XXX_$1clock/g if ($line =~ 
m,^[/ ]+\*,);
+
+   # spurious fprintf error reporting
+   $line =~ s/: qemu_new_timer_ns failed/: timer_new_ns failed/g;
+
+   # these have just changed name
+   $line =~ s/\bqemu_mod_timer\b/timer_mod/g;
+   $line =~ s/\bqemu_mod_timer_(ns|us|ms)\b/timer_mod_$1/g;
+   $line =~ s/\bqemu_free_timer\b/timer_free/g;
+   $line =~ s/\bqemu_del_timer\b/timer_del/g;
+   }
+
+   # fix up rtc_clock
+   $line =~ s/QEMUClock \*rtc_clock;/QEMUClockType rtc_clock;/g;
+   $line =~ s/\brtc_clock = (vm_|rt_|host_)clock\b/rtc_clock = 
XXX_$1clock/g;
+
+   unless ($option_rtc)
+   {
+   # replace any more general uses
+   $line =~ s/\b(vm_|rt_|host_)clock\b/qemu_clock_ptr(XXX_$1clock)/g;
+   }
+
+   # fix up the place holders
+   $line =~ s/\bXXX_vm_clock\b/QEMU_CLOCK_VIRTUAL/g;
+   $line =~ s/\bXXX_rt_clock\b/QEMU_CLOCK_REALTIME/g;
+   $line =~ s/\bXXX_host_clock\b/QEMU_CLOCK_HOST/g;
+
+   unless ($option_rtc)
+   {
+   DoWarn("$ifn:$linenum WARNING: timer $1 not fixed up", $line) if 
($line =~ /\b((vm_|rt_|host_)clock)\b/);
+   DoWarn("$ifn:$linenum WARNING: function $1 not fixed up", $line) if 
($line =~ /\b(qemu_new_timer\w+)\b/);
+   DoWarn("$ifn:$linenum WARNING: legacy function $1 remains", $line) 
if ($line =~ /$legacyre/o);
+   }
+
+   $outtext .= $line;
+}
+
+close $input;
+
+if ($intext ne $outtext)
+{
+   print STDERR "Patching $ifn\n" unless ($option_quiet);
+   unless ($option_dryrun)
+   {
+   open my $output, ">", $ofn || die "Cannot open $ofn for write: $!";
+   print $output $outtext;
+   close $output;
+   rename ($ofn, $ifn) || die "Cannot rename temp file to $ifn: $!";
+   return 1;
+   }
+}
+return 0;
+}
+
+sub DoCommit
+{
+my $file = s

[Qemu-devel] [PATCH] add qemu-img convert -C option (skip target volume creation)

2013-08-11 Thread Alex Bligh

Add a -C option to skip volume creation on qemu-img convert.
This is useful for targets such as rbd / ceph, where the
target volume may already exist; we cannot always rely on
qemu-img convert to create the image, as dependent on the
output format, there may be parameters which are not possible
to specify through the qemu-img convert command line.

Code:

Author: Alexandre Derumier 
Signed-off-by: Alexandre Derumier 
Signed-off-by: Alex Bligh 

Documentaton:

Author: Alex Bligh 
Signed-off-by: Alex Bligh 
---
 qemu-img-cmds.hx |4 ++--
 qemu-img.c   |   37 ++---
 qemu-img.texi|   15 ++-
 3 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 4ca7e95..74ced81 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -34,9 +34,9 @@ STEXI
 ETEXI
 
 DEF("convert", img_convert,
-"convert [-c] [-p] [-q] [-f fmt] [-t cache] [-O output_fmt] [-o options] 
[-s snapshot_name] [-S sparse_size] filename [filename2 [...]] output_filename")
+"convert [-c] [-p] [-q] [-C] [-f fmt] [-t cache] [-O output_fmt] [-o 
options] [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] 
output_filename")
 STEXI
-@item convert [-c] [-p] [-q] [-f @var{fmt}] [-t @var{cache}] [-O 
@var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S 
@var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
+@item convert [-c] [-p] [-q] [-C] [-f @var{fmt}] [-t @var{cache}] [-O 
@var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S 
@var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("info", img_info,
diff --git a/qemu-img.c b/qemu-img.c
index b9a848d..5b6ae15 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -103,6 +103,7 @@ static void help(void)
"  '-S' indicates the consecutive number of bytes that must contain 
only zeros\n"
"   for qemu-img to create a sparse image during conversion\n"
"  '--output' takes the format in which the output must be done 
(human or json)\n"
+   "  '-C' skips the target volume creation (useful for rbd)\n"
"\n"
"Parameters to check subcommand:\n"
"  '-r' tries to repair any inconsistencies that are found during 
the check.\n"
@@ -1116,7 +1117,7 @@ out3:
 
 static int img_convert(int argc, char **argv)
 {
-int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, cluster_sectors;
+int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, 
cluster_sectors, skipcreate;
 int progress = 0, flags;
 const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
 BlockDriver *drv, *proto_drv;
@@ -1139,8 +1140,9 @@ static int img_convert(int argc, char **argv)
 cache = "unsafe";
 out_baseimg = NULL;
 compress = 0;
+skipcreate = 0;
 for(;;) {
-c = getopt(argc, argv, "f:O:B:s:hce6o:pS:t:q");
+c = getopt(argc, argv, "f:O:B:s:hce6o:pS:t:qC");
 if (c == -1) {
 break;
 }
@@ -1161,6 +1163,9 @@ static int img_convert(int argc, char **argv)
 case 'c':
 compress = 1;
 break;
+case 'C':
+skipcreate = 1;
+break;
 case 'e':
 error_report("option -e is deprecated, please use \'-o "
   "encryption\' instead!");
@@ -1329,20 +1334,22 @@ static int img_convert(int argc, char **argv)
 }
 }
 
-/* Create the new image */
-ret = bdrv_create(drv, out_filename, param);
-if (ret < 0) {
-if (ret == -ENOTSUP) {
-error_report("Formatting not supported for file format '%s'",
- out_fmt);
-} else if (ret == -EFBIG) {
-error_report("The image size is too large for file format '%s'",
- out_fmt);
-} else {
-error_report("%s: error while converting %s: %s",
- out_filename, out_fmt, strerror(-ret));
+if (!skipcreate) {
+/* Create the new image */
+ret = bdrv_create(drv, out_filename, param);
+if (ret < 0) {
+if (ret == -ENOTSUP) {
+ error_report("Formatting not supported for file format '%s'",
+  out_fmt);
+} else if (ret == -EFBIG) {
+ error_report("The image size is too large for file format 
'%s'",
+  out_fmt);
+} else {
+error_report("%s: error while converting %s: %s",
+ out_filename, out_fmt, strerror(-ret));
+}
+goto out;
 }
-goto out;
 }
 
 flags = BDRV_O_RDWR;
diff --git a/qemu-img.texi b/qemu-img.texi
index 69f1bda..9e5ba36 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -96,6 +96,14 @@ Second image format
 Strict mode - fail on on different image size or sector allocation
 @end table
 
+Parameters

[Qemu-devel] [RFC] [PATCHv10 05/31] aio / timers: add ppoll support with qemu_poll_ns

2013-08-11 Thread Alex Bligh

Add qemu_poll_ns which works like g_poll but takes a nanosecond
timeout.

Signed-off-by: Alex Bligh 
---
 configure|   19 +++
 include/qemu/timer.h |   12 
 qemu-timer.c |   24 
 3 files changed, 55 insertions(+)

diff --git a/configure b/configure
index 18fa608..5659412 100755
--- a/configure
+++ b/configure
@@ -2818,6 +2818,22 @@ if compile_prog "" "" ; then
   dup3=yes
 fi
 
+# check for ppoll support
+ppoll=no
+cat > $TMPC << EOF
+#include 
+
+int main(void)
+{
+struct pollfd pfd = { .fd = 0, .events = 0, .revents = 0 };
+ppoll(&pfd, 1, 0, 0);
+return 0;
+}
+EOF
+if compile_prog "" "" ; then
+  ppoll=yes
+fi
+
 # check for epoll support
 epoll=no
 cat > $TMPC << EOF
@@ -3814,6 +3830,9 @@ fi
 if test "$dup3" = "yes" ; then
   echo "CONFIG_DUP3=y" >> $config_host_mak
 fi
+if test "$ppoll" = "yes" ; then
+  echo "CONFIG_PPOLL=y" >> $config_host_mak
+fi
 if test "$epoll" = "yes" ; then
   echo "CONFIG_EPOLL=y" >> $config_host_mak
 fi
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index e0a51a1..fcc3ca0 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -63,6 +63,18 @@ int64_t qemu_clock_deadline_ns(QEMUClock *clock);
  */
 int qemu_timeout_ns_to_ms(int64_t ns);
 
+/**
+ * qemu_poll_ns:
+ * @fds: Array of file descriptors
+ * @nfds: number of file descriptors
+ * @timeout: timeout in nanoseconds
+ *
+ * Perform a poll like g_poll but with a timeout in nanoseconds.
+ * See g_poll documentation for further details.
+ *
+ * Returns: number of fds ready
+ */
+int qemu_poll_ns(GPollFD *fds, uint nfds, int64_t timeout);
 void qemu_clock_enable(QEMUClock *clock, bool enabled);
 void qemu_clock_warp(QEMUClock *clock);
 
diff --git a/qemu-timer.c b/qemu-timer.c
index be29adf..4bf05d4 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -37,6 +37,10 @@
 #include 
 #endif
 
+#ifdef CONFIG_PPOLL
+#include 
+#endif
+
 /***/
 /* timers */
 
@@ -323,6 +327,26 @@ int qemu_timeout_ns_to_ms(int64_t ns)
 }
 
 
+/* qemu implementation of g_poll which uses a nanosecond timeout but is
+ * otherwise identical to g_poll
+ */
+int qemu_poll_ns(GPollFD *fds, uint nfds, int64_t timeout)
+{
+#ifdef CONFIG_PPOLL
+if (timeout < 0) {
+return ppoll((struct pollfd *)fds, nfds, NULL, NULL);
+} else {
+struct timespec ts;
+ts.tv_sec = timeout / 10LL;
+ts.tv_nsec = timeout % 10LL;
+return ppoll((struct pollfd *)fds, nfds, &ts, NULL);
+}
+#else
+return g_poll(fds, nfds, qemu_timeout_ns_to_ms(timeout));
+#endif
+}
+
+
 QEMUTimer *qemu_new_timer(QEMUClock *clock, int scale,
   QEMUTimerCB *cb, void *opaque)
 {
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 04/31] aio / timers: Consistent treatment of disabled clocks for deadlines

2013-08-11 Thread Alex Bligh

Make treatment of disabled clocks consistent in deadline calculation

Signed-off-by: Alex Bligh 
---
 qemu-timer.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-timer.c b/qemu-timer.c
index df8f12b..be29adf 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -264,7 +264,7 @@ int64_t qemu_clock_deadline(QEMUClock *clock)
 /* To avoid problems with overflow limit this to 2^32.  */
 int64_t delta = INT32_MAX;
 
-if (clock->active_timers) {
+if (clock->enabled && clock->active_timers) {
 delta = clock->active_timers->expire_time - qemu_get_clock_ns(clock);
 }
 if (delta < 0) {
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 10/31] aio / timers: Add QEMUTimerListGroup and helper functions

2013-08-11 Thread Alex Bligh

Add QEMUTimerListGroup and helper functions, to represent
a QEMUTimerList associated with each clock. Add a default
QEMUTimerListGroup representing the default timer lists
which are not associated with any other object (e.g.
an AioContext as added by future patches).

Signed-off-by: Alex Bligh 
---
 include/qemu/timer.h |   49 +
 qemu-timer.c |   42 ++
 2 files changed, 91 insertions(+)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 70b5ccd..151c35e 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -52,6 +52,11 @@ typedef enum {
 
 typedef struct QEMUClock QEMUClock;
 typedef struct QEMUTimerList QEMUTimerList;
+
+typedef struct QEMUTimerListGroup {
+QEMUTimerList *tl[QEMU_CLOCK_MAX];
+} QEMUTimerListGroup;
+
 typedef void QEMUTimerCB(void *opaque);
 
 typedef struct QEMUTimer {
@@ -63,6 +68,7 @@ typedef struct QEMUTimer {
 int scale;
 } QEMUTimer;
 
+extern QEMUTimerListGroup main_loop_tlg;
 extern QEMUClock *qemu_clocks[QEMU_CLOCK_MAX];
 
 /**
@@ -217,6 +223,49 @@ QEMUClock *timerlist_get_clock(QEMUTimerList *timer_list);
 bool timerlist_run_timers(QEMUTimerList *timer_list);
 
 /**
+ * timerlistgroup_init:
+ * @tlg: the timer list group
+ *
+ * Initialise a timer list group. This must already be
+ * allocated in memory and zeroed.
+ */
+void timerlistgroup_init(QEMUTimerListGroup *tlg);
+
+/**
+ * timerlistgroup_deinit:
+ * @tlg: the timer list group
+ *
+ * Deinitialise a timer list group. This must already be
+ * initialised. Note the memory is not freed.
+ */
+void timerlistgroup_deinit(QEMUTimerListGroup *tlg);
+
+/**
+ * timerlistgroup_run_timers:
+ * @tlg: the timer list group
+ *
+ * Run the timers associated with a timer list group.
+ * This will run timers on multiple clocks.
+ *
+ * Returns: true if any timer callback ran
+ */
+bool timerlistgroup_run_timers(QEMUTimerListGroup *tlg);
+
+/**
+ * timerlistgroup_deadline_ns
+ * @tlg: the timer list group
+ *
+ * Determine the deadline of the soonest timer to
+ * expire associated with any timer list linked to
+ * the timer list group. Only clocks suitable for
+ * deadline calculation are included.
+ *
+ * Returns: the deadline in nanoseconds or -1 if no
+ * timers are to expire.
+ */
+int64_t timerlistgroup_deadline_ns(QEMUTimerListGroup *tlg);
+
+/**
  * qemu_timeout_ns_to_ms:
  * @ns: nanosecond timeout value
  *
diff --git a/qemu-timer.c b/qemu-timer.c
index 2a83928..2f346c9 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -59,6 +59,7 @@ struct QEMUClock {
 bool enabled;
 };
 
+QEMUTimerListGroup main_loop_tlg;
 QEMUClock *qemu_clocks[QEMU_CLOCK_MAX];
 
 /* A QEMUTimerList is a list of timers attached to a clock. More
@@ -564,6 +565,46 @@ bool qemu_run_timers(QEMUClock *clock)
 return timerlist_run_timers(clock->main_loop_timerlist);
 }
 
+void timerlistgroup_init(QEMUTimerListGroup *tlg)
+{
+QEMUClockType type;
+for (type = 0; type < QEMU_CLOCK_MAX; type++) {
+tlg->tl[type] = timerlist_new(type);
+}
+}
+
+void timerlistgroup_deinit(QEMUTimerListGroup *tlg)
+{
+QEMUClockType type;
+for (type = 0; type < QEMU_CLOCK_MAX; type++) {
+timerlist_free(tlg->tl[type]);
+}
+}
+
+bool timerlistgroup_run_timers(QEMUTimerListGroup *tlg)
+{
+QEMUClockType type;
+bool progress = false;
+for (type = 0; type < QEMU_CLOCK_MAX; type++) {
+progress |= timerlist_run_timers(tlg->tl[type]);
+}
+return progress;
+}
+
+int64_t timerlistgroup_deadline_ns(QEMUTimerListGroup *tlg)
+{
+int64_t deadline = -1;
+QEMUClockType type;
+for (type = 0; type < QEMU_CLOCK_MAX; type++) {
+if (qemu_clock_use_for_deadline(tlg->tl[type]->clock)) {
+deadline = qemu_soonest_timeout(deadline,
+timerlist_deadline_ns(
+tlg->tl[type]));
+}
+}
+return deadline;
+}
+
 int64_t qemu_get_clock_ns(QEMUClock *clock)
 {
 int64_t now, last;
@@ -605,6 +646,7 @@ void init_clocks(void)
 for (type = 0; type < QEMU_CLOCK_MAX; type++) {
 if (!qemu_clocks[type]) {
 qemu_clocks[type] = qemu_clock_new(type);
+main_loop_tlg.tl[type] = qemu_clocks[type]->main_loop_timerlist;
 }
 }
 
-- 
1.7.9.5

[Qemu-devel] [RFC] [PATCHv10 12/31] aio / timers: Add a notify callback to QEMUTimerList

2013-08-11 Thread Alex Bligh

Add a notify pointer to QEMUTimerList so it knows what to notify
on a timer change.

Signed-off-by: Alex Bligh 
---
 async.c  |7 ++-
 include/qemu/timer.h |   27 +++
 qemu-timer.c |   31 ---
 3 files changed, 53 insertions(+), 12 deletions(-)

diff --git a/async.c b/async.c
index ae2c700..2b9ba9b 100644
--- a/async.c
+++ b/async.c
@@ -234,6 +234,11 @@ void aio_notify(AioContext *ctx)
 event_notifier_set(&ctx->notifier);
 }
 
+static void aio_timerlist_notify(void *opaque)
+{
+aio_notify(opaque);
+}
+
 AioContext *aio_context_new(void)
 {
 AioContext *ctx;
@@ -245,7 +250,7 @@ AioContext *aio_context_new(void)
 aio_set_event_notifier(ctx, &ctx->notifier, 
(EventNotifierHandler *)
event_notifier_test_and_clear, NULL);
-timerlistgroup_init(&ctx->tlg);
+timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
 
 return ctx;
 }
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 151c35e..354ee88 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -58,6 +58,7 @@ typedef struct QEMUTimerListGroup {
 } QEMUTimerListGroup;
 
 typedef void QEMUTimerCB(void *opaque);
+typedef void QEMUTimerListNotifyCB(void *opaque);
 
 typedef struct QEMUTimer {
 int64_t expire_time;/* in nanoseconds */
@@ -136,13 +137,16 @@ QEMUTimerList 
*qemu_clock_get_main_loop_timerlist(QEMUClock *clock);
 /**
  * timerlist_new:
  * @type: the clock type to associate with the timerlist
+ * @cb: the callback to call on notification
+ * @opaque: the opaque pointer to pass to the callback
  *
  * Create a new timerlist associated with the clock of
  * type @type.
  *
  * Returns: a pointer to the QEMUTimerList created
  */
-QEMUTimerList *timerlist_new(QEMUClockType type);
+QEMUTimerList *timerlist_new(QEMUClockType type,
+ QEMUTimerListNotifyCB *cb, void *opaque);
 
 /**
  * timerlist_free:
@@ -223,13 +227,28 @@ QEMUClock *timerlist_get_clock(QEMUTimerList *timer_list);
 bool timerlist_run_timers(QEMUTimerList *timer_list);
 
 /**
+ * timerlist_notify:
+ * @timer_list: the timer list to use
+ *
+ * call the notifier callback associated with the timer list.
+ */
+void timerlist_notify(QEMUTimerList *timer_list);
+
+/**
  * timerlistgroup_init:
  * @tlg: the timer list group
+ * @cb: the callback to call when a notify is required
+ * @opaque: the opaque pointer to be passed to the callback.
  *
  * Initialise a timer list group. This must already be
- * allocated in memory and zeroed.
- */
-void timerlistgroup_init(QEMUTimerListGroup *tlg);
+ * allocated in memory and zeroed. The notifier callback is
+ * called whenever a clock in the timer list group is
+ * reenabled or whenever a timer associated with any timer
+ * list is modified. If @cb is specified as null, qemu_notify()
+ * is used instead.
+ */
+void timerlistgroup_init(QEMUTimerListGroup *tlg,
+ QEMUTimerListNotifyCB *cb, void *opaque);
 
 /**
  * timerlistgroup_deinit:
diff --git a/qemu-timer.c b/qemu-timer.c
index 2f346c9..c1de3d3 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -73,6 +73,8 @@ struct QEMUTimerList {
 QEMUClock *clock;
 QEMUTimer *active_timers;
 QLIST_ENTRY(QEMUTimerList) list;
+QEMUTimerListNotifyCB *notify_cb;
+void *notify_opaque;
 };
 
 struct qemu_alarm_timer {
@@ -243,7 +245,9 @@ next:
 }
 }
 
-static QEMUTimerList *timerlist_new_from_clock(QEMUClock *clock)
+static QEMUTimerList *timerlist_new_from_clock(QEMUClock *clock,
+   QEMUTimerListNotifyCB *cb,
+   void *opaque)
 {
 QEMUTimerList *timer_list;
 
@@ -257,13 +261,16 @@ static QEMUTimerList *timerlist_new_from_clock(QEMUClock 
*clock)
 
 timer_list = g_malloc0(sizeof(QEMUTimerList));
 timer_list->clock = clock;
+timer_list->notify_cb = cb;
+timer_list->notify_opaque = opaque;
 QLIST_INSERT_HEAD(&clock->timerlists, timer_list, list);
 return timer_list;
 }
 
-QEMUTimerList *timerlist_new(QEMUClockType type)
+QEMUTimerList *timerlist_new(QEMUClockType type,
+ QEMUTimerListNotifyCB *cb, void *opaque)
 {
-return timerlist_new_from_clock(qemu_clock_ptr(type));
+return timerlist_new_from_clock(qemu_clock_ptr(type), cb, opaque);
 }
 
 void timerlist_free(QEMUTimerList *timer_list)
@@ -288,7 +295,7 @@ static QEMUClock *qemu_clock_new(QEMUClockType type)
 clock->last = INT64_MIN;
 QLIST_INIT(&clock->timerlists);
 notifier_list_init(&clock->reset_notifiers);
-clock->main_loop_timerlist = timerlist_new_from_clock(clock);
+clock->main_loop_timerlist = timerlist_new_from_clock(clock, NULL, NULL);
 return clock;
 }
 
@@ -386,6 +393,15 @@ QEMUTimerList 
*qemu_clock_get_main_loop_timerlist(QEMUClock *clock)
 return clock->main_loop_timerlist;
 }
 
+void timerlist_notify(QEMUTimer

[Qemu-devel] [PATCH] gdb: Fix gdb error

2013-08-11 Thread Aneesh Kumar K.V

From: "Aneesh Kumar K.V" 

Don't update the global register count if not requested.
Without this patch a remote gdb session gives

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
Remote 'g' packet reply is too long:
2884c0ccba50c0c ...

...
(gdb)

This is a regression introduce by a0e372f0c49ac01faeaeb73a6e8f50e8ac615f34

Signed-off-by: Aneesh Kumar K.V 
---
 gdbstub.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index 1af25a6..4b58a1e 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -598,6 +598,12 @@ void gdb_register_coprocessor(CPUState *cpu,
 {
 GDBRegisterState *s;
 GDBRegisterState **p;
+static int last_reg;
+CPUClass *cc = CPU_GET_CLASS(cpu);
+
+if (!last_reg) {
+last_reg = cc->gdb_num_core_regs;
+}
 
 p = &cpu->gdb_regs;
 while (*p) {
@@ -608,19 +614,21 @@ void gdb_register_coprocessor(CPUState *cpu,
 }
 
 s = g_new0(GDBRegisterState, 1);
-s->base_reg = cpu->gdb_num_regs;
+s->base_reg = last_reg;
 s->num_regs = num_regs;
 s->get_reg = get_reg;
 s->set_reg = set_reg;
 s->xml = xml;
 
 /* Add to end of list.  */
-cpu->gdb_num_regs += num_regs;
+last_reg += num_regs;
 *p = s;
 if (g_pos) {
 if (g_pos != s->base_reg) {
 fprintf(stderr, "Error: Bad gdb register numbering for '%s'\n"
 "Expected %d got %d\n", xml, g_pos, s->base_reg);
+} else {
+cpu->gdb_num_regs = last_reg;
 }
 }
 }
-- 
1.8.1.2

Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access

2013-08-11 Thread Rusty Russell

Anthony Liguori  writes:
> Rusty Russell  writes:
>> (Qemu run under eatmydata to eliminate syncs)
>
> FYI, cache=unsafe is equivalent to using eatmydata.

Ah, thanks!

> I can reproduce this although I also see a larger standard deviation.
>
> BEFORE:
>   MIN: 496
>   MAX: 1055
> AVG: 873.22
> STDEV: 136.88
>
> AFTER:
> MIN: 494
> MAX: 1456
> AVG: 947.77
> STDEV: 150.89

BTW, how did you generate these stats?  Consider this my plug for my
little stats filter:
https://github.com/rustyrussell/stats

>
> In my datasets, the stdev is higher in the after case implying that
> there is more variation.  Indeed, the MIN is pretty much the same.
>
> GCC is inlining the functions, I'm still surprised that it's measurable
> at all.

GCC won't inline across compilation units without -flto though, so the
stub call won't be inlined, right?

Cheers,
Rusty.

Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access

2013-08-11 Thread Rusty Russell

Benjamin Herrenschmidt  writes:
> This whole exercise should have nothing to do with the current endian
> mode of the CPU. If for example you are running lx86 (the x86 emulator
> IBM provides) which exploits MSR:LE on POWER7 to run x86 binaries in
> userspace, you don't want virtio to suddenly change endian !
>
> The information we care about is the endianness of the operating system.

Which is why my original patches nabbed the endianness when the target
updated the virtio device status.

You're making an assumption about the nature of the guest, that they
don't pass the virtio device directly through to userspace.

I don't care, though.  The point is to make something which works, until
the Real Fix (LE virtio).

> The most logical way to infer it is a different bit, which used to be
> MSR:ILE and is now in LPCR for guests and controlled via a hypercall on
> pseries, which indicates what is the endianness of interrupt vectors.
>
> IE. It indicates how the cpu should set MSR:LE when taking an interrupt,
> regardless of what the current MSR:LE value is at any given point in
> time.
>
> So what should be done in fact is whenever *that* bit is changed
> (currently via hcall, maybe via MSR:ILE if we emulate that on older
> models or LPCR when we emulate that), then the qemu cpu model can "call
> out" to change the "OS endianness" which we can propagate to virtio.
>
> Anything trying to do stuff based on the "current" endianness in the MSR
> sounds like a cesspit to me.

OK.  What should Anton's gdb stub do then?

Cheers,
Rusty.

Re: [Qemu-devel] [PATCH 1/7] virtio: allow byte swapping for vring and config access

2013-08-11 Thread Benjamin Herrenschmidt

On Mon, 2013-08-12 at 09:58 +0930, Rusty Russell wrote:
> Benjamin Herrenschmidt  writes:
> > This whole exercise should have nothing to do with the current endian
> > mode of the CPU. If for example you are running lx86 (the x86 emulator
> > IBM provides) which exploits MSR:LE on POWER7 to run x86 binaries in
> > userspace, you don't want virtio to suddenly change endian !
> >
> > The information we care about is the endianness of the operating system.
> 
> Which is why my original patches nabbed the endianness when the target
> updated the virtio device status.
> 
> You're making an assumption about the nature of the guest, that they
> don't pass the virtio device directly through to userspace.

Two points here:

 - Userspace is VERY likely to have the same endianness as the operating
system.

 - The case where we might support "foreign endian" userspace *and* pass
virtio directly to it *and* give a shit about virtio v1.0 doesn't exist
anywhere but your imagination right now :-)

> I don't care, though.  The point is to make something which works, until
> the Real Fix (LE virtio).

Exactly.

> > The most logical way to infer it is a different bit, which used to be
> > MSR:ILE and is now in LPCR for guests and controlled via a hypercall on
> > pseries, which indicates what is the endianness of interrupt vectors.
> >
> > IE. It indicates how the cpu should set MSR:LE when taking an interrupt,
> > regardless of what the current MSR:LE value is at any given point in
> > time.
> >
> > So what should be done in fact is whenever *that* bit is changed
> > (currently via hcall, maybe via MSR:ILE if we emulate that on older
> > models or LPCR when we emulate that), then the qemu cpu model can "call
> > out" to change the "OS endianness" which we can propagate to virtio.
> >
> > Anything trying to do stuff based on the "current" endianness in the MSR
> > sounds like a cesspit to me.
> 
> OK.  What should Anton's gdb stub do then?

Something else. It's a different problem and needs a different solution.

For one, I think, we should first fix the root problem with gdb (tagging
endianness in the protocol etc...) and once that's done, look at what
band-aid can be applied for old stuff if we care at all (it's not like
LE ppc64 is going to not require a new gdb anyway).

Cheers,
Ben.

Re: [Qemu-devel] [PATCH 1/2] vmdk: support vmfsSparse files

2013-08-11 Thread Fam Zheng

On Sun, 08/11 18:13, Paolo Bonzini wrote:
> VMware ESX hosts use a variant of the VMDK3 format, identified by the
> vmfsSparse create type ad the VMFSSPARSE extent type.
> 
> It has 16 KB grain tables (L2) and a variable-size grain directory (L1).
> In addition, the grain size is always 512, but that is not a problem
> because it is included in the header.
> 
> The format of the extents is documented in the VMDK spec.  The format
> of the descriptor file is not documented precisely, but it can be
> found at http://kb.vmware.com/kb/10026353 (Recreating a missing virtual
> machine disk (VMDK) descriptor file for delta disks).
> 
I don't have access to this link, could you include some documents to
this descriptor format in comment or commit message? IIRC, it's only the
type be "VMFSSPARSE", right?

What version of ESX has this format?

> With these patches, vmfsSparse files only work if opened through the
> descriptor file.  Data files without descriptor files, as far as I
> could understand, are not supported by ESX.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/vmdk.c | 51 ++-
>  1 file changed, 46 insertions(+), 5 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index b16d509..eaf484a 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -505,6 +505,34 @@ static int vmdk_open_vmdk3(BlockDriverState *bs,
>  return ret;
>  }
>  
> +static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
> + BlockDriverState *file,
> + int flags)
> +{
> +int ret;
> +uint32_t magic;
> +VMDK3Header header;
> +VmdkExtent *extent;
> +
> +ret = bdrv_pread(file, sizeof(magic), &header, sizeof(header));
> +if (ret < 0) {
> +return ret;
> +}
> +extent = vmdk_add_extent(bs, file, false,
> +  le64_to_cpu(header.disk_sectors),
> +  le64_to_cpu(header.l1dir_offset) << 9,
> +  0,
> +  le64_to_cpu(header.l1dir_size) * 4,
> +  4096,
> +  le64_to_cpu(header.granularity)); /* always 512 */

This needs to be rebased, vmdk_add_extent() signature has been changed
in:

commit 8aa1331c09a9b899f48d97f097bb49b7d458be1c
Author: Fam Zheng 
Date:   Tue Aug 6 15:44:51 2013 +0800

vmdk: check granularity field in opening

Granularity is used to calculate the cluster size and allocate r/w
buffer. Check the value from image before using it, so we don't abort()
for unbounded memory allocation.

Signed-off-by: Fam Zheng 
Signed-off-by: Kevin Wolf 

Since the new function is a variant of vmdk_open_vmdk3(), would you
consider doing a tiny refactor and reduce duplication? And l1dir_size
and granularity need to be checked, as in vmdk_open_vmdk4().

> +ret = vmdk_init_tables(bs, extent);
> +if (ret) {
> +/* free extent allocated by vmdk_add_extent */
> +vmdk_free_last_extent(bs);
> +}
> +return ret;
> +}
> +
>  static int vmdk_open_desc_file(BlockDriverState *bs, int flags,
> uint64_t desc_offset);
>  
> @@ -663,7 +691,7 @@ static int vmdk_parse_description(const char *desc, const 
> char *opt_name,
>  /* Open an extent file and append to bs array */
>  static int vmdk_open_sparse(BlockDriverState *bs,
>  BlockDriverState *file,
> -int flags)
> +int flags, bool vmfs_sparse)
>  {
>  uint32_t magic;
>  
> @@ -674,7 +702,11 @@ static int vmdk_open_sparse(BlockDriverState *bs,
>  magic = be32_to_cpu(magic);
>  switch (magic) {
>  case VMDK3_MAGIC:
> -return vmdk_open_vmdk3(bs, file, flags);
> +if (vmfs_sparse) {
> +return vmdk_open_vmfs_sparse(bs, file, flags);
> +} else {
> +return vmdk_open_vmdk3(bs, file, flags);
> +}
>  break;
>  case VMDK4_MAGIC:
>  return vmdk_open_vmdk4(bs, file, flags);
> @@ -718,7 +750,8 @@ static int vmdk_parse_extents(const char *desc, 
> BlockDriverState *bs,
>  }
>  
>  if (sectors <= 0 ||
> -(strcmp(type, "FLAT") && strcmp(type, "SPARSE")) ||
> +(strcmp(type, "FLAT") && strcmp(type, "SPARSE") &&
> + strcmp(type, "VMFSSPARSE")) ||
>  (strcmp(access, "RW"))) {
>  goto next_line;
>  }
> @@ -743,7 +776,14 @@ static int vmdk_parse_extents(const char *desc, 
> BlockDriverState *bs,
>  extent->flat_start_offset = flat_offset << 9;
>  } else if (!strcmp(type, "SPARSE")) {
>  /* SPARSE extent */
> -ret = vmdk_open_sparse(bs, extent_file, bs->open_flags);
> +ret = vmdk_open_sparse(bs, extent_file, bs->open_flags, false);
> +if (ret) {
> +

Re: [Qemu-devel] [PATCH for-1.6 V2 1/2] hw/misc: don't create pvpanic device by default

2013-08-11 Thread Hu Tao

On Sun, Aug 11, 2013 at 06:10:42PM +0300, Marcel Apfelbaum wrote:
> This patch is based on Hu Tao's:
> http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00124.html
> 
> No need to hard-code pvpanic as part of the machine.
> It can be added with "-device pvpanic" from command line (The next patch).
> Anyway, for backport compatibility it is still part of 1.5
> machine.
> 
> Signed-off-by: Marcel Apfelbaum 

Reviewed-by: Hu Tao

Re: [Qemu-devel] [PATCH for-1.6 V2 2/2] hw/misc: make pvpanic known to user

2013-08-11 Thread Hu Tao

On Sun, Aug 11, 2013 at 06:10:43PM +0300, Marcel Apfelbaum wrote:
> This patch is based on Hu Tao's:
> http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg00125.html
> 
> The pvpanic device may be enabled now with "-device pvpanic"
> from command line.
> 
> Signed-off-by: Marcel Apfelbaum 

Reviewed-by: Hu Tao

Re: [Qemu-devel] [PATCH 2/2] [v3] target-ppc: Enhance CPU nodes of device tree to be PAPR compliant.

2013-08-11 Thread Prerna Saxena

On 08/08/2013 04:04 PM, Andreas Färber wrote:
> Am 08.08.2013 09:26, schrieb Prerna Saxena:
>>
>> From: Prerna Saxena 
>> Date: Thu, 8 Aug 2013 06:38:03 +0530
>> Subject: [PATCH 2/2] Enhance CPU nodes of device tree to be PAPR compliant.
>>
>> This is based on patch from Andreas which enables the default CPU with KVM
>> to show up as "-cpu ", such as "POWER7_V2.3@0"
>>
>> While this is definitely, more descriptive, PAPR mandates the device tree CPU
>> node names to be of the form : "PowerPC," where  should not have
>> underscores.
>> Hence replacing the CPU model (which has underscores) with CPU alias.
>>
>> With this patch, the CPU nodes of device tree show up as :
>> /proc/device-tree/cpus/PowerPC,POWER7@0/...
>> /proc/device-tree/cpus/PowerPC,POWER7@4/...
>>
>> Signed-off-by: Prerna Saxena 
> 
> Not yet happy...

:(

> 
>> ---
>>  hw/ppc/spapr.c | 22 --
>>  1 file changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 59e2fea..8efd84e 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -43,6 +43,7 @@
>>  #include "hw/pci-host/spapr.h"
>>  #include "hw/ppc/xics.h"
>>  #include "hw/pci/msi.h"
>> +#include "cpu-models.h"
>>  
>>  #include "hw/pci/pci.h"
>>  
>> @@ -80,6 +81,8 @@
>>  
>>  #define HTAB_SIZE(spapr)(1ULL << ((spapr)->htab_shift))
>>  
>> +#define PPC_DEVTREE_STR "PowerPC,"
>> +
>>  sPAPREnvironment *spapr;
>>  
>>  int spapr_allocate_irq(int hint, bool lsi)
>> @@ -322,9 +325,16 @@ static void *spapr_create_fdt_skel(const char 
>> *cpu_model,
>>  _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
>>  _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
>>  
>> -modelname = g_strdup(cpu_model);
>> +/*
>> + * PAPR convention mandates that
>> + * Device tree nodes must be named as:
>> + * PowerPC,CPU-NAME@...
>> + * Also, CPU-NAME must not have underscores.(hence use of CPU-ALIAS)
>> + */
>> +
>> +modelname = g_strdup_printf(PPC_DEVTREE_STR "%s", cpu_model);
>>  
>> -for (i = 0; i < strlen(modelname); i++) {
>> +for (i = strlen(PPC_DEVTREE_STR); i < strlen(modelname); i++) {
>>  modelname[i] = toupper(modelname[i]);
>>  }
>>  
> 
> One of your colleagues had brought up that "PowerPC," prefix were not
> mandatory - is it *required* by the PAPR spec now, or is it just that
> the IBM CPUs used with PAPR happen to have such a name?

I dont know what context lead to this observation.
However, PAPR mentions the following nomenclature guideline:

"The value of this property shall be of the form: “PowerPC,”,
where  is the name of the processor chip which may be displayed to
the user.  shall not contain underscores."

I think this name guideline will hold good for all PAPR compliant
processors.

> 
>> @@ -1315,6 +1325,14 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
>>  
>>  cpu_model = g_strndup(parent_name,
>>  strlen(parent_name) - strlen("-" TYPE_POWERPC_CPU));
>> +
>> +for (i = 0; ppc_cpu_aliases[i].model != NULL; i++) {
>> +if (strcmp(ppc_cpu_aliases[i].model, cpu_model) == 0) {
>> +g_free(cpu_model);
>> +cpu_model = g_strndup(ppc_cpu_aliases[i].alias,
>> +strlen(ppc_cpu_aliases[i].alias));
>> +}
>> +}
>>  }
>>  
>>  /* Prepare the device tree */
> 
> This is still fixing up the name in the wrong place: -cpu POWER7_v2.3
> will not get fixed, only -cpu host or KVM's default.
> 
> The solution I had discussed with Alex is the following: When devices
> need to expose their name to firmware in a special way, we have the
> DeviceClass::fw_name field. All we have to do is assign it and use it
> instead of cpu_model if non-NULL, just like we assign DeviceClass::desc.
> The way to do it would be to extend the family of POWERPC_DEF* macros to
> specify the additional field on the relevant CPU models.
> 

Would this be the same use-case as reflected by: ppc_cpu_aliases.alias ?
If so, do we really need a separate field to convey the same information ?

> Therefore my above question: Would it be sufficient to explicitly name
> POWER7_v2.3 PowerPC,POWER7 etc. and to drop the upper-casing?
> Or would we also need to name a CPU such as MPC8572E (random Freescale
> CPU where I don't know the expected fw_name and that is unlikely to
> occur/work in sPAPR) "PowerPC,MPC8572E" if someone specified it with
> -cpu MPC8572E?
> 

If this is not a PAPR-compliant CPU, I dont think the PAPR naming
convention is of any good.
I havent worked with non-PAPR cpus. Is the device tree for such CPUs
generated by routines in hw/ppc/spapr.c ? Or do they have custom
routines to generate appropriate device tree nodes ?

Regards,
-- 
Prerna Saxena

Linux Technology Centre,
IBM Systems and Technology Lab,
Bangalore, India

Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/2] [v3] target-ppc: Enhance CPU nodes of device tree to be PAPR compliant.

2013-08-11 Thread Benjamin Herrenschmidt

On Mon, 2013-08-12 at 10:07 +0530, Prerna Saxena wrote:

 .../...

> I dont know what context lead to this observation.
> However, PAPR mentions the following nomenclature guideline:
> 
> "The value of this property shall be of the form: “PowerPC,”,
> where  is the name of the processor chip which may be displayed to
> the user.  shall not contain underscores."

This actually comes from the original Open Firmware binding for PowerPC
processors, which PAPR inherits largely from. Thus this naming scheme
should apply to all PowerPC processors when a device-tree is involved.

> I think this name guideline will hold good for all PAPR compliant
> processors.

Also PAPR is not a processor architecture, it's a platform and firmware
architecture, so "PAPR-compliant CPU" has little meaning :-)

Cheers,
Ben.

> > 
> >> @@ -1315,6 +1325,14 @@ static void ppc_spapr_init(QEMUMachineInitArgs 
> >> *args)
> >>  
> >>  cpu_model = g_strndup(parent_name,
> >>  strlen(parent_name) - strlen("-" TYPE_POWERPC_CPU));
> >> +
> >> +for (i = 0; ppc_cpu_aliases[i].model != NULL; i++) {
> >> +if (strcmp(ppc_cpu_aliases[i].model, cpu_model) == 0) {
> >> +g_free(cpu_model);
> >> +cpu_model = g_strndup(ppc_cpu_aliases[i].alias,
> >> +strlen(ppc_cpu_aliases[i].alias));
> >> +}
> >> +}
> >>  }
> >>  
> >>  /* Prepare the device tree */
> > 
> > This is still fixing up the name in the wrong place: -cpu POWER7_v2.3
> > will not get fixed, only -cpu host or KVM's default.
> > 
> > The solution I had discussed with Alex is the following: When devices
> > need to expose their name to firmware in a special way, we have the
> > DeviceClass::fw_name field. All we have to do is assign it and use it
> > instead of cpu_model if non-NULL, just like we assign DeviceClass::desc.
> > The way to do it would be to extend the family of POWERPC_DEF* macros to
> > specify the additional field on the relevant CPU models.
> > 
> 
> Would this be the same use-case as reflected by: ppc_cpu_aliases.alias ?
> If so, do we really need a separate field to convey the same information ?
> 
> > Therefore my above question: Would it be sufficient to explicitly name
> > POWER7_v2.3 PowerPC,POWER7 etc. and to drop the upper-casing?
> > Or would we also need to name a CPU such as MPC8572E (random Freescale
> > CPU where I don't know the expected fw_name and that is unlikely to
> > occur/work in sPAPR) "PowerPC,MPC8572E" if someone specified it with
> > -cpu MPC8572E?
> > 
> 
> If this is not a PAPR-compliant CPU, I dont think the PAPR naming
> convention is of any good.
> I havent worked with non-PAPR cpus. Is the device tree for such CPUs
> generated by routines in hw/ppc/spapr.c ? Or do they have custom
> routines to generate appropriate device tree nodes ?
> 
> Regards,

Re: [Qemu-devel] [SeaBIOS] [PATCH] acpi: hide 64-bit PCI hole for Windows XP

2013-08-11 Thread Gerd Hoffmann

On 08/10/13 05:10, Kevin O'Connor wrote:
> On Fri, Aug 09, 2013 at 08:25:00AM +0200, Gerd Hoffmann wrote:
>>> I don't think SeaBIOS should continue to do the above once the tables
>>> are moved to QEMU.  QEMU has all the info SeaBIOS has, so it can
>>> generate the tables correctly on its own.
>>
>> The loader script provided by qemu has fixup instructions, which is
>> needed to fixup pointers to other acpi tables.  The idea is to use that
>> mechanism to also allow th firmware to fixup addresses like pmbase in
>> the qemu-generated tables.
> 
> Yes, but why should QEMU tell SeaBIOS to modify the table for pmbase
> when it can just modify the table itself?

We'll need some way to make sure the pmbase (also mmconf xbar) set by
the firmware matches the pmbase address filled into the acpi tables by
qemu ...

So the options we have are:

  (1) Hardcode the address everywhere.  This is pretty close to the
  current state, 0xb000 is hard-coded pretty much everywhere,
  basically because older qemu versions had the pmbase register
  readonly with 0xb000.  I'd like to move the pmbase somewhere else
  long-term, to free the 0xb000-0xbfff window, so I'd like to avoid
  that.

  (2) Have qemu pick pmbase/xbar addr.  Doesn't work due to
  initialization order issues (especially xbar for coreboot).

  (3) Have firmware pick pmbase/xbar, have fixup instructions for the
  addresses in in the loader script, simliar to the fixup
  instructions for table-to-table pointers.

  (4) [ new idea by mst ]  Have firmware pick pmbase/xbar, then have
  qemu look at the hardware registers programmed by the firmware,
  use pmbase/xbar addresses found there there when generating the
  tables.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH] add qemu-img convert -C option (skip target volume creation)

2013-08-11 Thread Fam Zheng

On Sun, 08/11 18:31, Alex Bligh wrote:
> Add a -C option to skip volume creation on qemu-img convert.
> This is useful for targets such as rbd / ceph, where the
> target volume may already exist; we cannot always rely on
> qemu-img convert to create the image, as dependent on the
> output format, there may be parameters which are not possible
> to specify through the qemu-img convert command line.
> 
> Code:
> 
> Author: Alexandre Derumier 
> Signed-off-by: Alexandre Derumier 
> Signed-off-by: Alex Bligh 
> 
> Documentaton:
> 
> Author: Alex Bligh 
> Signed-off-by: Alex Bligh 
> ---
>  qemu-img-cmds.hx |4 ++--
>  qemu-img.c   |   37 ++---
>  qemu-img.texi|   15 ++-
>  3 files changed, 38 insertions(+), 18 deletions(-)
> 
> diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
> index 4ca7e95..74ced81 100644
> --- a/qemu-img-cmds.hx
> +++ b/qemu-img-cmds.hx
> @@ -34,9 +34,9 @@ STEXI
>  ETEXI
>  
>  DEF("convert", img_convert,
> -"convert [-c] [-p] [-q] [-f fmt] [-t cache] [-O output_fmt] [-o options] 
> [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] 
> output_filename")
> +"convert [-c] [-p] [-q] [-C] [-f fmt] [-t cache] [-O output_fmt] [-o 
> options] [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] 
> output_filename")
>  STEXI
> -@item convert [-c] [-p] [-q] [-f @var{fmt}] [-t @var{cache}] [-O 
> @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S 
> @var{sparse_size}] @var{filename} [@var{filename2} [...]] 
> @var{output_filename}
> +@item convert [-c] [-p] [-q] [-C] [-f @var{fmt}] [-t @var{cache}] [-O 
> @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S 
> @var{sparse_size}] @var{filename} [@var{filename2} [...]] 
> @var{output_filename}
>  ETEXI
>  
>  DEF("info", img_info,
> diff --git a/qemu-img.c b/qemu-img.c
> index b9a848d..5b6ae15 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -103,6 +103,7 @@ static void help(void)
> "  '-S' indicates the consecutive number of bytes that must 
> contain only zeros\n"
> "   for qemu-img to create a sparse image during conversion\n"
> "  '--output' takes the format in which the output must be done 
> (human or json)\n"
> +   "  '-C' skips the target volume creation (useful for rbd)\n"

The statement "useful for rbd" is not very informative, maybe you could
use "'-C' skips the target volume creation (assuming it already exists)".

> "\n"
> "Parameters to check subcommand:\n"
> "  '-r' tries to repair any inconsistencies that are found during 
> the check.\n"
> @@ -1116,7 +1117,7 @@ out3:
>  
>  static int img_convert(int argc, char **argv)
>  {
> -int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, 
> cluster_sectors;
> +int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, 
> cluster_sectors, skipcreate;

This line is too long, please break it.

>  int progress = 0, flags;
>  const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
>  BlockDriver *drv, *proto_drv;
> @@ -1139,8 +1140,9 @@ static int img_convert(int argc, char **argv)
>  cache = "unsafe";
>  out_baseimg = NULL;
>  compress = 0;
> +skipcreate = 0;
>  for(;;) {
> -c = getopt(argc, argv, "f:O:B:s:hce6o:pS:t:q");
> +c = getopt(argc, argv, "f:O:B:s:hce6o:pS:t:qC");
>  if (c == -1) {
>  break;
>  }
> @@ -1161,6 +1163,9 @@ static int img_convert(int argc, char **argv)
>  case 'c':
>  compress = 1;
>  break;
> +case 'C':
> +skipcreate = 1;
> +break;
>  case 'e':
>  error_report("option -e is deprecated, please use \'-o "
>"encryption\' instead!");
> @@ -1329,20 +1334,22 @@ static int img_convert(int argc, char **argv)
>  }
>  }
>  
> -/* Create the new image */
> -ret = bdrv_create(drv, out_filename, param);
> -if (ret < 0) {
> -if (ret == -ENOTSUP) {
> -error_report("Formatting not supported for file format '%s'",
> - out_fmt);
> -} else if (ret == -EFBIG) {
> -error_report("The image size is too large for file format '%s'",
> - out_fmt);
> -} else {
> -error_report("%s: error while converting %s: %s",
> - out_filename, out_fmt, strerror(-ret));
> +if (!skipcreate) {
> +/* Create the new image */
> +ret = bdrv_create(drv, out_filename, param);
> +if (ret < 0) {
> +if (ret == -ENOTSUP) {
> + error_report("Formatting not supported for file format 
> '%s'",
> +  out_fmt);
> +} else if (ret == -EFBIG) {
> + error_report("The image size is too large for file format 
> '%s'",
> +  out_fmt);
> +} else {
>

Re: [Qemu-devel] [SeaBIOS] [PATCH] acpi: hide 64-bit PCI hole for Windows XP

2013-08-11 Thread Gerd Hoffmann

  Hi,

> If we make it a rule that PCI is`setup before ACPI tables
> are read, then QEMU can do the patching itself when
> it detects BIOS reading the tables.

Approach makes sense to me.  The ordering constrain shouldn't be a big
burden, hardware detection+bringup (including pci setup) is the first
thing done by the firmware, loading/generating acpi tables is one of the
last things.  And it avoids the need to communicate the addresses (or
patch locations) between qemu+firmware.

What do you want to use this for?  pmbase and xbar are simple, they are
just a single register read.  pci io windows needs a root bus scan, but
should be doable too.

> Gerd, Laszlo,others,  does this rule work for alternative firmwares?

It surely works for coreboot, and I would be very surprised if this
causes trouble for ovmf.

cheers,
  Gerd

Re: [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes

2013-08-11 Thread Wenchao Xia

> Il 10/08/2013 05:24, Wenchao Xia ha scritto:
>> Hi folks,
>>I'd like form a series which remove AioContext's concept and
>> bind to glib's main loop more closely. Since changed place will be
>> a bit much so want to know your opinion before real coding:
> 
> I'm not sure I understand...  What does it buy you to split AioContext
> this way?  First, BhSource and FdSource are needed by block drivers, and
> BhSource needs the notifier to interrupt the main loop.  Second,
> AioContext is _already_ a GSource exactly to integrate closely with
> GLib's main loop.  Look at the series that introduced AioContext back in
> October 2012.  The main AioContext is already added as a GSource to the
> iothread's main loop; aio_wait is used in dataplane for simplicity, but
> it could also use a separate GMainLoop and add AioContext there as a
> GSource.
> 
> Paolo
> 
  It has two parts:
1) rename AioContext to AioSource.
  This is my major purpose, which declare it is not a "context" concept,
and GMainContext is the entity represent the thread's activity. This
can prevent people add wrapper API such as g_main_context_acquire(),
g_main_context_prepare(See qcontext_prepare() in QContext patch). As a
result, standard glib's main loop API can be exposed, so I can add
GMainContext into block layer or any other API layer, instead of
custom encapsulation. For example:
int bdrv_read(GMainContext *ctx,
  BlockDriverState *bs,
  int64_t sector_num,
  uint8_t *buf,
  int nb_sectors)
  In short, it avoid wrapper for GMainContext. I agree AioContext is
_already_ a GSource sub class now, there is no difficult to add
GMainContext *ctx for AioContext's reason. But I am afraid it becomes
not true with more patches comes, since the name is mis-guiding.

2) Break AioSource into FdSource and BhSource.
  This make custom code less and simpler, one Gsource for one kind of
job. It is not necessary but IMHO it will make things clear when add
more things into main loop: add a new Gsource sub class, avoid to
always have relationship with AioContext.

>> changes:
>> **before patch:
>> typedef struct AioContext {
>>  GSource source;
>>  int walking_handlers;
>>  QemuMutex bh_lock;
>>  struct QEMUBH *first_bh;
>>  int walking_bh;
>>  EventNotifier notifier;
>>  GArray *pollfds;
>>  struct ThreadPool *thread_pool;
>> } AioContext;
>>
>> **After patch:
>> typedef struct BhSource {
>>  GSource source;
>>  QemuMutex bh_lock;
>>  struct QEMUBH *first_bh;
>>  int walking_bh;
>> } BhSource;
>>
>> typedef struct FdSource {
>>  GSource source;
>>  int walking_handlers;
>>  EventNotifier notifier;
>>  GArray *pollfds;
>>  struct ThreadPool *thread_pool;
>> } FdSource;
>>
>> Benefits:
>>Original code have a mix of Gsource and GMainContext's concept, we
>> may want to add wrapper functions around GMainContext's functions, such
>> as g_main_context_acquire(), g_main_context_prepare(), which brings
>> extra effort if you want to form a good and clear API layer. With
>> this changes, all qemu's custom code is attached under Gsource, we
>> have a clear GMainContext's API layer for event loop, no wrapper is
>> needed, and the event's loop API is glib's API, a clear layer let
>> me form a library or adding more things.
>>
>> before:
>> qemu's mainloop caller,  BH user, fd user
>>|
>> AioContext
>>|
>>  GMainContext
>>
>>
>> after:
>> qemu's mainloop caller
>>  |   BH userfd user
>> GmainContext   | |
>>  ||--BhSource |
>>   |-FdSource
>>
>>
>> Note:
>>FdSource could be split more into ThreadSource and FdSource, which
>> distinguish more. It can be done easily if the change of this series
>> is online, when found necessary.
>>
>>More reasons:
>>When I thinking how to bind library code to a thread context, it may
>> need to add Context's concept into API of block.c. If I use AioContext,
>> there will need a wrapper API to run the event loop. But If I got
>> glib's GmainContext, things become simple.
> 


-- 
Best Regards

Wenchao Xia

Re: [Qemu-devel] [RFC] [PATCHv8 09/30] aio / timers: Add QEMUTimerListGroup and helper functions

2013-08-11 Thread Wenchao Xia


于 2013-8-10 19:05, Alex Bligh 写道:

Paolo,

On 9 Aug 2013, at 15:59, Paolo Bonzini wrote:


It's not papering over anything.

Timers right now are provided by the event loop.  If you make
AioContexts have timers, you can have a new AioContext for the timers
that the event loop handles before your patches.

It's not related to having two nested event loops.  The nested event
loops have the problem that timers do not run in them, but it's also a
feature---because you know exactly what code runs in the nested event
loop and what doesn't.  Using an entirely distinct event loop preserves
the feature.


I've submitted a first cut as a separate patch just to see whether
it works:
   http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg01412.html

It passes 'make check' anyway.

I didn't wrap TimerListGroup into a separate struct for reasons
I'll set out below.

I think there are two ways to go here:

1. Wrap TimerListGroup into a separate struct, leave all the
TimerListGroup functions in. This probably makes it easier if
(for instance) we decided to get rid of AioContexts entirely
and make them g_source subclasses (per Wenchao Xia).


  I have a quick view of this series, but not very carefully since
it is quite long.:) I have replied with Paolo's comments on my
RFC thread. Personally I like g_source sub class, which make code
clear, but let's discuss first if the direction is right in that mail.



2. Remove the TimerListGroup thing entirely and just have
an array of TimerLists of size QEMU_CLOCK_MAX. I can
leave the things that iterate that array inside qemu_timer.c
(which I'd rather do for encapsulation). Part of the
problem with the old timer code was that there were (in
my opinion) far too many files that were working out what
to iterate, and I really don't want to reintroduce that.

Despite the fact we both dislike the name TimerListGroup, I
think the way to go here is (1). (2) does not really save lines
of code (certainly not compiled instructions) - it's main saving
is removing a pile of commenting from include/qemu/timer.h,
which makes things more opaque.

I also think there may well be a use for something that wants
to use timers but not AioContext (I am thinking for instance
of a thread that does not do block IO). This permits that,
but does not require it.

WDYT?




--
Best Regards

Wenchao Xia

70 matches

Mail list logo