Re: [Qemu-devel] [PATCH v2] Fix guest OS hang when 64bit PCI bar present

2013-02-22 Thread Peter Maydell
On 22 February 2013 03:58, Alexey Korolev akoro...@gmail.com wrote:
 This patch addresses the issue fully described here:
 http://lists.nongnu.org/archive/html/qemu-devel/2013-02/msg01804.html

 Linux kernels prior to 2.6.36 do not disable the PCI device during
 enumeration process. Since lower and higher parts of a 64bit BAR
 are programmed separately this leads to qemu receiving a request to occupy
 a completely wrong address region for a short period of time.
 We have found that the boot process screws up completely if kvm-apic range
 is overlapped even for a short period of time (it is fine for other
 regions though).

 This patch raises the priority of the kvm-apic memory region, so it is
 never pushed out by PCI devices. The patch is quite safe as it does not
 touch memory manager.

 Signed-off-by: Alexey Korolev akoro...@gmail.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Reviewed-by: Peter Maydell peter.mayd...@linaro.org
as far as the sysbus changes go. I can't comment on whether the
change to the PC emulation is correct or not but it looks
vaguely plausible.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v4 00/10] main-loop: switch to g_poll(3) on POSIX hosts

2013-02-22 Thread Stefan Hajnoczi
On Thu, Feb 21, 2013 at 05:29:25PM +0100, Jan Kiszka wrote:
 On 2013-02-21 16:33, Stefan Hajnoczi wrote:
  On Thu, Feb 21, 2013 at 02:17:14PM +0100, Christian Borntraeger wrote:
  On 20/02/13 11:28, Stefan Hajnoczi wrote:
  Amos Kong ak...@redhat.com reported that file descriptors numbered 
  higher
  than 1024 could crash QEMU.  This is due to the fixed size of the fd_set 
  type
  used for select(2) event polling.
 
 
  Good to see somebody working on that.
  We have just faced this problem on s390 after we have experimentally 
  enabled ioeventfd
  for virtio-ccw. We have several scenarios were we want  600 devices for 
  the guest
  and the select loops then fails horribly because we exceed the 1024 fd 
  barrier.
 
  So this looks like it could become a bugfix for that problem.
 
  In terms of scalability it is probably better to have multiple threads 
  that poll on their 
  fds, instead of having one i/o thread. 
  
  I think we'll get there, I'm currently working on adding multiple event
  loop support to QEMU.
 
 In this context, I'd like to recall a detail: Real-time prioritization
 of those I/O threads will most probably require locking with
 prio-inversion avoidance (*). In that case external libs without a
 chance to tune or replace their internal locking (like glib) will be a
 no-go for those kind of I/O threads.

What is stopping glib from becoming RT-friendly?

Stefan



Re: [Qemu-devel] [PATCH v2] doc: document -netdev hubport

2013-02-22 Thread Stefan Hajnoczi
On Thu, Feb 21, 2013 at 03:17:51PM +0100, Markus Armbruster wrote:
 Stefan Hajnoczi stefa...@redhat.com writes:
 
  Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
  ---
   qemu-options.hx | 11 ++-
   1 file changed, 10 insertions(+), 1 deletion(-)
 
  diff --git a/qemu-options.hx b/qemu-options.hx
  index 4bc9c85..cd18ad1 100644
  --- a/qemu-options.hx
  +++ b/qemu-options.hx
  @@ -1404,7 +1404,8 @@ DEF(netdev, HAS_ARG, QEMU_OPTION_netdev,
   #ifdef CONFIG_VDE
   vde|
   #endif
  -socket],id=str[,option][,option][,...]\n, QEMU_ARCH_ALL)
  +socket|
  +hubport],id=str[,option][,option][,...]\n, QEMU_ARCH_ALL)
   STEXI
   @item -net nic[,vlan=@var{n}][,macaddr=@var{mac}][,model=@var{type}] 
  [,name=@var{name}][,addr=@var{addr}][,vectors=@var{v}]
   @findex -net
  @@ -1726,6 +1727,14 @@ vde_switch -F -sock /tmp/myswitch
   qemu-system-i386 linux.img -net nic -net vde,sock=/tmp/myswitch
   @end example
   
  +@item -netdev hubport,id=@var{id},hubid=@var{hubid}
  +
  +Create a hub port on QEMU vlan @var{hubid}.  This syntax is an 
  alternative to
  +the @code{-net @option{vlan}} argument and can be used to connect a NIC
  +specified with @code{-device ...,netdev=@var{id}} to a QEMU vlan.
 
 A simpler way to use a vlan with -device is -device e1000,vlan=0.
 How is that related to hubport?

That is yet another shortcut syntax.  It does the same thing as -netdev
hubport,id=tmphubport0,hubid=0 -device e1000,netdev=tmphubport0.

  +
  +Note that only NICs can be connected to a hubport, other -netdevs cannot.
  +
 
 Well, netdevs can't be connected to a netdev in general, and hubport is
 one.  You can connect up to one device model to a netdev.  I figure you
 can additionally connect any number of old-style -net thingies to it,
 whether they are NICs or not.  Correct?

net.c doesn't stop NetClients from connecting to each other.  It's just
our -netdev option parsing code which doesn't allow -netdevs to connect
with each other.

-net instances can only connect to hub ports.

   @item -net dump[,vlan=@var{n}][,file=@var{file}][,len=@var{len}]
   Dump network traffic on VLAN @var{n} to file @var{file} 
  (@file{qemu-vlan0.pcap} by default).
   At most @var{len} bytes (64k by default) per packet are stored. The file 
  format is
 
 I'm afraid you missed netdev_add in hmp-command.hx.

Thanks, will fix.



[Qemu-devel] [PATCH v3] doc: document -netdev hubport

2013-02-22 Thread Stefan Hajnoczi
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hmp-commands.hx |  2 +-
 qemu-options.hx | 11 ++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 64008a9..c204d31 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1169,7 +1169,7 @@ ETEXI
 {
 .name   = netdev_add,
 .args_type  = netdev:O,
-.params = [user|tap|socket],id=str[,prop=value][,...],
+.params = [user|tap|socket|hubport],id=str[,prop=value][,...],
 .help   = add host network device,
 .mhandler.cmd = hmp_netdev_add,
 },
diff --git a/qemu-options.hx b/qemu-options.hx
index 4bc9c85..cd18ad1 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1404,7 +1404,8 @@ DEF(netdev, HAS_ARG, QEMU_OPTION_netdev,
 #ifdef CONFIG_VDE
 vde|
 #endif
-socket],id=str[,option][,option][,...]\n, QEMU_ARCH_ALL)
+socket|
+hubport],id=str[,option][,option][,...]\n, QEMU_ARCH_ALL)
 STEXI
 @item -net nic[,vlan=@var{n}][,macaddr=@var{mac}][,model=@var{type}] 
[,name=@var{name}][,addr=@var{addr}][,vectors=@var{v}]
 @findex -net
@@ -1726,6 +1727,14 @@ vde_switch -F -sock /tmp/myswitch
 qemu-system-i386 linux.img -net nic -net vde,sock=/tmp/myswitch
 @end example
 
+@item -netdev hubport,id=@var{id},hubid=@var{hubid}
+
+Create a hub port on QEMU vlan @var{hubid}.  This syntax is an alternative to
+the @code{-net @option{vlan}} argument and can be used to connect a NIC
+specified with @code{-device ...,netdev=@var{id}} to a QEMU vlan.
+
+Note that only NICs can be connected to a hubport, other -netdevs cannot.
+
 @item -net dump[,vlan=@var{n}][,file=@var{file}][,len=@var{len}]
 Dump network traffic on VLAN @var{n} to file @var{file} 
(@file{qemu-vlan0.pcap} by default).
 At most @var{len} bytes (64k by default) per packet are stored. The file 
format is
-- 
1.8.1.2




Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 00:38, mdroth ha scritto:
 On Thu, Feb 21, 2013 at 11:12:01PM +0100, Paolo Bonzini wrote:
 Il 21/02/2013 22:07, mdroth ha scritto:

 100% agree.  In particular hw/dataplane/event-poll.c should be the first
 to go away, but AioContext provides the functionality that Ping Fan
 needs.  But hw/dataplane/vring.c will probably be here for a longer
 Has there been any discussion around introducing something similar to
 AioContexts for fd handlers? This would avoid the dataplane-specific hooks
 needed for NetClients in this series.

 AioContext can include file descriptors on POSIX systems (used for NBD
 and other network backends), see aio_set_fd_handler.
 
 Sorry, was using fd handlers too generally. I mean specifically for
 the qemu_set_fd_handler interfaces, where we currently rely on a single list
 of IOHandlerRecords for registration and a single loop to drive them. Would
 be nice if we could drive subsets of those via mini main loops, similar to the
 way dataplane threads would with a particular AioContext via aio_poll (or 
 perhaps
 the exact same way)

Yes, that's what I meant actually.  You can already do it for POSIX,
unfortunately Windows poses extra complication because sockets are not
handles.

Moving more of the os_host_main_loop_wait to AioContext would be
possible (timers are on the todo list, in fact), but we should only do
it as need arises.

Paolo

 Currently, Ping Fan's patches basically do this already by accessing a
 global reference to the vnet worker thread and attaching events/handlers to
 it's event loop via a new set of registration functions (PATCH 7).
 
 I think generalizing this by basing qemu_set_fd_handler() around
 AioContext, or something similar, would help to extend support to other
 NetClient implementations without requiring dataplane-specific hooks
 throughout.



Re: [Qemu-devel] [PATCH] net: reduce the unnecessary memory allocation of multiqueue

2013-02-22 Thread Stefan Hajnoczi
On Thu, Feb 21, 2013 at 05:45:57PM +0800, Jason Wang wrote:
 @@ -1397,6 +1398,7 @@ void virtio_net_exit(VirtIODevice *vdev)
  
  g_free(n-mac_table.macs);
  g_free(n-vlans);
 +g_free(n-vqs);
  
  for (i = 0; i  n-max_queues; i++) {
  VirtIONetQueue *q = n-vqs[i];

Use after free!



Re: [Qemu-devel] [Xen-devel] [PATCH] qemu: define a TOM register to report the base of PCI

2013-02-22 Thread Jan Beulich
 On 22.02.13 at 04:23, Xudong Hao xudong@intel.com wrote:
 @@ -203,6 +251,16 @@ static int i440fx_initfn(PCIDevice *dev)
  
  d-dev.config[I440FX_SMRAM] = 0x02;
  
 +/* Emulate top of memory, here use 0xe000 as default val*/
 +if (xen_enabled()) {
 +d-dev.config[I440FX_PCI_HOLE_BASE] = 0xf0;
 +} else {
 +d-dev.config[I440FX_PCI_HOLE_BASE] = 0xe0;
 +}
 +d-dev.config[I440FX_PCI_HOLE_BASE + 1] = 0x00;
 +d-dev.config[I440FX_PCI_HOLE_BASE + 2] = 0x00;
 +d-dev.config[I440FX_PCI_HOLE_BASE + 3] = 0x00;
 +
  cpu_smm_register(i440fx_set_smm, d);
  return 0;
  }

Isn't this the wrong way round (big endian, when it needs to be
little)?

Or else, this read and calculation

+pci_hole_start = pci_default_read_config(f-dev, I440FX_PCI_HOLE_BASE, 
4);
+pci_hole_size = 0x1ULL - pci_hole_start;

would seem wrong (e.g. if the granularity is meant to be 16M).

And then again, wasting 4 bytes of config space on something that
one ought to be allowed to expect to be at least 64k-aligned seems
questionable too. hvmloader surely could align the value up to the
next 64k boundary before writing the - then only word size - field.
That would come closer to how real chipsets normally implement
things like this.

Jan




Re: [Qemu-devel] [PATCH v4 00/10] main-loop: switch to g_poll(3) on POSIX hosts

2013-02-22 Thread Jan Kiszka
On 2013-02-22 09:40, Stefan Hajnoczi wrote:
 On Thu, Feb 21, 2013 at 05:29:25PM +0100, Jan Kiszka wrote:
 On 2013-02-21 16:33, Stefan Hajnoczi wrote:
 On Thu, Feb 21, 2013 at 02:17:14PM +0100, Christian Borntraeger wrote:
 On 20/02/13 11:28, Stefan Hajnoczi wrote:
 Amos Kong ak...@redhat.com reported that file descriptors numbered 
 higher
 than 1024 could crash QEMU.  This is due to the fixed size of the fd_set 
 type
 used for select(2) event polling.


 Good to see somebody working on that.
 We have just faced this problem on s390 after we have experimentally 
 enabled ioeventfd
 for virtio-ccw. We have several scenarios were we want  600 devices for 
 the guest
 and the select loops then fails horribly because we exceed the 1024 fd 
 barrier.

 So this looks like it could become a bugfix for that problem.

 In terms of scalability it is probably better to have multiple threads 
 that poll on their 
 fds, instead of having one i/o thread. 

 I think we'll get there, I'm currently working on adding multiple event
 loop support to QEMU.

 In this context, I'd like to recall a detail: Real-time prioritization
 of those I/O threads will most probably require locking with
 prio-inversion avoidance (*). In that case external libs without a
 chance to tune or replace their internal locking (like glib) will be a
 no-go for those kind of I/O threads.
 
 What is stopping glib from becoming RT-friendly?

It's not the target audience of glib.

RT means you care a lot about what happens underneath you, and glib is
much more about avoiding this detail knowledge. Also, RT is not as easy
portable as the simple internal and external locking primitives that
glib currently uses/supports. Even without RT use cases, we are already
unable to use glib's synchronization primitives for QEMU purposes.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH] dataplane: remove EventPoll in favor of AioContext

2013-02-22 Thread Stefan Hajnoczi
On Thu, Feb 21, 2013 at 05:29:55PM +0100, Paolo Bonzini wrote:
 The only interesting note is the value-copy of EventNotifiers.  At least
 in my opinion this is part of the EventNotifier API and is even portable
 to Windows.  Of course, in this case you should not close the notifier's
 underlying file descriptors or handle with event_notifier_cleanup.

This is worth a comment in the code.

 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  hw/dataplane/Makefile.objs |   2 +-
  hw/dataplane/event-poll.c  | 100 
 -
  hw/dataplane/event-poll.h  |  40 --
  hw/dataplane/virtio-blk.c  |  41 ++-
  4 files changed, 22 insertions(+), 161 deletions(-)
  delete mode 100644 hw/dataplane/event-poll.c
  delete mode 100644 hw/dataplane/event-poll.h

Looks good aside from what mdroth pointed out.

Stefan



Re: [Qemu-devel] [PATCH v2] Fix guest OS hang when 64bit PCI bar present

2013-02-22 Thread Jan Kiszka
On 2013-02-22 04:58, Alexey Korolev wrote:
 This patch addresses the issue fully described here:
 http://lists.nongnu.org/archive/html/qemu-devel/2013-02/msg01804.html
 
 Linux kernels prior to 2.6.36 do not disable the PCI device during
 enumeration process. Since lower and higher parts of a 64bit BAR
 are programmed separately this leads to qemu receiving a request to occupy
 a completely wrong address region for a short period of time.
 We have found that the boot process screws up completely if kvm-apic range
 is overlapped even for a short period of time (it is fine for other
 regions though).

So the issue it that we hide the MSI target region from the device
models for a short periods of time? How short is this? It would be
good to fully understand what goes completely wrong to avoid that we
miss to fix something else (IO-APIC, HPET) or even paper over a problem
of the memory subsystem.

 
 This patch raises the priority of the kvm-apic memory region, so it is
 never pushed out by PCI devices. The patch is quite safe as it does not
 touch memory manager.
 
 Signed-off-by: Alexey Korolev akoro...@gmail.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/sysbus.c   |   27 +++
  hw/sysbus.h   |2 ++
  target-i386/cpu.c |3 ++-
  3 files changed, 27 insertions(+), 5 deletions(-)
 
 diff --git a/hw/sysbus.c b/hw/sysbus.c
 index 6d9d1df..50c7232 100644
 --- a/hw/sysbus.c
 +++ b/hw/sysbus.c
 @@ -48,7 +48,8 @@ void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq 
 irq)
  }
  }
  
 -void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
 +static void sysbus_mmio_map_common(SysBusDevice *dev, int n, hwaddr addr,
 +   bool may_overlap, unsigned priority)
  {
  assert(n = 0  n  dev-num_mmio);
  
 @@ -61,11 +62,29 @@ void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr 
 addr)
  memory_region_del_subregion(get_system_memory(), 
 dev-mmio[n].memory);
  }
  dev-mmio[n].addr = addr;
 -memory_region_add_subregion(get_system_memory(),
 -addr,
 -dev-mmio[n].memory);
 +if (may_overlap) {
 +memory_region_add_subregion_overlap(get_system_memory(),
 +addr,
 +dev-mmio[n].memory,
 +priority);
 +}
 +else {

Coding style: } else {

 +memory_region_add_subregion(get_system_memory(),
 +addr,
 +dev-mmio[n].memory);
 +}
  }
  
 +void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
 +{
 +sysbus_mmio_map_common(dev, n, addr, false, 0);
 +}
 +
 +void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr,
 + unsigned priority)
 +{
 +sysbus_mmio_map_common(dev, n, addr, true, priority);
 +}
  
  /* Request an IRQ source.  The actual IRQ object may be populated later.  */
  void sysbus_init_irq(SysBusDevice *dev, qemu_irq *p)
 diff --git a/hw/sysbus.h b/hw/sysbus.h
 index a7fcded..2100bd7 100644
 --- a/hw/sysbus.h
 +++ b/hw/sysbus.h
 @@ -56,6 +56,8 @@ void sysbus_init_ioports(SysBusDevice *dev, pio_addr_t 
 ioport, pio_addr_t size);
  
  void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq irq);
  void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr);
 +void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr,
 + unsigned priority);
  void sysbus_add_memory(SysBusDevice *dev, hwaddr addr,
 MemoryRegion *mem);
  void sysbus_add_memory_overlap(SysBusDevice *dev, hwaddr addr,
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index dfcf86e..4cf27eb 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -2078,7 +2078,8 @@ static void x86_cpu_apic_init(X86CPU *cpu, Error **errp)
  /* NOTE: the APIC is directly connected to the CPU - it is not
 on the global memory bus. */
  /* XXX: what if the base changes? */
 -sysbus_mmio_map(SYS_BUS_DEVICE(env-apic_state), 0, MSI_ADDR_BASE);
 +sysbus_mmio_map_overlap(SYS_BUS_DEVICE(env-apic_state), 0,
 +MSI_ADDR_BASE, 0x1000);
  ^^
Minor, but 0x makes this look like an address.

  apic_mapped = 1;
  }
  }
 

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] using -net dump with tap networking

2013-02-22 Thread Markus Armbruster
Paolo Bonzini pbonz...@redhat.com writes:

 Il 21/02/2013 15:41, Markus Armbruster ha scritto:
 Trouble is the user interface is as confusing as ever.  Worse, in a way,
 because we now have to explain how vlan and the hubport netdev relate.
 
 Permit me a brief rant.  Have you read the manual page on -netdev and
 -net recently?  Have you tried to explain setting up QEMU networking to
 a neophyte?  Were you able to keep a straight face?  If yes, congrats,
 you got a great future ahead of you.
 
 We have so many ways to do the same thing, complicating our docs
 enormously.  Just one of them makes sense for almost all uses, but I
 doubt mortal users could guess which one just from the docs.
 
 Can we do for the docs what Stefan did for the code, i.e. get vlans
 out of the way?
 
 Specifically, what's missing to let us deprecate -net completely?

 Less important: the fact that -net dump is quite useful and requires
 vlans.

Let me rephrase: dumping network traffic is quite useful, and our
current way to do that from within QEMU requires vlans.

The dump vlan client is neither a net frontend nor a net backend, it's
a passive listener.  Could also be viewed as special case of a filter
that happens not to change anything.

If it's the only useful listener or filter, we could just special-case
it and be done.

If not, we need a way to attach listeners to the 1:1 netdev connections,
or turn it in a chain with filters as interior nodes.

(-net socket has some usecases too: -net bridge helps placing
 a VM's network on a bridge, but adding a bridge still requires root
 privileges).

-netdev socket and -netdev bridge don't cut it?

 More important: the fact that qdev and -device really only works for ISA
 and PCI devices.  Boards whose NICs are sysbus devices can only add
 networking with -net nic.

To be more precise: -device currently works only for devices on
pluggable buses.

Anything you could plug on a physical machine should be on a pluggable
bus (whether we implemented plugging is another question).

Anything you couldn't plug on a physical machine must be soldered onto
the board.  Boards often come in a few flavours with different onboard
devices.  Virtual boards tend to sprout more flavours.  Because of that,
thinking in board options you can combine is often more convenient than
having a finite number of board variants.

QOM will hopefully enable us to stitch devices together without the need
for buses.  But we're not there, yet.  When we are, we still have to
bridge the gap from interface suitable for building machines from
components to nice command line interface for customizing machines.

So, I quite agree that we need a way to configure board options.  Our
current way to do that for NICs is multiplexed into -net as -net nic.

-net nic is quite unlike the other -net: it doesn't create a vlan
client, it asks the board to create a NIC device.  Parameter model
lets you ask for a specific one.

Boards can do what they want with these requests.  Some boards don't
support NICs and simply ignore these requests (generic code attempts to
detect that and warn).  Others treat models they don't know as fatal
error.  Some boards create exactly the NICs you asked for.  Others
create exactly their onboard NICs, whether you asked for them or not
(asking is merely a way to supply configuration parameters).

Most (all?) boards can tell you what models they support (-net
nic,model=help), but some lie, e.g. PCs neglect to mention model
ne2k_isa.

Boards supporting PCI generally recognize a few common PCI NIC models.
But you're better off using -device there, because it gives you access
to all device models and options at no extra charge.

Well, here's another nice mess you've gotten me into, Stanley!


TL;DR: I quite agree that we can't just throw away -net without a
replacement.  I just think it's a mess, and should be replaced.



[Qemu-devel] [PATCH v2] dataplane: remove EventPoll in favor of AioContext

2013-02-22 Thread Paolo Bonzini
During the review of the dataplane code, the EventPoll API morphed itself
(not concidentially) into something very very similar to an AioContext.
Thus, it is trivial to convert virtio-blk-dataplane to use AioContext,
and a first baby step towards letting dataplane talk directly to the
QEMU block layer.

The only interesting note is the value-copy of EventNotifiers.  At least
in my opinion this is part of the EventNotifier API and is even portable
to Windows.  Of course, in this case you should not close the notifier's
underlying file descriptors or handle with event_notifier_cleanup.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/dataplane/Makefile.objs |   2 +-
 hw/dataplane/event-poll.c  | 100 -
 hw/dataplane/event-poll.h  |  40 --
 hw/dataplane/virtio-blk.c  |  48 +-
 4 files changed, 29 insertions(+), 161 deletions(-)
 delete mode 100644 hw/dataplane/event-poll.c
 delete mode 100644 hw/dataplane/event-poll.h

diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
index 3e47d05..70c 100644
--- a/hw/dataplane/Makefile.objs
+++ b/hw/dataplane/Makefile.objs
@@ -1 +1 @@
-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o ioq.o 
virtio-blk.o
+obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o ioq.o virtio-blk.o
diff --git a/hw/dataplane/event-poll.c b/hw/dataplane/event-poll.c
deleted file mode 100644
index 2b55c6e..000
--- a/hw/dataplane/event-poll.c
+++ /dev/null
@@ -1,100 +0,0 @@
-/*
- * Event loop with file descriptor polling
- *
- * Copyright 2012 IBM, Corp.
- * Copyright 2012 Red Hat, Inc. and/or its affiliates
- *
- * Authors:
- *   Stefan Hajnoczi stefa...@redhat.com
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#include sys/epoll.h
-#include hw/dataplane/event-poll.h
-
-/* Add an event notifier and its callback for polling */
-void event_poll_add(EventPoll *poll, EventHandler *handler,
-EventNotifier *notifier, EventCallback *callback)
-{
-struct epoll_event event = {
-.events = EPOLLIN,
-.data.ptr = handler,
-};
-handler-notifier = notifier;
-handler-callback = callback;
-if (epoll_ctl(poll-epoll_fd, EPOLL_CTL_ADD,
-  event_notifier_get_fd(notifier), event) != 0) {
-fprintf(stderr, failed to add event handler to epoll: %m\n);
-exit(1);
-}
-}
-
-/* Event callback for stopping event_poll() */
-static void handle_stop(EventHandler *handler)
-{
-/* Do nothing */
-}
-
-void event_poll_init(EventPoll *poll)
-{
-/* Create epoll file descriptor */
-poll-epoll_fd = epoll_create1(EPOLL_CLOEXEC);
-if (poll-epoll_fd  0) {
-fprintf(stderr, epoll_create1 failed: %m\n);
-exit(1);
-}
-
-/* Set up stop notifier */
-if (event_notifier_init(poll-stop_notifier, 0)  0) {
-fprintf(stderr, failed to init stop notifier\n);
-exit(1);
-}
-event_poll_add(poll, poll-stop_handler,
-   poll-stop_notifier, handle_stop);
-}
-
-void event_poll_cleanup(EventPoll *poll)
-{
-event_notifier_cleanup(poll-stop_notifier);
-close(poll-epoll_fd);
-poll-epoll_fd = -1;
-}
-
-/* Block until the next event and invoke its callback */
-void event_poll(EventPoll *poll)
-{
-EventHandler *handler;
-struct epoll_event event;
-int nevents;
-
-/* Wait for the next event.  Only do one event per call to keep the
- * function simple, this could be changed later. */
-do {
-nevents = epoll_wait(poll-epoll_fd, event, 1, -1);
-} while (nevents  0  errno == EINTR);
-if (unlikely(nevents != 1)) {
-fprintf(stderr, epoll_wait failed: %m\n);
-exit(1); /* should never happen */
-}
-
-/* Find out which event handler has become active */
-handler = event.data.ptr;
-
-/* Clear the eventfd */
-event_notifier_test_and_clear(handler-notifier);
-
-/* Handle the event */
-handler-callback(handler);
-}
-
-/* Stop event_poll()
- *
- * This function can be used from another thread.
- */
-void event_poll_notify(EventPoll *poll)
-{
-event_notifier_set(poll-stop_notifier);
-}
diff --git a/hw/dataplane/event-poll.h b/hw/dataplane/event-poll.h
deleted file mode 100644
index 3e8d3ec..000
--- a/hw/dataplane/event-poll.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/*
- * Event loop with file descriptor polling
- *
- * Copyright 2012 IBM, Corp.
- * Copyright 2012 Red Hat, Inc. and/or its affiliates
- *
- * Authors:
- *   Stefan Hajnoczi stefa...@redhat.com
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef EVENT_POLL_H
-#define EVENT_POLL_H
-
-#include qemu/event_notifier.h
-
-typedef struct EventHandler EventHandler;
-typedef void EventCallback(EventHandler *handler);
-struct 

Re: [Qemu-devel] [PATCH v2 2/2] migration: add migrate_set_state(), add trace_migrate_set_state()

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/02/2013 07:32, Kazuya Saito ha scritto:
 Signed-off-by: Kazuya Saito saito.kaz...@jp.fujitsu.com

 Unfortunately, this conflicts with my series to simplify migration.c
 (branch migration-thread-20130115 in git://github.com/bonzini/qemu.git).

 I'm not sure how to proceed here, because migrate_set_state doesn't
 exist anymore.  Probably you need many different tracepoints.
 +
 +void migrate_set_state(MigrationState *s, int new_state)
 +{
 +s-state = new_state;
 +trace_migrate_set_state(new_state);
 +}

It was introduced with this very patch O:-)

I take them, and will integrate them on top of your series.

Later, Juan.



Re: [Qemu-devel] [PATCH v2 2/2] migration: add migrate_set_state(), add trace_migrate_set_state()

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 10:50, Juan Quintela ha scritto:
 Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/02/2013 07:32, Kazuya Saito ha scritto:
 Signed-off-by: Kazuya Saito saito.kaz...@jp.fujitsu.com

 Unfortunately, this conflicts with my series to simplify migration.c
 (branch migration-thread-20130115 in git://github.com/bonzini/qemu.git).

 I'm not sure how to proceed here, because migrate_set_state doesn't
 exist anymore.  Probably you need many different tracepoints.
 +
 +void migrate_set_state(MigrationState *s, int new_state)
 +{
 +s-state = new_state;
 +trace_migrate_set_state(new_state);
 +}
 
 It was introduced with this very patch O:-)
 
 I take them, and will integrate them on top of your series.

Kazuya already implemented this on top of my series, so I'll post his
patch together with my v2.

Paolo




Re: [Qemu-devel] [PATCH v2] doc: document -netdev hubport

2013-02-22 Thread Markus Armbruster
Stefan Hajnoczi stefa...@redhat.com writes:

 On Thu, Feb 21, 2013 at 03:17:51PM +0100, Markus Armbruster wrote:
 Stefan Hajnoczi stefa...@redhat.com writes:
 
  Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
  ---
   qemu-options.hx | 11 ++-
   1 file changed, 10 insertions(+), 1 deletion(-)
 
  diff --git a/qemu-options.hx b/qemu-options.hx
  index 4bc9c85..cd18ad1 100644
  --- a/qemu-options.hx
  +++ b/qemu-options.hx
  @@ -1404,7 +1404,8 @@ DEF(netdev, HAS_ARG, QEMU_OPTION_netdev,
   #ifdef CONFIG_VDE
   vde|
   #endif
  -socket],id=str[,option][,option][,...]\n, QEMU_ARCH_ALL)
  +socket|
  +hubport],id=str[,option][,option][,...]\n, QEMU_ARCH_ALL)
   STEXI
   @item -net nic[,vlan=@var{n}][,macaddr=@var{mac}][,model=@var{type}] 
  [,name=@var{name}][,addr=@var{addr}][,vectors=@var{v}]
   @findex -net
  @@ -1726,6 +1727,14 @@ vde_switch -F -sock /tmp/myswitch
   qemu-system-i386 linux.img -net nic -net vde,sock=/tmp/myswitch
   @end example
   
  +@item -netdev hubport,id=@var{id},hubid=@var{hubid}
  +
  +Create a hub port on QEMU vlan @var{hubid}.  This syntax is an 
  alternative to
  +the @code{-net @option{vlan}} argument and can be used to connect a NIC
  +specified with @code{-device ...,netdev=@var{id}} to a QEMU vlan.
 
 A simpler way to use a vlan with -device is -device e1000,vlan=0.
 How is that related to hubport?

 That is yet another shortcut syntax.  It does the same thing as -netdev
 hubport,id=tmphubport0,hubid=0 -device e1000,netdev=tmphubport0.

Perhaps documentation should point to the shorthand syntax.  Let me try:

Create a hub port on QEMU vlan @var{hubid}.

The hubport netdev lets you connect a NIC to a QEMU vlan instead
of a single netdev.  @code{-net} and @code{-device} with parameter
@option{vlan} create the required hub automatically.

  +
  +Note that only NICs can be connected to a hubport, other -netdevs cannot.
  +
 
 Well, netdevs can't be connected to a netdev in general, and hubport is
 one.  You can connect up to one device model to a netdev.  I figure you
 can additionally connect any number of old-style -net thingies to it,
 whether they are NICs or not.  Correct?

 net.c doesn't stop NetClients from connecting to each other.  It's just
 our -netdev option parsing code which doesn't allow -netdevs to connect
 with each other.

 -net instances can only connect to hub ports.

I find the note confusing.  Only NICs suggests only NICs can connect,
which isn't true; one NIC and any number of vlan clients (defined with
-net) can connect.  The fact that the code would happily connect
arbitrary netdevs is of no interest to the user.  Drop the note?

[...]



Re: [Qemu-devel] using -net dump with tap networking

2013-02-22 Thread Jan Kiszka
On 2013-02-22 10:35, Markus Armbruster wrote:
(-net socket has some usecases too: -net bridge helps placing
 a VM's network on a bridge, but adding a bridge still requires root
 privileges).
 
 -netdev socket and -netdev bridge don't cut it?

Reminds me: Last time I tried -netdev socket, it was broken. There
should be nothing preventing this setup conceptually.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH v4 3/6] add backup related monitor commands

2013-02-22 Thread Stefan Hajnoczi
On Thu, Feb 21, 2013 at 03:48:57PM +, Dietmar Maurer wrote:
   In future, we can allow to pass multiple config files - the vma
   archive format can already handle that.
  
  My point is that QEMU has no business dealing with the management tool's VM
  configuration file.  And I think the management tool-specific archive 
  format also
  shouldn't be in QEMU.
  
  How will a proxmox user restore a VMA file generated with oVirt since the
  configuration file comes from the management layer?  They can't because the
  VMA is specific to the management layer and should be handled there, not 
  inside
  QEMU.
 
 The management tooI just needs to convert the config - looks quite easy to me.

It's not an easy problem.  This is why there is no central vm-images.com
where everyone can share/sell virtual appliances.  You cannot trivially
convert between VMware, oVirt, proxmox, Xen, EC2, etc.

  QEMU must provide the mechanism for point-in-time backups of block devices -
  your backup block job implements this.
  
  Where I disagree with this patch series is that you put the management tool-
  specific archive format writer into QEMU.  That is outside the scope of 
  QEMU.
  The management tool should tell QEMU to take the backups of block devices
  and do a live migration to file.
  
  The management tool can use a NBD server if it wants to capture all the 
  block
  device backups into a single archive.  And the management tool can bundle 
  the
  VM configuration into that archive too.  But those steps are beyond the 
  scope of
  QEMU.
  
  The approach I'm suggesting is more modular.  For example, the backup block
  job can also be used to copy out the state of a disk into a new
  qcow2 file.
 
 OK, I can try to split the code. But I think I will simply define a 
 'protocol' instead
 of using an NBD server (what for?)

block/nbd.c already exists so it saves you from writing the QEMU-side
code to export data to another process.

The archive writer program opens a listen socket for each block device
that is being backed up.  Then it handles NBD WRITE requests from QEMU.

 diff --git a/qmp-commands.hx b/qmp-commands.hx index
 799adea..17e381b
 100644
 --- a/qmp-commands.hx
 +++ b/qmp-commands.hx
 @@ -889,6 +889,18 @@ EQMP
  },

  {
 +.name   = backup,
 +.args_type  = backup-file:s,format:s?,config-
  file:F?,speed:o?,devlist:s?,
 +.mhandler.cmd_new = qmp_marshal_input_backup,
 +},
 +
 +{
 +.name   = backup-cancel,
 +.args_type  = ,
 +.mhandler.cmd_new = qmp_marshal_input_backup_cancel,
 +},
   
We might want to more than one backup if the guest has multiple disks.
For example, the database disk is backed up independently of the main OS
  disk.
   
This API only allows one backup to run at a time.
  
   I do not want multiple backup jobs. You can easily run 2 jobs in sequence 
   to
  solve above use case.
  
  Why do you not want multiple backup jobs?  It makes perfect sense to 
  separate
  database disks from main OS disks.  They have different backup 
  characteristics
  (how often to back up, how to restore) so it's likely that users will ask 
  for
  multiple backup jobs at the same time.  Let's get the QMP interfaces right 
  so that
  it can be supported in the future, if not right away.
 
 So you want to pass the 'uuid' to backup-cancel?

Yes, that works.  Or let the user specify the ID for this backup job.
That way the management tool can use its own IDs.  QEMU doesn't care, it
treats the IDs as opaque strings.

Stefan



Re: [Qemu-devel] [PATCH v4 00/10] main-loop: switch to g_poll(3) on POSIX hosts

2013-02-22 Thread Laurent Desnogues
On Wed, Feb 20, 2013 at 11:28 AM, Stefan Hajnoczi stefa...@redhat.com wrote:
 Amos Kong ak...@redhat.com reported that file descriptors numbered higher
 than 1024 could crash QEMU.  This is due to the fixed size of the fd_set type
 used for select(2) event polling.

 This series converts the main-loop.c and aio-posix.c select(2) calls to
 g_poll(3).  This eliminates the fd_set type and allows QEMU to scale to high
 numbers of file descriptors.

 The g_poll(3) interface is a portable version of the poll(2) system call.  The
 difference to select(2) is that fine-grained events (G_IO_IN, G_IO_OUT,
 G_IO_HUP, G_IO_ERR, G_IO_PRI) can be monitored instead of just
 read/write/exception.  Also, there is no limit to the file descriptor numbers
 that may be used, allowing applications to scale to many file descriptors.  
 See
 the documentation for details:

   http://developer.gnome.org/glib/2.28/glib-The-Main-Event-Loop.html#g-poll

g_poll isn't available in older glib versions used by CentOS (2.12
on CentOS 5.6).

Thanks,

Laurent

 The QEMU main loop works as follows today:

 1. Call out to slirp, iohandlers, and glib sources to fill rfds/wfds/xfds with
the file descriptors to select(2).
 2. Perform the select(2) call.
 3. Call out to slirp, iohandlers, and glib sources to handle events polled in
rfds/wfds/xfds.

 The plan of attack is as follows:

 1. Replace select(2) with g_poll(3).  Use glue that converts between
rfds/wfds/xfds and GPollFD so that the unconverted QEMU components still
work.

 2. Convert slirp, iohandlers, and glib source fill/poll functions to use
GPollFD directly instead of rfds/wfds/xfds.

 3. Drop the glue since all components now natively use GPollFD.

 4. Convert aio-posix.c to g_poll(3) by reusing GPollFD.

 I have tested that the series builds and is bisectable on Linux and Windows
 hosts.  But I have not done extensive testing on other host platforms or with
 long-term guests to check for performance regressions.

 v4:
  * Assign revents instead of bitwise OR to make code clearer [Laszlo]
  * Invoke pollfds_poll() only when select(2) succeeds [Laszlo]
  * Fix gpollfds_to_select(select_ret || g_poll_ret) call [Laszlo]

 v3:
  * Convert slirp/slirp.c to tabs to spaces [Blue Swirl]

 v2:
  * Replace custom Poller type with GArray [aliguori]

 Stefan Hajnoczi (10):
   main-loop: fix select_ret uninitialized variable warning
   main-loop: switch to g_poll() on POSIX hosts
   main-loop: switch POSIX glib integration to GPollFD
   slirp: slirp/slirp.c coding style cleanup
   slirp: switch to GPollFD
   iohandler: switch to GPollFD
   main-loop: drop rfds/wfds/xfds for good
   aio: extract aio_dispatch() from aio_poll()
   aio: convert aio_poll() to g_poll(3)
   aio: support G_IO_HUP and G_IO_ERR

  aio-posix.c  | 130 -
  async.c  |   2 +
  include/block/aio.h  |   3 +
  include/qemu/main-loop.h |   4 +-
  iohandler.c  |  40 ++-
  main-loop.c  | 158 ++-
  slirp/libslirp.h |   6 +-
  slirp/main.h |   1 -
  slirp/slirp.c| 673 
 +--
  slirp/socket.c   |   9 -
  slirp/socket.h   |   2 +
  stubs/slirp.c|   6 +-
  12 files changed, 547 insertions(+), 487 deletions(-)

 --
 1.8.1.2





Re: [Qemu-devel] [PATCH v5 0/6] Efficient VM backup for qemu

2013-02-22 Thread Markus Armbruster
I intend to review the new QMP commands as soon as I can find the time.



[Qemu-devel] [Bug 1130533] Re: Documentation cannot be build since commit c70a01e449536c616c85ab820c6fbad7d7e9cf39

2013-02-22 Thread FredBezies
Patch is ok, with this little warning when applied. Nothing really bad.

patching file qemu-options.hx
Hunk #1 succeeded at 2097 (offset 2 lines).

git head is :
http://git.qemu.org/?p=qemu.git;a=commit;h=73d4dc71f3a41131541c73b3ac2a8b160a51842b

Besides this,  it builds perfectly.

Any hope to get it upstream ?

Thanks !

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1130533

Title:
  Documentation cannot be build since commit
  c70a01e449536c616c85ab820c6fbad7d7e9cf39

Status in QEMU:
  New

Bug description:
  I tried to build git -based qemu and when documentation is processed I
  got this error :

  ./qemu-options.texi:1526: unknown command `list'
  ./qemu-options.texi:1526: table requires an argument: the formatter for @item
  ./qemu-options.texi:1526: warning: @table has text but no @item

  Looks like commit c70a01e449536c616c85ab820c6fbad7d7e9cf39 is guilty
  ?!

  Or any modification related to documentation, I think.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1130533/+subscriptions



Re: [Qemu-devel] GTK UI is now the default

2013-02-22 Thread Laurent Desnogues
On Fri, Feb 22, 2013 at 12:04 AM, Anthony Liguori aligu...@us.ibm.com wrote:

 Since this is a pretty visible change for a lot of people, I thought I'd
 send a top level note.  The GTK UI is now committed and is the default
 UI provided it's available.

 For anyone counting, it's been a little more than 7 years in the making:

 http://article.gmane.org/gmane.comp.emulators.qemu/9726

 If you want to try it, make sure you have the gtk-2.0 development
 packages installed and the VTE development packages.

It looks like gtk_widget_get_window isn't available in all gtk 2 versions,
at least it's not in the gtk2 2.10 on my CentOS 5.6 system.

Thanks,

Laurent



Re: [Qemu-devel] [PATCH v4 3/6] add backup related monitor commands

2013-02-22 Thread Dietmar Maurer
  The management tooI just needs to convert the config - looks quite easy to
 me.
 
 It's not an easy problem.  This is why there is no central vm-images.com where
 everyone can share/sell virtual appliances.  You cannot trivially convert 
 between
 VMware, oVirt, proxmox, Xen, EC2, etc.

For that case, I added the ability to write several different config files into 
a VMA archive.

So If someone want to distribute an appliance, he can simply add a config file 
for each manager.
 
   QEMU must provide the mechanism for point-in-time backups of block
   devices - your backup block job implements this.
  
   Where I disagree with this patch series is that you put the
   management tool- specific archive format writer into QEMU.  That is 
   outside
 the scope of QEMU.
   The management tool should tell QEMU to take the backups of block
   devices and do a live migration to file.
  
   The management tool can use a NBD server if it wants to capture all
   the block device backups into a single archive.  And the management
   tool can bundle the VM configuration into that archive too.  But
   those steps are beyond the scope of QEMU.
  
   The approach I'm suggesting is more modular.  For example, the
   backup block job can also be used to copy out the state of a disk
   into a new
   qcow2 file.
 
  OK, I can try to split the code. But I think I will simply define a
  'protocol' instead of using an NBD server (what for?)
 
 block/nbd.c already exists so it saves you from writing the QEMU-side code to
 export data to another process.

Is there some documentation about that NBD protocol?

 The archive writer program opens a listen socket for each block device that is
 being backed up.  Then it handles NBD WRITE requests from QEMU.

I still think using a specific protocol is wrong. You don't want to use NDB if 
you can directly
talk to some backup server?




[Qemu-devel] [PATCH 3/4] migration: don't account sleep time for calculating bandwidth

2013-02-22 Thread Juan Quintela
While we are sleeping we are not sending, so we should not use that
time to estimate our bandwidth.

Signed-off-by: Juan Quintela quint...@redhat.com


Reviewed-by: Orit Wasserman owass...@redhat.com
---
 migration.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/migration.c b/migration.c
index b8e412f..6649e3a 100644
--- a/migration.c
+++ b/migration.c
@@ -658,6 +658,7 @@ static void *buffered_file_thread(void *opaque)
 {
 MigrationState *s = opaque;
 int64_t initial_time = qemu_get_clock_ms(rt_clock);
+int64_t sleep_time = 0;
 int64_t max_size = 0;
 bool last_round = false;
 int ret;
@@ -730,7 +731,7 @@ static void *buffered_file_thread(void *opaque)
 current_time = qemu_get_clock_ms(rt_clock);
 if (current_time = initial_time + BUFFER_DELAY) {
 uint64_t transferred_bytes = s-bytes_xfer;
-uint64_t time_spent = current_time - initial_time;
+uint64_t time_spent = current_time - initial_time - sleep_time;
 double bandwidth = transferred_bytes / time_spent;
 max_size = bandwidth * migrate_max_downtime() / 100;

@@ -739,11 +740,13 @@ static void *buffered_file_thread(void *opaque)
 transferred_bytes, time_spent, bandwidth, max_size);

 s-bytes_xfer = 0;
+sleep_time = 0;
 initial_time = current_time;
 }
 if (!last_round  (s-bytes_xfer = s-xfer_limit)) {
 /* usleep expects microseconds */
 g_usleep((initial_time + BUFFER_DELAY - current_time)*1000);
+sleep_time += qemu_get_clock_ms(rt_clock) - current_time;
 }
 ret = buffered_flush(s);
 if (ret  0) {
-- 
1.8.1.2




[Qemu-devel] [PATCH 4/4] migration: calculate expected_downtime

2013-02-22 Thread Juan Quintela
We removed the calculation in commit e4ed1541ac9413eac494a03532e34beaf8a7d1c5

Now we add it back.  We need to create dirty_bytes_rate because we
can't include cpu-all.h from migration.c, and there is no other way to
include TARGET_PAGE_SIZE.

Signed-off-by: Juan Quintela quint...@redhat.com


Reviewed-by: Orit Wasserman owass...@redhat.com
---
 arch_init.c   | 1 +
 include/migration/migration.h | 1 +
 migration.c   | 5 +
 3 files changed, 7 insertions(+)

diff --git a/arch_init.c b/arch_init.c
index 8da868b..8daeafa 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -414,6 +414,7 @@ static void migration_bitmap_sync(void)
 if (end_time  start_time + 1000) {
 s-dirty_pages_rate = num_dirty_pages_period * 1000
 / (end_time - start_time);
+s-dirty_bytes_rate = s-dirty_pages_rate * TARGET_PAGE_SIZE;
 start_time = end_time;
 num_dirty_pages_period = 0;
 }
diff --git a/include/migration/migration.h b/include/migration/migration.h
index a8c9639..d121409 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -51,6 +51,7 @@ struct MigrationState
 int64_t downtime;
 int64_t expected_downtime;
 int64_t dirty_pages_rate;
+int64_t dirty_bytes_rate;
 bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
 int64_t xbzrle_cache_size;
 bool complete;
diff --git a/migration.c b/migration.c
index 6649e3a..11725ae 100644
--- a/migration.c
+++ b/migration.c
@@ -738,6 +738,11 @@ static void *buffered_file_thread(void *opaque)
 DPRINTF(transferred % PRIu64  time_spent % PRIu64
  bandwidth %g max_size % PRId64 \n,
 transferred_bytes, time_spent, bandwidth, max_size);
+/* if we haven't sent anything, we don't want to recalculate
+   1 is a small enough number for our purposes */
+if (s-dirty_bytes_rate  transferred_bytes  1) {
+s-expected_downtime = s-dirty_bytes_rate / bandwidth;
+}

 s-bytes_xfer = 0;
 sleep_time = 0;
-- 
1.8.1.2




Re: [Qemu-devel] [PATCH v4 4/6] introduce new vma archive format

2013-02-22 Thread Dietmar Maurer
  Sure, that is no problem. But so far there is no indication that this
  code will be added to qemu.
 
 FWIW the backup block job looks like a good feature and there's enough time to
 get it merged for QEMU 1.5.
 
 I'm not convinced by the backup writer part of this series, but don't let the
 discussions about that discourage you.  The fact that we are discussing it 
 means
 that it's worth discussing and we just need to keep communicating until we
 arrive at something that makes sense for everyone.

Sure. I just want to concentrate on the first parts of the series before I 
start 
writing perfect documentation.




Re: [Qemu-devel] [PATCH] vl.c: Support multiple CPU ranges on -numa option

2013-02-22 Thread Markus Armbruster
Markus Armbruster arm...@redhat.com writes:

 Anthony Liguori anth...@codemonkey.ws writes:

 Markus Armbruster arm...@redhat.com writes:

 Eduardo Habkost ehabk...@redhat.com writes:

 This allows , to be used a separator between each CPU range.  Note
 that commas inside key=value command-line options have to be escaped
 using ,,, so the command-line will look like:

   -numa node,cpus=A,,B,,C,,D

 This is really, really ugly, and an embarrassment to document.  Which
 you didn't ;)

 What about

 -numa node,cpus=A,cpus=B,cpus=C,cpus=D

 Yes, QemuOpts lets you do that.  Getting all the values isn't as easy as
 it could be (unless you use Laszlo's opt-visitor), but that could be
 improved.

 No more of this.

  -numa node,cpus=A:B:C:D 

 if you want to express a list.

 Okay for command line and human monitor, just don't let it bleed into
 QMP.

Footnotes:

1. Using colons for lists works only as long as the list elements don't
contain colons.  Fine for numbers.  No good for filenames, network
addresses, ...

2. QemuOpts helped us reduce the number of ad hoc option parsers,
improving consistency and error messages quite a bit.  Having every user
of colon lists roll their own ad hoc parser slides back into the hole
that motivated QemuOpts.  Let's try to avoid that, please.

3. The existing QemuOpts syntax for list-valued options (repeating the
option) doesn't have either of these problems.



Re: [Qemu-devel] [PATCH v4 0/6] Efficient VM backup for qemu

2013-02-22 Thread Stefan Hajnoczi
On Thu, Feb 21, 2013 at 03:56:15PM +, Dietmar Maurer wrote:
  On Wed, Feb 20, 2013 at 04:04:49PM +, Dietmar Maurer wrote:
The backup writer abstraction is a special case interface somewhere
between an image format and live migration.  I'd prefer it if the
backup block job stuck to BlockDriverState as the destination for backup
  data.
Then you could save a point-in-time copy of a disk to a raw or even
qcow2 image.
  
   backup needs to work on non-seekable pipes.
  
  In _your_ use case. That means that we should support using non-seekable
  pipes, but it doesn't mean that any other use case is irrelevant. In fact I 
  think
  backing up to a normal raw or qcow2 image file is the interesting case for 
  the
  majority of people.
 
 VMs can have more than one disk. How do you backup them into a single raw or 
 qcow2 image?
 And where do you store the configuration inside a qcow2 file?

Here is a draft showing how your backup block job can be used for your
VMA use case, as well as other use cases:

The QMP 'transaction' command takes a list of actions and executes
them atomically.  That means all actions either complete successfully or
they are all rolled back/cancelled.  Today it only supports taking
external snapshots of block devices, so you can say something like:

  transaction actions=[{'type': 'blockdev-snapshot-sync',
'device': 'virtio-drive0',
'snapshot-file': 'vm-snap1.qcow2'},
   {'type': 'blockdev-snapshot-sync',
'device': 'virtio-drive1',
'snapshot-file': 'data-snap1.qcow2'}]

Now let's add your backup block job to QMP 'transaction'.  It will allow
the user to start your point-in-time copy block job on one or more block
devices atomically:

  transaction actions=[{'type': 'point-in-time-copy-async',
'device': 'virtio-drive0',
'destination-device': 'vm-snap1.qcow2'}]

This kicks off the backup block job.  The point of using transaction is
so we can start multiple block jobs atomically, guaranteeing a
consistent state across multiple block devices.

Now let's add live migrate to file (RAM + device state) to QMP
'transaction'.  This provides the functionality of your 'backup' command
minus the management tool's guest configuration file:

  transaction actions=[{'type': 'migrate-async',
'destination': 'file:vmstate.bin',
'id': 'migration'},
   {'type': 'point-in-time-copy-async',
'wait-for': 'migration',
'device': 'virtio-drive0',
'destination-device': 'vm-snap1.qcow2'}]

I added the 'wait-for' attribute which tells QMP transaction to let the
migration action complete first before doing the backup block job.  This
way we first live migration VM state to vmstate.bin and then atomically
kick off the backup block job - with consistent disk state.

Finally, here is how VMA backup works.  There is an external vma-writer
program that works like this:

  $ vma-writer --output=backup-20130201.vma \
   --add-file=path/to/vm-config.xml \
   --vmstate=tcp:127.0.0.1:1234 \
   --blockdev=virtio-drive0,tcp:127.0.0.1:1235

It creates backup-20130201.vma and appends an arbitrary file with the VM
configuration.  It listens for vmstate (guest RAM + device state) on
TCP port 1234 and listens for virtio-drive0's backup over NBD on port 1235.

vma-writer starts by streaming the vmstate into the .vma file.  Next it
services zero or more NBD connections and puts all the WRITE data into
the .vma file.

Once the last NBD connection closes, vma-writer terminates and backup is
complete.

The QEMU command to initiate this backup looks something like:

  transaction actions=[{'type': 'migrate-async',
'destination': 'tcp:127.0.0.1:1234',
'id': 'migration'},
   {'type': 'point-in-time-copy-async',
'wait-for': 'migration',
'device': 'virtio-drive0',
'destination-device': 'nbd:127.0.0.1:1235'}]

It's also possible to serve other use cases.  For example, I previously
showed how to take point-in-time copies of the disk without saving
vmstate.  And I also showed how to save vmstate to a file and each block
device to its own qcow2 file (i.e. the backup into a directory approach
that Paolo mentioned).

This approach is much more flexible than the monolithic 'backup'
command.  This is why I think it's the right approach to take - it can
implement use cases we haven't even encountered yet.

I also think this approach brings together your live backup work with
Wenchao's snapshot work.  It's how I imagined his backup use cases to
work at the QMP level:
http://wiki.qemu.org/Features/VMSnapshotEnchancement

Stefan



Re: [Qemu-devel] GTK UI is now the default

2013-02-22 Thread Jan Kiszka
On 2013-02-22 00:04, Anthony Liguori wrote:
 
 Since this is a pretty visible change for a lot of people, I thought I'd
 send a top level note.  The GTK UI is now committed and is the default
 UI provided it's available.
 
 For anyone counting, it's been a little more than 7 years in the making:
 
 http://article.gmane.org/gmane.comp.emulators.qemu/9726
 
 If you want to try it, make sure you have the gtk-2.0 development
 packages installed and the VTE development packages.

What's your plan now regarding persistent configuration? I'd like to
make Grab on hover default here as it is how I'm used to work with
SDL, but also other tools like rdesktop. Clicking on some menu item each
time I start a guest is obviously no solution.

Then, any concerns about adding a control menu? Something to issue
standard tasks without monitor interaction (power down, reset, stop)?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH V2] help: add docs for multiqueue tap options

2013-02-22 Thread Markus Armbruster
Jason Wang jasow...@redhat.com writes:

 Cc: Markus Armbruster arm...@redhat.com
 Cc: qemu-sta...@nongnu.org
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
 Changes from V1:
 - Add the missing docs for 'queues' (Markus)

V1 got committed already.  Please respin the difference as a new patch.



Re: [Qemu-devel] GTK UI is now the default

2013-02-22 Thread Daniel P. Berrange
On Fri, Feb 22, 2013 at 11:12:35AM +0100, Laurent Desnogues wrote:
 On Fri, Feb 22, 2013 at 12:04 AM, Anthony Liguori aligu...@us.ibm.com wrote:
 
  Since this is a pretty visible change for a lot of people, I thought I'd
  send a top level note.  The GTK UI is now committed and is the default
  UI provided it's available.
 
  For anyone counting, it's been a little more than 7 years in the making:
 
  http://article.gmane.org/gmane.comp.emulators.qemu/9726
 
  If you want to try it, make sure you have the gtk-2.0 development
  packages installed and the VTE development packages.
 
 It looks like gtk_widget_get_window isn't available in all gtk 2 versions,
 at least it's not in the gtk2 2.10 on my CentOS 5.6 system.

The docs say since 2.14

http://developer.gnome.org/gtk3/stable/GtkWidget.html#gtk-widget-get-window

IIRC before that you could simply refernece  widget-window directly.
So you could probably try adding something like this:

  #if ! GTK_CHECK_VERSION(2, 14, 0)
  #define gtk_widget_get_window(w) (w)-window
  #endif

and see if that fixes your build.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



[Qemu-devel] [PATCH] migration: Improve QMP documentation

2013-02-22 Thread Juan Quintela
Signed-off-by: Juan Quintela quint...@redhat.com
---
 qmp-commands.hx | 50 ++
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/qmp-commands.hx b/qmp-commands.hx
index 799adea..6d037bd 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -644,7 +644,7 @@ EQMP

 SQMP
 migrate-set-cache-size
--
+--

 Set cache size to be used by XBZRLE migration, the cache size will be rounded
 down to the nearest power of 2
@@ -667,7 +667,7 @@ EQMP

 SQMP
 query-migrate-cache-size
--
+

 Show cache size to be used by XBZRLE migration

@@ -2419,6 +2419,7 @@ SQMP
 query-migrate
 -

+
 Migration status.

 Return a json-object. If migration is active there will be another json-object
@@ -2438,25 +2439,34 @@ The main json-object contains the following:
 total amount in ms for downtime that was calculated on
the last bitmap round (json-int)
 - ram: only present if status is active, it is a json-object with the
-  following RAM information (in bytes):
- - transferred: amount transferred (json-int)
- - remaining: amount remaining (json-int)
- - total: total (json-int)
- - duplicate: number of duplicated pages (json-int)
- - normal : number of normal pages transferred (json-int)
- - normal-bytes : number of normal bytes transferred (json-int)
+  following RAM information:
+ - transferred: amount transferred in bytes (json-int)
+ - remaining: amount remaining to transfer in bytes (json-int)
+ - total: total amount of memory in bytes (json-int)
+ - duplicate: number of pages containing the same byte. They
+are sent over the wire as a single byte (json-int)
+ - normal : number of whole pages transfered.  I.e. they
+were not sent as duplicate or xbzrle pages (json-int)
+ - normal-bytes : number of bytes transferred in whole
+pages. This is just normal pages times size of one page,
+but this way upper levels don't need to care about page
+size (json-int)
 - disk: only present if status is active and it is a block migration,
-  it is a json-object with the following disk information (in bytes):
- - transferred: amount transferred (json-int)
- - remaining: amount remaining (json-int)
- - total: total (json-int)
+  it is a json-object with the following disk information:
+ - transferred: amount transferred in bytes (json-int)
+ - remaining: amount remaining to transfer in bytes json-int)
+ - total: total disk amount in bytes (json-int)
 - xbzrle-cache: only present if XBZRLE is active.
   It is a json-object with the following XBZRLE information:
- - cache-size: XBZRLE cache size
- - bytes: total XBZRLE bytes transferred
+ - cache-size: XBZRLE cache size in bytes
+ - bytes: total XBZRLE bytes transferred as xbzrle pages
  - pages: number of XBZRLE compressed pages
- - cache-miss: number of cache misses
- - overflow: number of XBZRLE overflows
+ - cache-miss: number of XBRZRLE page cache misses
+ - overflow: number of times XBZRLE overflows.  This means
+   that the XBZRLE encoding was bigger than just sent the
+   whole page, and then we sent the whole page instead (as as
+   normal page).
+
 Examples:

 1. Before the first migration
@@ -2567,11 +2577,11 @@ EQMP

 SQMP
 migrate-set-capabilities

+

 Enable/Disable migration capabilities

-- xbzrle: xbzrle support
+- xbzrle: XBZRLE support

 Arguments:

@@ -2590,7 +2600,7 @@ EQMP
 },
 SQMP
 query-migrate-capabilities

+--

 Query current migration capabilities

-- 
1.8.1.2




Re: [Qemu-devel] [PATCH v4 0/6] Efficient VM backup for qemu

2013-02-22 Thread Kevin Wolf
Am 22.02.2013 um 11:31 hat Stefan Hajnoczi geschrieben:
 On Thu, Feb 21, 2013 at 03:56:15PM +, Dietmar Maurer wrote:
   On Wed, Feb 20, 2013 at 04:04:49PM +, Dietmar Maurer wrote:
 The backup writer abstraction is a special case interface somewhere
 between an image format and live migration.  I'd prefer it if the
 backup block job stuck to BlockDriverState as the destination for 
 backup
   data.
 Then you could save a point-in-time copy of a disk to a raw or even
 qcow2 image.
   
backup needs to work on non-seekable pipes.
   
   In _your_ use case. That means that we should support using non-seekable
   pipes, but it doesn't mean that any other use case is irrelevant. In fact 
   I think
   backing up to a normal raw or qcow2 image file is the interesting case 
   for the
   majority of people.
  
  VMs can have more than one disk. How do you backup them into a single raw 
  or qcow2 image?
  And where do you store the configuration inside a qcow2 file?
 
 Here is a draft showing how your backup block job can be used for your
 VMA use case, as well as other use cases:

I agree with most of what you say, with one exception:

 I added the 'wait-for' attribute which tells QMP transaction to let the
 migration action complete first before doing the backup block job.  This
 way we first live migration VM state to vmstate.bin and then atomically
 kick off the backup block job - with consistent disk state.

I'm not sure if this really belongs in qemu. The management tool can
wait for the migration completed event, after which the VM will be
automatically stopped, do the snapshot and then continue the VM. This
is, as far as I know, the approach that libvirt takes today.

Kevin



Re: [Qemu-devel] [PATCH 1/9] build: disable Wstrict-prototypes

2013-02-22 Thread Markus Armbruster
Markus Armbruster arm...@redhat.com writes:

 Kevin Wolf kw...@redhat.com writes:

 On Mon, Feb 18, 2013 at 05:56:57PM -0600, Anthony Liguori wrote:
 GTK won't build with strict-prototypes due to gtkitemfactory.h:
 
 /* We use () here to mean unspecified arguments. This is deprecated
  * as of C99, but we can't change it without breaking compatibility.
  * (Note that if we are included from a C++ program () will mean
  * (void) so an explicit cast will be needed.)
  */
 typedef void(*GtkItemFactoryCallback)  ();
 
 Signed-off-by: Anthony Liguori aligu...@us.ibm.com
 ---
  configure | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/configure b/configure
 index bf5970f..74d5878 100755
 --- a/configure
 +++ b/configure
 @@ -283,7 +283,7 @@ sdl_config=${SDL_CONFIG-${cross_prefix}sdl-config}
  # default flags for all hosts
  QEMU_CFLAGS=-fno-strict-aliasing $QEMU_CFLAGS
  QEMU_CFLAGS=-Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
 $QEMU_CFLAGS
 -QEMU_CFLAGS=-Wstrict-prototypes -Wredundant-decls $QEMU_CFLAGS
 +QEMU_CFLAGS=-Wredundant-decls $QEMU_CFLAGS
  QEMU_CFLAGS=-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
 $QEMU_CFLAGS
  QEMU_INCLUDES=-I. -I\$(SRC_PATH) -I\$(SRC_PATH)/include
  if test $debug_info = yes; then

 Other places wrap the inclusion of problematic headers in '#pragma GCC
 diagnostic ...' instead of globally disabling warnings.

 I'd hate to lose -Wstrict-prototypes globally.

We just lost it: commit 22bc9a46 :(



Re: [Qemu-devel] [PATCH 06/41] qemu-file: pass errno from qemu_fflush via f-last_error

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 This is done by almost all callers of qemu_fflush, move the code
 directly to qemu_fflush.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 07/41] migration: use qemu_file_set_error to pass error codes back to qemu_savevm_state

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 09/41] migration: flush all data to fd when buffered_flush is called

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Including data that resided in the QEMUFile's own buffer.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 11/41] migration: simplify error handling

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Always use qemu_file_get_error to detect errors, since that is how
 QEMUFile itself drops I/O after an error occurs.  There is no need
 to propagate and check return values all the time.

 Also remove the complete member, since we know that it is set (via
 migrate_fd_cleanup) only when the state changes.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-22 Thread Stefan Hajnoczi
On Thu, Feb 21, 2013 at 06:05:44PM +0100, Paolo Bonzini wrote:
 Il 21/02/2013 17:33, Stefan Hajnoczi ha scritto:
  This especially means things like adding live
  migration support or solving other limitations.  The limitations are
  there to motivate core QEMU refactoring, so that core QEMU code can be
  used directly instead of reimplementing it.
  
  Here's my TODO list:
   * Thread pool - AioContext
  
 The thread pool uses an EventNotifier to invoke callbacks.  In a
 multiple event loop world we need to invoke callbacks in the event
 loop that they are associated with.
  
 Worker threads are also started using a BH so that the thread
 inherits the iothread CPU affinity and other characteristics.  I
 think this can stay.
 
 It can and should stay. :)  BHs are tied to an AioContext.  Running
 thread-creation BHs from the AioContext's main thread ensures that the
 worker threads inherit the right attributes for e.g. real-time.

When I say this can stay I mean that worker threads will be spawned
from the QEMU iothread, not from the caller's thread.  Why?

Because if we decide to change this then we can no longer have a global
set of worker threads and a single work queue.  It would make worker
threads scoped to just one AioContext.

Do we really want to go there?  It kind of defeats the purpose of a
thread-pool to have one thread-pool per AioContext (i.e. per virtio-blk
device).

   * BlockDriverState - AioContext
  
 It probably makes sense to bind a BlockDriverState to an AioContext
 in which its file descriptors and aio activity happens.
 
 ... and have the full chain of BlockDriverStates (bs-file,
 bs-backing_file and any others as in the vmdk driver) bound to a single
 AioContext.  We will use two BlockDriverStates if the same backing file
 happens to be shared by two disks (BTW, anybody ever checked what kind
 of havoc happens if you commit to such a BDS? :)).

AFAIK we don't share backing file BlockDriverStates today.

   * Thread-friendly interrupt API
 
 What is this?

Raising interrupts, I think we're already okay using irqfd today but
maybe it's nicer to make the regular QEMU irq APIs support this instead
of manually using irqfd.

 What's still missing here is TCG support.  Perhaps you can emulate
 ioeventfd at the hw/virtio.c level and always going through a host
 notifier.  In retrospect, perhaps it was a mistake to make ioeventfd a
 non-experimental option.

That's true.  We need to emulate ioeventfd.  For virtio-pci we can cheat
by simply notifying the EventNotifier from the virtqueue kick handler.
But really we should implement an ioeventfd equivalent at the pio/mmio
emulation level in QEMU.

Stefan



Re: [Qemu-devel] [PATCH 12/41] migration: do not nest flushing of device data

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Completion of migration is currently done with a nested loop that
 invokes buffered_flush: migrate_fd_completed is called by
 buffered_file_thread, which calls migrate_fd_cleanup, which calls
 buffered_close (via qemu_fclose), which flushes the buffer.

 Simplify this, by reusing the buffered_flush call of buffered_file_thread.
 Then if qemu_savevm_state_complete was called, and the buffer is empty
 (including the QEMUFile buffer, for which we need the previous patch), we
 are done.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



[Qemu-devel] [PULL 0/4] migration stats queue

2013-02-22 Thread Juan Quintela
Hi

We add the calculation back.  Before doing the calculation we do:

- expected_downtime intial value is max_downtime.  Much, much better
  intial value than 0.

- we move when we measure the time.  We used to measure how much it
  took before we really sent the data.

- we introduce sleep_time concept.  While we are sleeping because we
  have sent all the allowed data for this second we shouldn't be
  accounting that time as sending.

- last patch just introduces the re-calculation of expected_downtime.

It just changes the stats value.  Well, patchs 2  3 change the
bandwidth calculation for migration, but I think that we were
undercalculating it enough than it was a bug.

Without the 2  3 patches, the expected_downtime for an idle gust
was calculated as 80ms (with 30 ms default target value), and we ended
having a downtime of around 15ms.

With this patches applied, we calculate an expected downtime of around
15ms or so, and then we spent aroqund 18ms on downtime.  Notice that
we only calculate how much it takes to sent the rest of the RAM, it
just happens that there is some more data to sent that what we are calculating.

Review, please.

Later, Juan.



The following changes since commit 73d4dc71f3a41131541c73b3ac2a8b160a51842b:

  gtk: suppress accelerators from the File menu when grab is active (2013-02-21 
16:34:49 -0600)

are available in the git repository at:

  git://repo.or.cz/qemu/quintela.git stats.next

for you to fetch changes up to 90f8ae724a575861f093fbdbfd49a925bcfec327:

  migration: calculate expected_downtime (2013-02-22 10:12:52 +0100)


Juan Quintela (4):
  migration: change initial value of expected_downtime
  migration: calculate end time after we have sent the data
  migration: don't account sleep time for calculating bandwidth
  migration: calculate expected_downtime

 arch_init.c   |  1 +
 include/migration/migration.h |  1 +
 migration.c   | 15 +--
 3 files changed, 15 insertions(+), 2 deletions(-)



Re: [Qemu-devel] (iSCSI-)Readcache in Qemu?

2013-02-22 Thread Kevin Wolf
Am 22.02.2013 um 08:51 hat Peter Lieven geschrieben:
 is there any project or plans to integrate a readcache in qemu either for 
 iscsi or for the whole block layer?
 
 I was thinking of 2 options:
 a) assign (up to) a fixed amount of ram for readcaching of a VMs disk I/O. 
 Without SG_IO this seems
 to be quite easy as there are only a handful of operations to intercept.
 b) add SSDs to a Node and allow a VM to use up to X MB of that SSD as 
 read-cache. This should still be
 a benefit if the storage for the VMs is on a NAS.
 
 I would like to do that independent of the OS for 2 reasons. First with iSCSI 
 the OS
 is not involved. And second I would like to be able to add a memory limit for 
 the cache.

I'm not aware of any such plans, but if anyone wanted to write this, I
think the right way to approach it would be as a new block driver that
acts as a filter - for which we still don't have proper support, but if
blkdebug/blkverify work, this could work as well.

Kevin



Re: [Qemu-devel] [PATCH v4 4/6] introduce new vma archive format

2013-02-22 Thread Stefan Hajnoczi
On Fri, Feb 22, 2013 at 05:48:33AM +, Dietmar Maurer wrote:
  Not in qemu.  There's a reason that we ask for clean specs, independent of
  source code.  That is the only way that we can later change the source code 
  to
  do something more efficient or to define an extension, and still have clean
  documentation of what the extensions are, vs. how an older version will 
  behave.
  The initial implementation source code might be easy to read, but that 
  condition
  is not guaranteed to last.
  
  If reading the source code to determine the header format is as easy as you 
  say,
  then you should have no problem writing the spec as detailed as we have 
  asked.
 
 Sure, that is no problem. But so far there is no indication that this code 
 will be added
 to qemu.

FWIW the backup block job looks like a good feature and there's enough
time to get it merged for QEMU 1.5.

I'm not convinced by the backup writer part of this series, but don't
let the discussions about that discourage you.  The fact that we are
discussing it means that it's worth discussing and we just need to keep
communicating until we arrive at something that makes sense for everyone.

Stefan



[Qemu-devel] [PATCH 2/4] migration: calculate end time after we have sent the data

2013-02-22 Thread Juan Quintela
Signed-off-by: Juan Quintela quint...@redhat.com


Reviewed-by: Orit Wasserman owass...@redhat.com
---
 migration.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration.c b/migration.c
index b3f5ba4..b8e412f 100644
--- a/migration.c
+++ b/migration.c
@@ -673,7 +673,7 @@ static void *buffered_file_thread(void *opaque)
 qemu_mutex_unlock_iothread();

 while (true) {
-int64_t current_time = qemu_get_clock_ms(rt_clock);
+int64_t current_time;
 uint64_t pending_size;

 qemu_mutex_lock_iothread();
@@ -727,6 +727,7 @@ static void *buffered_file_thread(void *opaque)
 }
 }
 qemu_mutex_unlock_iothread();
+current_time = qemu_get_clock_ms(rt_clock);
 if (current_time = initial_time + BUFFER_DELAY) {
 uint64_t transferred_bytes = s-bytes_xfer;
 uint64_t time_spent = current_time - initial_time;
-- 
1.8.1.2




Re: [Qemu-devel] [PATCH 14/41] migration: cleanup migration (including thread) in the iothread

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Perform final cleanup in a bottom half, and add joining the thread to
 the series of cleanup actions.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  include/migration/migration.h |1 +
  migration.c   |   37 +++--
  2 files changed, 20 insertions(+), 18 deletions(-)


Reviewed-by: Juan Quintela quint...@redhat.com

Previously, I have had trouble with bottom half here because they added
a lot of latency between we made it runnable until they decided to run.
This was the reason that I didn't used them on my previous tries.

But I haven't hit the problem with this series.



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-22 Thread Stefan Hajnoczi
On Thu, Feb 21, 2013 at 05:38:04PM -0600, mdroth wrote:
 On Thu, Feb 21, 2013 at 11:12:01PM +0100, Paolo Bonzini wrote:
  Il 21/02/2013 22:07, mdroth ha scritto:

100% agree.  In particular hw/dataplane/event-poll.c should be the 
first
to go away, but AioContext provides the functionality that Ping Fan
needs.  But hw/dataplane/vring.c will probably be here for a longer
   Has there been any discussion around introducing something similar to
   AioContexts for fd handlers? This would avoid the dataplane-specific hooks
   needed for NetClients in this series.
  
  AioContext can include file descriptors on POSIX systems (used for NBD
  and other network backends), see aio_set_fd_handler.
 
 Sorry, was using fd handlers too generally. I mean specifically for
 the qemu_set_fd_handler interfaces, where we currently rely on a single list
 of IOHandlerRecords for registration and a single loop to drive them. Would
 be nice if we could drive subsets of those via mini main loops, similar to the
 way dataplane threads would with a particular AioContext via aio_poll (or 
 perhaps
 the exact same way)
 
 Currently, Ping Fan's patches basically do this already by accessing a
 global reference to the vnet worker thread and attaching events/handlers to
 it's event loop via a new set of registration functions (PATCH 7).
 
 I think generalizing this by basing qemu_set_fd_handler() around
 AioContext, or something similar, would help to extend support to other
 NetClient implementations without requiring dataplane-specific hooks
 throughout.

We used to have a nesting feature that we got rid of.  I don't remember
the details but basically a nesting level counter:

  aio_set_fd_handler(...); /* fd at level N */

  aio_nesting_inc();
  aio_set_fd_handler(...); /* fd at level N+1 */
  aio_poll(); /* just this fd */
  aio_nesting_dec();

  aio_poll(); /* both fds */

Then aio_poll() only considers fds which are greater than or equal to
the current nesting level.

I remember disliking this feature but maybe it was just being used badly
rather than the mechanism itself being bad.

Stefan



Re: [Qemu-devel] [PATCH 17/41] block-migration: document usage of state across threads

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Acked-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 10/41] migration: use qemu_file_set_error

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Remove the return value of buffered_flush, pass it via the error code
 of s-file.  Once this is done, the error can be retrieved simply
 via migrate_fd_close's call to qemu_fclose.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 18/41] block-migration: add lock

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Some state is shared between the block migration code and its AIO
 callbacks.  Once block migration will run outside the iothread,
 the block migration code and the AIO callbacks will be able to
 run concurrently.  Protect the critical sections with a separate
 lock.  Do the same for completed_sectors, which can be used from
 the monitor.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com


For this idiom:

  for (sector = bmds-cur_dirty; sector  bmds-total_sectors;) {
 +blk_mig_lock();
  if (bmds_aio_inflight(bmds, sector)) {
 +blk_mig_unlock();
  bdrv_drain_all();
 +} else {
 +blk_mig_unlock();
  }
  if (bdrv_get_dirty(bmds-bs, sector)) {
  

I find easier to understand to do:

 for (sector = bmds-cur_dirty; sector  bmds-total_sectors;) {
bool drain;
blk_mig_lock();
drain = bmds_aio_inflight(bmds, sector)) {
blk_mig_unlock();
if (drain)
bdrv_drain_all();
}
if (bdrv_get_dirty(bmds-bs, sector)) {


But it is up-to-you if you have to resend the series.
 diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
 index 96a194b..10becb6 100644
 --- a/include/qemu/atomic.h
 +++ b/include/qemu/atomic.h
 @@ -16,6 +16,7 @@
   */
  #define smp_wmb()   barrier()
  #define smp_rmb()   barrier()
 +
  /*
   * We use GCC builtin if it's available, as that can use
   * mfence on 32 bit as well, e.g. if built with -march=pentium-m.

This hunk obviously don't belong to this commit O:-)



Re: [Qemu-devel] [PATCH 18/41] block-migration: add lock

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 11:59, Juan Quintela ha scritto:
 Paolo Bonzini pbonz...@redhat.com wrote:
 Some state is shared between the block migration code and its AIO
 callbacks.  Once block migration will run outside the iothread,
 the block migration code and the AIO callbacks will be able to
 run concurrently.  Protect the critical sections with a separate
 lock.  Do the same for completed_sectors, which can be used from
 the monitor.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 
 Reviewed-by: Juan Quintela quint...@redhat.com
 
 
 For this idiom:
 
  for (sector = bmds-cur_dirty; sector  bmds-total_sectors;) {
 +blk_mig_lock();
  if (bmds_aio_inflight(bmds, sector)) {
 +blk_mig_unlock();
  bdrv_drain_all();
 +} else {
 +blk_mig_unlock();
  }
  if (bdrv_get_dirty(bmds-bs, sector)) {
  
 
 I find easier to understand to do:
 
  for (sector = bmds-cur_dirty; sector  bmds-total_sectors;) {
 bool drain;
 blk_mig_lock();
 drain = bmds_aio_inflight(bmds, sector)) {
 blk_mig_unlock();
 if (drain)
 bdrv_drain_all();
 }
 if (bdrv_get_dirty(bmds-bs, sector)) {
 
 
 But it is up-to-you if you have to resend the series.

Sure, I can change that.  I don't like it either way, in truth. :)

Paolo

 diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
 index 96a194b..10becb6 100644
 --- a/include/qemu/atomic.h
 +++ b/include/qemu/atomic.h
 @@ -16,6 +16,7 @@
   */
  #define smp_wmb()   barrier()
  #define smp_rmb()   barrier()
 +
  /*
   * We use GCC builtin if it's available, as that can use
   * mfence on 32 bit as well, e.g. if built with -march=pentium-m.
 
 This hunk obviously don't belong to this commit O:-)
 
 




Re: [Qemu-devel] [PATCH 08/41] qemu-file: temporarily expose qemu_file_set_error and qemu_fflush

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Right now, migration cannot entirely rely on QEMUFile's automatic
 drop of I/O after an error, because it does its real I/O outside
 the put_buffer callback.  To fix this until buffering is gone, expose
 qemu_file_set_error which we will use in buffered_flush.

 Similarly, buffered_flush is not a complete flush because some data may
 still reside in the QEMUFile's own buffer.  This somewhat complicates the
 process of closing the migration thread.  Again, when buffering is gone
 buffered_flush will disappear and calling qemu_fflush will not be needed;
 in the meanwhile, we expose the function for use in migration.c.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 19/41] migration: reorder SaveVMHandlers members

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 This groups together the callbacks that later will have similar
 locking rules.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] (iSCSI-)Readcache in Qemu?

2013-02-22 Thread Stefan Hajnoczi
On Fri, Feb 22, 2013 at 08:51:48AM +0100, Peter Lieven wrote:
 is there any project or plans to integrate a readcache in qemu either for 
 iscsi or for the whole block layer?
 
 I was thinking of 2 options:
 a) assign (up to) a fixed amount of ram for readcaching of a VMs disk I/O. 
 Without SG_IO this seems
 to be quite easy as there are only a handful of operations to intercept.
 b) add SSDs to a Node and allow a VM to use up to X MB of that SSD as 
 read-cache. This should still be
 a benefit if the storage for the VMs is on a NAS.
 
 I would like to do that independent of the OS for 2 reasons. First with iSCSI 
 the OS
 is not involved. And second I would like to be able to add a memory limit for 
 the cache.

I'm not aware of such an effort.  In general QEMU tries to avoid caching
data because the host OS can already do it (if desired).

The Linux cgroups memory controller can limit file cache size for a
specific process or group of processes.

  http://www.kernel.org/doc/Documentation/cgroups/memory.txt

I suggest using the Linux iSCSI initiator together with cgroups to
achieve what you want.  Adding this functionality to QEMU will cost us
in complexity, data integrity worries, and code maintenance.

Stefan



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-22 Thread Kevin Wolf
Am 21.02.2013 um 18:05 hat Paolo Bonzini geschrieben:
 Il 21/02/2013 17:33, Stefan Hajnoczi ha scritto:
   * BlockDriverState - AioContext
  
 It probably makes sense to bind a BlockDriverState to an AioContext
 in which its file descriptors and aio activity happens.
 
 ... and have the full chain of BlockDriverStates (bs-file,
 bs-backing_file and any others as in the vmdk driver) bound to a single
 AioContext.  We will use two BlockDriverStates if the same backing file
 happens to be shared by two disks (BTW, anybody ever checked what kind
 of havoc happens if you commit to such a BDS? :)).

Backing files aren't that interesting indeed, but I think there could be
cases where two BDSes refer to the same parent BDS. For example think of
the read-only qcow2 snapshot BDS that we've been talking about before.

So it would have to be more like one AioContext per connected component
in the BDS graph, if one per BDS doesn't work (why doesn't it?)

Kevin



[Qemu-devel] [PATCH] gtk: Replace bogus term VGA with display

2013-02-22 Thread Jan Kiszka
VGA is a misnomer, only some of our targets have it. Use the more
generic term display instead.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

No translation updates yet as I do not know how you like to handle them.

 ui/gtk.c |   45 +
 1 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/ui/gtk.c b/ui/gtk.c
index 29156be..e9faf15 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -93,7 +93,7 @@ typedef struct GtkDisplayState
 GtkWidget *zoom_fit_item;
 GtkWidget *grab_item;
 GtkWidget *grab_on_hover_item;
-GtkWidget *vga_item;
+GtkWidget *display_item;
 
 int nb_vcs;
 VirtualConsole vc[MAX_VCS];
@@ -133,7 +133,7 @@ static bool gd_grab_on_hover(GtkDisplayState *s)
 return 
gtk_check_menu_item_get_active(GTK_CHECK_MENU_ITEM(s-grab_on_hover_item));
 }
 
-static bool gd_on_vga(GtkDisplayState *s)
+static bool gd_on_display(GtkDisplayState *s)
 {
 return gtk_notebook_get_current_page(GTK_NOTEBOOK(s-notebook)) == 0;
 }
@@ -141,13 +141,13 @@ static bool gd_on_vga(GtkDisplayState *s)
 static void gd_update_cursor(GtkDisplayState *s, gboolean override)
 {
 GdkWindow *window;
-bool on_vga;
+bool on_display;
 
 window = gtk_widget_get_window(GTK_WIDGET(s-drawing_area));
 
-on_vga = gd_on_vga(s);
+on_display = gd_on_display(s);
 
-if ((override || on_vga) 
+if ((override || on_display) 
 (s-full_screen || kbd_mouse_is_absolute() || gd_is_grab_active(s))) {
 gdk_window_set_cursor(window, s-null_cursor);
 } else {
@@ -593,7 +593,7 @@ static void gd_menu_switch_vc(GtkMenuItem *item, void 
*opaque)
 {
 GtkDisplayState *s = opaque;
 
-if (gtk_check_menu_item_get_active(GTK_CHECK_MENU_ITEM(s-vga_item))) {
+if (gtk_check_menu_item_get_active(GTK_CHECK_MENU_ITEM(s-display_item))) {
 gtk_notebook_set_current_page(GTK_NOTEBOOK(s-notebook), 0);
 } else {
 int i;
@@ -627,7 +627,7 @@ static void gd_menu_full_screen(GtkMenuItem *item, void 
*opaque)
 gtk_widget_set_size_request(s-menu_bar, 0, 0);
 gtk_widget_set_size_request(s-drawing_area, -1, -1);
 gtk_window_fullscreen(GTK_WINDOW(s-window));
-if (gd_on_vga(s)) {
+if (gd_on_display(s)) {
 gtk_check_menu_item_set_active(GTK_CHECK_MENU_ITEM(s-grab_item), 
TRUE);
 }
 s-full_screen = TRUE;
@@ -744,7 +744,7 @@ static void gd_change_page(GtkNotebook *nb, gpointer arg1, 
guint arg2,
 {
 GtkDisplayState *s = data;
 guint last_page;
-gboolean on_vga;
+gboolean on_display;
 
 if (!gtk_widget_get_realized(s-notebook)) {
 return;
@@ -756,9 +756,9 @@ static void gd_change_page(GtkNotebook *nb, gpointer arg1, 
guint arg2,
 gtk_widget_set_size_request(s-vc[last_page - 1].terminal, -1, -1);
 }
 
-on_vga = arg2 == 0;
+on_display = arg2 == 0;
 
-if (!on_vga) {
+if (!on_display) {
 gtk_check_menu_item_set_active(GTK_CHECK_MENU_ITEM(s-grab_item),
FALSE);
 } else if (s-full_screen) {
@@ -767,7 +767,8 @@ static void gd_change_page(GtkNotebook *nb, gpointer arg1, 
guint arg2,
 }
 
 if (arg2 == 0) {
-gtk_check_menu_item_set_active(GTK_CHECK_MENU_ITEM(s-vga_item), TRUE);
+gtk_check_menu_item_set_active(GTK_CHECK_MENU_ITEM(s-display_item),
+   TRUE);
 } else {
 VirtualConsole *vc = s-vc[arg2 - 1];
 VteTerminal *term = VTE_TERMINAL(vc-terminal);
@@ -780,7 +781,7 @@ static void gd_change_page(GtkNotebook *nb, gpointer arg1, 
guint arg2,
 gtk_widget_set_size_request(vc-terminal, width, height);
 }
 
-gtk_widget_set_sensitive(s-grab_item, on_vga);
+gtk_widget_set_sensitive(s-grab_item, on_display);
 
 gd_update_cursor(s, TRUE);
 }
@@ -964,7 +965,7 @@ static void gd_connect_signals(GtkDisplayState *s)
  G_CALLBACK(gd_menu_zoom_fixed), s);
 g_signal_connect(s-zoom_fit_item, activate,
  G_CALLBACK(gd_menu_zoom_fit), s);
-g_signal_connect(s-vga_item, activate,
+g_signal_connect(s-display_item, activate,
  G_CALLBACK(gd_menu_switch_vc), s);
 g_signal_connect(s-grab_item, activate,
  G_CALLBACK(gd_menu_grab_input), s);
@@ -1044,12 +1045,15 @@ static void gd_create_menus(GtkDisplayState *s)
 separator = gtk_separator_menu_item_new();
 gtk_menu_append(GTK_MENU(s-view_menu), separator);
 
-s-vga_item = gtk_radio_menu_item_new_with_mnemonic(group, _VGA);
-group = gtk_radio_menu_item_get_group(GTK_RADIO_MENU_ITEM(s-vga_item));
-gtk_menu_item_set_accel_path(GTK_MENU_ITEM(s-vga_item),
- QEMU/View/VGA);
-gtk_accel_map_add_entry(QEMU/View/VGA, GDK_KEY_1, GDK_CONTROL_MASK | 
GDK_MOD1_MASK);
-gtk_menu_append(GTK_MENU(s-view_menu), s-vga_item);
+s-display_item = gtk_radio_menu_item_new_with_mnemonic(group,
+ 

Re: [Qemu-devel] [PATCH 20/41] migration: run pending/iterate callbacks out of big lock

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 This makes it possible to do blocking writes directly to the socket,
 with no buffer in the middle.  For RAM, only the migration_bitmap_sync()
 call needs the iothread lock.  For block migration, it is needed by
 the block layer (including bdrv_drain_all and dirty bitmap access),
 but because some code is shared between iterate and complete, all of
 mig_save_device_dirty is run with the lock taken.

 In the savevm case, the iterate callback runs within the big lock.
 This is annoying because it complicates the rules.  Luckily we do not
 need to do anything about it: the RAM iterate callback does not need
 the iothread lock, and block migration never runs during savevm.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 21/41] migration: run setup callbacks out of big lock

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Only the migration_bitmap_sync() call needs the iothread lock.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH] slirp: fixed potential use-after-free of a socket

2013-02-22 Thread Vitaly Chipounov
Hi,

On 21.02.2013 15:33, Jan Kiszka wrote:
 On 2013-02-15 12:00, Vitaly Chipounov wrote:
 A socket may still have references to it in various queues
 at the time it is freed, causing memory corruptions.
 Did you see it in practice? Or is this patch based on code review? What
 will happen if those queued mbufs find their ifq_so NULL?

I have a packet trace that triggers this problem when it is injected
into the guest's NIC.
I am still not quite sure why this happens, but suspect that it could be
caused by malformed/partial TCP/IP streams.

Setting ifq_so to NULL shouldn't be an issue because there seems to be
checks for that (e.g., in if.c).
It doesn't seem to affect normal web browsing/downloading in the guest.

Vitaly


 Signed-off-by: Vitaly Chipounov vitaly.chipou...@epfl.ch
 ---
  slirp/socket.c |   29 +
  1 file changed, 29 insertions(+)

 diff --git a/slirp/socket.c b/slirp/socket.c
 index 77b0c98..8a7adc8 100644
 --- a/slirp/socket.c
 +++ b/slirp/socket.c
 @@ -55,6 +55,33 @@ socreate(Slirp *slirp)
return(so);
  }
  
 +/**
 + * It may happen that a socket is still referenced in various
 + * mbufs at the time it is freed. Clear all references to the
 + * socket here.
 + */
 +static void soremovefromqueues(struct socket *so)
 +{
 +if (!so-so_queued) {
 +return;
 +}
 +
 +Slirp *slirp = so-slirp;
 +struct mbuf *ifm;
 +
 +for (ifm = slirp-if_fastq.ifq_next; ifm != slirp-if_fastq; ifm = 
 ifm-ifq_next) {
 This line and the one below smells like checkpatch.pl wasn't run against
 it. Granted, slirp patches easy make checkpatch upset as the input is
 not properly formatted, but please don't add more violations.

 +if (ifm-ifq_so == so) {
 +ifm-ifq_so = NULL;
 +}
 +}
 +
 +for (ifm = slirp-if_batchq.ifq_next; ifm != slirp-if_batchq; ifm = 
 ifm-ifq_next) {
 +if (ifm-ifq_so == so) {
 +ifm-ifq_so = NULL;
 +}
 +}
 +}
 +
  /*
   * remque and free a socket, clobber cache
   */
 @@ -79,6 +106,8 @@ sofree(struct socket *so)
if(so-so_next  so-so_prev)
  remque(so);  /* crashes if so is not in a queue */
  
 +  soremovefromqueues(so);
 +
free(so);
  }
  

 Sorry for the late feedback,
 Jan





Re: [Qemu-devel] [PATCH 15/41] block-migration: remove variables that are never read

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 16/41] block-migration: small preparatory changes for locking

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Some small changes that will simplify the positioning of lock/unlock
 primitives
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 22/41] migration: yay, buffering is gone

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Buffering was needed because blocking writes could take a long time
 and starve other threads seeking to grab the big QEMU mutex.

 Now that all writes (except within _complete callbacks) are done
 outside the big QEMU mutex, we do not need buffering at all.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com

DEBUG error change was already found by Orit.



Re: [Qemu-devel] [PATCH 23/41] Rename buffered_ to migration_

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 From: Juan Quintela quint...@redhat.com

 This is consistent once that we have moved everything to migration.c

 Signed-off-by: Juan Quintela quint...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com




Re: [Qemu-devel] [PATCH 24/41] qemu-file: make qemu_fflush and qemu_file_set_error private again

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



[Qemu-devel] [PATCH 1/2] require gtk 2.20+

2013-02-22 Thread Gerd Hoffmann
The gtk code uses gtk_widget_get_realized which is available in 2.20+
only, so make this the minimum accepted versions.  Fixes build failures
on RHEL-6 (which ships 2.18) by not building gtk support there.

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 configure |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 0dadd31..a6b0c02 100755
--- a/configure
+++ b/configure
@@ -1644,7 +1644,7 @@ fi
 # GTK probe
 
 if test $gtk != no; then
-if $pkg_config gtk+-2.0 --modversion /dev/null 2/dev/null  \
+if $pkg_config gtk+-2.0 --atleast-version=2.20 /dev/null 2/dev/null  \
$pkg_config vte --modversion /dev/null 2/dev/null; then
gtk_cflags=`$pkg_config --cflags gtk+-2.0 2/dev/null`
gtk_libs=`$pkg_config --libs gtk+-2.0 2/dev/null`
-- 
1.7.9.7




[Qemu-devel] [PULL 0/2] fix build

2013-02-22 Thread Gerd Hoffmann
  Hi,

Two little build fixes.  One in the gtk ui pull, one in the usb queue.

please pull,
  Gerd

The following changes since commit 73d4dc71f3a41131541c73b3ac2a8b160a51842b:

  gtk: suppress accelerators from the File menu when grab is active (2013-02-21 
16:34:49 -0600)

are available in the git repository at:

  git://git.kraxel.org/qemu build.2

for you to fetch changes up to c489173a2c4a886fddad39c9981d1ab9ea9c7b85:

  unbreak hw/usb/redirect.c build (2013-02-22 12:07:49 +0100)


Gerd Hoffmann (2):
  require gtk 2.20+
  unbreak hw/usb/redirect.c build

 configure |2 +-
 hw/usb/redirect.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)



[Qemu-devel] [PATCH 2/2] unbreak hw/usb/redirect.c build

2013-02-22 Thread Gerd Hoffmann
Commit 8550a02d1239415342959f6a32d178bc05c557cc added a streams
parameter to usb_wakeup and didn't update redirect.c.  Fix it.

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/usb/redirect.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index 7078403..c519b9b 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -1897,7 +1897,7 @@ static void usbredir_interrupt_packet(void *priv, 
uint64_t id,
 }
 
 if (QTAILQ_EMPTY(dev-endpoint[EP2I(ep)].bufpq)) {
-usb_wakeup(usb_ep_get(dev-dev, USB_TOKEN_IN, ep  0x0f));
+usb_wakeup(usb_ep_get(dev-dev, USB_TOKEN_IN, ep  0x0f), 0);
 }
 
 /* bufp_alloc also adds the packet to the ep queue */
-- 
1.7.9.7




Re: [Qemu-devel] [PATCH v4 0/6] Efficient VM backup for qemu

2013-02-22 Thread Dietmar Maurer
Opening several NBD sockets 'nbd:127.0.0.1:1235' seems dangerous to me. One 
backup task
and easily write to the wrong port - there is no protection? Or how can we 
guarantee that
only one kvm process can write to that NBD device? 

   transaction actions=[{'type': 'migrate-async',
 'destination': 'tcp:127.0.0.1:1234',
 'id': 'migration'},
{'type': 'point-in-time-copy-async',
 'wait-for': 'migration',
 'device': 'virtio-drive0',
 'destination-device': 'nbd:127.0.0.1:1235'}]
 




[Qemu-devel] [PATCH 1/4] migration: change initial value of expected_downtime

2013-02-22 Thread Juan Quintela
0 is a very bad initial value, what we are trying to get is
max_downtime, so that is a much better estimation.

Signed-off-by: Juan Quintela quint...@redhat.com


Reviewed-by: Orit Wasserman owass...@redhat.com
---
 migration.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration.c b/migration.c
index b1ebb01..b3f5ba4 100644
--- a/migration.c
+++ b/migration.c
@@ -774,6 +774,8 @@ void migrate_fd_connect(MigrationState *s)
 s-buffer = NULL;
 s-buffer_size = 0;
 s-buffer_capacity = 0;
+/* This is a best 1st approximation. ns to ms */
+s-expected_downtime = max_downtime/100;

 s-xfer_limit = s-bandwidth_limit / XFER_LIMIT_RATIO;
 s-complete = false;
-- 
1.8.1.2




Re: [Qemu-devel] [PATCH 25/41] migration: eliminate last_round

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 We will go around the loop exactly once after setting last_round.
 Eliminate the variable altogether.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 26/41] migration: detect error before sleeping

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 27/41] migration: remove useless qemu_file_get_error check

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 migration_put_buffer is never called if there has been an error.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH] slirp: fixed potential use-after-free of a socket

2013-02-22 Thread Jan Kiszka
On 2013-02-22 10:57, Vitaly Chipounov wrote:
 Hi,
 
 On 21.02.2013 15:33, Jan Kiszka wrote:
 On 2013-02-15 12:00, Vitaly Chipounov wrote:
 A socket may still have references to it in various queues
 at the time it is freed, causing memory corruptions.
 Did you see it in practice? Or is this patch based on code review? What
 will happen if those queued mbufs find their ifq_so NULL?
 
 I have a packet trace that triggers this problem when it is injected
 into the guest's NIC.
 I am still not quite sure why this happens, but suspect that it could be
 caused by malformed/partial TCP/IP streams.

OK. Is it shareable?

We need to understand why these bufs still hang around - and exclude
they leak or cause more problems elsewhere.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 1/2] require gtk 2.20+

2013-02-22 Thread Daniel P. Berrange
On Fri, Feb 22, 2013 at 12:11:58PM +0100, Gerd Hoffmann wrote:
 The gtk code uses gtk_widget_get_realized which is available in 2.20+
 only, so make this the minimum accepted versions.  Fixes build failures
 on RHEL-6 (which ships 2.18) by not building gtk support there.

IMHO it'd be nicer to add the compat code to let it build with
something like this:

 #if !GTK_CHECK_VERSION(2, 18, 0)
 #define gtk_widget_get_realized(w) GTK_WIDGET_REALIZED(w)
 #endif

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCH 1/2] require gtk 2.20+

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 12:11, Gerd Hoffmann ha scritto:
 The gtk code uses gtk_widget_get_realized which is available in 2.20+
 only, so make this the minimum accepted versions.  Fixes build failures
 on RHEL-6 (which ships 2.18) by not building gtk support there.

We should either require GTK+ 3, or just use GTK_WIDGET_REALIZED().
Either is fine, I think.

Paolo



Re: [Qemu-devel] [PATCH 28/41] migration: use qemu_file_rate_limit consistently

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 29/41] migration: merge qemu_popen_cmd with qemu_popen

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 There is no reason for outgoing exec migration to do popen manually
 anymore (the reason used to be that we needed the FILE* to make it
 non-blocking).  Use qemu_popen_cmd.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 30/41] qemu-file: fsync a writable stdio QEMUFile

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 This is what fd_close does.  Prepare for switching to a QEMUFile.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 31/41] qemu-file: check exit status when closing a pipe QEMUFile

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 This is what exec_close does.  Move this to the underlying QEMUFile.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH v4 4/6] introduce new vma archive format

2013-02-22 Thread Dietmar Maurer
  For now I have no plans to backup such large VMs.
 
 640k ought to be enough for anybody?
 
 Code is easy enough to change that works for now can be good enough.
 Changing file formats and external interfaces is hard, though, so better get 
 it
 right the first time. I don't want to get patches for a new format in a year 
 or two
 just because 128T was enough for the first few users.

The limits are not really arbitrary - I choose them carefully to keep the 
overhead small.

We currently only need 8 bytes per cluster, which results in small files.

So I guess it is better to use a different format (version) if you want to 
store such big files.






Re: [Qemu-devel] [PATCH 33/41] qemu-file: simplify and export qemu_ftell

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Force a flush when qemu_ftell is called.  This simplifies the buffer magic
 (it also breaks qemu_ftell for input QEMUFiles, but we never use it).

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Acked-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH v4 0/6] Efficient VM backup for qemu

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 12:13, Dietmar Maurer ha scritto:
 Opening several NBD sockets 'nbd:127.0.0.1:1235' seems dangerous to me. One 
 backup task
 and easily write to the wrong port - there is no protection?

You can also use the named-export support.  Something like

   --nbd=127.0.0.1:1235
   --blockdev=virtio-drive0 --blockdev=virtio-drive1 --blockdev=ide0-hd0

etc.

Then QEMU can migrate to nbd://127.0.0.1:1235/virtio-drive0.

 Or how can we guarantee that only one kvm process can write to that NBD 
 device?

Why is this needed?

BTW, with Kevin's proposal of doing migrate+backup+continue at the
management level, you may not even need extensions to the transaction
command.  It's much easier to implement.

Paolo

 
transaction actions=[{'type': 'migrate-async',
  'destination': 'tcp:127.0.0.1:1234',
  'id': 'migration'},
 {'type': 'point-in-time-copy-async',
  'wait-for': 'migration',
  'device': 'virtio-drive0',
  'destination-device': 'nbd:127.0.0.1:1235'}]
  
 
 




Re: [Qemu-devel] [PATCH] xhci: fix bad print specifier

2013-02-22 Thread Gerd Hoffmann
On 02/21/13 22:58, Hervé Poussineau wrote:
 This fixes the following compilation error:
 hw/usb/hcd-xhci.c:1156:17: error: format ‘%llx’ expects argument of type
 ‘long long unsigned int’, but argument 4 has type ‘unsigned int’
 
 Signed-off-by: Hervé Poussineau hpous...@reactos.org

Acked-by: Gerd Hoffmann kra...@redhat.com

cheers,
  Gerd



Re: [Qemu-devel] [PATCH 34/41] migration: use QEMUFile for migration channel lifetime

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 As a start, use QEMUFile to store the destination and close it.
 qemu_get_fd gets a file descriptor that will be used by the write
 callbacks.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 35/41] migration: use QEMUFile for writing outgoing migration data

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Second, drop the file descriptor indirection, and write directly to the
 QEMUFile.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

 +qemu_put_buffer(s-migration_file, buf, size);
 +if (qemu_file_get_error(s-migration_file)) {
 +return qemu_file_get_error(s-migration_file);

Rest of patch is really, really nice.

But here, please, use a local variable.

qemu_put_buffer(s-migration_file, buf, size);
ret = qemu_file_get_error(s-migration_file);
if (ret) {
   return ret;
}



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 11:43, Stefan Hajnoczi ha scritto:
  
  It can and should stay. :)  BHs are tied to an AioContext.  Running
  thread-creation BHs from the AioContext's main thread ensures that the
  worker threads inherit the right attributes for e.g. real-time.
 When I say this can stay I mean that worker threads will be spawned
 from the QEMU iothread, not from the caller's thread.

Ah :)

 Why?
 
 Because if we decide to change this then we can no longer have a global
 set of worker threads and a single work queue.  It would make worker
 threads scoped to just one AioContext.
 
 Do we really want to go there?  It kind of defeats the purpose of a
 thread-pool to have one thread-pool per AioContext (i.e. per virtio-blk
 device).

I think scoped worker threads are simpler to implement and have little
or no lock contention (I have patches for lockless thread-pool, but I'm
not 100% sure they're correct---and they're a bit ugly).  Also, with
many virtio-blk threads you could hit the 64-thread limit that
thread-pool has, so it may require some tuning.

More importantly, scoped worker thread do not require an external
iothread to exist at all.

We can switch from one solution to the other easily (benchmarking takes
more time than coding, in all likelihood), so it's not urgent to choose
one over the other now.

 We will use two BlockDriverStates if the same backing file
 happens to be shared by two disks (BTW, anybody ever checked what kind
 of havoc happens if you commit to such a BDS? :)).
 
 AFAIK we don't share backing file BlockDriverStates today.

Indeed, we don't share BDSs but commit operations could cause the same
file to be backed by two BDSs, one read-only and one read-write.

 What's still missing here is TCG support.  Perhaps you can emulate
 ioeventfd at the hw/virtio.c level and always going through a host
 notifier.  In retrospect, perhaps it was a mistake to make ioeventfd a
 non-experimental option.
 
 That's true.  We need to emulate ioeventfd.  For virtio-pci we can cheat
 by simply notifying the EventNotifier from the virtqueue kick handler.
 But really we should implement an ioeventfd equivalent at the pio/mmio
 emulation level in QEMU.

Not a big deal anyway (like the irqfd).

Paolo



Re: [Qemu-devel] [PATCH 36/41] migration: use qemu_ftell to compute bandwidth

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Prepare for when s-bytes_xfer will be removed.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 37/41] migration: small changes around rate-limiting

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 This patch extracts a few small changes from the next patch, which
 are unrelated to adding generic rate-limiting functionality to
 QEMUFile.  Make migration_set_rate_limit a simple accessor, and
 use qemu_file_set_rate_limit consistently.  Also fix a typo where
 INT_MAX should have been SIZE_MAX.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 0/9] introduce virtio net dataplane

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 12:06, Kevin Wolf ha scritto:
 or example think of
 the read-only qcow2 snapshot BDS that we've been talking about before.
 
 So it would have to be more like one AioContext per connected component
 in the BDS graph

Yes.  For now it means one AioContext per top-level BDS, but that can
change in the future.

 , if one per BDS doesn't work (why doesn't it?)

One AioContext per BDS means one event loop and hence one thread per
BDS, with extra costs in terms of locking, context switching, etc.

Paolo



Re: [Qemu-devel] [PATCH 38/41] migration: move rate limiting to QEMUFile

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Rate limiting is now simply a byte counter; client call
 qemu_file_rate_limit() manually to determine if they have to exit.
 So it is possible and simple to move the functionality to QEMUFile.

 This makes the remaining functionality of s-file redundant;
 in the next patch we can remove it and write directly to s-migration_file.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com


Nice!!!

  qemu_put_buffer(s-migration_file, buf, size);
 -if (qemu_file_get_error(s-migration_file)) {
 -return qemu_file_get_error(s-migration_file);
 -}
 -
 -s-bytes_xfer += size;
 -return size;
 +return qemu_file_get_error(s-migration_file);
  }

You fix here the problem that I pointed in a previous patch, so feel
free not to fix it in the previous one.

If anyone reading this, just thinking aloud, this function is called as

if (f-is_write  f-buf_index  0) {
ret = f-ops-put_buffer(f-opaque, f-buf, f-buf_offset, 
f-buf_index);
if (ret = 0) {
f-buf_offset += f-buf_index;
}
f-buf_index = 0;

So the change from size to 0 on the non-error case don't matter.





Re: [Qemu-devel] [PATCH 39/41] migration: move contents of migration_close to migrate_fd_cleanup

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 With this patch, the migration_file is not needed anymore.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 40/41] migration: eliminate s-migration_file

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 The indirection is useless now.  Backends can open s-file directly.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH 41/41] migration: inline migrate_fd_close

2013-02-22 Thread Juan Quintela
Paolo Bonzini pbonz...@redhat.com wrote:
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH v4 00/10] main-loop: switch to g_poll(3) on POSIX hosts

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 11:07, Laurent Desnogues ha scritto:
 The g_poll(3) interface is a portable version of the poll(2) system call.  
 The
  difference to select(2) is that fine-grained events (G_IO_IN, G_IO_OUT,
  G_IO_HUP, G_IO_ERR, G_IO_PRI) can be monitored instead of just
  read/write/exception.  Also, there is no limit to the file descriptor 
  numbers
  that may be used, allowing applications to scale to many file descriptors. 
   See
  the documentation for details:
 
http://developer.gnome.org/glib/2.28/glib-The-Main-Event-Loop.html#g-poll
 g_poll isn't available in older glib versions used by CentOS (2.12
 on CentOS 5.6).

Hmm, right.  But we can just use poll.  We require a newer glib that has
g_poll on Windows, but we know that on POSIX systems g_poll is really poll.

Paolo



Re: [Qemu-devel] [PATCH v4 0/6] Efficient VM backup for qemu

2013-02-22 Thread Dietmar Maurer
 You can also use the named-export support.  Something like
 
--nbd=127.0.0.1:1235
--blockdev=virtio-drive0 --blockdev=virtio-drive1 --blockdev=ide0-hd0
 
 etc.
 
 Then QEMU can migrate to nbd://127.0.0.1:1235/virtio-drive0.
 
  Or how can we guarantee that only one kvm process can write to that NBD
 device?
 
 Why is this needed?

Security? I don't want that another process can write nonsense into my backup.






Re: [Qemu-devel] using -net dump with tap networking

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 10:35, Markus Armbruster ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 
 Il 21/02/2013 15:41, Markus Armbruster ha scritto:
 Trouble is the user interface is as confusing as ever.  Worse, in a way,
 because we now have to explain how vlan and the hubport netdev relate.

 Permit me a brief rant.  Have you read the manual page on -netdev and
 -net recently?  Have you tried to explain setting up QEMU networking to
 a neophyte?  Were you able to keep a straight face?  If yes, congrats,
 you got a great future ahead of you.

 We have so many ways to do the same thing, complicating our docs
 enormously.  Just one of them makes sense for almost all uses, but I
 doubt mortal users could guess which one just from the docs.

 Can we do for the docs what Stefan did for the code, i.e. get vlans
 out of the way?

 Specifically, what's missing to let us deprecate -net completely?

 Less important: the fact that -net dump is quite useful and requires
 vlans.
 
 Let me rephrase: dumping network traffic is quite useful, and our
 current way to do that from within QEMU requires vlans.
 
 The dump vlan client is neither a net frontend nor a net backend, it's
 a passive listener.  Could also be viewed as special case of a filter
 that happens not to change anything.
 
 If it's the only useful listener or filter, we could just special-case
 it and be done.
 
 If not, we need a way to attach listeners to the 1:1 netdev connections,
 or turn it in a chain with filters as interior nodes.
 
(-net socket has some usecases too: -net bridge helps placing
 a VM's network on a bridge, but adding a bridge still requires root
 privileges).
 
 -netdev socket and -netdev bridge don't cut it?

-netdev bridge works great.

If you want to bridge multiple guest NICs on the same socket, however,
you need the VLAN mechanism (though actually you can also use -netdev
socket,mcast if that works).

 To be more precise: -device currently works only for devices on
 pluggable buses.
 
 Anything you could plug on a physical machine should be on a pluggable
 bus (whether we implemented plugging is another question).
 
 Anything you couldn't plug on a physical machine must be soldered onto
 the board.  Boards often come in a few flavours with different onboard
 devices.  Virtual boards tend to sprout more flavours.  Because of that,
 thinking in board options you can combine is often more convenient than
 having a finite number of board variants.
 
 QOM will hopefully enable us to stitch devices together without the need
 for buses.

Still, you have to tell the board this is NIC 0, because it is the
board logic that knows the MMIO base address of the NICs---not the NIC
itself.

At least, that's how the authors of embedded ports structure their code
(see recent disappearance of sysbus_add_memory; enabling pervasive use
of -device would instead require sysbus_mmio_map to disappear!).

Paolo

 So, I quite agree that we need a way to configure board options.  Our
 current way to do that for NICs is multiplexed into -net as -net nic.
 
 -net nic is quite unlike the other -net: it doesn't create a vlan
 client, it asks the board to create a NIC device.  Parameter model
 lets you ask for a specific one.
 
 Boards can do what they want with these requests.  Some boards don't
 support NICs and simply ignore these requests (generic code attempts to
 detect that and warn).  Others treat models they don't know as fatal
 error.  Some boards create exactly the NICs you asked for.  Others
 create exactly their onboard NICs, whether you asked for them or not
 (asking is merely a way to supply configuration parameters).
 
 Most (all?) boards can tell you what models they support (-net
 nic,model=help), but some lie, e.g. PCs neglect to mention model
 ne2k_isa.
 
 Boards supporting PCI generally recognize a few common PCI NIC models.
 But you're better off using -device there, because it gives you access
 to all device models and options at no extra charge.
 
 Well, here's another nice mess you've gotten me into, Stanley!
 
 
 TL;DR: I quite agree that we can't just throw away -net without a
 replacement.  I just think it's a mess, and should be replaced.
 




Re: [Qemu-devel] [PATCH 35/41] migration: use QEMUFile for writing outgoing migration data

2013-02-22 Thread Juan Quintela
Juan Quintela quint...@redhat.com wrote:
 Paolo Bonzini pbonz...@redhat.com wrote:
 Second, drop the file descriptor indirection, and write directly to the
 QEMUFile.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

 +qemu_put_buffer(s-migration_file, buf, size);
 +if (qemu_file_get_error(s-migration_file)) {
 +return qemu_file_get_error(s-migration_file);

 Rest of patch is really, really nice.

 But here, please, use a local variable.

 qemu_put_buffer(s-migration_file, buf, size);
 ret = qemu_file_get_error(s-migration_file);
 if (ret) {
return ret;
 }

This chunk is removed in a later patch, so don't care about changing it.

Reviewed-by: Juan Quintela quint...@redhat.com



Re: [Qemu-devel] [PATCH v4 0/6] Efficient VM backup for qemu

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 12:51, Dietmar Maurer ha scritto:
 You can also use the named-export support.  Something like
  
 --nbd=127.0.0.1:1235
 --blockdev=virtio-drive0 --blockdev=virtio-drive1 --blockdev=ide0-hd0
  
  etc.
  
  Then QEMU can migrate to nbd://127.0.0.1:1235/virtio-drive0.
  
   Or how can we guarantee that only one kvm process can write to that NBD
  device?
  
  Why is this needed?
 Security? I don't want that another process can write nonsense into my backup.

They can already write nonsense to your iSCSI target, can't they?

But you can always sandbox using SELinux, if you care about that, or use
a Unix socket + SCM_CREDENTIALS.

Paolo




Re: [Qemu-devel] (iSCSI-)Readcache in Qemu?

2013-02-22 Thread Paolo Bonzini
Il 22/02/2013 12:06, Stefan Hajnoczi ha scritto:
 On Fri, Feb 22, 2013 at 08:51:48AM +0100, Peter Lieven wrote:
 is there any project or plans to integrate a readcache in qemu either for 
 iscsi or for the whole block layer?

 I was thinking of 2 options:
 a) assign (up to) a fixed amount of ram for readcaching of a VMs disk I/O. 
 Without SG_IO this seems
 to be quite easy as there are only a handful of operations to intercept.
 b) add SSDs to a Node and allow a VM to use up to X MB of that SSD as 
 read-cache. This should still be
 a benefit if the storage for the VMs is on a NAS.

 I would like to do that independent of the OS for 2 reasons. First with 
 iSCSI the OS
 is not involved. And second I would like to be able to add a memory limit 
 for the cache.
 
 I'm not aware of such an effort.  In general QEMU tries to avoid caching
 data because the host OS can already do it (if desired).
 
 The Linux cgroups memory controller can limit file cache size for a
 specific process or group of processes.
 
   http://www.kernel.org/doc/Documentation/cgroups/memory.txt
 
 I suggest using the Linux iSCSI initiator together with cgroups to
 achieve what you want.  Adding this functionality to QEMU will cost us
 in complexity, data integrity worries, and code maintenance.

libiscsi has some nice advantages over the Linux iSCSI initiator.  One
of them is that you don't need cache=none, but that mustn't count for
Peter. :)  Others are that no privileges are needed to configure it,
that it bypasses the mandatory partition scanning in the kernel, and
that iscsi URIs are easier to use than udev stable paths.

Paolo




Re: [Qemu-devel] [PATCH v4 4/6] introduce new vma archive format

2013-02-22 Thread Kevin Wolf
Am 22.02.2013 um 12:21 hat Dietmar Maurer geschrieben:
   For now I have no plans to backup such large VMs.
  
  640k ought to be enough for anybody?
  
  Code is easy enough to change that works for now can be good enough.
  Changing file formats and external interfaces is hard, though, so better 
  get it
  right the first time. I don't want to get patches for a new format in a 
  year or two
  just because 128T was enough for the first few users.
 
 The limits are not really arbitrary - I choose them carefully to keep the 
 overhead small.
 
 We currently only need 8 bytes per cluster, which results in small files.

So how big is the metadata overhead? If you always copy 64k at once
(which is a very conservative assumption - at least for the background
copy you'll have much larger blocks), then an 8 byte header for each is
0.01%. Increasing that to 0.02% doesn't sound like a huge problem to me.

And I think some other reasons were already suggested why using 16 bytes
would be better, so maybe we should just do it.

Kevin



Re: [Qemu-devel] [PATCH 37/41] migration: small changes around rate-limiting

2013-02-22 Thread Orit Wasserman
On 02/15/2013 07:47 PM, Paolo Bonzini wrote:
 This patch extracts a few small changes from the next patch, which
 are unrelated to adding generic rate-limiting functionality to
 QEMUFile.  Make migration_set_rate_limit a simple accessor, and
 use qemu_file_set_rate_limit consistently.  Also fix a typo where
 INT_MAX should have been SIZE_MAX.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  migration.c |   19 +++
  1 files changed, 11 insertions(+), 8 deletions(-)
 
 diff --git a/migration.c b/migration.c
 index d636c59..308214f 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -451,10 +451,15 @@ void qmp_migrate_set_speed(int64_t value, Error **errp)
  if (value  0) {
  value = 0;
  }
 +if (value  SIZE_MAX) {
 +value = SIZE_MAX;
 +}
  
  s = migrate_get_current();
  s-bandwidth_limit = value;
 -qemu_file_set_rate_limit(s-file, s-bandwidth_limit);
 +if (s-file) {
 +qemu_file_set_rate_limit(s-file, s-bandwidth_limit / 
 XFER_LIMIT_RATIO);
 +}
  }
  
  void qmp_migrate_set_downtime(double value, Error **errp)
 @@ -554,11 +559,8 @@ static int64_t migration_set_rate_limit(void *opaque, 
 int64_t new_rate)
  if (qemu_file_get_error(s-file)) {
  goto out;
  }
 -if (new_rate  SIZE_MAX) {
 -new_rate = SIZE_MAX;
 -}
  
 -s-xfer_limit = new_rate / XFER_LIMIT_RATIO;
 +s-xfer_limit = new_rate;
  
  out:
  return s-xfer_limit;
 @@ -600,7 +602,7 @@ static void *migration_thread(void *opaque)
  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
  old_vm_running = runstate_is_running();
  vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
 -s-xfer_limit = INT_MAX;
 +qemu_file_set_rate_limit(s-file, INT_MAX);
  qemu_savevm_state_complete(s-file);
  qemu_mutex_unlock_iothread();
  if (!qemu_file_get_error(s-file)) {
 @@ -663,11 +665,12 @@ void migrate_fd_connect(MigrationState *s)
  {
  s-state = MIG_STATE_ACTIVE;
  s-bytes_xfer = 0;
 -s-xfer_limit = s-bandwidth_limit / XFER_LIMIT_RATIO;
 -
  s-cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
  s-file = qemu_fopen_ops(s, migration_file_ops);
  
 +qemu_file_set_rate_limit(s-file,
 + s-bandwidth_limit / XFER_LIMIT_RATIO);
 +
  qemu_thread_create(s-thread, migration_thread, s,
 QEMU_THREAD_JOINABLE);
  notifier_list_notify(migration_state_notifiers, s);
 
Reviewed-by: Orit Wasserman owass...@redhat.com



[Qemu-devel] [RFC PATCH v2 0/2] qemu: ioeventfd for virtio-ccw.

2013-02-22 Thread Cornelia Huck
Here's the next version of ioeventfd support for virtio-ccw, again
against master.

v1 - v2:
- Adapt to changed ioeventfd interface.

Cornelia Huck (2):
  linux-headers: Update with ioeventfd changes.
  virtio-ccw: Wire up ioeventfd.

 hw/s390x/css.c|   2 +-
 hw/s390x/css.h|   1 +
 hw/s390x/virtio-ccw.c | 114 ++
 hw/s390x/virtio-ccw.h |   7 +++
 linux-headers/linux/kvm.h |   2 +
 target-s390x/cpu.h|  16 +++
 target-s390x/kvm.c|  18 
 7 files changed, 159 insertions(+), 1 deletion(-)

-- 
1.7.12.4




  1   2   3   4   5   >