Re: [Qemu-devel] [PATCH] qemu-ga: Extend guest-network-get-interfaces

2013-01-02 Thread Michal Privoznik
On 21.12.2012 19:43, Eric Blake wrote:
 On 12/21/2012 05:59 AM, Michal Privoznik wrote:
 Nowadays only basic information is reported. However, with the
 current implementation much more can be exposed to users. like
 broadcast/destination address (the former in case of standard
 ethernet device, the latter in case of PPP interface), if the
 interface is up, of type loopback, in promisc mode or capable of
 sending multicast.
 ---

 
 +++ b/qga/qapi-schema.json
 @@ -480,26 +480,57 @@
  #
  # @prefix: Network prefix length of @ip-address
  #
 -# Since: 1.1
 +# @dest-address: The broadcast or peer address.
 +#
 +# Since: 1.1, @dest-address since 1.3
 
 Actually, since 1.4 now (1.3 is already out).
 
  ##
  { 'type': 'GuestIpAddress',
'data': {'ip-address': 'str',
 'ip-address-type': 'GuestIpAddressType',
 -   'prefix': 'int'} }
 +   'prefix': 'int',
 +   '*dest-address': 'str'} }
 
 Is this field always going to be present in 1.4?  If so, then it doesn't
 need to be marked optional (even though it wasn't present in 1.3).

Not really. This field is gonna be there iff guest agent is able to dig
the info out. For instance, for PPP interfaces, I was unable to get
peer's address via getifaddrs(). Other utilities use netlink for that.
However, if interface has an broadcast address, this can be easily
obtained via getifaddrs(). That's why I am making this optional for now.

 
  ##
 +# @GuestNetworkInterfaceType:
 +#
 +# @broadcast: Interface has a broadcast address. In which case it is
 +# contained in @dest-address in @GuestIpAddress.
 +#
 +# @ppp: Interface is of point-to-point type. The peer address is then in
 +#   @dest-address in @GuestIpAddress.
 +#
 +# Since: 1.3
 
 1.4
 
 +##
 +{ 'enum': 'GuestNetworkInterfaceType',
 +  'data': ['broadcast', 'ppp'] }
 +##
  # @GuestNetworkInterface:
  #
  # @name: The name of interface for which info are being delivered
  #
 +# @up: If the interface is up
 +#
 +# @loopback: If the interface is of loopback type
 +#
 +# @promisc: If the interface is in promiscuous mode
 +#
 +# @multicast: If the interface is cappable of multicast
 
 s/cappable/capable/
 
 +#
 +# @type: If the interface has a broadcast address(-es) assigned, or is a 
 PPP.
 +#
  # @hardware-address: Hardware address of @name
  #
  # @ip-addresses: List of addresses assigned to @name
  #
 -# Since: 1.1
 +# Since: 1.1, @up, @loopback, @promisc, @multicast and @type since 1.3
 
 1.4
 
  ##
  { 'type': 'GuestNetworkInterface',
'data': {'name': 'str',
 +   'up': 'bool',
 +   'loopback': 'bool',
 +   'promisc': 'bool',
 +   'multicast': 'bool',
 +   '*type': 'GuestNetworkInterfaceType',
 
 Again, is this field optional?

Yes. Because this actually tells type of 'dest-address' field which is
optional I think this one should be optional as well.

 
 '*hardware-address': 'str',
 '*ip-addresses': ['GuestIpAddress'] } }
  

 

Michal



Re: [Qemu-devel] Fwd: Problem booting 32 bit guest on 64 bit host using kvm

2013-01-02 Thread Gleb Natapov
On Wed, Jan 02, 2013 at 05:35:44PM +1000, Mark Blakeney wrote:
 Ubuntu 12.04 (precise) kernel is 3.2.0.35.40. Too hard to downgrade kermel.
 
So, as far as I understand, you moved from 3.2.0 32bit kernel to
3.5.0/3.7.0 64bit kernel and things stopped working. It is hard to
conclude that this is 32 vs 64 bit problem from that. Can you compile
3.2 64bit kernel and try it?

 I should mention that I installed current qemu 1.3.0 from source but
 made no difference. I also had installed 1.3.0 from source on my older
 ubuntu system (trying to fix a problem which I later solved another
 way) and it worked fine. So I don't think this is an issue with
 older/later versions of qemu.
 
 Another very odd thing is that about one in 40 attempts the image will
 boot in kvm. Completely random though it seems. Always boots if I add
 -no-kvm.
 
 --
 Mark Blakeney.

--
Gleb.



Re: [Qemu-devel] [PATCH] Change to correct PowerPC on a 64bit host

2013-01-02 Thread Andreas Färber
Hello,

Am 02.01.2013 05:58, schrieb Samuel Seay:
 Attached is a patch for fixing bug #1052857. My local tests show it
 working properly on 32 and 64bit.
 
 Signed-off-by: Samuel Seay lightnin...@gmail.com
 mailto:lightnin...@gmail.com

Please submit patches using git-send-email tool so that they arrive
text-only and inline and can be commented on by reviewers.

The commit message should describe what you are changing and why, not
that you're changing something and it happens to fix one bug. ;-)

Also you should cc the linux-user maintainer, ppc maintainer and
qemu-...@nongnu.org (see MAINTAINERS file) and make the commit message /
subject start with linux-user:  to clarify what the patch is about and
who needs to look at it.

http://wiki.qemu.org/Contribute/SubmitAPatch

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH] Change to correct PowerPC on a 64bit host

2013-01-02 Thread Peter Maydell
On 2 January 2013 04:58, Samuel Seay lightnin...@gmail.com wrote:
 Attached is a patch for fixing bug #1052857. My local tests show it working
 properly on 32 and 64bit.

--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -4584,7 +4584,7 @@ static void setup_frame(int sig, struct
target_sigaction *ka,

 signal = current_exec_domain_sig(sig);

-err |= __put_user(h2g(ka-_sa_handler), sc-handler);
+err |= __put_user(ka-_sa_handler, sc-handler);
 err |= __put_user(set-sig[0], sc-oldmask);
 #if defined(TARGET_PPC64)
 err |= __put_user(set-sig[0]  32, sc-_unused[3]);

This looks OK...


@@ -4606,8 +4606,6 @@ static void setup_frame(int sig, struct
target_sigaction *ka,

 /* Create a stack frame for the caller of the handler.  */
 newsp = frame_addr - SIGNAL_FRAMESIZE;
-err |= __put_user(env-gpr[1], (target_ulong *)(uintptr_t) newsp);
-
 if (err)
 goto sigsegv;

...but this bit doesn't. We need to save the old SP to the stack frame,
and your patch just skips this step. You're right that the line in question
is broken though; it has two problems:
 * it's using newsp (a guest address) as an argument to __put_user(),
   which wants a host address
 * it's using __put_user() which works on locked addresses, but newsp
   is below the area we locked with lock_user_struct earlier

Another dodgy line in this function:
env-gpr[4] = (target_ulong) h2g(sc);
Since sc is an offset into the struct returned by lock_user_struct(),
if DEBUG_REMAP is defined then we're passing the guest a pointer
to memory that is free()d by unlock_user_struct(). This should probably
be setting gpr[4] to frame_addr + offsetof(something) instead.

-- PMM



Re: [Qemu-devel] Fwd: Problem booting 32 bit guest on 64 bit host using kvm

2013-01-02 Thread Mark Blakeney
FYI, I just discovered that I can make my Solaris guest boot every
time in kvm by specifying an interactive boot at the boot prompt and
then just hand stepping through the default prompts. Presumably there
is a timing issue in the guest boot sequence which kvm is exposing
when running natively on my new current gen cpu + ssd box. It seems
slowing down the boot artificially by hand stepping, or by running
with -no-kvm, or by running on my older hardware + hdd, avoids this.

So sorry but this seems likely a bug in the old Solaris 2.5.1 guest OS
when running on modern fast hardware(?). Now I know about this
interactive boot option I can just use it to boot each time. There are
only 3 quick prompts so it is not really a bother.

The guest runs fine in kvm after booting so this speed related bug is
only during the initial boot sequence.

Thanks for your help.



Re: [Qemu-devel] [PATCH 3/8] libqemustub: vmstate register/unregister stubs

2013-01-02 Thread Andreas Färber
Am 05.12.2012 17:49, schrieb Eduardo Habkost:
 diff --git a/stubs/vmstate.c b/stubs/vmstate.c
 new file mode 100644
 index 000..badf79e
 --- /dev/null
 +++ b/stubs/vmstate.c
 @@ -0,0 +1,17 @@
 +#include qemu-common.h
 +#include vmstate.h

Needed to update this to migration/vmstate.h since Paolo's header file
reorganization.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] Fwd: Problem booting 32 bit guest on 64 bit host using kvm

2013-01-02 Thread Gleb Natapov
On Wed, Jan 02, 2013 at 10:26:55PM +1000, Mark Blakeney wrote:
 FYI, I just discovered that I can make my Solaris guest boot every
 time in kvm by specifying an interactive boot at the boot prompt and
 then just hand stepping through the default prompts. Presumably there
 is a timing issue in the guest boot sequence which kvm is exposing
 when running natively on my new current gen cpu + ssd box. It seems
 slowing down the boot artificially by hand stepping, or by running
 with -no-kvm, or by running on my older hardware + hdd, avoids this.
 
Interesting. Thanks for the update and report back if you will find
something new please.

 So sorry but this seems likely a bug in the old Solaris 2.5.1 guest OS
 when running on modern fast hardware(?). Now I know about this
 interactive boot option I can just use it to boot each time. There are
 only 3 quick prompts so it is not really a bother.
 
 The guest runs fine in kvm after booting so this speed related bug is
 only during the initial boot sequence.
 
 Thanks for your help.

--
Gleb.



Re: [Qemu-devel] [PATCH] Change to correct PowerPC on a 64bit host

2013-01-02 Thread Samuel Seay
The VM I did the work in doesn't have internet access and I was unsure how
to do a text only email with gmail. With that said, the line that removed
the env-gpr[1] is redudant as a few lines below in the original source it
is set with newsp. The removed line would seg fault due to trying to write
the value of env-gpr[1] into newsp, which is not valid in host.

I can not speak to the line a bit further with h2g(sc).

Samuel

On Wed, Jan 2, 2013 at 7:00 AM, Peter Maydell peter.mayd...@linaro.orgwrote:

 On 2 January 2013 04:58, Samuel Seay lightnin...@gmail.com wrote:
  Attached is a patch for fixing bug #1052857. My local tests show it
 working
  properly on 32 and 64bit.

 --- a/linux-user/signal.c
 +++ b/linux-user/signal.c
 @@ -4584,7 +4584,7 @@ static void setup_frame(int sig, struct
 target_sigaction *ka,

  signal = current_exec_domain_sig(sig);

 -err |= __put_user(h2g(ka-_sa_handler), sc-handler);
 +err |= __put_user(ka-_sa_handler, sc-handler);
  err |= __put_user(set-sig[0], sc-oldmask);
  #if defined(TARGET_PPC64)
  err |= __put_user(set-sig[0]  32, sc-_unused[3]);

 This looks OK...


 @@ -4606,8 +4606,6 @@ static void setup_frame(int sig, struct
 target_sigaction *ka,

  /* Create a stack frame for the caller of the handler.  */
  newsp = frame_addr - SIGNAL_FRAMESIZE;
 -err |= __put_user(env-gpr[1], (target_ulong *)(uintptr_t) newsp);
 -
  if (err)
  goto sigsegv;

 ...but this bit doesn't. We need to save the old SP to the stack frame,
 and your patch just skips this step. You're right that the line in question
 is broken though; it has two problems:
  * it's using newsp (a guest address) as an argument to __put_user(),
which wants a host address
  * it's using __put_user() which works on locked addresses, but newsp
is below the area we locked with lock_user_struct earlier

 Another dodgy line in this function:
 env-gpr[4] = (target_ulong) h2g(sc);
 Since sc is an offset into the struct returned by lock_user_struct(),
 if DEBUG_REMAP is defined then we're passing the guest a pointer
 to memory that is free()d by unlock_user_struct(). This should probably
 be setting gpr[4] to frame_addr + offsetof(something) instead.

 -- PMM



[Qemu-devel] [PATCH 3/3] vnc: fix possible uninitialized removals

2013-01-02 Thread Tim Hardeck
Some VncState values are not initialized before the Websocket handshake.
If it fails QEMU segfaults during the cleanup. To prevent this behavior
intialization checks are added.

Signed-off-by: Tim Hardeck thard...@suse.de
---
 ui/vnc.c |   11 ---
 ui/vnc.h |1 +
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index ee08894..ff4e2ae 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -1053,20 +1053,24 @@ void vnc_disconnect_finish(VncState *vs)
 audio_del(vs);
 vnc_release_modifiers(vs);
 
-QTAILQ_REMOVE(vs-vd-clients, vs, next);
+if (vs-initialized) {
+QTAILQ_REMOVE(vs-vd-clients, vs, next);
+qemu_remove_mouse_mode_change_notifier(vs-mouse_mode_notifier);
+}
 
 if (QTAILQ_EMPTY(vs-vd-clients)) {
 dcl-idle = 1;
 }
 
-qemu_remove_mouse_mode_change_notifier(vs-mouse_mode_notifier);
 vnc_remove_timer(vs-vd);
 if (vs-vd-lock_key_sync)
 qemu_remove_led_event_handler(vs-led);
 vnc_unlock_output(vs);
 
 qemu_mutex_destroy(vs-output_mutex);
-qemu_bh_delete(vs-bh);
+if (vs-bh != NULL) {
+qemu_bh_delete(vs-bh);
+}
 buffer_free(vs-jobs_buffer);
 
 for (i = 0; i  VNC_STAT_ROWS; ++i) {
@@ -2749,6 +2753,7 @@ static void vnc_connect(VncDisplay *vd, int csock, int 
skipauth, bool websocket)
 
 void vnc_init_state(VncState *vs)
 {
+vs-initialized = true;
 VncDisplay *vd = vs-vd;
 
 vs-ds = vd-ds;
diff --git a/ui/vnc.h b/ui/vnc.h
index f93c89a..45d7686 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -306,6 +306,7 @@ struct VncState
 QEMUPutLEDEntry *led;
 
 bool abort;
+bool initialized;
 QemuMutex output_mutex;
 QEMUBH *bh;
 Buffer jobs_buffer;
-- 
1.7.10.4




[Qemu-devel] [PATCH 1/3] vnc: added buffer_advance function

2013-01-02 Thread Tim Hardeck
Following Anthony Liguori's Websocket implementation I have added the
buffer_advance function to VNC and replaced all related buffer memmove
operations with it.

Signed-off-by: Tim Hardeck thard...@suse.de
---
 ui/vnc.c |   13 +
 ui/vnc.h |1 +
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index 8912b78..ddf01f1 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -510,6 +510,13 @@ void buffer_append(Buffer *buffer, const void *data, 
size_t len)
 buffer-offset += len;
 }
 
+void buffer_advance(Buffer *buf, size_t len)
+{
+memmove(buf-buffer, buf-buffer + len,
+(buf-offset - len));
+buf-offset -= len;
+}
+
 static void vnc_desktop_resize(VncState *vs)
 {
 DisplayState *ds = vs-ds;
@@ -1166,8 +1173,7 @@ static long vnc_client_write_plain(VncState *vs)
 if (!ret)
 return 0;
 
-memmove(vs-output.buffer, vs-output.buffer + ret, (vs-output.offset - 
ret));
-vs-output.offset -= ret;
+buffer_advance(vs-output, ret);
 
 if (vs-output.offset == 0) {
 qemu_set_fd_handler2(vs-csock, NULL, vnc_client_read, NULL, vs);
@@ -1313,8 +1319,7 @@ void vnc_client_read(void *opaque)
 }
 
 if (!ret) {
-memmove(vs-input.buffer, vs-input.buffer + len, 
(vs-input.offset - len));
-vs-input.offset -= len;
+buffer_advance(vs-input, len);
 } else {
 vs-read_handler_expect = ret;
 }
diff --git a/ui/vnc.h b/ui/vnc.h
index 8b40f09..5059cbe 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -510,6 +510,7 @@ void buffer_reserve(Buffer *buffer, size_t len);
 void buffer_reset(Buffer *buffer);
 void buffer_free(Buffer *buffer);
 void buffer_append(Buffer *buffer, const void *data, size_t len);
+void buffer_advance(Buffer *buf, size_t len);
 
 
 /* Misc helpers */
-- 
1.7.10.4




Re: [Qemu-devel] [PATCH 1/8] Move -I$(SRC_PATH)/include compiler flag to Makefile.objs

2013-01-02 Thread Andreas Färber
Am 14.12.2012 18:21, schrieb Eduardo Habkost:
 On Fri, Dec 14, 2012 at 04:34:29PM +0100, Andreas Färber wrote:
 Waiting for ack or nack from Paolo here. I am expecting some overlap
 with his header file reorganization series.

 My previous (unanswered?) question was why you are moving vl.o lines in
 addition to the QEMU_CFLAGS lines that you mention in the commit message.
 
 
 I thought this note in the commit message would answer the question:
 
 This also moves the existing CFLAGS lines from Makefile.objs at the
 beginning of the file, to keep them all in the same place.
 
 In other words: it's cosmetic, just to keep all the QEMU_CLFAGS lines
 inside Makefile.objs grouped in a visible place at the beginning of the
 file.
 
 (You noticed that I am moving the vl.o lines _inside_ Makefile.obj,
 right? They are not being moved between different files.)

Nah, you caught me there, must've misread that on a previous submission
(or it changed or whatever).

Anyway, Paolo's header reorganization was pulled by now, so this patch
no longer seems necessary, series compiles without. Please shout if I'm
misreading this!

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



[Qemu-devel] [PATCH 0/3 v5] vnc: added initial websocket protocol support

2013-01-02 Thread Tim Hardeck
This patch set adds basic Websocket Protocol version 13 - RFC 6455 - support
to QEMU VNC. Binary encoding support on the client side is mandatory.

Because of the GnuTLS requirement the Websockets implementation is
optional (--enable-vnc-ws).

To activate Websocket support the VNC option websocket is used, for
example -vnc :0,websocket.
The listen port for Websocket connections is (5700 + display) so if
QEMU VNC is started with :0 the Websocket port would be 5700.
As an alternative the Websocket port could be manually specified by
using ,websocket=port instead.

Changes v2
* removed automatic websocket recognition
* added new lwebsock socket on port 5700 + display when the vnc option
  websocket is passed on
* adapted vnc_connect vnc_listen_read to differ between websocket
* added separate event handler to read the Websocket handshake

Changes v3
* added manual port specification by using ,websocket=port
* switched from memmem() to g_strstr_len()
* removed masked_size from vncws_decode_frame()
* resetted vnc_tls variable to default in the configure script

Changes v4
* incorporated suggestions from Stefan Hajnoczi
* moved websockets encoding from vnc_write to its own client_write function
* moved websockets decoding to its own client_read function
* added initialization checks to vnc_disconnect to prevent crashes if a regular 
client connects to the websocket port

Changes v5
* added initialized variable to VncState to prevent crashes during 
vnc_disconnect - the previously added initialization checks didn't prevent 
segfaults when a websocket client was connected

Tim Hardeck (3):
  vnc: added buffer_advance function
  vnc: added initial websocket protocol support
  vnc: fix possible uninitialized removals

 configure|   27 +-
 qemu-options.hx  |8 ++
 ui/Makefile.objs |1 +
 ui/vnc-ws.c  |  282 ++
 ui/vnc-ws.h  |   92 ++
 ui/vnc.c |  211 +++-
 ui/vnc.h |   21 
 7 files changed, 614 insertions(+), 28 deletions(-)
 create mode 100644 ui/vnc-ws.c
 create mode 100644 ui/vnc-ws.h

-- 
1.7.10.4




Re: [Qemu-devel] [PATCH] Change to correct PowerPC on a 64bit host

2013-01-02 Thread Peter Maydell
On 2 January 2013 13:01, Samuel Seay lightnin...@gmail.com wrote:
 The VM I did the work in doesn't have internet access and I was unsure how
 to do a text only email with gmail. With that said, the line that removed
 the env-gpr[1] is redudant as a few lines below in the original source it
 is set with newsp. The removed line would seg fault due to trying to write
 the value of env-gpr[1] into newsp, which is not valid in host.

No, it's not redundant -- we must save the old value of gpr[1], exactly
because we are about to change it (set it to newsp). The code is trying
to do the right thing (copy the old env-gpr[1] value into the guest
stack frame it is setting up) but in a broken way, so it must be fixed,
not just removed.

-- PMM



Re: [Qemu-devel] [RFC PATCH V8 01/15] qdev : add a maximum device allowed field for the bus.

2013-01-02 Thread Anthony Liguori
fred.kon...@greensocs.com writes:

 From: KONRAD Frederic fred.kon...@greensocs.com

 Add a max_dev field to BusState to specify the maximum amount of devices 
 allowed
 on the bus ( have no effect if max_dev=0 )

 Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com
 ---
  hw/qdev-core.h|  2 ++
  hw/qdev-monitor.c | 11 +++
  2 files changed, 13 insertions(+)

 diff --git a/hw/qdev-core.h b/hw/qdev-core.h
 index d672cca..af909b9 100644
 --- a/hw/qdev-core.h
 +++ b/hw/qdev-core.h
 @@ -104,6 +104,8 @@ struct BusState {
  const char *name;
  int allow_hotplug;
  int max_index;
 +/* maximum devices allowed on the bus, 0 : no limit. */
 +int max_dev;
  QTAILQ_HEAD(ChildrenHead, BusChild) children;
  QLIST_ENTRY(BusState) sibling;
  };
 diff --git a/hw/qdev-monitor.c b/hw/qdev-monitor.c
 index a1b4d6a..7a9d275 100644
 --- a/hw/qdev-monitor.c
 +++ b/hw/qdev-monitor.c
 @@ -292,6 +292,17 @@ static BusState *qbus_find_recursive(BusState *bus, 
 const char *name,
  if (bus_typename  !object_dynamic_cast(OBJECT(bus), bus_typename)) {
  match = 0;
  }
 +if ((bus-max_dev != 0)  (bus-max_dev = bus-max_index)) {
 +if (name != NULL) {
 +/* bus was explicitly specified : return an error. */
 +qerror_report(ERROR_CLASS_GENERIC_ERROR, Bus '%s' is full,
 +  bus-name);
 +return NULL;
 +} else {
 +/* bus was not specified : try to find another one. */
 +match = 0;
 +}
 +}
  if (match) {
  return bus;
  }

Nice change, but I wonder if this should be a class property instead of
an object property?  Would different objects of the same class ever set
this differently?

Regards,

Anthony Liguori

 -- 
 1.7.11.7




Re: [Qemu-devel] [RFC PATCH V8 02/15] virtio-bus : Introduce virtio-bus

2013-01-02 Thread Anthony Liguori
fred.kon...@greensocs.com writes:

 From: KONRAD Frederic fred.kon...@greensocs.com

 Introduce virtio-bus. Refactored transport device will create a bus which
 extends virtio-bus.

 Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com
 ---
  hw/Makefile.objs |   1 +
  hw/virtio-bus.c  | 169 
 +++
  hw/virtio-bus.h  |  98 
  3 files changed, 268 insertions(+)
  create mode 100644 hw/virtio-bus.c
  create mode 100644 hw/virtio-bus.h

 diff --git a/hw/Makefile.objs b/hw/Makefile.objs
 index d581d8d..6fa4de4 100644
 --- a/hw/Makefile.objs
 +++ b/hw/Makefile.objs
 @@ -3,6 +3,7 @@ common-obj-y += loader.o
  common-obj-$(CONFIG_VIRTIO) += virtio-console.o
  common-obj-$(CONFIG_VIRTIO) += virtio-rng.o
  common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
 +common-obj-$(CONFIG_VIRTIO) += virtio-bus.o
  common-obj-y += fw_cfg.o
  common-obj-$(CONFIG_PCI) += pci.o pci_bridge.o pci_bridge_dev.o
  common-obj-$(CONFIG_PCI) += msix.o msi.o
 diff --git a/hw/virtio-bus.c b/hw/virtio-bus.c
 new file mode 100644
 index 000..7a3d06e
 --- /dev/null
 +++ b/hw/virtio-bus.c
 @@ -0,0 +1,169 @@
 +/*
 + * VirtioBus
 + *
 + *  Copyright (C) 2012 : GreenSocs Ltd
 + *  http://www.greensocs.com/ , email: i...@greensocs.com
 + *
 + *  Developed by :
 + *  Frederic Konrad   fred.kon...@greensocs.com
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License as published by
 + * the Free Software Foundation, either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License along
 + * with this program; if not, see http://www.gnu.org/licenses/.
 + *
 + */
 +
 +#include hw.h
 +#include qemu-error.h
 +#include qdev.h
 +#include virtio-bus.h
 +#include virtio.h
 +
 +/* #define DEBUG_VIRTIO_BUS */
 +
 +#ifdef DEBUG_VIRTIO_BUS
 +#define DPRINTF(fmt, ...) \
 +do { printf(virtio_bus:  fmt , ## __VA_ARGS__); } while (0)
 +#else
 +#define DPRINTF(fmt, ...) do { } while (0)
 +#endif
 +
 +/* Plug the VirtIODevice */
 +int virtio_bus_plug_device(VirtIODevice *vdev)
 +{
 +DeviceState *qdev = DEVICE(vdev);
 +BusState *qbus = BUS(qdev_get_parent_bus(qdev));
 +VirtioBusState *bus = VIRTIO_BUS(qbus);
 +VirtioBusClass *klass = VIRTIO_BUS_GET_CLASS(bus);
 +DPRINTF(%s : plug device.\n, qbus-name);
 +
 +bus-vdev = vdev;
 +
 +if (klass-device_plugged != NULL) {
 +klass-device_plugged(qbus-parent);
 +}
 +
 +/*
 + * The lines below will disappear when we drop VirtIOBindings, at the end
 + * of the serie.

s/serie/series/g

 + */
 +bus-bindings.notify = klass-notify;
 +bus-bindings.save_config = klass-save_config;
 +bus-bindings.save_queue = klass-save_queue;
 +bus-bindings.load_config = klass-load_config;
 +bus-bindings.load_queue = klass-load_queue;
 +bus-bindings.load_done = klass-load_done;
 +bus-bindings.get_features = klass-get_features;
 +bus-bindings.query_guest_notifiers = klass-query_guest_notifiers;
 +bus-bindings.set_guest_notifiers = klass-set_guest_notifiers;
 +bus-bindings.set_host_notifier = klass-set_host_notifier;
 +bus-bindings.vmstate_change = klass-vmstate_change;
 +virtio_bind_device(bus-vdev, (bus-bindings), qbus-parent);
 +/*
 + */

No need for empty comment or the parens around bus-bindings.

 +
 +return 0;
 +}
 +
 +/* Reset the virtio_bus */
 +void virtio_bus_reset(VirtioBusState *bus)
 +{
 +DPRINTF(%s : reset device.\n, qbus-name);
 +if (bus-vdev != NULL) {
 +virtio_reset(bus-vdev);
 +}
 +}
 +
 +/* Destroy the VirtIODevice */
 +void virtio_bus_destroy_device(VirtioBusState *bus)
 +{
 +DeviceState *qdev;
 +BusState *qbus = BUS(bus);
 +VirtioBusClass *klass = VIRTIO_BUS_GET_CLASS(bus);
 +DPRINTF(%s : remove device.\n, qbus-name);
 +
 +if (bus-vdev != NULL) {
 +if (klass-device_unplug != NULL) {
 +klass-device_unplug(qbus-parent);
 +}
 +qdev = DEVICE(bus-vdev);
 +qdev_free(qdev);
 +bus-vdev = NULL;
 +}
 +}
 +
 +/* Get the device id of the plugged device. */
 +uint16_t get_virtio_device_id(VirtioBusState *bus)
 +{
 +assert(bus-vdev != NULL);
 +return bus-vdev-device_id;
 +}
 +
 +/* Get the nvectors field of the plugged device. */
 +int get_virtio_device_nvectors(VirtioBusState *bus)
 +{
 +assert(bus-vdev != NULL);
 +return bus-vdev-nvectors;
 +}
 +
 +/* Set the nvectors field of the plugged device. */
 +void set_virtio_device_nvectors(VirtioBusState *bus, int nvectors)
 +{
 +assert(bus-vdev != NULL);
 +

Re: [Qemu-devel] [RFC PATCH V8 03/15] virtio-pci-bus : Introduce virtio-pci-bus.

2013-01-02 Thread Anthony Liguori
fred.kon...@greensocs.com writes:

 From: KONRAD Frederic fred.kon...@greensocs.com

 Introduce virtio-pci-bus, which extends virtio-bus. It is used with virtio-pci
 transport device.

 Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com

Reviewed-by: Anthony Liguori aligu...@us.ibm.com

Regards,

Anthony Liguori

 ---
  hw/virtio-pci.c | 37 +
  hw/virtio-pci.h | 19 +--
  2 files changed, 54 insertions(+), 2 deletions(-)

 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index 7684ac9..859a1ed 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -32,6 +32,7 @@
  #include blockdev.h
  #include virtio-pci.h
  #include range.h
 +#include virtio-bus.h
  
  /* from Linux's linux/virtio_pci.h */
  
 @@ -1117,6 +1118,41 @@ static TypeInfo virtio_scsi_info = {
  .class_init= virtio_scsi_class_init,
  };
  
 +/* virtio-pci-bus */
 +
 +VirtioBusState *virtio_pci_bus_new(VirtIOPCIProxy *dev)
 +{
 +DeviceState *qdev = DEVICE(dev);
 +BusState *qbus = qbus_create(TYPE_VIRTIO_PCI_BUS, qdev, NULL);
 +VirtioBusState *bus = VIRTIO_BUS(qbus);
 +qbus-allow_hotplug = 0;
 +/* Only one virtio-device allowed for virtio-pci. */
 +qbus-max_dev = 1;
 +return bus;
 +}
 +
 +static void virtio_pci_bus_class_init(ObjectClass *klass, void *data)
 +{
 +VirtioBusClass *k = VIRTIO_BUS_CLASS(klass);
 +k-notify = virtio_pci_notify;
 +k-save_config = virtio_pci_save_config;
 +k-load_config = virtio_pci_load_config;
 +k-save_queue = virtio_pci_save_queue;
 +k-load_queue = virtio_pci_load_queue;
 +k-get_features = virtio_pci_get_features;
 +k-query_guest_notifiers = virtio_pci_query_guest_notifiers;
 +k-set_host_notifier = virtio_pci_set_host_notifier;
 +k-set_guest_notifiers = virtio_pci_set_guest_notifiers;
 +k-vmstate_change = virtio_pci_vmstate_change;
 +}
 +
 +static const TypeInfo virtio_pci_bus_info = {
 +.name  = TYPE_VIRTIO_PCI_BUS,
 +.parent= TYPE_VIRTIO_BUS,
 +.instance_size = sizeof(VirtioBusState),
 +.class_init= virtio_pci_bus_class_init,
 +};
 +
  static void virtio_pci_register_types(void)
  {
  type_register_static(virtio_blk_info);
 @@ -1125,6 +1161,7 @@ static void virtio_pci_register_types(void)
  type_register_static(virtio_balloon_info);
  type_register_static(virtio_scsi_info);
  type_register_static(virtio_rng_info);
 +type_register_static(virtio_pci_bus_info);
  }
  
  type_init(virtio_pci_register_types)
 diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
 index b58d9a2..0e3288e 100644
 --- a/hw/virtio-pci.h
 +++ b/hw/virtio-pci.h
 @@ -20,6 +20,21 @@
  #include virtio-rng.h
  #include virtio-serial.h
  #include virtio-scsi.h
 +#include virtio-bus.h
 +
 +/* VirtIOPCIProxy will be renammed VirtioPCIState at the end. */
 +typedef struct VirtIOPCIProxy VirtIOPCIProxy;
 +
 +/* virtio-pci-bus */
 +#define TYPE_VIRTIO_PCI_BUS virtio-pci-bus
 +#define VIRTIO_PCI_BUS_GET_CLASS(obj) \
 +OBJECT_GET_CLASS(VirtioBusClass, obj, TYPE_VIRTIO_PCI_BUS)
 +#define VIRTIO_PCI_BUS_CLASS(klass) \
 +OBJECT_CLASS_CHECK(VirtioBusClass, klass, TYPE_VIRTIO_PCI_BUS)
 +#define VIRTIO_PCI_BUS(obj) \
 +OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_PCI_BUS)
 +
 +VirtioBusState *virtio_pci_bus_new(VirtIOPCIProxy *dev);
  
  /* Performance improves when virtqueue kick processing is decoupled from the
   * vcpu thread using ioeventfd for some devices. */
 @@ -31,7 +46,7 @@ typedef struct {
  unsigned int users;
  } VirtIOIRQFD;
  
 -typedef struct {
 +struct VirtIOPCIProxy {
  PCIDevice pci_dev;
  VirtIODevice *vdev;
  MemoryRegion bar;
 @@ -51,7 +66,7 @@ typedef struct {
  bool ioeventfd_disabled;
  bool ioeventfd_started;
  VirtIOIRQFD *vector_irqfd;
 -} VirtIOPCIProxy;
 +};
  
  void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev);
  void virtio_pci_reset(DeviceState *d);
 -- 
 1.7.11.7




Re: [Qemu-devel] [RFC PATCH V8 04/15] virtio-pci : Refactor virtio-pci device.

2013-01-02 Thread Anthony Liguori
fred.kon...@greensocs.com writes:

 From: KONRAD Frederic fred.kon...@greensocs.com

 Create the virtio-pci device. This transport device will create a
 virtio-pci-bus, so one VirtIODevice can be connected.

 Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com
 ---
  hw/virtio-pci.c | 130 
 
  hw/virtio-pci.h |  19 +
  2 files changed, 149 insertions(+)

 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index 859a1ed..916ed7c 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -1118,6 +1118,133 @@ static TypeInfo virtio_scsi_info = {
  .class_init= virtio_scsi_class_init,
  };
  
 +/*
 + * virtio-pci : This is the PCIDevice which have a virtio-pci-bus.
 + */
 +
 +/* This is called by virtio-bus just after the device is plugged. */
 +static void virtio_pci_device_plugged(void *opaque)
 +{
 +VirtIOPCIProxy *proxy = VIRTIO_PCI(opaque);
 +VirtioBusState *bus = proxy-bus;
 +uint8_t *config;
 +uint32_t size;
 +
 +/* Put the PCI IDs */
 +switch (get_virtio_device_id(proxy-bus)) {
 +
 +
 +default:
 +error_report(unknown device id\n);
 +break;
 +
 +}
 +
 +/*
 + * vdev shouldn't be accessed directly by virtio-pci.
 + * We will remove that at the end of the series to keep virtio-x-pci
 + * working.
 + */
 +proxy-vdev = proxy-bus-vdev;
 +/*
 + */
 +
 +config = proxy-pci_dev.config;
 +if (proxy-class_code) {
 +pci_config_set_class(config, proxy-class_code);
 +}
 +pci_set_word(config + PCI_SUBSYSTEM_VENDOR_ID,
 + pci_get_word(config + PCI_VENDOR_ID));
 +pci_set_word(config + PCI_SUBSYSTEM_ID, 
 get_virtio_device_id(proxy-bus));
 +config[PCI_INTERRUPT_PIN] = 1;
 +
 +if (get_virtio_device_nvectors(bus) 
 +msix_init_exclusive_bar(proxy-pci_dev,
 +get_virtio_device_nvectors(bus), 1)) {
 +set_virtio_device_nvectors(bus, 0);
 +}
 +
 +proxy-pci_dev.config_write = virtio_write_config;
 +
 +size = VIRTIO_PCI_REGION_SIZE(proxy-pci_dev)
 + + get_virtio_device_config_len(bus);
 +if (size  (size-1)) {
 +size = 1  qemu_fls(size);
 +}
 +
 +memory_region_init_io(proxy-bar, virtio_pci_config_ops, proxy,
 +  virtio-pci, size);
 +pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
 + proxy-bar);
 +
 +if (!kvm_has_many_ioeventfds()) {
 +proxy-flags = ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
 +}
 +
 +proxy-host_features |= 0x1  VIRTIO_F_NOTIFY_ON_EMPTY;
 +proxy-host_features |= 0x1  VIRTIO_F_BAD_FEATURE;
 +proxy-host_features = get_virtio_device_features(bus,
 +  proxy-host_features);
 +}
 +
 +/* This is called by virtio-bus just before the device is unplugged. */
 +static void virtio_pci_device_unplug(void *opaque)
 +{
 +VirtIOPCIProxy *dev = VIRTIO_PCI(opaque);
 +virtio_pci_stop_ioeventfd(dev);
 +}
 +
 +static int virtio_pci_init(PCIDevice *pci_dev)
 +{
 +VirtIOPCIProxy *dev = VIRTIO_PCI(pci_dev);
 +VirtioPCIClass *k = VIRTIO_PCI_GET_CLASS(pci_dev);
 +dev-bus = virtio_pci_bus_new(dev);
 +if (k-init != NULL) {
 +return k-init(dev);
 +}
 +return 0;
 +}
 +
 +static void virtio_pci_exit(PCIDevice *pci_dev)
 +{
 +VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
 +VirtioBusState *bus = VIRTIO_BUS(proxy-bus);
 +BusState *qbus = BUS(proxy-bus);
 +virtio_bus_destroy_device(bus);
 +qbus_free(qbus);
 +}
 +
 +static void virtio_pci_rst(DeviceState *qdev)

s/rst/reset/

Regards,

Anthony Liguori

 +{
 +VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
 +VirtioBusState *bus = VIRTIO_BUS(proxy-bus);
 +virtio_pci_stop_ioeventfd(proxy);
 +virtio_bus_reset(bus);
 +msix_unuse_all_vectors(proxy-pci_dev);
 +proxy-flags = ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
 +}
 +
 +static void virtio_pci_class_init(ObjectClass *klass, void *data)
 +{
 +DeviceClass *dc = DEVICE_CLASS(klass);
 +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 +
 +k-init = virtio_pci_init;
 +k-exit = virtio_pci_exit;
 +k-vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
 +k-revision = VIRTIO_PCI_ABI_VERSION;
 +k-class_id = PCI_CLASS_OTHERS;
 +dc-reset = virtio_pci_rst;
 +}
 +
 +static const TypeInfo virtio_pci_info = {
 +.name  = TYPE_VIRTIO_PCI,
 +.parent= TYPE_PCI_DEVICE,
 +.instance_size = sizeof(VirtIOPCIProxy),
 +.class_init= virtio_pci_class_init,
 +.class_size= sizeof(VirtioPCIClass),
 +};
 +
  /* virtio-pci-bus */
  
  VirtioBusState *virtio_pci_bus_new(VirtIOPCIProxy *dev)
 @@ -1144,6 +1271,8 @@ static void virtio_pci_bus_class_init(ObjectClass 
 *klass, void *data)
  k-set_host_notifier = virtio_pci_set_host_notifier;
  k-set_guest_notifiers = virtio_pci_set_guest_notifiers;
  k-vmstate_change = virtio_pci_vmstate_change;
 

Re: [Qemu-devel] [RFC PATCH V8 05/15] virtio-device : Refactor virtio-device.

2013-01-02 Thread Anthony Liguori
fred.kon...@greensocs.com writes:

 From: KONRAD Frederic fred.kon...@greensocs.com

 Create the virtio-device which is abstract. All the virtio-device can extend
 this class.

 Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com

Reviewed-by: Anthony Liguori aligu...@us.ibm.com

Regards,

Anthony Liguori

 ---
  hw/virtio.c | 70 
 ++---
  hw/virtio.h | 31 +++
  2 files changed, 89 insertions(+), 12 deletions(-)

 diff --git a/hw/virtio.c b/hw/virtio.c
 index f40a8c5..e40fa12 100644
 --- a/hw/virtio.c
 +++ b/hw/virtio.c
 @@ -16,6 +16,7 @@
  #include trace.h
  #include qemu-error.h
  #include virtio.h
 +#include virtio-bus.h
  #include qemu-barrier.h
  
  /* The alignment to use between consumer and producer parts of vring.
 @@ -875,11 +876,16 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f)
  return 0;
  }
  
 -void virtio_cleanup(VirtIODevice *vdev)
 +void virtio_common_cleanup(VirtIODevice *vdev)
  {
  qemu_del_vm_change_state_handler(vdev-vmstate);
  g_free(vdev-config);
  g_free(vdev-vq);
 +}
 +
 +void virtio_cleanup(VirtIODevice *vdev)
 +{
 +virtio_common_cleanup(vdev);
  g_free(vdev);
  }
  
 @@ -902,14 +908,10 @@ static void virtio_vmstate_change(void *opaque, int 
 running, RunState state)
  }
  }
  
 -VirtIODevice *virtio_common_init(const char *name, uint16_t device_id,
 - size_t config_size, size_t struct_size)
 +void virtio_init(VirtIODevice *vdev, const char *name,
 + uint16_t device_id, size_t config_size)
  {
 -VirtIODevice *vdev;
  int i;
 -
 -vdev = g_malloc0(struct_size);
 -
  vdev-device_id = device_id;
  vdev-status = 0;
  vdev-isr = 0;
 @@ -917,20 +919,28 @@ VirtIODevice *virtio_common_init(const char *name, 
 uint16_t device_id,
  vdev-config_vector = VIRTIO_NO_VECTOR;
  vdev-vq = g_malloc0(sizeof(VirtQueue) * VIRTIO_PCI_QUEUE_MAX);
  vdev-vm_running = runstate_is_running();
 -for(i = 0; i  VIRTIO_PCI_QUEUE_MAX; i++) {
 +for (i = 0; i  VIRTIO_PCI_QUEUE_MAX; i++) {
  vdev-vq[i].vector = VIRTIO_NO_VECTOR;
  vdev-vq[i].vdev = vdev;
  }
  
  vdev-name = name;
  vdev-config_len = config_size;
 -if (vdev-config_len)
 +if (vdev-config_len) {
  vdev-config = g_malloc0(config_size);
 -else
 +} else {
  vdev-config = NULL;
 +}
 +vdev-vmstate = qemu_add_vm_change_state_handler(virtio_vmstate_change,
 + vdev);
 +}
  
 -vdev-vmstate = qemu_add_vm_change_state_handler(virtio_vmstate_change, 
 vdev);
 -
 +VirtIODevice *virtio_common_init(const char *name, uint16_t device_id,
 + size_t config_size, size_t struct_size)
 +{
 +VirtIODevice *vdev;
 +vdev = g_malloc0(struct_size);
 +virtio_init(vdev, name, device_id, config_size);
  return vdev;
  }
  
 @@ -1056,3 +1066,39 @@ EventNotifier 
 *virtio_queue_get_host_notifier(VirtQueue *vq)
  {
  return vq-host_notifier;
  }
 +
 +static int virtio_device_init(DeviceState *qdev)
 +{
 +VirtIODevice *vdev = VIRTIO_DEVICE(qdev);
 +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(qdev);
 +assert(k-init != NULL);
 +if (k-init(vdev)  0) {
 +return -1;
 +}
 +virtio_bus_plug_device(vdev);
 +return 0;
 +}
 +
 +static void virtio_device_class_init(ObjectClass *klass, void *data)
 +{
 +/* Set the default value here. */
 +DeviceClass *dc = DEVICE_CLASS(klass);
 +dc-init = virtio_device_init;
 +dc-bus_type = TYPE_VIRTIO_BUS;
 +}
 +
 +static const TypeInfo virtio_device_info = {
 +.name = TYPE_VIRTIO_DEVICE,
 +.parent = TYPE_DEVICE,
 +.instance_size = sizeof(VirtIODevice),
 +.class_init = virtio_device_class_init,
 +.abstract = true,
 +.class_size = sizeof(VirtioDeviceClass),
 +};
 +
 +static void virtio_register_types(void)
 +{
 +type_register_static(virtio_device_info);
 +}
 +
 +type_init(virtio_register_types)
 diff --git a/hw/virtio.h b/hw/virtio.h
 index 7c17f7b..98596a9 100644
 --- a/hw/virtio.h
 +++ b/hw/virtio.h
 @@ -108,8 +108,17 @@ typedef struct {
  
  #define VIRTIO_NO_VECTOR 0x
  
 +#define TYPE_VIRTIO_DEVICE virtio-device
 +#define VIRTIO_DEVICE_GET_CLASS(obj) \
 +OBJECT_GET_CLASS(VirtioDeviceClass, obj, TYPE_VIRTIO_DEVICE)
 +#define VIRTIO_DEVICE_CLASS(klass) \
 +OBJECT_CLASS_CHECK(VirtioDeviceClass, klass, TYPE_VIRTIO_DEVICE)
 +#define VIRTIO_DEVICE(obj) \
 +OBJECT_CHECK(VirtIODevice, (obj), TYPE_VIRTIO_DEVICE)
 +
  struct VirtIODevice
  {
 +DeviceState parent_obj;
  const char *name;
  uint8_t status;
  uint8_t isr;
 @@ -119,6 +128,10 @@ struct VirtIODevice
  void *config;
  uint16_t config_vector;
  int nvectors;
 +/*
 + * Will be removed ( at the end of the series ) as we have it in
 + * VirtioDeviceClass.
 + */
  uint32_t 

Re: [Qemu-devel] [RFC PATCH V8 04/15] virtio-pci : Refactor virtio-pci device.

2013-01-02 Thread KONRAD Frédéric

On 02/01/2013 15:14, Anthony Liguori wrote:

fred.kon...@greensocs.com writes:


From: KONRAD Frederic fred.kon...@greensocs.com

Create the virtio-pci device. This transport device will create a
virtio-pci-bus, so one VirtIODevice can be connected.

Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com
---
  hw/virtio-pci.c | 130 
  hw/virtio-pci.h |  19 +
  2 files changed, 149 insertions(+)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 859a1ed..916ed7c 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -1118,6 +1118,133 @@ static TypeInfo virtio_scsi_info = {
  .class_init= virtio_scsi_class_init,
  };
  
+/*

+ * virtio-pci : This is the PCIDevice which have a virtio-pci-bus.
+ */
+
+/* This is called by virtio-bus just after the device is plugged. */
+static void virtio_pci_device_plugged(void *opaque)
+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(opaque);
+VirtioBusState *bus = proxy-bus;
+uint8_t *config;
+uint32_t size;
+
+/* Put the PCI IDs */
+switch (get_virtio_device_id(proxy-bus)) {
+
+
+default:
+error_report(unknown device id\n);
+break;
+
+}
+
+/*
+ * vdev shouldn't be accessed directly by virtio-pci.
+ * We will remove that at the end of the series to keep virtio-x-pci
+ * working.
+ */
+proxy-vdev = proxy-bus-vdev;
+/*
+ */
+
+config = proxy-pci_dev.config;
+if (proxy-class_code) {
+pci_config_set_class(config, proxy-class_code);
+}
+pci_set_word(config + PCI_SUBSYSTEM_VENDOR_ID,
+ pci_get_word(config + PCI_VENDOR_ID));
+pci_set_word(config + PCI_SUBSYSTEM_ID, get_virtio_device_id(proxy-bus));
+config[PCI_INTERRUPT_PIN] = 1;
+
+if (get_virtio_device_nvectors(bus) 
+msix_init_exclusive_bar(proxy-pci_dev,
+get_virtio_device_nvectors(bus), 1)) {
+set_virtio_device_nvectors(bus, 0);
+}
+
+proxy-pci_dev.config_write = virtio_write_config;
+
+size = VIRTIO_PCI_REGION_SIZE(proxy-pci_dev)
+ + get_virtio_device_config_len(bus);
+if (size  (size-1)) {
+size = 1  qemu_fls(size);
+}
+
+memory_region_init_io(proxy-bar, virtio_pci_config_ops, proxy,
+  virtio-pci, size);
+pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
+ proxy-bar);
+
+if (!kvm_has_many_ioeventfds()) {
+proxy-flags = ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
+}
+
+proxy-host_features |= 0x1  VIRTIO_F_NOTIFY_ON_EMPTY;
+proxy-host_features |= 0x1  VIRTIO_F_BAD_FEATURE;
+proxy-host_features = get_virtio_device_features(bus,
+  proxy-host_features);
+}
+
+/* This is called by virtio-bus just before the device is unplugged. */
+static void virtio_pci_device_unplug(void *opaque)
+{
+VirtIOPCIProxy *dev = VIRTIO_PCI(opaque);
+virtio_pci_stop_ioeventfd(dev);
+}
+
+static int virtio_pci_init(PCIDevice *pci_dev)
+{
+VirtIOPCIProxy *dev = VIRTIO_PCI(pci_dev);
+VirtioPCIClass *k = VIRTIO_PCI_GET_CLASS(pci_dev);
+dev-bus = virtio_pci_bus_new(dev);
+if (k-init != NULL) {
+return k-init(dev);
+}
+return 0;
+}
+
+static void virtio_pci_exit(PCIDevice *pci_dev)
+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
+VirtioBusState *bus = VIRTIO_BUS(proxy-bus);
+BusState *qbus = BUS(proxy-bus);
+virtio_bus_destroy_device(bus);
+qbus_free(qbus);
+}
+
+static void virtio_pci_rst(DeviceState *qdev)

s/rst/reset/

Regards,

Anthony Liguori

virtio_pci_reset conflicts with another function.
Can I add a step to renamed it at the end when virtio_pci_reset is unused ?




+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
+VirtioBusState *bus = VIRTIO_BUS(proxy-bus);
+virtio_pci_stop_ioeventfd(proxy);
+virtio_bus_reset(bus);
+msix_unuse_all_vectors(proxy-pci_dev);
+proxy-flags = ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
+}
+
+static void virtio_pci_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+k-init = virtio_pci_init;
+k-exit = virtio_pci_exit;
+k-vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+k-revision = VIRTIO_PCI_ABI_VERSION;
+k-class_id = PCI_CLASS_OTHERS;
+dc-reset = virtio_pci_rst;
+}
+
+static const TypeInfo virtio_pci_info = {
+.name  = TYPE_VIRTIO_PCI,
+.parent= TYPE_PCI_DEVICE,
+.instance_size = sizeof(VirtIOPCIProxy),
+.class_init= virtio_pci_class_init,
+.class_size= sizeof(VirtioPCIClass),
+};
+
  /* virtio-pci-bus */
  
  VirtioBusState *virtio_pci_bus_new(VirtIOPCIProxy *dev)

@@ -1144,6 +1271,8 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, 
void *data)
  k-set_host_notifier = virtio_pci_set_host_notifier;
  k-set_guest_notifiers = virtio_pci_set_guest_notifiers;
  

Re: [Qemu-devel] [RFC PATCH V8 09/15] virtio-blk-pci : Switch to new API.

2013-01-02 Thread Anthony Liguori
fred.kon...@greensocs.com writes:

 From: KONRAD Frederic fred.kon...@greensocs.com

 Here the virtio-blk-pci is modified for the new API. The device virtio-blk-pci
 extends virtio-pci. It creates and connects a virtio-blk during the init.

 Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com
 ---
  hw/virtio-pci.c | 106 
 +---
  hw/virtio-pci.h |  14 +++-
  2 files changed, 53 insertions(+), 67 deletions(-)

 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index 877bf38..e3a8276 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -734,26 +734,6 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
 *vdev)
  proxy-host_features = vdev-get_features(vdev, proxy-host_features);
  }
  
 -static int virtio_blk_init_pci(PCIDevice *pci_dev)
 -{
 -VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
 -VirtIODevice *vdev;
 -
 -if (proxy-class_code != PCI_CLASS_STORAGE_SCSI 
 -proxy-class_code != PCI_CLASS_STORAGE_OTHER)
 -proxy-class_code = PCI_CLASS_STORAGE_SCSI;
 -
 -vdev = virtio_blk_init(pci_dev-qdev, proxy-blk);
 -if (!vdev) {
 -return -1;
 -}
 -vdev-nvectors = proxy-nvectors;
 -virtio_init_pci(proxy, vdev);
 -/* make the actual value visible */
 -proxy-nvectors = vdev-nvectors;
 -return 0;
 -}
 -
  static void virtio_exit_pci(PCIDevice *pci_dev)
  {
  VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
 @@ -762,15 +742,6 @@ static void virtio_exit_pci(PCIDevice *pci_dev)
  msix_uninit_exclusive_bar(pci_dev);
  }
  
 -static void virtio_blk_exit_pci(PCIDevice *pci_dev)
 -{
 -VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
 -
 -virtio_pci_stop_ioeventfd(proxy);
 -virtio_blk_exit(proxy-vdev);
 -virtio_exit_pci(pci_dev);
 -}
 -
  static int virtio_serial_init_pci(PCIDevice *pci_dev)
  {
  VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
 @@ -888,42 +859,6 @@ static void virtio_rng_exit_pci(PCIDevice *pci_dev)
  virtio_exit_pci(pci_dev);
  }
  
 -static Property virtio_blk_properties[] = {
 -DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0),
 -DEFINE_BLOCK_PROPERTIES(VirtIOPCIProxy, blk.conf),
 -DEFINE_BLOCK_CHS_PROPERTIES(VirtIOPCIProxy, blk.conf),
 -DEFINE_PROP_STRING(serial, VirtIOPCIProxy, blk.serial),
 -#ifdef __linux__
 -DEFINE_PROP_BIT(scsi, VirtIOPCIProxy, blk.scsi, 0, true),
 -#endif
 -DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
 -DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
 -DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
 -DEFINE_PROP_END_OF_LIST(),
 -};
 -
 -static void virtio_blk_class_init(ObjectClass *klass, void *data)
 -{
 -DeviceClass *dc = DEVICE_CLASS(klass);
 -PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 -
 -k-init = virtio_blk_init_pci;
 -k-exit = virtio_blk_exit_pci;
 -k-vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
 -k-device_id = PCI_DEVICE_ID_VIRTIO_BLOCK;
 -k-revision = VIRTIO_PCI_ABI_VERSION;
 -k-class_id = PCI_CLASS_STORAGE_SCSI;
 -dc-reset = virtio_pci_reset;
 -dc-props = virtio_blk_properties;
 -}
 -
 -static TypeInfo virtio_blk_info = {
 -.name  = virtio-blk-pci,
 -.parent= TYPE_PCI_DEVICE,
 -.instance_size = sizeof(VirtIOPCIProxy),
 -.class_init= virtio_blk_class_init,
 -};
 -
  static Property virtio_net_properties[] = {
  DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, 
 VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, false),
  DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 3),
 @@ -1248,6 +1183,45 @@ static const TypeInfo virtio_pci_info = {
  .class_size= sizeof(VirtioPCIClass),
  };
  
 +/* virtio-blk-pci */
 +
 +static Property virtio_blk_pci_properties[] = {
 +DEFINE_PROP_HEX32(class, VirtIOBlkPCI, parent_obj.class_code, 0),
 +DEFINE_PROP_BIT(ioeventfd, VirtIOBlkPCI, parent_obj.flags,
 +VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
 +DEFINE_PROP_UINT32(vectors, VirtIOBlkPCI, parent_obj.nvectors, 2),
 +DEFINE_VIRTIO_BLK_FEATURES(VirtIOBlkPCI, parent_obj.host_features),
 +DEFINE_VIRTIO_BLK_PROPERTIES(VirtIOBlkPCI, blk),
 +DEFINE_PROP_END_OF_LIST(),
 +};
 +
 +static int virtio_blk_pci_init(VirtIOPCIProxy *vpci_dev)
 +{
 +DeviceState *vdev;
 +VirtIOBlkPCI *dev = VIRTIO_BLK_PCI(vpci_dev);
 +vdev = qdev_create(BUS(vpci_dev-bus), virtio-blk);

This is the wrong way to do this.

You should do object_initialize() and object_property_add_child() in an
initialization function for this type.

You can qdev_set_parent_bus() in the initialization function, but defer
the qdev_init() until here for the child.

Regards,

Anthony Liguori

 +virtio_blk_set_conf(vdev, (dev-blk));
 +if (qdev_init(vdev)  0) {
 +return -1;
 +}
 +return 0;
 +}
 +
 +static void 

Re: [Qemu-devel] [RFC PATCH V8 01/15] qdev : add a maximum device allowed field for the bus.

2013-01-02 Thread KONRAD Frédéric

On 02/01/2013 15:16, Andreas Färber wrote:

Am 02.01.2013 15:08, schrieb Anthony Liguori:

fred.kon...@greensocs.com writes:


From: KONRAD Frederic fred.kon...@greensocs.com

Add a max_dev field to BusState to specify the maximum amount of devices allowed
on the bus ( have no effect if max_dev=0 )

Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com
---
  hw/qdev-core.h|  2 ++
  hw/qdev-monitor.c | 11 +++
  2 files changed, 13 insertions(+)

diff --git a/hw/qdev-core.h b/hw/qdev-core.h
index d672cca..af909b9 100644
--- a/hw/qdev-core.h
+++ b/hw/qdev-core.h
@@ -104,6 +104,8 @@ struct BusState {
  const char *name;
  int allow_hotplug;
  int max_index;
+/* maximum devices allowed on the bus, 0 : no limit. */
+int max_dev;

Can't for the virtio-bus case (which this is for AFAIU) the same effect
be achieved by setting max_index? If not, this could use some more
documentation - btw using gtk-doc style comments (above struct) would be
a bonus.
no, max_index is just a variable which count the number of bus children 
I think.

max_index is incremented each time bus_add_child is called.

maybe the name max_index is not a good choice ?



Regards,
Andreas

P.S. Please remember to use English punctuation rules, i.e. no spaces
before colon or inside parenthesis. ;)

:s sorry for that.








Re: [Qemu-devel] [PATCH 1/8] Move -I$(SRC_PATH)/include compiler flag to Makefile.objs

2013-01-02 Thread Eduardo Habkost
On Wed, Jan 02, 2013 at 02:48:00PM +0100, Andreas Färber wrote:
 Am 14.12.2012 18:21, schrieb Eduardo Habkost:
  On Fri, Dec 14, 2012 at 04:34:29PM +0100, Andreas Färber wrote:
  Waiting for ack or nack from Paolo here. I am expecting some overlap
  with his header file reorganization series.
 
  My previous (unanswered?) question was why you are moving vl.o lines in
  addition to the QEMU_CFLAGS lines that you mention in the commit message.
  
  
  I thought this note in the commit message would answer the question:
  
  This also moves the existing CFLAGS lines from Makefile.objs at the
  beginning of the file, to keep them all in the same place.
  
  In other words: it's cosmetic, just to keep all the QEMU_CLFAGS lines
  inside Makefile.objs grouped in a visible place at the beginning of the
  file.
  
  (You noticed that I am moving the vl.o lines _inside_ Makefile.obj,
  right? They are not being moved between different files.)
 
 Nah, you caught me there, must've misread that on a previous submission
 (or it changed or whatever).
 
 Anyway, Paolo's header reorganization was pulled by now, so this patch
 no longer seems necessary, series compiles without. Please shout if I'm
 misreading this!

Correct, commit 9d9199a003 from Paolo makes this patch unnecessary.

-- 
Eduardo



Re: [Qemu-devel] [PATCH 1/2] target-i386: kvm: -cpu host: use GET_SUPPORTED_CPUID for SVM features

2013-01-02 Thread Igor Mammedov
On Fri, 28 Dec 2012 16:37:33 -0200
Eduardo Habkost ehabk...@redhat.com wrote:

 The existing -cpu host code simply set every bit inside svm_features
 (initializing it to -1), and that makes it impossible to make the
 enforce/check options work properly when the user asks for SVM features
 explicitly in the command-line.
 
 So, instead of initializing svm_features to -1, use GET_SUPPORTED_CPUID
 to fill only the bits that are supported by the host (just like we do
 for all other CPUID feature words inside kvm_cpu_fill_host()).
 
 This will keep the existing behavior (as filter_features_for_kvm()
 already uses GET_SUPPORTED_CPUID to filter svm_features), but will allow
 us to properly check for KVM features inside
 kvm_check_features_against_host() later.
 
 For example, we will be able to make this:
 
   $ qemu-system-x86_64 -cpu ...,+pfthreshold,enforce
 
 refuse to start if the SVM pfthreshold feature is not supported by the
 host (after we fix kvm_check_features_against_host() to check SVM flags
 as well).
 
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Reviewed-By: Igor Mammedov imamm...@redhat.com

 ---
  target-i386/cpu.c | 11 ---
  1 file changed, 4 insertions(+), 7 deletions(-)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 3cd1cee..6e2d32d 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -897,13 +897,10 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
  }
  }
  
 -/*
 - * Every SVM feature requires emulation support in KVM - so we can't
 just
 - * read the host features here. KVM might even support SVM features not
 - * available on the host hardware. Just set all bits and mask out the
 - * unsupported ones later.
 - */
 -x86_cpu_def-svm_features = -1;
 +/* Other KVM-specific feature fields: */
 +x86_cpu_def-svm_features =
 +kvm_arch_get_supported_cpuid(s, 0x800A, 0, R_EDX);
 +
  #endif /* CONFIG_KVM */
  }
  




Re: [Qemu-devel] [PATCH] Change to correct PowerPC on a 64bit host

2013-01-02 Thread Samuel Seay
I did not catch that, somehow I managed to invert the logic when looking at
it. Maybe a g2h() (such a macro exist? what would be the proper method?)
around the newsp value would do it. I'll redo that this evening and attempt
to submit a newer patch. Considering I don't have direct internet access in
the VM, any suggestions to make everyone happy on the patch submission?

I might be able to redo the git setup on my mac and do it from there.

Samuel

On Wed, Jan 2, 2013 at 9:02 AM, Peter Maydell peter.mayd...@linaro.orgwrote:

 On 2 January 2013 13:01, Samuel Seay lightnin...@gmail.com wrote:
  The VM I did the work in doesn't have internet access and I was unsure
 how
  to do a text only email with gmail. With that said, the line that removed
  the env-gpr[1] is redudant as a few lines below in the original source
 it
  is set with newsp. The removed line would seg fault due to trying to
 write
  the value of env-gpr[1] into newsp, which is not valid in host.

 No, it's not redundant -- we must save the old value of gpr[1], exactly
 because we are about to change it (set it to newsp). The code is trying
 to do the right thing (copy the old env-gpr[1] value into the guest
 stack frame it is setting up) but in a broken way, so it must be fixed,
 not just removed.

 -- PMM



Re: [Qemu-devel] [RFC PATCH V8 01/15] qdev : add a maximum device allowed field for the bus.

2013-01-02 Thread Andreas Färber
Am 02.01.2013 15:08, schrieb Anthony Liguori:
 fred.kon...@greensocs.com writes:
 
 From: KONRAD Frederic fred.kon...@greensocs.com

 Add a max_dev field to BusState to specify the maximum amount of devices 
 allowed
 on the bus ( have no effect if max_dev=0 )

 Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com
 ---
  hw/qdev-core.h|  2 ++
  hw/qdev-monitor.c | 11 +++
  2 files changed, 13 insertions(+)

 diff --git a/hw/qdev-core.h b/hw/qdev-core.h
 index d672cca..af909b9 100644
 --- a/hw/qdev-core.h
 +++ b/hw/qdev-core.h
 @@ -104,6 +104,8 @@ struct BusState {
  const char *name;
  int allow_hotplug;
  int max_index;
 +/* maximum devices allowed on the bus, 0 : no limit. */
 +int max_dev;

Can't for the virtio-bus case (which this is for AFAIU) the same effect
be achieved by setting max_index? If not, this could use some more
documentation - btw using gtk-doc style comments (above struct) would be
a bonus.

Regards,
Andreas

P.S. Please remember to use English punctuation rules, i.e. no spaces
before colon or inside parenthesis. ;)

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH 1/2] target-i386: kvm: -cpu host: use GET_SUPPORTED_CPUID for SVM features

2013-01-02 Thread Andreas Färber
Am 28.12.2012 19:37, schrieb Eduardo Habkost:
 The existing -cpu host code simply set every bit inside svm_features
 (initializing it to -1), and that makes it impossible to make the
 enforce/check options work properly when the user asks for SVM features
 explicitly in the command-line.
 
 So, instead of initializing svm_features to -1, use GET_SUPPORTED_CPUID
 to fill only the bits that are supported by the host (just like we do
 for all other CPUID feature words inside kvm_cpu_fill_host()).
 
 This will keep the existing behavior (as filter_features_for_kvm()
 already uses GET_SUPPORTED_CPUID to filter svm_features), but will allow
 us to properly check for KVM features inside
 kvm_check_features_against_host() later.
 
 For example, we will be able to make this:
 
   $ qemu-system-x86_64 -cpu ...,+pfthreshold,enforce
 
 refuse to start if the SVM pfthreshold feature is not supported by the
 host (after we fix kvm_check_features_against_host() to check SVM flags
 as well).
 
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
 ---
  target-i386/cpu.c | 11 ---
  1 file changed, 4 insertions(+), 7 deletions(-)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 3cd1cee..6e2d32d 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -897,13 +897,10 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
  }
  }
  
 -/*
 - * Every SVM feature requires emulation support in KVM - so we can't just
 - * read the host features here. KVM might even support SVM features not
 - * available on the host hardware. Just set all bits and mask out the
 - * unsupported ones later.
 - */
 -x86_cpu_def-svm_features = -1;
 +/* Other KVM-specific feature fields: */
 +x86_cpu_def-svm_features =
 +kvm_arch_get_supported_cpuid(s, 0x800A, 0, R_EDX);

Is there no #define for this, similar to KVM_CPUID_FEATURES in 2/2?
FWIW indentation looks odd.

Andreas

 +
  #endif /* CONFIG_KVM */
  }
  
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH 0/2] Fixes for -cpu host KVM/SVM feature initialization

2013-01-02 Thread Andreas Färber
Am 28.12.2012 19:37, schrieb Eduardo Habkost:
 This series has two very similar fixes for feature initizliation for -cpu
 host. This should allow us to make the check/enforce code check for host
 support of KVM and SVM features, later.

I am out of my field here to verify whether this is semantically correct
and whether any fallback code may be needed. However, this will conflict
with X86CPU subclasses, so I'd be interested in taking this through my
qom-cpu queue if there are acks from the KVM folks.

Regards,
Andreas

 Eduardo Habkost (2):
   target-i386: kvm: -cpu host: use GET_SUPPORTED_CPUID for SVM features
   target-i386: kvm: enable all supported KVM features for -cpu host
 
  target-i386/cpu.c | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [Qemu-ppc] [PATCH] Change to correct PowerPC on a 64bit host

2013-01-02 Thread Alexander Graf
Please don't top post.

Am 02.01.2013 um 15:34 schrieb Samuel Seay lightnin...@gmail.com:

 I did not catch that, somehow I managed to invert the logic when looking at 
 it. Maybe a g2h() (such a macro exist? what would be the proper method?) 
 around the newsp value would do it.

Sounds reasonable OTOH ;).

 I'll redo that this evening and attempt to submit a newer patch. Considering 
 I don't have direct internet access in the VM, any suggestions to make 
 everyone happy on the patch submission?

Sure. Just export your patch using git format-patch, copy it to your host and 
run git send-mail from there ;).

 
 I might be able to redo the git setup on my mac and do it from there.

There are easy to install git packages for osx readily available, yeah.

Alex

 
 Samuel
 
 On Wed, Jan 2, 2013 at 9:02 AM, Peter Maydell peter.mayd...@linaro.org 
 wrote:
 On 2 January 2013 13:01, Samuel Seay lightnin...@gmail.com wrote:
  The VM I did the work in doesn't have internet access and I was unsure how
  to do a text only email with gmail. With that said, the line that removed
  the env-gpr[1] is redudant as a few lines below in the original source it
  is set with newsp. The removed line would seg fault due to trying to write
  the value of env-gpr[1] into newsp, which is not valid in host.
 
 No, it's not redundant -- we must save the old value of gpr[1], exactly
 because we are about to change it (set it to newsp). The code is trying
 to do the right thing (copy the old env-gpr[1] value into the guest
 stack frame it is setting up) but in a broken way, so it must be fixed,
 not just removed.
 
 -- PMM
 


Re: [Qemu-devel] [RFC 00/34] QOM realize, device-only plus ISA conversion

2013-01-02 Thread Anthony Liguori
So there are 3-4 different series in here all rolled into one.  This
makes review a bit tedious.

I'd suggest:

1) Pull out all of the QOMification stuff into a separate series.
2) Pull out the style cleanups into another series
3) Pull out the introduction of realize/unrealize into a series
4) Pull out the ISA use of realize into another series.

Individually, each series looks very easy to merge.  The general
approach is right along what I was hoping for.

Regards,

Anthony Liguori

Andreas Färber afaer...@suse.de writes:

 Hello Anthony and Paolo,

 As announced at KVM Forum, I have been preparing a new approach to 
 incrementally
 get us Anthony's QOM realizefn concept. A previous attempt by Paolo and me had
 been turned down for making this available at Object-level and over questions
 whether BlockDriverState may need its own three-stage realization model.

 So here's an all-new patchset doing it at DeviceState-level only, adapting the
 signature to void realize(DeviceState *, Error **).
 CPUState is on a good way to get derived from DeviceState, so in the future
 will benefit from this approach as well.

 I've picked ISADevice as an example to showcase what semantic effects the
 switch to QOM realizefn has (in the hopes the number of devices would be 
 small):
 As requested by Anthony for QOM CPUState reset and as seen with virtual 
 methods
 in object-oriented programming languages, it becomes the derived method's
 responsibility to decide when and whether to call the parent class' method. In
 lack of real vtables this requires to save the parent's method in case we want
 to call it; classes and matching macros may need to be added for that.
 Another point to note is that we should carefully distinguish what goes into
 the qdev initfn / QOM realizefn and what can already go into a QOM initfn.

 This series is rebased onto Julien's ioport cleanups (touching on ISA 
 devices).

 It starts by preparing the realized property, wrapping qdev's initfn (04).
 This means setting realized to true will not yet affect its children, as seen
 in Paolo's previous patches. That can be implemented later when the realizefns
 have been reviewed not to create new devices that would mess with recursive
 child realization. (In the previous series we had recursive realization but
 on my request dropped the hook-up of qdev due to the aforementioned quirks.)

 At that point there is a coexistence of QOM device realizefn and qdev initfn.
 For the first time now I have set out to actually eliminate some qdev initfns,
 that's what I chose ISA for. This consists of three parts, introducing
 realizefns for ISADevices (28) and recursively for PITs (31) and for PICs 
 (34).
 As seen for the PCI host bridge series, I've extracted general QOM cleanups
 from the main conversion patch to arrive at a clean QOM'ish state while
 hopefully keeping the main patches readable.

 This series also highlights an interesting find: Beyond the to-be-solved
 CPUState, there is also a realize function for BusState (02), which is not
 derived from DeviceState. :-)
 With the device-centric approach taken here it would still be possible to add
 realized properties to other types using their own infrastructure (e.g., a
 hardcoded setter rather than a realizefn hook).

 Posted as an RFC to encourage bikeshedding, in particular about the type names
 and macros introduced. Adding new header files to move them out of the source
 files for, e.g., vl.c is left for a follow-up, but for instance I was unsure
 about TYPE_ISA_FDC (should this be TYPE_ISA_FLOPPY_DRIVE_CONTROLLER as with
 PCI_HOST_BRIDGE rather than PHB?), and naming of type names and functions is
 highly inconsistent (e.g., isa_vga vs. vga_isa, or pic vs. i8259).

 Available for viewing/testing at:
 https://github.com/afaerber/qemu-cpu/commits/realize-qdev
 git://github.com/afaerber/qemu-cpu.git realize-qdev

 Regards,
 Andreas

 Cc: Anthony Liguori anth...@codemonkey.ws
 Cc: Paolo Bonzini pbonz...@redhat.com

 Cc: Julien Grall julien.gr...@citrix.com
 Cc: Frederic Konrad fred.kon...@greensocs.com
 Cc: Peter Maydell peter.mayd...@linaro.org
 Cc: Cornelia Huck cornelia.h...@de.ibm.com
 Cc: Kevin Wolf kw...@redhat.com
 Cc: Stefan Hajnoczi stefa...@redhat.com
 Cc: Markus Armbruster arm...@redhat.com
 Cc: Eduardo Habkost ehabk...@redhat.com
 Cc: Igor Mammedov imamm...@redhat.com

 Andreas Färber (34):
   qdev: Eliminate qdev_free() in favor of QOM
   qbus: QOM'ify qbus_realize()
   qdev: Fold state enum into bool realized
   qdev: Prepare realized property
   isa: Split off instance_init for ISADevice
   applesmc: QOM'ify
   cirrus_vga: QOM'ify ISA Cirrus VGA
   debugcon: QOM'ify ISA debug console
   fdc: QOM'ify ISA floppy controller
   i82374: QOM'ify
   i8259: Fix PIC_COMMON() macro
   i8259: QOM cleanups
   ide: QOM'ify ISA IDE
   m48t59: QOM'ify ISA M48T59 NVRAM
   mc146818rtc: QOM'ify
   ne2000-isa: QOM'ify
   parallel: QOM'ify
   pc: QOM'ify port 92
   pckbd: QOM'ify
   pcspk: QOM'ify
   

Re: [Qemu-devel] [PATCH 5/6] snapshot: qmp interface

2013-01-02 Thread Eric Blake
On 12/16/2012 11:25 PM, Wenchao Xia wrote:
   This patch changes the implemtion of external block snapshot

s/implemtion/implementation/

 to use internal unified interface, now qmp handler just do

s/do/does/

 a translation of request and submit.
   Also internal block snapshot qmp interface was added.
   Now add external snapshot, add/delete internal snapshot
 can be started in their own qmp interface or a group of
 BlockAction in qmp transaction interface.
 
 Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
 ---

 +++ b/qapi-schema.json

I didn't look at the code, because I want to make sure we get the
interface right, first.

 @@ -1458,17 +1458,36 @@
  { 'command': 'block_resize', 'data': { 'device': 'str', 'size': 'int' }}
  
  ##
 +# @SnapshotType
 +#
 +# An enumeration that tells QEMU what type of snapshot to access.
 +#
 +# @internal: QEMU should use internal snapshot in format such as qcow2.
 +#
 +# @external: QEMU should use backing file chain.
 +#
 +# Since: 1.4.
 +##
 +{ 'enum': 'SnapshotType'
 +  'data': [ 'internal', 'external' ] }
 +
 +##
  # @NewImageMode
  #
  # An enumeration that tells QEMU how to set the backing file path in
 -# a new image file.
 +# a new image file, or how to use internal snapshot record.
  #
 -# @existing: QEMU should look for an existing image file.
 +# @existing: QEMU should look for an existing image file or internal snapshot
 +#record. In external snapshot case, qemu will skip create new 
 image
 +#file, In internal snapshot case qemu will try use the existing

s/In/in/

 +#one. if not found operation would fail.

s/. if/. If/; s/would/will/

  #
 -# @absolute-paths: QEMU should create a new image with absolute paths
 -# for the backing file.
 +# @absolute-paths: QEMU should create a new image with absolute paths for
 +#  the backing file in external snapshot case, or create a 
 new
 +#  snapshot record in internal snapshot case which will
 +#  overwrite internal snapshot record if it already exist.

Doesn't quite make sense - internal snapshots don't record a path, so
why is absolute-paths the right mode for requesting the creation of a
new snapshot?   I think it would make more sense if you add a new mode,
and then declare that absolute-paths is invalid for internal snapshots,
and that the new mode is invalid for external snapshots.

  #
 -# Since: 1.1
 +# Since: 1.1, internal support since 1.4.
  ##
  { 'enum': 'NewImageMode'
'data': [ 'existing', 'absolute-paths' ] }
 @@ -1478,16 +1497,39 @@
  #
  # @device:  the name of the device to generate the snapshot from.
  #
 -# @snapshot-file: the target of the new image. A new file will be created.
 +# @snapshot-file: the target name of the snapshot. In external case, it is
 +# the new file's name, A new file will be created. In 
 internal

s/A/a/

and a new file is only created according to mode.

 +# case, it is the internal snapshot record's name and if it 
 is
 +# 'blank' name will be generated according to time.

Ugg.  Passing an empty string for snapshot-file as a special case seems
awkward; it might be better to make it an optional argument via
'*snapshot-file', where the argument is mandatory for external, but
omitting the argument on internal allows the fallback naming.  Or why do
you even need to worry about fallback naming?  Requiring the user to
always provide a record name may be easier to support (certainly fewer
corner cases to worry about).

  #
  # @format: #optional the format of the snapshot image, default is 'qcow2'.
  #
 -# @mode: #optional whether and how QEMU should create a new image, default is
 -#'absolute-paths'.
 +# @mode: #optional whether QEMU should create a new snapshot or use existing
 +#one, default is 'absolute-paths'.

Does this default still make sense for internal snapshots, or do you
need to document that the default mode differs depending on the type of
snapshot being taken?

 +#
 +# @type: #optional internal snapshot or external, default is 'external'.

Mention that this field is new since 1.4.

 +#
  ##
  { 'type': 'BlockdevSnapshot',
'data': { 'device': 'str', 'snapshot-file': 'str', '*format': 'str',
 -'*mode': 'NewImageMode' } }
 +'*mode': 'NewImageMode', '*type': 'SnapshotType'} }
 +
 +##
 +# @BlockdevSnapshotDelete
 +#
 +# @device:  the name of the device to delete the snapshot from.
 +#
 +# @snapshot-file: the target name of the snapshot. In external case, it is
 +# the file's name to be merged, In internal case, it is the
 +# internal snapshot record's name.

What happens if there is no record name (since the qcow2 file does not
require one)?

 +#
 +# @type: #optional internal snapshot or external, default is
 +#'external', note that delete 'external' snapshot is not supported
 +#now for that it is the same to commit it.

If external is 

Re: [Qemu-devel] [PATCH 2/2] target-i386: kvm: enable all supported KVM features for -cpu host

2013-01-02 Thread Igor Mammedov
On Fri, 28 Dec 2012 16:37:34 -0200
Eduardo Habkost ehabk...@redhat.com wrote:

 When using -cpu host, we don't need to use the kvm_default_features
 variable, as the user is explicitly asking QEMU to enable all feature
 supported by the host.
 
 This changes the kvm_cpu_fill_host() code to use GET_SUPPORTED_CPUID to
 initialize the kvm_features field, so we get all host KVM features
 enabled.

1_2 and 1_3 compat machines diff on pv_eoi flag, with this patch 1_2 might
have it set.
Is it ok from compat machines pov?

 
 This will also allow use to properly check/enforce KVM features inside
 kvm_check_features_against_host() later. For example, we will be able to
 make this:
 
   $ qemu-system-x86_64 -cpu ...,+kvm_pv_eoi,enforce
 
 refuse to start if kvm_pv_eoi is not supported by the host (after we fix
 kvm_check_features_against_host() to check KVM flags as well).
It would be nice to have kvm_check_features_against_host() patch in this
series to verify that this patch and previous patch works as expected.

 
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
 ---
  target-i386/cpu.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 6e2d32d..76f19f0 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -900,6 +900,8 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
  /* Other KVM-specific feature fields: */
  x86_cpu_def-svm_features =
  kvm_arch_get_supported_cpuid(s, 0x800A, 0, R_EDX);
 +x86_cpu_def-kvm_features =
 +kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0,
 R_EAX); 
  #endif /* CONFIG_KVM */
  }




Re: [Qemu-devel] [PATCH] Change to correct PowerPC on a 64bit host

2013-01-02 Thread Peter Maydell
On 2 January 2013 14:34, Samuel Seay lightnin...@gmail.com wrote:
 I did not catch that, somehow I managed to invert the logic when looking at
 it. Maybe a g2h() (such a macro exist? what would be the proper method?)
 around the newsp value would do it.

You want to use put_user() (without the __) -- this (a) takes a guest address
and (b) does the error checking for unwritable address, which __put_user
does not. [Don't be fooled by all the err |= __put_user code, this is bogus
because __put_user never fails.]

 I'll redo that this evening and attempt
 to submit a newer patch. Considering I don't have direct internet access in
 the VM, any suggestions to make everyone happy on the patch submission?

Use git format-patch in the VM to write the patch to file, copy the file out of
the VM and use git send-email to actually send it.

-- PMM



Re: [Qemu-devel] [PATCH 8/8] qom: Make CPU a child of DeviceState

2013-01-02 Thread Andreas Färber
Am 05.12.2012 17:49, schrieb Eduardo Habkost:
 This finally makes the CPU class a child of DeviceState, allowing us to
 start using DeviceState properties on CPU subclasses.

To avoid confusion with child properties and DeviceState vs.
DeviceClass I have reworded this to subclass of Device in my
qom-cpu-dev queue.

 
 It has no_user=1, as creating CPUs using -device doesn't work yet.
 

 (based on a previous patch from Igor Mammedov)

Can this comment be turned into or amended by the usual Signed-off-by?

 
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
 ---
 Changes v1 (imammedo) - v2 (ehabkost):
  - Change CPU type declaration to hae TYPE_DEVICE as parent
 
 Changes v2 - v3 (ehabkost):
  - Set no_user=1 on the CPU class
 ---
  include/qemu/cpu.h | 6 +++---
  qom/cpu.c  | 5 -
  2 files changed, 7 insertions(+), 4 deletions(-)
 
 diff --git a/include/qemu/cpu.h b/include/qemu/cpu.h
 index 61b7698..bc004fd 100644
 --- a/include/qemu/cpu.h
 +++ b/include/qemu/cpu.h
 @@ -20,7 +20,7 @@
  #ifndef QEMU_CPU_H
  #define QEMU_CPU_H
  
 -#include qemu/object.h
 +#include hw/qdev-core.h
  #include qemu-thread.h
  
  /**
[...]
 diff --git a/qom/cpu.c b/qom/cpu.c
 index 5b36046..d301f72 100644
 --- a/qom/cpu.c
 +++ b/qom/cpu.c
 @@ -20,6 +20,7 @@
  
  #include qemu/cpu.h
  #include qemu-common.h
 +#include hw/qdev-core.h

Already included via qom/cpu.h (formerly qemu/cpu.h) above, dropping.

  
  void cpu_reset(CPUState *cpu)
  {
 @@ -36,14 +37,16 @@ static void cpu_common_reset(CPUState *cpu)
  
  static void cpu_class_init(ObjectClass *klass, void *data)
  {
 +DeviceClass *dc = DEVICE_CLASS(klass);
  CPUClass *k = CPU_CLASS(klass);
  
  k-reset = cpu_common_reset;
 +dc-no_user = 1;
  }

I wonder if we should add a comment that we are intentionally not
hooking up dc-reset (yet)?

  
  static TypeInfo cpu_type_info = {

Would like to add the missing const while touching this.

  .name = TYPE_CPU,
 -.parent = TYPE_OBJECT,
 +.parent = TYPE_DEVICE,
  .instance_size = sizeof(CPUState),
  .abstract = true,
  .class_size = sizeof(CPUClass),

My testing so far confirms that the combination of object_new() without
qdev_init[_nofail]() is working fine.

Using qdev_create() in the current state of stubs would lead to a silly
if-bus-is-NULL-set-it-to-NULL sequence on top of object_new(). I do not
expect qdev_create() to grow in functionality, so continuing to use
object_new() should be okay - SoCs like my Tegra model may want to use
object_initialize() so we cannot prescribe using qdev_create() anyway.

qdev_init_nofail() would call the qdev initfn (to be replaced by
realizefn, not used for CPU in this patch), then if no parent add it to
/machine/unassigned, register VMSD if not NULL, update the internal
state (blocking static property changes) and if hotplugged reset (unused
due to dc-no_user and lack of dc-reset). The /machine/unassigned part
may be interesting, e.g., for APIC modelling (so that we can model the
former ptr property / now pointer-setting as a link property).

With these considerations I am leaning towards accepting this patch if
nobody objects, so that we can move on to the next refactorings...

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



[Qemu-devel] [PATCH 02/18] configure: add CONFIG_VIRTIO_BLK_DATA_PLANE

2013-01-02 Thread Stefan Hajnoczi
The virtio-blk-data-plane feature only works with Linux AIO.  Therefore
add a ./configure option and necessary checks to implement this
dependency.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 configure | 21 +
 1 file changed, 21 insertions(+)

diff --git a/configure b/configure
index b0c7e54..cc1e20a 100755
--- a/configure
+++ b/configure
@@ -223,6 +223,7 @@ libiscsi=
 coroutine=
 seccomp=
 glusterfs=
+virtio_blk_data_plane=
 
 # parse CC options first
 for opt do
@@ -882,6 +883,10 @@ for opt do
   ;;
   --enable-glusterfs) glusterfs=yes
   ;;
+  --disable-virtio-blk-data-plane) virtio_blk_data_plane=no
+  ;;
+  --enable-virtio-blk-data-plane) virtio_blk_data_plane=yes
+  ;;
   *) echo ERROR: unknown option $opt; show_help=yes
   ;;
   esac
@@ -2274,6 +2279,17 @@ EOF
 fi
 
 ##
+# adjust virtio-blk-data-plane based on linux-aio
+
+if test $virtio_blk_data_plane = yes -a \
+   $linux_aio != yes ; then
+  echo Error: virtio-blk-data-plane requires Linux AIO, please try 
--enable-linux-aio
+  exit 1
+elif test -z $virtio_blk_data_plane ; then
+  virtio_blk_data_plane=$linux_aio
+fi
+
+##
 # attr probe
 
 if test $attr != no ; then
@@ -3289,6 +3305,7 @@ echo build guest agent $guest_agent
 echo seccomp support   $seccomp
 echo coroutine backend $coroutine_backend
 echo GlusterFS support $glusterfs
+echo virtio-blk-data-plane $virtio_blk_data_plane
 
 if test $sdl_too_old = yes; then
 echo - Your SDL version is too old - please upgrade to have SDL support
@@ -3634,6 +3651,10 @@ if test $glusterfs = yes ; then
   echo CONFIG_GLUSTERFS=y  $config_host_mak
 fi
 
+if test $virtio_blk_data_plane = yes ; then
+  echo CONFIG_VIRTIO_BLK_DATA_PLANE=y  $config_host_mak
+fi
+
 # USB host support
 case $usb in
 linux)
-- 
1.8.0.2




[Qemu-devel] [PULL v2 00/18] Block patches

2013-01-02 Thread Stefan Hajnoczi
Paolo's include/ reorganization was merged and this pull request had conflicts.
Resolved in v2.

The following changes since commit 5928023cef87847a295035487397b9ec701fdd6b:

  pflash_cfi01: Suppress warning when Linux probes for AMD flash (2013-01-01 
13:05:57 +0100)

are available in the git repository at:

  git://github.com/stefanha/qemu.git block

for you to fetch changes up to d6b1ef89a1ede41334e4d0fa27e600e0b4d4f209:

  sheepdog: pass oid directly to send_pending_req() (2013-01-02 16:09:00 +0100)


Alexey Zaytsev (1):
  virtio-blk: Return UNSUPP for unknown request types

Liu Yuan (2):
  sheepdog: don't update inode when create_and_write fails
  sheepdog: pass oid directly to send_pending_req()

Stefan Hajnoczi (12):
  raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane
  configure: add CONFIG_VIRTIO_BLK_DATA_PLANE
  dataplane: add host memory mapping code
  dataplane: add virtqueue vring code
  dataplane: add event loop
  dataplane: add Linux AIO request queue
  iov: add iov_discard_front/back() to remove data
  test-iov: add iov_discard_front/back() testcases
  iov: add qemu_iovec_concat_iov()
  virtio-blk: restore VirtIOBlkConf-config_wce flag
  dataplane: add virtio-blk data plane code
  virtio-blk: add x-data-plane=on|off performance feature

Stefan Weil (1):
  block/raw-win32: Fix compiler warnings (wrong format specifiers)

liguang (2):
  cutils: change strtosz_suffix_unit function
  qemu-img: report size overflow error message

 block/raw-posix.c  |  34 
 block/raw-win32.c  |   4 +-
 block/sheepdog.c   |  11 +-
 configure  |  21 ++
 cutils.c   |   6 +-
 hw/Makefile.objs   |   2 +-
 hw/dataplane/Makefile.objs |   3 +
 hw/dataplane/event-poll.c  | 100 ++
 hw/dataplane/event-poll.h  |  40 
 hw/dataplane/hostmem.c | 176 +
 hw/dataplane/hostmem.h |  57 ++
 hw/dataplane/ioq.c | 117 
 hw/dataplane/ioq.h |  57 ++
 hw/dataplane/virtio-blk.c  | 465 +
 hw/dataplane/virtio-blk.h  |  29 +++
 hw/dataplane/vring.c   | 362 +++
 hw/dataplane/vring.h   |  62 ++
 hw/virtio-blk.c|  53 +-
 hw/virtio-blk.h|   5 +-
 hw/virtio-pci.c|   4 +
 include/block/block.h  |   9 +
 include/qemu-common.h  |   3 +
 include/qemu/iov.h |  13 ++
 iov.c  |  90 +++--
 qemu-img.c |  10 +-
 tests/test-iov.c   | 150 +++
 trace-events   |   9 +
 27 files changed, 1863 insertions(+), 29 deletions(-)
 create mode 100644 hw/dataplane/Makefile.objs
 create mode 100644 hw/dataplane/event-poll.c
 create mode 100644 hw/dataplane/event-poll.h
 create mode 100644 hw/dataplane/hostmem.c
 create mode 100644 hw/dataplane/hostmem.h
 create mode 100644 hw/dataplane/ioq.c
 create mode 100644 hw/dataplane/ioq.h
 create mode 100644 hw/dataplane/virtio-blk.c
 create mode 100644 hw/dataplane/virtio-blk.h
 create mode 100644 hw/dataplane/vring.c
 create mode 100644 hw/dataplane/vring.h

-- 
1.8.0.2



[Qemu-devel] [PATCH 09/18] iov: add qemu_iovec_concat_iov()

2013-01-02 Thread Stefan Hajnoczi
The qemu_iovec_concat() function copies a subset of a QEMUIOVector.  The
new qemu_iovec_concat_iov() function does the same for a iov/cnt pair.

It is easy to define qemu_iovec_concat() in terms of
qemu_iovec_concat_iov().  The existing code is mostly unchanged, except
for the assertion src-size = soffset, which cannot be efficiently
checked upfront on a iov/cnt pair.  Instead we assert upon hitting the
end of src with an unsatisfied soffset.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 include/qemu-common.h |  3 +++
 iov.c | 39 +++
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index 6871cab..2b83de3 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -329,6 +329,9 @@ void qemu_iovec_init_external(QEMUIOVector *qiov, struct 
iovec *iov, int niov);
 void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len);
 void qemu_iovec_concat(QEMUIOVector *dst,
QEMUIOVector *src, size_t soffset, size_t sbytes);
+void qemu_iovec_concat_iov(QEMUIOVector *dst,
+   struct iovec *src_iov, unsigned int src_cnt,
+   size_t soffset, size_t sbytes);
 void qemu_iovec_destroy(QEMUIOVector *qiov);
 void qemu_iovec_reset(QEMUIOVector *qiov);
 size_t qemu_iovec_to_buf(QEMUIOVector *qiov, size_t offset,
diff --git a/iov.c b/iov.c
index 92ad77b..c0f5c56 100644
--- a/iov.c
+++ b/iov.c
@@ -289,34 +289,49 @@ void qemu_iovec_add(QEMUIOVector *qiov, void *base, 
size_t len)
 }
 
 /*
- * Concatenates (partial) iovecs from src to the end of dst.
+ * Concatenates (partial) iovecs from src_iov to the end of dst.
  * It starts copying after skipping `soffset' bytes at the
  * beginning of src and adds individual vectors from src to
  * dst copies up to `sbytes' bytes total, or up to the end
- * of src if it comes first.  This way, it is okay to specify
+ * of src_iov if it comes first.  This way, it is okay to specify
  * very large value for `sbytes' to indicate up to the end
  * of src.
  * Only vector pointers are processed, not the actual data buffers.
  */
-void qemu_iovec_concat(QEMUIOVector *dst,
-   QEMUIOVector *src, size_t soffset, size_t sbytes)
+void qemu_iovec_concat_iov(QEMUIOVector *dst,
+   struct iovec *src_iov, unsigned int src_cnt,
+   size_t soffset, size_t sbytes)
 {
 int i;
 size_t done;
-struct iovec *siov = src-iov;
 assert(dst-nalloc != -1);
-assert(src-size = soffset);
-for (i = 0, done = 0; done  sbytes  i  src-niov; i++) {
-if (soffset  siov[i].iov_len) {
-size_t len = MIN(siov[i].iov_len - soffset, sbytes - done);
-qemu_iovec_add(dst, siov[i].iov_base + soffset, len);
+for (i = 0, done = 0; done  sbytes  i  src_cnt; i++) {
+if (soffset  src_iov[i].iov_len) {
+size_t len = MIN(src_iov[i].iov_len - soffset, sbytes - done);
+qemu_iovec_add(dst, src_iov[i].iov_base + soffset, len);
 done += len;
 soffset = 0;
 } else {
-soffset -= siov[i].iov_len;
+soffset -= src_iov[i].iov_len;
 }
 }
-/* return done; */
+assert(soffset == 0); /* offset beyond end of src */
+}
+
+/*
+ * Concatenates (partial) iovecs from src to the end of dst.
+ * It starts copying after skipping `soffset' bytes at the
+ * beginning of src and adds individual vectors from src to
+ * dst copies up to `sbytes' bytes total, or up to the end
+ * of src if it comes first.  This way, it is okay to specify
+ * very large value for `sbytes' to indicate up to the end
+ * of src.
+ * Only vector pointers are processed, not the actual data buffers.
+ */
+void qemu_iovec_concat(QEMUIOVector *dst,
+   QEMUIOVector *src, size_t soffset, size_t sbytes)
+{
+qemu_iovec_concat_iov(dst, src-iov, src-niov, soffset, sbytes);
 }
 
 void qemu_iovec_destroy(QEMUIOVector *qiov)
-- 
1.8.0.2




[Qemu-devel] [PATCH 10/18] virtio-blk: restore VirtIOBlkConf-config_wce flag

2013-01-02 Thread Stefan Hajnoczi
Two slightly different versions of a patch to conditionally set
VIRTIO_BLK_F_CONFIG_WCE through the config-wce qdev property have been
applied (ea776abca and eec7f96c2).  David Gibson
da...@gibson.dropbear.id.au noticed that the config-wce
property is broken as a result and fixed it recently.

The fix sets the host_features VIRTIO_BLK_F_CONFIG_WCE bit from a qdev
property.  Unfortunately, the virtio device then has no chance to test
for the presence of the feature bit during virtio_blk_init().

Therefore, reinstate the VirtIOBlkConf-config_wce flag.  Drop the
duplicate qdev property to set the host_features bit.  The
VirtIOBlkConf-config_wce flag will be used by virtio-blk-data-plane in
a later patch.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/virtio-blk.c | 3 +++
 hw/virtio-blk.h | 4 ++--
 hw/virtio-pci.c | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 90cfa24..f004148 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -524,6 +524,9 @@ static uint32_t virtio_blk_get_features(VirtIODevice *vdev, 
uint32_t features)
 features |= (1  VIRTIO_BLK_F_BLK_SIZE);
 features |= (1  VIRTIO_BLK_F_SCSI);
 
+if (s-blk-config_wce) {
+features |= (1  VIRTIO_BLK_F_CONFIG_WCE);
+}
 if (bdrv_enable_write_cache(s-bs))
 features |= (1  VIRTIO_BLK_F_WCE);
 
diff --git a/hw/virtio-blk.h b/hw/virtio-blk.h
index 651a000..454f445 100644
--- a/hw/virtio-blk.h
+++ b/hw/virtio-blk.h
@@ -104,10 +104,10 @@ struct VirtIOBlkConf
 BlockConf conf;
 char *serial;
 uint32_t scsi;
+uint32_t config_wce;
 };
 
 #define DEFINE_VIRTIO_BLK_FEATURES(_state, _field) \
-DEFINE_VIRTIO_COMMON_FEATURES(_state, _field), \
-DEFINE_PROP_BIT(config-wce, _state, _field, VIRTIO_BLK_F_CONFIG_WCE, 
true)
+DEFINE_VIRTIO_COMMON_FEATURES(_state, _field)
 
 #endif
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index d2d2454..3cab783 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -894,6 +894,7 @@ static Property virtio_blk_properties[] = {
 #ifdef __linux__
 DEFINE_PROP_BIT(scsi, VirtIOPCIProxy, blk.scsi, 0, true),
 #endif
+DEFINE_PROP_BIT(config-wce, VirtIOPCIProxy, blk.config_wce, 0, true),
 DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
 DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
 DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
-- 
1.8.0.2




[Qemu-devel] [PATCH 11/18] dataplane: add virtio-blk data plane code

2013-01-02 Thread Stefan Hajnoczi
virtio-blk-data-plane is a subset implementation of virtio-blk.  It only
handles read, write, and flush requests.  It does this using a dedicated
thread that executes an epoll(2)-based event loop and processes I/O
using Linux AIO.

This approach performs very well but can be used for raw image files
only.  The number of IOPS achieved has been reported to be several times
higher than the existing virtio-blk implementation.

Eventually it should be possible to unify virtio-blk-data-plane with the
main body of QEMU code once the block layer and hardware emulation is
able to run outside the global mutex.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/dataplane/Makefile.objs |   2 +-
 hw/dataplane/virtio-blk.c  | 465 +
 hw/dataplane/virtio-blk.h  |  29 +++
 hw/virtio-blk.h|   1 +
 trace-events   |   6 +
 5 files changed, 502 insertions(+), 1 deletion(-)
 create mode 100644 hw/dataplane/virtio-blk.c
 create mode 100644 hw/dataplane/virtio-blk.h

diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
index abd408f..682aa9e 100644
--- a/hw/dataplane/Makefile.objs
+++ b/hw/dataplane/Makefile.objs
@@ -1,3 +1,3 @@
 ifeq ($(CONFIG_VIRTIO), y)
-common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o 
ioq.o
+common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o 
ioq.o virtio-blk.o
 endif
diff --git a/hw/dataplane/virtio-blk.c b/hw/dataplane/virtio-blk.c
new file mode 100644
index 000..4c4ad84
--- /dev/null
+++ b/hw/dataplane/virtio-blk.c
@@ -0,0 +1,465 @@
+/*
+ * Dedicated thread for virtio-blk I/O processing
+ *
+ * Copyright 2012 IBM, Corp.
+ * Copyright 2012 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Stefan Hajnoczi stefa...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include trace.h
+#include qemu/iov.h
+#include event-poll.h
+#include qemu/thread.h
+#include vring.h
+#include ioq.h
+#include migration/migration.h
+#include hw/virtio-blk.h
+#include hw/dataplane/virtio-blk.h
+
+enum {
+SEG_MAX = 126,  /* maximum number of I/O segments */
+VRING_MAX = SEG_MAX + 2,/* maximum number of vring descriptors */
+REQ_MAX = VRING_MAX,/* maximum number of requests in the vring,
+ * is VRING_MAX / 2 with traditional and
+ * VRING_MAX with indirect descriptors */
+};
+
+typedef struct {
+struct iocb iocb;   /* Linux AIO control block */
+QEMUIOVector *inhdr;/* iovecs for virtio_blk_inhdr */
+unsigned int head;  /* vring descriptor index */
+} VirtIOBlockRequest;
+
+struct VirtIOBlockDataPlane {
+bool started;
+QEMUBH *start_bh;
+QemuThread thread;
+
+VirtIOBlkConf *blk;
+int fd; /* image file descriptor */
+
+VirtIODevice *vdev;
+Vring vring;/* virtqueue vring */
+EventNotifier *guest_notifier;  /* irq */
+
+EventPoll event_poll;   /* event poller */
+EventHandler io_handler;/* Linux AIO completion handler */
+EventHandler notify_handler;/* virtqueue notify handler */
+
+IOQueue ioqueue;/* Linux AIO queue (should really be per
+   dataplane thread) */
+VirtIOBlockRequest requests[REQ_MAX]; /* pool of requests, managed by the
+ queue */
+
+unsigned int num_reqs;
+
+Error *migration_blocker;
+};
+
+/* Raise an interrupt to signal guest, if necessary */
+static void notify_guest(VirtIOBlockDataPlane *s)
+{
+if (!vring_should_notify(s-vdev, s-vring)) {
+return;
+}
+
+event_notifier_set(s-guest_notifier);
+}
+
+static void complete_request(struct iocb *iocb, ssize_t ret, void *opaque)
+{
+VirtIOBlockDataPlane *s = opaque;
+VirtIOBlockRequest *req = container_of(iocb, VirtIOBlockRequest, iocb);
+struct virtio_blk_inhdr hdr;
+int len;
+
+if (likely(ret = 0)) {
+hdr.status = VIRTIO_BLK_S_OK;
+len = ret;
+} else {
+hdr.status = VIRTIO_BLK_S_IOERR;
+len = 0;
+}
+
+trace_virtio_blk_data_plane_complete_request(s, req-head, ret);
+
+qemu_iovec_from_buf(req-inhdr, 0, hdr, sizeof(hdr));
+qemu_iovec_destroy(req-inhdr);
+g_slice_free(QEMUIOVector, req-inhdr);
+
+/* According to the virtio specification len should be the number of bytes
+ * written to, but for virtio-blk it seems to be the number of bytes
+ * transferred plus the status bytes.
+ */
+vring_push(s-vring, req-head, len + sizeof(hdr));
+
+s-num_reqs--;
+}
+
+static void complete_request_early(VirtIOBlockDataPlane *s, unsigned int head,
+   QEMUIOVector *inhdr, unsigned char status)
+{

Re: [Qemu-devel] [PATCH 1/2] target-i386: kvm: -cpu host: use GET_SUPPORTED_CPUID for SVM features

2013-01-02 Thread Eduardo Habkost
On Wed, Jan 02, 2013 at 03:39:03PM +0100, Andreas Färber wrote:
 Am 28.12.2012 19:37, schrieb Eduardo Habkost:
  The existing -cpu host code simply set every bit inside svm_features
  (initializing it to -1), and that makes it impossible to make the
  enforce/check options work properly when the user asks for SVM features
  explicitly in the command-line.
  
  So, instead of initializing svm_features to -1, use GET_SUPPORTED_CPUID
  to fill only the bits that are supported by the host (just like we do
  for all other CPUID feature words inside kvm_cpu_fill_host()).
  
  This will keep the existing behavior (as filter_features_for_kvm()
  already uses GET_SUPPORTED_CPUID to filter svm_features), but will allow
  us to properly check for KVM features inside
  kvm_check_features_against_host() later.
  
  For example, we will be able to make this:
  
$ qemu-system-x86_64 -cpu ...,+pfthreshold,enforce
  
  refuse to start if the SVM pfthreshold feature is not supported by the
  host (after we fix kvm_check_features_against_host() to check SVM flags
  as well).
  
  Signed-off-by: Eduardo Habkost ehabk...@redhat.com
  ---
   target-i386/cpu.c | 11 ---
   1 file changed, 4 insertions(+), 7 deletions(-)
  
  diff --git a/target-i386/cpu.c b/target-i386/cpu.c
  index 3cd1cee..6e2d32d 100644
  --- a/target-i386/cpu.c
  +++ b/target-i386/cpu.c
  @@ -897,13 +897,10 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
   }
   }
   
  -/*
  - * Every SVM feature requires emulation support in KVM - so we can't 
  just
  - * read the host features here. KVM might even support SVM features not
  - * available on the host hardware. Just set all bits and mask out the
  - * unsupported ones later.
  - */
  -x86_cpu_def-svm_features = -1;
  +/* Other KVM-specific feature fields: */
  +x86_cpu_def-svm_features =
  +kvm_arch_get_supported_cpuid(s, 0x800A, 0, R_EDX);
 
 Is there no #define for this, similar to KVM_CPUID_FEATURES in 2/2?

I believve KVM_CPUID_FEATURES is the exception, all other leaves have
their own numbers hardcoded in the code.

(The way I plan to fix this is to introduce the feature-array and a
CPUID leaf/register table for the feature-array, so kvm_cpu_fill_host(),
filter_features_for_kvm(), kvm_check_features_against_host()  similar
functions would always handle exactly the same set of CPUID leaves, by
simply looking at the table).


 FWIW indentation looks odd.

Oops, I intended to follow the existing style used for ext2_features and
ext3_features and use 8 spaces instead of 12. I will resubmit.

-- 
Eduardo



Re: [Qemu-devel] [PATCH 2/2] target-i386: kvm: enable all supported KVM features for -cpu host

2013-01-02 Thread Eduardo Habkost
On Wed, Jan 02, 2013 at 03:52:45PM +0100, Igor Mammedov wrote:
 On Fri, 28 Dec 2012 16:37:34 -0200
 Eduardo Habkost ehabk...@redhat.com wrote:
 
  When using -cpu host, we don't need to use the kvm_default_features
  variable, as the user is explicitly asking QEMU to enable all feature
  supported by the host.
  
  This changes the kvm_cpu_fill_host() code to use GET_SUPPORTED_CPUID to
  initialize the kvm_features field, so we get all host KVM features
  enabled.
 
 1_2 and 1_3 compat machines diff on pv_eoi flag, with this patch 1_2 might
 have it set.
 Is it ok from compat machines pov?

-cpu host is completely dependent on host hardware and kernel version,
there are no compatibility expectations.

 
  
  This will also allow use to properly check/enforce KVM features inside
  kvm_check_features_against_host() later. For example, we will be able to
  make this:
  
$ qemu-system-x86_64 -cpu ...,+kvm_pv_eoi,enforce
  
  refuse to start if kvm_pv_eoi is not supported by the host (after we fix
  kvm_check_features_against_host() to check KVM flags as well).
 It would be nice to have kvm_check_features_against_host() patch in this
 series to verify that this patch and previous patch works as expected.

The kvm_check_features_against_host() change would be a user-visible
behavior change, and I wanted to keep the changes minimal by now. (the
main reason I submitted this earlier is to make it easier to clean up
the init code for CPU subclasses)

I was planning to introduce those behavior changes only after
introducing the feature-word array, so the kvm_check_features_against_host()
code can become simpler and easier to review (instead of adding 4
additional items to the messy struct model_features_t array). But if you
think we can introduce those changes now, I will be happy to send a
series that changes that code as well.

 
  
  Signed-off-by: Eduardo Habkost ehabk...@redhat.com
  ---
   target-i386/cpu.c | 2 ++
   1 file changed, 2 insertions(+)
  
  diff --git a/target-i386/cpu.c b/target-i386/cpu.c
  index 6e2d32d..76f19f0 100644
  --- a/target-i386/cpu.c
  +++ b/target-i386/cpu.c
  @@ -900,6 +900,8 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
   /* Other KVM-specific feature fields: */
   x86_cpu_def-svm_features =
   kvm_arch_get_supported_cpuid(s, 0x800A, 0, R_EDX);
  +x86_cpu_def-kvm_features =
  +kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0,
  R_EAX); 
   #endif /* CONFIG_KVM */
   }
 

-- 
Eduardo



[Qemu-devel] [PATCH 08/18] test-iov: add iov_discard_front/back() testcases

2013-01-02 Thread Stefan Hajnoczi
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 tests/test-iov.c | 150 +++
 1 file changed, 150 insertions(+)

diff --git a/tests/test-iov.c b/tests/test-iov.c
index a480bc8..46e4ddd 100644
--- a/tests/test-iov.c
+++ b/tests/test-iov.c
@@ -250,11 +250,161 @@ static void test_io(void)
 #endif
 }
 
+static void test_discard_front(void)
+{
+struct iovec *iov;
+struct iovec *iov_tmp;
+unsigned int iov_cnt;
+unsigned int iov_cnt_tmp;
+void *old_base;
+size_t size;
+size_t ret;
+
+/* Discard zero bytes */
+iov_random(iov, iov_cnt);
+iov_tmp = iov;
+iov_cnt_tmp = iov_cnt;
+ret = iov_discard_front(iov_tmp, iov_cnt_tmp, 0);
+g_assert(ret == 0);
+g_assert(iov_tmp == iov);
+g_assert(iov_cnt_tmp == iov_cnt);
+iov_free(iov, iov_cnt);
+
+/* Discard more bytes than vector size */
+iov_random(iov, iov_cnt);
+iov_tmp = iov;
+iov_cnt_tmp = iov_cnt;
+size = iov_size(iov, iov_cnt);
+ret = iov_discard_front(iov_tmp, iov_cnt_tmp, size + 1);
+g_assert(ret == size);
+g_assert(iov_cnt_tmp == 0);
+iov_free(iov, iov_cnt);
+
+/* Discard entire vector */
+iov_random(iov, iov_cnt);
+iov_tmp = iov;
+iov_cnt_tmp = iov_cnt;
+size = iov_size(iov, iov_cnt);
+ret = iov_discard_front(iov_tmp, iov_cnt_tmp, size);
+g_assert(ret == size);
+g_assert(iov_cnt_tmp == 0);
+iov_free(iov, iov_cnt);
+
+/* Discard within first element */
+iov_random(iov, iov_cnt);
+iov_tmp = iov;
+iov_cnt_tmp = iov_cnt;
+old_base = iov-iov_base;
+size = g_test_rand_int_range(1, iov-iov_len);
+ret = iov_discard_front(iov_tmp, iov_cnt_tmp, size);
+g_assert(ret == size);
+g_assert(iov_tmp == iov);
+g_assert(iov_cnt_tmp == iov_cnt);
+g_assert(iov_tmp-iov_base == old_base + size);
+iov_tmp-iov_base = old_base; /* undo before g_free() */
+iov_free(iov, iov_cnt);
+
+/* Discard entire first element */
+iov_random(iov, iov_cnt);
+iov_tmp = iov;
+iov_cnt_tmp = iov_cnt;
+ret = iov_discard_front(iov_tmp, iov_cnt_tmp, iov-iov_len);
+g_assert(ret == iov-iov_len);
+g_assert(iov_tmp == iov + 1);
+g_assert(iov_cnt_tmp == iov_cnt - 1);
+iov_free(iov, iov_cnt);
+
+/* Discard within second element */
+iov_random(iov, iov_cnt);
+iov_tmp = iov;
+iov_cnt_tmp = iov_cnt;
+old_base = iov[1].iov_base;
+size = iov-iov_len + g_test_rand_int_range(1, iov[1].iov_len);
+ret = iov_discard_front(iov_tmp, iov_cnt_tmp, size);
+g_assert(ret == size);
+g_assert(iov_tmp == iov + 1);
+g_assert(iov_cnt_tmp == iov_cnt - 1);
+g_assert(iov_tmp-iov_base == old_base + (size - iov-iov_len));
+iov_tmp-iov_base = old_base; /* undo before g_free() */
+iov_free(iov, iov_cnt);
+}
+
+static void test_discard_back(void)
+{
+struct iovec *iov;
+unsigned int iov_cnt;
+unsigned int iov_cnt_tmp;
+void *old_base;
+size_t size;
+size_t ret;
+
+/* Discard zero bytes */
+iov_random(iov, iov_cnt);
+iov_cnt_tmp = iov_cnt;
+ret = iov_discard_back(iov, iov_cnt_tmp, 0);
+g_assert(ret == 0);
+g_assert(iov_cnt_tmp == iov_cnt);
+iov_free(iov, iov_cnt);
+
+/* Discard more bytes than vector size */
+iov_random(iov, iov_cnt);
+iov_cnt_tmp = iov_cnt;
+size = iov_size(iov, iov_cnt);
+ret = iov_discard_back(iov, iov_cnt_tmp, size + 1);
+g_assert(ret == size);
+g_assert(iov_cnt_tmp == 0);
+iov_free(iov, iov_cnt);
+
+/* Discard entire vector */
+iov_random(iov, iov_cnt);
+iov_cnt_tmp = iov_cnt;
+size = iov_size(iov, iov_cnt);
+ret = iov_discard_back(iov, iov_cnt_tmp, size);
+g_assert(ret == size);
+g_assert(iov_cnt_tmp == 0);
+iov_free(iov, iov_cnt);
+
+/* Discard within last element */
+iov_random(iov, iov_cnt);
+iov_cnt_tmp = iov_cnt;
+old_base = iov[iov_cnt - 1].iov_base;
+size = g_test_rand_int_range(1, iov[iov_cnt - 1].iov_len);
+ret = iov_discard_back(iov, iov_cnt_tmp, size);
+g_assert(ret == size);
+g_assert(iov_cnt_tmp == iov_cnt);
+g_assert(iov[iov_cnt - 1].iov_base == old_base);
+iov_free(iov, iov_cnt);
+
+/* Discard entire last element */
+iov_random(iov, iov_cnt);
+iov_cnt_tmp = iov_cnt;
+old_base = iov[iov_cnt - 1].iov_base;
+size = iov[iov_cnt - 1].iov_len;
+ret = iov_discard_back(iov, iov_cnt_tmp, size);
+g_assert(ret == size);
+g_assert(iov_cnt_tmp == iov_cnt - 1);
+iov_free(iov, iov_cnt);
+
+/* Discard within second-to-last element */
+iov_random(iov, iov_cnt);
+iov_cnt_tmp = iov_cnt;
+old_base = iov[iov_cnt - 2].iov_base;
+size = iov[iov_cnt - 1].iov_len +
+   g_test_rand_int_range(1, iov[iov_cnt - 2].iov_len);
+ret = iov_discard_back(iov, iov_cnt_tmp, size);
+g_assert(ret == size);
+g_assert(iov_cnt_tmp == iov_cnt - 1);
+g_assert(iov[iov_cnt - 2].iov_base == 

Re: [Qemu-devel] [PATCH] pci-assign: Enable MSIX on device to match guest

2013-01-02 Thread Alex Williamson
On Fri, 2012-12-21 at 08:46 -0700, Alex Williamson wrote:
 On Fri, 2012-12-21 at 14:17 +0200, Michael S. Tsirkin wrote:
  On Thu, Dec 20, 2012 at 03:15:38PM -0700, Alex Williamson wrote:
   On Thu, 2012-12-20 at 18:38 +0200, Michael S. Tsirkin wrote:
On Thu, Dec 20, 2012 at 09:05:50AM -0700, Alex Williamson wrote:
 When a guest enables MSIX on a device we evaluate the MSIX vector
 table, typically find no unmasked vectors and don't switch the device
 to MSIX mode.  This generally works fine and the device will be
 switched once the guest enables and therefore unmasks a vector.
 Unfortunately some drivers enable MSIX, then use interfaces to send
 commands between VF  PF or PF  firmware that act based on the host
 state of the device.  These therefore break when MSIX is managed
 lazily.  This change re-enables the previous test used to enable MSIX
 (see qemu-kvm a6b402c9), which basically guesses whether a vector
 will be used based on the data field of the vector table.
 
 Cc: qemu-sta...@nongnu.org
 Signed-off-by: Alex Williamson alex.william...@redhat.com

Same question: can't we enable and mask MSIX through config sysfs?
In this case it can be done in userspace ...
   
   In this case userspace could do this, but I think it's still incredibly
   dangerous.  Kernel space drivers can also directly enable MSI-X on a
   device, but you might get shot for writing one that did.
  
  What would be the reason for the kernel driver to do this?
 
 Maybe they don't know how many vectors to use until they enable MSI-X
 and query some firmware interface.  It's a hypothetical situation, I'm
 just trying to illustrate that if a kernel driver did want to do this,
 they'd have to develop interfaces to allow it, not just manually poke
 their MSI-X enable bit.
 
We should
   follow the rules, play be the existing kernel interfaces, and work to
   eventually improve those interfaces.  Thanks,
   
   Alex
  
  I'm not against adding an interface for this long term but we have
  existing kernels to support too.  IMHO it would be nicer than
  the data hack which relies on non-documented guest behaviour
  that might change without warning in the future.
 
 We've unwittingly used the data hack for years and only ripped it out
 because it was undocumented.  The patch below adds documentation for it,
 so at least we have a more clear understanding of why it was there if we
 want to try to rip it out again.  This fully supports existing kernels
 and as I mention below, we might be able to do better with limiting how
 many vectors we enabled, but I think this is the right initial fix and
 right fix for stable and we can continue to experiment from here.

Happy new year.  I'd like to close on this as we do currently have a
regression for devices that cannot handle MSI-X being lazily enabled.
The option here is to document and revert to the old style
initialization behavior where we look at the data field of the vector to
get a hint whether the guest intends to make use of the vector.  This
gives us the same behavior as we had previously, but still allows
vectors to be added, so we maintain the current FreeBSD support.  This
much needs to go to stable.

For the development tree, I think we can do better.  Using the data
field is not 100% reliable in giving us the number of vectors the guest
actually intends to use.  Instead we'd like to enable MSI-X with no
vectors and add vectors as the guest unmasks them.  The host Linux MSI
API currently doesn't allow this, so I think the next best thing is to
enable MSI-X with a single vector in the case where MSI-X is enabled but
no vectors are unmasked.  This conserves vectors on the host though we
do potentially allow spurious interrupts through the enabled vector
(though we previously enabled multiple vectors using the above data
method without problems).

The alternative that you're proposing to this longer term solution is to
manually mask all vectors in the physical MSI-X vector table from
userspace then manually enable MSI-X on the physical device (through
pci-sysfs resource and config access respectively).  This puts the
physical device is a state that better matches the guest view of the
devices, but I'm doubtful that the risk is worth the reward.  This adds
a new state to the qemu MSI-X model where we have entirely host kernel
managed physical IRQ state, except for this.  It also creates a
synchronization problem that the physical device moves to a new
interrupt state outside of the control of the host kernel, possibly
bypassing any quirks for the host platform.

Another option is to modify the host MSI API to allow the interface we
want, splitting enabling MSI-X from vector allocation.  That of course
has a much longer lead time. 

We can certainly continue the discussion on this, but we need a fix for
stable and I don't think either of these longer term methods are known
to have the reliability or simplicity of 

Re: [Qemu-devel] [PATCH] qemu-jeos: Update .gitmodules

2013-01-02 Thread Stefan Hajnoczi
On Wed, Dec 19, 2012 at 12:54:25AM +0100, Andreas Färber wrote:
 sources.redhat.com is timing out, use sourceware.org URL instead.
 
 Signed-off-by: Andreas Färber afaer...@suse.de
 ---
  .gitmodules |2 +-
  1 Datei geändert, 1 Zeile hinzugefügt(+), 1 Zeile entfernt(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com



Re: [Qemu-devel] [Bug 1025244] Re: qcow2 image increasing disk size above the virtual limit

2013-01-02 Thread Stefan Hajnoczi
On Tue, Dec 18, 2012 at 10:18:20AM -, Andy Menzel wrote:
 Any solution right now? I have a similar problem like Todor Andreev;
 Our daily backup of some virtual machines (qcow2) looks like that:
 
 1. shutdown the VM
 2. create a snapshot via: qemu-img snapshot -c nameofsnapshot...
 3. boot the VM
 4. backup the snapshot to another virtual disk via: qemu-img convert  -f 
 qcow2 -O qcow2 -s nameofsnapshot...
 5. DELETE the snapshot from VM via: qemu-img snapshot -d nameofsnapshot...

It's not safe to modify the qcow2 file while the guest is running.  This
means Step 5 is not really safe and could result in an inconsistent
image.

This may also be causing the problem: the QEMU process has a variable
with the next free cluster index.  Since Step 5 runs as a separate
process it does not update the QEMU process' next free cluster index
variable.  QEMU doesn't know that there are now free clusters within the
image file because you updated the file behind QEMU's back - the result
is that it grows the file.

Please try deleting the last backup snapshot between Step 1 and Step 2.
This way you'll free the space while QEMU isn't accessing the image
file.  When you boot up the image file again QEMU should reuse the freed
clusters.

Stefan

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1025244

Title:
  qcow2 image increasing disk size above the virtual limit

Status in QEMU:
  New
Status in “qemu-kvm” package in Ubuntu:
  Triaged

Bug description:
  Using qemu/kvm, qcow2 images, ext4 file systems on both guest and host
   Host and Guest: Ubuntu server 12.04 64bit
  To create an image I did this:

  qemu-img create -f qcow2 -o preallocation=metadata ubuntu-pdc-vda.img 
10737418240 (not sure about the exact bytes, but around this)
  ls -l ubuntu-pdc-vda.img
  fallocate -l theSizeInBytesFromAbove ubuntu-pdc-vda.img

  The problem is that the image is growing progressively and has
  obviously no limit, although I gave it one. The root filesystem's
  image is the same case:

  qemu-img info ubuntu-pdc-vda.img
   image: ubuntu-pdc-vda.img
   file format: qcow2
   virtual size: 10G (10737418240 bytes)
   disk size: 14G
   cluster_size: 65536

  and for confirmation:
   du -sh ubuntu-pdc-vda.img
   15G ubuntu-pdc-vda.img

  I made a test and saw that when I delete something from the guest, the real 
size of the image is not decreasing (I read it is normal). OK, but when I write 
something again, it doesn't use the freed space, but instead grows the image. 
So for example:
   1. The initial physical size of the image is 1GB.
   2. I copy 1GB of data in the guest. It's physical size becomes 2GB.
   3. I delete this data (1GB). The physical size of the image remains 2GB.
   4. I copy another 1GB of data to the guest.
   5. The physical size of the image becomes 3GB.
   6. And so on with no limit. It doesn't care if the virtual size is less.

  Is this normal - the real/physical size of the image to be larger than
  the virtual limit???

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1025244/+subscriptions



Re: [Qemu-devel] [PATCH] spice: drop incorrect vm_change_state_handler() opaque

2013-01-02 Thread Stefan Hajnoczi
On Wed, Dec 19, 2012 at 02:07:16PM +0100, Stefan Hajnoczi wrote:
 The spice_server pointer is a global variable and
 vm_change_state_handler() therefore does not use its opaque parameter.
 
 The vm change state handler is added with a pointer to the spice_server
 pointer.  This is useless and we probably would not want 2 levels of
 pointers.
 
 Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
 ---
  ui/spice-core.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)
 
 diff --git a/ui/spice-core.c b/ui/spice-core.c
 index ac46deb..c128c0b 100644
 --- a/ui/spice-core.c
 +++ b/ui/spice-core.c
 @@ -709,7 +709,7 @@ void qemu_spice_init(void)
  qemu_spice_input_init();
  qemu_spice_audio_init();
  
 -qemu_add_vm_change_state_handler(vm_change_state_handler, spice_server);
 +qemu_add_vm_change_state_handler(vm_change_state_handler, NULL);
  
  g_free(x509_key_file);
  g_free(x509_cert_file);
 @@ -736,8 +736,7 @@ int qemu_spice_add_interface(SpiceBaseInstance *sin)
   */
  spice_server = spice_server_new();
  spice_server_init(spice_server, core_interface);
 -qemu_add_vm_change_state_handler(vm_change_state_handler,
 - spice_server);
 +qemu_add_vm_change_state_handler(vm_change_state_handler, NULL);
  }
  
  return spice_server_add_interface(spice_server, sin);
 -- 
 1.8.0.2

Gerd, would you like to take this through the spice queue or should I
put it in trivial-patches?

Stefan



[Qemu-devel] [PATCH 16/18] block/raw-win32: Fix compiler warnings (wrong format specifiers)

2013-01-02 Thread Stefan Hajnoczi
From: Stefan Weil s...@weilnetz.de

Commit fbcad04d6bfdff937536eb23088a01a280a1a3af added fprintf statements
with wrong format specifiers.

GetLastError() returns a DWORD which is unsigned long, so %lu must be used.

Signed-off-by: Stefan Weil s...@weilnetz.de
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/raw-win32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/raw-win32.c b/block/raw-win32.c
index f58334b..b89ac19 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -314,11 +314,11 @@ static int raw_truncate(BlockDriverState *bs, int64_t 
offset)
  */
 dwPtrLow = SetFilePointer(s-hfile, low, high, FILE_BEGIN);
 if (dwPtrLow == INVALID_SET_FILE_POINTER  GetLastError() != NO_ERROR) {
-fprintf(stderr, SetFilePointer error: %d\n, GetLastError());
+fprintf(stderr, SetFilePointer error: %lu\n, GetLastError());
 return -EIO;
 }
 if (SetEndOfFile(s-hfile) == 0) {
-fprintf(stderr, SetEndOfFile error: %d\n, GetLastError());
+fprintf(stderr, SetEndOfFile error: %lu\n, GetLastError());
 return -EIO;
 }
 return 0;
-- 
1.8.0.2




Re: [Qemu-devel] How to make TCP/IP applications run on guest OS?

2013-01-02 Thread Stefan Hajnoczi
On Wed, Dec 19, 2012 at 09:54:32PM +0800, GaoYi wrote:
 Hi all,
 
I have bridged the network of the host. There was one br0 and several
 taps on it. When I started up a guest using:
 
  #kvm -hda ubuntu.img -localtime -m 1G  -net nic, -net
 tap,ifname=tap0,script=no
 
The guest can ping to other VMs or physical PCs within the same LAN.
 However, when I tried to communicate with other VMs/PCs using TCP/IP,
 the incoming IP at the receiver side is the same as the host IP instead of
 the VM's IP. Then how to configure the network so that
 the TCP/IP applications run well just like the phsical PC?

libvirt/virt-manager can set up the network for you.  I suggest using
them if you're having issues configuring bridging.

There is some basic information here but you'll find specifics if you do
a web search for qemu bridging or similar:
http://wiki.qemu.org/Documentation/Networking

Stefan



[Qemu-devel] [PATCH 13/18] virtio-blk: Return UNSUPP for unknown request types

2013-01-02 Thread Stefan Hajnoczi
From: Alexey Zaytsev alexey.zayt...@gmail.com

Currently, all unknown requests are treated as VIRTIO_BLK_T_IN

Signed-off-by: Alexey Zaytsev alexey.zayt...@gmail.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/virtio-blk.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 92c745a..df57b35 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -398,10 +398,14 @@ static void virtio_blk_handle_request(VirtIOBlockReq *req,
 qemu_iovec_init_external(req-qiov, req-elem.out_sg[1],
  req-elem.out_num - 1);
 virtio_blk_handle_write(req, mrb);
-} else {
+} else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
+/* VIRTIO_BLK_T_IN is 0, so we can't just  it. */
 qemu_iovec_init_external(req-qiov, req-elem.in_sg[0],
  req-elem.in_num - 1);
 virtio_blk_handle_read(req);
+} else {
+virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
+g_free(req);
 }
 }
 
-- 
1.8.0.2




[Qemu-devel] [PATCH 12/18] virtio-blk: add x-data-plane=on|off performance feature

2013-01-02 Thread Stefan Hajnoczi
The virtio-blk-data-plane feature is easy to integrate into
hw/virtio-blk.c.  The data plane can be started and stopped similar to
vhost-net.

Users can take advantage of the virtio-blk-data-plane feature using the
new -device virtio-blk-pci,x-data-plane=on property.

The x-data-plane name was chosen because at this stage the feature is
experimental and likely to see changes in the future.

If the VM configuration does not support virtio-blk-data-plane an error
message is printed.  Although we could fall back to regular virtio-blk,
I prefer the explicit approach since it prompts the user to fix their
configuration if they want the performance benefit of
virtio-blk-data-plane.

Limitations:
 * Only format=raw is supported
 * Live migration is not supported
 * Block jobs, hot unplug, and other operations fail with -EBUSY
 * I/O throttling limits are ignored
 * Only Linux hosts are supported due to Linux AIO usage

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/virtio-blk.c | 44 +++-
 hw/virtio-pci.c |  3 +++
 2 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index f004148..92c745a 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -17,6 +17,9 @@
 #include hw/block-common.h
 #include sysemu/blockdev.h
 #include virtio-blk.h
+#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
+#include hw/dataplane/virtio-blk.h
+#endif
 #include scsi-defs.h
 #ifdef __linux__
 # include scsi/sg.h
@@ -33,6 +36,9 @@ typedef struct VirtIOBlock
 VirtIOBlkConf *blk;
 unsigned short sector_mask;
 DeviceState *qdev;
+#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
+VirtIOBlockDataPlane *dataplane;
+#endif
 } VirtIOBlock;
 
 static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
@@ -407,6 +413,16 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, 
VirtQueue *vq)
 .num_writes = 0,
 };
 
+#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
+/* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
+ * dataplane here instead of waiting for .set_status().
+ */
+if (s-dataplane) {
+virtio_blk_data_plane_start(s-dataplane);
+return;
+}
+#endif
+
 while ((req = virtio_blk_get_request(s))) {
 virtio_blk_handle_request(req, mrb);
 }
@@ -446,8 +462,9 @@ static void virtio_blk_dma_restart_cb(void *opaque, int 
running,
 {
 VirtIOBlock *s = opaque;
 
-if (!running)
+if (!running) {
 return;
+}
 
 if (!s-bh) {
 s-bh = qemu_bh_new(virtio_blk_dma_restart_bh, s);
@@ -457,6 +474,14 @@ static void virtio_blk_dma_restart_cb(void *opaque, int 
running,
 
 static void virtio_blk_reset(VirtIODevice *vdev)
 {
+#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
+VirtIOBlock *s = to_virtio_blk(vdev);
+
+if (s-dataplane) {
+virtio_blk_data_plane_stop(s-dataplane);
+}
+#endif
+
 /*
  * This should cancel pending requests, but can't do nicely until there
  * are per-device request lists.
@@ -541,6 +566,12 @@ static void virtio_blk_set_status(VirtIODevice *vdev, 
uint8_t status)
 VirtIOBlock *s = to_virtio_blk(vdev);
 uint32_t features;
 
+#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
+if (s-dataplane  !(status  VIRTIO_CONFIG_S_DRIVER)) {
+virtio_blk_data_plane_stop(s-dataplane);
+}
+#endif
+
 if (!(status  VIRTIO_CONFIG_S_DRIVER_OK)) {
 return;
 }
@@ -638,6 +669,12 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, 
VirtIOBlkConf *blk)
 s-sector_mask = (s-conf-logical_block_size / BDRV_SECTOR_SIZE) - 1;
 
 s-vq = virtio_add_queue(s-vdev, 128, virtio_blk_handle_output);
+#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
+if (!virtio_blk_data_plane_create(s-vdev, blk, s-dataplane)) {
+virtio_cleanup(s-vdev);
+return NULL;
+}
+#endif
 
 qemu_add_vm_change_state_handler(virtio_blk_dma_restart_cb, s);
 s-qdev = dev;
@@ -655,6 +692,11 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, 
VirtIOBlkConf *blk)
 void virtio_blk_exit(VirtIODevice *vdev)
 {
 VirtIOBlock *s = to_virtio_blk(vdev);
+
+#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
+virtio_blk_data_plane_destroy(s-dataplane);
+s-dataplane = NULL;
+#endif
 unregister_savevm(s-qdev, virtio-blk, s);
 blockdev_mark_auto_del(s-bs);
 virtio_cleanup(vdev);
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 3cab783..82761cf 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -896,6 +896,9 @@ static Property virtio_blk_properties[] = {
 #endif
 DEFINE_PROP_BIT(config-wce, VirtIOPCIProxy, blk.config_wce, 0, true),
 DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
+#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
+DEFINE_PROP_BIT(x-data-plane, VirtIOPCIProxy, blk.data_plane, 0, false),
+#endif
 DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
 DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
 DEFINE_PROP_END_OF_LIST(),
-- 
1.8.0.2




Re: [Qemu-devel] [Bug 1025244] Re: qcow2 image increasing disk size above the virtual limit

2013-01-02 Thread Eric Blake
On 01/02/2013 08:50 AM, Stefan Hajnoczi wrote:
 On Tue, Dec 18, 2012 at 10:18:20AM -, Andy Menzel wrote:
 Any solution right now? I have a similar problem like Todor Andreev;
 Our daily backup of some virtual machines (qcow2) looks like that:

 1. shutdown the VM
 2. create a snapshot via: qemu-img snapshot -c nameofsnapshot...
 3. boot the VM
 4. backup the snapshot to another virtual disk via: qemu-img convert  -f 
 qcow2 -O qcow2 -s nameofsnapshot...
 5. DELETE the snapshot from VM via: qemu-img snapshot -d nameofsnapshot...
 
 It's not safe to modify the qcow2 file while the guest is running.  This
 means Step 5 is not really safe and could result in an inconsistent
 image.
 
 This may also be causing the problem: the QEMU process has a variable
 with the next free cluster index.  Since Step 5 runs as a separate
 process it does not update the QEMU process' next free cluster index
 variable.  QEMU doesn't know that there are now free clusters within the
 image file because you updated the file behind QEMU's back - the result
 is that it grows the file.
 
 Please try deleting the last backup snapshot between Step 1 and Step 2.
 This way you'll free the space while QEMU isn't accessing the image
 file.  When you boot up the image file again QEMU should reuse the freed
 clusters.

You might also want to try modifying step 5 to use the HMP delvm monitor
command from within the running qemu rather than going behind qemu's
back with a qemu-img invocation.  That's how libvirt deletes internal
snapshots from a running qemu.

Also, there are patches currently under review that are talking about
creating a QMP counterpart to the delvm monitor command.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH 04/18] dataplane: add virtqueue vring code

2013-01-02 Thread Stefan Hajnoczi
The virtio-blk-data-plane cannot access memory using the usual QEMU
functions since it executes outside the global mutex and the memory APIs
are this time are not thread-safe.

This patch introduces a virtqueue module based on the kernel's vhost
vring code.  The trick is that we map guest memory ahead of time and
access it cheaply outside the global mutex.

Once the hardware emulation code can execute outside the global mutex it
will be possible to drop this code.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/dataplane/Makefile.objs |   2 +-
 hw/dataplane/vring.c   | 362 +
 hw/dataplane/vring.h   |  62 
 trace-events   |   3 +
 4 files changed, 428 insertions(+), 1 deletion(-)
 create mode 100644 hw/dataplane/vring.c
 create mode 100644 hw/dataplane/vring.h

diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
index 8c8dea1..34e6d57 100644
--- a/hw/dataplane/Makefile.objs
+++ b/hw/dataplane/Makefile.objs
@@ -1,3 +1,3 @@
 ifeq ($(CONFIG_VIRTIO), y)
-common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o
+common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o
 endif
diff --git a/hw/dataplane/vring.c b/hw/dataplane/vring.c
new file mode 100644
index 000..d5d4ef4
--- /dev/null
+++ b/hw/dataplane/vring.c
@@ -0,0 +1,362 @@
+/* Copyright 2012 Red Hat, Inc.
+ * Copyright IBM, Corp. 2012
+ *
+ * Based on Linux 2.6.39 vhost code:
+ * Copyright (C) 2009 Red Hat, Inc.
+ * Copyright (C) 2006 Rusty Russell IBM Corporation
+ *
+ * Author: Michael S. Tsirkin m...@redhat.com
+ * Stefan Hajnoczi stefa...@redhat.com
+ *
+ * Inspiration, some code, and most witty comments come from
+ * Documentation/virtual/lguest/lguest.c, by Rusty Russell
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+
+#include trace.h
+#include hw/dataplane/vring.h
+
+/* Map the guest's vring to host memory */
+bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
+{
+hwaddr vring_addr = virtio_queue_get_ring_addr(vdev, n);
+hwaddr vring_size = virtio_queue_get_ring_size(vdev, n);
+void *vring_ptr;
+
+vring-broken = false;
+
+hostmem_init(vring-hostmem);
+vring_ptr = hostmem_lookup(vring-hostmem, vring_addr, vring_size, true);
+if (!vring_ptr) {
+error_report(Failed to map vring 
+ addr %# HWADDR_PRIx  size % HWADDR_PRIu,
+ vring_addr, vring_size);
+vring-broken = true;
+return false;
+}
+
+vring_init(vring-vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096);
+
+vring-last_avail_idx = 0;
+vring-last_used_idx = 0;
+vring-signalled_used = 0;
+vring-signalled_used_valid = false;
+
+trace_vring_setup(virtio_queue_get_ring_addr(vdev, n),
+  vring-vr.desc, vring-vr.avail, vring-vr.used);
+return true;
+}
+
+void vring_teardown(Vring *vring)
+{
+hostmem_finalize(vring-hostmem);
+}
+
+/* Disable guest-host notifies */
+void vring_disable_notification(VirtIODevice *vdev, Vring *vring)
+{
+if (!(vdev-guest_features  (1  VIRTIO_RING_F_EVENT_IDX))) {
+vring-vr.used-flags |= VRING_USED_F_NO_NOTIFY;
+}
+}
+
+/* Enable guest-host notifies
+ *
+ * Return true if the vring is empty, false if there are more requests.
+ */
+bool vring_enable_notification(VirtIODevice *vdev, Vring *vring)
+{
+if (vdev-guest_features  (1  VIRTIO_RING_F_EVENT_IDX)) {
+vring_avail_event(vring-vr) = vring-vr.avail-idx;
+} else {
+vring-vr.used-flags = ~VRING_USED_F_NO_NOTIFY;
+}
+smp_mb(); /* ensure update is seen before reading avail_idx */
+return !vring_more_avail(vring);
+}
+
+/* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */
+bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
+{
+uint16_t old, new;
+bool v;
+/* Flush out used index updates. This is paired
+ * with the barrier that the Guest executes when enabling
+ * interrupts. */
+smp_mb();
+
+if ((vdev-guest_features  VIRTIO_F_NOTIFY_ON_EMPTY) 
+unlikely(vring-vr.avail-idx == vring-last_avail_idx)) {
+return true;
+}
+
+if (!(vdev-guest_features  VIRTIO_RING_F_EVENT_IDX)) {
+return !(vring-vr.avail-flags  VRING_AVAIL_F_NO_INTERRUPT);
+}
+old = vring-signalled_used;
+v = vring-signalled_used_valid;
+new = vring-signalled_used = vring-last_used_idx;
+vring-signalled_used_valid = true;
+
+if (unlikely(!v)) {
+return true;
+}
+
+return vring_need_event(vring_used_event(vring-vr), new, old);
+}
+
+/* This is stolen from linux/drivers/vhost/vhost.c. */
+static int get_indirect(Vring *vring,
+struct iovec iov[], struct iovec *iov_end,
+unsigned int *out_num, unsigned int *in_num,
+struct vring_desc *indirect)
+{
+struct vring_desc desc;
+unsigned int i = 0, count, found = 0;
+
+/* 

[Qemu-devel] [PATCH] qga: add missing commas in json docs

2013-01-02 Thread Eric Blake
* qga/qapi-schema.json: Use valid JSON.

Signed-off-by: Eric Blake ebl...@redhat.com
---
 qga/qapi-schema.json | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index ed0eb69..d91d903 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -31,7 +31,7 @@
 #
 # Since: 1.1
 # ##
-{ 'command': 'guest-sync-delimited'
+{ 'command': 'guest-sync-delimited',
   'data':{ 'id': 'int' },
   'returns': 'int' }

@@ -69,7 +69,7 @@
 #
 # Since: 0.15.0
 ##
-{ 'command': 'guest-sync'
+{ 'command': 'guest-sync',
   'data':{ 'id': 'int' },
   'returns': 'int' }

-- 
1.8.0.2




[Qemu-devel] [RFC V4 00/30] QCOW2 deduplication

2013-01-02 Thread Benoît Canet
This patchset is a cleanup of the previous QCOW2 deduplication rfc.

One can compile and install https://github.com/wernerd/Skein3Fish and use the
--enable-skein-dedup configure option in order to use the faster skein HASH.

Images must be created with -o dedup=[skein|sha256] in order to activate the
deduplication in the image.

Deduplication is now fast enough to be usable.

v4: Fix and complete qcow2 spec [Stefan]
Hash the hash_algo field in the header extension [Stefan]
Fix qcow2 spec [Eric]
Remove pointer to hash and simplify hash memory management [Stefan]
Rename and move qcow2_read_cluster_data to qcow2.c [Stefan]
Document lock dropping behaviour of the previous function [Stefan]
cleanup qcow2_dedup_read_missing_cluster_data [Stefan]
rename *_offset to *_sect [Stefan]
add a ./configure check for ssl [Stefan]
Replace openssl by gnutls [Stefan]
Implement Skein hashes
Rewrite pretty every qcow2-dedup.c commits after Add
   qcow2_dedup_read_missing_and_concatenate to simplify the code
Use 64KB deduplication hash block to reduce allocation flushes
Use 64KB l2 tables to reduce allocation flushes [breaks compatibility]
Use lazy refcounts to avoid qcow2_cache_set_dependency loops resultings
   in frequent caches flushes
Do not create and load dedup RAM structures when bdrs-read_only is true

v3: make it work barely
replace kernel red black trees by gtree.

*** BLURB HERE ***

Benoît Canet (30):
  qcow2: Add deduplication to the qcow2 specification.
  qcow2: Add deduplication structures and fields.
  qcow2: Add qcow2_dedup_read_missing_and_concatenate
  qcow2: Make update_refcount public.
  qcow2: Create a way to link to l2 tables when deduplicating.
  qcow2: Add qcow2_dedup and related functions
  qcow2: Add qcow2_dedup_store_new_hashes.
  qcow2: Implement qcow2_compute_cluster_hash.
  qcow2: Extract qcow2_dedup_grow_table
  qcow2: Add qcow2_dedup_grow_table and use it.
  qcow2: create function to load deduplication hashes at startup.
  qcow2: Load and save deduplication table header extension.
  qcow2: Extract qcow2_do_table_init.
  qcow2-cache: Allow to choose table size at creation.
  qcow2: Add qcow2_dedup_init and qcow2_dedup_close.
  qcow2: Extract qcow2_add_feature and qcow2_remove_feature.
  block: Add qemu-img dedup create option.
  qcow2: Behave correctly when refcount reach 0 or 2^16.
  qcow2: Integrate deduplication in qcow2_co_writev loop.
  qcow2: Serialize write requests when deduplication is activated.
  qcow2: Add verification of dedup table.
  qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup.
  qcow2: Add check_dedup_l2 in order to check l2 of dedup table.
  qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED.
  qcow2: Integrate SKEIN hash algorithm in deduplication.
  qcow2: Add lazy refcounts to deduplication to prevent
qcow2_cache_set_dependency loops
  qcow2: Use large L2 table for deduplication.
  qcow: Set dedup cluster block size to 64KB.
  qcow2: init and cleanup deduplication.
  qemu-iotests: Filter dedup=on/off so existing tests don't break.

 block/Makefile.objs  |1 +
 block/qcow2-cache.c  |   12 +-
 block/qcow2-cluster.c|  116 +++--
 block/qcow2-dedup.c  | 1157 ++
 block/qcow2-refcount.c   |  157 --
 block/qcow2.c|  357 +++--
 block/qcow2.h|  120 -
 configure|   55 ++
 docs/specs/qcow2.txt |  100 +++-
 include/block/block_int.h|1 +
 tests/qemu-iotests/common.rc |3 +-
 11 files changed, 1955 insertions(+), 124 deletions(-)
 create mode 100644 block/qcow2-dedup.c

-- 
1.7.10.4




[Qemu-devel] [RFC V4 01/30] qcow2: Add deduplication to the qcow2 specification.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 docs/specs/qcow2.txt |  100 +-
 1 file changed, 99 insertions(+), 1 deletion(-)

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 36a559d..c9c0d47 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -80,7 +80,12 @@ in the description of a field.
 tables to repair refcounts before accessing the
 image.
 
-Bits 1-63:  Reserved (set to 0)
+Bit 1:  Deduplication bit.  If this bit is set then
+deduplication is used on this image.
+L2 tables size 64KB is different from
+cluster size 4KB.
+
+Bits 2-63:  Reserved (set to 0)
 
  80 -  87:  compatible_features
 Bitmask of compatible features. An implementation can
@@ -116,6 +121,7 @@ be stored. Each extension has a structure like the 
following:
 0x - End of the header extension area
 0xE2792ACA - Backing file format name
 0x6803f857 - Feature name table
+0xCD8E819B - Deduplication
 other  - Unknown header extension, can be safely
  ignored
 
@@ -159,6 +165,98 @@ the header extension data. Each entry look like this:
 terminated if it has full length)
 
 
+== Deduplication ==
+
+The deduplication extension contains the informations concerning the
+deduplication.
+
+Byte   0 - 7:   Offset of the RAM deduplication table
+
+  8 - 11:   Size of the RAM deduplication table = number of L1 64-bit
+pointers
+
+  12:   Hash algo enum field
+0: SHA-256
+1: SHA3
+2: SKEIN-256
+
+  13:   Dedup stategies bitmap
+0: RAM based hash lookup
+1: Disk based hash lookup
+
+Disk based lookup structure will be described in a future QCOW2 specification.
+
+== Deduplication table (RAM method) ==
+
+The deduplication table maps a physical offset to a data hash and
+logical offset. It is used to store permanently the informations required to
+do the deduplication. It is loaded at startup into a RAM based representation
+used to do the lookups.
+
+The deduplication table contains 64-bit offsets to the level 2 deduplication
+table blocks.
+Each entry of these blocks contains a 32-byte SHA256 hash followed by the
+64-bit logical offset of the first encountered cluster having this hash.
+
+== Deduplication table schematic (RAM method) ==
+
+0   l1_dedup_index  Size
+  |
+||
+| |  |
+| |L1 Deduplication table|
+| |  |
+||
+  |
+  |
+  |
+0 |   l2_dedup_block_entries
+  |
+|-|
+| |
+|L2 deduplication block   |
+| |
+| l2_dedup_index  |
+|-|
+ |
+ 0   |  40
+ |
+ |---|
+ |   |
+ |Deduplication table entry  |
+ |   |
+ |---|
+
+
+== Deduplication table entry description (RAM method) ==
+
+Each L2 deduplication table entry has the following structure:
+
+Byte  0 - 31:   hash of data cluster
+
+ 32 - 39:   Logical offset of first encountered block having
+this hash
+
+== Deduplication table arithmetics (RAM method) ==
+
+Entries in the deduplication table are ordered by physical cluster index.
+
+The number of entries in an l2 deduplication table block is :
+l2_dedup_block_entries = dedup_block_size / (32 + 8)
+
+The index in the level 1 deduplication table is :
+l1_dedup_index = physical_cluster_index / l2_block_cluster_entries
+
+The index in the level 2 deduplication table is:
+l2_dedup_index = physical_cluster_index % l2_block_cluster_entries
+
+cluster_size = 4096
+dedup_block_size = 65536
+l2_size = 65536
+
+The 16 remaining bytes in each l2 deduplication blocks are set to zero and
+reserved for a future usage.
+
 == Host cluster management ==
 
 qcow2 manages the allocation of host clusters by maintaining a reference count
-- 

[Qemu-devel] [RFC V4 04/30] qcow2: Make update_refcount public.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-refcount.c |6 +-
 block/qcow2.h  |2 ++
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6a95aa6..e014b0e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -27,10 +27,6 @@
 #include block/qcow2.h
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
-static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
-int64_t offset, int64_t length,
-int addend);
-
 
 /*/
 /* refcount handling */
@@ -413,7 +409,7 @@ fail_block:
 }
 
 /* XXX: cache several refcount block clusters ? */
-static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
+int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
 int64_t offset, int64_t length, int addend)
 {
 BDRVQcowState *s = bs-opaque;
diff --git a/block/qcow2.h b/block/qcow2.h
index 730c9be..3307481 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -384,6 +384,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 
 int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
   BdrvCheckMode fix);
+int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
+int64_t offset, int64_t length, int addend);
 
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size);
-- 
1.7.10.4




[Qemu-devel] [RFC V4 10/30] qcow2: Add qcow2_dedup_grow_table and use it.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-dedup.c |   44 +++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 7adaaba..b998a2d 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -38,6 +38,44 @@ static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
bool write);
 
 /*
+ * Save the dedup table information into the header extensions
+ *
+ * @table_offset: the dedup table offset in the QCOW2 file
+ * @size: the size of the dedup table
+ * @ret:  0 on success, -errno  on error
+ */
+static int qcow2_dedup_save_table_info(BlockDriverState *bs,
+  int64_t table_offset, int size)
+{
+BDRVQcowState *s = bs-opaque;
+s-dedup_table_offset = table_offset;
+s-dedup_table_size = size;
+return qcow2_update_header(bs);
+}
+
+/*
+ * Grow the deduplication table
+ *
+ * @min_size:   minimal size
+ * @exact_size: if true force to grow to the exact size
+ * @ret:0 on success, -errno  on error
+ */
+static int qcow2_dedup_grow_table(BlockDriverState *bs,
+  int min_size,
+  bool exact_size)
+{
+BDRVQcowState *s = bs-opaque;
+return qcow2_do_grow_table(bs,
+   min_size,
+   exact_size,
+   s-dedup_table,
+   s-dedup_table_offset,
+   s-dedup_table_size,
+   qcow2_dedup_save_table_info,
+   dedup);
+}
+
+/*
  * Prepare a buffer containing all the required data required to compute 
cluster
  * sized deduplication hashes.
  * If sector_num or nb_sectors are not cluster-aligned, missing data
@@ -712,7 +750,11 @@ static int qcow2_dedup_read_write_hash(BlockDriverState 
*bs,
 index_in_dedup_table = cluster_number / nb_hash_in_block;
 
 if (s-dedup_table_size = index_in_dedup_table) {
-return -ENOSPC;
+ret = qcow2_dedup_grow_table(bs, index_in_dedup_table + 1, false);
+}
+
+if (ret  0) {
+return ret;
 }
 
 /* if we must read and there is nothing to read return a null hash */
-- 
1.7.10.4




[Qemu-devel] [RFC V4 15/30] qcow2: Add qcow2_dedup_init and qcow2_dedup_close.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-dedup.c |   16 
 block/qcow2.h   |2 ++
 2 files changed, 18 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 4c391e5..12a2dad 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -986,3 +986,19 @@ void coroutine_fn qcow2_co_load_dedup_hashes(void *opaque)
 qemu_co_mutex_unlock(s-lock);
 }
 }
+
+int qcow2_dedup_init(BlockDriverState *bs)
+{
+BDRVQcowState *s = bs-opaque;
+return qcow2_do_table_init(bs,
+   s-dedup_table,
+   s-dedup_table_offset,
+   s-dedup_table_size,
+   false);
+}
+
+void qcow2_dedup_close(BlockDriverState *bs)
+{
+BDRVQcowState *s = bs-opaque;
+g_free(s-dedup_table);
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 4932750..43586f2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -474,5 +474,7 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
  uint64_t logical_sect,
  uint64_t physical_sect);
 void coroutine_fn qcow2_co_load_dedup_hashes(void *opaque);
+int qcow2_dedup_init(BlockDriverState *bs);
+void qcow2_dedup_close(BlockDriverState *bs);
 
 #endif
-- 
1.7.10.4




[Qemu-devel] [RFC V4 17/30] block: Add qemu-img dedup create option.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2.c |  113 +++--
 block/qcow2.h |2 +
 include/block/block_int.h |1 +
 3 files changed, 103 insertions(+), 13 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index ad399c8..9130638 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -274,6 +274,11 @@ int qcow2_mark_dirty(BlockDriverState *bs)
 return qcow2_add_feature(bs, QCOW2_INCOMPAT_DIRTY);
 }
 
+static int qcow2_activate_dedup(BlockDriverState *bs)
+{
+return qcow2_add_feature(bs, QCOW2_INCOMPAT_DEDUP);
+}
+
 /*
  * Clears an incompatible feature bit and flushes before if necessary.
  * Only call this function when there are no pending requests, it does not
@@ -905,6 +910,11 @@ static void qcow2_close(BlockDriverState *bs)
 BDRVQcowState *s = bs-opaque;
 g_free(s-l1_table);
 
+if (s-has_dedup) {
+qcow2_cache_flush(bs, s-dedup_cluster_cache);
+qcow2_cache_destroy(bs, s-dedup_cluster_cache);
+}
+
 qcow2_cache_flush(bs, s-l2_table_cache);
 qcow2_cache_flush(bs, s-refcount_block_cache);
 
@@ -1261,7 +1271,8 @@ static int preallocate(BlockDriverState *bs)
 static int qcow2_create2(const char *filename, int64_t total_size,
  const char *backing_file, const char *backing_format,
  int flags, size_t cluster_size, int prealloc,
- QEMUOptionParameter *options, int version)
+ QEMUOptionParameter *options, int version,
+ bool dedup, uint8_t hash_algo)
 {
 /* Calculate cluster_bits */
 int cluster_bits;
@@ -1288,8 +1299,10 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
  * size for any qcow2 image.
  */
 BlockDriverState* bs;
+BDRVQcowState *s;
 QCowHeader header;
-uint8_t* refcount_table;
+uint8_t *tables;
+int size;
 int ret;
 
 ret = bdrv_create_file(filename, options);
@@ -1331,10 +1344,11 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 goto out;
 }
 
-/* Write an empty refcount table */
-refcount_table = g_malloc0(cluster_size);
-ret = bdrv_pwrite(bs, cluster_size, refcount_table, cluster_size);
-g_free(refcount_table);
+/* Write an empty refcount table + extra space for dedup table if needed */
+size = dedup ? 2 : 1;
+tables = g_malloc0(size * cluster_size);
+ret = bdrv_pwrite(bs, cluster_size, tables, size * cluster_size);
+g_free(tables);
 
 if (ret  0) {
 goto out;
@@ -1345,7 +1359,7 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 /*
  * And now open the image and make it consistent first (i.e. increase the
  * refcount of the cluster that is occupied by the header and the refcount
- * table)
+ * table and the eventual dedup table)
  */
 BlockDriver* drv = bdrv_find_format(qcow2);
 assert(drv != NULL);
@@ -1355,7 +1369,8 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 goto out;
 }
 
-ret = qcow2_alloc_clusters(bs, 2 * cluster_size);
+size++; /* Add a cluster for the header */
+ret = qcow2_alloc_clusters(bs, size * cluster_size);
 if (ret  0) {
 goto out;
 
@@ -1365,11 +1380,33 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 }
 
 /* Okay, now that we have a valid image, let's give it the right size */
+s = bs-opaque;
 ret = bdrv_truncate(bs, total_size * BDRV_SECTOR_SIZE);
 if (ret  0) {
 goto out;
 }
 
+if (dedup) {
+s-has_dedup = true;
+s-dedup_table_offset = cluster_size * 2;
+s-dedup_table_size = cluster_size / sizeof(uint64_t);
+s-dedup_hash_algo = hash_algo;
+
+ret = qcow2_activate_dedup(bs);
+if (ret  0) {
+goto out;
+}
+
+ret = qcow2_update_header(bs);
+if (ret  0) {
+goto out;
+}
+
+/* minimal init */
+s-dedup_cluster_cache = qcow2_cache_create(bs, DEDUP_CACHE_SIZE,
+s-hash_block_size);
+}
+
 /* Want a backing file? There you go.*/
 if (backing_file) {
 ret = bdrv_change_backing_file(bs, backing_file, backing_format);
@@ -1395,15 +1432,41 @@ out:
 return ret;
 }
 
+static int qcow2_warn_if_version_3_is_needed(int version,
+ bool has_feature,
+ const char *feature)
+{
+if (version  3  has_feature) {
+fprintf(stderr, %s only supported with compatibility 
+level 1.1 and above (use compat=1.1 or greater)\n,
+feature);
+return -EINVAL;
+}
+return 0;
+}
+
+static int8_t qcow2_get_dedup_hash_algo(char *value)
+{
+if (!strcmp(value, sha256)) {
+return QCOW_HASH_SHA256;
+}
+

[Qemu-devel] [RFC V4 21/30] qcow2: Add verification of dedup table.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-refcount.c |8 
 1 file changed, 8 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index aef280d..7e6d02f 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1156,6 +1156,14 @@ int qcow2_check_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
 goto fail;
 }
 
+if (s-has_dedup) {
+ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
+ s-dedup_table_offset, s-dedup_table_size, 
0);
+if (ret  0) {
+goto fail;
+}
+}
+
 /* snapshots */
 for(i = 0; i  s-nb_snapshots; i++) {
 sn = s-snapshots + i;
-- 
1.7.10.4




[Qemu-devel] [PATCH 14/18] cutils: change strtosz_suffix_unit function

2013-01-02 Thread Stefan Hajnoczi
From: liguang lig.f...@cn.fujitsu.com

if value to be translated is larger than INT64_MAX,
this function will not be convenient for caller to
be aware of it, so change a little for this.

Signed-off-by: liguang lig.f...@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 cutils.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/cutils.c b/cutils.c
index d06590b..80bb1dc 100644
--- a/cutils.c
+++ b/cutils.c
@@ -214,12 +214,13 @@ static int64_t suffix_mul(char suffix, int64_t unit)
 /*
  * Convert string to bytes, allowing either B/b for bytes, K/k for KB,
  * M/m for MB, G/g for GB or T/t for TB. End pointer will be returned
- * in *end, if not NULL. Return -1 on error.
+ * in *end, if not NULL. Return -ERANGE on overflow, Return -EINVAL on
+ * other error.
  */
 int64_t strtosz_suffix_unit(const char *nptr, char **end,
 const char default_suffix, int64_t unit)
 {
-int64_t retval = -1;
+int64_t retval = -EINVAL;
 char *endptr;
 unsigned char c;
 int mul_required = 0;
@@ -246,6 +247,7 @@ int64_t strtosz_suffix_unit(const char *nptr, char **end,
 goto fail;
 }
 if ((val * mul = INT64_MAX) || val  0) {
+retval = -ERANGE;
 goto fail;
 }
 retval = val * mul;
-- 
1.8.0.2




[Qemu-devel] [RFC V4 19/30] qcow2: Integrate deduplication in qcow2_co_writev loop.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2.c |   85 +++--
 1 file changed, 83 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 9130638..54c8847 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -328,6 +328,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 QCowHeader header;
 uint64_t ext_end;
 
+s-has_dedup = false;
 ret = bdrv_pread(bs-file, 0, header, sizeof(header));
 if (ret  0) {
 goto fail;
@@ -790,13 +791,17 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState 
*bs,
 BDRVQcowState *s = bs-opaque;
 int index_in_cluster;
 int n_end;
-int ret;
+int ret = 0;
 int cur_nr_sectors; /* number of sectors in current iteration */
 uint64_t cluster_offset;
 QEMUIOVector hd_qiov;
 uint64_t bytes_done = 0;
 uint8_t *cluster_data = NULL;
 QCowL2Meta *l2meta;
+uint8_t *dedup_cluster_data = NULL;
+int dedup_cluster_data_nr;
+int deduped_sectors_nr;
+QCowDedupState ds;
 
 trace_qcow2_writev_start_req(qemu_coroutine_self(), sector_num,
  remaining_sectors);
@@ -807,13 +812,69 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState 
*bs,
 
 qemu_co_mutex_lock(s-lock);
 
+if (s-has_dedup) {
+QTAILQ_INIT(ds.undedupables);
+ds.phash.reuse = false;
+ds.nb_undedupable_sectors = 0;
+ds.nb_clusters_processed = 0;
+
+/* if deduplication is on we make sure dedup_cluster_data
+ * contains a multiple of cluster size of data in order
+ * to compute the hashes
+ */
+ret = qcow2_dedup_read_missing_and_concatenate(bs,
+   qiov,
+   sector_num,
+   remaining_sectors,
+   dedup_cluster_data,
+   dedup_cluster_data_nr);
+
+if (ret  0) {
+goto fail;
+}
+}
+
 while (remaining_sectors != 0) {
 
 l2meta = NULL;
 
 trace_qcow2_writev_start_part(qemu_coroutine_self());
+
+if (s-has_dedup  ds.nb_undedupable_sectors == 0) {
+/* Try to deduplicate as much clusters as possible */
+deduped_sectors_nr = qcow2_dedup(bs,
+ ds,
+ sector_num,
+ dedup_cluster_data,
+ dedup_cluster_data_nr);
+
+if (deduped_sectors_nr  0) {
+goto fail;
+}
+
+remaining_sectors -= deduped_sectors_nr;
+sector_num += deduped_sectors_nr;
+bytes_done += deduped_sectors_nr * 512;
+
+/* no more data to write - exit */
+if (remaining_sectors = 0) {
+goto fail;
+}
+
+/* if we deduped something trace it */
+if (deduped_sectors_nr) {
+trace_qcow2_writev_done_part(qemu_coroutine_self(),
+ deduped_sectors_nr);
+trace_qcow2_writev_start_part(qemu_coroutine_self());
+}
+}
+
 index_in_cluster = sector_num  (s-cluster_sectors - 1);
-n_end = index_in_cluster + remaining_sectors;
+n_end = s-has_dedup 
+ds.nb_undedupable_sectors  remaining_sectors ?
+index_in_cluster + ds.nb_undedupable_sectors :
+index_in_cluster + remaining_sectors;
+
 if (s-crypt_method 
 n_end  QCOW_MAX_CRYPT_CLUSTERS * s-cluster_sectors) {
 n_end = QCOW_MAX_CRYPT_CLUSTERS * s-cluster_sectors;
@@ -849,6 +910,24 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState 
*bs,
 cur_nr_sectors * 512);
 }
 
+/* Write the non duplicated clusters hashes to disk */
+if (s-has_dedup) {
+int count = cur_nr_sectors / s-cluster_sectors;
+int has_ending = ((cluster_offset  9) + index_in_cluster +
+ cur_nr_sectors)  (s-cluster_sectors - 1);
+count = index_in_cluster ? count + 1 : count;
+count = has_ending ? count + 1 : count;
+ret = qcow2_dedup_store_new_hashes(bs,
+   ds,
+   count,
+   sector_num,
+   (cluster_offset  9));
+if (ret  0) {
+goto fail;
+}
+}
+
+BLKDBG_EVENT(bs-file, BLKDBG_WRITE_AIO);
 qemu_co_mutex_unlock(s-lock);
 BLKDBG_EVENT(bs-file, BLKDBG_WRITE_AIO);
 

[Qemu-devel] [PATCH 05/18] dataplane: add event loop

2013-01-02 Thread Stefan Hajnoczi
Outside the safety of the global mutex we need to poll on file
descriptors.  I found epoll(2) is a convenient way to do that, although
other options could replace this module in the future (such as an
AioContext-based loop or glib's GMainLoop).

One important feature of this small event loop implementation is that
the loop can be terminated in a thread-safe way.  This allows QEMU to
stop the data plane thread cleanly.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/dataplane/Makefile.objs |   2 +-
 hw/dataplane/event-poll.c  | 100 +
 hw/dataplane/event-poll.h  |  40 ++
 3 files changed, 141 insertions(+), 1 deletion(-)
 create mode 100644 hw/dataplane/event-poll.c
 create mode 100644 hw/dataplane/event-poll.h

diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
index 34e6d57..e26bd7d 100644
--- a/hw/dataplane/Makefile.objs
+++ b/hw/dataplane/Makefile.objs
@@ -1,3 +1,3 @@
 ifeq ($(CONFIG_VIRTIO), y)
-common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o
+common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o
 endif
diff --git a/hw/dataplane/event-poll.c b/hw/dataplane/event-poll.c
new file mode 100644
index 000..2b55c6e
--- /dev/null
+++ b/hw/dataplane/event-poll.c
@@ -0,0 +1,100 @@
+/*
+ * Event loop with file descriptor polling
+ *
+ * Copyright 2012 IBM, Corp.
+ * Copyright 2012 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Stefan Hajnoczi stefa...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include sys/epoll.h
+#include hw/dataplane/event-poll.h
+
+/* Add an event notifier and its callback for polling */
+void event_poll_add(EventPoll *poll, EventHandler *handler,
+EventNotifier *notifier, EventCallback *callback)
+{
+struct epoll_event event = {
+.events = EPOLLIN,
+.data.ptr = handler,
+};
+handler-notifier = notifier;
+handler-callback = callback;
+if (epoll_ctl(poll-epoll_fd, EPOLL_CTL_ADD,
+  event_notifier_get_fd(notifier), event) != 0) {
+fprintf(stderr, failed to add event handler to epoll: %m\n);
+exit(1);
+}
+}
+
+/* Event callback for stopping event_poll() */
+static void handle_stop(EventHandler *handler)
+{
+/* Do nothing */
+}
+
+void event_poll_init(EventPoll *poll)
+{
+/* Create epoll file descriptor */
+poll-epoll_fd = epoll_create1(EPOLL_CLOEXEC);
+if (poll-epoll_fd  0) {
+fprintf(stderr, epoll_create1 failed: %m\n);
+exit(1);
+}
+
+/* Set up stop notifier */
+if (event_notifier_init(poll-stop_notifier, 0)  0) {
+fprintf(stderr, failed to init stop notifier\n);
+exit(1);
+}
+event_poll_add(poll, poll-stop_handler,
+   poll-stop_notifier, handle_stop);
+}
+
+void event_poll_cleanup(EventPoll *poll)
+{
+event_notifier_cleanup(poll-stop_notifier);
+close(poll-epoll_fd);
+poll-epoll_fd = -1;
+}
+
+/* Block until the next event and invoke its callback */
+void event_poll(EventPoll *poll)
+{
+EventHandler *handler;
+struct epoll_event event;
+int nevents;
+
+/* Wait for the next event.  Only do one event per call to keep the
+ * function simple, this could be changed later. */
+do {
+nevents = epoll_wait(poll-epoll_fd, event, 1, -1);
+} while (nevents  0  errno == EINTR);
+if (unlikely(nevents != 1)) {
+fprintf(stderr, epoll_wait failed: %m\n);
+exit(1); /* should never happen */
+}
+
+/* Find out which event handler has become active */
+handler = event.data.ptr;
+
+/* Clear the eventfd */
+event_notifier_test_and_clear(handler-notifier);
+
+/* Handle the event */
+handler-callback(handler);
+}
+
+/* Stop event_poll()
+ *
+ * This function can be used from another thread.
+ */
+void event_poll_notify(EventPoll *poll)
+{
+event_notifier_set(poll-stop_notifier);
+}
diff --git a/hw/dataplane/event-poll.h b/hw/dataplane/event-poll.h
new file mode 100644
index 000..3e8d3ec
--- /dev/null
+++ b/hw/dataplane/event-poll.h
@@ -0,0 +1,40 @@
+/*
+ * Event loop with file descriptor polling
+ *
+ * Copyright 2012 IBM, Corp.
+ * Copyright 2012 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Stefan Hajnoczi stefa...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef EVENT_POLL_H
+#define EVENT_POLL_H
+
+#include qemu/event_notifier.h
+
+typedef struct EventHandler EventHandler;
+typedef void EventCallback(EventHandler *handler);
+struct EventHandler {
+EventNotifier *notifier;/* eventfd */
+EventCallback *callback;/* callback function */
+};
+
+typedef struct {
+int epoll_fd;   /* epoll(2) file descriptor */
+

[Qemu-devel] [RFC V4 25/30] qcow2: Integrate SKEIN hash algorithm in deduplication.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-dedup.c |   14 ++
 block/qcow2.c   |5 +
 configure   |   33 +
 3 files changed, 52 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 28001c6..bd8397e 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -30,6 +30,9 @@
 #include block/block_int.h
 #include qemu-common.h
 #include qcow2.h
+#ifdef CONFIG_SKEIN_DEDUP
+#include skeinApi.h
+#endif
 
 static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
QCowHash *hash,
@@ -202,6 +205,17 @@ static int qcow2_compute_cluster_hash(BlockDriverState *bs,
 case QCOW_HASH_SHA256:
 return gnutls_hash_fast(GNUTLS_DIG_SHA256, data,
 s-cluster_size, hash-data);
+#if defined(CONFIG_SKEIN_DEDUP)
+case QCOW_HASH_SKEIN:
+{
+SkeinCtx_t ctx;
+skeinCtxPrepare(ctx, Skein256);
+skeinInit(ctx, Skein256);
+skeinUpdate(ctx, data, s-cluster_size);
+skeinFinal(ctx, hash-data);
+}
+return 0;
+#endif
 default:
 error_report(Invalid deduplication hash algorithm %i,
  s-dedup_hash_algo);
diff --git a/block/qcow2.c b/block/qcow2.c
index 13f6a5c..0154d50 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1540,6 +1540,11 @@ static int8_t qcow2_get_dedup_hash_algo(char *value)
 if (!strcmp(value, sha256)) {
 return QCOW_HASH_SHA256;
 }
+#if defined(CONFIG_SKEIN_DEDUP)
+if (!strcmp(value, skein)) {
+return QCOW_HASH_SKEIN;
+}
+#endif
 
 error_printf(Unsupported deduplication hash algorithm.\n);
 return -EINVAL;
diff --git a/configure b/configure
index 390326e..97497af 100755
--- a/configure
+++ b/configure
@@ -223,6 +223,7 @@ libiscsi=
 coroutine=
 seccomp=
 glusterfs=
+skein_dedup=no
 
 # parse CC options first
 for opt do
@@ -882,6 +883,8 @@ for opt do
   ;;
   --enable-glusterfs) glusterfs=yes
   ;;
+  --enable-skein-dedup) skein_dedup=yes
+  ;;
   *) echo ERROR: unknown option $opt; show_help=yes
   ;;
   esac
@@ -1130,6 +1133,7 @@ echo   --with-coroutine=BACKEND coroutine backend. 
Supported options:
 echogthread, ucontext, sigaltstack, windows
 echo   --enable-glusterfs   enable GlusterFS backend
 echo   --disable-glusterfs  disable GlusterFS backend
+echo   --enable-skein-dedup enable computing dedup hashes with SKEIN
 echo 
 echo NOTE: The object files are built at the place where configure is 
launched
 exit 1
@@ -2412,6 +2416,30 @@ EOF
   fi
 fi
 
+##
+# SKEIN dedup hash function probe
+if test $skein_dedup != no ; then
+  cat  $TMPC EOF
+#include skeinApi.h
+int main(void) {
+SkeinCtx_t ctx;
+skeinCtxPrepare(ctx, 512);
+return 0;
+}
+EOF
+  skein_libs=-lskein3fish
+  if compile_prog  $skein_libs ; then
+skein_dedup=yes
+libs_tools=$skein_libs $libs_tools
+libs_softmmu=$skein_libs $libs_softmmu
+  else
+if test $skein_dedup = yes ; then
+  feature_not_found libskein3fish not found
+fi
+skein_dedup=no
+  fi
+fi
+
 #
 # Check for xxxat() functions when we are building linux-user
 # emulator.  This is done because older glibc versions don't
@@ -3296,6 +3324,7 @@ echo build guest agent $guest_agent
 echo seccomp support   $seccomp
 echo coroutine backend $coroutine_backend
 echo GlusterFS support $glusterfs
+echo SKEIN support $skein_dedup
 
 if test $sdl_too_old = yes; then
 echo - Your SDL version is too old - please upgrade to have SDL support
@@ -3637,6 +3666,10 @@ if test $glusterfs = yes ; then
   echo CONFIG_GLUSTERFS=y  $config_host_mak
 fi
 
+if test $skein_dedup = yes ; then
+  echo CONFIG_SKEIN_DEDUP=y  $config_host_mak
+fi
+
 # USB host support
 case $usb in
 linux)
-- 
1.7.10.4




[Qemu-devel] [RFC V4 20/30] qcow2: Serialize write requests when deduplication is activated.

2013-01-02 Thread Benoît Canet
This fix the sub cluster sized writes race conditions while waiting
for a more faster solution.

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2.c |9 +
 block/qcow2.h |1 +
 2 files changed, 10 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 54c8847..13f6a5c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -521,6 +521,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 
 /* Initialise locks */
 qemu_co_mutex_init(s-lock);
+qemu_co_mutex_init(s-dedup_lock);
 
 /* Repair image if dirty */
 if (!(flags  BDRV_O_CHECK)  !bs-read_only 
@@ -810,6 +811,10 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState 
*bs,
 
 s-cluster_cache_offset = -1; /* disable compressed cache */
 
+if (s-has_dedup) {
+qemu_co_mutex_lock(s-dedup_lock);
+}
+
 qemu_co_mutex_lock(s-lock);
 
 if (s-has_dedup) {
@@ -978,6 +983,10 @@ fail:
 g_free(l2meta);
 }
 
+if (s-has_dedup) {
+qemu_co_mutex_unlock(s-dedup_lock);
+}
+
 qemu_iovec_destroy(hd_qiov);
 qemu_vfree(cluster_data);
 qemu_vfree(dedup_cluster_data);
diff --git a/block/qcow2.h b/block/qcow2.h
index f5576be..fd31f4f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -224,6 +224,7 @@ typedef struct BDRVQcowState {
 GTree *dedup_tree_by_hash;
 GTree *dedup_tree_by_sect;
 CoMutex lock;
+CoMutex dedup_lock;
 
 uint32_t crypt_method; /* current crypt method, 0 if no key yet */
 uint32_t crypt_method_header;
-- 
1.7.10.4




[Qemu-devel] [RFC V4 29/30] qcow2: init and cleanup deduplication.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-dedup.c |   78 +++
 block/qcow2.c   |   17 ---
 2 files changed, 86 insertions(+), 9 deletions(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index bd8397e..da1a668 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -1014,20 +1014,88 @@ void coroutine_fn qcow2_co_load_dedup_hashes(void 
*opaque)
 }
 }
 
+static gint qcow2_dedup_compare_by_hash(gconstpointer a,
+gconstpointer b,
+gpointer data)
+{
+QCowHash *hash_a = (QCowHash *) a;
+QCowHash *hash_b = (QCowHash *) b;
+return memcmp(hash_a-data, hash_b-data, HASH_LENGTH);
+}
+
+static void qcow2_dedup_destroy_qcow_hash_node(gpointer p)
+{
+QCowHashNode *hash_node = (QCowHashNode *) p;
+g_free(hash_node);
+}
+
+static gint qcow2_dedup_compare_by_offset(gconstpointer a,
+  gconstpointer b,
+  gpointer data)
+{
+uint64_t offset_a = *((uint64_t *) a);
+uint64_t offset_b = *((uint64_t *) b);
+
+if (offset_a  offset_b) {
+return 1;
+}
+if (offset_a  offset_b) {
+return -1;
+}
+return 0;
+}
+
 int qcow2_dedup_init(BlockDriverState *bs)
 {
 BDRVQcowState *s = bs-opaque;
-return qcow2_do_table_init(bs,
-   s-dedup_table,
-   s-dedup_table_offset,
-   s-dedup_table_size,
-   false);
+Coroutine *co;
+int ret;
+
+s-has_dedup = true;
+
+ret = qcow2_do_table_init(bs,
+  s-dedup_table,
+  s-dedup_table_offset,
+  s-dedup_table_size,
+  false);
+
+if (ret  0) {
+return ret;
+}
+
+/* if we are read-only we don't deduplicate anything */
+if (bs-read_only) {
+return 0;
+}
+
+s-dedup_tree_by_hash = g_tree_new_full(qcow2_dedup_compare_by_hash, NULL,
+NULL,
+
qcow2_dedup_destroy_qcow_hash_node);
+s-dedup_tree_by_sect = g_tree_new_full(qcow2_dedup_compare_by_offset,
+  NULL, NULL, NULL);
+
+s-dedup_cluster_cache = qcow2_cache_create(bs, DEDUP_CACHE_SIZE,
+s-hash_block_size);
+
+/* load asynchronously the hashes */
+co = qemu_coroutine_create(qcow2_co_load_dedup_hashes);
+qemu_coroutine_enter(co, bs);
+return 0;
 }
 
 void qcow2_dedup_close(BlockDriverState *bs)
 {
 BDRVQcowState *s = bs-opaque;
 g_free(s-dedup_table);
+
+if (bs-read_only) {
+return;
+}
+
+qcow2_cache_flush(bs, s-dedup_cluster_cache);
+qcow2_cache_destroy(bs, s-dedup_cluster_cache);
+g_tree_destroy(s-dedup_tree_by_sect);
+g_tree_destroy(s-dedup_tree_by_hash);
 }
 
 /* Clean the last reference to a given cluster when it's refcount is zero
diff --git a/block/qcow2.c b/block/qcow2.c
index f1e0f5f..d534077 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -539,6 +539,13 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 }
 }
 
+if (s-incompatible_features  QCOW2_INCOMPAT_DEDUP) {
+ret = qcow2_dedup_init(bs);
+if (ret  0) {
+goto fail;
+}
+}
+
 #ifdef DEBUG_ALLOC
 {
 BdrvCheckResult result = {0};
@@ -1003,11 +1010,11 @@ fail:
 static void qcow2_close(BlockDriverState *bs)
 {
 BDRVQcowState *s = bs-opaque;
+
 g_free(s-l1_table);
 
 if (s-has_dedup) {
-qcow2_cache_flush(bs, s-dedup_cluster_cache);
-qcow2_cache_destroy(bs, s-dedup_cluster_cache);
+qcow2_dedup_close(bs);
 }
 
 qcow2_cache_flush(bs, s-l2_table_cache);
@@ -1498,8 +1505,10 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 }
 
 /* minimal init */
-s-dedup_cluster_cache = qcow2_cache_create(bs, DEDUP_CACHE_SIZE,
-s-hash_block_size);
+ret = qcow2_dedup_init(bs);
+if (ret  0) {
+goto out;
+}
 }
 
 /* Want a backing file? There you go.*/
-- 
1.7.10.4




[Qemu-devel] [RFC V4 22/30] qcow2: Adapt checking of QCOW_OFLAG_COPIED for dedup.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-refcount.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7e6d02f..9aef608 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1001,7 +1001,14 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 PRIx64 : %s\n, l2_entry, strerror(-refcount));
 goto fail;
 }
-if ((refcount == 1) != ((l2_entry  QCOW_OFLAG_COPIED) != 0)) {
+if (!s-has_dedup 
+(refcount == 1) != ((l2_entry  QCOW_OFLAG_COPIED) != 0)) {
+fprintf(stderr, ERROR OFLAG_COPIED: offset=%
+PRIx64  refcount=%d\n, l2_entry, refcount);
+res-corruptions++;
+}
+if (s-has_dedup  refcount  1 
+((l2_entry  QCOW_OFLAG_COPIED) != 0)) {
 fprintf(stderr, ERROR OFLAG_COPIED: offset=%
 PRIx64  refcount=%d\n, l2_entry, refcount);
 res-corruptions++;
-- 
1.7.10.4




[Qemu-devel] [Bug 1091766] Re: Physical host crash with Mellanox IB PCI passthrough

2013-01-02 Thread Vlastimil Holer
Just a silly questions, because I don't know the deveploment process in
QEMU project  -- can be your patches commited into project's VCS so that
new stable release contains them and doesn't fail again? Thank you!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1091766

Title:
  Physical host crash with Mellanox IB PCI passthrough

Status in QEMU:
  New

Bug description:
  (from
  http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/100736)

  We have been using PCI passthrough with the Mellanox IB interface
  (MT27500 Family [ConnectX-3]) on Debian 6.0.6, kernel 3.2.23 and
  qemu-kvm-1.0 (both from backports). It worked fine until latest
  update in backports to qemu-kvm-1.1.2. With newer qemu-kvm versions
  IB device probe in guest fails leaving firmware to kill whole physical 
machine.

  I have then compiled qemu-kvm from source, 1.0.1 was OK, 1.1.2 fails and
  even 1.2.0 fails as well. Our setup is based on IBM System X iDataPlex
  dx360 M4 Server.

  Note: Now I have also tested latest qemu-1.3.0 with linux 3.7.1 and
  new VFIO mechanism and behaves the same way.

  On guest the mlx4_core fails to probe device:
  | mlx4_core :00:08.0: irq 74 for MSI/MSI-X
  | mlx4_core :00:08.0: irq 75 for MSI/MSI-X
  | mlx4_core :00:08.0: irq 76 for MSI/MSI-X
  | mlx4_core :00:08.0: irq 77 for MSI/MSI-X
  | mlx4_core :00:08.0: NOP command failed to generate MSI-X interrupt IRQ 
51).
  | mlx4_core :00:08.0: Trying again without MSI-X.
  | mlx4_core :00:08.0: NOP command failed to generate interrupt (IRQ 51), 
aborting.
  | mlx4_core :00:08.0: BIOS or ACPI interrupt routing problem?
  | mlx4_core :00:08.0: PCI INT A disabled
  | mlx4_core: probe of :00:08.0 failed with error -16

  Which immediately results in reset of the whole physical machine:
  | Uhhuh. NMI received for unknown reason 3d on CPU 0.
  | Do you have a strange power saving mode enabled?
  | Dazed and confused, but trying to continue

  Followed by events in hardware management module:
  | A software NMI has occurred on system SN# xxx
  | Fault in slot All PCI Err on system SN# xxx
  | Fault in slot PCI 1 on system SN# xxx
  | A Uncorrectable Bus Error has occurred on system SN# xxx
  | Host Power has been Power Cycled
  | System SN# xxx has recovered from an NMI

  Kernel logs for both host/guest machines and different qemu-kvm
  versions are attached. PCI passthrough for e.g. Intel e1000 works
  fine with all tested qemu-kvm versions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1091766/+subscriptions



[Qemu-devel] [RFC V4 30/30] qemu-iotests: Filter dedup=on/off so existing tests don't break.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 tests/qemu-iotests/common.rc |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index aef5f52..72e746d 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -124,7 +124,8 @@ _make_test_img()
 -e s# compat='[^']*'##g \
 -e s# compat6=\\(on\\|off\\)##g \
 -e s# static=\\(on\\|off\\)##g \
--e s# lazy_refcounts=\\(on\\|off\\)##g
+-e s# lazy_refcounts=\\(on\\|off\\)##g \
+-e s# dedup=\\('sha256'\\|'skein'\\|'sha3'\\)##g
 
 # Start an NBD server on the image file, which is what we'll be talking to
 if [ $IMGPROTO = nbd ]; then
-- 
1.7.10.4




[Qemu-devel] [RFC V4 28/30] qcow: Set dedup cluster block size to 64KB.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-refcount.c |4 ++--
 block/qcow2.c  |1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 092546d..3f3efd8 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1061,7 +1061,7 @@ static int check_dedup_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
 int i, l2_size;
 
 /* Read L2 table from disk */
-l2_size = s-cluster_size;
+l2_size = s-hash_block_size;
 l2_table = g_malloc(l2_size);
 
 if (bdrv_pread(bs-file, l2_offset, l2_table, l2_size) != l2_size) {
@@ -1151,7 +1151,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
 /* Mark L2 table as used */
 l2_offset = L1E_OFFSET_MASK;
 inc_refcounts(bs, res, refcount_table, refcount_table_size,
-l2_offset, s-l2_size  3);
+l2_offset, dedup ? s-hash_block_size : s-l2_size  3);
 
 /* L2 tables are cluster aligned */
 if (l2_offset  (s-cluster_size - 1)) {
diff --git a/block/qcow2.c b/block/qcow2.c
index 16038db..f1e0f5f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -432,6 +432,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 s-cluster_sectors = 1  (s-cluster_bits - 9);
 if (s-incompatible_features  QCOW2_INCOMPAT_DEDUP) {
 s-l2_bits = 16 - 3; /* 64 KB L2 */
+s-hash_block_size = DEFAULT_CLUSTER_SIZE;
 } else {
 s-l2_bits = s-cluster_bits - 3; /* L2 is always one cluster */
 }
-- 
1.7.10.4




[Qemu-devel] [RFC V4 14/30] qcow2-cache: Allow to choose table size at creation.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-cache.c |   12 +++-
 block/qcow2.c   |5 +++--
 block/qcow2.h   |3 ++-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 2f3114e..83f2814 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -40,20 +40,22 @@ struct Qcow2Cache {
 struct Qcow2Cache*  depends;
 int size;
 booldepends_on_flush;
+int table_size;
 };
 
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
+Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
+   int table_size)
 {
-BDRVQcowState *s = bs-opaque;
 Qcow2Cache *c;
 int i;
 
 c = g_malloc0(sizeof(*c));
 c-size = num_tables;
 c-entries = g_malloc0(sizeof(*c-entries) * num_tables);
+c-table_size = table_size;
 
 for (i = 0; i  c-size; i++) {
-c-entries[i].table = qemu_blockalign(bs, s-cluster_size);
+c-entries[i].table = qemu_blockalign(bs, c-table_size);
 }
 
 return c;
@@ -121,7 +123,7 @@ static int qcow2_cache_entry_flush(BlockDriverState *bs, 
Qcow2Cache *c, int i)
 }
 
 ret = bdrv_pwrite(bs-file, c-entries[i].offset, c-entries[i].table,
-s-cluster_size);
+c-table_size);
 if (ret  0) {
 return ret;
 }
@@ -253,7 +255,7 @@ static int qcow2_cache_do_get(BlockDriverState *bs, 
Qcow2Cache *c,
 BLKDBG_EVENT(bs-file, BLKDBG_L2_LOAD);
 }
 
-ret = bdrv_pread(bs-file, offset, c-entries[i].table, 
s-cluster_size);
+ret = bdrv_pread(bs-file, offset, c-entries[i].table, c-table_size);
 if (ret  0) {
 return ret;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index 9a7177b..499e939 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -450,8 +450,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 }
 
 /* alloc L2 table/refcount block cache */
-s-l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
-s-refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
+s-l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE, s-cluster_size);
+s-refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE,
+ s-cluster_size);
 
 s-cluster_cache = g_malloc(s-cluster_size);
 /* one more sector for decompressed data alignment */
diff --git a/block/qcow2.h b/block/qcow2.h
index 9add0f1..4932750 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -440,7 +440,8 @@ void qcow2_free_snapshots(BlockDriverState *bs);
 int qcow2_read_snapshots(BlockDriverState *bs);
 
 /* qcow2-cache.c functions */
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
+Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables,
+   int table_size);
 int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
 
 void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
-- 
1.7.10.4




[Qemu-devel] [RFC V4 27/30] qcow2: Use large L2 table for deduplication.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-cluster.c  |2 +-
 block/qcow2-refcount.c |   22 +++---
 block/qcow2.c  |8 ++--
 3 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 07037a0..d69af17 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -236,7 +236,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, 
uint64_t **table)
 goto fail;
 }
 
-memcpy(l2_table, old_table, s-cluster_size);
+memcpy(l2_table, old_table, s-l2_size  3);
 
 ret = qcow2_cache_put(bs, s-l2_table_cache, (void**) old_table);
 if (ret  0) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 0c6e75a..092546d 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -535,12 +535,15 @@ fail:
  */
 static int update_cluster_refcount(BlockDriverState *bs,
int64_t cluster_index,
-   int addend)
+   int addend,
+   bool is_l2)
 {
 BDRVQcowState *s = bs-opaque;
 int ret;
 
-ret = update_refcount(bs, cluster_index  s-cluster_bits, 1, addend);
+int size = is_l2 ? s-l2_size  3 : 1;
+
+ret = update_refcount(bs, cluster_index  s-cluster_bits, size, addend);
 if (ret  0) {
 return ret;
 }
@@ -664,7 +667,7 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 if (free_in_cluster == 0)
 s-free_byte_offset = 0;
 if ((offset  (s-cluster_size - 1)) != 0)
-update_cluster_refcount(bs, offset  s-cluster_bits, 1);
+update_cluster_refcount(bs, offset  s-cluster_bits, 1, false);
 } else {
 offset = qcow2_alloc_clusters(bs, s-cluster_size);
 if (offset  0) {
@@ -674,7 +677,7 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 if ((cluster_offset + s-cluster_size) == offset) {
 /* we are lucky: contiguous data */
 offset = s-free_byte_offset;
-update_cluster_refcount(bs, offset  s-cluster_bits, 1);
+update_cluster_refcount(bs, offset  s-cluster_bits, 1, false);
 s-free_byte_offset += size;
 } else {
 s-free_byte_offset = offset;
@@ -815,7 +818,10 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 } else {
 uint64_t cluster_index = (offset  L2E_OFFSET_MASK)  
s-cluster_bits;
 if (addend != 0) {
-refcount = update_cluster_refcount(bs, 
cluster_index, addend);
+refcount = update_cluster_refcount(bs,
+   cluster_index,
+   addend,
+   false);
 } else {
 refcount = get_refcount(bs, cluster_index);
 }
@@ -847,7 +853,9 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 
 
 if (addend != 0) {
-refcount = update_cluster_refcount(bs, l2_offset  
s-cluster_bits, addend);
+refcount = update_cluster_refcount(bs,
+   l2_offset  
s-cluster_bits,
+   addend, true);
 } else {
 refcount = get_refcount(bs, l2_offset  s-cluster_bits);
 }
@@ -1143,7 +1151,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
 /* Mark L2 table as used */
 l2_offset = L1E_OFFSET_MASK;
 inc_refcounts(bs, res, refcount_table, refcount_table_size,
-l2_offset, s-cluster_size);
+l2_offset, s-l2_size  3);
 
 /* L2 tables are cluster aligned */
 if (l2_offset  (s-cluster_size - 1)) {
diff --git a/block/qcow2.c b/block/qcow2.c
index f66e67d..16038db 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -430,7 +430,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 s-cluster_bits = header.cluster_bits;
 s-cluster_size = 1  s-cluster_bits;
 s-cluster_sectors = 1  (s-cluster_bits - 9);
-s-l2_bits = s-cluster_bits - 3; /* L2 is always one cluster */
+if (s-incompatible_features  QCOW2_INCOMPAT_DEDUP) {
+s-l2_bits = 16 - 3; /* 64 KB L2 */
+} else {
+s-l2_bits = s-cluster_bits - 3; /* L2 is always one cluster */
+}
 s-l2_size = 1  s-l2_bits;
 bs-total_sectors = header.size / 512;
 s-csize_shift = (62 - (s-cluster_bits - 8));
@@ -467,7 +471,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 }
 
 /* alloc L2 table/refcount block cache */
-s-l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE, 

[Qemu-devel] [PATCH 06/18] dataplane: add Linux AIO request queue

2013-01-02 Thread Stefan Hajnoczi
The IOQueue has a pool of iocb structs and a function to add new
read/write requests.  Multiple requests can be added before calling the
submit function to actually tell the host kernel to begin I/O.  This
allows callers to batch requests and submit them in one go.

The actual I/O is performed using Linux AIO.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/dataplane/Makefile.objs |   2 +-
 hw/dataplane/ioq.c | 117 +
 hw/dataplane/ioq.h |  57 ++
 3 files changed, 175 insertions(+), 1 deletion(-)
 create mode 100644 hw/dataplane/ioq.c
 create mode 100644 hw/dataplane/ioq.h

diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
index e26bd7d..abd408f 100644
--- a/hw/dataplane/Makefile.objs
+++ b/hw/dataplane/Makefile.objs
@@ -1,3 +1,3 @@
 ifeq ($(CONFIG_VIRTIO), y)
-common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o
+common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o 
ioq.o
 endif
diff --git a/hw/dataplane/ioq.c b/hw/dataplane/ioq.c
new file mode 100644
index 000..0c9f5c4
--- /dev/null
+++ b/hw/dataplane/ioq.c
@@ -0,0 +1,117 @@
+/*
+ * Linux AIO request queue
+ *
+ * Copyright 2012 IBM, Corp.
+ * Copyright 2012 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Stefan Hajnoczi stefa...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include hw/dataplane/ioq.h
+
+void ioq_init(IOQueue *ioq, int fd, unsigned int max_reqs)
+{
+int rc;
+
+ioq-fd = fd;
+ioq-max_reqs = max_reqs;
+
+memset(ioq-io_ctx, 0, sizeof ioq-io_ctx);
+rc = io_setup(max_reqs, ioq-io_ctx);
+if (rc != 0) {
+fprintf(stderr, ioq io_setup failed %d\n, rc);
+exit(1);
+}
+
+rc = event_notifier_init(ioq-io_notifier, 0);
+if (rc != 0) {
+fprintf(stderr, ioq io event notifier creation failed %d\n, rc);
+exit(1);
+}
+
+ioq-freelist = g_malloc0(sizeof ioq-freelist[0] * max_reqs);
+ioq-freelist_idx = 0;
+
+ioq-queue = g_malloc0(sizeof ioq-queue[0] * max_reqs);
+ioq-queue_idx = 0;
+}
+
+void ioq_cleanup(IOQueue *ioq)
+{
+g_free(ioq-freelist);
+g_free(ioq-queue);
+
+event_notifier_cleanup(ioq-io_notifier);
+io_destroy(ioq-io_ctx);
+}
+
+EventNotifier *ioq_get_notifier(IOQueue *ioq)
+{
+return ioq-io_notifier;
+}
+
+struct iocb *ioq_get_iocb(IOQueue *ioq)
+{
+/* Underflow cannot happen since ioq is sized for max_reqs */
+assert(ioq-freelist_idx != 0);
+
+struct iocb *iocb = ioq-freelist[--ioq-freelist_idx];
+ioq-queue[ioq-queue_idx++] = iocb;
+return iocb;
+}
+
+void ioq_put_iocb(IOQueue *ioq, struct iocb *iocb)
+{
+/* Overflow cannot happen since ioq is sized for max_reqs */
+assert(ioq-freelist_idx != ioq-max_reqs);
+
+ioq-freelist[ioq-freelist_idx++] = iocb;
+}
+
+struct iocb *ioq_rdwr(IOQueue *ioq, bool read, struct iovec *iov,
+  unsigned int count, long long offset)
+{
+struct iocb *iocb = ioq_get_iocb(ioq);
+
+if (read) {
+io_prep_preadv(iocb, ioq-fd, iov, count, offset);
+} else {
+io_prep_pwritev(iocb, ioq-fd, iov, count, offset);
+}
+io_set_eventfd(iocb, event_notifier_get_fd(ioq-io_notifier));
+return iocb;
+}
+
+int ioq_submit(IOQueue *ioq)
+{
+int rc = io_submit(ioq-io_ctx, ioq-queue_idx, ioq-queue);
+ioq-queue_idx = 0; /* reset */
+return rc;
+}
+
+int ioq_run_completion(IOQueue *ioq, IOQueueCompletion *completion,
+   void *opaque)
+{
+struct io_event events[ioq-max_reqs];
+int nevents, i;
+
+do {
+nevents = io_getevents(ioq-io_ctx, 0, ioq-max_reqs, events, NULL);
+} while (nevents  0  errno == EINTR);
+if (nevents  0) {
+return nevents;
+}
+
+for (i = 0; i  nevents; i++) {
+ssize_t ret = ((uint64_t)events[i].res2  32) | events[i].res;
+
+completion(events[i].obj, ret, opaque);
+ioq_put_iocb(ioq, events[i].obj);
+}
+return nevents;
+}
diff --git a/hw/dataplane/ioq.h b/hw/dataplane/ioq.h
new file mode 100644
index 000..b49b5de
--- /dev/null
+++ b/hw/dataplane/ioq.h
@@ -0,0 +1,57 @@
+/*
+ * Linux AIO request queue
+ *
+ * Copyright 2012 IBM, Corp.
+ * Copyright 2012 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Stefan Hajnoczi stefa...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef IOQ_H
+#define IOQ_H
+
+#include libaio.h
+#include qemu/event_notifier.h
+
+typedef struct {
+int fd; /* file descriptor */
+unsigned int max_reqs;  /* max length of freelist and queue */
+
+io_context_t io_ctx;/* Linux AIO context */
+EventNotifier io_notifier;  /* Linux AIO eventfd */
+
+  

[Qemu-devel] [RFC V4 09/30] qcow2: Extract qcow2_dedup_grow_table

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-cluster.c |  102 +++--
 block/qcow2-dedup.c   |3 +-
 block/qcow2.h |6 +++
 3 files changed, 71 insertions(+), 40 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 63a7241..dbcb6d2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -29,44 +29,48 @@
 #include block/qcow2.h
 #include trace.h
 
-int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
+int qcow2_do_grow_table(BlockDriverState *bs, int min_size, bool exact_size,
+uint64_t **table, uint64_t *table_offset,
+int *table_size, qcow2_save_table save_table,
+const char *table_name)
 {
 BDRVQcowState *s = bs-opaque;
-int new_l1_size, new_l1_size2, ret, i;
-uint64_t *new_l1_table;
-int64_t new_l1_table_offset;
-uint8_t data[12];
+int new_size, new_size2, ret, i;
+uint64_t *new_table;
+int64_t new_table_offset;
 
-if (min_size = s-l1_size)
+if (min_size = *table_size) {
 return 0;
+}
 
 if (exact_size) {
-new_l1_size = min_size;
+new_size = min_size;
 } else {
 /* Bump size up to reduce the number of times we have to grow */
-new_l1_size = s-l1_size;
-if (new_l1_size == 0) {
-new_l1_size = 1;
+new_size = *table_size;
+if (new_size == 0) {
+new_size = 1;
 }
-while (min_size  new_l1_size) {
-new_l1_size = (new_l1_size * 3 + 1) / 2;
+while (min_size  new_size) {
+new_size = (new_size * 3 + 1) / 2;
 }
 }
 
 #ifdef DEBUG_ALLOC2
-fprintf(stderr, grow l1_table from %d to %d\n, s-l1_size, new_l1_size);
+fprintf(stderr, grow %s_table from %d to %d\n,
+table_name, *table_size, new_size);
 #endif
 
-new_l1_size2 = sizeof(uint64_t) * new_l1_size;
-new_l1_table = g_malloc0(align_offset(new_l1_size2, 512));
-memcpy(new_l1_table, s-l1_table, s-l1_size * sizeof(uint64_t));
+new_size2 = sizeof(uint64_t) * new_size;
+new_table = g_malloc0(align_offset(new_size2, 512));
+memcpy(new_table, *table, *table_size * sizeof(uint64_t));
 
 /* write new table (align to cluster) */
 BLKDBG_EVENT(bs-file, BLKDBG_L1_GROW_ALLOC_TABLE);
-new_l1_table_offset = qcow2_alloc_clusters(bs, new_l1_size2);
-if (new_l1_table_offset  0) {
-g_free(new_l1_table);
-return new_l1_table_offset;
+new_table_offset = qcow2_alloc_clusters(bs, new_size2);
+if (new_table_offset  0) {
+g_free(new_table);
+return new_table_offset;
 }
 
 ret = qcow2_cache_flush(bs, s-refcount_block_cache);
@@ -75,34 +79,56 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, 
bool exact_size)
 }
 
 BLKDBG_EVENT(bs-file, BLKDBG_L1_GROW_WRITE_TABLE);
-for(i = 0; i  s-l1_size; i++)
-new_l1_table[i] = cpu_to_be64(new_l1_table[i]);
-ret = bdrv_pwrite_sync(bs-file, new_l1_table_offset, new_l1_table, 
new_l1_size2);
+for (i = 0; i  *table_size; i++) {
+new_table[i] = cpu_to_be64(new_table[i]);
+}
+ret = bdrv_pwrite_sync(bs-file, new_table_offset, new_table, new_size2);
 if (ret  0)
 goto fail;
-for(i = 0; i  s-l1_size; i++)
-new_l1_table[i] = be64_to_cpu(new_l1_table[i]);
+for (i = 0; i  *table_size; i++) {
+new_table[i] = be64_to_cpu(new_table[i]);
+}
+
+g_free(*table);
+qcow2_free_clusters(bs, *table_offset, *table_size * sizeof(uint64_t));
+*table_offset = new_table_offset;
+*table = new_table;
+*table_size = new_size;
 
 /* set new table */
 BLKDBG_EVENT(bs-file, BLKDBG_L1_GROW_ACTIVATE_TABLE);
-cpu_to_be32w((uint32_t*)data, new_l1_size);
-cpu_to_be64wu((uint64_t*)(data + 4), new_l1_table_offset);
-ret = bdrv_pwrite_sync(bs-file, offsetof(QCowHeader, l1_size), 
data,sizeof(data));
-if (ret  0) {
-goto fail;
-}
-g_free(s-l1_table);
-qcow2_free_clusters(bs, s-l1_table_offset, s-l1_size * sizeof(uint64_t));
-s-l1_table_offset = new_l1_table_offset;
-s-l1_table = new_l1_table;
-s-l1_size = new_l1_size;
+save_table(bs, *table_offset, *table_size);
+
 return 0;
  fail:
-g_free(new_l1_table);
-qcow2_free_clusters(bs, new_l1_table_offset, new_l1_size2);
+g_free(new_table);
+qcow2_free_clusters(bs, new_table_offset, new_size2);
 return ret;
 }
 
+static int qcow2_l1_save_table(BlockDriverState *bs,
+   int64_t table_offset, int size)
+{
+uint8_t data[12];
+cpu_to_be32w((uint32_t *)data, size);
+cpu_to_be64wu((uint64_t *)(data + 4), table_offset);
+return bdrv_pwrite_sync(bs-file, offsetof(QCowHeader, l1_size),
+data, sizeof(data));
+}
+
+int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
+{

[Qemu-devel] [Bug 1091766] Re: Physical host crash with Mellanox IB PCI passthrough

2013-01-02 Thread Alex Williamson
I'm currently trying to make that happen, starting with the patches in
comments 2  3 and moving to something like the patch in comment 8 in
the development branch.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1091766

Title:
  Physical host crash with Mellanox IB PCI passthrough

Status in QEMU:
  New

Bug description:
  (from
  http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/100736)

  We have been using PCI passthrough with the Mellanox IB interface
  (MT27500 Family [ConnectX-3]) on Debian 6.0.6, kernel 3.2.23 and
  qemu-kvm-1.0 (both from backports). It worked fine until latest
  update in backports to qemu-kvm-1.1.2. With newer qemu-kvm versions
  IB device probe in guest fails leaving firmware to kill whole physical 
machine.

  I have then compiled qemu-kvm from source, 1.0.1 was OK, 1.1.2 fails and
  even 1.2.0 fails as well. Our setup is based on IBM System X iDataPlex
  dx360 M4 Server.

  Note: Now I have also tested latest qemu-1.3.0 with linux 3.7.1 and
  new VFIO mechanism and behaves the same way.

  On guest the mlx4_core fails to probe device:
  | mlx4_core :00:08.0: irq 74 for MSI/MSI-X
  | mlx4_core :00:08.0: irq 75 for MSI/MSI-X
  | mlx4_core :00:08.0: irq 76 for MSI/MSI-X
  | mlx4_core :00:08.0: irq 77 for MSI/MSI-X
  | mlx4_core :00:08.0: NOP command failed to generate MSI-X interrupt IRQ 
51).
  | mlx4_core :00:08.0: Trying again without MSI-X.
  | mlx4_core :00:08.0: NOP command failed to generate interrupt (IRQ 51), 
aborting.
  | mlx4_core :00:08.0: BIOS or ACPI interrupt routing problem?
  | mlx4_core :00:08.0: PCI INT A disabled
  | mlx4_core: probe of :00:08.0 failed with error -16

  Which immediately results in reset of the whole physical machine:
  | Uhhuh. NMI received for unknown reason 3d on CPU 0.
  | Do you have a strange power saving mode enabled?
  | Dazed and confused, but trying to continue

  Followed by events in hardware management module:
  | A software NMI has occurred on system SN# xxx
  | Fault in slot All PCI Err on system SN# xxx
  | Fault in slot PCI 1 on system SN# xxx
  | A Uncorrectable Bus Error has occurred on system SN# xxx
  | Host Power has been Power Cycled
  | System SN# xxx has recovered from an NMI

  Kernel logs for both host/guest machines and different qemu-kvm
  versions are attached. PCI passthrough for e.g. Intel e1000 works
  fine with all tested qemu-kvm versions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1091766/+subscriptions



[Qemu-devel] [RFC V4 12/30] qcow2: Load and save deduplication table header extension.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2.c |   38 ++
 1 file changed, 38 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 410d3c1..9a7177b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -53,9 +53,16 @@ typedef struct {
 uint32_t len;
 } QCowExtension;
 
+typedef struct {
+uint64_t offset;
+int32_t  size;
+uint8_t  hash_algo;
+} QCowDedupTableExtension;
+
 #define  QCOW2_EXT_MAGIC_END 0
 #define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
 #define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
+#define  QCOW2_EXT_MAGIC_DEDUP_TABLE 0xCD8E819B
 
 static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
@@ -83,6 +90,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, 
uint64_t start_offset,
 QCowExtension ext;
 uint64_t offset;
 int ret;
+QCowDedupTableExtension dedup_table_extension;
 
 #ifdef DEBUG_EXT
 printf(qcow2_read_extensions: start=%ld end=%ld\n, start_offset, 
end_offset);
@@ -147,6 +155,19 @@ static int qcow2_read_extensions(BlockDriverState *bs, 
uint64_t start_offset,
 }
 break;
 
+case QCOW2_EXT_MAGIC_DEDUP_TABLE:
+ret = bdrv_pread(bs-file, offset,
+ dedup_table_extension, ext.len);
+if (ret  0) {
+return ret;
+}
+s-dedup_table_offset =
+be64_to_cpu(dedup_table_extension.offset);
+s-dedup_table_size =
+be32_to_cpu(dedup_table_extension.size);
+s-dedup_hash_algo = dedup_table_extension.hash_algo;
+break;
+
 default:
 /* unknown magic - save it in case we need to rewrite the header */
 {
@@ -958,6 +979,7 @@ int qcow2_update_header(BlockDriverState *bs)
 uint32_t refcount_table_clusters;
 size_t header_length;
 Qcow2UnknownHeaderExtension *uext;
+QCowDedupTableExtension dedup_table_extension;
 
 buf = qemu_blockalign(bs, buflen);
 
@@ -1061,6 +1083,22 @@ int qcow2_update_header(BlockDriverState *bs)
 buf += ret;
 buflen -= ret;
 
+if (s-has_dedup) {
+dedup_table_extension.offset = cpu_to_be64(s-dedup_table_offset);
+dedup_table_extension.size = cpu_to_be32(s-dedup_table_size);
+dedup_table_extension.hash_algo = s-dedup_hash_algo;
+ret = header_ext_add(buf,
+ QCOW2_EXT_MAGIC_DEDUP_TABLE,
+ dedup_table_extension,
+ sizeof(dedup_table_extension),
+ buflen);
+if (ret  0) {
+goto fail;
+}
+buf += ret;
+buflen -= ret;
+}
+
 /* Keep unknown header extensions */
 QLIST_FOREACH(uext, s-unknown_header_ext, next) {
 ret = header_ext_add(buf, uext-magic, uext-data, uext-len, buflen);
-- 
1.7.10.4




[Qemu-devel] [RFC V4 05/30] qcow2: Create a way to link to l2 tables when deduplicating.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-cluster.c |8 ++--
 block/qcow2.h |9 +
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 56fccf9..63a7241 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -693,7 +693,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 old_cluster[j++] = l2_table[l2_index + i];
 
 l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
-(i  s-cluster_bits)) | QCOW_OFLAG_COPIED);
+(i  s-cluster_bits)) |
+(m-oflag_copied ? QCOW_OFLAG_COPIED : 0));
  }
 
 
@@ -706,7 +707,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
  * If this was a COW, we need to decrease the refcount of the old cluster.
  * Also flush bs-file to get the right order for L2 and refcount update.
  */
-if (j != 0) {
+if (!m-overwrite  j != 0) {
 for (i = 0; i  j; i++) {
 qcow2_free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1);
 }
@@ -1006,6 +1007,9 @@ again:
 .offset = nb_sectors * BDRV_SECTOR_SIZE,
 .nb_sectors = avail_sectors - nb_sectors,
 },
+
+.oflag_copied   = true,
+.overwrite  = false,
 };
 qemu_co_queue_init((*m)-dependent_requests);
 QLIST_INSERT_HEAD(s-cluster_allocs, *m, next_in_flight);
diff --git a/block/qcow2.h b/block/qcow2.h
index 3307481..9403431 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -59,6 +59,10 @@
 #define DEFAULT_CLUSTER_SIZE 65536
 
 #define HASH_LENGTH 32
+/* indicate that the hash structure is empty and miss offset */
+#define QCOW_FLAG_EMPTY   (1LL  62)
+/* indicate that the cluster for this hash has QCOW_OFLAG_COPIED on disk */
+#define QCOW_FLAG_FIRST   (1LL  63)
 
 typedef enum {
 QCOW_HASH_SHA256 = 0,
@@ -289,6 +293,11 @@ typedef struct QCowL2Meta
  */
 CoQueue dependent_requests;
 
+/* set to true if QCOW_OFLAG_COPIED must be set in the L2 table entry */
+bool oflag_copied;
+/* set to true if we are overwriting an L2 table entry */
+bool overwrite;
+
 /**
  * The COW Region between the start of the first allocated cluster and the
  * area the guest actually writes to.
-- 
1.7.10.4




[Qemu-devel] [RFC V4 16/30] qcow2: Extract qcow2_add_feature and qcow2_remove_feature.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2.c |   49 ++---
 block/qcow2.h |4 ++--
 2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 499e939..ad399c8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -236,61 +236,72 @@ static void report_unsupported_feature(BlockDriverState 
*bs,
 }
 
 /*
- * Sets the dirty bit and flushes afterwards if necessary.
+ * Sets the an incompatible feature bit and flushes afterwards if necessary.
  *
  * The incompatible_features bit is only set if the image file header was
  * updated successfully.  Therefore it is not required to check the return
  * value of this function.
  */
-int qcow2_mark_dirty(BlockDriverState *bs)
+static int qcow2_add_feature(BlockDriverState *bs,
+ QCow2IncompatibleFeature feature)
 {
 BDRVQcowState *s = bs-opaque;
 uint64_t val;
-int ret;
+int ret = 0;
 
 assert(s-qcow_version = 3);
 
-if (s-incompatible_features  QCOW2_INCOMPAT_DIRTY) {
-return 0; /* already dirty */
+if (s-incompatible_features  feature) {
+return 0; /* already added */
 }
 
-val = cpu_to_be64(s-incompatible_features | QCOW2_INCOMPAT_DIRTY);
+val = cpu_to_be64(s-incompatible_features | feature);
 ret = bdrv_pwrite(bs-file, offsetof(QCowHeader, incompatible_features),
   val, sizeof(val));
 if (ret  0) {
 return ret;
 }
-ret = bdrv_flush(bs-file);
-if (ret  0) {
-return ret;
-}
 
-/* Only treat image as dirty if the header was updated successfully */
-s-incompatible_features |= QCOW2_INCOMPAT_DIRTY;
+/* Only treat image as having the feature if the header was updated
+ * successfully
+ */
+s-incompatible_features |= feature;
 return 0;
 }
 
+int qcow2_mark_dirty(BlockDriverState *bs)
+{
+return qcow2_add_feature(bs, QCOW2_INCOMPAT_DIRTY);
+}
+
 /*
- * Clears the dirty bit and flushes before if necessary.  Only call this
- * function when there are no pending requests, it does not guard against
- * concurrent requests dirtying the image.
+ * Clears an incompatible feature bit and flushes before if necessary.
+ * Only call this function when there are no pending requests, it does not
+ * guard against concurrent requests adding a feature to the image.
  */
-static int qcow2_mark_clean(BlockDriverState *bs)
+static int qcow2_remove_feature(BlockDriverState *bs,
+ QCow2IncompatibleFeature feature)
 {
 BDRVQcowState *s = bs-opaque;
+int ret = 0;
 
-if (s-incompatible_features  QCOW2_INCOMPAT_DIRTY) {
-int ret = bdrv_flush(bs);
+if (s-incompatible_features  feature) {
+ret = bdrv_flush(bs);
 if (ret  0) {
 return ret;
 }
 
-s-incompatible_features = ~QCOW2_INCOMPAT_DIRTY;
+s-incompatible_features = ~feature;
 return qcow2_update_header(bs);
 }
 return 0;
 }
 
+static int qcow2_mark_clean(BlockDriverState *bs)
+{
+return qcow2_remove_feature(bs, QCOW2_INCOMPAT_DIRTY);
+}
+
 static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
BdrvCheckMode fix)
 {
diff --git a/block/qcow2.h b/block/qcow2.h
index 43586f2..7813c4c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -159,14 +159,14 @@ enum {
 };
 
 /* Incompatible feature bits */
-enum {
+typedef enum {
 QCOW2_INCOMPAT_DIRTY_BITNR   = 0,
 QCOW2_INCOMPAT_DIRTY = 1  QCOW2_INCOMPAT_DIRTY_BITNR,
 QCOW2_INCOMPAT_DEDUP_BITNR   = 1,
 QCOW2_INCOMPAT_DEDUP = 1  QCOW2_INCOMPAT_DEDUP_BITNR,
 
 QCOW2_INCOMPAT_MASK  = QCOW2_INCOMPAT_DIRTY | QCOW2_INCOMPAT_DEDUP,
-};
+} QCow2IncompatibleFeature;
 
 /* Compatible feature bits */
 enum {
-- 
1.7.10.4




[Qemu-devel] [RFC V4 08/30] qcow2: Implement qcow2_compute_cluster_hash.

2013-01-02 Thread Benoît Canet
Add detection of libgnutls used to compute SHA256 hashes

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-dedup.c |   13 -
 configure   |   22 ++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 2a444f5..0914267 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -25,6 +25,8 @@
  * THE SOFTWARE.
  */
 
+#include gnutls/gnutls.h
+#include gnutls/crypto.h
 #include block/block_int.h
 #include qemu-common.h
 #include qcow2.h
@@ -157,7 +159,16 @@ static int qcow2_compute_cluster_hash(BlockDriverState *bs,
QCowHash *hash,
uint8_t *data)
 {
-return 0;
+BDRVQcowState *s = bs-opaque;
+switch (s-dedup_hash_algo) {
+case QCOW_HASH_SHA256:
+return gnutls_hash_fast(GNUTLS_DIG_SHA256, data,
+s-cluster_size, hash-data);
+default:
+error_report(Invalid deduplication hash algorithm %i,
+ s-dedup_hash_algo);
+abort();
+}
 }
 
 /*
diff --git a/configure b/configure
index 99c1ec3..390326e 100755
--- a/configure
+++ b/configure
@@ -1724,6 +1724,28 @@ EOF
 fi
 
 ##
+# QCOW Deduplication gnutls detection
+cat  $TMPC EOF
+#include gnutls/gnutls.h
+#include gnutls/crypto.h
+int main(void) {char data[4096], digest[32];
+gnutls_hash_fast(GNUTLS_DIG_SHA256, data, 4096, digest);
+return 0;
+}
+EOF
+qcow_tls_cflags=`$pkg_config --cflags gnutls 2 /dev/null`
+qcow_tls_libs=`$pkg_config --libs gnutls 2 /dev/null`
+if compile_prog $qcow_tls_cflags $qcow_tls_libs ; then
+  qcow_tls=yes
+  libs_softmmu=$qcow_tls_libs $libs_softmmu
+  libs_tools=$qcow_tls_libs $libs_softmmu
+  QEMU_CFLAGS=$QEMU_CFLAGS $qcow_tls_cflags
+else
+  echo gnutls  2.10.0 required to compile QEMU
+  exit 1
+fi
+
+##
 # VNC SASL detection
 if test $vnc = yes -a $vnc_sasl != no ; then
   cat  $TMPC EOF
-- 
1.7.10.4




Re: [Qemu-devel] [PATCH 8/8] qom: Make CPU a child of DeviceState

2013-01-02 Thread Igor Mammedov
On Wed, 02 Jan 2013 16:08:42 +0100
Andreas Färber afaer...@suse.de wrote:

 Am 05.12.2012 17:49, schrieb Eduardo Habkost:
  This finally makes the CPU class a child of DeviceState, allowing us to
  start using DeviceState properties on CPU subclasses.
 
 To avoid confusion with child properties and DeviceState vs.
 DeviceClass I have reworded this to subclass of Device in my
 qom-cpu-dev queue.
 
  
  It has no_user=1, as creating CPUs using -device doesn't work yet.
  
 
  (based on a previous patch from Igor Mammedov)
 
 Can this comment be turned into or amended by the usual Signed-off-by?
Signed-off-by should be ok.

 
  
  Signed-off-by: Eduardo Habkost ehabk...@redhat.com
  ---
  Changes v1 (imammedo) - v2 (ehabkost):
   - Change CPU type declaration to hae TYPE_DEVICE as parent
  
  Changes v2 - v3 (ehabkost):
   - Set no_user=1 on the CPU class
  ---
   include/qemu/cpu.h | 6 +++---
   qom/cpu.c  | 5 -
   2 files changed, 7 insertions(+), 4 deletions(-)
  
  diff --git a/include/qemu/cpu.h b/include/qemu/cpu.h
  index 61b7698..bc004fd 100644
  --- a/include/qemu/cpu.h
  +++ b/include/qemu/cpu.h
  @@ -20,7 +20,7 @@
   #ifndef QEMU_CPU_H
   #define QEMU_CPU_H
   
  -#include qemu/object.h
  +#include hw/qdev-core.h
   #include qemu-thread.h
   
   /**
 [...]
  diff --git a/qom/cpu.c b/qom/cpu.c
  index 5b36046..d301f72 100644
  --- a/qom/cpu.c
  +++ b/qom/cpu.c
  @@ -20,6 +20,7 @@
   
   #include qemu/cpu.h
   #include qemu-common.h
  +#include hw/qdev-core.h
 
 Already included via qom/cpu.h (formerly qemu/cpu.h) above, dropping.
 
   
   void cpu_reset(CPUState *cpu)
   {
  @@ -36,14 +37,16 @@ static void cpu_common_reset(CPUState *cpu)
   
   static void cpu_class_init(ObjectClass *klass, void *data)
   {
  +DeviceClass *dc = DEVICE_CLASS(klass);
   CPUClass *k = CPU_CLASS(klass);
   
   k-reset = cpu_common_reset;
  +dc-no_user = 1;
   }
 
 I wonder if we should add a comment that we are intentionally not
 hooking up dc-reset (yet)?
not relevant to this patch, could be separate patch though.

 
   
   static TypeInfo cpu_type_info = {
 
 Would like to add the missing const while touching this.
 
   .name = TYPE_CPU,
  -.parent = TYPE_OBJECT,
  +.parent = TYPE_DEVICE,
   .instance_size = sizeof(CPUState),
   .abstract = true,
   .class_size = sizeof(CPUClass),
 
 My testing so far confirms that the combination of object_new() without
 qdev_init[_nofail]() is working fine.
+1, I tested this combo for (x86)-(softmmu|linux-user) targets, no issues were
found so far.

 
 Using qdev_create() in the current state of stubs would lead to a silly
 if-bus-is-NULL-set-it-to-NULL sequence on top of object_new(). I do not
 expect qdev_create() to grow in functionality, so continuing to use
 object_new() should be okay - SoCs like my Tegra model may want to use
 object_initialize() so we cannot prescribe using qdev_create() anyway.
 
 qdev_init_nofail() would call the qdev initfn (to be replaced by
 realizefn, not used for CPU in this patch), then if no parent add it to
 /machine/unassigned, register VMSD if not NULL, update the internal
 state (blocking static property changes) and if hotplugged reset (unused
 due to dc-no_user and lack of dc-reset). The /machine/unassigned part
 may be interesting, e.g., for APIC modelling (so that we can model the
 former ptr property / now pointer-setting as a link property).
 
 With these considerations I am leaning towards accepting this patch if
 nobody objects, so that we can move on to the next refactorings...
+1

 
 Regards,
 Andreas
 




Re: [Qemu-devel] [RFC V4 30/30] qemu-iotests: Filter dedup=on/off so existing tests don't break.

2013-01-02 Thread Eric Blake
On 01/02/2013 09:16 AM, Benoît Canet wrote:
 Signed-off-by: Benoit Canet ben...@irqsave.net
 ---
  tests/qemu-iotests/common.rc |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
 index aef5f52..72e746d 100644
 --- a/tests/qemu-iotests/common.rc
 +++ b/tests/qemu-iotests/common.rc
 @@ -124,7 +124,8 @@ _make_test_img()
  -e s# compat='[^']*'##g \
  -e s# compat6=\\(on\\|off\\)##g \
  -e s# static=\\(on\\|off\\)##g \
 --e s# lazy_refcounts=\\(on\\|off\\)##g
 +-e s# lazy_refcounts=\\(on\\|off\\)##g \
 +-e s# dedup=\\('sha256'\\|'skein'\\|'sha3'\\)##g

Shouldn't this patch be hoisted earlier into the series, or even
squashed in with the patch that introduced the temporary test failures?
 That is, you want 'git bisect' to pass on every patch in the series,
rather than introducing problems in one patch that only get cleaned up
in a later patch.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH 03/18] dataplane: add host memory mapping code

2013-01-02 Thread Stefan Hajnoczi
The data plane thread needs to map guest physical addresses to host
pointers.  Normally this is done with cpu_physical_memory_map() but the
function assumes the global mutex is held.  The data plane thread does
not touch the global mutex and therefore needs a thread-safe memory
mapping mechanism.

Hostmem registers a MemoryListener similar to how vhost collects and
pushes memory region information into the kernel.  There is a
fine-grained lock on the regions list which is held during lookup and
when installing a new regions list.

When the physical memory map changes the MemoryListener callbacks are
invoked.  They build up a new list of memory regions which is finally
installed when the list has been completed.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/Makefile.objs   |   2 +-
 hw/dataplane/Makefile.objs |   3 +
 hw/dataplane/hostmem.c | 176 +
 hw/dataplane/hostmem.h |  57 +++
 4 files changed, 237 insertions(+), 1 deletion(-)
 create mode 100644 hw/dataplane/Makefile.objs
 create mode 100644 hw/dataplane/hostmem.c
 create mode 100644 hw/dataplane/hostmem.h

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index d75f2f0..5ac4913 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y = usb/ ide/ pci/
+common-obj-y = usb/ ide/ pci/ dataplane/
 common-obj-y += loader.o
 common-obj-$(CONFIG_VIRTIO) += virtio-console.o
 common-obj-$(CONFIG_VIRTIO) += virtio-rng.o
diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
new file mode 100644
index 000..8c8dea1
--- /dev/null
+++ b/hw/dataplane/Makefile.objs
@@ -0,0 +1,3 @@
+ifeq ($(CONFIG_VIRTIO), y)
+common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o
+endif
diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c
new file mode 100644
index 000..380537e
--- /dev/null
+++ b/hw/dataplane/hostmem.c
@@ -0,0 +1,176 @@
+/*
+ * Thread-safe guest to host memory mapping
+ *
+ * Copyright 2012 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Stefan Hajnoczi stefa...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include exec/address-spaces.h
+#include hostmem.h
+
+static int hostmem_lookup_cmp(const void *phys_, const void *region_)
+{
+hwaddr phys = *(const hwaddr *)phys_;
+const HostMemRegion *region = region_;
+
+if (phys  region-guest_addr) {
+return -1;
+} else if (phys = region-guest_addr + region-size) {
+return 1;
+} else {
+return 0;
+}
+}
+
+/**
+ * Map guest physical address to host pointer
+ */
+void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
+{
+HostMemRegion *region;
+void *host_addr = NULL;
+hwaddr offset_within_region;
+
+qemu_mutex_lock(hostmem-current_regions_lock);
+region = bsearch(phys, hostmem-current_regions,
+ hostmem-num_current_regions,
+ sizeof(hostmem-current_regions[0]),
+ hostmem_lookup_cmp);
+if (!region) {
+goto out;
+}
+if (is_write  region-readonly) {
+goto out;
+}
+offset_within_region = phys - region-guest_addr;
+if (len = region-size - offset_within_region) {
+host_addr = region-host_addr + offset_within_region;
+}
+out:
+qemu_mutex_unlock(hostmem-current_regions_lock);
+
+return host_addr;
+}
+
+/**
+ * Install new regions list
+ */
+static void hostmem_listener_commit(MemoryListener *listener)
+{
+HostMem *hostmem = container_of(listener, HostMem, listener);
+
+qemu_mutex_lock(hostmem-current_regions_lock);
+g_free(hostmem-current_regions);
+hostmem-current_regions = hostmem-new_regions;
+hostmem-num_current_regions = hostmem-num_new_regions;
+qemu_mutex_unlock(hostmem-current_regions_lock);
+
+/* Reset new regions list */
+hostmem-new_regions = NULL;
+hostmem-num_new_regions = 0;
+}
+
+/**
+ * Add a MemoryRegionSection to the new regions list
+ */
+static void hostmem_append_new_region(HostMem *hostmem,
+  MemoryRegionSection *section)
+{
+void *ram_ptr = memory_region_get_ram_ptr(section-mr);
+size_t num = hostmem-num_new_regions;
+size_t new_size = (num + 1) * sizeof(hostmem-new_regions[0]);
+
+hostmem-new_regions = g_realloc(hostmem-new_regions, new_size);
+hostmem-new_regions[num] = (HostMemRegion){
+.host_addr = ram_ptr + section-offset_within_region,
+.guest_addr = section-offset_within_address_space,
+.size = section-size,
+.readonly = section-readonly,
+};
+hostmem-num_new_regions++;
+}
+
+static void hostmem_listener_append_region(MemoryListener *listener,
+   MemoryRegionSection *section)
+{
+HostMem *hostmem = container_of(listener, HostMem, listener);
+
+/* Ignore 

[Qemu-devel] [RFC V4 06/30] qcow2: Add qcow2_dedup and related functions

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-dedup.c |  436 +++
 block/qcow2.h   |5 +
 2 files changed, 441 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 4e99eb1..5901749 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -117,3 +117,439 @@ fail:
 *data = NULL;
 return ret;
 }
+
+/*
+ * Build a QCowHashNode structure
+ *
+ * @hash:   the given hash
+ * @physical_sect:  the cluster offset in the QCOW2 file
+ * @first_logical_sect: the first logical cluster offset written
+ * @ret:the build QCowHashNode
+ */
+static QCowHashNode *qcow2_dedup_build_qcow_hash_node(QCowHash *hash,
+  uint64_t physical_sect,
+  uint64_t first_logical_sect)
+{
+QCowHashNode *hash_node;
+
+hash_node = g_new0(QCowHashNode, 1);
+memcpy(hash_node-hash.data, hash-data, HASH_LENGTH);
+hash_node-physical_sect = physical_sect;
+hash_node-first_logical_sect = first_logical_sect;
+
+return hash_node;
+}
+
+/*
+ * Compute the hash of a given cluster
+ *
+ * @data: a buffer containing the cluster data
+ * @hash: a QCowHash where to store the computed hash
+ * @ret:  0 on success, negative on error
+ */
+static int qcow2_compute_cluster_hash(BlockDriverState *bs,
+   QCowHash *hash,
+   uint8_t *data)
+{
+return 0;
+}
+
+/*
+ * Get a QCowHashNode corresponding to a cluster data
+ *
+ * @phash:   if phash can be used no hash is computed
+ * @data:a buffer containing the cluster
+ * @nb_clusters_processed: the number of cluster to skip in the buffer
+ * @err: Error code if any
+ * @ret: QCowHashNode of the duplicated cluster or NULL if not 
found
+ */
+static QCowHashNode *qcow2_get_hash_node_for_cluster(BlockDriverState *bs,
+ QcowPersistantHash *phash,
+ uint8_t *data,
+ int nb_clusters_processed,
+ int *err)
+{
+BDRVQcowState *s = bs-opaque;
+int ret = 0;
+*err = 0;
+
+/* no hash has been provided compute it and store it for later usage */
+if (!phash-reuse) {
+ret = qcow2_compute_cluster_hash(bs,
+ phash-hash,
+ data +
+ nb_clusters_processed *
+ s-cluster_size);
+}
+
+/* do not reuse the hash anymore if it was precomputed */
+phash-reuse = false;
+
+if (ret  0) {
+*err = ret;
+return NULL;
+}
+
+return g_tree_lookup(s-dedup_tree_by_hash, phash-hash);
+}
+
+/*
+ * Build a QCowHashNode from a given QCowHash and insert it into the tree
+ *
+ * @hash: the given QCowHash
+ */
+static void qcow2_build_and_insert_hash_node(BlockDriverState *bs,
+ QCowHash *hash)
+{
+BDRVQcowState *s = bs-opaque;
+QCowHashNode *hash_node;
+
+/* build the hash node with QCOW_FLAG_EMPTY as offsets so we will remember
+ * to fill these field later with real values.
+ */
+hash_node = qcow2_dedup_build_qcow_hash_node(hash,
+ QCOW_FLAG_EMPTY,
+ QCOW_FLAG_EMPTY);
+g_tree_insert(s-dedup_tree_by_hash, hash_node-hash, hash_node);
+}
+
+/*
+ * Helper used to build a QCowHashElement
+ *
+ * @hash: the QCowHash to use
+ * @ret:  a newly allocated QCowHashElement containing the given hash
+ */
+static QCowHashElement *qcow2_build_dedup_hash(QCowHash *hash)
+{
+QCowHashElement *dedup_hash;
+dedup_hash = g_new0(QCowHashElement, 1);
+memcpy(dedup_hash-hash.data, hash-data, HASH_LENGTH);
+return dedup_hash;
+}
+
+/*
+ * Helper used to link a deduplicated cluster in the l2
+ *
+ * @logical_sect:  the cluster sector seen by the guest
+ * @physical_sect: the cluster sector in the QCOW2 file
+ * @overwrite: true if we must overwrite the L2 table entry
+ * @ret:
+ */
+static int qcow2_dedup_link_l2(BlockDriverState *bs,
+   uint64_t logical_sect,
+   uint64_t physical_sect,
+   bool overwrite)
+{
+QCowL2Meta m = {
+.alloc_offset   = physical_sect  9,
+.offset = logical_sect  9,
+.nb_clusters= 1,
+.nb_available   = 0,
+.cow_start = {
+.offset = 0,
+.nb_sectors = 0,
+},
+.cow_end = {
+.offset = 0,
+.nb_sectors = 0,
+},
+.oflag_copied   = false,
+.overwrite 

[Qemu-devel] [RFC V4 03/30] qcow2: Add qcow2_dedup_read_missing_and_concatenate

2013-01-02 Thread Benoît Canet
This function is used to read missing data when unaligned writes are
done. This function also concatenate missing data with the given
qiov data in order to prepare a buffer used to look for duplicated
clusters.

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/Makefile.objs |1 +
 block/qcow2-dedup.c |  119 +++
 block/qcow2.c   |   36 +++-
 block/qcow2.h   |   12 ++
 4 files changed, 167 insertions(+), 1 deletion(-)
 create mode 100644 block/qcow2-dedup.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index c067f38..21afc85 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,5 +1,6 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o 
vvfat.o
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
+block-obj-y += qcow2-dedup.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
new file mode 100644
index 000..4e99eb1
--- /dev/null
+++ b/block/qcow2-dedup.c
@@ -0,0 +1,119 @@
+/*
+ * Deduplication for the QCOW2 format
+ *
+ * Copyright (C) Nodalink, SARL. 2012-2013
+ *
+ * Author:
+ *   Benoît Canet benoit.ca...@irqsave.net
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include block/block_int.h
+#include qemu-common.h
+#include qcow2.h
+
+/*
+ * Prepare a buffer containing all the required data required to compute 
cluster
+ * sized deduplication hashes.
+ * If sector_num or nb_sectors are not cluster-aligned, missing data
+ * before/after the qiov will be read.
+ *
+ * @qiov:   the qiov for which missing data must be read
+ * @sector_num: the first sectors that must be read into the qiov
+ * @nb_sectors: the number of sectors to read into the qiov
+ * @data:   the place where the data will be concatenated and 
stored
+ * @nb_data_sectors:the resulting size of the contatenated data (in 
sectors)
+ * @ret:negative on error
+ */
+int qcow2_dedup_read_missing_and_concatenate(BlockDriverState *bs,
+ QEMUIOVector *qiov,
+ uint64_t sector_num,
+ int nb_sectors,
+ uint8_t **data,
+ int *nb_data_sectors)
+{
+BDRVQcowState *s = bs-opaque;
+int ret = 0;
+uint64_t cluster_beginning_sector;
+uint64_t first_sector_after_qiov;
+int cluster_beginning_nr;
+int cluster_ending_nr;
+int unaligned_ending_nr;
+uint64_t max_cluster_ending_nr;
+
+/* compute how much and where to read at the beginning */
+cluster_beginning_nr = sector_num  (s-cluster_sectors - 1);
+cluster_beginning_sector = sector_num - cluster_beginning_nr;
+
+/* for the ending */
+first_sector_after_qiov = sector_num + nb_sectors;
+unaligned_ending_nr = first_sector_after_qiov  (s-cluster_sectors - 1);
+cluster_ending_nr = unaligned_ending_nr ?
+s-cluster_sectors - unaligned_ending_nr : 0;
+
+/* compute total size in sectors and allocate memory */
+*nb_data_sectors = cluster_beginning_nr + nb_sectors + cluster_ending_nr;
+*data = qemu_blockalign(bs, *nb_data_sectors * BDRV_SECTOR_SIZE);
+
+/* read beginning */
+if (cluster_beginning_nr) {
+ret = qcow2_read_cluster_data(bs,
+  *data,
+  cluster_beginning_sector,
+  cluster_beginning_nr);
+}
+
+if (ret  0) {
+goto fail;
+}
+
+/* append qiov content */
+qemu_iovec_to_buf(qiov, 0, *data + cluster_beginning_nr * BDRV_SECTOR_SIZE,
+   

[Qemu-devel] [RFC V4 07/30] qcow2: Add qcow2_dedup_store_new_hashes.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-dedup.c |  315 ++-
 block/qcow2.h   |5 +
 2 files changed, 319 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 5901749..2a444f5 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -29,6 +29,12 @@
 #include qemu-common.h
 #include qcow2.h
 
+static int qcow2_dedup_read_write_hash(BlockDriverState *bs,
+   QCowHash *hash,
+   uint64_t *first_logical_sect,
+   uint64_t physical_sect,
+   bool write);
+
 /*
  * Prepare a buffer containing all the required data required to compute 
cluster
  * sized deduplication hashes.
@@ -291,7 +297,11 @@ static int 
qcow2_clear_l2_copied_flag_if_needed(BlockDriverState *bs,
 /* remember that we dont't need to clear QCOW_OFLAG_COPIED again */
 hash_node-first_logical_sect = first_logical_sect;
 
-return 0;
+/* clear the QCOW_FLAG_FIRST flag from disk */
+return qcow2_dedup_read_write_hash(bs, hash_node-hash,
+   hash_node-first_logical_sect,
+   hash_node-physical_sect,
+   true);
 }
 
 /* This function deduplicate a cluster
@@ -553,3 +563,306 @@ exit:
 
 return deduped_clusters_nr * s-cluster_sectors - begining_index;
 }
+
+
+/* Create a deduplication table hash block, write it's offset to disk and
+ * reference it in the RAM deduplication table
+ *
+ * sync this to disk and get the dedup cluster cache entry
+ *
+ * @index: index in the RAM deduplication table
+ * @ret:   offset on success, negative on error
+ */
+static uint64_t qcow2_create_block(BlockDriverState *bs,
+   int32_t index)
+{
+BDRVQcowState *s = bs-opaque;
+int64_t offset;
+uint64_t data64;
+int ret = 0;
+
+/* allocate a new dedup table hash block */
+offset = qcow2_alloc_clusters(bs, s-hash_block_size);
+
+if (offset  0) {
+return offset;
+}
+
+ret = qcow2_cache_flush(bs, s-refcount_block_cache);
+if (ret  0) {
+goto free_fail;
+}
+
+/* write the new block offset in the dedup table L1 */
+data64 = cpu_to_be64(offset);
+ret = bdrv_pwrite_sync(bs-file,
+   s-dedup_table_offset +
+   index * sizeof(uint64_t),
+   data64, sizeof(data64));
+
+if (ret  0) {
+goto free_fail;
+}
+
+s-dedup_table[index] = offset;
+
+return offset;
+
+free_fail:
+qcow2_free_clusters(bs, offset, s-hash_block_size);
+return ret;
+}
+
+static int qcow2_create_and_get_block(BlockDriverState *bs,
+  uint32_t index,
+  uint8_t **block)
+{
+BDRVQcowState *s = bs-opaque;
+int ret = 0;
+int64_t offset;
+
+offset = qcow2_create_block(bs, index);
+
+if (offset  0) {
+return offset;
+}
+
+
+/* get an empty cluster from the dedup cache */
+ret = qcow2_cache_get_empty(bs, s-dedup_cluster_cache,
+offset,
+(void **) block);
+
+if (ret  0) {
+return ret;
+}
+
+/* clear it */
+memset(*block, 0, s-hash_block_size);
+
+return 0;
+}
+
+static inline bool qcow2_has_dedup_block(BlockDriverState *bs,
+ uint32_t index)
+{
+BDRVQcowState *s = bs-opaque;
+return s-dedup_table[index] == 0 ? false : true;
+}
+
+static inline void qcow2_write_hash_to_block_and_dirty(BlockDriverState *bs,
+   uint8_t *block,
+   QCowHash *hash,
+   int offset,
+   uint64_t *logical_sect)
+{
+BDRVQcowState *s = bs-opaque;
+uint64_t first;
+first = cpu_to_be64(*logical_sect);
+memcpy(block + offset, hash-data, HASH_LENGTH);
+memcpy(block + offset + HASH_LENGTH, first, 8);
+qcow2_cache_entry_mark_dirty(s-dedup_cluster_cache, block);
+}
+
+static inline uint64_t qcow2_read_hash_from_block(uint8_t *block,
+  QCowHash *hash,
+  int offset)
+{
+uint64_t first;
+memcpy(hash-data, block + offset, HASH_LENGTH);
+memcpy(first, block + offset + HASH_LENGTH, 8);
+return be64_to_cpu(first);
+}
+
+/* Read/write a given hash and cluster_sect from/to the dedup table
+ *
+ * This function doesn't flush the dedup cache to disk
+ *
+ * @hash: the hash to read or store
+ * @first_logical_sect:   logical sector of the QCOW_FLAG_OCOPIED 

Re: [Qemu-devel] [PATCH 8/8] qom: Make CPU a child of DeviceState

2013-01-02 Thread Eduardo Habkost
On Wed, Jan 02, 2013 at 05:40:58PM +0100, Igor Mammedov wrote:
 On Wed, 02 Jan 2013 16:08:42 +0100
 Andreas Färber afaer...@suse.de wrote:
 
  Am 05.12.2012 17:49, schrieb Eduardo Habkost:
   This finally makes the CPU class a child of DeviceState, allowing us to
   start using DeviceState properties on CPU subclasses.
  
  To avoid confusion with child properties and DeviceState vs.
  DeviceClass I have reworded this to subclass of Device in my
  qom-cpu-dev queue.
  
   
   It has no_user=1, as creating CPUs using -device doesn't work yet.
   
  
   (based on a previous patch from Igor Mammedov)
  
  Can this comment be turned into or amended by the usual Signed-off-by?
 Signed-off-by should be ok.

OK to me, as well. Should I resubmit, or can Andreas edit it when
committing the patch?

 
  
   
   Signed-off-by: Eduardo Habkost ehabk...@redhat.com
   ---
   Changes v1 (imammedo) - v2 (ehabkost):
- Change CPU type declaration to hae TYPE_DEVICE as parent
   
   Changes v2 - v3 (ehabkost):
- Set no_user=1 on the CPU class
   ---
include/qemu/cpu.h | 6 +++---
qom/cpu.c  | 5 -
2 files changed, 7 insertions(+), 4 deletions(-)
   
   diff --git a/include/qemu/cpu.h b/include/qemu/cpu.h
   index 61b7698..bc004fd 100644
   --- a/include/qemu/cpu.h
   +++ b/include/qemu/cpu.h
   @@ -20,7 +20,7 @@
#ifndef QEMU_CPU_H
#define QEMU_CPU_H

   -#include qemu/object.h
   +#include hw/qdev-core.h
#include qemu-thread.h

/**
  [...]
   diff --git a/qom/cpu.c b/qom/cpu.c
   index 5b36046..d301f72 100644
   --- a/qom/cpu.c
   +++ b/qom/cpu.c
   @@ -20,6 +20,7 @@

#include qemu/cpu.h
#include qemu-common.h
   +#include hw/qdev-core.h
  
  Already included via qom/cpu.h (formerly qemu/cpu.h) above, dropping.
  

void cpu_reset(CPUState *cpu)
{
   @@ -36,14 +37,16 @@ static void cpu_common_reset(CPUState *cpu)

static void cpu_class_init(ObjectClass *klass, void *data)
{
   +DeviceClass *dc = DEVICE_CLASS(klass);
CPUClass *k = CPU_CLASS(klass);

k-reset = cpu_common_reset;
   +dc-no_user = 1;
}
  
  I wonder if we should add a comment that we are intentionally not
  hooking up dc-reset (yet)?
 not relevant to this patch, could be separate patch though.
 
  

static TypeInfo cpu_type_info = {
  
  Would like to add the missing const while touching this.
  
.name = TYPE_CPU,
   -.parent = TYPE_OBJECT,
   +.parent = TYPE_DEVICE,
.instance_size = sizeof(CPUState),
.abstract = true,
.class_size = sizeof(CPUClass),
  
  My testing so far confirms that the combination of object_new() without
  qdev_init[_nofail]() is working fine.
 +1, I tested this combo for (x86)-(softmmu|linux-user) targets, no issues were
 found so far.
 
  
  Using qdev_create() in the current state of stubs would lead to a silly
  if-bus-is-NULL-set-it-to-NULL sequence on top of object_new(). I do not
  expect qdev_create() to grow in functionality, so continuing to use
  object_new() should be okay - SoCs like my Tegra model may want to use
  object_initialize() so we cannot prescribe using qdev_create() anyway.
  
  qdev_init_nofail() would call the qdev initfn (to be replaced by
  realizefn, not used for CPU in this patch), then if no parent add it to
  /machine/unassigned, register VMSD if not NULL, update the internal
  state (blocking static property changes) and if hotplugged reset (unused
  due to dc-no_user and lack of dc-reset). The /machine/unassigned part
  may be interesting, e.g., for APIC modelling (so that we can model the
  former ptr property / now pointer-setting as a link property).
  
  With these considerations I am leaning towards accepting this patch if
  nobody objects, so that we can move on to the next refactorings...
 +1
 
  
  Regards,
  Andreas
  
 

-- 
Eduardo



[Qemu-devel] [RFC V4 24/30] qcow2: Do not overwrite existing entries with QCOW_OFLAG_COPIED.

2013-01-02 Thread Benoît Canet
In the case of a race condition between two writes a l2 entry can be written
without QCOW_OFLAG_COPIED before the first write fill it.
This patch simply check if the l2 entry has the correct offset without
QCOW_OFLAG_COPIED and do nothing.

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-cluster.c |4 
 1 file changed, 4 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index dbcb6d2..07037a0 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -709,6 +709,10 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 qcow2_cache_entry_mark_dirty(s-l2_table_cache, l2_table);
 
 for (i = 0; i  m-nb_clusters; i++) {
+if (be64_to_cpu(l2_table[l2_index + i]) ==
+(cluster_offset + (i  s-cluster_bits))) {
+continue;
+}
 /* if two concurrent writes happen to the same unallocated cluster
 * each write allocates separate cluster and writes data concurrently.
 * The first one to complete updates l2 table with pointer to its
-- 
1.7.10.4




[Qemu-devel] [RFC V4 26/30] qcow2: Add lazy refcounts to deduplication to prevent qcow2_cache_set_dependency loops

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0154d50..f66e67d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1606,6 +1606,7 @@ static int qcow2_create(const char *filename, 
QEMUOptionParameter *options)
 return hash_algo;
 }
 dedup = true;
+flags |= BLOCK_FLAG_LAZY_REFCOUNTS;
 }
 options++;
 }
-- 
1.7.10.4




Re: [Qemu-devel] [RFC V4 30/30] qemu-iotests: Filter dedup=on/off so existing tests don't break.

2013-01-02 Thread Benoît Canet

Ack.
There is more than one patch to move.
I'll do in for the next RFC.

Regards

Benoît

Le Wednesday 02 Jan 2013 à 09:42:06 (-0700), Eric Blake a écrit :
 On 01/02/2013 09:16 AM, Benoît Canet wrote:
  Signed-off-by: Benoit Canet ben...@irqsave.net
  ---
   tests/qemu-iotests/common.rc |3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)
  
  diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
  index aef5f52..72e746d 100644
  --- a/tests/qemu-iotests/common.rc
  +++ b/tests/qemu-iotests/common.rc
  @@ -124,7 +124,8 @@ _make_test_img()
   -e s# compat='[^']*'##g \
   -e s# compat6=\\(on\\|off\\)##g \
   -e s# static=\\(on\\|off\\)##g \
  --e s# lazy_refcounts=\\(on\\|off\\)##g
  +-e s# lazy_refcounts=\\(on\\|off\\)##g \
  +-e s# dedup=\\('sha256'\\|'skein'\\|'sha3'\\)##g
 
 Shouldn't this patch be hoisted earlier into the series, or even
 squashed in with the patch that introduced the temporary test failures?
  That is, you want 'git bisect' to pass on every patch in the series,
 rather than introducing problems in one patch that only get cleaned up
 in a later patch.
 
 -- 
 Eric Blake   eblake redhat com+1-919-301-3266
 Libvirt virtualization library http://libvirt.org
 





[Qemu-devel] [PATCH 15/18] qemu-img: report size overflow error message

2013-01-02 Thread Stefan Hajnoczi
From: liguang lig.f...@cn.fujitsu.com

qemu-img will complain when qcow or qcow2
size overflow for 64 bits, report the right
message in this condition.

$./qemu-img create -f qcow2 /tmp/foo 0x1
before change:
qemu-img: Invalid image size specified! You may use k, M, G or T suffixes for
qemu-img: kilobytes, megabytes, gigabytes and terabytes.

after change:
qemu-img: Image size must be less than 8 EiB!

[Resolved conflict with a9300911 goto removal -- Stefan]

Signed-off-by: liguang lig.f...@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 qemu-img.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 69cc028..85d3740 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -348,9 +348,13 @@ static int img_create(int argc, char **argv)
 char *end;
 sval = strtosz_suffix(argv[optind++], end, STRTOSZ_DEFSUFFIX_B);
 if (sval  0 || *end) {
-error_report(Invalid image size specified! You may use k, M, G or 

-  T suffixes for );
-error_report(kilobytes, megabytes, gigabytes and terabytes.);
+if (sval == -ERANGE) {
+error_report(Image size must be less than 8 EiB!);
+} else {
+error_report(Invalid image size specified! You may use k, M, 
+  G or T suffixes for );
+error_report(kilobytes, megabytes, gigabytes and terabytes.);
+}
 return 1;
 }
 img_size = (uint64_t)sval;
-- 
1.8.0.2




[Qemu-devel] [PATCH 17/18] sheepdog: don't update inode when create_and_write fails

2013-01-02 Thread Stefan Hajnoczi
From: Liu Yuan tailai...@taobao.com

For the error case such as SD_RES_NO_SPACE, we shouldn't update the inode bitmap
to avoid the scenario that the object is allocated but wasn't created at the
server side. This will result in VM's IO error on the failed object.

Cc: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
Cc: Kevin Wolf kw...@redhat.com
Signed-off-by: Liu Yuan tailai...@taobao.com
Reviewed-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/sheepdog.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 13dc023..b9186fb 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -714,10 +714,11 @@ static void coroutine_fn aio_read_response(void *opaque)
  * and max_dirty_data_idx are changed to include updated
  * index between them.
  */
-s-inode.data_vdi_id[idx] = s-inode.vdi_id;
-s-max_dirty_data_idx = MAX(idx, s-max_dirty_data_idx);
-s-min_dirty_data_idx = MIN(idx, s-min_dirty_data_idx);
-
+if (rsp.result == SD_RES_SUCCESS) {
+s-inode.data_vdi_id[idx] = s-inode.vdi_id;
+s-max_dirty_data_idx = MAX(idx, s-max_dirty_data_idx);
+s-min_dirty_data_idx = MIN(idx, s-min_dirty_data_idx);
+}
 /*
  * Some requests may be blocked because simultaneous
  * create requests are not allowed, so we search the
-- 
1.8.0.2




Re: [Qemu-devel] [PATCH 6/6] usb-tablet: Allow connecting to ehci

2013-01-02 Thread Hans de Goede

Hi,

On 12/30/2012 12:55 PM, Erik Rull wrote:

Hi Hans,

Hans de Goede wrote:

Hi,

On 12/26/2012 01:07 PM, Erik Rull wrote:

Hi Gerd, hi Hans,

is my assumption correct that if I check out and compile this version
from GIT master that the usb-tablet device is automatically routed to
ehci without changing anything else in the qemu call arguments? (And the
performance enhancement takes place automatically)
If not - what has to be changed to get it working?


That depends, if you specify a machine model, you need to change it to
pc-1.4, if you
don't specify a machine model you will get the change automatically, as 1.4
is the new
default machine model.

Regards,

Hans


Thanks.

QEMU shows version 1.3.50 at the moment (from git), is the 1.4 model internally 
already active there?


Probably, that depends on the exact git version you're using (the 1.3.50 just
indicates you're using a git build, not which version exactly). If you've build
from a recent git checkout, then yes the default model is already 1.4

Regards,

Hans



[Qemu-devel] [PATCH 07/18] iov: add iov_discard_front/back() to remove data

2013-01-02 Thread Stefan Hajnoczi
The iov_discard_front/back() functions remove data from the front or
back of the vector.  This is useful when peeling off header/footer
structs.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 include/qemu/iov.h | 13 +
 iov.c  | 51 +++
 2 files changed, 64 insertions(+)

diff --git a/include/qemu/iov.h b/include/qemu/iov.h
index d06f8b9..68d25f2 100644
--- a/include/qemu/iov.h
+++ b/include/qemu/iov.h
@@ -99,4 +99,17 @@ unsigned iov_copy(struct iovec *dst_iov, unsigned int 
dst_iov_cnt,
  const struct iovec *iov, unsigned int iov_cnt,
  size_t offset, size_t bytes);
 
+/*
+ * Remove a given number of bytes from the front or back of a vector.
+ * This may update iov and/or iov_cnt to exclude iovec elements that are
+ * no longer required.
+ *
+ * The number of bytes actually discarded is returned.  This number may be
+ * smaller than requested if the vector is too small.
+ */
+size_t iov_discard_front(struct iovec **iov, unsigned int *iov_cnt,
+ size_t bytes);
+size_t iov_discard_back(struct iovec *iov, unsigned int *iov_cnt,
+size_t bytes);
+
 #endif
diff --git a/iov.c b/iov.c
index 419e419..92ad77b 100644
--- a/iov.c
+++ b/iov.c
@@ -354,3 +354,54 @@ size_t qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
 {
 return iov_memset(qiov-iov, qiov-niov, offset, fillc, bytes);
 }
+
+size_t iov_discard_front(struct iovec **iov, unsigned int *iov_cnt,
+ size_t bytes)
+{
+size_t total = 0;
+struct iovec *cur;
+
+for (cur = *iov; *iov_cnt  0; cur++) {
+if (cur-iov_len  bytes) {
+cur-iov_base += bytes;
+cur-iov_len -= bytes;
+total += bytes;
+break;
+}
+
+bytes -= cur-iov_len;
+total += cur-iov_len;
+*iov_cnt -= 1;
+}
+
+*iov = cur;
+return total;
+}
+
+size_t iov_discard_back(struct iovec *iov, unsigned int *iov_cnt,
+size_t bytes)
+{
+size_t total = 0;
+struct iovec *cur;
+
+if (*iov_cnt == 0) {
+return 0;
+}
+
+cur = iov + (*iov_cnt - 1);
+
+while (*iov_cnt  0) {
+if (cur-iov_len  bytes) {
+cur-iov_len -= bytes;
+total += bytes;
+break;
+}
+
+bytes -= cur-iov_len;
+total += cur-iov_len;
+cur--;
+*iov_cnt -= 1;
+}
+
+return total;
+}
-- 
1.8.0.2




[Qemu-devel] [RFC V4 18/30] qcow2: Behave correctly when refcount reach 0 or 2^16.

2013-01-02 Thread Benoît Canet
When refcount reach zero we destroy the hash on disk and remove it from GTree.
When refcount is at it's maximum value we mark the hash so it won't be loaded
at next startup and remove it from GTree.

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-dedup.c|   79 +---
 block/qcow2-refcount.c |6 
 block/qcow2.h  |6 
 3 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index 12a2dad..28001c6 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -804,11 +804,19 @@ static inline bool is_hash_node_empty(QCowHashNode 
*hash_node)
 return hash_node-physical_sect  QCOW_FLAG_EMPTY;
 }
 
+static void qcow2_remove_hash_node(BlockDriverState *bs,
+   QCowHashNode *hash_node)
+{
+BDRVQcowState *s = bs-opaque;
+g_tree_remove(s-dedup_tree_by_sect, hash_node-physical_sect);
+g_tree_remove(s-dedup_tree_by_hash, hash_node-hash);
+}
+
 /* This function removes a hash_node from the trees given a physical sector
  *
  * @physical_sect: The physical sector of the cluster corresponding to the hash
  */
-static void qcow_remove_hash_node_by_sector(BlockDriverState *bs,
+static void qcow2_remove_hash_node_by_sector(BlockDriverState *bs,
 uint64_t physical_sect)
 {
 BDRVQcowState *s = bs-opaque;
@@ -820,8 +828,7 @@ static void 
qcow_remove_hash_node_by_sector(BlockDriverState *bs,
 return;
 }
 
-g_tree_remove(s-dedup_tree_by_sect, hash_node-physical_sect);
-g_tree_remove(s-dedup_tree_by_hash, hash_node-hash);
+qcow2_remove_hash_node(bs, hash_node);
 }
 
 /* This function store a dedup hash information to disk and RAM
@@ -858,7 +865,7 @@ static int qcow2_store_dedup_hash(BlockDriverState *bs,
 logical_sect = logical_sect | QCOW_FLAG_FIRST;
 
 /* remove stale hash node pointing to this physical sector from the trees 
*/
-qcow_remove_hash_node_by_sector(bs, physical_sect);
+qcow2_remove_hash_node_by_sector(bs, physical_sect);
 
 /* fill the missing fields of the hash node */
 hash_node-physical_sect = physical_sect;
@@ -979,6 +986,12 @@ void coroutine_fn qcow2_co_load_dedup_hashes(void *opaque)
 continue;
 }
 
+/* if this cluster has reached max refcount don't load it */
+if (first_logical_sect  QCOW_FLAG_MAX_REFCOUNT) {
+qemu_co_mutex_unlock(s-lock);
+continue;
+}
+
 hash_node = qcow2_dedup_build_qcow_hash_node(hash,
  i * s-cluster_sectors,
  first_logical_sect);
@@ -1002,3 +1015,61 @@ void qcow2_dedup_close(BlockDriverState *bs)
 BDRVQcowState *s = bs-opaque;
 g_free(s-dedup_table);
 }
+
+/* Clean the last reference to a given cluster when it's refcount is zero
+ *
+ * @cluster_index: the index of the physical cluster
+ */
+void qcow2_dedup_refcount_zero_reached(BlockDriverState *bs,
+  uint64_t cluster_index)
+{
+BDRVQcowState *s = bs-opaque;
+QCowHash null_hash;
+uint64_t logical_sect = 0;
+uint64_t physical_sect = cluster_index * s-cluster_sectors;
+
+/* prepare null hash */
+memset(null_hash, 0, sizeof(null_hash));
+
+/* clear from disk */
+qcow2_dedup_read_write_hash(bs,
+null_hash,
+logical_sect,
+physical_sect,
+true);
+
+/* remove from ram if present so we won't dedup with it anymore */
+qcow2_remove_hash_node_by_sector(bs, physical_sect);
+}
+
+/* Force to use a new physical cluster and QCowHashNode when the refcount limit
+ * of 2^16 is about to break.
+ *
+ * @cluster_index: the index of the physical cluster
+ */
+void qcow2_dedup_refcount_max_reached(BlockDriverState *bs,
+  uint64_t cluster_index)
+{
+BDRVQcowState *s = bs-opaque;
+QCowHashNode *hash_node;
+uint64_t physical_sect = cluster_index * s-cluster_sectors;
+
+hash_node =  g_tree_lookup(s-dedup_tree_by_sect, physical_sect);
+
+if (!hash_node) {
+return;
+}
+
+/* mark this hash so we won't load it anymore at startup after writing it 
*/
+hash_node-first_logical_sect |= QCOW_FLAG_MAX_REFCOUNT;
+
+/* write to disk */
+qcow2_dedup_read_write_hash(bs,
+hash_node-hash,
+hash_node-first_logical_sect,
+hash_node-physical_sect,
+true);
+
+/* remove the QCowHashNode from ram so we won't use it anymore for dedup */
+qcow2_remove_hash_node(bs, hash_node);
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 75c2bde..aef280d 100644
--- a/block/qcow2-refcount.c
+++ 

[Qemu-devel] [RFC V4 11/30] qcow2: create function to load deduplication hashes at startup.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-dedup.c |   68 +++
 block/qcow2.h   |1 +
 2 files changed, 69 insertions(+)

diff --git a/block/qcow2-dedup.c b/block/qcow2-dedup.c
index b998a2d..4c391e5 100644
--- a/block/qcow2-dedup.c
+++ b/block/qcow2-dedup.c
@@ -918,3 +918,71 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
 
 return ret;
 }
+
+static void qcow2_dedup_insert_hash_and_preserve_newer(BlockDriverState *bs,
+   QCowHashNode *hash_node)
+{
+BDRVQcowState *s = bs-opaque;
+QCowHashNode *newer_hash_node;
+
+newer_hash_node = g_tree_lookup(s-dedup_tree_by_sect,
+hash_node-physical_sect);
+
+if (!newer_hash_node) {
+g_tree_insert(s-dedup_tree_by_hash, hash_node-hash, hash_node);
+g_tree_insert(s-dedup_tree_by_sect, hash_node-physical_sect,
+  hash_node);
+} else {
+g_free(hash_node);
+}
+}
+
+/*
+ * This coroutine load the deduplication hashes in the tree
+ *
+ * @data: the given BlockDriverState
+ * @ret:  NULL
+ */
+void coroutine_fn qcow2_co_load_dedup_hashes(void *opaque)
+{
+BlockDriverState *bs = opaque;
+BDRVQcowState *s = bs-opaque;
+int ret;
+QCowHash hash, null_hash;
+uint64_t max_clusters, i;
+uint64_t first_logical_sect;
+int nb_hash_in_hash_block = s-hash_block_size / (HASH_LENGTH + 8);
+QCowHashNode *hash_node;
+
+/* prepare the null hash */
+memset(null_hash, 0, sizeof(null_hash));
+
+max_clusters = s-dedup_table_size * nb_hash_in_hash_block;
+
+for (i = 0; i  max_clusters; i++) {
+/* get the hash */
+qemu_co_mutex_lock(s-lock);
+ret = qcow2_dedup_read_write_hash(bs, hash,
+  first_logical_sect,
+  i * s-cluster_sectors,
+  false);
+
+if (ret  0) {
+qemu_co_mutex_unlock(s-lock);
+error_report(Failed to load deduplication hash.);
+continue;
+}
+
+/* if the hash is null don't load it */
+if (!memcmp(hash.data, null_hash.data, HASH_LENGTH)) {
+qemu_co_mutex_unlock(s-lock);
+continue;
+}
+
+hash_node = qcow2_dedup_build_qcow_hash_node(hash,
+ i * s-cluster_sectors,
+ first_logical_sect);
+qcow2_dedup_insert_hash_and_preserve_newer(bs, hash_node);
+qemu_co_mutex_unlock(s-lock);
+}
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index afa730e..5cbfc82 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -467,5 +467,6 @@ int qcow2_dedup_store_new_hashes(BlockDriverState *bs,
  int count,
  uint64_t logical_sect,
  uint64_t physical_sect);
+void coroutine_fn qcow2_co_load_dedup_hashes(void *opaque);
 
 #endif
-- 
1.7.10.4




Re: [Qemu-devel] [PATCH 0/2] [PULL] qemu-kvm.git uq/master queue

2013-01-02 Thread Anthony Liguori
Gleb Natapov g...@redhat.com writes:

 The following changes since commit e376a788ae130454ad5e797f60cb70d0308babb6:

   Merge remote-tracking branch 'kwolf/for-anthony' into staging (2012-12-13 
 14:32:28 -0600)

 are available in the git repository at:


   git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

 for you to fetch changes up to 0a2a59d35cbabf63c91340a1c62038e3e60538c1:

   qemu-kvm/pci-assign: 64 bits bar emulation (2012-12-25 14:37:52 +0200)


Pulled. Thanks.

Regards,

Anthony Liguori

 
 Will Auld (1):
   target-i386: Enabling IA32_TSC_ADJUST for QEMU KVM guest VMs

 Xudong Hao (1):
   qemu-kvm/pci-assign: 64 bits bar emulation

  hw/kvm/pci-assign.c   |   14 ++
  target-i386/cpu.h |2 ++
  target-i386/kvm.c |   14 ++
  target-i386/machine.c |   21 +
  4 files changed, 47 insertions(+), 4 deletions(-)
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html




Re: [Qemu-devel] [PULL] pci,virtio

2013-01-02 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 Included here is v3 virtio typesafety change - no comments
 were made to v3 - I made my best to address all comment
 and got no response to v3 so I assume it's OK now.

 There are more optimizations in my tree but
 they are a bit more scary - I'll let them
 stay there as I'll be away for a week.

 The following changes since commit 27dd7730582be85c7d4f680f5f71146629809c86:

   Merge remote-tracking branch 'bonzini/header-dirs' into staging (2012-12-19 
 17:15:39 -0600)

 are available in the git repository at:


   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_anthony

 for you to fetch changes up to 89d62be9f4fb538db7f919a2be7df2544ffc02c5:

   virtio-pci: don't poll masked vectors (2012-12-26 11:49:29 +0200)


Pulled. Thanks.

Regards,

Anthony Liguori

 
 pci,virtio

 This optimizes MSIX handling in virtio-pci.
 Also included is pci express capability bugfix.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

 
 Knut Omang (1):
   pcie: Fix bug in pcie_ext_cap_set_next

 Michael S. Tsirkin (4):
   virtio: make bindings typesafe
   msi: add API to get notified about pending bit poll
   msix: expose access to masked/pending state
   virtio-pci: don't poll masked vectors

  hw/pci/msix.c|  19 +++--
  hw/pci/msix.h|   6 ++-
  hw/pci/pci.h |   4 ++
  hw/pci/pcie.c|   2 +-
  hw/s390-virtio-bus.c |  24 ---
  hw/vfio_pci.c|   2 +-
  hw/virtio-pci.c  | 112 
 +++
  hw/virtio.c  |   2 +-
  hw/virtio.h  |  26 ++--
  9 files changed, 136 insertions(+), 61 deletions(-)



Re: [Qemu-devel] [PULL 0/1] update seabios

2013-01-02 Thread Anthony Liguori
Gerd Hoffmann kra...@redhat.com writes:

   Hi,

 One more seabios update, fixing the FreeBSD build failure.

 please pull,
   Gerd


Pulled. Thanks.

Regards,

Anthony Liguori

 The following changes since commit 914606d26e654d4c01bd5186f4d05e3fd445e219:

   Merge remote-tracking branch 'stefanha/trivial-patches' into staging 
 (2012-12-18 15:41:43 -0600)

 are available in the git repository at:

   git://git.kraxel.org/qemu seabios-a810e4e

 Gerd Hoffmann (1):
   Update seabios to a810e4e72a0d42c7bc04eda57382f8e019add901

  pc-bios/acpi-dsdt.aml |  Bin 4438 - 4521 bytes
  pc-bios/bios.bin  |  Bin 262144 - 131072 bytes
  pc-bios/q35-acpi-dsdt.aml |  Bin 7458 - 7458 bytes
  roms/seabios  |2 +-
  4 files changed, 1 insertions(+), 1 deletions(-)



[Qemu-devel] [PATCH 01/18] raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane

2013-01-02 Thread Stefan Hajnoczi
The raw_get_aio_fd() function allows virtio-blk-data-plane to get the
file descriptor of a raw image file with Linux AIO enabled.  This
interface is really a layering violation that can be resolved once the
block layer is able to run outside the global mutex - at that point
virtio-blk-data-plane will switch from custom Linux AIO code to using
the block layer.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/raw-posix.c | 34 ++
 include/block/block.h |  9 +
 2 files changed, 43 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 91159c7..87d888e 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1776,6 +1776,40 @@ static BlockDriver bdrv_host_cdrom = {
 };
 #endif /* __FreeBSD__ */
 
+#ifdef CONFIG_LINUX_AIO
+/**
+ * Return the file descriptor for Linux AIO
+ *
+ * This function is a layering violation and should be removed when it becomes
+ * possible to call the block layer outside the global mutex.  It allows the
+ * caller to hijack the file descriptor so I/O can be performed outside the
+ * block layer.
+ */
+int raw_get_aio_fd(BlockDriverState *bs)
+{
+BDRVRawState *s;
+
+if (!bs-drv) {
+return -ENOMEDIUM;
+}
+
+if (bs-drv == bdrv_find_format(raw)) {
+bs = bs-file;
+}
+
+/* raw-posix has several protocols so just check for raw_aio_readv */
+if (bs-drv-bdrv_aio_readv != raw_aio_readv) {
+return -ENOTSUP;
+}
+
+s = bs-opaque;
+if (!s-use_aio) {
+return -ENOTSUP;
+}
+return s-fd;
+}
+#endif /* CONFIG_LINUX_AIO */
+
 static void bdrv_file_init(void)
 {
 /*
diff --git a/include/block/block.h b/include/block/block.h
index b81d200..0719339 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -365,6 +365,15 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs);
 void bdrv_set_in_use(BlockDriverState *bs, int in_use);
 int bdrv_in_use(BlockDriverState *bs);
 
+#ifdef CONFIG_LINUX_AIO
+int raw_get_aio_fd(BlockDriverState *bs);
+#else
+static inline int raw_get_aio_fd(BlockDriverState *bs)
+{
+return -ENOTSUP;
+}
+#endif
+
 enum BlockAcctType {
 BDRV_ACCT_READ,
 BDRV_ACCT_WRITE,
-- 
1.8.0.2




Re: [Qemu-devel] [PULL 0/1] update seabios

2013-01-02 Thread Luigi Rizzo
are you going to distribute a 1.3.x snapshot with the updated bios that
lets FreeBSD boot ?

thanks
luigi

On Wed, Jan 2, 2013 at 5:57 PM, Anthony Liguori anth...@codemonkey.wswrote:

 Gerd Hoffmann kra...@redhat.com writes:

Hi,
 
  One more seabios update, fixing the FreeBSD build failure.
 
  please pull,
Gerd




[Qemu-devel] [RFC V4 13/30] qcow2: Extract qcow2_do_table_init.

2013-01-02 Thread Benoît Canet
Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qcow2-refcount.c |   43 ++-
 block/qcow2.h  |5 +
 2 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index e014b0e..75c2bde 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -31,27 +31,44 @@ static int64_t alloc_clusters_noref(BlockDriverState *bs, 
int64_t size);
 /*/
 /* refcount handling */
 
-int qcow2_refcount_init(BlockDriverState *bs)
+int qcow2_do_table_init(BlockDriverState *bs,
+uint64_t **table,
+int64_t offset,
+int size,
+bool is_refcount)
 {
-BDRVQcowState *s = bs-opaque;
-int ret, refcount_table_size2, i;
-
-refcount_table_size2 = s-refcount_table_size * sizeof(uint64_t);
-s-refcount_table = g_malloc(refcount_table_size2);
-if (s-refcount_table_size  0) {
-BLKDBG_EVENT(bs-file, BLKDBG_REFTABLE_LOAD);
-ret = bdrv_pread(bs-file, s-refcount_table_offset,
- s-refcount_table, refcount_table_size2);
-if (ret != refcount_table_size2)
+int ret, size2, i;
+
+size2 = size * sizeof(uint64_t);
+*table = g_malloc(size2);
+if (size  0) {
+if (is_refcount) {
+BLKDBG_EVENT(bs-file, BLKDBG_REFTABLE_LOAD);
+}
+ret = bdrv_pread(bs-file, offset,
+ *table, size2);
+if (ret != size2) {
 goto fail;
-for(i = 0; i  s-refcount_table_size; i++)
-be64_to_cpus(s-refcount_table[i]);
+}
+for (i = 0; i  size; i++) {
+be64_to_cpus((*table)[i]);
+}
 }
 return 0;
  fail:
 return -ENOMEM;
 }
 
+int qcow2_refcount_init(BlockDriverState *bs)
+{
+BDRVQcowState *s = bs-opaque;
+return qcow2_do_table_init(bs,
+   s-refcount_table,
+   s-refcount_table_offset,
+   s-refcount_table_size,
+   true);
+}
+
 void qcow2_refcount_close(BlockDriverState *bs)
 {
 BDRVQcowState *s = bs-opaque;
diff --git a/block/qcow2.h b/block/qcow2.h
index 5cbfc82..9add0f1 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -376,6 +376,11 @@ int qcow2_read_cluster_data(BlockDriverState *bs,
 int nb_sectors);
 
 /* qcow2-refcount.c functions */
+int qcow2_do_table_init(BlockDriverState *bs,
+uint64_t **table,
+int64_t offset,
+int size,
+bool is_refcount);
 int qcow2_refcount_init(BlockDriverState *bs);
 void qcow2_refcount_close(BlockDriverState *bs);
 
-- 
1.7.10.4




  1   2   >