date:20140630

[Qemu-devel] [PATCH V4] qemu-img create: add 'nocow' option

2014-06-30 Thread Chunyan Liu

Add 'nocow' option so that users could have a chance to set NOCOW flag to
newly created files. It's useful on btrfs file system to enhance performance.

Btrfs has low performance when hosting VM images, even more when the guest
in those VM are also using btrfs as file system. One way to mitigate this bad
performance is to turn off COW attributes on VM files. Generally, there are
two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then
all newly created files will be NOCOW. b) per file. Add the NOCOW file
attribute. It could only be done to empty or new files.

This patch tries the second way, according to the option, it could add NOCOW
per file.

For most block drivers, since the create file step is in raw-posix.c, so we
can do setting NOCOW flag ioctl in raw-posix.c only.

But there are some exceptions, like block/vpc.c and block/vdi.c, they are
creating file by calling qemu_open directly. For them, do the same setting
NOCOW flag ioctl work in them separately.

Signed-off-by: Chunyan Liu cy...@suse.com
---
  Changes to v3:
* remove NOCOW option from .create_opts of those drivers calling
  bdrv_create_file to create file. Adding NOCOW to raw-posix.c is
  enough. No difference to users when using 'qemu-img create' interface.
  Make the patch cleaner. 

 block/qed.c   |  6 +++---
 block/raw-posix.c | 25 +
 block/vdi.c   | 29 +
 block/vmdk.c  |  6 +++---
 block/vpc.c   | 29 +
 include/block/block_int.h |  1 +
 qemu-doc.texi | 16 
 qemu-img.texi | 16 
 8 files changed, 122 insertions(+), 6 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index eddae92..b69374b 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -567,7 +567,7 @@ static void bdrv_qed_close(BlockDriverState *bs)
 static int qed_create(const char *filename, uint32_t cluster_size,
   uint64_t image_size, uint32_t table_size,
   const char *backing_file, const char *backing_fmt,
-  Error **errp)
+  QemuOpts *opts, Error **errp)
 {
 QEDHeader header = {
 .magic = QED_MAGIC,
@@ -586,7 +586,7 @@ static int qed_create(const char *filename, uint32_t 
cluster_size,
 int ret = 0;
 BlockDriverState *bs;
 
-ret = bdrv_create_file(filename, NULL, local_err);
+ret = bdrv_create_file(filename, opts, local_err);
 if (ret  0) {
 error_propagate(errp, local_err);
 return ret;
@@ -682,7 +682,7 @@ static int bdrv_qed_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 
 ret = qed_create(filename, cluster_size, image_size, table_size,
- backing_file, backing_fmt, errp);
+ backing_file, backing_fmt, opts, errp);
 
 finish:
 g_free(backing_file);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index dacf4fb..825a0c8 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -55,6 +55,9 @@
 #include linux/cdrom.h
 #include linux/fd.h
 #include linux/fs.h
+#ifndef FS_NOCOW_FL
+#define FS_NOCOW_FL 0x0080 /* Do not cow file */
+#endif
 #endif
 #ifdef CONFIG_FIEMAP
 #include linux/fiemap.h
@@ -1278,12 +1281,14 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 int fd;
 int result = 0;
 int64_t total_size = 0;
+bool nocow = false;
 
 strstart(filename, file:, filename);
 
 /* Read out options */
 total_size =
 qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) / BDRV_SECTOR_SIZE;
+nocow = qemu_opt_get_bool(opts, BLOCK_OPT_NOCOW, false);
 
 fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
0644);
@@ -1291,6 +1296,21 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 result = -errno;
 error_setg_errno(errp, -result, Could not create file);
 } else {
+if (nocow) {
+#ifdef __linux__
+/* Set NOCOW flag to solve performance issue on fs like btrfs.
+ * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value
+ * will be ignored since any failure of this operation should not
+ * block the left work.
+ */
+int attr;
+if (ioctl(fd, FS_IOC_GETFLAGS, attr) == 0) {
+attr |= FS_NOCOW_FL;
+ioctl(fd, FS_IOC_SETFLAGS, attr);
+}
+#endif
+}
+
 if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) {
 result = -errno;
 error_setg_errno(errp, -result, Could not resize file);
@@ -1477,6 +1497,11 @@ static QemuOptsList raw_create_opts = {
 .type = QEMU_OPT_SIZE,
 .help = Virtual disk size
 },
+{
+.name = BLOCK_OPT_NOCOW,
+.type = QEMU_OPT_BOOL,
+.help = Turn

Re: [Qemu-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-06-30 Thread Michael S. Tsirkin

On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote:
 On 2014/6/26 18:03, Paolo Bonzini wrote:
 Il 26/06/2014 11:18, Chen, Tiejun ha scritto:
 
 
 - offsets 0x..0x0fff map to configuration space of the host MCH
 
 
 Are you saying the config space in the video device?
 
 No, I am saying in a new BAR, or at some magic offset of an existing
 MMIO BAR.
 
 
 As I mentioned previously, the IGD guy told me we have no any unused a
 offset or BAR in the config space.
 
 And guy who are responsible for the native driver seems not be accept to
 extend some magic offset of an existing MMIO BAR.
 
 In addition I think in a short time its not possible to migrate i440fx to
 q35 as a PCIe machine of xen.

That seems like a weak motivation.  I don't see a need to get something
merged upstream in a short time: this seems sure to miss 2.1,
so you have the time to make it architecturally sound.
Making existing guests work would be a better motivation.
Isn't this possible with an mch chipset?


 So could we do this step by step:
 
 #1 phase: We just cover current qemu-xen implementation based on i44fx, so
 still provide that pseudo ISA bridge at 00:1f.0 as we already did.

By the way there is no reason to put it at 00:1f.0 specifically I think.
So it seems simple: create a dummy device that gets device and
vendor id as properties. If you really like, add an option to get values
from sysfs: device and vendor id are world readable, so just get them
directly and not through xen wrappers, this way you can open the files
RO and not RW.
You seem to poke at revision as well, I don't see
driver looking at that - strictly necessary?
If yes please patch host kernel to expose that info in sysfs,
though we can fall back on pci config if not there.

MCH (bridge_dev) hacks in i915 are nastier.
To clean them up, we really have to have an explicit driver for this
bridge, not a pass-through device.  Long term, the right thing to do is
likely to extend host driver and expose the necessary information in
sysfs on host kernel.




 #2 phase: Now, we will choose a capability ID that won't be conflicting with
 others. To do this properly, we need to get one from PCI SIG group. To have
 this workable and consistently validated, this method shouldn't be virt
 specific. Then native driver should use the same method.

You mean you will be able to talk sense into hardware guys?
I doubt that. If they could be convinced to make e.g. i915 base a
proper BAR, why didn't they?


 So when xen work on
 q35 PCIe machine, we can walk this way.

If you are emulating MCH anyway, pick one that is close
to what i915 driver expects. It would then work with existing
devices, without new capability IDs.


 Anthony,
 
 Any comments to address this in xen case?
 
 Thanks
 Tiejun

Re: [Qemu-devel] [PATCH v9 00/22] legacy virtio support for cross-endian targets

2014-06-30 Thread Greg Kurz

On Sun, 29 Jun 2014 18:13:53 +0300
Michael S. Tsirkin m...@redhat.com wrote:
 On Tue, Jun 24, 2014 at 07:06:58PM +0200, Greg Kurz wrote:
  The current legacy virtio devices have a fundamental flaw: they all share
  data between host and guest with guest endianness ordering. This is ok for
  nearly all architectures that have fixed endianness. Unfortunately, it 
  breaks
  for recent PPC64 and ARM targets that can change endianness at runtime.
  The virtio-1.0 specification fixes the issue by enforcing little-endian
  ordering. It may take some time though until the code for 1.0 gets available
  and supported, and all the users can migrate. There have been discussions
  for some monthes about supporting such oddity: now we have little-endian
  PPC64 distros available, it is worth to propose something.
  
  This patch set brings legacy virtio support for cross-endian targets. The
  rationale is that we add a new device_endianness property to VirtIODevice.
  This property is used as a runtime indicator to decide wether we should
  do little-endian or big-endian conversion, as opposed to the compile time
  choice we have now with TARGTE_WORDS_BIGENDIAN. The choice was made to
  sample the device endianness out of the endianness mode of the guest
  CPU that does the reset. It is an evil but logical consequence of the
  initial flaw in the virtio specification, and it was agreed that the concept
  would be a good common base for ARM and PPC64 enablement at least. Please
  note also that this new property is state and must be preserved across
  migrations.
  
  There are several parts in the serie:
  - patches 1 and 2 are simple fixes
  - patches 3 to 9 introduce VMState based subsections in the virtio
migration code. This is needed because we introduce a new property
in VirtIODevice that we want to migrate without ruining compatibility
efforts
  - patches 10 to 13 bring virtio device endianness and memory accessors
to be used by the virtio code
  - patches 14 to 20 wire the new memory accessors everywhere accross the
virtio code
  - patch 21 is the PPC64 enablement
  - patch 22 is a follow-up workaround to disable vhost-net acceleration
in the case the host and guest have different endianness, because
it is not supported for the moment
  
  Changes since v8 are provided in each patch.
  
  Cheers.
 
 Applied, thanks everyone.
 

\O/

Thanks Michael !

--
Greg

  ---
  
  Alexander Graf (1):
virtio-serial: don't migrate the config space
  
  Cédric Le Goater (1):
virtio-net: byteswap virtio-net header
  
  Greg Kurz (14):
virtio: introduce device specific migration calls
virtio-net: implement per-device migration calls
virtio-blk: implement per-device migration calls
virtio-serial: implement per-device migration calls
virtio-balloon: implement per-device migration calls
virtio-rng: implement per-device migration calls
virtio: add subsections to the migration stream
exec: introduce target_words_bigendian() helper
cpu: introduce CPUClass::virtio_is_big_endian()
virtio: add endian-ambivalent support to VirtIODevice
virtio: memory accessors for endian-ambivalent targets
virtio-9p: use virtio wrappers to access headers
target-ppc: enable virtio endian ambivalent support
vhost-net: disable when cross-endian
  
  Rusty Russell (6):
virtio: allow byte swapping for vring
virtio-net: use virtio wrappers to access headers
virtio-balloon: use virtio wrappers to access page frame numbers
virtio-blk: use virtio wrappers to access headers
virtio-scsi: use virtio wrappers to access headers
virtio-serial-bus: use virtio wrappers to access headers
  
  
   exec.c|8 -
   hw/9pfs/virtio-9p-device.c|3 -
   hw/block/virtio-blk.c |   62 ++-
   hw/char/virtio-serial-bus.c   |   94 ++--
   hw/net/vhost_net.c|   19 +++
   hw/net/virtio-net.c   |   56 +++---
   hw/scsi/virtio-scsi.c |   40 ---
   hw/virtio/virtio-balloon.c|   33 +++---
   hw/virtio/virtio-pci.c|   11 +-
   hw/virtio/virtio-rng.c|   12 +-
   hw/virtio/virtio.c|  216 
  -
   include/hw/virtio/virtio-access.h |  170 +
   include/hw/virtio/virtio.h|   17 +++
   include/qom/cpu.h |1 
   qom/cpu.c |6 +
   target-ppc/cpu.h  |2 
   target-ppc/translate_init.c   |   15 +++
   17 files changed, 583 insertions(+), 182 deletions(-)
   create mode 100644 include/hw/virtio/virtio-access.h
  
  --
  Greg

Re: [Qemu-devel] [PATCH] numa: check for busy memory backend

2014-06-30 Thread Michael S. Tsirkin

On Mon, Jun 30, 2014 at 01:33:42PM +0800, Hu Tao wrote:
 On Sun, Jun 29, 2014 at 06:20:22PM +0300, Michael S. Tsirkin wrote:
  On Wed, Jun 25, 2014 at 05:04:14PM +0800, Hu Tao wrote:
   ..to prevent one memory backend from being used by more than one numa
   node.
  
  Thanks, but please always make the msg content self-contained
  so it can be understood without the subject.
  E.g. here, just drop ..to.
  
  Are you sure we want this? Is there a chance sharing a backend
  can be useful?
 
 This patch is actually a bug fix.

It is?  What is the bug and how to reproduce it?
I am not sure we should write a ton of code to validate qemu
configuration, as long as qemu does not assert.

 Even if we will want backend sharing, we
 can do it after.

By reverting this patch? So why merge it?

  
  Igor, what's your take?
  
   
   Signed-off-by: Hu Tao hu...@cn.fujitsu.com
   ---
numa.c | 7 +++
1 file changed, 7 insertions(+)
   
   diff --git a/numa.c b/numa.c
   index e471afe..6c1c554 100644
   --- a/numa.c
   +++ b/numa.c
   @@ -279,6 +279,13 @@ void 
   memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
exit(1);
}

   +if (memory_region_is_mapped(seg)) {
   +char *path = 
   object_get_canonical_path_component(OBJECT(backend));
   +error_report(memory backend %s is busy, path);
   +g_free(path);
   +exit(1);
   +}
   +
memory_region_add_subregion(mr, addr, seg);
vmstate_register_ram_global(seg);
addr += size;
   -- 
   1.9.3

Re: [Qemu-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-06-30 Thread Chen, Tiejun


On 2014/6/30 14:48, Michael S. Tsirkin wrote:

On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote:

On 2014/6/26 18:03, Paolo Bonzini wrote:

Il 26/06/2014 11:18, Chen, Tiejun ha scritto:




- offsets 0x..0x0fff map to configuration space of the host MCH



Are you saying the config space in the video device?


No, I am saying in a new BAR, or at some magic offset of an existing
MMIO BAR.



As I mentioned previously, the IGD guy told me we have no any unused a
offset or BAR in the config space.

And guy who are responsible for the native driver seems not be accept to
extend some magic offset of an existing MMIO BAR.

In addition I think in a short time its not possible to migrate i440fx to
q35 as a PCIe machine of xen.


That seems like a weak motivation.  I don't see a need to get something
merged upstream in a short time: this seems sure to miss 2.1,
so you have the time to make it architecturally sound.
Making existing guests work would be a better motivation.


Yes.


Isn't this possible with an mch chipset?


If you're saying q35, I mean AFAIK we have no any plan to migrate to 
this MCH in xen case. Additionally, I think I should follow this feature 
after q35 can work for xen scenario.






So could we do this step by step:

#1 phase: We just cover current qemu-xen implementation based on i44fx, so
still provide that pseudo ISA bridge at 00:1f.0 as we already did.


By the way there is no reason to put it at 00:1f.0 specifically I think.
So it seems simple: create a dummy device that gets device and
vendor id as properties. If you really like, add an option to get values


Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx 
passthrough: create pseudo intel isa bridge. There, we fake this device 
just at 00:1f.0.


But you guys don't like this, and shouldn't this be just this point we 
discussing now?


If you guys agree that , everything is fine.


from sysfs: device and vendor id are world readable, so just get them
directly and not through xen wrappers, this way you can open the files
RO and not RW.
You seem to poke at revision as well, I don't see
driver looking at that - strictly necessary?
If yes please patch host kernel to expose that info in sysfs,
though we can fall back on pci config if not there.

MCH (bridge_dev) hacks in i915 are nastier.
To clean them up, we really have to have an explicit driver for this
bridge, not a pass-through device.  Long term, the right thing to do is
likely to extend host driver and expose the necessary information in
sysfs on host kernel.




I'm a bit confused. Any sysfs should be filled by the associated PCIe 
device, shouldn't it? So qemu still need to emulate this PCIe device 
firstly, then set properties into sysfs.






#2 phase: Now, we will choose a capability ID that won't be conflicting with
others. To do this properly, we need to get one from PCI SIG group. To have
this workable and consistently validated, this method shouldn't be virt
specific. Then native driver should use the same method.


You mean you will be able to talk sense into hardware guys?
I doubt that. If they could be convinced to make e.g. i915 base a


We're negotiating this, so this is just our long term solution we figure 
out currently.



proper BAR, why didn't they?


We already have no any free BAR as we mentioned previously.





So when xen work on
q35 PCIe machine, we can walk this way.


If you are emulating MCH anyway, pick one that is close
to what i915 driver expects. It would then work with existing


Looks you guys prefer we create a new MCH anyway, right? But is it 
necessary to create a new based on i440fx just for a little change?


Thanks
Tiejun


devices, without new capability IDs.



Anthony,

Any comments to address this in xen case?

Thanks
Tiejun

[Qemu-devel] [PATCH] ui/vnc: fix potential memory corruption issues

2014-06-30 Thread Peter Lieven

this patch addresses 2 memory corruption issues.

The first was actually discovered during playing
around with a Windows 7 vServer. During resolution
change in Windows 7 it happens sometimes that Windows
changes to an intermediate resolution where
server_stride % cmp_bytes != 0 (in vnc_refresh_server_surface).
This happens only if width % VNC_DIRTY_PIXELS_PER_BIT != 0.
The patch fixes the issue by clamping cmp_bytes in that case
and it finally makes those resolutions work correctly.
This can be easily tested by setting VNC_DIRTY_PIXELS_PER_BIT
to a bigger power of 2 value different than 16.

The second is a theoretical issue, but is maybe exploitable
by the guest. If for some reason the surface size is bigger
than VNC_MAX_WIDTH x VNC_MAX_HEIGHT we end up in severe corruption.
This can be easily reproduced by playing around with VNC_MAX_WIDTH
and VNC_MAX_HEIGHT. This patch modifies the VNC server to only
track and copy the area up to the maximum possible size.

Signed-off-by: Peter Lieven p...@kamp.de
---
 ui/vnc.c |  149 +-
 ui/vnc.h |   14 +++---
 2 files changed, 77 insertions(+), 86 deletions(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index 14a86c3..a6d748b 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -432,14 +432,10 @@ static void framebuffer_update_request(VncState *vs, int 
incremental,
 static void vnc_refresh(DisplayChangeListener *dcl);
 static int vnc_refresh_server_surface(VncDisplay *vd);
 
-static void vnc_dpy_update(DisplayChangeListener *dcl,
-   int x, int y, int w, int h)
-{
-VncDisplay *vd = container_of(dcl, VncDisplay, dcl);
-struct VncSurface *s = vd-guest;
-int width = surface_width(vd-ds);
-int height = surface_height(vd-ds);
-
+static void vnc_set_area_dirty(DECLARE_BITMAP(dirty[VNC_MAX_HEIGHT],
+   VNC_MAX_WIDTH / VNC_DIRTY_PIXELS_PER_BIT),
+   int width, int height,
+   int x, int y, int w, int h) {
 /* this is needed this to ensure we updated all affected
  * blocks if x % VNC_DIRTY_PIXELS_PER_BIT != 0 */
 w += (x % VNC_DIRTY_PIXELS_PER_BIT);
@@ -451,11 +447,22 @@ static void vnc_dpy_update(DisplayChangeListener *dcl,
 h = MIN(y + h, height);
 
 for (; y  h; y++) {
-bitmap_set(s-dirty[y], x / VNC_DIRTY_PIXELS_PER_BIT,
+bitmap_set(dirty[y], x / VNC_DIRTY_PIXELS_PER_BIT,
DIV_ROUND_UP(w, VNC_DIRTY_PIXELS_PER_BIT));
 }
 }
 
+static void vnc_dpy_update(DisplayChangeListener *dcl,
+   int x, int y, int w, int h)
+{
+VncDisplay *vd = container_of(dcl, VncDisplay, dcl);
+struct VncSurface *s = vd-guest;
+int width = pixman_image_get_width(vd-server);
+int height = pixman_image_get_height(vd-server);
+
+vnc_set_area_dirty(s-dirty, width, height, x, y, w, h);
+}
+
 void vnc_framebuffer_update(VncState *vs, int x, int y, int w, int h,
 int32_t encoding)
 {
@@ -517,17 +524,15 @@ void buffer_advance(Buffer *buf, size_t len)
 
 static void vnc_desktop_resize(VncState *vs)
 {
-DisplaySurface *ds = vs-vd-ds;
-
 if (vs-csock == -1 || !vnc_has_feature(vs, VNC_FEATURE_RESIZE)) {
 return;
 }
-if (vs-client_width == surface_width(ds) 
-vs-client_height == surface_height(ds)) {
+if (vs-client_width == pixman_image_get_width(vs-vd-server) 
+vs-client_height == pixman_image_get_height(vs-vd-server)) {
 return;
 }
-vs-client_width = surface_width(ds);
-vs-client_height = surface_height(ds);
+vs-client_width = pixman_image_get_width(vs-vd-server);
+vs-client_height = pixman_image_get_height(vs-vd-server);
 vnc_lock_output(vs);
 vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE);
 vnc_write_u8(vs, 0);
@@ -571,31 +576,24 @@ void *vnc_server_fb_ptr(VncDisplay *vd, int x, int y)
 ptr += x * VNC_SERVER_FB_BYTES;
 return ptr;
 }
-/* this sets only the visible pixels of a dirty bitmap */
-#define VNC_SET_VISIBLE_PIXELS_DIRTY(bitmap, w, h) {\
-int y;\
-memset(bitmap, 0x00, sizeof(bitmap));\
-for (y = 0; y  h; y++) {\
-bitmap_set(bitmap[y], 0,\
-   DIV_ROUND_UP(w, VNC_DIRTY_PIXELS_PER_BIT));\
-} \
-}
 
 static void vnc_dpy_switch(DisplayChangeListener *dcl,
DisplaySurface *surface)
 {
 VncDisplay *vd = container_of(dcl, VncDisplay, dcl);
 VncState *vs;
+int width, height;
 
 vnc_abort_display_jobs(vd);
 
 /* server surface */
 qemu_pixman_image_unref(vd-server);
 vd-ds = surface;
+width = MIN(VNC_MAX_WIDTH, ROUND_UP(surface_width(vd-ds),
+VNC_DIRTY_PIXELS_PER_BIT));
+height = MIN(VNC_MAX_HEIGHT, surface_height(vd-ds));
 vd-server = pixman_image_create_bits(VNC_SERVER_FB_FORMAT,
-  surface_width(vd-ds),
-

Re: [Qemu-devel] [PATCH 0/4] ui/cocoa: Fix absolute positioning and other bugs

2014-06-30 Thread Gerd Hoffmann

On Mo, 2014-06-23 at 10:35 +0100, Peter Maydell wrote:
 This set of cocoa UI patches:
  * fixes the completely broken handling of absolute positioning
(tablet-style) input devices
  * fixes a bug where if the first surface created was the same 640x480
as the initial window we'd never actually draw it
  * implements support for the -show-cursor command line option
 
 The GTK and SDL UI frontends don't seem to be consistent about how they
 handle mousegrab for absolute-position devices; I followed SDL on the
 basis that it was the older and more established UI. (GTK doesn't
 implement -show-cursor at all, incidentally.)

Series looks sane to me, although I can't claim I fully understand patch
#3, especially the macos x mouse handling internals involved there ...

cheers,
  Gerd

Re: [Qemu-devel] possible denial of service via VNC

2014-06-30 Thread Gerd Hoffmann

On So, 2014-06-29 at 14:16 +0200, Peter Lieven wrote:
 Hi,
 
 while debugging a VNC issue I found this:
 
 case VNC_MSG_CLIENT_CUT_TEXT:
 if (len == 1)
 return 8;
 
 if (len == 8) {
 uint32_t dlen = read_u32(data, 4);
 if (dlen  0)
 return 8 + dlen;
 }
 
 client_cut_text(vs, read_u32(data, 4), data + 8);
 break;
 
 in protocol_client_msg().
 
 Is this really a good idea? This allows for letting the vs-input buffer to 
 grow
 up to 2^32 + 8 byte which will possibly result in an out of memory condition.

Applying a limit there looks reasonable to me.  Patches welcome.
As this is text only a megabyte should be more than enough for all
practical purposes.  Question is what to do when the limit is exceeded?
Disconnect?  Read  throw away?

cheers,
  Gerd

Re: [Qemu-devel] possible denial of service via VNC

2014-06-30 Thread Peter Lieven


On 30.06.2014 09:33, Gerd Hoffmann wrote:

On So, 2014-06-29 at 14:16 +0200, Peter Lieven wrote:

Hi,

while debugging a VNC issue I found this:

 case VNC_MSG_CLIENT_CUT_TEXT:
 if (len == 1)
 return 8;

 if (len == 8) {
 uint32_t dlen = read_u32(data, 4);
 if (dlen  0)
 return 8 + dlen;
 }

 client_cut_text(vs, read_u32(data, 4), data + 8);
 break;

in protocol_client_msg().

Is this really a good idea? This allows for letting the vs-input buffer to grow
up to 2^32 + 8 byte which will possibly result in an out of memory condition.

Applying a limit there looks reasonable to me.  Patches welcome.
As this is text only a megabyte should be more than enough for all
practical purposes.  Question is what to do when the limit is exceeded?
Disconnect?  Read  throw away?


I would also think something in the order of megabytes should be fine.
I would vote for disconnect as soon as the limit specified is too big. Otherwise
we had to rewrite the whole receive logic which could introduce additional
bugs.

Peter

--

Mit freundlichen Grüßen

Peter Lieven

...

  KAMP Netzwerkdienste GmbH
  Vestische Str. 89-91 | 46117 Oberhausen
  Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
  p...@kamp.de | http://www.kamp.de

  Geschäftsführer: Heiner Lante | Michael Lante
  Amtsgericht Duisburg | HRB Nr. 12154
  USt-Id-Nr.: DE 120607556

...

Re: [Qemu-devel] possible denial of service via VNC

2014-06-30 Thread Gerd Hoffmann

  Hi,

 I would vote for disconnect as soon as the limit specified is too big. 
 Otherwise
 we had to rewrite the whole receive logic which could introduce additional
 bugs.

Sounds sensible.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH] numa: check for busy memory backend

2014-06-30 Thread Hu Tao

On Mon, Jun 30, 2014 at 09:53:20AM +0300, Michael S. Tsirkin wrote:
 On Mon, Jun 30, 2014 at 01:33:42PM +0800, Hu Tao wrote:
  On Sun, Jun 29, 2014 at 06:20:22PM +0300, Michael S. Tsirkin wrote:
   On Wed, Jun 25, 2014 at 05:04:14PM +0800, Hu Tao wrote:
..to prevent one memory backend from being used by more than one numa
node.
   
   Thanks, but please always make the msg content self-contained
   so it can be understood without the subject.
   E.g. here, just drop ..to.
   
   Are you sure we want this? Is there a chance sharing a backend
   can be useful?
  
  This patch is actually a bug fix.
 
 It is?  What is the bug and how to reproduce it?

If user specifies the same memory backend for two numa nodes:

./x86_64-softmmu/qemu-system-x86_64 -hda /home/data/libvirt-images/f18.img  -m 
512M \
-qmp unix:/tmp/m,server,nowait -monitor stdio -enable-kvm \
-object memory-backend-ram,size=256M,id=ram0 \
-numa node,nodeid=0,memdev=ram0 \
-numa node,nodeid=1,memdev=ram0

 I am not sure we should write a ton of code to validate qemu
 configuration, as long as qemu does not assert.

It seems qemu does not provide a way to disable assert currently.
Even if I removed asserts on the code path in my test, there is another
problem that it hits an infinite in render_memory_region().

 
  Even if we will want backend sharing, we
  can do it after.
 
 By reverting this patch? So why merge it?

The point is qemu doesn't fire a bug no matter what user inputs.

 
   
   Igor, what's your take?
   

Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 numa.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/numa.c b/numa.c
index e471afe..6c1c554 100644
--- a/numa.c
+++ b/numa.c
@@ -279,6 +279,13 @@ void 
memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
 exit(1);
 }
 
+if (memory_region_is_mapped(seg)) {
+char *path = 
object_get_canonical_path_component(OBJECT(backend));
+error_report(memory backend %s is busy, path);
+g_free(path);
+exit(1);
+}
+
 memory_region_add_subregion(mr, addr, seg);
 vmstate_register_ram_global(seg);
 addr += size;
-- 
1.9.3

Re: [Qemu-devel] [PATCH] Functions bus_foreach and device_find from libqos virtio API

2014-06-30 Thread Stefan Hajnoczi

On Thu, Jun 26, 2014 at 04:34:40PM +0200, Marc Marí wrote:
 +static QVirtioPCIDevice *qpcidevice_to_qvirtiodevice(QPCIDevice *pdev)
 +{
 +QVirtioPCIDevice *vpcidev;
 +vpcidev = g_malloc0(sizeof(*vpcidev));
 +
 +if (pdev) {
 +vpcidev-pdev = pdev;
 +vpcidev-vdev.device_type =
 +qpci_config_readw(vpcidev-pdev, 
 PCI_SUBSYSTEM_ID);
 +/* TODO: When QVirtQueue is defined, change for
 +g_malloc0(sizeof(QVirtQueue)); */
 +vpcidev-vdev.vq = NULL;

This doesn't quite work because devices can have multiple virtqueues.

Please drop the vq field from this patch.  Add it later when you
actually implement virtqueues.

 +}
 +
 +return vpcidev;
 +}
 +
 +static void qvirtio_pci_foreach_callback(
 +QPCIDevice *dev, int devfn, void *data)
 +{
 +QVirtioPCIForeachData *d = data;
 +QVirtioPCIDevice *vpcidev = qpcidevice_to_qvirtiodevice(dev);
 +
 +if (vpcidev-vdev.device_type == d-device_type) {
 +d-func(vpcidev-vdev, d-user_data);
 +}
 +}
 +
 +static void qvirtio_pci_assign_device(QVirtioDevice *d, void *data)
 +{
 +QVirtioPCIDevice *vpcidev = data;
 +vpcidev-pdev   = ((QVirtioPCIDevice *)d)-pdev;
 +vpcidev-vdev.device_type   = ((QVirtioPCIDevice *)d)-vdev.device_type;
 +vpcidev-vdev.vq= ((QVirtioPCIDevice *)d)-vdev.vq;
 +}
 +
 +static void qvirtio_pci_notify(QVirtioDevice *d, uint16_t vector)
 +{
 +
 +}
 +
 +static void qvirtio_pci_get_config(QVirtioDevice *d, void *config)
 +{
 +
 +}
 +
 +static void qvirtio_pci_set_config(QVirtioDevice *d, void *config)
 +{
 +
 +}
 +
 +static uint32_t qvirtio_pci_get_features(QVirtioDevice *d)
 +{
 +return 0;
 +}
 +
 +static uint8_t qvirtio_pci_get_status(QVirtioDevice *d)
 +{
 +return 0;
 +}
 +
 +static void qvirtio_pci_set_status(QVirtioDevice *d, uint8_t val)
 +{
 +
 +}
 +
 +static void qvirtio_pci_reset(QVirtioDevice *d)
 +{
 +
 +}
 +
 +static uint8_t qvirtio_pci_query_isr(QVirtioDevice *d)
 +{
 +return 0;
 +}

Patches should only include useful, working code.  Please drop all these
unimplemented functions.

 +
 +static void qvirtio_pci_foreach(uint16_t device_type,
 +void (*func)(QVirtioDevice *d, void *data), void *data)
 +{
 +QVirtioPCIForeachData d = { .func = func,
 +.device_type = device_type,
 +.user_data = data };
 +
 +if (!bus) {
 +bus = qpci_init_pc();
 +}
 +
 +qpci_device_foreach(bus, QVIRTIO_VENDOR_ID, -1,
 +qvirtio_pci_foreach_callback, d);
 +}
 +
 +static QVirtioDevice *qvirtio_pci_device_find(uint16_t device_type)
 +{
 +QVirtioPCIDevice *dev;
 +
 +dev = g_malloc0(sizeof(*dev));
 +qvirtio_pci_foreach(device_type, qvirtio_pci_assign_device, dev);
 +
 +return (QVirtioDevice *)dev;
 +}
 +
 +const QVirtioBus qvirtio_pci = {
 +.notify = qvirtio_pci_notify,
 +.get_config = qvirtio_pci_get_config,
 +.set_config = qvirtio_pci_set_config,
 +.get_features = qvirtio_pci_get_features,
 +.get_status = qvirtio_pci_get_status,
 +.set_status = qvirtio_pci_set_status,
 +.reset = qvirtio_pci_reset,
 +.query_isr = qvirtio_pci_query_isr,
 +.bus_foreach = qvirtio_pci_foreach,
 +.device_find = qvirtio_pci_device_find,
 +};
 +
 diff --git a/tests/libqos/virtio-pci.h b/tests/libqos/virtio-pci.h
 new file mode 100644
 index 000..e5812af
 --- /dev/null
 +++ b/tests/libqos/virtio-pci.h
 @@ -0,0 +1,29 @@
 +/*
 + * libqos virtio PCI definitions
 + *
 + * Copyright (c) 2014 Marc Marí
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
 +#ifndef LIBQOS_VIRTIO_PCI_H
 +#define LIBQOS_VIRTIO_PCI_H
 +
 +#include libqos/virtio.h
 +#include libqos/pci.h
 +
 +typedef struct QVirtioPCIDevice {
 +QVirtioDevice vdev;
 +QPCIDevice *pdev;
 +} QVirtioPCIDevice;
 +
 +typedef struct QVirtioPCIForeachData {
 +void (*func)(QVirtioDevice *d, void *data);
 +uint16_t device_type;
 +void *user_data;
 +} QVirtioPCIForeachData;
 +
 +extern const QVirtioBus qvirtio_pci;
 +
 +#endif
 diff --git a/tests/libqos/virtio.h b/tests/libqos/virtio.h
 new file mode 100644
 index 000..43a0e73
 --- /dev/null
 +++ b/tests/libqos/virtio.h
 @@ -0,0 +1,64 @@
 +/*
 + * libqos virtio definitions
 + *
 + * Copyright (c) 2014 Marc Marí
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
 +#ifndef LIBQOS_VIRTIO_H
 +#define LIBQOS_VIRTIO_H
 +
 +#define QVIRTQUEUE_MAX_SIZE 1024
 +#define QVIRTIO_VENDOR_ID   0x1AF4
 +
 +#define QVIRTIO_NET_DEVICE_ID   0x1
 +#define QVIRTIO_BLK_DEVICE_ID   0x2
 +
 +typedef struct QVirtioDevice QVirtioDevice;
 +typedef struct QVirtQueue QVirtQueue;
 +typedef struct QVRing QVRing;
 +
 +typedef struct QVirtioDevice {
 +

Re: [Qemu-devel] possible denial of service via VNC

2014-06-30 Thread Peter Lieven


On 30.06.2014 09:46, Gerd Hoffmann wrote:

   Hi,


I would vote for disconnect as soon as the limit specified is too big. Otherwise
we had to rewrite the whole receive logic which could introduce additional
bugs.

Sounds sensible.


Especially since client_cut_text is currently a NOP.

Peter

Re: [Qemu-devel] [PATCH] ui/vnc: fix potential memory corruption issues

2014-06-30 Thread Gerd Hoffmann

On Mo, 2014-06-30 at 09:24 +0200, Peter Lieven wrote:
 this patch addresses 2 memory corruption issues.
 
 The first was actually discovered during playing
 around with a Windows 7 vServer. During resolution
 change in Windows 7 it happens sometimes that Windows
 changes to an intermediate resolution where
 server_stride % cmp_bytes != 0 (in vnc_refresh_server_surface).
 This happens only if width % VNC_DIRTY_PIXELS_PER_BIT != 0.
 The patch fixes the issue by clamping cmp_bytes in that case
 and it finally makes those resolutions work correctly.
 This can be easily tested by setting VNC_DIRTY_PIXELS_PER_BIT
 to a bigger power of 2 value different than 16.
 
 The second is a theoretical issue, but is maybe exploitable
 by the guest. If for some reason the surface size is bigger
 than VNC_MAX_WIDTH x VNC_MAX_HEIGHT we end up in severe corruption.
 This can be easily reproduced by playing around with VNC_MAX_WIDTH
 and VNC_MAX_HEIGHT. This patch modifies the VNC server to only
 track and copy the area up to the maximum possible size.

So this basically makes vnc work correctly in case guest surface and
server surface have different sizes, then fixes the two bugs on top of
that.  And it obsoletes the other corruption patch send Friday.
Correct?

cheers,
  Gerd

Re: [Qemu-devel] [PATCH] ui/vnc: fix potential memory corruption issues

2014-06-30 Thread Peter Lieven


On 30.06.2014 09:52, Gerd Hoffmann wrote:

On Mo, 2014-06-30 at 09:24 +0200, Peter Lieven wrote:

this patch addresses 2 memory corruption issues.

The first was actually discovered during playing
around with a Windows 7 vServer. During resolution
change in Windows 7 it happens sometimes that Windows
changes to an intermediate resolution where
server_stride % cmp_bytes != 0 (in vnc_refresh_server_surface).
This happens only if width % VNC_DIRTY_PIXELS_PER_BIT != 0.
The patch fixes the issue by clamping cmp_bytes in that case
and it finally makes those resolutions work correctly.
This can be easily tested by setting VNC_DIRTY_PIXELS_PER_BIT
to a bigger power of 2 value different than 16.

The second is a theoretical issue, but is maybe exploitable
by the guest. If for some reason the surface size is bigger
than VNC_MAX_WIDTH x VNC_MAX_HEIGHT we end up in severe corruption.
This can be easily reproduced by playing around with VNC_MAX_WIDTH
and VNC_MAX_HEIGHT. This patch modifies the VNC server to only
track and copy the area up to the maximum possible size.

So this basically makes vnc work correctly in case guest surface and
server surface have different sizes, then fixes the two bugs on top of
that.  And it obsoletes the other corruption patch send Friday.
Correct?


Yes and yes.

Basically the server surface is adjusted to not exceed VNC_MAX_WIDTH x 
VNC_MAX_HEIGHT
and on top the width is rounded up to multiple of VNC_DIRTY_PIXELS_PER_BIT.

If you have a resolution whose width is not dividable by 
VNC_DIRTY_PIXELS_PER_BIT you get
a small black bar on the right of the screen. First I wanted to fix only that 
and then
I noticed that VNC_MAX_WIDTH and VNC_MAX_HEIGHT are nowhere enforced. I do not
know if this is actually exploitable by the guest.

If the surface is too big to fit the limits only the upper left area is shown.

I ran several sessions with altered values of VNC_MAX_WIDTH, VNC_MAX_HEIGHT and 
VNC_DIRTY_PIXELS_PER_BIT
in valgrind, but additional testing is welcome.

Peter

[Qemu-devel] [PATCH] ui/vnc: limit client_cut_text msg payload size

2014-06-30 Thread Peter Lieven

currently a malicious client could define a payload
size of 2^32 - 1 bytes and send up to that size of
data to the vnc server. The server would allocated
that amount of memory which could easily create an
out of memory condition.

This patch limits the payload size to 1MB max.

Please note that client_cut_text messages are currently
silently ignored.

Signed-off-by: Peter Lieven p...@kamp.de
---
 ui/vnc.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index 14a86c3..096278f 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -2165,13 +2165,20 @@ static int protocol_client_msg(VncState *vs, uint8_t 
*data, size_t len)
 pointer_event(vs, read_u8(data, 1), read_u16(data, 2), read_u16(data, 
4));
 break;
 case VNC_MSG_CLIENT_CUT_TEXT:
-if (len == 1)
+if (len == 1) {
 return 8;
-
+}
 if (len == 8) {
 uint32_t dlen = read_u32(data, 4);
-if (dlen  0)
+if (dlen  (1  20)) {
+error_report(vnc: client_cut_text msg payload has %u bytes
+  which exceeds our limit of 1MB., dlen);
+vnc_client_error(vs);
+break;
+}
+if (dlen  0) {
 return 8 + dlen;
+}
 }
 
 client_cut_text(vs, read_u32(data, 4), data + 8);
-- 
1.7.9.5

Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2

2014-06-30 Thread Stefan Hajnoczi

On Sat, Jun 28, 2014 at 05:58:58PM +0800, Ming Lei wrote:
 On Sat, Jun 28, 2014 at 5:51 AM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 27/06/2014 20:01, Ming Lei ha scritto:
 
  I just implemented plugunplug based batching, and it is working now.
  But throughout still has no obvious improvement.
 
  Looks loading in IOthread is a bit low, so I am wondering if there is
  block point caused by Qemu QEMU block layer.
 
 
  What does perf say?  Also, you can try using the QEMU trace subsystem and
  see where the latency goes.
 
 Follows some test result against 8589744aaf07b62 of
 upstream qemu, and the test is done on my 2core(4thread)
 laptop:
 
 1, with my draft batch patches[1](only linux-aio supported now)
 - throughput: +16% compared qemu upstream
 - average time spent by handle_notify(): 310us
 - average time between two handle_notify(): 1591us
 (this time reflects latency of handling host_notifier)

16% is still a worthwhile improvement.  I guess batching only benefits
aio=native since the threadpool ought to do better when it receives
requests as soon as possible.

Patch or an RFC would be welcome.

 2, same tests on 2.0.0 release(use custom Linux AIO)
 - average time spent by handle_notify(): 68us
 - average time between calling two handle_notify(): 269us
 (this time reflects latency of handling host_notifier)
 
 From above tests, looks root cause is late handling notify, and
 qemu block layer becomes 4times slower than previous custom
 linux aio taken by dataplane.

Try:
$ perf record -e syscalls:* --tid iothread-tid
^C
$ perf script # shows the trace log

The difference between syscalls in QEMU 2.0 and qemu.git/master could
reveal the problem.

Using perf you can also trace ioeventfd signalling in the host kernel
and compare against the QEMU handle_notify entry/return.  It may be
easiest to use the ftrace_marker tracing backing in QEMU so the trace is
unified with the host kernel trace (./configure
--enable-trace-backend=ftrace and see the ftrace section in QEMU
docs/tracing.txt).

This way you can see whether the ioeventfd signal - handle_notify()
entry increased or something else is going on.

Stefan


pgp0tBWdsV4pj.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] numa: check for busy memory backend

2014-06-30 Thread Michael S. Tsirkin

On Mon, Jun 30, 2014 at 03:46:56PM +0800, Hu Tao wrote:
 On Mon, Jun 30, 2014 at 09:53:20AM +0300, Michael S. Tsirkin wrote:
  On Mon, Jun 30, 2014 at 01:33:42PM +0800, Hu Tao wrote:
   On Sun, Jun 29, 2014 at 06:20:22PM +0300, Michael S. Tsirkin wrote:
On Wed, Jun 25, 2014 at 05:04:14PM +0800, Hu Tao wrote:
 ..to prevent one memory backend from being used by more than one numa
 node.

Thanks, but please always make the msg content self-contained
so it can be understood without the subject.
E.g. here, just drop ..to.

Are you sure we want this? Is there a chance sharing a backend
can be useful?
   
   This patch is actually a bug fix.
  
  It is?  What is the bug and how to reproduce it?
 
 If user specifies the same memory backend for two numa nodes:
 
 ./x86_64-softmmu/qemu-system-x86_64 -hda /home/data/libvirt-images/f18.img  
 -m 512M \
 -qmp unix:/tmp/m,server,nowait -monitor stdio -enable-kvm \
 -object memory-backend-ram,size=256M,id=ram0 \
 -numa node,nodeid=0,memdev=ram0 \
 -numa node,nodeid=1,memdev=ram0
 
  I am not sure we should write a ton of code to validate qemu
  configuration, as long as qemu does not assert.
 
 It seems qemu does not provide a way to disable assert currently.
 Even if I removed asserts on the code path in my test, there is another
 problem that it hits an infinite in render_memory_region().

OK so this is what commit log should say:
---
Specifying the same memory region twice leads to an assert:

./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object
memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0
-numa node,nodeid=1,memdev=ram0
qemu-system-x86_64: /scm/qemu/memory.c:1506:
memory_region_add_subregion_common: Assertion `!subregion-container'
failed.
Aborted (core dumped)

Detect and exit with an error message instead.
---

See? Explain why your patch makes sense, don't just repeat what it does.

  
   Even if we will want backend sharing, we
   can do it after.
  
  By reverting this patch? So why merge it?
 
 The point is qemu doesn't fire a bug no matter what user inputs.
 
  

Igor, what's your take?

 
 Signed-off-by: Hu Tao hu...@cn.fujitsu.com
 ---
  numa.c | 7 +++
  1 file changed, 7 insertions(+)
 
 diff --git a/numa.c b/numa.c
 index e471afe..6c1c554 100644
 --- a/numa.c
 +++ b/numa.c
 @@ -279,6 +279,13 @@ void 
 memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
  exit(1);
  }
  
 +if (memory_region_is_mapped(seg)) {
 +char *path = 
 object_get_canonical_path_component(OBJECT(backend));
 +error_report(memory backend %s is busy, path);

That's not very clear. How about:
memory backend %s is used multiple times. Each -numa option must use a 
different memdev value.

 +g_free(path);

As we are going to exit anyway, it does not make sense to bother with this.

 +exit(1);
 +}
 +
  memory_region_add_subregion(mr, addr, seg);
  vmstate_register_ram_global(seg);
  addr += size;
 -- 
 1.9.3

Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2

2014-06-30 Thread Ming Lei

On Mon, Jun 30, 2014 at 4:08 PM, Stefan Hajnoczi stefa...@redhat.com wrote:
 On Sat, Jun 28, 2014 at 05:58:58PM +0800, Ming Lei wrote:
 On Sat, Jun 28, 2014 at 5:51 AM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 27/06/2014 20:01, Ming Lei ha scritto:
 
  I just implemented plugunplug based batching, and it is working now.
  But throughout still has no obvious improvement.
 
  Looks loading in IOthread is a bit low, so I am wondering if there is
  block point caused by Qemu QEMU block layer.
 
 
  What does perf say?  Also, you can try using the QEMU trace subsystem and
  see where the latency goes.

 Follows some test result against 8589744aaf07b62 of
 upstream qemu, and the test is done on my 2core(4thread)
 laptop:

 1, with my draft batch patches[1](only linux-aio supported now)
 - throughput: +16% compared qemu upstream
 - average time spent by handle_notify(): 310us
 - average time between two handle_notify(): 1591us
 (this time reflects latency of handling host_notifier)

 16% is still a worthwhile improvement.  I guess batching only benefits
 aio=native since the threadpool ought to do better when it receives
 requests as soon as possible.

16% is obtained with 'simple' trace-backend enabled, and looks the
actual data with 'nop' trace is quite better than 16%, but it is still
not good as 2.0.0 release.


 Patch or an RFC would be welcome.

Yes, I will post it soon.

 2, same tests on 2.0.0 release(use custom Linux AIO)
 - average time spent by handle_notify(): 68us
 - average time between calling two handle_notify(): 269us
 (this time reflects latency of handling host_notifier)

 From above tests, looks root cause is late handling notify, and
 qemu block layer becomes 4times slower than previous custom
 linux aio taken by dataplane.

The above data is still obtained with 'simple' trace backend enabled,
I need to find other ways to test again without extra trace io.

 Try:
 $ perf record -e syscalls:* --tid iothread-tid
 ^C
 $ perf script # shows the trace log

 The difference between syscalls in QEMU 2.0 and qemu.git/master could
 reveal the problem.
 Using perf you can also trace ioeventfd signalling in the host kernel
 and compare against the QEMU handle_notify entry/return.  It may be
 easiest to use the ftrace_marker tracing backing in QEMU so the trace is
 unified with the host kernel trace (./configure
 --enable-trace-backend=ftrace and see the ftrace section in QEMU
 docs/tracing.txt).

 This way you can see whether the ioeventfd signal - handle_notify()
 entry increased or something else is going on.

Looks good ideas, I will try it.

I have tried ftrace, but looks some traces may be dropped and my
current script can't handle that well.


Thanks,
-- 
Ming Lei

[Qemu-devel] [PATCH v5] ppc: spapr-rtas - implement os-term rtas call

2014-06-30 Thread Nikunj A Dadhania

PAPR compliant guest calls this in absence of kdump. This finally
reaches the guest and can be handled according to the policies set by
higher level tools(like taking dump) for further analysis by tools like
crash.

Linux kernel calls ibm,os-term when extended property of os-term is set.
This makes sure that a return to the linux kernel is gauranteed.

CC: Benjamin Herrenschmidt b...@au1.ibm.com
CC: Anton Blanchard an...@samba.org
CC: Alexander Graf ag...@suse.de
CC: Tyrel Datwyler turtle.in.the.ker...@gmail.com
Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com

---

v2: rebase to ppcnext
v3: Do not stop the VM, and update comments
v4: update spapr_register_rtas and qapi_event changes
v5: set ibm,extended-os-term as null encoded property
---
 hw/ppc/spapr.c |  9 +
 hw/ppc/spapr_rtas.c| 15 +++
 include/hw/ppc/spapr.h |  1 -
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 307c58d..e6c9014 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -520,6 +520,15 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
 
 _FDT((fdt_property_cell(fdt, rtas-error-log-max, RTAS_ERROR_LOG_MAX)));
 
+/*
+ * According to PAPR, rtas ibm,os-term, does not gaurantee a return
+ * back to the guest cpu.
+ *
+ * While an additional ibm,extended-os-term property indicates that
+ * rtas call return will always occur. Set this property.
+ */
+_FDT((fdt_property(fdt, ibm,extended-os-term, NULL, 0)));
+
 _FDT((fdt_end_node(fdt)));
 
 /* interrupt controller */
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 9ba1ba6..2ec2a8e 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -277,6 +277,19 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu,
 rtas_st(rets, 0, ret);
 }
 
+static void rtas_ibm_os_term(PowerPCCPU *cpu,
+sPAPREnvironment *spapr,
+uint32_t token, uint32_t nargs,
+target_ulong args,
+uint32_t nret, target_ulong rets)
+{
+target_ulong ret = 0;
+
+qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, error_abort);
+
+rtas_st(rets, 0, ret);
+}
+
 static struct rtas_call {
 const char *name;
 spapr_rtas_fn fn;
@@ -404,6 +417,8 @@ static void core_rtas_register_types(void)
 spapr_rtas_register(RTAS_IBM_SET_SYSTEM_PARAMETER,
 ibm,set-system-parameter,
 rtas_ibm_set_system_parameter);
+spapr_rtas_register(RTAS_IBM_OS_TERM, ibm,os-term,
+rtas_ibm_os_term);
 }
 
 type_init(core_rtas_register_types)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index bbba51a..4e96381 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -382,7 +382,6 @@ int spapr_allocate_irq_block(int num, bool lsi, bool msi);
 #define RTAS_GET_SENSOR_STATE   (RTAS_TOKEN_BASE + 0x1D)
 #define RTAS_IBM_CONFIGURE_CONNECTOR(RTAS_TOKEN_BASE + 0x1E)
 #define RTAS_IBM_OS_TERM(RTAS_TOKEN_BASE + 0x1F)
-#define RTAS_IBM_EXTENDED_OS_TERM   (RTAS_TOKEN_BASE + 0x20)
 
 #define RTAS_TOKEN_MAX  (RTAS_TOKEN_BASE + 0x21)
 
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] ui/vnc: fix potential memory corruption issues

2014-06-30 Thread Gerd Hoffmann

On Mo, 2014-06-30 at 10:01 +0200, Peter Lieven wrote:
 On 30.06.2014 09:52, Gerd Hoffmann wrote:
  So this basically makes vnc work correctly in case guest surface and
  server surface have different sizes, then fixes the two bugs on top of
  that.  And it obsoletes the other corruption patch send Friday.
  Correct?
 
 Yes and yes.

Good, so I'm reading the code correctly ;)
Can you put that information into the patch description please?
Otherwise the patch is fine.

thanks,
  Gerd

Re: [Qemu-devel] [PATCH v4 0/9] virtio-blk: use alias properties in transport devices

2014-06-30 Thread Stefan Hajnoczi

On Wed, Jun 18, 2014 at 05:58:27PM +0800, Stefan Hajnoczi wrote:
 v4:
  * Coding style: typedef struct { on a single line [Andreas]
  * Add dataplane: bail out on unsupported transport for s390-virtio 
 [Cornelia]
 
 v3:
  * Split qdev_alias_all_properties() into its own patch [Peter Crosthwaite]
  * Do not dereference DEVICE_CLASS(class) inline [Peter Crosthwaite]
 
 v2:
  * Add qdev_alias_all_properties() instead of virtio-blk-specific function 
 [Paolo]
  * Explain refcount handling in doc comment [Paolo]
  * Fix property duplicate typo [Peter Crosthwaite]
  * Add the same object or to clarify commit description [Igor]
 
 Thanks for the feedback on the RFC.  This time around the alias property is
 implemented at the QOM property level instead of at the qdev property level.
 
 Note that this series only addresses virtio-blk.  In later series we can
 convert virtio net, scsi, rng, and serial.
 
 The virtio transport/device split is broken as follows:
 
 1. The virtio-blk device is never finalized because the transport devices
(virtio-blk-pci and friends) leak the refcount.
 
 2. If we fix the refcount leak then we double-free the 'serial' string 
 property
upon hot unplug since its char* is copied into the virtio-blk device which
has an identical 'serial' qdev property.
 
 This series solves both of these problems as follows:
 
 1. Introduce a QOM alias property that lets the transport device forward
property accesses into the virtio device (the child).
 
 2. Use alias properties in transport devices, instead of keeping a duplicate
copy of the VirtIOBlkConf struct.
 
 3. Fix the virtio-blk device refcount leak.  It's now safe to do this since 
 the
double-free has been resolved.
 
 Tested that hotplug/hotunplug of virtio-blk-pci still works.
 
 Cornelia Huck (1):
   dataplane: bail out on unsupported transport
 
 Stefan Hajnoczi (8):
   qom: add object_property_add_alias()
   virtio-blk: avoid qdev property definition duplication
   virtio-blk: move x-data-plane qdev property to virtio-blk.h
   qdev: add qdev_alias_all_properties()
   virtio-blk: use aliases instead of duplicate qdev properties
   virtio-blk: drop virtio_blk_set_conf()
   virtio: fix virtio-blk child refcount in transports
   virtio-blk: move qdev properties into virtio-blk.c
 
  hw/block/dataplane/virtio-blk.c | 10 
  hw/block/virtio-blk.c   | 18 +--
  hw/core/qdev.c  | 21 +
  hw/s390x/s390-virtio-bus.c  | 10 ++--
  hw/s390x/s390-virtio-bus.h  |  1 -
  hw/s390x/virtio-ccw.c   |  7 ++
  hw/s390x/virtio-ccw.h   |  1 -
  hw/virtio/virtio-pci.c  |  7 ++
  hw/virtio/virtio-pci.h  |  1 -
  include/hw/qdev-properties.h|  2 ++
  include/hw/virtio/virtio-blk.h  | 19 ---
  include/qom/object.h| 20 
  qom/object.c| 51 
 +
  13 files changed, 121 insertions(+), 47 deletions(-)

Applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan


pgpcyIUD2l5RB.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] numa: check for busy memory backend

2014-06-30 Thread Igor Mammedov

On Mon, 30 Jun 2014 11:28:07 +0300
Michael S. Tsirkin m...@redhat.com wrote:

 On Mon, Jun 30, 2014 at 03:46:56PM +0800, Hu Tao wrote:
  On Mon, Jun 30, 2014 at 09:53:20AM +0300, Michael S. Tsirkin wrote:
   On Mon, Jun 30, 2014 at 01:33:42PM +0800, Hu Tao wrote:
On Sun, Jun 29, 2014 at 06:20:22PM +0300, Michael S. Tsirkin wrote:
 On Wed, Jun 25, 2014 at 05:04:14PM +0800, Hu Tao wrote:
  ..to prevent one memory backend from being used by more than one 
  numa
  node.
 
 Thanks, but please always make the msg content self-contained
 so it can be understood without the subject.
 E.g. here, just drop ..to.
 
 Are you sure we want this? Is there a chance sharing a backend
 can be useful?

This patch is actually a bug fix.
   
   It is?  What is the bug and how to reproduce it?
  
  If user specifies the same memory backend for two numa nodes:
  
  ./x86_64-softmmu/qemu-system-x86_64 -hda /home/data/libvirt-images/f18.img  
  -m 512M \
  -qmp unix:/tmp/m,server,nowait -monitor stdio -enable-kvm \
  -object memory-backend-ram,size=256M,id=ram0 \
  -numa node,nodeid=0,memdev=ram0 \
  -numa node,nodeid=1,memdev=ram0
  
   I am not sure we should write a ton of code to validate qemu
   configuration, as long as qemu does not assert.
  
  It seems qemu does not provide a way to disable assert currently.
  Even if I removed asserts on the code path in my test, there is another
  problem that it hits an infinite in render_memory_region().
 
 OK so this is what commit log should say:
 ---
 Specifying the same memory region twice leads to an assert:
 
 ./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object
 memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0
 -numa node,nodeid=1,memdev=ram0
 qemu-system-x86_64: /scm/qemu/memory.c:1506:
 memory_region_add_subregion_common: Assertion `!subregion-container'
 failed.
 Aborted (core dumped)
 
 Detect and exit with an error message instead.
 ---
with  fixed-up commit message:
Reviewed-by: Igor Mammedov imamm...@redhat.com

 
 See? Explain why your patch makes sense, don't just repeat what it does.
 
   
Even if we will want backend sharing, we
can do it after.
   
   By reverting this patch? So why merge it?
  
  The point is qemu doesn't fire a bug no matter what user inputs.
  
   
 
 Igor, what's your take?
 
  
  Signed-off-by: Hu Tao hu...@cn.fujitsu.com
  ---
   numa.c | 7 +++
   1 file changed, 7 insertions(+)
  
  diff --git a/numa.c b/numa.c
  index e471afe..6c1c554 100644
  --- a/numa.c
  +++ b/numa.c
  @@ -279,6 +279,13 @@ void 
  memory_region_allocate_system_memory(MemoryRegion *mr, Object 
  *owner,
   exit(1);
   }
   
  +if (memory_region_is_mapped(seg)) {
  +char *path = 
  object_get_canonical_path_component(OBJECT(backend));
  +error_report(memory backend %s is busy, path);
 
 That's not very clear. How about:
   memory backend %s is used multiple times. Each -numa option must use a 
 different memdev value.
 
  +g_free(path);
 
 As we are going to exit anyway, it does not make sense to bother with this.
 
  +exit(1);
  +}
  +
   memory_region_add_subregion(mr, addr, seg);
   vmstate_register_ram_global(seg);
   addr += size;
  -- 
  1.9.3

[Qemu-devel] [PATCHv2] ui/vnc: fix potential memory corruption issues

2014-06-30 Thread Peter Lieven

this patch makes the VNC server work correctly if the
server surface and the guest surface have different sizes.

Basically the server surface is adjusted to not exceed VNC_MAX_WIDTH
x VNC_MAX_HEIGHT and additionally the width is rounded up to multiple of
VNC_DIRTY_PIXELS_PER_BIT. 

If we have a resolution whose width is not dividable by VNC_DIRTY_PIXELS_PER_BIT
we now get a small black bar on the right of the screen. 

If the surface is too big to fit the limits only the upper left area is shown. 

On top of that this fixes 2 memory corruption issues:

The first was actually discovered during playing
around with a Windows 7 vServer. During resolution
change in Windows 7 it happens sometimes that Windows
changes to an intermediate resolution where
server_stride % cmp_bytes != 0 (in vnc_refresh_server_surface).
This happens only if width % VNC_DIRTY_PIXELS_PER_BIT != 0.

The second is a theoretical issue, but is maybe exploitable
by the guest. If for some reason the guest surface size is bigger
than VNC_MAX_WIDTH x VNC_MAX_HEIGHT we end up in severe corruption since
this limit is nowhere enforced.

Signed-off-by: Peter Lieven p...@kamp.de
---
v1-v2: updated commit msg [Gerd]

 ui/vnc.c |  149 +-
 ui/vnc.h |   14 +++---
 2 files changed, 77 insertions(+), 86 deletions(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index 14a86c3..a6d748b 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -432,14 +432,10 @@ static void framebuffer_update_request(VncState *vs, int 
incremental,
 static void vnc_refresh(DisplayChangeListener *dcl);
 static int vnc_refresh_server_surface(VncDisplay *vd);
 
-static void vnc_dpy_update(DisplayChangeListener *dcl,
-   int x, int y, int w, int h)
-{
-VncDisplay *vd = container_of(dcl, VncDisplay, dcl);
-struct VncSurface *s = vd-guest;
-int width = surface_width(vd-ds);
-int height = surface_height(vd-ds);
-
+static void vnc_set_area_dirty(DECLARE_BITMAP(dirty[VNC_MAX_HEIGHT],
+   VNC_MAX_WIDTH / VNC_DIRTY_PIXELS_PER_BIT),
+   int width, int height,
+   int x, int y, int w, int h) {
 /* this is needed this to ensure we updated all affected
  * blocks if x % VNC_DIRTY_PIXELS_PER_BIT != 0 */
 w += (x % VNC_DIRTY_PIXELS_PER_BIT);
@@ -451,11 +447,22 @@ static void vnc_dpy_update(DisplayChangeListener *dcl,
 h = MIN(y + h, height);
 
 for (; y  h; y++) {
-bitmap_set(s-dirty[y], x / VNC_DIRTY_PIXELS_PER_BIT,
+bitmap_set(dirty[y], x / VNC_DIRTY_PIXELS_PER_BIT,
DIV_ROUND_UP(w, VNC_DIRTY_PIXELS_PER_BIT));
 }
 }
 
+static void vnc_dpy_update(DisplayChangeListener *dcl,
+   int x, int y, int w, int h)
+{
+VncDisplay *vd = container_of(dcl, VncDisplay, dcl);
+struct VncSurface *s = vd-guest;
+int width = pixman_image_get_width(vd-server);
+int height = pixman_image_get_height(vd-server);
+
+vnc_set_area_dirty(s-dirty, width, height, x, y, w, h);
+}
+
 void vnc_framebuffer_update(VncState *vs, int x, int y, int w, int h,
 int32_t encoding)
 {
@@ -517,17 +524,15 @@ void buffer_advance(Buffer *buf, size_t len)
 
 static void vnc_desktop_resize(VncState *vs)
 {
-DisplaySurface *ds = vs-vd-ds;
-
 if (vs-csock == -1 || !vnc_has_feature(vs, VNC_FEATURE_RESIZE)) {
 return;
 }
-if (vs-client_width == surface_width(ds) 
-vs-client_height == surface_height(ds)) {
+if (vs-client_width == pixman_image_get_width(vs-vd-server) 
+vs-client_height == pixman_image_get_height(vs-vd-server)) {
 return;
 }
-vs-client_width = surface_width(ds);
-vs-client_height = surface_height(ds);
+vs-client_width = pixman_image_get_width(vs-vd-server);
+vs-client_height = pixman_image_get_height(vs-vd-server);
 vnc_lock_output(vs);
 vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE);
 vnc_write_u8(vs, 0);
@@ -571,31 +576,24 @@ void *vnc_server_fb_ptr(VncDisplay *vd, int x, int y)
 ptr += x * VNC_SERVER_FB_BYTES;
 return ptr;
 }
-/* this sets only the visible pixels of a dirty bitmap */
-#define VNC_SET_VISIBLE_PIXELS_DIRTY(bitmap, w, h) {\
-int y;\
-memset(bitmap, 0x00, sizeof(bitmap));\
-for (y = 0; y  h; y++) {\
-bitmap_set(bitmap[y], 0,\
-   DIV_ROUND_UP(w, VNC_DIRTY_PIXELS_PER_BIT));\
-} \
-}
 
 static void vnc_dpy_switch(DisplayChangeListener *dcl,
DisplaySurface *surface)
 {
 VncDisplay *vd = container_of(dcl, VncDisplay, dcl);
 VncState *vs;
+int width, height;
 
 vnc_abort_display_jobs(vd);
 
 /* server surface */
 qemu_pixman_image_unref(vd-server);
 vd-ds = surface;
+width = MIN(VNC_MAX_WIDTH, ROUND_UP(surface_width(vd-ds),
+VNC_DIRTY_PIXELS_PER_BIT));
+height =

Re: [Qemu-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-06-30 Thread Michael S. Tsirkin

On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote:
 On 2014/6/30 14:48, Michael S. Tsirkin wrote:
 On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote:
 On 2014/6/26 18:03, Paolo Bonzini wrote:
 Il 26/06/2014 11:18, Chen, Tiejun ha scritto:
 
 
 - offsets 0x..0x0fff map to configuration space of the host MCH
 
 
 Are you saying the config space in the video device?
 
 No, I am saying in a new BAR, or at some magic offset of an existing
 MMIO BAR.
 
 
 As I mentioned previously, the IGD guy told me we have no any unused a
 offset or BAR in the config space.
 
 And guy who are responsible for the native driver seems not be accept to
 extend some magic offset of an existing MMIO BAR.
 
 In addition I think in a short time its not possible to migrate i440fx to
 q35 as a PCIe machine of xen.
 
 That seems like a weak motivation.  I don't see a need to get something
 merged upstream in a short time: this seems sure to miss 2.1,
 so you have the time to make it architecturally sound.
 Making existing guests work would be a better motivation.
 
 Yes.

So focus on this then. Existing guests will probably work
fine on a newer chipset - likely better than on i440fx.
xen management tools need to do some work to support this?
That will just give everyone more long term benefits.

If instead we create a hack that does not resemble
any existing hardware even remotely, what's the
chance that it will not break with any future
guest modification?


 Isn't this possible with an mch chipset?
 
 If you're saying q35, I mean AFAIK we have no any plan to migrate to this
 MCH in xen case.

q35 or a newer chipset that's closer to what guests expect.

 Additionally, I think I should follow this feature after
 q35 can work for xen scenario.

What's stopping you?

 
 
 So could we do this step by step:
 
 #1 phase: We just cover current qemu-xen implementation based on i44fx, so
 still provide that pseudo ISA bridge at 00:1f.0 as we already did.
 
 By the way there is no reason to put it at 00:1f.0 specifically I think.
 So it seems simple: create a dummy device that gets device and
 vendor id as properties. If you really like, add an option to get values
 
 Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx
 passthrough: create pseudo intel isa bridge. There, we fake this device just
 at 00:1f.0.
 But you guys don't like this, and shouldn't this be just this point we
 discussing now?
 
 If you guys agree that , everything is fine.

Actually, this isn't what you did.
Don't tie it to xen, and don't tie it to 1f.
Just make it a simple stub pci device.
Whoever wants it, creates it.

The thing I worry about, is the chance this will break going forward.
So you created a system with 2 isa bridges.
This is already not something that exists on real hardware.
So it might break some guests that will get confused.
Maybe we are lucky and most guests see an unfamiliar device
and ignore it. It seems believable.

But your MCH hack overrides registers in the pci host.
Are we lucky and there's nothing in these registers
of interest to guests? This seems much more fragile.
So please poke at the spec, and compile the list
of registers you want to touch, figure out why they are
safe to override, and put this all in code comments.

And the same thing that applies to the isa bridge
applies here too. It should work without QEMU touching
hosts' hardware.



 from sysfs: device and vendor id are world readable, so just get them
 directly and not through xen wrappers, this way you can open the files
 RO and not RW.
 You seem to poke at revision as well, I don't see
 driver looking at that - strictly necessary?
 If yes please patch host kernel to expose that info in sysfs,
 though we can fall back on pci config if not there.
 
 MCH (bridge_dev) hacks in i915 are nastier.
 To clean them up, we really have to have an explicit driver for this
 bridge, not a pass-through device.  Long term, the right thing to do is
 likely to extend host driver and expose the necessary information in
 sysfs on host kernel.
 
 
 
 I'm a bit confused. Any sysfs should be filled by the associated PCIe
 device, shouldn't it? So qemu still need to emulate this PCIe device
 firstly, then set properties into sysfs.

I am talking about getting host properties into qemu.
You don't want to give qemu R/W root access to host sysfs system
of the root bridge, that's not secure.
Avoiding read only access to filesystem is a good idea too, so it
should be possible to pass all parameters in as
device properties, and let whoever starts qemu
figure out what are reasonable values.

 
 
 #2 phase: Now, we will choose a capability ID that won't be conflicting with
 others. To do this properly, we need to get one from PCI SIG group. To have
 this workable and consistently validated, this method shouldn't be virt
 specific. Then native driver should use the same method.
 
 You mean you will be able to talk sense into hardware guys?
 I doubt that. If they could

Re: [Qemu-devel] [PATCH v5] ppc: spapr-rtas - implement os-term rtas call

2014-06-30 Thread Alexander Graf



 Am 30.06.2014 um 10:35 schrieb Nikunj A Dadhania nik...@linux.vnet.ibm.com:
 
 PAPR compliant guest calls this in absence of kdump. This finally
 reaches the guest and can be handled according to the policies set by
 higher level tools(like taking dump) for further analysis by tools like
 crash.
 
 Linux kernel calls ibm,os-term when extended property of os-term is set.
 This makes sure that a return to the linux kernel is gauranteed.
 
 CC: Benjamin Herrenschmidt b...@au1.ibm.com
 CC: Anton Blanchard an...@samba.org
 CC: Alexander Graf ag...@suse.de
 CC: Tyrel Datwyler turtle.in.the.ker...@gmail.com
 Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com
 
 ---
 
 v2: rebase to ppcnext
 v3: Do not stop the VM, and update comments
 v4: update spapr_register_rtas and qapi_event changes
 v5: set ibm,extended-os-term as null encoded property
 ---
 hw/ppc/spapr.c |  9 +
 hw/ppc/spapr_rtas.c| 15 +++
 include/hw/ppc/spapr.h |  1 -
 3 files changed, 24 insertions(+), 1 deletion(-)
 
 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
 index 307c58d..e6c9014 100644
 --- a/hw/ppc/spapr.c
 +++ b/hw/ppc/spapr.c
 @@ -520,6 +520,15 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
 
 _FDT((fdt_property_cell(fdt, rtas-error-log-max, RTAS_ERROR_LOG_MAX)));
 
 +/*
 + * According to PAPR, rtas ibm,os-term, does not gaurantee a return
 + * back to the guest cpu.
 + *
 + * While an additional ibm,extended-os-term property indicates that
 + * rtas call return will always occur. Set this property.
 + */
 +_FDT((fdt_property(fdt, ibm,extended-os-term, NULL, 0)));
 +
 _FDT((fdt_end_node(fdt)));
 
 /* interrupt controller */
 diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
 index 9ba1ba6..2ec2a8e 100644
 --- a/hw/ppc/spapr_rtas.c
 +++ b/hw/ppc/spapr_rtas.c
 @@ -277,6 +277,19 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU 
 *cpu,
 rtas_st(rets, 0, ret);
 }
 
 +static void rtas_ibm_os_term(PowerPCCPU *cpu,
 +sPAPREnvironment *spapr,
 +uint32_t token, uint32_t nargs,
 +target_ulong args,
 +uint32_t nret, target_ulong rets)
 +{
 +target_ulong ret = 0;
 +
 +qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, error_abort);

The guest doesn't pause. Since the guest will call os-term in a loop, this will 
also flood the event listener with lots and lots of panic messages.

Alex

Re: [Qemu-devel] [PATCH] numa: check for busy memory backend

2014-06-30 Thread Michael S. Tsirkin

On Mon, Jun 30, 2014 at 10:48:22AM +0200, Igor Mammedov wrote:
 On Mon, 30 Jun 2014 11:28:07 +0300
 Michael S. Tsirkin m...@redhat.com wrote:
 
  On Mon, Jun 30, 2014 at 03:46:56PM +0800, Hu Tao wrote:
   On Mon, Jun 30, 2014 at 09:53:20AM +0300, Michael S. Tsirkin wrote:
On Mon, Jun 30, 2014 at 01:33:42PM +0800, Hu Tao wrote:
 On Sun, Jun 29, 2014 at 06:20:22PM +0300, Michael S. Tsirkin wrote:
  On Wed, Jun 25, 2014 at 05:04:14PM +0800, Hu Tao wrote:
   ..to prevent one memory backend from being used by more than one 
   numa
   node.
  
  Thanks, but please always make the msg content self-contained
  so it can be understood without the subject.
  E.g. here, just drop ..to.
  
  Are you sure we want this? Is there a chance sharing a backend
  can be useful?
 
 This patch is actually a bug fix.

It is?  What is the bug and how to reproduce it?
   
   If user specifies the same memory backend for two numa nodes:
   
   ./x86_64-softmmu/qemu-system-x86_64 -hda 
   /home/data/libvirt-images/f18.img  -m 512M \
   -qmp unix:/tmp/m,server,nowait -monitor stdio -enable-kvm \
   -object memory-backend-ram,size=256M,id=ram0 \
   -numa node,nodeid=0,memdev=ram0 \
   -numa node,nodeid=1,memdev=ram0
   
I am not sure we should write a ton of code to validate qemu
configuration, as long as qemu does not assert.
   
   It seems qemu does not provide a way to disable assert currently.
   Even if I removed asserts on the code path in my test, there is another
   problem that it hits an infinite in render_memory_region().
  
  OK so this is what commit log should say:
  ---
  Specifying the same memory region twice leads to an assert:
  
  ./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object
  memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0
  -numa node,nodeid=1,memdev=ram0
  qemu-system-x86_64: /scm/qemu/memory.c:1506:
  memory_region_add_subregion_common: Assertion `!subregion-container'
  failed.
  Aborted (core dumped)
  
  Detect and exit with an error message instead.
  ---
 with  fixed-up commit message:
 Reviewed-by: Igor Mammedov imamm...@redhat.com

Sorry I want the error message fixed up too.

  
  See? Explain why your patch makes sense, don't just repeat what it does.
  

 Even if we will want backend sharing, we
 can do it after.

By reverting this patch? So why merge it?
   
   The point is qemu doesn't fire a bug no matter what user inputs.
   

  
  Igor, what's your take?
  
   
   Signed-off-by: Hu Tao hu...@cn.fujitsu.com
   ---
numa.c | 7 +++
1 file changed, 7 insertions(+)
   
   diff --git a/numa.c b/numa.c
   index e471afe..6c1c554 100644
   --- a/numa.c
   +++ b/numa.c
   @@ -279,6 +279,13 @@ void 
   memory_region_allocate_system_memory(MemoryRegion *mr, Object 
   *owner,
exit(1);
}

   +if (memory_region_is_mapped(seg)) {
   +char *path = 
   object_get_canonical_path_component(OBJECT(backend));
   +error_report(memory backend %s is busy, path);
  
  That's not very clear. How about:
  memory backend %s is used multiple times. Each -numa option must use a 
  different memdev value.
  
   +g_free(path);
  
  As we are going to exit anyway, it does not make sense to bother with this.
  
   +exit(1);
   +}
   +
memory_region_add_subregion(mr, addr, seg);
vmstate_register_ram_global(seg);
addr += size;
   -- 
   1.9.3

Re: [Qemu-devel] [PATCH v5] ppc: spapr-rtas - implement os-term rtas call

2014-06-30 Thread Nikunj A Dadhania

Alexander Graf ag...@suse.de writes:

 Am 30.06.2014 um 10:35 schrieb Nikunj A Dadhania nik...@linux.vnet.ibm.com:
 
 +static void rtas_ibm_os_term(PowerPCCPU *cpu,
 +sPAPREnvironment *spapr,
 +uint32_t token, uint32_t nargs,
 +target_ulong args,
 +uint32_t nret, target_ulong rets)
 +{
 +target_ulong ret = 0;
 +
 +qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, error_abort);

 The guest doesn't pause. 

I see the event reaching libvirt and a dump taken and the guest
restarts.

 Since the guest will call os-term in a loop, this will also flood the
 event listener with lots and lots of panic messages.

do {
status = rtas_call(rtas_token(ibm,os-term), 1, 1, NULL,
   __pa(rtas_os_term_buf));
} while (rtas_busy_delay(status));

So when status from the rtas call is success, the loop should exit.

Am I missing something?

Regards
Nikunj

Re: [Qemu-devel] [v2][PATCH 4/8] xen, gfx passthrough: reserve 00:02.0 for INTEL IGD

2014-06-30 Thread Gerd Hoffmann

  Hi,

/* Make cirrues VGA S3 suspend/resume work in Windows
  XP/2003 */
Device (VGA)
{
  -   Name (_ADR, 0x0002)
  +   // Address of the VGA (device F function 0)
  +   Name (_ADR, 0x000F)
 
 
  With this change I still didn't see anything.
 
 
 This does not match with what I see.
 
 Looks like linux does not care about the acpi data.  Using

The acpi data doesn't matter at all for the boot screen, the vgabios
doesn't look at it.

 (d12) Scan for VGA option rom
 (d12) Running option rom at c000:0003

seabios loaded+initialized the vgabios (from the pci rom bar of the vga
device).  Which slot the vga is installed at should not matter at all.
seabios scans the pci bus and should find the vga with no problems no
matter where it is.

 (XEN) stdvga.c:147:d12v0 entering stdvga and caching modes

Seems to have worked fine ;)

 (d12) pmm call arg1=0
 (d12) Turning on vga text mode console
 (d12) SeaBIOS (version rel-1.7.5-0-ge51488c-20140626_113926-dcs-xen-54)

At this point you should see the seabios banner at the vga screen.

 And the VGA screen has the SeaBIOS messages:
 
 SeaBIOS (version rel-1.7.5-0-ge51488c-20140626_113926-dcs-xen-54)
 Machine UUID 18cf5fd0-e564-49c4-b63f-e6c3c23b1294

As it should be.

Now the questions is why it doesn't work for Tiejun ...
Anything in the logs?

cheers,
  Gerd

Re: [Qemu-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-06-30 Thread Chen, Tiejun


On 2014/6/30 17:05, Michael S. Tsirkin wrote:

On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote:

On 2014/6/30 14:48, Michael S. Tsirkin wrote:

On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote:

On 2014/6/26 18:03, Paolo Bonzini wrote:

Il 26/06/2014 11:18, Chen, Tiejun ha scritto:




- offsets 0x..0x0fff map to configuration space of the host MCH



Are you saying the config space in the video device?


No, I am saying in a new BAR, or at some magic offset of an existing
MMIO BAR.



As I mentioned previously, the IGD guy told me we have no any unused a
offset or BAR in the config space.

And guy who are responsible for the native driver seems not be accept to
extend some magic offset of an existing MMIO BAR.

In addition I think in a short time its not possible to migrate i440fx to
q35 as a PCIe machine of xen.


That seems like a weak motivation.  I don't see a need to get something
merged upstream in a short time: this seems sure to miss 2.1,
so you have the time to make it architecturally sound.
Making existing guests work would be a better motivation.


Yes.


So focus on this then. Existing guests will probably work
fine on a newer chipset - likely better than on i440fx.
xen management tools need to do some work to support this?
That will just give everyone more long term benefits.

If instead we create a hack that does not resemble
any existing hardware even remotely, what's the
chance that it will not break with any future
guest modification?



Isn't this possible with an mch chipset?


If you're saying q35, I mean AFAIK we have no any plan to migrate to this
MCH in xen case.


q35 or a newer chipset that's closer to what guests expect.


Additionally, I think I should follow this feature after
q35 can work for xen scenario.


What's stopping you?


I mean if you want create an new machine based on q35, actually this is 
equal to start making xen to migrate to q35 now. Right? I can't image 
how much effort should be done.


So this is a reason why I'm saying I'd like to follow this feature after 
q35 can work with xen completely.








So could we do this step by step:

#1 phase: We just cover current qemu-xen implementation based on i44fx, so
still provide that pseudo ISA bridge at 00:1f.0 as we already did.


By the way there is no reason to put it at 00:1f.0 specifically I think.
So it seems simple: create a dummy device that gets device and
vendor id as properties. If you really like, add an option to get values


Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx
passthrough: create pseudo intel isa bridge. There, we fake this device just
at 00:1f.0.
But you guys don't like this, and shouldn't this be just this point we
discussing now?

If you guys agree that , everything is fine.


Actually, this isn't what you did.
Don't tie it to xen, and don't tie it to 1f.
Just make it a simple stub pci device.
Whoever wants it, creates it.

The thing I worry about, is the chance this will break going forward.
So you created a system with 2 isa bridges.


I don't set class type to claim this is an ISA bridge, just a normal PCI 
device.


+static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice 
*hdev)

+{
+struct PCIDevice *dev;
+
+char rid;
+
+/* We havt to use a simple PCI device to fake this ISA bridge
+ * to avoid making some confusion to BIOS and ACPI.
+ */
+dev = pci_create(bus, PCI_DEVFN(0x1f, 0), 
pseudo-intel-pch-isa-bridge);

+
+qdev_init_nofail(dev-qdev);
+
+pci_config_set_vendor_id(dev-config, hdev-vendor_id);
+pci_config_set_device_id(dev-config, hdev-device_id);
+
+xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)rid, 1);
+
+pci_config_set_revision(dev-config, rid);
+
+XEN_PT_LOG(dev, The pseudo Intel PCH ISA bridge created.\n);
+return 0;
+}


This is already not something that exists on real hardware.
So it might break some guests that will get confused.
Maybe we are lucky and most guests see an unfamiliar device
and ignore it. It seems believable.

But your MCH hack overrides registers in the pci host.


We just try to write *one* register we already confirm this is safe 
enough. Other register are read-only.



Are we lucky and there's nothing in these registers
of interest to guests? This seems much more fragile.
So please poke at the spec, and compile the list
of registers you want to touch, figure out why they are
safe to override, and put this all in code comments.

And the same thing that applies to the isa bridge


We just expose its own vendor/device ids here. We don't access to change 
anything in the isa bridge.



applies here too. It should work without QEMU touching
hosts' hardware.




from sysfs: device and vendor id are world readable, so just get them

directly and not through xen wrappers, this way you can open the files
RO and not RW.
You seem to poke at revision as well, I don't see
driver looking at that - strictly necessary?
If yes please

Re: [Qemu-devel] [PULL 01/15] block/iscsi: handle BUSY condition

2014-06-30 Thread Alexey Kardashevskiy

On 06/19/2014 02:03 AM, Paolo Bonzini wrote:
 From: Peter Lieven p...@kamp.de
 
 this patch adds handling of BUSY status reponse from an iSCSI target.
 Currently, we fail with -EIO in case of SCSI_STATUS_BUSY while the
 obvious reaction would be to retry the operation after some time.
 The retry time is randomly choosen from a range with exponential
 growth increasing with each retry.
 
 This patch includes most of the changes by a an upcoming patch
 from Stefan Hajnoczi:
 
  iscsi: implement .bdrv_detach/attach_aio_context()
 
 because I also need the reference to the aio_context for
 the retry timer to work. I included the changes to maintain
 better mergeability.
 
 Signed-off-by: Peter Lieven p...@kamp.de
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  block/iscsi.c | 55 +++
  1 file changed, 43 insertions(+), 12 deletions(-)
 
 diff --git a/block/iscsi.c b/block/iscsi.c
 index 6f87605..e64659f 100644
 --- a/block/iscsi.c
 +++ b/block/iscsi.c
 @@ -26,6 +26,7 @@
  #include config-host.h
  
  #include poll.h
 +#include math.h
  #include arpa/inet.h
  #include qemu-common.h
  #include qemu/config-file.h
 @@ -75,6 +76,7 @@ typedef struct IscsiTask {
  Coroutine *co;
  QEMUBH *bh;
  IscsiLun *iscsilun;
 +QEMUTimer retry_timer;
  } IscsiTask;
  
  typedef struct IscsiAIOCB {
 @@ -86,7 +88,6 @@ typedef struct IscsiAIOCB {
  uint8_t *buf;
  int status;
  int canceled;
 -int retries;
  int64_t sector_num;
  int nb_sectors;
  #ifdef __linux__
 @@ -96,7 +97,8 @@ typedef struct IscsiAIOCB {
  
  #define NOP_INTERVAL 5000
  #define MAX_NOP_FAILURES 3
 -#define ISCSI_CMD_RETRIES 5
 +#define ISCSI_CMD_RETRIES ARRAY_SIZE(iscsi_retry_times)
 +static const unsigned iscsi_retry_times[] = {8, 32, 128, 512, 2048};
  
  /* this threshhold is a trade-off knob to choose between
   * the potential additional overhead of an extra GET_LBA_STATUS request
 @@ -146,6 +148,19 @@ static void iscsi_co_generic_bh_cb(void *opaque)
  qemu_coroutine_enter(iTask-co, NULL);
  }
  
 +static void iscsi_retry_timer_expired(void *opaque)
 +{
 +struct IscsiTask *iTask = opaque;
 +if (iTask-co) {
 +qemu_coroutine_enter(iTask-co, NULL);
 +}
 +}
 +
 +static inline unsigned exp_random(double mean)
 +{
 +return -mean * log((double)rand() / RAND_MAX);


This new use of log() breaks linkage on Fedora20/ppc64, qemu-nbd fails (but
qemu-system-ppc64 still links find). Adding --extra-cflags=-lm to
./configure invocation helps.

How to fix? Seems like fc20 packages do not have some dependency which
qemu-nbd relies on. Thanks.


[aik@vpl2 qemu]$ make  V=1
cc -DHAS_LIBSSH2_SFTP_FSYNC -m64 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64
-D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef
-Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common
-Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs
-Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers
-Wold-style-declaration -Wold-style-definition -Wtype-limits
-fstack-protector-strong -I/usr/include/p11-kit-1  -I/usr/include/p11-kit-1
   -I/usr/include/libpng16  -I/usr/include/libusb-1.0
-I/usr/include/pixman-1   -I/home/aik/p/qemu/tests -pthread
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  -g
-Wl,--warn-common -m64 -g  -o qemu-nbd qemu-nbd.o async.o thread-pool.o
nbd.o block.o blockjob.o main-loop.o iohandler.o qemu-timer.o aio-posix.o
qapi-types.o qapi-visit.o qemu-io-cmds.o qemu-coroutine.o
qemu-coroutine-lock.o qemu-coroutine-io.o qemu-coroutine-sleep.o
coroutine-ucontext.o block/raw_bsd.o block/cow.o block/qcow.o block/vdi.o
block/vmdk.o block/cloop.o block/dmg.o block/bochs.o block/vpc.o
block/vvfat.o block/qcow2.o block/qcow2-refcount.o block/qcow2-cluster.o
block/qcow2-snapshot.o block/qcow2-cache.o block/qed.o block/qed-gencb.o
block/qed-l2-cache.o block/qed-table.o block/qed-cluster.o
block/qed-check.o block/vhdx.o block/vhdx-endian.o block/vhdx-log.o
block/quorum.o block/parallels.o block/blkdebug.o block/blkverify.o
block/snapshot.o block/qapi.o block/raw-posix.o block/linux-aio.o
block/nbd.o block/nbd-client.o block/sheepdog.o block/iscsi.o block/curl.o
block/rbd.o block/gluster.o block/ssh.o libqemuutil.a libqemustub.a   -lz
-laio -L/usr/lib64/iscsi -liscsi -lcurl -lrbd -lrados -lgfapi -lglusterfs
-lgfrpc -lgfxdr -lssh2 -lgthread-2.0 -pthread -lglib-2.0   -lz -lrt -lz
-luuid -lgnutls  -lgnutls  -lgnutls  -lSDL -lpthread  -lX11 -lvte2_90
-lgtk-3 -lgdk-3 -lpangocairo-1.0 -lpango-1.0 -latk-1.0 -lcairo-gobject
-lgdk_pixbuf-2.0 -lgio-2.0 -lgobject-2.0 -lglib-2.0 -lcairo -lX11 -lXext
-lgtk-3 -lgdk-3 -lpangocairo-1.0 -lpango-1.0 -latk-1.0 -lcairo-gobject
-lcairo -lgdk_pixbuf-2.0 -lgio-2.0 -lgobject-2.0 -lglib-2.0  -lX11
-lrdmacm -libverbs
/usr/bin/ld: block/iscsi.o: undefined reference to symbol 'log@@GLIBC_2.3'
/usr/bin/ld: note: 'log@@GLIBC_2.3' is defined in DSO /lib64/libm.so.6 so
try adding it to the linker command

Re: [Qemu-devel] [PATCH v2] qtest: enable vhost-user-test

2014-06-30 Thread Michael S. Tsirkin

On Thu, Jun 19, 2014 at 09:24:09PM +0300, Michael S. Tsirkin wrote:
 On Thu, Jun 19, 2014 at 08:35:42PM +0300, Nikolay Nikolaev wrote:
  Use qtest-obj-y to get the right library order. CONFIG_POSIX ensures
  mingw compilation won't break.
  
  Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
 
 okay but why does non posix work without -lutil and posix doesn't?

Ping.


  ---
   0 files changed
  
  diff --git a/tests/Makefile b/tests/Makefile
  index 4caf7de..5661d52 100644
  --- a/tests/Makefile
  +++ b/tests/Makefile
  @@ -156,7 +156,7 @@ gcov-files-i386-y += hw/usb/hcd-ehci.c
   gcov-files-i386-y += hw/usb/hcd-uhci.c
   gcov-files-i386-y += hw/usb/dev-hid.c
   gcov-files-i386-y += hw/usb/dev-storage.c
  -#check-qtest-i386-y += tests/vhost-user-test$(EXESUF)
  +check-qtest-i386-$(CONFIG_POSIX) += tests/vhost-user-test$(EXESUF)
   check-qtest-x86_64-y = $(check-qtest-i386-y)
   gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c
   gcov-files-x86_64-y = $(subst 
  i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y))
  @@ -323,11 +323,14 @@ tests/es1370-test$(EXESUF): tests/es1370-test.o
   tests/intel-hda-test$(EXESUF): tests/intel-hda-test.o
   tests/ioh3420-test$(EXESUF): tests/ioh3420-test.o
   tests/usb-hcd-ehci-test$(EXESUF): tests/usb-hcd-ehci-test.o 
  $(libqos-pc-obj-y)
  -tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o 
  qemu-timer.o libqemuutil.a libqemustub.a
  +tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o 
  qemu-timer.o $(qtest-obj-y)
   tests/qemu-iotests/socket_scm_helper$(EXESUF): 
  tests/qemu-iotests/socket_scm_helper.o
   tests/test-qemu-opts$(EXESUF): tests/test-qemu-opts.o libqemuutil.a 
  libqemustub.a
   
  -#LIBS+= -lutil
  +
  +ifeq ($(CONFIG_POSIX),y)
  +LIBS+= -lutil
  +endif
   
   # QTest rules

Re: [Qemu-devel] [PATCH v3 1/2] block: Do not prematurely remove filename

2014-06-30 Thread Kevin Wolf

Am 26.06.2014 um 23:38 hat Max Reitz geschrieben:
 If filename is removed from the options QDict before entering
 bdrv_open_common(), it cannot be stored in the BDS.  Therefore, wait
 until it has been copied there and remove it from the options only
 afterwards.
 
 This fixes filename in the BDS being empty for block drivers which do
 not need the filename because they have parsed it already (e.g. NBD).
 
 Signed-off-by: Max Reitz mre...@redhat.com

I can't say I like this approach. It looks a bit odd to pass a boolean
variable to bdrv_open(), and in some other function called from there
the cleanup is done that logically really belong to bdrv_fill_options().

More importantly, the goal was to get rid of the filename and handle
everything through the options so that we get a uniform state again.
This would involve replacing bs-filename by a new callback function in
BlockDriver that constructs a filename that describes the BDS. This way
we would get useful output not only for nbd:localhost:10809, but also
for driver=nbd,host=localhost.

In hard cases, the callback might just use json:{...} syntax. This
suggests that maybe in the end we'll want to have two different
callbacks, one giving a short human-readable description
('localhost:10809') and another one giving something that can be used on
the command line ('json:{driver: nbd, host: localhost, ipv6:
true}').

Kevin

[Qemu-devel] [PATCH 0/3] block: introduce submit I/O at batch

2014-06-30 Thread Ming Lei

Hi,

The commit 580b6b2aa2(dataplane: use the QEMU block layer for I/O)
introduces ~40% throughput regression on virtio-blk dataplane, and
one of causes is that submitting I/O at batch is removed.

This patchset trys to introduce this mechanism on block, at least,
linux-aio can benefit from that.

With these patches, it is observed that thoughout on virtio-blk
dataplane can be improved a lot, see data in commit log of patch
3/3.

It should be possible to apply the batch mechanism to other devices
(such as virtio-scsi) too.

Thanks,
--
Ming Lei

[Qemu-devel] [PATCH 1/3] block: introduce IO queue APIs

2014-06-30 Thread Ming Lei

This patch introduces IO queue related APIs so that following
patches can support queuing I/O requests and submitting them
at batch for improving I/O performance.

Signed-off-by: Ming Lei ming@canonical.com
---
 aio-posix.c   |   13 
 block.c   |   78 +
 include/block/aio.h   |   12 +++
 include/block/block.h |6 
 include/block/block_int.h |3 ++
 5 files changed, 112 insertions(+)

diff --git a/aio-posix.c b/aio-posix.c
index f921d4f..3e065c1 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -240,3 +240,16 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 return progress;
 }
+
+/* IO queue support */
+void aio_init_io_queue(AioContext *ctx, unsigned int size)
+{
+ctx-io_queue.iocbs = g_new0(void *, size);
+ctx-io_queue.size = size;
+ctx-io_queue.idx = 0;
+}
+
+void aio_exit_io_queue(AioContext *ctx)
+{
+g_free(ctx-io_queue.iocbs);
+}
diff --git a/block.c b/block.c
index 217f523..c4a6f8b 100644
--- a/block.c
+++ b/block.c
@@ -1910,6 +1910,7 @@ void bdrv_drain_all(void)
 bool bs_busy;
 
 aio_context_acquire(aio_context);
+bdrv_io_unplug(bs);
 bdrv_start_throttled_reqs(bs);
 bs_busy = bdrv_requests_pending(bs);
 bs_busy |= aio_poll(aio_context, bs_busy);
@@ -5774,3 +5775,80 @@ bool bdrv_is_first_non_filter(BlockDriverState 
*candidate)
 
 return false;
 }
+
+/* IO queue APIs */
+void bdrv_init_io_queue(BlockDriverState *bs, unsigned int size)
+{
+aio_init_io_queue(bs-aio_context, size);
+}
+
+void bdrv_exit_io_queue(BlockDriverState *bs)
+{
+aio_exit_io_queue(bs-aio_context);
+}
+
+/* We think one block driver supports feature of io queue if
+ * they implement callback of .bdrv_submit_io_queue
+ */
+static bool __bdrv_support_io_queue(BlockDriverState *bs)
+{
+if (bs  bs-drv  bs-drv-bdrv_submit_io_queue)
+return true;
+return false;
+}
+bool bdrv_support_io_queue(BlockDriverState *bs)
+{
+if (__bdrv_support_io_queue(bs-file))
+return true;
+if (__bdrv_support_io_queue(bs-backing_hd))
+return true;
+return false;
+}
+
+static void __bdrv_io_unplug(BlockDriverState *bs)
+{
+if (!bdrv_support_io_queue(bs))
+return;
+
+if (!bs-aio_context-io_queue.idx)
+return;
+
+if (__bdrv_support_io_queue(bs-file))
+bs-file-drv-bdrv_submit_io_queue(bs-file);
+if (__bdrv_support_io_queue(bs-backing_hd))
+bs-backing_hd-drv-bdrv_submit_io_queue(bs-backing_hd);
+bs-aio_context-io_queue.idx = 0;
+}
+
+/* BlockDriver may call this function for queuing current IO request
+ * represented by iocb to io_queue, and will submit them at batch.
+ */
+void bdrv_queue_io(BlockDriverState *bs, void *iocb)
+{
+unsigned int idx = bs-aio_context-io_queue.idx;
+
+bs-aio_context-io_queue.iocbs[idx++] = iocb;
+bs-aio_context-io_queue.idx = idx;
+
+/* unplug immediately if queue is full */
+if (idx == bs-aio_context-io_queue.size)
+__bdrv_io_unplug(bs);
+}
+
+/* Block device calls this function to let driver queue IO request
+ * and submit them at batch later, but it is just a hint because
+ * block driver may not support the feature.
+ */
+void bdrv_io_plug(BlockDriverState *bs)
+{
+bs-aio_context-io_plugged = true;
+}
+
+/* Block device calls this function to ask driver to submit
+ * all requests in io queue immediatelly.
+ */
+void bdrv_io_unplug(BlockDriverState *bs)
+{
+__bdrv_io_unplug(bs);
+bs-aio_context-io_plugged = false;
+}
diff --git a/include/block/aio.h b/include/block/aio.h
index a92511b..b7fdcfb 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -45,6 +45,12 @@ typedef struct AioHandler AioHandler;
 typedef void QEMUBHFunc(void *opaque);
 typedef void IOHandler(void *opaque);
 
+typedef struct AioIOQueue {
+void **iocbs;
+unsigned int size;
+unsigned int idx;
+} AioIOQueue;
+
 struct AioContext {
 GSource source;
 
@@ -81,6 +87,10 @@ struct AioContext {
 
 /* TimerLists for calling timers - one per clock type */
 QEMUTimerListGroup tlg;
+
+/* IOQueue support */
+AioIOQueue io_queue;
+bool io_plugged;
 };
 
 /**
@@ -307,4 +317,6 @@ static inline void aio_timer_init(AioContext *ctx,
 timer_init(ts, ctx-tlg.tl[type], scale, cb, opaque);
 }
 
+void aio_init_io_queue(AioContext *ctx, unsigned int size);
+void aio_exit_io_queue(AioContext *ctx);
 #endif
diff --git a/include/block/block.h b/include/block/block.h
index d0baf4f..702b79a 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -578,4 +578,10 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs);
  */
 void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context);
 
+void bdrv_init_io_queue(BlockDriverState *bs, unsigned int size);
+void bdrv_exit_io_queue(BlockDriverState *bs);
+bool bdrv_support_io_queue(BlockDriverState *bs);
+void bdrv_queue_io(BlockDriverState

[Qemu-devel] [PATCH 2/3] block: linux-aio: support submit io_queue

2014-06-30 Thread Ming Lei

This patch implements .bdrv_submit_io_queue callback
for linux-aio Block Drivers, so that submitting I/O
at batch can be supported on linux-aio.

Signed-off-by: Ming Lei ming@canonical.com
---
 block/linux-aio.c |   32 ++--
 block/raw-aio.h   |1 +
 block/raw-posix.c |   16 
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index f0a2c08..12f56d8 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -11,6 +11,7 @@
 #include block/aio.h
 #include qemu/queue.h
 #include block/raw-aio.h
+#include block/block_int.h
 #include qemu/event_notifier.h
 
 #include libaio.h
@@ -135,6 +136,29 @@ static const AIOCBInfo laio_aiocb_info = {
 .cancel = laio_cancel,
 };
 
+int laio_submit_io_queue(BlockDriverState *bs, void *aio_ctx)
+{
+struct qemu_laio_state *s = aio_ctx;
+int ret, i;
+
+while (1) {
+ret = io_submit(s-ctx, bs-aio_context-io_queue.idx,
+ (struct iocb **)bs-aio_context-io_queue.iocbs);
+if (ret != -EAGAIN)
+break;
+}
+
+if (ret = 0)
+  return 0;
+
+for (i = 0; i  bs-aio_context-io_queue.idx; i++) {
+struct qemu_laiocb *laiocb =
+container_of(bs-aio_context-io_queue.iocbs[i], struct 
qemu_laiocb, iocb);
+qemu_aio_release(laiocb);
+}
+return ret;
+}
+
 BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockDriverCompletionFunc *cb, void *opaque, int type)
@@ -168,8 +192,12 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void 
*aio_ctx, int fd,
 }
 io_set_eventfd(laiocb-iocb, event_notifier_get_fd(s-e));
 
-if (io_submit(s-ctx, 1, iocbs)  0)
-goto out_free_aiocb;
+if (!bs-aio_context-io_plugged) {
+if (io_submit(s-ctx, 1, iocbs)  0)
+goto out_free_aiocb;
+} else {
+bdrv_queue_io(bs, iocbs);
+}
 return laiocb-common;
 
 out_free_aiocb:
diff --git a/block/raw-aio.h b/block/raw-aio.h
index 8cf084e..89fa775 100644
--- a/block/raw-aio.h
+++ b/block/raw-aio.h
@@ -40,6 +40,7 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void 
*aio_ctx, int fd,
 BlockDriverCompletionFunc *cb, void *opaque, int type);
 void laio_detach_aio_context(void *s, AioContext *old_context);
 void laio_attach_aio_context(void *s, AioContext *new_context);
+int laio_submit_io_queue(BlockDriverState *bs, void *aio_ctx);
 #endif
 
 #ifdef _WIN32
diff --git a/block/raw-posix.c b/block/raw-posix.c
index dacf4fb..5869e8e 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1054,6 +1054,17 @@ static BlockDriverAIOCB *raw_aio_submit(BlockDriverState 
*bs,
cb, opaque, type);
 }
 
+static int raw_aio_submit_io_queue(BlockDriverState *bs)
+{
+BDRVRawState *s = bs-opaque;
+
+#ifdef CONFIG_LINUX_AIO
+if (s-use_aio)
+return laio_submit_io_queue(bs, s-aio_ctx);
+#endif
+return 0;
+}
+
 static BlockDriverAIOCB *raw_aio_readv(BlockDriverState *bs,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockDriverCompletionFunc *cb, void *opaque)
@@ -1503,6 +1514,7 @@ static BlockDriver bdrv_file = {
 .bdrv_aio_flush = raw_aio_flush,
 .bdrv_aio_discard = raw_aio_discard,
 .bdrv_refresh_limits = raw_refresh_limits,
+.bdrv_submit_io_queue = raw_aio_submit_io_queue,
 
 .bdrv_truncate = raw_truncate,
 .bdrv_getlength = raw_getlength,
@@ -1902,6 +1914,7 @@ static BlockDriver bdrv_host_device = {
 .bdrv_aio_flush= raw_aio_flush,
 .bdrv_aio_discard   = hdev_aio_discard,
 .bdrv_refresh_limits = raw_refresh_limits,
+.bdrv_submit_io_queue = raw_aio_submit_io_queue,
 
 .bdrv_truncate  = raw_truncate,
 .bdrv_getlength= raw_getlength,
@@ -2047,6 +2060,7 @@ static BlockDriver bdrv_host_floppy = {
 .bdrv_aio_writev= raw_aio_writev,
 .bdrv_aio_flush= raw_aio_flush,
 .bdrv_refresh_limits = raw_refresh_limits,
+.bdrv_submit_io_queue = raw_aio_submit_io_queue,
 
 .bdrv_truncate  = raw_truncate,
 .bdrv_getlength  = raw_getlength,
@@ -2175,6 +2189,7 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_aio_writev= raw_aio_writev,
 .bdrv_aio_flush= raw_aio_flush,
 .bdrv_refresh_limits = raw_refresh_limits,
+.bdrv_submit_io_queue = raw_aio_submit_io_queue,
 
 .bdrv_truncate  = raw_truncate,
 .bdrv_getlength  = raw_getlength,
@@ -2309,6 +2324,7 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_aio_writev= raw_aio_writev,
 .bdrv_aio_flush= raw_aio_flush,
 .bdrv_refresh_limits = raw_refresh_limits,
+.bdrv_submit_io_queue = raw_aio_submit_io_queue,
 
 .bdrv_truncate  = raw_truncate,
 .bdrv_getlength  = raw_getlength,
-- 
1.7.9.5

[Qemu-devel] [PATCH 3/3] dataplane: submit I/O at batch

2014-06-30 Thread Ming Lei

Before commit 580b6b2aa2(dataplane: use the Qemu block
layer for I/O), dataplane for virtio-blk submits block
I/O at batch.

This commit 580b6b2aa2 replaces the custom linux AIO
implementation(including I/O batch) with Qemu block
layer, but this commit causes ~40% throughput regression
on virtio-blk performance, and removing submitting I/O
at batch is one of the cause.

This patch applys the new introduced bdrv_io_plug() and
bdrv_io_unplug() interfaces to support submitting I/O
at batch for Qemu block layer, and in my test, the change
can improve thoughput by ~30% with 'aio=native'.

Following my fio test script:

[global]
direct=1
size=4G
bsrange=4k-4k
timeout=40
numjobs=4
ioengine=libaio
iodepth=64
filename=/dev/vdc
group_reporting=1

[f]
rw=randread

Result on one of my small machine(host: x86_64, 2cores, 4thread, guest: 4cores):
- qemu master: 62K IOPS
- qemu master with these patches: 84K IOPS
- 2.0.0 release(dataplane using custom linux aio): 97K IOPS

Signed-off-by: Ming Lei ming@canonical.com
---
 hw/block/dataplane/virtio-blk.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index c10b7b7..5deaddc 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -289,6 +289,7 @@ static void handle_notify(EventNotifier *e)
 int ret;
 
 event_notifier_test_and_clear(s-host_notifier);
+bdrv_io_plug(s-blk-conf.bs);
 for (;;) {
 /* Disable guest-host notifies to avoid unnecessary vmexits */
 vring_disable_notification(s-vdev, s-vring);
@@ -322,6 +323,7 @@ static void handle_notify(EventNotifier *e)
 break;
 }
 }
+bdrv_io_unplug(s-blk-conf.bs);
 }
 
 /* Context: QEMU global mutex held */
@@ -431,6 +433,7 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
 trace_virtio_blk_data_plane_start(s);
 
 bdrv_set_aio_context(s-blk-conf.bs, s-ctx);
+bdrv_init_io_queue(s-blk-conf.bs, 128);
 
 /* Kick right away to begin processing requests already in vring */
 event_notifier_set(virtio_queue_get_host_notifier(vq));
@@ -462,6 +465,8 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
 
 aio_context_release(s-ctx);
 
+bdrv_exit_io_queue(s-blk-conf.bs);
+
 /* Sync vring state back to virtqueue so that non-dataplane request
  * processing can continue when we disable the host notifier below.
  */
-- 
1.7.9.5

Re: [Qemu-devel] [PATCH v8 00/14] qemu-img: Implement commit like QMP

2014-06-30 Thread Kevin Wolf

Am 28.06.2014 um 00:07 hat Max Reitz geschrieben:
 On 07.06.2014 20:51, Max Reitz wrote:
 qemu-img should use QMP commands whenever possible in order to ensure
 feature completeness of both online and offline image operations. For
 the commit command, this is relatively easy, so implement it first
 (in the hope that indeed others will follow).
 
 As qemu-img does not have access to QMP (due to QMP being intertwined
 with basically everything in qemu), we cannot directly use QMP, but at
 least use the functions the corresponding QMP commands are using (which
 would be block-commit, in this case).
 
 
 With Stefan's pull request for his dataplane series now out, I thought
 this a good opportunity to send a rebase of this series.
 
 Ping; Hu Tao will need minimal_blob_size() from patch 3 for the
 next iteration of his qemu-img: add preallocation=full series.
 Sending an own patch just for that function seems infeasible, as it
 is a static function which would be unused in the meantime (which
 throws a compiler warning and an error thanks to -Werror). Using
 __attribute__((unused)) just for this seems like a hack; especially
 considering that all patches of this series have been reviewed and
 it should therefore be ready to merge.
 
 In case there are some objections because you want to test it more,
 it would be fine to merge the first three patches (which suffice for
 the preallocation series and should only introduce unused codepaths)
 now and the rest later on.

This series needs rebasing from patch 5 on. I'll review patches 1-3 so
that the dependency for the other series is there.

Kevin

Re: [Qemu-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-06-30 Thread Michael S. Tsirkin

On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote:
 On 2014/6/30 17:05, Michael S. Tsirkin wrote:
 On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote:
 On 2014/6/30 14:48, Michael S. Tsirkin wrote:
 On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote:
 On 2014/6/26 18:03, Paolo Bonzini wrote:
 Il 26/06/2014 11:18, Chen, Tiejun ha scritto:
 
 
 - offsets 0x..0x0fff map to configuration space of the host MCH
 
 
 Are you saying the config space in the video device?
 
 No, I am saying in a new BAR, or at some magic offset of an existing
 MMIO BAR.
 
 
 As I mentioned previously, the IGD guy told me we have no any unused a
 offset or BAR in the config space.
 
 And guy who are responsible for the native driver seems not be accept to
 extend some magic offset of an existing MMIO BAR.
 
 In addition I think in a short time its not possible to migrate i440fx to
 q35 as a PCIe machine of xen.
 
 That seems like a weak motivation.  I don't see a need to get something
 merged upstream in a short time: this seems sure to miss 2.1,
 so you have the time to make it architecturally sound.
 Making existing guests work would be a better motivation.
 
 Yes.
 
 So focus on this then. Existing guests will probably work
 fine on a newer chipset - likely better than on i440fx.
 xen management tools need to do some work to support this?
 That will just give everyone more long term benefits.
 
 If instead we create a hack that does not resemble
 any existing hardware even remotely, what's the
 chance that it will not break with any future
 guest modification?
 
 
 Isn't this possible with an mch chipset?
 
 If you're saying q35, I mean AFAIK we have no any plan to migrate to this
 MCH in xen case.
 
 q35 or a newer chipset that's closer to what guests expect.
 
 Additionally, I think I should follow this feature after
 q35 can work for xen scenario.
 
 What's stopping you?
 
 I mean if you want create an new machine based on q35, actually this is
 equal to start making xen to migrate to q35 now. Right? I can't image how
 much effort should be done.

I don't see why you don't try. Sounds like a more robust solution to me.

 So this is a reason why I'm saying I'd like to follow this feature after q35
 can work with xen completely.

Then we'll end up with more configurations to support, and to what end?

 
 
 
 So could we do this step by step:
 
 #1 phase: We just cover current qemu-xen implementation based on i44fx, so
 still provide that pseudo ISA bridge at 00:1f.0 as we already did.
 
 By the way there is no reason to put it at 00:1f.0 specifically I think.
 So it seems simple: create a dummy device that gets device and
 vendor id as properties. If you really like, add an option to get values
 
 Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx
 passthrough: create pseudo intel isa bridge. There, we fake this device just
 at 00:1f.0.
 But you guys don't like this, and shouldn't this be just this point we
 discussing now?
 
 If you guys agree that , everything is fine.
 
 Actually, this isn't what you did.
 Don't tie it to xen, and don't tie it to 1f.
 Just make it a simple stub pci device.
 Whoever wants it, creates it.
 
 The thing I worry about, is the chance this will break going forward.
 So you created a system with 2 isa bridges.
 
 I don't set class type to claim this is an ISA bridge, just a normal PCI
 device.
 +static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice
 *hdev)
 +{
 +struct PCIDevice *dev;
 +
 +char rid;
 +
 +/* We havt to use a simple PCI device to fake this ISA bridge
 + * to avoid making some confusion to BIOS and ACPI.
 + */
 +dev = pci_create(bus, PCI_DEVFN(0x1f, 0),
 pseudo-intel-pch-isa-bridge);
 +
 +qdev_init_nofail(dev-qdev);
 +
 +pci_config_set_vendor_id(dev-config, hdev-vendor_id);
 +pci_config_set_device_id(dev-config, hdev-device_id);
 +
 +xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)rid, 1);
 +
 +pci_config_set_revision(dev-config, rid);
 +
 +XEN_PT_LOG(dev, The pseudo Intel PCH ISA bridge created.\n);
 +return 0;
 +}

Then I don't see how it's supposed to work.
Doesn't i915 look for an isa bridge?

/*
 * The reason to probe ISA bridge instead of Dev31:Fun0 is to
 * make graphics device passthrough work easy for VMM, that only
 * need to expose ISA bridge to let driver know the real hardware
 * underneath. This is a requirement from virtualization team.
 *
 * In some virtualized environments (e.g. XEN), there is irrelevant
 * ISA bridge in the system. To work reliably, we should scan trhough
 * all the ISA bridge devices and check for the first match, instead
 * of only checking the first one.
 */
while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA  8, pch))) {
if (pch-vendor == PCI_VENDOR_ID_INTEL) {
unsigned short id =

Re: [Qemu-devel] [SeaBIOS] [PATCH v3] hw/pci: reserve IO and mem for pci express downstream ports with no devices attached

2014-06-30 Thread Gerd Hoffmann

On Mo, 2014-06-23 at 18:55 +0300, Michael S. Tsirkin wrote:
 On Mon, Jun 23, 2014 at 06:29:51PM +0300, Marcel Apfelbaum wrote:
  Commit c6e298e1f12e0f4ca02b6da5e42919ae055f6830
  hw/pci: reserve IO and mem for pci-2-pci bridges with no devices 
  attached
  
  introduced support for hot-plugging devices behind pci-2-pci bridges.
  Extend hotplug support also for pci express downstream ports.
  
  Signed-off-by: Marcel Apfelbaum marce...@redhat.com
 
 
 Reviewed-by: Michael S. Tsirkin m...@redhat.com
 
 Gerd, is there a plan to do a release for QEMU 2.1?
 It would be nice to have this patch there.

It's in already (seabios 1.7.5 release  qemu git master).

cheers,
  Gerd

[Qemu-devel] [PATCH v3] tests: Functions bus_foreach and device_find from libqos virtio API

2014-06-30 Thread Marc Marí

Virtio header has been changed to compile and work with a real device.
Functions bus_foreach and device_find have been implemented for PCI.
Virtio-blk test case now opens a fake device.

Signed-off-by: Marc Marí marc.mari.barc...@gmail.com
---
 tests/Makefile|3 +-
 tests/libqos/virtio-pci.c |   72 +
 tests/libqos/virtio-pci.h |   31 +++
 tests/libqos/virtio.h |   28 ++
 tests/virtio-blk-test.c   |   65 +++-
 5 files changed, 190 insertions(+), 9 deletions(-)
 create mode 100644 tests/libqos/virtio-pci.c
 create mode 100644 tests/libqos/virtio-pci.h
 create mode 100644 tests/libqos/virtio.h

diff --git a/tests/Makefile b/tests/Makefile
index 7e53d0d..028c462 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -292,6 +292,7 @@ libqos-obj-y += tests/libqos/i2c.o
 libqos-pc-obj-y = $(libqos-obj-y) tests/libqos/pci-pc.o
 libqos-pc-obj-y += tests/libqos/malloc-pc.o
 libqos-omap-obj-y = $(libqos-obj-y) tests/libqos/i2c-omap.o
+libqos-virtio-obj-y = $(libqos-obj-y) $(libqos-pc-obj-y) 
tests/libqos/virtio-pci.o
 
 tests/rtc-test$(EXESUF): tests/rtc-test.o
 tests/m48t59-test$(EXESUF): tests/m48t59-test.o
@@ -312,7 +313,7 @@ tests/eepro100-test$(EXESUF): tests/eepro100-test.o
 tests/vmxnet3-test$(EXESUF): tests/vmxnet3-test.o
 tests/ne2000-test$(EXESUF): tests/ne2000-test.o
 tests/virtio-balloon-test$(EXESUF): tests/virtio-balloon-test.o
-tests/virtio-blk-test$(EXESUF): tests/virtio-blk-test.o
+tests/virtio-blk-test$(EXESUF): tests/virtio-blk-test.o $(libqos-virtio-obj-y)
 tests/virtio-net-test$(EXESUF): tests/virtio-net-test.o
 tests/virtio-rng-test$(EXESUF): tests/virtio-rng-test.o
 tests/virtio-scsi-test$(EXESUF): tests/virtio-scsi-test.o
diff --git a/tests/libqos/virtio-pci.c b/tests/libqos/virtio-pci.c
new file mode 100644
index 000..9fd9735
--- /dev/null
+++ b/tests/libqos/virtio-pci.c
@@ -0,0 +1,72 @@
+/*
+ * libqos virtio PCI driver
+ *
+ * Copyright (c) 2014 Marc Marí
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include glib.h
+#include stdio.h
+#include libqtest.h
+#include libqos/virtio.h
+#include libqos/virtio-pci.h
+#include libqos/pci.h
+#include libqos/pci-pc.h
+
+#include hw/pci/pci_regs.h
+
+static QVirtioPCIDevice *qpcidevice_to_qvirtiodevice(QPCIDevice *pdev)
+{
+QVirtioPCIDevice *vpcidev;
+vpcidev = g_malloc0(sizeof(*vpcidev));
+
+if (pdev) {
+vpcidev-pdev = pdev;
+vpcidev-vdev.device_type =
+qpci_config_readw(vpcidev-pdev, PCI_SUBSYSTEM_ID);
+}
+
+return vpcidev;
+}
+
+static void qvirtio_pci_foreach_callback(
+QPCIDevice *dev, int devfn, void *data)
+{
+QVirtioPCIForeachData *d = data;
+QVirtioPCIDevice *vpcidev = qpcidevice_to_qvirtiodevice(dev);
+
+if (vpcidev-vdev.device_type == d-device_type) {
+d-func(vpcidev-vdev, d-user_data);
+}
+}
+
+static void qvirtio_pci_assign_device(QVirtioDevice *d, void *data)
+{
+QVirtioPCIDevice *vpcidev = data;
+vpcidev-pdev   = ((QVirtioPCIDevice *)d)-pdev;
+vpcidev-vdev.device_type   = ((QVirtioPCIDevice *)d)-vdev.device_type;
+}
+
+void qvirtio_pci_foreach(QPCIBus *bus, uint16_t device_type,
+void (*func)(QVirtioDevice *d, void *data), void *data)
+{
+QVirtioPCIForeachData d = { .func = func,
+.device_type = device_type,
+.user_data = data };
+
+qpci_device_foreach(bus, QVIRTIO_VENDOR_ID, -1,
+qvirtio_pci_foreach_callback, d);
+}
+
+QVirtioPCIDevice *qvirtio_pci_device_find(QPCIBus *bus, uint16_t device_type)
+{
+QVirtioPCIDevice *dev;
+
+dev = g_malloc0(sizeof(*dev));
+qvirtio_pci_foreach(bus, device_type, qvirtio_pci_assign_device, dev);
+
+return dev;
+}
+
diff --git a/tests/libqos/virtio-pci.h b/tests/libqos/virtio-pci.h
new file mode 100644
index 000..f00a982
--- /dev/null
+++ b/tests/libqos/virtio-pci.h
@@ -0,0 +1,31 @@
+/*
+ * libqos virtio PCI definitions
+ *
+ * Copyright (c) 2014 Marc Marí
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef LIBQOS_VIRTIO_PCI_H
+#define LIBQOS_VIRTIO_PCI_H
+
+#include libqos/virtio.h
+#include libqos/pci.h
+
+typedef struct QVirtioPCIDevice {
+QVirtioDevice vdev;
+QPCIDevice *pdev;
+} QVirtioPCIDevice;
+
+typedef struct QVirtioPCIForeachData {
+void (*func)(QVirtioDevice *d, void *data);
+uint16_t device_type;
+void *user_data;
+} QVirtioPCIForeachData;
+
+void qvirtio_pci_foreach(QPCIBus *bus, uint16_t device_type,
+void (*func)(QVirtioDevice *d, void *data), void *data);
+
+QVirtioPCIDevice *qvirtio_pci_device_find(QPCIBus *bus, uint16_t device_type);
+#endif
diff --git

Re: [Qemu-devel] [PATCH] hw/virtio: enable common virtio feature for mmio device

2014-06-30 Thread Ming Lei

Hi Guys,

On Wed, Jun 4, 2014 at 10:05 AM, Ming Lei ming@canonical.com wrote:
 Both 'indirect_desc' and 'event_idx' are bus independent features,
 and they should be enabled for mmio devices too.

 On arm64 quad core VM(qemu-kvm), the patch can increase block I/O
 performance a lot with latest linux tree:
 - without the patch: 14K IOPS
 - with the patch: 34K IOPS

 fio script:
 [global]
 direct=1
 bsrange=4k-4k
 timeout=10
 numjobs=4
 ioengine=libaio
 iodepth=64

 filename=/dev/vdc
 group_reporting=1

 [f1]
 rw=randread

 Cc: Peter Maydell peter.mayd...@linaro.org
 Signed-off-by: Ming Lei ming@canonical.com
 ---
  hw/virtio/virtio-mmio.c |6 ++
  1 file changed, 6 insertions(+)

 diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
 index 8829eb0..18c6e5b 100644
 --- a/hw/virtio/virtio-mmio.c
 +++ b/hw/virtio/virtio-mmio.c
 @@ -369,10 +369,16 @@ static void virtio_mmio_realizefn(DeviceState *d, Error 
 **errp)
  sysbus_init_mmio(sbd, proxy-iomem);
  }

 +static Property virtio_mmio_properties[] = {
 +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOMMIOProxy, host_features),
 +DEFINE_PROP_END_OF_LIST(),
 +};
 +
  static void virtio_mmio_class_init(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);

 +dc-props = virtio_mmio_properties;
  dc-realize = virtio_mmio_realizefn;
  dc-reset = virtio_mmio_reset;
  set_bit(DEVICE_CATEGORY_MISC, dc-categories);
 --

Gentle ping...

Thanks,
--
Ming Lei

Re: [Qemu-devel] [PATCH v8 01/14] qcow2: Allow full discard

2014-06-30 Thread Kevin Wolf

Am 07.06.2014 um 20:51 hat Max Reitz geschrieben:
 Normally, discarded sectors should read back as zero. However, there are
 cases in which a sector (or rather cluster) should be discarded as if
 they were never written in the first place, that is, reading them should
 fall through to the backing file again.
 
 Signed-off-by: Max Reitz mre...@redhat.com
 Reviewed-by: Eric Blake ebl...@redhat.com

Reviewed-by: Kevin Wolf kw...@redhat.com

Re: [Qemu-devel] [PATCH v8 02/14] qcow2: Implement bdrv_make_empty()

2014-06-30 Thread Kevin Wolf

Am 07.06.2014 um 20:51 hat Max Reitz geschrieben:
 Implement this function by making all clusters in the image file fall
 through to the backing file (by using the recently extended discard).
 
 Signed-off-by: Max Reitz mre...@redhat.com
 Reviewed-by: Eric Blake ebl...@redhat.com

Reviewed-by: Kevin Wolf kw...@redhat.com

Re: [Qemu-devel] [questions] about KVM as a Microsoft-compatible hypervisor

2014-06-30 Thread Vadim Rozenfeld

On Mon, 2014-06-30 at 09:39 +0800, Zhang Haoyu wrote:
 Hi, Vadim
 I read the kvm-2012-forum paper  KVM as a Microsoft-compatible hypervisor, 
 Any update and other references, please?
 
 Thanks,
 Zhang Haoyu
 
 

Unfortunately, not too much.
From the the most recent, we have lazy eoi implemented by 
MST and reference time counter.

Best regards,
Vadim.

Re: [Qemu-devel] [PATCH] hw/virtio: enable common virtio feature for mmio device

2014-06-30 Thread Michael S. Tsirkin

On Wed, Jun 04, 2014 at 10:05:55AM +0800, Ming Lei wrote:
 Both 'indirect_desc' and 'event_idx' are bus independent features,
 and they should be enabled for mmio devices too.
 
 On arm64 quad core VM(qemu-kvm), the patch can increase block I/O
 performance a lot with latest linux tree:
 - without the patch: 14K IOPS
 - with the patch: 34K IOPS
 
 fio script:
 [global]
 direct=1
 bsrange=4k-4k
 timeout=10
 numjobs=4
 ioengine=libaio
 iodepth=64
 
 filename=/dev/vdc
 group_reporting=1
 
 [f1]
 rw=randread
 
 Cc: Peter Maydell peter.mayd...@linaro.org
 Signed-off-by: Ming Lei ming@canonical.com

Applied, thanks!

 ---
  hw/virtio/virtio-mmio.c |6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
 index 8829eb0..18c6e5b 100644
 --- a/hw/virtio/virtio-mmio.c
 +++ b/hw/virtio/virtio-mmio.c
 @@ -369,10 +369,16 @@ static void virtio_mmio_realizefn(DeviceState *d, Error 
 **errp)
  sysbus_init_mmio(sbd, proxy-iomem);
  }
  
 +static Property virtio_mmio_properties[] = {
 +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOMMIOProxy, host_features),
 +DEFINE_PROP_END_OF_LIST(),
 +};
 +
  static void virtio_mmio_class_init(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
  
 +dc-props = virtio_mmio_properties;
  dc-realize = virtio_mmio_realizefn;
  dc-reset = virtio_mmio_reset;
  set_bit(DEVICE_CATEGORY_MISC, dc-categories);
 -- 
 1.7.9.5

Re: [Qemu-devel] [PATCH] hw/virtio: enable common virtio feature for mmio device

2014-06-30 Thread Michael S. Tsirkin

On Mon, Jun 16, 2014 at 05:26:33PM +0800, Ming Lei wrote:
 On Mon, Jun 16, 2014 at 3:54 PM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 09/06/2014 10:00, Ming Lei ha scritto:
 
  On Wed, Jun 4, 2014 at 10:05 AM, Ming Lei ming@canonical.com wrote:
 
  Both 'indirect_desc' and 'event_idx' are bus independent features,
  and they should be enabled for mmio devices too.
 
  On arm64 quad core VM(qemu-kvm), the patch can increase block I/O
  performance a lot with latest linux tree:
  - without the patch: 14K IOPS
  - with the patch: 34K IOPS
 
  fio script:
  [global]
  direct=1
  bsrange=4k-4k
  timeout=10
  numjobs=4
  ioengine=libaio
  iodepth=64
 
  filename=/dev/vdc
  group_reporting=1
 
  [f1]
  rw=randread
 
  Cc: Peter Maydell peter.mayd...@linaro.org
  Signed-off-by: Ming Lei ming@canonical.com
  ---
   hw/virtio/virtio-mmio.c |6 ++
   1 file changed, 6 insertions(+)
 
  diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
  index 8829eb0..18c6e5b 100644
  --- a/hw/virtio/virtio-mmio.c
  +++ b/hw/virtio/virtio-mmio.c
  @@ -369,10 +369,16 @@ static void virtio_mmio_realizefn(DeviceState *d,
  Error **errp)
   sysbus_init_mmio(sbd, proxy-iomem);
   }
 
  +static Property virtio_mmio_properties[] = {
  +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOMMIOProxy, host_features),
  +DEFINE_PROP_END_OF_LIST(),
  +};
  +
   static void virtio_mmio_class_init(ObjectClass *klass, void *data)
   {
   DeviceClass *dc = DEVICE_CLASS(klass);
 
  +dc-props = virtio_mmio_properties;
   dc-realize = virtio_mmio_realizefn;
   dc-reset = virtio_mmio_reset;
   set_bit(DEVICE_CATEGORY_MISC, dc-categories);
  --
  1.7.9.5
 
 
  Looks good.
 
 Paolo, thanks for your review.
 
  Can you look into moving DEFINE_VIRTIO_COMMON_FEATURES
  from all virtio pci devices to TYPE_VIRTIO_PCI, too?
 
 OK, that looks a good cleanup, how about the attached patch?
 If it is OK, I will prepare a formal one for submitting.
 
 
 Thanks,
 --
 Ming Lei

I applied the original patch for now.
Pls address Paolo's comments and resubmit this one.


 diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
 index b437f19..af2e1c3 100644
 --- a/hw/virtio/virtio-pci.c
 +++ b/hw/virtio/virtio-pci.c
 @@ -917,7 +917,6 @@ static Property virtio_9p_pci_properties[] = {
  DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags,
  VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
  DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
 -DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
  DEFINE_VIRTIO_9P_PROPERTIES(V9fsPCIState, vdev.fsconf),
  DEFINE_PROP_END_OF_LIST(),
  };
 @@ -1038,11 +1037,17 @@ static void virtio_pci_reset(DeviceState *qdev)
  proxy-flags = ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
  }
  
 +static Property virtio_pci_properties[] = {
 +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
 +DEFINE_PROP_END_OF_LIST(),
 +};
 +
  static void virtio_pci_class_init(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
  PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
  
 +dc-props = virtio_pci_properties;
  k-init = virtio_pci_init;
  k-exit = virtio_pci_exit;
  k-vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
 @@ -1071,7 +1076,6 @@ static Property virtio_blk_pci_properties[] = {
  DEFINE_PROP_BIT(x-data-plane, VirtIOBlkPCI, blk.data_plane, 0, false),
  DEFINE_PROP_UINT32(num_queues, VirtIOBlkPCI, blk.num_queues, 1),
  #endif
 -DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
  DEFINE_VIRTIO_BLK_PROPERTIES(VirtIOBlkPCI, blk),
  DEFINE_PROP_END_OF_LIST(),
  };
 @@ -1195,7 +1199,6 @@ static const TypeInfo virtio_scsi_pci_info = {
  static Property vhost_scsi_pci_properties[] = {
  DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors,
 DEV_NVECTORS_UNSPECIFIED),
 -DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
  DEFINE_VHOST_SCSI_PROPERTIES(VHostSCSIPCI, vdev.parent_obj.conf),
  DEFINE_PROP_END_OF_LIST(),
  };
 @@ -1276,7 +1279,6 @@ static void balloon_pci_stats_set_poll_interval(Object 
 *obj, struct Visitor *v,
  }
  
  static Property virtio_balloon_pci_properties[] = {
 -DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
  DEFINE_PROP_UINT32(class, VirtIOPCIProxy, class_code, 0),
  DEFINE_PROP_END_OF_LIST(),
  };
 @@ -1379,7 +1381,6 @@ static Property virtio_serial_pci_properties[] = {
  VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
  DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
  DEFINE_PROP_UINT32(class, VirtIOPCIProxy, class_code, 0),
 -DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
  DEFINE_VIRTIO_SERIAL_PROPERTIES(VirtIOSerialPCI, vdev.serial),
  DEFINE_PROP_END_OF_LIST(),
  };
 @@ -1475,7 +1476,6 @@ static const TypeInfo

Re: [Qemu-devel] [PATCH] hw/virtio: enable common virtio feature for mmio device

2014-06-30 Thread Ming Lei

On Mon, Jun 30, 2014 at 6:09 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Mon, Jun 16, 2014 at 05:26:33PM +0800, Ming Lei wrote:
 On Mon, Jun 16, 2014 at 3:54 PM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 09/06/2014 10:00, Ming Lei ha scritto:
 
  On Wed, Jun 4, 2014 at 10:05 AM, Ming Lei ming@canonical.com wrote:
 
  Both 'indirect_desc' and 'event_idx' are bus independent features,
  and they should be enabled for mmio devices too.
 
  On arm64 quad core VM(qemu-kvm), the patch can increase block I/O
  performance a lot with latest linux tree:
  - without the patch: 14K IOPS
  - with the patch: 34K IOPS
 
  fio script:
  [global]
  direct=1
  bsrange=4k-4k
  timeout=10
  numjobs=4
  ioengine=libaio
  iodepth=64
 
  filename=/dev/vdc
  group_reporting=1
 
  [f1]
  rw=randread
 
  Cc: Peter Maydell peter.mayd...@linaro.org
  Signed-off-by: Ming Lei ming@canonical.com
  ---
   hw/virtio/virtio-mmio.c |6 ++
   1 file changed, 6 insertions(+)
 
  diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
  index 8829eb0..18c6e5b 100644
  --- a/hw/virtio/virtio-mmio.c
  +++ b/hw/virtio/virtio-mmio.c
  @@ -369,10 +369,16 @@ static void virtio_mmio_realizefn(DeviceState *d,
  Error **errp)
   sysbus_init_mmio(sbd, proxy-iomem);
   }
 
  +static Property virtio_mmio_properties[] = {
  +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOMMIOProxy, host_features),
  +DEFINE_PROP_END_OF_LIST(),
  +};
  +
   static void virtio_mmio_class_init(ObjectClass *klass, void *data)
   {
   DeviceClass *dc = DEVICE_CLASS(klass);
 
  +dc-props = virtio_mmio_properties;
   dc-realize = virtio_mmio_realizefn;
   dc-reset = virtio_mmio_reset;
   set_bit(DEVICE_CATEGORY_MISC, dc-categories);
  --
  1.7.9.5
 
 
  Looks good.

 Paolo, thanks for your review.

  Can you look into moving DEFINE_VIRTIO_COMMON_FEATURES
  from all virtio pci devices to TYPE_VIRTIO_PCI, too?

 OK, that looks a good cleanup, how about the attached patch?
 If it is OK, I will prepare a formal one for submitting.


 Thanks,
 --
 Ming Lei

 I applied the original patch for now.
 Pls address Paolo's comments and resubmit this one.

I have addresses all comments for the virtio-pci changes, and
will reply you on that thread.

Thanks,
--
Ming Lei

Re: [Qemu-devel] [PATCH] virtio: move common virtio properties to bus class device

2014-06-30 Thread Ming Lei

Hi Michael,


On Wed, Jun 18, 2014 at 3:13 PM, Ming Lei ming@canonical.com wrote:
 The two common virtio features can be defined per bus, so move all
 into bus class device to make code more clean.

 As discussed with cornelia, s390-virtio-blk doesn't support
 the two features at all, so keep s390-virtio as it.

 Acked-by: Cornelia Huck cornelia.h...@de.ibm.com #for s390 ccw
 Suggested-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Ming Lei ming@canonical.com
 ---
  hw/s390x/s390-virtio-bus.c  |2 ++
  hw/s390x/virtio-ccw.c   |   11 ++-
  hw/virtio/virtio-pci.c  |   12 ++--
  include/hw/virtio/virtio-blk.h  |3 ---
  include/hw/virtio/virtio-net.h  |1 -
  include/hw/virtio/virtio-scsi.h |1 -
  6 files changed, 14 insertions(+), 16 deletions(-)

 diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
 index 9c71afa..921cdc2 100644
 --- a/hw/s390x/s390-virtio-bus.c
 +++ b/hw/s390x/s390-virtio-bus.c
 @@ -504,6 +504,7 @@ static unsigned virtio_s390_get_features(DeviceState *d)

  static Property s390_virtio_net_properties[] = {
  DEFINE_NIC_PROPERTIES(VirtIONetS390, vdev.nic_conf),
 +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features),
  DEFINE_VIRTIO_NET_FEATURES(VirtIOS390Device, host_features),
  DEFINE_VIRTIO_NET_PROPERTIES(VirtIONetS390, vdev.net_conf),
  DEFINE_PROP_END_OF_LIST(),
 @@ -631,6 +632,7 @@ static const TypeInfo virtio_s390_device_info = {

  static Property s390_virtio_scsi_properties[] = {
  DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOSCSIS390, vdev.parent_obj.conf),
 +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features),
  DEFINE_VIRTIO_SCSI_FEATURES(VirtIOS390Device, host_features),
  DEFINE_PROP_END_OF_LIST(),
  };
 diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
 index 2bf0af8..c0124e1 100644
 --- a/hw/s390x/virtio-ccw.c
 +++ b/hw/s390x/virtio-ccw.c
 @@ -1126,7 +1126,6 @@ static const TypeInfo virtio_ccw_net = {

  static Property virtio_ccw_blk_properties[] = {
  DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id),
 -DEFINE_VIRTIO_BLK_FEATURES(VirtioCcwDevice, host_features[0]),
  DEFINE_VIRTIO_BLK_PROPERTIES(VirtIOBlkCcw, blk),
  DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags,
  VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
 @@ -1158,7 +1157,6 @@ static const TypeInfo virtio_ccw_blk = {
  static Property virtio_ccw_serial_properties[] = {
  DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id),
  DEFINE_VIRTIO_SERIAL_PROPERTIES(VirtioSerialCcw, vdev.serial),
 -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]),
  DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags,
  VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
  DEFINE_PROP_END_OF_LIST(),
 @@ -1185,7 +1183,6 @@ static const TypeInfo virtio_ccw_serial = {

  static Property virtio_ccw_balloon_properties[] = {
  DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id),
 -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]),
  DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags,
  VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
  DEFINE_PROP_END_OF_LIST(),
 @@ -1242,7 +1239,6 @@ static const TypeInfo virtio_ccw_scsi = {
  static Property vhost_ccw_scsi_properties[] = {
  DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id),
  DEFINE_VHOST_SCSI_PROPERTIES(VirtIOSCSICcw, vdev.parent_obj.conf),
 -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]),
  DEFINE_PROP_END_OF_LIST(),
  };

 @@ -1279,7 +1275,6 @@ static void virtio_ccw_rng_instance_init(Object *obj)

  static Property virtio_ccw_rng_properties[] = {
  DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id),
 -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]),
  DEFINE_VIRTIO_RNG_PROPERTIES(VirtIORNGCcw, vdev.conf),
  DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags,
  VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
 @@ -1345,10 +1340,16 @@ static int virtio_ccw_busdev_unplug(DeviceState *dev)
  return 0;
  }

 +static Property virtio_ccw_properties[] = {
 +DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]),
 +DEFINE_PROP_END_OF_LIST(),
 +};
 +
  static void virtio_ccw_device_class_init(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);

 +dc-props = virtio_ccw_properties;
  dc-init = virtio_ccw_busdev_init;
  dc-exit = virtio_ccw_busdev_exit;
  dc-unplug = virtio_ccw_busdev_unplug;
 diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
 index b437f19..af2e1c3 100644
 --- a/hw/virtio/virtio-pci.c
 +++ b/hw/virtio/virtio-pci.c
 @@ -917,7 +917,6 @@ static Property virtio_9p_pci_properties[] = {
  DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags,
  VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
  DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
 -

Re: [Qemu-devel] [PATCH] numa: check for busy memory backend

2014-06-30 Thread Hu Tao

On Mon, Jun 30, 2014 at 12:12:20PM +0300, Michael S. Tsirkin wrote:
 On Mon, Jun 30, 2014 at 10:48:22AM +0200, Igor Mammedov wrote:
  On Mon, 30 Jun 2014 11:28:07 +0300
  Michael S. Tsirkin m...@redhat.com wrote:
  
   On Mon, Jun 30, 2014 at 03:46:56PM +0800, Hu Tao wrote:
On Mon, Jun 30, 2014 at 09:53:20AM +0300, Michael S. Tsirkin wrote:
 On Mon, Jun 30, 2014 at 01:33:42PM +0800, Hu Tao wrote:
  On Sun, Jun 29, 2014 at 06:20:22PM +0300, Michael S. Tsirkin wrote:
   On Wed, Jun 25, 2014 at 05:04:14PM +0800, Hu Tao wrote:
..to prevent one memory backend from being used by more than 
one numa
node.
   
   Thanks, but please always make the msg content self-contained
   so it can be understood without the subject.
   E.g. here, just drop ..to.
   
   Are you sure we want this? Is there a chance sharing a backend
   can be useful?
  
  This patch is actually a bug fix.
 
 It is?  What is the bug and how to reproduce it?

If user specifies the same memory backend for two numa nodes:

./x86_64-softmmu/qemu-system-x86_64 -hda 
/home/data/libvirt-images/f18.img  -m 512M \
-qmp unix:/tmp/m,server,nowait -monitor stdio -enable-kvm \
-object memory-backend-ram,size=256M,id=ram0 \
-numa node,nodeid=0,memdev=ram0 \
-numa node,nodeid=1,memdev=ram0

 I am not sure we should write a ton of code to validate qemu
 configuration, as long as qemu does not assert.

It seems qemu does not provide a way to disable assert currently.
Even if I removed asserts on the code path in my test, there is another
problem that it hits an infinite in render_memory_region().
   
   OK so this is what commit log should say:
   ---
   Specifying the same memory region twice leads to an assert:
   
   ./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object
   memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0
   -numa node,nodeid=1,memdev=ram0
   qemu-system-x86_64: /scm/qemu/memory.c:1506:
   memory_region_add_subregion_common: Assertion `!subregion-container'
   failed.
   Aborted (core dumped)
   
   Detect and exit with an error message instead.
   ---
  with  fixed-up commit message:
  Reviewed-by: Igor Mammedov imamm...@redhat.com
 
 Sorry I want the error message fixed up too.

Yes your error message is more clear. I'll send v2. Thanks for review.

Regards,
Hu

 
   
   See? Explain why your patch makes sense, don't just repeat what it does.
   
 
  Even if we will want backend sharing, we
  can do it after.
 
 By reverting this patch? So why merge it?

The point is qemu doesn't fire a bug no matter what user inputs.

 
   
   Igor, what's your take?
   

Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 numa.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/numa.c b/numa.c
index e471afe..6c1c554 100644
--- a/numa.c
+++ b/numa.c
@@ -279,6 +279,13 @@ void 
memory_region_allocate_system_memory(MemoryRegion *mr, Object 
*owner,
 exit(1);
 }
 
+if (memory_region_is_mapped(seg)) {
+char *path = 
object_get_canonical_path_component(OBJECT(backend));
+error_report(memory backend %s is busy, path);
   
   That's not very clear. How about:
 memory backend %s is used multiple times. Each -numa option must use a 
   different memdev value.
   
+g_free(path);
   
   As we are going to exit anyway, it does not make sense to bother with 
   this.
   
+exit(1);
+}
+
 memory_region_add_subregion(mr, addr, seg);
 vmstate_register_ram_global(seg);
 addr += size;
-- 
1.9.3

Re: [Qemu-devel] [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-06-30 Thread Chen, Tiejun


On 2014/6/30 17:55, Michael S. Tsirkin wrote:

On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote:

On 2014/6/30 17:05, Michael S. Tsirkin wrote:

On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote:

On 2014/6/30 14:48, Michael S. Tsirkin wrote:

On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote:

On 2014/6/26 18:03, Paolo Bonzini wrote:

Il 26/06/2014 11:18, Chen, Tiejun ha scritto:




- offsets 0x..0x0fff map to configuration space of the host MCH



Are you saying the config space in the video device?


No, I am saying in a new BAR, or at some magic offset of an existing
MMIO BAR.



As I mentioned previously, the IGD guy told me we have no any unused a
offset or BAR in the config space.

And guy who are responsible for the native driver seems not be accept to
extend some magic offset of an existing MMIO BAR.

In addition I think in a short time its not possible to migrate i440fx to
q35 as a PCIe machine of xen.


That seems like a weak motivation.  I don't see a need to get something
merged upstream in a short time: this seems sure to miss 2.1,
so you have the time to make it architecturally sound.
Making existing guests work would be a better motivation.


Yes.


So focus on this then. Existing guests will probably work
fine on a newer chipset - likely better than on i440fx.
xen management tools need to do some work to support this?
That will just give everyone more long term benefits.

If instead we create a hack that does not resemble
any existing hardware even remotely, what's the
chance that it will not break with any future
guest modification?



Isn't this possible with an mch chipset?


If you're saying q35, I mean AFAIK we have no any plan to migrate to this
MCH in xen case.


q35 or a newer chipset that's closer to what guests expect.


Additionally, I think I should follow this feature after
q35 can work for xen scenario.


What's stopping you?


I mean if you want create an new machine based on q35, actually this is
equal to start making xen to migrate to q35 now. Right? I can't image how
much effort should be done.


I don't see why you don't try. Sounds like a more robust solution to me.


As I think this is another requirement to us. I'm not sure if I have 
enough time to touch this currently.





So this is a reason why I'm saying I'd like to follow this feature after q35
can work with xen completely.


Then we'll end up with more configurations to support, and to what end?







So could we do this step by step:

#1 phase: We just cover current qemu-xen implementation based on i44fx, so
still provide that pseudo ISA bridge at 00:1f.0 as we already did.


By the way there is no reason to put it at 00:1f.0 specifically I think.
So it seems simple: create a dummy device that gets device and
vendor id as properties. If you really like, add an option to get values


Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx
passthrough: create pseudo intel isa bridge. There, we fake this device just
at 00:1f.0.
But you guys don't like this, and shouldn't this be just this point we
discussing now?

If you guys agree that , everything is fine.


Actually, this isn't what you did.
Don't tie it to xen, and don't tie it to 1f.
Just make it a simple stub pci device.
Whoever wants it, creates it.

The thing I worry about, is the chance this will break going forward.
So you created a system with 2 isa bridges.


I don't set class type to claim this is an ISA bridge, just a normal PCI
device.
+static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice
*hdev)
+{
+struct PCIDevice *dev;
+
+char rid;
+
+/* We havt to use a simple PCI device to fake this ISA bridge
+ * to avoid making some confusion to BIOS and ACPI.
+ */
+dev = pci_create(bus, PCI_DEVFN(0x1f, 0),
pseudo-intel-pch-isa-bridge);
+
+qdev_init_nofail(dev-qdev);
+
+pci_config_set_vendor_id(dev-config, hdev-vendor_id);
+pci_config_set_device_id(dev-config, hdev-device_id);
+
+xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)rid, 1);
+
+pci_config_set_revision(dev-config, rid);
+
+XEN_PT_LOG(dev, The pseudo Intel PCH ISA bridge created.\n);
+return 0;
+}


Then I don't see how it's supposed to work.
Doesn't i915 look for an isa bridge?

 /*
  * The reason to probe ISA bridge instead of Dev31:Fun0 is to
  * make graphics device passthrough work easy for VMM, that only
  * need to expose ISA bridge to let driver know the real hardware
  * underneath. This is a requirement from virtualization team.
  *
  * In some virtualized environments (e.g. XEN), there is irrelevant
  * ISA bridge in the system. To work reliably, we should scan trhough
  * all the ISA bridge devices and check for the first match, instead
  * of only checking the first one.
  */
 while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA  8, pch))) {
 if

Re: [Qemu-devel] [v2][PATCH 4/8] xen, gfx passthrough: reserve 00:02.0 for INTEL IGD

2014-06-30 Thread Chen, Tiejun


On 2014/6/30 17:29, Gerd Hoffmann wrote:

   Hi,


   /* Make cirrues VGA S3 suspend/resume work in Windows
XP/2003 */
   Device (VGA)
   {
-   Name (_ADR, 0x0002)
+   // Address of the VGA (device F function 0)
+   Name (_ADR, 0x000F)



With this change I still didn't see anything.



This does not match with what I see.

Looks like linux does not care about the acpi data.  Using


The acpi data doesn't matter at all for the boot screen, the vgabios
doesn't look at it.


(d12) Scan for VGA option rom
(d12) Running option rom at c000:0003


seabios loaded+initialized the vgabios (from the pci rom bar of the vga
device).  Which slot the vga is installed at should not matter at all.
seabios scans the pci bus and should find the vga with no problems no
matter where it is.


(XEN) stdvga.c:147:d12v0 entering stdvga and caching modes


Seems to have worked fine ;)


(d12) pmm call arg1=0
(d12) Turning on vga text mode console
(d12) SeaBIOS (version rel-1.7.5-0-ge51488c-20140626_113926-dcs-xen-54)


At this point you should see the seabios banner at the vga screen.


And the VGA screen has the SeaBIOS messages:

SeaBIOS (version rel-1.7.5-0-ge51488c-20140626_113926-dcs-xen-54)
Machine UUID 18cf5fd0-e564-49c4-b63f-e6c3c23b1294


As it should be.

Now the questions is why it doesn't work for Tiejun ...


Gerd and Don,

Thanks for your information.

But I have no time to validate this configuration now, I will update 
this as soon as I try Don's config file.


Thanks again.

Tiejun


Anything in the logs?

cheers,
   Gerd

Re: [Qemu-devel] [PATCH] qemu-char: Convert socket char backend to parse/kind

2014-06-30 Thread Gerd Hoffmann

  Hi,

   * The old qemu_chr_open_socket() has an
 if (!is_waitconnect)
 qemu_set_nonblock(fd);
 which the QMP char-open codepath has never had. Does this matter?
 Which of the two code paths was correct?
 
 Gerd?

IIRC the socket is put into non-blocking mode anyway by the qemu socket
helper functions.

cheers,
  Gerd

[Qemu-devel] [PATCH v2] numa: check for busy memory backend

2014-06-30 Thread Hu Tao

Specifying the same memory backend twice leads to an assert:

./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object
memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0
-numa node,nodeid=1,memdev=ram0
qemu-system-x86_64: /scm/qemu/memory.c:1506:
memory_region_add_subregion_common: Assertion `!subregion-container'
failed.
Aborted (core dumped)

Detect and exit with an error message instead.

Reviewed-by: Igor Mammedov imamm...@redhat.com
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 numa.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/numa.c b/numa.c
index 2fde740..7bf7834 100644
--- a/numa.c
+++ b/numa.c
@@ -301,6 +301,14 @@ void memory_region_allocate_system_memory(MemoryRegion 
*mr, Object *owner,
 exit(1);
 }
 
+if (memory_region_is_mapped(seg)) {
+char *path = object_get_canonical_path_component(OBJECT(backend));
+error_report(memory backend %s is used multiple times. Each 
+ -numa option must use a different memdev value.,
+ path);
+exit(1);
+}
+
 memory_region_add_subregion(mr, addr, seg);
 vmstate_register_ram_global(seg);
 addr += size;
-- 
1.9.3

Re: [Qemu-devel] [PATCH] qemu-char: Convert socket char backend to parse/kind

2014-06-30 Thread Peter Maydell

On 30 June 2014 11:23, Gerd Hoffmann kra...@redhat.com wrote:
   Hi,

   * The old qemu_chr_open_socket() has an
 if (!is_waitconnect)
 qemu_set_nonblock(fd);
 which the QMP char-open codepath has never had. Does this matter?
 Which of the two code paths was correct?

 Gerd?

 IIRC the socket is put into non-blocking mode anyway by the qemu socket
 helper functions.

In that case is qemu_chr_open_socket_fd() incorrect
in marking the socket as nonblocking in the
is_listen  is_waitconnect case?

-- PMM

[Qemu-devel] [PATCH for-2.1 v2] pc-dimm: error out if memory hotplug is not enabled

2014-06-30 Thread Igor Mammedov

fixes QEMU abort in case it's started without memory
hotplug enabled.

as result of fix it will print following messages:

-device pc-dimm,id=d1,memdev=m1: memory hotplug is not enabled, enable it on 
startup
-device pc-dimm,id=d1,memdev=m1: Device 'pc-dimm' could not be initialized


Also fixup assert condition to detect hotplug address
space overflow.

Signed-off-by: Igor Mammedov imamm...@redhat.com
Reported-by:  Hu Tao hu...@cn.fujitsu.com
---
v2:
 - amend error message
---
 hw/mem/pc-dimm.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index ad176b7..08f49ed 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -146,7 +146,13 @@ uint64_t pc_dimm_get_free_addr(uint64_t 
address_space_start,
 uint64_t new_addr, ret = 0;
 uint64_t address_space_end = address_space_start + address_space_size;
 
-assert(address_space_end  address_space_size);
+if (!address_space_size) {
+error_setg(errp, memory hotplug is not enabled, 
+ please add maxmem option);
+goto out;
+}
+
+assert(address_space_end  address_space_start);
 object_child_foreach(qdev_get_machine(), pc_dimm_built_list, list);
 
 if (hint) {
-- 
1.7.1

Re: [Qemu-devel] [PATCH for-2.1 v2] pc-dimm: error out if memory hotplug is not enabled

2014-06-30 Thread Michael S. Tsirkin

On Mon, Jun 30, 2014 at 12:43:29PM +0200, Igor Mammedov wrote:
 fixes QEMU abort in case it's started without memory
 hotplug enabled.
 
 as result of fix it will print following messages:
 
 -device pc-dimm,id=d1,memdev=m1: memory hotplug is not enabled, enable it on 
 startup
 -device pc-dimm,id=d1,memdev=m1: Device 'pc-dimm' could not be initialized
 
 
 Also fixup assert condition to detect hotplug address
 space overflow.
 
 Signed-off-by: Igor Mammedov imamm...@redhat.com
 Reported-by:  Hu Tao hu...@cn.fujitsu.com

Applied, thanks.

 ---
 v2:
  - amend error message
 ---
  hw/mem/pc-dimm.c |8 +++-
  1 files changed, 7 insertions(+), 1 deletions(-)
 
 diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
 index ad176b7..08f49ed 100644
 --- a/hw/mem/pc-dimm.c
 +++ b/hw/mem/pc-dimm.c
 @@ -146,7 +146,13 @@ uint64_t pc_dimm_get_free_addr(uint64_t 
 address_space_start,
  uint64_t new_addr, ret = 0;
  uint64_t address_space_end = address_space_start + address_space_size;
  
 -assert(address_space_end  address_space_size);
 +if (!address_space_size) {
 +error_setg(errp, memory hotplug is not enabled, 
 + please add maxmem option);
 +goto out;
 +}
 +
 +assert(address_space_end  address_space_start);
  object_child_foreach(qdev_get_machine(), pc_dimm_built_list, list);
  
  if (hint) {
 -- 
 1.7.1

Re: [Qemu-devel] [PATCH v2] numa: check for busy memory backend

2014-06-30 Thread Michael S. Tsirkin

On Mon, Jun 30, 2014 at 06:28:15PM +0800, Hu Tao wrote:
 Specifying the same memory backend twice leads to an assert:
 
 ./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object
 memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0
 -numa node,nodeid=1,memdev=ram0
 qemu-system-x86_64: /scm/qemu/memory.c:1506:
 memory_region_add_subregion_common: Assertion `!subregion-container'
 failed.
 Aborted (core dumped)
 
 Detect and exit with an error message instead.
 
 Reviewed-by: Igor Mammedov imamm...@redhat.com
 Signed-off-by: Hu Tao hu...@cn.fujitsu.com

Applied, thanks.

 ---
  numa.c | 8 
  1 file changed, 8 insertions(+)
 
 diff --git a/numa.c b/numa.c
 index 2fde740..7bf7834 100644
 --- a/numa.c
 +++ b/numa.c
 @@ -301,6 +301,14 @@ void memory_region_allocate_system_memory(MemoryRegion 
 *mr, Object *owner,
  exit(1);
  }
  
 +if (memory_region_is_mapped(seg)) {
 +char *path = 
 object_get_canonical_path_component(OBJECT(backend));
 +error_report(memory backend %s is used multiple times. Each 
 + -numa option must use a different memdev value.,
 + path);
 +exit(1);
 +}
 +
  memory_region_add_subregion(mr, addr, seg);
  vmstate_register_ram_global(seg);
  addr += size;
 -- 
 1.9.3

Re: [Qemu-devel] The first function called after migration for a block device

2014-06-30 Thread Paolo Bonzini


Il 28/06/2014 00:54, Xiongzi Ge ha scritto:

Hi Paolo,

Thanks. I found a function called bdrv_invalidate_cache() in qcow2.c.
After migration, it will be called to invalidate the cache of the block
device?


Let me quote again:


This function has *nothing* to do with the guest OS's
cache.  That cache is managed by the guest OS and, as I have already
told you multiple times, there is *nothing* that QEMU can do about it.

All you can do is use O_DIRECT in the guest.


Did you even *try* to understand this?!?

Paolo

Re: [Qemu-devel] [PATCH FOR 2.1 1/5] tests/test-qmp-event: fix for GLib 2.31

2014-06-30 Thread Paolo Bonzini


Il 29/06/2014 22:31, Peter Maydell ha scritto:

On 27 June 2014 19:28, Luiz Capitulino lcapitul...@redhat.com wrote:

On Wed, 25 Jun 2014 15:15:35 +0200
Paolo Bonzini pbonz...@redhat.com wrote:


Il 25/06/2014 15:13, Luiz Capitulino ha scritto:

On Tue, 24 Jun 2014 16:33:56 -0700
Wenchao Xia wenchaoq...@gmail.com wrote:


From: Paolo Bonzini pbonz...@redhat.com

On old GLib, the test needs a g_thread_init call.

Reported-by: Wenchao Xia wenchaoq...@gmail.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Tested-by: Wenchao Xia wenchaoq...@gmail.com
Signed-off-by: Wenchao Xia wenchaoq...@gmail.com
---
 tests/test-qmp-event.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tests/test-qmp-event.c b/tests/test-qmp-event.c
index cb1e441..17c6444 100644
--- a/tests/test-qmp-event.c
+++ b/tests/test-qmp-event.c
@@ -251,6 +251,7 @@ static void test_event_d(TestEventData *data,

 int main(int argc, char **argv)
 {
+g_thread_init(NULL);
 qmp_event_set_func_emit(event_test_emit);

 g_test_init(argc, argv, NULL);


This breaks make check on F20:


/home/lcapitulino/work/src/upstream/qmp-unstable/tests/test-qmp-event.c: In 
function ‘main’:
/home/lcapitulino/work/src/upstream/qmp-unstable/tests/test-qmp-event.c:254:5: 
error: ‘g_thread_init’ is deprecated (declared at 
/usr/include/glib-2.0/glib/deprecated/gthread.h:260) 
[-Werror=deprecated-declarations]
 g_thread_init(NULL);
 ^
cc1: all warnings being treated as errors
make: *** [tests/test-qmp-event.o] Error 1


I think the best way to fix this is to make util/osdep.c:thread_init()
public (maybe by moving it to include/glib-compat.h) and use that instead.
Also, note that thread_init()'s body is duplicated in a few other places,
so maybe those places should call it too.

You may want to do this in a different series, then I can skip this patch
and apply the rest of the series.



Thanks Luiz, it's a good suggestion.


Paolo, Wenchao, are one of one going to work on this?


Ping! Can we have at least a local fix using glib version #ifdefs before
Tuesday please? Otherwise we need to do something like this
to avoid shipping an rc0 which doesn't pass make check on some
systems.


I'll send the patch today.  thread_init() is a bit tricky because it is 
a __constructor__ but it is not included in the binary because no other 
function is included from the same file.


BTW, the make check limitation is currently mentioned in the changelog.

Paolo


diff --git a/tests/Makefile b/tests/Makefile
index 7e53d0d..a1a0dae 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -27,8 +27,6 @@ check-unit-y += tests/test-string-input-visitor$(EXESUF)
 gcov-files-test-string-input-visitor-y = qapi/string-input-visitor.c
 check-unit-y += tests/test-string-output-visitor$(EXESUF)
 gcov-files-test-string-output-visitor-y = qapi/string-output-visitor.c
-check-unit-y += tests/test-qmp-event$(EXESUF)
-gcov-files-test-qmp-event-y += qapi/qmp-event.c
 check-unit-y += tests/test-opts-visitor$(EXESUF)
 gcov-files-test-opts-visitor-y = qapi/opts-visitor.c
 check-unit-y += tests/test-coroutine$(EXESUF)
@@ -213,7 +211,7 @@ test-obj-y = tests/check-qint.o
tests/check-qstring.o tests/check-qdict.o \
tests/test-qmp-input-visitor.o tests/test-qmp-input-strict.o \
tests/test-qmp-commands.o tests/test-visitor-serialization.o \
tests/test-x86-cpuid.o tests/test-mul64.o tests/test-int128.o \
-   tests/test-opts-visitor.o tests/test-qmp-event.o
+   tests/test-opts-visitor.o

 test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \
   tests/test-qapi-event.o

thanks
-- PMM

Re: [Qemu-devel] [RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class type

2014-06-30 Thread Paolo Bonzini


Il 30/06/2014 05:13, Chen, Tiejun ha scritto:

After I discuss internal, we think even we just set the real
vendor/device ids to this ISA bridge at 00:1f.0, guest firmware should
still work well with these pair of real vendor/device ids.

So if you think something would conflict or be broken, could you tell us
what's exactly that? Then we will double check.


The Xen hvmloader doesn't break since it only supports one chipset.  But 
SeaBIOS checks for the exact vendor/device ids since Q35 support was added.


If you want to add this feature, try to implement it in a way that is a 
bit more forward-looking.  I'm sure that Xen sooner or later will want a 
PCIe chipset, otherwise things such as AER forwarding are impossible.


Paolo

Re: [Qemu-devel] [PATCH] qemu-char: Convert socket char backend to parse/kind

2014-06-30 Thread Gerd Hoffmann

On Mo, 2014-06-30 at 11:33 +0100, Peter Maydell wrote:
 On 30 June 2014 11:23, Gerd Hoffmann kra...@redhat.com wrote:
Hi,
 
* The old qemu_chr_open_socket() has an
  if (!is_waitconnect)
  qemu_set_nonblock(fd);
  which the QMP char-open codepath has never had. Does this matter?
  Which of the two code paths was correct?
 
  Gerd?
 
  IIRC the socket is put into non-blocking mode anyway by the qemu socket
  helper functions.
 
 In that case is qemu_chr_open_socket_fd() incorrect
 in marking the socket as nonblocking in the
 is_listen  is_waitconnect case?

Honestly I think the whole waitconnect stuff should be ripped out.
Obvious problem is backward compatibility.  But maybe not because nobody
uses it anyway.  Anyone out there using chardev server sockets without
'nowait' option?

waitconnect means go into server mode, wait for a client to connect,
then unpause the guest.  Which handles one special case of the generic
start qemu with -S, connect everything, then sent 'cont' to monitor
libvirt is doing.

IIRC wait for client to connect is a blocking accept() call.  Which
you certainly don't want do in a qmp command handler.  So if we switch
over chardevs created via -chardev to use the qmp init code path too we
need some hackery to make '-chardev socket,wait,...' work without
introducing a blocking qmp command I suspect ...

cheers,
  Gerd

Re: [Qemu-devel] [PATCH v2 3/3] serial: poll the serial console with G_IO_HUP

2014-06-30 Thread Roger Pau Monné

Do I need to resend this? it's been more than a month without review.

Roger.

On 13/06/14 17:35, Roger Pau Monné wrote:
 Ping?
 
 On 23/05/14 17:57, Roger Pau Monne wrote:
 On FreeBSD polling a master pty while the other end is not connected
 with G_IO_OUT only results in an endless wait. This is different from
 the Linux behaviour, that returns immediately. In order to demonstrate
 this, I have the following example code:

 http://xenbits.xen.org/people/royger/test_poll.c

 When executed on Linux:

 $ ./test_poll
 In callback

 On FreeBSD instead, the callback never gets called:

 $ ./test_poll

 So, in order to workaround this, poll the source with G_IO_HUP (which
 makes the code behave the same way on both Linux and FreeBSD).

 Signed-off-by: Roger Pau Monné roger@citrix.com
 Cc: Peter Crosthwaite peter.crosthwa...@xilinx.com
 Cc: Michael Tokarev m...@tls.msk.ru
 Cc: Andreas Färber afaer...@suse.de
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: xen-de...@lists.xenproject.org
 ---
 Changes since v1:
  - Fix other users of qemu_chr_fe_add_watch to use G_IO_HUP.
 ---
  hw/char/serial.c |2 +-
  hw/char/virtio-console.c |3 ++-
  hw/usb/redirect.c|2 +-
  monitor.c|2 +-
  4 files changed, 5 insertions(+), 4 deletions(-)

 diff --git a/hw/char/serial.c b/hw/char/serial.c
 index f4d167f..2a2c9e5 100644
 --- a/hw/char/serial.c
 +++ b/hw/char/serial.c
 @@ -246,7 +246,7 @@ static gboolean serial_xmit(GIOChannel *chan, 
 GIOCondition cond, void *opaque)
  serial_receive1(s, s-tsr, 1);
  } else if (qemu_chr_fe_write(s-chr, s-tsr, 1) != 1) {
  if (s-tsr_retry = 0  s-tsr_retry  MAX_XMIT_RETRY 
 -qemu_chr_fe_add_watch(s-chr, G_IO_OUT, serial_xmit, s)  0) {
 +qemu_chr_fe_add_watch(s-chr, G_IO_OUT|G_IO_HUP, serial_xmit, 
 s)  0) {
  s-tsr_retry++;
  return FALSE;
  }
 diff --git a/hw/char/virtio-console.c b/hw/char/virtio-console.c
 index 6c8be0f..38e290a 100644
 --- a/hw/char/virtio-console.c
 +++ b/hw/char/virtio-console.c
 @@ -69,7 +69,8 @@ static ssize_t flush_buf(VirtIOSerialPort *port,
  if (!k-is_console) {
  virtio_serial_throttle_port(port, true);
  if (!vcon-watch) {
 -vcon-watch = qemu_chr_fe_add_watch(vcon-chr, G_IO_OUT,
 +vcon-watch = qemu_chr_fe_add_watch(vcon-chr,
 +G_IO_OUT|G_IO_HUP,
  chr_write_unblocked, 
 vcon);
  }
  }
 diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
 index 287a505..06e757d 100644
 --- a/hw/usb/redirect.c
 +++ b/hw/usb/redirect.c
 @@ -284,7 +284,7 @@ static int usbredir_write(void *priv, uint8_t *data, int 
 count)
  r = qemu_chr_fe_write(dev-cs, data, count);
  if (r  count) {
  if (!dev-watch) {
 -dev-watch = qemu_chr_fe_add_watch(dev-cs, G_IO_OUT,
 +dev-watch = qemu_chr_fe_add_watch(dev-cs, G_IO_OUT|G_IO_HUP,
 usbredir_write_unblocked, 
 dev);
  }
  if (r  0) {
 diff --git a/monitor.c b/monitor.c
 index 593679a..ae1c539 100644
 --- a/monitor.c
 +++ b/monitor.c
 @@ -304,7 +304,7 @@ void monitor_flush(Monitor *mon)
  mon-outbuf = tmp;
  }
  if (mon-watch == 0) {
 -mon-watch = qemu_chr_fe_add_watch(mon-chr, G_IO_OUT,
 +mon-watch = qemu_chr_fe_add_watch(mon-chr, G_IO_OUT|G_IO_HUP,
 monitor_unblocked, mon);
  }
  }

Re: [Qemu-devel] [PATCH 0/3] block: introduce submit I/O at batch

2014-06-30 Thread Paolo Bonzini


Il 30/06/2014 11:49, Ming Lei ha scritto:

Hi,

The commit 580b6b2aa2(dataplane: use the QEMU block layer for I/O)
introduces ~40% throughput regression on virtio-blk dataplane, and
one of causes is that submitting I/O at batch is removed.

This patchset trys to introduce this mechanism on block, at least,
linux-aio can benefit from that.

With these patches, it is observed that thoughout on virtio-blk
dataplane can be improved a lot, see data in commit log of patch
3/3.

It should be possible to apply the batch mechanism to other devices
(such as virtio-scsi) too.


The basic idea of the code is great, however I do not think it is 
necessary to add the logic in AioContext.  You are right that io_submit 
supports multiple I/O, but right now we have one io_context_t per 
BlockDriverState so it is somewhat premature.  Note that AioContext in 
QEMU means something else than io_context_t.


I don't think it's necessary to walk backing_hd too (especially because 
we're close to release and this cannot be a regression---dataplane in 
2.0 didn't support backing files at all!).  We need to keep the patches 
as simple as possible.


You can just add bdrv_plug/bdrv_unplug callbacks to block/linux-aio.c 
and, in block.c, something like


BlockDriver *drv = bs-drv;
if (drv  drv-bdrv_plug) {
drv-bdrv_plug(bdrv);
} else if (bs-file) {
bdrv_plug(bs-file);
}

and place all the queuing logic in linux-aio.c.  It should be obvious to 
the reviewer that the patch only affects dataplane.


(Also, note that QEMU doesn't use __ as the prefix for internal function).

Thanks for the patch!

Paolo

Re: [Qemu-devel] [questions] about KVM as a Microsoft-compatible hypervisor

2014-06-30 Thread Jidong Xiao

On Mon, Jun 30, 2014 at 6:02 AM, Vadim Rozenfeld vroze...@redhat.com wrote:
 On Mon, 2014-06-30 at 09:39 +0800, Zhang Haoyu wrote:
 Hi, Vadim
 I read the kvm-2012-forum paper  KVM as a Microsoft-compatible hypervisor,
 Any update and other references, please?

 Thanks,
 Zhang Haoyu



 Unfortunately, not too much.
 From the the most recent, we have lazy eoi implemented by
 MST and reference time counter.

 Best regards,
 Vadim.

It looks like that Mircosoft has defined a large number of synthetic
registers in their Hyper-v spec, so ultimately KVM should virtualize
most of these registers, so as to support the Mircosoft Enlightment,
right?

-Jidong

Re: [Qemu-devel] Why I advise against using ivshmem

2014-06-30 Thread Markus Armbruster

Stefan Hajnoczi stefa...@gmail.com writes:

 On Tue, Jun 17, 2014 at 11:44:11AM +0200, Paolo Bonzini wrote:
 Il 17/06/2014 11:03, David Marchand ha scritto:
 Unless someone steps up and maintains ivshmem, I think it should be
 deprecated and dropped from QEMU.
 
 Then I can maintain ivshmem for QEMU.
 If this is ok, I will send a patch for MAINTAINERS file.
 
 Typically, adding yourself to maintainers is done only after having proved
 your ability to be a maintainer. :)
 
 So, let's stop talking and go back to code!  You can start doing what was
 suggested elsewhere in the thread: get the server and uio driver merged into
 the QEMU tree, document the protocol in docs/specs/ivshmem_device_spec.txt,
 and start fixing bugs such as the ones that Markus reported.

 One more thing to add to the list:

 static void ivshmem_read(void *opaque, const uint8_t * buf, int flags)

 The flags argument should be size.  Size should be checked before
 accessing buf.

 Please also see the bug fixes in the following unapplied patch:
 [PATCH] ivshmem: fix potential OOB r/w access (#2) by Sebastian Krahmer
 https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg03538.html

Another one: most devices can be controlled via a dedicated
CONFIG_DEVNAME, but not ivshmem: it uses CONFIG_KVM and CONFIG_PCI.
Giving it its own CONFIG_IVSHMEM would be nice.

Re: [Qemu-devel] [RFC] qemu VGA endian swap low level drawing changes

2014-06-30 Thread Gerd Hoffmann

  Hi,

 From what I can tell, we only ever call the cursor drawing callback on
 non-shared surfaces. Should I deduce that the HW cursor emulation simply
 doesn't work when using shared surfaces ? Or is there another path I
 have missed to handle it ?

Hmm.  Looks like hw-cursor-on-shared-surface broken indeed.  Need to dig
out a guest which actually uses it  go figure when testing your patch
series ...

 It makes sense in a way since we never want to draw the cursor in the
 shared framebuffer, but we could probably handle it by having a small
 separate pixman surface which we paint on top of the final render or
 something like that (or link to a host side HW cursor if any) but I
 can't quite see anything in the code.

There is infrastructure to inform the ui code how the cursor should look
like (grep for dpy_cursor_define), so we actually can use the hosts
hardware cursor support.  cirrus doesn't use it though.

cheers,
  Gerd

Re: [Qemu-devel] [RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class type

2014-06-30 Thread Michael S. Tsirkin

On Thu, Jun 19, 2014 at 05:53:51PM +0800, Tiejun Chen wrote:
 Originally the reason to probe ISA bridge instead of Dev31:Fun0
 is to make graphics device passthrough work easy for VMM, that
 only need to expose ISA bridge to let driver know the real
 hardware underneath. This is a requirement from virtualization
 team. Especially in that virtualized environments, XEN, there
 is irrelevant ISA bridge in the system with that legacy qemu
 version specific to xen, qemu-xen-traditional. So to work
 reliably, we should scan through all the ISA bridge devices
 and check for the first match, instead of only checking the
 first one.
 
 But actually, qemu-xen-traditional, is always enumerated with
 Dev31:Fun0, 00:1f.0 as follows:
 
 hw/pt-graphics.c:
 
 intel_pch_init()
 |
 + pci_isa_bridge_init(bus, PCI_DEVFN(0x1f, 0), ...);
 
 so this mean that isa bridge is still represented with Dev31:Func0
 like the native OS. Furthermore, currently we're pushing VGA
 passthrough support into qemu upstream, and with some discussion,
 we wouldn't set the bridge class type and just expose this devfn.
 
 So we just go back to check devfn to make life normal.
 
 Signed-off-by: Tiejun Chen tiejun.c...@intel.com
 ---
  drivers/gpu/drm/i915/i915_drv.c | 19 +++
  1 file changed, 3 insertions(+), 16 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
 index 651e65e..cb2526e 100644
 --- a/drivers/gpu/drm/i915/i915_drv.c
 +++ b/drivers/gpu/drm/i915/i915_drv.c
 @@ -417,18 +417,8 @@ void intel_detect_pch(struct drm_device *dev)
   return;
   }
  
 - /*
 -  * The reason to probe ISA bridge instead of Dev31:Fun0 is to
 -  * make graphics device passthrough work easy for VMM, that only
 -  * need to expose ISA bridge to let driver know the real hardware
 -  * underneath. This is a requirement from virtualization team.
 -  *
 -  * In some virtualized environments (e.g. XEN), there is irrelevant
 -  * ISA bridge in the system. To work reliably, we should scan trhough
 -  * all the ISA bridge devices and check for the first match, instead
 -  * of only checking the first one.
 -  */
 - while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA  8, pch))) {
 + pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0));
 + if (pch) {

Then if you want to use this slot for something else, what happens?
If you want to relax the PCI_CLASS_BRIDGE_ISA requirement when
running on top of a hypervisor, just scan all devices.

   if (pch-vendor == PCI_VENDOR_ID_INTEL) {
   unsigned short id = pch-device  
 INTEL_PCH_DEVICE_ID_MASK;
   dev_priv-pch_id = id;
 @@ -462,10 +452,7 @@ void intel_detect_pch(struct drm_device *dev)
   DRM_DEBUG_KMS(Found LynxPoint LP PCH\n);
   WARN_ON(!IS_HASWELL(dev));
   WARN_ON(!IS_ULT(dev));
 - } else
 - continue;
 -
 - break;
 + }
   }
   }
   if (!pch)
 -- 
 1.9.1

Re: [Qemu-devel] [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-06-30 Thread Paolo Bonzini


Il 30/06/2014 12:20, Chen, Tiejun ha scritto:


I already post this to mainline to change as follows:

-while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA  8, pch))) {
+pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0));
+if (pch) {

Please refer to this,

[RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead
of check class type

Linux Native guys would like to accept this. And actually Windows always
use devfn to detect this.


Fair enough, but that means that, when using IGD, Q35 will have to move 
the ISA bridge off 1f.0.


To me it seems fairly clear that as things stand IGD is not 
virtualizable without PV support.  We're beating a dead horse.


Paolo

Re: [Qemu-devel] [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-06-30 Thread Michael S. Tsirkin

On Mon, Jun 30, 2014 at 06:20:22PM +0800, Chen, Tiejun wrote:
 On 2014/6/30 17:55, Michael S. Tsirkin wrote:
 On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote:
 On 2014/6/30 17:05, Michael S. Tsirkin wrote:
 On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote:
 On 2014/6/30 14:48, Michael S. Tsirkin wrote:
 On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote:
 On 2014/6/26 18:03, Paolo Bonzini wrote:
 Il 26/06/2014 11:18, Chen, Tiejun ha scritto:
 
 
 - offsets 0x..0x0fff map to configuration space of the host MCH
 
 
 Are you saying the config space in the video device?
 
 No, I am saying in a new BAR, or at some magic offset of an existing
 MMIO BAR.
 
 
 As I mentioned previously, the IGD guy told me we have no any unused a
 offset or BAR in the config space.
 
 And guy who are responsible for the native driver seems not be accept to
 extend some magic offset of an existing MMIO BAR.
 
 In addition I think in a short time its not possible to migrate i440fx 
 to
 q35 as a PCIe machine of xen.
 
 That seems like a weak motivation.  I don't see a need to get something
 merged upstream in a short time: this seems sure to miss 2.1,
 so you have the time to make it architecturally sound.
 Making existing guests work would be a better motivation.
 
 Yes.
 
 So focus on this then. Existing guests will probably work
 fine on a newer chipset - likely better than on i440fx.
 xen management tools need to do some work to support this?
 That will just give everyone more long term benefits.
 
 If instead we create a hack that does not resemble
 any existing hardware even remotely, what's the
 chance that it will not break with any future
 guest modification?
 
 
 Isn't this possible with an mch chipset?
 
 If you're saying q35, I mean AFAIK we have no any plan to migrate to this
 MCH in xen case.
 
 q35 or a newer chipset that's closer to what guests expect.
 
 Additionally, I think I should follow this feature after
 q35 can work for xen scenario.
 
 What's stopping you?
 
 I mean if you want create an new machine based on q35, actually this is
 equal to start making xen to migrate to q35 now. Right? I can't image how
 much effort should be done.
 
 I don't see why you don't try. Sounds like a more robust solution to me.
 
 As I think this is another requirement to us. I'm not sure if I have enough
 time to touch this currently.
 
 
 So this is a reason why I'm saying I'd like to follow this feature after q35
 can work with xen completely.
 
 Then we'll end up with more configurations to support, and to what end?
 
 
 
 
 So could we do this step by step:
 
 #1 phase: We just cover current qemu-xen implementation based on i44fx, 
 so
 still provide that pseudo ISA bridge at 00:1f.0 as we already did.
 
 By the way there is no reason to put it at 00:1f.0 specifically I think.
 So it seems simple: create a dummy device that gets device and
 vendor id as properties. If you really like, add an option to get values
 
 Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx
 passthrough: create pseudo intel isa bridge. There, we fake this device 
 just
 at 00:1f.0.
 But you guys don't like this, and shouldn't this be just this point we
 discussing now?
 
 If you guys agree that , everything is fine.
 
 Actually, this isn't what you did.
 Don't tie it to xen, and don't tie it to 1f.
 Just make it a simple stub pci device.
 Whoever wants it, creates it.
 
 The thing I worry about, is the chance this will break going forward.
 So you created a system with 2 isa bridges.
 
 I don't set class type to claim this is an ISA bridge, just a normal PCI
 device.
 +static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice
 *hdev)
 +{
 +struct PCIDevice *dev;
 +
 +char rid;
 +
 +/* We havt to use a simple PCI device to fake this ISA bridge
 + * to avoid making some confusion to BIOS and ACPI.
 + */
 +dev = pci_create(bus, PCI_DEVFN(0x1f, 0),
 pseudo-intel-pch-isa-bridge);
 +
 +qdev_init_nofail(dev-qdev);
 +
 +pci_config_set_vendor_id(dev-config, hdev-vendor_id);
 +pci_config_set_device_id(dev-config, hdev-device_id);
 +
 +xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)rid, 1);
 +
 +pci_config_set_revision(dev-config, rid);
 +
 +XEN_PT_LOG(dev, The pseudo Intel PCH ISA bridge created.\n);
 +return 0;
 +}
 
 Then I don't see how it's supposed to work.
 Doesn't i915 look for an isa bridge?
 
  /*
   * The reason to probe ISA bridge instead of Dev31:Fun0 is to
   * make graphics device passthrough work easy for VMM, that only
   * need to expose ISA bridge to let driver know the real hardware
   * underneath. This is a requirement from virtualization team.
   *
   * In some virtualized environments (e.g. XEN), there is irrelevant
   * ISA bridge in the system. To work reliably, we should scan 
  trhough
   * all the ISA bridge devices and check for

Re: [Qemu-devel] [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-06-30 Thread Michael S. Tsirkin

On Mon, Jun 30, 2014 at 01:18:51PM +0200, Paolo Bonzini wrote:
 Il 30/06/2014 12:20, Chen, Tiejun ha scritto:
 
 I already post this to mainline to change as follows:
 
 -while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA  8, pch))) {
 +pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0));
 +if (pch) {
 
 Please refer to this,
 
 [RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead
 of check class type
 
 Linux Native guys would like to accept this. And actually Windows always
 use devfn to detect this.
 
 Fair enough, but that means that, when using IGD, Q35 will have to move the
 ISA bridge off 1f.0.
 
 To me it seems fairly clear that as things stand IGD is not virtualizable
 without PV support.  We're beating a dead horse.
 
 Paolo

It seems virtualizeable without PV.  Virtualizing it just requires
emulating a chipset that's closer to what the driver expects,
which seems to be more effort than Tiejun is prepared to put in.

-- 
MST

Re: [Qemu-devel] [PATCH] qemu-char: Convert socket char backend to parse/kind

2014-06-30 Thread Peter Maydell

On 30 June 2014 11:57, Gerd Hoffmann kra...@redhat.com wrote:
 IIRC wait for client to connect is a blocking accept() call.  Which
 you certainly don't want do in a qmp command handler.  So if we switch
 over chardevs created via -chardev to use the qmp init code path too we
 need some hackery to make '-chardev socket,wait,...' work without
 introducing a blocking qmp command I suspect ...

We already have blocking code paths in qmp_chardev_open_socket():
for the non-listening case we call socket_connect() with a NULL callback
argument, which will result in our calling connect() in blocking mode.

thanks
-- PMM

Re: [Qemu-devel] [RFC 10/14] vga: Remove some should be done in BIOS comments

2014-06-30 Thread Gerd Hoffmann

On Di, 2014-06-24 at 09:11 +1000, Benjamin Herrenschmidt wrote:
 Not all platforms have a VGA BIOS, powerpc typically relies on
 using the DISPI interface to initialize the card.

s/bios/guest/ and it would hold ;)

But the dispi interface is defined the way it is.  We are not going to
change it, even if it isn't perfect, as it is guest/host abi.  So the
comments are not very useful indeed ...

cheers,
  Gerd

Re: [Qemu-devel] [PATCH v8 03/14] qcow2: Optimize bdrv_make_empty()

2014-06-30 Thread Kevin Wolf

Am 07.06.2014 um 20:51 hat Max Reitz geschrieben:
 bdrv_make_empty() is currently only called if the current image
 represents an external snapshot that has been committed to its base
 image; it is therefore unlikely to have internal snapshots. In this
 case, bdrv_make_empty() can be greatly sped up by creating an empty L1
 table and dropping all data clusters at once by recreating the refcount
 structure accordingly instead of normally discarding all clusters.
 
 If there are snapshots, fall back to the simple implementation (discard
 all clusters).
 
 Signed-off-by: Max Reitz mre...@redhat.com
 Reviewed-by: Eric Blake ebl...@redhat.com

This approach looks a bit too complicated to me, and calulating the
required metadata size seems error-prone.

How about this:

1. Set the dirty flag in the header so we can mess with the L1 table
   without keeping the refcounts consistent

2. Overwrite the L1 table with zeros

3. Overwrite the first n clusters after the header with zeros
   (n = 2 + l1_clusters).

4. Update the header:
   refcount_table_offset = cluster_size
   refcount_table_clusters = 1
   l1_table_offset = 3 * cluster_size

6. bdrv_truncate to n + 1 clusters

7. Now update the first 8 bytes at cluster_size (the first new refcount
   table entry) to point to 2 * cluster_size (new refcount block)

8. Reset refcount block and L2 cache

9. Allocate n + 1 clusters (the header, too) and make sure you get
   offset 0

10. Remove the dirty flag

Surprisingly (or not) this is much like an ordinary image creation. The
main difference is that we keep the full size of the L1 table so the
image stays always valid (the spec would even allow us to temporarily
set l1_size = 0, but qcow2_open() doesn't seem to like that) and all
areas where the L1 table could be are zeroed (this includes the new
refcount table/block until the header is updated).


I wanted to check whether this would still give the preallocation=full
series what it needs, but a v11 doesn't seem to be on the list yet and
v10 doesn't have the dependency on this series yet.

Kevin

Re: [Qemu-devel] [questions] about KVM as a Microsoft-compatible hypervisor

2014-06-30 Thread Vadim Rozenfeld

On Mon, 2014-06-30 at 06:19 -0400, Jidong Xiao wrote:
 On Mon, Jun 30, 2014 at 6:02 AM, Vadim Rozenfeld vroze...@redhat.com wrote:
  On Mon, 2014-06-30 at 09:39 +0800, Zhang Haoyu wrote:
  Hi, Vadim
  I read the kvm-2012-forum paper  KVM as a Microsoft-compatible 
  hypervisor,
  Any update and other references, please?
 
  Thanks,
  Zhang Haoyu
 
 
 
  Unfortunately, not too much.
  From the the most recent, we have lazy eoi implemented by
  MST and reference time counter.
 
  Best regards,
  Vadim.
 
 It looks like that Mircosoft has defined a large number of synthetic
 registers in their Hyper-v spec, so ultimately KVM should virtualize
 most of these registers, so as to support the Mircosoft Enlightment,
 right?
 
 -Jidong

Yes, but you don't have to support all the Hyper-V features at once. 
Hypervisor declares supported feature by specifying appropriate flags
in Feature identification (0x4003) and Implementation
recommendations (0x4004)CPUID leaves.

Best regards,
Vadim.

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC 13/14] vga: Add endian control register

2014-06-30 Thread Gerd Hoffmann

  Hi,

 diff --git a/hw/display/vga_int.h b/hw/display/vga_int.h
 index ae64321..894c6ab 100644
 --- a/hw/display/vga_int.h
 +++ b/hw/display/vga_int.h
 @@ -47,6 +47,8 @@
  #define VBE_DISPI_INDEX_Y_OFFSET0x9
  #define VBE_DISPI_INDEX_NB  0xa /* size of vbe_regs[] */
  #define VBE_DISPI_INDEX_VIDEO_MEMORY_64K 0xa /* read-only, not in vbe_regs */
 +#define VBE_DISPI_INDEX_EXTENDED_CAPS   0xb /* read-only, not in vbe_regs */
 +#define VBE_DISPI_INDEX_ENDIAN_CTRL 0xc /* not in vbe_regs */
  
  #define VBE_DISPI_ID0   0xB0C0
  #define VBE_DISPI_ID1   0xB0C1
 @@ -55,13 +57,22 @@
  #define VBE_DISPI_ID4   0xB0C4
  #define VBE_DISPI_ID5   0xB0C5

I was more thinking to add ID6 to indicate the new interface revision
with the additional VBE_DISPI_INDEX_ENDIAN_CTRL register.

I'm a bit worried that there is no response from the bochs guys yet, I
don't want have two incompatible rev6 interfaces.  At least nobody seems
to have defined one so far, google finds nothing for bochs dispi
0xB0C6.

cheers,
  Gerd

Re: [Qemu-devel] [questions] about KVM as a Microsoft-compatiblehypervisor

2014-06-30 Thread Zhang Haoyu

 Hi, Vadim
 I read the kvm-2012-forum paper  KVM as a Microsoft-compatible hypervisor, 
 Any update and other references, please?
 
 Thanks,
 Zhang Haoyu
 
 

Unfortunately, not too much.
From the the most recent, we have lazy eoi implemented by 
MST and reference time counter.

How to get the source of windows pv-eoi?
And what is reference time counter, could you provide some references or 
code, please?

Thanks,
Zhang Haoyu

Best regards,
Vadim.

[Qemu-devel] [PATCH 0/5] tests: Add the image fuzzer with qcow2 support

2014-06-30 Thread Maria Kustova

This patch series introduces the image fuzzer, a tool for stability and
reliability testing.
Its approach is to run large amount of tests in background. During every test a
program (e.g. qemu-img) is called to read or modify an invalid test image.
A test image has valid inner structure defined by its format specification with
some fields having random invalid values.

Patch 1 contains documentation for the image fuzzer, patch 2 is the test runner
and remaining ones relate to the image generator for qcow2 format.

Maria Kustova (5):
  docs: Specification for the image fuzzer
  runner: Tool for fuzz tests execution
  fuzz: Fuzzing functions for qcow2 images
  layout: Generator of fuzzed qcow2 images
  package: Public API for image-fuzzer/runner/runner.py

 tests/image-fuzzer/docs/image-fuzzer.txt | 176 +
 tests/image-fuzzer/qcow2/__init__.py |   1 +
 tests/image-fuzzer/qcow2/fuzz.py | 329 +++
 tests/image-fuzzer/qcow2/layout.py   | 250 +++
 tests/image-fuzzer/runner/runner.py  | 270 +
 5 files changed, 1026 insertions(+)
 create mode 100644 tests/image-fuzzer/docs/image-fuzzer.txt
 create mode 100644 tests/image-fuzzer/qcow2/__init__.py
 create mode 100644 tests/image-fuzzer/qcow2/fuzz.py
 create mode 100644 tests/image-fuzzer/qcow2/layout.py
 create mode 100755 tests/image-fuzzer/runner/runner.py

-- 
1.9.3

[Qemu-devel] [PATCH 1/5] docs: Specification for the image fuzzer

2014-06-30 Thread Maria Kustova

'Overall fuzzer requirements' chapter contains the current product vision and
features done and to be done. This chapter is still in progress.

Signed-off-by: Maria Kustova mari...@catit.be
---
 tests/image-fuzzer/docs/image-fuzzer.txt | 176 +++
 1 file changed, 176 insertions(+)
 create mode 100644 tests/image-fuzzer/docs/image-fuzzer.txt

diff --git a/tests/image-fuzzer/docs/image-fuzzer.txt 
b/tests/image-fuzzer/docs/image-fuzzer.txt
new file mode 100644
index 000..a9ed4b6
--- /dev/null
+++ b/tests/image-fuzzer/docs/image-fuzzer.txt
@@ -0,0 +1,176 @@
+Image fuzzer
+
+
+Description
+---
+
+The goal of the image fuzzer is to catch crashes of qemu-io/qemu-img providing
+to them randomly corrupted images.
+Test images are generated from scratch and have valid inner structure with some
+elements, e.g. L1/L2 tables, having random invalid values.
+
+
+Test runner
+---
+
+The test runner generates test images, executes tests utilizing generated
+images, indicates their results and collect all test related artifacts (logs,
+core dumps, test images).
+The test means one start of a system under test (SUT), e.g. qemu-io, with
+specified arguments and one test image.
+By default, the test runner generates new tests and executes them till
+keyboard interruption. But if a test seed is specified via '-s' runner
+parameter, then only one test with this seed will be executed, after its finish
+the runner will exit.
+
+The runner uses an external image fuzzer to generate test images. An image
+generator should be specified as a mandatory parameter of the test runner.
+Details about interactions between the runner and fuzzers see Module
+interfaces.
+
+The runner activates generation of core dumps during test executions, but it
+assumes that core dumps will be generated in the current working directory.
+For comprehensive test results, please, set up your test environment
+properly.
+
+Path to a SUT can be specified via environment variables (for now only
+qemu-img) or via '--binary' parameter of the test runner. For details about
+environment variables see qemu-iotests/check.
+
+
+Qcow2 image generator
+-
+
+The 'qcow2' generator is a Python package providing 'create_image' method as
+a single public API. See details in 'Test runner/image fuzzer' in 'Module
+interfaces'.
+
+Qcow2 contains two submodules: fuzz.py and layout.py.
+
+'fuzz.py' contains all fuzzing functions one per image field. It's assumed that
+after code analysis every field will have own constraints for its value. For
+now only universal potentially dangerous values are used, e.g. type limits for
+integers or unsafe symbols as '%s' for strings. For bitmasks random amount of
+bits are set to ones. All fuzzed values are checked on non-equality to the
+current valid value of the field. In case of equality the value will be
+regenerated.
+
+'layout.py' creates a random valid image, fuzzes a random subset of the image
+fields by 'fuzz.py' module and writes a fuzzed image to the file specified.
+
+For now only header fields and header extensions are generated, the remaining
+file is filled with zeros.
+
+
+Module interfaces
+-
+
+* Test runner/image fuzzer
+
+The runner calls an image generator specifying path to a test image file.
+An image generator is expected to provide 'create_image(test_img_path)' method
+that creates a test image and writes it to the specified file. The file should
+be created if it doesn't exist or overwritten otherwise.
+Random seed is set by the runner at every test execution for the regression
+purpose, so an image generator is not recommended to modify it internally.
+
+* Test runner/SUT
+
+A full test command is composed from the SUT, its arguments specified
+via '-c' runner parameter and the name of generated image. To indicate where
+a test image name should be placed TEST_IMG placeholder can be used.
+For example, for the next test command
+
+ qemu-img convert -O vmdk fuzzed_image test.vmdk
+
+a call of the image fuzzer will be following:
+
+ ./runner.py -b qemu-img -c 'convert -O vmdk TEST_IMG test.vmdk' work_dir qcow2
+
+
+Overall fuzzer requirements
+===
+
+Input data:
+--
+
+ - image template (generator)
+ - work directory
+ - test run duration (optional)
+ - action vector (optional)
+ - seed (optional)
+ - SUT and its arguments (optional)
+
+
+Fuzzer requirements:
+---
+
+1.  Should be able to inject random data
+2.  Should be able to select a random value from the manually pregenerated
+vector (boundary values, e.g. max/min cluster size)
+3.  Image template should describe a general structure invariant for all
+test images (image format description)
+4.  Image template should be autonomous and other fuzzer parts should not
+relate on it
+5.  Image template should contain reference rules (not only block+size
+description)
+6.  Should generate the test image with the

[Qemu-devel] [PATCH 4/5] layout: Generator of fuzzed qcow2 images

2014-06-30 Thread Maria Kustova

Layout submodule of qcow2 package creates a random valid image, randomly
selects some amount of its fields, fuzzes them and write the fuzzed image to
the file.
Now only header and header extensions are generated, remaining file is filled
by zeroes.

Signed-off-by: Maria Kustova mari...@catit.be
---
 tests/image-fuzzer/qcow2/layout.py | 250 +
 1 file changed, 250 insertions(+)
 create mode 100644 tests/image-fuzzer/qcow2/layout.py

diff --git a/tests/image-fuzzer/qcow2/layout.py 
b/tests/image-fuzzer/qcow2/layout.py
new file mode 100644
index 000..b296979
--- /dev/null
+++ b/tests/image-fuzzer/qcow2/layout.py
@@ -0,0 +1,250 @@
+# Generator of fuzzed qcow2 images
+#
+# Copyright (C) 2014 Maria Kustova mari...@catit.be
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+import random
+import struct
+import fuzz
+from itertools import repeat
+import copy
+
+MAX_IMAGE_SIZE = 10*2**20
+# Standard sizes
+UINT32_S = 4
+UINT64_S = 8
+
+# Percentage of fields will be fuzzed
+BIAS = random.uniform(0.1, 0.4)
+
+
+class Field(object):
+Describes an image field as a triplet of a data format necessary for
+packing, an offset to the beginning of the image and value of the field.
+
+Can be iterated as a list [format, offset, value]
+
+__slots__ = ('fmt', 'offset', 'value')
+
+def __init__(self, fmt, offset, val):
+self.fmt = fmt
+self.offset = offset
+self.value = val
+
+def __iter__(self):
+return (x for x in [self.fmt, self.offset, self.value])
+
+def __repr__(self):
+return Field(fmt='%s', offset=%d, value=%s) % (self.fmt, self.offset,
+ str(self.value))
+
+
+def walk(v_struct, func):
+Walk via structure and apply the specified function to all Field()
+elements
+
+if isinstance(v_struct, list):
+for item in v_struct:
+walk(item, func)
+else:
+for k, v in v_struct.items():
+if isinstance(v, list):
+walk(v, func)
+else:
+func(v, k)
+
+
+def fuzz_struct(structure):
+Select part of fields in the specified structure and assign them invalid
+values
+
+
+def coin():
+Return boolean value proportional to a portion of fields to be
+fuzzed
+
+return random.random()  BIAS
+
+def iter_fuzz(field, name):
+Fuzz field value if it's selected 
+if coin():
+field.value = getattr(fuzz, name)(field.value)
+
+tmp = copy.deepcopy(structure)
+walk(tmp, iter_fuzz)
+
+return tmp
+
+
+def image_size():
+Generate a random file size aligned to a random correct cluster size
+cluster_bits = random.randrange(9, 21)
+cluster_size = 1  cluster_bits
+file_size = random.randrange(5*cluster_size, MAX_IMAGE_SIZE + 1,
+ cluster_size)
+return [cluster_bits, file_size]
+
+
+def header(cluster_bits, img_size):
+Generate a random valid header
+meta_header = [
+['4s', 0, 'magic'],
+['I', 4, 'version'],
+['Q', 8, 'backing_file_offset'],
+['I', 16, 'backing_file_size'],
+['I', 20, 'cluster_bits'],
+['Q', 24, 'size'],
+['I', 32, 'crypt_method'],
+['I', 36, 'l1_size'],
+['Q', 40, 'l1_table_offset'],
+['Q', 48, 'refcount_table_offset'],
+['I', 56, 'refcount_table_clusters'],
+['I', 60, 'nb_snapshots'],
+['Q', 64, 'snapshots_offset'],
+['Q', 72, 'incompatible_features'],
+['Q', 80, 'compatible_features'],
+['Q', 88, 'autoclear_features'],
+['I', 96, 'refcount_order'],
+['I', 100, 'header_length']
+]
+values = repeat(0)
+v_header = dict((f[2], Field(f[0], f[1], values.next()))
+for f in meta_header)
+
+# Setup of valid values
+v_header['magic'].value = QFI\xfb
+v_header['version'].value = random.randint(2, 3)
+v_header['cluster_bits'].value = cluster_bits
+v_header['size'].value = img_size
+if v_header['version'].value == 2:
+v_header['header_length'].value = 72
+else:
+v_header['incompatible_features'].value = random.getrandbits(2)
+v_header['compatible_features'].value = random.getrandbits(1)
+v_header['header_length'].value =

[Qemu-devel] [PATCH 2/5] runner: Tool for fuzz tests execution

2014-06-30 Thread Maria Kustova

The purpose of the test runner is to prepare test environment (e.g. create a
work directory, a test image, etc), execute the program under test with
parameters, indicate a test failure if the program was killed during test
execution and collect core dumps, logs and other test artifacts.

The test runner doesn't depend on image format or a program will be tested, so
it can be used with any external image generator and program under test.

Signed-off-by: Maria Kustova mari...@catit.be
---
 tests/image-fuzzer/runner/runner.py | 270 
 1 file changed, 270 insertions(+)
 create mode 100755 tests/image-fuzzer/runner/runner.py

diff --git a/tests/image-fuzzer/runner/runner.py 
b/tests/image-fuzzer/runner/runner.py
new file mode 100755
index 000..21de78e
--- /dev/null
+++ b/tests/image-fuzzer/runner/runner.py
@@ -0,0 +1,270 @@
+#!/usr/bin/env python
+
+# Tool for running fuzz tests
+#
+# Copyright (C) 2014 Maria Kustova mari...@catit.be
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+import sys, os, signal
+from time import time
+import subprocess
+import random
+from itertools import count
+from shutil import rmtree
+import getopt
+import resource
+resource.setrlimit(resource.RLIMIT_CORE, (-1, -1))
+
+
+def multilog(msg, *output):
+ Write an object to all of specified file descriptors
+
+
+for fd in output:
+fd.write(msg)
+fd.flush()
+
+
+def str_signal(sig):
+ Convert a numeric value of a system signal to the string one
+defined by the current operational system
+
+
+for k, v in signal.__dict__.items():
+if v == sig:
+return k
+
+
+class TestException(Exception):
+Exception for errors risen by TestEnv objects
+pass
+
+
+class TestEnv(object):
+ Trivial test object
+
+The class sets up test environment, generates a test image and executes
+application under tests with specified arguments and a test image provided.
+All logs are collected.
+Summary log will contain short descriptions and statuses of tests in
+a run.
+Test log will include application (e.g. 'qemu-img') logs besides info sent
+to the summary log.
+
+
+def __init__(self, test_id, seed, work_dir, run_log, exec_bin=None,
+ cleanup=True, log_all=False):
+Set test environment in a specified work directory.
+
+Path to qemu_img will be retrieved from 'QEMU_IMG' environment
+variable, if a test binary is not specified.
+
+
+if seed is not None:
+self.seed = seed
+else:
+self.seed = hash(time())
+
+self.init_path = os.getcwd()
+self.work_dir = work_dir
+self.current_dir = os.path.join(work_dir, 'test-' + test_id)
+if exec_bin is not None:
+self.exec_bin = exec_bin.strip().split(' ')
+else:
+self.exec_bin = \
+os.environ.get('QEMU_IMG', 'qemu-img').strip().split(' ')
+
+try:
+os.makedirs(self.current_dir)
+except OSError:
+e = sys.exc_info()[1]
+print sys.stderr, \
+Error: The working directory '%s' cannot be used. Reason: %s\
+% (self.work_dir, e[1])
+raise TestException
+self.log = open(os.path.join(self.current_dir, test.log), w)
+self.parent_log = open(run_log, a)
+self.result = False
+self.cleanup = cleanup
+self.log_all = log_all
+
+def _test_app(self, q_args):
+ Start application under test with specified arguments and return
+an exit code or a kill signal depending on result of an execution.
+
+devnull = open('/dev/null', 'r+')
+return subprocess.call(self.exec_bin + q_args,
+   stdin=devnull, stdout=self.log, stderr=self.log)
+
+def execute(self, q_args):
+ Execute a test.
+
+The method creates a test image, runs test app and analyzes its exit
+status. If the application was killed by a signal, the test is marked
+as failed.
+
+os.chdir(self.current_dir)
+# Seed initialization should be as close to image generation call
+# as posssible to avoid a corruption of random sequence
+random.seed(self.seed)
+image_generator.create_image('test_image')
+

Re: [Qemu-devel] [RFC 14/14] ppc/spapr/vga: Switch VGA endian on H_SET_MODE

2014-06-30 Thread Gerd Hoffmann

On Di, 2014-06-24 at 09:11 +1000, Benjamin Herrenschmidt wrote:
 When the guest switches the interrupt endian mode, which essentially
 means a global machine endian switch, we want to change the VGA
 framebuffer endian mode as well in order to be backward compatible
 with existing guests who don't know about the new endian control
 register.

I'd tend to add a notifier (see include/qemu/notify.h) for endian
switches instead, so anybody interested could register himself.

cheers,
  Gerd

[Qemu-devel] [PATCH 3/5] fuzz: Fuzzing functions for qcow2 images

2014-06-30 Thread Maria Kustova

Fuzz submodule of qcow2 image generator contains fuzzing functions for image
fields.
Each fuzzing function contains list of constraints and call of a helper
function that randomly selects a fuzzed value satisfied to one of constraints.
For now constraints are only known as invalid or potentially dangerous values.
But after investigation of code coverage by fuzz tests they will be expanded
by heuristic values based on inner checks and flows of a program under test.

Now fuzzing of header and header extensions is only supported.

Signed-off-by: Maria Kustova mari...@catit.be
---
 tests/image-fuzzer/qcow2/fuzz.py | 329 +++
 1 file changed, 329 insertions(+)
 create mode 100644 tests/image-fuzzer/qcow2/fuzz.py

diff --git a/tests/image-fuzzer/qcow2/fuzz.py b/tests/image-fuzzer/qcow2/fuzz.py
new file mode 100644
index 000..a81296e
--- /dev/null
+++ b/tests/image-fuzzer/qcow2/fuzz.py
@@ -0,0 +1,329 @@
+# Fuzzing functions for qcow2 fields
+#
+# Copyright (C) 2014 Maria Kustova mari...@catit.be
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+import random
+
+
+UINT32 = 0x
+UINT64 = 0x
+# Most significant bit orders
+UINT32_M = 31
+UINT64_M = 63
+# Fuzz vectors
+UINT32_V = [0, 0x100, 0x1000, 0x1, 0x10, UINT32/4, UINT32/2 - 1,
+UINT32/2, UINT32/2 + 1, UINT32 - 1, UINT32]
+UINT64_V = UINT32_V + [0x100, 0x1000, 0x1, UINT64/4,
+   UINT64/2 - 1, UINT64/2, UINT64/2 + 1, UINT64 - 1,
+   UINT64]
+STRING_V = ['%s%p%x%d', '.1024d', '%.2049d', '%p%p%p%p', '%x%x%x%x',
+'%d%d%d%d', '%s%s%s%s', '%999s', '%08x', '%%20d', '%%20n',
+'%%20x', '%%20s', '%s%s%s%s%s%s%s%s%s%s', '%p%p%p%p%p%p%p%p%p%p',
+'%#0123456x%08x%x%s%p%d%n%o%u%c%h%l%q%j%z%Z%t%i%e%g%f%a%C%S%08x%%',
+'%s x 129', '%x x 257']
+
+
+def random_from_intervals(intervals):
+Select a random integer number from the list of specified intervals
+
+Each interval is a tuple of lower and upper limits of the interval. The
+limits are included. Intervals in a list should not overlap.
+
+total = reduce(lambda x, y: x + y[1] - y[0] + 1, intervals, 0)
+r = random.randint(0, total-1) + intervals[0][0]
+temp = zip(intervals, intervals[1:])
+for x in temp:
+r = r + (r  x[0][1])*(x[1][0] - x[0][1] - 1)
+return r
+
+
+def random_bits(bit_ranges):
+Generate random binary mask with ones in the specified bit ranges
+
+Each bit_ranges is a list of tuples of lower and upper limits of bit
+positions will be fuzzed. The limits are included. Random amount of bits
+in range limits will be set to ones. The mask is returned in decimal
+integer format.
+
+bit_numbers = []
+# Select random amount of random positions in bit_ranges
+for rng in bit_ranges:
+bit_numbers += random.sample(range(rng[0], rng[1] + 1),
+ random.randint(0, rng[1] - rng[0] + 1))
+val = 0
+# Set bits on selected possitions to ones
+for bit in bit_numbers:
+val |= 1  bit
+return val
+
+
+def truncate_string(strings, length):
+Return strings truncated to specified length
+if type(strings) == list:
+return [s[length:] for s in strings]
+else:
+return strings[length:]
+
+
+def int_validator(current, intervals):
+Return a random value from intervals not equal to the current.
+
+This function is useful for selection from valid values except current one.
+
+val = random_from_intervals(intervals)
+if val == current:
+return int_validator(current, intervals)
+else:
+return val
+
+
+def bit_validator(current, bit_ranges):
+Return a random bit mask not equal to the current.
+
+This function is useful for selection from valid values except current one.
+
+
+val = random_bits(bit_ranges)
+if val == current:
+return bit_validator(current, bit_ranges)
+else:
+return val
+
+
+def string_validator(current, strings):
+Return a random string value from the list not equal to the current.
+
+This function is useful for selection from valid values except current one.
+
+
+val = random.choice(strings)
+if val == current:
+return string_validator(current, strings)
+else:
+

Re: [Qemu-devel] [PATCH] virtio: move common virtio properties to bus class device

2014-06-30 Thread Michael S. Tsirkin

On Mon, Jun 30, 2014 at 06:15:35PM +0800, Ming Lei wrote:
 Hi Michael,
 
 
 On Wed, Jun 18, 2014 at 3:13 PM, Ming Lei ming@canonical.com wrote:
  The two common virtio features can be defined per bus, so move all
  into bus class device to make code more clean.
 
  As discussed with cornelia, s390-virtio-blk doesn't support
  the two features at all, so keep s390-virtio as it.
 
  Acked-by: Cornelia Huck cornelia.h...@de.ibm.com #for s390 ccw
  Suggested-by: Paolo Bonzini pbonz...@redhat.com
  Signed-off-by: Ming Lei ming@canonical.com
  ---
   hw/s390x/s390-virtio-bus.c  |2 ++
   hw/s390x/virtio-ccw.c   |   11 ++-
   hw/virtio/virtio-pci.c  |   12 ++--
   include/hw/virtio/virtio-blk.h  |3 ---
   include/hw/virtio/virtio-net.h  |1 -
   include/hw/virtio/virtio-scsi.h |1 -
   6 files changed, 14 insertions(+), 16 deletions(-)
 
  diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
  index 9c71afa..921cdc2 100644
  --- a/hw/s390x/s390-virtio-bus.c
  +++ b/hw/s390x/s390-virtio-bus.c
  @@ -504,6 +504,7 @@ static unsigned virtio_s390_get_features(DeviceState *d)
 
   static Property s390_virtio_net_properties[] = {
   DEFINE_NIC_PROPERTIES(VirtIONetS390, vdev.nic_conf),
  +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features),
   DEFINE_VIRTIO_NET_FEATURES(VirtIOS390Device, host_features),
   DEFINE_VIRTIO_NET_PROPERTIES(VirtIONetS390, vdev.net_conf),
   DEFINE_PROP_END_OF_LIST(),
  @@ -631,6 +632,7 @@ static const TypeInfo virtio_s390_device_info = {
 
   static Property s390_virtio_scsi_properties[] = {
   DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOSCSIS390, vdev.parent_obj.conf),
  +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features),
   DEFINE_VIRTIO_SCSI_FEATURES(VirtIOS390Device, host_features),
   DEFINE_PROP_END_OF_LIST(),
   };
  diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
  index 2bf0af8..c0124e1 100644
  --- a/hw/s390x/virtio-ccw.c
  +++ b/hw/s390x/virtio-ccw.c
  @@ -1126,7 +1126,6 @@ static const TypeInfo virtio_ccw_net = {
 
   static Property virtio_ccw_blk_properties[] = {
   DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id),
  -DEFINE_VIRTIO_BLK_FEATURES(VirtioCcwDevice, host_features[0]),
   DEFINE_VIRTIO_BLK_PROPERTIES(VirtIOBlkCcw, blk),
   DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags,
   VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
  @@ -1158,7 +1157,6 @@ static const TypeInfo virtio_ccw_blk = {
   static Property virtio_ccw_serial_properties[] = {
   DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id),
   DEFINE_VIRTIO_SERIAL_PROPERTIES(VirtioSerialCcw, vdev.serial),
  -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]),
   DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags,
   VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
   DEFINE_PROP_END_OF_LIST(),
  @@ -1185,7 +1183,6 @@ static const TypeInfo virtio_ccw_serial = {
 
   static Property virtio_ccw_balloon_properties[] = {
   DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id),
  -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]),
   DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags,
   VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
   DEFINE_PROP_END_OF_LIST(),
  @@ -1242,7 +1239,6 @@ static const TypeInfo virtio_ccw_scsi = {
   static Property vhost_ccw_scsi_properties[] = {
   DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id),
   DEFINE_VHOST_SCSI_PROPERTIES(VirtIOSCSICcw, vdev.parent_obj.conf),
  -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]),
   DEFINE_PROP_END_OF_LIST(),
   };
 
  @@ -1279,7 +1275,6 @@ static void virtio_ccw_rng_instance_init(Object *obj)
 
   static Property virtio_ccw_rng_properties[] = {
   DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id),
  -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]),
   DEFINE_VIRTIO_RNG_PROPERTIES(VirtIORNGCcw, vdev.conf),
   DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags,
   VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
  @@ -1345,10 +1340,16 @@ static int virtio_ccw_busdev_unplug(DeviceState 
  *dev)
   return 0;
   }
 
  +static Property virtio_ccw_properties[] = {
  +DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]),
  +DEFINE_PROP_END_OF_LIST(),
  +};
  +
   static void virtio_ccw_device_class_init(ObjectClass *klass, void *data)
   {
   DeviceClass *dc = DEVICE_CLASS(klass);
 
  +dc-props = virtio_ccw_properties;
   dc-init = virtio_ccw_busdev_init;
   dc-exit = virtio_ccw_busdev_exit;
   dc-unplug = virtio_ccw_busdev_unplug;
  diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
  index b437f19..af2e1c3 100644
  --- a/hw/virtio/virtio-pci.c
  +++ b/hw/virtio/virtio-pci.c
  @@ -917,7 +917,6 @@ static Property virtio_9p_pci_properties[] = {

[Qemu-devel] [PATCH 5/5] package: Public API for image-fuzzer/runner/runner.py

2014-06-30 Thread Maria Kustova

__init__.py provides the public API required by the test runner

Signed-off-by: Maria Kustova mari...@catit.be
---
 tests/image-fuzzer/qcow2/__init__.py | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 tests/image-fuzzer/qcow2/__init__.py

diff --git a/tests/image-fuzzer/qcow2/__init__.py 
b/tests/image-fuzzer/qcow2/__init__.py
new file mode 100644
index 000..e2ebe19
--- /dev/null
+++ b/tests/image-fuzzer/qcow2/__init__.py
@@ -0,0 +1 @@
+from layout import create_image
-- 
1.9.3

Re: [Qemu-devel] [RFC 00/14] VGA cleanups and endian control

2014-06-30 Thread Gerd Hoffmann

On Di, 2014-06-24 at 09:10 +1000, Benjamin Herrenschmidt wrote:
 This series cleans up VGA and a bit of cirrus to remove all
 the now unused conversions to non-32bpp surfaces. Then the last
 two patches add a proposed variant of the endian control register
 and the (still somewhat controversial) trick to auto-switch VGA
 endian on powerpc SPAPR platforms.
 
 They apply on top of Gerd pixel format series

Looks good overall.  I'll go test this when I find time, after flushing
my email backlog ...

cheers,
  Gerd

Re: [Qemu-devel] [PATCH v5] ppc: spapr-rtas - implement os-term rtas call

2014-06-30 Thread Alexander Graf



On 30.06.14 11:25, Nikunj A Dadhania wrote:

Alexander Graf ag...@suse.de writes:


Am 30.06.2014 um 10:35 schrieb Nikunj A Dadhania nik...@linux.vnet.ibm.com:

+static void rtas_ibm_os_term(PowerPCCPU *cpu,
+sPAPREnvironment *spapr,
+uint32_t token, uint32_t nargs,
+target_ulong args,
+uint32_t nret, target_ulong rets)
+{
+target_ulong ret = 0;
+
+qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, error_abort);

The guest doesn't pause.

I see the event reaching libvirt and a dump taken and the guest
restarts.


Since the guest will call os-term in a loop, this will also flood the
event listener with lots and lots of panic messages.

do {
status = rtas_call(rtas_token(ibm,os-term), 1, 1, NULL,
   __pa(rtas_os_term_buf));
} while (rtas_busy_delay(status));

So when status from the rtas call is success, the loop should exit.

Am I missing something?


No, I think you're right. I'll queue it for 2.2.


Alex

Re: [Qemu-devel] [PATCH 1/2] target-ppc: Change default cpu for ppc64le-linux-user

2014-06-30 Thread Tom Musta

On 6/28/2014 1:42 PM, Richard Henderson wrote:
 On 06/28/2014 09:50 AM, Alexander Graf wrote:
 How about we switch to p7 for BE top?
 
 Not ideal until we implement all of p7's insns in TCG:
 
 Warning: Disabling some instructions which are not emulated by TCG (0x0, 0x4)
 
 
 r~
 

That was dealt with in this patch: 
http://lists.nongnu.org/archive/html/qemu-ppc/2014-06/msg00479.html and has now 
made its way into HEAD.

Re: [Qemu-devel] [PATCH] Makefile: Don't build generated headers before Makefile is reread

2014-06-30 Thread Paolo Bonzini


Il 28/06/2014 18:59, Peter Maydell ha scritto:

Having a direct dependency
   Makefile: $(GENERATED_HEADERS)
can result in not-from-clean builds failing sometimes, because it means
that when Make does its is any makefile or include out of date and
needing a rebuild? check, as well as possibly running configure (to
update config-host.mak) it will also rebuild GENERATED_HEADERS under
the old config-host.mak regime. If the makefile rules for rebuilding
the generated headers have changed in a way that means they're not
compatible with the old config-host.mak variable names, the build will
fail. Even if it does work, it's wasted effort since we'll need to
rebuild them with the right config-host.mak settings immediately.

Instead, make the generated headers depend on config-host.mak
and config-target.mak. This means we'll definitely rebuild them
if the configuration changes, but it will be done after Make
reloads its makefiles and includes so will happen with the
correct set of config-host.mak settings.

We rely on the various .o files having correct autogenerated
dependency rules to cause the generated headers to be generated
as and when they are needed.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
The specific example of this is the change from CONFIG_TRACE_BACKEND
to CONFIG_TRACE_BACKENDS, which I am still hitting on a pretty
regular basis. This patch fixes that problem and I don't think it
has any unpleasant side effects; Paolo, have I missed anything?

 Makefile| 8 +++-
 Makefile.target | 2 +-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index 145adb6..dca33b4 100644
--- a/Makefile
+++ b/Makefile
@@ -546,11 +546,9 @@ ifdef SIGNCODE
 endif # SIGNCODE
 endif # CONFIG_WIN

-# Add a dependency on the generated files, so that they are always
-# rebuilt before other object files
-ifneq ($(filter-out %clean,$(MAKECMDGOALS)),$(if $(MAKECMDGOALS),,fail))
-Makefile: $(GENERATED_HEADERS)
-endif
+# Make the generated files depend on config-host.mak so that if
+# the configuration settings change we will rebuild them
+$(GENERATED_HEADERS): config-host.mak

 # Include automatically generated dependency files
 # Dependencies in Makefile.objs files come from our recursive subdir rules
diff --git a/Makefile.target b/Makefile.target
index 6089d29..ea8614a 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -198,4 +198,4 @@ ifdef CONFIG_TRACE_SYSTEMTAP
 endif

 GENERATED_HEADERS += config-target.h
-Makefile: $(GENERATED_HEADERS)
+$(GENERATED_HEADERS): config-target.mak config-devices.mak


config-devices.mak is not reflected in any C header file.  Apart from this,

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Paolo

Re: [Qemu-devel] [PATCH] ahci.c: mask unused flags when reading size PRDT DBC

2014-06-30 Thread Alexander Graf



On 29.06.14 20:21, reza.jel...@gmail.com wrote:

This requires a custom ovmf image with sata controller for testing [0]

[0]: http://reza.jelveh.me/assets/OVMF.fd.bz2



I guess this is supposed to be a cover letter? A few rules for cover 
letters:


  1) Cover letters only make sense for patch sets. This is a single 
patch, so you don't need one
  2) Because they come with patch sets and you number patch sets, cover 
letters are [PATCH 0/n].
  3) Usually cover letters contain git statistics. Try git format-patch 
--cover-letter. It will give you a nice template.


The usual way to add the comment you have here to a patch you're trying 
to submit it is to put volatile information (things that shouldn't end 
up in the git log) behind a --- line in the commit message. Everything 
after that line gets ignored by git when the patch gets applied.



Alex

Re: [Qemu-devel] [PATCH] ahci.c: mask unused flags when reading size PRDT DBC

2014-06-30 Thread Alexander Graf



On 29.06.14 20:21, reza.jel...@gmail.com wrote:

From: Reza Jelveh reza.jel...@tuhh.de


This is a hint that your git configuration isn't fully correct. If the 
email address git thinks you want to use is the same as the From: email 
address, it will not print this line. I suppose the problem is with the 
difference in @gmail.com and @tuhh.de?



The data byte count(DBC) read from the description information is defined for


bits


21:00.


Bits


30:22 are reserved and


bit


31 is the Interrupt on Completion (I) flag.

Interrupt is not implemented in QEMU.


They are implemented in QEMU, but incorrectly. We trigger a completion 
interrupt after every transaction, not every time we see the I bit in 
the PRDT.



tbl_entry_size is a signed integer and
improperly reading the DBC leads to a negative offset that causes sglist
allocation to fail.

Signed-off-by: Reza Jelveh reza.jel...@tuhh.de
---
  hw/ide/ahci.c | 12 +---
  hw/ide/ahci.h |  2 ++
  2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 9bae22e..93aa981 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -639,6 +639,12 @@ static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t 
*cmd_fis)
  }
  }
  
+static int prdt_tbl_entry_size(const AHCI_SG tbl)


The argument should still be a pointer.


+{
+return (le32_to_cpu(tbl.flags_size)  AHCI_PRDT_SIZE_MASK) + 1;
+}
+
+


There is still one whitespace line too much :).


Alex


  static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist, int 
offset)
  {
  AHCICmdHdr *cmd = ad-cur_cmd;
@@ -681,7 +687,7 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList 
*sglist, int offset)
  sum = 0;
  for (i = 0; i  sglist_alloc_hint; i++) {
  /* flags_size is zero-based */
-tbl_entry_size = (le32_to_cpu(tbl[i].flags_size) + 1);
+tbl_entry_size = prdt_tbl_entry_size(tbl[i]);
  if (offset = (sum + tbl_entry_size)) {
  off_idx = i;
  off_pos = offset - sum;
@@ -700,12 +706,12 @@ static int ahci_populate_sglist(AHCIDevice *ad, 
QEMUSGList *sglist, int offset)
  qemu_sglist_init(sglist, qbus-parent, (sglist_alloc_hint - off_idx),
   ad-hba-as);
  qemu_sglist_add(sglist, le64_to_cpu(tbl[off_idx].addr + off_pos),
-le32_to_cpu(tbl[off_idx].flags_size) + 1 - off_pos);
+prdt_tbl_entry_size(tbl[off_idx]) - off_pos);
  
  for (i = off_idx + 1; i  sglist_alloc_hint; i++) {

  /* flags_size is zero-based */
  qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr),
-le32_to_cpu(tbl[i].flags_size) + 1);
+prdt_tbl_entry_size(tbl[i]));
  }
  }
  
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h

index 9a4064f..f418b30 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -201,6 +201,8 @@
  
  #define AHCI_COMMAND_TABLE_ACMD0x40
  
+#define AHCI_PRDT_SIZE_MASK0x3f

+
  #define IDE_FEATURE_DMA1
  
  #define READ_FPDMA_QUEUED  0x60

Re: [Qemu-devel] [PATCH] Makefile: Don't build generated headers before Makefile is reread

2014-06-30 Thread Peter Maydell

On 30 June 2014 13:09, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 28/06/2014 18:59, Peter Maydell ha scritto:

 Having a direct dependency
Makefile: $(GENERATED_HEADERS)
 can result in not-from-clean builds failing sometimes, because it means
 that when Make does its is any makefile or include out of date and
 needing a rebuild? check, as well as possibly running configure (to
 update config-host.mak) it will also rebuild GENERATED_HEADERS under
 the old config-host.mak regime. If the makefile rules for rebuilding
 the generated headers have changed in a way that means they're not
 compatible with the old config-host.mak variable names, the build will
 fail. Even if it does work, it's wasted effort since we'll need to
 rebuild them with the right config-host.mak settings immediately.

 Instead, make the generated headers depend on config-host.mak
 and config-target.mak. This means we'll definitely rebuild them
 if the configuration changes, but it will be done after Make
 reloads its makefiles and includes so will happen with the
 correct set of config-host.mak settings.

 We rely on the various .o files having correct autogenerated
 dependency rules to cause the generated headers to be generated
 as and when they are needed.

 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
 The specific example of this is the change from CONFIG_TRACE_BACKEND
 to CONFIG_TRACE_BACKENDS, which I am still hitting on a pretty
 regular basis. This patch fixes that problem and I don't think it
 has any unpleasant side effects; Paolo, have I missed anything?

  Makefile| 8 +++-
  Makefile.target | 2 +-
  2 files changed, 4 insertions(+), 6 deletions(-)

 diff --git a/Makefile b/Makefile
 index 145adb6..dca33b4 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -546,11 +546,9 @@ ifdef SIGNCODE
  endif # SIGNCODE
  endif # CONFIG_WIN

 -# Add a dependency on the generated files, so that they are always
 -# rebuilt before other object files
 -ifneq ($(filter-out %clean,$(MAKECMDGOALS)),$(if $(MAKECMDGOALS),,fail))
 -Makefile: $(GENERATED_HEADERS)
 -endif
 +# Make the generated files depend on config-host.mak so that if
 +# the configuration settings change we will rebuild them
 +$(GENERATED_HEADERS): config-host.mak

  # Include automatically generated dependency files
  # Dependencies in Makefile.objs files come from our recursive subdir
 rules
 diff --git a/Makefile.target b/Makefile.target
 index 6089d29..ea8614a 100644
 --- a/Makefile.target
 +++ b/Makefile.target
 @@ -198,4 +198,4 @@ ifdef CONFIG_TRACE_SYSTEMTAP
  endif

  GENERATED_HEADERS += config-target.h
 -Makefile: $(GENERATED_HEADERS)
 +$(GENERATED_HEADERS): config-target.mak config-devices.mak


 config-devices.mak is not reflected in any C header file.  Apart from this,

 Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Do you mean ...and therefore should not be listed on the
RHS of this dependency ?

thanks
-- PMM

Re: [Qemu-devel] [PATCH 0/3] block: introduce submit I/O at batch

2014-06-30 Thread Ming Lei

Hi Paolo,

On Mon, Jun 30, 2014 at 7:10 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 30/06/2014 11:49, Ming Lei ha scritto:

 Hi,

 The commit 580b6b2aa2(dataplane: use the QEMU block layer for I/O)
 introduces ~40% throughput regression on virtio-blk dataplane, and
 one of causes is that submitting I/O at batch is removed.

 This patchset trys to introduce this mechanism on block, at least,
 linux-aio can benefit from that.

 With these patches, it is observed that thoughout on virtio-blk
 dataplane can be improved a lot, see data in commit log of patch
 3/3.

 It should be possible to apply the batch mechanism to other devices
 (such as virtio-scsi) too.


 The basic idea of the code is great, however I do not think it is necessary
 to add the logic in AioContext.  You are right that io_submit supports
 multiple I/O, but right now we have one io_context_t per BlockDriverState so
 it is somewhat premature.  Note that AioContext in QEMU means something else
 than io_context_t.

I added the io queue into AioContext because the io queue can only
be used in the attached context(or thread), that said the io queue has to
be put into per context instance.

Or could you suggest where we put the io queue for submitting at
batch?  At the beginning, I put it into bs, but looks like bdrv_io_plug
and bdrv_io_unplug have to handle bs-file and bs itself.


 I don't think it's necessary to walk backing_hd too (especially because
 we're close to release and this cannot be a regression---dataplane in 2.0
 didn't support backing files at all!).  We need to keep the patches as
 simple as possible.

OK, I will not consider backing_hd case in v1, also the patchset will
not only improve dataplane, and other devices should benefit from
it too.

These patches themselves should be simple, and most of code are
add-only, if bdrv_io_plug()/bdrv_io_unplug() isn't used, they are very close
to noop.

I just wrote a draft patch to apply the mechanism on virtio-scsi, and
the improvement is obvious.


 You can just add bdrv_plug/bdrv_unplug callbacks to block/linux-aio.c and,
 in block.c, something like

 BlockDriver *drv = bs-drv;
 if (drv  drv-bdrv_plug) {
 drv-bdrv_plug(bdrv);
 } else if (bs-file) {
 bdrv_plug(bs-file);
 }

 and place all the queuing logic in linux-aio.c.  It should be obvious to the
 reviewer that the patch only affects dataplane.

As I explained above, the io queue has to be per context(thread), so
I am wondering if something like above is simpler since linux-aio still
need to get context information from bs-aio_context.


 (Also, note that QEMU doesn't use __ as the prefix for internal function).

OK, I will follow the QEMU code style.

Thanks,
--
Ming Lei

Re: [Qemu-devel] [PATCH 1/2] target-ppc: Change default cpu for ppc64le-linux-user

2014-06-30 Thread Alexander Graf



On 30.06.14 14:08, Tom Musta wrote:

On 6/28/2014 1:42 PM, Richard Henderson wrote:

On 06/28/2014 09:50 AM, Alexander Graf wrote:

How about we switch to p7 for BE top?

Not ideal until we implement all of p7's insns in TCG:

Warning: Disabling some instructions which are not emulated by TCG (0x0, 0x4)


r~


That was dealt with in this patch: 
http://lists.nongnu.org/archive/html/qemu-ppc/2014-06/msg00479.html and has now 
made its way into HEAD.



Yup, I'll change the default for BE to POWER7 too while applying.


Alex

Re: [Qemu-devel] [questions] about KVM as a Microsoft-compatiblehypervisor

2014-06-30 Thread Vadim Rozenfeld

On Mon, 2014-06-30 at 19:45 +0800, Zhang Haoyu wrote:
  Hi, Vadim
  I read the kvm-2012-forum paper  KVM as a Microsoft-compatible 
  hypervisor, 
  Any update and other references, please?
  
  Thanks,
  Zhang Haoyu
  
  
 
 Unfortunately, not too much.
 From the the most recent, we have lazy eoi implemented by 
 MST and reference time counter.
 
 How to get the source of windows pv-eoi?
I'll be referencing to git://git.kernel.org/pub/scm/virt/kvm/kvm.git

for lazy eoi please take a look at commit:
b63cf42fd1d8c18fab71222321aaf356f63089c9

 And what is reference time counter, could you provide some references or 
 code, please?
Take a look at commit:
e984097b553ed2d6551c805223e4057421370f00

I also suggest reading Hypervisor Functional Specification 3.0a provided
by Microsoft and available for downloading
from  http://www.microsoft.com/en-au/download/details.aspx?id=39289

Best regards,
Vadim.

 
 Thanks,
 Zhang Haoyu
 
 Best regards,
 Vadim.

Re: [Qemu-devel] [PATCH 0/3] iotests: Fix qemu-iotests-quick.sh

2014-06-30 Thread Kevin Wolf

Am 27.06.2014 um 22:47 hat Max Reitz geschrieben:
 My previous series iotests: Allow out-of-tree run broke
 qemu-iotests-quick.sh. Fixing it means simplifying it and allowing more
 tests to be added to the quick group, which is what this series does. It
 also adds some unaffected tests to the quick group because I can't see a
 reason why not to.
 
 This series (or at least patch 2) depends on v3 of my series
 block: Fix unset \filename\ for certain drivers.

Reviewed-by: Kevin Wolf kw...@redhat.com

The conflict without 'block: Fix unset filename for certain drivers'
is trivial to resolve (drop the 097 line in tests/qemu-iotests/group),
so this can be merged even without the other series.

Kevin

Re: [Qemu-devel] [PATCH 2/2] target-ppc: Fix gdbstub for ppc64le-linux-user

2014-06-30 Thread Alexander Graf



On 28.06.14 18:45, Richard Henderson wrote:

The bswap that's needed for system mode isn't required for
user mode, and in fact breaks debugging.

Cc: Aldy Hernandez al...@redhat.com
Signed-off-by: Richard Henderson r...@twiddle.net


This breaks the Apple gdbstub backend we recently got in target-ppc. 
I'll fix it up while applying.



Alex

Re: [Qemu-devel] [PATCH 0/2] ppc64le-linux-user fixes

2014-06-30 Thread Alexander Graf



On 28.06.14 18:45, Richard Henderson wrote:

Two fixes needed to run and debug hello world.


Thanks, fixed up both patches and applied the to ppc-next (2.1 branch).


Alex

Re: [Qemu-devel] [PATCH v2 3/3] serial: poll the serial console with G_IO_HUP

2014-06-30 Thread Paolo Bonzini


Il 30/06/2014 13:00, Roger Pau Monné ha scritto:

Do I need to resend this? it's been more than a month without review.


I'll send a pull request for it.

Sorry for the delay.

Paolo


On 13/06/14 17:35, Roger Pau Monné wrote:

Ping?

On 23/05/14 17:57, Roger Pau Monne wrote:

On FreeBSD polling a master pty while the other end is not connected
with G_IO_OUT only results in an endless wait. This is different from
the Linux behaviour, that returns immediately. In order to demonstrate
this, I have the following example code:

http://xenbits.xen.org/people/royger/test_poll.c

When executed on Linux:

$ ./test_poll
In callback

On FreeBSD instead, the callback never gets called:

$ ./test_poll

So, in order to workaround this, poll the source with G_IO_HUP (which
makes the code behave the same way on both Linux and FreeBSD).

Signed-off-by: Roger Pau Monné roger@citrix.com
Cc: Peter Crosthwaite peter.crosthwa...@xilinx.com
Cc: Michael Tokarev m...@tls.msk.ru
Cc: Andreas Färber afaer...@suse.de
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: xen-de...@lists.xenproject.org
---
Changes since v1:
 - Fix other users of qemu_chr_fe_add_watch to use G_IO_HUP.
---
 hw/char/serial.c |2 +-
 hw/char/virtio-console.c |3 ++-
 hw/usb/redirect.c|2 +-
 monitor.c|2 +-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/char/serial.c b/hw/char/serial.c
index f4d167f..2a2c9e5 100644
--- a/hw/char/serial.c
+++ b/hw/char/serial.c
@@ -246,7 +246,7 @@ static gboolean serial_xmit(GIOChannel *chan, GIOCondition 
cond, void *opaque)
 serial_receive1(s, s-tsr, 1);
 } else if (qemu_chr_fe_write(s-chr, s-tsr, 1) != 1) {
 if (s-tsr_retry = 0  s-tsr_retry  MAX_XMIT_RETRY 
-qemu_chr_fe_add_watch(s-chr, G_IO_OUT, serial_xmit, s)  0) {
+qemu_chr_fe_add_watch(s-chr, G_IO_OUT|G_IO_HUP, serial_xmit, s)  
0) {
 s-tsr_retry++;
 return FALSE;
 }
diff --git a/hw/char/virtio-console.c b/hw/char/virtio-console.c
index 6c8be0f..38e290a 100644
--- a/hw/char/virtio-console.c
+++ b/hw/char/virtio-console.c
@@ -69,7 +69,8 @@ static ssize_t flush_buf(VirtIOSerialPort *port,
 if (!k-is_console) {
 virtio_serial_throttle_port(port, true);
 if (!vcon-watch) {
-vcon-watch = qemu_chr_fe_add_watch(vcon-chr, G_IO_OUT,
+vcon-watch = qemu_chr_fe_add_watch(vcon-chr,
+G_IO_OUT|G_IO_HUP,
 chr_write_unblocked, vcon);
 }
 }
diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index 287a505..06e757d 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -284,7 +284,7 @@ static int usbredir_write(void *priv, uint8_t *data, int 
count)
 r = qemu_chr_fe_write(dev-cs, data, count);
 if (r  count) {
 if (!dev-watch) {
-dev-watch = qemu_chr_fe_add_watch(dev-cs, G_IO_OUT,
+dev-watch = qemu_chr_fe_add_watch(dev-cs, G_IO_OUT|G_IO_HUP,
usbredir_write_unblocked, dev);
 }
 if (r  0) {
diff --git a/monitor.c b/monitor.c
index 593679a..ae1c539 100644
--- a/monitor.c
+++ b/monitor.c
@@ -304,7 +304,7 @@ void monitor_flush(Monitor *mon)
 mon-outbuf = tmp;
 }
 if (mon-watch == 0) {
-mon-watch = qemu_chr_fe_add_watch(mon-chr, G_IO_OUT,
+mon-watch = qemu_chr_fe_add_watch(mon-chr, G_IO_OUT|G_IO_HUP,
monitor_unblocked, mon);
 }
 }

Re: [Qemu-devel] [PATCH] qemu-char: Convert socket char backend to parse/kind

2014-06-30 Thread Paolo Bonzini


Il 30/06/2014 12:57, Gerd Hoffmann ha scritto:

On Mo, 2014-06-30 at 11:33 +0100, Peter Maydell wrote:

On 30 June 2014 11:23, Gerd Hoffmann kra...@redhat.com wrote:

  Hi,


 * The old qemu_chr_open_socket() has an
   if (!is_waitconnect)
   qemu_set_nonblock(fd);
   which the QMP char-open codepath has never had. Does this matter?
   Which of the two code paths was correct?


Gerd?


IIRC the socket is put into non-blocking mode anyway by the qemu socket
helper functions.


In that case is qemu_chr_open_socket_fd() incorrect
in marking the socket as nonblocking in the
is_listen  is_waitconnect case?


It's unnecessary.  tcp_chr_accept calls tcp_chr_add_client, which takes 
care of that.  But it doesn't hurt either.



Honestly I think the whole waitconnect stuff should be ripped out.
Obvious problem is backward compatibility.  But maybe not because nobody
uses it anyway.  Anyone out there using chardev server sockets without
'nowait' option?

waitconnect means go into server mode, wait for a client to connect,
then unpause the guest.  Which handles one special case of the generic
start qemu with -S, connect everything, then sent 'cont' to monitor
libvirt is doing.

IIRC wait for client to connect is a blocking accept() call.  Which
you certainly don't want do in a qmp command handler.


I agree you don't want to do it, but then... just don't do it. :)  We 
can leave in waitconnect, and leave it blocking.  QMP clients simply 
shouldn't use it, but if they do it's not our fault.


In fact, ChardevSocket.wait defaults to false (unlike the command-line 
counterpart).


Paolo

1 2 3 4 >

1 - 100 of 321 matches

Mail list logo