[Qemu-devel] [PATCH V4] qemu-img create: add 'nocow' option
Add 'nocow' option so that users could have a chance to set NOCOW flag to newly created files. It's useful on btrfs file system to enhance performance. Btrfs has low performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files. Generally, there are two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then all newly created files will be NOCOW. b) per file. Add the NOCOW file attribute. It could only be done to empty or new files. This patch tries the second way, according to the option, it could add NOCOW per file. For most block drivers, since the create file step is in raw-posix.c, so we can do setting NOCOW flag ioctl in raw-posix.c only. But there are some exceptions, like block/vpc.c and block/vdi.c, they are creating file by calling qemu_open directly. For them, do the same setting NOCOW flag ioctl work in them separately. Signed-off-by: Chunyan Liu cy...@suse.com --- Changes to v3: * remove NOCOW option from .create_opts of those drivers calling bdrv_create_file to create file. Adding NOCOW to raw-posix.c is enough. No difference to users when using 'qemu-img create' interface. Make the patch cleaner. block/qed.c | 6 +++--- block/raw-posix.c | 25 + block/vdi.c | 29 + block/vmdk.c | 6 +++--- block/vpc.c | 29 + include/block/block_int.h | 1 + qemu-doc.texi | 16 qemu-img.texi | 16 8 files changed, 122 insertions(+), 6 deletions(-) diff --git a/block/qed.c b/block/qed.c index eddae92..b69374b 100644 --- a/block/qed.c +++ b/block/qed.c @@ -567,7 +567,7 @@ static void bdrv_qed_close(BlockDriverState *bs) static int qed_create(const char *filename, uint32_t cluster_size, uint64_t image_size, uint32_t table_size, const char *backing_file, const char *backing_fmt, - Error **errp) + QemuOpts *opts, Error **errp) { QEDHeader header = { .magic = QED_MAGIC, @@ -586,7 +586,7 @@ static int qed_create(const char *filename, uint32_t cluster_size, int ret = 0; BlockDriverState *bs; -ret = bdrv_create_file(filename, NULL, local_err); +ret = bdrv_create_file(filename, opts, local_err); if (ret 0) { error_propagate(errp, local_err); return ret; @@ -682,7 +682,7 @@ static int bdrv_qed_create(const char *filename, QemuOpts *opts, Error **errp) } ret = qed_create(filename, cluster_size, image_size, table_size, - backing_file, backing_fmt, errp); + backing_file, backing_fmt, opts, errp); finish: g_free(backing_file); diff --git a/block/raw-posix.c b/block/raw-posix.c index dacf4fb..825a0c8 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -55,6 +55,9 @@ #include linux/cdrom.h #include linux/fd.h #include linux/fs.h +#ifndef FS_NOCOW_FL +#define FS_NOCOW_FL 0x0080 /* Do not cow file */ +#endif #endif #ifdef CONFIG_FIEMAP #include linux/fiemap.h @@ -1278,12 +1281,14 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp) int fd; int result = 0; int64_t total_size = 0; +bool nocow = false; strstart(filename, file:, filename); /* Read out options */ total_size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) / BDRV_SECTOR_SIZE; +nocow = qemu_opt_get_bool(opts, BLOCK_OPT_NOCOW, false); fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644); @@ -1291,6 +1296,21 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp) result = -errno; error_setg_errno(errp, -result, Could not create file); } else { +if (nocow) { +#ifdef __linux__ +/* Set NOCOW flag to solve performance issue on fs like btrfs. + * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value + * will be ignored since any failure of this operation should not + * block the left work. + */ +int attr; +if (ioctl(fd, FS_IOC_GETFLAGS, attr) == 0) { +attr |= FS_NOCOW_FL; +ioctl(fd, FS_IOC_SETFLAGS, attr); +} +#endif +} + if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) { result = -errno; error_setg_errno(errp, -result, Could not resize file); @@ -1477,6 +1497,11 @@ static QemuOptsList raw_create_opts = { .type = QEMU_OPT_SIZE, .help = Virtual disk size }, +{ +.name = BLOCK_OPT_NOCOW, +.type = QEMU_OPT_BOOL, +.help = Turn
Re: [Qemu-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote: On 2014/6/26 18:03, Paolo Bonzini wrote: Il 26/06/2014 11:18, Chen, Tiejun ha scritto: - offsets 0x..0x0fff map to configuration space of the host MCH Are you saying the config space in the video device? No, I am saying in a new BAR, or at some magic offset of an existing MMIO BAR. As I mentioned previously, the IGD guy told me we have no any unused a offset or BAR in the config space. And guy who are responsible for the native driver seems not be accept to extend some magic offset of an existing MMIO BAR. In addition I think in a short time its not possible to migrate i440fx to q35 as a PCIe machine of xen. That seems like a weak motivation. I don't see a need to get something merged upstream in a short time: this seems sure to miss 2.1, so you have the time to make it architecturally sound. Making existing guests work would be a better motivation. Isn't this possible with an mch chipset? So could we do this step by step: #1 phase: We just cover current qemu-xen implementation based on i44fx, so still provide that pseudo ISA bridge at 00:1f.0 as we already did. By the way there is no reason to put it at 00:1f.0 specifically I think. So it seems simple: create a dummy device that gets device and vendor id as properties. If you really like, add an option to get values from sysfs: device and vendor id are world readable, so just get them directly and not through xen wrappers, this way you can open the files RO and not RW. You seem to poke at revision as well, I don't see driver looking at that - strictly necessary? If yes please patch host kernel to expose that info in sysfs, though we can fall back on pci config if not there. MCH (bridge_dev) hacks in i915 are nastier. To clean them up, we really have to have an explicit driver for this bridge, not a pass-through device. Long term, the right thing to do is likely to extend host driver and expose the necessary information in sysfs on host kernel. #2 phase: Now, we will choose a capability ID that won't be conflicting with others. To do this properly, we need to get one from PCI SIG group. To have this workable and consistently validated, this method shouldn't be virt specific. Then native driver should use the same method. You mean you will be able to talk sense into hardware guys? I doubt that. If they could be convinced to make e.g. i915 base a proper BAR, why didn't they? So when xen work on q35 PCIe machine, we can walk this way. If you are emulating MCH anyway, pick one that is close to what i915 driver expects. It would then work with existing devices, without new capability IDs. Anthony, Any comments to address this in xen case? Thanks Tiejun
Re: [Qemu-devel] [PATCH v9 00/22] legacy virtio support for cross-endian targets
On Sun, 29 Jun 2014 18:13:53 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Tue, Jun 24, 2014 at 07:06:58PM +0200, Greg Kurz wrote: The current legacy virtio devices have a fundamental flaw: they all share data between host and guest with guest endianness ordering. This is ok for nearly all architectures that have fixed endianness. Unfortunately, it breaks for recent PPC64 and ARM targets that can change endianness at runtime. The virtio-1.0 specification fixes the issue by enforcing little-endian ordering. It may take some time though until the code for 1.0 gets available and supported, and all the users can migrate. There have been discussions for some monthes about supporting such oddity: now we have little-endian PPC64 distros available, it is worth to propose something. This patch set brings legacy virtio support for cross-endian targets. The rationale is that we add a new device_endianness property to VirtIODevice. This property is used as a runtime indicator to decide wether we should do little-endian or big-endian conversion, as opposed to the compile time choice we have now with TARGTE_WORDS_BIGENDIAN. The choice was made to sample the device endianness out of the endianness mode of the guest CPU that does the reset. It is an evil but logical consequence of the initial flaw in the virtio specification, and it was agreed that the concept would be a good common base for ARM and PPC64 enablement at least. Please note also that this new property is state and must be preserved across migrations. There are several parts in the serie: - patches 1 and 2 are simple fixes - patches 3 to 9 introduce VMState based subsections in the virtio migration code. This is needed because we introduce a new property in VirtIODevice that we want to migrate without ruining compatibility efforts - patches 10 to 13 bring virtio device endianness and memory accessors to be used by the virtio code - patches 14 to 20 wire the new memory accessors everywhere accross the virtio code - patch 21 is the PPC64 enablement - patch 22 is a follow-up workaround to disable vhost-net acceleration in the case the host and guest have different endianness, because it is not supported for the moment Changes since v8 are provided in each patch. Cheers. Applied, thanks everyone. \O/ Thanks Michael ! -- Greg --- Alexander Graf (1): virtio-serial: don't migrate the config space Cédric Le Goater (1): virtio-net: byteswap virtio-net header Greg Kurz (14): virtio: introduce device specific migration calls virtio-net: implement per-device migration calls virtio-blk: implement per-device migration calls virtio-serial: implement per-device migration calls virtio-balloon: implement per-device migration calls virtio-rng: implement per-device migration calls virtio: add subsections to the migration stream exec: introduce target_words_bigendian() helper cpu: introduce CPUClass::virtio_is_big_endian() virtio: add endian-ambivalent support to VirtIODevice virtio: memory accessors for endian-ambivalent targets virtio-9p: use virtio wrappers to access headers target-ppc: enable virtio endian ambivalent support vhost-net: disable when cross-endian Rusty Russell (6): virtio: allow byte swapping for vring virtio-net: use virtio wrappers to access headers virtio-balloon: use virtio wrappers to access page frame numbers virtio-blk: use virtio wrappers to access headers virtio-scsi: use virtio wrappers to access headers virtio-serial-bus: use virtio wrappers to access headers exec.c|8 - hw/9pfs/virtio-9p-device.c|3 - hw/block/virtio-blk.c | 62 ++- hw/char/virtio-serial-bus.c | 94 ++-- hw/net/vhost_net.c| 19 +++ hw/net/virtio-net.c | 56 +++--- hw/scsi/virtio-scsi.c | 40 --- hw/virtio/virtio-balloon.c| 33 +++--- hw/virtio/virtio-pci.c| 11 +- hw/virtio/virtio-rng.c| 12 +- hw/virtio/virtio.c| 216 - include/hw/virtio/virtio-access.h | 170 + include/hw/virtio/virtio.h| 17 +++ include/qom/cpu.h |1 qom/cpu.c |6 + target-ppc/cpu.h |2 target-ppc/translate_init.c | 15 +++ 17 files changed, 583 insertions(+), 182 deletions(-) create mode 100644 include/hw/virtio/virtio-access.h -- Greg
Re: [Qemu-devel] [PATCH] numa: check for busy memory backend
On Mon, Jun 30, 2014 at 01:33:42PM +0800, Hu Tao wrote: On Sun, Jun 29, 2014 at 06:20:22PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 25, 2014 at 05:04:14PM +0800, Hu Tao wrote: ..to prevent one memory backend from being used by more than one numa node. Thanks, but please always make the msg content self-contained so it can be understood without the subject. E.g. here, just drop ..to. Are you sure we want this? Is there a chance sharing a backend can be useful? This patch is actually a bug fix. It is? What is the bug and how to reproduce it? I am not sure we should write a ton of code to validate qemu configuration, as long as qemu does not assert. Even if we will want backend sharing, we can do it after. By reverting this patch? So why merge it? Igor, what's your take? Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- numa.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/numa.c b/numa.c index e471afe..6c1c554 100644 --- a/numa.c +++ b/numa.c @@ -279,6 +279,13 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner, exit(1); } +if (memory_region_is_mapped(seg)) { +char *path = object_get_canonical_path_component(OBJECT(backend)); +error_report(memory backend %s is busy, path); +g_free(path); +exit(1); +} + memory_region_add_subregion(mr, addr, seg); vmstate_register_ram_global(seg); addr += size; -- 1.9.3
Re: [Qemu-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On 2014/6/30 14:48, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote: On 2014/6/26 18:03, Paolo Bonzini wrote: Il 26/06/2014 11:18, Chen, Tiejun ha scritto: - offsets 0x..0x0fff map to configuration space of the host MCH Are you saying the config space in the video device? No, I am saying in a new BAR, or at some magic offset of an existing MMIO BAR. As I mentioned previously, the IGD guy told me we have no any unused a offset or BAR in the config space. And guy who are responsible for the native driver seems not be accept to extend some magic offset of an existing MMIO BAR. In addition I think in a short time its not possible to migrate i440fx to q35 as a PCIe machine of xen. That seems like a weak motivation. I don't see a need to get something merged upstream in a short time: this seems sure to miss 2.1, so you have the time to make it architecturally sound. Making existing guests work would be a better motivation. Yes. Isn't this possible with an mch chipset? If you're saying q35, I mean AFAIK we have no any plan to migrate to this MCH in xen case. Additionally, I think I should follow this feature after q35 can work for xen scenario. So could we do this step by step: #1 phase: We just cover current qemu-xen implementation based on i44fx, so still provide that pseudo ISA bridge at 00:1f.0 as we already did. By the way there is no reason to put it at 00:1f.0 specifically I think. So it seems simple: create a dummy device that gets device and vendor id as properties. If you really like, add an option to get values Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx passthrough: create pseudo intel isa bridge. There, we fake this device just at 00:1f.0. But you guys don't like this, and shouldn't this be just this point we discussing now? If you guys agree that , everything is fine. from sysfs: device and vendor id are world readable, so just get them directly and not through xen wrappers, this way you can open the files RO and not RW. You seem to poke at revision as well, I don't see driver looking at that - strictly necessary? If yes please patch host kernel to expose that info in sysfs, though we can fall back on pci config if not there. MCH (bridge_dev) hacks in i915 are nastier. To clean them up, we really have to have an explicit driver for this bridge, not a pass-through device. Long term, the right thing to do is likely to extend host driver and expose the necessary information in sysfs on host kernel. I'm a bit confused. Any sysfs should be filled by the associated PCIe device, shouldn't it? So qemu still need to emulate this PCIe device firstly, then set properties into sysfs. #2 phase: Now, we will choose a capability ID that won't be conflicting with others. To do this properly, we need to get one from PCI SIG group. To have this workable and consistently validated, this method shouldn't be virt specific. Then native driver should use the same method. You mean you will be able to talk sense into hardware guys? I doubt that. If they could be convinced to make e.g. i915 base a We're negotiating this, so this is just our long term solution we figure out currently. proper BAR, why didn't they? We already have no any free BAR as we mentioned previously. So when xen work on q35 PCIe machine, we can walk this way. If you are emulating MCH anyway, pick one that is close to what i915 driver expects. It would then work with existing Looks you guys prefer we create a new MCH anyway, right? But is it necessary to create a new based on i440fx just for a little change? Thanks Tiejun devices, without new capability IDs. Anthony, Any comments to address this in xen case? Thanks Tiejun
[Qemu-devel] [PATCH] ui/vnc: fix potential memory corruption issues
this patch addresses 2 memory corruption issues. The first was actually discovered during playing around with a Windows 7 vServer. During resolution change in Windows 7 it happens sometimes that Windows changes to an intermediate resolution where server_stride % cmp_bytes != 0 (in vnc_refresh_server_surface). This happens only if width % VNC_DIRTY_PIXELS_PER_BIT != 0. The patch fixes the issue by clamping cmp_bytes in that case and it finally makes those resolutions work correctly. This can be easily tested by setting VNC_DIRTY_PIXELS_PER_BIT to a bigger power of 2 value different than 16. The second is a theoretical issue, but is maybe exploitable by the guest. If for some reason the surface size is bigger than VNC_MAX_WIDTH x VNC_MAX_HEIGHT we end up in severe corruption. This can be easily reproduced by playing around with VNC_MAX_WIDTH and VNC_MAX_HEIGHT. This patch modifies the VNC server to only track and copy the area up to the maximum possible size. Signed-off-by: Peter Lieven p...@kamp.de --- ui/vnc.c | 149 +- ui/vnc.h | 14 +++--- 2 files changed, 77 insertions(+), 86 deletions(-) diff --git a/ui/vnc.c b/ui/vnc.c index 14a86c3..a6d748b 100644 --- a/ui/vnc.c +++ b/ui/vnc.c @@ -432,14 +432,10 @@ static void framebuffer_update_request(VncState *vs, int incremental, static void vnc_refresh(DisplayChangeListener *dcl); static int vnc_refresh_server_surface(VncDisplay *vd); -static void vnc_dpy_update(DisplayChangeListener *dcl, - int x, int y, int w, int h) -{ -VncDisplay *vd = container_of(dcl, VncDisplay, dcl); -struct VncSurface *s = vd-guest; -int width = surface_width(vd-ds); -int height = surface_height(vd-ds); - +static void vnc_set_area_dirty(DECLARE_BITMAP(dirty[VNC_MAX_HEIGHT], + VNC_MAX_WIDTH / VNC_DIRTY_PIXELS_PER_BIT), + int width, int height, + int x, int y, int w, int h) { /* this is needed this to ensure we updated all affected * blocks if x % VNC_DIRTY_PIXELS_PER_BIT != 0 */ w += (x % VNC_DIRTY_PIXELS_PER_BIT); @@ -451,11 +447,22 @@ static void vnc_dpy_update(DisplayChangeListener *dcl, h = MIN(y + h, height); for (; y h; y++) { -bitmap_set(s-dirty[y], x / VNC_DIRTY_PIXELS_PER_BIT, +bitmap_set(dirty[y], x / VNC_DIRTY_PIXELS_PER_BIT, DIV_ROUND_UP(w, VNC_DIRTY_PIXELS_PER_BIT)); } } +static void vnc_dpy_update(DisplayChangeListener *dcl, + int x, int y, int w, int h) +{ +VncDisplay *vd = container_of(dcl, VncDisplay, dcl); +struct VncSurface *s = vd-guest; +int width = pixman_image_get_width(vd-server); +int height = pixman_image_get_height(vd-server); + +vnc_set_area_dirty(s-dirty, width, height, x, y, w, h); +} + void vnc_framebuffer_update(VncState *vs, int x, int y, int w, int h, int32_t encoding) { @@ -517,17 +524,15 @@ void buffer_advance(Buffer *buf, size_t len) static void vnc_desktop_resize(VncState *vs) { -DisplaySurface *ds = vs-vd-ds; - if (vs-csock == -1 || !vnc_has_feature(vs, VNC_FEATURE_RESIZE)) { return; } -if (vs-client_width == surface_width(ds) -vs-client_height == surface_height(ds)) { +if (vs-client_width == pixman_image_get_width(vs-vd-server) +vs-client_height == pixman_image_get_height(vs-vd-server)) { return; } -vs-client_width = surface_width(ds); -vs-client_height = surface_height(ds); +vs-client_width = pixman_image_get_width(vs-vd-server); +vs-client_height = pixman_image_get_height(vs-vd-server); vnc_lock_output(vs); vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE); vnc_write_u8(vs, 0); @@ -571,31 +576,24 @@ void *vnc_server_fb_ptr(VncDisplay *vd, int x, int y) ptr += x * VNC_SERVER_FB_BYTES; return ptr; } -/* this sets only the visible pixels of a dirty bitmap */ -#define VNC_SET_VISIBLE_PIXELS_DIRTY(bitmap, w, h) {\ -int y;\ -memset(bitmap, 0x00, sizeof(bitmap));\ -for (y = 0; y h; y++) {\ -bitmap_set(bitmap[y], 0,\ - DIV_ROUND_UP(w, VNC_DIRTY_PIXELS_PER_BIT));\ -} \ -} static void vnc_dpy_switch(DisplayChangeListener *dcl, DisplaySurface *surface) { VncDisplay *vd = container_of(dcl, VncDisplay, dcl); VncState *vs; +int width, height; vnc_abort_display_jobs(vd); /* server surface */ qemu_pixman_image_unref(vd-server); vd-ds = surface; +width = MIN(VNC_MAX_WIDTH, ROUND_UP(surface_width(vd-ds), +VNC_DIRTY_PIXELS_PER_BIT)); +height = MIN(VNC_MAX_HEIGHT, surface_height(vd-ds)); vd-server = pixman_image_create_bits(VNC_SERVER_FB_FORMAT, - surface_width(vd-ds), -
Re: [Qemu-devel] [PATCH 0/4] ui/cocoa: Fix absolute positioning and other bugs
On Mo, 2014-06-23 at 10:35 +0100, Peter Maydell wrote: This set of cocoa UI patches: * fixes the completely broken handling of absolute positioning (tablet-style) input devices * fixes a bug where if the first surface created was the same 640x480 as the initial window we'd never actually draw it * implements support for the -show-cursor command line option The GTK and SDL UI frontends don't seem to be consistent about how they handle mousegrab for absolute-position devices; I followed SDL on the basis that it was the older and more established UI. (GTK doesn't implement -show-cursor at all, incidentally.) Series looks sane to me, although I can't claim I fully understand patch #3, especially the macos x mouse handling internals involved there ... cheers, Gerd
Re: [Qemu-devel] possible denial of service via VNC
On So, 2014-06-29 at 14:16 +0200, Peter Lieven wrote: Hi, while debugging a VNC issue I found this: case VNC_MSG_CLIENT_CUT_TEXT: if (len == 1) return 8; if (len == 8) { uint32_t dlen = read_u32(data, 4); if (dlen 0) return 8 + dlen; } client_cut_text(vs, read_u32(data, 4), data + 8); break; in protocol_client_msg(). Is this really a good idea? This allows for letting the vs-input buffer to grow up to 2^32 + 8 byte which will possibly result in an out of memory condition. Applying a limit there looks reasonable to me. Patches welcome. As this is text only a megabyte should be more than enough for all practical purposes. Question is what to do when the limit is exceeded? Disconnect? Read throw away? cheers, Gerd
Re: [Qemu-devel] possible denial of service via VNC
On 30.06.2014 09:33, Gerd Hoffmann wrote: On So, 2014-06-29 at 14:16 +0200, Peter Lieven wrote: Hi, while debugging a VNC issue I found this: case VNC_MSG_CLIENT_CUT_TEXT: if (len == 1) return 8; if (len == 8) { uint32_t dlen = read_u32(data, 4); if (dlen 0) return 8 + dlen; } client_cut_text(vs, read_u32(data, 4), data + 8); break; in protocol_client_msg(). Is this really a good idea? This allows for letting the vs-input buffer to grow up to 2^32 + 8 byte which will possibly result in an out of memory condition. Applying a limit there looks reasonable to me. Patches welcome. As this is text only a megabyte should be more than enough for all practical purposes. Question is what to do when the limit is exceeded? Disconnect? Read throw away? I would also think something in the order of megabytes should be fine. I would vote for disconnect as soon as the limit specified is too big. Otherwise we had to rewrite the whole receive logic which could introduce additional bugs. Peter -- Mit freundlichen Grüßen Peter Lieven ... KAMP Netzwerkdienste GmbH Vestische Str. 89-91 | 46117 Oberhausen Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40 p...@kamp.de | http://www.kamp.de Geschäftsführer: Heiner Lante | Michael Lante Amtsgericht Duisburg | HRB Nr. 12154 USt-Id-Nr.: DE 120607556 ...
Re: [Qemu-devel] possible denial of service via VNC
Hi, I would vote for disconnect as soon as the limit specified is too big. Otherwise we had to rewrite the whole receive logic which could introduce additional bugs. Sounds sensible. cheers, Gerd
Re: [Qemu-devel] [PATCH] numa: check for busy memory backend
On Mon, Jun 30, 2014 at 09:53:20AM +0300, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 01:33:42PM +0800, Hu Tao wrote: On Sun, Jun 29, 2014 at 06:20:22PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 25, 2014 at 05:04:14PM +0800, Hu Tao wrote: ..to prevent one memory backend from being used by more than one numa node. Thanks, but please always make the msg content self-contained so it can be understood without the subject. E.g. here, just drop ..to. Are you sure we want this? Is there a chance sharing a backend can be useful? This patch is actually a bug fix. It is? What is the bug and how to reproduce it? If user specifies the same memory backend for two numa nodes: ./x86_64-softmmu/qemu-system-x86_64 -hda /home/data/libvirt-images/f18.img -m 512M \ -qmp unix:/tmp/m,server,nowait -monitor stdio -enable-kvm \ -object memory-backend-ram,size=256M,id=ram0 \ -numa node,nodeid=0,memdev=ram0 \ -numa node,nodeid=1,memdev=ram0 I am not sure we should write a ton of code to validate qemu configuration, as long as qemu does not assert. It seems qemu does not provide a way to disable assert currently. Even if I removed asserts on the code path in my test, there is another problem that it hits an infinite in render_memory_region(). Even if we will want backend sharing, we can do it after. By reverting this patch? So why merge it? The point is qemu doesn't fire a bug no matter what user inputs. Igor, what's your take? Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- numa.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/numa.c b/numa.c index e471afe..6c1c554 100644 --- a/numa.c +++ b/numa.c @@ -279,6 +279,13 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner, exit(1); } +if (memory_region_is_mapped(seg)) { +char *path = object_get_canonical_path_component(OBJECT(backend)); +error_report(memory backend %s is busy, path); +g_free(path); +exit(1); +} + memory_region_add_subregion(mr, addr, seg); vmstate_register_ram_global(seg); addr += size; -- 1.9.3
Re: [Qemu-devel] [PATCH] Functions bus_foreach and device_find from libqos virtio API
On Thu, Jun 26, 2014 at 04:34:40PM +0200, Marc Marí wrote: +static QVirtioPCIDevice *qpcidevice_to_qvirtiodevice(QPCIDevice *pdev) +{ +QVirtioPCIDevice *vpcidev; +vpcidev = g_malloc0(sizeof(*vpcidev)); + +if (pdev) { +vpcidev-pdev = pdev; +vpcidev-vdev.device_type = +qpci_config_readw(vpcidev-pdev, PCI_SUBSYSTEM_ID); +/* TODO: When QVirtQueue is defined, change for +g_malloc0(sizeof(QVirtQueue)); */ +vpcidev-vdev.vq = NULL; This doesn't quite work because devices can have multiple virtqueues. Please drop the vq field from this patch. Add it later when you actually implement virtqueues. +} + +return vpcidev; +} + +static void qvirtio_pci_foreach_callback( +QPCIDevice *dev, int devfn, void *data) +{ +QVirtioPCIForeachData *d = data; +QVirtioPCIDevice *vpcidev = qpcidevice_to_qvirtiodevice(dev); + +if (vpcidev-vdev.device_type == d-device_type) { +d-func(vpcidev-vdev, d-user_data); +} +} + +static void qvirtio_pci_assign_device(QVirtioDevice *d, void *data) +{ +QVirtioPCIDevice *vpcidev = data; +vpcidev-pdev = ((QVirtioPCIDevice *)d)-pdev; +vpcidev-vdev.device_type = ((QVirtioPCIDevice *)d)-vdev.device_type; +vpcidev-vdev.vq= ((QVirtioPCIDevice *)d)-vdev.vq; +} + +static void qvirtio_pci_notify(QVirtioDevice *d, uint16_t vector) +{ + +} + +static void qvirtio_pci_get_config(QVirtioDevice *d, void *config) +{ + +} + +static void qvirtio_pci_set_config(QVirtioDevice *d, void *config) +{ + +} + +static uint32_t qvirtio_pci_get_features(QVirtioDevice *d) +{ +return 0; +} + +static uint8_t qvirtio_pci_get_status(QVirtioDevice *d) +{ +return 0; +} + +static void qvirtio_pci_set_status(QVirtioDevice *d, uint8_t val) +{ + +} + +static void qvirtio_pci_reset(QVirtioDevice *d) +{ + +} + +static uint8_t qvirtio_pci_query_isr(QVirtioDevice *d) +{ +return 0; +} Patches should only include useful, working code. Please drop all these unimplemented functions. + +static void qvirtio_pci_foreach(uint16_t device_type, +void (*func)(QVirtioDevice *d, void *data), void *data) +{ +QVirtioPCIForeachData d = { .func = func, +.device_type = device_type, +.user_data = data }; + +if (!bus) { +bus = qpci_init_pc(); +} + +qpci_device_foreach(bus, QVIRTIO_VENDOR_ID, -1, +qvirtio_pci_foreach_callback, d); +} + +static QVirtioDevice *qvirtio_pci_device_find(uint16_t device_type) +{ +QVirtioPCIDevice *dev; + +dev = g_malloc0(sizeof(*dev)); +qvirtio_pci_foreach(device_type, qvirtio_pci_assign_device, dev); + +return (QVirtioDevice *)dev; +} + +const QVirtioBus qvirtio_pci = { +.notify = qvirtio_pci_notify, +.get_config = qvirtio_pci_get_config, +.set_config = qvirtio_pci_set_config, +.get_features = qvirtio_pci_get_features, +.get_status = qvirtio_pci_get_status, +.set_status = qvirtio_pci_set_status, +.reset = qvirtio_pci_reset, +.query_isr = qvirtio_pci_query_isr, +.bus_foreach = qvirtio_pci_foreach, +.device_find = qvirtio_pci_device_find, +}; + diff --git a/tests/libqos/virtio-pci.h b/tests/libqos/virtio-pci.h new file mode 100644 index 000..e5812af --- /dev/null +++ b/tests/libqos/virtio-pci.h @@ -0,0 +1,29 @@ +/* + * libqos virtio PCI definitions + * + * Copyright (c) 2014 Marc Marí + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef LIBQOS_VIRTIO_PCI_H +#define LIBQOS_VIRTIO_PCI_H + +#include libqos/virtio.h +#include libqos/pci.h + +typedef struct QVirtioPCIDevice { +QVirtioDevice vdev; +QPCIDevice *pdev; +} QVirtioPCIDevice; + +typedef struct QVirtioPCIForeachData { +void (*func)(QVirtioDevice *d, void *data); +uint16_t device_type; +void *user_data; +} QVirtioPCIForeachData; + +extern const QVirtioBus qvirtio_pci; + +#endif diff --git a/tests/libqos/virtio.h b/tests/libqos/virtio.h new file mode 100644 index 000..43a0e73 --- /dev/null +++ b/tests/libqos/virtio.h @@ -0,0 +1,64 @@ +/* + * libqos virtio definitions + * + * Copyright (c) 2014 Marc Marí + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef LIBQOS_VIRTIO_H +#define LIBQOS_VIRTIO_H + +#define QVIRTQUEUE_MAX_SIZE 1024 +#define QVIRTIO_VENDOR_ID 0x1AF4 + +#define QVIRTIO_NET_DEVICE_ID 0x1 +#define QVIRTIO_BLK_DEVICE_ID 0x2 + +typedef struct QVirtioDevice QVirtioDevice; +typedef struct QVirtQueue QVirtQueue; +typedef struct QVRing QVRing; + +typedef struct QVirtioDevice { +
Re: [Qemu-devel] possible denial of service via VNC
On 30.06.2014 09:46, Gerd Hoffmann wrote: Hi, I would vote for disconnect as soon as the limit specified is too big. Otherwise we had to rewrite the whole receive logic which could introduce additional bugs. Sounds sensible. Especially since client_cut_text is currently a NOP. Peter
Re: [Qemu-devel] [PATCH] ui/vnc: fix potential memory corruption issues
On Mo, 2014-06-30 at 09:24 +0200, Peter Lieven wrote: this patch addresses 2 memory corruption issues. The first was actually discovered during playing around with a Windows 7 vServer. During resolution change in Windows 7 it happens sometimes that Windows changes to an intermediate resolution where server_stride % cmp_bytes != 0 (in vnc_refresh_server_surface). This happens only if width % VNC_DIRTY_PIXELS_PER_BIT != 0. The patch fixes the issue by clamping cmp_bytes in that case and it finally makes those resolutions work correctly. This can be easily tested by setting VNC_DIRTY_PIXELS_PER_BIT to a bigger power of 2 value different than 16. The second is a theoretical issue, but is maybe exploitable by the guest. If for some reason the surface size is bigger than VNC_MAX_WIDTH x VNC_MAX_HEIGHT we end up in severe corruption. This can be easily reproduced by playing around with VNC_MAX_WIDTH and VNC_MAX_HEIGHT. This patch modifies the VNC server to only track and copy the area up to the maximum possible size. So this basically makes vnc work correctly in case guest surface and server surface have different sizes, then fixes the two bugs on top of that. And it obsoletes the other corruption patch send Friday. Correct? cheers, Gerd
Re: [Qemu-devel] [PATCH] ui/vnc: fix potential memory corruption issues
On 30.06.2014 09:52, Gerd Hoffmann wrote: On Mo, 2014-06-30 at 09:24 +0200, Peter Lieven wrote: this patch addresses 2 memory corruption issues. The first was actually discovered during playing around with a Windows 7 vServer. During resolution change in Windows 7 it happens sometimes that Windows changes to an intermediate resolution where server_stride % cmp_bytes != 0 (in vnc_refresh_server_surface). This happens only if width % VNC_DIRTY_PIXELS_PER_BIT != 0. The patch fixes the issue by clamping cmp_bytes in that case and it finally makes those resolutions work correctly. This can be easily tested by setting VNC_DIRTY_PIXELS_PER_BIT to a bigger power of 2 value different than 16. The second is a theoretical issue, but is maybe exploitable by the guest. If for some reason the surface size is bigger than VNC_MAX_WIDTH x VNC_MAX_HEIGHT we end up in severe corruption. This can be easily reproduced by playing around with VNC_MAX_WIDTH and VNC_MAX_HEIGHT. This patch modifies the VNC server to only track and copy the area up to the maximum possible size. So this basically makes vnc work correctly in case guest surface and server surface have different sizes, then fixes the two bugs on top of that. And it obsoletes the other corruption patch send Friday. Correct? Yes and yes. Basically the server surface is adjusted to not exceed VNC_MAX_WIDTH x VNC_MAX_HEIGHT and on top the width is rounded up to multiple of VNC_DIRTY_PIXELS_PER_BIT. If you have a resolution whose width is not dividable by VNC_DIRTY_PIXELS_PER_BIT you get a small black bar on the right of the screen. First I wanted to fix only that and then I noticed that VNC_MAX_WIDTH and VNC_MAX_HEIGHT are nowhere enforced. I do not know if this is actually exploitable by the guest. If the surface is too big to fit the limits only the upper left area is shown. I ran several sessions with altered values of VNC_MAX_WIDTH, VNC_MAX_HEIGHT and VNC_DIRTY_PIXELS_PER_BIT in valgrind, but additional testing is welcome. Peter
[Qemu-devel] [PATCH] ui/vnc: limit client_cut_text msg payload size
currently a malicious client could define a payload size of 2^32 - 1 bytes and send up to that size of data to the vnc server. The server would allocated that amount of memory which could easily create an out of memory condition. This patch limits the payload size to 1MB max. Please note that client_cut_text messages are currently silently ignored. Signed-off-by: Peter Lieven p...@kamp.de --- ui/vnc.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/ui/vnc.c b/ui/vnc.c index 14a86c3..096278f 100644 --- a/ui/vnc.c +++ b/ui/vnc.c @@ -2165,13 +2165,20 @@ static int protocol_client_msg(VncState *vs, uint8_t *data, size_t len) pointer_event(vs, read_u8(data, 1), read_u16(data, 2), read_u16(data, 4)); break; case VNC_MSG_CLIENT_CUT_TEXT: -if (len == 1) +if (len == 1) { return 8; - +} if (len == 8) { uint32_t dlen = read_u32(data, 4); -if (dlen 0) +if (dlen (1 20)) { +error_report(vnc: client_cut_text msg payload has %u bytes + which exceeds our limit of 1MB., dlen); +vnc_client_error(vs); +break; +} +if (dlen 0) { return 8 + dlen; +} } client_cut_text(vs, read_u32(data, 4), data + 8); -- 1.7.9.5
Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2
On Sat, Jun 28, 2014 at 05:58:58PM +0800, Ming Lei wrote: On Sat, Jun 28, 2014 at 5:51 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 27/06/2014 20:01, Ming Lei ha scritto: I just implemented plugunplug based batching, and it is working now. But throughout still has no obvious improvement. Looks loading in IOthread is a bit low, so I am wondering if there is block point caused by Qemu QEMU block layer. What does perf say? Also, you can try using the QEMU trace subsystem and see where the latency goes. Follows some test result against 8589744aaf07b62 of upstream qemu, and the test is done on my 2core(4thread) laptop: 1, with my draft batch patches[1](only linux-aio supported now) - throughput: +16% compared qemu upstream - average time spent by handle_notify(): 310us - average time between two handle_notify(): 1591us (this time reflects latency of handling host_notifier) 16% is still a worthwhile improvement. I guess batching only benefits aio=native since the threadpool ought to do better when it receives requests as soon as possible. Patch or an RFC would be welcome. 2, same tests on 2.0.0 release(use custom Linux AIO) - average time spent by handle_notify(): 68us - average time between calling two handle_notify(): 269us (this time reflects latency of handling host_notifier) From above tests, looks root cause is late handling notify, and qemu block layer becomes 4times slower than previous custom linux aio taken by dataplane. Try: $ perf record -e syscalls:* --tid iothread-tid ^C $ perf script # shows the trace log The difference between syscalls in QEMU 2.0 and qemu.git/master could reveal the problem. Using perf you can also trace ioeventfd signalling in the host kernel and compare against the QEMU handle_notify entry/return. It may be easiest to use the ftrace_marker tracing backing in QEMU so the trace is unified with the host kernel trace (./configure --enable-trace-backend=ftrace and see the ftrace section in QEMU docs/tracing.txt). This way you can see whether the ioeventfd signal - handle_notify() entry increased or something else is going on. Stefan pgp0tBWdsV4pj.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH] numa: check for busy memory backend
On Mon, Jun 30, 2014 at 03:46:56PM +0800, Hu Tao wrote: On Mon, Jun 30, 2014 at 09:53:20AM +0300, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 01:33:42PM +0800, Hu Tao wrote: On Sun, Jun 29, 2014 at 06:20:22PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 25, 2014 at 05:04:14PM +0800, Hu Tao wrote: ..to prevent one memory backend from being used by more than one numa node. Thanks, but please always make the msg content self-contained so it can be understood without the subject. E.g. here, just drop ..to. Are you sure we want this? Is there a chance sharing a backend can be useful? This patch is actually a bug fix. It is? What is the bug and how to reproduce it? If user specifies the same memory backend for two numa nodes: ./x86_64-softmmu/qemu-system-x86_64 -hda /home/data/libvirt-images/f18.img -m 512M \ -qmp unix:/tmp/m,server,nowait -monitor stdio -enable-kvm \ -object memory-backend-ram,size=256M,id=ram0 \ -numa node,nodeid=0,memdev=ram0 \ -numa node,nodeid=1,memdev=ram0 I am not sure we should write a ton of code to validate qemu configuration, as long as qemu does not assert. It seems qemu does not provide a way to disable assert currently. Even if I removed asserts on the code path in my test, there is another problem that it hits an infinite in render_memory_region(). OK so this is what commit log should say: --- Specifying the same memory region twice leads to an assert: ./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0 -numa node,nodeid=1,memdev=ram0 qemu-system-x86_64: /scm/qemu/memory.c:1506: memory_region_add_subregion_common: Assertion `!subregion-container' failed. Aborted (core dumped) Detect and exit with an error message instead. --- See? Explain why your patch makes sense, don't just repeat what it does. Even if we will want backend sharing, we can do it after. By reverting this patch? So why merge it? The point is qemu doesn't fire a bug no matter what user inputs. Igor, what's your take? Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- numa.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/numa.c b/numa.c index e471afe..6c1c554 100644 --- a/numa.c +++ b/numa.c @@ -279,6 +279,13 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner, exit(1); } +if (memory_region_is_mapped(seg)) { +char *path = object_get_canonical_path_component(OBJECT(backend)); +error_report(memory backend %s is busy, path); That's not very clear. How about: memory backend %s is used multiple times. Each -numa option must use a different memdev value. +g_free(path); As we are going to exit anyway, it does not make sense to bother with this. +exit(1); +} + memory_region_add_subregion(mr, addr, seg); vmstate_register_ram_global(seg); addr += size; -- 1.9.3
Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2
On Mon, Jun 30, 2014 at 4:08 PM, Stefan Hajnoczi stefa...@redhat.com wrote: On Sat, Jun 28, 2014 at 05:58:58PM +0800, Ming Lei wrote: On Sat, Jun 28, 2014 at 5:51 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 27/06/2014 20:01, Ming Lei ha scritto: I just implemented plugunplug based batching, and it is working now. But throughout still has no obvious improvement. Looks loading in IOthread is a bit low, so I am wondering if there is block point caused by Qemu QEMU block layer. What does perf say? Also, you can try using the QEMU trace subsystem and see where the latency goes. Follows some test result against 8589744aaf07b62 of upstream qemu, and the test is done on my 2core(4thread) laptop: 1, with my draft batch patches[1](only linux-aio supported now) - throughput: +16% compared qemu upstream - average time spent by handle_notify(): 310us - average time between two handle_notify(): 1591us (this time reflects latency of handling host_notifier) 16% is still a worthwhile improvement. I guess batching only benefits aio=native since the threadpool ought to do better when it receives requests as soon as possible. 16% is obtained with 'simple' trace-backend enabled, and looks the actual data with 'nop' trace is quite better than 16%, but it is still not good as 2.0.0 release. Patch or an RFC would be welcome. Yes, I will post it soon. 2, same tests on 2.0.0 release(use custom Linux AIO) - average time spent by handle_notify(): 68us - average time between calling two handle_notify(): 269us (this time reflects latency of handling host_notifier) From above tests, looks root cause is late handling notify, and qemu block layer becomes 4times slower than previous custom linux aio taken by dataplane. The above data is still obtained with 'simple' trace backend enabled, I need to find other ways to test again without extra trace io. Try: $ perf record -e syscalls:* --tid iothread-tid ^C $ perf script # shows the trace log The difference between syscalls in QEMU 2.0 and qemu.git/master could reveal the problem. Using perf you can also trace ioeventfd signalling in the host kernel and compare against the QEMU handle_notify entry/return. It may be easiest to use the ftrace_marker tracing backing in QEMU so the trace is unified with the host kernel trace (./configure --enable-trace-backend=ftrace and see the ftrace section in QEMU docs/tracing.txt). This way you can see whether the ioeventfd signal - handle_notify() entry increased or something else is going on. Looks good ideas, I will try it. I have tried ftrace, but looks some traces may be dropped and my current script can't handle that well. Thanks, -- Ming Lei
[Qemu-devel] [PATCH v5] ppc: spapr-rtas - implement os-term rtas call
PAPR compliant guest calls this in absence of kdump. This finally reaches the guest and can be handled according to the policies set by higher level tools(like taking dump) for further analysis by tools like crash. Linux kernel calls ibm,os-term when extended property of os-term is set. This makes sure that a return to the linux kernel is gauranteed. CC: Benjamin Herrenschmidt b...@au1.ibm.com CC: Anton Blanchard an...@samba.org CC: Alexander Graf ag...@suse.de CC: Tyrel Datwyler turtle.in.the.ker...@gmail.com Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com --- v2: rebase to ppcnext v3: Do not stop the VM, and update comments v4: update spapr_register_rtas and qapi_event changes v5: set ibm,extended-os-term as null encoded property --- hw/ppc/spapr.c | 9 + hw/ppc/spapr_rtas.c| 15 +++ include/hw/ppc/spapr.h | 1 - 3 files changed, 24 insertions(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 307c58d..e6c9014 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -520,6 +520,15 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base, _FDT((fdt_property_cell(fdt, rtas-error-log-max, RTAS_ERROR_LOG_MAX))); +/* + * According to PAPR, rtas ibm,os-term, does not gaurantee a return + * back to the guest cpu. + * + * While an additional ibm,extended-os-term property indicates that + * rtas call return will always occur. Set this property. + */ +_FDT((fdt_property(fdt, ibm,extended-os-term, NULL, 0))); + _FDT((fdt_end_node(fdt))); /* interrupt controller */ diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c index 9ba1ba6..2ec2a8e 100644 --- a/hw/ppc/spapr_rtas.c +++ b/hw/ppc/spapr_rtas.c @@ -277,6 +277,19 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu, rtas_st(rets, 0, ret); } +static void rtas_ibm_os_term(PowerPCCPU *cpu, +sPAPREnvironment *spapr, +uint32_t token, uint32_t nargs, +target_ulong args, +uint32_t nret, target_ulong rets) +{ +target_ulong ret = 0; + +qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, error_abort); + +rtas_st(rets, 0, ret); +} + static struct rtas_call { const char *name; spapr_rtas_fn fn; @@ -404,6 +417,8 @@ static void core_rtas_register_types(void) spapr_rtas_register(RTAS_IBM_SET_SYSTEM_PARAMETER, ibm,set-system-parameter, rtas_ibm_set_system_parameter); +spapr_rtas_register(RTAS_IBM_OS_TERM, ibm,os-term, +rtas_ibm_os_term); } type_init(core_rtas_register_types) diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index bbba51a..4e96381 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -382,7 +382,6 @@ int spapr_allocate_irq_block(int num, bool lsi, bool msi); #define RTAS_GET_SENSOR_STATE (RTAS_TOKEN_BASE + 0x1D) #define RTAS_IBM_CONFIGURE_CONNECTOR(RTAS_TOKEN_BASE + 0x1E) #define RTAS_IBM_OS_TERM(RTAS_TOKEN_BASE + 0x1F) -#define RTAS_IBM_EXTENDED_OS_TERM (RTAS_TOKEN_BASE + 0x20) #define RTAS_TOKEN_MAX (RTAS_TOKEN_BASE + 0x21) -- 1.8.3.1
Re: [Qemu-devel] [PATCH] ui/vnc: fix potential memory corruption issues
On Mo, 2014-06-30 at 10:01 +0200, Peter Lieven wrote: On 30.06.2014 09:52, Gerd Hoffmann wrote: So this basically makes vnc work correctly in case guest surface and server surface have different sizes, then fixes the two bugs on top of that. And it obsoletes the other corruption patch send Friday. Correct? Yes and yes. Good, so I'm reading the code correctly ;) Can you put that information into the patch description please? Otherwise the patch is fine. thanks, Gerd
Re: [Qemu-devel] [PATCH v4 0/9] virtio-blk: use alias properties in transport devices
On Wed, Jun 18, 2014 at 05:58:27PM +0800, Stefan Hajnoczi wrote: v4: * Coding style: typedef struct { on a single line [Andreas] * Add dataplane: bail out on unsupported transport for s390-virtio [Cornelia] v3: * Split qdev_alias_all_properties() into its own patch [Peter Crosthwaite] * Do not dereference DEVICE_CLASS(class) inline [Peter Crosthwaite] v2: * Add qdev_alias_all_properties() instead of virtio-blk-specific function [Paolo] * Explain refcount handling in doc comment [Paolo] * Fix property duplicate typo [Peter Crosthwaite] * Add the same object or to clarify commit description [Igor] Thanks for the feedback on the RFC. This time around the alias property is implemented at the QOM property level instead of at the qdev property level. Note that this series only addresses virtio-blk. In later series we can convert virtio net, scsi, rng, and serial. The virtio transport/device split is broken as follows: 1. The virtio-blk device is never finalized because the transport devices (virtio-blk-pci and friends) leak the refcount. 2. If we fix the refcount leak then we double-free the 'serial' string property upon hot unplug since its char* is copied into the virtio-blk device which has an identical 'serial' qdev property. This series solves both of these problems as follows: 1. Introduce a QOM alias property that lets the transport device forward property accesses into the virtio device (the child). 2. Use alias properties in transport devices, instead of keeping a duplicate copy of the VirtIOBlkConf struct. 3. Fix the virtio-blk device refcount leak. It's now safe to do this since the double-free has been resolved. Tested that hotplug/hotunplug of virtio-blk-pci still works. Cornelia Huck (1): dataplane: bail out on unsupported transport Stefan Hajnoczi (8): qom: add object_property_add_alias() virtio-blk: avoid qdev property definition duplication virtio-blk: move x-data-plane qdev property to virtio-blk.h qdev: add qdev_alias_all_properties() virtio-blk: use aliases instead of duplicate qdev properties virtio-blk: drop virtio_blk_set_conf() virtio: fix virtio-blk child refcount in transports virtio-blk: move qdev properties into virtio-blk.c hw/block/dataplane/virtio-blk.c | 10 hw/block/virtio-blk.c | 18 +-- hw/core/qdev.c | 21 + hw/s390x/s390-virtio-bus.c | 10 ++-- hw/s390x/s390-virtio-bus.h | 1 - hw/s390x/virtio-ccw.c | 7 ++ hw/s390x/virtio-ccw.h | 1 - hw/virtio/virtio-pci.c | 7 ++ hw/virtio/virtio-pci.h | 1 - include/hw/qdev-properties.h| 2 ++ include/hw/virtio/virtio-blk.h | 19 --- include/qom/object.h| 20 qom/object.c| 51 + 13 files changed, 121 insertions(+), 47 deletions(-) Applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan pgpcyIUD2l5RB.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH] numa: check for busy memory backend
On Mon, 30 Jun 2014 11:28:07 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Mon, Jun 30, 2014 at 03:46:56PM +0800, Hu Tao wrote: On Mon, Jun 30, 2014 at 09:53:20AM +0300, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 01:33:42PM +0800, Hu Tao wrote: On Sun, Jun 29, 2014 at 06:20:22PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 25, 2014 at 05:04:14PM +0800, Hu Tao wrote: ..to prevent one memory backend from being used by more than one numa node. Thanks, but please always make the msg content self-contained so it can be understood without the subject. E.g. here, just drop ..to. Are you sure we want this? Is there a chance sharing a backend can be useful? This patch is actually a bug fix. It is? What is the bug and how to reproduce it? If user specifies the same memory backend for two numa nodes: ./x86_64-softmmu/qemu-system-x86_64 -hda /home/data/libvirt-images/f18.img -m 512M \ -qmp unix:/tmp/m,server,nowait -monitor stdio -enable-kvm \ -object memory-backend-ram,size=256M,id=ram0 \ -numa node,nodeid=0,memdev=ram0 \ -numa node,nodeid=1,memdev=ram0 I am not sure we should write a ton of code to validate qemu configuration, as long as qemu does not assert. It seems qemu does not provide a way to disable assert currently. Even if I removed asserts on the code path in my test, there is another problem that it hits an infinite in render_memory_region(). OK so this is what commit log should say: --- Specifying the same memory region twice leads to an assert: ./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0 -numa node,nodeid=1,memdev=ram0 qemu-system-x86_64: /scm/qemu/memory.c:1506: memory_region_add_subregion_common: Assertion `!subregion-container' failed. Aborted (core dumped) Detect and exit with an error message instead. --- with fixed-up commit message: Reviewed-by: Igor Mammedov imamm...@redhat.com See? Explain why your patch makes sense, don't just repeat what it does. Even if we will want backend sharing, we can do it after. By reverting this patch? So why merge it? The point is qemu doesn't fire a bug no matter what user inputs. Igor, what's your take? Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- numa.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/numa.c b/numa.c index e471afe..6c1c554 100644 --- a/numa.c +++ b/numa.c @@ -279,6 +279,13 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner, exit(1); } +if (memory_region_is_mapped(seg)) { +char *path = object_get_canonical_path_component(OBJECT(backend)); +error_report(memory backend %s is busy, path); That's not very clear. How about: memory backend %s is used multiple times. Each -numa option must use a different memdev value. +g_free(path); As we are going to exit anyway, it does not make sense to bother with this. +exit(1); +} + memory_region_add_subregion(mr, addr, seg); vmstate_register_ram_global(seg); addr += size; -- 1.9.3
[Qemu-devel] [PATCHv2] ui/vnc: fix potential memory corruption issues
this patch makes the VNC server work correctly if the server surface and the guest surface have different sizes. Basically the server surface is adjusted to not exceed VNC_MAX_WIDTH x VNC_MAX_HEIGHT and additionally the width is rounded up to multiple of VNC_DIRTY_PIXELS_PER_BIT. If we have a resolution whose width is not dividable by VNC_DIRTY_PIXELS_PER_BIT we now get a small black bar on the right of the screen. If the surface is too big to fit the limits only the upper left area is shown. On top of that this fixes 2 memory corruption issues: The first was actually discovered during playing around with a Windows 7 vServer. During resolution change in Windows 7 it happens sometimes that Windows changes to an intermediate resolution where server_stride % cmp_bytes != 0 (in vnc_refresh_server_surface). This happens only if width % VNC_DIRTY_PIXELS_PER_BIT != 0. The second is a theoretical issue, but is maybe exploitable by the guest. If for some reason the guest surface size is bigger than VNC_MAX_WIDTH x VNC_MAX_HEIGHT we end up in severe corruption since this limit is nowhere enforced. Signed-off-by: Peter Lieven p...@kamp.de --- v1-v2: updated commit msg [Gerd] ui/vnc.c | 149 +- ui/vnc.h | 14 +++--- 2 files changed, 77 insertions(+), 86 deletions(-) diff --git a/ui/vnc.c b/ui/vnc.c index 14a86c3..a6d748b 100644 --- a/ui/vnc.c +++ b/ui/vnc.c @@ -432,14 +432,10 @@ static void framebuffer_update_request(VncState *vs, int incremental, static void vnc_refresh(DisplayChangeListener *dcl); static int vnc_refresh_server_surface(VncDisplay *vd); -static void vnc_dpy_update(DisplayChangeListener *dcl, - int x, int y, int w, int h) -{ -VncDisplay *vd = container_of(dcl, VncDisplay, dcl); -struct VncSurface *s = vd-guest; -int width = surface_width(vd-ds); -int height = surface_height(vd-ds); - +static void vnc_set_area_dirty(DECLARE_BITMAP(dirty[VNC_MAX_HEIGHT], + VNC_MAX_WIDTH / VNC_DIRTY_PIXELS_PER_BIT), + int width, int height, + int x, int y, int w, int h) { /* this is needed this to ensure we updated all affected * blocks if x % VNC_DIRTY_PIXELS_PER_BIT != 0 */ w += (x % VNC_DIRTY_PIXELS_PER_BIT); @@ -451,11 +447,22 @@ static void vnc_dpy_update(DisplayChangeListener *dcl, h = MIN(y + h, height); for (; y h; y++) { -bitmap_set(s-dirty[y], x / VNC_DIRTY_PIXELS_PER_BIT, +bitmap_set(dirty[y], x / VNC_DIRTY_PIXELS_PER_BIT, DIV_ROUND_UP(w, VNC_DIRTY_PIXELS_PER_BIT)); } } +static void vnc_dpy_update(DisplayChangeListener *dcl, + int x, int y, int w, int h) +{ +VncDisplay *vd = container_of(dcl, VncDisplay, dcl); +struct VncSurface *s = vd-guest; +int width = pixman_image_get_width(vd-server); +int height = pixman_image_get_height(vd-server); + +vnc_set_area_dirty(s-dirty, width, height, x, y, w, h); +} + void vnc_framebuffer_update(VncState *vs, int x, int y, int w, int h, int32_t encoding) { @@ -517,17 +524,15 @@ void buffer_advance(Buffer *buf, size_t len) static void vnc_desktop_resize(VncState *vs) { -DisplaySurface *ds = vs-vd-ds; - if (vs-csock == -1 || !vnc_has_feature(vs, VNC_FEATURE_RESIZE)) { return; } -if (vs-client_width == surface_width(ds) -vs-client_height == surface_height(ds)) { +if (vs-client_width == pixman_image_get_width(vs-vd-server) +vs-client_height == pixman_image_get_height(vs-vd-server)) { return; } -vs-client_width = surface_width(ds); -vs-client_height = surface_height(ds); +vs-client_width = pixman_image_get_width(vs-vd-server); +vs-client_height = pixman_image_get_height(vs-vd-server); vnc_lock_output(vs); vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE); vnc_write_u8(vs, 0); @@ -571,31 +576,24 @@ void *vnc_server_fb_ptr(VncDisplay *vd, int x, int y) ptr += x * VNC_SERVER_FB_BYTES; return ptr; } -/* this sets only the visible pixels of a dirty bitmap */ -#define VNC_SET_VISIBLE_PIXELS_DIRTY(bitmap, w, h) {\ -int y;\ -memset(bitmap, 0x00, sizeof(bitmap));\ -for (y = 0; y h; y++) {\ -bitmap_set(bitmap[y], 0,\ - DIV_ROUND_UP(w, VNC_DIRTY_PIXELS_PER_BIT));\ -} \ -} static void vnc_dpy_switch(DisplayChangeListener *dcl, DisplaySurface *surface) { VncDisplay *vd = container_of(dcl, VncDisplay, dcl); VncState *vs; +int width, height; vnc_abort_display_jobs(vd); /* server surface */ qemu_pixman_image_unref(vd-server); vd-ds = surface; +width = MIN(VNC_MAX_WIDTH, ROUND_UP(surface_width(vd-ds), +VNC_DIRTY_PIXELS_PER_BIT)); +height =
Re: [Qemu-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote: On 2014/6/30 14:48, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote: On 2014/6/26 18:03, Paolo Bonzini wrote: Il 26/06/2014 11:18, Chen, Tiejun ha scritto: - offsets 0x..0x0fff map to configuration space of the host MCH Are you saying the config space in the video device? No, I am saying in a new BAR, or at some magic offset of an existing MMIO BAR. As I mentioned previously, the IGD guy told me we have no any unused a offset or BAR in the config space. And guy who are responsible for the native driver seems not be accept to extend some magic offset of an existing MMIO BAR. In addition I think in a short time its not possible to migrate i440fx to q35 as a PCIe machine of xen. That seems like a weak motivation. I don't see a need to get something merged upstream in a short time: this seems sure to miss 2.1, so you have the time to make it architecturally sound. Making existing guests work would be a better motivation. Yes. So focus on this then. Existing guests will probably work fine on a newer chipset - likely better than on i440fx. xen management tools need to do some work to support this? That will just give everyone more long term benefits. If instead we create a hack that does not resemble any existing hardware even remotely, what's the chance that it will not break with any future guest modification? Isn't this possible with an mch chipset? If you're saying q35, I mean AFAIK we have no any plan to migrate to this MCH in xen case. q35 or a newer chipset that's closer to what guests expect. Additionally, I think I should follow this feature after q35 can work for xen scenario. What's stopping you? So could we do this step by step: #1 phase: We just cover current qemu-xen implementation based on i44fx, so still provide that pseudo ISA bridge at 00:1f.0 as we already did. By the way there is no reason to put it at 00:1f.0 specifically I think. So it seems simple: create a dummy device that gets device and vendor id as properties. If you really like, add an option to get values Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx passthrough: create pseudo intel isa bridge. There, we fake this device just at 00:1f.0. But you guys don't like this, and shouldn't this be just this point we discussing now? If you guys agree that , everything is fine. Actually, this isn't what you did. Don't tie it to xen, and don't tie it to 1f. Just make it a simple stub pci device. Whoever wants it, creates it. The thing I worry about, is the chance this will break going forward. So you created a system with 2 isa bridges. This is already not something that exists on real hardware. So it might break some guests that will get confused. Maybe we are lucky and most guests see an unfamiliar device and ignore it. It seems believable. But your MCH hack overrides registers in the pci host. Are we lucky and there's nothing in these registers of interest to guests? This seems much more fragile. So please poke at the spec, and compile the list of registers you want to touch, figure out why they are safe to override, and put this all in code comments. And the same thing that applies to the isa bridge applies here too. It should work without QEMU touching hosts' hardware. from sysfs: device and vendor id are world readable, so just get them directly and not through xen wrappers, this way you can open the files RO and not RW. You seem to poke at revision as well, I don't see driver looking at that - strictly necessary? If yes please patch host kernel to expose that info in sysfs, though we can fall back on pci config if not there. MCH (bridge_dev) hacks in i915 are nastier. To clean them up, we really have to have an explicit driver for this bridge, not a pass-through device. Long term, the right thing to do is likely to extend host driver and expose the necessary information in sysfs on host kernel. I'm a bit confused. Any sysfs should be filled by the associated PCIe device, shouldn't it? So qemu still need to emulate this PCIe device firstly, then set properties into sysfs. I am talking about getting host properties into qemu. You don't want to give qemu R/W root access to host sysfs system of the root bridge, that's not secure. Avoiding read only access to filesystem is a good idea too, so it should be possible to pass all parameters in as device properties, and let whoever starts qemu figure out what are reasonable values. #2 phase: Now, we will choose a capability ID that won't be conflicting with others. To do this properly, we need to get one from PCI SIG group. To have this workable and consistently validated, this method shouldn't be virt specific. Then native driver should use the same method. You mean you will be able to talk sense into hardware guys? I doubt that. If they could
Re: [Qemu-devel] [PATCH v5] ppc: spapr-rtas - implement os-term rtas call
Am 30.06.2014 um 10:35 schrieb Nikunj A Dadhania nik...@linux.vnet.ibm.com: PAPR compliant guest calls this in absence of kdump. This finally reaches the guest and can be handled according to the policies set by higher level tools(like taking dump) for further analysis by tools like crash. Linux kernel calls ibm,os-term when extended property of os-term is set. This makes sure that a return to the linux kernel is gauranteed. CC: Benjamin Herrenschmidt b...@au1.ibm.com CC: Anton Blanchard an...@samba.org CC: Alexander Graf ag...@suse.de CC: Tyrel Datwyler turtle.in.the.ker...@gmail.com Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com --- v2: rebase to ppcnext v3: Do not stop the VM, and update comments v4: update spapr_register_rtas and qapi_event changes v5: set ibm,extended-os-term as null encoded property --- hw/ppc/spapr.c | 9 + hw/ppc/spapr_rtas.c| 15 +++ include/hw/ppc/spapr.h | 1 - 3 files changed, 24 insertions(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 307c58d..e6c9014 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -520,6 +520,15 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base, _FDT((fdt_property_cell(fdt, rtas-error-log-max, RTAS_ERROR_LOG_MAX))); +/* + * According to PAPR, rtas ibm,os-term, does not gaurantee a return + * back to the guest cpu. + * + * While an additional ibm,extended-os-term property indicates that + * rtas call return will always occur. Set this property. + */ +_FDT((fdt_property(fdt, ibm,extended-os-term, NULL, 0))); + _FDT((fdt_end_node(fdt))); /* interrupt controller */ diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c index 9ba1ba6..2ec2a8e 100644 --- a/hw/ppc/spapr_rtas.c +++ b/hw/ppc/spapr_rtas.c @@ -277,6 +277,19 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU *cpu, rtas_st(rets, 0, ret); } +static void rtas_ibm_os_term(PowerPCCPU *cpu, +sPAPREnvironment *spapr, +uint32_t token, uint32_t nargs, +target_ulong args, +uint32_t nret, target_ulong rets) +{ +target_ulong ret = 0; + +qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, error_abort); The guest doesn't pause. Since the guest will call os-term in a loop, this will also flood the event listener with lots and lots of panic messages. Alex
Re: [Qemu-devel] [PATCH] numa: check for busy memory backend
On Mon, Jun 30, 2014 at 10:48:22AM +0200, Igor Mammedov wrote: On Mon, 30 Jun 2014 11:28:07 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Mon, Jun 30, 2014 at 03:46:56PM +0800, Hu Tao wrote: On Mon, Jun 30, 2014 at 09:53:20AM +0300, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 01:33:42PM +0800, Hu Tao wrote: On Sun, Jun 29, 2014 at 06:20:22PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 25, 2014 at 05:04:14PM +0800, Hu Tao wrote: ..to prevent one memory backend from being used by more than one numa node. Thanks, but please always make the msg content self-contained so it can be understood without the subject. E.g. here, just drop ..to. Are you sure we want this? Is there a chance sharing a backend can be useful? This patch is actually a bug fix. It is? What is the bug and how to reproduce it? If user specifies the same memory backend for two numa nodes: ./x86_64-softmmu/qemu-system-x86_64 -hda /home/data/libvirt-images/f18.img -m 512M \ -qmp unix:/tmp/m,server,nowait -monitor stdio -enable-kvm \ -object memory-backend-ram,size=256M,id=ram0 \ -numa node,nodeid=0,memdev=ram0 \ -numa node,nodeid=1,memdev=ram0 I am not sure we should write a ton of code to validate qemu configuration, as long as qemu does not assert. It seems qemu does not provide a way to disable assert currently. Even if I removed asserts on the code path in my test, there is another problem that it hits an infinite in render_memory_region(). OK so this is what commit log should say: --- Specifying the same memory region twice leads to an assert: ./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0 -numa node,nodeid=1,memdev=ram0 qemu-system-x86_64: /scm/qemu/memory.c:1506: memory_region_add_subregion_common: Assertion `!subregion-container' failed. Aborted (core dumped) Detect and exit with an error message instead. --- with fixed-up commit message: Reviewed-by: Igor Mammedov imamm...@redhat.com Sorry I want the error message fixed up too. See? Explain why your patch makes sense, don't just repeat what it does. Even if we will want backend sharing, we can do it after. By reverting this patch? So why merge it? The point is qemu doesn't fire a bug no matter what user inputs. Igor, what's your take? Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- numa.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/numa.c b/numa.c index e471afe..6c1c554 100644 --- a/numa.c +++ b/numa.c @@ -279,6 +279,13 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner, exit(1); } +if (memory_region_is_mapped(seg)) { +char *path = object_get_canonical_path_component(OBJECT(backend)); +error_report(memory backend %s is busy, path); That's not very clear. How about: memory backend %s is used multiple times. Each -numa option must use a different memdev value. +g_free(path); As we are going to exit anyway, it does not make sense to bother with this. +exit(1); +} + memory_region_add_subregion(mr, addr, seg); vmstate_register_ram_global(seg); addr += size; -- 1.9.3
Re: [Qemu-devel] [PATCH v5] ppc: spapr-rtas - implement os-term rtas call
Alexander Graf ag...@suse.de writes: Am 30.06.2014 um 10:35 schrieb Nikunj A Dadhania nik...@linux.vnet.ibm.com: +static void rtas_ibm_os_term(PowerPCCPU *cpu, +sPAPREnvironment *spapr, +uint32_t token, uint32_t nargs, +target_ulong args, +uint32_t nret, target_ulong rets) +{ +target_ulong ret = 0; + +qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, error_abort); The guest doesn't pause. I see the event reaching libvirt and a dump taken and the guest restarts. Since the guest will call os-term in a loop, this will also flood the event listener with lots and lots of panic messages. do { status = rtas_call(rtas_token(ibm,os-term), 1, 1, NULL, __pa(rtas_os_term_buf)); } while (rtas_busy_delay(status)); So when status from the rtas call is success, the loop should exit. Am I missing something? Regards Nikunj
Re: [Qemu-devel] [v2][PATCH 4/8] xen, gfx passthrough: reserve 00:02.0 for INTEL IGD
Hi, /* Make cirrues VGA S3 suspend/resume work in Windows XP/2003 */ Device (VGA) { - Name (_ADR, 0x0002) + // Address of the VGA (device F function 0) + Name (_ADR, 0x000F) With this change I still didn't see anything. This does not match with what I see. Looks like linux does not care about the acpi data. Using The acpi data doesn't matter at all for the boot screen, the vgabios doesn't look at it. (d12) Scan for VGA option rom (d12) Running option rom at c000:0003 seabios loaded+initialized the vgabios (from the pci rom bar of the vga device). Which slot the vga is installed at should not matter at all. seabios scans the pci bus and should find the vga with no problems no matter where it is. (XEN) stdvga.c:147:d12v0 entering stdvga and caching modes Seems to have worked fine ;) (d12) pmm call arg1=0 (d12) Turning on vga text mode console (d12) SeaBIOS (version rel-1.7.5-0-ge51488c-20140626_113926-dcs-xen-54) At this point you should see the seabios banner at the vga screen. And the VGA screen has the SeaBIOS messages: SeaBIOS (version rel-1.7.5-0-ge51488c-20140626_113926-dcs-xen-54) Machine UUID 18cf5fd0-e564-49c4-b63f-e6c3c23b1294 As it should be. Now the questions is why it doesn't work for Tiejun ... Anything in the logs? cheers, Gerd
Re: [Qemu-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On 2014/6/30 17:05, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote: On 2014/6/30 14:48, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote: On 2014/6/26 18:03, Paolo Bonzini wrote: Il 26/06/2014 11:18, Chen, Tiejun ha scritto: - offsets 0x..0x0fff map to configuration space of the host MCH Are you saying the config space in the video device? No, I am saying in a new BAR, or at some magic offset of an existing MMIO BAR. As I mentioned previously, the IGD guy told me we have no any unused a offset or BAR in the config space. And guy who are responsible for the native driver seems not be accept to extend some magic offset of an existing MMIO BAR. In addition I think in a short time its not possible to migrate i440fx to q35 as a PCIe machine of xen. That seems like a weak motivation. I don't see a need to get something merged upstream in a short time: this seems sure to miss 2.1, so you have the time to make it architecturally sound. Making existing guests work would be a better motivation. Yes. So focus on this then. Existing guests will probably work fine on a newer chipset - likely better than on i440fx. xen management tools need to do some work to support this? That will just give everyone more long term benefits. If instead we create a hack that does not resemble any existing hardware even remotely, what's the chance that it will not break with any future guest modification? Isn't this possible with an mch chipset? If you're saying q35, I mean AFAIK we have no any plan to migrate to this MCH in xen case. q35 or a newer chipset that's closer to what guests expect. Additionally, I think I should follow this feature after q35 can work for xen scenario. What's stopping you? I mean if you want create an new machine based on q35, actually this is equal to start making xen to migrate to q35 now. Right? I can't image how much effort should be done. So this is a reason why I'm saying I'd like to follow this feature after q35 can work with xen completely. So could we do this step by step: #1 phase: We just cover current qemu-xen implementation based on i44fx, so still provide that pseudo ISA bridge at 00:1f.0 as we already did. By the way there is no reason to put it at 00:1f.0 specifically I think. So it seems simple: create a dummy device that gets device and vendor id as properties. If you really like, add an option to get values Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx passthrough: create pseudo intel isa bridge. There, we fake this device just at 00:1f.0. But you guys don't like this, and shouldn't this be just this point we discussing now? If you guys agree that , everything is fine. Actually, this isn't what you did. Don't tie it to xen, and don't tie it to 1f. Just make it a simple stub pci device. Whoever wants it, creates it. The thing I worry about, is the chance this will break going forward. So you created a system with 2 isa bridges. I don't set class type to claim this is an ISA bridge, just a normal PCI device. +static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice *hdev) +{ +struct PCIDevice *dev; + +char rid; + +/* We havt to use a simple PCI device to fake this ISA bridge + * to avoid making some confusion to BIOS and ACPI. + */ +dev = pci_create(bus, PCI_DEVFN(0x1f, 0), pseudo-intel-pch-isa-bridge); + +qdev_init_nofail(dev-qdev); + +pci_config_set_vendor_id(dev-config, hdev-vendor_id); +pci_config_set_device_id(dev-config, hdev-device_id); + +xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)rid, 1); + +pci_config_set_revision(dev-config, rid); + +XEN_PT_LOG(dev, The pseudo Intel PCH ISA bridge created.\n); +return 0; +} This is already not something that exists on real hardware. So it might break some guests that will get confused. Maybe we are lucky and most guests see an unfamiliar device and ignore it. It seems believable. But your MCH hack overrides registers in the pci host. We just try to write *one* register we already confirm this is safe enough. Other register are read-only. Are we lucky and there's nothing in these registers of interest to guests? This seems much more fragile. So please poke at the spec, and compile the list of registers you want to touch, figure out why they are safe to override, and put this all in code comments. And the same thing that applies to the isa bridge We just expose its own vendor/device ids here. We don't access to change anything in the isa bridge. applies here too. It should work without QEMU touching hosts' hardware. from sysfs: device and vendor id are world readable, so just get them directly and not through xen wrappers, this way you can open the files RO and not RW. You seem to poke at revision as well, I don't see driver looking at that - strictly necessary? If yes please
Re: [Qemu-devel] [PULL 01/15] block/iscsi: handle BUSY condition
On 06/19/2014 02:03 AM, Paolo Bonzini wrote: From: Peter Lieven p...@kamp.de this patch adds handling of BUSY status reponse from an iSCSI target. Currently, we fail with -EIO in case of SCSI_STATUS_BUSY while the obvious reaction would be to retry the operation after some time. The retry time is randomly choosen from a range with exponential growth increasing with each retry. This patch includes most of the changes by a an upcoming patch from Stefan Hajnoczi: iscsi: implement .bdrv_detach/attach_aio_context() because I also need the reference to the aio_context for the retry timer to work. I included the changes to maintain better mergeability. Signed-off-by: Peter Lieven p...@kamp.de Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/iscsi.c | 55 +++ 1 file changed, 43 insertions(+), 12 deletions(-) diff --git a/block/iscsi.c b/block/iscsi.c index 6f87605..e64659f 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -26,6 +26,7 @@ #include config-host.h #include poll.h +#include math.h #include arpa/inet.h #include qemu-common.h #include qemu/config-file.h @@ -75,6 +76,7 @@ typedef struct IscsiTask { Coroutine *co; QEMUBH *bh; IscsiLun *iscsilun; +QEMUTimer retry_timer; } IscsiTask; typedef struct IscsiAIOCB { @@ -86,7 +88,6 @@ typedef struct IscsiAIOCB { uint8_t *buf; int status; int canceled; -int retries; int64_t sector_num; int nb_sectors; #ifdef __linux__ @@ -96,7 +97,8 @@ typedef struct IscsiAIOCB { #define NOP_INTERVAL 5000 #define MAX_NOP_FAILURES 3 -#define ISCSI_CMD_RETRIES 5 +#define ISCSI_CMD_RETRIES ARRAY_SIZE(iscsi_retry_times) +static const unsigned iscsi_retry_times[] = {8, 32, 128, 512, 2048}; /* this threshhold is a trade-off knob to choose between * the potential additional overhead of an extra GET_LBA_STATUS request @@ -146,6 +148,19 @@ static void iscsi_co_generic_bh_cb(void *opaque) qemu_coroutine_enter(iTask-co, NULL); } +static void iscsi_retry_timer_expired(void *opaque) +{ +struct IscsiTask *iTask = opaque; +if (iTask-co) { +qemu_coroutine_enter(iTask-co, NULL); +} +} + +static inline unsigned exp_random(double mean) +{ +return -mean * log((double)rand() / RAND_MAX); This new use of log() breaks linkage on Fedora20/ppc64, qemu-nbd fails (but qemu-system-ppc64 still links find). Adding --extra-cflags=-lm to ./configure invocation helps. How to fix? Seems like fc20 packages do not have some dependency which qemu-nbd relies on. Thanks. [aik@vpl2 qemu]$ make V=1 cc -DHAS_LIBSSH2_SFTP_FSYNC -m64 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/include/p11-kit-1 -I/usr/include/p11-kit-1 -I/usr/include/libpng16 -I/usr/include/libusb-1.0 -I/usr/include/pixman-1 -I/home/aik/p/qemu/tests -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -g -Wl,--warn-common -m64 -g -o qemu-nbd qemu-nbd.o async.o thread-pool.o nbd.o block.o blockjob.o main-loop.o iohandler.o qemu-timer.o aio-posix.o qapi-types.o qapi-visit.o qemu-io-cmds.o qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-io.o qemu-coroutine-sleep.o coroutine-ucontext.o block/raw_bsd.o block/cow.o block/qcow.o block/vdi.o block/vmdk.o block/cloop.o block/dmg.o block/bochs.o block/vpc.o block/vvfat.o block/qcow2.o block/qcow2-refcount.o block/qcow2-cluster.o block/qcow2-snapshot.o block/qcow2-cache.o block/qed.o block/qed-gencb.o block/qed-l2-cache.o block/qed-table.o block/qed-cluster.o block/qed-check.o block/vhdx.o block/vhdx-endian.o block/vhdx-log.o block/quorum.o block/parallels.o block/blkdebug.o block/blkverify.o block/snapshot.o block/qapi.o block/raw-posix.o block/linux-aio.o block/nbd.o block/nbd-client.o block/sheepdog.o block/iscsi.o block/curl.o block/rbd.o block/gluster.o block/ssh.o libqemuutil.a libqemustub.a -lz -laio -L/usr/lib64/iscsi -liscsi -lcurl -lrbd -lrados -lgfapi -lglusterfs -lgfrpc -lgfxdr -lssh2 -lgthread-2.0 -pthread -lglib-2.0 -lz -lrt -lz -luuid -lgnutls -lgnutls -lgnutls -lSDL -lpthread -lX11 -lvte2_90 -lgtk-3 -lgdk-3 -lpangocairo-1.0 -lpango-1.0 -latk-1.0 -lcairo-gobject -lgdk_pixbuf-2.0 -lgio-2.0 -lgobject-2.0 -lglib-2.0 -lcairo -lX11 -lXext -lgtk-3 -lgdk-3 -lpangocairo-1.0 -lpango-1.0 -latk-1.0 -lcairo-gobject -lcairo -lgdk_pixbuf-2.0 -lgio-2.0 -lgobject-2.0 -lglib-2.0 -lX11 -lrdmacm -libverbs /usr/bin/ld: block/iscsi.o: undefined reference to symbol 'log@@GLIBC_2.3' /usr/bin/ld: note: 'log@@GLIBC_2.3' is defined in DSO /lib64/libm.so.6 so try adding it to the linker command
Re: [Qemu-devel] [PATCH v2] qtest: enable vhost-user-test
On Thu, Jun 19, 2014 at 09:24:09PM +0300, Michael S. Tsirkin wrote: On Thu, Jun 19, 2014 at 08:35:42PM +0300, Nikolay Nikolaev wrote: Use qtest-obj-y to get the right library order. CONFIG_POSIX ensures mingw compilation won't break. Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com okay but why does non posix work without -lutil and posix doesn't? Ping. --- 0 files changed diff --git a/tests/Makefile b/tests/Makefile index 4caf7de..5661d52 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -156,7 +156,7 @@ gcov-files-i386-y += hw/usb/hcd-ehci.c gcov-files-i386-y += hw/usb/hcd-uhci.c gcov-files-i386-y += hw/usb/dev-hid.c gcov-files-i386-y += hw/usb/dev-storage.c -#check-qtest-i386-y += tests/vhost-user-test$(EXESUF) +check-qtest-i386-$(CONFIG_POSIX) += tests/vhost-user-test$(EXESUF) check-qtest-x86_64-y = $(check-qtest-i386-y) gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c gcov-files-x86_64-y = $(subst i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y)) @@ -323,11 +323,14 @@ tests/es1370-test$(EXESUF): tests/es1370-test.o tests/intel-hda-test$(EXESUF): tests/intel-hda-test.o tests/ioh3420-test$(EXESUF): tests/ioh3420-test.o tests/usb-hcd-ehci-test$(EXESUF): tests/usb-hcd-ehci-test.o $(libqos-pc-obj-y) -tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o qemu-timer.o libqemuutil.a libqemustub.a +tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o qemu-timer.o $(qtest-obj-y) tests/qemu-iotests/socket_scm_helper$(EXESUF): tests/qemu-iotests/socket_scm_helper.o tests/test-qemu-opts$(EXESUF): tests/test-qemu-opts.o libqemuutil.a libqemustub.a -#LIBS+= -lutil + +ifeq ($(CONFIG_POSIX),y) +LIBS+= -lutil +endif # QTest rules
Re: [Qemu-devel] [PATCH v3 1/2] block: Do not prematurely remove filename
Am 26.06.2014 um 23:38 hat Max Reitz geschrieben: If filename is removed from the options QDict before entering bdrv_open_common(), it cannot be stored in the BDS. Therefore, wait until it has been copied there and remove it from the options only afterwards. This fixes filename in the BDS being empty for block drivers which do not need the filename because they have parsed it already (e.g. NBD). Signed-off-by: Max Reitz mre...@redhat.com I can't say I like this approach. It looks a bit odd to pass a boolean variable to bdrv_open(), and in some other function called from there the cleanup is done that logically really belong to bdrv_fill_options(). More importantly, the goal was to get rid of the filename and handle everything through the options so that we get a uniform state again. This would involve replacing bs-filename by a new callback function in BlockDriver that constructs a filename that describes the BDS. This way we would get useful output not only for nbd:localhost:10809, but also for driver=nbd,host=localhost. In hard cases, the callback might just use json:{...} syntax. This suggests that maybe in the end we'll want to have two different callbacks, one giving a short human-readable description ('localhost:10809') and another one giving something that can be used on the command line ('json:{driver: nbd, host: localhost, ipv6: true}'). Kevin
[Qemu-devel] [PATCH 0/3] block: introduce submit I/O at batch
Hi, The commit 580b6b2aa2(dataplane: use the QEMU block layer for I/O) introduces ~40% throughput regression on virtio-blk dataplane, and one of causes is that submitting I/O at batch is removed. This patchset trys to introduce this mechanism on block, at least, linux-aio can benefit from that. With these patches, it is observed that thoughout on virtio-blk dataplane can be improved a lot, see data in commit log of patch 3/3. It should be possible to apply the batch mechanism to other devices (such as virtio-scsi) too. Thanks, -- Ming Lei
[Qemu-devel] [PATCH 1/3] block: introduce IO queue APIs
This patch introduces IO queue related APIs so that following patches can support queuing I/O requests and submitting them at batch for improving I/O performance. Signed-off-by: Ming Lei ming@canonical.com --- aio-posix.c | 13 block.c | 78 + include/block/aio.h | 12 +++ include/block/block.h |6 include/block/block_int.h |3 ++ 5 files changed, 112 insertions(+) diff --git a/aio-posix.c b/aio-posix.c index f921d4f..3e065c1 100644 --- a/aio-posix.c +++ b/aio-posix.c @@ -240,3 +240,16 @@ bool aio_poll(AioContext *ctx, bool blocking) return progress; } + +/* IO queue support */ +void aio_init_io_queue(AioContext *ctx, unsigned int size) +{ +ctx-io_queue.iocbs = g_new0(void *, size); +ctx-io_queue.size = size; +ctx-io_queue.idx = 0; +} + +void aio_exit_io_queue(AioContext *ctx) +{ +g_free(ctx-io_queue.iocbs); +} diff --git a/block.c b/block.c index 217f523..c4a6f8b 100644 --- a/block.c +++ b/block.c @@ -1910,6 +1910,7 @@ void bdrv_drain_all(void) bool bs_busy; aio_context_acquire(aio_context); +bdrv_io_unplug(bs); bdrv_start_throttled_reqs(bs); bs_busy = bdrv_requests_pending(bs); bs_busy |= aio_poll(aio_context, bs_busy); @@ -5774,3 +5775,80 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate) return false; } + +/* IO queue APIs */ +void bdrv_init_io_queue(BlockDriverState *bs, unsigned int size) +{ +aio_init_io_queue(bs-aio_context, size); +} + +void bdrv_exit_io_queue(BlockDriverState *bs) +{ +aio_exit_io_queue(bs-aio_context); +} + +/* We think one block driver supports feature of io queue if + * they implement callback of .bdrv_submit_io_queue + */ +static bool __bdrv_support_io_queue(BlockDriverState *bs) +{ +if (bs bs-drv bs-drv-bdrv_submit_io_queue) +return true; +return false; +} +bool bdrv_support_io_queue(BlockDriverState *bs) +{ +if (__bdrv_support_io_queue(bs-file)) +return true; +if (__bdrv_support_io_queue(bs-backing_hd)) +return true; +return false; +} + +static void __bdrv_io_unplug(BlockDriverState *bs) +{ +if (!bdrv_support_io_queue(bs)) +return; + +if (!bs-aio_context-io_queue.idx) +return; + +if (__bdrv_support_io_queue(bs-file)) +bs-file-drv-bdrv_submit_io_queue(bs-file); +if (__bdrv_support_io_queue(bs-backing_hd)) +bs-backing_hd-drv-bdrv_submit_io_queue(bs-backing_hd); +bs-aio_context-io_queue.idx = 0; +} + +/* BlockDriver may call this function for queuing current IO request + * represented by iocb to io_queue, and will submit them at batch. + */ +void bdrv_queue_io(BlockDriverState *bs, void *iocb) +{ +unsigned int idx = bs-aio_context-io_queue.idx; + +bs-aio_context-io_queue.iocbs[idx++] = iocb; +bs-aio_context-io_queue.idx = idx; + +/* unplug immediately if queue is full */ +if (idx == bs-aio_context-io_queue.size) +__bdrv_io_unplug(bs); +} + +/* Block device calls this function to let driver queue IO request + * and submit them at batch later, but it is just a hint because + * block driver may not support the feature. + */ +void bdrv_io_plug(BlockDriverState *bs) +{ +bs-aio_context-io_plugged = true; +} + +/* Block device calls this function to ask driver to submit + * all requests in io queue immediatelly. + */ +void bdrv_io_unplug(BlockDriverState *bs) +{ +__bdrv_io_unplug(bs); +bs-aio_context-io_plugged = false; +} diff --git a/include/block/aio.h b/include/block/aio.h index a92511b..b7fdcfb 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -45,6 +45,12 @@ typedef struct AioHandler AioHandler; typedef void QEMUBHFunc(void *opaque); typedef void IOHandler(void *opaque); +typedef struct AioIOQueue { +void **iocbs; +unsigned int size; +unsigned int idx; +} AioIOQueue; + struct AioContext { GSource source; @@ -81,6 +87,10 @@ struct AioContext { /* TimerLists for calling timers - one per clock type */ QEMUTimerListGroup tlg; + +/* IOQueue support */ +AioIOQueue io_queue; +bool io_plugged; }; /** @@ -307,4 +317,6 @@ static inline void aio_timer_init(AioContext *ctx, timer_init(ts, ctx-tlg.tl[type], scale, cb, opaque); } +void aio_init_io_queue(AioContext *ctx, unsigned int size); +void aio_exit_io_queue(AioContext *ctx); #endif diff --git a/include/block/block.h b/include/block/block.h index d0baf4f..702b79a 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -578,4 +578,10 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs); */ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context); +void bdrv_init_io_queue(BlockDriverState *bs, unsigned int size); +void bdrv_exit_io_queue(BlockDriverState *bs); +bool bdrv_support_io_queue(BlockDriverState *bs); +void bdrv_queue_io(BlockDriverState
[Qemu-devel] [PATCH 2/3] block: linux-aio: support submit io_queue
This patch implements .bdrv_submit_io_queue callback for linux-aio Block Drivers, so that submitting I/O at batch can be supported on linux-aio. Signed-off-by: Ming Lei ming@canonical.com --- block/linux-aio.c | 32 ++-- block/raw-aio.h |1 + block/raw-posix.c | 16 3 files changed, 47 insertions(+), 2 deletions(-) diff --git a/block/linux-aio.c b/block/linux-aio.c index f0a2c08..12f56d8 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -11,6 +11,7 @@ #include block/aio.h #include qemu/queue.h #include block/raw-aio.h +#include block/block_int.h #include qemu/event_notifier.h #include libaio.h @@ -135,6 +136,29 @@ static const AIOCBInfo laio_aiocb_info = { .cancel = laio_cancel, }; +int laio_submit_io_queue(BlockDriverState *bs, void *aio_ctx) +{ +struct qemu_laio_state *s = aio_ctx; +int ret, i; + +while (1) { +ret = io_submit(s-ctx, bs-aio_context-io_queue.idx, + (struct iocb **)bs-aio_context-io_queue.iocbs); +if (ret != -EAGAIN) +break; +} + +if (ret = 0) + return 0; + +for (i = 0; i bs-aio_context-io_queue.idx; i++) { +struct qemu_laiocb *laiocb = +container_of(bs-aio_context-io_queue.iocbs[i], struct qemu_laiocb, iocb); +qemu_aio_release(laiocb); +} +return ret; +} + BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd, int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, BlockDriverCompletionFunc *cb, void *opaque, int type) @@ -168,8 +192,12 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd, } io_set_eventfd(laiocb-iocb, event_notifier_get_fd(s-e)); -if (io_submit(s-ctx, 1, iocbs) 0) -goto out_free_aiocb; +if (!bs-aio_context-io_plugged) { +if (io_submit(s-ctx, 1, iocbs) 0) +goto out_free_aiocb; +} else { +bdrv_queue_io(bs, iocbs); +} return laiocb-common; out_free_aiocb: diff --git a/block/raw-aio.h b/block/raw-aio.h index 8cf084e..89fa775 100644 --- a/block/raw-aio.h +++ b/block/raw-aio.h @@ -40,6 +40,7 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd, BlockDriverCompletionFunc *cb, void *opaque, int type); void laio_detach_aio_context(void *s, AioContext *old_context); void laio_attach_aio_context(void *s, AioContext *new_context); +int laio_submit_io_queue(BlockDriverState *bs, void *aio_ctx); #endif #ifdef _WIN32 diff --git a/block/raw-posix.c b/block/raw-posix.c index dacf4fb..5869e8e 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -1054,6 +1054,17 @@ static BlockDriverAIOCB *raw_aio_submit(BlockDriverState *bs, cb, opaque, type); } +static int raw_aio_submit_io_queue(BlockDriverState *bs) +{ +BDRVRawState *s = bs-opaque; + +#ifdef CONFIG_LINUX_AIO +if (s-use_aio) +return laio_submit_io_queue(bs, s-aio_ctx); +#endif +return 0; +} + static BlockDriverAIOCB *raw_aio_readv(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, BlockDriverCompletionFunc *cb, void *opaque) @@ -1503,6 +1514,7 @@ static BlockDriver bdrv_file = { .bdrv_aio_flush = raw_aio_flush, .bdrv_aio_discard = raw_aio_discard, .bdrv_refresh_limits = raw_refresh_limits, +.bdrv_submit_io_queue = raw_aio_submit_io_queue, .bdrv_truncate = raw_truncate, .bdrv_getlength = raw_getlength, @@ -1902,6 +1914,7 @@ static BlockDriver bdrv_host_device = { .bdrv_aio_flush= raw_aio_flush, .bdrv_aio_discard = hdev_aio_discard, .bdrv_refresh_limits = raw_refresh_limits, +.bdrv_submit_io_queue = raw_aio_submit_io_queue, .bdrv_truncate = raw_truncate, .bdrv_getlength= raw_getlength, @@ -2047,6 +2060,7 @@ static BlockDriver bdrv_host_floppy = { .bdrv_aio_writev= raw_aio_writev, .bdrv_aio_flush= raw_aio_flush, .bdrv_refresh_limits = raw_refresh_limits, +.bdrv_submit_io_queue = raw_aio_submit_io_queue, .bdrv_truncate = raw_truncate, .bdrv_getlength = raw_getlength, @@ -2175,6 +2189,7 @@ static BlockDriver bdrv_host_cdrom = { .bdrv_aio_writev= raw_aio_writev, .bdrv_aio_flush= raw_aio_flush, .bdrv_refresh_limits = raw_refresh_limits, +.bdrv_submit_io_queue = raw_aio_submit_io_queue, .bdrv_truncate = raw_truncate, .bdrv_getlength = raw_getlength, @@ -2309,6 +2324,7 @@ static BlockDriver bdrv_host_cdrom = { .bdrv_aio_writev= raw_aio_writev, .bdrv_aio_flush= raw_aio_flush, .bdrv_refresh_limits = raw_refresh_limits, +.bdrv_submit_io_queue = raw_aio_submit_io_queue, .bdrv_truncate = raw_truncate, .bdrv_getlength = raw_getlength, -- 1.7.9.5
[Qemu-devel] [PATCH 3/3] dataplane: submit I/O at batch
Before commit 580b6b2aa2(dataplane: use the Qemu block layer for I/O), dataplane for virtio-blk submits block I/O at batch. This commit 580b6b2aa2 replaces the custom linux AIO implementation(including I/O batch) with Qemu block layer, but this commit causes ~40% throughput regression on virtio-blk performance, and removing submitting I/O at batch is one of the cause. This patch applys the new introduced bdrv_io_plug() and bdrv_io_unplug() interfaces to support submitting I/O at batch for Qemu block layer, and in my test, the change can improve thoughput by ~30% with 'aio=native'. Following my fio test script: [global] direct=1 size=4G bsrange=4k-4k timeout=40 numjobs=4 ioengine=libaio iodepth=64 filename=/dev/vdc group_reporting=1 [f] rw=randread Result on one of my small machine(host: x86_64, 2cores, 4thread, guest: 4cores): - qemu master: 62K IOPS - qemu master with these patches: 84K IOPS - 2.0.0 release(dataplane using custom linux aio): 97K IOPS Signed-off-by: Ming Lei ming@canonical.com --- hw/block/dataplane/virtio-blk.c |5 + 1 file changed, 5 insertions(+) diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index c10b7b7..5deaddc 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -289,6 +289,7 @@ static void handle_notify(EventNotifier *e) int ret; event_notifier_test_and_clear(s-host_notifier); +bdrv_io_plug(s-blk-conf.bs); for (;;) { /* Disable guest-host notifies to avoid unnecessary vmexits */ vring_disable_notification(s-vdev, s-vring); @@ -322,6 +323,7 @@ static void handle_notify(EventNotifier *e) break; } } +bdrv_io_unplug(s-blk-conf.bs); } /* Context: QEMU global mutex held */ @@ -431,6 +433,7 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s) trace_virtio_blk_data_plane_start(s); bdrv_set_aio_context(s-blk-conf.bs, s-ctx); +bdrv_init_io_queue(s-blk-conf.bs, 128); /* Kick right away to begin processing requests already in vring */ event_notifier_set(virtio_queue_get_host_notifier(vq)); @@ -462,6 +465,8 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s) aio_context_release(s-ctx); +bdrv_exit_io_queue(s-blk-conf.bs); + /* Sync vring state back to virtqueue so that non-dataplane request * processing can continue when we disable the host notifier below. */ -- 1.7.9.5
Re: [Qemu-devel] [PATCH v8 00/14] qemu-img: Implement commit like QMP
Am 28.06.2014 um 00:07 hat Max Reitz geschrieben: On 07.06.2014 20:51, Max Reitz wrote: qemu-img should use QMP commands whenever possible in order to ensure feature completeness of both online and offline image operations. For the commit command, this is relatively easy, so implement it first (in the hope that indeed others will follow). As qemu-img does not have access to QMP (due to QMP being intertwined with basically everything in qemu), we cannot directly use QMP, but at least use the functions the corresponding QMP commands are using (which would be block-commit, in this case). With Stefan's pull request for his dataplane series now out, I thought this a good opportunity to send a rebase of this series. Ping; Hu Tao will need minimal_blob_size() from patch 3 for the next iteration of his qemu-img: add preallocation=full series. Sending an own patch just for that function seems infeasible, as it is a static function which would be unused in the meantime (which throws a compiler warning and an error thanks to -Werror). Using __attribute__((unused)) just for this seems like a hack; especially considering that all patches of this series have been reviewed and it should therefore be ready to merge. In case there are some objections because you want to test it more, it would be fine to merge the first three patches (which suffice for the preallocation series and should only introduce unused codepaths) now and the rest later on. This series needs rebasing from patch 5 on. I'll review patches 1-3 so that the dependency for the other series is there. Kevin
Re: [Qemu-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote: On 2014/6/30 17:05, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote: On 2014/6/30 14:48, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote: On 2014/6/26 18:03, Paolo Bonzini wrote: Il 26/06/2014 11:18, Chen, Tiejun ha scritto: - offsets 0x..0x0fff map to configuration space of the host MCH Are you saying the config space in the video device? No, I am saying in a new BAR, or at some magic offset of an existing MMIO BAR. As I mentioned previously, the IGD guy told me we have no any unused a offset or BAR in the config space. And guy who are responsible for the native driver seems not be accept to extend some magic offset of an existing MMIO BAR. In addition I think in a short time its not possible to migrate i440fx to q35 as a PCIe machine of xen. That seems like a weak motivation. I don't see a need to get something merged upstream in a short time: this seems sure to miss 2.1, so you have the time to make it architecturally sound. Making existing guests work would be a better motivation. Yes. So focus on this then. Existing guests will probably work fine on a newer chipset - likely better than on i440fx. xen management tools need to do some work to support this? That will just give everyone more long term benefits. If instead we create a hack that does not resemble any existing hardware even remotely, what's the chance that it will not break with any future guest modification? Isn't this possible with an mch chipset? If you're saying q35, I mean AFAIK we have no any plan to migrate to this MCH in xen case. q35 or a newer chipset that's closer to what guests expect. Additionally, I think I should follow this feature after q35 can work for xen scenario. What's stopping you? I mean if you want create an new machine based on q35, actually this is equal to start making xen to migrate to q35 now. Right? I can't image how much effort should be done. I don't see why you don't try. Sounds like a more robust solution to me. So this is a reason why I'm saying I'd like to follow this feature after q35 can work with xen completely. Then we'll end up with more configurations to support, and to what end? So could we do this step by step: #1 phase: We just cover current qemu-xen implementation based on i44fx, so still provide that pseudo ISA bridge at 00:1f.0 as we already did. By the way there is no reason to put it at 00:1f.0 specifically I think. So it seems simple: create a dummy device that gets device and vendor id as properties. If you really like, add an option to get values Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx passthrough: create pseudo intel isa bridge. There, we fake this device just at 00:1f.0. But you guys don't like this, and shouldn't this be just this point we discussing now? If you guys agree that , everything is fine. Actually, this isn't what you did. Don't tie it to xen, and don't tie it to 1f. Just make it a simple stub pci device. Whoever wants it, creates it. The thing I worry about, is the chance this will break going forward. So you created a system with 2 isa bridges. I don't set class type to claim this is an ISA bridge, just a normal PCI device. +static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice *hdev) +{ +struct PCIDevice *dev; + +char rid; + +/* We havt to use a simple PCI device to fake this ISA bridge + * to avoid making some confusion to BIOS and ACPI. + */ +dev = pci_create(bus, PCI_DEVFN(0x1f, 0), pseudo-intel-pch-isa-bridge); + +qdev_init_nofail(dev-qdev); + +pci_config_set_vendor_id(dev-config, hdev-vendor_id); +pci_config_set_device_id(dev-config, hdev-device_id); + +xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)rid, 1); + +pci_config_set_revision(dev-config, rid); + +XEN_PT_LOG(dev, The pseudo Intel PCH ISA bridge created.\n); +return 0; +} Then I don't see how it's supposed to work. Doesn't i915 look for an isa bridge? /* * The reason to probe ISA bridge instead of Dev31:Fun0 is to * make graphics device passthrough work easy for VMM, that only * need to expose ISA bridge to let driver know the real hardware * underneath. This is a requirement from virtualization team. * * In some virtualized environments (e.g. XEN), there is irrelevant * ISA bridge in the system. To work reliably, we should scan trhough * all the ISA bridge devices and check for the first match, instead * of only checking the first one. */ while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA 8, pch))) { if (pch-vendor == PCI_VENDOR_ID_INTEL) { unsigned short id =
Re: [Qemu-devel] [SeaBIOS] [PATCH v3] hw/pci: reserve IO and mem for pci express downstream ports with no devices attached
On Mo, 2014-06-23 at 18:55 +0300, Michael S. Tsirkin wrote: On Mon, Jun 23, 2014 at 06:29:51PM +0300, Marcel Apfelbaum wrote: Commit c6e298e1f12e0f4ca02b6da5e42919ae055f6830 hw/pci: reserve IO and mem for pci-2-pci bridges with no devices attached introduced support for hot-plugging devices behind pci-2-pci bridges. Extend hotplug support also for pci express downstream ports. Signed-off-by: Marcel Apfelbaum marce...@redhat.com Reviewed-by: Michael S. Tsirkin m...@redhat.com Gerd, is there a plan to do a release for QEMU 2.1? It would be nice to have this patch there. It's in already (seabios 1.7.5 release qemu git master). cheers, Gerd
[Qemu-devel] [PATCH v3] tests: Functions bus_foreach and device_find from libqos virtio API
Virtio header has been changed to compile and work with a real device. Functions bus_foreach and device_find have been implemented for PCI. Virtio-blk test case now opens a fake device. Signed-off-by: Marc Marí marc.mari.barc...@gmail.com --- tests/Makefile|3 +- tests/libqos/virtio-pci.c | 72 + tests/libqos/virtio-pci.h | 31 +++ tests/libqos/virtio.h | 28 ++ tests/virtio-blk-test.c | 65 +++- 5 files changed, 190 insertions(+), 9 deletions(-) create mode 100644 tests/libqos/virtio-pci.c create mode 100644 tests/libqos/virtio-pci.h create mode 100644 tests/libqos/virtio.h diff --git a/tests/Makefile b/tests/Makefile index 7e53d0d..028c462 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -292,6 +292,7 @@ libqos-obj-y += tests/libqos/i2c.o libqos-pc-obj-y = $(libqos-obj-y) tests/libqos/pci-pc.o libqos-pc-obj-y += tests/libqos/malloc-pc.o libqos-omap-obj-y = $(libqos-obj-y) tests/libqos/i2c-omap.o +libqos-virtio-obj-y = $(libqos-obj-y) $(libqos-pc-obj-y) tests/libqos/virtio-pci.o tests/rtc-test$(EXESUF): tests/rtc-test.o tests/m48t59-test$(EXESUF): tests/m48t59-test.o @@ -312,7 +313,7 @@ tests/eepro100-test$(EXESUF): tests/eepro100-test.o tests/vmxnet3-test$(EXESUF): tests/vmxnet3-test.o tests/ne2000-test$(EXESUF): tests/ne2000-test.o tests/virtio-balloon-test$(EXESUF): tests/virtio-balloon-test.o -tests/virtio-blk-test$(EXESUF): tests/virtio-blk-test.o +tests/virtio-blk-test$(EXESUF): tests/virtio-blk-test.o $(libqos-virtio-obj-y) tests/virtio-net-test$(EXESUF): tests/virtio-net-test.o tests/virtio-rng-test$(EXESUF): tests/virtio-rng-test.o tests/virtio-scsi-test$(EXESUF): tests/virtio-scsi-test.o diff --git a/tests/libqos/virtio-pci.c b/tests/libqos/virtio-pci.c new file mode 100644 index 000..9fd9735 --- /dev/null +++ b/tests/libqos/virtio-pci.c @@ -0,0 +1,72 @@ +/* + * libqos virtio PCI driver + * + * Copyright (c) 2014 Marc Marí + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include glib.h +#include stdio.h +#include libqtest.h +#include libqos/virtio.h +#include libqos/virtio-pci.h +#include libqos/pci.h +#include libqos/pci-pc.h + +#include hw/pci/pci_regs.h + +static QVirtioPCIDevice *qpcidevice_to_qvirtiodevice(QPCIDevice *pdev) +{ +QVirtioPCIDevice *vpcidev; +vpcidev = g_malloc0(sizeof(*vpcidev)); + +if (pdev) { +vpcidev-pdev = pdev; +vpcidev-vdev.device_type = +qpci_config_readw(vpcidev-pdev, PCI_SUBSYSTEM_ID); +} + +return vpcidev; +} + +static void qvirtio_pci_foreach_callback( +QPCIDevice *dev, int devfn, void *data) +{ +QVirtioPCIForeachData *d = data; +QVirtioPCIDevice *vpcidev = qpcidevice_to_qvirtiodevice(dev); + +if (vpcidev-vdev.device_type == d-device_type) { +d-func(vpcidev-vdev, d-user_data); +} +} + +static void qvirtio_pci_assign_device(QVirtioDevice *d, void *data) +{ +QVirtioPCIDevice *vpcidev = data; +vpcidev-pdev = ((QVirtioPCIDevice *)d)-pdev; +vpcidev-vdev.device_type = ((QVirtioPCIDevice *)d)-vdev.device_type; +} + +void qvirtio_pci_foreach(QPCIBus *bus, uint16_t device_type, +void (*func)(QVirtioDevice *d, void *data), void *data) +{ +QVirtioPCIForeachData d = { .func = func, +.device_type = device_type, +.user_data = data }; + +qpci_device_foreach(bus, QVIRTIO_VENDOR_ID, -1, +qvirtio_pci_foreach_callback, d); +} + +QVirtioPCIDevice *qvirtio_pci_device_find(QPCIBus *bus, uint16_t device_type) +{ +QVirtioPCIDevice *dev; + +dev = g_malloc0(sizeof(*dev)); +qvirtio_pci_foreach(bus, device_type, qvirtio_pci_assign_device, dev); + +return dev; +} + diff --git a/tests/libqos/virtio-pci.h b/tests/libqos/virtio-pci.h new file mode 100644 index 000..f00a982 --- /dev/null +++ b/tests/libqos/virtio-pci.h @@ -0,0 +1,31 @@ +/* + * libqos virtio PCI definitions + * + * Copyright (c) 2014 Marc Marí + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef LIBQOS_VIRTIO_PCI_H +#define LIBQOS_VIRTIO_PCI_H + +#include libqos/virtio.h +#include libqos/pci.h + +typedef struct QVirtioPCIDevice { +QVirtioDevice vdev; +QPCIDevice *pdev; +} QVirtioPCIDevice; + +typedef struct QVirtioPCIForeachData { +void (*func)(QVirtioDevice *d, void *data); +uint16_t device_type; +void *user_data; +} QVirtioPCIForeachData; + +void qvirtio_pci_foreach(QPCIBus *bus, uint16_t device_type, +void (*func)(QVirtioDevice *d, void *data), void *data); + +QVirtioPCIDevice *qvirtio_pci_device_find(QPCIBus *bus, uint16_t device_type); +#endif diff --git
Re: [Qemu-devel] [PATCH] hw/virtio: enable common virtio feature for mmio device
Hi Guys, On Wed, Jun 4, 2014 at 10:05 AM, Ming Lei ming@canonical.com wrote: Both 'indirect_desc' and 'event_idx' are bus independent features, and they should be enabled for mmio devices too. On arm64 quad core VM(qemu-kvm), the patch can increase block I/O performance a lot with latest linux tree: - without the patch: 14K IOPS - with the patch: 34K IOPS fio script: [global] direct=1 bsrange=4k-4k timeout=10 numjobs=4 ioengine=libaio iodepth=64 filename=/dev/vdc group_reporting=1 [f1] rw=randread Cc: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Ming Lei ming@canonical.com --- hw/virtio/virtio-mmio.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c index 8829eb0..18c6e5b 100644 --- a/hw/virtio/virtio-mmio.c +++ b/hw/virtio/virtio-mmio.c @@ -369,10 +369,16 @@ static void virtio_mmio_realizefn(DeviceState *d, Error **errp) sysbus_init_mmio(sbd, proxy-iomem); } +static Property virtio_mmio_properties[] = { +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOMMIOProxy, host_features), +DEFINE_PROP_END_OF_LIST(), +}; + static void virtio_mmio_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); +dc-props = virtio_mmio_properties; dc-realize = virtio_mmio_realizefn; dc-reset = virtio_mmio_reset; set_bit(DEVICE_CATEGORY_MISC, dc-categories); -- Gentle ping... Thanks, -- Ming Lei
Re: [Qemu-devel] [PATCH v8 01/14] qcow2: Allow full discard
Am 07.06.2014 um 20:51 hat Max Reitz geschrieben: Normally, discarded sectors should read back as zero. However, there are cases in which a sector (or rather cluster) should be discarded as if they were never written in the first place, that is, reading them should fall through to the backing file again. Signed-off-by: Max Reitz mre...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com Reviewed-by: Kevin Wolf kw...@redhat.com
Re: [Qemu-devel] [PATCH v8 02/14] qcow2: Implement bdrv_make_empty()
Am 07.06.2014 um 20:51 hat Max Reitz geschrieben: Implement this function by making all clusters in the image file fall through to the backing file (by using the recently extended discard). Signed-off-by: Max Reitz mre...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com Reviewed-by: Kevin Wolf kw...@redhat.com
Re: [Qemu-devel] [questions] about KVM as a Microsoft-compatible hypervisor
On Mon, 2014-06-30 at 09:39 +0800, Zhang Haoyu wrote: Hi, Vadim I read the kvm-2012-forum paper KVM as a Microsoft-compatible hypervisor, Any update and other references, please? Thanks, Zhang Haoyu Unfortunately, not too much. From the the most recent, we have lazy eoi implemented by MST and reference time counter. Best regards, Vadim.
Re: [Qemu-devel] [PATCH] hw/virtio: enable common virtio feature for mmio device
On Wed, Jun 04, 2014 at 10:05:55AM +0800, Ming Lei wrote: Both 'indirect_desc' and 'event_idx' are bus independent features, and they should be enabled for mmio devices too. On arm64 quad core VM(qemu-kvm), the patch can increase block I/O performance a lot with latest linux tree: - without the patch: 14K IOPS - with the patch: 34K IOPS fio script: [global] direct=1 bsrange=4k-4k timeout=10 numjobs=4 ioengine=libaio iodepth=64 filename=/dev/vdc group_reporting=1 [f1] rw=randread Cc: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Ming Lei ming@canonical.com Applied, thanks! --- hw/virtio/virtio-mmio.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c index 8829eb0..18c6e5b 100644 --- a/hw/virtio/virtio-mmio.c +++ b/hw/virtio/virtio-mmio.c @@ -369,10 +369,16 @@ static void virtio_mmio_realizefn(DeviceState *d, Error **errp) sysbus_init_mmio(sbd, proxy-iomem); } +static Property virtio_mmio_properties[] = { +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOMMIOProxy, host_features), +DEFINE_PROP_END_OF_LIST(), +}; + static void virtio_mmio_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); +dc-props = virtio_mmio_properties; dc-realize = virtio_mmio_realizefn; dc-reset = virtio_mmio_reset; set_bit(DEVICE_CATEGORY_MISC, dc-categories); -- 1.7.9.5
Re: [Qemu-devel] [PATCH] hw/virtio: enable common virtio feature for mmio device
On Mon, Jun 16, 2014 at 05:26:33PM +0800, Ming Lei wrote: On Mon, Jun 16, 2014 at 3:54 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 09/06/2014 10:00, Ming Lei ha scritto: On Wed, Jun 4, 2014 at 10:05 AM, Ming Lei ming@canonical.com wrote: Both 'indirect_desc' and 'event_idx' are bus independent features, and they should be enabled for mmio devices too. On arm64 quad core VM(qemu-kvm), the patch can increase block I/O performance a lot with latest linux tree: - without the patch: 14K IOPS - with the patch: 34K IOPS fio script: [global] direct=1 bsrange=4k-4k timeout=10 numjobs=4 ioengine=libaio iodepth=64 filename=/dev/vdc group_reporting=1 [f1] rw=randread Cc: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Ming Lei ming@canonical.com --- hw/virtio/virtio-mmio.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c index 8829eb0..18c6e5b 100644 --- a/hw/virtio/virtio-mmio.c +++ b/hw/virtio/virtio-mmio.c @@ -369,10 +369,16 @@ static void virtio_mmio_realizefn(DeviceState *d, Error **errp) sysbus_init_mmio(sbd, proxy-iomem); } +static Property virtio_mmio_properties[] = { +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOMMIOProxy, host_features), +DEFINE_PROP_END_OF_LIST(), +}; + static void virtio_mmio_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); +dc-props = virtio_mmio_properties; dc-realize = virtio_mmio_realizefn; dc-reset = virtio_mmio_reset; set_bit(DEVICE_CATEGORY_MISC, dc-categories); -- 1.7.9.5 Looks good. Paolo, thanks for your review. Can you look into moving DEFINE_VIRTIO_COMMON_FEATURES from all virtio pci devices to TYPE_VIRTIO_PCI, too? OK, that looks a good cleanup, how about the attached patch? If it is OK, I will prepare a formal one for submitting. Thanks, -- Ming Lei I applied the original patch for now. Pls address Paolo's comments and resubmit this one. diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index b437f19..af2e1c3 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -917,7 +917,6 @@ static Property virtio_9p_pci_properties[] = { DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), -DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), DEFINE_VIRTIO_9P_PROPERTIES(V9fsPCIState, vdev.fsconf), DEFINE_PROP_END_OF_LIST(), }; @@ -1038,11 +1037,17 @@ static void virtio_pci_reset(DeviceState *qdev) proxy-flags = ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG; } +static Property virtio_pci_properties[] = { +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), +DEFINE_PROP_END_OF_LIST(), +}; + static void virtio_pci_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); +dc-props = virtio_pci_properties; k-init = virtio_pci_init; k-exit = virtio_pci_exit; k-vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET; @@ -1071,7 +1076,6 @@ static Property virtio_blk_pci_properties[] = { DEFINE_PROP_BIT(x-data-plane, VirtIOBlkPCI, blk.data_plane, 0, false), DEFINE_PROP_UINT32(num_queues, VirtIOBlkPCI, blk.num_queues, 1), #endif -DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features), DEFINE_VIRTIO_BLK_PROPERTIES(VirtIOBlkPCI, blk), DEFINE_PROP_END_OF_LIST(), }; @@ -1195,7 +1199,6 @@ static const TypeInfo virtio_scsi_pci_info = { static Property vhost_scsi_pci_properties[] = { DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, DEV_NVECTORS_UNSPECIFIED), -DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), DEFINE_VHOST_SCSI_PROPERTIES(VHostSCSIPCI, vdev.parent_obj.conf), DEFINE_PROP_END_OF_LIST(), }; @@ -1276,7 +1279,6 @@ static void balloon_pci_stats_set_poll_interval(Object *obj, struct Visitor *v, } static Property virtio_balloon_pci_properties[] = { -DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), DEFINE_PROP_UINT32(class, VirtIOPCIProxy, class_code, 0), DEFINE_PROP_END_OF_LIST(), }; @@ -1379,7 +1381,6 @@ static Property virtio_serial_pci_properties[] = { VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), DEFINE_PROP_UINT32(class, VirtIOPCIProxy, class_code, 0), -DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features), DEFINE_VIRTIO_SERIAL_PROPERTIES(VirtIOSerialPCI, vdev.serial), DEFINE_PROP_END_OF_LIST(), }; @@ -1475,7 +1476,6 @@ static const TypeInfo
Re: [Qemu-devel] [PATCH] hw/virtio: enable common virtio feature for mmio device
On Mon, Jun 30, 2014 at 6:09 PM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Jun 16, 2014 at 05:26:33PM +0800, Ming Lei wrote: On Mon, Jun 16, 2014 at 3:54 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 09/06/2014 10:00, Ming Lei ha scritto: On Wed, Jun 4, 2014 at 10:05 AM, Ming Lei ming@canonical.com wrote: Both 'indirect_desc' and 'event_idx' are bus independent features, and they should be enabled for mmio devices too. On arm64 quad core VM(qemu-kvm), the patch can increase block I/O performance a lot with latest linux tree: - without the patch: 14K IOPS - with the patch: 34K IOPS fio script: [global] direct=1 bsrange=4k-4k timeout=10 numjobs=4 ioengine=libaio iodepth=64 filename=/dev/vdc group_reporting=1 [f1] rw=randread Cc: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Ming Lei ming@canonical.com --- hw/virtio/virtio-mmio.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c index 8829eb0..18c6e5b 100644 --- a/hw/virtio/virtio-mmio.c +++ b/hw/virtio/virtio-mmio.c @@ -369,10 +369,16 @@ static void virtio_mmio_realizefn(DeviceState *d, Error **errp) sysbus_init_mmio(sbd, proxy-iomem); } +static Property virtio_mmio_properties[] = { +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOMMIOProxy, host_features), +DEFINE_PROP_END_OF_LIST(), +}; + static void virtio_mmio_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); +dc-props = virtio_mmio_properties; dc-realize = virtio_mmio_realizefn; dc-reset = virtio_mmio_reset; set_bit(DEVICE_CATEGORY_MISC, dc-categories); -- 1.7.9.5 Looks good. Paolo, thanks for your review. Can you look into moving DEFINE_VIRTIO_COMMON_FEATURES from all virtio pci devices to TYPE_VIRTIO_PCI, too? OK, that looks a good cleanup, how about the attached patch? If it is OK, I will prepare a formal one for submitting. Thanks, -- Ming Lei I applied the original patch for now. Pls address Paolo's comments and resubmit this one. I have addresses all comments for the virtio-pci changes, and will reply you on that thread. Thanks, -- Ming Lei
Re: [Qemu-devel] [PATCH] virtio: move common virtio properties to bus class device
Hi Michael, On Wed, Jun 18, 2014 at 3:13 PM, Ming Lei ming@canonical.com wrote: The two common virtio features can be defined per bus, so move all into bus class device to make code more clean. As discussed with cornelia, s390-virtio-blk doesn't support the two features at all, so keep s390-virtio as it. Acked-by: Cornelia Huck cornelia.h...@de.ibm.com #for s390 ccw Suggested-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Ming Lei ming@canonical.com --- hw/s390x/s390-virtio-bus.c |2 ++ hw/s390x/virtio-ccw.c | 11 ++- hw/virtio/virtio-pci.c | 12 ++-- include/hw/virtio/virtio-blk.h |3 --- include/hw/virtio/virtio-net.h |1 - include/hw/virtio/virtio-scsi.h |1 - 6 files changed, 14 insertions(+), 16 deletions(-) diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c index 9c71afa..921cdc2 100644 --- a/hw/s390x/s390-virtio-bus.c +++ b/hw/s390x/s390-virtio-bus.c @@ -504,6 +504,7 @@ static unsigned virtio_s390_get_features(DeviceState *d) static Property s390_virtio_net_properties[] = { DEFINE_NIC_PROPERTIES(VirtIONetS390, vdev.nic_conf), +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features), DEFINE_VIRTIO_NET_FEATURES(VirtIOS390Device, host_features), DEFINE_VIRTIO_NET_PROPERTIES(VirtIONetS390, vdev.net_conf), DEFINE_PROP_END_OF_LIST(), @@ -631,6 +632,7 @@ static const TypeInfo virtio_s390_device_info = { static Property s390_virtio_scsi_properties[] = { DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOSCSIS390, vdev.parent_obj.conf), +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features), DEFINE_VIRTIO_SCSI_FEATURES(VirtIOS390Device, host_features), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index 2bf0af8..c0124e1 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -1126,7 +1126,6 @@ static const TypeInfo virtio_ccw_net = { static Property virtio_ccw_blk_properties[] = { DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id), -DEFINE_VIRTIO_BLK_FEATURES(VirtioCcwDevice, host_features[0]), DEFINE_VIRTIO_BLK_PROPERTIES(VirtIOBlkCcw, blk), DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags, VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true), @@ -1158,7 +1157,6 @@ static const TypeInfo virtio_ccw_blk = { static Property virtio_ccw_serial_properties[] = { DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id), DEFINE_VIRTIO_SERIAL_PROPERTIES(VirtioSerialCcw, vdev.serial), -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]), DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags, VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_END_OF_LIST(), @@ -1185,7 +1183,6 @@ static const TypeInfo virtio_ccw_serial = { static Property virtio_ccw_balloon_properties[] = { DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id), -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]), DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags, VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_END_OF_LIST(), @@ -1242,7 +1239,6 @@ static const TypeInfo virtio_ccw_scsi = { static Property vhost_ccw_scsi_properties[] = { DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id), DEFINE_VHOST_SCSI_PROPERTIES(VirtIOSCSICcw, vdev.parent_obj.conf), -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]), DEFINE_PROP_END_OF_LIST(), }; @@ -1279,7 +1275,6 @@ static void virtio_ccw_rng_instance_init(Object *obj) static Property virtio_ccw_rng_properties[] = { DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id), -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]), DEFINE_VIRTIO_RNG_PROPERTIES(VirtIORNGCcw, vdev.conf), DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags, VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true), @@ -1345,10 +1340,16 @@ static int virtio_ccw_busdev_unplug(DeviceState *dev) return 0; } +static Property virtio_ccw_properties[] = { +DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]), +DEFINE_PROP_END_OF_LIST(), +}; + static void virtio_ccw_device_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); +dc-props = virtio_ccw_properties; dc-init = virtio_ccw_busdev_init; dc-exit = virtio_ccw_busdev_exit; dc-unplug = virtio_ccw_busdev_unplug; diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index b437f19..af2e1c3 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -917,7 +917,6 @@ static Property virtio_9p_pci_properties[] = { DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2), -
Re: [Qemu-devel] [PATCH] numa: check for busy memory backend
On Mon, Jun 30, 2014 at 12:12:20PM +0300, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 10:48:22AM +0200, Igor Mammedov wrote: On Mon, 30 Jun 2014 11:28:07 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Mon, Jun 30, 2014 at 03:46:56PM +0800, Hu Tao wrote: On Mon, Jun 30, 2014 at 09:53:20AM +0300, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 01:33:42PM +0800, Hu Tao wrote: On Sun, Jun 29, 2014 at 06:20:22PM +0300, Michael S. Tsirkin wrote: On Wed, Jun 25, 2014 at 05:04:14PM +0800, Hu Tao wrote: ..to prevent one memory backend from being used by more than one numa node. Thanks, but please always make the msg content self-contained so it can be understood without the subject. E.g. here, just drop ..to. Are you sure we want this? Is there a chance sharing a backend can be useful? This patch is actually a bug fix. It is? What is the bug and how to reproduce it? If user specifies the same memory backend for two numa nodes: ./x86_64-softmmu/qemu-system-x86_64 -hda /home/data/libvirt-images/f18.img -m 512M \ -qmp unix:/tmp/m,server,nowait -monitor stdio -enable-kvm \ -object memory-backend-ram,size=256M,id=ram0 \ -numa node,nodeid=0,memdev=ram0 \ -numa node,nodeid=1,memdev=ram0 I am not sure we should write a ton of code to validate qemu configuration, as long as qemu does not assert. It seems qemu does not provide a way to disable assert currently. Even if I removed asserts on the code path in my test, there is another problem that it hits an infinite in render_memory_region(). OK so this is what commit log should say: --- Specifying the same memory region twice leads to an assert: ./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0 -numa node,nodeid=1,memdev=ram0 qemu-system-x86_64: /scm/qemu/memory.c:1506: memory_region_add_subregion_common: Assertion `!subregion-container' failed. Aborted (core dumped) Detect and exit with an error message instead. --- with fixed-up commit message: Reviewed-by: Igor Mammedov imamm...@redhat.com Sorry I want the error message fixed up too. Yes your error message is more clear. I'll send v2. Thanks for review. Regards, Hu See? Explain why your patch makes sense, don't just repeat what it does. Even if we will want backend sharing, we can do it after. By reverting this patch? So why merge it? The point is qemu doesn't fire a bug no matter what user inputs. Igor, what's your take? Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- numa.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/numa.c b/numa.c index e471afe..6c1c554 100644 --- a/numa.c +++ b/numa.c @@ -279,6 +279,13 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner, exit(1); } +if (memory_region_is_mapped(seg)) { +char *path = object_get_canonical_path_component(OBJECT(backend)); +error_report(memory backend %s is busy, path); That's not very clear. How about: memory backend %s is used multiple times. Each -numa option must use a different memdev value. +g_free(path); As we are going to exit anyway, it does not make sense to bother with this. +exit(1); +} + memory_region_add_subregion(mr, addr, seg); vmstate_register_ram_global(seg); addr += size; -- 1.9.3
Re: [Qemu-devel] [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On 2014/6/30 17:55, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote: On 2014/6/30 17:05, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote: On 2014/6/30 14:48, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote: On 2014/6/26 18:03, Paolo Bonzini wrote: Il 26/06/2014 11:18, Chen, Tiejun ha scritto: - offsets 0x..0x0fff map to configuration space of the host MCH Are you saying the config space in the video device? No, I am saying in a new BAR, or at some magic offset of an existing MMIO BAR. As I mentioned previously, the IGD guy told me we have no any unused a offset or BAR in the config space. And guy who are responsible for the native driver seems not be accept to extend some magic offset of an existing MMIO BAR. In addition I think in a short time its not possible to migrate i440fx to q35 as a PCIe machine of xen. That seems like a weak motivation. I don't see a need to get something merged upstream in a short time: this seems sure to miss 2.1, so you have the time to make it architecturally sound. Making existing guests work would be a better motivation. Yes. So focus on this then. Existing guests will probably work fine on a newer chipset - likely better than on i440fx. xen management tools need to do some work to support this? That will just give everyone more long term benefits. If instead we create a hack that does not resemble any existing hardware even remotely, what's the chance that it will not break with any future guest modification? Isn't this possible with an mch chipset? If you're saying q35, I mean AFAIK we have no any plan to migrate to this MCH in xen case. q35 or a newer chipset that's closer to what guests expect. Additionally, I think I should follow this feature after q35 can work for xen scenario. What's stopping you? I mean if you want create an new machine based on q35, actually this is equal to start making xen to migrate to q35 now. Right? I can't image how much effort should be done. I don't see why you don't try. Sounds like a more robust solution to me. As I think this is another requirement to us. I'm not sure if I have enough time to touch this currently. So this is a reason why I'm saying I'd like to follow this feature after q35 can work with xen completely. Then we'll end up with more configurations to support, and to what end? So could we do this step by step: #1 phase: We just cover current qemu-xen implementation based on i44fx, so still provide that pseudo ISA bridge at 00:1f.0 as we already did. By the way there is no reason to put it at 00:1f.0 specifically I think. So it seems simple: create a dummy device that gets device and vendor id as properties. If you really like, add an option to get values Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx passthrough: create pseudo intel isa bridge. There, we fake this device just at 00:1f.0. But you guys don't like this, and shouldn't this be just this point we discussing now? If you guys agree that , everything is fine. Actually, this isn't what you did. Don't tie it to xen, and don't tie it to 1f. Just make it a simple stub pci device. Whoever wants it, creates it. The thing I worry about, is the chance this will break going forward. So you created a system with 2 isa bridges. I don't set class type to claim this is an ISA bridge, just a normal PCI device. +static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice *hdev) +{ +struct PCIDevice *dev; + +char rid; + +/* We havt to use a simple PCI device to fake this ISA bridge + * to avoid making some confusion to BIOS and ACPI. + */ +dev = pci_create(bus, PCI_DEVFN(0x1f, 0), pseudo-intel-pch-isa-bridge); + +qdev_init_nofail(dev-qdev); + +pci_config_set_vendor_id(dev-config, hdev-vendor_id); +pci_config_set_device_id(dev-config, hdev-device_id); + +xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)rid, 1); + +pci_config_set_revision(dev-config, rid); + +XEN_PT_LOG(dev, The pseudo Intel PCH ISA bridge created.\n); +return 0; +} Then I don't see how it's supposed to work. Doesn't i915 look for an isa bridge? /* * The reason to probe ISA bridge instead of Dev31:Fun0 is to * make graphics device passthrough work easy for VMM, that only * need to expose ISA bridge to let driver know the real hardware * underneath. This is a requirement from virtualization team. * * In some virtualized environments (e.g. XEN), there is irrelevant * ISA bridge in the system. To work reliably, we should scan trhough * all the ISA bridge devices and check for the first match, instead * of only checking the first one. */ while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA 8, pch))) { if
Re: [Qemu-devel] [v2][PATCH 4/8] xen, gfx passthrough: reserve 00:02.0 for INTEL IGD
On 2014/6/30 17:29, Gerd Hoffmann wrote: Hi, /* Make cirrues VGA S3 suspend/resume work in Windows XP/2003 */ Device (VGA) { - Name (_ADR, 0x0002) + // Address of the VGA (device F function 0) + Name (_ADR, 0x000F) With this change I still didn't see anything. This does not match with what I see. Looks like linux does not care about the acpi data. Using The acpi data doesn't matter at all for the boot screen, the vgabios doesn't look at it. (d12) Scan for VGA option rom (d12) Running option rom at c000:0003 seabios loaded+initialized the vgabios (from the pci rom bar of the vga device). Which slot the vga is installed at should not matter at all. seabios scans the pci bus and should find the vga with no problems no matter where it is. (XEN) stdvga.c:147:d12v0 entering stdvga and caching modes Seems to have worked fine ;) (d12) pmm call arg1=0 (d12) Turning on vga text mode console (d12) SeaBIOS (version rel-1.7.5-0-ge51488c-20140626_113926-dcs-xen-54) At this point you should see the seabios banner at the vga screen. And the VGA screen has the SeaBIOS messages: SeaBIOS (version rel-1.7.5-0-ge51488c-20140626_113926-dcs-xen-54) Machine UUID 18cf5fd0-e564-49c4-b63f-e6c3c23b1294 As it should be. Now the questions is why it doesn't work for Tiejun ... Gerd and Don, Thanks for your information. But I have no time to validate this configuration now, I will update this as soon as I try Don's config file. Thanks again. Tiejun Anything in the logs? cheers, Gerd
Re: [Qemu-devel] [PATCH] qemu-char: Convert socket char backend to parse/kind
Hi, * The old qemu_chr_open_socket() has an if (!is_waitconnect) qemu_set_nonblock(fd); which the QMP char-open codepath has never had. Does this matter? Which of the two code paths was correct? Gerd? IIRC the socket is put into non-blocking mode anyway by the qemu socket helper functions. cheers, Gerd
[Qemu-devel] [PATCH v2] numa: check for busy memory backend
Specifying the same memory backend twice leads to an assert: ./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0 -numa node,nodeid=1,memdev=ram0 qemu-system-x86_64: /scm/qemu/memory.c:1506: memory_region_add_subregion_common: Assertion `!subregion-container' failed. Aborted (core dumped) Detect and exit with an error message instead. Reviewed-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- numa.c | 8 1 file changed, 8 insertions(+) diff --git a/numa.c b/numa.c index 2fde740..7bf7834 100644 --- a/numa.c +++ b/numa.c @@ -301,6 +301,14 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner, exit(1); } +if (memory_region_is_mapped(seg)) { +char *path = object_get_canonical_path_component(OBJECT(backend)); +error_report(memory backend %s is used multiple times. Each + -numa option must use a different memdev value., + path); +exit(1); +} + memory_region_add_subregion(mr, addr, seg); vmstate_register_ram_global(seg); addr += size; -- 1.9.3
Re: [Qemu-devel] [PATCH] qemu-char: Convert socket char backend to parse/kind
On 30 June 2014 11:23, Gerd Hoffmann kra...@redhat.com wrote: Hi, * The old qemu_chr_open_socket() has an if (!is_waitconnect) qemu_set_nonblock(fd); which the QMP char-open codepath has never had. Does this matter? Which of the two code paths was correct? Gerd? IIRC the socket is put into non-blocking mode anyway by the qemu socket helper functions. In that case is qemu_chr_open_socket_fd() incorrect in marking the socket as nonblocking in the is_listen is_waitconnect case? -- PMM
[Qemu-devel] [PATCH for-2.1 v2] pc-dimm: error out if memory hotplug is not enabled
fixes QEMU abort in case it's started without memory hotplug enabled. as result of fix it will print following messages: -device pc-dimm,id=d1,memdev=m1: memory hotplug is not enabled, enable it on startup -device pc-dimm,id=d1,memdev=m1: Device 'pc-dimm' could not be initialized Also fixup assert condition to detect hotplug address space overflow. Signed-off-by: Igor Mammedov imamm...@redhat.com Reported-by: Hu Tao hu...@cn.fujitsu.com --- v2: - amend error message --- hw/mem/pc-dimm.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c index ad176b7..08f49ed 100644 --- a/hw/mem/pc-dimm.c +++ b/hw/mem/pc-dimm.c @@ -146,7 +146,13 @@ uint64_t pc_dimm_get_free_addr(uint64_t address_space_start, uint64_t new_addr, ret = 0; uint64_t address_space_end = address_space_start + address_space_size; -assert(address_space_end address_space_size); +if (!address_space_size) { +error_setg(errp, memory hotplug is not enabled, + please add maxmem option); +goto out; +} + +assert(address_space_end address_space_start); object_child_foreach(qdev_get_machine(), pc_dimm_built_list, list); if (hint) { -- 1.7.1
Re: [Qemu-devel] [PATCH for-2.1 v2] pc-dimm: error out if memory hotplug is not enabled
On Mon, Jun 30, 2014 at 12:43:29PM +0200, Igor Mammedov wrote: fixes QEMU abort in case it's started without memory hotplug enabled. as result of fix it will print following messages: -device pc-dimm,id=d1,memdev=m1: memory hotplug is not enabled, enable it on startup -device pc-dimm,id=d1,memdev=m1: Device 'pc-dimm' could not be initialized Also fixup assert condition to detect hotplug address space overflow. Signed-off-by: Igor Mammedov imamm...@redhat.com Reported-by: Hu Tao hu...@cn.fujitsu.com Applied, thanks. --- v2: - amend error message --- hw/mem/pc-dimm.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c index ad176b7..08f49ed 100644 --- a/hw/mem/pc-dimm.c +++ b/hw/mem/pc-dimm.c @@ -146,7 +146,13 @@ uint64_t pc_dimm_get_free_addr(uint64_t address_space_start, uint64_t new_addr, ret = 0; uint64_t address_space_end = address_space_start + address_space_size; -assert(address_space_end address_space_size); +if (!address_space_size) { +error_setg(errp, memory hotplug is not enabled, + please add maxmem option); +goto out; +} + +assert(address_space_end address_space_start); object_child_foreach(qdev_get_machine(), pc_dimm_built_list, list); if (hint) { -- 1.7.1
Re: [Qemu-devel] [PATCH v2] numa: check for busy memory backend
On Mon, Jun 30, 2014 at 06:28:15PM +0800, Hu Tao wrote: Specifying the same memory backend twice leads to an assert: ./x86_64-softmmu/qemu-system-x86_64 -m 512M -enable-kvm -object memory-backend-ram,size=256M,id=ram0 -numa node,nodeid=0,memdev=ram0 -numa node,nodeid=1,memdev=ram0 qemu-system-x86_64: /scm/qemu/memory.c:1506: memory_region_add_subregion_common: Assertion `!subregion-container' failed. Aborted (core dumped) Detect and exit with an error message instead. Reviewed-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Hu Tao hu...@cn.fujitsu.com Applied, thanks. --- numa.c | 8 1 file changed, 8 insertions(+) diff --git a/numa.c b/numa.c index 2fde740..7bf7834 100644 --- a/numa.c +++ b/numa.c @@ -301,6 +301,14 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner, exit(1); } +if (memory_region_is_mapped(seg)) { +char *path = object_get_canonical_path_component(OBJECT(backend)); +error_report(memory backend %s is used multiple times. Each + -numa option must use a different memdev value., + path); +exit(1); +} + memory_region_add_subregion(mr, addr, seg); vmstate_register_ram_global(seg); addr += size; -- 1.9.3
Re: [Qemu-devel] The first function called after migration for a block device
Il 28/06/2014 00:54, Xiongzi Ge ha scritto: Hi Paolo, Thanks. I found a function called bdrv_invalidate_cache() in qcow2.c. After migration, it will be called to invalidate the cache of the block device? Let me quote again: This function has *nothing* to do with the guest OS's cache. That cache is managed by the guest OS and, as I have already told you multiple times, there is *nothing* that QEMU can do about it. All you can do is use O_DIRECT in the guest. Did you even *try* to understand this?!? Paolo
Re: [Qemu-devel] [PATCH FOR 2.1 1/5] tests/test-qmp-event: fix for GLib 2.31
Il 29/06/2014 22:31, Peter Maydell ha scritto: On 27 June 2014 19:28, Luiz Capitulino lcapitul...@redhat.com wrote: On Wed, 25 Jun 2014 15:15:35 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 25/06/2014 15:13, Luiz Capitulino ha scritto: On Tue, 24 Jun 2014 16:33:56 -0700 Wenchao Xia wenchaoq...@gmail.com wrote: From: Paolo Bonzini pbonz...@redhat.com On old GLib, the test needs a g_thread_init call. Reported-by: Wenchao Xia wenchaoq...@gmail.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Tested-by: Wenchao Xia wenchaoq...@gmail.com Signed-off-by: Wenchao Xia wenchaoq...@gmail.com --- tests/test-qmp-event.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/tests/test-qmp-event.c b/tests/test-qmp-event.c index cb1e441..17c6444 100644 --- a/tests/test-qmp-event.c +++ b/tests/test-qmp-event.c @@ -251,6 +251,7 @@ static void test_event_d(TestEventData *data, int main(int argc, char **argv) { +g_thread_init(NULL); qmp_event_set_func_emit(event_test_emit); g_test_init(argc, argv, NULL); This breaks make check on F20: /home/lcapitulino/work/src/upstream/qmp-unstable/tests/test-qmp-event.c: In function ‘main’: /home/lcapitulino/work/src/upstream/qmp-unstable/tests/test-qmp-event.c:254:5: error: ‘g_thread_init’ is deprecated (declared at /usr/include/glib-2.0/glib/deprecated/gthread.h:260) [-Werror=deprecated-declarations] g_thread_init(NULL); ^ cc1: all warnings being treated as errors make: *** [tests/test-qmp-event.o] Error 1 I think the best way to fix this is to make util/osdep.c:thread_init() public (maybe by moving it to include/glib-compat.h) and use that instead. Also, note that thread_init()'s body is duplicated in a few other places, so maybe those places should call it too. You may want to do this in a different series, then I can skip this patch and apply the rest of the series. Thanks Luiz, it's a good suggestion. Paolo, Wenchao, are one of one going to work on this? Ping! Can we have at least a local fix using glib version #ifdefs before Tuesday please? Otherwise we need to do something like this to avoid shipping an rc0 which doesn't pass make check on some systems. I'll send the patch today. thread_init() is a bit tricky because it is a __constructor__ but it is not included in the binary because no other function is included from the same file. BTW, the make check limitation is currently mentioned in the changelog. Paolo diff --git a/tests/Makefile b/tests/Makefile index 7e53d0d..a1a0dae 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -27,8 +27,6 @@ check-unit-y += tests/test-string-input-visitor$(EXESUF) gcov-files-test-string-input-visitor-y = qapi/string-input-visitor.c check-unit-y += tests/test-string-output-visitor$(EXESUF) gcov-files-test-string-output-visitor-y = qapi/string-output-visitor.c -check-unit-y += tests/test-qmp-event$(EXESUF) -gcov-files-test-qmp-event-y += qapi/qmp-event.c check-unit-y += tests/test-opts-visitor$(EXESUF) gcov-files-test-opts-visitor-y = qapi/opts-visitor.c check-unit-y += tests/test-coroutine$(EXESUF) @@ -213,7 +211,7 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o tests/check-qdict.o \ tests/test-qmp-input-visitor.o tests/test-qmp-input-strict.o \ tests/test-qmp-commands.o tests/test-visitor-serialization.o \ tests/test-x86-cpuid.o tests/test-mul64.o tests/test-int128.o \ - tests/test-opts-visitor.o tests/test-qmp-event.o + tests/test-opts-visitor.o test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \ tests/test-qapi-event.o thanks -- PMM
Re: [Qemu-devel] [RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class type
Il 30/06/2014 05:13, Chen, Tiejun ha scritto: After I discuss internal, we think even we just set the real vendor/device ids to this ISA bridge at 00:1f.0, guest firmware should still work well with these pair of real vendor/device ids. So if you think something would conflict or be broken, could you tell us what's exactly that? Then we will double check. The Xen hvmloader doesn't break since it only supports one chipset. But SeaBIOS checks for the exact vendor/device ids since Q35 support was added. If you want to add this feature, try to implement it in a way that is a bit more forward-looking. I'm sure that Xen sooner or later will want a PCIe chipset, otherwise things such as AER forwarding are impossible. Paolo
Re: [Qemu-devel] [PATCH] qemu-char: Convert socket char backend to parse/kind
On Mo, 2014-06-30 at 11:33 +0100, Peter Maydell wrote: On 30 June 2014 11:23, Gerd Hoffmann kra...@redhat.com wrote: Hi, * The old qemu_chr_open_socket() has an if (!is_waitconnect) qemu_set_nonblock(fd); which the QMP char-open codepath has never had. Does this matter? Which of the two code paths was correct? Gerd? IIRC the socket is put into non-blocking mode anyway by the qemu socket helper functions. In that case is qemu_chr_open_socket_fd() incorrect in marking the socket as nonblocking in the is_listen is_waitconnect case? Honestly I think the whole waitconnect stuff should be ripped out. Obvious problem is backward compatibility. But maybe not because nobody uses it anyway. Anyone out there using chardev server sockets without 'nowait' option? waitconnect means go into server mode, wait for a client to connect, then unpause the guest. Which handles one special case of the generic start qemu with -S, connect everything, then sent 'cont' to monitor libvirt is doing. IIRC wait for client to connect is a blocking accept() call. Which you certainly don't want do in a qmp command handler. So if we switch over chardevs created via -chardev to use the qmp init code path too we need some hackery to make '-chardev socket,wait,...' work without introducing a blocking qmp command I suspect ... cheers, Gerd
Re: [Qemu-devel] [PATCH v2 3/3] serial: poll the serial console with G_IO_HUP
Do I need to resend this? it's been more than a month without review. Roger. On 13/06/14 17:35, Roger Pau Monné wrote: Ping? On 23/05/14 17:57, Roger Pau Monne wrote: On FreeBSD polling a master pty while the other end is not connected with G_IO_OUT only results in an endless wait. This is different from the Linux behaviour, that returns immediately. In order to demonstrate this, I have the following example code: http://xenbits.xen.org/people/royger/test_poll.c When executed on Linux: $ ./test_poll In callback On FreeBSD instead, the callback never gets called: $ ./test_poll So, in order to workaround this, poll the source with G_IO_HUP (which makes the code behave the same way on both Linux and FreeBSD). Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Peter Crosthwaite peter.crosthwa...@xilinx.com Cc: Michael Tokarev m...@tls.msk.ru Cc: Andreas Färber afaer...@suse.de Cc: Paolo Bonzini pbonz...@redhat.com Cc: xen-de...@lists.xenproject.org --- Changes since v1: - Fix other users of qemu_chr_fe_add_watch to use G_IO_HUP. --- hw/char/serial.c |2 +- hw/char/virtio-console.c |3 ++- hw/usb/redirect.c|2 +- monitor.c|2 +- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/hw/char/serial.c b/hw/char/serial.c index f4d167f..2a2c9e5 100644 --- a/hw/char/serial.c +++ b/hw/char/serial.c @@ -246,7 +246,7 @@ static gboolean serial_xmit(GIOChannel *chan, GIOCondition cond, void *opaque) serial_receive1(s, s-tsr, 1); } else if (qemu_chr_fe_write(s-chr, s-tsr, 1) != 1) { if (s-tsr_retry = 0 s-tsr_retry MAX_XMIT_RETRY -qemu_chr_fe_add_watch(s-chr, G_IO_OUT, serial_xmit, s) 0) { +qemu_chr_fe_add_watch(s-chr, G_IO_OUT|G_IO_HUP, serial_xmit, s) 0) { s-tsr_retry++; return FALSE; } diff --git a/hw/char/virtio-console.c b/hw/char/virtio-console.c index 6c8be0f..38e290a 100644 --- a/hw/char/virtio-console.c +++ b/hw/char/virtio-console.c @@ -69,7 +69,8 @@ static ssize_t flush_buf(VirtIOSerialPort *port, if (!k-is_console) { virtio_serial_throttle_port(port, true); if (!vcon-watch) { -vcon-watch = qemu_chr_fe_add_watch(vcon-chr, G_IO_OUT, +vcon-watch = qemu_chr_fe_add_watch(vcon-chr, +G_IO_OUT|G_IO_HUP, chr_write_unblocked, vcon); } } diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c index 287a505..06e757d 100644 --- a/hw/usb/redirect.c +++ b/hw/usb/redirect.c @@ -284,7 +284,7 @@ static int usbredir_write(void *priv, uint8_t *data, int count) r = qemu_chr_fe_write(dev-cs, data, count); if (r count) { if (!dev-watch) { -dev-watch = qemu_chr_fe_add_watch(dev-cs, G_IO_OUT, +dev-watch = qemu_chr_fe_add_watch(dev-cs, G_IO_OUT|G_IO_HUP, usbredir_write_unblocked, dev); } if (r 0) { diff --git a/monitor.c b/monitor.c index 593679a..ae1c539 100644 --- a/monitor.c +++ b/monitor.c @@ -304,7 +304,7 @@ void monitor_flush(Monitor *mon) mon-outbuf = tmp; } if (mon-watch == 0) { -mon-watch = qemu_chr_fe_add_watch(mon-chr, G_IO_OUT, +mon-watch = qemu_chr_fe_add_watch(mon-chr, G_IO_OUT|G_IO_HUP, monitor_unblocked, mon); } }
Re: [Qemu-devel] [PATCH 0/3] block: introduce submit I/O at batch
Il 30/06/2014 11:49, Ming Lei ha scritto: Hi, The commit 580b6b2aa2(dataplane: use the QEMU block layer for I/O) introduces ~40% throughput regression on virtio-blk dataplane, and one of causes is that submitting I/O at batch is removed. This patchset trys to introduce this mechanism on block, at least, linux-aio can benefit from that. With these patches, it is observed that thoughout on virtio-blk dataplane can be improved a lot, see data in commit log of patch 3/3. It should be possible to apply the batch mechanism to other devices (such as virtio-scsi) too. The basic idea of the code is great, however I do not think it is necessary to add the logic in AioContext. You are right that io_submit supports multiple I/O, but right now we have one io_context_t per BlockDriverState so it is somewhat premature. Note that AioContext in QEMU means something else than io_context_t. I don't think it's necessary to walk backing_hd too (especially because we're close to release and this cannot be a regression---dataplane in 2.0 didn't support backing files at all!). We need to keep the patches as simple as possible. You can just add bdrv_plug/bdrv_unplug callbacks to block/linux-aio.c and, in block.c, something like BlockDriver *drv = bs-drv; if (drv drv-bdrv_plug) { drv-bdrv_plug(bdrv); } else if (bs-file) { bdrv_plug(bs-file); } and place all the queuing logic in linux-aio.c. It should be obvious to the reviewer that the patch only affects dataplane. (Also, note that QEMU doesn't use __ as the prefix for internal function). Thanks for the patch! Paolo
Re: [Qemu-devel] [questions] about KVM as a Microsoft-compatible hypervisor
On Mon, Jun 30, 2014 at 6:02 AM, Vadim Rozenfeld vroze...@redhat.com wrote: On Mon, 2014-06-30 at 09:39 +0800, Zhang Haoyu wrote: Hi, Vadim I read the kvm-2012-forum paper KVM as a Microsoft-compatible hypervisor, Any update and other references, please? Thanks, Zhang Haoyu Unfortunately, not too much. From the the most recent, we have lazy eoi implemented by MST and reference time counter. Best regards, Vadim. It looks like that Mircosoft has defined a large number of synthetic registers in their Hyper-v spec, so ultimately KVM should virtualize most of these registers, so as to support the Mircosoft Enlightment, right? -Jidong
Re: [Qemu-devel] Why I advise against using ivshmem
Stefan Hajnoczi stefa...@gmail.com writes: On Tue, Jun 17, 2014 at 11:44:11AM +0200, Paolo Bonzini wrote: Il 17/06/2014 11:03, David Marchand ha scritto: Unless someone steps up and maintains ivshmem, I think it should be deprecated and dropped from QEMU. Then I can maintain ivshmem for QEMU. If this is ok, I will send a patch for MAINTAINERS file. Typically, adding yourself to maintainers is done only after having proved your ability to be a maintainer. :) So, let's stop talking and go back to code! You can start doing what was suggested elsewhere in the thread: get the server and uio driver merged into the QEMU tree, document the protocol in docs/specs/ivshmem_device_spec.txt, and start fixing bugs such as the ones that Markus reported. One more thing to add to the list: static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) The flags argument should be size. Size should be checked before accessing buf. Please also see the bug fixes in the following unapplied patch: [PATCH] ivshmem: fix potential OOB r/w access (#2) by Sebastian Krahmer https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg03538.html Another one: most devices can be controlled via a dedicated CONFIG_DEVNAME, but not ivshmem: it uses CONFIG_KVM and CONFIG_PCI. Giving it its own CONFIG_IVSHMEM would be nice.
Re: [Qemu-devel] [RFC] qemu VGA endian swap low level drawing changes
Hi, From what I can tell, we only ever call the cursor drawing callback on non-shared surfaces. Should I deduce that the HW cursor emulation simply doesn't work when using shared surfaces ? Or is there another path I have missed to handle it ? Hmm. Looks like hw-cursor-on-shared-surface broken indeed. Need to dig out a guest which actually uses it go figure when testing your patch series ... It makes sense in a way since we never want to draw the cursor in the shared framebuffer, but we could probably handle it by having a small separate pixman surface which we paint on top of the final render or something like that (or link to a host side HW cursor if any) but I can't quite see anything in the code. There is infrastructure to inform the ui code how the cursor should look like (grep for dpy_cursor_define), so we actually can use the hosts hardware cursor support. cirrus doesn't use it though. cheers, Gerd
Re: [Qemu-devel] [RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class type
On Thu, Jun 19, 2014 at 05:53:51PM +0800, Tiejun Chen wrote: Originally the reason to probe ISA bridge instead of Dev31:Fun0 is to make graphics device passthrough work easy for VMM, that only need to expose ISA bridge to let driver know the real hardware underneath. This is a requirement from virtualization team. Especially in that virtualized environments, XEN, there is irrelevant ISA bridge in the system with that legacy qemu version specific to xen, qemu-xen-traditional. So to work reliably, we should scan through all the ISA bridge devices and check for the first match, instead of only checking the first one. But actually, qemu-xen-traditional, is always enumerated with Dev31:Fun0, 00:1f.0 as follows: hw/pt-graphics.c: intel_pch_init() | + pci_isa_bridge_init(bus, PCI_DEVFN(0x1f, 0), ...); so this mean that isa bridge is still represented with Dev31:Func0 like the native OS. Furthermore, currently we're pushing VGA passthrough support into qemu upstream, and with some discussion, we wouldn't set the bridge class type and just expose this devfn. So we just go back to check devfn to make life normal. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- drivers/gpu/drm/i915/i915_drv.c | 19 +++ 1 file changed, 3 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 651e65e..cb2526e 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -417,18 +417,8 @@ void intel_detect_pch(struct drm_device *dev) return; } - /* - * The reason to probe ISA bridge instead of Dev31:Fun0 is to - * make graphics device passthrough work easy for VMM, that only - * need to expose ISA bridge to let driver know the real hardware - * underneath. This is a requirement from virtualization team. - * - * In some virtualized environments (e.g. XEN), there is irrelevant - * ISA bridge in the system. To work reliably, we should scan trhough - * all the ISA bridge devices and check for the first match, instead - * of only checking the first one. - */ - while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA 8, pch))) { + pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0)); + if (pch) { Then if you want to use this slot for something else, what happens? If you want to relax the PCI_CLASS_BRIDGE_ISA requirement when running on top of a hypervisor, just scan all devices. if (pch-vendor == PCI_VENDOR_ID_INTEL) { unsigned short id = pch-device INTEL_PCH_DEVICE_ID_MASK; dev_priv-pch_id = id; @@ -462,10 +452,7 @@ void intel_detect_pch(struct drm_device *dev) DRM_DEBUG_KMS(Found LynxPoint LP PCH\n); WARN_ON(!IS_HASWELL(dev)); WARN_ON(!IS_ULT(dev)); - } else - continue; - - break; + } } } if (!pch) -- 1.9.1
Re: [Qemu-devel] [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
Il 30/06/2014 12:20, Chen, Tiejun ha scritto: I already post this to mainline to change as follows: -while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA 8, pch))) { +pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0)); +if (pch) { Please refer to this, [RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class type Linux Native guys would like to accept this. And actually Windows always use devfn to detect this. Fair enough, but that means that, when using IGD, Q35 will have to move the ISA bridge off 1f.0. To me it seems fairly clear that as things stand IGD is not virtualizable without PV support. We're beating a dead horse. Paolo
Re: [Qemu-devel] [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On Mon, Jun 30, 2014 at 06:20:22PM +0800, Chen, Tiejun wrote: On 2014/6/30 17:55, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote: On 2014/6/30 17:05, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote: On 2014/6/30 14:48, Michael S. Tsirkin wrote: On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote: On 2014/6/26 18:03, Paolo Bonzini wrote: Il 26/06/2014 11:18, Chen, Tiejun ha scritto: - offsets 0x..0x0fff map to configuration space of the host MCH Are you saying the config space in the video device? No, I am saying in a new BAR, or at some magic offset of an existing MMIO BAR. As I mentioned previously, the IGD guy told me we have no any unused a offset or BAR in the config space. And guy who are responsible for the native driver seems not be accept to extend some magic offset of an existing MMIO BAR. In addition I think in a short time its not possible to migrate i440fx to q35 as a PCIe machine of xen. That seems like a weak motivation. I don't see a need to get something merged upstream in a short time: this seems sure to miss 2.1, so you have the time to make it architecturally sound. Making existing guests work would be a better motivation. Yes. So focus on this then. Existing guests will probably work fine on a newer chipset - likely better than on i440fx. xen management tools need to do some work to support this? That will just give everyone more long term benefits. If instead we create a hack that does not resemble any existing hardware even remotely, what's the chance that it will not break with any future guest modification? Isn't this possible with an mch chipset? If you're saying q35, I mean AFAIK we have no any plan to migrate to this MCH in xen case. q35 or a newer chipset that's closer to what guests expect. Additionally, I think I should follow this feature after q35 can work for xen scenario. What's stopping you? I mean if you want create an new machine based on q35, actually this is equal to start making xen to migrate to q35 now. Right? I can't image how much effort should be done. I don't see why you don't try. Sounds like a more robust solution to me. As I think this is another requirement to us. I'm not sure if I have enough time to touch this currently. So this is a reason why I'm saying I'd like to follow this feature after q35 can work with xen completely. Then we'll end up with more configurations to support, and to what end? So could we do this step by step: #1 phase: We just cover current qemu-xen implementation based on i44fx, so still provide that pseudo ISA bridge at 00:1f.0 as we already did. By the way there is no reason to put it at 00:1f.0 specifically I think. So it seems simple: create a dummy device that gets device and vendor id as properties. If you really like, add an option to get values Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx passthrough: create pseudo intel isa bridge. There, we fake this device just at 00:1f.0. But you guys don't like this, and shouldn't this be just this point we discussing now? If you guys agree that , everything is fine. Actually, this isn't what you did. Don't tie it to xen, and don't tie it to 1f. Just make it a simple stub pci device. Whoever wants it, creates it. The thing I worry about, is the chance this will break going forward. So you created a system with 2 isa bridges. I don't set class type to claim this is an ISA bridge, just a normal PCI device. +static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice *hdev) +{ +struct PCIDevice *dev; + +char rid; + +/* We havt to use a simple PCI device to fake this ISA bridge + * to avoid making some confusion to BIOS and ACPI. + */ +dev = pci_create(bus, PCI_DEVFN(0x1f, 0), pseudo-intel-pch-isa-bridge); + +qdev_init_nofail(dev-qdev); + +pci_config_set_vendor_id(dev-config, hdev-vendor_id); +pci_config_set_device_id(dev-config, hdev-device_id); + +xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)rid, 1); + +pci_config_set_revision(dev-config, rid); + +XEN_PT_LOG(dev, The pseudo Intel PCH ISA bridge created.\n); +return 0; +} Then I don't see how it's supposed to work. Doesn't i915 look for an isa bridge? /* * The reason to probe ISA bridge instead of Dev31:Fun0 is to * make graphics device passthrough work easy for VMM, that only * need to expose ISA bridge to let driver know the real hardware * underneath. This is a requirement from virtualization team. * * In some virtualized environments (e.g. XEN), there is irrelevant * ISA bridge in the system. To work reliably, we should scan trhough * all the ISA bridge devices and check for
Re: [Qemu-devel] [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On Mon, Jun 30, 2014 at 01:18:51PM +0200, Paolo Bonzini wrote: Il 30/06/2014 12:20, Chen, Tiejun ha scritto: I already post this to mainline to change as follows: -while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA 8, pch))) { +pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0)); +if (pch) { Please refer to this, [RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class type Linux Native guys would like to accept this. And actually Windows always use devfn to detect this. Fair enough, but that means that, when using IGD, Q35 will have to move the ISA bridge off 1f.0. To me it seems fairly clear that as things stand IGD is not virtualizable without PV support. We're beating a dead horse. Paolo It seems virtualizeable without PV. Virtualizing it just requires emulating a chipset that's closer to what the driver expects, which seems to be more effort than Tiejun is prepared to put in. -- MST
Re: [Qemu-devel] [PATCH] qemu-char: Convert socket char backend to parse/kind
On 30 June 2014 11:57, Gerd Hoffmann kra...@redhat.com wrote: IIRC wait for client to connect is a blocking accept() call. Which you certainly don't want do in a qmp command handler. So if we switch over chardevs created via -chardev to use the qmp init code path too we need some hackery to make '-chardev socket,wait,...' work without introducing a blocking qmp command I suspect ... We already have blocking code paths in qmp_chardev_open_socket(): for the non-listening case we call socket_connect() with a NULL callback argument, which will result in our calling connect() in blocking mode. thanks -- PMM
Re: [Qemu-devel] [RFC 10/14] vga: Remove some should be done in BIOS comments
On Di, 2014-06-24 at 09:11 +1000, Benjamin Herrenschmidt wrote: Not all platforms have a VGA BIOS, powerpc typically relies on using the DISPI interface to initialize the card. s/bios/guest/ and it would hold ;) But the dispi interface is defined the way it is. We are not going to change it, even if it isn't perfect, as it is guest/host abi. So the comments are not very useful indeed ... cheers, Gerd
Re: [Qemu-devel] [PATCH v8 03/14] qcow2: Optimize bdrv_make_empty()
Am 07.06.2014 um 20:51 hat Max Reitz geschrieben: bdrv_make_empty() is currently only called if the current image represents an external snapshot that has been committed to its base image; it is therefore unlikely to have internal snapshots. In this case, bdrv_make_empty() can be greatly sped up by creating an empty L1 table and dropping all data clusters at once by recreating the refcount structure accordingly instead of normally discarding all clusters. If there are snapshots, fall back to the simple implementation (discard all clusters). Signed-off-by: Max Reitz mre...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com This approach looks a bit too complicated to me, and calulating the required metadata size seems error-prone. How about this: 1. Set the dirty flag in the header so we can mess with the L1 table without keeping the refcounts consistent 2. Overwrite the L1 table with zeros 3. Overwrite the first n clusters after the header with zeros (n = 2 + l1_clusters). 4. Update the header: refcount_table_offset = cluster_size refcount_table_clusters = 1 l1_table_offset = 3 * cluster_size 6. bdrv_truncate to n + 1 clusters 7. Now update the first 8 bytes at cluster_size (the first new refcount table entry) to point to 2 * cluster_size (new refcount block) 8. Reset refcount block and L2 cache 9. Allocate n + 1 clusters (the header, too) and make sure you get offset 0 10. Remove the dirty flag Surprisingly (or not) this is much like an ordinary image creation. The main difference is that we keep the full size of the L1 table so the image stays always valid (the spec would even allow us to temporarily set l1_size = 0, but qcow2_open() doesn't seem to like that) and all areas where the L1 table could be are zeroed (this includes the new refcount table/block until the header is updated). I wanted to check whether this would still give the preallocation=full series what it needs, but a v11 doesn't seem to be on the list yet and v10 doesn't have the dependency on this series yet. Kevin
Re: [Qemu-devel] [questions] about KVM as a Microsoft-compatible hypervisor
On Mon, 2014-06-30 at 06:19 -0400, Jidong Xiao wrote: On Mon, Jun 30, 2014 at 6:02 AM, Vadim Rozenfeld vroze...@redhat.com wrote: On Mon, 2014-06-30 at 09:39 +0800, Zhang Haoyu wrote: Hi, Vadim I read the kvm-2012-forum paper KVM as a Microsoft-compatible hypervisor, Any update and other references, please? Thanks, Zhang Haoyu Unfortunately, not too much. From the the most recent, we have lazy eoi implemented by MST and reference time counter. Best regards, Vadim. It looks like that Mircosoft has defined a large number of synthetic registers in their Hyper-v spec, so ultimately KVM should virtualize most of these registers, so as to support the Mircosoft Enlightment, right? -Jidong Yes, but you don't have to support all the Hyper-V features at once. Hypervisor declares supported feature by specifying appropriate flags in Feature identification (0x4003) and Implementation recommendations (0x4004)CPUID leaves. Best regards, Vadim. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC 13/14] vga: Add endian control register
Hi, diff --git a/hw/display/vga_int.h b/hw/display/vga_int.h index ae64321..894c6ab 100644 --- a/hw/display/vga_int.h +++ b/hw/display/vga_int.h @@ -47,6 +47,8 @@ #define VBE_DISPI_INDEX_Y_OFFSET0x9 #define VBE_DISPI_INDEX_NB 0xa /* size of vbe_regs[] */ #define VBE_DISPI_INDEX_VIDEO_MEMORY_64K 0xa /* read-only, not in vbe_regs */ +#define VBE_DISPI_INDEX_EXTENDED_CAPS 0xb /* read-only, not in vbe_regs */ +#define VBE_DISPI_INDEX_ENDIAN_CTRL 0xc /* not in vbe_regs */ #define VBE_DISPI_ID0 0xB0C0 #define VBE_DISPI_ID1 0xB0C1 @@ -55,13 +57,22 @@ #define VBE_DISPI_ID4 0xB0C4 #define VBE_DISPI_ID5 0xB0C5 I was more thinking to add ID6 to indicate the new interface revision with the additional VBE_DISPI_INDEX_ENDIAN_CTRL register. I'm a bit worried that there is no response from the bochs guys yet, I don't want have two incompatible rev6 interfaces. At least nobody seems to have defined one so far, google finds nothing for bochs dispi 0xB0C6. cheers, Gerd
Re: [Qemu-devel] [questions] about KVM as a Microsoft-compatiblehypervisor
Hi, Vadim I read the kvm-2012-forum paper KVM as a Microsoft-compatible hypervisor, Any update and other references, please? Thanks, Zhang Haoyu Unfortunately, not too much. From the the most recent, we have lazy eoi implemented by MST and reference time counter. How to get the source of windows pv-eoi? And what is reference time counter, could you provide some references or code, please? Thanks, Zhang Haoyu Best regards, Vadim.
[Qemu-devel] [PATCH 0/5] tests: Add the image fuzzer with qcow2 support
This patch series introduces the image fuzzer, a tool for stability and reliability testing. Its approach is to run large amount of tests in background. During every test a program (e.g. qemu-img) is called to read or modify an invalid test image. A test image has valid inner structure defined by its format specification with some fields having random invalid values. Patch 1 contains documentation for the image fuzzer, patch 2 is the test runner and remaining ones relate to the image generator for qcow2 format. Maria Kustova (5): docs: Specification for the image fuzzer runner: Tool for fuzz tests execution fuzz: Fuzzing functions for qcow2 images layout: Generator of fuzzed qcow2 images package: Public API for image-fuzzer/runner/runner.py tests/image-fuzzer/docs/image-fuzzer.txt | 176 + tests/image-fuzzer/qcow2/__init__.py | 1 + tests/image-fuzzer/qcow2/fuzz.py | 329 +++ tests/image-fuzzer/qcow2/layout.py | 250 +++ tests/image-fuzzer/runner/runner.py | 270 + 5 files changed, 1026 insertions(+) create mode 100644 tests/image-fuzzer/docs/image-fuzzer.txt create mode 100644 tests/image-fuzzer/qcow2/__init__.py create mode 100644 tests/image-fuzzer/qcow2/fuzz.py create mode 100644 tests/image-fuzzer/qcow2/layout.py create mode 100755 tests/image-fuzzer/runner/runner.py -- 1.9.3
[Qemu-devel] [PATCH 1/5] docs: Specification for the image fuzzer
'Overall fuzzer requirements' chapter contains the current product vision and features done and to be done. This chapter is still in progress. Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/docs/image-fuzzer.txt | 176 +++ 1 file changed, 176 insertions(+) create mode 100644 tests/image-fuzzer/docs/image-fuzzer.txt diff --git a/tests/image-fuzzer/docs/image-fuzzer.txt b/tests/image-fuzzer/docs/image-fuzzer.txt new file mode 100644 index 000..a9ed4b6 --- /dev/null +++ b/tests/image-fuzzer/docs/image-fuzzer.txt @@ -0,0 +1,176 @@ +Image fuzzer + + +Description +--- + +The goal of the image fuzzer is to catch crashes of qemu-io/qemu-img providing +to them randomly corrupted images. +Test images are generated from scratch and have valid inner structure with some +elements, e.g. L1/L2 tables, having random invalid values. + + +Test runner +--- + +The test runner generates test images, executes tests utilizing generated +images, indicates their results and collect all test related artifacts (logs, +core dumps, test images). +The test means one start of a system under test (SUT), e.g. qemu-io, with +specified arguments and one test image. +By default, the test runner generates new tests and executes them till +keyboard interruption. But if a test seed is specified via '-s' runner +parameter, then only one test with this seed will be executed, after its finish +the runner will exit. + +The runner uses an external image fuzzer to generate test images. An image +generator should be specified as a mandatory parameter of the test runner. +Details about interactions between the runner and fuzzers see Module +interfaces. + +The runner activates generation of core dumps during test executions, but it +assumes that core dumps will be generated in the current working directory. +For comprehensive test results, please, set up your test environment +properly. + +Path to a SUT can be specified via environment variables (for now only +qemu-img) or via '--binary' parameter of the test runner. For details about +environment variables see qemu-iotests/check. + + +Qcow2 image generator +- + +The 'qcow2' generator is a Python package providing 'create_image' method as +a single public API. See details in 'Test runner/image fuzzer' in 'Module +interfaces'. + +Qcow2 contains two submodules: fuzz.py and layout.py. + +'fuzz.py' contains all fuzzing functions one per image field. It's assumed that +after code analysis every field will have own constraints for its value. For +now only universal potentially dangerous values are used, e.g. type limits for +integers or unsafe symbols as '%s' for strings. For bitmasks random amount of +bits are set to ones. All fuzzed values are checked on non-equality to the +current valid value of the field. In case of equality the value will be +regenerated. + +'layout.py' creates a random valid image, fuzzes a random subset of the image +fields by 'fuzz.py' module and writes a fuzzed image to the file specified. + +For now only header fields and header extensions are generated, the remaining +file is filled with zeros. + + +Module interfaces +- + +* Test runner/image fuzzer + +The runner calls an image generator specifying path to a test image file. +An image generator is expected to provide 'create_image(test_img_path)' method +that creates a test image and writes it to the specified file. The file should +be created if it doesn't exist or overwritten otherwise. +Random seed is set by the runner at every test execution for the regression +purpose, so an image generator is not recommended to modify it internally. + +* Test runner/SUT + +A full test command is composed from the SUT, its arguments specified +via '-c' runner parameter and the name of generated image. To indicate where +a test image name should be placed TEST_IMG placeholder can be used. +For example, for the next test command + + qemu-img convert -O vmdk fuzzed_image test.vmdk + +a call of the image fuzzer will be following: + + ./runner.py -b qemu-img -c 'convert -O vmdk TEST_IMG test.vmdk' work_dir qcow2 + + +Overall fuzzer requirements +=== + +Input data: +-- + + - image template (generator) + - work directory + - test run duration (optional) + - action vector (optional) + - seed (optional) + - SUT and its arguments (optional) + + +Fuzzer requirements: +--- + +1. Should be able to inject random data +2. Should be able to select a random value from the manually pregenerated +vector (boundary values, e.g. max/min cluster size) +3. Image template should describe a general structure invariant for all +test images (image format description) +4. Image template should be autonomous and other fuzzer parts should not +relate on it +5. Image template should contain reference rules (not only block+size +description) +6. Should generate the test image with the
[Qemu-devel] [PATCH 4/5] layout: Generator of fuzzed qcow2 images
Layout submodule of qcow2 package creates a random valid image, randomly selects some amount of its fields, fuzzes them and write the fuzzed image to the file. Now only header and header extensions are generated, remaining file is filled by zeroes. Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/qcow2/layout.py | 250 + 1 file changed, 250 insertions(+) create mode 100644 tests/image-fuzzer/qcow2/layout.py diff --git a/tests/image-fuzzer/qcow2/layout.py b/tests/image-fuzzer/qcow2/layout.py new file mode 100644 index 000..b296979 --- /dev/null +++ b/tests/image-fuzzer/qcow2/layout.py @@ -0,0 +1,250 @@ +# Generator of fuzzed qcow2 images +# +# Copyright (C) 2014 Maria Kustova mari...@catit.be +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +import random +import struct +import fuzz +from itertools import repeat +import copy + +MAX_IMAGE_SIZE = 10*2**20 +# Standard sizes +UINT32_S = 4 +UINT64_S = 8 + +# Percentage of fields will be fuzzed +BIAS = random.uniform(0.1, 0.4) + + +class Field(object): +Describes an image field as a triplet of a data format necessary for +packing, an offset to the beginning of the image and value of the field. + +Can be iterated as a list [format, offset, value] + +__slots__ = ('fmt', 'offset', 'value') + +def __init__(self, fmt, offset, val): +self.fmt = fmt +self.offset = offset +self.value = val + +def __iter__(self): +return (x for x in [self.fmt, self.offset, self.value]) + +def __repr__(self): +return Field(fmt='%s', offset=%d, value=%s) % (self.fmt, self.offset, + str(self.value)) + + +def walk(v_struct, func): +Walk via structure and apply the specified function to all Field() +elements + +if isinstance(v_struct, list): +for item in v_struct: +walk(item, func) +else: +for k, v in v_struct.items(): +if isinstance(v, list): +walk(v, func) +else: +func(v, k) + + +def fuzz_struct(structure): +Select part of fields in the specified structure and assign them invalid +values + + +def coin(): +Return boolean value proportional to a portion of fields to be +fuzzed + +return random.random() BIAS + +def iter_fuzz(field, name): +Fuzz field value if it's selected +if coin(): +field.value = getattr(fuzz, name)(field.value) + +tmp = copy.deepcopy(structure) +walk(tmp, iter_fuzz) + +return tmp + + +def image_size(): +Generate a random file size aligned to a random correct cluster size +cluster_bits = random.randrange(9, 21) +cluster_size = 1 cluster_bits +file_size = random.randrange(5*cluster_size, MAX_IMAGE_SIZE + 1, + cluster_size) +return [cluster_bits, file_size] + + +def header(cluster_bits, img_size): +Generate a random valid header +meta_header = [ +['4s', 0, 'magic'], +['I', 4, 'version'], +['Q', 8, 'backing_file_offset'], +['I', 16, 'backing_file_size'], +['I', 20, 'cluster_bits'], +['Q', 24, 'size'], +['I', 32, 'crypt_method'], +['I', 36, 'l1_size'], +['Q', 40, 'l1_table_offset'], +['Q', 48, 'refcount_table_offset'], +['I', 56, 'refcount_table_clusters'], +['I', 60, 'nb_snapshots'], +['Q', 64, 'snapshots_offset'], +['Q', 72, 'incompatible_features'], +['Q', 80, 'compatible_features'], +['Q', 88, 'autoclear_features'], +['I', 96, 'refcount_order'], +['I', 100, 'header_length'] +] +values = repeat(0) +v_header = dict((f[2], Field(f[0], f[1], values.next())) +for f in meta_header) + +# Setup of valid values +v_header['magic'].value = QFI\xfb +v_header['version'].value = random.randint(2, 3) +v_header['cluster_bits'].value = cluster_bits +v_header['size'].value = img_size +if v_header['version'].value == 2: +v_header['header_length'].value = 72 +else: +v_header['incompatible_features'].value = random.getrandbits(2) +v_header['compatible_features'].value = random.getrandbits(1) +v_header['header_length'].value =
[Qemu-devel] [PATCH 2/5] runner: Tool for fuzz tests execution
The purpose of the test runner is to prepare test environment (e.g. create a work directory, a test image, etc), execute the program under test with parameters, indicate a test failure if the program was killed during test execution and collect core dumps, logs and other test artifacts. The test runner doesn't depend on image format or a program will be tested, so it can be used with any external image generator and program under test. Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/runner/runner.py | 270 1 file changed, 270 insertions(+) create mode 100755 tests/image-fuzzer/runner/runner.py diff --git a/tests/image-fuzzer/runner/runner.py b/tests/image-fuzzer/runner/runner.py new file mode 100755 index 000..21de78e --- /dev/null +++ b/tests/image-fuzzer/runner/runner.py @@ -0,0 +1,270 @@ +#!/usr/bin/env python + +# Tool for running fuzz tests +# +# Copyright (C) 2014 Maria Kustova mari...@catit.be +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +import sys, os, signal +from time import time +import subprocess +import random +from itertools import count +from shutil import rmtree +import getopt +import resource +resource.setrlimit(resource.RLIMIT_CORE, (-1, -1)) + + +def multilog(msg, *output): + Write an object to all of specified file descriptors + + +for fd in output: +fd.write(msg) +fd.flush() + + +def str_signal(sig): + Convert a numeric value of a system signal to the string one +defined by the current operational system + + +for k, v in signal.__dict__.items(): +if v == sig: +return k + + +class TestException(Exception): +Exception for errors risen by TestEnv objects +pass + + +class TestEnv(object): + Trivial test object + +The class sets up test environment, generates a test image and executes +application under tests with specified arguments and a test image provided. +All logs are collected. +Summary log will contain short descriptions and statuses of tests in +a run. +Test log will include application (e.g. 'qemu-img') logs besides info sent +to the summary log. + + +def __init__(self, test_id, seed, work_dir, run_log, exec_bin=None, + cleanup=True, log_all=False): +Set test environment in a specified work directory. + +Path to qemu_img will be retrieved from 'QEMU_IMG' environment +variable, if a test binary is not specified. + + +if seed is not None: +self.seed = seed +else: +self.seed = hash(time()) + +self.init_path = os.getcwd() +self.work_dir = work_dir +self.current_dir = os.path.join(work_dir, 'test-' + test_id) +if exec_bin is not None: +self.exec_bin = exec_bin.strip().split(' ') +else: +self.exec_bin = \ +os.environ.get('QEMU_IMG', 'qemu-img').strip().split(' ') + +try: +os.makedirs(self.current_dir) +except OSError: +e = sys.exc_info()[1] +print sys.stderr, \ +Error: The working directory '%s' cannot be used. Reason: %s\ +% (self.work_dir, e[1]) +raise TestException +self.log = open(os.path.join(self.current_dir, test.log), w) +self.parent_log = open(run_log, a) +self.result = False +self.cleanup = cleanup +self.log_all = log_all + +def _test_app(self, q_args): + Start application under test with specified arguments and return +an exit code or a kill signal depending on result of an execution. + +devnull = open('/dev/null', 'r+') +return subprocess.call(self.exec_bin + q_args, + stdin=devnull, stdout=self.log, stderr=self.log) + +def execute(self, q_args): + Execute a test. + +The method creates a test image, runs test app and analyzes its exit +status. If the application was killed by a signal, the test is marked +as failed. + +os.chdir(self.current_dir) +# Seed initialization should be as close to image generation call +# as posssible to avoid a corruption of random sequence +random.seed(self.seed) +image_generator.create_image('test_image') +
Re: [Qemu-devel] [RFC 14/14] ppc/spapr/vga: Switch VGA endian on H_SET_MODE
On Di, 2014-06-24 at 09:11 +1000, Benjamin Herrenschmidt wrote: When the guest switches the interrupt endian mode, which essentially means a global machine endian switch, we want to change the VGA framebuffer endian mode as well in order to be backward compatible with existing guests who don't know about the new endian control register. I'd tend to add a notifier (see include/qemu/notify.h) for endian switches instead, so anybody interested could register himself. cheers, Gerd
[Qemu-devel] [PATCH 3/5] fuzz: Fuzzing functions for qcow2 images
Fuzz submodule of qcow2 image generator contains fuzzing functions for image fields. Each fuzzing function contains list of constraints and call of a helper function that randomly selects a fuzzed value satisfied to one of constraints. For now constraints are only known as invalid or potentially dangerous values. But after investigation of code coverage by fuzz tests they will be expanded by heuristic values based on inner checks and flows of a program under test. Now fuzzing of header and header extensions is only supported. Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/qcow2/fuzz.py | 329 +++ 1 file changed, 329 insertions(+) create mode 100644 tests/image-fuzzer/qcow2/fuzz.py diff --git a/tests/image-fuzzer/qcow2/fuzz.py b/tests/image-fuzzer/qcow2/fuzz.py new file mode 100644 index 000..a81296e --- /dev/null +++ b/tests/image-fuzzer/qcow2/fuzz.py @@ -0,0 +1,329 @@ +# Fuzzing functions for qcow2 fields +# +# Copyright (C) 2014 Maria Kustova mari...@catit.be +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +import random + + +UINT32 = 0x +UINT64 = 0x +# Most significant bit orders +UINT32_M = 31 +UINT64_M = 63 +# Fuzz vectors +UINT32_V = [0, 0x100, 0x1000, 0x1, 0x10, UINT32/4, UINT32/2 - 1, +UINT32/2, UINT32/2 + 1, UINT32 - 1, UINT32] +UINT64_V = UINT32_V + [0x100, 0x1000, 0x1, UINT64/4, + UINT64/2 - 1, UINT64/2, UINT64/2 + 1, UINT64 - 1, + UINT64] +STRING_V = ['%s%p%x%d', '.1024d', '%.2049d', '%p%p%p%p', '%x%x%x%x', +'%d%d%d%d', '%s%s%s%s', '%999s', '%08x', '%%20d', '%%20n', +'%%20x', '%%20s', '%s%s%s%s%s%s%s%s%s%s', '%p%p%p%p%p%p%p%p%p%p', +'%#0123456x%08x%x%s%p%d%n%o%u%c%h%l%q%j%z%Z%t%i%e%g%f%a%C%S%08x%%', +'%s x 129', '%x x 257'] + + +def random_from_intervals(intervals): +Select a random integer number from the list of specified intervals + +Each interval is a tuple of lower and upper limits of the interval. The +limits are included. Intervals in a list should not overlap. + +total = reduce(lambda x, y: x + y[1] - y[0] + 1, intervals, 0) +r = random.randint(0, total-1) + intervals[0][0] +temp = zip(intervals, intervals[1:]) +for x in temp: +r = r + (r x[0][1])*(x[1][0] - x[0][1] - 1) +return r + + +def random_bits(bit_ranges): +Generate random binary mask with ones in the specified bit ranges + +Each bit_ranges is a list of tuples of lower and upper limits of bit +positions will be fuzzed. The limits are included. Random amount of bits +in range limits will be set to ones. The mask is returned in decimal +integer format. + +bit_numbers = [] +# Select random amount of random positions in bit_ranges +for rng in bit_ranges: +bit_numbers += random.sample(range(rng[0], rng[1] + 1), + random.randint(0, rng[1] - rng[0] + 1)) +val = 0 +# Set bits on selected possitions to ones +for bit in bit_numbers: +val |= 1 bit +return val + + +def truncate_string(strings, length): +Return strings truncated to specified length +if type(strings) == list: +return [s[length:] for s in strings] +else: +return strings[length:] + + +def int_validator(current, intervals): +Return a random value from intervals not equal to the current. + +This function is useful for selection from valid values except current one. + +val = random_from_intervals(intervals) +if val == current: +return int_validator(current, intervals) +else: +return val + + +def bit_validator(current, bit_ranges): +Return a random bit mask not equal to the current. + +This function is useful for selection from valid values except current one. + + +val = random_bits(bit_ranges) +if val == current: +return bit_validator(current, bit_ranges) +else: +return val + + +def string_validator(current, strings): +Return a random string value from the list not equal to the current. + +This function is useful for selection from valid values except current one. + + +val = random.choice(strings) +if val == current: +return string_validator(current, strings) +else: +
Re: [Qemu-devel] [PATCH] virtio: move common virtio properties to bus class device
On Mon, Jun 30, 2014 at 06:15:35PM +0800, Ming Lei wrote: Hi Michael, On Wed, Jun 18, 2014 at 3:13 PM, Ming Lei ming@canonical.com wrote: The two common virtio features can be defined per bus, so move all into bus class device to make code more clean. As discussed with cornelia, s390-virtio-blk doesn't support the two features at all, so keep s390-virtio as it. Acked-by: Cornelia Huck cornelia.h...@de.ibm.com #for s390 ccw Suggested-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Ming Lei ming@canonical.com --- hw/s390x/s390-virtio-bus.c |2 ++ hw/s390x/virtio-ccw.c | 11 ++- hw/virtio/virtio-pci.c | 12 ++-- include/hw/virtio/virtio-blk.h |3 --- include/hw/virtio/virtio-net.h |1 - include/hw/virtio/virtio-scsi.h |1 - 6 files changed, 14 insertions(+), 16 deletions(-) diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c index 9c71afa..921cdc2 100644 --- a/hw/s390x/s390-virtio-bus.c +++ b/hw/s390x/s390-virtio-bus.c @@ -504,6 +504,7 @@ static unsigned virtio_s390_get_features(DeviceState *d) static Property s390_virtio_net_properties[] = { DEFINE_NIC_PROPERTIES(VirtIONetS390, vdev.nic_conf), +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features), DEFINE_VIRTIO_NET_FEATURES(VirtIOS390Device, host_features), DEFINE_VIRTIO_NET_PROPERTIES(VirtIONetS390, vdev.net_conf), DEFINE_PROP_END_OF_LIST(), @@ -631,6 +632,7 @@ static const TypeInfo virtio_s390_device_info = { static Property s390_virtio_scsi_properties[] = { DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOSCSIS390, vdev.parent_obj.conf), +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features), DEFINE_VIRTIO_SCSI_FEATURES(VirtIOS390Device, host_features), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index 2bf0af8..c0124e1 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -1126,7 +1126,6 @@ static const TypeInfo virtio_ccw_net = { static Property virtio_ccw_blk_properties[] = { DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id), -DEFINE_VIRTIO_BLK_FEATURES(VirtioCcwDevice, host_features[0]), DEFINE_VIRTIO_BLK_PROPERTIES(VirtIOBlkCcw, blk), DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags, VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true), @@ -1158,7 +1157,6 @@ static const TypeInfo virtio_ccw_blk = { static Property virtio_ccw_serial_properties[] = { DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id), DEFINE_VIRTIO_SERIAL_PROPERTIES(VirtioSerialCcw, vdev.serial), -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]), DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags, VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_END_OF_LIST(), @@ -1185,7 +1183,6 @@ static const TypeInfo virtio_ccw_serial = { static Property virtio_ccw_balloon_properties[] = { DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id), -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]), DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags, VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_END_OF_LIST(), @@ -1242,7 +1239,6 @@ static const TypeInfo virtio_ccw_scsi = { static Property vhost_ccw_scsi_properties[] = { DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id), DEFINE_VHOST_SCSI_PROPERTIES(VirtIOSCSICcw, vdev.parent_obj.conf), -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]), DEFINE_PROP_END_OF_LIST(), }; @@ -1279,7 +1275,6 @@ static void virtio_ccw_rng_instance_init(Object *obj) static Property virtio_ccw_rng_properties[] = { DEFINE_PROP_STRING(devno, VirtioCcwDevice, bus_id), -DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]), DEFINE_VIRTIO_RNG_PROPERTIES(VirtIORNGCcw, vdev.conf), DEFINE_PROP_BIT(ioeventfd, VirtioCcwDevice, flags, VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true), @@ -1345,10 +1340,16 @@ static int virtio_ccw_busdev_unplug(DeviceState *dev) return 0; } +static Property virtio_ccw_properties[] = { +DEFINE_VIRTIO_COMMON_FEATURES(VirtioCcwDevice, host_features[0]), +DEFINE_PROP_END_OF_LIST(), +}; + static void virtio_ccw_device_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); +dc-props = virtio_ccw_properties; dc-init = virtio_ccw_busdev_init; dc-exit = virtio_ccw_busdev_exit; dc-unplug = virtio_ccw_busdev_unplug; diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index b437f19..af2e1c3 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -917,7 +917,6 @@ static Property virtio_9p_pci_properties[] = {
[Qemu-devel] [PATCH 5/5] package: Public API for image-fuzzer/runner/runner.py
__init__.py provides the public API required by the test runner Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/qcow2/__init__.py | 1 + 1 file changed, 1 insertion(+) create mode 100644 tests/image-fuzzer/qcow2/__init__.py diff --git a/tests/image-fuzzer/qcow2/__init__.py b/tests/image-fuzzer/qcow2/__init__.py new file mode 100644 index 000..e2ebe19 --- /dev/null +++ b/tests/image-fuzzer/qcow2/__init__.py @@ -0,0 +1 @@ +from layout import create_image -- 1.9.3
Re: [Qemu-devel] [RFC 00/14] VGA cleanups and endian control
On Di, 2014-06-24 at 09:10 +1000, Benjamin Herrenschmidt wrote: This series cleans up VGA and a bit of cirrus to remove all the now unused conversions to non-32bpp surfaces. Then the last two patches add a proposed variant of the endian control register and the (still somewhat controversial) trick to auto-switch VGA endian on powerpc SPAPR platforms. They apply on top of Gerd pixel format series Looks good overall. I'll go test this when I find time, after flushing my email backlog ... cheers, Gerd
Re: [Qemu-devel] [PATCH v5] ppc: spapr-rtas - implement os-term rtas call
On 30.06.14 11:25, Nikunj A Dadhania wrote: Alexander Graf ag...@suse.de writes: Am 30.06.2014 um 10:35 schrieb Nikunj A Dadhania nik...@linux.vnet.ibm.com: +static void rtas_ibm_os_term(PowerPCCPU *cpu, +sPAPREnvironment *spapr, +uint32_t token, uint32_t nargs, +target_ulong args, +uint32_t nret, target_ulong rets) +{ +target_ulong ret = 0; + +qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, error_abort); The guest doesn't pause. I see the event reaching libvirt and a dump taken and the guest restarts. Since the guest will call os-term in a loop, this will also flood the event listener with lots and lots of panic messages. do { status = rtas_call(rtas_token(ibm,os-term), 1, 1, NULL, __pa(rtas_os_term_buf)); } while (rtas_busy_delay(status)); So when status from the rtas call is success, the loop should exit. Am I missing something? No, I think you're right. I'll queue it for 2.2. Alex
Re: [Qemu-devel] [PATCH 1/2] target-ppc: Change default cpu for ppc64le-linux-user
On 6/28/2014 1:42 PM, Richard Henderson wrote: On 06/28/2014 09:50 AM, Alexander Graf wrote: How about we switch to p7 for BE top? Not ideal until we implement all of p7's insns in TCG: Warning: Disabling some instructions which are not emulated by TCG (0x0, 0x4) r~ That was dealt with in this patch: http://lists.nongnu.org/archive/html/qemu-ppc/2014-06/msg00479.html and has now made its way into HEAD.
Re: [Qemu-devel] [PATCH] Makefile: Don't build generated headers before Makefile is reread
Il 28/06/2014 18:59, Peter Maydell ha scritto: Having a direct dependency Makefile: $(GENERATED_HEADERS) can result in not-from-clean builds failing sometimes, because it means that when Make does its is any makefile or include out of date and needing a rebuild? check, as well as possibly running configure (to update config-host.mak) it will also rebuild GENERATED_HEADERS under the old config-host.mak regime. If the makefile rules for rebuilding the generated headers have changed in a way that means they're not compatible with the old config-host.mak variable names, the build will fail. Even if it does work, it's wasted effort since we'll need to rebuild them with the right config-host.mak settings immediately. Instead, make the generated headers depend on config-host.mak and config-target.mak. This means we'll definitely rebuild them if the configuration changes, but it will be done after Make reloads its makefiles and includes so will happen with the correct set of config-host.mak settings. We rely on the various .o files having correct autogenerated dependency rules to cause the generated headers to be generated as and when they are needed. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- The specific example of this is the change from CONFIG_TRACE_BACKEND to CONFIG_TRACE_BACKENDS, which I am still hitting on a pretty regular basis. This patch fixes that problem and I don't think it has any unpleasant side effects; Paolo, have I missed anything? Makefile| 8 +++- Makefile.target | 2 +- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index 145adb6..dca33b4 100644 --- a/Makefile +++ b/Makefile @@ -546,11 +546,9 @@ ifdef SIGNCODE endif # SIGNCODE endif # CONFIG_WIN -# Add a dependency on the generated files, so that they are always -# rebuilt before other object files -ifneq ($(filter-out %clean,$(MAKECMDGOALS)),$(if $(MAKECMDGOALS),,fail)) -Makefile: $(GENERATED_HEADERS) -endif +# Make the generated files depend on config-host.mak so that if +# the configuration settings change we will rebuild them +$(GENERATED_HEADERS): config-host.mak # Include automatically generated dependency files # Dependencies in Makefile.objs files come from our recursive subdir rules diff --git a/Makefile.target b/Makefile.target index 6089d29..ea8614a 100644 --- a/Makefile.target +++ b/Makefile.target @@ -198,4 +198,4 @@ ifdef CONFIG_TRACE_SYSTEMTAP endif GENERATED_HEADERS += config-target.h -Makefile: $(GENERATED_HEADERS) +$(GENERATED_HEADERS): config-target.mak config-devices.mak config-devices.mak is not reflected in any C header file. Apart from this, Reviewed-by: Paolo Bonzini pbonz...@redhat.com Paolo
Re: [Qemu-devel] [PATCH] ahci.c: mask unused flags when reading size PRDT DBC
On 29.06.14 20:21, reza.jel...@gmail.com wrote: This requires a custom ovmf image with sata controller for testing [0] [0]: http://reza.jelveh.me/assets/OVMF.fd.bz2 I guess this is supposed to be a cover letter? A few rules for cover letters: 1) Cover letters only make sense for patch sets. This is a single patch, so you don't need one 2) Because they come with patch sets and you number patch sets, cover letters are [PATCH 0/n]. 3) Usually cover letters contain git statistics. Try git format-patch --cover-letter. It will give you a nice template. The usual way to add the comment you have here to a patch you're trying to submit it is to put volatile information (things that shouldn't end up in the git log) behind a --- line in the commit message. Everything after that line gets ignored by git when the patch gets applied. Alex
Re: [Qemu-devel] [PATCH] ahci.c: mask unused flags when reading size PRDT DBC
On 29.06.14 20:21, reza.jel...@gmail.com wrote: From: Reza Jelveh reza.jel...@tuhh.de This is a hint that your git configuration isn't fully correct. If the email address git thinks you want to use is the same as the From: email address, it will not print this line. I suppose the problem is with the difference in @gmail.com and @tuhh.de? The data byte count(DBC) read from the description information is defined for bits 21:00. Bits 30:22 are reserved and bit 31 is the Interrupt on Completion (I) flag. Interrupt is not implemented in QEMU. They are implemented in QEMU, but incorrectly. We trigger a completion interrupt after every transaction, not every time we see the I bit in the PRDT. tbl_entry_size is a signed integer and improperly reading the DBC leads to a negative offset that causes sglist allocation to fail. Signed-off-by: Reza Jelveh reza.jel...@tuhh.de --- hw/ide/ahci.c | 12 +--- hw/ide/ahci.h | 2 ++ 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c index 9bae22e..93aa981 100644 --- a/hw/ide/ahci.c +++ b/hw/ide/ahci.c @@ -639,6 +639,12 @@ static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t *cmd_fis) } } +static int prdt_tbl_entry_size(const AHCI_SG tbl) The argument should still be a pointer. +{ +return (le32_to_cpu(tbl.flags_size) AHCI_PRDT_SIZE_MASK) + 1; +} + + There is still one whitespace line too much :). Alex static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist, int offset) { AHCICmdHdr *cmd = ad-cur_cmd; @@ -681,7 +687,7 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist, int offset) sum = 0; for (i = 0; i sglist_alloc_hint; i++) { /* flags_size is zero-based */ -tbl_entry_size = (le32_to_cpu(tbl[i].flags_size) + 1); +tbl_entry_size = prdt_tbl_entry_size(tbl[i]); if (offset = (sum + tbl_entry_size)) { off_idx = i; off_pos = offset - sum; @@ -700,12 +706,12 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist, int offset) qemu_sglist_init(sglist, qbus-parent, (sglist_alloc_hint - off_idx), ad-hba-as); qemu_sglist_add(sglist, le64_to_cpu(tbl[off_idx].addr + off_pos), -le32_to_cpu(tbl[off_idx].flags_size) + 1 - off_pos); +prdt_tbl_entry_size(tbl[off_idx]) - off_pos); for (i = off_idx + 1; i sglist_alloc_hint; i++) { /* flags_size is zero-based */ qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr), -le32_to_cpu(tbl[i].flags_size) + 1); +prdt_tbl_entry_size(tbl[i])); } } diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h index 9a4064f..f418b30 100644 --- a/hw/ide/ahci.h +++ b/hw/ide/ahci.h @@ -201,6 +201,8 @@ #define AHCI_COMMAND_TABLE_ACMD0x40 +#define AHCI_PRDT_SIZE_MASK0x3f + #define IDE_FEATURE_DMA1 #define READ_FPDMA_QUEUED 0x60
Re: [Qemu-devel] [PATCH] Makefile: Don't build generated headers before Makefile is reread
On 30 June 2014 13:09, Paolo Bonzini pbonz...@redhat.com wrote: Il 28/06/2014 18:59, Peter Maydell ha scritto: Having a direct dependency Makefile: $(GENERATED_HEADERS) can result in not-from-clean builds failing sometimes, because it means that when Make does its is any makefile or include out of date and needing a rebuild? check, as well as possibly running configure (to update config-host.mak) it will also rebuild GENERATED_HEADERS under the old config-host.mak regime. If the makefile rules for rebuilding the generated headers have changed in a way that means they're not compatible with the old config-host.mak variable names, the build will fail. Even if it does work, it's wasted effort since we'll need to rebuild them with the right config-host.mak settings immediately. Instead, make the generated headers depend on config-host.mak and config-target.mak. This means we'll definitely rebuild them if the configuration changes, but it will be done after Make reloads its makefiles and includes so will happen with the correct set of config-host.mak settings. We rely on the various .o files having correct autogenerated dependency rules to cause the generated headers to be generated as and when they are needed. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- The specific example of this is the change from CONFIG_TRACE_BACKEND to CONFIG_TRACE_BACKENDS, which I am still hitting on a pretty regular basis. This patch fixes that problem and I don't think it has any unpleasant side effects; Paolo, have I missed anything? Makefile| 8 +++- Makefile.target | 2 +- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index 145adb6..dca33b4 100644 --- a/Makefile +++ b/Makefile @@ -546,11 +546,9 @@ ifdef SIGNCODE endif # SIGNCODE endif # CONFIG_WIN -# Add a dependency on the generated files, so that they are always -# rebuilt before other object files -ifneq ($(filter-out %clean,$(MAKECMDGOALS)),$(if $(MAKECMDGOALS),,fail)) -Makefile: $(GENERATED_HEADERS) -endif +# Make the generated files depend on config-host.mak so that if +# the configuration settings change we will rebuild them +$(GENERATED_HEADERS): config-host.mak # Include automatically generated dependency files # Dependencies in Makefile.objs files come from our recursive subdir rules diff --git a/Makefile.target b/Makefile.target index 6089d29..ea8614a 100644 --- a/Makefile.target +++ b/Makefile.target @@ -198,4 +198,4 @@ ifdef CONFIG_TRACE_SYSTEMTAP endif GENERATED_HEADERS += config-target.h -Makefile: $(GENERATED_HEADERS) +$(GENERATED_HEADERS): config-target.mak config-devices.mak config-devices.mak is not reflected in any C header file. Apart from this, Reviewed-by: Paolo Bonzini pbonz...@redhat.com Do you mean ...and therefore should not be listed on the RHS of this dependency ? thanks -- PMM
Re: [Qemu-devel] [PATCH 0/3] block: introduce submit I/O at batch
Hi Paolo, On Mon, Jun 30, 2014 at 7:10 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 30/06/2014 11:49, Ming Lei ha scritto: Hi, The commit 580b6b2aa2(dataplane: use the QEMU block layer for I/O) introduces ~40% throughput regression on virtio-blk dataplane, and one of causes is that submitting I/O at batch is removed. This patchset trys to introduce this mechanism on block, at least, linux-aio can benefit from that. With these patches, it is observed that thoughout on virtio-blk dataplane can be improved a lot, see data in commit log of patch 3/3. It should be possible to apply the batch mechanism to other devices (such as virtio-scsi) too. The basic idea of the code is great, however I do not think it is necessary to add the logic in AioContext. You are right that io_submit supports multiple I/O, but right now we have one io_context_t per BlockDriverState so it is somewhat premature. Note that AioContext in QEMU means something else than io_context_t. I added the io queue into AioContext because the io queue can only be used in the attached context(or thread), that said the io queue has to be put into per context instance. Or could you suggest where we put the io queue for submitting at batch? At the beginning, I put it into bs, but looks like bdrv_io_plug and bdrv_io_unplug have to handle bs-file and bs itself. I don't think it's necessary to walk backing_hd too (especially because we're close to release and this cannot be a regression---dataplane in 2.0 didn't support backing files at all!). We need to keep the patches as simple as possible. OK, I will not consider backing_hd case in v1, also the patchset will not only improve dataplane, and other devices should benefit from it too. These patches themselves should be simple, and most of code are add-only, if bdrv_io_plug()/bdrv_io_unplug() isn't used, they are very close to noop. I just wrote a draft patch to apply the mechanism on virtio-scsi, and the improvement is obvious. You can just add bdrv_plug/bdrv_unplug callbacks to block/linux-aio.c and, in block.c, something like BlockDriver *drv = bs-drv; if (drv drv-bdrv_plug) { drv-bdrv_plug(bdrv); } else if (bs-file) { bdrv_plug(bs-file); } and place all the queuing logic in linux-aio.c. It should be obvious to the reviewer that the patch only affects dataplane. As I explained above, the io queue has to be per context(thread), so I am wondering if something like above is simpler since linux-aio still need to get context information from bs-aio_context. (Also, note that QEMU doesn't use __ as the prefix for internal function). OK, I will follow the QEMU code style. Thanks, -- Ming Lei
Re: [Qemu-devel] [PATCH 1/2] target-ppc: Change default cpu for ppc64le-linux-user
On 30.06.14 14:08, Tom Musta wrote: On 6/28/2014 1:42 PM, Richard Henderson wrote: On 06/28/2014 09:50 AM, Alexander Graf wrote: How about we switch to p7 for BE top? Not ideal until we implement all of p7's insns in TCG: Warning: Disabling some instructions which are not emulated by TCG (0x0, 0x4) r~ That was dealt with in this patch: http://lists.nongnu.org/archive/html/qemu-ppc/2014-06/msg00479.html and has now made its way into HEAD. Yup, I'll change the default for BE to POWER7 too while applying. Alex
Re: [Qemu-devel] [questions] about KVM as a Microsoft-compatiblehypervisor
On Mon, 2014-06-30 at 19:45 +0800, Zhang Haoyu wrote: Hi, Vadim I read the kvm-2012-forum paper KVM as a Microsoft-compatible hypervisor, Any update and other references, please? Thanks, Zhang Haoyu Unfortunately, not too much. From the the most recent, we have lazy eoi implemented by MST and reference time counter. How to get the source of windows pv-eoi? I'll be referencing to git://git.kernel.org/pub/scm/virt/kvm/kvm.git for lazy eoi please take a look at commit: b63cf42fd1d8c18fab71222321aaf356f63089c9 And what is reference time counter, could you provide some references or code, please? Take a look at commit: e984097b553ed2d6551c805223e4057421370f00 I also suggest reading Hypervisor Functional Specification 3.0a provided by Microsoft and available for downloading from http://www.microsoft.com/en-au/download/details.aspx?id=39289 Best regards, Vadim. Thanks, Zhang Haoyu Best regards, Vadim.
Re: [Qemu-devel] [PATCH 0/3] iotests: Fix qemu-iotests-quick.sh
Am 27.06.2014 um 22:47 hat Max Reitz geschrieben: My previous series iotests: Allow out-of-tree run broke qemu-iotests-quick.sh. Fixing it means simplifying it and allowing more tests to be added to the quick group, which is what this series does. It also adds some unaffected tests to the quick group because I can't see a reason why not to. This series (or at least patch 2) depends on v3 of my series block: Fix unset \filename\ for certain drivers. Reviewed-by: Kevin Wolf kw...@redhat.com The conflict without 'block: Fix unset filename for certain drivers' is trivial to resolve (drop the 097 line in tests/qemu-iotests/group), so this can be merged even without the other series. Kevin
Re: [Qemu-devel] [PATCH 2/2] target-ppc: Fix gdbstub for ppc64le-linux-user
On 28.06.14 18:45, Richard Henderson wrote: The bswap that's needed for system mode isn't required for user mode, and in fact breaks debugging. Cc: Aldy Hernandez al...@redhat.com Signed-off-by: Richard Henderson r...@twiddle.net This breaks the Apple gdbstub backend we recently got in target-ppc. I'll fix it up while applying. Alex
Re: [Qemu-devel] [PATCH 0/2] ppc64le-linux-user fixes
On 28.06.14 18:45, Richard Henderson wrote: Two fixes needed to run and debug hello world. Thanks, fixed up both patches and applied the to ppc-next (2.1 branch). Alex
Re: [Qemu-devel] [PATCH v2 3/3] serial: poll the serial console with G_IO_HUP
Il 30/06/2014 13:00, Roger Pau Monné ha scritto: Do I need to resend this? it's been more than a month without review. I'll send a pull request for it. Sorry for the delay. Paolo On 13/06/14 17:35, Roger Pau Monné wrote: Ping? On 23/05/14 17:57, Roger Pau Monne wrote: On FreeBSD polling a master pty while the other end is not connected with G_IO_OUT only results in an endless wait. This is different from the Linux behaviour, that returns immediately. In order to demonstrate this, I have the following example code: http://xenbits.xen.org/people/royger/test_poll.c When executed on Linux: $ ./test_poll In callback On FreeBSD instead, the callback never gets called: $ ./test_poll So, in order to workaround this, poll the source with G_IO_HUP (which makes the code behave the same way on both Linux and FreeBSD). Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Peter Crosthwaite peter.crosthwa...@xilinx.com Cc: Michael Tokarev m...@tls.msk.ru Cc: Andreas Färber afaer...@suse.de Cc: Paolo Bonzini pbonz...@redhat.com Cc: xen-de...@lists.xenproject.org --- Changes since v1: - Fix other users of qemu_chr_fe_add_watch to use G_IO_HUP. --- hw/char/serial.c |2 +- hw/char/virtio-console.c |3 ++- hw/usb/redirect.c|2 +- monitor.c|2 +- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/hw/char/serial.c b/hw/char/serial.c index f4d167f..2a2c9e5 100644 --- a/hw/char/serial.c +++ b/hw/char/serial.c @@ -246,7 +246,7 @@ static gboolean serial_xmit(GIOChannel *chan, GIOCondition cond, void *opaque) serial_receive1(s, s-tsr, 1); } else if (qemu_chr_fe_write(s-chr, s-tsr, 1) != 1) { if (s-tsr_retry = 0 s-tsr_retry MAX_XMIT_RETRY -qemu_chr_fe_add_watch(s-chr, G_IO_OUT, serial_xmit, s) 0) { +qemu_chr_fe_add_watch(s-chr, G_IO_OUT|G_IO_HUP, serial_xmit, s) 0) { s-tsr_retry++; return FALSE; } diff --git a/hw/char/virtio-console.c b/hw/char/virtio-console.c index 6c8be0f..38e290a 100644 --- a/hw/char/virtio-console.c +++ b/hw/char/virtio-console.c @@ -69,7 +69,8 @@ static ssize_t flush_buf(VirtIOSerialPort *port, if (!k-is_console) { virtio_serial_throttle_port(port, true); if (!vcon-watch) { -vcon-watch = qemu_chr_fe_add_watch(vcon-chr, G_IO_OUT, +vcon-watch = qemu_chr_fe_add_watch(vcon-chr, +G_IO_OUT|G_IO_HUP, chr_write_unblocked, vcon); } } diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c index 287a505..06e757d 100644 --- a/hw/usb/redirect.c +++ b/hw/usb/redirect.c @@ -284,7 +284,7 @@ static int usbredir_write(void *priv, uint8_t *data, int count) r = qemu_chr_fe_write(dev-cs, data, count); if (r count) { if (!dev-watch) { -dev-watch = qemu_chr_fe_add_watch(dev-cs, G_IO_OUT, +dev-watch = qemu_chr_fe_add_watch(dev-cs, G_IO_OUT|G_IO_HUP, usbredir_write_unblocked, dev); } if (r 0) { diff --git a/monitor.c b/monitor.c index 593679a..ae1c539 100644 --- a/monitor.c +++ b/monitor.c @@ -304,7 +304,7 @@ void monitor_flush(Monitor *mon) mon-outbuf = tmp; } if (mon-watch == 0) { -mon-watch = qemu_chr_fe_add_watch(mon-chr, G_IO_OUT, +mon-watch = qemu_chr_fe_add_watch(mon-chr, G_IO_OUT|G_IO_HUP, monitor_unblocked, mon); } }
Re: [Qemu-devel] [PATCH] qemu-char: Convert socket char backend to parse/kind
Il 30/06/2014 12:57, Gerd Hoffmann ha scritto: On Mo, 2014-06-30 at 11:33 +0100, Peter Maydell wrote: On 30 June 2014 11:23, Gerd Hoffmann kra...@redhat.com wrote: Hi, * The old qemu_chr_open_socket() has an if (!is_waitconnect) qemu_set_nonblock(fd); which the QMP char-open codepath has never had. Does this matter? Which of the two code paths was correct? Gerd? IIRC the socket is put into non-blocking mode anyway by the qemu socket helper functions. In that case is qemu_chr_open_socket_fd() incorrect in marking the socket as nonblocking in the is_listen is_waitconnect case? It's unnecessary. tcp_chr_accept calls tcp_chr_add_client, which takes care of that. But it doesn't hurt either. Honestly I think the whole waitconnect stuff should be ripped out. Obvious problem is backward compatibility. But maybe not because nobody uses it anyway. Anyone out there using chardev server sockets without 'nowait' option? waitconnect means go into server mode, wait for a client to connect, then unpause the guest. Which handles one special case of the generic start qemu with -S, connect everything, then sent 'cont' to monitor libvirt is doing. IIRC wait for client to connect is a blocking accept() call. Which you certainly don't want do in a qmp command handler. I agree you don't want to do it, but then... just don't do it. :) We can leave in waitconnect, and leave it blocking. QMP clients simply shouldn't use it, but if they do it's not our fault. In fact, ChardevSocket.wait defaults to false (unlike the command-line counterpart). Paolo