Re: [Qemu-devel] [PATCH] Fixes key mapping so all keys work
On 14 January 2015 at 00:12, Programmingkid programmingk...@gmail.com wrote: This patch allows for all the keys on an Apple extended keyboard to work in QEMU. Signed-off-by: John Arbuckle programmingk...@gmail.com --- ui/cocoa.m | 29 - 1 files changed, 20 insertions(+), 9 deletions(-) diff --git a/ui/cocoa.m b/ui/cocoa.m index c8535a3..afac987 100644 --- a/ui/cocoa.m +++ b/ui/cocoa.m @@ -128,18 +128,26 @@ int keymap[] = 14, // 51 0x330x0eBKSPQZ_BACKSPACE 0, // 52 0x34Undefined 1, // 53 0x350x01ESC QZ_ESCAPE -220, // 54 0x360xdcE0,5C R GUI QZ_RMETA -219, // 55 0x370xdbE0,5B L GUI QZ_LMETA +219,// 54 0x36QZ_RMETA +219,// 55 0x37QZ_LMETA This looks wrong: we shouldn't be mapping right meta and left meta to the same thing, they're different keys. 42, // 56 0x380x2aL SHFT QZ_LSHIFT 58, // 57 0x390x3aCAPSQZ_CAPSLOCK 56, // 58 0x3A0x38L ALT QZ_LALT 29, // 59 0x3B0x1dL CTRL QZ_LCTRL -54, // 60 0x3C0x36R SHFT QZ_RSHIFT -184,// 61 0x3D0xb8E0,38 R ALT QZ_RALT -157,// 62 0x3E0x9dE0,1D R CTRL QZ_RCTRL + +/* + Map right shift, control, option, and command keys to left counterpart + to improve compatibility with Mac OS. +*/ + +42, // 60 0x3C0x36R SHFT QZ_RSHIFT +56, // 61 0x3D0xb8E0,38 R ALT QZ_RALT +29, // 62 0x3E0x9dE0,1D R CTRL QZ_RCTRL Compatibility with Mac OS as a host or as a guest? These are different keys, and if we do this then we won't be able to work correctly with guests that really do treat them as different. + + 0, // 63 0x3FUndefined 0, // 64 0x40Undefined -0, // 65 0x41Undefined +0x53,// 65 0x41 Keypad . Can we be consistent with the other entries, please? SDLi entry in decimal, and appropriate items in the other columns including whatever the QZ_ constant is if we can find it. 0, // 66 0x42Undefined 55, // 67 0x430x37KP *QZ_KP_MULTIPLY 0, // 68 0x44Undefined @@ -150,12 +158,12 @@ int keymap[] = 0, // 73 0x49Undefined 0, // 74 0x4AUndefined 181,// 75 0x4B0xb5E0,35 KP /QZ_KP_DIVIDE -152,// 76 0x4C0x9cE0,1C KP EN QZ_KP_ENTER +0xe01c, //152,// 76 0x4C0x9cE0,1C KP EN QZ_KP_ENTER Only entry in the entire table that's not a one-byte value? 0, // 77 0x4Dundefined 74, // 78 0x4E0x4aKP -QZ_KP_MINUS 0, // 79 0x4FUndefined 0, // 80 0x50Undefined -0, // 81 0x51QZ_KP_EQUALS +13, // 81 0x51QZ_KP_EQUALS The keypad = key and the normal = key ought to have different values here. Otherwise the guest can't tell them apart. 82, // 82 0x520x52KP 0QZ_KP0 79, // 83 0x530x4fKP 1QZ_KP1 80, // 84 0x540x50KP 2QZ_KP2 @@ -201,10 +209,13 @@ int keymap[] = 205,// 124 0x7C0xcde0,4D R ARROW QZ_RIGHT 208,// 125 0x7D0xd0E0,50 D ARROW QZ_DOWN 200,// 126 0x7E0xc8E0,48 U ARROW QZ_UP + /* completed according to http://www.libsdl.org/cgi/cvsweb.cgi/SDL12/src/video/quartz/SDL_QuartzKeys.h?rev=1.6content-type=text/x-cvsweb-markup */ -/* Additional 104 Key XP-Keyboard Scancodes from http://www.computer-engineering.org/ps2keyboard/scancodes1.html */ +/* Aditional 104 Key XP-Keyboard Scancodes from http://www.computer-engineering.org/ps2keyboard/scancodes1.html */ This is reintroducing a typo that was fixed back in 2013 (in commit 49b9bd4dc). /* +219 // 0xdbe0,5b L GUI +220 // 0xdce0,5c R GUI 221 // 0xdde0,5d APPS // E0,2A,E0,37 PRNT SCRN // E1,1D,45,E1,9D,C5 PAUSE -- PMM
[Qemu-devel] [PATCH] AIO: Reduce number of threads for 32bit hosts
On hosts with limited virtual address space (32bit pointers), we can very easily run out of virtual memory with big thread pools. Instead, we should limit ourselves to small pools to keep memory footprint low on those systems. This patch fixes random VM stalls like (process:25114): GLib-ERROR **: gmem.c:103: failed to allocate 1048576 bytes on 32bit ARM systems for me. Signed-off-by: Alexander Graf ag...@suse.de --- thread-pool.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/thread-pool.c b/thread-pool.c index e2cac8e..87a3ea9 100644 --- a/thread-pool.c +++ b/thread-pool.c @@ -299,7 +299,12 @@ static void thread_pool_init_one(ThreadPool *pool, AioContext *ctx) qemu_mutex_init(pool-lock); qemu_cond_init(pool-worker_stopped); qemu_sem_init(pool-sem, 0); -pool-max_threads = 64; +if (sizeof(pool) == 4) { +/* 32bit systems run out of virtual memory quickly */ +pool-max_threads = 4; +} else { +pool-max_threads = 64; +} pool-new_thread_bh = aio_bh_new(ctx, spawn_thread_bh_fn, pool); QLIST_INIT(pool-head); -- 1.7.12.4
Re: [Qemu-devel] [RESEND PATCH 0/2] QEMUSizedBuffer/QEMUFile fixes
Ping... 在 12/19/2014 11:38 AM, Yang Hongyang 写道: Rebased to the latest master. Yang Hongyang (2): QEMUSizedBuffer: only free qsb that qemu_bufopen allocated Tests: QEMUSizedBuffer/QEMUBuffer migration/qemu-file-buf.c | 10 ++ tests/test-vmstate.c | 20 2 files changed, 14 insertions(+), 16 deletions(-) -- Thanks, Yang.
Re: [Qemu-devel] [PATCH v11 09/13] qmp: Add support of dirty-bitmap sync mode for drive-backup
On Tue, 01/13 12:50, John Snow wrote: On 01/13/2015 04:37 AM, Fam Zheng wrote: On Mon, 01/12 11:31, John Snow wrote: For dirty-bitmap sync mode, the block job will iterate through the given dirty bitmap to decide if a sector needs backup (backup all the dirty clusters and skip clean ones), just as allocation conditions of top sync mode. Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: John Snow js...@redhat.com --- block.c | 5 ++ block/backup.c| 120 ++ block/mirror.c| 4 ++ blockdev.c| 14 +- hmp.c | 3 +- include/block/block.h | 1 + include/block/block_int.h | 2 + qapi/block-core.json | 11 +++-- qmp-commands.hx | 7 +-- 9 files changed, 137 insertions(+), 30 deletions(-) diff --git a/block.c b/block.c index 3f33b9d..5eb6788 100644 --- a/block.c +++ b/block.c @@ -5633,6 +5633,11 @@ static void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, } } +void bdrv_set_dirty_iter(HBitmapIter *hbi, int64_t offset) +{ +hbitmap_iter_init(hbi, hbi-hb, offset); What's the reason for this indirection? Can caller just use hbitmap_iter_init? Essentially it is just a rename of hbitmap_iter_init to make its usage here clear, that we are manually advancing the pointer. How we accomplish that (hbitmap_iter_init) is an implementation detail. Yes, I can just call this directly from block/backup, but at the time I was less sure of how I would advance the pointer, so I created a wrapper where I could change details as needed, if I needed to. OK, either is fine for me. +} + int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) { return hbitmap_count(bitmap-bitmap); diff --git a/block/backup.c b/block/backup.c index 792e655..0626a3e 100644 --- a/block/backup.c +++ b/block/backup.c @@ -37,6 +37,8 @@ typedef struct CowRequest { typedef struct BackupBlockJob { BlockJob common; BlockDriverState *target; +/* bitmap for sync=dirty-bitmap */ +BdrvDirtyBitmap *sync_bitmap; MirrorSyncMode sync_mode; RateLimit limit; BlockdevOnError on_source_error; @@ -242,6 +244,31 @@ static void backup_complete(BlockJob *job, void *opaque) g_free(data); } +static bool coroutine_fn yield_and_check(BackupBlockJob *job) +{ +if (block_job_is_cancelled(job-common)) { +return true; +} + +/* we need to yield so that qemu_aio_flush() returns. + * (without, VM does not reboot) + */ +if (job-common.speed) { +uint64_t delay_ns = ratelimit_calculate_delay(job-limit, + job-sectors_read); +job-sectors_read = 0; +block_job_sleep_ns(job-common, QEMU_CLOCK_REALTIME, delay_ns); +} else { +block_job_sleep_ns(job-common, QEMU_CLOCK_REALTIME, 0); +} + +if (block_job_is_cancelled(job-common)) { +return true; +} + +return false; +} + static void coroutine_fn backup_run(void *opaque) { BackupBlockJob *job = opaque; @@ -254,13 +281,13 @@ static void coroutine_fn backup_run(void *opaque) }; int64_t start, end; int ret = 0; +bool error_is_read; QLIST_INIT(job-inflight_reqs); qemu_co_rwlock_init(job-flush_rwlock); start = 0; -end = DIV_ROUND_UP(job-common.len / BDRV_SECTOR_SIZE, - BACKUP_SECTORS_PER_CLUSTER); +end = DIV_ROUND_UP(job-common.len, BACKUP_CLUSTER_SIZE); job-bitmap = hbitmap_alloc(end, 0); @@ -278,28 +305,45 @@ static void coroutine_fn backup_run(void *opaque) qemu_coroutine_yield(); job-common.busy = true; } +} else if (job-sync_mode == MIRROR_SYNC_MODE_DIRTY_BITMAP) { +/* Dirty Bitmap sync has a slightly different iteration method */ +HBitmapIter hbi; +int64_t sector; +int64_t cluster; +bool polyrhythmic; + +bdrv_dirty_iter_init(bs, job-sync_bitmap, hbi); +/* Does the granularity happen to match our backup cluster size? */ +polyrhythmic = (bdrv_dirty_bitmap_granularity(bs, job-sync_bitmap) != +BACKUP_CLUSTER_SIZE); + +/* Find the next dirty /sector/ and copy that /cluster/ */ +while ((sector = hbitmap_iter_next(hbi)) != -1) { +if (yield_and_check(job)) { +goto leave; +} +cluster = sector / BACKUP_SECTORS_PER_CLUSTER; + +do { +ret = backup_do_cow(bs, cluster * BACKUP_SECTORS_PER_CLUSTER, +BACKUP_SECTORS_PER_CLUSTER, error_is_read); +if ((ret 0) +backup_error_action(job, error_is_read, -ret) == +
[Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
Persistent dirty bitmaps will be saved into qcow2 files. It may be used as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for other drives (there may be qcow2 file with zero disk size but with several dirty bitmaps for other drives). Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- docs/specs/qcow2.txt | 59 1 file changed, 59 insertions(+) diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt index 121dfc8..b29de40 100644 --- a/docs/specs/qcow2.txt +++ b/docs/specs/qcow2.txt @@ -116,6 +116,13 @@ in the description of a field. Length of the header structure in bytes. For version 2 images, the length is always assumed to be 72 bytes. +104 - 107: nb_dirty_bitmaps +Number of dirty bitmaps contained in the image + +108 - 115: dirty_bitmaps_offset +Offset into the image file at which the dirty bitmaps table +starts. Must be aligned to a cluster boundary. + Directly after the image header, optional sections called header extensions can be stored. Each extension has a structure like the following: @@ -360,3 +367,55 @@ Snapshot table entry: variable: Padding to round up the snapshot table entry size to the next multiple of 8. + + +== Dirty bitmaps == + +The feature supports storing several dirty bitmaps in the qcow2 file. + +=== Cluster mapping === + +Dirty bitmaps are stored using a ONE-level structure for the mapping of +bitmaps to host clusters. There only an L1 table. + +The L1 table has a variable size (stored in the Bitmap table entry) and may +use multiple clusters, however it must be contiguous in the image file. + +Given an offset into the bitmap, the offset into the image file can be +obtained as follows: + +offset = l1_table[offset / cluster_size] + (offset % cluster_size) + +L1 table entry: + +Bit 0 - 61: Standard cluster descriptor + +62 - 63: Reserved + +=== Bitmap table === + +A directory of all bitmaps is stored in the bitmap table, a contiguous area in +the image file, whose starting offset and length are given by the header fields +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap table have +variable length, depending on the length of name and extra data. + +Bitmap table entry: + +Byte 0 - 7:Offset into the image file at which the L1 table for the +bitmap starts. Must be aligned to a cluster boundary. + + 8 - 11:Number of entries in the L1 table of the bitmap + +12 - 15:Bitmap granularity in bytes + +16 - 23:Bitmap size in sectors + +24 - 25:Size of the bitmap name + +variable: The name of the bitmap (not null terminated) + +variable: Padding to round up the bitmap table entry size to the +next multiple of 8. + +The fields size, granularity and name are corresponding with the fields +in struct BdrvDirtyBitmap. -- 1.9.1
Re: [Qemu-devel] Patch Round-up for stable 2.1.3, freeze on 2015-01-14
Hello, On Jan09 23:42, Paolo Bonzini wrote: That's commit 364c3e6b8dd7912e01d19122d791b8c8f6df4f6c on branch uq/master of git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git. maybe the one for: fix regression when reading memory size from config file is also a patch to think about. Thanks, -- William signature.asc Description: Digital signature
Re: [Qemu-devel] [PATCH for 2.3 v2 1/1] xen-hvm: increase maxmem before calling xc_domain_populate_physmap
On Mon, 12 Jan 2015, Stefano Stabellini wrote: On Wed, 3 Dec 2014, Don Slutz wrote: From: Stefano Stabellini stefano.stabell...@eu.citrix.com Increase maxmem before calling xc_domain_populate_physmap_exact to avoid the risk of running out of guest memory. This way we can also avoid complex memory calculations in libxl at domain construction time. This patch fixes an abort() when assigning more than 4 NICs to a VM. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Signed-off-by: Don Slutz dsl...@verizon.com --- v2: Changes by Don Slutz Switch from xc_domain_getinfo to xc_domain_getinfolist Fix error check for xc_domain_getinfolist Limit increase of maxmem to only do when needed: Add QEMU_SPARE_PAGES (How many pages to leave free) Add free_pages calculation xen-hvm.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/xen-hvm.c b/xen-hvm.c index 7548794..d30e77e 100644 --- a/xen-hvm.c +++ b/xen-hvm.c @@ -90,6 +90,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu) #endif #define BUFFER_IO_MAX_DELAY 100 +#define QEMU_SPARE_PAGES 16 We need a big comment here to explain why we have this parameter and when we'll be able to get rid of it. Other than that the patch is fine. Thanks! Actually I'll just go ahead and add the comment and commit, if for you is OK. Cheers, Stefano
Re: [Qemu-devel] [PATCH v5] block/raw-posix.c: Fixes raw_getlength() on Mac OS X so that it reports the correct length of a real CD
On 13 January 2015 at 18:26, Programmingkid programmingk...@gmail.com wrote: Allows for using real cdrom disc in QEMU under Mac OS X. This command was used to see if this patch worked: ./qemu-system-i386 -drive if=none,id=drive0,file=/dev/null,format=raw Signed-off-by: John Arbuckle programmingk...@gmail.com --- Replaced -errno with 0 for ioctl() failure return value. make check now passes with this patch. block/raw-posix.c | 18 +- configure |2 +- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index e51293a..16fa0a4 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -1312,7 +1312,23 @@ again: if (size == 0) #endif #if defined(__APPLE__) defined(__MACH__) -size = LLONG_MAX; +{ +uint64_t sectors = 0; +uint32_t sector_size = 0; + +/* Query the number of sectors on the disk */ +ret = ioctl(fd, DKIOCGETBLOCKCOUNT, sectors); +if (ret == -1) { +return 0; We should be falling back to lseek, not returning 0... thanks -- PMM
[Qemu-devel] [PATCH 6/8] qemu: command line option for dirty bitmaps
The patch adds the following command line option: -dirty-bitmap [option1=val1][,option2=val2]... Available options are: name The name for the bitmap (necessary). file The file to load the bitmap from. file_id When specified with 'file' option, then this file will be available through this id for other -dirty-bitmap options when specified without 'file' option, then it is a reference to 'file', specified with another -dirty-bitmap option, and it will be used to load the bitmap from. driveThe drive to bind the bitmap to. It should be specified as 'id' suboption of one of -drive options. If nor 'file' neither 'file_id' are specified, then the bitmap will be loaded from that drive (internal dirty bitmap). granularity The granularity for the bitmap. Not necessary, the default value may be used. enabled on|off. Default is 'on'. Disabled bitmaps are not changing regardless of writes to corresponding drive. Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- blockdev.c| 38 ++ include/sysemu/blockdev.h | 1 + include/sysemu/sysemu.h | 1 + qemu-options.hx | 37 + vl.c | 100 ++ 5 files changed, 177 insertions(+) diff --git a/blockdev.c b/blockdev.c index 209fedd..8a9be08 100644 --- a/blockdev.c +++ b/blockdev.c @@ -176,6 +176,11 @@ QemuOpts *drive_def(const char *optstr) return qemu_opts_parse(qemu_find_opts(drive), optstr, 0); } +QemuOpts *dirty_bitmap_def(const char *optstr) +{ +return qemu_opts_parse(qemu_find_opts(dirty-bitmap), optstr, 0); +} + QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file, const char *optstr) { @@ -3044,6 +3049,39 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp) return head; } +QemuOptsList qemu_dirty_bitmap_opts = { +.name = dirty-bitmap, +.head = QTAILQ_HEAD_INITIALIZER(qemu_dirty_bitmap_opts.head), +.desc = { +{ +.name = name, +.type = QEMU_OPT_STRING, +.help = Name of the dirty bitmap, +},{ +.name = file, +.type = QEMU_OPT_STRING, +.help = file name to load the bitmap from, +},{ +.name = file_id, +.type = QEMU_OPT_STRING, +.help = node name to load the bitmap from (or to set id for + for file, opened by previous option), +},{ +.name = drive, +.type = QEMU_OPT_STRING, +.help = drive id to bind the bitmap to, +},{ +.name = granularity, +.type = QEMU_OPT_NUMBER, +.help = granularity, +},{ +.name = enabled, +.type = QEMU_OPT_BOOL, +.help = enabled flag (default is 'on'), +} +} +}; + QemuOptsList qemu_common_drive_opts = { .name = drive, .head = QTAILQ_HEAD_INITIALIZER(qemu_common_drive_opts.head), diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h index 09d1e30..48cce5c 100644 --- a/include/sysemu/blockdev.h +++ b/include/sysemu/blockdev.h @@ -57,6 +57,7 @@ int drive_get_max_devs(BlockInterfaceType type); DriveInfo *drive_get_next(BlockInterfaceType type); QemuOpts *drive_def(const char *optstr); +QemuOpts *dirty_bitmap_def(const char *optstr); QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file, const char *optstr); DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType block_default_type); diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 503e5a4..c984a52 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -231,6 +231,7 @@ bool usb_enabled(bool default_usb); extern QemuOptsList qemu_legacy_drive_opts; extern QemuOptsList qemu_common_drive_opts; +extern QemuOptsList qemu_dirty_bitmap_opts; extern QemuOptsList qemu_drive_opts; extern QemuOptsList qemu_chardev_opts; extern QemuOptsList qemu_device_opts; diff --git a/qemu-options.hx b/qemu-options.hx index 10b9568..3a5bfde 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -592,6 +592,43 @@ qemu-system-i386 -hda a -hdb b @end example ETEXI +DEF(dirty-bitmap, HAS_ARG, QEMU_OPTION_dirty_bitmap, +-dirty-bitmap name=name[,file=file][,file_id=file_id][,drive=@var{id}]\n + [,granularity=granularity][,enabled=on|off]\n, +QEMU_ARCH_ALL) +STEXI +@item -dirty-bitmap @var{option}[,@var{option}[,@var{option}[,...]]] +@findex -dirty-bitmap + +Define a dirty-bitmap. Valid options are: + +@table @option +@item name=@var{name} +The name of the bitmap. Should be unique per @var{file}/@var{drive} and per +@var{for_drive}. +@item file=@var{file}
[Qemu-devel] [PATCH 4/8] block: store persistent dirty bitmaps
Persistent dirty bitmaps are the bitmaps, for which the new field BdrvDirtyBitmap.file is not NULL. We save all persistent dirty bitmaps owned by BlockDriverState in corresponding bdrv_close(). BdrvDirtyBitmap.file is a BlockDriverState, where we want to save the bitmap. It may be set in bdrv_dirty_bitmap_set_file() only once. bdrv_ref/bdrv_unref are used for BdrvDirtyBitmap.file to be sure that files will be closed and resources will be freed. Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- block.c | 43 +++ include/block/block.h | 3 +++ 2 files changed, 46 insertions(+) diff --git a/block.c b/block.c index 2466ba8..7237b95 100644 --- a/block.c +++ b/block.c @@ -54,6 +54,7 @@ struct BdrvDirtyBitmap { HBitmap *bitmap; BdrvDirtyBitmap *originator; +BlockDriverState *file; int64_t size; int64_t granularity; char *name; @@ -1840,6 +1841,7 @@ void bdrv_reopen_abort(BDRVReopenState *reopen_state) void bdrv_close(BlockDriverState *bs) { BdrvAioNotifier *ban, *ban_next; +BdrvDirtyBitmap *bm, *bm_next; if (bs-job) { block_job_cancel_sync(bs-job); @@ -1849,6 +1851,15 @@ void bdrv_close(BlockDriverState *bs) bdrv_drain_all(); /* in case flush left pending I/O */ notifier_list_notify(bs-close_notifiers, bs); +/* save and release persistent dirty bitmaps */ +QLIST_FOREACH_SAFE(bm, bs-dirty_bitmaps, list, bm_next) { +if (bm-file) { +bdrv_store_dirty_bitmap(bm); +bdrv_unref(bm-file); +bdrv_release_dirty_bitmap(bs, bm); +} +} + if (bs-drv) { if (bs-backing_hd) { BlockDriverState *backing_hd = bs-backing_hd; @@ -5373,6 +5384,29 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs, return originator; } +int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap) +{ +BlockDriverState *bs = bitmap-file; +uint8_t *buf; +uint64_t size; +assert(bs); +assert(bs-drv); +assert(bs-drv-bdrv_dirty_bitmap_store); + +size = hbitmap_data_size(bitmap-bitmap, bitmap-size); +size = (size + 3) ~3; +buf = g_malloc(size); + +hbitmap_store_data(bitmap-bitmap, buf, 0, bitmap-size); + +int res = bs-drv-bdrv_dirty_bitmap_store(bs, buf, + bitmap-name, + bitmap-size, + bitmap-granularity); + +g_free(buf); +return res; +} BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, int granularity, @@ -5421,6 +5455,15 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) } } +void bdrv_dirty_bitmap_set_file(BdrvDirtyBitmap *bitmap, BlockDriverState *file) +{ +assert(bitmap-file == NULL); +bitmap-file = file; +if (file != NULL) { +bdrv_ref(file); +} +} + void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap) { bitmap-enabled = false; diff --git a/include/block/block.h b/include/block/block.h index cb1f28d..0dfefe3 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -442,6 +442,8 @@ BdrvDirtyBitmap *bdrv_copy_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *failed); void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap); +void bdrv_dirty_bitmap_set_file(BdrvDirtyBitmap *bitmap, +BlockDriverState *file); void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap); void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap); BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs); @@ -458,6 +460,7 @@ void bdrv_dirty_iter_init(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, struct HBitmapIter *hbi); void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t offset); int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap); +int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap); void bdrv_enable_copy_on_read(BlockDriverState *bs); void bdrv_disable_copy_on_read(BlockDriverState *bs); -- 1.9.1
Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
Stefan Hajnoczi stefa...@redhat.com writes: On Tue, Jan 13, 2015 at 11:51:51AM +0100, Markus Armbruster wrote: Christian Borntraeger borntrae...@de.ibm.com writes: Am 02.01.2015 um 13:57 schrieb Stefan Hajnoczi: On Thu, Dec 18, 2014 at 04:59:50PM +0100, Christian Borntraeger wrote: Are you ok with the patches? If yes, can you take care of these patches in the block tree? This series looks close, I've left comments on the patches. OK, so you would take care of the series in your tree if the next spin addresses your comments, right? The series is fine for command-line QEMU users where probing makes the command-line more convenient, so we can merge it. But the approach is fundamentally wrong for stacks where libvirt is in use. Libvirt is unaware of the guest geometry and block sizes that are probed in QEMU by this patch series. This breaks non-shared storage migration and also means libvirt-based tools that manipulate drives on a guest may inadvertently change the guest-visible geometry and cause disk problems. For example, what happens when you copy the disk image off a host DASD and onto NFS? QEMU no longer probes the geometry and the disk geometry has changed. The right place to tackle guest-visible geometry is in libvirt, not in QEMU, because it is guest state the needs to be captured in domain XML so that migration and tooling can preserve it when manipulating guests. Stefan I agree that this is not perfect and has obvious holes, but it works just fine with libvirt stacks that are typical on s390 (*). scsi disks (emulated and real) are not affected and work just fine. DASD disks are special anyway - please consider this as some kind of real HW pass-through even when we use virtio-blk. We can assume that admins can provide shared access between source and target. If you look at the real HW (LPAR and z/VM) DASDs are always attached via fibre channel (FICON protocol) (often with SAN switches) so shared access is quite common even over longer distances. I wouldn't quite call it pass-through, but I agree that DASDs are special. And yes, using NFS or image file will break unless the user specifies the geometry in libvirt. Setting these values is possible as of today in libvirts XML. But what programmatic way should libvirt use to detect that information itself? On an image file libvirt doesnt know either. This would be somewhat possible on an upper level like open stack and that upper level would then need to talk with the DS8000 storage subsystems and the system z hardware but even then it would fail when creating new DASD like images. (This would boil down to provide an interface like create an image file that looks like as DASD of type 3390/xx formatted with dasdfmt and the following parameters). Disk geometry is really a device model property. For historical reasons, we conjure up default geometries in the block layer. For convenience, we special-case DASDs (this series). Very few guests care about device geometry. If your guest does, relying on default geometry has always exposed you to the risk of an unwanted geometry change. So far largely theoretical, as the default geometry guessing algorithm has been unlikely to change incompatibily. To eliminate the risk, specify the geometry explicitly. With libvirt, that means putting it into domain XML. This series changes *does* change the guessing algorithm, but for DASDs only. Thus, the risk becomes real, but for users of DASDs only. Since said users of DASDs ask for it... let them have rope, I say. Libvirt could default the geometry to whatever QEMU comes up with, then fix it forever in domain XML. Doubtful whether it's worth the bother, as very few guests care, and their users are probably (painfully) aware of geometry pitfalls anyway. If we talk about cloud/openstack etc we do not have pass through devices anyway and only image files. If somebody asks me, I would create all image files as SCSI images anyway, all the DASD stuff makes sense if you have DASD hardware (or if you really know what you are going to do). What is your opinion on a) migrating disk properties like geometry b) comparing the detected disk properties and reject migration if they dont match? Both changes should provide a safety net and could be added at a later point in time. If I remember correctly I suggested to investigate in that direction, but then we concluded it wasn't worth the bother. Migrating disk geometry violates how live migration works: 1. Run-time device state like device registers and memory contents are live-migrated by QEMU. 2. Machine configuration like the list of emulated devices and read-only properties on those devices are not live migrated by QEMU. Correct. Frontend configuration could be migrated, but we don't. Disk geometry is a property of the
[Qemu-devel] [PATCH 4/8] rcu: allow nesting of rcu_read_lock/rcu_read_unlock
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- include/qemu/rcu.h | 15 ++- tests/rcutorture.c | 2 ++ 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h index 91b3a5c..e3d9b63 100644 --- a/include/qemu/rcu.h +++ b/include/qemu/rcu.h @@ -76,6 +76,9 @@ struct rcu_reader_data { unsigned long ctr; bool waiting; +/* Data used by reader only */ +unsigned depth; + /* Data used for registry, protected by rcu_gp_lock */ QLIST_ENTRY(rcu_reader_data) node; }; @@ -85,8 +88,13 @@ extern __thread struct rcu_reader_data rcu_reader; static inline void rcu_read_lock(void) { struct rcu_reader_data *p_rcu_reader = rcu_reader; +unsigned ctr; + +if (p_rcu_reader-depth++ 0) { +return; +} -unsigned ctr = atomic_read(rcu_gp_ctr); +ctr = atomic_read(rcu_gp_ctr); atomic_xchg(p_rcu_reader-ctr, ctr); if (atomic_read(p_rcu_reader-waiting)) { atomic_set(p_rcu_reader-waiting, false); @@ -98,6 +106,11 @@ static inline void rcu_read_unlock(void) { struct rcu_reader_data *p_rcu_reader = rcu_reader; +assert(p_rcu_reader-depth != 0); +if (--p_rcu_reader-depth 0) { +return; +} + atomic_xchg(p_rcu_reader-ctr, 0); if (atomic_read(p_rcu_reader-waiting)) { atomic_set(p_rcu_reader-waiting, false); diff --git a/tests/rcutorture.c b/tests/rcutorture.c index cda7458..a6ac2a9 100644 --- a/tests/rcutorture.c +++ b/tests/rcutorture.c @@ -257,9 +257,11 @@ static void *rcu_read_stress_test(void *arg) if (p-mbtest == 0) { n_mberror++; } +rcu_read_lock(); for (i = 0; i 100; i++) { garbage++; } +rcu_read_unlock(); pc = p-pipe_count; rcu_read_unlock(); if ((pc RCU_STRESS_PIPE_LEN) || (pc 0)) { -- 1.8.3.1
[Qemu-devel] [PATCH 7/8] memory: protect current_map by RCU
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- include/exec/memory.h | 5 + memory.c | 54 ++- 2 files changed, 28 insertions(+), 31 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index 0cd96b1..06ffa1d 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -33,6 +33,7 @@ #include qemu/notify.h #include qapi/error.h #include qom/object.h +#include qemu/rcu.h #define MAX_PHYS_ADDR_SPACE_BITS 62 #define MAX_PHYS_ADDR(((hwaddr)1 MAX_PHYS_ADDR_SPACE_BITS) - 1) @@ -207,9 +208,13 @@ struct MemoryListener { */ struct AddressSpace { /* All fields are private. */ +struct rcu_head rcu; char *name; MemoryRegion *root; + +/* Accessed via RCU. */ struct FlatView *current_map; + int ioeventfd_nb; struct MemoryRegionIoeventfd *ioeventfds; struct AddressSpaceDispatch *dispatch; diff --git a/memory.c b/memory.c index 8c3d8c0..a844ced 100644 --- a/memory.c +++ b/memory.c @@ -33,26 +33,12 @@ static bool memory_region_update_pending; static bool ioeventfd_update_pending; static bool global_dirty_log = false; -/* flat_view_mutex is taken around reading as-current_map; the critical - * section is extremely short, so I'm using a single mutex for every AS. - * We could also RCU for the read-side. - * - * The BQL is taken around transaction commits, hence both locks are taken - * while writing to as-current_map (with the BQL taken outside). - */ -static QemuMutex flat_view_mutex; - static QTAILQ_HEAD(memory_listeners, MemoryListener) memory_listeners = QTAILQ_HEAD_INITIALIZER(memory_listeners); static QTAILQ_HEAD(, AddressSpace) address_spaces = QTAILQ_HEAD_INITIALIZER(address_spaces); -static void memory_init(void) -{ -qemu_mutex_init(flat_view_mutex); -} - typedef struct AddrRange AddrRange; /* @@ -242,6 +228,7 @@ struct FlatRange { * order. */ struct FlatView { +struct rcu_head rcu; unsigned ref; FlatRange *ranges; unsigned nr; @@ -654,10 +641,10 @@ static FlatView *address_space_get_flatview(AddressSpace *as) { FlatView *view; -qemu_mutex_lock(flat_view_mutex); -view = as-current_map; +rcu_read_lock(); +view = atomic_rcu_read(as-current_map); flatview_ref(view); -qemu_mutex_unlock(flat_view_mutex); +rcu_read_unlock(); return view; } @@ -766,10 +753,9 @@ static void address_space_update_topology(AddressSpace *as) address_space_update_topology_pass(as, old_view, new_view, false); address_space_update_topology_pass(as, old_view, new_view, true); -qemu_mutex_lock(flat_view_mutex); -flatview_unref(as-current_map); -as-current_map = new_view; -qemu_mutex_unlock(flat_view_mutex); +/* Writes are protected by the BQL. */ +atomic_rcu_set(as-current_map, new_view); +call_rcu(old_view, flatview_unref, rcu); /* Note that all the old MemoryRegions are still alive up to this * point. This relieves most MemoryListeners from the need to @@ -1957,10 +1943,6 @@ void memory_listener_unregister(MemoryListener *listener) void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name) { -if (QTAILQ_EMPTY(address_spaces)) { -memory_init(); -} - memory_region_transaction_begin(); as-root = root; as-current_map = g_new(FlatView, 1); @@ -1974,15 +1956,10 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name) memory_region_transaction_commit(); } -void address_space_destroy(AddressSpace *as) +static void do_address_space_destroy(AddressSpace *as) { MemoryListener *listener; -/* Flush out anything from MemoryListeners listening in on this */ -memory_region_transaction_begin(); -as-root = NULL; -memory_region_transaction_commit(); -QTAILQ_REMOVE(address_spaces, as, address_spaces_link); address_space_destroy_dispatch(as); QTAILQ_FOREACH(listener, memory_listeners, link) { @@ -1994,6 +1971,21 @@ void address_space_destroy(AddressSpace *as) g_free(as-ioeventfds); } +void address_space_destroy(AddressSpace *as) +{ +/* Flush out anything from MemoryListeners listening in on this */ +memory_region_transaction_begin(); +as-root = NULL; +memory_region_transaction_commit(); +QTAILQ_REMOVE(address_spaces, as, address_spaces_link); + +/* At this point, as-dispatch and as-current_map are dummy + * entries that the guest should never use. Wait for the old + * values to expire before freeing the data. + */ +call_rcu(as, do_address_space_destroy, rcu); +} + bool io_mem_read(MemoryRegion *mr, hwaddr addr, uint64_t *pval, unsigned size) { return memory_region_dispatch_read(mr, addr, pval, size); -- 1.8.3.1
Re: [Qemu-devel] Question regarding two variables in qemu migration code
* Jidong Xiao (jidong.x...@gmail.com) wrote: On Tue, Jan 13, 2015 at 1:38 AM, Dr. David Alan Gilbert dgilb...@redhat.com wrote: * Jidong Xiao (jidong.x...@gmail.com) wrote: Hi, Hi, I am looking at the qemu source code, and trying to understand the migration part. In arch_init.c, there are two variables which seems quite confusing to me, They are: static uint64_t migration_dirty_pages; 'migration_dirty_pages' is the number of pages that are currently known that need to be sent to the destination; it goes down whenever we send a page, but goes up when we sync the dirty bitmap that tells us that something changed the data in the page (see migration_bitmap_sync_range ) static int64_t num_dirty_pages_period; // defined in function migration_bitmap_sync() This is looking how many pages we've noticed are now dirty within a particular time - to try and get an estimate of how fast memory is changing If you see migration_bitmap_sync has an: if (end_time start_time + 1000) { and inside there it uses num_dirty_pages_period to update dirty_pages_rate. Can anyone kindly explain that what does these two variables mean? Thanks. -Jidong Dave -- Thanks Dave, your explanation is really really helpful. But in function migration_bitmap_sync(), I see this: num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init; If as you said, num_dirty_pages_period refers to the pages get dirty within a particular time, then why it is +=???, instead of =? i.e., something like this: num_dirty_pages_period = migration_dirty_pages - num_dirty_pages_init; I just don't see why num_dirty_pages_period has to be accumulated with its previous value. It's because there are two periods: migration_dirty_pages - num_dirty_pages_init is the number of pages found dirty in this call to migration_bitmap_sync num_dirty_pages_period is the number of pages found dirty in a ~1 second periods, as tested by the code in migration_bitmap_sync after the: if (end_time start_time + 1000) { migration_bitmap_sync is called multiple times within that ~1 second period, and num_dirty_pages_period accumulates across the calls. Dave -Jidong -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[Qemu-devel] [PATCH 0/8] block: persistent dirty bitmaps (RFC)
The bitmaps are saved into qcow2 file format. It provides both 'internal' and 'external' dirty bitmaps feature: - for qcow2 drives we can store bitmaps in the same file - for other formats we can store bitmaps in the separate qcow2 file QCow2 header is extended by fields 'nb_dirty_bitmaps' and 'dirty_bitmaps_offset' like with snapshots. Proposed command line syntax is the following: -dirty-bitmap [option1=val1][,option2=val2]... Available options are: name The name for the bitmap (necessary). file The file to load the bitmap from. file_id When specified with 'file' option, then this file will be available through this id for other -dirty-bitmap options when specified without 'file' option, then it is a reference to 'file', specified with another -dirty-bitmap option, and it will be used to load the bitmap from. driveThe drive to bind the bitmap to. It should be specified as 'id' suboption of one of -drive options. If nor 'file' neither 'file_id' are specified, then the bitmap will be loaded from that drive (internal dirty bitmap). granularity The granularity for the bitmap. Not necessary, the default value may be used. enabled on|off. Default is 'on'. Disabled bitmaps are not changing regardless of writes to corresponding drive. Examples: qemu -drive file=a.qcow2,id=disk -dirty-bitmap name=b,drive=disk qemu -drive file=a.raw,id=disk \ -dirty-bitmap name=b,drive=disk,file=b.qcow2,enabled=off Vladimir Sementsov-Ogievskiy (8): spec: add qcow2-dirty-bitmaps specification hbitmap: store / restore qcow2: add dirty-bitmaps feature block: store persistent dirty bitmaps block: add bdrv_load_dirty_bitmap qemu: command line option for dirty bitmaps qmp: print dirty bitmap iotests: test internal persistent dirty bitmap block.c| 113 ++ block/Makefile.objs| 2 +- block/qcow2-dirty-bitmap.c | 514 + block/qcow2.c | 26 +++ block/qcow2.h | 48 + blockdev.c | 51 + docs/specs/qcow2.txt | 59 ++ hmp-commands.hx| 15 ++ hmp.c | 8 + hmp.h | 1 + include/block/block.h | 9 + include/block/block_int.h | 10 + include/qemu/hbitmap.h | 49 + include/sysemu/blockdev.h | 1 + include/sysemu/sysemu.h| 1 + qapi-schema.json | 3 +- qapi/block-core.json | 3 + qemu-options.hx| 37 qmp-commands.hx| 5 + tests/qemu-iotests/115 | 96 + tests/qemu-iotests/115.out | 64 ++ tests/qemu-iotests/group | 1 + util/hbitmap.c | 87 vl.c | 100 + 24 files changed, 1301 insertions(+), 2 deletions(-) create mode 100644 block/qcow2-dirty-bitmap.c create mode 100755 tests/qemu-iotests/115 create mode 100644 tests/qemu-iotests/115.out -- 1.9.1
Re: [Qemu-devel] [PATCH 0/1] pci: allow 0 address for PCI IO/MEM regions
On 01/13/15 17:17, Michael S. Tsirkin wrote: On Tue, Jan 13, 2015 at 05:54:46PM +0200, Michael S. Tsirkin wrote: I think we already do this for PC: commit 83d08f2673504a299194dcac1657a13754b5932a Author: Michael S. Tsirkin m...@redhat.com Date: Tue Oct 29 13:57:34 2013 +0100 pc: map PCI address space as catchall region for not mapped addresses but we need to find and fix all other targets. BTW this is very easy to test. Add an unused device (like ivshmem) enable BAR, and set it to 0. System should survive, as opposed to hanging. But the big question is whether this is the right thing to do for each platform. For PIIX whatever is not system memory, is PCI. But other boxes might have a different view of the matter. Only few platforms have PCI mapped at 0, no? So in most cases you get windows that simply don't have overlapping at all. Alex
[Qemu-devel] [PATCH 1/1] Print PID and time in stderr traces
From: Dr. David Alan Gilbert dgilb...@redhat.com When debugging migration it's useful to know the PID of each trace message so you can figure out if it came from the source or the destination. Printing the time makes it easy to do latency measurements or timings between trace points. Signed-off-by: Dr. David Alan Gilbert dgilb...@redhat.com --- scripts/tracetool/backend/stderr.py | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/scripts/tracetool/backend/stderr.py b/scripts/tracetool/backend/stderr.py index 2a1e906..16878a3 100644 --- a/scripts/tracetool/backend/stderr.py +++ b/scripts/tracetool/backend/stderr.py @@ -21,6 +21,9 @@ PUBLIC = True def generate_h_begin(events): out('#include stdio.h', +'#include sys/time.h', +'#include sys/types.h', +'#include unistd.h', '#include trace/control.h', '') @@ -31,7 +34,12 @@ def generate_h(event): argnames = , + argnames out('if (trace_event_get_state(%(event_id)s)) {', -'fprintf(stderr, %(name)s %(fmt)s \\n %(argnames)s);', +'struct timeval _now;', +'gettimeofday(_now, NULL);', +'fprintf(stderr, %%d@%%zd.%%zd:%(name)s %(fmt)s \\n,', +'getpid(),', +'(size_t)_now.tv_sec, (size_t)_now.tv_usec', +'%(argnames)s);', '}', event_id=TRACE_ + event.name.upper(), name=event.name, -- 2.1.0
[Qemu-devel] [PATCH 8/9] balloon: Factor out common is balloon active test
Signed-off-by: Markus Armbruster arm...@redhat.com --- balloon.c | 29 +++-- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/balloon.c b/balloon.c index 2884c2d..aa30617 100644 --- a/balloon.c +++ b/balloon.c @@ -36,6 +36,19 @@ static QEMUBalloonEvent *balloon_event_fn; static QEMUBalloonStatus *balloon_stat_fn; static void *balloon_opaque; +static int have_ballon(Error **errp) +{ +if (kvm_enabled() !kvm_has_sync_mmu()) { +error_set(errp, QERR_KVM_MISSING_CAP, synchronous MMU, balloon); +return 0; +} +if (!balloon_event_fn) { +error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon); +return 0; +} +return 1; +} + int qemu_add_balloon_handler(QEMUBalloonEvent *event_func, QEMUBalloonStatus *stat_func, void *opaque) { @@ -66,13 +79,7 @@ BalloonInfo *qmp_query_balloon(Error **errp) { BalloonInfo *info; -if (kvm_enabled() !kvm_has_sync_mmu()) { -error_set(errp, QERR_KVM_MISSING_CAP, synchronous MMU, balloon); -return NULL; -} - -if (!balloon_stat_fn) { -error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon); +if (!have_ballon(errp)) { return NULL; } @@ -83,8 +90,7 @@ BalloonInfo *qmp_query_balloon(Error **errp) void qmp_balloon(int64_t target, Error **errp) { -if (kvm_enabled() !kvm_has_sync_mmu()) { -error_set(errp, QERR_KVM_MISSING_CAP, synchronous MMU, balloon); +if (!have_ballon(errp)) { return; } @@ -92,11 +98,6 @@ void qmp_balloon(int64_t target, Error **errp) error_set(errp, QERR_INVALID_PARAMETER_VALUE, target, a size); return; } - -if (!balloon_event_fn) { -error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon); -return; -} trace_balloon_event(balloon_opaque, target); balloon_event_fn(balloon_opaque, target); -- 1.9.3
[Qemu-devel] [PATCH 3/9] hmp: Compile hmp_info_spice() only with CONFIG_SPICE
It's dead code when CONFIG_SPICE is off. If it wasn't, it would crash dereferencing the null pointer returned by the qmp_query_spice() dummy in qmp.c. Signed-off-by: Markus Armbruster arm...@redhat.com --- hmp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hmp.c b/hmp.c index 481be80..a42c5c0 100644 --- a/hmp.c +++ b/hmp.c @@ -535,6 +535,7 @@ out: qapi_free_VncInfo(info); } +#ifdef CONFIG_SPICE void hmp_info_spice(Monitor *mon, const QDict *qdict) { SpiceChannelList *chan; @@ -581,6 +582,7 @@ void hmp_info_spice(Monitor *mon, const QDict *qdict) out: qapi_free_SpiceInfo(info); } +#endif void hmp_info_balloon(Monitor *mon, const QDict *qdict) { -- 1.9.3
[Qemu-devel] [PATCH 1/9] qmp hmp: Factor out common using spice test
Into qemu_using_spice(). For want of a better place, put it next the existing monitor command handler dummies in qemu-spice.h. Signed-off-by: Markus Armbruster arm...@redhat.com --- include/ui/qemu-spice.h | 10 ++ monitor.c | 5 +++-- qmp.c | 11 +++ 3 files changed, 16 insertions(+), 10 deletions(-) diff --git a/include/ui/qemu-spice.h b/include/ui/qemu-spice.h index a93b4b2..b3f2679 100644 --- a/include/ui/qemu-spice.h +++ b/include/ui/qemu-spice.h @@ -88,4 +88,14 @@ static inline int qemu_spice_display_add_client(int csock, int skipauth, #endif /* CONFIG_SPICE */ +static inline int qemu_using_spice(Error **errp) +{ +if (!using_spice) { +/* correct one? spice isn't a device ,,, */ +error_set(errp, QERR_DEVICE_NOT_ACTIVE, spice); +return 0; +} +return 1; +} + #endif /* QEMU_SPICE_H */ diff --git a/monitor.c b/monitor.c index 1808e41..cd81289 100644 --- a/monitor.c +++ b/monitor.c @@ -1095,11 +1095,12 @@ static int client_migrate_info(Monitor *mon, const QDict *qdict, const char *subject = qdict_get_try_str(qdict, cert-subject); int port = qdict_get_try_int(qdict, port, -1); int tls_port = qdict_get_try_int(qdict, tls-port, -1); +Error *err; int ret; if (strcmp(protocol, spice) == 0) { -if (!using_spice) { -qerror_report(QERR_DEVICE_NOT_ACTIVE, spice); +if (!qemu_using_spice(err)) { +qerror_report_err(err); return -1; } diff --git a/qmp.c b/qmp.c index 0b4f131..f01ae7d 100644 --- a/qmp.c +++ b/qmp.c @@ -287,9 +287,7 @@ void qmp_set_password(const char *protocol, const char *password, } if (strcmp(protocol, spice) == 0) { -if (!using_spice) { -/* correct one? spice isn't a device ,,, */ -error_set(errp, QERR_DEVICE_NOT_ACTIVE, spice); +if (!qemu_using_spice(errp)) { return; } rc = qemu_spice_set_passwd(password, fail_if_connected, @@ -335,9 +333,7 @@ void qmp_expire_password(const char *protocol, const char *whenstr, } if (strcmp(protocol, spice) == 0) { -if (!using_spice) { -/* correct one? spice isn't a device ,,, */ -error_set(errp, QERR_DEVICE_NOT_ACTIVE, spice); +if (!qemu_using_spice(errp)) { return; } rc = qemu_spice_set_pw_expire(when); @@ -562,8 +558,7 @@ void qmp_add_client(const char *protocol, const char *fdname, } if (strcmp(protocol, spice) == 0) { -if (!using_spice) { -error_set(errp, QERR_DEVICE_NOT_ACTIVE, spice); +if (!qemu_using_spice(errp)) { close(fd); return; } -- 1.9.3
[Qemu-devel] [PATCH 4/9] qmp: Clean up qmp_query_spice() #ifndef !CONFIG_SPICE dummy
QMP command query-spice exists only #ifdef CONFIG_SPICE. Due to QAPI limitations, we need a dummy function anyway, but it's unreachable. Our current dummy function goes out of its way to produce the exact same error as the QMP core does for unknown commands. Cute, but both unclean and unnecessary. Replace by straight abort(). Signed-off-by: Markus Armbruster arm...@redhat.com --- qmp.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/qmp.c b/qmp.c index f01ae7d..54624ef 100644 --- a/qmp.c +++ b/qmp.c @@ -137,14 +137,18 @@ VncInfo *qmp_query_vnc(Error **errp) #endif #ifndef CONFIG_SPICE -/* If SPICE support is enabled, the true query-spice command is - defined in the SPICE subsystem. Also note that we use a small - trick to maintain query-spice's original behavior, which is not - to be available in the namespace if SPICE is not compiled in */ +/* + * qmp-commands.hx ensures that QMP command query-spice exists only + * #ifdef CONFIG_SPICE. Necessary for an accurate query-commands + * result. However, the QAPI schema is blissfully unaware of that, + * and the QAPI code generator happily generates a dead + * qmp_marshal_input_query_spice() that calls qmp_query_spice(). + * Provide it one, or else linking fails. + * FIXME Educate the QAPI schema on CONFIG_SPICE. + */ SpiceInfo *qmp_query_spice(Error **errp) { -error_set(errp, QERR_COMMAND_NOT_FOUND, query-spice); -return NULL; +abort(); }; #endif -- 1.9.3
[Qemu-devel] [PATCH 0/9] qmp hmp balloon: Cleanups around error reporting
I'm including balloon patches in the hope that they too can go through Luiz's tree. Markus Armbruster (9): qmp hmp: Factor out common using spice test qmp hmp: Improve error messages when SPICE is not in use hmp: Compile hmp_info_spice() only with CONFIG_SPICE qmp: Clean up qmp_query_spice() #ifndef !CONFIG_SPICE dummy qmp: Simplify recognition of capability negotiation command qmp: Eliminate silly QERR_COMMAND_NOT_FOUND macro balloon: Inline qemu_balloon(), qemu_balloon_status() balloon: Factor out common is balloon active test balloon: Eliminate silly QERR_ macros balloon.c | 59 ++- hmp.c | 2 ++ include/qapi/qmp/qerror.h | 9 include/ui/qemu-spice.h | 10 monitor.c | 19 +++ qapi/qmp-dispatch.c | 3 ++- qmp.c | 27 +++--- 7 files changed, 58 insertions(+), 71 deletions(-) -- 1.9.3
[Qemu-devel] [PATCH 5/9] qmp: Simplify recognition of capability negotiation command
Signed-off-by: Markus Armbruster arm...@redhat.com --- monitor.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/monitor.c b/monitor.c index cd81289..f8497b6 100644 --- a/monitor.c +++ b/monitor.c @@ -4783,9 +4783,9 @@ static int monitor_can_read(void *opaque) return (mon-suspend_cnt == 0) ? 1 : 0; } -static int invalid_qmp_mode(const Monitor *mon, const char *cmd_name) +static int invalid_qmp_mode(const Monitor *mon, const mon_cmd_t *cmd) { -int is_cap = compare_cmd(cmd_name, qmp_capabilities); +int is_cap = cmd-mhandler.cmd_new == do_qmp_capabilities; return (qmp_cmd_mode(mon) ? is_cap : !is_cap); } @@ -5079,13 +5079,8 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens) cmd_name = qdict_get_str(input, execute); trace_handle_qmp_command(mon, cmd_name); -if (invalid_qmp_mode(mon, cmd_name)) { -qerror_report(QERR_COMMAND_NOT_FOUND, cmd_name); -goto err_out; -} - cmd = qmp_find_cmd(cmd_name); -if (!cmd) { +if (!cmd || invalid_qmp_mode(mon, cmd)) { qerror_report(QERR_COMMAND_NOT_FOUND, cmd_name); goto err_out; } -- 1.9.3
[Qemu-devel] [PATCH 3/8] rcu: add rcutorture
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- tests/Makefile | 6 +- tests/rcutorture.c | 447 + util/rcu.c | 1 - 3 files changed, 452 insertions(+), 2 deletions(-) create mode 100644 tests/rcutorture.c diff --git a/tests/Makefile b/tests/Makefile index 47ba143..980c0ec 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -63,6 +63,8 @@ gcov-files-test-tls-y = check-unit-y += tests/test-int128$(EXESUF) # all code tested by test-int128 is inside int128.h gcov-files-test-int128-y = +check-unit-y += tests/rcutorture$(EXESUF) +gcov-files-rcutorture-y = util/rcu.c check-unit-y += tests/test-bitops$(EXESUF) check-unit-$(CONFIG_HAS_GLIB_SUBPROCESS_TESTS) += tests/test-qdev-global-props$(EXESUF) check-unit-y += tests/check-qom-interface$(EXESUF) @@ -226,7 +228,8 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o tests/check-qdict.o \ tests/test-qmp-input-visitor.o tests/test-qmp-input-strict.o \ tests/test-qmp-commands.o tests/test-visitor-serialization.o \ tests/test-x86-cpuid.o tests/test-mul64.o tests/test-int128.o \ - tests/test-opts-visitor.o tests/test-qmp-event.o tests/test-tls.o + tests/test-opts-visitor.o tests/test-qmp-event.o tests/test-tls.o \ + tests/rcutorture.o test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \ tests/test-qapi-event.o @@ -256,6 +259,7 @@ tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o page_cache.o tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o tests/test-int128$(EXESUF): tests/test-int128.o tests/test-tls$(EXESUF): tests/test-tls.o libqemuutil.a +tests/rcutorture$(EXESUF): tests/rcutorture.o libqemuutil.a tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \ hw/core/qdev.o hw/core/qdev-properties.o hw/core/hotplug.o\ diff --git a/tests/rcutorture.c b/tests/rcutorture.c new file mode 100644 index 000..cda7458 --- /dev/null +++ b/tests/rcutorture.c @@ -0,0 +1,447 @@ +/* + * rcutorture.c: simple user-level performance/stress test of RCU. + * + * Usage: + * ./rcu nreaders rperf [ seconds ] + * Run a read-side performance test with the specified + * number of readers for seconds seconds. + * ./rcu nupdaters uperf [ seconds ] + * Run an update-side performance test with the specified + * number of updaters and specified duration. + * ./rcu nreaders perf [ seconds ] + * Run a combined read/update performance test with the specified + * number of readers and one updater and specified duration. + * + * The above tests produce output as follows: + * + * n_reads: 46008000 n_updates: 146026 nreaders: 2 nupdaters: 1 duration: 1 + * ns/read: 43.4707 ns/update: 6848.1 + * + * The first line lists the total number of RCU reads and updates executed + * during the test, the number of reader threads, the number of updater + * threads, and the duration of the test in seconds. The second line + * lists the average duration of each type of operation in nanoseconds, + * or nan if the corresponding type of operation was not performed. + * + * ./rcu nreaders stress [ seconds ] + * Run a stress test with the specified number of readers and + * one updater. + * + * This test produces output as follows: + * + * n_reads: 114633217 n_updates: 3903415 n_mberror: 0 + * rcu_stress_count: 114618391 14826 0 0 0 0 0 0 0 0 0 + * + * The first line lists the number of RCU read and update operations + * executed, followed by the number of memory-ordering violations + * (which will be zero in a correct RCU implementation). The second + * line lists the number of readers observing progressively more stale + * data. A correct RCU implementation will have all but the first two + * numbers non-zero. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (c) 2008 Paul E. McKenney, IBM Corporation. + */ + +/* + * Test variables. + */ + +#include glib.h +#include stdlib.h +#include stdio.h +#include string.h +#include qemu/atomic.h +#include qemu/rcu.h +#include qemu/compiler.h +#include qemu/thread.h + +long long n_reads = 0LL; +long n_updates = 0L; +int nthreadsrunning; + +char argsbuf[64]; + +#define GOFLAG_INIT 0
[Qemu-devel] [PATCH 2/8] rcu: add rcu library
This includes a (mangled) copy of the liburcu code. The main changes are: 1) removing dependencies on many other header files in liburcu; 2) removing for simplicity the tentative busy waiting in synchronize_rcu, which has limited performance effects; 3) replacing futexes in synchronize_rcu with QemuEvents for Win32 portability. The API is the same as liburcu, so it should be possible in the future to require liburcu on POSIX systems for example and use our copy only on Windows. Among the various versions available I chose urcu-mb, which is the least invasive implementation even though it does not have the fastest rcu_read_{lock,unlock} implementation. The urcu flavor can be changed later, after benchmarking. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- docs/rcu.txt | 286 ++ hw/9pfs/virtio-9p-synth.c | 1 + include/qemu/atomic.h | 61 ++ include/qemu/queue.h | 13 +++ include/qemu/rcu.h| 120 +++ include/qemu/thread.h | 3 - util/Makefile.objs| 1 + util/rcu.c| 173 8 files changed, 655 insertions(+), 3 deletions(-) create mode 100644 docs/rcu.txt create mode 100644 include/qemu/rcu.h create mode 100644 util/rcu.c diff --git a/docs/rcu.txt b/docs/rcu.txt new file mode 100644 index 000..b839642 --- /dev/null +++ b/docs/rcu.txt @@ -0,0 +1,286 @@ +Using RCU (Read-Copy-Update) for synchronization + + +Read-copy update (RCU) is a synchronization mechanism that is used to +protect read-mostly data structures. RCU is very efficient and scalable +on the read side (it is wait-free), and thus can make the read paths +extremely fast. + +RCU supports concurrency between a single writer and multiple readers, +thus it is not used alone. Typically, the write-side will use a lock to +serialize multiple updates, but other approaches are possible (e.g., +restricting updates to a single task). In QEMU, when a lock is used, +this will often be the iothread mutex, also known as the big QEMU +lock (BQL). Also, restricting updates to a single task is done in +QEMU using the bottom half API. + +RCU is fundamentally a wait-to-finish mechanism. The read side marks +sections of code with critical sections, and the update side will wait +for the execution of all *currently running* critical sections before +proceeding, or before asynchronously executing a callback. + +The key point here is that only the currently running critical sections +are waited for; critical sections that are started _after_ the beginning +of the wait do not extend the wait, despite running concurrently with +the updater. This is the reason why RCU is more scalable than, +for example, reader-writer locks. It is so much more scalable that +the system will have a single instance of the RCU mechanism; a single +mechanism can be used for an arbitrary number of things, without +having to worry about things such as contention or deadlocks. + +How is this possible? The basic idea is to split updates in two phases, +removal and reclamation. During removal, we ensure that subsequent +readers will not be able to get a reference to the old data. After +removal has completed, a critical section will not be able to access +the old data. Therefore, critical sections that begin after removal +do not matter; as soon as all previous critical sections have finished, +there cannot be any readers who hold references to the data structure, +which may not be safely reclaimed (e.g., freed or unref'ed). + +Here is a picutre: + +thread 1 thread 2 thread 3 +------ +enter RCU crit.sec. + |finish removal phase + |begin wait + | |enter RCU crit.sec. +exit RCU crit.sec | | +complete wait | +begin reclamation phase | + exit RCU crit.sec. + + +Note how thread 3 is still executing its critical section when thread 2 +starts reclaiming data. This is possible, because the old version of the +data structure was not accessible at the time thread 3 began executing +that critical section. + + +RCU API +=== + +The core RCU API is small: + + void rcu_read_lock(void); + +Used by a reader to inform the reclaimer that the reader is +entering an RCU read-side critical section. + + void rcu_read_unlock(void); + +Used by a reader to inform the reclaimer that the reader is +exiting an RCU read-side critical section. Note that RCU +read-side critical sections may be nested and/or overlapping. + + void
[Qemu-devel] [PULL 0/2] Xen tree 2015-01-13
The following changes since commit 7d5ad15d17f26dd4f9ff5f3491828bc34e74f28c: Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into staging (2015-01-12 11:13:24 +) are available in the git repository at: git://xenbits.xen.org/people/sstabellini/qemu-dm.git xen-2015-01-13 for you to fetch changes up to c1d322e6048796296555dd36fdd102d7fa2f50bf: xen-hvm: increase maxmem before calling xc_domain_populate_physmap (2015-01-13 18:05:52 +) Liang Li (1): xen-pt: Fix PCI devices re-attach failed Stefano Stabellini (1): xen-hvm: increase maxmem before calling xc_domain_populate_physmap hw/xen/xen_pt.c |2 +- xen-hvm.c | 24 2 files changed, 25 insertions(+), 1 deletion(-)
[Qemu-devel] [PATCH 0/8] RCUification of the memory API, part 1
These are the minimal changes to adopt RCU and use it in memory_region_find (and hence in virtio-blk-dataplane). Looks big, but two thirds of it is documentation and tests. Please review! :) Paolo Jan Kiszka (1): memory: remove assertion on memory_region_destroy Paolo Bonzini (7): tls: require compiler support for __thread rcu: add rcu library rcu: add rcutorture rcu: allow nesting of rcu_read_lock/rcu_read_unlock rcu: add call_rcu memory: protect current_map by RCU memory: avoid ref/unref in memory_region_find configure | 9 +- docs/rcu.txt | 388 +++ exec.c| 2 +- hw/9pfs/virtio-9p-synth.c | 1 + include/exec/memory.h | 5 + include/qemu/atomic.h | 61 +++ include/qemu/queue.h | 13 ++ include/qemu/rcu.h| 155 include/qemu/thread.h | 3 - include/qemu/tls.h| 52 -- include/qom/cpu.h | 4 +- memory.c | 60 +++ tests/Makefile| 11 +- tests/rcutorture.c| 449 ++ tests/test-tls.c | 83 + util/Makefile.objs| 1 + util/rcu.c| 290 ++ 17 files changed, 1485 insertions(+), 102 deletions(-) create mode 100644 docs/rcu.txt create mode 100644 include/qemu/rcu.h delete mode 100644 include/qemu/tls.h create mode 100644 tests/rcutorture.c create mode 100644 tests/test-tls.c create mode 100644 util/rcu.c -- 1.8.3.1
[Qemu-devel] [PATCH 2/2] cpu_ldst.h: Collapse laddr() and saddr() into ldst_get_host_addr()
The macros laddr() and saddr() are always defined to be identical to each other. Replace them with a single macro, and give it a longer name so it's easier to grep the codebase and confirm that it's only used in this one place: ldst_get_host_addr(). Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- include/exec/cpu_ldst.h | 37 +++-- 1 file changed, 19 insertions(+), 18 deletions(-) diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h index 4700831..0aae9e4 100644 --- a/include/exec/cpu_ldst.h +++ b/include/exec/cpu_ldst.h @@ -53,30 +53,31 @@ h2g_nocheck(x); \ }) -#define saddr(x) g2h(x) -#define laddr(x) g2h(x) +#define ldst_get_host_addr(x) g2h(x) #else /* !CONFIG_USER_ONLY */ /* NOTE: we use double casts if pointers and target_ulong have different sizes */ -#define saddr(x) (uint8_t *)(intptr_t)(x) -#define laddr(x) (uint8_t *)(intptr_t)(x) +#define ldst_get_host_addr(x) (uint8_t *)(intptr_t)(x) #endif -#define ldub_raw(p) ldub_p(laddr((p))) -#define ldsb_raw(p) ldsb_p(laddr((p))) -#define lduw_raw(p) lduw_p(laddr((p))) -#define ldsw_raw(p) ldsw_p(laddr((p))) -#define ldl_raw(p) ldl_p(laddr((p))) -#define ldq_raw(p) ldq_p(laddr((p))) -#define ldfl_raw(p) ldfl_p(laddr((p))) -#define ldfq_raw(p) ldfq_p(laddr((p))) -#define stb_raw(p, v) stb_p(saddr((p)), v) -#define stw_raw(p, v) stw_p(saddr((p)), v) -#define stl_raw(p, v) stl_p(saddr((p)), v) -#define stq_raw(p, v) stq_p(saddr((p)), v) -#define stfl_raw(p, v) stfl_p(saddr((p)), v) -#define stfq_raw(p, v) stfq_p(saddr((p)), v) +/* Note that ldst_get_host_addr() should not be used anywhere outside + * these macros + */ +#define ldub_raw(p) ldub_p(ldst_get_host_addr(p)) +#define ldsb_raw(p) ldsb_p(ldst_get_host_addr(p)) +#define lduw_raw(p) lduw_p(ldst_get_host_addr(p)) +#define ldsw_raw(p) ldsw_p(ldst_get_host_addr(p)) +#define ldl_raw(p) ldl_p(ldst_get_host_addr(p)) +#define ldq_raw(p) ldq_p(ldst_get_host_addr(p)) +#define ldfl_raw(p) ldfl_p(ldst_get_host_addr(p)) +#define ldfq_raw(p) ldfq_p(ldst_get_host_addr(p)) +#define stb_raw(p, v) stb_p(ldst_get_host_addr(p), v) +#define stw_raw(p, v) stw_p(ldst_get_host_addr(p), v) +#define stl_raw(p, v) stl_p(ldst_get_host_addr(p), v) +#define stq_raw(p, v) stq_p(ldst_get_host_addr(p), v) +#define stfl_raw(p, v) stfl_p(ldst_get_host_addr(p), v) +#define stfq_raw(p, v) stfq_p(ldst_get_host_addr(p), v) #if defined(CONFIG_USER_ONLY) -- 1.9.1
[Qemu-devel] [PATCH 1/2] cpu_ldst.h: Remove unused ldul_ macros
The five ldul_ macros are not used anywhere and are marked up with an XXX comment. ldul is a non-standard prefix for our family of load instructions: we don't mark 32-bit accesses for signedness because they return a 32 bit quantity. So just delete them. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- include/exec/cpu_ldst.h | 9 - 1 file changed, 9 deletions(-) diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h index e5550e7..4700831 100644 --- a/include/exec/cpu_ldst.h +++ b/include/exec/cpu_ldst.h @@ -151,15 +151,6 @@ #else -/* XXX: find something cleaner. - * Furthermore, this is false for 64 bits targets - */ -#define ldul_user ldl_user -#define ldul_kernel ldl_kernel -#define ldul_hypv ldl_hypv -#define ldul_executive ldl_executive -#define ldul_supervisor ldl_supervisor - /* The memory helpers for tcg-generated code need tcg_target_long etc. */ #include tcg.h -- 1.9.1
Re: [Qemu-devel] [PATCH 2/4] target-i386: do not memcpy in and out of xmm_regs
On Wed, Jan 07, 2015 at 06:39:13PM +0100, Paolo Bonzini wrote: After the next patch, we will move the high parts of AVX and AVX512 registers in the same array as the SSE registers. This will make it impossible to memcpy an array of 128-bit values in and out of xmm_regs in one swoop. Use a for loop instead. Signed-off-by: Paolo Bonzini pbonz...@redhat.com Reviewed-by: Eduardo Habkost ehabk...@redhat.com -- Eduardo
Re: [Qemu-devel] [PATCH 4/9] hbitmap: store / restore
On 01/13/2015 07:59 AM, Vladimir Sementsov-Ogievskiy wrote: Best regards, Vladimir On 09.01.2015 00:21, John Snow wrote: On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote: Functions to store / restore HBitmap. HBitmap should be saved to linear bitmap format independently of endianess. Because of restoring in several steps, every step writes only the last level of the bitmap. All other levels are restored by hbitmap_restore_finish as a last step of restoring. So, HBitmap is inconsistent while restoring. . I guess by nature of how bitmap migration has been implemented, we cannot help but restore parts of the bitmap piece by piece, which requires us to force the bitmap into an inconsistent state. Yuck. It might be nice to turn on a disable bit inside the hbitmap that disallows its use until it is made consistent again, but I don't know what the performance hit of the extra conditionals everywhere would be like. Another approach is to restore the bitmap after each piece set. It is a bit slower of course but the bitmap will be consistent after hbitmap_restore_data() Yeah, that's not great either. The approach you have already taken is probably the best. +/** + * hbitmap_restore_finish + * @hb: HBitmap to operate on. + * + * Repair HBitmap after calling hbitmap_restore_data. Actuall all HBitmap + * layers are restore here. + */ +void hbitmap_restore_finish(HBitmap *hb); + +/** * hbitmap_free: * @hb: HBitmap to operate on. * These are just biased opinions: - It might be nice to name the store/restore functions serialize and deserialize, to describe exactly what they are doing. - I might refer to restore_finish as post_load or post_restore or something similar to mimic how device migration functions are named. I think hbitmap_restore_data_finalize would also be fine; the key part here is clearly naming it relative to restore_data. Hmm. Ok, what about the following set: hbitmap_serialize() hbitmap_deserialize_part() hbitmap_deserialize_finish() Looks good to me! diff --git a/util/hbitmap.c b/util/hbitmap.c index 8aa7406..ac0323f 100644 --- a/util/hbitmap.c +++ b/util/hbitmap.c @@ -362,6 +362,90 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item) return (hb-levels[HBITMAP_LEVELS - 1][pos BITS_PER_LEVEL] bit) != 0; } +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count) +{ +uint64_t size; + +if (count == 0) { +return 0; +} + +size = (((count - 1) hb-granularity) BITS_PER_LEVEL) + 1; + +return size * sizeof(unsigned long); +} + This seems flawed to me: number of bits without an offset can't be mapped to a number of real bytes, because two bits may or may not cross a granularity boundary, depending on *WHERE* you start counting. e.g. granularity = 1 (i.e. 2^1 = 2 virtual bits per 1 real bit) virtual: 001100 real: 0 1 0 The amount of space required to hold two bits here could be as little as one bit, if the offset is k={0,2,4} but it could be as much as two bits if the offset is k={1,3}. You may never use the function in this way, but mapping virtual bits to an implementation byte-size seems like it is inviting an inconsistency. I dislike this function too.. But unfortunately we need the size in bytes used for serialization. Hmm. Ok, without loss of generality, let offset be less than granularity. The precise formula should look like: size = (((offset+count-1) hb-granularity) BITS_PER_LEVEL); So, size = 1 hb-granularity) + count - 2) hb-granularity) BITS_PER_LEVEL) + 1; would be enough in any case. Ok? I think so, as long as when you deserialize the object it does so correctly, regardless of whether or not you start on an even multiple of the granularity. +void hbitmap_store_data(const HBitmap *hb, uint8_t *buf, +uint64_t start, uint64_t count) +{ +uint64_t last = start + count - 1; +unsigned long *out = (unsigned long *)buf; + +if (count == 0) { +return; +} + +start = (start hb-granularity) BITS_PER_LEVEL; +last = (last hb-granularity) BITS_PER_LEVEL; +count = last - start + 1; + +#ifdef __BIG_ENDIAN_BITFIELD +for (i = start; i = last; ++i) { +unsigned long el = hb-levels[HBITMAP_LEVELS - 1][i]; +out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) : cpu_to_le64(el)); +} +#else +memcpy(out, hb-levels[HBITMAP_LEVELS - 1][start], + count * sizeof(unsigned long)); +#endif +} + I suppose the ifdefs are trying to take an optimization by using memcpy if at all possible, without a conversion. Why are the conversions to little endian, though? Shouldn't we be serializing to a Big Endian format? As Paolo already explained it is for portability. LE bitmap representation is independent of size of bitmap array elements, which may be 32bit/64bit, or whatever else with LE format. Yes, that makes sense. :) +void hbitmap_restore_data(HBitmap *hb, uint8_t *buf, +
[Qemu-devel] [PATCH 5/8] block: add bdrv_load_dirty_bitmap
The funcion loads dirty bitmap from file, using underlying driver function. Note: the function doesn't change BdrvDirtyBitmap.file field. This field is only used by bdrv_store_dirty_bitmap() function and is ONLY written by bdrv_dirty_bitmap_set_file() function. Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- block.c | 37 + include/block/block.h | 5 + 2 files changed, 42 insertions(+) diff --git a/block.c b/block.c index 7237b95..77419e9 100644 --- a/block.c +++ b/block.c @@ -5384,6 +5384,43 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs, return originator; } +BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs, +BlockDriverState *file, +int granularity, +const char *name, +Error **errp) +{ +BlockDriver *drv = file-drv; +if (!drv) { +return NULL; +} +if (drv-bdrv_dirty_bitmap_load) { +BdrvDirtyBitmap *bitmap; +uint64_t bitmap_size = bdrv_nb_sectors(bs); +uint8_t *buf = drv-bdrv_dirty_bitmap_load(file, name, bitmap_size, + granularity); +if (buf == NULL) { +return NULL; +} + +bitmap = bdrv_create_dirty_bitmap(bs, granularity, name, errp); +if (bitmap == NULL) { +g_free(buf); +return NULL; +} + +hbitmap_restore_data(bitmap-bitmap, buf, 0, bitmap_size); +hbitmap_restore_finish(bitmap-bitmap); + +return bitmap; +} +if (file-file) { +return bdrv_load_dirty_bitmap(bs, file-file, granularity, name, + errp); +} +return NULL; +} + int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap) { BlockDriverState *bs = bitmap-file; diff --git a/include/block/block.h b/include/block/block.h index 0dfefe3..f36557f 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -460,6 +460,11 @@ void bdrv_dirty_iter_init(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, struct HBitmapIter *hbi); void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t offset); int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap); +BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs, +BlockDriverState *file, +int granularity, +const char *name, +Error **errp); int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap); void bdrv_enable_copy_on_read(BlockDriverState *bs); -- 1.9.1
[Qemu-devel] [PATCH] block: update string sizes for filename, backing_file, exact_filename
The string field entries 'filename', 'backing_file', and 'exact_filename' in the BlockDriverState struct are defined as 1024 bytes. However, most places that use these values accept a maximum of PATH_MAX bytes. This patch makes the BlockDriverStruct field string sizes match the most common usage. This patch also updates two block drivers that still use 1024-byte sized arrays for 'backing_file'. Signed-off-by: Jeff Cody jc...@redhat.com --- block/mirror.c| 2 +- block/qapi.c | 2 +- include/block/block_int.h | 8 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index 9019d1b..57154eb 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -378,7 +378,7 @@ static void coroutine_fn mirror_run(void *opaque) int64_t sector_num, end, sectors_per_chunk, length; uint64_t last_pause_ns; BlockDriverInfo bdi; -char backing_filename[1024]; +char backing_filename[PATH_MAX]; int ret = 0; int n; diff --git a/block/qapi.c b/block/qapi.c index a6fd6f7..c097238 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -175,7 +175,7 @@ void bdrv_query_image_info(BlockDriverState *bs, { int64_t size; const char *backing_filename; -char backing_filename2[1024]; +char backing_filename2[PATH_MAX]; BlockDriverInfo bdi; int ret; Error *err = NULL; diff --git a/include/block/block_int.h b/include/block/block_int.h index 06a21dd..e264be9 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -339,13 +339,13 @@ struct BlockDriverState { * regarding this BDS's context */ QLIST_HEAD(, BdrvAioNotifier) aio_notifiers; -char filename[1024]; -char backing_file[1024]; /* if non zero, the image is a diff of -this file image */ +char filename[PATH_MAX]; +char backing_file[PATH_MAX]; /* if non zero, the image is a diff of +this file image */ char backing_format[16]; /* if non-zero and backing_file exists */ QDict *full_open_options; -char exact_filename[1024]; +char exact_filename[PATH_MAX]; BlockDriverState *backing_hd; BlockDriverState *file; -- 1.9.3
[Qemu-devel] [PATCH 6/9] qmp: Eliminate silly QERR_COMMAND_NOT_FOUND macro
The QERR_ macros are leftovers from the days of rich error objects. They're used with error_set() and qerror_report(), and expand into the first *two* arguments. This trickiness has become pointless. Clean this one up. Signed-off-by: Markus Armbruster arm...@redhat.com --- include/qapi/qmp/qerror.h | 3 --- monitor.c | 3 ++- qapi/qmp-dispatch.c | 3 ++- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h index 0ca6cbd..28f980e 100644 --- a/include/qapi/qmp/qerror.h +++ b/include/qapi/qmp/qerror.h @@ -52,9 +52,6 @@ void qerror_report_err(Error *err); #define QERR_BUS_NOT_FOUND \ ERROR_CLASS_GENERIC_ERROR, Bus '%s' not found -#define QERR_COMMAND_NOT_FOUND \ -ERROR_CLASS_COMMAND_NOT_FOUND, The command %s has not been found - #define QERR_DEVICE_ENCRYPTED \ ERROR_CLASS_DEVICE_ENCRYPTED, '%s' (%s) is encrypted diff --git a/monitor.c b/monitor.c index f8497b6..0293769 100644 --- a/monitor.c +++ b/monitor.c @@ -5081,7 +5081,8 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens) trace_handle_qmp_command(mon, cmd_name); cmd = qmp_find_cmd(cmd_name); if (!cmd || invalid_qmp_mode(mon, cmd)) { -qerror_report(QERR_COMMAND_NOT_FOUND, cmd_name); +qerror_report(ERROR_CLASS_COMMAND_NOT_FOUND, + The command %s has not been found, cmd_name); goto err_out; } diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c index 168b083..2227420 100644 --- a/qapi/qmp-dispatch.c +++ b/qapi/qmp-dispatch.c @@ -76,7 +76,8 @@ static QObject *do_qmp_dispatch(QObject *request, Error **errp) command = qdict_get_str(dict, execute); cmd = qmp_find_command(command); if (cmd == NULL) { -error_set(errp, QERR_COMMAND_NOT_FOUND, command); +error_set(errp, ERROR_CLASS_COMMAND_NOT_FOUND, + The command %s has not been found, command); return NULL; } if (!cmd-enabled) { -- 1.9.3
[Qemu-devel] [PATCH 7/9] balloon: Inline qemu_balloon(), qemu_balloon_status()
... and simplify a bit. Permits factoring out common error checks in the next commit. Signed-off-by: Markus Armbruster arm...@redhat.com --- balloon.c | 42 +- 1 file changed, 13 insertions(+), 29 deletions(-) diff --git a/balloon.c b/balloon.c index b70da4f..2884c2d 100644 --- a/balloon.c +++ b/balloon.c @@ -62,25 +62,6 @@ void qemu_remove_balloon_handler(void *opaque) balloon_opaque = NULL; } -static int qemu_balloon(ram_addr_t target) -{ -if (!balloon_event_fn) { -return 0; -} -trace_balloon_event(balloon_opaque, target); -balloon_event_fn(balloon_opaque, target); -return 1; -} - -static int qemu_balloon_status(BalloonInfo *info) -{ -if (!balloon_stat_fn) { -return 0; -} -balloon_stat_fn(balloon_opaque, info); -return 1; -} - BalloonInfo *qmp_query_balloon(Error **errp) { BalloonInfo *info; @@ -90,30 +71,33 @@ BalloonInfo *qmp_query_balloon(Error **errp) return NULL; } +if (!balloon_stat_fn) { +error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon); +return NULL; +} + info = g_malloc0(sizeof(*info)); - -if (qemu_balloon_status(info) == 0) { -error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon); -qapi_free_BalloonInfo(info); -return NULL; -} - +balloon_stat_fn(balloon_opaque, info); return info; } -void qmp_balloon(int64_t value, Error **errp) +void qmp_balloon(int64_t target, Error **errp) { if (kvm_enabled() !kvm_has_sync_mmu()) { error_set(errp, QERR_KVM_MISSING_CAP, synchronous MMU, balloon); return; } -if (value = 0) { +if (target = 0) { error_set(errp, QERR_INVALID_PARAMETER_VALUE, target, a size); return; } -if (qemu_balloon(value) == 0) { +if (!balloon_event_fn) { error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon); +return; } + +trace_balloon_event(balloon_opaque, target); +balloon_event_fn(balloon_opaque, target); } -- 1.9.3
Re: [Qemu-devel] [PATCH v11 09/13] qmp: Add support of dirty-bitmap sync mode for drive-backup
On 01/13/2015 04:37 AM, Fam Zheng wrote: On Mon, 01/12 11:31, John Snow wrote: For dirty-bitmap sync mode, the block job will iterate through the given dirty bitmap to decide if a sector needs backup (backup all the dirty clusters and skip clean ones), just as allocation conditions of top sync mode. Signed-off-by: Fam Zheng f...@redhat.com Signed-off-by: John Snow js...@redhat.com --- block.c | 5 ++ block/backup.c| 120 ++ block/mirror.c| 4 ++ blockdev.c| 14 +- hmp.c | 3 +- include/block/block.h | 1 + include/block/block_int.h | 2 + qapi/block-core.json | 11 +++-- qmp-commands.hx | 7 +-- 9 files changed, 137 insertions(+), 30 deletions(-) diff --git a/block.c b/block.c index 3f33b9d..5eb6788 100644 --- a/block.c +++ b/block.c @@ -5633,6 +5633,11 @@ static void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, } } +void bdrv_set_dirty_iter(HBitmapIter *hbi, int64_t offset) +{ +hbitmap_iter_init(hbi, hbi-hb, offset); What's the reason for this indirection? Can caller just use hbitmap_iter_init? Essentially it is just a rename of hbitmap_iter_init to make its usage here clear, that we are manually advancing the pointer. How we accomplish that (hbitmap_iter_init) is an implementation detail. Yes, I can just call this directly from block/backup, but at the time I was less sure of how I would advance the pointer, so I created a wrapper where I could change details as needed, if I needed to. +} + int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) { return hbitmap_count(bitmap-bitmap); diff --git a/block/backup.c b/block/backup.c index 792e655..0626a3e 100644 --- a/block/backup.c +++ b/block/backup.c @@ -37,6 +37,8 @@ typedef struct CowRequest { typedef struct BackupBlockJob { BlockJob common; BlockDriverState *target; +/* bitmap for sync=dirty-bitmap */ +BdrvDirtyBitmap *sync_bitmap; MirrorSyncMode sync_mode; RateLimit limit; BlockdevOnError on_source_error; @@ -242,6 +244,31 @@ static void backup_complete(BlockJob *job, void *opaque) g_free(data); } +static bool coroutine_fn yield_and_check(BackupBlockJob *job) +{ +if (block_job_is_cancelled(job-common)) { +return true; +} + +/* we need to yield so that qemu_aio_flush() returns. + * (without, VM does not reboot) + */ +if (job-common.speed) { +uint64_t delay_ns = ratelimit_calculate_delay(job-limit, + job-sectors_read); +job-sectors_read = 0; +block_job_sleep_ns(job-common, QEMU_CLOCK_REALTIME, delay_ns); +} else { +block_job_sleep_ns(job-common, QEMU_CLOCK_REALTIME, 0); +} + +if (block_job_is_cancelled(job-common)) { +return true; +} + +return false; +} + static void coroutine_fn backup_run(void *opaque) { BackupBlockJob *job = opaque; @@ -254,13 +281,13 @@ static void coroutine_fn backup_run(void *opaque) }; int64_t start, end; int ret = 0; +bool error_is_read; QLIST_INIT(job-inflight_reqs); qemu_co_rwlock_init(job-flush_rwlock); start = 0; -end = DIV_ROUND_UP(job-common.len / BDRV_SECTOR_SIZE, - BACKUP_SECTORS_PER_CLUSTER); +end = DIV_ROUND_UP(job-common.len, BACKUP_CLUSTER_SIZE); job-bitmap = hbitmap_alloc(end, 0); @@ -278,28 +305,45 @@ static void coroutine_fn backup_run(void *opaque) qemu_coroutine_yield(); job-common.busy = true; } +} else if (job-sync_mode == MIRROR_SYNC_MODE_DIRTY_BITMAP) { +/* Dirty Bitmap sync has a slightly different iteration method */ +HBitmapIter hbi; +int64_t sector; +int64_t cluster; +bool polyrhythmic; + +bdrv_dirty_iter_init(bs, job-sync_bitmap, hbi); +/* Does the granularity happen to match our backup cluster size? */ +polyrhythmic = (bdrv_dirty_bitmap_granularity(bs, job-sync_bitmap) != +BACKUP_CLUSTER_SIZE); + +/* Find the next dirty /sector/ and copy that /cluster/ */ +while ((sector = hbitmap_iter_next(hbi)) != -1) { +if (yield_and_check(job)) { +goto leave; +} +cluster = sector / BACKUP_SECTORS_PER_CLUSTER; + +do { +ret = backup_do_cow(bs, cluster * BACKUP_SECTORS_PER_CLUSTER, +BACKUP_SECTORS_PER_CLUSTER, error_is_read); +if ((ret 0) +backup_error_action(job, error_is_read, -ret) == +BLOCK_ERROR_ACTION_REPORT) { +goto leave; +} +} while (ret 0); + +/* Advance (or rewind) our iterator if we need to. */ +
[Qemu-devel] [PATCH 5/8] rcu: add call_rcu
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- docs/rcu.txt | 110 +++-- include/qemu/rcu.h | 22 ++ util/rcu.c | 118 + 3 files changed, 246 insertions(+), 4 deletions(-) diff --git a/docs/rcu.txt b/docs/rcu.txt index b839642..96ed241 100644 --- a/docs/rcu.txt +++ b/docs/rcu.txt @@ -82,7 +82,50 @@ The core RCU API is small: Note that it would be valid for another update to come while synchronize_rcu is running. Because of this, it is better that the updater releases any locks it may hold before calling -synchronize_rcu. +synchronize_rcu. If this is not possible (for example, because +the updater is protected by the BQL), you can use call_rcu. + + void call_rcu1(struct rcu_head * head, +void (*func)(struct rcu_head *head)); + +This function invokes func(head) after all pre-existing RCU +read-side critical sections on all threads have completed. This +marks the end of the removal phase, with func taking care +asynchronously of the reclamation phase. + +The foo struct needs to have an rcu_head structure added, +perhaps as follows: + +struct foo { +struct rcu_head rcu; +int a; +char b; +long c; +}; + +so that the reclaimer function can fetch the struct foo address +and free it: + +call_rcu1(foo.rcu, foo_reclaim); + +void foo_reclaim(struct rcu_head *rp) +{ +struct foo *fp = container_of(rp, struct foo, rcu); +g_free(fp); +} + +For the common case where the rcu_head member is the first of the +struct, you can use the following macro. + + void call_rcu(T *p, + void (*func)(T *p), + field-name); + +call_rcu1 is typically used through this macro, in the common case +where the struct rcu_head is the first field in the struct. In +the above case, one could have written simply: + +call_rcu(foo_reclaim, g_free, rcu); typeof(*p) atomic_rcu_read(p); @@ -154,6 +197,11 @@ DIFFERENCES WITH LINUX - atomic_rcu_read and atomic_rcu_set replace rcu_dereference and rcu_assign_pointer. They take a _pointer_ to the variable being accessed. +- call_rcu is a macro that has an extra argument (the name of the first + field in the struct, which must be a struct rcu_head), and expects the + type of the callback's argument to be the type of the first argument. + call_rcu1 is the same as Linux's call_rcu. + RCU PATTERNS @@ -207,7 +255,47 @@ The write side looks simply like this (with appropriate locking): synchronize_rcu(); free(old); -Note that the same idiom would be possible with reader/writer +If the processing cannot be done purely within the critical section, it +is possible to combine this idiom with a real reference count: + +rcu_read_lock(); +p = atomic_rcu_read(foo); +foo_ref(p); +rcu_read_unlock(); +/* do something with p. */ +foo_unref(p); + +The write side can be like this: + +qemu_mutex_lock(foo_mutex); +old = foo; +atomic_rcu_set(foo, new); +qemu_mutex_unlock(foo_mutex); +synchronize_rcu(); +foo_unref(old); + +or with call_rcu: + +qemu_mutex_lock(foo_mutex); +old = foo; +atomic_rcu_set(foo, new); +qemu_mutex_unlock(foo_mutex); +call_rcu(foo_unref, old, rcu); + +In both cases, the write side only performs removal. Reclamation +happens when the last reference to a foo object is dropped. +Using synchronize_rcu() is undesirably expensive, because the +last reference may be dropped on the read side. Hence you can +use call_rcu() instead: + + foo_unref(struct foo *p) { +if (atomic_fetch_dec(p-refcount) == 1) { +call_rcu(foo_destroy, p, rcu); +} +} + + +Note that the same idioms would be possible with reader/writer locks: read_lock(foo_rwlock); write_mutex_lock(foo_rwlock); @@ -217,13 +305,27 @@ locks: write_mutex_unlock(foo_rwlock); free(p); +-- + +read_lock(foo_rwlock); write_mutex_lock(foo_rwlock); +p = foo;old = foo; +foo_ref(p); foo = new; +read_unlock(foo_rwlock); foo_unref(old); +/* do something with p. */ write_mutex_unlock(foo_rwlock); +read_lock(foo_rwlock); +foo_unref(p); +read_unlock(foo_rwlock); + +foo_unref could use a mechanism such as bottom halves to move deallocation +out of the write-side critical section. + RCU resizable arrays Resizable arrays can be used
Re: [Qemu-devel] [PATCH v4] block/raw-posix.c: Fixes raw_getlength() on Mac OS X so that it reports the correct length of a real CD
On Jan 13, 2015, at 9:05 AM, Stefan Hajnoczi wrote: On Tue, Jan 6, 2015 at 3:04 PM, Programmingkid programmingk...@gmail.com wrote: On Jan 6, 2015, at 9:02 AM, Stefan Hajnoczi wrote: On Fri, Jan 02, 2015 at 04:44:38PM -0500, Programmingkid wrote: Removes redundant ret variable and renames sectorSize variable to meet QEMU coding standards. This is a changelog item for v4 of this patch. Changelogs should go below the '---' line so they are not merged into git history. The rationale is that when a patch is merged into git, the changelog describing patch revisions that were posted on the mailing list is not relevant (we only see the final patch in git, not the revisions from the mailing list). Patches usually look like this: Subject: block/raw-posix: brief summary A longer description of the problem, maybe a command-line to reproduce a bug, and some rationale for this code change. Signed-off-by: Me m...@email.com --- v2: * Fix int - size_t for memory lengths [Requested by Bob] The changelog at the bottom is useful to code reviewers but won't get merged in the git history. Anyway, thanks for this patch. I have dropped this changelog line and merged it! Signed-off-by: John Arbuckle programmingk...@gmail.com --- block/raw-posix.c | 18 +- configure |2 +- 2 files changed, 18 insertions(+), 2 deletions(-) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan Thank you very much for accepting my patch. Hi, Unfortunately I had to drop this patch because it breaks -drive file=/dev/null,... /dev/null is a character device and we should not return -ENOTSUP when the CD-ROM ioctls fail. Please let it fail gracefully when the device is not a CD-ROM. Stefan What is the exact command you use with QEMU involving the /dev/null device? What value is suppose to be returned when using a device like /dev/null?
Re: [Qemu-devel] [PATCH v4] block/raw-posix.c: Fixes raw_getlength() on Mac OS X so that it reports the correct length of a real CD
On 13 January 2015 at 17:52, Programmingkid programmingk...@gmail.com wrote: What is the exact command you use with QEMU involving the /dev/null device? make check includes some tests which do this... What value is suppose to be returned when using a device like /dev/null? I think we should fall back to the try lseek code path, as we do on Linux. (That will return zero, as it happens.) -- PMM
Re: [Qemu-devel] Question regarding two variables in qemu migration code
On Tue, Jan 13, 2015 at 1:38 AM, Dr. David Alan Gilbert dgilb...@redhat.com wrote: * Jidong Xiao (jidong.x...@gmail.com) wrote: Hi, Hi, I am looking at the qemu source code, and trying to understand the migration part. In arch_init.c, there are two variables which seems quite confusing to me, They are: static uint64_t migration_dirty_pages; 'migration_dirty_pages' is the number of pages that are currently known that need to be sent to the destination; it goes down whenever we send a page, but goes up when we sync the dirty bitmap that tells us that something changed the data in the page (see migration_bitmap_sync_range ) static int64_t num_dirty_pages_period; // defined in function migration_bitmap_sync() This is looking how many pages we've noticed are now dirty within a particular time - to try and get an estimate of how fast memory is changing If you see migration_bitmap_sync has an: if (end_time start_time + 1000) { and inside there it uses num_dirty_pages_period to update dirty_pages_rate. Can anyone kindly explain that what does these two variables mean? Thanks. -Jidong Dave -- Thanks Dave, your explanation is really really helpful. But in function migration_bitmap_sync(), I see this: num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init; If as you said, num_dirty_pages_period refers to the pages get dirty within a particular time, then why it is +=“, instead of =? i.e., something like this: num_dirty_pages_period = migration_dirty_pages - num_dirty_pages_init; I just don't see why num_dirty_pages_period has to be accumulated with its previous value. -Jidong
[Qemu-devel] [PULL 1/2] xen-pt: Fix PCI devices re-attach failed
From: Liang Li liang.z...@intel.com Use the 'xl pci-attach $DomU $BDF' command to attach more than one PCI devices to the guest, then detach the devices with 'xl pci-detach $DomU $BDF', after that, re-attach these PCI devices again, an error message will be reported like following: libxl: error: libxl_qmp.c:287:qmp_handle_error_response: receive an error message from QMP server: Duplicate ID 'pci-pt-03_10.1' for device. If using the 'address_space_memory' as the parameter of 'memory_listener_register', 'xen_pt_region_del' will not be called if the memory region's name is not 'xen-pci-pt-*' when the devices is detached. This will cause the device's related QemuOpts object not be released properly. Using the device's address space can avoid such issue, because the calling count of 'xen_pt_region_add' when attaching and the calling count of 'xen_pt_region_del' when detaching is the same, so all the memory region ref and unref by the 'xen_pt_region_add' and 'xen_pt_region_del' can be released properly. Signed-off-by: Liang Li liang.z...@intel.com Reviewed-by: Paolo Bonzini pbonz...@redhat.com Reported-by: Longtao Pang longtaox.p...@intel.com --- hw/xen/xen_pt.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c index c1bf357..f2893b2 100644 --- a/hw/xen/xen_pt.c +++ b/hw/xen/xen_pt.c @@ -736,7 +736,7 @@ static int xen_pt_initfn(PCIDevice *d) } out: -memory_listener_register(s-memory_listener, address_space_memory); +memory_listener_register(s-memory_listener, s-dev.bus_master_as); memory_listener_register(s-io_listener, address_space_io); XEN_PT_LOG(d, Real physical device %02x:%02x.%d registered successfully!\n, -- 1.7.10.4
[Qemu-devel] [PULL 2/2] xen-hvm: increase maxmem before calling xc_domain_populate_physmap
Increase maxmem before calling xc_domain_populate_physmap_exact to avoid the risk of running out of guest memory. This way we can also avoid complex memory calculations in libxl at domain construction time. This patch fixes an abort() when assigning more than 4 NICs to a VM. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Signed-off-by: Don Slutz dsl...@verizon.com --- xen-hvm.c | 24 1 file changed, 24 insertions(+) diff --git a/xen-hvm.c b/xen-hvm.c index 7548794..e2e575b 100644 --- a/xen-hvm.c +++ b/xen-hvm.c @@ -90,6 +90,12 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu) #endif #define BUFFER_IO_MAX_DELAY 100 +/* Leave some slack so that hvmloader does not complain about lack of + * memory at boot time (Could not allocate order=0 extent). + * Once hvmloader is modified to cope with that situation without + * printing warning messages, QEMU_SPARE_PAGES can be removed. + */ +#define QEMU_SPARE_PAGES 16 typedef struct XenPhysmap { hwaddr start_addr; @@ -244,6 +250,8 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr) unsigned long nr_pfn; xen_pfn_t *pfn_list; int i; +xc_domaininfo_t info; +unsigned long free_pages; if (runstate_check(RUN_STATE_INMIGRATE)) { /* RAM already populated in Xen */ @@ -266,6 +274,22 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr) pfn_list[i] = (ram_addr TARGET_PAGE_BITS) + i; } +if ((xc_domain_getinfolist(xen_xc, xen_domid, 1, info) != 1) || +(info.domain != xen_domid)) { +hw_error(xc_domain_getinfolist failed); +} +free_pages = info.max_pages - info.tot_pages; +if (free_pages QEMU_SPARE_PAGES) { +free_pages -= QEMU_SPARE_PAGES; +} else { +free_pages = 0; +} +if ((free_pages nr_pfn) +(xc_domain_setmaxmem(xen_xc, xen_domid, + ((info.max_pages + nr_pfn - free_pages) + (XC_PAGE_SHIFT - 10))) 0)) { +hw_error(xc_domain_setmaxmem failed); +} if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0, 0, pfn_list)) { hw_error(xen: failed to populate ram at RAM_ADDR_FMT, ram_addr); } -- 1.7.10.4
[Qemu-devel] [PATCH 8/8] iotests: test internal persistent dirty bitmap
The test performs several vm reloads with checking and updating dirty bitmap. Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- tests/qemu-iotests/115 | 96 ++ tests/qemu-iotests/115.out | 64 +++ tests/qemu-iotests/group | 1 + 3 files changed, 161 insertions(+) create mode 100755 tests/qemu-iotests/115 create mode 100644 tests/qemu-iotests/115.out diff --git a/tests/qemu-iotests/115 b/tests/qemu-iotests/115 new file mode 100755 index 000..ef853b1 --- /dev/null +++ b/tests/qemu-iotests/115 @@ -0,0 +1,96 @@ +#!/bin/bash +# +# Persistent dirty bitmap test +# +# Performs several vm reloads with checking and updating dirty bitmap. +# +# Copyright (C) 2014 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +# creator +owner=vsement...@parallels.com + +seq=`basename $0` +echo QA output created by $seq + +here=`pwd` +status=1# failure is the default! + +_cleanup() +{ +_cleanup_qemu +_cleanup_test_img +} +trap _cleanup; exit \$status 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common.rc +. ./common.filter +. ./common.qemu + +_supported_fmt qcow2 +_supported_os Linux + +size=1G +_make_test_img $size + +qemu_comm_method=monitor + +launch() { +_launch_qemu -drive file=${TEST_IMG},id=disk\ +-dirty-bitmap name=bitmap,drive=disk +echo +echo START VM +} + +print_bitmap() { +_send_qemu_cmd $QEMU_HANDLE 'print_dirty_bitmap disk bitmap'\ +dirty regions end | sed 's/^//' +} + +write() { +_send_qemu_cmd $QEMU_HANDLE 'qemu-io disk write ' $@ '' (qemu) +echo +echo write $@ +} + +quit() { +_send_qemu_cmd $QEMU_HANDLE 'qemu-io disk flush' (qemu) +_send_qemu_cmd $QEMU_HANDLE 'quit' +echo QUIT +wait ${QEMU_PID[$QEMU_HANDLE]} 2/dev/null +} + +{ +launch +print_bitmap +write 50m 1m +print_bitmap +quit + +launch +print_bitmap +write 0m 10m +write 700m 200m +print_bitmap +quit + +launch +print_bitmap +quit +} | sed '/(qemu)/d' + +status=0 diff --git a/tests/qemu-iotests/115.out b/tests/qemu-iotests/115.out new file mode 100644 index 000..d570c9f --- /dev/null +++ b/tests/qemu-iotests/115.out @@ -0,0 +1,64 @@ +QA output created by 115 +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 + +START VM +QEMU X.Y.Z monitor - type 'help' for more information +bitmap 'bitmap' +enabled: true +size: 2097152 +granularity: 65536 +dirty regions begin: +dirty regions end + +write 50m 1m +wrote 1048576/1048576 bytes at offset 52428800 +1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +bitmap 'bitmap' +enabled: true +size: 2097152 +granularity: 65536 +dirty regions begin: +102400 - 104447 +dirty regions end +QUIT + +START VM +QEMU X.Y.Z monitor - type 'help' for more information +bitmap 'bitmap' +enabled: true +size: 2097152 +granularity: 65536 +dirty regions begin: +102400 - 104447 +dirty regions end + +write 0m 10m +wrote 10485760/10485760 bytes at offset 0 +10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) + +write 700m 200m +wrote 209715200/209715200 bytes at offset 734003200 +200 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +bitmap 'bitmap' +enabled: true +size: 2097152 +granularity: 65536 +dirty regions begin: +0 - 20479 +102400 - 104447 +1433600 - 1843199 +dirty regions end +QUIT + +START VM +QEMU X.Y.Z monitor - type 'help' for more information +bitmap 'bitmap' +enabled: true +size: 2097152 +granularity: 65536 +dirty regions begin: +0 - 20479 +102400 - 104447 +1433600 - 1843199 +dirty regions end +QUIT diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group index a4742c6..77377b3 100644 --- a/tests/qemu-iotests/group +++ b/tests/qemu-iotests/group @@ -115,3 +115,4 @@ 111 rw auto quick 113 rw auto quick 114 rw auto quick +115 rw auto quick -- 1.9.1
[Qemu-devel] [PATCH 2/8] hbitmap: store / restore
Functions to store / restore HBitmap. HBitmap should be saved to linear bitmap format independently of endianess. These functions are appropriate for dirty bitmap migration, retoring the bitmap in several steps is available. To save performance, every step writes only the last level of the bitmap. All other levels are restored by hbitmap_restore_finish as a last step of restoring. So, HBitmap is inconsistent while restoring. Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- include/qemu/hbitmap.h | 49 util/hbitmap.c | 87 ++ 2 files changed, 136 insertions(+) diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h index c48c50a..f432f7f 100644 --- a/include/qemu/hbitmap.h +++ b/include/qemu/hbitmap.h @@ -137,6 +137,55 @@ void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count); bool hbitmap_get(const HBitmap *hb, uint64_t item); /** + * hbitmap_data_size: + * @hb: HBitmap to operate on. + * @count: Number of bits + * + * Return amount of bytes hbitmap_store_data needs + */ +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count); + +/** + * hbitmap_store_data + * @hb: HBitmap to oprate on. + * @buf: Buffer to store bitmap data. + * @start: First bit to store. + * @count: Number of bits to store. + * + * Stores HBitmap data corresponding to given region. The format of saved data + * is linear sequence of bits, so it can be used by hbitmap_restore_data + * independently of endianess + */ +void hbitmap_store_data(const HBitmap *hb, uint8_t *buf, +uint64_t start, uint64_t count); + +/** + * hbitmap_restore_data + * @hb: HBitmap to oprate on. + * @buf: Buffer to restore bitmap data from. + * @start: First bit to restore. + * @count: Number of bits to restore. + * + * Retores HBitmap data corresponding to given region. The format is the same + * as for hbitmap_store_data. + * + * ! The bitmap becomes inconsistent after this operation. + * hbitmap_restore_finish should be called before using the bitmap after + * data restoring. + */ +void hbitmap_restore_data(HBitmap *hb, uint8_t *buf, + uint64_t start, uint64_t count); + +/** + * hbitmap_restore_finish + * @hb: HBitmap to operate on. + * + * Repair HBitmap after calling hbitmap_restore_data. Actuall all HBitmap + * layers are restore here. + */ +void hbitmap_restore_finish(HBitmap *hb); + +/** * hbitmap_free: * @hb: HBitmap to operate on. * diff --git a/util/hbitmap.c b/util/hbitmap.c index f400dcb..5d1a776 100644 --- a/util/hbitmap.c +++ b/util/hbitmap.c @@ -366,6 +366,93 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item) return (hb-levels[HBITMAP_LEVELS - 1][pos BITS_PER_LEVEL] bit) != 0; } +uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count) +{ +uint64_t size; + +if (count == 0) { +return 0; +} + +size = (((count - 1) hb-granularity) BITS_PER_LEVEL) + 1; + +return size * sizeof(unsigned long); +} + +void hbitmap_store_data(const HBitmap *hb, uint8_t *buf, +uint64_t start, uint64_t count) +{ +uint64_t last = start + count - 1; +unsigned long *out = (unsigned long *)buf; + +if (count == 0) { +return; +} + +start = (start hb-granularity) BITS_PER_LEVEL; +last = (last hb-granularity) BITS_PER_LEVEL; +count = last - start + 1; + +#ifdef __BIG_ENDIAN_BITFIELD +for (i = start; i = last; ++i) { +unsigned long el = hb-levels[HBITMAP_LEVELS - 1][i]; +out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) : cpu_to_le64(el)); +} +#else +memcpy(out, hb-levels[HBITMAP_LEVELS - 1][start], + count * sizeof(unsigned long)); +#endif +} + +void hbitmap_restore_data(HBitmap *hb, uint8_t *buf, + uint64_t start, uint64_t count) +{ +uint64_t last = start + count - 1; +unsigned long *in = (unsigned long *)buf; + +if (count == 0) { +return; +} + +start = (start hb-granularity) BITS_PER_LEVEL; +last = (last hb-granularity) BITS_PER_LEVEL; +count = last - start + 1; + +#ifdef __BIG_ENDIAN_BITFIELD +for (i = start; i = last; ++i) { +hb-levels[HBITMAP_LEVELS - 1][i] = +(BITS_PER_LONG == 32 ? be32_to_cpu(in[i]) : be64_to_cpu(in[i])); +} +#else +memcpy(hb-levels[HBITMAP_LEVELS - 1][start], in, + count * sizeof(unsigned long)); +#endif +} + +void hbitmap_restore_finish(HBitmap *bitmap) +{ +int64_t i, size, prev_size; +int lev; + +/* restore levels starting from penultimate to zero level, assuming + * that the last level is ok */ +size = MAX((bitmap-size + BITS_PER_LONG - 1) BITS_PER_LEVEL, 1); +for (lev = HBITMAP_LEVELS - 1; lev-- 0; ) { +prev_size = size; +size = MAX((size + BITS_PER_LONG - 1) BITS_PER_LEVEL, 1); +memset(bitmap-levels[lev], 0, size * sizeof(unsigned long)); + +for (i = 0; i
[Qemu-devel] [PATCH 1/8] tls: require compiler support for __thread
The block layer is now using __thread unconditionally. Remove the fake TLS wrappers (that actually aren't TLS on !Linux) in include/qemu/tls.h, and add a testcase. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- configure | 9 +- exec.c | 2 +- include/qemu/tls.h | 52 -- include/qom/cpu.h | 4 +-- tests/Makefile | 7 - tests/test-tls.c | 83 ++ 6 files changed, 92 insertions(+), 65 deletions(-) delete mode 100644 include/qemu/tls.h create mode 100644 tests/test-tls.c diff --git a/configure b/configure index 7539645..0580606 100755 --- a/configure +++ b/configure @@ -1554,14 +1554,7 @@ fi if test $pie != no ; then cat $TMPC EOF - -#ifdef __linux__ -# define THREAD __thread -#else -# define THREAD -#endif - -static THREAD int tls_var; +static __thread int tls_var; int main(void) { return tls_var; } diff --git a/exec.c b/exec.c index 081818e..9046203 100644 --- a/exec.c +++ b/exec.c @@ -85,7 +85,7 @@ static MemoryRegion io_mem_unassigned; struct CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus); /* current CPU in the current thread. It is only valid inside cpu_exec() */ -DEFINE_TLS(CPUState *, current_cpu); +__thread CPUState *current_cpu; /* 0 = Do not count executed instructions. 1 = Precise instruction counting. 2 = Adaptive rate instruction counting. */ diff --git a/include/qemu/tls.h b/include/qemu/tls.h deleted file mode 100644 index b92ea9d..000 --- a/include/qemu/tls.h +++ /dev/null @@ -1,52 +0,0 @@ -/* - * Abstraction layer for defining and using TLS variables - * - * Copyright (c) 2011 Red Hat, Inc - * Copyright (c) 2011 Linaro Limited - * - * Authors: - * Paolo Bonzini pbonz...@redhat.com - * Peter Maydell peter.mayd...@linaro.org - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation; either version 2 of - * the License, or (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License along - * with this program; if not, see http://www.gnu.org/licenses/. - */ - -#ifndef QEMU_TLS_H -#define QEMU_TLS_H - -/* Per-thread variables. Note that we only have implementations - * which are really thread-local on Linux; the dummy implementations - * define plain global variables. - * - * This means that for the moment use should be restricted to - * per-VCPU variables, which are OK because: - * - the only -user mode supporting multiple VCPU threads is linux-user - * - TCG system mode is single-threaded regarding VCPUs - * - KVM system mode is multi-threaded but limited to Linux - * - * TODO: proper implementations via Win32 .tls sections and - * POSIX pthread_getspecific. - */ -#ifdef __linux__ -#define DECLARE_TLS(type, x) extern DEFINE_TLS(type, x) -#define DEFINE_TLS(type, x) __thread __typeof__(type) tls__##x -#define tls_var(x) tls__##x -#else -/* Dummy implementations which define plain global variables */ -#define DECLARE_TLS(type, x) extern DEFINE_TLS(type, x) -#define DEFINE_TLS(type, x) __typeof__(type) tls__##x -#define tls_var(x) tls__##x -#endif - -#endif diff --git a/include/qom/cpu.h b/include/qom/cpu.h index 2098f1c..a93466d 100644 --- a/include/qom/cpu.h +++ b/include/qom/cpu.h @@ -26,7 +26,6 @@ #include exec/hwaddr.h #include qemu/queue.h #include qemu/thread.h -#include qemu/tls.h #include qemu/typedefs.h typedef int (*WriteCoreDumpFunction)(const void *buf, size_t size, @@ -310,8 +309,7 @@ extern struct CPUTailQ cpus; QTAILQ_FOREACH_SAFE(cpu, cpus, node, next_cpu) #define first_cpu QTAILQ_FIRST(cpus) -DECLARE_TLS(CPUState *, current_cpu); -#define current_cpu tls_var(current_cpu) +extern __thread CPUState *current_cpu; /** * cpu_paging_enabled: diff --git a/tests/Makefile b/tests/Makefile index e4ddb6a..47ba143 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -57,6 +57,9 @@ check-unit-y += tests/test-cutils$(EXESUF) gcov-files-test-cutils-y += util/cutils.c check-unit-y += tests/test-mul64$(EXESUF) gcov-files-test-mul64-y = util/host-utils.c +check-unit-y += tests/test-tls$(EXESUF) +# all code tested by test-tls is inside test-tls.c +gcov-files-test-tls-y = check-unit-y += tests/test-int128$(EXESUF) # all code tested by test-int128 is inside int128.h gcov-files-test-int128-y = @@ -223,7 +226,7 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o tests/check-qdict.o \ tests/test-qmp-input-visitor.o tests/test-qmp-input-strict.o \ tests/test-qmp-commands.o tests/test-visitor-serialization.o \ tests/test-x86-cpuid.o
[Qemu-devel] [PATCH 2/9] qmp hmp: Improve error messages when SPICE is not in use
Commit 7572150 adopted QERR_DEVICE_NOT_ACTIVE for the purpose, probably because adding another error seemed cumbersome overkill. Produces No spice device has been activated, which is awkward. We've since abandoned our quest for rich error objects. Time to undo the damage to this error message. Replace it by SPICE is not in use. Keep the stupid DeviceNotActive ErrorClass for compatibility, even though Libvirt doesn't use it. Signed-off-by: Markus Armbruster arm...@redhat.com --- include/ui/qemu-spice.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/ui/qemu-spice.h b/include/ui/qemu-spice.h index b3f2679..99bb622 100644 --- a/include/ui/qemu-spice.h +++ b/include/ui/qemu-spice.h @@ -91,8 +91,8 @@ static inline int qemu_spice_display_add_client(int csock, int skipauth, static inline int qemu_using_spice(Error **errp) { if (!using_spice) { -/* correct one? spice isn't a device ,,, */ -error_set(errp, QERR_DEVICE_NOT_ACTIVE, spice); +error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE, + SPICE is not in use); return 0; } return 1; -- 1.9.3
[Qemu-devel] [PATCH 6/8] memory: remove assertion on memory_region_destroy
From: Jan Kiszka jan.kis...@siemens.com Now that memory_region_destroy can be called from an RCU callback, checking the BQL-protected global memory_region_transaction_depth does not make much sense. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- memory.c | 1 - 1 file changed, 1 deletion(-) diff --git a/memory.c b/memory.c index c343bf3..8c3d8c0 100644 --- a/memory.c +++ b/memory.c @@ -1263,7 +1263,6 @@ static void memory_region_finalize(Object *obj) MemoryRegion *mr = MEMORY_REGION(obj); assert(QTAILQ_EMPTY(mr-subregions)); -assert(memory_region_transaction_depth == 0); mr-destructor(mr); memory_region_clear_coalescing(mr); g_free((char *)mr-name); -- 1.8.3.1
[Qemu-devel] [PATCH 9/9] balloon: Eliminate silly QERR_ macros
The QERR_ macros are leftovers from the days of rich error objects. They're used with error_set() and qerror_report(), and expand into the first *two* arguments. This trickiness has become pointless. Clean up. Signed-off-by: Markus Armbruster arm...@redhat.com --- balloon.c | 6 -- include/qapi/qmp/qerror.h | 6 -- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/balloon.c b/balloon.c index aa30617..dcf5b9c 100644 --- a/balloon.c +++ b/balloon.c @@ -39,11 +39,13 @@ static void *balloon_opaque; static int have_ballon(Error **errp) { if (kvm_enabled() !kvm_has_sync_mmu()) { -error_set(errp, QERR_KVM_MISSING_CAP, synchronous MMU, balloon); +error_set(errp, ERROR_CLASS_KVM_MISSING_CAP, + Using KVM without synchronous MMU, balloon unavailable); return 0; } if (!balloon_event_fn) { -error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon); +error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE, + No balloon device has been activated); return 0; } return 1; diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h index 28f980e..eeaf0cb 100644 --- a/include/qapi/qmp/qerror.h +++ b/include/qapi/qmp/qerror.h @@ -70,9 +70,6 @@ void qerror_report_err(Error *err); #define QERR_DEVICE_NO_HOTPLUG \ ERROR_CLASS_GENERIC_ERROR, Device '%s' does not support hotplugging -#define QERR_DEVICE_NOT_ACTIVE \ -ERROR_CLASS_DEVICE_NOT_ACTIVE, No %s device has been activated - #define QERR_DEVICE_NOT_ENCRYPTED \ ERROR_CLASS_GENERIC_ERROR, Device '%s' is not encrypted @@ -109,9 +106,6 @@ void qerror_report_err(Error *err); #define QERR_JSON_PARSING \ ERROR_CLASS_GENERIC_ERROR, Invalid JSON syntax -#define QERR_KVM_MISSING_CAP \ -ERROR_CLASS_KVM_MISSING_CAP, Using KVM without %s, %s unavailable - #define QERR_MIGRATION_ACTIVE \ ERROR_CLASS_GENERIC_ERROR, There's a migration process in progress -- 1.9.3
[Qemu-devel] [PATCH 8/8] memory: avoid ref/unref in memory_region_find
Do the entire lookup under RCU. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- memory.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/memory.c b/memory.c index a844ced..577e87c 100644 --- a/memory.c +++ b/memory.c @@ -1828,7 +1828,8 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr, } range = addrrange_make(int128_make64(addr), int128_make64(size)); -view = address_space_get_flatview(as); +rcu_read_lock(); +view = atomic_rcu_read(as-current_map); fr = flatview_lookup(view, range); if (!fr) { flatview_unref(view); @@ -1850,7 +1851,7 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr, ret.readonly = fr-readonly; memory_region_ref(ret.mr); -flatview_unref(view); +rcu_read_unlock(); return ret; } -- 1.8.3.1
[Qemu-devel] [PATCH v5] block/raw-posix.c: Fixes raw_getlength() on Mac OS X so that it reports the correct length of a real CD
Allows for using real cdrom disc in QEMU under Mac OS X. This command was used to see if this patch worked: ./qemu-system-i386 -drive if=none,id=drive0,file=/dev/null,format=raw Signed-off-by: John Arbuckle programmingk...@gmail.com --- Replaced -errno with 0 for ioctl() failure return value. make check now passes with this patch. block/raw-posix.c | 18 +- configure |2 +- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index e51293a..16fa0a4 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -1312,7 +1312,23 @@ again: if (size == 0) #endif #if defined(__APPLE__) defined(__MACH__) -size = LLONG_MAX; +{ +uint64_t sectors = 0; +uint32_t sector_size = 0; + +/* Query the number of sectors on the disk */ +ret = ioctl(fd, DKIOCGETBLOCKCOUNT, sectors); +if (ret == -1) { +return 0; +} + +/* Query the size of each sector */ +ret = ioctl(fd, DKIOCGETBLOCKSIZE, sector_size); +if (ret == -1) { +return 0; +} +size = sectors * sector_size; +} #else size = lseek(fd, 0LL, SEEK_END); if (size 0) { diff --git a/configure b/configure index cae588c..32d3d3f 100755 --- a/configure +++ b/configure @@ -611,7 +611,7 @@ Darwin) cocoa=yes audio_drv_list=coreaudio audio_possible_drivers=coreaudio sdl fmod - LDFLAGS=-framework CoreFoundation -framework IOKit $LDFLAGS + LDFLAGS=-framework CoreFoundation -framework IOKit -framework ApplicationServices $LDFLAGS libs_softmmu=-F/System/Library/Frameworks -framework Cocoa -framework IOKit $libs_softmmu # Disable attempts to use ObjectiveC features in os/object.h since they # won't work when we're compiling with gcc as a C compiler. -- 1.7.5.4
Re: [Qemu-devel] [PATCH v11 13/13] block: BdrvDirtyBitmap miscellaneous fixup
On 01/13/2015 11:50 AM, Vladimir Sementsov-Ogievskiy wrote: Hmm. May be, update the size field to be uint64_t too? Negative size is not meaningful.. Best regards, Vladimir No, it is not meaningful. However, it does match the return type from bdrv_nb_sectors(bs), which uses -1 to indicate an error. So we also don't really need to change it in this circumstance, since we'll not be exceeding what int64_t/bdrv_nb_sectors can provide anyway. (Plus, INT64_MAX sectors is an absurdly large amount of bytes!) I'll see how many changes this requires in other helper functions when I prepare a V12. If it's easy I will do it. Thanks! --js On 12.01.2015 19:31, John Snow wrote: (1) granularity type consistency: Update the granularity to be uint64_t in all places. This value is always in bytes. (2) Some documentation for the fields within BdrvDirtyBitmap. Signed-off-by: John Snow js...@redhat.com --- block.c | 16 include/block/block.h | 2 +- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/block.c b/block.c index 9e2b8c0..13f9cc0 100644 --- a/block.c +++ b/block.c @@ -52,13 +52,13 @@ #endif struct BdrvDirtyBitmap { -HBitmap *bitmap; -BdrvDirtyBitmap *successor; -int64_t size; -int64_t granularity; -char *name; -bool enabled; -bool frozen; +HBitmap *bitmap;/* Dirty sector bitmap */ +BdrvDirtyBitmap *successor; /* Anonymous child */ +int64_t size; /* Number of sectors */ +uint64_t granularity; /* Granularity in bytes */ +char *name; /* Optional non-empty unique ID */ +bool enabled; /* Writes are being tracked */ +bool frozen;/* Protected; see bdrv_freeze_dirty_bitmap() */ QLIST_ENTRY(BdrvDirtyBitmap) list; }; @@ -5350,7 +5350,7 @@ void bdrv_dirty_bitmap_make_anon(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) } BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, - int granularity, + uint64_t granularity, const char *name, Error **errp) { diff --git a/include/block/block.h b/include/block/block.h index 99ed679..749429e 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -430,7 +430,7 @@ bool bdrv_qiov_is_aligned(BlockDriverState *bs, QEMUIOVector *qiov); struct HBitmapIter; typedef struct BdrvDirtyBitmap BdrvDirtyBitmap; BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, - int granularity, + uint64_t granularity, const char *name, Error **errp); int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs, -- —js
[Qemu-devel] [PATCH 3/8] qcow2: add dirty-bitmaps feature
Adds dirty-bitmaps feature to qcow2 format as specified in docs/specs/qcow2.txt Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- block/Makefile.objs| 2 +- block/qcow2-dirty-bitmap.c | 514 + block/qcow2.c | 26 +++ block/qcow2.h | 48 + include/block/block_int.h | 10 + 5 files changed, 599 insertions(+), 1 deletion(-) create mode 100644 block/qcow2-dirty-bitmap.c diff --git a/block/Makefile.objs b/block/Makefile.objs index 04b0e43..eebd1c9 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -1,5 +1,5 @@ block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o -block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o +block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o qcow2-dirty-bitmap.o block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-obj-y += qed-check.o block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c new file mode 100644 index 000..b3d114f --- /dev/null +++ b/block/qcow2-dirty-bitmap.c @@ -0,0 +1,514 @@ +/* + * Dirty bitmpas for the QCOW version 2 format + * + * Copyright (c) 2014-2015 Vladimir Sementsov-Ogievskiy + * + * This file is derived from qcow2-snapshot.c, original copyright: + * Copyright (c) 2004-2006 Fabrice Bellard + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include qemu-common.h +#include block/block_int.h +#include block/qcow2.h + +void qcow2_free_dirty_bitmaps(BlockDriverState *bs) +{ +BDRVQcowState *s = bs-opaque; +int i; + +for (i = 0; i s-nb_dirty_bitmaps; i++) { +g_free(s-dirty_bitmaps[i].name); +} +g_free(s-dirty_bitmaps); +s-dirty_bitmaps = NULL; +s-nb_dirty_bitmaps = 0; +} + +int qcow2_read_dirty_bitmaps(BlockDriverState *bs) +{ +BDRVQcowState *s = bs-opaque; +QCowDirtyBitmapHeader h; +QCowDirtyBitmap *bm; +int i, name_size; +int64_t offset; +int ret; + +if (!s-nb_dirty_bitmaps) { +s-dirty_bitmaps = NULL; +s-dirty_bitmaps_size = 0; +return 0; +} + +offset = s-dirty_bitmaps_offset; +s-dirty_bitmaps = g_new0(QCowDirtyBitmap, s-nb_dirty_bitmaps); + +for (i = 0; i s-nb_dirty_bitmaps; i++) { +/* Read statically sized part of the dirty_bitmap header */ +offset = align_offset(offset, 8); +ret = bdrv_pread(bs-file, offset, h, sizeof(h)); +if (ret 0) { +goto fail; +} + +offset += sizeof(h); +bm = s-dirty_bitmaps + i; +bm-l1_table_offset = be64_to_cpu(h.l1_table_offset); +bm-l1_size = be32_to_cpu(h.l1_size); +bm-bitmap_granularity = be32_to_cpu(h.bitmap_granularity); +bm-bitmap_size = be64_to_cpu(h.bitmap_size); + +name_size = be16_to_cpu(h.name_size); + +/* Read dirty_bitmap name */ +bm-name = g_malloc(name_size + 1); +ret = bdrv_pread(bs-file, offset, bm-name, name_size); +if (ret 0) { +goto fail; +} +offset += name_size; +bm-name[name_size] = '\0'; + +if (offset - s-dirty_bitmaps_offset QCOW_MAX_DIRTY_BITMAPS_SIZE) { +ret = -EFBIG; +goto fail; +} +} + +assert(offset - s-dirty_bitmaps_offset = INT_MAX); +s-dirty_bitmaps_size = offset - s-dirty_bitmaps_offset; +return 0; + +fail: +qcow2_free_dirty_bitmaps(bs); +return ret; +} + +/* add at the end of the file a new list of dirty bitmaps */ +static int qcow2_write_dirty_bitmaps(BlockDriverState *bs) +{ +BDRVQcowState *s = bs-opaque; +QCowDirtyBitmap *bm; +QCowDirtyBitmapHeader h; +int i, name_size, dirty_bitmaps_size; +struct { +uint32_t nb_dirty_bitmaps; +uint64_t dirty_bitmaps_offset; +}
[Qemu-devel] [PATCH 7/8] qmp: print dirty bitmap
Adds qmp and hmp commands to print dirty bitmap. This is needed only for testing persistent dirty bitmap feature. Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com --- block.c | 33 + blockdev.c| 13 + hmp-commands.hx | 15 +++ hmp.c | 8 hmp.h | 1 + include/block/block.h | 1 + qapi-schema.json | 3 ++- qapi/block-core.json | 3 +++ qmp-commands.hx | 5 + 9 files changed, 81 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index 77419e9..3e6dedf 100644 --- a/block.c +++ b/block.c @@ -5445,6 +5445,39 @@ int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap) return res; } +void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap) +{ +unsigned long a = 0, b = 0; + +printf(bitmap '%s'\n, bitmap-name ? bitmap-name : no name); +printf(enabled: %s\n, bitmap-enabled ? true : false); +printf(size: % PRId64 \n, bitmap-size); +printf(granularity: % PRId64 \n, bitmap-granularity); +printf(dirty regions begin:\n); + +while (true) { +for (a = b; a bitmap-size !hbitmap_get(bitmap-bitmap, a); ++a) { +; +} +if (a = bitmap-size) { +break; +} + +for (b = a + 1; + b bitmap-size hbitmap_get(bitmap-bitmap, b); + ++b) { +; +} + +printf(%ld - %ld\n, a, b - 1); +if (b = bitmap-size) { +break; +} +} + +printf(dirty regions end\n); +} + BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, int granularity, const char *name, diff --git a/blockdev.c b/blockdev.c index 8a9be08..8b58c2e 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2079,6 +2079,19 @@ void qmp_block_dirty_bitmap_add(const char *node_ref, const char *name, aio_context_release(aio_context); } +void qmp_block_dirty_bitmap_print(const char *node_ref, const char *name, + Error **errp) +{ +BdrvDirtyBitmap *bitmap; + +bitmap = block_dirty_bitmap_lookup(node_ref, name, NULL, errp); +if (!bitmap) { +return; +} + +bdrv_print_dirty_bitmap(bitmap); +} + void qmp_block_dirty_bitmap_remove(const char *node_ref, const char *name, Error **errp) { diff --git a/hmp-commands.hx b/hmp-commands.hx index e37bc8b..a9be506 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -58,6 +58,21 @@ Quit the emulator. ETEXI { +.name = print_dirty_bitmap, +.args_type = device:B,bitmap:s, +.params = device bitmap, +.help = print dirty bitmap, +.user_print = monitor_user_noop, +.mhandler.cmd = hmp_print_dirty_bitmap, +}, + +STEXI +@item print_dirty_bitmap device_id bitmap_name +@findex print_dirty_bitmap +Print dirty bitmap meta information and dirty regions. +ETEXI + +{ .name = block_resize, .args_type = device:B,size:o, .params = device size, diff --git a/hmp.c b/hmp.c index 63b19c7..a269145 100644 --- a/hmp.c +++ b/hmp.c @@ -782,6 +782,14 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict) qapi_free_TPMInfoList(info_list); } +void hmp_print_dirty_bitmap(Monitor *mon, const QDict *qdict) +{ +const char *device = qdict_get_str(qdict, device); +const char *name = qdict_get_str(qdict, bitmap); + +qmp_block_dirty_bitmap_print(device, name, NULL); +} + void hmp_quit(Monitor *mon, const QDict *qdict) { monitor_suspend(mon); diff --git a/hmp.h b/hmp.h index 4bb5dca..6bbbc33 100644 --- a/hmp.h +++ b/hmp.h @@ -19,6 +19,7 @@ #include qapi-types.h #include qapi/qmp/qdict.h +void hmp_print_dirty_bitmap(Monitor *mon, const QDict *qdict); void hmp_info_name(Monitor *mon, const QDict *qdict); void hmp_info_version(Monitor *mon, const QDict *qdict); void hmp_info_kvm(Monitor *mon, const QDict *qdict); diff --git a/include/block/block.h b/include/block/block.h index f36557f..7188791 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -466,6 +466,7 @@ BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs, const char *name, Error **errp); int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap); +void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap); void bdrv_enable_copy_on_read(BlockDriverState *bs); void bdrv_disable_copy_on_read(BlockDriverState *bs); diff --git a/qapi-schema.json b/qapi-schema.json index 85f55d9..1475f69 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1263,7 +1263,8 @@ 'blockdev-snapshot-internal-sync': 'BlockdevSnapshotInternal', 'block-dirty-bitmap-add': 'BlockDirtyBitmapAdd', 'block-dirty-bitmap-enable': 'BlockDirtyBitmap', -
Re: [Qemu-devel] [PATCH v11 08/13] block: Add bitmap successors
On 01/13/2015 04:24 AM, Fam Zheng wrote: On Mon, 01/12 11:31, John Snow wrote: A bitmap successor is an anonymous BdrvDirtyBitmap that is intended to be created just prior to a sensitive operation (e.g. Incremental Backup) that can either succeed or fail, but during the course of which we still want a bitmap tracking writes. On creating a successor, we freeze the parent bitmap which prevents its deletion, enabling, anonymization, or creating a bitmap with the same name. On success, the parent bitmap can abdicate responsibility to the successor, which will inherit its name. The successor will have been tracking writes during the course of the backup operation. The parent will be safely deleted. On failure, we can reclaim the successor from the parent, unifying them such that the resulting bitmap describes all writes occurring since the last successful backup, for instance. Reclamation will thaw the parent, but not explicitly re-enable it. If we compare this with image snapshot and overlay, it fits the current backing chain model very well. Given that we're on the way to persistent dirty bitmap, which will probably be read/written with general block.c code, I'm wondering if it will be any better to reuse the block layer backing file code to handle successor logic? Also with the frozen feature built in dirty bitmaps, is it okay to drop enabled state? I think there are three states for a bitmap: 1) Active, no successor (similar to an read-write top image) 2) Frozen, no successor (similar to an read-only top image) 3) Frozen, with successor (similar to an read-only backing file, with an overlap) Admittedly, more code is needed than this patch, in order to glue hbitmap and block layer together, but it would probably make live migration of dirty bitmap very easy, but I'm not sure without a closer look. Hmm -- I suppose so. I was just treating frozen as a stricter disabled. Disabled just stops tracking writes, but frozen was intended to stop deletion, renaming, etc. If there is no use case for the softer disabled, then it can really be just a single parameter. Signed-off-by: John Snow js...@redhat.com --- block.c | 123 -- blockdev.c| 14 ++ include/block/block.h | 10 3 files changed, 144 insertions(+), 3 deletions(-) diff --git a/block.c b/block.c index bd4b449..3f33b9d 100644 --- a/block.c +++ b/block.c @@ -53,10 +53,12 @@ struct BdrvDirtyBitmap { HBitmap *bitmap; +BdrvDirtyBitmap *successor; int64_t size; int64_t granularity; char *name; bool enabled; +bool frozen; QLIST_ENTRY(BdrvDirtyBitmap) list; }; @@ -5342,6 +5344,7 @@ BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name) void bdrv_dirty_bitmap_make_anon(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) { +assert(!bitmap-frozen); g_free(bitmap-name); bitmap-name = NULL; } @@ -5379,9 +5382,114 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, return bitmap; } +/** + * A frozen bitmap cannot be renamed, deleted, cleared, + * or set. A frozen bitmap can only abdicate, reclaim, + * or thaw. + */ +static void bdrv_freeze_dirty_bitmap(BdrvDirtyBitmap *bitmap) +{ +bitmap-frozen = true; +} + +static void bdrv_thaw_dirty_bitmap(BdrvDirtyBitmap *bitmap) +{ +bitmap-frozen = false; +} + +bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap) +{ +return bitmap-frozen; +} + bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap) { -return bitmap-enabled; +return bitmap-enabled !bitmap-frozen; An indication that enabled could be replaced by frozen. Otherwise it looks confusing to me. Perhaps. Here I mean to say that Frozen implies disabled. The stronger condition implies the weaker. +} + +/** + * Create a successor bitmap destined to replace this bitmap after an operation. + * Requires that the bitmap is not frozen and has no successor. + */ +int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs, + BdrvDirtyBitmap *bitmap, Error **errp) +{ +uint64_t granularity; + +if (bdrv_dirty_bitmap_frozen(bitmap)) { +error_setg(errp, + Cannot create a successor for a bitmap currently in-use.); +return -1; +} else if (bitmap-successor) { +error_setg(errp, Cannot create a successor for a bitmap that + already has one.); +return -1; +} + +/* Create an anonymous successor */ +granularity = bdrv_dirty_bitmap_granularity(bs, bitmap); +bitmap-successor = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp); +if (!bitmap-successor) { +return -1; +} + +/* Successor will be on or off based on our current state. + * Parent will be disabled and frozen. */ +bitmap-successor-enabled = bitmap-enabled; +bdrv_disable_dirty_bitmap(bitmap); +
[Qemu-devel] [PATCH 0/2] cpu_ldst.h: Minor cleanups in ld/st macros
These two patches make some minor cleanups to cpu_ldst.h which I noticed while I was trying to get my head around our confusing mess of load and store related functions. Peter Maydell (2): cpu_ldst.h: Remove unused ldul_ macros cpu_ldst.h: Collapse laddr() and saddr() into ldst_get_host_addr() include/exec/cpu_ldst.h | 46 +++--- 1 file changed, 19 insertions(+), 27 deletions(-) -- 1.9.1
Re: [Qemu-devel] [PATCH 0/9] qmp hmp balloon: Cleanups around error reporting
On Di, 2015-01-13 at 18:50 +0100, Markus Armbruster wrote: qmp hmp: Factor out common using spice test qmp hmp: Improve error messages when SPICE is not in use hmp: Compile hmp_info_spice() only with CONFIG_SPICE qmp: Clean up qmp_query_spice() #ifndef !CONFIG_SPICE dummy Reviewed-by: Gerd Hoffmann kra...@redhat.com
[Qemu-devel] [PATCH v3 4/7] monitor: use cc-get_arch_id as the cpu index
From: Gu Zheng guz.f...@cn.fujitsu.com Use cc-get_arch_id as the cpu index to avoid the cpu index duplicated issue in the QMP/HMP command output. Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- cpus.c| 4 +++- monitor.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/cpus.c b/cpus.c index 2edb5cd..d5e35c0 100644 --- a/cpus.c +++ b/cpus.c @@ -1420,6 +1420,7 @@ CpuInfoList *qmp_query_cpus(Error **errp) CPU_FOREACH(cpu) { CpuInfoList *info; +CPUClass *cc; #if defined(TARGET_I386) X86CPU *x86_cpu = X86_CPU(cpu); CPUX86State *env = x86_cpu-env; @@ -1437,11 +1438,12 @@ CpuInfoList *qmp_query_cpus(Error **errp) CPUTriCoreState *env = tricore_cpu-env; #endif +cc = CPU_GET_CLASS(cpu); cpu_synchronize_state(cpu); info = g_malloc0(sizeof(*info)); info-value = g_malloc0(sizeof(*info-value)); -info-value-CPU = cpu-cpu_index; +info-value-CPU = cc-get_arch_id(cpu); info-value-current = (cpu == first_cpu); info-value-halted = cpu-halted; info-value-thread_id = cpu-thread_id; diff --git a/monitor.c b/monitor.c index 1808e41..2283461 100644 --- a/monitor.c +++ b/monitor.c @@ -1024,7 +1024,9 @@ static CPUArchState *mon_get_cpu(void) int monitor_get_cpu_index(void) { CPUState *cpu = ENV_GET_CPU(mon_get_cpu()); -return cpu-cpu_index; +CPUClass *cc = CPU_GET_CLASS(cpu); + +return cc-get_arch_id(cpu); } static void do_info_registers(Monitor *mon, const QDict *qdict) -- 1.9.3
[Qemu-devel] [PATCH v3 5/7] acpi:cpu hotplug: set pcmachine as icc bus' hotplug handler
From: Gu Zheng guz.f...@cn.fujitsu.com As the pre-check in the qdev_device_add(): if (qdev_hotplug bus !qbus_is_hotpluggable(bus)) { qerror_report(QERR_BUS_NO_HOTPLUG, bus-name); return NULL; } if device has parent bus, the bus must have valid hotplug_handler, otherwise can not hot plug. Currently cpu hotplug is based on the PCMachine's hotplug handler, so when hot add cpu, the hotpluggable check of icc bus will be rejected. So we set pcmachine as icc bus' hotplug handler to avoid the rejetion. Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/cpu/icc_bus.c | 5 + hw/i386/pc_piix.c| 5 + hw/i386/pc_q35.c | 5 + include/hw/cpu/icc_bus.h | 2 ++ 4 files changed, 17 insertions(+) diff --git a/hw/cpu/icc_bus.c b/hw/cpu/icc_bus.c index 6646ea2..f455a20 100644 --- a/hw/cpu/icc_bus.c +++ b/hw/cpu/icc_bus.c @@ -92,6 +92,11 @@ static void icc_bridge_init(Object *obj) s-icc_bus.apic_address_space = s-apic_container; } +ICCBus *get_icc_bus(DeviceState *dev) +{ +return ICC_BRIDGE(dev)-icc_bus; +} + static void icc_bridge_class_init(ObjectClass *oc, void *data) { DeviceClass *dc = DEVICE_CLASS(oc); diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index f0a3201..363f796 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -281,6 +281,7 @@ static void pc_init1(MachineState *machine, if (pci_enabled acpi_enabled) { DeviceState *piix4_pm; I2CBus *smbus; +ICCBus *iccbus; smi_irq = qemu_allocate_irqs(pc_acpi_smi_interrupt, first_cpu, 1); /* TODO: Populate SPD eeprom data. */ @@ -296,6 +297,10 @@ static void pc_init1(MachineState *machine, OBJ_PROP_LINK_UNREF_ON_RELEASE, error_abort); object_property_set_link(OBJECT(machine), OBJECT(piix4_pm), PC_MACHINE_ACPI_DEVICE_PROP, error_abort); + +iccbus = get_icc_bus(icc_bridge); +object_property_set_link(OBJECT(iccbus), OBJECT(pc_machine), + QDEV_HOTPLUG_HANDLER_PROPERTY, error_abort); } if (pci_enabled) { diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index a432944..2ce8b70 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -88,6 +88,7 @@ static void pc_q35_init(MachineState *machine) PcGuestInfo *guest_info; ram_addr_t lowmem; DriveInfo *hd[MAX_SATA_PORTS]; +ICCBus *iccbus; /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping @@ -212,6 +213,10 @@ static void pc_q35_init(MachineState *machine) object_property_set_link(OBJECT(machine), OBJECT(lpc), PC_MACHINE_ACPI_DEVICE_PROP, error_abort); +iccbus = get_icc_bus(icc_bridge); +object_property_set_link(OBJECT(iccbus), OBJECT(pc_machine), + QDEV_HOTPLUG_HANDLER_PROPERTY, error_abort); + ich9_lpc = ICH9_LPC_DEVICE(lpc); ich9_lpc-pic = gsi; ich9_lpc-ioapic = gsi_state-ioapic_irq; diff --git a/include/hw/cpu/icc_bus.h b/include/hw/cpu/icc_bus.h index 98a979f..7e53afb 100644 --- a/include/hw/cpu/icc_bus.h +++ b/include/hw/cpu/icc_bus.h @@ -44,6 +44,8 @@ typedef struct ICCBus { #define ICC_BUS(obj) OBJECT_CHECK(ICCBus, (obj), TYPE_ICC_BUS) +ICCBus *get_icc_bus(DeviceState *dev); + /** * ICCDevice: * -- 1.9.3
[Qemu-devel] [PATCH v3 2/7] qom/cpu: move register_vmstate to common CPUClass.realizefn
From: Gu Zheng guz.f...@cn.fujitsu.com Move cpu vmstate register from cpu_exec_init into cpu_common_realizefn, and use cc-get_arch_id as the instance id that suggested by Igor to fix the migration issue. Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- exec.c| 32 +++- include/qom/cpu.h | 2 ++ qom/cpu.c | 2 ++ 3 files changed, 23 insertions(+), 13 deletions(-) diff --git a/exec.c b/exec.c index 081818e..c9ffda6 100644 --- a/exec.c +++ b/exec.c @@ -513,10 +513,28 @@ void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace *as) } #endif +void cpu_vmstate_register(CPUState *cpu) +{ +CPUClass *cc = CPU_GET_CLASS(cpu); +int cpu_index = cc-get_arch_id(cpu); + +if (qdev_get_vmsd(DEVICE(cpu)) == NULL) { +vmstate_register(NULL, cpu_index, vmstate_cpu_common, cpu); +} +#if defined(CPU_SAVE_VERSION) !defined(CONFIG_USER_ONLY) +register_savevm(NULL, cpu, cpu_index, CPU_SAVE_VERSION, +cpu_save, cpu_load, cpu-env_ptr); +assert(cc-vmsd == NULL); +assert(qdev_get_vmsd(DEVICE(cpu)) == NULL); +#endif +if (cc-vmsd != NULL) { +vmstate_register(NULL, cpu_index, cc-vmsd, cpu); +} +} + void cpu_exec_init(CPUArchState *env) { CPUState *cpu = ENV_GET_CPU(env); -CPUClass *cc = CPU_GET_CLASS(cpu); CPUState *some_cpu; int cpu_index; @@ -539,18 +557,6 @@ void cpu_exec_init(CPUArchState *env) #if defined(CONFIG_USER_ONLY) cpu_list_unlock(); #endif -if (qdev_get_vmsd(DEVICE(cpu)) == NULL) { -vmstate_register(NULL, cpu_index, vmstate_cpu_common, cpu); -} -#if defined(CPU_SAVE_VERSION) !defined(CONFIG_USER_ONLY) -register_savevm(NULL, cpu, cpu_index, CPU_SAVE_VERSION, -cpu_save, cpu_load, env); -assert(cc-vmsd == NULL); -assert(qdev_get_vmsd(DEVICE(cpu)) == NULL); -#endif -if (cc-vmsd != NULL) { -vmstate_register(NULL, cpu_index, cc-vmsd, cpu); -} } #if defined(TARGET_HAS_ICE) diff --git a/include/qom/cpu.h b/include/qom/cpu.h index 2098f1c..936afcd 100644 --- a/include/qom/cpu.h +++ b/include/qom/cpu.h @@ -562,6 +562,8 @@ void cpu_interrupt(CPUState *cpu, int mask); #endif /* USER_ONLY */ +void cpu_vmstate_register(CPUState *cpu); + #ifdef CONFIG_SOFTMMU static inline void cpu_unassigned_access(CPUState *cpu, hwaddr addr, bool is_write, bool is_exec, diff --git a/qom/cpu.c b/qom/cpu.c index 9c68fa4..a639822 100644 --- a/qom/cpu.c +++ b/qom/cpu.c @@ -302,6 +302,8 @@ static void cpu_common_realizefn(DeviceState *dev, Error **errp) { CPUState *cpu = CPU(dev); +cpu_vmstate_register(cpu); + if (dev-hotplugged) { cpu_synchronize_post_init(cpu); cpu_resume(cpu); -- 1.9.3
[Qemu-devel] [PATCH v2 02/11] acpi/cpu: add cpu hot unplug request callback function
From: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/cpu_hotplug.c | 38 +- include/hw/acpi/cpu_hotplug.h | 4 2 files changed, 37 insertions(+), 5 deletions(-) diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c index 4047294..f8a10d2 100644 --- a/hw/acpi/cpu_hotplug.c +++ b/hw/acpi/cpu_hotplug.c @@ -12,6 +12,11 @@ #include hw/hw.h #include hw/acpi/cpu_hotplug.h +typedef enum STS_OPT { +SET, +CLEAR, +} STS_OPT; + static uint64_t cpu_status_read(void *opaque, hwaddr addr, unsigned int size) { AcpiCpuHotplug *cpus = opaque; @@ -36,8 +41,8 @@ static const MemoryRegionOps AcpiCpuHotplug_ops = { }, }; -static void acpi_set_cpu_present_bit(AcpiCpuHotplug *g, CPUState *cpu, - Error **errp) +static void acpi_update_cpu_present_bit(AcpiCpuHotplug *g, CPUState *cpu, +STS_OPT opt, Error **errp) { CPUClass *k = CPU_GET_CLASS(cpu); int64_t cpu_id; @@ -48,13 +53,23 @@ static void acpi_set_cpu_present_bit(AcpiCpuHotplug *g, CPUState *cpu, return; } -g-sts[cpu_id / 8] |= (1 (cpu_id % 8)); +switch (opt) { +case SET: +g-sts[cpu_id / 8] |= (1 (cpu_id % 8)); +break; +case CLEAR: +g-sts[cpu_id / 8] = ~(1 (cpu_id % 8)); +break; +default: +g_assert(0); +break; +} } void acpi_cpu_plug_cb(ACPIREGS *ar, qemu_irq irq, AcpiCpuHotplug *g, DeviceState *dev, Error **errp) { -acpi_set_cpu_present_bit(g, CPU(dev), errp); +acpi_update_cpu_present_bit(g, CPU(dev), SET, errp); if (*errp != NULL) { return; } @@ -66,13 +81,26 @@ void acpi_cpu_plug_cb(ACPIREGS *ar, qemu_irq irq, } } +void acpi_cpu_unplug_request_cb(ACPIREGS *ar, qemu_irq irq, +AcpiCpuHotplug *g, DeviceState *dev, +Error **errp) +{ +acpi_update_cpu_present_bit(g, CPU(dev), CLEAR, errp); +if (*errp != NULL) { +return; +} + +ar-gpe.sts[0] |= ACPI_CPU_HOTPLUG_STATUS; +acpi_update_sci(ar, irq); +} + void acpi_cpu_hotplug_init(MemoryRegion *parent, Object *owner, AcpiCpuHotplug *gpe_cpu, uint16_t base) { CPUState *cpu; CPU_FOREACH(cpu) { -acpi_set_cpu_present_bit(gpe_cpu, cpu, error_abort); +acpi_update_cpu_present_bit(gpe_cpu, cpu, SET, error_abort); } memory_region_init_io(gpe_cpu-io, owner, AcpiCpuHotplug_ops, gpe_cpu, acpi-cpu-hotplug, ACPI_GPE_PROC_LEN); diff --git a/include/hw/acpi/cpu_hotplug.h b/include/hw/acpi/cpu_hotplug.h index f6d358d..8b15a3d 100644 --- a/include/hw/acpi/cpu_hotplug.h +++ b/include/hw/acpi/cpu_hotplug.h @@ -23,6 +23,10 @@ typedef struct AcpiCpuHotplug { void acpi_cpu_plug_cb(ACPIREGS *ar, qemu_irq irq, AcpiCpuHotplug *g, DeviceState *dev, Error **errp); +void acpi_cpu_unplug_request_cb(ACPIREGS *ar, qemu_irq irq, +AcpiCpuHotplug *g, DeviceState *dev, +Error **errp); + void acpi_cpu_hotplug_init(MemoryRegion *parent, Object *owner, AcpiCpuHotplug *gpe_cpu, uint16_t base); #endif -- 1.9.3
[Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support
This series is based on chen fan's previous i386 cpu hot remove patchset: https://lists.nongnu.org/archive/html/qemu-devel/2013-12/msg04266.html Via implementing ACPI standard methods _EJ0 in ACPI table, after Guest OS remove one vCPU online, the fireware will store removed bitmap to QEMU, then QEMU could know to notify the assigned vCPU of exiting. Meanwhile, intruduce the QOM command 'device_del' to remove vCPU from QEMU itself. The whole work is based on the new hot plug/unplug framework, ,the unplug request callback does the pre-check and send the request, unplug callback does the removal handling. This series depends on tangchen's common hot plug/unplug enhance patchset. [RESEND PATCH v1 0/5] Common unplug and unplug request cb for memory and CPU hot-unplug https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg00429.html The is the second half of the previous series: [RFC V2 00/10] cpu: add device_add foo-x86_64-cpu and i386 cpu hot remove support https://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04779.html If you want to test the series, you need to apply the 'device_add foo-x86_64-cpu' patchset first: [PATCH v3 0/7] cpu: add device_add foo-x86_64-cpu support https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg01552.html --- Changelog since v1: -rebase on the latest version. -delete patch i386/cpu: add instance finalize callback, and put it into patchset [PATCH v3 0/6] cpu: add device_add foo-x86_64-cpu support. Changelog since RFC: -splited the i386 cpu hot remove into single thread. -replaced apic_no with apic_id, so does the related stuff to make it work with arbitrary CPU hotadd. -add the icc_device_unrealize callback to handle apic unrealize. -rework on the new hot plug/unplug platform. --- Chen Fan (2): x86: add x86_cpu_unrealizefn() for cpu apic remove cpu hotplug: implement function cpu_status_write() for vcpu ejection Gu Zheng (5): acpi/cpu: add cpu hot unplug request callback function acpi/piix4: add cpu hot unplug callback support acpi/ich9: add cpu hot unplug support pc: add cpu hot unplug callback support cpus: reclaim allocated vCPU objects Zhu Guihua (4): acpi/piix4: add cpu hot unplug request callback support acpi/ich9: add cpu hot unplug request callback support pc: add cpu hot unplug request callback support acpi/cpu: add cpu hot unplug callback function cpus.c| 44 hw/acpi/cpu_hotplug.c | 88 --- hw/acpi/ich9.c| 17 ++-- hw/acpi/piix4.c | 12 +- hw/core/qdev.c| 2 +- hw/cpu/icc_bus.c | 11 + hw/i386/acpi-dsdt-cpu-hotplug.dsl | 6 ++- hw/i386/kvm/apic.c| 8 hw/i386/pc.c | 62 +-- hw/intc/apic.c| 10 + hw/intc/apic_common.c | 21 ++ include/hw/acpi/cpu_hotplug.h | 8 include/hw/cpu/icc_bus.h | 1 + include/hw/i386/apic_internal.h | 1 + include/hw/qdev-core.h| 1 + include/qom/cpu.h | 9 include/sysemu/kvm.h | 1 + kvm-all.c | 57 - target-i386/cpu.c | 46 19 files changed, 378 insertions(+), 27 deletions(-) -- 1.9.3
[Qemu-devel] [PATCH v2 05/11] pc: add cpu hot unplug request callback support
Add cpu hot unplug request callback support, and a pre-check about the active cpus is added to avoid removing the last cpu. Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com --- hw/i386/pc.c | 36 ++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 9bdec16..dd6e8da 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1694,6 +1694,34 @@ out: error_propagate(errp, local_err); } +static void pc_cpu_unplug_request(HotplugHandler *hotplug_dev, + DeviceState *dev, Error **errp) +{ +HotplugHandlerClass *hhc; +Error *local_err = NULL; +PCMachineState *pcms; + +if (smp_cpus == 1) { +error_setg(local_err, + This is the last cpu, should not be removed!); +goto out; +} + +pcms = PC_MACHINE(hotplug_dev); +if (!pcms-acpi_dev) { +if (dev-hotplugged) { +error_setg(local_err, + cpu hot unplug is not enabled: missing acpi device); +} +goto out; +} + +hhc = HOTPLUG_HANDLER_GET_CLASS(pcms-acpi_dev); +hhc-unplug_request(HOTPLUG_HANDLER(pcms-acpi_dev), dev, local_err); +out: +error_propagate(errp, local_err); +} + static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { @@ -1707,8 +1735,12 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev, static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { -error_setg(errp, acpi: device unplug request for not supported device -type: %s, object_get_typename(OBJECT(dev))); +if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { +pc_cpu_unplug_request(hotplug_dev, dev, errp); +} else { +error_setg(errp, acpi: device unplug request for not supported device +type: %s, object_get_typename(OBJECT(dev))); +} } static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev, -- 1.9.3
[Qemu-devel] [PATCH v2 04/11] acpi/ich9: add cpu hot unplug request callback support
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com --- hw/acpi/ich9.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c index c48d176..b795f8f 100644 --- a/hw/acpi/ich9.c +++ b/hw/acpi/ich9.c @@ -304,8 +304,13 @@ void ich9_pm_device_plug_cb(ICH9LPCPMRegs *pm, DeviceState *dev, Error **errp) void ich9_pm_device_unplug_request_cb(ICH9LPCPMRegs *pm, DeviceState *dev, Error **errp) { -error_setg(errp, acpi: device unplug request for not supported device -type: %s, object_get_typename(OBJECT(dev))); +if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { +acpi_cpu_unplug_request_cb(pm-acpi_regs, pm-irq, + pm-gpe_cpu, dev, errp); +} else { +error_setg(errp, acpi: device unplug request for not supported device +type: %s, object_get_typename(OBJECT(dev))); +} } void ich9_pm_device_unplug_cb(ICH9LPCPMRegs *pm, DeviceState *dev, -- 1.9.3
[Qemu-devel] [PATCH v2 03/11] acpi/piix4: add cpu hot unplug request callback support
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com --- hw/acpi/piix4.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c index 4407388..9b75780 100644 --- a/hw/acpi/piix4.c +++ b/hw/acpi/piix4.c @@ -364,6 +364,8 @@ static void piix4_device_unplug_request_cb(HotplugHandler *hotplug_dev, if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { acpi_pcihp_device_unplug_cb(s-ar, s-irq, s-acpi_pci_hotplug, dev, errp); +} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { +acpi_cpu_unplug_request_cb(s-ar, s-irq, s-gpe_cpu, dev, errp); } else { error_setg(errp, acpi: device unplug request for not supported device type: %s, object_get_typename(OBJECT(dev))); -- 1.9.3
[Qemu-devel] [PATCH v3 1/7] cpu: introduce CpuTopoInfo structure for argument simplification
From: Chen Fan chen.fan.f...@cn.fujitsu.com Reviewed-by: Eduardo Habkost ehabk...@redhat.com Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- target-i386/topology.h | 33 + 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/target-i386/topology.h b/target-i386/topology.h index 07a6c5f..e9ff89c 100644 --- a/target-i386/topology.h +++ b/target-i386/topology.h @@ -47,6 +47,12 @@ */ typedef uint32_t apic_id_t; +typedef struct X86CPUTopoInfo { +unsigned pkg_id; +unsigned core_id; +unsigned smt_id; +} X86CPUTopoInfo; + /* Return the bit width needed for 'count' IDs */ static unsigned apicid_bitwidth_for_count(unsigned count) @@ -92,13 +98,11 @@ static inline unsigned apicid_pkg_offset(unsigned nr_cores, unsigned nr_threads) */ static inline apic_id_t apicid_from_topo_ids(unsigned nr_cores, unsigned nr_threads, - unsigned pkg_id, - unsigned core_id, - unsigned smt_id) + const X86CPUTopoInfo *topo) { -return (pkg_id apicid_pkg_offset(nr_cores, nr_threads)) | - (core_id apicid_core_offset(nr_cores, nr_threads)) | - smt_id; +return (topo-pkg_id apicid_pkg_offset(nr_cores, nr_threads)) | + (topo-core_id apicid_core_offset(nr_cores, nr_threads)) | + topo-smt_id; } /* Calculate thread/core/package IDs for a specific topology, @@ -107,14 +111,12 @@ static inline apic_id_t apicid_from_topo_ids(unsigned nr_cores, static inline void x86_topo_ids_from_idx(unsigned nr_cores, unsigned nr_threads, unsigned cpu_index, - unsigned *pkg_id, - unsigned *core_id, - unsigned *smt_id) + X86CPUTopoInfo *topo) { unsigned core_index = cpu_index / nr_threads; -*smt_id = cpu_index % nr_threads; -*core_id = core_index % nr_cores; -*pkg_id = core_index / nr_cores; +topo-smt_id = cpu_index % nr_threads; +topo-core_id = core_index % nr_cores; +topo-pkg_id = core_index / nr_cores; } /* Make APIC ID for the CPU 'cpu_index' @@ -125,10 +127,9 @@ static inline apic_id_t x86_apicid_from_cpu_idx(unsigned nr_cores, unsigned nr_threads, unsigned cpu_index) { -unsigned pkg_id, core_id, smt_id; -x86_topo_ids_from_idx(nr_cores, nr_threads, cpu_index, - pkg_id, core_id, smt_id); -return apicid_from_topo_ids(nr_cores, nr_threads, pkg_id, core_id, smt_id); +X86CPUTopoInfo topo; +x86_topo_ids_from_idx(nr_cores, nr_threads, cpu_index, topo); +return apicid_from_topo_ids(nr_cores, nr_threads, topo); } #endif /* TARGET_I386_TOPOLOGY_H */ -- 1.9.3
[Qemu-devel] [PATCH v3 3/7] qom/cpu: move apic vmstate register into x86_cpu_apic_realize
From: Gu Zheng guz.f...@cn.fujitsu.com move apic vmstate register into x86_cpu_apic_realize, and use cc-get_arch_id as the instance id to avoid using the auto-id which will break the migration if we add device not in order. Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/intc/apic_common.c | 3 +-- include/hw/i386/apic_internal.h | 3 +++ target-i386/cpu.c | 8 +++- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index d9bb188..719f74d 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -380,7 +380,7 @@ static const VMStateDescription vmstate_apic_common_sipi = { } }; -static const VMStateDescription vmstate_apic_common = { +const VMStateDescription vmstate_apic_common = { .name = apic, .version_id = 3, .minimum_version_id = 3, @@ -434,7 +434,6 @@ static void apic_common_class_init(ObjectClass *klass, void *data) ICCDeviceClass *idc = ICC_DEVICE_CLASS(klass); DeviceClass *dc = DEVICE_CLASS(klass); -dc-vmsd = vmstate_apic_common; dc-reset = apic_reset_common; dc-props = apic_properties_common; idc-realize = apic_common_realize; diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h index dc7a89d..00d29e2 100644 --- a/include/hw/i386/apic_internal.h +++ b/include/hw/i386/apic_internal.h @@ -23,6 +23,7 @@ #include exec/memory.h #include hw/cpu/icc_bus.h #include qemu/timer.h +#include migration/vmstate.h /* APIC Local Vector Table */ #define APIC_LVT_TIMER 0 @@ -138,6 +139,8 @@ typedef struct VAPICState { extern bool apic_report_tpr_access; +extern const VMStateDescription vmstate_apic_common; + void apic_report_irq_delivered(int delivered); bool apic_next_timer(APICCommonState *s, int64_t current_time); void apic_enable_tpr_access_reporting(DeviceState *d, bool enable); diff --git a/target-i386/cpu.c b/target-i386/cpu.c index b81ac5c..3406fe8 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -2750,10 +2750,16 @@ static void x86_cpu_apic_create(X86CPU *cpu, Error **errp) static void x86_cpu_apic_realize(X86CPU *cpu, Error **errp) { -if (cpu-apic_state == NULL) { +DeviceState *apic_state = cpu-apic_state; +CPUClass *cc = CPU_GET_CLASS(CPU(cpu)); + +if (apic_state == NULL) { return; } +vmstate_register(0, cc-get_arch_id(CPU(cpu)), + vmstate_apic_common, apic_state); + if (qdev_init(cpu-apic_state)) { error_setg(errp, APIC device '%s' could not be initialized, object_get_typename(OBJECT(cpu-apic_state))); -- 1.9.3
[Qemu-devel] [PATCH v3 6/7] cpu: add device_add foo-x86_64-cpu support
From: Chen Fan chen.fan.f...@cn.fujitsu.com Add support to device_add foo-x86_64-cpu, and additional checks of apic id are added into x86_cpuid_set_apic_id() to avoid duplicate. Besides, in order to support device/device_add foo-x86_64-cpu which without specified apic id, we assign cpuid_apic_id with a default broadcast value (0x) in initfn, and a new function get_free_apic_id() to provide a free apid id to cpuid_apic_id if it still has the default at realize time (e.g. hot add foo-cpu without a specified apic id) to avoid apic id duplicates. Thanks very much for Igor's suggestion. Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/cpu_hotplug.c | 7 +-- hw/i386/pc.c | 6 -- target-i386/cpu.c | 47 --- target-i386/topology.h | 18 ++ 4 files changed, 67 insertions(+), 11 deletions(-) diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c index b8ebfad..4047294 100644 --- a/hw/acpi/cpu_hotplug.c +++ b/hw/acpi/cpu_hotplug.c @@ -59,8 +59,11 @@ void acpi_cpu_plug_cb(ACPIREGS *ar, qemu_irq irq, return; } -ar-gpe.sts[0] |= ACPI_CPU_HOTPLUG_STATUS; -acpi_update_sci(ar, irq); +/* Only trigger sci if cpu is hotplugged */ +if (dev-hotplugged) { +ar-gpe.sts[0] |= ACPI_CPU_HOTPLUG_STATUS; +acpi_update_sci(ar, irq); +} } void acpi_cpu_hotplug_init(MemoryRegion *parent, Object *owner, diff --git a/hw/i386/pc.c b/hw/i386/pc.c index e07f1fa..006f355 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1678,13 +1678,7 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev, Error *local_err = NULL; PCMachineState *pcms = PC_MACHINE(hotplug_dev); -if (!dev-hotplugged) { -goto out; -} - if (!pcms-acpi_dev) { -error_setg(local_err, - cpu hotplug is not enabled: missing acpi device); goto out; } diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 3406fe8..4347948 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -1705,6 +1705,7 @@ static void x86_cpuid_set_apic_id(Object *obj, Visitor *v, void *opaque, const int64_t max = UINT32_MAX; Error *error = NULL; int64_t value; +X86CPUTopoInfo topo; if (dev-realized) { error_setg(errp, Attempt to set property '%s' on '%s' after @@ -1724,6 +1725,19 @@ static void x86_cpuid_set_apic_id(Object *obj, Visitor *v, void *opaque, return; } +if (value x86_cpu_apic_id_from_index(max_cpus - 1)) { +error_setg(errp, CPU with APIC ID % PRIi64 +is more than MAX APIC ID limits, value); +return; +} + +x86_topo_ids_from_apic_id(smp_cores, smp_threads, value, topo); +if (topo.smt_id = smp_threads || topo.core_id = smp_cores) { +error_setg(errp, CPU with APIC ID % PRIi64 does not match + topology configuration., value); +return; +} + if ((value != cpu-env.cpuid_apic_id) cpu_exists(value)) { error_setg(errp, CPU with APIC ID % PRIi64 exists, value); return; @@ -2178,8 +2192,10 @@ static void x86_cpu_cpudef_class_init(ObjectClass *oc, void *data) { X86CPUDefinition *cpudef = data; X86CPUClass *xcc = X86_CPU_CLASS(oc); +DeviceClass *dc = DEVICE_CLASS(oc); xcc-cpu_def = cpudef; +dc-cannot_instantiate_with_device_add_yet = false; } static void x86_register_cpudef_type(X86CPUDefinition *def) @@ -2188,6 +2204,7 @@ static void x86_register_cpudef_type(X86CPUDefinition *def) TypeInfo ti = { .name = typename, .parent = TYPE_X86_CPU, +.instance_size = sizeof(X86CPU), .class_init = x86_cpu_cpudef_class_init, .class_data = def, }; @@ -2721,12 +2738,29 @@ static void mce_init(X86CPU *cpu) } #ifndef CONFIG_USER_ONLY +static uint32_t get_free_apic_id(void) +{ +int i; + +for (i = 0; i max_cpus; i++) { +uint32_t id = x86_cpu_apic_id_from_index(i); + +if (!cpu_exists(id)) { +return id; +} +} + +return x86_cpu_apic_id_from_index(max_cpus); +} + +#define APIC_ID_NOT_SET (~0U) + static void x86_cpu_apic_create(X86CPU *cpu, Error **errp) { -CPUX86State *env = cpu-env; DeviceState *dev = DEVICE(cpu); APICCommonState *apic; const char *apic_type = apic; +uint32_t apic_id; if (kvm_irqchip_in_kernel()) { apic_type = kvm-apic; @@ -2742,7 +2776,14 @@ static void x86_cpu_apic_create(X86CPU *cpu, Error **errp) object_property_add_child(OBJECT(cpu), apic, OBJECT(cpu-apic_state), NULL); -qdev_prop_set_uint8(cpu-apic_state, id, env-cpuid_apic_id); + +apic_id = object_property_get_int(OBJECT(cpu), apic-id, NULL); +if (apic_id == APIC_ID_NOT_SET) { +apic_id = get_free_apic_id(); +
[Qemu-devel] [PATCH v3 0/7] cpu: add device_add foo-x86_64-cpu support
This series is based on the previous patchset from Chen Fan: https://lists.nongnu.org/archive/html/qemu-devel/2014-05/msg02360.html We try to make cpu hotplug with device_add, and make -device foo-x86_64-cpu available???also we can set apic-id property with command line, if without setting apic-id property, we offer the first unoccupied apic id as the default new apic id. When hotplug cpu with device_add, additional check of APIC ID will be done after cpu object initialization which was different from 'cpu_add' command that check 'ids' at the beginning. The is the first half of the previous series: [RFC V2 00/10] cpu: add device_add foo-x86_64-cpu and i386 cpu hot remove support https://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04779.html --- Changelog since v2: -rebase on latest upstream. -add cpu instance finalize. Changelog since v1: -rebased on latest upstream. -introduce a help function to hide the access to icc_bus. -use a macro ACPI_ID_NOT_SET to replace the magic number(0x). Changelog since RFC: -split out APIC vmstate/QMP-monitor changes into separate patches. -add the handle of the startup cpus(-device foo). -remove duplicated checking about env-cpuid_apic_id. -do actual APIC ID allocation at realize time if it is not set before. -remove the unneeded x86_cpu_cpudef_instance_init(). -split off device_del support out here. --- Chen Fan (2): cpu: introduce CpuTopoInfo structure for argument simplification cpu: add device_add foo-x86_64-cpu support Gu Zheng (5): qom/cpu: move register_vmstate to common CPUClass.realizefn qom/cpu: move apic vmstate register into x86_cpu_apic_realize monitor: use cc-get_arch_id as the cpu index acpi:cpu hotplug: set pcmachine as icc bus' hotplug handler i386/cpu: add instance finalize callback cpus.c | 4 ++- exec.c | 32 - hw/acpi/cpu_hotplug.c | 7 +++-- hw/cpu/icc_bus.c| 5 hw/i386/pc.c| 6 hw/i386/pc_piix.c | 5 hw/i386/pc_q35.c| 5 hw/intc/apic_common.c | 3 +- include/hw/cpu/icc_bus.h| 2 ++ include/hw/i386/apic_internal.h | 3 ++ include/qom/cpu.h | 3 ++ monitor.c | 4 ++- qom/cpu.c | 2 ++ target-i386/cpu.c | 63 ++--- target-i386/topology.h | 51 ++--- 15 files changed, 150 insertions(+), 45 deletions(-) -- 1.9.3
[Qemu-devel] [PATCH v3 7/7] i386/cpu: add instance finalize callback
From: Gu Zheng guz.f...@cn.fujitsu.com Add a func to finalize a cpu's instance. When cpu's device_add failed, and cpu's device_del executed, this func would be invoked. Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- include/qom/cpu.h | 1 + target-i386/cpu.c | 8 2 files changed, 9 insertions(+) diff --git a/include/qom/cpu.h b/include/qom/cpu.h index 936afcd..f663199 100644 --- a/include/qom/cpu.h +++ b/include/qom/cpu.h @@ -305,6 +305,7 @@ struct CPUState { QTAILQ_HEAD(CPUTailQ, CPUState); extern struct CPUTailQ cpus; #define CPU_NEXT(cpu) QTAILQ_NEXT(cpu, node) +#define CPU_REMOVE(cpu) QTAILQ_REMOVE(cpus, cpu, node) #define CPU_FOREACH(cpu) QTAILQ_FOREACH(cpu, cpus, node) #define CPU_FOREACH_SAFE(cpu, next_cpu) \ QTAILQ_FOREACH_SAFE(cpu, cpus, node, next_cpu) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 4347948..4746814 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -2983,6 +2983,13 @@ static void x86_cpu_initfn(Object *obj) } } +static void x86_cpu_finalizefn(Object *obj) +{ +CPUState *cs = CPU(obj); + +CPU_REMOVE(cs); +} + static int64_t x86_cpu_get_arch_id(CPUState *cs) { X86CPU *cpu = X86_CPU(cs); @@ -3095,6 +3102,7 @@ static const TypeInfo x86_cpu_type_info = { .parent = TYPE_CPU, .instance_size = sizeof(X86CPU), .instance_init = x86_cpu_initfn, +.instance_finalize = x86_cpu_finalizefn, .abstract = true, .class_size = sizeof(X86CPUClass), .class_init = x86_cpu_common_class_init, -- 1.9.3
Re: [Qemu-devel] question about live migration with storage
On 14/01/2015 03:41, Zhang Haoyu wrote: Hi, Paolo, what's advantages of drive_mirror over traditional mechanism implemented in block-migration.c ? Why libvirt use drive_mirror instead of traditional iterative mechanism as the default way of live migration with non-shared storage? 1) Being able to choose which block devices are migrated, and whether they are migrated incrementally or not. 2) Finer-grain control the parameters of block migration (dirty bitmap granularity). 3) Block and RAM migration do not share the same socket and thus can more easily be parallelized. Note that 1-2 are not yet supported by libvirt as far as I remember. Paolo
Re: [Qemu-devel] [PATCH] AIO: Reduce number of threads for 32bit hosts
On 14/01/2015 01:56, Alexander Graf wrote: +if (sizeof(pool) == 4) { +/* 32bit systems run out of virtual memory quickly */ +pool-max_threads = 4; +} else { +pool-max_threads = 64; +} Reviewed-by: Paolo Bonzini pbonz...@redhat.com The same problem applies to coroutine stacks, and those cannot be throttled down as easily. But I guess if you limit the number of threads, the guest gets slowed down and doesn't create as many coroutines. It would be nice to have a way to measure coroutine stack usage, similar to what the kernel does. Paolo
[Qemu-devel] [PATCH v2 01/11] x86: add x86_cpu_unrealizefn() for cpu apic remove
From: Chen Fan chen.fan.f...@cn.fujitsu.com Implement x86_cpu_unrealizefn() for corresponding x86_cpu_realizefn(), which is mostly used to clean the apic related allocation and vmstates at here. Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/cpu/icc_bus.c| 11 ++ hw/i386/kvm/apic.c | 8 +++ hw/intc/apic.c | 10 + hw/intc/apic_common.c | 21 --- include/hw/cpu/icc_bus.h| 1 + include/hw/i386/apic_internal.h | 1 + target-i386/cpu.c | 46 + 7 files changed, 90 insertions(+), 8 deletions(-) diff --git a/hw/cpu/icc_bus.c b/hw/cpu/icc_bus.c index f455a20..3e7f62f 100644 --- a/hw/cpu/icc_bus.c +++ b/hw/cpu/icc_bus.c @@ -44,11 +44,22 @@ static void icc_device_realize(DeviceState *dev, Error **errp) } +static void icc_device_unrealize(DeviceState *dev, Error **errp) +{ +ICCDeviceClass *idc = ICC_DEVICE_GET_CLASS(dev); + +/* convert to QOM */ +if (idc-unrealize) { +idc-unrealize(dev, errp); +} +} + static void icc_device_class_init(ObjectClass *oc, void *data) { DeviceClass *dc = DEVICE_CLASS(oc); dc-realize = icc_device_realize; +dc-unrealize = icc_device_unrealize; dc-bus_type = TYPE_ICC_BUS; } diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c index 5b47056..781c296 100644 --- a/hw/i386/kvm/apic.c +++ b/hw/i386/kvm/apic.c @@ -189,11 +189,19 @@ static void kvm_apic_realize(DeviceState *dev, Error **errp) } } +static void kvm_apic_unrealize(DeviceState *dev, Error **errp) +{ +APICCommonState *s = APIC_COMMON(dev); + +object_unparent(OBJECT(s-io_memory)); +} + static void kvm_apic_class_init(ObjectClass *klass, void *data) { APICCommonClass *k = APIC_COMMON_CLASS(klass); k-realize = kvm_apic_realize; +k-unrealize = kvm_apic_unrealize; k-reset = kvm_apic_reset; k-set_base = kvm_apic_set_base; k-set_tpr = kvm_apic_set_tpr; diff --git a/hw/intc/apic.c b/hw/intc/apic.c index 0f97b47..2f3a7a3 100644 --- a/hw/intc/apic.c +++ b/hw/intc/apic.c @@ -889,11 +889,21 @@ static void apic_realize(DeviceState *dev, Error **errp) msi_supported = true; } +static void apic_unrealize(DeviceState *dev, Error **errp) +{ +APICCommonState *s = APIC_COMMON(dev); + +object_unparent(OBJECT(s-io_memory)); +timer_free(s-timer); +local_apics[s-idx] = NULL; +} + static void apic_class_init(ObjectClass *klass, void *data) { APICCommonClass *k = APIC_COMMON_CLASS(klass); k-realize = apic_realize; +k-unrealize = apic_unrealize; k-set_base = apic_set_base; k-set_tpr = apic_set_tpr; k-get_tpr = apic_get_tpr; diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index 719f74d..912ba64 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -299,16 +299,8 @@ static void apic_common_realize(DeviceState *dev, Error **errp) APICCommonState *s = APIC_COMMON(dev); APICCommonClass *info; static DeviceState *vapic; -static int apic_no; static bool mmio_registered; -if (apic_no = MAX_APICS) { -error_setg(errp, %s initialization failed., - object_get_typename(OBJECT(dev))); -return; -} -s-idx = apic_no++; - info = APIC_COMMON_GET_CLASS(s); info-realize(dev, errp); if (!mmio_registered) { @@ -329,6 +321,18 @@ static void apic_common_realize(DeviceState *dev, Error **errp) } +static void apic_common_unrealize(DeviceState *dev, Error **errp) +{ +APICCommonState *s = APIC_COMMON(dev); +APICCommonClass *info = APIC_COMMON_GET_CLASS(s); + +info-unrealize(dev, errp); + +if (apic_report_tpr_access info-enable_tpr_reporting) { +info-enable_tpr_reporting(s, false); +} +} + static int apic_pre_load(void *opaque) { APICCommonState *s = APIC_COMMON(opaque); @@ -437,6 +441,7 @@ static void apic_common_class_init(ObjectClass *klass, void *data) dc-reset = apic_reset_common; dc-props = apic_properties_common; idc-realize = apic_common_realize; +idc-unrealize = apic_common_unrealize; /* * Reason: APIC and CPU need to be wired up by * x86_cpu_apic_create() diff --git a/include/hw/cpu/icc_bus.h b/include/hw/cpu/icc_bus.h index 7e53afb..0b2ddd5 100644 --- a/include/hw/cpu/icc_bus.h +++ b/include/hw/cpu/icc_bus.h @@ -69,6 +69,7 @@ typedef struct ICCDeviceClass { /* public */ DeviceRealize realize; +DeviceUnrealize unrealize; } ICCDeviceClass; #define TYPE_ICC_DEVICE icc-device diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h index 00d29e2..b46a3af 100644 --- a/include/hw/i386/apic_internal.h +++ b/include/hw/i386/apic_internal.h @@ -82,6 +82,7 @@ typedef struct APICCommonClass ICCDeviceClass parent_class;
[Qemu-devel] [PATCH v2 09/11] pc: add cpu hot unplug callback support
From: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/i386/pc.c | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index dd6e8da..b7d5712 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1722,6 +1722,24 @@ out: error_propagate(errp, local_err); } +static void pc_cpu_unplug(HotplugHandler *hotplug_dev, + DeviceState *dev, Error **errp) +{ +HotplugHandlerClass *hhc; +Error *local_err = NULL; +PCMachineState *pcms = PC_MACHINE(hotplug_dev); + +hhc = HOTPLUG_HANDLER_GET_CLASS(pcms-acpi_dev); +hhc-unplug(HOTPLUG_HANDLER(pcms-acpi_dev), dev, local_err); +if (local_err) { +error_propagate(errp, local_err); +return; +} + +/* decrement the number of CPUs */ +rtc_set_memory(pcms-rtc, 0x5f, rtc_get_memory(pcms-rtc, 0x5f) - 1); +} + static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { @@ -1746,8 +1764,12 @@ static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev, static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { -error_setg(errp, acpi: device unplug for not supported device -type: %s, object_get_typename(OBJECT(dev))); +if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { +pc_cpu_unplug(hotplug_dev, dev, errp); +} else { +error_setg(errp, acpi: device unplug for not supported device +type: %s, object_get_typename(OBJECT(dev))); +} } static HotplugHandler *pc_get_hotpug_handler(MachineState *machine, -- 1.9.3
[Qemu-devel] [PATCH v2 11/11] cpus: reclaim allocated vCPU objects
From: Gu Zheng guz.f...@cn.fujitsu.com After ACPI get a signal to eject a vCPU, the vCPU must be removed from CPU list,before the vCPU really removed, then release the all related vCPU objects. In order to deal well with the kvm vcpus (which can not be removed without any protection), we do not close KVM vcpu fd, just record and mark it as stopped into a list, so that we can reuse it for the appending cpu hot-add request if possible. It is also the approach that kvm guys suggested: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- cpus.c | 37 ++ include/sysemu/kvm.h | 1 + kvm-all.c| 57 +++- 3 files changed, 94 insertions(+), 1 deletion(-) diff --git a/cpus.c b/cpus.c index 1c25054..b5b0c91 100644 --- a/cpus.c +++ b/cpus.c @@ -871,6 +871,24 @@ void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data) qemu_cpu_kick(cpu); } +static void qemu_kvm_destroy_vcpu(CPUState *cpu) +{ +CPU_REMOVE(cpu); + +if (kvm_destroy_vcpu(cpu) 0) { +error_report(kvm_destroy_vcpu failed.\n); +exit(EXIT_FAILURE); +} + +object_unparent(OBJECT(cpu)); +} + +static void qemu_tcg_destroy_vcpu(CPUState *cpu) +{ +CPU_REMOVE(cpu); +object_unparent(OBJECT(cpu)); +} + static void flush_queued_work(CPUState *cpu) { struct qemu_work_item *wi; @@ -963,6 +981,11 @@ static void *qemu_kvm_cpu_thread_fn(void *arg) } } qemu_kvm_wait_io_event(cpu); +if (cpu-exit !cpu_can_run(cpu)) { +qemu_kvm_destroy_vcpu(cpu); +qemu_mutex_unlock(qemu_global_mutex); +return NULL; +} } return NULL; @@ -1016,6 +1039,7 @@ static void tcg_exec_all(void); static void *qemu_tcg_cpu_thread_fn(void *arg) { CPUState *cpu = arg; +CPUState *remove_cpu = NULL; qemu_tcg_init_cpu_signals(); qemu_thread_get_self(cpu-thread); @@ -1049,6 +1073,16 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) } } qemu_tcg_wait_io_event(); +CPU_FOREACH(cpu) { +if (cpu-exit !cpu_can_run(cpu)) { +remove_cpu = cpu; +break; +} +} +if (remove_cpu) { +qemu_tcg_destroy_vcpu(remove_cpu); +remove_cpu = NULL; +} } return NULL; @@ -1406,6 +1440,9 @@ static void tcg_exec_all(void) break; } } else if (cpu-stop || cpu-stopped) { +if (cpu-exit) { +next_cpu = CPU_NEXT(cpu); +} break; } } diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 30cb84d..560caef 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -188,6 +188,7 @@ int kvm_has_intx_set_mask(void); int kvm_init_vcpu(CPUState *cpu); int kvm_cpu_exec(CPUState *cpu); +int kvm_destroy_vcpu(CPUState *cpu); #ifdef NEED_CPU_H diff --git a/kvm-all.c b/kvm-all.c index 2f21a4e..6ea3d70 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -71,6 +71,12 @@ typedef struct KVMSlot typedef struct kvm_dirty_log KVMDirtyLog; +struct KVMParkedVcpu { +unsigned long vcpu_id; +int kvm_fd; +QLIST_ENTRY(KVMParkedVcpu) node; +}; + struct KVMState { AccelState parent_obj; @@ -107,6 +113,7 @@ struct KVMState QTAILQ_HEAD(msi_hashtab, KVMMSIRoute) msi_hashtab[KVM_MSI_HASHTAB_SIZE]; bool direct_msi; #endif +QLIST_HEAD(, KVMParkedVcpu) kvm_parked_vcpus; }; #define TYPE_KVM_ACCEL ACCEL_CLASS_NAME(kvm) @@ -247,6 +254,53 @@ static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot) return kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, mem); } +int kvm_destroy_vcpu(CPUState *cpu) +{ +KVMState *s = kvm_state; +long mmap_size; +struct KVMParkedVcpu *vcpu = NULL; +int ret = 0; + +DPRINTF(kvm_destroy_vcpu\n); + +mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0); +if (mmap_size 0) { +ret = mmap_size; +DPRINTF(KVM_GET_VCPU_MMAP_SIZE failed\n); +goto err; +} + +ret = munmap(cpu-kvm_run, mmap_size); +if (ret 0) { +goto err; +} + +vcpu = g_malloc0(sizeof(*vcpu)); +vcpu-vcpu_id = kvm_arch_vcpu_id(cpu); +vcpu-kvm_fd = cpu-kvm_fd; +QLIST_INSERT_HEAD(kvm_state-kvm_parked_vcpus, vcpu, node); +err: +return ret; +} + +static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id) +{ +struct KVMParkedVcpu *cpu; + +QLIST_FOREACH(cpu, s-kvm_parked_vcpus, node) { +if (cpu-vcpu_id == vcpu_id) { +int kvm_fd; + +QLIST_REMOVE(cpu, node); +kvm_fd = cpu-kvm_fd; +g_free(cpu); +return kvm_fd; +} +} + +return kvm_vm_ioctl(s,
[Qemu-devel] [PATCH v2 07/11] acpi/piix4: add cpu hot unplug callback support
From: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/piix4.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c index 9b75780..2a923fc 100644 --- a/hw/acpi/piix4.c +++ b/hw/acpi/piix4.c @@ -375,8 +375,14 @@ static void piix4_device_unplug_request_cb(HotplugHandler *hotplug_dev, static void piix4_device_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { -error_setg(errp, acpi: device unplug for not supported device -type: %s, object_get_typename(OBJECT(dev))); +PIIX4PMState *s = PIIX4_PM(hotplug_dev); + +if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { +acpi_cpu_unplug_cb(s-ar, s-irq, s-gpe_cpu, dev, errp); +} else { +error_setg(errp, acpi: device unplug for not supported device +type: %s, object_get_typename(OBJECT(dev))); +} } static void piix4_update_bus_hotplug(PCIBus *pci_bus, void *opaque) -- 1.9.3
[Qemu-devel] [PATCH v2 06/11] acpi/cpu: add cpu hot unplug callback function
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com --- cpus.c| 7 +++ hw/acpi/cpu_hotplug.c | 8 include/hw/acpi/cpu_hotplug.h | 3 +++ include/qom/cpu.h | 9 + 4 files changed, 27 insertions(+) diff --git a/cpus.c b/cpus.c index d5e35c0..1c25054 100644 --- a/cpus.c +++ b/cpus.c @@ -1205,6 +1205,13 @@ void resume_all_vcpus(void) } } +void cpu_remove(CPUState *cpu) +{ +cpu-stop = true; +cpu-exit = true; +qemu_cpu_kick(cpu); +} + /* For temporary buffers for forming a name */ #define VCPU_THREAD_NAME_SIZE 16 diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c index f8a10d2..36feb6a 100644 --- a/hw/acpi/cpu_hotplug.c +++ b/hw/acpi/cpu_hotplug.c @@ -94,6 +94,14 @@ void acpi_cpu_unplug_request_cb(ACPIREGS *ar, qemu_irq irq, acpi_update_sci(ar, irq); } +void acpi_cpu_unplug_cb(ACPIREGS *ar, qemu_irq irq, AcpiCpuHotplug *g, +DeviceState *dev, Error **errp) +{ +CPUState *cpu = CPU(dev); + +cpu_remove(cpu); +} + void acpi_cpu_hotplug_init(MemoryRegion *parent, Object *owner, AcpiCpuHotplug *gpe_cpu, uint16_t base) { diff --git a/include/hw/acpi/cpu_hotplug.h b/include/hw/acpi/cpu_hotplug.h index 8b15a3d..0f84adb 100644 --- a/include/hw/acpi/cpu_hotplug.h +++ b/include/hw/acpi/cpu_hotplug.h @@ -27,6 +27,9 @@ void acpi_cpu_unplug_request_cb(ACPIREGS *ar, qemu_irq irq, AcpiCpuHotplug *g, DeviceState *dev, Error **errp); +void acpi_cpu_unplug_cb(ACPIREGS *ar, qemu_irq irq, +AcpiCpuHotplug *g, DeviceState *dev, Error **errp); + void acpi_cpu_hotplug_init(MemoryRegion *parent, Object *owner, AcpiCpuHotplug *gpe_cpu, uint16_t base); #endif diff --git a/include/qom/cpu.h b/include/qom/cpu.h index f663199..0592b4d 100644 --- a/include/qom/cpu.h +++ b/include/qom/cpu.h @@ -249,6 +249,7 @@ struct CPUState { bool created; bool stop; bool stopped; +bool exit; volatile sig_atomic_t exit_request; uint32_t interrupt_request; int singlestep_enabled; @@ -613,6 +614,14 @@ void cpu_exit(CPUState *cpu); void cpu_resume(CPUState *cpu); /** + * cpu_remove: + * @cpu: The vCPU to remove. + * + * Requests the CPU @cpu to be removed. + */ +void cpu_remove(CPUState *cpu); + +/** * qemu_init_vcpu: * @cpu: The vCPU to initialize. * -- 1.9.3
[Qemu-devel] [PATCH v2 08/11] acpi/ich9: add cpu hot unplug support
From: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/ich9.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c index b795f8f..a892e5d 100644 --- a/hw/acpi/ich9.c +++ b/hw/acpi/ich9.c @@ -316,8 +316,12 @@ void ich9_pm_device_unplug_request_cb(ICH9LPCPMRegs *pm, DeviceState *dev, void ich9_pm_device_unplug_cb(ICH9LPCPMRegs *pm, DeviceState *dev, Error **errp) { -error_setg(errp, acpi: device unplug for not supported device -type: %s, object_get_typename(OBJECT(dev))); +if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { +acpi_cpu_unplug_cb(pm-acpi_regs, pm-irq, pm-gpe_cpu, dev, errp); +} else { +error_setg(errp, acpi: device unplug for not supported device +type: %s, object_get_typename(OBJECT(dev))); +} } void ich9_pm_ospm_status(AcpiDeviceIf *adev, ACPIOSTInfoList ***list) -- 1.9.3
[Qemu-devel] [PATCH v2 10/11] cpu hotplug: implement function cpu_status_write() for vcpu ejection
From: Chen Fan chen.fan.f...@cn.fujitsu.com When OS ejected a vcpu (like: echo 1 /sys/bus/acpi/devices/LNXCPUXX/eject), it would call acpi EJ0 method, the firmware need to write the new cpumap, QEMU would know which vcpu need to be ejected. Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com --- hw/acpi/cpu_hotplug.c | 42 ++- hw/core/qdev.c| 2 +- hw/i386/acpi-dsdt-cpu-hotplug.dsl | 6 +- include/hw/acpi/cpu_hotplug.h | 1 + include/hw/qdev-core.h| 1 + 5 files changed, 49 insertions(+), 3 deletions(-) diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c index 36feb6a..2a56df0 100644 --- a/hw/acpi/cpu_hotplug.c +++ b/hw/acpi/cpu_hotplug.c @@ -17,6 +17,26 @@ typedef enum STS_OPT { CLEAR, } STS_OPT; +static void acpi_eject_vcpu(AcpiCpuHotplug *cpus_status, int64_t cpu_id) +{ +CPUState *cpu; + +CPU_FOREACH(cpu) { +CPUClass *cc = CPU_GET_CLASS(cpu); +int64_t id = cc-get_arch_id(cpu); +HotplugHandler *hotplug_ctrl; + +if (cpu_id == id) { +cpus_status-old_sts[cpu_id / 8] = ~(1 (cpu_id % 8)); +cpus_status-sts[cpu_id / 8] = ~(1 (cpu_id % 8)); + +hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(cpu)); +hotplug_handler_unplug(hotplug_ctrl, DEVICE(cpu), NULL); +break; +} +} +} + static uint64_t cpu_status_read(void *opaque, hwaddr addr, unsigned int size) { AcpiCpuHotplug *cpus = opaque; @@ -28,7 +48,26 @@ static uint64_t cpu_status_read(void *opaque, hwaddr addr, unsigned int size) static void cpu_status_write(void *opaque, hwaddr addr, uint64_t data, unsigned int size) { -/* TODO: implement VCPU removal on guest signal that CPU can be removed */ +AcpiCpuHotplug *cpus = opaque; +uint8_t val; +int i; +int64_t cpu_id = -1; + +val = cpus-old_sts[addr] ^ data; + +if (val == 0) { +return; +} + +for (i = 0; i 8; i++) { +if (val 1 i) { +cpu_id = 8 * addr + i; +} +} + +if (cpu_id != -1) { +acpi_eject_vcpu(cpus, cpu_id); +} } static const MemoryRegionOps AcpiCpuHotplug_ops = { @@ -56,6 +95,7 @@ static void acpi_update_cpu_present_bit(AcpiCpuHotplug *g, CPUState *cpu, switch (opt) { case SET: g-sts[cpu_id / 8] |= (1 (cpu_id % 8)); +g-old_sts[cpu_id / 8] |= (1 (cpu_id % 8)); break; case CLEAR: g-sts[cpu_id / 8] = ~(1 (cpu_id % 8)); diff --git a/hw/core/qdev.c b/hw/core/qdev.c index 901f289..9f08fe6 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -223,7 +223,7 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id, dev-alias_required_for_version = required_for_version; } -static HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev) +HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev) { HotplugHandler *hotplug_ctrl = NULL; diff --git a/hw/i386/acpi-dsdt-cpu-hotplug.dsl b/hw/i386/acpi-dsdt-cpu-hotplug.dsl index 34aab5a..9485f12 100644 --- a/hw/i386/acpi-dsdt-cpu-hotplug.dsl +++ b/hw/i386/acpi-dsdt-cpu-hotplug.dsl @@ -50,7 +50,11 @@ Scope(\_SB) { } Method(CPEJ, 2, NotSerialized) { // _EJ0 method - eject callback -Sleep(200) +Store(Zero, Index(CPON, ToInteger(Arg0))) +Store(One, Local0) +ShiftLeft(Local0, Arg0, Local0) +Not(Local0, Local0) +And(PRS, Local0, PRS) } #define CPU_STATUS_LEN ACPI_GPE_PROC_LEN diff --git a/include/hw/acpi/cpu_hotplug.h b/include/hw/acpi/cpu_hotplug.h index 0f84adb..abbb29e 100644 --- a/include/hw/acpi/cpu_hotplug.h +++ b/include/hw/acpi/cpu_hotplug.h @@ -18,6 +18,7 @@ typedef struct AcpiCpuHotplug { MemoryRegion io; uint8_t sts[ACPI_GPE_PROC_LEN]; +uint8_t old_sts[ACPI_GPE_PROC_LEN]; } AcpiCpuHotplug; void acpi_cpu_plug_cb(ACPIREGS *ar, qemu_irq irq, diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 589bbe7..5b28a4f 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -263,6 +263,7 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id, void qdev_unplug(DeviceState *dev, Error **errp); void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp); +HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev); void qdev_machine_creation_done(void); bool qdev_machine_modified(void); -- 1.9.3
Re: [Qemu-devel] [PATCH 1/4] target-i386: fix movntsd on big-endian hosts
On 13/01/2015 19:48, Eduardo Habkost wrote: if (b1 1) { -gen_stq_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); +gen_stq_env_A0(s, offsetof(CPUX86State, + xmm_regs[reg].XMM_Q(0))); Do we have (or will patch 4/4 introduce) the same bug on the tcg_gen_addi_ptr() calls that don't use the XMM_Q macro? No, they all call into helpers that use the XMM_Q macro themselves. Paolo
Re: [Qemu-devel] Patch Round-up for stable 2.1.3, freeze on 2015-01-14
Quoting Marcel Apfelbaum (2015-01-13 12:48:50) On 01/13/2015 07:49 PM, William Dauchy wrote: Hello, On Jan09 23:42, Paolo Bonzini wrote: That's commit 364c3e6b8dd7912e01d19122d791b8c8f6df4f6c on branch uq/master of git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git. maybe the one for: fix regression when reading memory size from config file is also a patch to think about. Definitely, the patch was: [PATCH] vl.c: fix regression when reading memory size from config file + Paolo's fix The thread is: http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg01000.html Be aware that we need both the patch and Paolo's fix. By Paolo's fix you mean: 364c3e6, vl.c: fix regression when reading machine type from config file? I've gone ahead and applied from uq/uq/master, but still waiting on vl.c: fix regression when reading memory size from config file to be picked up by a maintainer. Thanks, Marcel Thanks,
Re: [Qemu-devel] [PATCH v11 00/13] block: Incremental backup series
Minor update to the github version of this patchset that brings it inline with the recent block pull into master. It also drops a duplicate patch that has since made it upstream. Not re-posting yet to allow time for review and critique. --js On 01/12/2015 11:30 AM, John Snow wrote: Welcome to version 11. I hope you are enjoying our regular newsletter. This patchset enables the in-memory part of the incremental backup feature. A patchset by Vladimir Sementsov-Ogievskiy enables the migration of in-memory dirty bitmaps, and a future patchset will enable the storage and retrieval of dirty bitmaps to and from permanent storage. Enough changes have been made that most Reviewed-By lines from previous iterations have been removed. (Sorry!) This series was originally authored by Fam Zheng; his cover letter is included below. ~John Snow = This is the in memory part of the incremental backup feature. With the added commands, we can create a bitmap on a block backend, from which point of time all the writes are tracked by the bitmap, marking sectors as dirty. Later, we call drive-backup and pass the bitmap to it, to do an incremental backup. See the last patch which adds some tests for this use case. Fam = For convenience, this patchset is available on github: https://github.com/jnsnow/qemu/commits/dbm-backup v11: - Instead of copying BdrvDirtyBitmaps and keeping a pointer to the object we were copied from, we instead freeze a bitmap in-place without copying it. On success, we thaw and delete the bitmap. On failure, we merge the bitmap with a successor, which is an anonymous bitmap attached as a child that records writes for us for the duration of the backup operation. This means that incremental backups can NEVER BE RETRIED in a deterministic fashion. If an incremental backup fails on a set of dirty sectors {x}, and a new set of dirty sectors {y} are introduced during the backup, then any possible retry action on an incremental backup can only operate on {x,y}. There is no way to get an incremental backup as it would have been. So, the failure mode for incremental backup is to try again, and the resulting image will simply be a differential from the last successful dirty bitmap backup. - Removed hbitmap_copy and bdrv_dirty_bitmap_copy. - Added a small fixup patch: - Update all granularity fields to be uint64_t. - Update documentation around BdrvDirtyBitmap structure. - Modified transactions to obey frozen attribute of BdrvDirtyBitmaps. - Added frozen attribute to the info query.
Re: [Qemu-devel] [PATCH 1/8] tls: require compiler support for __thread
On 13 January 2015 at 17:52, Paolo Bonzini pbonz...@redhat.com wrote: The block layer is now using __thread unconditionally. Where? I did a quick grep for __thread and (other than stuff in the *-user code and some Win32 specific files) there's no use of __thread outside the DEFINE_TLS wrappers. Remove the fake TLS wrappers (that actually aren't TLS on !Linux) in include/qemu/tls.h, and add a testcase. What platforms have you tested on? We definitely shouldn't widen our use of __thread without testing it on all the platforms we support... thanks -- PMM
Re: [Qemu-devel] Patch Round-up for stable 2.1.3, freeze on 2015-01-14
On 01/13/2015 07:49 PM, William Dauchy wrote: Hello, On Jan09 23:42, Paolo Bonzini wrote: That's commit 364c3e6b8dd7912e01d19122d791b8c8f6df4f6c on branch uq/master of git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git. maybe the one for: fix regression when reading memory size from config file is also a patch to think about. Definitely, the patch was: [PATCH] vl.c: fix regression when reading memory size from config file + Paolo's fix The thread is: http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg01000.html Be aware that we need both the patch and Paolo's fix. Thanks, Marcel Thanks,
Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
Am 13.01.2015 um 17:04 schrieb Stefan Hajnoczi: [...] I'm really starting to get worried that you are going to break things. This DASD hack is a layering violation but okay, go ahead if you want. But now you are also thinking about breaking live migration. The proper thing to do is to introduce libvirt XML syntax for DASD. That way the geometry can be handled as part of the machine configuration. Then live migration and storage management tools can do the right thing. I've said this should be done in libvirt repeatedly but you keep wanting to hack QEMU instead of doing this cleanly :(. If you have plans to expand on this hack, please scrap this series and introduce libvirt XML syntax instead. We had no plans to expand on this band-aid. I just tried to come up with ideas beyond this hack to make this more acceptible to you (because I though that you want something on top). Seems that I misunderstood you, so if you prefer to just keep this hack and not do anything further in that direction, this is totally fine with me. So lets just fix up the small nits and go ahead, ok? Christian
Re: [Qemu-devel] [PATCH 2/4] target-i386: do not memcpy in and out of xmm_regs
On Wed, Jan 07, 2015 at 06:39:13PM +0100, Paolo Bonzini wrote: [...] diff --git a/target-i386/translate.c b/target-i386/translate.c index 5af4300..253009a 100644 --- a/target-i386/translate.c +++ b/target-i386/translate.c @@ -2621,10 +2621,10 @@ static inline void gen_sto_env_A0(DisasContext *s, int offset) static inline void gen_op_movo(int d_offset, int s_offset) { -tcg_gen_ld_i64(cpu_tmp1_i64, cpu_env, s_offset); -tcg_gen_st_i64(cpu_tmp1_i64, cpu_env, d_offset); -tcg_gen_ld_i64(cpu_tmp1_i64, cpu_env, s_offset + 8); -tcg_gen_st_i64(cpu_tmp1_i64, cpu_env, d_offset + 8); +tcg_gen_ld_i64(cpu_tmp1_i64, cpu_env, s_offset + offsetof(XMMReg, XMM_Q(0)); +tcg_gen_st_i64(cpu_tmp1_i64, cpu_env, d_offset + offsetof(XMMReg, XMM_Q(0)); +tcg_gen_ld_i64(cpu_tmp1_i64, cpu_env, s_offset + offsetof(XMMReg, XMM_Q(1)); +tcg_gen_st_i64(cpu_tmp1_i64, cpu_env, d_offset + offsetof(XMMReg, XMM_Q(1)); It looks good (I even sent my Reviewed-by line), but: target-i386/translate.c:2624:88: error: expected ‘)’ before ‘;’ token tcg_gen_ld_i64(cpu_tmp1_i64, cpu_env, s_offset + offsetof(XMMReg, XMM_Q(0)); ^ -- Eduardo
Re: [Qemu-devel] [PATCH v5] block/raw-posix.c: Fixes raw_getlength() on Mac OS X so that it reports the correct length of a real CD
On Jan 13, 2015, at 1:34 PM, Peter Maydell wrote: On 13 January 2015 at 18:26, Programmingkid programmingk...@gmail.com wrote: Allows for using real cdrom disc in QEMU under Mac OS X. This command was used to see if this patch worked: ./qemu-system-i386 -drive if=none,id=drive0,file=/dev/null,format=raw Signed-off-by: John Arbuckle programmingk...@gmail.com --- Replaced -errno with 0 for ioctl() failure return value. make check now passes with this patch. block/raw-posix.c | 18 +- configure |2 +- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index e51293a..16fa0a4 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -1312,7 +1312,23 @@ again: if (size == 0) #endif #if defined(__APPLE__) defined(__MACH__) -size = LLONG_MAX; +{ +uint64_t sectors = 0; +uint32_t sector_size = 0; + +/* Query the number of sectors on the disk */ +ret = ioctl(fd, DKIOCGETBLOCKCOUNT, sectors); +if (ret == -1) { +return 0; We should be falling back to lseek, not returning 0... thanks -- PMM I did try using lseek, but it doesn't work with Mac OS X block devices. It would always fail. Here is some more information on the problem: http://stackoverflow.com/questions/9073614/open-raw-disk-and-get-size-os-x. I thought things would be as simple as using fopen() with a block device as the argument, but Apple decided to make things more difficult. It looks like they are forcing developers to use the IOKit for device communications. Simple posix functions like lseek don't work.
Re: [Qemu-devel] [PATCH 2/2] cpu_ldst.h: Collapse laddr() and saddr() into ldst_get_host_addr()
On 13 January 2015 at 18:32, Peter Maydell peter.mayd...@linaro.org wrote: The macros laddr() and saddr() are always defined to be identical to each other. Replace them with a single macro, and give it a longer name so it's easier to grep the codebase and confirm that it's only used in this one place: ldst_get_host_addr(). Hmm. Actually I think we should be able to drop the _raw() accessors entirely -- they're kind of nonsensical on softmmu and four unnecessary extra characters on linux-user. (The only place we use them in softmmu is in monitor.c where it should be using the ld*_p functions instead.) I'll put together a patchset... -- PMM
Re: [Qemu-devel] [PATCH 1/8] tls: require compiler support for __thread
On 13/01/2015 19:40, Peter Maydell wrote: On 13 January 2015 at 17:52, Paolo Bonzini pbonz...@redhat.com wrote: The block layer is now using __thread unconditionally. Where? I did a quick grep for __thread and (other than stuff in the *-user code and some Win32 specific files) there's no use of __thread outside the DEFINE_TLS wrappers. It's in the pull request. Remove the fake TLS wrappers (that actually aren't TLS on !Linux) in include/qemu/tls.h, and add a testcase. What platforms have you tested on? We definitely shouldn't widen our use of __thread without testing it on all the platforms we support... Native TLS is supported by all platforms except Windows and, I think, OpenBSD, which will have to use GCC's emulated TLS. For Windows we already do. OpenBSD ports will have to use a new-enough GCC (basically depend on the GPLv3 GCC port). Paolo
Re: [Qemu-devel] [PATCH 1/2] cpu_ldst.h: Remove unused ldul_ macros
On 13/01/2015 19:32, Peter Maydell wrote: The five ldul_ macros are not used anywhere and are marked up with an XXX comment. ldul is a non-standard prefix for our family of load instructions: we don't mark 32-bit accesses for signedness because they return a 32 bit quantity. So just delete them. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- include/exec/cpu_ldst.h | 9 - 1 file changed, 9 deletions(-) diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h index e5550e7..4700831 100644 --- a/include/exec/cpu_ldst.h +++ b/include/exec/cpu_ldst.h @@ -151,15 +151,6 @@ #else -/* XXX: find something cleaner. - * Furthermore, this is false for 64 bits targets - */ -#define ldul_user ldl_user -#define ldul_kernel ldl_kernel -#define ldul_hypv ldl_hypv -#define ldul_executive ldl_executive -#define ldul_supervisor ldl_supervisor - /* The memory helpers for tcg-generated code need tcg_target_long etc. */ #include tcg.h Trivial, even. Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Re: [Qemu-devel] [PATCH for 2.3 v2 1/1] xen-hvm: increase maxmem before calling xc_domain_populate_physmap
On 01/13/15 13:07, Stefano Stabellini wrote: On Mon, 12 Jan 2015, Stefano Stabellini wrote: On Wed, 3 Dec 2014, Don Slutz wrote: From: Stefano Stabellini stefano.stabell...@eu.citrix.com Increase maxmem before calling xc_domain_populate_physmap_exact to avoid the risk of running out of guest memory. This way we can also avoid complex memory calculations in libxl at domain construction time. This patch fixes an abort() when assigning more than 4 NICs to a VM. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Signed-off-by: Don Slutz dsl...@verizon.com --- v2: Changes by Don Slutz Switch from xc_domain_getinfo to xc_domain_getinfolist Fix error check for xc_domain_getinfolist Limit increase of maxmem to only do when needed: Add QEMU_SPARE_PAGES (How many pages to leave free) Add free_pages calculation xen-hvm.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/xen-hvm.c b/xen-hvm.c index 7548794..d30e77e 100644 --- a/xen-hvm.c +++ b/xen-hvm.c @@ -90,6 +90,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu) #endif #define BUFFER_IO_MAX_DELAY 100 +#define QEMU_SPARE_PAGES 16 We need a big comment here to explain why we have this parameter and when we'll be able to get rid of it. Other than that the patch is fine. Thanks! Actually I'll just go ahead and add the comment and commit, if for you is OK. That would be fine with me. I was still working on a good wording. -Don Slutz Cheers, Stefano
Re: [Qemu-devel] Fedora FC21 - Bug: 100% CPU and hangs in gettimeofday(tp, NULL); forever
On 12.01.2015 12:41, Gerhard Wiesinger wrote: On 08.01.2015 23:28, Gerhard Wiesinger wrote: I'll keep you up to date in the next days whether it happens again or not. With qemu-kvm 2.2.0 release from the above repository the 100% usage didn't happen so far (although I had to reboot after kernel update). It happens also with qemu-kvm 2.2.0 on another VM where also PostgreSQL is running: (gdb) bt #0 0x7fff9a1feff4 in gettimeofday () #1 0x006d425e in GetCurrentTimestamp () at timestamp.c:1274 What we know: OK : F20: 3.17.6-200.fc20.x86_64 on guest/host, qemu-kvm-1.6.2-10.fc20.x86_64 on host NOK: F21: 3.17.7-300.fc21.x86_64 on guest/host, qemu-kvm-2.1.2-7.fc21.x86_64 on host NOK: F21: 3.17.8-300.fc21.x86_64 on guest/host, qemu-kvm-2.2.0-1.fc21.x86_64 on host No one less can reproduce or has similar problems? Any further ideas? BTW: I'm running ntp in the following manner: internet = ntp server in VM = ntp client on KVM host (firewall runs in KVM) Ciao, Gerhard
Re: [Qemu-devel] [PATCH 0/1] pci: allow 0 address for PCI IO/MEM regions
On Tue, Jan 13, 2015 at 06:01:53PM +0100, Alexander Graf wrote: On 01/13/15 17:17, Michael S. Tsirkin wrote: On Tue, Jan 13, 2015 at 05:54:46PM +0200, Michael S. Tsirkin wrote: I think we already do this for PC: commit 83d08f2673504a299194dcac1657a13754b5932a Author: Michael S. Tsirkin m...@redhat.com Date: Tue Oct 29 13:57:34 2013 +0100 pc: map PCI address space as catchall region for not mapped addresses but we need to find and fix all other targets. BTW this is very easy to test. Add an unused device (like ivshmem) enable BAR, and set it to 0. System should survive, as opposed to hanging. But the big question is whether this is the right thing to do for each platform. For PIIX whatever is not system memory, is PCI. But other boxes might have a different view of the matter. Only few platforms have PCI mapped at 0, no? So in most cases you get windows that simply don't have overlapping at all. Alex What's missing is complete analysis of the problem. Ideally testing on several targets. -- MST
Re: [Qemu-devel] [PATCH 1/8] tls: require compiler support for __thread
On 13 January 2015 at 19:48, Paolo Bonzini pbonz...@redhat.com wrote: It's in the pull request. That was sneaky :-) Native TLS is supported by all platforms except Windows and, I think, OpenBSD, which will have to use GCC's emulated TLS. For Windows we already do. OpenBSD ports will have to use a new-enough GCC (basically depend on the GPLv3 GCC port). Hmm. That's a chunk of users who are now going to have to change the way they've been building QEMU. Does configure at least blow up on the old gcc, or will we just silently build a non-working QEMU? (I have an OpenBSD VM at home, I should test it...) Presumably we've also just effectively dropped support for some of the obscure stuff we have ifdefs for in configure? -- PMM
Re: [Qemu-devel] [PATCH v15 2/2] sPAPR: Implement sPAPRPHBClass::eeh_handler
On Mon, Jan 05, 2015 at 11:26:28AM +1100, Gavin Shan wrote: The patch implements sPAPRPHBClass::eeh_handler so that the EEH RTAS requests can be routed to VFIO for further handling. Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- hw/ppc/spapr_pci_vfio.c | 56 + hw/vfio/common.c| 1 + 2 files changed, 57 insertions(+) diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c index 144912b..73652a9 100644 --- a/hw/ppc/spapr_pci_vfio.c +++ b/hw/ppc/spapr_pci_vfio.c @@ -71,6 +71,61 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp) spapr_tce_get_iommu(tcet)); } +static int spapr_phb_vfio_eeh_handler(sPAPRPHBState *sphb, int req, int opt) +{ +sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); +struct vfio_eeh_pe_op op = { .argsz = sizeof(op) }; This is a local variable, which means it won't be initialized. You never memset() it and it's not obvious that all fields get initialized, which makes it dangerous to pass to an ioctl(). +int cmd; + +switch (req) { +case RTAS_EEH_REQ_SET_OPTION: +switch (opt) { +case RTAS_EEH_DISABLE: +cmd = VFIO_EEH_PE_DISABLE; +break; +case RTAS_EEH_ENABLE: +cmd = VFIO_EEH_PE_ENABLE; +break; +case RTAS_EEH_THAW_IO: +cmd = VFIO_EEH_PE_UNFREEZE_IO; +break; +case RTAS_EEH_THAW_DMA: +cmd = VFIO_EEH_PE_UNFREEZE_DMA; +break; +default: +return -EINVAL; +} +break; +case RTAS_EEH_REQ_GET_STATE: +cmd = VFIO_EEH_PE_GET_STATE; +break; +case RTAS_EEH_REQ_RESET: +switch (opt) { +case RTAS_SLOT_RESET_DEACTIVATE: +cmd = VFIO_EEH_PE_RESET_DEACTIVATE; +break; +case RTAS_SLOT_RESET_HOT: +cmd = VFIO_EEH_PE_RESET_HOT; +break; +case RTAS_SLOT_RESET_FUNDAMENTAL: +cmd = VFIO_EEH_PE_RESET_FUNDAMENTAL; +break; +default: +return -EINVAL; +} +break; +case RTAS_EEH_REQ_CONFIGURE: +cmd = VFIO_EEH_PE_CONFIGURE; +break; +default: + return -EINVAL; +} + +op.op = cmd; +return vfio_container_ioctl(svphb-phb.iommu_as, svphb-iommugroupid, +VFIO_EEH_PE_OP, op); Don't you need some sort of translation from the errnos the ioctl() returns into RTAS error codes? +} + static void spapr_phb_vfio_reset(DeviceState *qdev) { /* Do nothing */ @@ -84,6 +139,7 @@ static void spapr_phb_vfio_class_init(ObjectClass *klass, void *data) dc-props = spapr_phb_vfio_properties; dc-reset = spapr_phb_vfio_reset; spc-finish_realize = spapr_phb_vfio_finish_realize; +spc-eeh_handler = spapr_phb_vfio_eeh_handler; } static const TypeInfo spapr_phb_vfio_info = { diff --git a/hw/vfio/common.c b/hw/vfio/common.c index cf483ff..8a10c8b 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -948,6 +948,7 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid, switch (req) { case VFIO_CHECK_EXTENSION: case VFIO_IOMMU_SPAPR_TCE_GET_INFO: +case VFIO_EEH_PE_OP: break; default: /* Return an error on unknown requests */ -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpjGi7d6REGx.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH v15 1/2] sPAPR: Implement EEH RTAS calls
On Mon, Jan 05, 2015 at 11:26:27AM +1100, Gavin Shan wrote: The emulation for EEH RTAS requests from guest isn't covered by QEMU yet and the patch implements them. The patch defines constants used by EEH RTAS calls and adds callback sPAPRPHBClass::eeh_handler, which is going to be used this way: * RTAS calls are received in spapr_pci.c, sanity check is done there. * RTAS handlers handle what they can. If there is something it cannot handle and sPAPRPHBClass::eeh_handler callback is defined, it is called. * sPAPRPHBClass::eeh_handler is only implemented for VFIO now. It does ioctl() to the IOMMU container fd to complete the call. Error codes from that ioctl() are transferred back to the guest. [aik: defined RTAS tokens for EEH RTAS calls] Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com --- hw/ppc/spapr_pci.c | 275 include/hw/pci-host/spapr.h | 7 ++ include/hw/ppc/spapr.h | 43 ++- 3 files changed, 323 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index 21b95b3..a150074 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -406,6 +406,262 @@ static void rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu, rtas_st(rets, 2, 1);/* 0 == level; 1 == edge */ } +static int rtas_handle_eeh_request(sPAPREnvironment *spapr, + uint64_t buid, uint32_t req, uint32_t opt) +{ +sPAPRPHBState *sphb = find_phb(spapr, buid); +sPAPRPHBClass *info; + +if (!sphb) { +return -ENODEV; I think it would make more sense to return RTAS error codes here, rather than errnos. At present all the callers seem to ignore the exact value of this return value. But it's not really correct to return RTAS_OUT_HW_ERROR for a bad BUID, which is what this will do now. Also several of the callers have already done a find_phb() by the time they call this. Perhaps it would make more sense for this function to take a sPAPRPHBState * instead of the buid. +} + +info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb); +if (!info-eeh_handler) { +return -ENOENT; +} + +return info-eeh_handler(sphb, req, opt); +} + +static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu, +sPAPREnvironment *spapr, +uint32_t token, uint32_t nargs, +target_ulong args, uint32_t nret, +target_ulong rets) +{ +uint32_t addr, option; +uint64_t buid; +int ret; + +if ((nargs != 4) || (nret != 1)) { +goto param_error_exit; +} + +buid = ((uint64_t)rtas_ld(args, 1) 32) | rtas_ld(args, 2); +addr = rtas_ld(args, 0); +option = rtas_ld(args, 3); +switch (option) { +case RTAS_EEH_ENABLE: +if (!find_dev(spapr, buid, addr)) { +goto param_error_exit; +} +break; +case RTAS_EEH_DISABLE: +case RTAS_EEH_THAW_IO: +case RTAS_EEH_THAW_DMA: So, currently these are all implemented as no-ops. Is that correct? +break; +default: +goto param_error_exit; +} + +ret = rtas_handle_eeh_request(spapr, buid, + RTAS_EEH_REQ_SET_OPTION, option); +if (ret = 0) { +rtas_st(rets, 0, RTAS_OUT_SUCCESS); +} else { +rtas_st(rets, 0, RTAS_OUT_HW_ERROR); +} + +return; + +param_error_exit: +rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR); +} + +static void rtas_ibm_get_config_addr_info2(PowerPCCPU *cpu, + sPAPREnvironment *spapr, + uint32_t token, uint32_t nargs, + target_ulong args, uint32_t nret, + target_ulong rets) +{ +uint32_t addr, option; +uint64_t buid; +sPAPRPHBState *sphb; +sPAPRPHBClass *info; +PCIDevice *pdev; + +if ((nargs != 4) || (nret != 2)) { +goto param_error_exit; +} + +buid = ((uint64_t)rtas_ld(args, 1) 32) | rtas_ld(args, 2); +sphb = find_phb(spapr, buid); +if (!sphb) { +goto param_error_exit; +} + +info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb); +if (!info-eeh_handler) { +goto param_error_exit; +} + +addr = rtas_ld(args, 0); +option = rtas_ld(args, 3); +if (option != RTAS_GET_PE_ADDR option != RTAS_GET_PE_MODE) { +goto param_error_exit; +} + +pdev = find_dev(spapr, buid, addr); +if (!pdev) { +goto param_error_exit; +} + +/* + * For now, we always have bus level PE whose address + * has format 00BBSS00. The guest OS might regard + * PE address 0 as invalid. We avoid that simply by + * extending it with one. +
Re: [Qemu-devel] Fedora FC21 - Bug: 100% CPU and hangs in gettimeofday(tp, NULL); forever
On 01/13/2015 05:01 PM, Gerhard Wiesinger wrote: On 13.01.2015 22:16, Paolo Bonzini wrote: On 13/01/2015 22:14, Gerhard Wiesinger wrote: I also had a look at the kernel code again: http://lxr.free-electrons.com/source/kernel/time/timekeeping.c?v=3.17#L493 499 do { 500 seq = read_seqcount_begin(tk_core.seq); 501 502 ts-tv_sec = tk-xtime_sec; 503 nsecs = timekeeping_get_ns(tk-tkr); 504 505 } while (read_seqcount_retry(tk_core.seq, seq)); So it looks like that the seqcount always changes and therefore loops forever here (as far as I digged it down this is the only loop here). Might be something wrong with the memory barriers in recent qemu-kvm releases? No, that's not possible. Unless you pause/resume or migrate the VM, all of the handling of kvmclock is entirely in the kernel. Any other possible explaination of the problem? Had a look at the diff (I guess the right file at least in qemu tree): # no critical changes IHMO here git diff -u v1.6.2..v2.1.2 ./hw/i386/kvm/clock.c Trying to reproduce with a loop: #include sys/time.h #include stdio.h int main(int argc, char* argv[]) { struct timeval tv; int i = 0; for (;;) { gettimeofday(tv, 0); ++i; if (i = 1000) { i = 0; printf(%i\n, (int)tv.tv_sec); } } return 0; } As I wrote this: First tests seem to run well, so no quick win , I could reproduce it with a stall in 318s :-) (gdb) bt #0 0x7fff6d9fefff in gettimeofday () #1 0x004005ad in main (argc=1, argv=0x7fff6d9b28b8) at gettimeofdayloop.c:10 So we have at least a testcase which is quickly to reproduce. So we are digging down my second findings about a major bug in qemu-kvm :-) Can someone try, too? Ciao, Gerhard Take a look at the following kernel bug. It specifically deals with a hang in gettimeofday() in a KVM guest: https://bugzilla.redhat.com/show_bug.cgi?id=1178975 There is a link to a patched kernel you can try; it fixed my problems (I was repeatedly getting hangs in python-urlgrabber during yum updates on F21).
[Qemu-devel] [PATCH] pseries: Limit PCI host bridge index value
pseries guests can have large numbers of PCI host bridges. To avoid the user having to specify a number of different configuration values for every one, the device supports an index property which is a shorthand setting the various window and configuration addresses from a predefined sensible set. There are some problems with the details at present: * The index propery is signed, but negative values will create PCI windows below where we expect, potentially colliding with other devices * No limit is imposed on the index property and large values can translate to extremely large window addresses. With PCI passthrough in particular this can mean we exceed various mapping and physical address limits causing the guest host bridge to not work in strange ways. This patch addresses this, by making index unsigned, and imposing a limit. Currently the limit allows indices from 0..255 which is probably enough host bridges for the time being. It's fairly easy to extend if we discover we need more. Signed-off-by: David Gibson da...@gibson.dropbear.id.au --- hw/ppc/spapr_pci.c | 8 +++- include/hw/pci-host/spapr.h | 4 +++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index 21b95b3..6deeb19 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -501,6 +501,12 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) return; } +if (sphb-index SPAPR_PCI_MAX_INDEX) { +error_setg(errp, \index\ for PAPR PHB is too large (max %u), + SPAPR_PCI_MAX_INDEX); +return; +} + sphb-buid = SPAPR_PCI_BASE_BUID + sphb-index; sphb-dma_liobn = SPAPR_PCI_BASE_LIOBN + sphb-index; @@ -669,7 +675,7 @@ static void spapr_phb_reset(DeviceState *qdev) } static Property spapr_phb_properties[] = { -DEFINE_PROP_INT32(index, sPAPRPHBState, index, -1), +DEFINE_PROP_UINT32(index, sPAPRPHBState, index, -1), DEFINE_PROP_UINT64(buid, sPAPRPHBState, buid, -1), DEFINE_PROP_UINT32(liobn, sPAPRPHBState, dma_liobn, -1), DEFINE_PROP_UINT64(mem_win_addr, sPAPRPHBState, mem_win_addr, -1), diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h index 4ea2a0d..876ecf0 100644 --- a/include/hw/pci-host/spapr.h +++ b/include/hw/pci-host/spapr.h @@ -64,7 +64,7 @@ typedef struct spapr_pci_msi_mig { struct sPAPRPHBState { PCIHostState parent_obj; -int32_t index; +uint32_t index; uint64_t buid; char *dtbusname; @@ -94,6 +94,8 @@ struct sPAPRPHBVFIOState { int32_t iommugroupid; }; +#define SPAPR_PCI_MAX_INDEX 255 + #define SPAPR_PCI_BASE_BUID 0x8002000ULL #define SPAPR_PCI_WINDOW_BASE0x100ULL -- 2.1.0
Re: [Qemu-devel] question about live migration with storage
On 2015-01-13 17:45:45, Paolo Bonzini wrote: On 13/01/2015 03:03, Zhang Haoyu wrote: I want to live migrate a vm with storage, with regard to the migration of storage, should I use drive_mirror or traditional mechanism implemented in block-migration.c ? Because I don't use libvirtd to manage vm, if I want to use drive_mirror to perform live migration with storage, how to organize the flow via script? The same as libvirt does. Hi, Paolo, what's advantages of drive_mirror over traditional mechanism implemented in block-migration.c ? Why libvirt use drive_mirror instead of traditional iterative mechanism as the default way of live migration with non-shared storage? Thanks, Zhang Haoyu Paolo
Re: [Qemu-devel] [PATCH v3] spapr-pci: Enable huge BARs
On Fri, Jan 09, 2015 at 12:11:14PM +1100, Alexey Kardashevskiy wrote: At the moment sPAPR only supports 512MB window for MMIO BARs. However modern devices might want bigger 64bit BARs. This extends MMIO window from 512MB to 62GB (aligned to SPAPR_PCI_WINDOW_SPACING) and advertises it in 2 records in the PHB ranges property. 32bit gets the space from SPAPR_PCI_MEM_WIN_BUS_OFFSET till the end of 4GB, 64bit gets the rest of the space. The approach changes the device tree which is a guest visible change, however it won't break migration as: 1. we do not support migration to older QEMU versions 2. migration to newer QEMU will migrate the device tree as well and since the new layout only extends the old one and does not change address mappigns, no breakage is expected here too. SLOF change is required to utilize this extension. Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v3: * used SPAPR_PCI_WINDOW_SPACING in SPAPR_PCI_MMIO_WIN_SIZE * extended commit log v2: * do not change existing memory layout * do not create another mmio window --- hw/ppc/spapr_pci.c | 8 +++- include/hw/pci-host/spapr.h | 7 --- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index c0d8c1c..7b73106 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -862,6 +862,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, int bus_off, i, j; char nodename[256]; uint32_t bus_range[] = { cpu_to_be32(0), cpu_to_be32(0xff) }; +const uint64_t win32size = (1ULL 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET; struct { uint32_t hi; uint64_t child; @@ -876,7 +877,12 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, { cpu_to_be32(b_ss(2)), cpu_to_be64(SPAPR_PCI_MEM_WIN_BUS_OFFSET), cpu_to_be64(phb-mem_win_addr), -cpu_to_be64(memory_region_size(phb-memwindow)), +cpu_to_be64(win32size), +}, +{ +cpu_to_be32(b_ss(3)), cpu_to_be64(1ULL 32), +cpu_to_be64(phb-mem_win_addr + win32size), +cpu_to_be64(memory_region_size(phb-memwindow) - win32size) }, I think this needs to be a little fancier. It will work in the case of the normal PHB configuration. But the user can override the size of the memory window size, in which case you may need to omit the second range, or even limit the size of the 32-bit range. }; uint64_t bus_reg[] = { cpu_to_be64(phb-buid), 0 }; diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h index 4ea2a0d..92695b6 100644 --- a/include/hw/pci-host/spapr.h +++ b/include/hw/pci-host/spapr.h @@ -96,17 +96,18 @@ struct sPAPRPHBVFIOState { #define SPAPR_PCI_BASE_BUID 0x8002000ULL +#define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x8000ULL + #define SPAPR_PCI_WINDOW_BASE0x100ULL #define SPAPR_PCI_WINDOW_SPACING 0x10ULL #define SPAPR_PCI_MMIO_WIN_OFF 0xA000 -#define SPAPR_PCI_MMIO_WIN_SIZE 0x2000 +#define SPAPR_PCI_MMIO_WIN_SIZE (SPAPR_PCI_WINDOW_SPACING - \ + SPAPR_PCI_MEM_WIN_BUS_OFFSET) #define SPAPR_PCI_IO_WIN_OFF 0x8000 #define SPAPR_PCI_IO_WIN_SIZE0x1 #define SPAPR_PCI_MSI_WINDOW 0x400ULL -#define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x8000ULL - static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin) { return xics_get_qirq(spapr-icp, phb-lsi_table[pin].irq); I think you also should disable the extended window for older machine revisions. It would probably work anyway, but I think it's going to be safer for backwards compat if you keep the windows the same for a given machine revision. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpJv40AShru7.pgp Description: PGP signature
Re: [Qemu-devel] [RFC PATCH] spapr_vio/spapr_iommu: Move VIO bypass where it belongs
On Fri, Jan 09, 2015 at 12:02:51PM +1100, Alexey Kardashevskiy wrote: Instead of tweaking a TCE table device by adding there a bypass flag, let's add an alias to RAM and IOMMU memory region, and enable/disable those according to the selected bypass mode. This way IOMMU memory region can have size of the actual window rather than ram_size which is essential for upcoming DDW support. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- The main reason for this patch is DDW and the fact that sPAPRTCETable used for DMA windows for VFIO. My latest approach removes all DMA windows on the guest reset (and creates a new 32bit one) which means than VFIO unmaps everything and this fails as normally sPAPRTCETable MemoryRegion is ram_size big (to support bypass) while it should be 1-2GB. Paolo already mentioned the only significant problem I see with this, which is migration backwards compat. Otherwise it looks good, sorry I messed it up in the first place :). [snip] @@ -456,14 +469,25 @@ static int spapr_vio_busdev_init(DeviceState *qdev) if (pc-rtce_window_size) { uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev-reg; + +memory_region_init(dev-mrroot, OBJECT(dev), iommu-spapr-root, + ram_size); +memory_region_init_alias(dev-mrbypass, OBJECT(dev), + iommu-spapr-bypass, get_system_memory(), + 0, ram_size); +memory_region_add_subregion_overlap(dev-mrroot, 0, dev-mrbypass, 1); +address_space_init(dev-as, dev-mrroot, qdev-id); + dev-tcet = spapr_tce_new_table(qdev, liobn, 0, SPAPR_TCE_PAGE_SHIFT, pc-rtce_window_size SPAPR_TCE_PAGE_SHIFT, false); -address_space_init(dev-as, spapr_tce_get_iommu(dev-tcet), qdev-id); +memory_region_add_subregion_overlap(dev-mrroot, 0, +spapr_tce_get_iommu(dev-tcet), 2); } + ^^ Tiny nit, extraneous whitespace change. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson pgpggLAYVI6v_.pgp Description: PGP signature