date:20150113

Re: [Qemu-devel] [PATCH] Fixes key mapping so all keys work

2015-01-13 Thread Peter Maydell

On 14 January 2015 at 00:12, Programmingkid programmingk...@gmail.com wrote:
 This patch allows for all the keys on an Apple extended keyboard to work in 
 QEMU.

 Signed-off-by: John Arbuckle programmingk...@gmail.com

 ---
  ui/cocoa.m |   29 -
  1 files changed, 20 insertions(+), 9 deletions(-)

 diff --git a/ui/cocoa.m b/ui/cocoa.m
 index c8535a3..afac987 100644
 --- a/ui/cocoa.m
 +++ b/ui/cocoa.m
 @@ -128,18 +128,26 @@ int keymap[] =
  14, //  51  0x330x0eBKSPQZ_BACKSPACE
  0,  //  52  0x34Undefined
  1,  //  53  0x350x01ESC QZ_ESCAPE
 -220, // 54  0x360xdcE0,5C   R GUI   QZ_RMETA
 -219, // 55  0x370xdbE0,5B   L GUI   QZ_LMETA
 +219,//  54  0x36QZ_RMETA
 +219,//  55  0x37QZ_LMETA

This looks wrong: we shouldn't be mapping right meta and
left meta to the same thing, they're different keys.

  42, //  56  0x380x2aL SHFT  QZ_LSHIFT
  58, //  57  0x390x3aCAPSQZ_CAPSLOCK
  56, //  58  0x3A0x38L ALT   QZ_LALT
  29, //  59  0x3B0x1dL CTRL  QZ_LCTRL
 -54, //  60  0x3C0x36R SHFT  QZ_RSHIFT
 -184,//  61  0x3D0xb8E0,38   R ALT   QZ_RALT
 -157,//  62  0x3E0x9dE0,1D   R CTRL  QZ_RCTRL
 +
 +/*
 + Map right shift, control, option, and command keys to left counterpart
 + to improve compatibility with Mac OS.
 +*/
 +
 +42, //  60  0x3C0x36R SHFT  QZ_RSHIFT
 +56, //  61  0x3D0xb8E0,38   R ALT   QZ_RALT
 +29, //  62  0x3E0x9dE0,1D   R CTRL  QZ_RCTRL

Compatibility with Mac OS as a host or as a guest?
These are different keys, and if we do this then
we won't be able to work correctly with guests that
really do treat them as different.

 +
 +
  0,  //  63  0x3FUndefined
  0,  //  64  0x40Undefined
 -0,  //  65  0x41Undefined
 +0x53,// 65  0x41 Keypad .

Can we be consistent with the other entries, please? SDLi entry
in decimal, and appropriate items in the other columns including
whatever the QZ_ constant is if we can find it.

  0,  //  66  0x42Undefined
  55, //  67  0x430x37KP *QZ_KP_MULTIPLY
  0,  //  68  0x44Undefined
 @@ -150,12 +158,12 @@ int keymap[] =
  0,  //  73  0x49Undefined
  0,  //  74  0x4AUndefined
  181,//  75  0x4B0xb5E0,35   KP /QZ_KP_DIVIDE
 -152,//  76  0x4C0x9cE0,1C   KP EN   QZ_KP_ENTER
 +0xe01c, //152,//  76  0x4C0x9cE0,1C   KP EN   QZ_KP_ENTER

Only entry in the entire table that's not a one-byte value?

  0,  //  77  0x4Dundefined
  74, //  78  0x4E0x4aKP -QZ_KP_MINUS
  0,  //  79  0x4FUndefined
  0,  //  80  0x50Undefined
 -0,  //  81  0x51QZ_KP_EQUALS
 +13, //  81  0x51QZ_KP_EQUALS

The keypad = key and the normal = key ought to have different
values here. Otherwise the guest can't tell them apart.

  82, //  82  0x520x52KP 0QZ_KP0
  79, //  83  0x530x4fKP 1QZ_KP1
  80, //  84  0x540x50KP 2QZ_KP2
 @@ -201,10 +209,13 @@ int keymap[] =
  205,//  124 0x7C0xcde0,4D   R ARROW QZ_RIGHT
  208,//  125 0x7D0xd0E0,50   D ARROW QZ_DOWN
  200,//  126 0x7E0xc8E0,48   U ARROW QZ_UP
 +
  /* completed according to 
 http://www.libsdl.org/cgi/cvsweb.cgi/SDL12/src/video/quartz/SDL_QuartzKeys.h?rev=1.6content-type=text/x-cvsweb-markup
  */

 -/* Additional 104 Key XP-Keyboard Scancodes from 
 http://www.computer-engineering.org/ps2keyboard/scancodes1.html */
 +/* Aditional 104 Key XP-Keyboard Scancodes from 
 http://www.computer-engineering.org/ps2keyboard/scancodes1.html */

This is reintroducing a typo that was fixed back in 2013
(in commit 49b9bd4dc).

  /*
 +219 //  0xdbe0,5b   L GUI
 +220 //  0xdce0,5c   R GUI
  221 //  0xdde0,5d   APPS
  //  E0,2A,E0,37 PRNT SCRN
  //  E1,1D,45,E1,9D,C5   PAUSE

-- PMM

[Qemu-devel] [PATCH] AIO: Reduce number of threads for 32bit hosts

2015-01-13 Thread Alexander Graf

On hosts with limited virtual address space (32bit pointers), we can very
easily run out of virtual memory with big thread pools.

Instead, we should limit ourselves to small pools to keep memory footprint
low on those systems.

This patch fixes random VM stalls like

  (process:25114): GLib-ERROR **: gmem.c:103: failed to allocate 1048576 bytes

on 32bit ARM systems for me.

Signed-off-by: Alexander Graf ag...@suse.de
---
 thread-pool.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/thread-pool.c b/thread-pool.c
index e2cac8e..87a3ea9 100644
--- a/thread-pool.c
+++ b/thread-pool.c
@@ -299,7 +299,12 @@ static void thread_pool_init_one(ThreadPool *pool, 
AioContext *ctx)
 qemu_mutex_init(pool-lock);
 qemu_cond_init(pool-worker_stopped);
 qemu_sem_init(pool-sem, 0);
-pool-max_threads = 64;
+if (sizeof(pool) == 4) {
+/* 32bit systems run out of virtual memory quickly */
+pool-max_threads = 4;
+} else {
+pool-max_threads = 64;
+}
 pool-new_thread_bh = aio_bh_new(ctx, spawn_thread_bh_fn, pool);
 
 QLIST_INIT(pool-head);
-- 
1.7.12.4

Re: [Qemu-devel] [RESEND PATCH 0/2] QEMUSizedBuffer/QEMUFile fixes

2015-01-13 Thread Hongyang Yang


Ping...

在 12/19/2014 11:38 AM, Yang Hongyang 写道:

Rebased to the latest master.

Yang Hongyang (2):
   QEMUSizedBuffer: only free qsb that qemu_bufopen allocated
   Tests: QEMUSizedBuffer/QEMUBuffer

  migration/qemu-file-buf.c | 10 ++
  tests/test-vmstate.c  | 20 
  2 files changed, 14 insertions(+), 16 deletions(-)



--
Thanks,
Yang.

Re: [Qemu-devel] [PATCH v11 09/13] qmp: Add support of dirty-bitmap sync mode for drive-backup

2015-01-13 Thread Fam Zheng

On Tue, 01/13 12:50, John Snow wrote:
 
 
 On 01/13/2015 04:37 AM, Fam Zheng wrote:
 On Mon, 01/12 11:31, John Snow wrote:
 For dirty-bitmap sync mode, the block job will iterate through the
 given dirty bitmap to decide if a sector needs backup (backup all the
 dirty clusters and skip clean ones), just as allocation conditions of
 top sync mode.
 
 Signed-off-by: Fam Zheng f...@redhat.com
 Signed-off-by: John Snow js...@redhat.com
 ---
   block.c   |   5 ++
   block/backup.c| 120 
  ++
   block/mirror.c|   4 ++
   blockdev.c|  14 +-
   hmp.c |   3 +-
   include/block/block.h |   1 +
   include/block/block_int.h |   2 +
   qapi/block-core.json  |  11 +++--
   qmp-commands.hx   |   7 +--
   9 files changed, 137 insertions(+), 30 deletions(-)
 
 diff --git a/block.c b/block.c
 index 3f33b9d..5eb6788 100644
 --- a/block.c
 +++ b/block.c
 @@ -5633,6 +5633,11 @@ static void bdrv_reset_dirty(BlockDriverState *bs, 
 int64_t cur_sector,
   }
   }
 
 +void bdrv_set_dirty_iter(HBitmapIter *hbi, int64_t offset)
 +{
 +hbitmap_iter_init(hbi, hbi-hb, offset);
 
 What's the reason for this indirection? Can caller just use 
 hbitmap_iter_init?
 
 
 Essentially it is just a rename of hbitmap_iter_init to make its usage here
 clear, that we are manually advancing the pointer. How we accomplish that
 (hbitmap_iter_init) is an implementation detail.
 
 Yes, I can just call this directly from block/backup, but at the time I was
 less sure of how I would advance the pointer, so I created a wrapper where I
 could change details as needed, if I needed to.

OK, either is fine for me.

 
 +}
 +
   int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap 
  *bitmap)
   {
   return hbitmap_count(bitmap-bitmap);
 diff --git a/block/backup.c b/block/backup.c
 index 792e655..0626a3e 100644
 --- a/block/backup.c
 +++ b/block/backup.c
 @@ -37,6 +37,8 @@ typedef struct CowRequest {
   typedef struct BackupBlockJob {
   BlockJob common;
   BlockDriverState *target;
 +/* bitmap for sync=dirty-bitmap */
 +BdrvDirtyBitmap *sync_bitmap;
   MirrorSyncMode sync_mode;
   RateLimit limit;
   BlockdevOnError on_source_error;
 @@ -242,6 +244,31 @@ static void backup_complete(BlockJob *job, void 
 *opaque)
   g_free(data);
   }
 
 +static bool coroutine_fn yield_and_check(BackupBlockJob *job)
 +{
 +if (block_job_is_cancelled(job-common)) {
 +return true;
 +}
 +
 +/* we need to yield so that qemu_aio_flush() returns.
 + * (without, VM does not reboot)
 + */
 +if (job-common.speed) {
 +uint64_t delay_ns = ratelimit_calculate_delay(job-limit,
 +  job-sectors_read);
 +job-sectors_read = 0;
 +block_job_sleep_ns(job-common, QEMU_CLOCK_REALTIME, delay_ns);
 +} else {
 +block_job_sleep_ns(job-common, QEMU_CLOCK_REALTIME, 0);
 +}
 +
 +if (block_job_is_cancelled(job-common)) {
 +return true;
 +}
 +
 +return false;
 +}
 +
   static void coroutine_fn backup_run(void *opaque)
   {
   BackupBlockJob *job = opaque;
 @@ -254,13 +281,13 @@ static void coroutine_fn backup_run(void *opaque)
   };
   int64_t start, end;
   int ret = 0;
 +bool error_is_read;
 
   QLIST_INIT(job-inflight_reqs);
   qemu_co_rwlock_init(job-flush_rwlock);
 
   start = 0;
 -end = DIV_ROUND_UP(job-common.len / BDRV_SECTOR_SIZE,
 -   BACKUP_SECTORS_PER_CLUSTER);
 +end = DIV_ROUND_UP(job-common.len, BACKUP_CLUSTER_SIZE);
 
   job-bitmap = hbitmap_alloc(end, 0);
 
 @@ -278,28 +305,45 @@ static void coroutine_fn backup_run(void *opaque)
   qemu_coroutine_yield();
   job-common.busy = true;
   }
 +} else if (job-sync_mode == MIRROR_SYNC_MODE_DIRTY_BITMAP) {
 +/* Dirty Bitmap sync has a slightly different iteration method */
 +HBitmapIter hbi;
 +int64_t sector;
 +int64_t cluster;
 +bool polyrhythmic;
 +
 +bdrv_dirty_iter_init(bs, job-sync_bitmap, hbi);
 +/* Does the granularity happen to match our backup cluster size? */
 +polyrhythmic = (bdrv_dirty_bitmap_granularity(bs, 
 job-sync_bitmap) !=
 +BACKUP_CLUSTER_SIZE);
 +
 +/* Find the next dirty /sector/ and copy that /cluster/ */
 +while ((sector = hbitmap_iter_next(hbi)) != -1) {
 +if (yield_and_check(job)) {
 +goto leave;
 +}
 +cluster = sector / BACKUP_SECTORS_PER_CLUSTER;
 +
 +do {
 +ret = backup_do_cow(bs, cluster * 
 BACKUP_SECTORS_PER_CLUSTER,
 +BACKUP_SECTORS_PER_CLUSTER, 
 error_is_read);
 +if ((ret  0) 
 +backup_error_action(job, error_is_read, -ret) ==
 +

[Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification

2015-01-13 Thread Vladimir Sementsov-Ogievskiy

Persistent dirty bitmaps will be saved into qcow2 files. It may be used
as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
other drives (there may be qcow2 file with zero disk size but with
several dirty bitmaps for other drives).

Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
---
 docs/specs/qcow2.txt | 59 
 1 file changed, 59 insertions(+)

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 121dfc8..b29de40 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -116,6 +116,13 @@ in the description of a field.
 Length of the header structure in bytes. For version 2
 images, the length is always assumed to be 72 bytes.
 
+104 - 107:  nb_dirty_bitmaps
+Number of dirty bitmaps contained in the image
+
+108 - 115:  dirty_bitmaps_offset
+Offset into the image file at which the dirty bitmaps table
+starts. Must be aligned to a cluster boundary.
+
 Directly after the image header, optional sections called header extensions can
 be stored. Each extension has a structure like the following:
 
@@ -360,3 +367,55 @@ Snapshot table entry:
 
 variable:   Padding to round up the snapshot table entry size to the
 next multiple of 8.
+
+
+== Dirty bitmaps ==
+
+The feature supports storing several dirty bitmaps in the qcow2 file.
+
+=== Cluster mapping ===
+
+Dirty bitmaps are stored using a ONE-level structure for the mapping of
+bitmaps to host clusters. There only an L1 table.
+
+The L1 table has a variable size (stored in the Bitmap table entry) and may
+use multiple clusters, however it must be contiguous in the image file.
+
+Given an offset into the bitmap, the offset into the image file can be
+obtained as follows:
+
+offset = l1_table[offset / cluster_size] + (offset % cluster_size)
+
+L1 table entry:
+
+Bit  0 -  61:   Standard cluster descriptor
+
+62 -  63:   Reserved
+
+=== Bitmap table ===
+
+A directory of all bitmaps is stored in the bitmap table, a contiguous area in
+the image file, whose starting offset and length are given by the header fields
+dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap table have
+variable length, depending on the length of name and extra data.
+
+Bitmap table entry:
+
+Byte 0 -  7:Offset into the image file at which the L1 table for the
+bitmap starts. Must be aligned to a cluster boundary.
+
+ 8 - 11:Number of entries in the L1 table of the bitmap
+
+12 - 15:Bitmap granularity in bytes
+
+16 - 23:Bitmap size in sectors
+
+24 - 25:Size of the bitmap name
+
+variable:   The name of the bitmap (not null terminated)
+
+variable:   Padding to round up the bitmap table entry size to the
+next multiple of 8.
+
+The fields size, granularity and name are corresponding with the fields
+in struct BdrvDirtyBitmap.
-- 
1.9.1

Re: [Qemu-devel] Patch Round-up for stable 2.1.3, freeze on 2015-01-14

2015-01-13 Thread William Dauchy

Hello,

On Jan09 23:42, Paolo Bonzini wrote:
 That's commit 364c3e6b8dd7912e01d19122d791b8c8f6df4f6c on branch
 uq/master of git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git.

maybe the one for:
fix regression when reading memory size from config file

is also a patch to think about.

Thanks,
-- 
William


signature.asc
Description: Digital signature

Re: [Qemu-devel] [PATCH for 2.3 v2 1/1] xen-hvm: increase maxmem before calling xc_domain_populate_physmap

2015-01-13 Thread Stefano Stabellini

On Mon, 12 Jan 2015, Stefano Stabellini wrote:
 On Wed, 3 Dec 2014, Don Slutz wrote:
  From: Stefano Stabellini stefano.stabell...@eu.citrix.com
  
  Increase maxmem before calling xc_domain_populate_physmap_exact to
  avoid the risk of running out of guest memory. This way we can also
  avoid complex memory calculations in libxl at domain construction
  time.
  
  This patch fixes an abort() when assigning more than 4 NICs to a VM.
  
  Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
  Signed-off-by: Don Slutz dsl...@verizon.com
  ---
  v2: Changes by Don Slutz
Switch from xc_domain_getinfo to xc_domain_getinfolist
Fix error check for xc_domain_getinfolist
Limit increase of maxmem to only do when needed:
  Add QEMU_SPARE_PAGES (How many pages to leave free)
  Add free_pages calculation
  
   xen-hvm.c | 19 +++
   1 file changed, 19 insertions(+)
  
  diff --git a/xen-hvm.c b/xen-hvm.c
  index 7548794..d30e77e 100644
  --- a/xen-hvm.c
  +++ b/xen-hvm.c
  @@ -90,6 +90,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t 
  *shared_page, int vcpu)
   #endif
   
   #define BUFFER_IO_MAX_DELAY  100
  +#define QEMU_SPARE_PAGES 16
 
 We need a big comment here to explain why we have this parameter and
 when we'll be able to get rid of it.
 
 Other than that the patch is fine.
 
 Thanks!
 

Actually I'll just go ahead and add the comment and commit, if for you
is OK.

Cheers,

Stefano

Re: [Qemu-devel] [PATCH v5] block/raw-posix.c: Fixes raw_getlength() on Mac OS X so that it reports the correct length of a real CD

2015-01-13 Thread Peter Maydell

On 13 January 2015 at 18:26, Programmingkid programmingk...@gmail.com wrote:
 Allows for using real cdrom disc in QEMU under Mac OS X.

 This command was used to see if this patch worked:
 ./qemu-system-i386 -drive if=none,id=drive0,file=/dev/null,format=raw

 Signed-off-by: John Arbuckle programmingk...@gmail.com

 ---
 Replaced -errno with 0 for ioctl() failure return value.
 make check now passes with this patch.


  block/raw-posix.c |   18 +-
  configure |2 +-
  2 files changed, 18 insertions(+), 2 deletions(-)

 diff --git a/block/raw-posix.c b/block/raw-posix.c
 index e51293a..16fa0a4 100644
 --- a/block/raw-posix.c
 +++ b/block/raw-posix.c
 @@ -1312,7 +1312,23 @@ again:
  if (size == 0)
  #endif
  #if defined(__APPLE__)  defined(__MACH__)
 -size = LLONG_MAX;
 +{
 +uint64_t sectors = 0;
 +uint32_t sector_size = 0;
 +
 +/* Query the number of sectors on the disk */
 +ret = ioctl(fd, DKIOCGETBLOCKCOUNT, sectors);
 +if (ret == -1) {
 +return 0;

We should be falling back to lseek, not returning 0...

thanks
-- PMM

[Qemu-devel] [PATCH 6/8] qemu: command line option for dirty bitmaps

2015-01-13 Thread Vladimir Sementsov-Ogievskiy

The patch adds the following command line option:

-dirty-bitmap [option1=val1][,option2=val2]...
Available options are:
name The name for the bitmap (necessary).

file The file to load the bitmap from.

file_id  When specified with 'file' option, then this file will
 be available through this id for other -dirty-bitmap
 options when specified without 'file' option, then it
 is a reference to 'file', specified with another
 -dirty-bitmap option, and it will be used to load the
 bitmap from.

driveThe drive to bind the bitmap to. It should be specified
 as 'id' suboption of one of -drive options. If nor
 'file' neither 'file_id' are specified, then the bitmap
 will be loaded from that drive (internal dirty bitmap).

granularity  The granularity for the bitmap. Not necessary, the
 default value may be used.

enabled  on|off. Default is 'on'. Disabled bitmaps are not
 changing regardless of writes to corresponding drive.

Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
---
 blockdev.c|  38 ++
 include/sysemu/blockdev.h |   1 +
 include/sysemu/sysemu.h   |   1 +
 qemu-options.hx   |  37 +
 vl.c  | 100 ++
 5 files changed, 177 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 209fedd..8a9be08 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -176,6 +176,11 @@ QemuOpts *drive_def(const char *optstr)
 return qemu_opts_parse(qemu_find_opts(drive), optstr, 0);
 }
 
+QemuOpts *dirty_bitmap_def(const char *optstr)
+{
+return qemu_opts_parse(qemu_find_opts(dirty-bitmap), optstr, 0);
+}
+
 QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file,
 const char *optstr)
 {
@@ -3044,6 +3049,39 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 return head;
 }
 
+QemuOptsList qemu_dirty_bitmap_opts = {
+.name = dirty-bitmap,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_dirty_bitmap_opts.head),
+.desc = {
+{
+.name = name,
+.type = QEMU_OPT_STRING,
+.help = Name of the dirty bitmap,
+},{
+.name = file,
+.type = QEMU_OPT_STRING,
+.help = file name to load the bitmap from,
+},{
+.name = file_id,
+.type = QEMU_OPT_STRING,
+.help = node name to load the bitmap from (or to set id for
+ for file, opened by previous option),
+},{
+.name = drive,
+.type = QEMU_OPT_STRING,
+.help = drive id to bind the bitmap to,
+},{
+.name = granularity,
+.type = QEMU_OPT_NUMBER,
+.help = granularity,
+},{
+.name = enabled,
+.type = QEMU_OPT_BOOL,
+.help = enabled flag (default is 'on'),
+}
+}
+};
+
 QemuOptsList qemu_common_drive_opts = {
 .name = drive,
 .head = QTAILQ_HEAD_INITIALIZER(qemu_common_drive_opts.head),
diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index 09d1e30..48cce5c 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -57,6 +57,7 @@ int drive_get_max_devs(BlockInterfaceType type);
 DriveInfo *drive_get_next(BlockInterfaceType type);
 
 QemuOpts *drive_def(const char *optstr);
+QemuOpts *dirty_bitmap_def(const char *optstr);
 QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file,
 const char *optstr);
 DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType block_default_type);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 503e5a4..c984a52 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -231,6 +231,7 @@ bool usb_enabled(bool default_usb);
 
 extern QemuOptsList qemu_legacy_drive_opts;
 extern QemuOptsList qemu_common_drive_opts;
+extern QemuOptsList qemu_dirty_bitmap_opts;
 extern QemuOptsList qemu_drive_opts;
 extern QemuOptsList qemu_chardev_opts;
 extern QemuOptsList qemu_device_opts;
diff --git a/qemu-options.hx b/qemu-options.hx
index 10b9568..3a5bfde 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -592,6 +592,43 @@ qemu-system-i386 -hda a -hdb b
 @end example
 ETEXI
 
+DEF(dirty-bitmap, HAS_ARG, QEMU_OPTION_dirty_bitmap,
+-dirty-bitmap name=name[,file=file][,file_id=file_id][,drive=@var{id}]\n
+  [,granularity=granularity][,enabled=on|off]\n,
+QEMU_ARCH_ALL)
+STEXI
+@item -dirty-bitmap @var{option}[,@var{option}[,@var{option}[,...]]]
+@findex -dirty-bitmap
+
+Define a dirty-bitmap. Valid options are:
+
+@table @option
+@item name=@var{name}
+The name of the bitmap. Should be unique per @var{file}/@var{drive} and per
+@var{for_drive}.
+@item file=@var{file}

[Qemu-devel] [PATCH 4/8] block: store persistent dirty bitmaps

2015-01-13 Thread Vladimir Sementsov-Ogievskiy

Persistent dirty bitmaps are the bitmaps, for which the new field
BdrvDirtyBitmap.file is not NULL. We save all persistent dirty bitmaps
owned by BlockDriverState in corresponding bdrv_close().
BdrvDirtyBitmap.file is a BlockDriverState, where we want to save the
bitmap. It may be set in bdrv_dirty_bitmap_set_file() only once.
bdrv_ref/bdrv_unref are used for BdrvDirtyBitmap.file to be sure that
files will be closed and resources will be freed.

Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
---
 block.c   | 43 +++
 include/block/block.h |  3 +++
 2 files changed, 46 insertions(+)

diff --git a/block.c b/block.c
index 2466ba8..7237b95 100644
--- a/block.c
+++ b/block.c
@@ -54,6 +54,7 @@
 struct BdrvDirtyBitmap {
 HBitmap *bitmap;
 BdrvDirtyBitmap *originator;
+BlockDriverState *file;
 int64_t size;
 int64_t granularity;
 char *name;
@@ -1840,6 +1841,7 @@ void bdrv_reopen_abort(BDRVReopenState *reopen_state)
 void bdrv_close(BlockDriverState *bs)
 {
 BdrvAioNotifier *ban, *ban_next;
+BdrvDirtyBitmap *bm, *bm_next;
 
 if (bs-job) {
 block_job_cancel_sync(bs-job);
@@ -1849,6 +1851,15 @@ void bdrv_close(BlockDriverState *bs)
 bdrv_drain_all(); /* in case flush left pending I/O */
 notifier_list_notify(bs-close_notifiers, bs);
 
+/* save and release persistent dirty bitmaps */
+QLIST_FOREACH_SAFE(bm, bs-dirty_bitmaps, list, bm_next) {
+if (bm-file) {
+bdrv_store_dirty_bitmap(bm);
+bdrv_unref(bm-file);
+bdrv_release_dirty_bitmap(bs, bm);
+}
+}
+
 if (bs-drv) {
 if (bs-backing_hd) {
 BlockDriverState *backing_hd = bs-backing_hd;
@@ -5373,6 +5384,29 @@ BdrvDirtyBitmap 
*bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
 return originator;
 }
 
+int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap)
+{
+BlockDriverState *bs = bitmap-file;
+uint8_t *buf;
+uint64_t size;
+assert(bs);
+assert(bs-drv);
+assert(bs-drv-bdrv_dirty_bitmap_store);
+
+size = hbitmap_data_size(bitmap-bitmap, bitmap-size);
+size = (size + 3)  ~3;
+buf = g_malloc(size);
+
+hbitmap_store_data(bitmap-bitmap, buf, 0, bitmap-size);
+
+int res = bs-drv-bdrv_dirty_bitmap_store(bs, buf,
+   bitmap-name,
+   bitmap-size,
+   bitmap-granularity);
+
+g_free(buf);
+return res;
+}
 
 BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
   int granularity,
@@ -5421,6 +5455,15 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, 
BdrvDirtyBitmap *bitmap)
 }
 }
 
+void bdrv_dirty_bitmap_set_file(BdrvDirtyBitmap *bitmap, BlockDriverState 
*file)
+{
+assert(bitmap-file == NULL);
+bitmap-file = file;
+if (file != NULL) {
+bdrv_ref(file);
+}
+}
+
 void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap)
 {
 bitmap-enabled = false;
diff --git a/include/block/block.h b/include/block/block.h
index cb1f28d..0dfefe3 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -442,6 +442,8 @@ BdrvDirtyBitmap *bdrv_copy_dirty_bitmap(BlockDriverState 
*bs,
 BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
BdrvDirtyBitmap *failed);
 void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
+void bdrv_dirty_bitmap_set_file(BdrvDirtyBitmap *bitmap,
+BlockDriverState *file);
 void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs);
@@ -458,6 +460,7 @@ void bdrv_dirty_iter_init(BlockDriverState *bs,
   BdrvDirtyBitmap *bitmap, struct HBitmapIter *hbi);
 void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t offset);
 int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
+int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
-- 
1.9.1

Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.

2015-01-13 Thread Markus Armbruster

Stefan Hajnoczi stefa...@redhat.com writes:

 On Tue, Jan 13, 2015 at 11:51:51AM +0100, Markus Armbruster wrote:
 Christian Borntraeger borntrae...@de.ibm.com writes:
 
  Am 02.01.2015 um 13:57 schrieb Stefan Hajnoczi:
  On Thu, Dec 18, 2014 at 04:59:50PM +0100, Christian Borntraeger wrote:
  Are you ok with the patches? If yes, can you take care of these
  patches in the block tree?
  
  This series looks close, I've left comments on the patches.
 
  OK, so you would take care of the series in your tree if the next spin
  addresses your comments, right?
  
  The series is fine for command-line QEMU users where probing makes the
  command-line more convenient, so we can merge it.  But the approach is
  fundamentally wrong for stacks where libvirt is in use.
  Libvirt is unaware of the guest geometry and block sizes that are probed
  in QEMU by this patch series.  This breaks non-shared storage migration
  and also means libvirt-based tools that manipulate drives on a guest may
  inadvertently change the guest-visible geometry and cause disk problems.
  
  For example, what happens when you copy the disk image off a host DASD
  and onto NFS?  QEMU no longer probes the geometry and the disk geometry
  has changed.
  
  The right place to tackle guest-visible geometry is in libvirt, not in
  QEMU, because it is guest state the needs to be captured in domain XML
  so that migration and tooling can preserve it when manipulating guests.
  
  Stefan
  
 
  I agree that this is not perfect and has obvious holes, but  it works 
  just fine with libvirt stacks that are typical on s390 (*). scsi disks
  (emulated and real) are not affected and work just fine. DASD disks are
  special anyway - please consider this as some kind of real HW pass-through
  even when we use virtio-blk. We can assume that admins can provide
  shared access between source and target. If you look at the real HW (LPAR
  and z/VM) DASDs are always attached via fibre channel (FICON protocol)
  (often with SAN switches) so shared access is quite common even over longer
  distances.
 
 I wouldn't quite call it pass-through, but I agree that DASDs are
 special.
 
  And yes, using NFS or image file will break unless the user specifies the
  geometry in libvirt. Setting these values is possible as of today
  in libvirts
  XML. But what programmatic way should libvirt use to detect that
  information
  itself? On an image file libvirt doesnt know either. This would be somewhat
  possible on an upper level like open stack and that upper level would then
  need to talk with the DS8000 storage subsystems and the system z
  hardware but
  even then it would fail when creating new DASD like images.  (This
  would boil
  down to provide an interface like create an image file that looks like as
  DASD of type 3390/xx formatted with dasdfmt and the following parameters).
 
 Disk geometry is really a device model property.  For historical
 reasons, we conjure up default geometries in the block layer.  For
 convenience, we special-case DASDs (this series).
 
 Very few guests care about device geometry.
 
 If your guest does, relying on default geometry has always exposed you
 to the risk of an unwanted geometry change.  So far largely theoretical,
 as the default geometry guessing algorithm has been unlikely to change
 incompatibily.  To eliminate the risk, specify the geometry explicitly.
 With libvirt, that means putting it into domain XML.
 
 This series changes *does* change the guessing algorithm, but for DASDs
 only.  Thus, the risk becomes real, but for users of DASDs only.  Since
 said users of DASDs ask for it...  let them have rope, I say.
 
 Libvirt could default the geometry to whatever QEMU comes up with, then
 fix it forever in domain XML.  Doubtful whether it's worth the bother,
 as very few guests care, and their users are probably (painfully) aware
 of geometry pitfalls anyway.
 
  If we talk about cloud/openstack etc we do not have pass through
  devices anyway
  and only image files. If somebody asks me, I would create all
  image files as
  SCSI images anyway, all the DASD stuff makes sense if you have
  DASD hardware
  (or if you really know what you are going to do).
 
  What is your opinion on 
  a) migrating disk properties like geometry
  b) comparing the detected disk properties and reject migration if they
 dont match?
  Both changes  should provide a safety net and could be added at a later
  point in time.
 
 If I remember correctly I suggested to investigate in that direction,
 but then we concluded it wasn't worth the bother.

 Migrating disk geometry violates how live migration works:

 1. Run-time device state like device registers and memory contents are
live-migrated by QEMU.

 2. Machine configuration like the list of emulated devices and
read-only properties on those devices are not live migrated by QEMU.

Correct.  Frontend configuration could be migrated, but we don't.

 Disk geometry is a property of the

[Qemu-devel] [PATCH 4/8] rcu: allow nesting of rcu_read_lock/rcu_read_unlock

2015-01-13 Thread Paolo Bonzini

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 include/qemu/rcu.h | 15 ++-
 tests/rcutorture.c |  2 ++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index 91b3a5c..e3d9b63 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -76,6 +76,9 @@ struct rcu_reader_data {
 unsigned long ctr;
 bool waiting;
 
+/* Data used by reader only */
+unsigned depth;
+
 /* Data used for registry, protected by rcu_gp_lock */
 QLIST_ENTRY(rcu_reader_data) node;
 };
@@ -85,8 +88,13 @@ extern __thread struct rcu_reader_data rcu_reader;
 static inline void rcu_read_lock(void)
 {
 struct rcu_reader_data *p_rcu_reader = rcu_reader;
+unsigned ctr;
+
+if (p_rcu_reader-depth++  0) {
+return;
+}
 
-unsigned ctr = atomic_read(rcu_gp_ctr);
+ctr = atomic_read(rcu_gp_ctr);
 atomic_xchg(p_rcu_reader-ctr, ctr);
 if (atomic_read(p_rcu_reader-waiting)) {
 atomic_set(p_rcu_reader-waiting, false);
@@ -98,6 +106,11 @@ static inline void rcu_read_unlock(void)
 {
 struct rcu_reader_data *p_rcu_reader = rcu_reader;
 
+assert(p_rcu_reader-depth != 0);
+if (--p_rcu_reader-depth  0) {
+return;
+}
+
 atomic_xchg(p_rcu_reader-ctr, 0);
 if (atomic_read(p_rcu_reader-waiting)) {
 atomic_set(p_rcu_reader-waiting, false);
diff --git a/tests/rcutorture.c b/tests/rcutorture.c
index cda7458..a6ac2a9 100644
--- a/tests/rcutorture.c
+++ b/tests/rcutorture.c
@@ -257,9 +257,11 @@ static void *rcu_read_stress_test(void *arg)
 if (p-mbtest == 0) {
 n_mberror++;
 }
+rcu_read_lock();
 for (i = 0; i  100; i++) {
 garbage++;
 }
+rcu_read_unlock();
 pc = p-pipe_count;
 rcu_read_unlock();
 if ((pc  RCU_STRESS_PIPE_LEN) || (pc  0)) {
-- 
1.8.3.1

[Qemu-devel] [PATCH 7/8] memory: protect current_map by RCU

2015-01-13 Thread Paolo Bonzini

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 include/exec/memory.h |  5 +
 memory.c  | 54 ++-
 2 files changed, 28 insertions(+), 31 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 0cd96b1..06ffa1d 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -33,6 +33,7 @@
 #include qemu/notify.h
 #include qapi/error.h
 #include qom/object.h
+#include qemu/rcu.h
 
 #define MAX_PHYS_ADDR_SPACE_BITS 62
 #define MAX_PHYS_ADDR(((hwaddr)1  MAX_PHYS_ADDR_SPACE_BITS) - 1)
@@ -207,9 +208,13 @@ struct MemoryListener {
  */
 struct AddressSpace {
 /* All fields are private. */
+struct rcu_head rcu;
 char *name;
 MemoryRegion *root;
+
+/* Accessed via RCU.  */
 struct FlatView *current_map;
+
 int ioeventfd_nb;
 struct MemoryRegionIoeventfd *ioeventfds;
 struct AddressSpaceDispatch *dispatch;
diff --git a/memory.c b/memory.c
index 8c3d8c0..a844ced 100644
--- a/memory.c
+++ b/memory.c
@@ -33,26 +33,12 @@ static bool memory_region_update_pending;
 static bool ioeventfd_update_pending;
 static bool global_dirty_log = false;
 
-/* flat_view_mutex is taken around reading as-current_map; the critical
- * section is extremely short, so I'm using a single mutex for every AS.
- * We could also RCU for the read-side.
- *
- * The BQL is taken around transaction commits, hence both locks are taken
- * while writing to as-current_map (with the BQL taken outside).
- */
-static QemuMutex flat_view_mutex;
-
 static QTAILQ_HEAD(memory_listeners, MemoryListener) memory_listeners
 = QTAILQ_HEAD_INITIALIZER(memory_listeners);
 
 static QTAILQ_HEAD(, AddressSpace) address_spaces
 = QTAILQ_HEAD_INITIALIZER(address_spaces);
 
-static void memory_init(void)
-{
-qemu_mutex_init(flat_view_mutex);
-}
-
 typedef struct AddrRange AddrRange;
 
 /*
@@ -242,6 +228,7 @@ struct FlatRange {
  * order.
  */
 struct FlatView {
+struct rcu_head rcu;
 unsigned ref;
 FlatRange *ranges;
 unsigned nr;
@@ -654,10 +641,10 @@ static FlatView *address_space_get_flatview(AddressSpace 
*as)
 {
 FlatView *view;
 
-qemu_mutex_lock(flat_view_mutex);
-view = as-current_map;
+rcu_read_lock();
+view = atomic_rcu_read(as-current_map);
 flatview_ref(view);
-qemu_mutex_unlock(flat_view_mutex);
+rcu_read_unlock();
 return view;
 }
 
@@ -766,10 +753,9 @@ static void address_space_update_topology(AddressSpace *as)
 address_space_update_topology_pass(as, old_view, new_view, false);
 address_space_update_topology_pass(as, old_view, new_view, true);
 
-qemu_mutex_lock(flat_view_mutex);
-flatview_unref(as-current_map);
-as-current_map = new_view;
-qemu_mutex_unlock(flat_view_mutex);
+/* Writes are protected by the BQL.  */
+atomic_rcu_set(as-current_map, new_view);
+call_rcu(old_view, flatview_unref, rcu);
 
 /* Note that all the old MemoryRegions are still alive up to this
  * point.  This relieves most MemoryListeners from the need to
@@ -1957,10 +1943,6 @@ void memory_listener_unregister(MemoryListener *listener)
 
 void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
 {
-if (QTAILQ_EMPTY(address_spaces)) {
-memory_init();
-}
-
 memory_region_transaction_begin();
 as-root = root;
 as-current_map = g_new(FlatView, 1);
@@ -1974,15 +1956,10 @@ void address_space_init(AddressSpace *as, MemoryRegion 
*root, const char *name)
 memory_region_transaction_commit();
 }
 
-void address_space_destroy(AddressSpace *as)
+static void do_address_space_destroy(AddressSpace *as)
 {
 MemoryListener *listener;
 
-/* Flush out anything from MemoryListeners listening in on this */
-memory_region_transaction_begin();
-as-root = NULL;
-memory_region_transaction_commit();
-QTAILQ_REMOVE(address_spaces, as, address_spaces_link);
 address_space_destroy_dispatch(as);
 
 QTAILQ_FOREACH(listener, memory_listeners, link) {
@@ -1994,6 +1971,21 @@ void address_space_destroy(AddressSpace *as)
 g_free(as-ioeventfds);
 }
 
+void address_space_destroy(AddressSpace *as)
+{
+/* Flush out anything from MemoryListeners listening in on this */
+memory_region_transaction_begin();
+as-root = NULL;
+memory_region_transaction_commit();
+QTAILQ_REMOVE(address_spaces, as, address_spaces_link);
+
+/* At this point, as-dispatch and as-current_map are dummy
+ * entries that the guest should never use.  Wait for the old
+ * values to expire before freeing the data.
+ */
+call_rcu(as, do_address_space_destroy, rcu);
+}
+
 bool io_mem_read(MemoryRegion *mr, hwaddr addr, uint64_t *pval, unsigned size)
 {
 return memory_region_dispatch_read(mr, addr, pval, size);
-- 
1.8.3.1

Re: [Qemu-devel] Question regarding two variables in qemu migration code

2015-01-13 Thread Dr. David Alan Gilbert

* Jidong Xiao (jidong.x...@gmail.com) wrote:
 On Tue, Jan 13, 2015 at 1:38 AM, Dr. David Alan Gilbert
 dgilb...@redhat.com wrote:
  * Jidong Xiao (jidong.x...@gmail.com) wrote:
  Hi,
 
  Hi,
 
  I am looking at the qemu source code, and trying to understand the
  migration part. In arch_init.c, there are two variables which seems
  quite confusing to me,
 
  They are:
 
  static uint64_t migration_dirty_pages;
 
  'migration_dirty_pages' is the number of pages that are currently known
  that need to be sent to the destination; it goes down whenever we send
  a page, but goes up when we sync the dirty bitmap that tells us that
  something changed the data in the page (see migration_bitmap_sync_range )
 
  static int64_t num_dirty_pages_period; // defined in function
  migration_bitmap_sync()
 
  This is looking how many pages we've noticed are now dirty within
  a particular time - to try and get an estimate of how fast memory is 
  changing
  If you see migration_bitmap_sync  has an:
 if (end_time  start_time + 1000) {
 
  and inside there it uses num_dirty_pages_period to update dirty_pages_rate.
 
 
  Can anyone kindly explain that what does these two variables mean? Thanks.
 
  -Jidong
 
  Dave
  --
 Thanks Dave, your explanation is really really helpful.
 
 But in function migration_bitmap_sync(), I see this:
 
 num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
 
 If as you said, num_dirty_pages_period refers to the pages get dirty
 within a particular time, then why it is +=???, instead of =? i.e.,
 something like this:
 
 num_dirty_pages_period = migration_dirty_pages - num_dirty_pages_init;
 
 I just don't see why num_dirty_pages_period has to be accumulated with
 its previous value.

It's because there are two periods:
   migration_dirty_pages - num_dirty_pages_init
 is the number of pages found dirty in this call to 
migration_bitmap_sync

   num_dirty_pages_period
 is the number of pages found dirty in a ~1 second periods, as 
tested by
 the code in migration_bitmap_sync after the:
 if (end_time  start_time + 1000) {

migration_bitmap_sync is called multiple times within that ~1 second period, and
num_dirty_pages_period accumulates across the calls.

Dave

 
 -Jidong
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH 0/8] block: persistent dirty bitmaps (RFC)

2015-01-13 Thread Vladimir Sementsov-Ogievskiy

The bitmaps are saved into qcow2 file format. It provides both
'internal' and 'external' dirty bitmaps feature:
 - for qcow2 drives we can store bitmaps in the same file
 - for other formats we can store bitmaps in the separate qcow2 file

QCow2 header is extended by fields 'nb_dirty_bitmaps' and
'dirty_bitmaps_offset' like with snapshots.

Proposed command line syntax is the following:

-dirty-bitmap [option1=val1][,option2=val2]...
Available options are:
name The name for the bitmap (necessary).

file The file to load the bitmap from.

file_id  When specified with 'file' option, then this file will
 be available through this id for other -dirty-bitmap
 options when specified without 'file' option, then it
 is a reference to 'file', specified with another
 -dirty-bitmap option, and it will be used to load the
 bitmap from.

driveThe drive to bind the bitmap to. It should be specified
 as 'id' suboption of one of -drive options. If nor
 'file' neither 'file_id' are specified, then the bitmap
 will be loaded from that drive (internal dirty bitmap).

granularity  The granularity for the bitmap. Not necessary, the
 default value may be used.

enabled  on|off. Default is 'on'. Disabled bitmaps are not
 changing regardless of writes to corresponding drive.

Examples:

qemu -drive file=a.qcow2,id=disk -dirty-bitmap name=b,drive=disk
qemu -drive file=a.raw,id=disk \
 -dirty-bitmap name=b,drive=disk,file=b.qcow2,enabled=off

Vladimir Sementsov-Ogievskiy (8):
  spec: add qcow2-dirty-bitmaps specification
  hbitmap: store / restore
  qcow2: add dirty-bitmaps feature
  block: store persistent dirty bitmaps
  block: add bdrv_load_dirty_bitmap
  qemu: command line option for dirty bitmaps
  qmp: print dirty bitmap
  iotests: test internal persistent dirty bitmap

 block.c| 113 ++
 block/Makefile.objs|   2 +-
 block/qcow2-dirty-bitmap.c | 514 +
 block/qcow2.c  |  26 +++
 block/qcow2.h  |  48 +
 blockdev.c |  51 +
 docs/specs/qcow2.txt   |  59 ++
 hmp-commands.hx|  15 ++
 hmp.c  |   8 +
 hmp.h  |   1 +
 include/block/block.h  |   9 +
 include/block/block_int.h  |  10 +
 include/qemu/hbitmap.h |  49 +
 include/sysemu/blockdev.h  |   1 +
 include/sysemu/sysemu.h|   1 +
 qapi-schema.json   |   3 +-
 qapi/block-core.json   |   3 +
 qemu-options.hx|  37 
 qmp-commands.hx|   5 +
 tests/qemu-iotests/115 |  96 +
 tests/qemu-iotests/115.out |  64 ++
 tests/qemu-iotests/group   |   1 +
 util/hbitmap.c |  87 
 vl.c   | 100 +
 24 files changed, 1301 insertions(+), 2 deletions(-)
 create mode 100644 block/qcow2-dirty-bitmap.c
 create mode 100755 tests/qemu-iotests/115
 create mode 100644 tests/qemu-iotests/115.out

-- 
1.9.1

Re: [Qemu-devel] [PATCH 0/1] pci: allow 0 address for PCI IO/MEM regions

2015-01-13 Thread Alexander Graf


On 01/13/15 17:17, Michael S. Tsirkin wrote:

On Tue, Jan 13, 2015 at 05:54:46PM +0200, Michael S. Tsirkin wrote:

I think we already do this for PC:
commit 83d08f2673504a299194dcac1657a13754b5932a
Author: Michael S. Tsirkin m...@redhat.com
Date:   Tue Oct 29 13:57:34 2013 +0100

pc: map PCI address space as catchall region for not mapped 
addresses

but we need to find and fix all other targets.

BTW this is very easy to test.
Add an unused device (like ivshmem) enable BAR, and set it to 0.
System should survive, as opposed to hanging.

But the big question is whether this is the right
thing to do for each platform.
For PIIX whatever is not system memory, is PCI.
But other boxes might have a different view of the matter.


Only few platforms have PCI mapped at 0, no? So in most cases you get 
windows that simply don't have overlapping at all.



Alex

[Qemu-devel] [PATCH 1/1] Print PID and time in stderr traces

2015-01-13 Thread Dr. David Alan Gilbert (git)

From: Dr. David Alan Gilbert dgilb...@redhat.com

When debugging migration it's useful to know the PID of
each trace message so you can figure out if it came from the source
or the destination.

Printing the time makes it easy to do latency measurements or timings
between trace points.

Signed-off-by: Dr. David Alan Gilbert dgilb...@redhat.com
---
 scripts/tracetool/backend/stderr.py | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/scripts/tracetool/backend/stderr.py 
b/scripts/tracetool/backend/stderr.py
index 2a1e906..16878a3 100644
--- a/scripts/tracetool/backend/stderr.py
+++ b/scripts/tracetool/backend/stderr.py
@@ -21,6 +21,9 @@ PUBLIC = True
 
 def generate_h_begin(events):
 out('#include stdio.h',
+'#include sys/time.h',
+'#include sys/types.h',
+'#include unistd.h',
 '#include trace/control.h',
 '')
 
@@ -31,7 +34,12 @@ def generate_h(event):
 argnames = ,  + argnames
 
 out('if (trace_event_get_state(%(event_id)s)) {',
-'fprintf(stderr, %(name)s  %(fmt)s \\n %(argnames)s);',
+'struct timeval _now;',
+'gettimeofday(_now, NULL);',
+'fprintf(stderr, %%d@%%zd.%%zd:%(name)s  %(fmt)s \\n,',
+'getpid(),',
+'(size_t)_now.tv_sec, (size_t)_now.tv_usec',
+'%(argnames)s);',
 '}',
 event_id=TRACE_ + event.name.upper(),
 name=event.name,
-- 
2.1.0

[Qemu-devel] [PATCH 8/9] balloon: Factor out common is balloon active test

2015-01-13 Thread Markus Armbruster

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 balloon.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/balloon.c b/balloon.c
index 2884c2d..aa30617 100644
--- a/balloon.c
+++ b/balloon.c
@@ -36,6 +36,19 @@ static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
 static void *balloon_opaque;
 
+static int have_ballon(Error **errp)
+{
+if (kvm_enabled()  !kvm_has_sync_mmu()) {
+error_set(errp, QERR_KVM_MISSING_CAP, synchronous MMU, balloon);
+return 0;
+}
+if (!balloon_event_fn) {
+error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon);
+return 0;
+}
+return 1;
+}
+
 int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
  QEMUBalloonStatus *stat_func, void *opaque)
 {
@@ -66,13 +79,7 @@ BalloonInfo *qmp_query_balloon(Error **errp)
 {
 BalloonInfo *info;
 
-if (kvm_enabled()  !kvm_has_sync_mmu()) {
-error_set(errp, QERR_KVM_MISSING_CAP, synchronous MMU, balloon);
-return NULL;
-}
-
-if (!balloon_stat_fn) {
-error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon);
+if (!have_ballon(errp)) {
 return NULL;
 }
 
@@ -83,8 +90,7 @@ BalloonInfo *qmp_query_balloon(Error **errp)
 
 void qmp_balloon(int64_t target, Error **errp)
 {
-if (kvm_enabled()  !kvm_has_sync_mmu()) {
-error_set(errp, QERR_KVM_MISSING_CAP, synchronous MMU, balloon);
+if (!have_ballon(errp)) {
 return;
 }
 
@@ -92,11 +98,6 @@ void qmp_balloon(int64_t target, Error **errp)
 error_set(errp, QERR_INVALID_PARAMETER_VALUE, target, a size);
 return;
 }
-
-if (!balloon_event_fn) {
-error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon);
-return;
-}
 
 trace_balloon_event(balloon_opaque, target);
 balloon_event_fn(balloon_opaque, target);
-- 
1.9.3

[Qemu-devel] [PATCH 3/9] hmp: Compile hmp_info_spice() only with CONFIG_SPICE

2015-01-13 Thread Markus Armbruster

It's dead code when CONFIG_SPICE is off.  If it wasn't, it would crash
dereferencing the null pointer returned by the qmp_query_spice()
dummy in qmp.c.

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 hmp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hmp.c b/hmp.c
index 481be80..a42c5c0 100644
--- a/hmp.c
+++ b/hmp.c
@@ -535,6 +535,7 @@ out:
 qapi_free_VncInfo(info);
 }
 
+#ifdef CONFIG_SPICE
 void hmp_info_spice(Monitor *mon, const QDict *qdict)
 {
 SpiceChannelList *chan;
@@ -581,6 +582,7 @@ void hmp_info_spice(Monitor *mon, const QDict *qdict)
 out:
 qapi_free_SpiceInfo(info);
 }
+#endif
 
 void hmp_info_balloon(Monitor *mon, const QDict *qdict)
 {
-- 
1.9.3

[Qemu-devel] [PATCH 1/9] qmp hmp: Factor out common using spice test

2015-01-13 Thread Markus Armbruster

Into qemu_using_spice().  For want of a better place, put it next the
existing monitor command handler dummies in qemu-spice.h.

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 include/ui/qemu-spice.h | 10 ++
 monitor.c   |  5 +++--
 qmp.c   | 11 +++
 3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/ui/qemu-spice.h b/include/ui/qemu-spice.h
index a93b4b2..b3f2679 100644
--- a/include/ui/qemu-spice.h
+++ b/include/ui/qemu-spice.h
@@ -88,4 +88,14 @@ static inline int qemu_spice_display_add_client(int csock, 
int skipauth,
 
 #endif /* CONFIG_SPICE */
 
+static inline int qemu_using_spice(Error **errp)
+{
+if (!using_spice) {
+/* correct one? spice isn't a device ,,, */
+error_set(errp, QERR_DEVICE_NOT_ACTIVE, spice);
+return 0;
+}
+return 1;
+}
+
 #endif /* QEMU_SPICE_H */
diff --git a/monitor.c b/monitor.c
index 1808e41..cd81289 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1095,11 +1095,12 @@ static int client_migrate_info(Monitor *mon, const 
QDict *qdict,
 const char *subject  = qdict_get_try_str(qdict, cert-subject);
 int port = qdict_get_try_int(qdict, port, -1);
 int tls_port = qdict_get_try_int(qdict, tls-port, -1);
+Error *err;
 int ret;
 
 if (strcmp(protocol, spice) == 0) {
-if (!using_spice) {
-qerror_report(QERR_DEVICE_NOT_ACTIVE, spice);
+if (!qemu_using_spice(err)) {
+qerror_report_err(err);
 return -1;
 }
 
diff --git a/qmp.c b/qmp.c
index 0b4f131..f01ae7d 100644
--- a/qmp.c
+++ b/qmp.c
@@ -287,9 +287,7 @@ void qmp_set_password(const char *protocol, const char 
*password,
 }
 
 if (strcmp(protocol, spice) == 0) {
-if (!using_spice) {
-/* correct one? spice isn't a device ,,, */
-error_set(errp, QERR_DEVICE_NOT_ACTIVE, spice);
+if (!qemu_using_spice(errp)) {
 return;
 }
 rc = qemu_spice_set_passwd(password, fail_if_connected,
@@ -335,9 +333,7 @@ void qmp_expire_password(const char *protocol, const char 
*whenstr,
 }
 
 if (strcmp(protocol, spice) == 0) {
-if (!using_spice) {
-/* correct one? spice isn't a device ,,, */
-error_set(errp, QERR_DEVICE_NOT_ACTIVE, spice);
+if (!qemu_using_spice(errp)) {
 return;
 }
 rc = qemu_spice_set_pw_expire(when);
@@ -562,8 +558,7 @@ void qmp_add_client(const char *protocol, const char 
*fdname,
 }
 
 if (strcmp(protocol, spice) == 0) {
-if (!using_spice) {
-error_set(errp, QERR_DEVICE_NOT_ACTIVE, spice);
+if (!qemu_using_spice(errp)) {
 close(fd);
 return;
 }
-- 
1.9.3

[Qemu-devel] [PATCH 4/9] qmp: Clean up qmp_query_spice() #ifndef !CONFIG_SPICE dummy

2015-01-13 Thread Markus Armbruster

QMP command query-spice exists only #ifdef CONFIG_SPICE.  Due to QAPI
limitations, we need a dummy function anyway, but it's unreachable.

Our current dummy function goes out of its way to produce the exact
same error as the QMP core does for unknown commands.  Cute, but both
unclean and unnecessary.  Replace by straight abort().

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 qmp.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/qmp.c b/qmp.c
index f01ae7d..54624ef 100644
--- a/qmp.c
+++ b/qmp.c
@@ -137,14 +137,18 @@ VncInfo *qmp_query_vnc(Error **errp)
 #endif
 
 #ifndef CONFIG_SPICE
-/* If SPICE support is enabled, the true query-spice command is
-   defined in the SPICE subsystem. Also note that we use a small
-   trick to maintain query-spice's original behavior, which is not
-   to be available in the namespace if SPICE is not compiled in */
+/*
+ * qmp-commands.hx ensures that QMP command query-spice exists only
+ * #ifdef CONFIG_SPICE.  Necessary for an accurate query-commands
+ * result.  However, the QAPI schema is blissfully unaware of that,
+ * and the QAPI code generator happily generates a dead
+ * qmp_marshal_input_query_spice() that calls qmp_query_spice().
+ * Provide it one, or else linking fails.
+ * FIXME Educate the QAPI schema on CONFIG_SPICE.
+ */
 SpiceInfo *qmp_query_spice(Error **errp)
 {
-error_set(errp, QERR_COMMAND_NOT_FOUND, query-spice);
-return NULL;
+abort();
 };
 #endif
 
-- 
1.9.3

[Qemu-devel] [PATCH 0/9] qmp hmp balloon: Cleanups around error reporting

2015-01-13 Thread Markus Armbruster

I'm including balloon patches in the hope that they too can go through
Luiz's tree.

Markus Armbruster (9):
  qmp hmp: Factor out common using spice test
  qmp hmp: Improve error messages when SPICE is not in use
  hmp: Compile hmp_info_spice() only with CONFIG_SPICE
  qmp: Clean up qmp_query_spice() #ifndef !CONFIG_SPICE dummy
  qmp: Simplify recognition of capability negotiation command
  qmp: Eliminate silly QERR_COMMAND_NOT_FOUND macro
  balloon: Inline qemu_balloon(), qemu_balloon_status()
  balloon: Factor out common is balloon active test
  balloon: Eliminate silly QERR_ macros

 balloon.c | 59 ++-
 hmp.c |  2 ++
 include/qapi/qmp/qerror.h |  9 
 include/ui/qemu-spice.h   | 10 
 monitor.c | 19 +++
 qapi/qmp-dispatch.c   |  3 ++-
 qmp.c | 27 +++---
 7 files changed, 58 insertions(+), 71 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH 5/9] qmp: Simplify recognition of capability negotiation command

2015-01-13 Thread Markus Armbruster

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 monitor.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/monitor.c b/monitor.c
index cd81289..f8497b6 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4783,9 +4783,9 @@ static int monitor_can_read(void *opaque)
 return (mon-suspend_cnt == 0) ? 1 : 0;
 }
 
-static int invalid_qmp_mode(const Monitor *mon, const char *cmd_name)
+static int invalid_qmp_mode(const Monitor *mon, const mon_cmd_t *cmd)
 {
-int is_cap = compare_cmd(cmd_name, qmp_capabilities);
+int is_cap = cmd-mhandler.cmd_new == do_qmp_capabilities;
 return (qmp_cmd_mode(mon) ? is_cap : !is_cap);
 }
 
@@ -5079,13 +5079,8 @@ static void handle_qmp_command(JSONMessageParser 
*parser, QList *tokens)
 
 cmd_name = qdict_get_str(input, execute);
 trace_handle_qmp_command(mon, cmd_name);
-if (invalid_qmp_mode(mon, cmd_name)) {
-qerror_report(QERR_COMMAND_NOT_FOUND, cmd_name);
-goto err_out;
-}
-
 cmd = qmp_find_cmd(cmd_name);
-if (!cmd) {
+if (!cmd || invalid_qmp_mode(mon, cmd)) {
 qerror_report(QERR_COMMAND_NOT_FOUND, cmd_name);
 goto err_out;
 }
-- 
1.9.3

[Qemu-devel] [PATCH 3/8] rcu: add rcutorture

2015-01-13 Thread Paolo Bonzini

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 tests/Makefile |   6 +-
 tests/rcutorture.c | 447 +
 util/rcu.c |   1 -
 3 files changed, 452 insertions(+), 2 deletions(-)
 create mode 100644 tests/rcutorture.c

diff --git a/tests/Makefile b/tests/Makefile
index 47ba143..980c0ec 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -63,6 +63,8 @@ gcov-files-test-tls-y =
 check-unit-y += tests/test-int128$(EXESUF)
 # all code tested by test-int128 is inside int128.h
 gcov-files-test-int128-y =
+check-unit-y += tests/rcutorture$(EXESUF)
+gcov-files-rcutorture-y = util/rcu.c
 check-unit-y += tests/test-bitops$(EXESUF)
 check-unit-$(CONFIG_HAS_GLIB_SUBPROCESS_TESTS) += 
tests/test-qdev-global-props$(EXESUF)
 check-unit-y += tests/check-qom-interface$(EXESUF)
@@ -226,7 +228,8 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o 
tests/check-qdict.o \
tests/test-qmp-input-visitor.o tests/test-qmp-input-strict.o \
tests/test-qmp-commands.o tests/test-visitor-serialization.o \
tests/test-x86-cpuid.o tests/test-mul64.o tests/test-int128.o \
-   tests/test-opts-visitor.o tests/test-qmp-event.o tests/test-tls.o
+   tests/test-opts-visitor.o tests/test-qmp-event.o tests/test-tls.o \
+   tests/rcutorture.o
 
 test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \
  tests/test-qapi-event.o
@@ -256,6 +259,7 @@ tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o 
migration/xbzrle.o page_cache.o
 tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o
 tests/test-int128$(EXESUF): tests/test-int128.o
 tests/test-tls$(EXESUF): tests/test-tls.o libqemuutil.a
+tests/rcutorture$(EXESUF): tests/rcutorture.o libqemuutil.a
 
 tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
hw/core/qdev.o hw/core/qdev-properties.o hw/core/hotplug.o\
diff --git a/tests/rcutorture.c b/tests/rcutorture.c
new file mode 100644
index 000..cda7458
--- /dev/null
+++ b/tests/rcutorture.c
@@ -0,0 +1,447 @@
+/*
+ * rcutorture.c: simple user-level performance/stress test of RCU.
+ *
+ * Usage:
+ * ./rcu nreaders rperf [ seconds ]
+ * Run a read-side performance test with the specified
+ * number of readers for seconds seconds.
+ * ./rcu nupdaters uperf [ seconds ]
+ * Run an update-side performance test with the specified
+ * number of updaters and specified duration.
+ * ./rcu nreaders perf [ seconds ]
+ * Run a combined read/update performance test with the specified
+ * number of readers and one updater and specified duration.
+ *
+ * The above tests produce output as follows:
+ *
+ * n_reads: 46008000  n_updates: 146026  nreaders: 2  nupdaters: 1 duration: 1
+ * ns/read: 43.4707  ns/update: 6848.1
+ *
+ * The first line lists the total number of RCU reads and updates executed
+ * during the test, the number of reader threads, the number of updater
+ * threads, and the duration of the test in seconds.  The second line
+ * lists the average duration of each type of operation in nanoseconds,
+ * or nan if the corresponding type of operation was not performed.
+ *
+ * ./rcu nreaders stress [ seconds ]
+ * Run a stress test with the specified number of readers and
+ * one updater.
+ *
+ * This test produces output as follows:
+ *
+ * n_reads: 114633217  n_updates: 3903415  n_mberror: 0
+ * rcu_stress_count: 114618391 14826 0 0 0 0 0 0 0 0 0
+ *
+ * The first line lists the number of RCU read and update operations
+ * executed, followed by the number of memory-ordering violations
+ * (which will be zero in a correct RCU implementation).  The second
+ * line lists the number of readers observing progressively more stale
+ * data.  A correct RCU implementation will have all but the first two
+ * numbers non-zero.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (c) 2008 Paul E. McKenney, IBM Corporation.
+ */
+
+/*
+ * Test variables.
+ */
+
+#include glib.h
+#include stdlib.h
+#include stdio.h
+#include string.h
+#include qemu/atomic.h
+#include qemu/rcu.h
+#include qemu/compiler.h
+#include qemu/thread.h
+
+long long n_reads = 0LL;
+long n_updates = 0L;
+int nthreadsrunning;
+
+char argsbuf[64];
+
+#define GOFLAG_INIT 0

[Qemu-devel] [PATCH 2/8] rcu: add rcu library

2015-01-13 Thread Paolo Bonzini

This includes a (mangled) copy of the liburcu code.  The main changes
are: 1) removing dependencies on many other header files in liburcu; 2)
removing for simplicity the tentative busy waiting in synchronize_rcu,
which has limited performance effects; 3) replacing futexes in
synchronize_rcu with QemuEvents for Win32 portability.  The API is
the same as liburcu, so it should be possible in the future to require
liburcu on POSIX systems for example and use our copy only on Windows.

Among the various versions available I chose urcu-mb, which is the
least invasive implementation even though it does not have the
fastest rcu_read_{lock,unlock} implementation.  The urcu flavor can
be changed later, after benchmarking.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 docs/rcu.txt  | 286 ++
 hw/9pfs/virtio-9p-synth.c |   1 +
 include/qemu/atomic.h |  61 ++
 include/qemu/queue.h  |  13 +++
 include/qemu/rcu.h| 120 +++
 include/qemu/thread.h |   3 -
 util/Makefile.objs|   1 +
 util/rcu.c| 173 
 8 files changed, 655 insertions(+), 3 deletions(-)
 create mode 100644 docs/rcu.txt
 create mode 100644 include/qemu/rcu.h
 create mode 100644 util/rcu.c

diff --git a/docs/rcu.txt b/docs/rcu.txt
new file mode 100644
index 000..b839642
--- /dev/null
+++ b/docs/rcu.txt
@@ -0,0 +1,286 @@
+Using RCU (Read-Copy-Update) for synchronization
+
+
+Read-copy update (RCU) is a synchronization mechanism that is used to
+protect read-mostly data structures.  RCU is very efficient and scalable
+on the read side (it is wait-free), and thus can make the read paths
+extremely fast.
+
+RCU supports concurrency between a single writer and multiple readers,
+thus it is not used alone.  Typically, the write-side will use a lock to
+serialize multiple updates, but other approaches are possible (e.g.,
+restricting updates to a single task).  In QEMU, when a lock is used,
+this will often be the iothread mutex, also known as the big QEMU
+lock (BQL).  Also, restricting updates to a single task is done in
+QEMU using the bottom half API.
+
+RCU is fundamentally a wait-to-finish mechanism.  The read side marks
+sections of code with critical sections, and the update side will wait
+for the execution of all *currently running* critical sections before
+proceeding, or before asynchronously executing a callback.
+
+The key point here is that only the currently running critical sections
+are waited for; critical sections that are started _after_ the beginning
+of the wait do not extend the wait, despite running concurrently with
+the updater.  This is the reason why RCU is more scalable than,
+for example, reader-writer locks.  It is so much more scalable that
+the system will have a single instance of the RCU mechanism; a single
+mechanism can be used for an arbitrary number of things, without
+having to worry about things such as contention or deadlocks.
+
+How is this possible?  The basic idea is to split updates in two phases,
+removal and reclamation.  During removal, we ensure that subsequent
+readers will not be able to get a reference to the old data.  After
+removal has completed, a critical section will not be able to access
+the old data.  Therefore, critical sections that begin after removal
+do not matter; as soon as all previous critical sections have finished,
+there cannot be any readers who hold references to the data structure,
+which may not be safely reclaimed (e.g., freed or unref'ed).
+
+Here is a picutre:
+
+thread 1  thread 2  thread 3
+------
+enter RCU crit.sec.
+   |finish removal phase
+   |begin wait
+   |  |enter RCU crit.sec.
+exit RCU crit.sec |   |
+complete wait |
+begin reclamation phase   |
+   exit RCU crit.sec.
+
+
+Note how thread 3 is still executing its critical section when thread 2
+starts reclaiming data.  This is possible, because the old version of the
+data structure was not accessible at the time thread 3 began executing
+that critical section.
+
+
+RCU API
+===
+
+The core RCU API is small:
+
+ void rcu_read_lock(void);
+
+Used by a reader to inform the reclaimer that the reader is
+entering an RCU read-side critical section.
+
+ void rcu_read_unlock(void);
+
+Used by a reader to inform the reclaimer that the reader is
+exiting an RCU read-side critical section.  Note that RCU
+read-side critical sections may be nested and/or overlapping.
+
+ void

[Qemu-devel] [PULL 0/2] Xen tree 2015-01-13

2015-01-13 Thread Stefano Stabellini

The following changes since commit 7d5ad15d17f26dd4f9ff5f3491828bc34e74f28c:

  Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into 
staging (2015-01-12 11:13:24 +)

are available in the git repository at:


  git://xenbits.xen.org/people/sstabellini/qemu-dm.git xen-2015-01-13

for you to fetch changes up to c1d322e6048796296555dd36fdd102d7fa2f50bf:

  xen-hvm: increase maxmem before calling xc_domain_populate_physmap 
(2015-01-13 18:05:52 +)


Liang Li (1):
  xen-pt: Fix PCI devices re-attach failed

Stefano Stabellini (1):
  xen-hvm: increase maxmem before calling xc_domain_populate_physmap

 hw/xen/xen_pt.c |2 +-
 xen-hvm.c   |   24 
 2 files changed, 25 insertions(+), 1 deletion(-)

[Qemu-devel] [PATCH 0/8] RCUification of the memory API, part 1

2015-01-13 Thread Paolo Bonzini

These are the minimal changes to adopt RCU and use it in memory_region_find
(and hence in virtio-blk-dataplane).  Looks big, but two thirds of it
is documentation and tests.

Please review! :)

Paolo

Jan Kiszka (1):
  memory: remove assertion on memory_region_destroy

Paolo Bonzini (7):
  tls: require compiler support for __thread
  rcu: add rcu library
  rcu: add rcutorture
  rcu: allow nesting of rcu_read_lock/rcu_read_unlock
  rcu: add call_rcu
  memory: protect current_map by RCU
  memory: avoid ref/unref in memory_region_find

 configure |   9 +-
 docs/rcu.txt  | 388 +++
 exec.c|   2 +-
 hw/9pfs/virtio-9p-synth.c |   1 +
 include/exec/memory.h |   5 +
 include/qemu/atomic.h |  61 +++
 include/qemu/queue.h  |  13 ++
 include/qemu/rcu.h| 155 
 include/qemu/thread.h |   3 -
 include/qemu/tls.h|  52 --
 include/qom/cpu.h |   4 +-
 memory.c  |  60 +++
 tests/Makefile|  11 +-
 tests/rcutorture.c| 449 ++
 tests/test-tls.c  |  83 +
 util/Makefile.objs|   1 +
 util/rcu.c| 290 ++
 17 files changed, 1485 insertions(+), 102 deletions(-)
 create mode 100644 docs/rcu.txt
 create mode 100644 include/qemu/rcu.h
 delete mode 100644 include/qemu/tls.h
 create mode 100644 tests/rcutorture.c
 create mode 100644 tests/test-tls.c
 create mode 100644 util/rcu.c

-- 
1.8.3.1

[Qemu-devel] [PATCH 2/2] cpu_ldst.h: Collapse laddr() and saddr() into ldst_get_host_addr()

2015-01-13 Thread Peter Maydell

The macros laddr() and saddr() are always defined to be identical
to each other. Replace them with a single macro, and give it a
longer name so it's easier to grep the codebase and confirm that
it's only used in this one place: ldst_get_host_addr().

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 include/exec/cpu_ldst.h | 37 +++--
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index 4700831..0aae9e4 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -53,30 +53,31 @@
 h2g_nocheck(x); \
 })
 
-#define saddr(x) g2h(x)
-#define laddr(x) g2h(x)
+#define ldst_get_host_addr(x) g2h(x)
 
 #else /* !CONFIG_USER_ONLY */
 /* NOTE: we use double casts if pointers and target_ulong have
different sizes */
-#define saddr(x) (uint8_t *)(intptr_t)(x)
-#define laddr(x) (uint8_t *)(intptr_t)(x)
+#define ldst_get_host_addr(x) (uint8_t *)(intptr_t)(x)
 #endif
 
-#define ldub_raw(p) ldub_p(laddr((p)))
-#define ldsb_raw(p) ldsb_p(laddr((p)))
-#define lduw_raw(p) lduw_p(laddr((p)))
-#define ldsw_raw(p) ldsw_p(laddr((p)))
-#define ldl_raw(p) ldl_p(laddr((p)))
-#define ldq_raw(p) ldq_p(laddr((p)))
-#define ldfl_raw(p) ldfl_p(laddr((p)))
-#define ldfq_raw(p) ldfq_p(laddr((p)))
-#define stb_raw(p, v) stb_p(saddr((p)), v)
-#define stw_raw(p, v) stw_p(saddr((p)), v)
-#define stl_raw(p, v) stl_p(saddr((p)), v)
-#define stq_raw(p, v) stq_p(saddr((p)), v)
-#define stfl_raw(p, v) stfl_p(saddr((p)), v)
-#define stfq_raw(p, v) stfq_p(saddr((p)), v)
+/* Note that ldst_get_host_addr() should not be used anywhere outside
+ * these macros
+ */
+#define ldub_raw(p) ldub_p(ldst_get_host_addr(p))
+#define ldsb_raw(p) ldsb_p(ldst_get_host_addr(p))
+#define lduw_raw(p) lduw_p(ldst_get_host_addr(p))
+#define ldsw_raw(p) ldsw_p(ldst_get_host_addr(p))
+#define ldl_raw(p) ldl_p(ldst_get_host_addr(p))
+#define ldq_raw(p) ldq_p(ldst_get_host_addr(p))
+#define ldfl_raw(p) ldfl_p(ldst_get_host_addr(p))
+#define ldfq_raw(p) ldfq_p(ldst_get_host_addr(p))
+#define stb_raw(p, v) stb_p(ldst_get_host_addr(p), v)
+#define stw_raw(p, v) stw_p(ldst_get_host_addr(p), v)
+#define stl_raw(p, v) stl_p(ldst_get_host_addr(p), v)
+#define stq_raw(p, v) stq_p(ldst_get_host_addr(p), v)
+#define stfl_raw(p, v) stfl_p(ldst_get_host_addr(p), v)
+#define stfq_raw(p, v) stfq_p(ldst_get_host_addr(p), v)
 
 
 #if defined(CONFIG_USER_ONLY)
-- 
1.9.1

[Qemu-devel] [PATCH 1/2] cpu_ldst.h: Remove unused ldul_ macros

2015-01-13 Thread Peter Maydell

The five ldul_ macros are not used anywhere and are marked up with an XXX
comment. ldul is a non-standard prefix for our family of load instructions:
we don't mark 32-bit accesses for signedness because they return a 32 bit
quantity. So just delete them.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 include/exec/cpu_ldst.h | 9 -
 1 file changed, 9 deletions(-)

diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index e5550e7..4700831 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -151,15 +151,6 @@
 
 #else
 
-/* XXX: find something cleaner.
- * Furthermore, this is false for 64 bits targets
- */
-#define ldul_user   ldl_user
-#define ldul_kernel ldl_kernel
-#define ldul_hypv   ldl_hypv
-#define ldul_executive  ldl_executive
-#define ldul_supervisor ldl_supervisor
-
 /* The memory helpers for tcg-generated code need tcg_target_long etc.  */
 #include tcg.h
 
-- 
1.9.1

Re: [Qemu-devel] [PATCH 2/4] target-i386: do not memcpy in and out of xmm_regs

2015-01-13 Thread Eduardo Habkost

On Wed, Jan 07, 2015 at 06:39:13PM +0100, Paolo Bonzini wrote:
 After the next patch, we will move the high parts of AVX and AVX512 registers
 in the same array as the SSE registers.  This will make it impossible to
 memcpy an array of 128-bit values in and out of xmm_regs in one swoop.
 Use a for loop instead.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Eduardo Habkost ehabk...@redhat.com

-- 
Eduardo

Re: [Qemu-devel] [PATCH 4/9] hbitmap: store / restore

2015-01-13 Thread John Snow




On 01/13/2015 07:59 AM, Vladimir Sementsov-Ogievskiy wrote:


Best regards,
Vladimir

On 09.01.2015 00:21, John Snow wrote:



On 12/11/2014 09:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Functions to store / restore HBitmap. HBitmap should be saved to linear
bitmap format independently of endianess.

Because of restoring in several steps, every step writes only the last
level of the bitmap. All other levels are restored by
hbitmap_restore_finish as a last step of restoring. So, HBitmap is
inconsistent while restoring.

.


I guess by nature of how bitmap migration has been implemented, we
cannot help but restore parts of the bitmap piece by piece, which
requires us to force the bitmap into an inconsistent state.

Yuck.

It might be nice to turn on a disable bit inside the hbitmap that
disallows its use until it is made consistent again, but I don't know
what the performance hit of the extra conditionals everywhere would be
like.



Another approach is to restore the bitmap after each piece set. It is a
bit slower of course but the bitmap will be consistent after
hbitmap_restore_data()



Yeah, that's not great either. The approach you have already taken is 
probably the best.



+/**
+ * hbitmap_restore_finish
+ * @hb: HBitmap to operate on.
+ *
+ * Repair HBitmap after calling hbitmap_restore_data. Actuall all
HBitmap
+ * layers are restore here.
+ */
+void hbitmap_restore_finish(HBitmap *hb);
+
+/**
   * hbitmap_free:
   * @hb: HBitmap to operate on.
   *


These are just biased opinions:

- It might be nice to name the store/restore functions serialize and
deserialize, to describe exactly what they are doing.

- I might refer to restore_finish as post_load or post_restore
or something similar to mimic how device migration functions are
named. I think hbitmap_restore_data_finalize would also be fine; the
key part here is clearly naming it relative to restore_data.



Hmm. Ok, what about the following set:
 hbitmap_serialize()
 hbitmap_deserialize_part()
 hbitmap_deserialize_finish()



Looks good to me!


diff --git a/util/hbitmap.c b/util/hbitmap.c
index 8aa7406..ac0323f 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -362,6 +362,90 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
  return (hb-levels[HBITMAP_LEVELS - 1][pos  BITS_PER_LEVEL] 
bit) != 0;
  }

+uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
+{
+uint64_t size;
+
+if (count == 0) {
+return 0;
+}
+
+size = (((count - 1)  hb-granularity)  BITS_PER_LEVEL) + 1;
+
+return size * sizeof(unsigned long);
+}
+


This seems flawed to me: number of bits without an offset can't be
mapped to a number of real bytes, because two bits may or may not
cross a granularity boundary, depending on *WHERE* you start counting.

e.g.

granularity = 1 (i.e. 2^1 = 2 virtual bits per 1 real bit)
virtual: 001100
real: 0 1 0

The amount of space required to hold two bits here could be as
little as one bit, if the offset is k={0,2,4} but it could be as much
as two bits if the offset is k={1,3}.

You may never use the function in this way, but mapping virtual bits
to an implementation byte-size seems like it is inviting an
inconsistency.

I dislike this function too.. But unfortunately we need the size in
bytes used for serialization.

Hmm. Ok, without loss of generality, let offset be less than
granularity. The precise formula should look like:

size = (((offset+count-1)  hb-granularity)  BITS_PER_LEVEL);

So,
size = 1  hb-granularity) + count - 2)  hb-granularity) 
BITS_PER_LEVEL) + 1;
would be enough in any case. Ok?



I think so, as long as when you deserialize the object it does so 
correctly, regardless of whether or not you start on an even multiple of 
the granularity.








+void hbitmap_store_data(const HBitmap *hb, uint8_t *buf,
+uint64_t start, uint64_t count)
+{
+uint64_t last = start + count - 1;
+unsigned long *out = (unsigned long *)buf;
+
+if (count == 0) {
+return;
+}
+
+start = (start  hb-granularity)  BITS_PER_LEVEL;
+last = (last  hb-granularity)  BITS_PER_LEVEL;
+count = last - start + 1;
+
+#ifdef __BIG_ENDIAN_BITFIELD
+for (i = start; i = last; ++i) {
+unsigned long el = hb-levels[HBITMAP_LEVELS - 1][i];
+out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) :
cpu_to_le64(el));
+}
+#else
+memcpy(out, hb-levels[HBITMAP_LEVELS - 1][start],
+   count * sizeof(unsigned long));
+#endif
+}
+


I suppose the ifdefs are trying to take an optimization by using
memcpy if at all possible, without a conversion.

Why are the conversions to little endian, though? Shouldn't we be
serializing to a Big Endian format?

As Paolo already explained it is for portability. LE bitmap
representation is independent of size of bitmap array elements, which
may be 32bit/64bit, or whatever else with LE format.


Yes, that makes sense. :)




+void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
+

[Qemu-devel] [PATCH 5/8] block: add bdrv_load_dirty_bitmap

2015-01-13 Thread Vladimir Sementsov-Ogievskiy

The funcion loads dirty bitmap from file, using underlying driver
function.

Note: the function doesn't change BdrvDirtyBitmap.file field. This field
is only used by bdrv_store_dirty_bitmap() function and is ONLY written
by bdrv_dirty_bitmap_set_file() function.

Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
---
 block.c   | 37 +
 include/block/block.h |  5 +
 2 files changed, 42 insertions(+)

diff --git a/block.c b/block.c
index 7237b95..77419e9 100644
--- a/block.c
+++ b/block.c
@@ -5384,6 +5384,43 @@ BdrvDirtyBitmap 
*bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
 return originator;
 }
 
+BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs,
+BlockDriverState *file,
+int granularity,
+const char *name,
+Error **errp)
+{
+BlockDriver *drv = file-drv;
+if (!drv) {
+return NULL;
+}
+if (drv-bdrv_dirty_bitmap_load) {
+BdrvDirtyBitmap *bitmap;
+uint64_t bitmap_size = bdrv_nb_sectors(bs);
+uint8_t *buf = drv-bdrv_dirty_bitmap_load(file, name, bitmap_size,
+   granularity);
+if (buf == NULL) {
+return NULL;
+}
+
+bitmap = bdrv_create_dirty_bitmap(bs, granularity, name, errp);
+if (bitmap == NULL) {
+g_free(buf);
+return NULL;
+}
+
+hbitmap_restore_data(bitmap-bitmap, buf, 0, bitmap_size);
+hbitmap_restore_finish(bitmap-bitmap);
+
+return bitmap;
+}
+if (file-file)  {
+return bdrv_load_dirty_bitmap(bs, file-file, granularity, name,
+  errp);
+}
+return NULL;
+}
+
 int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap)
 {
 BlockDriverState *bs = bitmap-file;
diff --git a/include/block/block.h b/include/block/block.h
index 0dfefe3..f36557f 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -460,6 +460,11 @@ void bdrv_dirty_iter_init(BlockDriverState *bs,
   BdrvDirtyBitmap *bitmap, struct HBitmapIter *hbi);
 void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t offset);
 int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
+BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs,
+BlockDriverState *file,
+int granularity,
+const char *name,
+Error **errp);
 int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
-- 
1.9.1

[Qemu-devel] [PATCH] block: update string sizes for filename, backing_file, exact_filename

2015-01-13 Thread Jeff Cody

The string field entries 'filename', 'backing_file', and
'exact_filename' in the BlockDriverState struct are defined as 1024
bytes.

However, most places that use these values accept a maximum of PATH_MAX
bytes.  This patch makes the BlockDriverStruct field string sizes match
the most common usage.

This patch also updates two block drivers that still use 1024-byte sized
arrays for 'backing_file'.

Signed-off-by: Jeff Cody jc...@redhat.com
---
 block/mirror.c| 2 +-
 block/qapi.c  | 2 +-
 include/block/block_int.h | 8 
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 9019d1b..57154eb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -378,7 +378,7 @@ static void coroutine_fn mirror_run(void *opaque)
 int64_t sector_num, end, sectors_per_chunk, length;
 uint64_t last_pause_ns;
 BlockDriverInfo bdi;
-char backing_filename[1024];
+char backing_filename[PATH_MAX];
 int ret = 0;
 int n;
 
diff --git a/block/qapi.c b/block/qapi.c
index a6fd6f7..c097238 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -175,7 +175,7 @@ void bdrv_query_image_info(BlockDriverState *bs,
 {
 int64_t size;
 const char *backing_filename;
-char backing_filename2[1024];
+char backing_filename2[PATH_MAX];
 BlockDriverInfo bdi;
 int ret;
 Error *err = NULL;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 06a21dd..e264be9 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -339,13 +339,13 @@ struct BlockDriverState {
  * regarding this BDS's context */
 QLIST_HEAD(, BdrvAioNotifier) aio_notifiers;
 
-char filename[1024];
-char backing_file[1024]; /* if non zero, the image is a diff of
-this file image */
+char filename[PATH_MAX];
+char backing_file[PATH_MAX]; /* if non zero, the image is a diff of
+this file image */
 char backing_format[16]; /* if non-zero and backing_file exists */
 
 QDict *full_open_options;
-char exact_filename[1024];
+char exact_filename[PATH_MAX];
 
 BlockDriverState *backing_hd;
 BlockDriverState *file;
-- 
1.9.3

[Qemu-devel] [PATCH 6/9] qmp: Eliminate silly QERR_COMMAND_NOT_FOUND macro

2015-01-13 Thread Markus Armbruster

The QERR_ macros are leftovers from the days of rich error objects.
They're used with error_set() and qerror_report(), and expand into the
first *two* arguments.  This trickiness has become pointless.  Clean
this one up.

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 include/qapi/qmp/qerror.h | 3 ---
 monitor.c | 3 ++-
 qapi/qmp-dispatch.c   | 3 ++-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h
index 0ca6cbd..28f980e 100644
--- a/include/qapi/qmp/qerror.h
+++ b/include/qapi/qmp/qerror.h
@@ -52,9 +52,6 @@ void qerror_report_err(Error *err);
 #define QERR_BUS_NOT_FOUND \
 ERROR_CLASS_GENERIC_ERROR, Bus '%s' not found
 
-#define QERR_COMMAND_NOT_FOUND \
-ERROR_CLASS_COMMAND_NOT_FOUND, The command %s has not been found
-
 #define QERR_DEVICE_ENCRYPTED \
 ERROR_CLASS_DEVICE_ENCRYPTED, '%s' (%s) is encrypted
 
diff --git a/monitor.c b/monitor.c
index f8497b6..0293769 100644
--- a/monitor.c
+++ b/monitor.c
@@ -5081,7 +5081,8 @@ static void handle_qmp_command(JSONMessageParser *parser, 
QList *tokens)
 trace_handle_qmp_command(mon, cmd_name);
 cmd = qmp_find_cmd(cmd_name);
 if (!cmd || invalid_qmp_mode(mon, cmd)) {
-qerror_report(QERR_COMMAND_NOT_FOUND, cmd_name);
+qerror_report(ERROR_CLASS_COMMAND_NOT_FOUND,
+  The command %s has not been found, cmd_name);
 goto err_out;
 }
 
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 168b083..2227420 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -76,7 +76,8 @@ static QObject *do_qmp_dispatch(QObject *request, Error 
**errp)
 command = qdict_get_str(dict, execute);
 cmd = qmp_find_command(command);
 if (cmd == NULL) {
-error_set(errp, QERR_COMMAND_NOT_FOUND, command);
+error_set(errp, ERROR_CLASS_COMMAND_NOT_FOUND,
+  The command %s has not been found, command);
 return NULL;
 }
 if (!cmd-enabled) {
-- 
1.9.3

[Qemu-devel] [PATCH 7/9] balloon: Inline qemu_balloon(), qemu_balloon_status()

2015-01-13 Thread Markus Armbruster

... and simplify a bit.  Permits factoring out common error checks in
the next commit.

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 balloon.c | 42 +-
 1 file changed, 13 insertions(+), 29 deletions(-)

diff --git a/balloon.c b/balloon.c
index b70da4f..2884c2d 100644
--- a/balloon.c
+++ b/balloon.c
@@ -62,25 +62,6 @@ void qemu_remove_balloon_handler(void *opaque)
 balloon_opaque = NULL;
 }
 
-static int qemu_balloon(ram_addr_t target)
-{
-if (!balloon_event_fn) {
-return 0;
-}
-trace_balloon_event(balloon_opaque, target);
-balloon_event_fn(balloon_opaque, target);
-return 1;
-}
-
-static int qemu_balloon_status(BalloonInfo *info)
-{
-if (!balloon_stat_fn) {
-return 0;
-}
-balloon_stat_fn(balloon_opaque, info);
-return 1;
-}
-
 BalloonInfo *qmp_query_balloon(Error **errp)
 {
 BalloonInfo *info;
@@ -90,30 +71,33 @@ BalloonInfo *qmp_query_balloon(Error **errp)
 return NULL;
 }
 
+if (!balloon_stat_fn) {
+error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon);
+return NULL;
+}
+
 info = g_malloc0(sizeof(*info));
-
-if (qemu_balloon_status(info) == 0) {
-error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon);
-qapi_free_BalloonInfo(info);
-return NULL;
-}
-
+balloon_stat_fn(balloon_opaque, info);
 return info;
 }
 
-void qmp_balloon(int64_t value, Error **errp)
+void qmp_balloon(int64_t target, Error **errp)
 {
 if (kvm_enabled()  !kvm_has_sync_mmu()) {
 error_set(errp, QERR_KVM_MISSING_CAP, synchronous MMU, balloon);
 return;
 }
 
-if (value = 0) {
+if (target = 0) {
 error_set(errp, QERR_INVALID_PARAMETER_VALUE, target, a size);
 return;
 }
 
-if (qemu_balloon(value) == 0) {
+if (!balloon_event_fn) {
 error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon);
+return;
 }
+
+trace_balloon_event(balloon_opaque, target);
+balloon_event_fn(balloon_opaque, target);
 }
-- 
1.9.3

Re: [Qemu-devel] [PATCH v11 09/13] qmp: Add support of dirty-bitmap sync mode for drive-backup

2015-01-13 Thread John Snow




On 01/13/2015 04:37 AM, Fam Zheng wrote:

On Mon, 01/12 11:31, John Snow wrote:

For dirty-bitmap sync mode, the block job will iterate through the
given dirty bitmap to decide if a sector needs backup (backup all the
dirty clusters and skip clean ones), just as allocation conditions of
top sync mode.

Signed-off-by: Fam Zheng f...@redhat.com
Signed-off-by: John Snow js...@redhat.com
---
  block.c   |   5 ++
  block/backup.c| 120 ++
  block/mirror.c|   4 ++
  blockdev.c|  14 +-
  hmp.c |   3 +-
  include/block/block.h |   1 +
  include/block/block_int.h |   2 +
  qapi/block-core.json  |  11 +++--
  qmp-commands.hx   |   7 +--
  9 files changed, 137 insertions(+), 30 deletions(-)

diff --git a/block.c b/block.c
index 3f33b9d..5eb6788 100644
--- a/block.c
+++ b/block.c
@@ -5633,6 +5633,11 @@ static void bdrv_reset_dirty(BlockDriverState *bs, 
int64_t cur_sector,
  }
  }

+void bdrv_set_dirty_iter(HBitmapIter *hbi, int64_t offset)
+{
+hbitmap_iter_init(hbi, hbi-hb, offset);


What's the reason for this indirection? Can caller just use hbitmap_iter_init?



Essentially it is just a rename of hbitmap_iter_init to make its usage 
here clear, that we are manually advancing the pointer. How we 
accomplish that (hbitmap_iter_init) is an implementation detail.


Yes, I can just call this directly from block/backup, but at the time I 
was less sure of how I would advance the pointer, so I created a wrapper 
where I could change details as needed, if I needed to.



+}
+
  int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
  {
  return hbitmap_count(bitmap-bitmap);
diff --git a/block/backup.c b/block/backup.c
index 792e655..0626a3e 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -37,6 +37,8 @@ typedef struct CowRequest {
  typedef struct BackupBlockJob {
  BlockJob common;
  BlockDriverState *target;
+/* bitmap for sync=dirty-bitmap */
+BdrvDirtyBitmap *sync_bitmap;
  MirrorSyncMode sync_mode;
  RateLimit limit;
  BlockdevOnError on_source_error;
@@ -242,6 +244,31 @@ static void backup_complete(BlockJob *job, void *opaque)
  g_free(data);
  }

+static bool coroutine_fn yield_and_check(BackupBlockJob *job)
+{
+if (block_job_is_cancelled(job-common)) {
+return true;
+}
+
+/* we need to yield so that qemu_aio_flush() returns.
+ * (without, VM does not reboot)
+ */
+if (job-common.speed) {
+uint64_t delay_ns = ratelimit_calculate_delay(job-limit,
+  job-sectors_read);
+job-sectors_read = 0;
+block_job_sleep_ns(job-common, QEMU_CLOCK_REALTIME, delay_ns);
+} else {
+block_job_sleep_ns(job-common, QEMU_CLOCK_REALTIME, 0);
+}
+
+if (block_job_is_cancelled(job-common)) {
+return true;
+}
+
+return false;
+}
+
  static void coroutine_fn backup_run(void *opaque)
  {
  BackupBlockJob *job = opaque;
@@ -254,13 +281,13 @@ static void coroutine_fn backup_run(void *opaque)
  };
  int64_t start, end;
  int ret = 0;
+bool error_is_read;

  QLIST_INIT(job-inflight_reqs);
  qemu_co_rwlock_init(job-flush_rwlock);

  start = 0;
-end = DIV_ROUND_UP(job-common.len / BDRV_SECTOR_SIZE,
-   BACKUP_SECTORS_PER_CLUSTER);
+end = DIV_ROUND_UP(job-common.len, BACKUP_CLUSTER_SIZE);

  job-bitmap = hbitmap_alloc(end, 0);

@@ -278,28 +305,45 @@ static void coroutine_fn backup_run(void *opaque)
  qemu_coroutine_yield();
  job-common.busy = true;
  }
+} else if (job-sync_mode == MIRROR_SYNC_MODE_DIRTY_BITMAP) {
+/* Dirty Bitmap sync has a slightly different iteration method */
+HBitmapIter hbi;
+int64_t sector;
+int64_t cluster;
+bool polyrhythmic;
+
+bdrv_dirty_iter_init(bs, job-sync_bitmap, hbi);
+/* Does the granularity happen to match our backup cluster size? */
+polyrhythmic = (bdrv_dirty_bitmap_granularity(bs, job-sync_bitmap) !=
+BACKUP_CLUSTER_SIZE);
+
+/* Find the next dirty /sector/ and copy that /cluster/ */
+while ((sector = hbitmap_iter_next(hbi)) != -1) {
+if (yield_and_check(job)) {
+goto leave;
+}
+cluster = sector / BACKUP_SECTORS_PER_CLUSTER;
+
+do {
+ret = backup_do_cow(bs, cluster * BACKUP_SECTORS_PER_CLUSTER,
+BACKUP_SECTORS_PER_CLUSTER, 
error_is_read);
+if ((ret  0) 
+backup_error_action(job, error_is_read, -ret) ==
+BLOCK_ERROR_ACTION_REPORT) {
+goto leave;
+}
+} while (ret  0);
+
+/* Advance (or rewind) our iterator if we need to. */
+

[Qemu-devel] [PATCH 5/8] rcu: add call_rcu

2015-01-13 Thread Paolo Bonzini

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 docs/rcu.txt   | 110 +++--
 include/qemu/rcu.h |  22 ++
 util/rcu.c | 118 +
 3 files changed, 246 insertions(+), 4 deletions(-)

diff --git a/docs/rcu.txt b/docs/rcu.txt
index b839642..96ed241 100644
--- a/docs/rcu.txt
+++ b/docs/rcu.txt
@@ -82,7 +82,50 @@ The core RCU API is small:
 Note that it would be valid for another update to come while
 synchronize_rcu is running.  Because of this, it is better that
 the updater releases any locks it may hold before calling
-synchronize_rcu.
+synchronize_rcu.  If this is not possible (for example, because
+the updater is protected by the BQL), you can use call_rcu.
+
+ void call_rcu1(struct rcu_head * head,
+void (*func)(struct rcu_head *head));
+
+This function invokes func(head) after all pre-existing RCU
+read-side critical sections on all threads have completed.  This
+marks the end of the removal phase, with func taking care
+asynchronously of the reclamation phase.
+
+The foo struct needs to have an rcu_head structure added,
+perhaps as follows:
+
+struct foo {
+struct rcu_head rcu;
+int a;
+char b;
+long c;
+};
+
+so that the reclaimer function can fetch the struct foo address
+and free it:
+
+call_rcu1(foo.rcu, foo_reclaim);
+
+void foo_reclaim(struct rcu_head *rp)
+{
+struct foo *fp = container_of(rp, struct foo, rcu);
+g_free(fp);
+}
+
+For the common case where the rcu_head member is the first of the
+struct, you can use the following macro.
+
+ void call_rcu(T *p,
+   void (*func)(T *p),
+   field-name);
+
+call_rcu1 is typically used through this macro, in the common case
+where the struct rcu_head is the first field in the struct.  In
+the above case, one could have written simply:
+
+call_rcu(foo_reclaim, g_free, rcu);
 
  typeof(*p) atomic_rcu_read(p);
 
@@ -154,6 +197,11 @@ DIFFERENCES WITH LINUX
 - atomic_rcu_read and atomic_rcu_set replace rcu_dereference and
   rcu_assign_pointer.  They take a _pointer_ to the variable being accessed.
 
+- call_rcu is a macro that has an extra argument (the name of the first
+  field in the struct, which must be a struct rcu_head), and expects the
+  type of the callback's argument to be the type of the first argument.
+  call_rcu1 is the same as Linux's call_rcu.
+
 
 RCU PATTERNS
 
@@ -207,7 +255,47 @@ The write side looks simply like this (with appropriate 
locking):
 synchronize_rcu();
 free(old);
 
-Note that the same idiom would be possible with reader/writer
+If the processing cannot be done purely within the critical section, it
+is possible to combine this idiom with a real reference count:
+
+rcu_read_lock();
+p = atomic_rcu_read(foo);
+foo_ref(p);
+rcu_read_unlock();
+/* do something with p. */
+foo_unref(p);
+
+The write side can be like this:
+
+qemu_mutex_lock(foo_mutex);
+old = foo;
+atomic_rcu_set(foo, new);
+qemu_mutex_unlock(foo_mutex);
+synchronize_rcu();
+foo_unref(old);
+
+or with call_rcu:
+
+qemu_mutex_lock(foo_mutex);
+old = foo;
+atomic_rcu_set(foo, new);
+qemu_mutex_unlock(foo_mutex);
+call_rcu(foo_unref, old, rcu);
+
+In both cases, the write side only performs removal.  Reclamation
+happens when the last reference to a foo object is dropped.
+Using synchronize_rcu() is undesirably expensive, because the
+last reference may be dropped on the read side.  Hence you can
+use call_rcu() instead:
+
+ foo_unref(struct foo *p) {
+if (atomic_fetch_dec(p-refcount) == 1) {
+call_rcu(foo_destroy, p, rcu);
+}
+}
+
+
+Note that the same idioms would be possible with reader/writer
 locks:
 
 read_lock(foo_rwlock); write_mutex_lock(foo_rwlock);
@@ -217,13 +305,27 @@ locks:
 write_mutex_unlock(foo_rwlock);
 free(p);
 
+--
+
+read_lock(foo_rwlock); write_mutex_lock(foo_rwlock);
+p = foo;old = foo;
+foo_ref(p); foo = new;
+read_unlock(foo_rwlock);   foo_unref(old);
+/* do something with p. */  write_mutex_unlock(foo_rwlock);
+read_lock(foo_rwlock);
+foo_unref(p);
+read_unlock(foo_rwlock);
+
+foo_unref could use a mechanism such as bottom halves to move deallocation
+out of the write-side critical section.
+
 
 RCU resizable arrays
 
 
 Resizable arrays can be used

Re: [Qemu-devel] [PATCH v4] block/raw-posix.c: Fixes raw_getlength() on Mac OS X so that it reports the correct length of a real CD

2015-01-13 Thread Programmingkid


On Jan 13, 2015, at 9:05 AM, Stefan Hajnoczi wrote:

 On Tue, Jan 6, 2015 at 3:04 PM, Programmingkid
 programmingk...@gmail.com wrote:
 
 On Jan 6, 2015, at 9:02 AM, Stefan Hajnoczi wrote:
 
 On Fri, Jan 02, 2015 at 04:44:38PM -0500, Programmingkid wrote:
 Removes redundant ret variable and renames sectorSize variable to meet 
 QEMU coding standards.
 
 This is a changelog item for v4 of this patch.  Changelogs should go
 below the '---' line so they are not merged into git history.
 
 The rationale is that when a patch is merged into git, the changelog
 describing patch revisions that were posted on the mailing list is not
 relevant (we only see the final patch in git, not the revisions from the
 mailing list).
 
 Patches usually look like this:
 
 Subject: block/raw-posix: brief summary
 
 A longer description of the problem, maybe a command-line to reproduce a
 bug, and some rationale for this code change.
 
 Signed-off-by: Me m...@email.com
 ---
 v2:
 * Fix int - size_t for memory lengths [Requested by Bob]
 
 The changelog at the bottom is useful to code reviewers but won't get
 merged in the git history.
 
 Anyway, thanks for this patch.  I have dropped this changelog line and
 merged it!
 
 
 Signed-off-by: John Arbuckle programmingk...@gmail.com
 
 ---
 block/raw-posix.c |   18 +-
 configure |2 +-
 2 files changed, 18 insertions(+), 2 deletions(-)
 
 Thanks, applied to my block tree:
 https://github.com/stefanha/qemu/commits/block
 
 Stefan
 
 Thank you very much for accepting my patch.
 
 Hi,
 Unfortunately I had to drop this patch because it breaks -drive
 file=/dev/null,...
 
 /dev/null is a character device and we should not return -ENOTSUP when
 the CD-ROM ioctls fail.
 
 Please let it fail gracefully when the device is not a CD-ROM.
 
 Stefan

What is the exact command you use with QEMU involving the /dev/null device?

What value is suppose to be returned when using a device like /dev/null?

Re: [Qemu-devel] [PATCH v4] block/raw-posix.c: Fixes raw_getlength() on Mac OS X so that it reports the correct length of a real CD

2015-01-13 Thread Peter Maydell

On 13 January 2015 at 17:52, Programmingkid programmingk...@gmail.com wrote:
 What is the exact command you use with QEMU involving the /dev/null device?

make check includes some tests which do this...

 What value is suppose to be returned when using a device like /dev/null?

I think we should fall back to the try lseek code path, as we do on
Linux. (That will return zero, as it happens.)

-- PMM

Re: [Qemu-devel] Question regarding two variables in qemu migration code

2015-01-13 Thread Jidong Xiao

On Tue, Jan 13, 2015 at 1:38 AM, Dr. David Alan Gilbert
dgilb...@redhat.com wrote:
 * Jidong Xiao (jidong.x...@gmail.com) wrote:
 Hi,

 Hi,

 I am looking at the qemu source code, and trying to understand the
 migration part. In arch_init.c, there are two variables which seems
 quite confusing to me,

 They are:

 static uint64_t migration_dirty_pages;

 'migration_dirty_pages' is the number of pages that are currently known
 that need to be sent to the destination; it goes down whenever we send
 a page, but goes up when we sync the dirty bitmap that tells us that
 something changed the data in the page (see migration_bitmap_sync_range )

 static int64_t num_dirty_pages_period; // defined in function
 migration_bitmap_sync()

 This is looking how many pages we've noticed are now dirty within
 a particular time - to try and get an estimate of how fast memory is changing
 If you see migration_bitmap_sync  has an:
if (end_time  start_time + 1000) {

 and inside there it uses num_dirty_pages_period to update dirty_pages_rate.


 Can anyone kindly explain that what does these two variables mean? Thanks.

 -Jidong

 Dave
 --
Thanks Dave, your explanation is really really helpful.

But in function migration_bitmap_sync(), I see this:

num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;

If as you said, num_dirty_pages_period refers to the pages get dirty
within a particular time, then why it is +=“, instead of =? i.e.,
something like this:

num_dirty_pages_period = migration_dirty_pages - num_dirty_pages_init;

I just don't see why num_dirty_pages_period has to be accumulated with
its previous value.

-Jidong

[Qemu-devel] [PULL 1/2] xen-pt: Fix PCI devices re-attach failed

2015-01-13 Thread Stefano Stabellini

From: Liang Li liang.z...@intel.com

Use the 'xl pci-attach $DomU $BDF' command to attach more than
one PCI devices to the guest, then detach the devices with
'xl pci-detach $DomU $BDF', after that, re-attach these PCI
devices again, an error message will be reported like following:

libxl: error: libxl_qmp.c:287:qmp_handle_error_response: receive
an error message from QMP server: Duplicate ID 'pci-pt-03_10.1'
for device.

If using the 'address_space_memory' as the parameter of
'memory_listener_register', 'xen_pt_region_del' will not be called
if the memory region's name is not 'xen-pci-pt-*' when the devices
is detached. This will cause the device's related QemuOpts object
not be released properly.

Using the device's address space can avoid such issue, because the
calling count of 'xen_pt_region_add' when attaching and the calling
count of 'xen_pt_region_del' when detaching is the same, so all the
memory region ref and unref by the 'xen_pt_region_add' and
'xen_pt_region_del' can be released properly.

Signed-off-by: Liang Li liang.z...@intel.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Reported-by: Longtao Pang longtaox.p...@intel.com
---
 hw/xen/xen_pt.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index c1bf357..f2893b2 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -736,7 +736,7 @@ static int xen_pt_initfn(PCIDevice *d)
 }
 
 out:
-memory_listener_register(s-memory_listener, address_space_memory);
+memory_listener_register(s-memory_listener, s-dev.bus_master_as);
 memory_listener_register(s-io_listener, address_space_io);
 XEN_PT_LOG(d,
Real physical device %02x:%02x.%d registered successfully!\n,
-- 
1.7.10.4

[Qemu-devel] [PULL 2/2] xen-hvm: increase maxmem before calling xc_domain_populate_physmap

2015-01-13 Thread Stefano Stabellini

Increase maxmem before calling xc_domain_populate_physmap_exact to
avoid the risk of running out of guest memory. This way we can also
avoid complex memory calculations in libxl at domain construction
time.

This patch fixes an abort() when assigning more than 4 NICs to a VM.

Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
Signed-off-by: Don Slutz dsl...@verizon.com
---
 xen-hvm.c |   24 
 1 file changed, 24 insertions(+)

diff --git a/xen-hvm.c b/xen-hvm.c
index 7548794..e2e575b 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -90,6 +90,12 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t 
*shared_page, int vcpu)
 #endif
 
 #define BUFFER_IO_MAX_DELAY  100
+/* Leave some slack so that hvmloader does not complain about lack of
+ * memory at boot time (Could not allocate order=0 extent).
+ * Once hvmloader is modified to cope with that situation without
+ * printing warning messages, QEMU_SPARE_PAGES can be removed.
+ */
+#define QEMU_SPARE_PAGES 16
 
 typedef struct XenPhysmap {
 hwaddr start_addr;
@@ -244,6 +250,8 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, 
MemoryRegion *mr)
 unsigned long nr_pfn;
 xen_pfn_t *pfn_list;
 int i;
+xc_domaininfo_t info;
+unsigned long free_pages;
 
 if (runstate_check(RUN_STATE_INMIGRATE)) {
 /* RAM already populated in Xen */
@@ -266,6 +274,22 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, 
MemoryRegion *mr)
 pfn_list[i] = (ram_addr  TARGET_PAGE_BITS) + i;
 }
 
+if ((xc_domain_getinfolist(xen_xc, xen_domid, 1, info) != 1) ||
+(info.domain != xen_domid)) {
+hw_error(xc_domain_getinfolist failed);
+}
+free_pages = info.max_pages - info.tot_pages;
+if (free_pages  QEMU_SPARE_PAGES) {
+free_pages -= QEMU_SPARE_PAGES;
+} else {
+free_pages = 0;
+}
+if ((free_pages  nr_pfn) 
+(xc_domain_setmaxmem(xen_xc, xen_domid,
+ ((info.max_pages + nr_pfn - free_pages)
+   (XC_PAGE_SHIFT - 10)))  0)) {
+hw_error(xc_domain_setmaxmem failed);
+}
 if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0, 0, 
pfn_list)) {
 hw_error(xen: failed to populate ram at  RAM_ADDR_FMT, ram_addr);
 }
-- 
1.7.10.4

[Qemu-devel] [PATCH 8/8] iotests: test internal persistent dirty bitmap

2015-01-13 Thread Vladimir Sementsov-Ogievskiy

The test performs several vm reloads with checking and updating dirty
bitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
---
 tests/qemu-iotests/115 | 96 ++
 tests/qemu-iotests/115.out | 64 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 161 insertions(+)
 create mode 100755 tests/qemu-iotests/115
 create mode 100644 tests/qemu-iotests/115.out

diff --git a/tests/qemu-iotests/115 b/tests/qemu-iotests/115
new file mode 100755
index 000..ef853b1
--- /dev/null
+++ b/tests/qemu-iotests/115
@@ -0,0 +1,96 @@
+#!/bin/bash
+#
+# Persistent dirty bitmap test
+#
+# Performs several vm reloads with checking and updating dirty bitmap.
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+# creator
+owner=vsement...@parallels.com
+
+seq=`basename $0`
+echo QA output created by $seq
+
+here=`pwd`
+status=1# failure is the default!
+
+_cleanup()
+{
+_cleanup_qemu
+_cleanup_test_img
+}
+trap _cleanup; exit \$status 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt qcow2
+_supported_os Linux
+
+size=1G
+_make_test_img $size
+
+qemu_comm_method=monitor
+
+launch() {
+_launch_qemu -drive file=${TEST_IMG},id=disk\
+-dirty-bitmap name=bitmap,drive=disk
+echo
+echo START VM
+}
+
+print_bitmap() {
+_send_qemu_cmd $QEMU_HANDLE 'print_dirty_bitmap disk bitmap'\
+dirty regions end | sed 's/^//'
+}
+
+write() {
+_send_qemu_cmd $QEMU_HANDLE 'qemu-io disk write ' $@ '' (qemu)
+echo
+echo write $@
+}
+
+quit() {
+_send_qemu_cmd $QEMU_HANDLE 'qemu-io disk flush' (qemu)
+_send_qemu_cmd $QEMU_HANDLE 'quit' 
+echo QUIT
+wait ${QEMU_PID[$QEMU_HANDLE]} 2/dev/null
+}
+
+{
+launch
+print_bitmap
+write 50m 1m
+print_bitmap
+quit
+
+launch
+print_bitmap
+write 0m 10m
+write 700m 200m
+print_bitmap
+quit
+
+launch
+print_bitmap
+quit
+} | sed '/(qemu)/d'
+
+status=0
diff --git a/tests/qemu-iotests/115.out b/tests/qemu-iotests/115.out
new file mode 100644
index 000..d570c9f
--- /dev/null
+++ b/tests/qemu-iotests/115.out
@@ -0,0 +1,64 @@
+QA output created by 115
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
+
+START VM
+QEMU X.Y.Z monitor - type 'help' for more information
+bitmap 'bitmap'
+enabled: true
+size: 2097152
+granularity: 65536
+dirty regions begin:
+dirty regions end
+
+write 50m 1m
+wrote 1048576/1048576 bytes at offset 52428800
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+bitmap 'bitmap'
+enabled: true
+size: 2097152
+granularity: 65536
+dirty regions begin:
+102400 - 104447
+dirty regions end
+QUIT
+
+START VM
+QEMU X.Y.Z monitor - type 'help' for more information
+bitmap 'bitmap'
+enabled: true
+size: 2097152
+granularity: 65536
+dirty regions begin:
+102400 - 104447
+dirty regions end
+
+write 0m 10m
+wrote 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+write 700m 200m
+wrote 209715200/209715200 bytes at offset 734003200
+200 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+bitmap 'bitmap'
+enabled: true
+size: 2097152
+granularity: 65536
+dirty regions begin:
+0 - 20479
+102400 - 104447
+1433600 - 1843199
+dirty regions end
+QUIT
+
+START VM
+QEMU X.Y.Z monitor - type 'help' for more information
+bitmap 'bitmap'
+enabled: true
+size: 2097152
+granularity: 65536
+dirty regions begin:
+0 - 20479
+102400 - 104447
+1433600 - 1843199
+dirty regions end
+QUIT
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index a4742c6..77377b3 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -115,3 +115,4 @@
 111 rw auto quick
 113 rw auto quick
 114 rw auto quick
+115 rw auto quick
-- 
1.9.1

[Qemu-devel] [PATCH 2/8] hbitmap: store / restore

2015-01-13 Thread Vladimir Sementsov-Ogievskiy

Functions to store / restore HBitmap. HBitmap should be saved to linear
bitmap format independently of endianess.

These functions are appropriate for dirty bitmap migration, retoring the
bitmap in several steps is available.  To save performance, every step
writes only the last level of the bitmap. All other levels are restored
by hbitmap_restore_finish as a last step of restoring. So, HBitmap is
inconsistent while restoring.

Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
---
 include/qemu/hbitmap.h | 49 
 util/hbitmap.c | 87 ++
 2 files changed, 136 insertions(+)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index c48c50a..f432f7f 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -137,6 +137,55 @@ void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t 
count);
 bool hbitmap_get(const HBitmap *hb, uint64_t item);
 
 /**
+ * hbitmap_data_size:
+ * @hb: HBitmap to operate on.
+ * @count: Number of bits
+ *
+ * Return amount of bytes hbitmap_store_data needs
+ */
+uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count);
+
+/**
+ * hbitmap_store_data
+ * @hb: HBitmap to oprate on.
+ * @buf: Buffer to store bitmap data.
+ * @start: First bit to store.
+ * @count: Number of bits to store.
+ *
+ * Stores HBitmap data corresponding to given region. The format of saved data
+ * is linear sequence of bits, so it can be used by hbitmap_restore_data
+ * independently of endianess
+ */
+void hbitmap_store_data(const HBitmap *hb, uint8_t *buf,
+uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_restore_data
+ * @hb: HBitmap to oprate on.
+ * @buf: Buffer to restore bitmap data from.
+ * @start: First bit to restore.
+ * @count: Number of bits to restore.
+ *
+ * Retores HBitmap data corresponding to given region. The format is the same
+ * as for hbitmap_store_data.
+ *
+ * ! The bitmap becomes inconsistent after this operation.
+ * hbitmap_restore_finish should be called before using the bitmap after
+ * data restoring.
+ */
+void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
+  uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_restore_finish
+ * @hb: HBitmap to operate on.
+ *
+ * Repair HBitmap after calling hbitmap_restore_data. Actuall all HBitmap
+ * layers are restore here.
+ */
+void hbitmap_restore_finish(HBitmap *hb);
+
+/**
  * hbitmap_free:
  * @hb: HBitmap to operate on.
  *
diff --git a/util/hbitmap.c b/util/hbitmap.c
index f400dcb..5d1a776 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -366,6 +366,93 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
 return (hb-levels[HBITMAP_LEVELS - 1][pos  BITS_PER_LEVEL]  bit) != 0;
 }
 
+uint64_t hbitmap_data_size(const HBitmap *hb, uint64_t count)
+{
+uint64_t size;
+
+if (count == 0) {
+return 0;
+}
+
+size = (((count - 1)  hb-granularity)  BITS_PER_LEVEL) + 1;
+
+return size * sizeof(unsigned long);
+}
+
+void hbitmap_store_data(const HBitmap *hb, uint8_t *buf,
+uint64_t start, uint64_t count)
+{
+uint64_t last = start + count - 1;
+unsigned long *out = (unsigned long *)buf;
+
+if (count == 0) {
+return;
+}
+
+start = (start  hb-granularity)  BITS_PER_LEVEL;
+last = (last  hb-granularity)  BITS_PER_LEVEL;
+count = last - start + 1;
+
+#ifdef __BIG_ENDIAN_BITFIELD
+for (i = start; i = last; ++i) {
+unsigned long el = hb-levels[HBITMAP_LEVELS - 1][i];
+out[i] = (BITS_PER_LONG == 32 ? cpu_to_le32(el) : cpu_to_le64(el));
+}
+#else
+memcpy(out, hb-levels[HBITMAP_LEVELS - 1][start],
+   count * sizeof(unsigned long));
+#endif
+}
+
+void hbitmap_restore_data(HBitmap *hb, uint8_t *buf,
+  uint64_t start, uint64_t count)
+{
+uint64_t last = start + count - 1;
+unsigned long *in = (unsigned long *)buf;
+
+if (count == 0) {
+return;
+}
+
+start = (start  hb-granularity)  BITS_PER_LEVEL;
+last = (last  hb-granularity)  BITS_PER_LEVEL;
+count = last - start + 1;
+
+#ifdef __BIG_ENDIAN_BITFIELD
+for (i = start; i = last; ++i) {
+hb-levels[HBITMAP_LEVELS - 1][i] =
+(BITS_PER_LONG == 32 ? be32_to_cpu(in[i]) : be64_to_cpu(in[i]));
+}
+#else
+memcpy(hb-levels[HBITMAP_LEVELS - 1][start], in,
+   count * sizeof(unsigned long));
+#endif
+}
+
+void hbitmap_restore_finish(HBitmap *bitmap)
+{
+int64_t i, size, prev_size;
+int lev;
+
+/* restore levels starting from penultimate to zero level, assuming
+ * that the last level is ok */
+size = MAX((bitmap-size + BITS_PER_LONG - 1)  BITS_PER_LEVEL, 1);
+for (lev = HBITMAP_LEVELS - 1; lev--  0; ) {
+prev_size = size;
+size = MAX((size + BITS_PER_LONG - 1)  BITS_PER_LEVEL, 1);
+memset(bitmap-levels[lev], 0, size * sizeof(unsigned long));
+
+for (i = 0; i

[Qemu-devel] [PATCH 1/8] tls: require compiler support for __thread

2015-01-13 Thread Paolo Bonzini

The block layer is now using __thread unconditionally.  Remove the
fake TLS wrappers (that actually aren't TLS on !Linux) in
include/qemu/tls.h, and add a testcase.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 configure  |  9 +-
 exec.c |  2 +-
 include/qemu/tls.h | 52 --
 include/qom/cpu.h  |  4 +--
 tests/Makefile |  7 -
 tests/test-tls.c   | 83 ++
 6 files changed, 92 insertions(+), 65 deletions(-)
 delete mode 100644 include/qemu/tls.h
 create mode 100644 tests/test-tls.c

diff --git a/configure b/configure
index 7539645..0580606 100755
--- a/configure
+++ b/configure
@@ -1554,14 +1554,7 @@ fi
 
 if test $pie != no ; then
   cat  $TMPC  EOF
-
-#ifdef __linux__
-#  define THREAD __thread
-#else
-#  define THREAD
-#endif
-
-static THREAD int tls_var;
+static __thread int tls_var;
 
 int main(void) { return tls_var; }
 
diff --git a/exec.c b/exec.c
index 081818e..9046203 100644
--- a/exec.c
+++ b/exec.c
@@ -85,7 +85,7 @@ static MemoryRegion io_mem_unassigned;
 struct CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
 /* current CPU in the current thread. It is only valid inside
cpu_exec() */
-DEFINE_TLS(CPUState *, current_cpu);
+__thread CPUState *current_cpu;
 /* 0 = Do not count executed instructions.
1 = Precise instruction counting.
2 = Adaptive rate instruction counting.  */
diff --git a/include/qemu/tls.h b/include/qemu/tls.h
deleted file mode 100644
index b92ea9d..000
--- a/include/qemu/tls.h
+++ /dev/null
@@ -1,52 +0,0 @@
-/*
- * Abstraction layer for defining and using TLS variables
- *
- * Copyright (c) 2011 Red Hat, Inc
- * Copyright (c) 2011 Linaro Limited
- *
- * Authors:
- *  Paolo Bonzini pbonz...@redhat.com
- *  Peter Maydell peter.mayd...@linaro.org
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2 of
- * the License, or (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program; if not, see http://www.gnu.org/licenses/.
- */
-
-#ifndef QEMU_TLS_H
-#define QEMU_TLS_H
-
-/* Per-thread variables. Note that we only have implementations
- * which are really thread-local on Linux; the dummy implementations
- * define plain global variables.
- *
- * This means that for the moment use should be restricted to
- * per-VCPU variables, which are OK because:
- *  - the only -user mode supporting multiple VCPU threads is linux-user
- *  - TCG system mode is single-threaded regarding VCPUs
- *  - KVM system mode is multi-threaded but limited to Linux
- *
- * TODO: proper implementations via Win32 .tls sections and
- * POSIX pthread_getspecific.
- */
-#ifdef __linux__
-#define DECLARE_TLS(type, x) extern DEFINE_TLS(type, x)
-#define DEFINE_TLS(type, x)  __thread __typeof__(type) tls__##x
-#define tls_var(x)   tls__##x
-#else
-/* Dummy implementations which define plain global variables */
-#define DECLARE_TLS(type, x) extern DEFINE_TLS(type, x)
-#define DEFINE_TLS(type, x)  __typeof__(type) tls__##x
-#define tls_var(x)   tls__##x
-#endif
-
-#endif
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 2098f1c..a93466d 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -26,7 +26,6 @@
 #include exec/hwaddr.h
 #include qemu/queue.h
 #include qemu/thread.h
-#include qemu/tls.h
 #include qemu/typedefs.h
 
 typedef int (*WriteCoreDumpFunction)(const void *buf, size_t size,
@@ -310,8 +309,7 @@ extern struct CPUTailQ cpus;
 QTAILQ_FOREACH_SAFE(cpu, cpus, node, next_cpu)
 #define first_cpu QTAILQ_FIRST(cpus)
 
-DECLARE_TLS(CPUState *, current_cpu);
-#define current_cpu tls_var(current_cpu)
+extern __thread CPUState *current_cpu;
 
 /**
  * cpu_paging_enabled:
diff --git a/tests/Makefile b/tests/Makefile
index e4ddb6a..47ba143 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -57,6 +57,9 @@ check-unit-y += tests/test-cutils$(EXESUF)
 gcov-files-test-cutils-y += util/cutils.c
 check-unit-y += tests/test-mul64$(EXESUF)
 gcov-files-test-mul64-y = util/host-utils.c
+check-unit-y += tests/test-tls$(EXESUF)
+# all code tested by test-tls is inside test-tls.c
+gcov-files-test-tls-y =
 check-unit-y += tests/test-int128$(EXESUF)
 # all code tested by test-int128 is inside int128.h
 gcov-files-test-int128-y =
@@ -223,7 +226,7 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o 
tests/check-qdict.o \
tests/test-qmp-input-visitor.o tests/test-qmp-input-strict.o \
tests/test-qmp-commands.o tests/test-visitor-serialization.o \
tests/test-x86-cpuid.o

[Qemu-devel] [PATCH 2/9] qmp hmp: Improve error messages when SPICE is not in use

2015-01-13 Thread Markus Armbruster

Commit 7572150 adopted QERR_DEVICE_NOT_ACTIVE for the purpose,
probably because adding another error seemed cumbersome overkill.
Produces No spice device has been activated, which is awkward.

We've since abandoned our quest for rich error objects.  Time to
undo the damage to this error message.  Replace it by SPICE is not in
use.

Keep the stupid DeviceNotActive ErrorClass for compatibility, even
though Libvirt doesn't use it.

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 include/ui/qemu-spice.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/ui/qemu-spice.h b/include/ui/qemu-spice.h
index b3f2679..99bb622 100644
--- a/include/ui/qemu-spice.h
+++ b/include/ui/qemu-spice.h
@@ -91,8 +91,8 @@ static inline int qemu_spice_display_add_client(int csock, 
int skipauth,
 static inline int qemu_using_spice(Error **errp)
 {
 if (!using_spice) {
-/* correct one? spice isn't a device ,,, */
-error_set(errp, QERR_DEVICE_NOT_ACTIVE, spice);
+error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE,
+  SPICE is not in use);
 return 0;
 }
 return 1;
-- 
1.9.3

[Qemu-devel] [PATCH 6/8] memory: remove assertion on memory_region_destroy

2015-01-13 Thread Paolo Bonzini

From: Jan Kiszka jan.kis...@siemens.com

Now that memory_region_destroy can be called from an RCU callback,
checking the BQL-protected global memory_region_transaction_depth
does not make much sense.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 memory.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/memory.c b/memory.c
index c343bf3..8c3d8c0 100644
--- a/memory.c
+++ b/memory.c
@@ -1263,7 +1263,6 @@ static void memory_region_finalize(Object *obj)
 MemoryRegion *mr = MEMORY_REGION(obj);
 
 assert(QTAILQ_EMPTY(mr-subregions));
-assert(memory_region_transaction_depth == 0);
 mr-destructor(mr);
 memory_region_clear_coalescing(mr);
 g_free((char *)mr-name);
-- 
1.8.3.1

[Qemu-devel] [PATCH 9/9] balloon: Eliminate silly QERR_ macros

2015-01-13 Thread Markus Armbruster

The QERR_ macros are leftovers from the days of rich error objects.
They're used with error_set() and qerror_report(), and expand into the
first *two* arguments.  This trickiness has become pointless.  Clean
up.

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 balloon.c | 6 --
 include/qapi/qmp/qerror.h | 6 --
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/balloon.c b/balloon.c
index aa30617..dcf5b9c 100644
--- a/balloon.c
+++ b/balloon.c
@@ -39,11 +39,13 @@ static void *balloon_opaque;
 static int have_ballon(Error **errp)
 {
 if (kvm_enabled()  !kvm_has_sync_mmu()) {
-error_set(errp, QERR_KVM_MISSING_CAP, synchronous MMU, balloon);
+error_set(errp, ERROR_CLASS_KVM_MISSING_CAP,
+  Using KVM without synchronous MMU, balloon unavailable);
 return 0;
 }
 if (!balloon_event_fn) {
-error_set(errp, QERR_DEVICE_NOT_ACTIVE, balloon);
+error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE,
+  No balloon device has been activated);
 return 0;
 }
 return 1;
diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h
index 28f980e..eeaf0cb 100644
--- a/include/qapi/qmp/qerror.h
+++ b/include/qapi/qmp/qerror.h
@@ -70,9 +70,6 @@ void qerror_report_err(Error *err);
 #define QERR_DEVICE_NO_HOTPLUG \
 ERROR_CLASS_GENERIC_ERROR, Device '%s' does not support hotplugging
 
-#define QERR_DEVICE_NOT_ACTIVE \
-ERROR_CLASS_DEVICE_NOT_ACTIVE, No %s device has been activated
-
 #define QERR_DEVICE_NOT_ENCRYPTED \
 ERROR_CLASS_GENERIC_ERROR, Device '%s' is not encrypted
 
@@ -109,9 +106,6 @@ void qerror_report_err(Error *err);
 #define QERR_JSON_PARSING \
 ERROR_CLASS_GENERIC_ERROR, Invalid JSON syntax
 
-#define QERR_KVM_MISSING_CAP \
-ERROR_CLASS_KVM_MISSING_CAP, Using KVM without %s, %s unavailable
-
 #define QERR_MIGRATION_ACTIVE \
 ERROR_CLASS_GENERIC_ERROR, There's a migration process in progress
 
-- 
1.9.3

[Qemu-devel] [PATCH 8/8] memory: avoid ref/unref in memory_region_find

2015-01-13 Thread Paolo Bonzini

Do the entire lookup under RCU.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 memory.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/memory.c b/memory.c
index a844ced..577e87c 100644
--- a/memory.c
+++ b/memory.c
@@ -1828,7 +1828,8 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
 }
 range = addrrange_make(int128_make64(addr), int128_make64(size));
 
-view = address_space_get_flatview(as);
+rcu_read_lock();
+view = atomic_rcu_read(as-current_map);
 fr = flatview_lookup(view, range);
 if (!fr) {
 flatview_unref(view);
@@ -1850,7 +1851,7 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
 ret.readonly = fr-readonly;
 memory_region_ref(ret.mr);
 
-flatview_unref(view);
+rcu_read_unlock();
 return ret;
 }
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v5] block/raw-posix.c: Fixes raw_getlength() on Mac OS X so that it reports the correct length of a real CD

2015-01-13 Thread Programmingkid

Allows for using real cdrom disc in QEMU under Mac OS X.

This command was used to see if this patch worked:
./qemu-system-i386 -drive if=none,id=drive0,file=/dev/null,format=raw

Signed-off-by: John Arbuckle programmingk...@gmail.com

---
Replaced -errno with 0 for ioctl() failure return value.
make check now passes with this patch.


 block/raw-posix.c |   18 +-
 configure |2 +-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e51293a..16fa0a4 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1312,7 +1312,23 @@ again:
 if (size == 0)
 #endif
 #if defined(__APPLE__)  defined(__MACH__)
-size = LLONG_MAX;
+{
+uint64_t sectors = 0;
+uint32_t sector_size = 0;
+
+/* Query the number of sectors on the disk */
+ret = ioctl(fd, DKIOCGETBLOCKCOUNT, sectors);
+if (ret == -1) {
+return 0;
+}
+
+/* Query the size of each sector */
+ret = ioctl(fd, DKIOCGETBLOCKSIZE, sector_size);
+if (ret == -1) {
+return 0;
+}
+size = sectors * sector_size;
+}
 #else
 size = lseek(fd, 0LL, SEEK_END);
 if (size  0) {
diff --git a/configure b/configure
index cae588c..32d3d3f 100755
--- a/configure
+++ b/configure
@@ -611,7 +611,7 @@ Darwin)
   cocoa=yes
   audio_drv_list=coreaudio
   audio_possible_drivers=coreaudio sdl fmod
-  LDFLAGS=-framework CoreFoundation -framework IOKit $LDFLAGS
+  LDFLAGS=-framework CoreFoundation -framework IOKit -framework 
ApplicationServices $LDFLAGS
   libs_softmmu=-F/System/Library/Frameworks -framework Cocoa -framework IOKit 
$libs_softmmu
   # Disable attempts to use ObjectiveC features in os/object.h since they
   # won't work when we're compiling with gcc as a C compiler.
-- 
1.7.5.4

Re: [Qemu-devel] [PATCH v11 13/13] block: BdrvDirtyBitmap miscellaneous fixup

2015-01-13 Thread John Snow




On 01/13/2015 11:50 AM, Vladimir Sementsov-Ogievskiy wrote:

Hmm. May be, update the size field to be uint64_t too? Negative size is
not meaningful..

Best regards,
Vladimir



No, it is not meaningful. However, it does match the return type from 
bdrv_nb_sectors(bs), which uses -1 to indicate an error. So we also 
don't really need to change it in this circumstance, since we'll not be 
exceeding what int64_t/bdrv_nb_sectors can provide anyway.


(Plus, INT64_MAX sectors is an absurdly large amount of bytes!)

I'll see how many changes this requires in other helper functions when I 
prepare a V12. If it's easy I will do it.


Thanks!
--js


On 12.01.2015 19:31, John Snow wrote:

(1) granularity type consistency: Update the granularity to be uint64_t
 in all places. This value is always in bytes.

(2) Some documentation for the fields within BdrvDirtyBitmap.

Signed-off-by: John Snow js...@redhat.com
---
  block.c   | 16 
  include/block/block.h |  2 +-
  2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index 9e2b8c0..13f9cc0 100644
--- a/block.c
+++ b/block.c
@@ -52,13 +52,13 @@
  #endif
  struct BdrvDirtyBitmap {
-HBitmap *bitmap;
-BdrvDirtyBitmap *successor;
-int64_t size;
-int64_t granularity;
-char *name;
-bool enabled;
-bool frozen;
+HBitmap *bitmap;/* Dirty sector bitmap */
+BdrvDirtyBitmap *successor; /* Anonymous child */
+int64_t size;   /* Number of sectors */
+uint64_t granularity;   /* Granularity in bytes */
+char *name; /* Optional non-empty unique ID */
+bool enabled;   /* Writes are being tracked */
+bool frozen;/* Protected; see
bdrv_freeze_dirty_bitmap() */
  QLIST_ENTRY(BdrvDirtyBitmap) list;
  };
@@ -5350,7 +5350,7 @@ void
bdrv_dirty_bitmap_make_anon(BlockDriverState *bs, BdrvDirtyBitmap
*bitmap)
  }
  BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
-  int granularity,
+  uint64_t granularity,
const char *name,
Error **errp)
  {
diff --git a/include/block/block.h b/include/block/block.h
index 99ed679..749429e 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -430,7 +430,7 @@ bool bdrv_qiov_is_aligned(BlockDriverState *bs,
QEMUIOVector *qiov);
  struct HBitmapIter;
  typedef struct BdrvDirtyBitmap BdrvDirtyBitmap;
  BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
-  int granularity,
+  uint64_t granularity,
const char *name,
Error **errp);
  int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs,





--
—js

[Qemu-devel] [PATCH 3/8] qcow2: add dirty-bitmaps feature

2015-01-13 Thread Vladimir Sementsov-Ogievskiy

Adds dirty-bitmaps feature to qcow2 format as specified in
docs/specs/qcow2.txt

Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
---
 block/Makefile.objs|   2 +-
 block/qcow2-dirty-bitmap.c | 514 +
 block/qcow2.c  |  26 +++
 block/qcow2.h  |  48 +
 include/block/block_int.h  |  10 +
 5 files changed, 599 insertions(+), 1 deletion(-)
 create mode 100644 block/qcow2-dirty-bitmap.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 04b0e43..eebd1c9 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,5 +1,5 @@
 block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o 
vvfat.o
-block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
+block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o qcow2-dirty-bitmap.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
new file mode 100644
index 000..b3d114f
--- /dev/null
+++ b/block/qcow2-dirty-bitmap.c
@@ -0,0 +1,514 @@
+/*
+ * Dirty bitmpas for the QCOW version 2 format
+ *
+ * Copyright (c) 2014-2015 Vladimir Sementsov-Ogievskiy
+ *
+ * This file is derived from qcow2-snapshot.c, original copyright:
+ * Copyright (c) 2004-2006 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include qemu-common.h
+#include block/block_int.h
+#include block/qcow2.h
+
+void qcow2_free_dirty_bitmaps(BlockDriverState *bs)
+{
+BDRVQcowState *s = bs-opaque;
+int i;
+
+for (i = 0; i  s-nb_dirty_bitmaps; i++) {
+g_free(s-dirty_bitmaps[i].name);
+}
+g_free(s-dirty_bitmaps);
+s-dirty_bitmaps = NULL;
+s-nb_dirty_bitmaps = 0;
+}
+
+int qcow2_read_dirty_bitmaps(BlockDriverState *bs)
+{
+BDRVQcowState *s = bs-opaque;
+QCowDirtyBitmapHeader h;
+QCowDirtyBitmap *bm;
+int i, name_size;
+int64_t offset;
+int ret;
+
+if (!s-nb_dirty_bitmaps) {
+s-dirty_bitmaps = NULL;
+s-dirty_bitmaps_size = 0;
+return 0;
+}
+
+offset = s-dirty_bitmaps_offset;
+s-dirty_bitmaps = g_new0(QCowDirtyBitmap, s-nb_dirty_bitmaps);
+
+for (i = 0; i  s-nb_dirty_bitmaps; i++) {
+/* Read statically sized part of the dirty_bitmap header */
+offset = align_offset(offset, 8);
+ret = bdrv_pread(bs-file, offset, h, sizeof(h));
+if (ret  0) {
+goto fail;
+}
+
+offset += sizeof(h);
+bm = s-dirty_bitmaps + i;
+bm-l1_table_offset = be64_to_cpu(h.l1_table_offset);
+bm-l1_size = be32_to_cpu(h.l1_size);
+bm-bitmap_granularity = be32_to_cpu(h.bitmap_granularity);
+bm-bitmap_size = be64_to_cpu(h.bitmap_size);
+
+name_size = be16_to_cpu(h.name_size);
+
+/* Read dirty_bitmap name */
+bm-name = g_malloc(name_size + 1);
+ret = bdrv_pread(bs-file, offset, bm-name, name_size);
+if (ret  0) {
+goto fail;
+}
+offset += name_size;
+bm-name[name_size] = '\0';
+
+if (offset - s-dirty_bitmaps_offset  QCOW_MAX_DIRTY_BITMAPS_SIZE) {
+ret = -EFBIG;
+goto fail;
+}
+}
+
+assert(offset - s-dirty_bitmaps_offset = INT_MAX);
+s-dirty_bitmaps_size = offset - s-dirty_bitmaps_offset;
+return 0;
+
+fail:
+qcow2_free_dirty_bitmaps(bs);
+return ret;
+}
+
+/* add at the end of the file a new list of dirty bitmaps */
+static int qcow2_write_dirty_bitmaps(BlockDriverState *bs)
+{
+BDRVQcowState *s = bs-opaque;
+QCowDirtyBitmap *bm;
+QCowDirtyBitmapHeader h;
+int i, name_size, dirty_bitmaps_size;
+struct {
+uint32_t nb_dirty_bitmaps;
+uint64_t dirty_bitmaps_offset;
+}

[Qemu-devel] [PATCH 7/8] qmp: print dirty bitmap

2015-01-13 Thread Vladimir Sementsov-Ogievskiy

Adds qmp and hmp commands to print dirty bitmap. This is needed only for
testing persistent dirty bitmap feature.

Signed-off-by: Vladimir Sementsov-Ogievskiy vsement...@parallels.com
---
 block.c   | 33 +
 blockdev.c| 13 +
 hmp-commands.hx   | 15 +++
 hmp.c |  8 
 hmp.h |  1 +
 include/block/block.h |  1 +
 qapi-schema.json  |  3 ++-
 qapi/block-core.json  |  3 +++
 qmp-commands.hx   |  5 +
 9 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 77419e9..3e6dedf 100644
--- a/block.c
+++ b/block.c
@@ -5445,6 +5445,39 @@ int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap)
 return res;
 }
 
+void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap)
+{
+unsigned long a = 0, b = 0;
+
+printf(bitmap '%s'\n, bitmap-name ? bitmap-name : no name);
+printf(enabled: %s\n, bitmap-enabled ? true : false);
+printf(size: % PRId64 \n, bitmap-size);
+printf(granularity: % PRId64 \n, bitmap-granularity);
+printf(dirty regions begin:\n);
+
+while (true) {
+for (a = b; a  bitmap-size  !hbitmap_get(bitmap-bitmap, a); ++a) {
+;
+}
+if (a = bitmap-size) {
+break;
+}
+
+for (b = a + 1;
+ b  bitmap-size  hbitmap_get(bitmap-bitmap, b);
+ ++b) {
+;
+}
+
+printf(%ld - %ld\n, a, b - 1);
+if (b = bitmap-size) {
+break;
+}
+}
+
+printf(dirty regions end\n);
+}
+
 BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
   int granularity,
   const char *name,
diff --git a/blockdev.c b/blockdev.c
index 8a9be08..8b58c2e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2079,6 +2079,19 @@ void qmp_block_dirty_bitmap_add(const char *node_ref, 
const char *name,
 aio_context_release(aio_context);
 }
 
+void qmp_block_dirty_bitmap_print(const char *node_ref, const char *name,
+  Error **errp)
+{
+BdrvDirtyBitmap *bitmap;
+
+bitmap = block_dirty_bitmap_lookup(node_ref, name, NULL, errp);
+if (!bitmap) {
+return;
+}
+
+bdrv_print_dirty_bitmap(bitmap);
+}
+
 void qmp_block_dirty_bitmap_remove(const char *node_ref, const char *name,
Error **errp)
 {
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e37bc8b..a9be506 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -58,6 +58,21 @@ Quit the emulator.
 ETEXI
 
 {
+.name   = print_dirty_bitmap,
+.args_type  = device:B,bitmap:s,
+.params = device bitmap,
+.help   = print dirty bitmap,
+.user_print = monitor_user_noop,
+.mhandler.cmd = hmp_print_dirty_bitmap,
+},
+
+STEXI
+@item print_dirty_bitmap device_id bitmap_name
+@findex print_dirty_bitmap
+Print dirty bitmap meta information and dirty regions.
+ETEXI
+
+{
 .name   = block_resize,
 .args_type  = device:B,size:o,
 .params = device size,
diff --git a/hmp.c b/hmp.c
index 63b19c7..a269145 100644
--- a/hmp.c
+++ b/hmp.c
@@ -782,6 +782,14 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict)
 qapi_free_TPMInfoList(info_list);
 }
 
+void hmp_print_dirty_bitmap(Monitor *mon, const QDict *qdict)
+{
+const char *device = qdict_get_str(qdict, device);
+const char *name = qdict_get_str(qdict, bitmap);
+
+qmp_block_dirty_bitmap_print(device, name, NULL);
+}
+
 void hmp_quit(Monitor *mon, const QDict *qdict)
 {
 monitor_suspend(mon);
diff --git a/hmp.h b/hmp.h
index 4bb5dca..6bbbc33 100644
--- a/hmp.h
+++ b/hmp.h
@@ -19,6 +19,7 @@
 #include qapi-types.h
 #include qapi/qmp/qdict.h
 
+void hmp_print_dirty_bitmap(Monitor *mon, const QDict *qdict);
 void hmp_info_name(Monitor *mon, const QDict *qdict);
 void hmp_info_version(Monitor *mon, const QDict *qdict);
 void hmp_info_kvm(Monitor *mon, const QDict *qdict);
diff --git a/include/block/block.h b/include/block/block.h
index f36557f..7188791 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -466,6 +466,7 @@ BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState 
*bs,
 const char *name,
 Error **errp);
 int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap);
+void bdrv_print_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
diff --git a/qapi-schema.json b/qapi-schema.json
index 85f55d9..1475f69 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1263,7 +1263,8 @@
'blockdev-snapshot-internal-sync': 'BlockdevSnapshotInternal',
'block-dirty-bitmap-add': 'BlockDirtyBitmapAdd',
'block-dirty-bitmap-enable': 'BlockDirtyBitmap',
-

Re: [Qemu-devel] [PATCH v11 08/13] block: Add bitmap successors

2015-01-13 Thread John Snow




On 01/13/2015 04:24 AM, Fam Zheng wrote:

On Mon, 01/12 11:31, John Snow wrote:

A bitmap successor is an anonymous BdrvDirtyBitmap that is intended to
be created just prior to a sensitive operation (e.g. Incremental Backup)
that can either succeed or fail, but during the course of which we still
want a bitmap tracking writes.

On creating a successor, we freeze the parent bitmap which prevents
its deletion, enabling, anonymization, or creating a bitmap with the
same name.

On success, the parent bitmap can abdicate responsibility to the
successor, which will inherit its name. The successor will have been
tracking writes during the course of the backup operation. The parent
will be safely deleted.

On failure, we can reclaim the successor from the parent, unifying
them such that the resulting bitmap describes all writes occurring since
the last successful backup, for instance. Reclamation will thaw the
parent, but not explicitly re-enable it.


If we compare this with image snapshot and overlay, it fits the current backing
chain model very well.  Given that we're on the way to persistent dirty bitmap,
which will probably be read/written with general block.c code, I'm wondering if
it will be any better to reuse the block layer backing file code to handle
successor logic?

Also with the frozen feature built in dirty bitmaps, is it okay to drop
enabled state?

I think there are three states for a bitmap:

1) Active, no successor (similar to an read-write top image)
2) Frozen, no successor (similar to an read-only top image)
3) Frozen, with successor (similar to an read-only backing file, with an
overlap)

Admittedly, more code is needed than this patch, in order to glue hbitmap and
block layer together, but it would probably make live migration of dirty bitmap
very easy, but I'm not sure without a closer look.



Hmm -- I suppose so. I was just treating frozen as a stricter disabled.

Disabled just stops tracking writes, but frozen was intended to stop 
deletion, renaming, etc. If there is no use case for the softer 
disabled, then it can really be just a single parameter.




Signed-off-by: John Snow js...@redhat.com
---
  block.c   | 123 --
  blockdev.c|  14 ++
  include/block/block.h |  10 
  3 files changed, 144 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index bd4b449..3f33b9d 100644
--- a/block.c
+++ b/block.c
@@ -53,10 +53,12 @@

  struct BdrvDirtyBitmap {
  HBitmap *bitmap;
+BdrvDirtyBitmap *successor;
  int64_t size;
  int64_t granularity;
  char *name;
  bool enabled;
+bool frozen;
  QLIST_ENTRY(BdrvDirtyBitmap) list;
  };

@@ -5342,6 +5344,7 @@ BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState 
*bs, const char *name)

  void bdrv_dirty_bitmap_make_anon(BlockDriverState *bs, BdrvDirtyBitmap 
*bitmap)
  {
+assert(!bitmap-frozen);
  g_free(bitmap-name);
  bitmap-name = NULL;
  }
@@ -5379,9 +5382,114 @@ BdrvDirtyBitmap 
*bdrv_create_dirty_bitmap(BlockDriverState *bs,
  return bitmap;
  }

+/**
+  * A frozen bitmap cannot be renamed, deleted, cleared,
+  * or set. A frozen bitmap can only abdicate, reclaim,
+  * or thaw.
+  */
+static void bdrv_freeze_dirty_bitmap(BdrvDirtyBitmap *bitmap)
+{
+bitmap-frozen = true;
+}
+
+static void bdrv_thaw_dirty_bitmap(BdrvDirtyBitmap *bitmap)
+{
+bitmap-frozen = false;
+}
+
+bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap)
+{
+return bitmap-frozen;
+}
+
  bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap)
  {
-return bitmap-enabled;
+return bitmap-enabled  !bitmap-frozen;


An indication that enabled could be replaced by frozen. Otherwise it looks
confusing to me.



Perhaps. Here I mean to say that Frozen implies disabled. The stronger 
condition implies the weaker.



+}
+
+/**
+ * Create a successor bitmap destined to replace this bitmap after an 
operation.
+ * Requires that the bitmap is not frozen and has no successor.
+ */
+int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs,
+   BdrvDirtyBitmap *bitmap, Error **errp)
+{
+uint64_t granularity;
+
+if (bdrv_dirty_bitmap_frozen(bitmap)) {
+error_setg(errp,
+   Cannot create a successor for a bitmap currently in-use.);
+return -1;
+} else if (bitmap-successor) {
+error_setg(errp, Cannot create a successor for a bitmap that 
+   already has one.);
+return -1;
+}
+
+/* Create an anonymous successor */
+granularity = bdrv_dirty_bitmap_granularity(bs, bitmap);
+bitmap-successor = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
+if (!bitmap-successor) {
+return -1;
+}
+
+/* Successor will be on or off based on our current state.
+ * Parent will be disabled and frozen. */
+bitmap-successor-enabled = bitmap-enabled;
+bdrv_disable_dirty_bitmap(bitmap);
+

[Qemu-devel] [PATCH 0/2] cpu_ldst.h: Minor cleanups in ld/st macros

2015-01-13 Thread Peter Maydell

These two patches make some minor cleanups to cpu_ldst.h which I
noticed while I was trying to get my head around our confusing
mess of load and store related functions.


Peter Maydell (2):
  cpu_ldst.h: Remove unused ldul_ macros
  cpu_ldst.h: Collapse laddr() and saddr() into ldst_get_host_addr()

 include/exec/cpu_ldst.h | 46 +++---
 1 file changed, 19 insertions(+), 27 deletions(-)

-- 
1.9.1

Re: [Qemu-devel] [PATCH 0/9] qmp hmp balloon: Cleanups around error reporting

2015-01-13 Thread Gerd Hoffmann

On Di, 2015-01-13 at 18:50 +0100, Markus Armbruster wrote:
   qmp hmp: Factor out common using spice test
   qmp hmp: Improve error messages when SPICE is not in use
   hmp: Compile hmp_info_spice() only with CONFIG_SPICE
   qmp: Clean up qmp_query_spice() #ifndef !CONFIG_SPICE dummy

Reviewed-by: Gerd Hoffmann kra...@redhat.com

[Qemu-devel] [PATCH v3 4/7] monitor: use cc-get_arch_id as the cpu index

2015-01-13 Thread Zhu Guihua

From: Gu Zheng guz.f...@cn.fujitsu.com

Use cc-get_arch_id as the cpu index to avoid the cpu index duplicated
issue in the QMP/HMP command output.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 cpus.c| 4 +++-
 monitor.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index 2edb5cd..d5e35c0 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1420,6 +1420,7 @@ CpuInfoList *qmp_query_cpus(Error **errp)
 
 CPU_FOREACH(cpu) {
 CpuInfoList *info;
+CPUClass *cc;
 #if defined(TARGET_I386)
 X86CPU *x86_cpu = X86_CPU(cpu);
 CPUX86State *env = x86_cpu-env;
@@ -1437,11 +1438,12 @@ CpuInfoList *qmp_query_cpus(Error **errp)
 CPUTriCoreState *env = tricore_cpu-env;
 #endif
 
+cc = CPU_GET_CLASS(cpu);
 cpu_synchronize_state(cpu);
 
 info = g_malloc0(sizeof(*info));
 info-value = g_malloc0(sizeof(*info-value));
-info-value-CPU = cpu-cpu_index;
+info-value-CPU = cc-get_arch_id(cpu);
 info-value-current = (cpu == first_cpu);
 info-value-halted = cpu-halted;
 info-value-thread_id = cpu-thread_id;
diff --git a/monitor.c b/monitor.c
index 1808e41..2283461 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1024,7 +1024,9 @@ static CPUArchState *mon_get_cpu(void)
 int monitor_get_cpu_index(void)
 {
 CPUState *cpu = ENV_GET_CPU(mon_get_cpu());
-return cpu-cpu_index;
+CPUClass *cc = CPU_GET_CLASS(cpu);
+
+return cc-get_arch_id(cpu);
 }
 
 static void do_info_registers(Monitor *mon, const QDict *qdict)
-- 
1.9.3

[Qemu-devel] [PATCH v3 5/7] acpi:cpu hotplug: set pcmachine as icc bus' hotplug handler

2015-01-13 Thread Zhu Guihua

From: Gu Zheng guz.f...@cn.fujitsu.com

As the pre-check in the qdev_device_add():
if (qdev_hotplug  bus  !qbus_is_hotpluggable(bus)) {
qerror_report(QERR_BUS_NO_HOTPLUG, bus-name);
return NULL;
}
if device has parent bus, the bus must have valid hotplug_handler,
otherwise can not hot plug.
Currently cpu hotplug is based on the PCMachine's hotplug handler,
so when hot add cpu, the hotpluggable check of icc bus will be
rejected.
So we set pcmachine as icc bus' hotplug handler to avoid the rejetion.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/cpu/icc_bus.c | 5 +
 hw/i386/pc_piix.c| 5 +
 hw/i386/pc_q35.c | 5 +
 include/hw/cpu/icc_bus.h | 2 ++
 4 files changed, 17 insertions(+)

diff --git a/hw/cpu/icc_bus.c b/hw/cpu/icc_bus.c
index 6646ea2..f455a20 100644
--- a/hw/cpu/icc_bus.c
+++ b/hw/cpu/icc_bus.c
@@ -92,6 +92,11 @@ static void icc_bridge_init(Object *obj)
 s-icc_bus.apic_address_space = s-apic_container;
 }
 
+ICCBus *get_icc_bus(DeviceState *dev)
+{
+return ICC_BRIDGE(dev)-icc_bus;
+}
+
 static void icc_bridge_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index f0a3201..363f796 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -281,6 +281,7 @@ static void pc_init1(MachineState *machine,
 if (pci_enabled  acpi_enabled) {
 DeviceState *piix4_pm;
 I2CBus *smbus;
+ICCBus *iccbus;
 
 smi_irq = qemu_allocate_irqs(pc_acpi_smi_interrupt, first_cpu, 1);
 /* TODO: Populate SPD eeprom data.  */
@@ -296,6 +297,10 @@ static void pc_init1(MachineState *machine,
  OBJ_PROP_LINK_UNREF_ON_RELEASE, error_abort);
 object_property_set_link(OBJECT(machine), OBJECT(piix4_pm),
  PC_MACHINE_ACPI_DEVICE_PROP, error_abort);
+
+iccbus = get_icc_bus(icc_bridge);
+object_property_set_link(OBJECT(iccbus), OBJECT(pc_machine),
+ QDEV_HOTPLUG_HANDLER_PROPERTY, error_abort);
 }
 
 if (pci_enabled) {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index a432944..2ce8b70 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -88,6 +88,7 @@ static void pc_q35_init(MachineState *machine)
 PcGuestInfo *guest_info;
 ram_addr_t lowmem;
 DriveInfo *hd[MAX_SATA_PORTS];
+ICCBus *iccbus;
 
 /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
  * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
@@ -212,6 +213,10 @@ static void pc_q35_init(MachineState *machine)
 object_property_set_link(OBJECT(machine), OBJECT(lpc),
  PC_MACHINE_ACPI_DEVICE_PROP, error_abort);
 
+iccbus = get_icc_bus(icc_bridge);
+object_property_set_link(OBJECT(iccbus), OBJECT(pc_machine),
+ QDEV_HOTPLUG_HANDLER_PROPERTY, error_abort);
+
 ich9_lpc = ICH9_LPC_DEVICE(lpc);
 ich9_lpc-pic = gsi;
 ich9_lpc-ioapic = gsi_state-ioapic_irq;
diff --git a/include/hw/cpu/icc_bus.h b/include/hw/cpu/icc_bus.h
index 98a979f..7e53afb 100644
--- a/include/hw/cpu/icc_bus.h
+++ b/include/hw/cpu/icc_bus.h
@@ -44,6 +44,8 @@ typedef struct ICCBus {
 
 #define ICC_BUS(obj) OBJECT_CHECK(ICCBus, (obj), TYPE_ICC_BUS)
 
+ICCBus *get_icc_bus(DeviceState *dev);
+
 /**
  * ICCDevice:
  *
-- 
1.9.3

[Qemu-devel] [PATCH v3 2/7] qom/cpu: move register_vmstate to common CPUClass.realizefn

2015-01-13 Thread Zhu Guihua

From: Gu Zheng guz.f...@cn.fujitsu.com

Move cpu vmstate register from cpu_exec_init into cpu_common_realizefn,
and use cc-get_arch_id as the instance id that suggested by Igor to
fix the migration issue.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 exec.c| 32 +++-
 include/qom/cpu.h |  2 ++
 qom/cpu.c |  2 ++
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/exec.c b/exec.c
index 081818e..c9ffda6 100644
--- a/exec.c
+++ b/exec.c
@@ -513,10 +513,28 @@ void tcg_cpu_address_space_init(CPUState *cpu, 
AddressSpace *as)
 }
 #endif
 
+void cpu_vmstate_register(CPUState *cpu)
+{
+CPUClass *cc = CPU_GET_CLASS(cpu);
+int cpu_index = cc-get_arch_id(cpu);
+
+if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
+vmstate_register(NULL, cpu_index, vmstate_cpu_common, cpu);
+}
+#if defined(CPU_SAVE_VERSION)  !defined(CONFIG_USER_ONLY)
+register_savevm(NULL, cpu, cpu_index, CPU_SAVE_VERSION,
+cpu_save, cpu_load, cpu-env_ptr);
+assert(cc-vmsd == NULL);
+assert(qdev_get_vmsd(DEVICE(cpu)) == NULL);
+#endif
+if (cc-vmsd != NULL) {
+vmstate_register(NULL, cpu_index, cc-vmsd, cpu);
+}
+}
+
 void cpu_exec_init(CPUArchState *env)
 {
 CPUState *cpu = ENV_GET_CPU(env);
-CPUClass *cc = CPU_GET_CLASS(cpu);
 CPUState *some_cpu;
 int cpu_index;
 
@@ -539,18 +557,6 @@ void cpu_exec_init(CPUArchState *env)
 #if defined(CONFIG_USER_ONLY)
 cpu_list_unlock();
 #endif
-if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
-vmstate_register(NULL, cpu_index, vmstate_cpu_common, cpu);
-}
-#if defined(CPU_SAVE_VERSION)  !defined(CONFIG_USER_ONLY)
-register_savevm(NULL, cpu, cpu_index, CPU_SAVE_VERSION,
-cpu_save, cpu_load, env);
-assert(cc-vmsd == NULL);
-assert(qdev_get_vmsd(DEVICE(cpu)) == NULL);
-#endif
-if (cc-vmsd != NULL) {
-vmstate_register(NULL, cpu_index, cc-vmsd, cpu);
-}
 }
 
 #if defined(TARGET_HAS_ICE)
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 2098f1c..936afcd 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -562,6 +562,8 @@ void cpu_interrupt(CPUState *cpu, int mask);
 
 #endif /* USER_ONLY */
 
+void cpu_vmstate_register(CPUState *cpu);
+
 #ifdef CONFIG_SOFTMMU
 static inline void cpu_unassigned_access(CPUState *cpu, hwaddr addr,
  bool is_write, bool is_exec,
diff --git a/qom/cpu.c b/qom/cpu.c
index 9c68fa4..a639822 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -302,6 +302,8 @@ static void cpu_common_realizefn(DeviceState *dev, Error 
**errp)
 {
 CPUState *cpu = CPU(dev);
 
+cpu_vmstate_register(cpu);
+
 if (dev-hotplugged) {
 cpu_synchronize_post_init(cpu);
 cpu_resume(cpu);
-- 
1.9.3

[Qemu-devel] [PATCH v2 02/11] acpi/cpu: add cpu hot unplug request callback function

2015-01-13 Thread Zhu Guihua

From: Gu Zheng guz.f...@cn.fujitsu.com

Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/cpu_hotplug.c | 38 +-
 include/hw/acpi/cpu_hotplug.h |  4 
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c
index 4047294..f8a10d2 100644
--- a/hw/acpi/cpu_hotplug.c
+++ b/hw/acpi/cpu_hotplug.c
@@ -12,6 +12,11 @@
 #include hw/hw.h
 #include hw/acpi/cpu_hotplug.h
 
+typedef enum STS_OPT {
+SET,
+CLEAR,
+} STS_OPT;
+
 static uint64_t cpu_status_read(void *opaque, hwaddr addr, unsigned int size)
 {
 AcpiCpuHotplug *cpus = opaque;
@@ -36,8 +41,8 @@ static const MemoryRegionOps AcpiCpuHotplug_ops = {
 },
 };
 
-static void acpi_set_cpu_present_bit(AcpiCpuHotplug *g, CPUState *cpu,
- Error **errp)
+static void acpi_update_cpu_present_bit(AcpiCpuHotplug *g, CPUState *cpu,
+STS_OPT opt, Error **errp)
 {
 CPUClass *k = CPU_GET_CLASS(cpu);
 int64_t cpu_id;
@@ -48,13 +53,23 @@ static void acpi_set_cpu_present_bit(AcpiCpuHotplug *g, 
CPUState *cpu,
 return;
 }
 
-g-sts[cpu_id / 8] |= (1  (cpu_id % 8));
+switch (opt) {
+case SET:
+g-sts[cpu_id / 8] |= (1  (cpu_id % 8));
+break;
+case CLEAR:
+g-sts[cpu_id / 8] = ~(1  (cpu_id % 8));
+break;
+default:
+g_assert(0);
+break;
+}
 }
 
 void acpi_cpu_plug_cb(ACPIREGS *ar, qemu_irq irq,
   AcpiCpuHotplug *g, DeviceState *dev, Error **errp)
 {
-acpi_set_cpu_present_bit(g, CPU(dev), errp);
+acpi_update_cpu_present_bit(g, CPU(dev), SET, errp);
 if (*errp != NULL) {
 return;
 }
@@ -66,13 +81,26 @@ void acpi_cpu_plug_cb(ACPIREGS *ar, qemu_irq irq,
 }
 }
 
+void acpi_cpu_unplug_request_cb(ACPIREGS *ar, qemu_irq irq,
+AcpiCpuHotplug *g, DeviceState *dev,
+Error **errp)
+{
+acpi_update_cpu_present_bit(g, CPU(dev), CLEAR, errp);
+if (*errp != NULL) {
+return;
+}
+
+ar-gpe.sts[0] |= ACPI_CPU_HOTPLUG_STATUS;
+acpi_update_sci(ar, irq);
+}
+
 void acpi_cpu_hotplug_init(MemoryRegion *parent, Object *owner,
AcpiCpuHotplug *gpe_cpu, uint16_t base)
 {
 CPUState *cpu;
 
 CPU_FOREACH(cpu) {
-acpi_set_cpu_present_bit(gpe_cpu, cpu, error_abort);
+acpi_update_cpu_present_bit(gpe_cpu, cpu, SET, error_abort);
 }
 memory_region_init_io(gpe_cpu-io, owner, AcpiCpuHotplug_ops,
   gpe_cpu, acpi-cpu-hotplug, ACPI_GPE_PROC_LEN);
diff --git a/include/hw/acpi/cpu_hotplug.h b/include/hw/acpi/cpu_hotplug.h
index f6d358d..8b15a3d 100644
--- a/include/hw/acpi/cpu_hotplug.h
+++ b/include/hw/acpi/cpu_hotplug.h
@@ -23,6 +23,10 @@ typedef struct AcpiCpuHotplug {
 void acpi_cpu_plug_cb(ACPIREGS *ar, qemu_irq irq,
   AcpiCpuHotplug *g, DeviceState *dev, Error **errp);
 
+void acpi_cpu_unplug_request_cb(ACPIREGS *ar, qemu_irq irq,
+AcpiCpuHotplug *g, DeviceState *dev,
+Error **errp);
+
 void acpi_cpu_hotplug_init(MemoryRegion *parent, Object *owner,
AcpiCpuHotplug *gpe_cpu, uint16_t base);
 #endif
-- 
1.9.3

[Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support

2015-01-13 Thread Zhu Guihua

This series is based on chen fan's previous i386 cpu hot remove patchset:
https://lists.nongnu.org/archive/html/qemu-devel/2013-12/msg04266.html

Via implementing ACPI standard methods _EJ0 in ACPI table, after Guest
OS remove one vCPU online, the fireware will store removed bitmap to
QEMU, then QEMU could know to notify the assigned vCPU of exiting.
Meanwhile, intruduce the QOM command 'device_del' to remove vCPU from
QEMU itself.

The whole work is based on the new hot plug/unplug framework, ,the unplug 
request
callback does the pre-check and send the request, unplug callback does the
removal handling.

This series depends on tangchen's common hot plug/unplug enhance patchset.
[RESEND PATCH v1 0/5] Common unplug and unplug request cb for memory and CPU 
hot-unplug
https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg00429.html

The is the second half of the previous series:  

[RFC V2 00/10] cpu: add device_add foo-x86_64-cpu and i386 cpu hot remove 
support
https://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04779.html  
   

If you want to test the series, you need to apply the 'device_add 
foo-x86_64-cpu'
patchset first:
[PATCH v3 0/7] cpu: add device_add foo-x86_64-cpu support
https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg01552.html

---
Changelog since v1:
 -rebase on the latest version.
 -delete patch i386/cpu: add instance finalize callback, and put it into 
patchset
  [PATCH v3 0/6] cpu: add device_add foo-x86_64-cpu support.

Changelog since RFC:
 -splited the i386 cpu hot remove into single thread.
 -replaced apic_no with apic_id, so does the related stuff to make it
  work with arbitrary CPU hotadd.
 -add the icc_device_unrealize callback to handle apic unrealize.
 -rework on the new hot plug/unplug platform.
---

Chen Fan (2):
  x86: add x86_cpu_unrealizefn() for cpu apic remove
  cpu hotplug: implement function cpu_status_write() for vcpu ejection

Gu Zheng (5):
  acpi/cpu: add cpu hot unplug request callback function
  acpi/piix4: add cpu hot unplug callback support
  acpi/ich9: add cpu hot unplug support
  pc: add cpu hot unplug callback support
  cpus: reclaim allocated vCPU objects

Zhu Guihua (4):
  acpi/piix4: add cpu hot unplug request callback support
  acpi/ich9: add cpu hot unplug request callback support
  pc: add cpu hot unplug request callback support
  acpi/cpu: add cpu hot unplug callback function

 cpus.c| 44 
 hw/acpi/cpu_hotplug.c | 88 ---
 hw/acpi/ich9.c| 17 ++--
 hw/acpi/piix4.c   | 12 +-
 hw/core/qdev.c|  2 +-
 hw/cpu/icc_bus.c  | 11 +
 hw/i386/acpi-dsdt-cpu-hotplug.dsl |  6 ++-
 hw/i386/kvm/apic.c|  8 
 hw/i386/pc.c  | 62 +--
 hw/intc/apic.c| 10 +
 hw/intc/apic_common.c | 21 ++
 include/hw/acpi/cpu_hotplug.h |  8 
 include/hw/cpu/icc_bus.h  |  1 +
 include/hw/i386/apic_internal.h   |  1 +
 include/hw/qdev-core.h|  1 +
 include/qom/cpu.h |  9 
 include/sysemu/kvm.h  |  1 +
 kvm-all.c | 57 -
 target-i386/cpu.c | 46 
 19 files changed, 378 insertions(+), 27 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH v2 05/11] pc: add cpu hot unplug request callback support

2015-01-13 Thread Zhu Guihua

Add cpu hot unplug request callback support, and a pre-check about the
active cpus is added to avoid removing the last cpu.

Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 hw/i386/pc.c | 36 ++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 9bdec16..dd6e8da 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1694,6 +1694,34 @@ out:
 error_propagate(errp, local_err);
 }
 
+static void pc_cpu_unplug_request(HotplugHandler *hotplug_dev,
+  DeviceState *dev, Error **errp)
+{
+HotplugHandlerClass *hhc;
+Error *local_err = NULL;
+PCMachineState *pcms;
+
+if (smp_cpus == 1) {
+error_setg(local_err,
+   This is the last cpu, should not be removed!);
+goto out;
+}
+
+pcms = PC_MACHINE(hotplug_dev);
+if (!pcms-acpi_dev) {
+if (dev-hotplugged) {
+error_setg(local_err,
+   cpu hot unplug is not enabled: missing acpi device);
+}
+goto out;
+}
+
+hhc = HOTPLUG_HANDLER_GET_CLASS(pcms-acpi_dev);
+hhc-unplug_request(HOTPLUG_HANDLER(pcms-acpi_dev), dev, local_err);
+out:
+error_propagate(errp, local_err);
+}
+
 static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp)
 {
@@ -1707,8 +1735,12 @@ static void pc_machine_device_plug_cb(HotplugHandler 
*hotplug_dev,
 static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
-error_setg(errp, acpi: device unplug request for not supported device
-type: %s, object_get_typename(OBJECT(dev)));
+if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+pc_cpu_unplug_request(hotplug_dev, dev, errp);
+} else {
+error_setg(errp, acpi: device unplug request for not supported device
+type: %s, object_get_typename(OBJECT(dev)));
+}
 }
 
 static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
-- 
1.9.3

[Qemu-devel] [PATCH v2 04/11] acpi/ich9: add cpu hot unplug request callback support

2015-01-13 Thread Zhu Guihua

Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 hw/acpi/ich9.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index c48d176..b795f8f 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -304,8 +304,13 @@ void ich9_pm_device_plug_cb(ICH9LPCPMRegs *pm, DeviceState 
*dev, Error **errp)
 void ich9_pm_device_unplug_request_cb(ICH9LPCPMRegs *pm, DeviceState *dev,
   Error **errp)
 {
-error_setg(errp, acpi: device unplug request for not supported device
-type: %s, object_get_typename(OBJECT(dev)));
+if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+acpi_cpu_unplug_request_cb(pm-acpi_regs, pm-irq,
+   pm-gpe_cpu, dev, errp);
+} else {
+error_setg(errp, acpi: device unplug request for not supported device
+type: %s, object_get_typename(OBJECT(dev)));
+}
 }
 
 void ich9_pm_device_unplug_cb(ICH9LPCPMRegs *pm, DeviceState *dev,
-- 
1.9.3

[Qemu-devel] [PATCH v2 03/11] acpi/piix4: add cpu hot unplug request callback support

2015-01-13 Thread Zhu Guihua

Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 hw/acpi/piix4.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 4407388..9b75780 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -364,6 +364,8 @@ static void piix4_device_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
 acpi_pcihp_device_unplug_cb(s-ar, s-irq, s-acpi_pci_hotplug, dev,
 errp);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+acpi_cpu_unplug_request_cb(s-ar, s-irq, s-gpe_cpu, dev, errp);
 } else {
 error_setg(errp, acpi: device unplug request for not supported device
 type: %s, object_get_typename(OBJECT(dev)));
-- 
1.9.3

[Qemu-devel] [PATCH v3 1/7] cpu: introduce CpuTopoInfo structure for argument simplification

2015-01-13 Thread Zhu Guihua

From: Chen Fan chen.fan.f...@cn.fujitsu.com

Reviewed-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 target-i386/topology.h | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/target-i386/topology.h b/target-i386/topology.h
index 07a6c5f..e9ff89c 100644
--- a/target-i386/topology.h
+++ b/target-i386/topology.h
@@ -47,6 +47,12 @@
  */
 typedef uint32_t apic_id_t;
 
+typedef struct X86CPUTopoInfo {
+unsigned pkg_id;
+unsigned core_id;
+unsigned smt_id;
+} X86CPUTopoInfo;
+
 /* Return the bit width needed for 'count' IDs
  */
 static unsigned apicid_bitwidth_for_count(unsigned count)
@@ -92,13 +98,11 @@ static inline unsigned apicid_pkg_offset(unsigned nr_cores, 
unsigned nr_threads)
  */
 static inline apic_id_t apicid_from_topo_ids(unsigned nr_cores,
  unsigned nr_threads,
- unsigned pkg_id,
- unsigned core_id,
- unsigned smt_id)
+ const X86CPUTopoInfo *topo)
 {
-return (pkg_id   apicid_pkg_offset(nr_cores, nr_threads)) |
-   (core_id  apicid_core_offset(nr_cores, nr_threads)) |
-   smt_id;
+return (topo-pkg_id   apicid_pkg_offset(nr_cores, nr_threads)) |
+   (topo-core_id  apicid_core_offset(nr_cores, nr_threads)) |
+   topo-smt_id;
 }
 
 /* Calculate thread/core/package IDs for a specific topology,
@@ -107,14 +111,12 @@ static inline apic_id_t apicid_from_topo_ids(unsigned 
nr_cores,
 static inline void x86_topo_ids_from_idx(unsigned nr_cores,
  unsigned nr_threads,
  unsigned cpu_index,
- unsigned *pkg_id,
- unsigned *core_id,
- unsigned *smt_id)
+ X86CPUTopoInfo *topo)
 {
 unsigned core_index = cpu_index / nr_threads;
-*smt_id = cpu_index % nr_threads;
-*core_id = core_index % nr_cores;
-*pkg_id = core_index / nr_cores;
+topo-smt_id = cpu_index % nr_threads;
+topo-core_id = core_index % nr_cores;
+topo-pkg_id = core_index / nr_cores;
 }
 
 /* Make APIC ID for the CPU 'cpu_index'
@@ -125,10 +127,9 @@ static inline apic_id_t x86_apicid_from_cpu_idx(unsigned 
nr_cores,
 unsigned nr_threads,
 unsigned cpu_index)
 {
-unsigned pkg_id, core_id, smt_id;
-x86_topo_ids_from_idx(nr_cores, nr_threads, cpu_index,
-  pkg_id, core_id, smt_id);
-return apicid_from_topo_ids(nr_cores, nr_threads, pkg_id, core_id, smt_id);
+X86CPUTopoInfo topo;
+x86_topo_ids_from_idx(nr_cores, nr_threads, cpu_index, topo);
+return apicid_from_topo_ids(nr_cores, nr_threads, topo);
 }
 
 #endif /* TARGET_I386_TOPOLOGY_H */
-- 
1.9.3

[Qemu-devel] [PATCH v3 3/7] qom/cpu: move apic vmstate register into x86_cpu_apic_realize

2015-01-13 Thread Zhu Guihua

From: Gu Zheng guz.f...@cn.fujitsu.com

move apic vmstate register into x86_cpu_apic_realize, and use
cc-get_arch_id as the instance id to avoid using the auto-id which will
break the migration if we add device not in order.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/intc/apic_common.c   | 3 +--
 include/hw/i386/apic_internal.h | 3 +++
 target-i386/cpu.c   | 8 +++-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index d9bb188..719f74d 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -380,7 +380,7 @@ static const VMStateDescription vmstate_apic_common_sipi = {
 }
 };
 
-static const VMStateDescription vmstate_apic_common = {
+const VMStateDescription vmstate_apic_common = {
 .name = apic,
 .version_id = 3,
 .minimum_version_id = 3,
@@ -434,7 +434,6 @@ static void apic_common_class_init(ObjectClass *klass, void 
*data)
 ICCDeviceClass *idc = ICC_DEVICE_CLASS(klass);
 DeviceClass *dc = DEVICE_CLASS(klass);
 
-dc-vmsd = vmstate_apic_common;
 dc-reset = apic_reset_common;
 dc-props = apic_properties_common;
 idc-realize = apic_common_realize;
diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
index dc7a89d..00d29e2 100644
--- a/include/hw/i386/apic_internal.h
+++ b/include/hw/i386/apic_internal.h
@@ -23,6 +23,7 @@
 #include exec/memory.h
 #include hw/cpu/icc_bus.h
 #include qemu/timer.h
+#include migration/vmstate.h
 
 /* APIC Local Vector Table */
 #define APIC_LVT_TIMER  0
@@ -138,6 +139,8 @@ typedef struct VAPICState {
 
 extern bool apic_report_tpr_access;
 
+extern const VMStateDescription vmstate_apic_common;
+
 void apic_report_irq_delivered(int delivered);
 bool apic_next_timer(APICCommonState *s, int64_t current_time);
 void apic_enable_tpr_access_reporting(DeviceState *d, bool enable);
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index b81ac5c..3406fe8 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2750,10 +2750,16 @@ static void x86_cpu_apic_create(X86CPU *cpu, Error 
**errp)
 
 static void x86_cpu_apic_realize(X86CPU *cpu, Error **errp)
 {
-if (cpu-apic_state == NULL) {
+DeviceState *apic_state = cpu-apic_state;
+CPUClass *cc = CPU_GET_CLASS(CPU(cpu));
+
+if (apic_state == NULL) {
 return;
 }
 
+vmstate_register(0, cc-get_arch_id(CPU(cpu)),
+ vmstate_apic_common, apic_state);
+
 if (qdev_init(cpu-apic_state)) {
 error_setg(errp, APIC device '%s' could not be initialized,
object_get_typename(OBJECT(cpu-apic_state)));
-- 
1.9.3

[Qemu-devel] [PATCH v3 6/7] cpu: add device_add foo-x86_64-cpu support

2015-01-13 Thread Zhu Guihua

From: Chen Fan chen.fan.f...@cn.fujitsu.com

Add support to device_add foo-x86_64-cpu, and additional checks of
apic id are added into x86_cpuid_set_apic_id() to avoid duplicate.
Besides, in order to support device/device_add foo-x86_64-cpu
which without specified apic id, we assign cpuid_apic_id with a
default broadcast value (0x) in initfn, and a new function
get_free_apic_id() to provide a free apid id to cpuid_apic_id if
it still has the default at realize time (e.g. hot add foo-cpu without
a specified apic id) to avoid apic id duplicates.

Thanks very much for Igor's suggestion.

Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/cpu_hotplug.c  |  7 +--
 hw/i386/pc.c   |  6 --
 target-i386/cpu.c  | 47 ---
 target-i386/topology.h | 18 ++
 4 files changed, 67 insertions(+), 11 deletions(-)

diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c
index b8ebfad..4047294 100644
--- a/hw/acpi/cpu_hotplug.c
+++ b/hw/acpi/cpu_hotplug.c
@@ -59,8 +59,11 @@ void acpi_cpu_plug_cb(ACPIREGS *ar, qemu_irq irq,
 return;
 }
 
-ar-gpe.sts[0] |= ACPI_CPU_HOTPLUG_STATUS;
-acpi_update_sci(ar, irq);
+/* Only trigger sci if cpu is hotplugged */
+if (dev-hotplugged) {
+ar-gpe.sts[0] |= ACPI_CPU_HOTPLUG_STATUS;
+acpi_update_sci(ar, irq);
+}
 }
 
 void acpi_cpu_hotplug_init(MemoryRegion *parent, Object *owner,
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e07f1fa..006f355 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1678,13 +1678,7 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
 Error *local_err = NULL;
 PCMachineState *pcms = PC_MACHINE(hotplug_dev);
 
-if (!dev-hotplugged) {
-goto out;
-}
-
 if (!pcms-acpi_dev) {
-error_setg(local_err,
-   cpu hotplug is not enabled: missing acpi device);
 goto out;
 }
 
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 3406fe8..4347948 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1705,6 +1705,7 @@ static void x86_cpuid_set_apic_id(Object *obj, Visitor 
*v, void *opaque,
 const int64_t max = UINT32_MAX;
 Error *error = NULL;
 int64_t value;
+X86CPUTopoInfo topo;
 
 if (dev-realized) {
 error_setg(errp, Attempt to set property '%s' on '%s' after 
@@ -1724,6 +1725,19 @@ static void x86_cpuid_set_apic_id(Object *obj, Visitor 
*v, void *opaque,
 return;
 }
 
+if (value  x86_cpu_apic_id_from_index(max_cpus - 1)) {
+error_setg(errp, CPU with APIC ID % PRIi64
+is more than MAX APIC ID limits, value);
+return;
+}
+
+x86_topo_ids_from_apic_id(smp_cores, smp_threads, value, topo);
+if (topo.smt_id = smp_threads || topo.core_id = smp_cores) {
+error_setg(errp, CPU with APIC ID % PRIi64  does not match 
+   topology configuration., value);
+return;
+}
+
 if ((value != cpu-env.cpuid_apic_id)  cpu_exists(value)) {
 error_setg(errp, CPU with APIC ID % PRIi64  exists, value);
 return;
@@ -2178,8 +2192,10 @@ static void x86_cpu_cpudef_class_init(ObjectClass *oc, 
void *data)
 {
 X86CPUDefinition *cpudef = data;
 X86CPUClass *xcc = X86_CPU_CLASS(oc);
+DeviceClass *dc = DEVICE_CLASS(oc);
 
 xcc-cpu_def = cpudef;
+dc-cannot_instantiate_with_device_add_yet = false;
 }
 
 static void x86_register_cpudef_type(X86CPUDefinition *def)
@@ -2188,6 +2204,7 @@ static void x86_register_cpudef_type(X86CPUDefinition 
*def)
 TypeInfo ti = {
 .name = typename,
 .parent = TYPE_X86_CPU,
+.instance_size = sizeof(X86CPU),
 .class_init = x86_cpu_cpudef_class_init,
 .class_data = def,
 };
@@ -2721,12 +2738,29 @@ static void mce_init(X86CPU *cpu)
 }
 
 #ifndef CONFIG_USER_ONLY
+static uint32_t get_free_apic_id(void)
+{
+int i;
+
+for (i = 0; i  max_cpus; i++) {
+uint32_t id = x86_cpu_apic_id_from_index(i);
+
+if (!cpu_exists(id)) {
+return id;
+}
+}
+
+return x86_cpu_apic_id_from_index(max_cpus);
+}
+
+#define APIC_ID_NOT_SET (~0U)
+
 static void x86_cpu_apic_create(X86CPU *cpu, Error **errp)
 {
-CPUX86State *env = cpu-env;
 DeviceState *dev = DEVICE(cpu);
 APICCommonState *apic;
 const char *apic_type = apic;
+uint32_t apic_id;
 
 if (kvm_irqchip_in_kernel()) {
 apic_type = kvm-apic;
@@ -2742,7 +2776,14 @@ static void x86_cpu_apic_create(X86CPU *cpu, Error 
**errp)
 
 object_property_add_child(OBJECT(cpu), apic,
   OBJECT(cpu-apic_state), NULL);
-qdev_prop_set_uint8(cpu-apic_state, id, env-cpuid_apic_id);
+
+apic_id = object_property_get_int(OBJECT(cpu), apic-id, NULL);
+if (apic_id == APIC_ID_NOT_SET) {
+apic_id = get_free_apic_id();
+

[Qemu-devel] [PATCH v3 0/7] cpu: add device_add foo-x86_64-cpu support

2015-01-13 Thread Zhu Guihua

This series is based on the previous patchset from Chen Fan:
https://lists.nongnu.org/archive/html/qemu-devel/2014-05/msg02360.html

We try to make cpu hotplug with device_add, and make
-device foo-x86_64-cpu available???also we can set apic-id
property with command line, if without setting apic-id property,
we offer the first unoccupied apic id as the default new apic id.
When hotplug cpu with device_add, additional check of APIC ID will be
done after cpu object initialization which was different from
'cpu_add' command that check 'ids' at the beginning.

The is the first half of the previous series:
[RFC V2 00/10] cpu: add device_add foo-x86_64-cpu and i386 cpu hot remove 
support
https://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04779.html

---
Changelog since v2:
 -rebase on latest upstream.
 -add cpu instance finalize.

Changelog since v1:
 -rebased on latest upstream.
 -introduce a help function to hide the access to icc_bus.
 -use a macro ACPI_ID_NOT_SET to replace the magic number(0x).

Changelog since RFC:
 -split out APIC vmstate/QMP-monitor changes into separate patches.
 -add the handle of the startup cpus(-device foo).
 -remove duplicated checking about env-cpuid_apic_id.
 -do actual APIC ID allocation at realize time if it is not set before.
 -remove the unneeded x86_cpu_cpudef_instance_init().
 -split off device_del support out here.
---

Chen Fan (2):
  cpu: introduce CpuTopoInfo structure for argument simplification
  cpu: add device_add foo-x86_64-cpu support

Gu Zheng (5):
  qom/cpu: move register_vmstate to common CPUClass.realizefn
  qom/cpu: move apic vmstate register into x86_cpu_apic_realize
  monitor: use cc-get_arch_id as the cpu index
  acpi:cpu hotplug: set pcmachine as icc bus' hotplug handler
  i386/cpu: add instance finalize callback

 cpus.c  |  4 ++-
 exec.c  | 32 -
 hw/acpi/cpu_hotplug.c   |  7 +++--
 hw/cpu/icc_bus.c|  5 
 hw/i386/pc.c|  6 
 hw/i386/pc_piix.c   |  5 
 hw/i386/pc_q35.c|  5 
 hw/intc/apic_common.c   |  3 +-
 include/hw/cpu/icc_bus.h|  2 ++
 include/hw/i386/apic_internal.h |  3 ++
 include/qom/cpu.h   |  3 ++
 monitor.c   |  4 ++-
 qom/cpu.c   |  2 ++
 target-i386/cpu.c   | 63 ++---
 target-i386/topology.h  | 51 ++---
 15 files changed, 150 insertions(+), 45 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH v3 7/7] i386/cpu: add instance finalize callback

2015-01-13 Thread Zhu Guihua

From: Gu Zheng guz.f...@cn.fujitsu.com

Add a func to finalize a cpu's instance. When cpu's device_add failed,
and cpu's device_del executed, this func would be invoked.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 include/qom/cpu.h | 1 +
 target-i386/cpu.c | 8 
 2 files changed, 9 insertions(+)

diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 936afcd..f663199 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -305,6 +305,7 @@ struct CPUState {
 QTAILQ_HEAD(CPUTailQ, CPUState);
 extern struct CPUTailQ cpus;
 #define CPU_NEXT(cpu) QTAILQ_NEXT(cpu, node)
+#define CPU_REMOVE(cpu) QTAILQ_REMOVE(cpus, cpu, node)
 #define CPU_FOREACH(cpu) QTAILQ_FOREACH(cpu, cpus, node)
 #define CPU_FOREACH_SAFE(cpu, next_cpu) \
 QTAILQ_FOREACH_SAFE(cpu, cpus, node, next_cpu)
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 4347948..4746814 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2983,6 +2983,13 @@ static void x86_cpu_initfn(Object *obj)
 }
 }
 
+static void x86_cpu_finalizefn(Object *obj)
+{
+CPUState *cs = CPU(obj);
+
+CPU_REMOVE(cs);
+}
+
 static int64_t x86_cpu_get_arch_id(CPUState *cs)
 {
 X86CPU *cpu = X86_CPU(cs);
@@ -3095,6 +3102,7 @@ static const TypeInfo x86_cpu_type_info = {
 .parent = TYPE_CPU,
 .instance_size = sizeof(X86CPU),
 .instance_init = x86_cpu_initfn,
+.instance_finalize = x86_cpu_finalizefn,
 .abstract = true,
 .class_size = sizeof(X86CPUClass),
 .class_init = x86_cpu_common_class_init,
-- 
1.9.3

Re: [Qemu-devel] question about live migration with storage

2015-01-13 Thread Paolo Bonzini



On 14/01/2015 03:41, Zhang Haoyu wrote:
 Hi, Paolo,
 what's advantages of drive_mirror over traditional mechanism implemented in 
 block-migration.c ?
 Why libvirt use drive_mirror instead of traditional iterative mechanism as 
 the default way
 of live migration with non-shared storage?

1) Being able to choose which block devices are migrated, and whether
they are migrated incrementally or not.

2) Finer-grain control the parameters of block migration (dirty bitmap
granularity).

3) Block and RAM migration do not share the same socket and thus can
more easily be parallelized.

Note that 1-2 are not yet supported by libvirt as far as I remember.

Paolo

Re: [Qemu-devel] [PATCH] AIO: Reduce number of threads for 32bit hosts

2015-01-13 Thread Paolo Bonzini



On 14/01/2015 01:56, Alexander Graf wrote:
 +if (sizeof(pool) == 4) {
 +/* 32bit systems run out of virtual memory quickly */
 +pool-max_threads = 4;
 +} else {
 +pool-max_threads = 64;
 +}

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

The same problem applies to coroutine stacks, and those cannot be
throttled down as easily.  But I guess if you limit the number of
threads, the guest gets slowed down and doesn't create as many coroutines.

It would be nice to have a way to measure coroutine stack usage, similar
to what the kernel does.

Paolo

[Qemu-devel] [PATCH v2 01/11] x86: add x86_cpu_unrealizefn() for cpu apic remove

2015-01-13 Thread Zhu Guihua

From: Chen Fan chen.fan.f...@cn.fujitsu.com

Implement x86_cpu_unrealizefn() for corresponding x86_cpu_realizefn(),
which is mostly used to clean the apic related allocation and vmstates
at here.

Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/cpu/icc_bus.c| 11 ++
 hw/i386/kvm/apic.c  |  8 +++
 hw/intc/apic.c  | 10 +
 hw/intc/apic_common.c   | 21 ---
 include/hw/cpu/icc_bus.h|  1 +
 include/hw/i386/apic_internal.h |  1 +
 target-i386/cpu.c   | 46 +
 7 files changed, 90 insertions(+), 8 deletions(-)

diff --git a/hw/cpu/icc_bus.c b/hw/cpu/icc_bus.c
index f455a20..3e7f62f 100644
--- a/hw/cpu/icc_bus.c
+++ b/hw/cpu/icc_bus.c
@@ -44,11 +44,22 @@ static void icc_device_realize(DeviceState *dev, Error 
**errp)
 
 }
 
+static void icc_device_unrealize(DeviceState *dev, Error **errp)
+{
+ICCDeviceClass *idc = ICC_DEVICE_GET_CLASS(dev);
+
+/* convert to QOM */
+if (idc-unrealize) {
+idc-unrealize(dev, errp);
+}
+}
+
 static void icc_device_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 
 dc-realize = icc_device_realize;
+dc-unrealize = icc_device_unrealize;
 dc-bus_type = TYPE_ICC_BUS;
 }
 
diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index 5b47056..781c296 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -189,11 +189,19 @@ static void kvm_apic_realize(DeviceState *dev, Error 
**errp)
 }
 }
 
+static void kvm_apic_unrealize(DeviceState *dev, Error **errp)
+{
+APICCommonState *s = APIC_COMMON(dev);
+
+object_unparent(OBJECT(s-io_memory));
+}
+
 static void kvm_apic_class_init(ObjectClass *klass, void *data)
 {
 APICCommonClass *k = APIC_COMMON_CLASS(klass);
 
 k-realize = kvm_apic_realize;
+k-unrealize = kvm_apic_unrealize;
 k-reset = kvm_apic_reset;
 k-set_base = kvm_apic_set_base;
 k-set_tpr = kvm_apic_set_tpr;
diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 0f97b47..2f3a7a3 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -889,11 +889,21 @@ static void apic_realize(DeviceState *dev, Error **errp)
 msi_supported = true;
 }
 
+static void apic_unrealize(DeviceState *dev, Error **errp)
+{
+APICCommonState *s = APIC_COMMON(dev);
+
+object_unparent(OBJECT(s-io_memory));
+timer_free(s-timer);
+local_apics[s-idx] = NULL;
+}
+
 static void apic_class_init(ObjectClass *klass, void *data)
 {
 APICCommonClass *k = APIC_COMMON_CLASS(klass);
 
 k-realize = apic_realize;
+k-unrealize = apic_unrealize;
 k-set_base = apic_set_base;
 k-set_tpr = apic_set_tpr;
 k-get_tpr = apic_get_tpr;
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 719f74d..912ba64 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -299,16 +299,8 @@ static void apic_common_realize(DeviceState *dev, Error 
**errp)
 APICCommonState *s = APIC_COMMON(dev);
 APICCommonClass *info;
 static DeviceState *vapic;
-static int apic_no;
 static bool mmio_registered;
 
-if (apic_no = MAX_APICS) {
-error_setg(errp, %s initialization failed.,
-   object_get_typename(OBJECT(dev)));
-return;
-}
-s-idx = apic_no++;
-
 info = APIC_COMMON_GET_CLASS(s);
 info-realize(dev, errp);
 if (!mmio_registered) {
@@ -329,6 +321,18 @@ static void apic_common_realize(DeviceState *dev, Error 
**errp)
 
 }
 
+static void apic_common_unrealize(DeviceState *dev, Error **errp)
+{
+APICCommonState *s = APIC_COMMON(dev);
+APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
+
+info-unrealize(dev, errp);
+
+if (apic_report_tpr_access  info-enable_tpr_reporting) {
+info-enable_tpr_reporting(s, false);
+}
+}
+
 static int apic_pre_load(void *opaque)
 {
 APICCommonState *s = APIC_COMMON(opaque);
@@ -437,6 +441,7 @@ static void apic_common_class_init(ObjectClass *klass, void 
*data)
 dc-reset = apic_reset_common;
 dc-props = apic_properties_common;
 idc-realize = apic_common_realize;
+idc-unrealize = apic_common_unrealize;
 /*
  * Reason: APIC and CPU need to be wired up by
  * x86_cpu_apic_create()
diff --git a/include/hw/cpu/icc_bus.h b/include/hw/cpu/icc_bus.h
index 7e53afb..0b2ddd5 100644
--- a/include/hw/cpu/icc_bus.h
+++ b/include/hw/cpu/icc_bus.h
@@ -69,6 +69,7 @@ typedef struct ICCDeviceClass {
 /* public */
 
 DeviceRealize realize;
+DeviceUnrealize unrealize;
 } ICCDeviceClass;
 
 #define TYPE_ICC_DEVICE icc-device
diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
index 00d29e2..b46a3af 100644
--- a/include/hw/i386/apic_internal.h
+++ b/include/hw/i386/apic_internal.h
@@ -82,6 +82,7 @@ typedef struct APICCommonClass
 ICCDeviceClass parent_class;

[Qemu-devel] [PATCH v2 09/11] pc: add cpu hot unplug callback support

2015-01-13 Thread Zhu Guihua

From: Gu Zheng guz.f...@cn.fujitsu.com

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/i386/pc.c | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index dd6e8da..b7d5712 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1722,6 +1722,24 @@ out:
 error_propagate(errp, local_err);
 }
 
+static void pc_cpu_unplug(HotplugHandler *hotplug_dev,
+  DeviceState *dev, Error **errp)
+{
+HotplugHandlerClass *hhc;
+Error *local_err = NULL;
+PCMachineState *pcms = PC_MACHINE(hotplug_dev);
+
+hhc = HOTPLUG_HANDLER_GET_CLASS(pcms-acpi_dev);
+hhc-unplug(HOTPLUG_HANDLER(pcms-acpi_dev), dev, local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+/* decrement the number of CPUs */
+rtc_set_memory(pcms-rtc, 0x5f, rtc_get_memory(pcms-rtc, 0x5f) - 1);
+}
+
 static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp)
 {
@@ -1746,8 +1764,12 @@ static void 
pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
 static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
-error_setg(errp, acpi: device unplug for not supported device
-type: %s, object_get_typename(OBJECT(dev)));
+if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+pc_cpu_unplug(hotplug_dev, dev, errp);
+} else {
+error_setg(errp, acpi: device unplug for not supported device
+type: %s, object_get_typename(OBJECT(dev)));
+}
 }
 
 static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
-- 
1.9.3

[Qemu-devel] [PATCH v2 11/11] cpus: reclaim allocated vCPU objects

2015-01-13 Thread Zhu Guihua

From: Gu Zheng guz.f...@cn.fujitsu.com

After ACPI get a signal to eject a vCPU, the vCPU must be
removed from CPU list，before the vCPU really removed,  then
release the all related vCPU objects.

In order to deal well with the kvm vcpus (which can not be removed without any
protection), we do not close KVM vcpu fd, just record and mark it as stopped
into a list, so that we can reuse it for the appending cpu hot-add request if
possible. It is also the approach that kvm guys suggested:
https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html

Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 cpus.c   | 37 ++
 include/sysemu/kvm.h |  1 +
 kvm-all.c| 57 +++-
 3 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 1c25054..b5b0c91 100644
--- a/cpus.c
+++ b/cpus.c
@@ -871,6 +871,24 @@ void async_run_on_cpu(CPUState *cpu, void (*func)(void 
*data), void *data)
 qemu_cpu_kick(cpu);
 }
 
+static void qemu_kvm_destroy_vcpu(CPUState *cpu)
+{
+CPU_REMOVE(cpu);
+
+if (kvm_destroy_vcpu(cpu)  0) {
+error_report(kvm_destroy_vcpu failed.\n);
+exit(EXIT_FAILURE);
+}
+
+object_unparent(OBJECT(cpu));
+}
+
+static void qemu_tcg_destroy_vcpu(CPUState *cpu)
+{
+CPU_REMOVE(cpu);
+object_unparent(OBJECT(cpu));
+}
+
 static void flush_queued_work(CPUState *cpu)
 {
 struct qemu_work_item *wi;
@@ -963,6 +981,11 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
 }
 }
 qemu_kvm_wait_io_event(cpu);
+if (cpu-exit  !cpu_can_run(cpu)) {
+qemu_kvm_destroy_vcpu(cpu);
+qemu_mutex_unlock(qemu_global_mutex);
+return NULL;
+}
 }
 
 return NULL;
@@ -1016,6 +1039,7 @@ static void tcg_exec_all(void);
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
 CPUState *cpu = arg;
+CPUState *remove_cpu = NULL;
 
 qemu_tcg_init_cpu_signals();
 qemu_thread_get_self(cpu-thread);
@@ -1049,6 +1073,16 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 }
 }
 qemu_tcg_wait_io_event();
+CPU_FOREACH(cpu) {
+if (cpu-exit  !cpu_can_run(cpu)) {
+remove_cpu = cpu;
+break;
+}
+}
+if (remove_cpu) {
+qemu_tcg_destroy_vcpu(remove_cpu);
+remove_cpu = NULL;
+}
 }
 
 return NULL;
@@ -1406,6 +1440,9 @@ static void tcg_exec_all(void)
 break;
 }
 } else if (cpu-stop || cpu-stopped) {
+if (cpu-exit) {
+next_cpu = CPU_NEXT(cpu);
+}
 break;
 }
 }
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 30cb84d..560caef 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -188,6 +188,7 @@ int kvm_has_intx_set_mask(void);
 
 int kvm_init_vcpu(CPUState *cpu);
 int kvm_cpu_exec(CPUState *cpu);
+int kvm_destroy_vcpu(CPUState *cpu);
 
 #ifdef NEED_CPU_H
 
diff --git a/kvm-all.c b/kvm-all.c
index 2f21a4e..6ea3d70 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -71,6 +71,12 @@ typedef struct KVMSlot
 
 typedef struct kvm_dirty_log KVMDirtyLog;
 
+struct KVMParkedVcpu {
+unsigned long vcpu_id;
+int kvm_fd;
+QLIST_ENTRY(KVMParkedVcpu) node;
+};
+
 struct KVMState
 {
 AccelState parent_obj;
@@ -107,6 +113,7 @@ struct KVMState
 QTAILQ_HEAD(msi_hashtab, KVMMSIRoute) msi_hashtab[KVM_MSI_HASHTAB_SIZE];
 bool direct_msi;
 #endif
+QLIST_HEAD(, KVMParkedVcpu) kvm_parked_vcpus;
 };
 
 #define TYPE_KVM_ACCEL ACCEL_CLASS_NAME(kvm)
@@ -247,6 +254,53 @@ static int kvm_set_user_memory_region(KVMState *s, KVMSlot 
*slot)
 return kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, mem);
 }
 
+int kvm_destroy_vcpu(CPUState *cpu)
+{
+KVMState *s = kvm_state;
+long mmap_size;
+struct KVMParkedVcpu *vcpu = NULL;
+int ret = 0;
+
+DPRINTF(kvm_destroy_vcpu\n);
+
+mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
+if (mmap_size  0) {
+ret = mmap_size;
+DPRINTF(KVM_GET_VCPU_MMAP_SIZE failed\n);
+goto err;
+}
+
+ret = munmap(cpu-kvm_run, mmap_size);
+if (ret  0) {
+goto err;
+}
+
+vcpu = g_malloc0(sizeof(*vcpu));
+vcpu-vcpu_id = kvm_arch_vcpu_id(cpu);
+vcpu-kvm_fd = cpu-kvm_fd;
+QLIST_INSERT_HEAD(kvm_state-kvm_parked_vcpus, vcpu, node);
+err:
+return ret;
+}
+
+static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
+{
+struct KVMParkedVcpu *cpu;
+
+QLIST_FOREACH(cpu, s-kvm_parked_vcpus, node) {
+if (cpu-vcpu_id == vcpu_id) {
+int kvm_fd;
+
+QLIST_REMOVE(cpu, node);
+kvm_fd = cpu-kvm_fd;
+g_free(cpu);
+return kvm_fd;
+}
+}
+
+return kvm_vm_ioctl(s,

[Qemu-devel] [PATCH v2 07/11] acpi/piix4: add cpu hot unplug callback support

2015-01-13 Thread Zhu Guihua

From: Gu Zheng guz.f...@cn.fujitsu.com

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/piix4.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 9b75780..2a923fc 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -375,8 +375,14 @@ static void piix4_device_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 static void piix4_device_unplug_cb(HotplugHandler *hotplug_dev,
DeviceState *dev, Error **errp)
 {
-error_setg(errp, acpi: device unplug for not supported device
-type: %s, object_get_typename(OBJECT(dev)));
+PIIX4PMState *s = PIIX4_PM(hotplug_dev);
+
+if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+acpi_cpu_unplug_cb(s-ar, s-irq, s-gpe_cpu, dev, errp);
+} else {
+error_setg(errp, acpi: device unplug for not supported device
+type: %s, object_get_typename(OBJECT(dev)));
+}
 }
 
 static void piix4_update_bus_hotplug(PCIBus *pci_bus, void *opaque)
-- 
1.9.3

[Qemu-devel] [PATCH v2 06/11] acpi/cpu: add cpu hot unplug callback function

2015-01-13 Thread Zhu Guihua

Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 cpus.c| 7 +++
 hw/acpi/cpu_hotplug.c | 8 
 include/hw/acpi/cpu_hotplug.h | 3 +++
 include/qom/cpu.h | 9 +
 4 files changed, 27 insertions(+)

diff --git a/cpus.c b/cpus.c
index d5e35c0..1c25054 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1205,6 +1205,13 @@ void resume_all_vcpus(void)
 }
 }
 
+void cpu_remove(CPUState *cpu)
+{
+cpu-stop = true;
+cpu-exit = true;
+qemu_cpu_kick(cpu);
+}
+
 /* For temporary buffers for forming a name */
 #define VCPU_THREAD_NAME_SIZE 16
 
diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c
index f8a10d2..36feb6a 100644
--- a/hw/acpi/cpu_hotplug.c
+++ b/hw/acpi/cpu_hotplug.c
@@ -94,6 +94,14 @@ void acpi_cpu_unplug_request_cb(ACPIREGS *ar, qemu_irq irq,
 acpi_update_sci(ar, irq);
 }
 
+void acpi_cpu_unplug_cb(ACPIREGS *ar, qemu_irq irq, AcpiCpuHotplug *g,
+DeviceState *dev, Error **errp)
+{
+CPUState *cpu = CPU(dev);
+
+cpu_remove(cpu);
+}
+
 void acpi_cpu_hotplug_init(MemoryRegion *parent, Object *owner,
AcpiCpuHotplug *gpe_cpu, uint16_t base)
 {
diff --git a/include/hw/acpi/cpu_hotplug.h b/include/hw/acpi/cpu_hotplug.h
index 8b15a3d..0f84adb 100644
--- a/include/hw/acpi/cpu_hotplug.h
+++ b/include/hw/acpi/cpu_hotplug.h
@@ -27,6 +27,9 @@ void acpi_cpu_unplug_request_cb(ACPIREGS *ar, qemu_irq irq,
 AcpiCpuHotplug *g, DeviceState *dev,
 Error **errp);
 
+void acpi_cpu_unplug_cb(ACPIREGS *ar, qemu_irq irq,
+AcpiCpuHotplug *g, DeviceState *dev, Error **errp);
+
 void acpi_cpu_hotplug_init(MemoryRegion *parent, Object *owner,
AcpiCpuHotplug *gpe_cpu, uint16_t base);
 #endif
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index f663199..0592b4d 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -249,6 +249,7 @@ struct CPUState {
 bool created;
 bool stop;
 bool stopped;
+bool exit;
 volatile sig_atomic_t exit_request;
 uint32_t interrupt_request;
 int singlestep_enabled;
@@ -613,6 +614,14 @@ void cpu_exit(CPUState *cpu);
 void cpu_resume(CPUState *cpu);
 
 /**
+ * cpu_remove:
+ * @cpu: The vCPU to remove.
+ *
+ * Requests the CPU @cpu to be removed.
+ */
+void cpu_remove(CPUState *cpu);
+
+/**
  * qemu_init_vcpu:
  * @cpu: The vCPU to initialize.
  *
-- 
1.9.3

[Qemu-devel] [PATCH v2 08/11] acpi/ich9: add cpu hot unplug support

2015-01-13 Thread Zhu Guihua

From: Gu Zheng guz.f...@cn.fujitsu.com

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/ich9.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index b795f8f..a892e5d 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -316,8 +316,12 @@ void ich9_pm_device_unplug_request_cb(ICH9LPCPMRegs *pm, 
DeviceState *dev,
 void ich9_pm_device_unplug_cb(ICH9LPCPMRegs *pm, DeviceState *dev,
   Error **errp)
 {
-error_setg(errp, acpi: device unplug for not supported device
-type: %s, object_get_typename(OBJECT(dev)));
+if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+acpi_cpu_unplug_cb(pm-acpi_regs, pm-irq, pm-gpe_cpu, dev, errp);
+} else {
+error_setg(errp, acpi: device unplug for not supported device
+type: %s, object_get_typename(OBJECT(dev)));
+}
 }
 
 void ich9_pm_ospm_status(AcpiDeviceIf *adev, ACPIOSTInfoList ***list)
-- 
1.9.3

[Qemu-devel] [PATCH v2 10/11] cpu hotplug: implement function cpu_status_write() for vcpu ejection

2015-01-13 Thread Zhu Guihua

From: Chen Fan chen.fan.f...@cn.fujitsu.com

When OS ejected a vcpu (like: echo 1  /sys/bus/acpi/devices/LNXCPUXX/eject),
it would call acpi EJ0 method, the firmware need to write the new cpumap, QEMU
would know which vcpu need to be ejected.

Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Zhu Guihua zhugh.f...@cn.fujitsu.com
---
 hw/acpi/cpu_hotplug.c | 42 ++-
 hw/core/qdev.c|  2 +-
 hw/i386/acpi-dsdt-cpu-hotplug.dsl |  6 +-
 include/hw/acpi/cpu_hotplug.h |  1 +
 include/hw/qdev-core.h|  1 +
 5 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c
index 36feb6a..2a56df0 100644
--- a/hw/acpi/cpu_hotplug.c
+++ b/hw/acpi/cpu_hotplug.c
@@ -17,6 +17,26 @@ typedef enum STS_OPT {
 CLEAR,
 } STS_OPT;
 
+static void acpi_eject_vcpu(AcpiCpuHotplug *cpus_status, int64_t cpu_id)
+{
+CPUState *cpu;
+
+CPU_FOREACH(cpu) {
+CPUClass *cc = CPU_GET_CLASS(cpu);
+int64_t id = cc-get_arch_id(cpu);
+HotplugHandler *hotplug_ctrl;
+
+if (cpu_id == id) {
+cpus_status-old_sts[cpu_id / 8] = ~(1  (cpu_id % 8));
+cpus_status-sts[cpu_id / 8] = ~(1  (cpu_id % 8));
+
+hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(cpu));
+hotplug_handler_unplug(hotplug_ctrl, DEVICE(cpu), NULL);
+break;
+}
+}
+}
+
 static uint64_t cpu_status_read(void *opaque, hwaddr addr, unsigned int size)
 {
 AcpiCpuHotplug *cpus = opaque;
@@ -28,7 +48,26 @@ static uint64_t cpu_status_read(void *opaque, hwaddr addr, 
unsigned int size)
 static void cpu_status_write(void *opaque, hwaddr addr, uint64_t data,
  unsigned int size)
 {
-/* TODO: implement VCPU removal on guest signal that CPU can be removed */
+AcpiCpuHotplug *cpus = opaque;
+uint8_t val;
+int i;
+int64_t cpu_id = -1;
+
+val = cpus-old_sts[addr] ^ data;
+
+if (val == 0) {
+return;
+}
+
+for (i = 0; i  8; i++) {
+if (val  1  i) {
+cpu_id = 8 * addr + i;
+}
+}
+
+if (cpu_id != -1) {
+acpi_eject_vcpu(cpus, cpu_id);
+}
 }
 
 static const MemoryRegionOps AcpiCpuHotplug_ops = {
@@ -56,6 +95,7 @@ static void acpi_update_cpu_present_bit(AcpiCpuHotplug *g, 
CPUState *cpu,
 switch (opt) {
 case SET:
 g-sts[cpu_id / 8] |= (1  (cpu_id % 8));
+g-old_sts[cpu_id / 8] |= (1  (cpu_id % 8));
 break;
 case CLEAR:
 g-sts[cpu_id / 8] = ~(1  (cpu_id % 8));
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 901f289..9f08fe6 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -223,7 +223,7 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int 
alias_id,
 dev-alias_required_for_version = required_for_version;
 }
 
-static HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev)
+HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev)
 {
 HotplugHandler *hotplug_ctrl = NULL;
 
diff --git a/hw/i386/acpi-dsdt-cpu-hotplug.dsl 
b/hw/i386/acpi-dsdt-cpu-hotplug.dsl
index 34aab5a..9485f12 100644
--- a/hw/i386/acpi-dsdt-cpu-hotplug.dsl
+++ b/hw/i386/acpi-dsdt-cpu-hotplug.dsl
@@ -50,7 +50,11 @@ Scope(\_SB) {
 }
 Method(CPEJ, 2, NotSerialized) {
 // _EJ0 method - eject callback
-Sleep(200)
+Store(Zero, Index(CPON, ToInteger(Arg0)))
+Store(One, Local0)
+ShiftLeft(Local0, Arg0, Local0)
+Not(Local0, Local0)
+And(PRS, Local0, PRS)
 }
 
 #define CPU_STATUS_LEN ACPI_GPE_PROC_LEN
diff --git a/include/hw/acpi/cpu_hotplug.h b/include/hw/acpi/cpu_hotplug.h
index 0f84adb..abbb29e 100644
--- a/include/hw/acpi/cpu_hotplug.h
+++ b/include/hw/acpi/cpu_hotplug.h
@@ -18,6 +18,7 @@
 typedef struct AcpiCpuHotplug {
 MemoryRegion io;
 uint8_t sts[ACPI_GPE_PROC_LEN];
+uint8_t old_sts[ACPI_GPE_PROC_LEN];
 } AcpiCpuHotplug;
 
 void acpi_cpu_plug_cb(ACPIREGS *ar, qemu_irq irq,
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 589bbe7..5b28a4f 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -263,6 +263,7 @@ void qdev_set_legacy_instance_id(DeviceState *dev, int 
alias_id,
 void qdev_unplug(DeviceState *dev, Error **errp);
 void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp);
+HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev);
 void qdev_machine_creation_done(void);
 bool qdev_machine_modified(void);
 
-- 
1.9.3

Re: [Qemu-devel] [PATCH 1/4] target-i386: fix movntsd on big-endian hosts

2015-01-13 Thread Paolo Bonzini



On 13/01/2015 19:48, Eduardo Habkost wrote:
   if (b1  1) {
  -gen_stq_env_A0(s, offsetof(CPUX86State, xmm_regs[reg]));
  +gen_stq_env_A0(s, offsetof(CPUX86State,
  +   xmm_regs[reg].XMM_Q(0)));
 Do we have (or will patch 4/4 introduce) the same bug on the
 tcg_gen_addi_ptr() calls that don't use the XMM_Q macro?

No, they all call into helpers that use the XMM_Q macro themselves.

Paolo

Re: [Qemu-devel] Patch Round-up for stable 2.1.3, freeze on 2015-01-14

2015-01-13 Thread Michael Roth

Quoting Marcel Apfelbaum (2015-01-13 12:48:50)
 On 01/13/2015 07:49 PM, William Dauchy wrote:
  Hello,
 
  On Jan09 23:42, Paolo Bonzini wrote:
  That's commit 364c3e6b8dd7912e01d19122d791b8c8f6df4f6c on branch
  uq/master of git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git.
 
  maybe the one for:
  fix regression when reading memory size from config file
 
  is also a patch to think about.
 Definitely, the patch was:
   [PATCH] vl.c: fix regression when reading memory size from config file + 
 Paolo's fix
 
 The thread is:
 http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg01000.html
 
 Be aware that we need both the patch and Paolo's fix.

By Paolo's fix you mean:

364c3e6, vl.c: fix regression when reading machine type from config file?

I've gone ahead and applied from uq/uq/master, but still waiting on

vl.c: fix regression when reading memory size from config file to be picked
up by a maintainer.

 Thanks,
 Marcel
 
 
 
  Thanks,

Re: [Qemu-devel] [PATCH v11 00/13] block: Incremental backup series

2015-01-13 Thread John Snow

Minor update to the github version of this patchset that brings it 
inline with the recent block pull into master. It also drops a duplicate 
patch that has since made it upstream.


Not re-posting yet to allow time for review and critique.

--js

On 01/12/2015 11:30 AM, John Snow wrote:

Welcome to version 11. I hope you are enjoying our regular newsletter.

This patchset enables the in-memory part of the incremental backup
feature. A patchset by Vladimir Sementsov-Ogievskiy enables the
migration of in-memory dirty bitmaps, and a future patchset will
enable the storage and retrieval of dirty bitmaps to and from permanent
storage.

Enough changes have been made that most Reviewed-By lines from
previous iterations have been removed. (Sorry!)

This series was originally authored by Fam Zheng;
his cover letter is included below.

~John Snow

=

This is the in memory part of the incremental backup feature.

With the added commands, we can create a bitmap on a block
backend, from which point of time all the writes are tracked by
the bitmap, marking sectors as dirty. Later, we call drive-backup
and pass the bitmap to it, to do an incremental backup.

See the last patch which adds some tests for this use case.

Fam

=

For convenience, this patchset is available on github:
 https://github.com/jnsnow/qemu/commits/dbm-backup

v11:

  - Instead of copying BdrvDirtyBitmaps and keeping a pointer to the
object we were copied from, we instead freeze a bitmap in-place
without copying it. On success, we thaw and delete the bitmap.
On failure, we merge the bitmap with a successor, which is an
anonymous bitmap attached as a child that records writes for us
for the duration of the backup operation.

This means that incremental backups can NEVER BE RETRIED in a
deterministic fashion. If an incremental backup fails on a set
of dirty sectors {x}, and a new set of dirty sectors {y} are
introduced during the backup, then any possible retry action
on an incremental backup can only operate on {x,y}. There is no
way to get an incremental backup as it would have been.

So, the failure mode for incremental backup is to try again,
and the resulting image will simply be a differential from the
last successful dirty bitmap backup.

  - Removed hbitmap_copy and bdrv_dirty_bitmap_copy.

  - Added a small fixup patch:
- Update all granularity fields to be uint64_t.
- Update documentation around BdrvDirtyBitmap structure.

  - Modified transactions to obey frozen attribute of BdrvDirtyBitmaps.

  - Added frozen attribute to the info query.

Re: [Qemu-devel] [PATCH 1/8] tls: require compiler support for __thread

2015-01-13 Thread Peter Maydell

On 13 January 2015 at 17:52, Paolo Bonzini pbonz...@redhat.com wrote:
 The block layer is now using __thread unconditionally.

Where? I did a quick grep for __thread and (other than stuff in
the *-user code and some Win32 specific files) there's no use of
__thread outside the DEFINE_TLS wrappers.

  Remove the
 fake TLS wrappers (that actually aren't TLS on !Linux) in
 include/qemu/tls.h, and add a testcase.

What platforms have you tested on? We definitely shouldn't
widen our use of __thread without testing it on all the
platforms we support...

thanks
-- PMM

Re: [Qemu-devel] Patch Round-up for stable 2.1.3, freeze on 2015-01-14

2015-01-13 Thread Marcel Apfelbaum


On 01/13/2015 07:49 PM, William Dauchy wrote:

Hello,

On Jan09 23:42, Paolo Bonzini wrote:

That's commit 364c3e6b8dd7912e01d19122d791b8c8f6df4f6c on branch
uq/master of git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git.


maybe the one for:
fix regression when reading memory size from config file

is also a patch to think about.

Definitely, the patch was:
 [PATCH] vl.c: fix regression when reading memory size from config file + 
Paolo's fix

The thread is:
http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg01000.html

Be aware that we need both the patch and Paolo's fix.
Thanks,
Marcel




Thanks,

Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.

2015-01-13 Thread Christian Borntraeger

Am 13.01.2015 um 17:04 schrieb Stefan Hajnoczi:
[...]
 I'm really starting to get worried that you are going to break things.
 This DASD hack is a layering violation but okay, go ahead if you want.
 But now you are also thinking about breaking live migration.
 
 The proper thing to do is to introduce libvirt XML syntax for DASD.
 That way the geometry can be handled as part of the machine
 configuration.  Then live migration and storage management tools can do
 the right thing.
 
 I've said this should be done in libvirt repeatedly but you keep wanting
 to hack QEMU instead of doing this cleanly :(.
 
 If you have plans to expand on this hack, please scrap this series and
 introduce libvirt XML syntax instead.

We had no plans to expand on this band-aid. I just tried to come up with ideas 
beyond this hack to make this more acceptible to you (because I though that
you want something on top). Seems that I misunderstood you, so if you  prefer
to just keep this hack and not do anything further in that direction, this is
totally fine with me.

So lets just fix up the small nits and go ahead, ok?


Christian

Re: [Qemu-devel] [PATCH 2/4] target-i386: do not memcpy in and out of xmm_regs

2015-01-13 Thread Eduardo Habkost

On Wed, Jan 07, 2015 at 06:39:13PM +0100, Paolo Bonzini wrote:
[...]
 diff --git a/target-i386/translate.c b/target-i386/translate.c
 index 5af4300..253009a 100644
 --- a/target-i386/translate.c
 +++ b/target-i386/translate.c
 @@ -2621,10 +2621,10 @@ static inline void gen_sto_env_A0(DisasContext *s, 
 int offset)
  
  static inline void gen_op_movo(int d_offset, int s_offset)
  {
 -tcg_gen_ld_i64(cpu_tmp1_i64, cpu_env, s_offset);
 -tcg_gen_st_i64(cpu_tmp1_i64, cpu_env, d_offset);
 -tcg_gen_ld_i64(cpu_tmp1_i64, cpu_env, s_offset + 8);
 -tcg_gen_st_i64(cpu_tmp1_i64, cpu_env, d_offset + 8);
 +tcg_gen_ld_i64(cpu_tmp1_i64, cpu_env, s_offset + offsetof(XMMReg, 
 XMM_Q(0));
 +tcg_gen_st_i64(cpu_tmp1_i64, cpu_env, d_offset + offsetof(XMMReg, 
 XMM_Q(0));
 +tcg_gen_ld_i64(cpu_tmp1_i64, cpu_env, s_offset + offsetof(XMMReg, 
 XMM_Q(1));
 +tcg_gen_st_i64(cpu_tmp1_i64, cpu_env, d_offset + offsetof(XMMReg, 
 XMM_Q(1));

It looks good (I even sent my Reviewed-by line), but:

target-i386/translate.c:2624:88: error: expected ‘)’ before ‘;’ token
 tcg_gen_ld_i64(cpu_tmp1_i64, cpu_env, s_offset + offsetof(XMMReg, 
XMM_Q(0));

^

-- 
Eduardo

Re: [Qemu-devel] [PATCH v5] block/raw-posix.c: Fixes raw_getlength() on Mac OS X so that it reports the correct length of a real CD

2015-01-13 Thread Programmingkid


On Jan 13, 2015, at 1:34 PM, Peter Maydell wrote:

 On 13 January 2015 at 18:26, Programmingkid programmingk...@gmail.com wrote:
 Allows for using real cdrom disc in QEMU under Mac OS X.
 
 This command was used to see if this patch worked:
 ./qemu-system-i386 -drive if=none,id=drive0,file=/dev/null,format=raw
 
 Signed-off-by: John Arbuckle programmingk...@gmail.com
 
 ---
 Replaced -errno with 0 for ioctl() failure return value.
 make check now passes with this patch.
 
 
 block/raw-posix.c |   18 +-
 configure |2 +-
 2 files changed, 18 insertions(+), 2 deletions(-)
 
 diff --git a/block/raw-posix.c b/block/raw-posix.c
 index e51293a..16fa0a4 100644
 --- a/block/raw-posix.c
 +++ b/block/raw-posix.c
 @@ -1312,7 +1312,23 @@ again:
 if (size == 0)
 #endif
 #if defined(__APPLE__)  defined(__MACH__)
 -size = LLONG_MAX;
 +{
 +uint64_t sectors = 0;
 +uint32_t sector_size = 0;
 +
 +/* Query the number of sectors on the disk */
 +ret = ioctl(fd, DKIOCGETBLOCKCOUNT, sectors);
 +if (ret == -1) {
 +return 0;
 
 We should be falling back to lseek, not returning 0...
 
 thanks
 -- PMM

I did try using lseek, but it doesn't work with Mac OS X block devices. It 
would always fail. Here is some more information on the problem: 
http://stackoverflow.com/questions/9073614/open-raw-disk-and-get-size-os-x. 

I thought things would be as simple as using fopen() with a block device as the 
argument, but Apple decided to make things more difficult. It looks like they 
are forcing developers to use the IOKit for device communications. Simple posix 
functions like lseek don't work.

Re: [Qemu-devel] [PATCH 2/2] cpu_ldst.h: Collapse laddr() and saddr() into ldst_get_host_addr()

2015-01-13 Thread Peter Maydell

On 13 January 2015 at 18:32, Peter Maydell peter.mayd...@linaro.org wrote:
 The macros laddr() and saddr() are always defined to be identical
 to each other. Replace them with a single macro, and give it a
 longer name so it's easier to grep the codebase and confirm that
 it's only used in this one place: ldst_get_host_addr().

Hmm. Actually I think we should be able to drop the _raw() accessors
entirely -- they're kind of nonsensical on softmmu and four unnecessary
extra characters on linux-user. (The only place we use them in softmmu
is in monitor.c where it should be using the ld*_p functions instead.)
I'll put together a patchset...

-- PMM

Re: [Qemu-devel] [PATCH 1/8] tls: require compiler support for __thread

2015-01-13 Thread Paolo Bonzini



On 13/01/2015 19:40, Peter Maydell wrote:
 On 13 January 2015 at 17:52, Paolo Bonzini pbonz...@redhat.com wrote:
 The block layer is now using __thread unconditionally.
 
 Where? I did a quick grep for __thread and (other than stuff in
 the *-user code and some Win32 specific files) there's no use of
 __thread outside the DEFINE_TLS wrappers.

It's in the pull request.

  Remove the
 fake TLS wrappers (that actually aren't TLS on !Linux) in
 include/qemu/tls.h, and add a testcase.
 
 What platforms have you tested on? We definitely shouldn't
 widen our use of __thread without testing it on all the
 platforms we support...

Native TLS is supported by all platforms except Windows and, I think,
OpenBSD, which will have to use GCC's emulated TLS.  For Windows we
already do.  OpenBSD ports will have to use a new-enough GCC (basically
depend on the GPLv3 GCC port).

Paolo

Re: [Qemu-devel] [PATCH 1/2] cpu_ldst.h: Remove unused ldul_ macros

2015-01-13 Thread Paolo Bonzini



On 13/01/2015 19:32, Peter Maydell wrote:
 The five ldul_ macros are not used anywhere and are marked up with an XXX
 comment. ldul is a non-standard prefix for our family of load instructions:
 we don't mark 32-bit accesses for signedness because they return a 32 bit
 quantity. So just delete them.
 
 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
  include/exec/cpu_ldst.h | 9 -
  1 file changed, 9 deletions(-)
 
 diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
 index e5550e7..4700831 100644
 --- a/include/exec/cpu_ldst.h
 +++ b/include/exec/cpu_ldst.h
 @@ -151,15 +151,6 @@
  
  #else
  
 -/* XXX: find something cleaner.
 - * Furthermore, this is false for 64 bits targets
 - */
 -#define ldul_user   ldl_user
 -#define ldul_kernel ldl_kernel
 -#define ldul_hypv   ldl_hypv
 -#define ldul_executive  ldl_executive
 -#define ldul_supervisor ldl_supervisor
 -
  /* The memory helpers for tcg-generated code need tcg_target_long etc.  */
  #include tcg.h
  
 

Trivial, even.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Re: [Qemu-devel] [PATCH for 2.3 v2 1/1] xen-hvm: increase maxmem before calling xc_domain_populate_physmap

2015-01-13 Thread Don Slutz

On 01/13/15 13:07, Stefano Stabellini wrote:
 On Mon, 12 Jan 2015, Stefano Stabellini wrote:
 On Wed, 3 Dec 2014, Don Slutz wrote:
 From: Stefano Stabellini stefano.stabell...@eu.citrix.com

 Increase maxmem before calling xc_domain_populate_physmap_exact to
 avoid the risk of running out of guest memory. This way we can also
 avoid complex memory calculations in libxl at domain construction
 time.

 This patch fixes an abort() when assigning more than 4 NICs to a VM.

 Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
 Signed-off-by: Don Slutz dsl...@verizon.com
 ---
 v2: Changes by Don Slutz
   Switch from xc_domain_getinfo to xc_domain_getinfolist
   Fix error check for xc_domain_getinfolist
   Limit increase of maxmem to only do when needed:
 Add QEMU_SPARE_PAGES (How many pages to leave free)
 Add free_pages calculation

  xen-hvm.c | 19 +++
  1 file changed, 19 insertions(+)

 diff --git a/xen-hvm.c b/xen-hvm.c
 index 7548794..d30e77e 100644
 --- a/xen-hvm.c
 +++ b/xen-hvm.c
 @@ -90,6 +90,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t 
 *shared_page, int vcpu)
  #endif
  
  #define BUFFER_IO_MAX_DELAY  100
 +#define QEMU_SPARE_PAGES 16

 We need a big comment here to explain why we have this parameter and
 when we'll be able to get rid of it.

 Other than that the patch is fine.

 Thanks!

 
 Actually I'll just go ahead and add the comment and commit, if for you
 is OK.
 

That would be fine with me.  I was still working on a good wording.
   -Don Slutz

 Cheers,
 
 Stefano

Re: [Qemu-devel] Fedora FC21 - Bug: 100% CPU and hangs in gettimeofday(tp, NULL); forever

2015-01-13 Thread Gerhard Wiesinger


On 12.01.2015 12:41, Gerhard Wiesinger wrote:

On 08.01.2015 23:28, Gerhard Wiesinger wrote:


I'll keep you up to date in the next days whether it happens again or 
not.


With qemu-kvm 2.2.0 release from the above repository the 100% usage 
didn't happen so far (although I had to reboot after kernel update).


It happens also with qemu-kvm 2.2.0 on another VM where also PostgreSQL 
is running:

(gdb) bt
#0  0x7fff9a1feff4 in gettimeofday ()
#1  0x006d425e in GetCurrentTimestamp () at timestamp.c:1274

What we know:
OK : F20: 3.17.6-200.fc20.x86_64 on guest/host, 
qemu-kvm-1.6.2-10.fc20.x86_64 on host
NOK: F21: 3.17.7-300.fc21.x86_64 on guest/host, 
qemu-kvm-2.1.2-7.fc21.x86_64 on host
NOK: F21: 3.17.8-300.fc21.x86_64 on guest/host, 
qemu-kvm-2.2.0-1.fc21.x86_64 on host


No one less can reproduce or has similar problems?
Any further ideas?

BTW: I'm running ntp in the following manner: internet = ntp server in 
VM = ntp client on KVM host (firewall runs in KVM)


Ciao,
Gerhard

Re: [Qemu-devel] [PATCH 0/1] pci: allow 0 address for PCI IO/MEM regions

2015-01-13 Thread Michael S. Tsirkin

On Tue, Jan 13, 2015 at 06:01:53PM +0100, Alexander Graf wrote:
 On 01/13/15 17:17, Michael S. Tsirkin wrote:
 On Tue, Jan 13, 2015 at 05:54:46PM +0200, Michael S. Tsirkin wrote:
 I think we already do this for PC:
 commit 83d08f2673504a299194dcac1657a13754b5932a
 Author: Michael S. Tsirkin m...@redhat.com
 Date:   Tue Oct 29 13:57:34 2013 +0100
 
 pc: map PCI address space as catchall region for not mapped 
  addresses
 
 but we need to find and fix all other targets.
 BTW this is very easy to test.
 Add an unused device (like ivshmem) enable BAR, and set it to 0.
 System should survive, as opposed to hanging.
 
 But the big question is whether this is the right
 thing to do for each platform.
 For PIIX whatever is not system memory, is PCI.
 But other boxes might have a different view of the matter.
 
 Only few platforms have PCI mapped at 0, no? So in most cases you get
 windows that simply don't have overlapping at all.
 
 
 Alex

What's missing is complete analysis of the problem.
Ideally testing on several targets.

-- 
MST

Re: [Qemu-devel] [PATCH 1/8] tls: require compiler support for __thread

2015-01-13 Thread Peter Maydell

On 13 January 2015 at 19:48, Paolo Bonzini pbonz...@redhat.com wrote:
 It's in the pull request.

That was sneaky :-)

 Native TLS is supported by all platforms except Windows and, I think,
 OpenBSD, which will have to use GCC's emulated TLS.  For Windows we
 already do.  OpenBSD ports will have to use a new-enough GCC (basically
 depend on the GPLv3 GCC port).

Hmm. That's a chunk of users who are now going to have to
change the way they've been building QEMU. Does configure
at least blow up on the old gcc, or will we just silently
build a non-working QEMU?

(I have an OpenBSD VM at home, I should test it...)

Presumably we've also just effectively dropped support for
some of the obscure stuff we have ifdefs for in configure?

-- PMM

Re: [Qemu-devel] [PATCH v15 2/2] sPAPR: Implement sPAPRPHBClass::eeh_handler

2015-01-13 Thread David Gibson

On Mon, Jan 05, 2015 at 11:26:28AM +1100, Gavin Shan wrote:
 The patch implements sPAPRPHBClass::eeh_handler so that the
 EEH RTAS requests can be routed to VFIO for further handling.
 
 Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
 ---
  hw/ppc/spapr_pci_vfio.c | 56 
 +
  hw/vfio/common.c|  1 +
  2 files changed, 57 insertions(+)
 
 diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
 index 144912b..73652a9 100644
 --- a/hw/ppc/spapr_pci_vfio.c
 +++ b/hw/ppc/spapr_pci_vfio.c
 @@ -71,6 +71,61 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState 
 *sphb, Error **errp)
  spapr_tce_get_iommu(tcet));
  }
  
 +static int spapr_phb_vfio_eeh_handler(sPAPRPHBState *sphb, int req, int opt)
 +{
 +sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
 +struct vfio_eeh_pe_op op = { .argsz = sizeof(op) };

This is a local variable, which means it won't be initialized.  You
never memset() it and it's not obvious that all fields get
initialized, which makes it dangerous to pass to an ioctl().

 +int cmd;
 +
 +switch (req) {
 +case RTAS_EEH_REQ_SET_OPTION:
 +switch (opt) {
 +case RTAS_EEH_DISABLE:
 +cmd = VFIO_EEH_PE_DISABLE;
 +break;
 +case RTAS_EEH_ENABLE:
 +cmd = VFIO_EEH_PE_ENABLE;
 +break;
 +case RTAS_EEH_THAW_IO:
 +cmd = VFIO_EEH_PE_UNFREEZE_IO;
 +break;
 +case RTAS_EEH_THAW_DMA:
 +cmd = VFIO_EEH_PE_UNFREEZE_DMA;
 +break;
 +default:
 +return -EINVAL;
 +}
 +break;
 +case RTAS_EEH_REQ_GET_STATE:
 +cmd = VFIO_EEH_PE_GET_STATE;
 +break;
 +case RTAS_EEH_REQ_RESET:
 +switch (opt) {
 +case RTAS_SLOT_RESET_DEACTIVATE:
 +cmd = VFIO_EEH_PE_RESET_DEACTIVATE;
 +break;
 +case RTAS_SLOT_RESET_HOT:
 +cmd = VFIO_EEH_PE_RESET_HOT;
 +break;
 +case RTAS_SLOT_RESET_FUNDAMENTAL:
 +cmd = VFIO_EEH_PE_RESET_FUNDAMENTAL;
 +break;
 +default:
 +return -EINVAL;
 +}
 +break;
 +case RTAS_EEH_REQ_CONFIGURE:
 +cmd = VFIO_EEH_PE_CONFIGURE;
 +break;
 +default:
 + return -EINVAL;
 +}
 +
 +op.op = cmd;
 +return vfio_container_ioctl(svphb-phb.iommu_as, svphb-iommugroupid,
 +VFIO_EEH_PE_OP, op);

Don't you need some sort of translation from the errnos the ioctl()
returns into RTAS error codes?

 +}
 +
  static void spapr_phb_vfio_reset(DeviceState *qdev)
  {
  /* Do nothing */
 @@ -84,6 +139,7 @@ static void spapr_phb_vfio_class_init(ObjectClass *klass, 
 void *data)
  dc-props = spapr_phb_vfio_properties;
  dc-reset = spapr_phb_vfio_reset;
  spc-finish_realize = spapr_phb_vfio_finish_realize;
 +spc-eeh_handler = spapr_phb_vfio_eeh_handler;
  }
  
  static const TypeInfo spapr_phb_vfio_info = {
 diff --git a/hw/vfio/common.c b/hw/vfio/common.c
 index cf483ff..8a10c8b 100644
 --- a/hw/vfio/common.c
 +++ b/hw/vfio/common.c
 @@ -948,6 +948,7 @@ int vfio_container_ioctl(AddressSpace *as, int32_t 
 groupid,
  switch (req) {
  case VFIO_CHECK_EXTENSION:
  case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
 +case VFIO_EEH_PE_OP:
  break;
  default:
  /* Return an error on unknown requests */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpjGi7d6REGx.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v15 1/2] sPAPR: Implement EEH RTAS calls

2015-01-13 Thread David Gibson

On Mon, Jan 05, 2015 at 11:26:27AM +1100, Gavin Shan wrote:
 The emulation for EEH RTAS requests from guest isn't covered
 by QEMU yet and the patch implements them.
 
 The patch defines constants used by EEH RTAS calls and adds
 callback sPAPRPHBClass::eeh_handler, which is going to be used
 this way:
 
   * RTAS calls are received in spapr_pci.c, sanity check is done
 there.
   * RTAS handlers handle what they can. If there is something it
 cannot handle and sPAPRPHBClass::eeh_handler callback is defined,
 it is called.
   * sPAPRPHBClass::eeh_handler is only implemented for VFIO now. It
 does ioctl() to the IOMMU container fd to complete the call. Error
 codes from that ioctl() are transferred back to the guest.
 
 [aik: defined RTAS tokens for EEH RTAS calls]
 Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
 ---
  hw/ppc/spapr_pci.c  | 275 
 
  include/hw/pci-host/spapr.h |   7 ++
  include/hw/ppc/spapr.h  |  43 ++-
  3 files changed, 323 insertions(+), 2 deletions(-)
 
 diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
 index 21b95b3..a150074 100644
 --- a/hw/ppc/spapr_pci.c
 +++ b/hw/ppc/spapr_pci.c
 @@ -406,6 +406,262 @@ static void 
 rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu,
  rtas_st(rets, 2, 1);/* 0 == level; 1 == edge */
  }
  
 +static int rtas_handle_eeh_request(sPAPREnvironment *spapr,
 +   uint64_t buid, uint32_t req, uint32_t opt)
 +{
 +sPAPRPHBState *sphb = find_phb(spapr, buid);
 +sPAPRPHBClass *info;
 +
 +if (!sphb) {
 +return -ENODEV;

I think it would make more sense to return RTAS error codes here,
rather than errnos.  At present all the callers seem to ignore the
exact value of this return value.

But it's not really correct to return RTAS_OUT_HW_ERROR for a bad
BUID, which is what this will do now.

Also several of the callers have already done a find_phb() by the time
they call this.  Perhaps it would make more sense for this function to
take a sPAPRPHBState * instead of the buid.

 +}
 +
 +info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
 +if (!info-eeh_handler) {
 +return -ENOENT;
 +}
 +
 +return info-eeh_handler(sphb, req, opt);
 +}
 +
 +static void rtas_ibm_set_eeh_option(PowerPCCPU *cpu,
 +sPAPREnvironment *spapr,
 +uint32_t token, uint32_t nargs,
 +target_ulong args, uint32_t nret,
 +target_ulong rets)
 +{
 +uint32_t addr, option;
 +uint64_t buid;
 +int ret;
 +
 +if ((nargs != 4) || (nret != 1)) {
 +goto param_error_exit;
 +}
 +
 +buid = ((uint64_t)rtas_ld(args, 1)  32) | rtas_ld(args, 2);
 +addr = rtas_ld(args, 0);
 +option = rtas_ld(args, 3);
 +switch (option) {
 +case RTAS_EEH_ENABLE:
 +if (!find_dev(spapr, buid, addr)) {
 +goto param_error_exit;
 +}
 +break;
 +case RTAS_EEH_DISABLE:
 +case RTAS_EEH_THAW_IO:
 +case RTAS_EEH_THAW_DMA:

So, currently these are all implemented as no-ops.  Is that correct?

 +break;
 +default:
 +goto param_error_exit;
 +}
 +
 +ret = rtas_handle_eeh_request(spapr, buid,
 +  RTAS_EEH_REQ_SET_OPTION, option);
 +if (ret = 0) {
 +rtas_st(rets, 0, RTAS_OUT_SUCCESS);
 +} else {
 +rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
 +}
 +
 +return;
 +
 +param_error_exit:
 +rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
 +}
 +
 +static void rtas_ibm_get_config_addr_info2(PowerPCCPU *cpu,
 +   sPAPREnvironment *spapr,
 +   uint32_t token, uint32_t nargs,
 +   target_ulong args, uint32_t nret,
 +   target_ulong rets)
 +{
 +uint32_t addr, option;
 +uint64_t buid;
 +sPAPRPHBState *sphb;
 +sPAPRPHBClass *info;
 +PCIDevice *pdev;
 +
 +if ((nargs != 4) || (nret != 2)) {
 +goto param_error_exit;
 +}
 +
 +buid = ((uint64_t)rtas_ld(args, 1)  32) | rtas_ld(args, 2);
 +sphb = find_phb(spapr, buid);
 +if (!sphb) {
 +goto param_error_exit;
 +}
 +
 +info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(sphb);
 +if (!info-eeh_handler) {
 +goto param_error_exit;
 +}
 +
 +addr = rtas_ld(args, 0);
 +option = rtas_ld(args, 3);
 +if (option != RTAS_GET_PE_ADDR  option != RTAS_GET_PE_MODE) {
 +goto param_error_exit;
 +}
 +
 +pdev = find_dev(spapr, buid, addr);
 +if (!pdev) {
 +goto param_error_exit;
 +}
 +
 +/*
 + * For now, we always have bus level PE whose address
 + * has format 00BBSS00. The guest OS might regard
 + * PE address 0 as invalid. We avoid that simply by
 + * extending it with one.
 +

Re: [Qemu-devel] Fedora FC21 - Bug: 100% CPU and hangs in gettimeofday(tp, NULL); forever

2015-01-13 Thread Laine Stump

On 01/13/2015 05:01 PM, Gerhard Wiesinger wrote:
 On 13.01.2015 22:16, Paolo Bonzini wrote:

 On 13/01/2015 22:14, Gerhard Wiesinger wrote:
 I also had a look at the kernel code again:
 http://lxr.free-electrons.com/source/kernel/time/timekeeping.c?v=3.17#L493

 499 do {
 500 seq = read_seqcount_begin(tk_core.seq);
 501
 502 ts-tv_sec = tk-xtime_sec;
 503 nsecs = timekeeping_get_ns(tk-tkr);
 504
 505 } while (read_seqcount_retry(tk_core.seq, seq));

 So it looks like that the seqcount always changes and therefore loops
 forever here (as far as I digged it down this is the only loop here).

 Might be something wrong with the memory barriers in recent qemu-kvm
 releases?
 No, that's not possible.  Unless you pause/resume or migrate the VM, all
 of the handling of kvmclock is entirely in the kernel.

 Any other possible explaination of the problem?

 Had a look at the diff (I guess the right file at least in qemu tree):
 # no critical changes IHMO here
 git diff -u v1.6.2..v2.1.2 ./hw/i386/kvm/clock.c

 Trying to reproduce with a loop:
 #include sys/time.h
 #include stdio.h

 int main(int argc, char* argv[])
 {
   struct timeval tv;
   int i = 0;
   for (;;)
   {
 gettimeofday(tv, 0);
 ++i;
 if (i = 1000)
 {
   i = 0;
   printf(%i\n, (int)tv.tv_sec);
 }
   }
   return 0;
 }

 As I wrote this: First tests seem to run well, so no quick win ,
 I could reproduce it with a stall in 318s :-)
 (gdb) bt
 #0  0x7fff6d9fefff in gettimeofday ()
 #1  0x004005ad in main (argc=1, argv=0x7fff6d9b28b8) at
 gettimeofdayloop.c:10

 So we have at least a testcase which is quickly to reproduce.

 So we are digging down my second findings about a major bug in
 qemu-kvm  :-)

 Can someone try, too?

 Ciao,
 Gerhard



Take a look at the following kernel bug. It specifically deals with a
hang in gettimeofday() in a KVM guest:

https://bugzilla.redhat.com/show_bug.cgi?id=1178975

There is a link to a patched kernel you can try; it fixed my problems (I
was repeatedly getting hangs in python-urlgrabber during yum updates on
F21).

[Qemu-devel] [PATCH] pseries: Limit PCI host bridge index value

2015-01-13 Thread David Gibson

pseries guests can have large numbers of PCI host bridges.  To avoid the
user having to specify a number of different configuration values for every
one, the device supports an index property which is a shorthand setting
the various window and configuration addresses from a predefined sensible
set.

There are some problems with the details at present:
  * The index propery is signed, but negative values will create PCI
windows below where we expect, potentially colliding with other devices
  * No limit is imposed on the index property and large values can
translate to extremely large window addresses.  With PCI passthrough in
particular this can mean we exceed various mapping and physical address
limits causing the guest host bridge to not work in strange ways.

This patch addresses this, by making index unsigned, and imposing a
limit.  Currently the limit allows indices from 0..255 which is probably
enough host bridges for the time being.  It's fairly easy to extend if
we discover we need more.

Signed-off-by: David Gibson da...@gibson.dropbear.id.au
---
 hw/ppc/spapr_pci.c  | 8 +++-
 include/hw/pci-host/spapr.h | 4 +++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 21b95b3..6deeb19 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -501,6 +501,12 @@ static void spapr_phb_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
+if (sphb-index  SPAPR_PCI_MAX_INDEX) {
+error_setg(errp, \index\ for PAPR PHB is too large (max %u),
+   SPAPR_PCI_MAX_INDEX);
+return;
+}
+
 sphb-buid = SPAPR_PCI_BASE_BUID + sphb-index;
 sphb-dma_liobn = SPAPR_PCI_BASE_LIOBN + sphb-index;
 
@@ -669,7 +675,7 @@ static void spapr_phb_reset(DeviceState *qdev)
 }
 
 static Property spapr_phb_properties[] = {
-DEFINE_PROP_INT32(index, sPAPRPHBState, index, -1),
+DEFINE_PROP_UINT32(index, sPAPRPHBState, index, -1),
 DEFINE_PROP_UINT64(buid, sPAPRPHBState, buid, -1),
 DEFINE_PROP_UINT32(liobn, sPAPRPHBState, dma_liobn, -1),
 DEFINE_PROP_UINT64(mem_win_addr, sPAPRPHBState, mem_win_addr, -1),
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 4ea2a0d..876ecf0 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -64,7 +64,7 @@ typedef struct spapr_pci_msi_mig {
 struct sPAPRPHBState {
 PCIHostState parent_obj;
 
-int32_t index;
+uint32_t index;
 uint64_t buid;
 char *dtbusname;
 
@@ -94,6 +94,8 @@ struct sPAPRPHBVFIOState {
 int32_t iommugroupid;
 };
 
+#define SPAPR_PCI_MAX_INDEX  255
+
 #define SPAPR_PCI_BASE_BUID  0x8002000ULL
 
 #define SPAPR_PCI_WINDOW_BASE0x100ULL
-- 
2.1.0

Re: [Qemu-devel] question about live migration with storage

2015-01-13 Thread Zhang Haoyu


On 2015-01-13 17:45:45, Paolo Bonzini wrote:
 
 On 13/01/2015 03:03, Zhang Haoyu wrote:
  I want to live migrate a vm with storage, with regard to the migration of 
  storage,
  should I use drive_mirror or traditional mechanism implemented in 
  block-migration.c ?
  
  Because I don't use libvirtd to manage vm,
  if I want to use drive_mirror to perform live migration with storage,
  how to organize the flow via script?
 
 The same as libvirt does.
 
Hi, Paolo,
what's advantages of drive_mirror over traditional mechanism implemented in 
block-migration.c ?
Why libvirt use drive_mirror instead of traditional iterative mechanism as the 
default way
of live migration with non-shared storage?

Thanks,
Zhang Haoyu
 Paolo

Re: [Qemu-devel] [PATCH v3] spapr-pci: Enable huge BARs

2015-01-13 Thread David Gibson

On Fri, Jan 09, 2015 at 12:11:14PM +1100, Alexey Kardashevskiy wrote:
 At the moment sPAPR only supports 512MB window for MMIO BARs. However
 modern devices might want bigger 64bit BARs.
 
 This extends MMIO window from 512MB to 62GB (aligned to
 SPAPR_PCI_WINDOW_SPACING) and advertises it in 2 records in
 the PHB ranges property. 32bit gets the space from
 SPAPR_PCI_MEM_WIN_BUS_OFFSET till the end of 4GB, 64bit gets the rest
 of the space.
 
 The approach changes the device tree which is a guest visible change, however
 it won't break migration as:
 1. we do not support migration to older QEMU versions
 2. migration to newer QEMU will migrate the device tree as well and since
 the new layout only extends the old one and does not change address mappigns,
 no breakage is expected here too.
 
 SLOF change is required to utilize this extension.
 
 Suggested-by: Benjamin Herrenschmidt b...@kernel.crashing.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 Changes:
 v3:
 * used SPAPR_PCI_WINDOW_SPACING in SPAPR_PCI_MMIO_WIN_SIZE
 * extended commit log
 
 v2:
 * do not change existing memory layout
 * do not create another mmio window
 ---
  hw/ppc/spapr_pci.c  | 8 +++-
  include/hw/pci-host/spapr.h | 7 ---
  2 files changed, 11 insertions(+), 4 deletions(-)
 
 diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
 index c0d8c1c..7b73106 100644
 --- a/hw/ppc/spapr_pci.c
 +++ b/hw/ppc/spapr_pci.c
 @@ -862,6 +862,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
  int bus_off, i, j;
  char nodename[256];
  uint32_t bus_range[] = { cpu_to_be32(0), cpu_to_be32(0xff) };
 +const uint64_t win32size = (1ULL  32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET;
  struct {
  uint32_t hi;
  uint64_t child;
 @@ -876,7 +877,12 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
  {
  cpu_to_be32(b_ss(2)), cpu_to_be64(SPAPR_PCI_MEM_WIN_BUS_OFFSET),
  cpu_to_be64(phb-mem_win_addr),
 -cpu_to_be64(memory_region_size(phb-memwindow)),
 +cpu_to_be64(win32size),
 +},
 +{
 +cpu_to_be32(b_ss(3)), cpu_to_be64(1ULL  32),
 +cpu_to_be64(phb-mem_win_addr + win32size),
 +cpu_to_be64(memory_region_size(phb-memwindow) - win32size)
  },

I think this needs to be a little fancier.  It will work in the case
of the normal PHB configuration.  But the user can override the size
of the memory window size, in which case you may need to omit the
second range, or even limit the size of the 32-bit range.

  };
  uint64_t bus_reg[] = { cpu_to_be64(phb-buid), 0 };
 diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
 index 4ea2a0d..92695b6 100644
 --- a/include/hw/pci-host/spapr.h
 +++ b/include/hw/pci-host/spapr.h
 @@ -96,17 +96,18 @@ struct sPAPRPHBVFIOState {
  
  #define SPAPR_PCI_BASE_BUID  0x8002000ULL
  
 +#define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x8000ULL
 +
  #define SPAPR_PCI_WINDOW_BASE0x100ULL
  #define SPAPR_PCI_WINDOW_SPACING 0x10ULL
  #define SPAPR_PCI_MMIO_WIN_OFF   0xA000
 -#define SPAPR_PCI_MMIO_WIN_SIZE  0x2000
 +#define SPAPR_PCI_MMIO_WIN_SIZE  (SPAPR_PCI_WINDOW_SPACING - \
 + SPAPR_PCI_MEM_WIN_BUS_OFFSET)
  #define SPAPR_PCI_IO_WIN_OFF 0x8000
  #define SPAPR_PCI_IO_WIN_SIZE0x1
  
  #define SPAPR_PCI_MSI_WINDOW 0x400ULL
  
 -#define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x8000ULL
 -
  static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin)
  {
  return xics_get_qirq(spapr-icp, phb-lsi_table[pin].irq);

I think you also should disable the extended window for older machine
revisions.  It would probably work anyway, but I think it's going to
be safer for backwards compat if you keep the windows the same for a
given machine revision.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpJv40AShru7.pgp
Description: PGP signature

Re: [Qemu-devel] [RFC PATCH] spapr_vio/spapr_iommu: Move VIO bypass where it belongs

2015-01-13 Thread David Gibson

On Fri, Jan 09, 2015 at 12:02:51PM +1100, Alexey Kardashevskiy wrote:
 Instead of tweaking a TCE table device by adding there a bypass flag,
 let's add an alias to RAM and IOMMU memory region, and enable/disable
 those according to the selected bypass mode.
 This way IOMMU memory region can have size of the actual window rather
 than ram_size which is essential for upcoming DDW support.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 
 The main reason for this patch is DDW and the fact that sPAPRTCETable
 used for DMA windows for VFIO. My latest approach removes all DMA windows
 on the guest reset (and creates a new 32bit one) which means than VFIO
 unmaps everything and this fails as normally sPAPRTCETable MemoryRegion is
 ram_size big (to support bypass) while it should be 1-2GB.

Paolo already mentioned the only significant problem I see with this,
which is migration backwards compat.  Otherwise it looks good, sorry I
messed it up in the first place :).

[snip]
 @@ -456,14 +469,25 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
  
  if (pc-rtce_window_size) {
  uint32_t liobn = SPAPR_VIO_BASE_LIOBN | dev-reg;
 +
 +memory_region_init(dev-mrroot, OBJECT(dev), iommu-spapr-root,
 +   ram_size);
 +memory_region_init_alias(dev-mrbypass, OBJECT(dev),
 + iommu-spapr-bypass, get_system_memory(),
 + 0, ram_size);
 +memory_region_add_subregion_overlap(dev-mrroot, 0, dev-mrbypass, 
 1);
 +address_space_init(dev-as, dev-mrroot, qdev-id);
 +
  dev-tcet = spapr_tce_new_table(qdev, liobn,
  0,
  SPAPR_TCE_PAGE_SHIFT,
  pc-rtce_window_size 
  SPAPR_TCE_PAGE_SHIFT, false);
 -address_space_init(dev-as, spapr_tce_get_iommu(dev-tcet), 
 qdev-id);
 +memory_region_add_subregion_overlap(dev-mrroot, 0,
 +spapr_tce_get_iommu(dev-tcet), 
 2);
  }
  
 +

^^ Tiny nit, extraneous whitespace change.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpggLAYVI6v_.pgp
Description: PGP signature

1 2 3 >

1 - 100 of 237 matches

Mail list logo