Re: [Qemu-block] [PATCH] nvme: generate OpenFirmware device path in the "bootorder" fw_cfg file

2016-01-26 Thread Gonglei (Arei)
>
> Subject: [PATCH] nvme: generate OpenFirmware device path in the "bootorder"
> fw_cfg file
> 
> Background on QEMU boot indices
> ---
> 
> Normally, the "bootindex" property is configured for bootable devices
> with:
> 
>   DEVICE_instance_init()
> device_add_bootindex_property(..., "bootindex", ...)
>   object_property_add(..., device_get_bootindex,
>   device_set_bootindex, ...)
> 
> and when the bootindex is set on the QEMU command line, with
> 
>   -device DEVICE,...,bootindex=N
> 
> the setter that was configured above is invoked:
> 
>   device_set_bootindex()
> /* parse boot index */
> visit_type_int32()
> 
> /* verify unicity */
> check_boot_index()
> 
> /* store parsed boot index */
> ...
> 
> /* insert device path to boot order */
> add_boot_device_path()
> 
> In the last step, add_boot_device_path() ensures that an OpenFirmware
> device path will show up in the "bootorder" fw_cfg file, at a position
> corresponding to the device's boot index. Thus guest firmware (SeaBIOS and
> OVMF) can try to boot off the device with the right priority.
> 
> NVMe boot index
> ---
> 
> In QEMU commit 33739c712982,
> 
>   nvma: ide: add bootindex to qom property
> 
> the following generic setters / getters:
> - device_set_bootindex()
> - device_get_bootindex()
> 
> were open-coded for NVMe, under the names
> - nvme_set_bootindex()
> - nvme_get_bootindex()
> 
> Plus nvme_instance_init() was added to configure the "bootindex" property
> manually, designating the open-coded getter & setter, rather than calling
> device_add_bootindex_property().
> 
> Crucially, nvme_set_bootindex() avoided the final add_boot_device_path()
> call. This fact is spelled out in the message of commit 33739c712982, and
> it was presumably the entire reason for all of the code duplication.
> 
> Now, Vladislav filed an RFE for OVMF
> ; OVMF should boot off NVMe
> devices. It is simple to build edk2's existent NvmExpressDxe driver into
> OVMF, but the boot order matching logic in OVMF can only handle NVMe if
> the "bootorder" fw_cfg file includes such devices.
> 
> Therefore this patch converts the NVMe device model to
> device_set_bootindex() all the way.
> 
> Device paths
> 
> 
> device_set_bootindex() accepts an optional parameter called "suffix". When
> present, it is expected to take the form of an OpenFirmware device path
> node, and it gets appended as last node to the otherwise auto-generated
> OFW path.
> 
> For NVMe, the auto-generated part is
> 
>   /pci@i0cf8/pci8086,5845@6[,1]
>^ ^^  ^
>| |PCI slot and (present when nonzero)
>| |function of the NVMe controller, both hex
>| "driver name" component, built from PCI vendor & device IDs
>PCI root at system bus port, PIO
> 
> to which here we append the suffix
> 
>   /namespace@1,0
>  ^ ^
>  | big endian (MSB at lowest address) numeric interpretation
>  | of the 64-bit IEEE Extended Unique Identifier, aka EUI-64,
>  | hex
>  32-bit NVMe namespace identifier, aka NSID, hex
> 
> resulting in the OFW device path
> 
>   /pci@i0cf8/pci8086,5845@6[,1]/namespace@1,0
> 
> The reason for including the NSID and the EUI-64 is that an NVMe device
> can in theory produce several different namespaces (distinguished by
> NSID). Additionally, each of those may (optionally) have an EUI-64 value.
> 
> For now, QEMU only provides namespace 1.
> 
> Furthermore, QEMU doesn't even represent the EUI-64 as a standalone field;
> it is embedded (and left unused) inside the "NvmeIdNs.res30" array, at the
> last eight bytes. (Which is fine, since EUI-64 can be left zero-filled if
> unsupported by the device.)
> 
> Based on the above, we set the "unit address" part of the last
> ("namespace") node to fixed "1,0".
> 
> OVMF will then map the above OFW device path to the following UEFI device
> path fragment, for boot order processing:
> 
>   PciRoot(0x0)/Pci(0x6,0x1)/NVMe(0x1,00-00-00-00-00-00-00-00)
>   ^^   ^^^   ^
>   ||   |||   octets of the EUI-64 in address
> order
>   ||   ||NSID
>   ||   |NVMe namespace messaging device path
> node
>   |PCI slot and function
>   PCI root bridge
> 
> Cc: Keith Busch  (supporter:nvme)
> Cc: Kevin Wolf  (supporter:Block layer core)
> Cc: qemu-block@nongnu.org (open list:nvme)
> Cc: Gonglei 
> Cc: Vladislav Vovchenko 
> Cc: Feng Tian 
> Cc: Gerd Hoffmann 
> Cc: Kevin O'Connor 
> Signed-off-by: Laszlo Ersek 
> ---
>  hw/block/nvme.c | 42 +-
>  1 file changed, 5 insertions(+), 37 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index a5fedb2..c68b625 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nv

Re: [Qemu-block] [Qemu-devel] [PATCH v2 5/5] blockjob: add Job parameter to BlockCompletionFunc

2016-01-26 Thread Fam Zheng
On Tue, 01/26 18:54, John Snow wrote:
> It will no longer be sufficient to rely on the opaque parameter
> containing a BDS, and there's no way to reliably include a
> self-reference to the job we're creating, so always pass the Job
> object forward to any callbacks.
> 
> Signed-off-by: John Snow 
> ---
>  block/backup.c|  2 +-
>  block/commit.c|  2 +-
>  block/mirror.c|  6 +++---
>  block/stream.c|  2 +-
>  blockdev.c| 14 +-
>  blockjob.c| 13 +++--
>  include/block/block.h |  2 ++
>  include/block/block_int.h | 10 +-
>  include/block/blockjob.h  |  6 --
>  qemu-img.c|  4 ++--
>  tests/test-blockjob-txn.c |  4 ++--
>  11 files changed, 37 insertions(+), 28 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 735cbe6..dd7e532 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -492,7 +492,7 @@ BlockJob *backup_start(BlockDriverState *bs, 
> BlockDriverState *target,
> BdrvDirtyBitmap *sync_bitmap,
> BlockdevOnError on_source_error,
> BlockdevOnError on_target_error,
> -   BlockCompletionFunc *cb, void *opaque,
> +   BlockJobCompletionFunc *cb, void *opaque,
> BlockJobTxn *txn, Error **errp)
>  {
>  int64_t len;
> diff --git a/block/commit.c b/block/commit.c
> index 446a3ae..aca4b84 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -203,7 +203,7 @@ static const BlockJobDriver commit_job_driver = {
>  
>  void commit_start(BlockDriverState *bs, BlockDriverState *base,
>BlockDriverState *top, int64_t speed,
> -  BlockdevOnError on_error, BlockCompletionFunc *cb,
> +  BlockdevOnError on_error, BlockJobCompletionFunc *cb,
>void *opaque, const char *backing_file_str, Error **errp)
>  {
>  CommitBlockJob *s;
> diff --git a/block/mirror.c b/block/mirror.c
> index b193e36..8016834 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -715,7 +715,7 @@ static BlockJob *mirror_start_job(BlockDriverState *bs,
>BlockdevOnError on_source_error,
>BlockdevOnError on_target_error,
>bool unmap,
> -  BlockCompletionFunc *cb,
> +  BlockJobCompletionFunc *cb,
>void *opaque, Error **errp,
>const BlockJobDriver *driver,
>bool is_none_mode, BlockDriverState *base)
> @@ -802,7 +802,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
> *target,
>MirrorSyncMode mode, BlockdevOnError on_source_error,
>BlockdevOnError on_target_error,
>bool unmap,
> -  BlockCompletionFunc *cb,
> +  BlockJobCompletionFunc *cb,
>void *opaque, Error **errp)
>  {
>  bool is_none_mode;
> @@ -827,7 +827,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
> *target,
>  BlockJob *commit_active_start(BlockDriverState *bs, BlockDriverState *base,
>int64_t speed,
>BlockdevOnError on_error,
> -  BlockCompletionFunc *cb,
> +  BlockJobCompletionFunc *cb,
>void *opaque, Error **errp)
>  {
>  int64_t length, base_length;
> diff --git a/block/stream.c b/block/stream.c
> index 26a2990..0c8ae20 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -217,7 +217,7 @@ static const BlockJobDriver stream_job_driver = {
>  BlockJob *stream_start(BlockDriverState *bs, BlockDriverState *base,
>const char *backing_file_str, int64_t speed,
>BlockdevOnError on_error,
> -  BlockCompletionFunc *cb,
> +  BlockJobCompletionFunc *cb,
>void *opaque, Error **errp)
>  {
>  StreamBlockJob *s;
> diff --git a/blockdev.c b/blockdev.c
> index ad46aa8..9851405 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2854,28 +2854,24 @@ out:
>  aio_context_release(aio_context);
>  }
>  
> -static void block_job_cb(void *opaque, int ret)
> +static void block_job_cb(BlockJob *job, int ret)
>  {
>  /* Note that this function may be executed from another AioContext 
> besides
>   * the QEMU main loop.  If you need to access anything that assumes the
>   * QEMU global mutex, use a BH or introduce a mutex.
>   */
> -
> -BlockDriverState *bs = opaque;
>  const char *msg = NULL;
>  
> -trace_block_job_cb(bs, bs->job, ret);
> -
> -assert(bs->job);
> +trace_block_job_cb(job->bs, job, ret);
>  
>  if (ret < 0) {
>  msg = strerr

Re: [Qemu-block] [Qemu-devel] [PATCH v2 4/5] block/backup: Pack Notifier within BackupBlockJob

2016-01-26 Thread Fam Zheng
On Tue, 01/26 18:54, John Snow wrote:
> Instead of relying on peeking at bs->job, we want to explicitly get
> a reference to the job that was involved in this notifier callback.
> 
> Pack the Notifier inside of the BackupBlockJob so we can use
> container_of to get a reference back to the BackupBlockJob object.
> 
> This cuts out one more case where we rely unnecessarily on bs->job.
> 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: John Snow 
> ---
>  block/backup.c | 20 +++-
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index ce96b45..735cbe6 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -46,6 +46,7 @@ typedef struct BackupBlockJob {
>  CoRwlock flush_rwlock;
>  uint64_t sectors_read;
>  HBitmap *bitmap;
> +NotifierWithReturn before_write;
>  QLIST_HEAD(, CowRequest) inflight_reqs;
>  } BackupBlockJob;
>  
> @@ -87,11 +88,11 @@ static void cow_request_end(CowRequest *req)
>  }
>  
>  static int coroutine_fn backup_do_cow(BlockDriverState *bs,
> +  BackupBlockJob *job,
>int64_t sector_num, int nb_sectors,
>bool *error_is_read,
>bool is_write_notifier)
>  {
> -BackupBlockJob *job = (BackupBlockJob *)bs->job;
>  CowRequest cow_request;
>  struct iovec iov;
>  QEMUIOVector bounce_qiov;
> @@ -189,6 +190,7 @@ static int coroutine_fn backup_before_write_notify(
>  NotifierWithReturn *notifier,
>  void *opaque)
>  {
> +BackupBlockJob *job = container_of(notifier, BackupBlockJob, 
> before_write);
>  BdrvTrackedRequest *req = opaque;
>  int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
>  int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
> @@ -196,7 +198,8 @@ static int coroutine_fn backup_before_write_notify(
>  assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
>  assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
>  
> -return backup_do_cow(req->bs, sector_num, nb_sectors, NULL, true);
> +return backup_do_cow(req->bs, job, sector_num,
> + nb_sectors, NULL, true);
>  }
>  
>  static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
> @@ -344,7 +347,8 @@ static int coroutine_fn 
> backup_run_incremental(BackupBlockJob *job)
>  if (yield_and_check(job)) {
>  return ret;
>  }
> -ret = backup_do_cow(bs, cluster * BACKUP_SECTORS_PER_CLUSTER,
> +ret = backup_do_cow(bs, job,
> +cluster * BACKUP_SECTORS_PER_CLUSTER,
>  BACKUP_SECTORS_PER_CLUSTER, 
> &error_is_read,
>  false);
>  if ((ret < 0) &&
> @@ -380,9 +384,6 @@ static void coroutine_fn backup_run(void *opaque)
>  BlockDriverState *bs = job->common.bs;
>  BlockDriverState *target = job->target;
>  BlockdevOnError on_target_error = job->on_target_error;
> -NotifierWithReturn before_write = {
> -.notify = backup_before_write_notify,
> -};
>  int64_t start, end;
>  int ret = 0;
>  
> @@ -400,7 +401,8 @@ static void coroutine_fn backup_run(void *opaque)
>  blk_iostatus_enable(target->blk);
>  }
>  
> -bdrv_add_before_write_notifier(bs, &before_write);
> +job->before_write.notify = backup_before_write_notify;
> +bdrv_add_before_write_notifier(bs, &job->before_write);
>  
>  if (job->sync_mode == MIRROR_SYNC_MODE_NONE) {
>  while (!block_job_is_cancelled(&job->common)) {
> @@ -452,7 +454,7 @@ static void coroutine_fn backup_run(void *opaque)
>  }
>  }
>  /* FULL sync mode we copy the whole drive. */
> -ret = backup_do_cow(bs, start * BACKUP_SECTORS_PER_CLUSTER,
> +ret = backup_do_cow(bs, job, start * BACKUP_SECTORS_PER_CLUSTER,
>  BACKUP_SECTORS_PER_CLUSTER, &error_is_read, false);
>  if (ret < 0) {
>  /* Depending on error action, fail now or retry cluster */
> @@ -468,7 +470,7 @@ static void coroutine_fn backup_run(void *opaque)
>  }
>  }
>  
> -notifier_with_return_remove(&before_write);
> +notifier_with_return_remove(&job->before_write);
>  
>  /* wait until pending backup_do_cow() calls have completed */
>  qemu_co_rwlock_wrlock(&job->flush_rwlock);
> -- 
> 2.4.3
> 
> 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [Qemu-devel] [PATCH v5 00/12] Dirty bitmaps migration

2016-01-26 Thread Fam Zheng
On Tue, 01/26 17:57, John Snow wrote:
> Can we implement a function that, in the event of a disaster, compares
> the current state of the drive with the last known good incremental and
> populates a bitmap based on the difference?
> 
> Actually, we probably really want this feature around regardless. It'll
> allow us to start incremental backup chains from full backups made
> before we even knew we wanted to start making them.
> 
> Something like:
> 
> block-dirty-bitmap-diff node=drive0 name=bitmap0 target=/path/to/file
> 
> I like this idea, I think I'll prototype just this little piece, since
> it's useful even without postcopy bitmap migration.

Good idea!  It looks like another type of block job.

Fam



[Qemu-block] [PATCH v2 1/5] block: Allow mirror_start to return job references

2016-01-26 Thread John Snow
Pick up an extra reference in mirror_start_job to allow callers
of mirror_start and commit_active_start to get a reference to
the job they have created. Phase out callers from fishing the job
out of bs->job -- use the return value instead.

Callers of mirror_start_job and commit_active_start are now
responsible for putting down their reference to the job.

No callers of mirror_start yet seem to need the reference, so
that's left as a void return for now.

Ultimately, this patch fixes qemu-img's reliance on bs->job.

Reviewed-by: Kevin Wolf 
Signed-off-by: John Snow 
---
 block/mirror.c| 72 ++-
 blockdev.c|  8 --
 include/block/block_int.h | 10 +++
 qemu-img.c| 12 +---
 4 files changed, 59 insertions(+), 43 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index e9e151c..b193e36 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -707,17 +707,18 @@ static const BlockJobDriver commit_active_job_driver = {
 .complete  = mirror_complete,
 };
 
-static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
- const char *replaces,
- int64_t speed, uint32_t granularity,
- int64_t buf_size,
- BlockdevOnError on_source_error,
- BlockdevOnError on_target_error,
- bool unmap,
- BlockCompletionFunc *cb,
- void *opaque, Error **errp,
- const BlockJobDriver *driver,
- bool is_none_mode, BlockDriverState *base)
+static BlockJob *mirror_start_job(BlockDriverState *bs,
+  BlockDriverState *target,
+  const char *replaces,
+  int64_t speed, uint32_t granularity,
+  int64_t buf_size,
+  BlockdevOnError on_source_error,
+  BlockdevOnError on_target_error,
+  bool unmap,
+  BlockCompletionFunc *cb,
+  void *opaque, Error **errp,
+  const BlockJobDriver *driver,
+  bool is_none_mode, BlockDriverState *base)
 {
 MirrorBlockJob *s;
 BlockDriverState *replaced_bs;
@@ -732,12 +733,12 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
  on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
 (!bs->blk || !blk_iostatus_is_enabled(bs->blk))) {
 error_setg(errp, QERR_INVALID_PARAMETER, "on-source-error");
-return;
+return NULL;
 }
 
 if (buf_size < 0) {
 error_setg(errp, "Invalid parameter 'buf-size'");
-return;
+return NULL;
 }
 
 if (buf_size == 0) {
@@ -749,19 +750,19 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
 if (replaces) {
 replaced_bs = bdrv_lookup_bs(replaces, replaces, errp);
 if (replaced_bs == NULL) {
-return;
+return NULL;
 }
 } else {
 replaced_bs = bs;
 }
 if (replaced_bs->blk && target->blk) {
 error_setg(errp, "Can't create node with two BlockBackends");
-return;
+return NULL;
 }
 
 s = block_job_create(driver, bs, speed, cb, opaque, errp);
 if (!s) {
-return;
+return NULL;
 }
 
 s->replaces = g_strdup(replaces);
@@ -778,7 +779,7 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
 if (!s->dirty_bitmap) {
 g_free(s->replaces);
 block_job_unref(&s->common);
-return;
+return NULL;
 }
 
 bdrv_op_block_all(s->target, s->common.blocker);
@@ -788,9 +789,11 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
 blk_set_on_error(s->target->blk, on_target_error, on_target_error);
 blk_iostatus_enable(s->target->blk);
 }
+block_job_ref(&s->common);
 s->common.co = qemu_coroutine_create(mirror_run);
 trace_mirror_start(bs, s, s->common.co, opaque);
 qemu_coroutine_enter(s->common.co, s);
+return &s->common;
 }
 
 void mirror_start(BlockDriverState *bs, BlockDriverState *target,
@@ -804,6 +807,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
*target,
 {
 bool is_none_mode;
 BlockDriverState *base;
+BlockJob *job;
 
 if (mode == MIRROR_SYNC_MODE_INCREMENTAL) {
 error_setg(errp, "Sync mode 'incremental' not supported");
@@ -811,27 +815,31 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
*target,
 }
 is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
 base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
-   

[Qemu-block] [PATCH v2 5/5] blockjob: add Job parameter to BlockCompletionFunc

2016-01-26 Thread John Snow
It will no longer be sufficient to rely on the opaque parameter
containing a BDS, and there's no way to reliably include a
self-reference to the job we're creating, so always pass the Job
object forward to any callbacks.

Signed-off-by: John Snow 
---
 block/backup.c|  2 +-
 block/commit.c|  2 +-
 block/mirror.c|  6 +++---
 block/stream.c|  2 +-
 blockdev.c| 14 +-
 blockjob.c| 13 +++--
 include/block/block.h |  2 ++
 include/block/block_int.h | 10 +-
 include/block/blockjob.h  |  6 --
 qemu-img.c|  4 ++--
 tests/test-blockjob-txn.c |  4 ++--
 11 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 735cbe6..dd7e532 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -492,7 +492,7 @@ BlockJob *backup_start(BlockDriverState *bs, 
BlockDriverState *target,
BdrvDirtyBitmap *sync_bitmap,
BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
-   BlockCompletionFunc *cb, void *opaque,
+   BlockJobCompletionFunc *cb, void *opaque,
BlockJobTxn *txn, Error **errp)
 {
 int64_t len;
diff --git a/block/commit.c b/block/commit.c
index 446a3ae..aca4b84 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -203,7 +203,7 @@ static const BlockJobDriver commit_job_driver = {
 
 void commit_start(BlockDriverState *bs, BlockDriverState *base,
   BlockDriverState *top, int64_t speed,
-  BlockdevOnError on_error, BlockCompletionFunc *cb,
+  BlockdevOnError on_error, BlockJobCompletionFunc *cb,
   void *opaque, const char *backing_file_str, Error **errp)
 {
 CommitBlockJob *s;
diff --git a/block/mirror.c b/block/mirror.c
index b193e36..8016834 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -715,7 +715,7 @@ static BlockJob *mirror_start_job(BlockDriverState *bs,
   BlockdevOnError on_source_error,
   BlockdevOnError on_target_error,
   bool unmap,
-  BlockCompletionFunc *cb,
+  BlockJobCompletionFunc *cb,
   void *opaque, Error **errp,
   const BlockJobDriver *driver,
   bool is_none_mode, BlockDriverState *base)
@@ -802,7 +802,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
*target,
   MirrorSyncMode mode, BlockdevOnError on_source_error,
   BlockdevOnError on_target_error,
   bool unmap,
-  BlockCompletionFunc *cb,
+  BlockJobCompletionFunc *cb,
   void *opaque, Error **errp)
 {
 bool is_none_mode;
@@ -827,7 +827,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
*target,
 BlockJob *commit_active_start(BlockDriverState *bs, BlockDriverState *base,
   int64_t speed,
   BlockdevOnError on_error,
-  BlockCompletionFunc *cb,
+  BlockJobCompletionFunc *cb,
   void *opaque, Error **errp)
 {
 int64_t length, base_length;
diff --git a/block/stream.c b/block/stream.c
index 26a2990..0c8ae20 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -217,7 +217,7 @@ static const BlockJobDriver stream_job_driver = {
 BlockJob *stream_start(BlockDriverState *bs, BlockDriverState *base,
   const char *backing_file_str, int64_t speed,
   BlockdevOnError on_error,
-  BlockCompletionFunc *cb,
+  BlockJobCompletionFunc *cb,
   void *opaque, Error **errp)
 {
 StreamBlockJob *s;
diff --git a/blockdev.c b/blockdev.c
index ad46aa8..9851405 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2854,28 +2854,24 @@ out:
 aio_context_release(aio_context);
 }
 
-static void block_job_cb(void *opaque, int ret)
+static void block_job_cb(BlockJob *job, int ret)
 {
 /* Note that this function may be executed from another AioContext besides
  * the QEMU main loop.  If you need to access anything that assumes the
  * QEMU global mutex, use a BH or introduce a mutex.
  */
-
-BlockDriverState *bs = opaque;
 const char *msg = NULL;
 
-trace_block_job_cb(bs, bs->job, ret);
-
-assert(bs->job);
+trace_block_job_cb(job->bs, job, ret);
 
 if (ret < 0) {
 msg = strerror(-ret);
 }
 
-if (block_job_is_cancelled(bs->job)) {
-block_job_event_cancelled(bs->job);
+if (block_job_is_cancelled(job)) {
+block_job_event_cancelled(job);
 } else {
-block_job_event_completed(bs->job, msg);
+block_job_event_c

[Qemu-block] [PATCH v2 3/5] block: allow backup_start to return job references

2016-01-26 Thread John Snow
backup_start picks up a reference to return the job it created to a
caller. callers are updated to put down the reference when they are
finished.

This is particularly interesting for transactions where backup jobs
pick up an implicit reference to the job. Previously, we check to
see if the job still exists by seeing if (bs->job == state->job),
but now we can be assured that our job object is still valid.

The job of course may have been canceled already, though.

Reviewed-by: Kevin Wolf 
Signed-off-by: John Snow 
---
 block/backup.c|  38 +-
 blockdev.c| 180 +-
 include/block/block_int.h |   2 +-
 3 files changed, 120 insertions(+), 100 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 00cafdb..ce96b45 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -485,13 +485,13 @@ static void coroutine_fn backup_run(void *opaque)
 block_job_defer_to_main_loop(&job->common, backup_complete, data);
 }
 
-void backup_start(BlockDriverState *bs, BlockDriverState *target,
-  int64_t speed, MirrorSyncMode sync_mode,
-  BdrvDirtyBitmap *sync_bitmap,
-  BlockdevOnError on_source_error,
-  BlockdevOnError on_target_error,
-  BlockCompletionFunc *cb, void *opaque,
-  BlockJobTxn *txn, Error **errp)
+BlockJob *backup_start(BlockDriverState *bs, BlockDriverState *target,
+   int64_t speed, MirrorSyncMode sync_mode,
+   BdrvDirtyBitmap *sync_bitmap,
+   BlockdevOnError on_source_error,
+   BlockdevOnError on_target_error,
+   BlockCompletionFunc *cb, void *opaque,
+   BlockJobTxn *txn, Error **errp)
 {
 int64_t len;
 
@@ -501,53 +501,53 @@ void backup_start(BlockDriverState *bs, BlockDriverState 
*target,
 
 if (bs == target) {
 error_setg(errp, "Source and target cannot be the same");
-return;
+return NULL;
 }
 
 if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
  on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
 (!bs->blk || !blk_iostatus_is_enabled(bs->blk))) {
 error_setg(errp, QERR_INVALID_PARAMETER, "on-source-error");
-return;
+return NULL;
 }
 
 if (!bdrv_is_inserted(bs)) {
 error_setg(errp, "Device is not inserted: %s",
bdrv_get_device_name(bs));
-return;
+return NULL;
 }
 
 if (!bdrv_is_inserted(target)) {
 error_setg(errp, "Device is not inserted: %s",
bdrv_get_device_name(target));
-return;
+return NULL;
 }
 
 if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
-return;
+return NULL;
 }
 
 if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_BACKUP_TARGET, errp)) {
-return;
+return NULL;
 }
 
 if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
 if (!sync_bitmap) {
 error_setg(errp, "must provide a valid bitmap name for "
  "\"incremental\" sync mode");
-return;
+return NULL;
 }
 
 /* Create a new bitmap, and freeze/disable this one. */
 if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
-return;
+return NULL;
 }
 } else if (sync_bitmap) {
 error_setg(errp,
"a sync_bitmap was provided to backup_run, "
"but received an incompatible sync_mode (%s)",
MirrorSyncMode_lookup[sync_mode]);
-return;
+return NULL;
 }
 
 len = bdrv_getlength(bs);
@@ -572,13 +572,17 @@ void backup_start(BlockDriverState *bs, BlockDriverState 
*target,
 job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ?
sync_bitmap : NULL;
 job->common.len = len;
+
+block_job_ref(&job->common);
 job->common.co = qemu_coroutine_create(backup_run);
 block_job_txn_add_job(txn, &job->common);
 qemu_coroutine_enter(job->common.co, job);
-return;
+return &job->common;
 
  error:
 if (sync_bitmap) {
 bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL);
 }
+
+return NULL;
 }
diff --git a/blockdev.c b/blockdev.c
index 5a74c81..ad46aa8 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1805,17 +1805,17 @@ typedef struct DriveBackupState {
 BlockJob *job;
 } DriveBackupState;
 
-static void do_drive_backup(const char *device, const char *target,
-bool has_format, const char *format,
-enum MirrorSyncMode sync,
-bool has_mode, enum NewImageMode mode,
-bool has_speed, int64_t speed,
-bool has_bitmap, const char *bitmap,
-bool has_on_source_error,

[Qemu-block] [PATCH v2 2/5] block: Allow stream_start to return job references

2016-01-26 Thread John Snow
stream_start now picks up a reference for its return value, a copy of
the job started. callers are responsible for putting it down when they
are done with it.

This removes a minor reference to bs->job in qmp_block_stream, for
a simple tracing function.

Reviewed-by: Kevin Wolf 
Signed-off-by: John Snow 
---
 block/stream.c| 8 +---
 blockdev.c| 8 +---
 include/block/block_int.h | 9 +
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index cafaa07..26a2990 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -214,7 +214,7 @@ static const BlockJobDriver stream_job_driver = {
 .set_speed = stream_set_speed,
 };
 
-void stream_start(BlockDriverState *bs, BlockDriverState *base,
+BlockJob *stream_start(BlockDriverState *bs, BlockDriverState *base,
   const char *backing_file_str, int64_t speed,
   BlockdevOnError on_error,
   BlockCompletionFunc *cb,
@@ -226,19 +226,21 @@ void stream_start(BlockDriverState *bs, BlockDriverState 
*base,
  on_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
 (!bs->blk || !blk_iostatus_is_enabled(bs->blk))) {
 error_setg(errp, QERR_INVALID_PARAMETER, "on-error");
-return;
+return NULL;
 }
 
 s = block_job_create(&stream_job_driver, bs, speed, cb, opaque, errp);
 if (!s) {
-return;
+return NULL;
 }
 
 s->base = base;
 s->backing_file_str = g_strdup(backing_file_str);
 
 s->on_error = on_error;
+block_job_ref(&s->common);
 s->common.co = qemu_coroutine_create(stream_run);
 trace_stream_start(bs, base, s, s->common.co, opaque);
 qemu_coroutine_enter(s->common.co, s);
+return &s->common;
 }
diff --git a/blockdev.c b/blockdev.c
index c0b9b12..5a74c81 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2883,6 +2883,7 @@ void qmp_block_stream(const char *device,
 BlockBackend *blk;
 BlockDriverState *bs;
 BlockDriverState *base_bs = NULL;
+BlockJob *job;
 AioContext *aio_context;
 Error *local_err = NULL;
 const char *base_name = NULL;
@@ -2932,14 +2933,15 @@ void qmp_block_stream(const char *device,
 /* backing_file string overrides base bs filename */
 base_name = has_backing_file ? backing_file : base_name;
 
-stream_start(bs, base_bs, base_name, has_speed ? speed : 0,
- on_error, block_job_cb, bs, &local_err);
+job = stream_start(bs, base_bs, base_name, has_speed ? speed : 0,
+   on_error, block_job_cb, bs, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 goto out;
 }
 
-trace_qmp_block_stream(bs, bs->job);
+trace_qmp_block_stream(bs, job);
+block_job_unref(job);
 
 out:
 aio_context_release(aio_context);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index de4c3c6..7a86d7f 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -598,10 +598,11 @@ int is_windows_drive(const char *filename);
  * streaming job, the backing file of @bs will be changed to
  * @base_id in the written image and to @base in the live BlockDriverState.
  */
-void stream_start(BlockDriverState *bs, BlockDriverState *base,
-  const char *base_id, int64_t speed, BlockdevOnError on_error,
-  BlockCompletionFunc *cb,
-  void *opaque, Error **errp);
+BlockJob *stream_start(BlockDriverState *bs, BlockDriverState *base,
+   const char *base_id, int64_t speed,
+   BlockdevOnError on_error,
+   BlockCompletionFunc *cb,
+   void *opaque, Error **errp);
 
 /**
  * commit_start:
-- 
2.4.3




[Qemu-block] [PATCH v2 4/5] block/backup: Pack Notifier within BackupBlockJob

2016-01-26 Thread John Snow
Instead of relying on peeking at bs->job, we want to explicitly get
a reference to the job that was involved in this notifier callback.

Pack the Notifier inside of the BackupBlockJob so we can use
container_of to get a reference back to the BackupBlockJob object.

This cuts out one more case where we rely unnecessarily on bs->job.

Suggested-by: Paolo Bonzini 
Signed-off-by: John Snow 
---
 block/backup.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index ce96b45..735cbe6 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -46,6 +46,7 @@ typedef struct BackupBlockJob {
 CoRwlock flush_rwlock;
 uint64_t sectors_read;
 HBitmap *bitmap;
+NotifierWithReturn before_write;
 QLIST_HEAD(, CowRequest) inflight_reqs;
 } BackupBlockJob;
 
@@ -87,11 +88,11 @@ static void cow_request_end(CowRequest *req)
 }
 
 static int coroutine_fn backup_do_cow(BlockDriverState *bs,
+  BackupBlockJob *job,
   int64_t sector_num, int nb_sectors,
   bool *error_is_read,
   bool is_write_notifier)
 {
-BackupBlockJob *job = (BackupBlockJob *)bs->job;
 CowRequest cow_request;
 struct iovec iov;
 QEMUIOVector bounce_qiov;
@@ -189,6 +190,7 @@ static int coroutine_fn backup_before_write_notify(
 NotifierWithReturn *notifier,
 void *opaque)
 {
+BackupBlockJob *job = container_of(notifier, BackupBlockJob, before_write);
 BdrvTrackedRequest *req = opaque;
 int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
 int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
@@ -196,7 +198,8 @@ static int coroutine_fn backup_before_write_notify(
 assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
 assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 
-return backup_do_cow(req->bs, sector_num, nb_sectors, NULL, true);
+return backup_do_cow(req->bs, job, sector_num,
+ nb_sectors, NULL, true);
 }
 
 static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
@@ -344,7 +347,8 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 if (yield_and_check(job)) {
 return ret;
 }
-ret = backup_do_cow(bs, cluster * BACKUP_SECTORS_PER_CLUSTER,
+ret = backup_do_cow(bs, job,
+cluster * BACKUP_SECTORS_PER_CLUSTER,
 BACKUP_SECTORS_PER_CLUSTER, &error_is_read,
 false);
 if ((ret < 0) &&
@@ -380,9 +384,6 @@ static void coroutine_fn backup_run(void *opaque)
 BlockDriverState *bs = job->common.bs;
 BlockDriverState *target = job->target;
 BlockdevOnError on_target_error = job->on_target_error;
-NotifierWithReturn before_write = {
-.notify = backup_before_write_notify,
-};
 int64_t start, end;
 int ret = 0;
 
@@ -400,7 +401,8 @@ static void coroutine_fn backup_run(void *opaque)
 blk_iostatus_enable(target->blk);
 }
 
-bdrv_add_before_write_notifier(bs, &before_write);
+job->before_write.notify = backup_before_write_notify;
+bdrv_add_before_write_notifier(bs, &job->before_write);
 
 if (job->sync_mode == MIRROR_SYNC_MODE_NONE) {
 while (!block_job_is_cancelled(&job->common)) {
@@ -452,7 +454,7 @@ static void coroutine_fn backup_run(void *opaque)
 }
 }
 /* FULL sync mode we copy the whole drive. */
-ret = backup_do_cow(bs, start * BACKUP_SECTORS_PER_CLUSTER,
+ret = backup_do_cow(bs, job, start * BACKUP_SECTORS_PER_CLUSTER,
 BACKUP_SECTORS_PER_CLUSTER, &error_is_read, false);
 if (ret < 0) {
 /* Depending on error action, fail now or retry cluster */
@@ -468,7 +470,7 @@ static void coroutine_fn backup_run(void *opaque)
 }
 }
 
-notifier_with_return_remove(&before_write);
+notifier_with_return_remove(&job->before_write);
 
 /* wait until pending backup_do_cow() calls have completed */
 qemu_co_rwlock_wrlock(&job->flush_rwlock);
-- 
2.4.3




[Qemu-block] [PATCH v2 0/5] block: reduce reliance on bs->job pointer

2016-01-26 Thread John Snow
This is a small collection of patches to reduce our use of the bs->job
pointer where possible. There are still more usages in the code, but
this cuts down on a few.

The goal is to eventually eliminate all of them and allow multiple block
jobs to run concurrently, but design on what that will look like is
on-going.

In the meantime, eliminate a few obviously needless references to
bs->job by allowing more systems to carry pointers to jobs directly
instead of trying to fish the pointer out of the BDS all the time.

===
v2:
===

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/5:[] [--] 'block: Allow mirror_start to return job references'
002/5:[] [--] 'block: Allow stream_start to return job references'
003/5:[] [--] 'block: allow backup_start to return job references'
004/5:[down] 'block/backup: Pack Notifier within BackupBlockJob'
005/5:[0006] [FC] 'blockjob: add Job parameter to BlockCompletionFunc'

4: Rewritten to pack the notifier within the job, instead of the job within
   a subclassed notifier. [Paolo]
5: Remove junk assert and extraneous local BDS variable. [Kevin]



For convenience, this branch is available at:
https://github.com/jnsnow/qemu.git branch block-multijob2
https://github.com/jnsnow/qemu/tree/block-multijob2

This version is tagged block-multijob2-v2:
https://github.com/jnsnow/qemu/releases/tag/block-multijob2-v2

John Snow (5):
  block: Allow mirror_start to return job references
  block: Allow stream_start to return job references
  block: allow backup_start to return job references
  block/backup: Pack Notifier within BackupBlockJob
  blockjob: add Job parameter to BlockCompletionFunc

 block/backup.c|  58 +++--
 block/commit.c|   2 +-
 block/mirror.c|  74 
 block/stream.c|  10 ++-
 blockdev.c| 210 +-
 blockjob.c|  13 ++-
 include/block/block.h |   2 +
 include/block/block_int.h |  27 +++---
 include/block/blockjob.h  |   6 +-
 qemu-img.c|  16 ++--
 tests/test-blockjob-txn.c |   4 +-
 11 files changed, 237 insertions(+), 185 deletions(-)

-- 
2.4.3




[Qemu-block] [PATCH] nvme: generate OpenFirmware device path in the "bootorder" fw_cfg file

2016-01-26 Thread Laszlo Ersek
Background on QEMU boot indices
---

Normally, the "bootindex" property is configured for bootable devices
with:

  DEVICE_instance_init()
device_add_bootindex_property(..., "bootindex", ...)
  object_property_add(..., device_get_bootindex,
  device_set_bootindex, ...)

and when the bootindex is set on the QEMU command line, with

  -device DEVICE,...,bootindex=N

the setter that was configured above is invoked:

  device_set_bootindex()
/* parse boot index */
visit_type_int32()

/* verify unicity */
check_boot_index()

/* store parsed boot index */
...

/* insert device path to boot order */
add_boot_device_path()

In the last step, add_boot_device_path() ensures that an OpenFirmware
device path will show up in the "bootorder" fw_cfg file, at a position
corresponding to the device's boot index. Thus guest firmware (SeaBIOS and
OVMF) can try to boot off the device with the right priority.

NVMe boot index
---

In QEMU commit 33739c712982,

  nvma: ide: add bootindex to qom property

the following generic setters / getters:
- device_set_bootindex()
- device_get_bootindex()

were open-coded for NVMe, under the names
- nvme_set_bootindex()
- nvme_get_bootindex()

Plus nvme_instance_init() was added to configure the "bootindex" property
manually, designating the open-coded getter & setter, rather than calling
device_add_bootindex_property().

Crucially, nvme_set_bootindex() avoided the final add_boot_device_path()
call. This fact is spelled out in the message of commit 33739c712982, and
it was presumably the entire reason for all of the code duplication.

Now, Vladislav filed an RFE for OVMF
; OVMF should boot off NVMe
devices. It is simple to build edk2's existent NvmExpressDxe driver into
OVMF, but the boot order matching logic in OVMF can only handle NVMe if
the "bootorder" fw_cfg file includes such devices.

Therefore this patch converts the NVMe device model to
device_set_bootindex() all the way.

Device paths


device_set_bootindex() accepts an optional parameter called "suffix". When
present, it is expected to take the form of an OpenFirmware device path
node, and it gets appended as last node to the otherwise auto-generated
OFW path.

For NVMe, the auto-generated part is

  /pci@i0cf8/pci8086,5845@6[,1]
   ^ ^^  ^
   | |PCI slot and (present when nonzero)
   | |function of the NVMe controller, both hex
   | "driver name" component, built from PCI vendor & device IDs
   PCI root at system bus port, PIO

to which here we append the suffix

  /namespace@1,0
 ^ ^
 | big endian (MSB at lowest address) numeric interpretation
 | of the 64-bit IEEE Extended Unique Identifier, aka EUI-64,
 | hex
 32-bit NVMe namespace identifier, aka NSID, hex

resulting in the OFW device path

  /pci@i0cf8/pci8086,5845@6[,1]/namespace@1,0

The reason for including the NSID and the EUI-64 is that an NVMe device
can in theory produce several different namespaces (distinguished by
NSID). Additionally, each of those may (optionally) have an EUI-64 value.

For now, QEMU only provides namespace 1.

Furthermore, QEMU doesn't even represent the EUI-64 as a standalone field;
it is embedded (and left unused) inside the "NvmeIdNs.res30" array, at the
last eight bytes. (Which is fine, since EUI-64 can be left zero-filled if
unsupported by the device.)

Based on the above, we set the "unit address" part of the last
("namespace") node to fixed "1,0".

OVMF will then map the above OFW device path to the following UEFI device
path fragment, for boot order processing:

  PciRoot(0x0)/Pci(0x6,0x1)/NVMe(0x1,00-00-00-00-00-00-00-00)
  ^^   ^^^   ^
  ||   |||   octets of the EUI-64 in address order
  ||   ||NSID
  ||   |NVMe namespace messaging device path node
  |PCI slot and function
  PCI root bridge

Cc: Keith Busch  (supporter:nvme)
Cc: Kevin Wolf  (supporter:Block layer core)
Cc: qemu-block@nongnu.org (open list:nvme)
Cc: Gonglei 
Cc: Vladislav Vovchenko 
Cc: Feng Tian 
Cc: Gerd Hoffmann 
Cc: Kevin O'Connor 
Signed-off-by: Laszlo Ersek 
---
 hw/block/nvme.c | 42 +-
 1 file changed, 5 insertions(+), 37 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index a5fedb2..c68b625 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -916,45 +916,13 @@ static void nvme_class_init(ObjectClass *oc, void *data)
 dc->vmsd = &nvme_vmstate;
 }
 
-static void nvme_get_bootindex(Object *obj, Visitor *v, void *opaque,
-  const char *name, Error **errp)
-{
-NvmeCtrl *s = NVME(obj);
-
-visit_type_int32(v, &s->conf.bootindex, name, errp);
-}
-
-static void nvme_set_b

Re: [Qemu-block] [Qemu-devel] [PATCH v5 00/12] Dirty bitmaps migration

2016-01-26 Thread John Snow


On 01/26/2016 03:45 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 03.06.2015 01:17, John Snow wrote:
>>
>> On 05/28/2015 04:56 PM, Denis V. Lunev wrote:
>>> On 28/05/15 23:09, John Snow wrote:
 On 05/26/2015 10:51 AM, Denis V. Lunev wrote:
> On 26/05/15 17:48, Denis V. Lunev wrote:
>> On 21/05/15 19:44, John Snow wrote:
>>> On 05/21/2015 09:57 AM, Denis V. Lunev wrote:
 On 21/05/15 16:51, Vladimir Sementsov-Ogievskiy wrote:
> Hi all.
>
> Hmm. There is an interesting suggestion from Denis Lunev (in CC)
> about
> how to drop meta bitmaps and make things easer.
>
> method:
>
>> start migration
> disk and memory are migrated, but not dirty bitmaps.
>> stop vm
> create all necessary bitmaps in destination vm (empty, but with
> same
> names and granularities and enabled flag)
>> start destination vm
> empty bitmaps are tracking now
>> start migrating dirty bitmaps. merge them to corresponding
>> bitmaps
> in destination
> while bitmaps are migrating, they should be in some kind of
> 'inconsistent' state.
> so, we can't start backup or other migration while bitmaps are
> migrating, but vm is already _running_ on destination.
>
> what do you think about it?
>
 the description is a bit incorrect

 - start migration process, perform memory and disk migration
   as usual. VM is still executed at source
 - start VM on target. VM on source should be on pause as usual,
   do not finish migration process. Running VM on target
 "writes"
   normally setting dirty bits as usual
 - copy active dirty bitmaps from source to target. This is safe
   as VM on source is not running
 - "OR" copied bitmaps with ones running on target
 - finish migration process (stop source VM).

 Downtime will not be increased due to dirty bitmaps with this
 approach, migration process is very simple - plain data copy.

 Regards,
Den

>>> I was actually just discussing the live migration approach a little
>>> bit
>>> ago with Stefan, trying to decide on the "right" packet format (The
>>> only
>>> two patches I haven't ACKed yet are ones in which we need to
>>> choose a
>>> send size) and we decided that 1KiB chunk sends would be
>>> appropriate for
>>> live migration.
>>>
>>> I think I'm okay with that method, but obviously this approach
>>> outlined
>>> here would also work very well and would avoid meta bitmaps, chunk
>>> sizes, migration tuning, convergence questions, etc etc etc.
>>>
>>> You'd need to add a new status to the bitmap on the target (maybe
>>> "INCOMPLETE" or "MIGRATING") that prevents it from being used for a
>>> backup operation without preventing it from recording new writes.
>>>
>>> My only concern is how easy it will be to work this into the
>>> migration
>>> workflow.
>>>
>>> It would require some sort of "post-migration" ternary phase, I
>>> suppose,
>>> for devices/data that can be transferred after the VM starts --
>>> and I
>>> suspect we'll be the only use of that phase for now.
>>>
>>> David, what are your thoughts, here? Would you prefer Vladimir and I
>>> push forward on the live migration approach, or add a new post-hoc
>>> phase? This approach might be simpler on the block layer, but I
>>> would be
>>> rather upset if he scrapped his entire series for the second time
>>> for
>>> another approach that also didn't get accepted.
>>>
>>> --js
>> hmmm It looks like we should proceed with this to fit 2.4 dates.
>> There is not much interest at the moment. I think that we could
>> implement this later in 2.5 etc...
>>
>> Regards,
>>   Den
> oops. I have written something strange. Anyway, I think that for
> now we should proceed with this patchset to fit QEMU 2.4 dates.
> The implementation with additional stage (my proposal) could be
> added later, f.e. in 2.5 as I do not see much interest from migration
> gurus.
>
> In this case the review will take a ... lot of time.
>
> Regards,
>   Den
>
 That sounds good to me. I think this solution is workable for 2.4, and
 we can begin working on a post-migration phase for the future to help
 simplify our cases a lot.

 I have been out sick much of this week, so apologies in my lack of
 fervor getting this series upstream recently.

 --js
>>> no prob :)
>> Had a chat with Stefan about this approach and apparently that's what
>> the postcopy migration patches on-list are all about.
>>
>> Stefan brought up the point of post-

[Qemu-block] [PATCH v3 15/19] fdc: use IsaDma interface instead of global DMA_* functions

2016-01-26 Thread Hervé Poussineau
Signed-off-by: Hervé Poussineau 
---
 hw/block/fdc.c | 63 ++
 1 file changed, 46 insertions(+), 17 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index e45a7f4..ba77bea 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -627,6 +627,7 @@ struct FDCtrl {
 QEMUTimer *result_timer;
 int dma_chann;
 uint8_t phase;
+IsaDma *dma;
 /* Controller's identification */
 uint8_t version;
 /* HW */
@@ -1412,7 +1413,8 @@ static void fdctrl_stop_transfer(FDCtrl *fdctrl, uint8_t 
status0,
 fdctrl->fifo[6] = FD_SECTOR_SC;
 fdctrl->data_dir = FD_DIR_READ;
 if (!(fdctrl->msr & FD_MSR_NONDMA)) {
-DMA_release_DREQ(fdctrl->dma_chann);
+IsaDmaClass *k = ISADMA_GET_CLASS(fdctrl->dma);
+k->release_DREQ(fdctrl->dma, fdctrl->dma_chann);
 }
 fdctrl->msr |= FD_MSR_RQM | FD_MSR_DIO;
 fdctrl->msr &= ~FD_MSR_NONDMA;
@@ -1498,27 +1500,43 @@ static void fdctrl_start_transfer(FDCtrl *fdctrl, int 
direction)
 }
 fdctrl->eot = fdctrl->fifo[6];
 if (fdctrl->dor & FD_DOR_DMAEN) {
-int dma_mode;
+IsaDmaTransferMode dma_mode;
+IsaDmaClass *k = ISADMA_GET_CLASS(fdctrl->dma);
+bool dma_mode_ok;
 /* DMA transfer are enabled. Check if DMA channel is well programmed */
-dma_mode = DMA_get_channel_mode(fdctrl->dma_chann);
-dma_mode = (dma_mode >> 2) & 3;
+dma_mode = k->get_transfer_mode(fdctrl->dma, fdctrl->dma_chann);
 FLOPPY_DPRINTF("dma_mode=%d direction=%d (%d - %d)\n",
dma_mode, direction,
(128 << fdctrl->fifo[5]) *
(cur_drv->last_sect - ks + 1), fdctrl->data_len);
-if (((direction == FD_DIR_SCANE || direction == FD_DIR_SCANL ||
-  direction == FD_DIR_SCANH) && dma_mode == 0) ||
-(direction == FD_DIR_WRITE && dma_mode == 2) ||
-(direction == FD_DIR_READ && dma_mode == 1) ||
-(direction == FD_DIR_VERIFY)) {
+switch (direction) {
+case FD_DIR_SCANE:
+case FD_DIR_SCANL:
+case FD_DIR_SCANH:
+dma_mode_ok = (dma_mode == ISADMA_TRANSFER_VERIFY);
+break;
+case FD_DIR_WRITE:
+dma_mode_ok = (dma_mode == ISADMA_TRANSFER_WRITE);
+break;
+case FD_DIR_READ:
+dma_mode_ok = (dma_mode == ISADMA_TRANSFER_READ);
+break;
+case FD_DIR_VERIFY:
+dma_mode_ok = true;
+break;
+default:
+dma_mode_ok = false;
+break;
+}
+if (dma_mode_ok) {
 /* No access is allowed until DMA transfer has completed */
 fdctrl->msr &= ~FD_MSR_RQM;
 if (direction != FD_DIR_VERIFY) {
 /* Now, we just have to wait for the DMA controller to
  * recall us...
  */
-DMA_hold_DREQ(fdctrl->dma_chann);
-DMA_schedule();
+k->hold_DREQ(fdctrl->dma, fdctrl->dma_chann);
+k->schedule(fdctrl->dma);
 } else {
 /* Start transfer */
 fdctrl_transfer_handler(fdctrl, fdctrl->dma_chann, 0,
@@ -1557,12 +1575,14 @@ static int fdctrl_transfer_handler (void *opaque, int 
nchan,
 FDrive *cur_drv;
 int len, start_pos, rel_pos;
 uint8_t status0 = 0x00, status1 = 0x00, status2 = 0x00;
+IsaDmaClass *k;
 
 fdctrl = opaque;
 if (fdctrl->msr & FD_MSR_RQM) {
 FLOPPY_DPRINTF("Not in DMA transfer mode !\n");
 return 0;
 }
+k = ISADMA_GET_CLASS(fdctrl->dma);
 cur_drv = get_cur_drv(fdctrl);
 if (fdctrl->data_dir == FD_DIR_SCANE || fdctrl->data_dir == FD_DIR_SCANL ||
 fdctrl->data_dir == FD_DIR_SCANH)
@@ -1601,8 +1621,8 @@ static int fdctrl_transfer_handler (void *opaque, int 
nchan,
 switch (fdctrl->data_dir) {
 case FD_DIR_READ:
 /* READ commands */
-DMA_write_memory (nchan, fdctrl->fifo + rel_pos,
-  fdctrl->data_pos, len);
+k->write_memory(fdctrl->dma, nchan, fdctrl->fifo + rel_pos,
+fdctrl->data_pos, len);
 break;
 case FD_DIR_WRITE:
 /* WRITE commands */
@@ -1616,8 +1636,8 @@ static int fdctrl_transfer_handler (void *opaque, int 
nchan,
 goto transfer_error;
 }
 
-DMA_read_memory (nchan, fdctrl->fifo + rel_pos,
- fdctrl->data_pos, len);
+k->read_memory(fdctrl->dma, nchan, fdctrl->fifo + rel_pos,
+   fdctrl->data_pos, len);
 if (blk_write(cur_drv->blk, fd_sector(cur_drv),
   fdctrl->fifo, 1) < 0) {
 FLOPPY_DPRINTF("error writing sector %d\n",
@@ -1634,7 +1654,8 @@ static int fdctrl_transfer_handler (void *opaque, int 
nchan,
 {
   

[Qemu-block] [PATCH v3 13/19] sparc: disable floppy DMA

2016-01-26 Thread Hervé Poussineau
All functions relative to DMA (DMA_*() functions) are stubs on sparc platform.
Disable the DMA in the floppy controller, instead of calling these stubs.

Signed-off-by: Hervé Poussineau 
---
 hw/block/fdc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index e3b0e1e..e45a7f4 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -2485,6 +2485,8 @@ static void sun4m_fdc_initfn(Object *obj)
 FDCtrlSysBus *sys = SYSBUS_FDC(obj);
 FDCtrl *fdctrl = &sys->state;
 
+fdctrl->dma_chann = -1;
+
 memory_region_init_io(&fdctrl->iomem, obj, &fdctrl_mem_strict_ops,
   fdctrl, "fdctrl", 0x08);
 sysbus_init_mmio(sbd, &fdctrl->iomem);
-- 
2.1.4




Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/6] ide: fix atapi software reset

2016-01-26 Thread John Snow
Ping

(I know I could just send a PR, but in this case I'd like someone to
look over it. I've gone over pretty terrible detail to convince myself
this won't break anything IDE-wise, so if it looks sane block-wise, I'd
be satisfied with an ACK.)

On 01/19/2016 12:39 PM, John Snow wrote:
> The ATAPI software reset function is implemented somewhat lackadaisically.
> 
> Firstly, it is valid only for ATAPI drives - not HDs. If a HD should
> receive this command while BSY, it should be ignored like any other
> command instead of aborted. A non-BSY HD is free to abort the command
> in the usual fashion to indicate it doesn't understand or doesn't support
> that command.
> 
> Second, for drives that should accept a software reset, they should not
> "forget" about all pending AIO during the reset. Since a software reset
> resets the DRQ and BSY flags, it is possible to 'stack' multiple
> concurrent reads using DMA and alternately chaining software reset and
> DMA reads. We mustn't reset BSY/DRQ until we are confident that we
> have canceled existing AIO.
> 
> Third, the existing software reset routine does not perform a very
> rigorous reset.
> 
> This series corrects this by:
> 
> (1) Correcting ide_exec_cmd to correctly ignore, not abort, software
> reset commands for ide-hd devices that are busy executing a command.
> 
> (2) Improving the software reset routine to cancel buffered DMA, then
> fall back to synchronously waiting for any pending DMA to finish
> before returning, insuring that the reset completes sanely.
> 
> (3) Use existing reset routines to comprehensively reset the device.
> 
> Reported-by: Kevin Wolf 
> Signed-off-by: John Snow 
> 
> 
> 
> For convenience, this branch is available at:
> https://github.com/jnsnow/qemu.git branch ide-reset-fix
> https://github.com/jnsnow/qemu/tree/ide-reset-fix
> 
> This version is tagged ide-reset-fix-v2:
> https://github.com/jnsnow/qemu/releases/tag/ide-reset-fix-v2
> 
> John Snow (6):
>   ide: Prohibit RESET on IDE drives
>   ide: code motion
>   ide: move buffered DMA cancel to core
>   ide: replace blk_drain_all by blk_drain
>   ide: Add silent DRQ cancellation
>   ide: fix device_reset to not ignore pending AIO
> 
>  hw/ide/core.c | 215 
> --
>  hw/ide/internal.h |   1 +
>  hw/ide/pci.c  |  36 +
>  3 files changed, 144 insertions(+), 108 deletions(-)
> 



Re: [Qemu-block] [RFC PATCH 03/16] block: Allow .bdrv_close callback to release dirty bitmaps

2016-01-26 Thread Eric Blake
On 01/26/2016 03:38 AM, Fam Zheng wrote:
> If the driver owns some dirty bitmaps, this assertion will fail.
> 
> The correct place to release them is in bdrv_close, so move the
> assertion one line down.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [RFC PATCH 02/16] block: Set dirty before doing write

2016-01-26 Thread Eric Blake
On 01/26/2016 03:38 AM, Fam Zheng wrote:
> So that driver can write the dirty bits into persistent dirty bitmaps in
> the write callback.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/io.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

> 
> diff --git a/block/io.c b/block/io.c
> index 343ff1f..b964e7e 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1164,6 +1164,8 @@ static int coroutine_fn 
> bdrv_aligned_pwritev(BlockDriverState *bs,
>  }
>  }
>  
> +bdrv_set_dirty(bs, sector_num, nb_sectors);
> +
>  if (ret < 0) {
>  /* Do nothing, write notifier decided to fail this request */

This sets the dirty bit even on failure, but I guess that doesn't hurt
(it's better to mark too much dirty than it is to not mark enough).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [RFC PATCH 01/16] doc: Add QBM format specification

2016-01-26 Thread Eric Blake
On 01/26/2016 03:38 AM, Fam Zheng wrote:
> Signed-off-by: Fam Zheng 
> ---
>  docs/specs/qbm.md | 118 
> ++
>  1 file changed, 118 insertions(+)
>  create mode 100644 docs/specs/qbm.md
> 
> diff --git a/docs/specs/qbm.md b/docs/specs/qbm.md
> new file mode 100644
> index 000..b91910b
> --- /dev/null
> +++ b/docs/specs/qbm.md
> @@ -0,0 +1,118 @@
> +QEMU Block Bitmap (QBM)
> +===

No explicit copyright mention means that this document is GPLv2+ by
default.  I don't know if any 3rd-party implementation trying to use
this spec would object to that, or if a looser license is desirable here
(I don't personally care, but just raising the point).

> +
> +QBM is a multi-file disk format to allow storing persistent block bitmaps 
> along
> +with the tracked data image.  A QBM image includes one json descriptor file,
> +one data image, one or more bitmap files that describe the block dirty status

s/one or more/and one or more/

Must it have block dirty status bitmaps, or can you have a QBM image
with just an allocation bitmap?

> +of the data image.
> +
> +The json file describes the structure of the image. The structure of the json
> +descriptor file is:

Please mention that the file must be valid JSON per RFC 7159.  Probably
also worth requiring that the file is a text file (ends in a newline).

> +
> +QBM-JSON-FILE := { "QBM": DESC-JSON }
> +
> +DESC-JSON := { "version": 1,
> +   "image": IMAGE,
> +   "BITMAPS": BITMAPS

s/"BITMAPS"/"bitmaps"/

See my thoughts below on whether this is the ideal top-level structure.

> + }
> +
> +Fields in the top level json dictionary are:
> +
> +@version: An integer which must be 1.
> +@image: A dictionary in IMAGE schema, as described later. It provides the
> +information of the data image where user data is stored. Its format 
> is
> +documented in the "IMAGE schema" section.
> +@bitmaps: A dictionary that describes one ore more bitmap files. The keys 
> into
> +  the dictionary are the names of bitmap, which must be strings, and
> +  each value is a dictionary describing the information of the 
> bitmap,
> +  as documented below in the "BITMAP schema" section.

Making 'bitmaps' be a dictionary means that we now have keys that might
not be an identifier.  Although this is valid JSON, it may trip up some
tools.  Would it be better to make 'bitmaps' be a list of dictionaries,
where each dictionary has a 'name':'value' member, so that bitmap names
that are not a (C, Python, whatever) identifier are still valid?

> +
> +=== IMAGE schema ===
> +
> +An IMAGE records the information of an image (such as a data image or a 
> backing
> +file). It has following fields:

I liked how you showed DESC-JSON := for the top level; should you do the
same here for IMAGE?

> +
> +@file: The file name string of the referenced image. If it's a relative path,
> +   the file should be found relative to the descriptor file's
> +   location.

Does that mean we'll have to use 'json:...' encoding for representing
network resources?  Should we instead reuse some of the
qapi/block-core.json representation of a block device?

> +@format: The format string of the file.

Nice that format is mandatory.  Do we want to call out a finite list of
supported formats, or leave it open-ended in this spec?

> +
> +=== BITMAP schema ===
> +
> +A BITMAP dictionary records the information of a bitmap (such as a dirty 
> bitmap
> +or a block allocation status bitmap). It has following mandatory fields:
> +
> +@file: The name of the bitmap file. The bitmap file is in little endian, both

s/in //

> +   byte-order-wise and bit-order-wise, which means the LSB in the byte 0
> +   corresponds to the first sectors.

s/the byte 0/the first byte/

Again, should we be reusing something from qapi/block-core.json, to
allow network devices with more structure than just 'json:...' naming?

> +@granularity-bytes: How many bytes of data does one bit in the bitmap track.
> +This value must be a power of 2 and no less than 512.
> +@type: The type of the bitmap.  Currently only "dirty" and "allocation" are
> +   supported.
> +   "dirty" indicates a block dirty bitmap; "allocation" indicates a
> +   allocation status bitmap. There must be at most one "allocation" 
> bitmap.
> +
> +If the type of the bitmap is "allocation", an extra field "backing" is also
> +accepted:
> +
> +@backing: a dictionary as specified in the IMAGE schema. It can be used to
> +  adding a backing file to raw image.

s/adding/add/

As promised above, would an alternative representation be any better?
That is, I'm trying to see if I could write qapi to describe the
structure you've presented here, and I fell short when it comes to
naming a particular bitmap.  Also, since there is at most one
'type':'allocation' member of 'bitmaps', I wonder if se

Re: [Qemu-block] [Qemu-devel] [PATCH v2 13/13] tests: Add test code for hbitmap serialization

2016-01-26 Thread John Snow


On 01/20/2016 01:11 AM, Fam Zheng wrote:
> Signed-off-by: Fam Zheng 

Looks OK at 10,000 feet.

Acked-by: John Snow 

> ---
>  tests/test-hbitmap.c | 139 
> +++
>  1 file changed, 139 insertions(+)
> 
> diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
> index a341803..9b89751 100644
> --- a/tests/test-hbitmap.c
> +++ b/tests/test-hbitmap.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include "qemu/hbitmap.h"
> +#include "qemu/bitmap.h"
>  #include "block/block.h"
>  
>  #define LOG_BITS_PER_LONG  (BITS_PER_LONG == 32 ? 5 : 6)
> @@ -740,6 +741,16 @@ static void test_hbitmap_meta_one(TestHBitmapData *data, 
> const void *unused)
>  }
>  }
>  
> +static void test_hbitmap_serialize_granularity(TestHBitmapData *data,
> +   const void *unused)
> +{
> +int r;
> +
> +hbitmap_test_init(data, L3 * 2, 3);
> +r = hbitmap_serialization_granularity(data->hb);
> +g_assert_cmpint(r, ==, BITS_PER_LONG << 3);
> +}
> +
>  static void test_hbitmap_meta_zero(TestHBitmapData *data, const void *unused)
>  {
>  hbitmap_test_init_meta(data, 0, 0, 1);
> @@ -747,6 +758,125 @@ static void test_hbitmap_meta_zero(TestHBitmapData 
> *data, const void *unused)
>  hbitmap_check_meta(data, 0, 0);
>  }
>  
> +static void hbitmap_test_serialize_range(TestHBitmapData *data,
> + uint8_t *buf, size_t buf_size,
> + uint64_t pos, uint64_t count)
> +{
> +size_t i;
> +
> +assert(hbitmap_granularity(data->hb) == 0);
> +hbitmap_reset_all(data->hb);
> +memset(buf, 0, buf_size);
> +if (count) {
> +hbitmap_set(data->hb, pos, count);
> +}
> +hbitmap_serialize_part(data->hb, buf, 0, data->size);
> +for (i = 0; i < data->size; i++) {
> +int is_set = test_bit(i, (unsigned long *)buf);
> +if (i >= pos && i < pos + count) {
> +g_assert(is_set);
> +} else {
> +g_assert(!is_set);
> +}
> +}
> +hbitmap_reset_all(data->hb);
> +hbitmap_deserialize_part(data->hb, buf, 0, data->size, true);
> +
> +for (i = 0; i < data->size; i++) {
> +int is_set = hbitmap_get(data->hb, i);
> +if (i >= pos && i < pos + count) {
> +g_assert(is_set);
> +} else {
> +g_assert(!is_set);
> +}
> +}
> +}
> +
> +static void test_hbitmap_serialize_basic(TestHBitmapData *data,
> + const void *unused)
> +{
> +int i, j;
> +size_t buf_size;
> +uint8_t *buf;
> +uint64_t positions[] = { 0, 1, L1 - 1, L1, L2 - 1, L2, L2 + 1, L3 - 1 };
> +int num_positions = sizeof(positions) / sizeof(positions[0]);
> +
> +hbitmap_test_init(data, L3, 0);
> +buf_size = hbitmap_serialization_size(data->hb, 0, data->size);
> +buf = g_malloc0(buf_size);
> +
> +for (i = 0; i < num_positions; i++) {
> +for (j = 0; j < num_positions; j++) {
> +hbitmap_test_serialize_range(data, buf, buf_size,
> + positions[i],
> + MIN(positions[j], L3 - 
> positions[i]));
> +}
> +}
> +
> +g_free(buf);
> +}
> +
> +static void test_hbitmap_serialize_part(TestHBitmapData *data,
> +const void *unused)
> +{
> +int i, j, k;
> +size_t buf_size;
> +uint8_t *buf;
> +uint64_t positions[] = { 0, 1, L1 - 1, L1, L2 - 1, L2, L2 + 1, L3 - 1 };
> +int num_positions = sizeof(positions) / sizeof(positions[0]);
> +
> +hbitmap_test_init(data, L3, 0);
> +buf_size = L2;
> +buf = g_malloc0(buf_size);
> +
> +for (i = 0; i < num_positions; i++) {
> +hbitmap_set(data->hb, positions[i], 1);
> +}
> +
> +for (i = 0; i < data->size; i += buf_size) {
> +hbitmap_serialize_part(data->hb, buf, i, buf_size);
> +for (j = 0; j < buf_size; j++) {
> +bool should_set = false;
> +for (k = 0; k < num_positions; k++) {
> +if (positions[k] == j + i) {
> +should_set = true;
> +break;
> +}
> +}
> +g_assert_cmpint(should_set, ==, test_bit(j, (unsigned long 
> *)buf));
> +}
> +}
> +
> +g_free(buf);
> +}
> +
> +static void test_hbitmap_serialize_zeroes(TestHBitmapData *data,
> +  const void *unused)
> +{
> +int i;
> +HBitmapIter iter;
> +int64_t next;
> +uint64_t positions[] = { 0, L1, L2, L3 - L1};
> +int num_positions = sizeof(positions) / sizeof(positions[0]);
> +
> +hbitmap_test_init(data, L3, 0);
> +
> +for (i = 0; i < num_positions; i++) {
> +hbitmap_set(data->hb, positions[i], L1);
> +}
> +
> +for (i = 0; i < num_positions; i++) {
> +hbitmap_deserialize_zeroes(data->hb, position

Re: [Qemu-block] [Qemu-devel] [PATCH v2 12/13] block: BdrvDirtyBitmap serialization interface

2016-01-26 Thread John Snow


On 01/20/2016 01:11 AM, Fam Zheng wrote:
> From: Vladimir Sementsov-Ogievskiy 
> 
> Several functions to provide necessary access to BdrvDirtyBitmap for
> block-migration.c
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> [Add the "finish" parameters. - Fam]
> Signed-off-by: Fam Zheng 
> ---
>  block/dirty-bitmap.c | 37 +
>  include/block/dirty-bitmap.h | 14 ++
>  2 files changed, 51 insertions(+)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index ec13d38..919ce10 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -429,6 +429,43 @@ void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap 
> *bitmap, HBitmap *in)
>  hbitmap_free(tmp);
>  }
>  
> +uint64_t bdrv_dirty_bitmap_serialization_size(const BdrvDirtyBitmap *bitmap,
> +  uint64_t start, uint64_t count)
> +{
> +return hbitmap_serialization_size(bitmap->bitmap, start, count);
> +}
> +
> +uint64_t bdrv_dirty_bitmap_serialization_align(const BdrvDirtyBitmap *bitmap)
> +{
> +return hbitmap_serialization_granularity(bitmap->bitmap);
> +}
> +
> +void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
> +  uint8_t *buf, uint64_t start,
> +  uint64_t count)
> +{
> +hbitmap_serialize_part(bitmap->bitmap, buf, start, count);
> +}
> +
> +void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
> +uint8_t *buf, uint64_t start,
> +uint64_t count, bool finish)
> +{
> +hbitmap_deserialize_part(bitmap->bitmap, buf, start, count, finish);
> +}
> +
> +void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
> +  uint64_t start, uint64_t count,
> +  bool finish)
> +{
> +hbitmap_deserialize_zeroes(bitmap->bitmap, start, count, finish);
> +}
> +
> +void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap)
> +{
> +hbitmap_deserialize_finish(bitmap->bitmap);
> +}
> +
>  void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>  int nr_sectors)
>  {
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index 8c29c3e..6ba5bec 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -54,4 +54,18 @@ void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t 
> sector_num);
>  int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
>  void bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
>  
> +uint64_t bdrv_dirty_bitmap_serialization_size(const BdrvDirtyBitmap *bitmap,
> +  uint64_t start, uint64_t 
> count);
> +uint64_t bdrv_dirty_bitmap_serialization_align(const BdrvDirtyBitmap 
> *bitmap);
> +void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
> +  uint8_t *buf, uint64_t start,
> +  uint64_t count);
> +void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
> +uint8_t *buf, uint64_t start,
> +uint64_t count, bool finish);
> +void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
> +  uint64_t start, uint64_t count,
> +  bool finish);
> +void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap);
> +
>  #endif
> 

Reviewed-by: John Snow 



Re: [Qemu-block] [Qemu-devel] [PATCH v2 11/13] hbitmap: serialization

2016-01-26 Thread John Snow


On 01/20/2016 01:11 AM, Fam Zheng wrote:
> From: Vladimir Sementsov-Ogievskiy 
> 
> Functions to serialize / deserialize(restore) HBitmap. HBitmap should be
> saved to linear sequence of bits independently of endianness and bitmap
> array element (unsigned long) size. Therefore Little Endian is chosen.
> 
> These functions are appropriate for dirty bitmap migration, restoring
> the bitmap in several steps is available. To save performance, every
> step writes only the last level of the bitmap. All other levels are
> restored by hbitmap_deserialize_finish() as a last step of restoring.
> So, HBitmap is inconsistent while restoring.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> [Fix left shift operand to 1UL; add "finish" parameter. - Fam]
> Signed-off-by: Fam Zheng 
> ---
>  include/qemu/hbitmap.h |  78 
>  util/hbitmap.c | 135 
> +
>  2 files changed, 213 insertions(+)
> 
> diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
> index 595ad65..00dbb60 100644
> --- a/include/qemu/hbitmap.h
> +++ b/include/qemu/hbitmap.h
> @@ -149,6 +149,84 @@ void hbitmap_reset_all(HBitmap *hb);
>  bool hbitmap_get(const HBitmap *hb, uint64_t item);
>  
>  /**
> + * hbitmap_serialization_granularity:
> + * @hb: HBitmap to operate on.
> + *
> + * Grunularity of serialization chunks, used by other serializetion 
> functions.

"Granularity," "serialization."

Perhaps we should be consistent with "hbitmap" vs "HBitmap" as well, too.

> + * For every chunk:
> + * 1. Chunk start should be aligned to this granularity.
> + * 2. Chunk size should be aligned too, except for last chunk (for which
> + *  start + count == hb->size)
> + */
> +uint64_t hbitmap_serialization_granularity(const HBitmap *hb);
> +
> +/**
> + * hbitmap_data_size:
> + * @hb: HBitmap to operate on.
> + * @count: Number of bits
> + *
> + * Return amount of bytes hbitmap_(de)serialize_part needs
> + */

"number of bytes" -- amount is for uncountable nouns (like "water.")

> +uint64_t hbitmap_serialization_size(const HBitmap *hb,
> +uint64_t start, uint64_t count);
> +
> +/**
> + * hbitmap_serialize_part
> + * @hb: HBitmap to operate on.
> + * @buf: Buffer to store serialized bitmap.
> + * @start: First bit to store.
> + * @count: Number of bits to store.
> + *
> + * Stores HBitmap data corresponding to given region. The format of saved 
> data
> + * is linear sequence of bits, so it can be used by hbitmap_deserialize_part
> + * independently of endianness and size of HBitmap level array elements
> + */
> +void hbitmap_serialize_part(const HBitmap *hb, uint8_t *buf,
> +uint64_t start, uint64_t count);
> +
> +/**
> + * hbitmap_deserialize_part
> + * @hb: HBitmap to operate on.
> + * @buf: Buffer to restore bitmap data from.
> + * @start: First bit to restore.
> + * @count: Number of bits to restore.
> + * @finish: Whether to call hbitmap_deserialize_finish automatically.
> + *
> + * Restores HBitmap data corresponding to given region. The format is the 
> same
> + * as for hbitmap_serialize_part.
> + *
> + * If @finish is false, caller must call hbitmap_serialize_finish before 
> using
> + * the bitmap.
> + */
> +void hbitmap_deserialize_part(HBitmap *hb, uint8_t *buf,
> +  uint64_t start, uint64_t count,
> +  bool finish);
> +
> +/**
> + * hbitmap_deserialize_zeroes
> + * @hb: HBitmap to operate on.
> + * @start: First bit to restore.
> + * @count: Number of bits to restore.
> + * @finish: Whether to call hbitmap_deserialize_finish automatically.
> + *
> + * Fills the bitmap with zeroes.
> + *
> + * If @finish is false, caller must call hbitmap_serialize_finish before 
> using
> + * the bitmap.
> + */
> +void hbitmap_deserialize_zeroes(HBitmap *hb, uint64_t start, uint64_t count,
> +bool finish);
> +
> +/**
> + * hbitmap_deserialize_finish
> + * @hb: HBitmap to operate on.
> + *
> + * Repair HBitmap after calling hbitmap_deserialize_data. Actually, all 
> HBitmap
> + * layers are restored here.
> + */
> +void hbitmap_deserialize_finish(HBitmap *hb);
> +
> +/**
>   * hbitmap_free:
>   * @hb: HBitmap to operate on.
>   *
> diff --git a/util/hbitmap.c b/util/hbitmap.c
> index 79f6236..1e49ab7 100644
> --- a/util/hbitmap.c
> +++ b/util/hbitmap.c
> @@ -397,6 +397,141 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
>  return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 
> 0;
>  }
>  
> +uint64_t hbitmap_serialization_granularity(const HBitmap *hb)
> +{
> +return 64 << hb->granularity;
> +}
> +

Sorry, about to be confused about what this function is meant to do,
please bear with me:

Normally the granularity specifies the mapping of "interface" bits to
"implementation" bits. I've never came up with good terminology for
this, but if the user asks for 128 bits and a granularity of 2, we
sec

Re: [Qemu-block] [PATCH v2 05/13] block: Hide HBitmap in block dirty bitmap interface

2016-01-26 Thread Vladimir Sementsov-Ogievskiy

On 20.01.2016 09:11, Fam Zheng wrote:

HBitmap is an implementation detail of block dirty bitmap that should be hidden
from users. Introduce a BdrvDirtyBitmapIter to encapsulate the underlying
HBitmapIter.

A small difference in the interface is, before, an HBitmapIter is initialized
in place, now the new BdrvDirtyBitmapIter must be dynamically allocated because
the structure definition is in block.c.

Two current users are converted too.

Signed-off-by: Fam Zheng 
Reviewed-by: John Snow 
---
  block/backup.c   | 14 --
  block/dirty-bitmap.c | 37 -
  block/mirror.c   | 14 --
  include/block/dirty-bitmap.h |  7 +--
  include/qemu/typedefs.h  |  1 +
  5 files changed, 54 insertions(+), 19 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 56ddec0..2388039 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -326,14 +326,14 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
  int64_t end;
  int64_t last_cluster = -1;
  BlockDriverState *bs = job->common.bs;
-HBitmapIter hbi;
+BdrvDirtyBitmapIter *dbi;
  
  granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);

  clusters_per_iter = MAX((granularity / BACKUP_CLUSTER_SIZE), 1);
-bdrv_dirty_iter_init(job->sync_bitmap, &hbi);
+dbi = bdrv_dirty_iter_new(job->sync_bitmap, 0);
  
  /* Find the next dirty sector(s) */

-while ((sector = hbitmap_iter_next(&hbi)) != -1) {
+while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
  cluster = sector / BACKUP_SECTORS_PER_CLUSTER;
  
  /* Fake progress updates for any clusters we skipped */

@@ -345,7 +345,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
  for (end = cluster + clusters_per_iter; cluster < end; cluster++) {
  do {
  if (yield_and_check(job)) {
-return ret;
+goto out;
  }
  ret = backup_do_cow(bs, cluster * BACKUP_SECTORS_PER_CLUSTER,
  BACKUP_SECTORS_PER_CLUSTER, 
&error_is_read,
@@ -353,7 +353,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
  if ((ret < 0) &&
  backup_error_action(job, error_is_read, -ret) ==
  BLOCK_ERROR_ACTION_REPORT) {
-return ret;
+goto out;
  }
  } while (ret < 0);
  }
@@ -361,7 +361,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
  /* If the bitmap granularity is smaller than the backup granularity,
   * we need to advance the iterator pointer to the next cluster. */
  if (granularity < BACKUP_CLUSTER_SIZE) {
-bdrv_set_dirty_iter(&hbi, cluster * BACKUP_SECTORS_PER_CLUSTER);
+bdrv_set_dirty_iter(dbi, cluster * BACKUP_SECTORS_PER_CLUSTER);
  }
  
  last_cluster = cluster - 1;

@@ -373,6 +373,8 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
  job->common.offset += ((end - last_cluster - 1) * 
BACKUP_CLUSTER_SIZE);
  }
  
+out:

+bdrv_dirty_iter_free(dbi);
  return ret;
  }
  
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c

index 7924c38..bd7758b 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -41,9 +41,15 @@ struct BdrvDirtyBitmap {
  char *name; /* Optional non-empty unique ID */
  int64_t size;   /* Size of the bitmap (Number of sectors) */
  bool disabled;  /* Bitmap is read-only */
+int active_iterators;   /* How many iterators are active */
  QLIST_ENTRY(BdrvDirtyBitmap) list;
  };
  
+struct BdrvDirtyBitmapIter {

+HBitmapIter hbi;
+BdrvDirtyBitmap *bitmap;
+};
+
  BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char 
*name)
  {
  BdrvDirtyBitmap *bm;
@@ -211,6 +217,7 @@ void bdrv_dirty_bitmap_truncate(BlockDriverState *bs)
  
  QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {

  assert(!bdrv_dirty_bitmap_frozen(bitmap));
+assert(!bitmap->active_iterators);
  hbitmap_truncate(bitmap->bitmap, size);
  bitmap->size = size;
  }
@@ -221,6 +228,7 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, 
BdrvDirtyBitmap *bitmap)
  BdrvDirtyBitmap *bm, *next;
  QLIST_FOREACH_SAFE(bm, &bs->dirty_bitmaps, list, next) {
  if (bm == bitmap) {
+assert(!bitmap->active_iterators);
  assert(!bdrv_dirty_bitmap_frozen(bm));
  QLIST_REMOVE(bitmap, list);
  hbitmap_free(bitmap->bitmap);
@@ -299,9 +307,29 @@ uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap 
*bitmap)
  return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
  }
  
-void bdrv_dirty_iter_init(BdrvDirtyBitmap *bitmap, HBitmapIter *hbi)

+BdrvDi

Re: [Qemu-block] [Patch v12 resend 08/10] Implement new driver for block replication

2016-01-26 Thread Stefan Hajnoczi
On Mon, Jan 04, 2016 at 01:50:40PM +0800, Wen Congyang wrote:
> On 12/23/2015 05:47 PM, Stefan Hajnoczi wrote:
> > On Wed, Dec 02, 2015 at 01:37:25PM +0800, Wen Congyang wrote:
> >> +/*
> >> + * Only write to active disk if the sectors have
> >> + * already been allocated in active disk/hidden disk.
> >> + */
> >> +qemu_iovec_init(&hd_qiov, qiov->niov);
> >> +while (remaining_sectors > 0) {
> >> +ret = bdrv_is_allocated_above(top, base, sector_num,
> >> +  remaining_sectors, &n);
> > 
> > There is a race condition here since multiple I/O requests can be in
> > flight at the same time.   If two requests touch the same cluster
> > between top->base then the result of these checks could be unreliable.
> 
> I don't think so. When we come here, primary qemu is gone, and failover is
> done. We only write to active disk if the sectors have already been allocated
> in active disk/hidden disk before failover. So it two requests touch the same
> cluster, it is OK, because the function bdrv_is_allocated_above()'s return
> value is not changed.

You are right.  I didn't realize that there will be no allocating writes
to the active disk.  There is no race condition.

> >> +if (ret < 0) {
> >> +return ret;
> >> +}
> >> +
> >> +qemu_iovec_reset(&hd_qiov);
> >> +qemu_iovec_concat(&hd_qiov, qiov, bytes_done, n * 512);
> >> +
> >> +target = ret ? top : base;
> >> +ret = bdrv_co_writev(target, sector_num, n, &hd_qiov);
> >> +if (ret < 0) {
> >> +return ret;
> >> +}
> >> +
> >> +remaining_sectors -= n;
> >> +sector_num += n;
> >> +bytes_done += n * BDRV_SECTOR_SIZE;
> >> +}
> > 
> > I think this can be replaced with an active commit block job that copies
> > data down from the hidden/active disk to the secondary disk.  It is okay
> > to keep writing to the secondary disk while the block job is running and
> > then switch over to the secondary disk once it completes.
> 
> Yes, active commit is another choice. IIRC, I don't use it because mirror job
> has some problem. It is fixed now(see bdrv_drained_begin()/bdrv_drained_end()
> in the mirror job).
> We will use mirror job in the next version.

I see, thanks.

> > 
> >> +
> >> +return 0;
> >> +}
> >> +
> >> +static coroutine_fn int replication_co_discard(BlockDriverState *bs,
> >> +   int64_t sector_num,
> >> +   int nb_sectors)
> >> +{
> >> +BDRVReplicationState *s = bs->opaque;
> >> +int ret;
> >> +
> >> +ret = replication_get_io_status(s);
> >> +if (ret < 0) {
> >> +return ret;
> >> +}
> >> +
> >> +if (ret == 1) {
> >> +/* It is secondary qemu and we are after failover */
> >> +ret = bdrv_co_discard(s->secondary_disk, sector_num, nb_sectors);
> > 
> > What if the clusters are still allocated in the hidden/active disk?
> > 
> 
> What does discard do? Drop the data that allocated in the disk?
> If so, I think I make a misunderstand. I will fix it in the next version.

If I understand correctly the chain is:

 (base) secondary_disk <- hidden disk <- active disk (top)

Clusters in hidden disk or active disk will remain if you discard in
secondary_disk.

I think you need to consider how discard works with a backing chain (I
have forgotten the details but you can check block.c and block/qcow2.c
to find out).

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [Patch v12 resend 05/10] docs: block replication's description

2016-01-26 Thread Stefan Hajnoczi
On Mon, Jan 04, 2016 at 02:03:16PM +0800, Wen Congyang wrote:
> On 12/23/2015 05:26 PM, Stefan Hajnoczi wrote:
> > On Wed, Dec 02, 2015 at 01:31:46PM +0800, Wen Congyang wrote:
> >> +== Failure Handling ==
> >> +There are 6 internal errors when block replication is running:
> >> +1. I/O error on primary disk
> >> +2. Forwarding primary write requests failed
> >> +3. Backup failed
> >> +4. I/O error on secondary disk
> >> +5. I/O error on active disk
> >> +6. Making active disk or hidden disk empty failed
> >> +In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
> >> +4 and 6, we just report block replication's error to FT/HA manager (which
> >> +decides when to do a new checkpoint, when to do failover).
> >> +There is no internal error when doing failover.
> > 
> > Not sure this is true.
> > 
> > Below it says the following for failover: "We will flush the Disk buffer
> > into Secondary Disk and stop block replication".  Flushing the disk
> > buffer can result in I/O errors.  This means that failover operations
> > are not guaranteed to succeed.
> 
> We don't use mirror job now. We may use it in the next version.
> Is there any way to know the I/O error when the mirror job is running?
> Get the job's status?

Block jobs have an error status which is exposed via QMP.  The block job
emits a QMP event notifying the client.  If the client issues
query-block-jobs it will also see the iostatus field.

I'm not aware of an internal API to monitor QMP events.  It would be
possible to add it but first I wonder why you want to use mirror?

> > In practice I think this is similar to a successful failover followed by
> > immediately getting I/O errors on the new Primary Disk.  It means that
> > right after failover there is another failure and the system may not be
> > able to continue.
> 
> Block replication is not designed for such case. For example, we don't do
> failover on primary disk's failure. In such case, we just report the error
> to the disk layer(It is the case 1 in the above Failure Handling).
> 
> Sorry for the late reply. Your mail is sent at 2015-12-23, but I receive
> it at 2016-01-04

What is supposed to happen when flushing the Disk Buffer into the
Secondary Disk fails?

Stefan


signature.asc
Description: PGP signature


[Qemu-block] [PATCH v4 10/10] qemu-io: use no_argument/required_argument constants

2016-01-26 Thread Daniel P. Berrange
When declaring the 'struct option' array, use the standard
constants no_argument/required_argument, instead of magic
values 0 and 1.

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrange 
---
 qemu-io.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 51d8272..0f759bb 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -413,21 +413,21 @@ int main(int argc, char **argv)
 int readonly = 0;
 const char *sopt = "hVc:d:f:rsnmgkt:T:";
 const struct option lopt[] = {
-{ "help", 0, NULL, 'h' },
-{ "version", 0, NULL, 'V' },
-{ "offset", 1, NULL, 'o' },
-{ "cmd", 1, NULL, 'c' },
-{ "format", 1, NULL, 'f' },
-{ "read-only", 0, NULL, 'r' },
-{ "snapshot", 0, NULL, 's' },
-{ "nocache", 0, NULL, 'n' },
-{ "misalign", 0, NULL, 'm' },
-{ "native-aio", 0, NULL, 'k' },
-{ "discard", 1, NULL, 'd' },
-{ "cache", 1, NULL, 't' },
-{ "trace", 1, NULL, 'T' },
-{ "object", 1, NULL, OPTION_OBJECT },
-{ "image-opts", 0, NULL, OPTION_IMAGE_OPTS },
+{ "help", no_argument, NULL, 'h' },
+{ "version", no_argument, NULL, 'V' },
+{ "offset", required_argument, NULL, 'o' },
+{ "cmd", required_argument, NULL, 'c' },
+{ "format", required_argument, NULL, 'f' },
+{ "read-only", no_argument, NULL, 'r' },
+{ "snapshot", no_argument, NULL, 's' },
+{ "nocache", no_argument, NULL, 'n' },
+{ "misalign", no_argument, NULL, 'm' },
+{ "native-aio", no_argument, NULL, 'k' },
+{ "discard", required_argument, NULL, 'd' },
+{ "cache", required_argument, NULL, 't' },
+{ "trace", required_argument, NULL, 'T' },
+{ "object", required_argument, NULL, OPTION_OBJECT },
+{ "image-opts", no_argument, NULL, OPTION_IMAGE_OPTS },
 { NULL, 0, NULL, 0 }
 };
 int c;
-- 
2.5.0




[Qemu-block] [PATCH v4 09/10] qemu-nbd: use no_argument/required_argument constants

2016-01-26 Thread Daniel P. Berrange
When declaring the 'struct option' array, use the standard
constants no_argument/required_argument, instead of magic
values 0 and 1.

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrange 
---
 qemu-nbd.c | 47 ---
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index bbc79f4..58e1610 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -437,29 +437,30 @@ int main(int argc, char **argv)
 const char *sn_id_or_name = NULL;
 const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:";
 struct option lopt[] = {
-{ "help", 0, NULL, 'h' },
-{ "version", 0, NULL, 'V' },
-{ "bind", 1, NULL, 'b' },
-{ "port", 1, NULL, 'p' },
-{ "socket", 1, NULL, 'k' },
-{ "offset", 1, NULL, 'o' },
-{ "read-only", 0, NULL, 'r' },
-{ "partition", 1, NULL, 'P' },
-{ "connect", 1, NULL, 'c' },
-{ "disconnect", 0, NULL, 'd' },
-{ "snapshot", 0, NULL, 's' },
-{ "load-snapshot", 1, NULL, 'l' },
-{ "nocache", 0, NULL, 'n' },
-{ "cache", 1, NULL, QEMU_NBD_OPT_CACHE },
-{ "aio", 1, NULL, QEMU_NBD_OPT_AIO },
-{ "discard", 1, NULL, QEMU_NBD_OPT_DISCARD },
-{ "detect-zeroes", 1, NULL, QEMU_NBD_OPT_DETECT_ZEROES },
-{ "shared", 1, NULL, 'e' },
-{ "format", 1, NULL, 'f' },
-{ "persistent", 0, NULL, 't' },
-{ "verbose", 0, NULL, 'v' },
-{ "object", 1, NULL, QEMU_NBD_OPT_OBJECT },
-{ "image-opts", 0, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
+{ "help", no_argument, NULL, 'h' },
+{ "version", no_argument, NULL, 'V' },
+{ "bind", required_argument, NULL, 'b' },
+{ "port", required_argument, NULL, 'p' },
+{ "socket", required_argument, NULL, 'k' },
+{ "offset", required_argument, NULL, 'o' },
+{ "read-only", no_argument, NULL, 'r' },
+{ "partition", required_argument, NULL, 'P' },
+{ "connect", required_argument, NULL, 'c' },
+{ "disconnect", no_argument, NULL, 'd' },
+{ "snapshot", no_argument, NULL, 's' },
+{ "load-snapshot", required_argument, NULL, 'l' },
+{ "nocache", no_argument, NULL, 'n' },
+{ "cache", required_argument, NULL, QEMU_NBD_OPT_CACHE },
+{ "aio", required_argument, NULL, QEMU_NBD_OPT_AIO },
+{ "discard", required_argument, NULL, QEMU_NBD_OPT_DISCARD },
+{ "detect-zeroes", required_argument, NULL,
+  QEMU_NBD_OPT_DETECT_ZEROES },
+{ "shared", required_argument, NULL, 'e' },
+{ "format", required_argument, NULL, 'f' },
+{ "persistent", no_argument, NULL, 't' },
+{ "verbose", no_argument, NULL, 'v' },
+{ "object", required_argument, NULL, QEMU_NBD_OPT_OBJECT },
+{ "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
 { NULL, 0, NULL, 0 }
 };
 int ch;
-- 
2.5.0




[Qemu-block] [PATCH v4 04/10] qemu-io: add support for --object command line arg

2016-01-26 Thread Daniel P. Berrange
Allow creation of user creatable object types with qemu-io
via a new --object command line arg. This will be used to supply
passwords and/or encryption keys to the various block driver
backends via the recently added 'secret' object type.

 # printf letmein > mypasswd.txt
 # qemu-io --object secret,id=sec0,file=mypasswd.txt \
  ...other args...

Signed-off-by: Daniel P. Berrange 
---
 qemu-io.c | 54 ++
 1 file changed, 54 insertions(+)

diff --git a/qemu-io.c b/qemu-io.c
index d593f19..d1432ea 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -18,6 +18,8 @@
 #include "qemu/config-file.h"
 #include "qemu/readline.h"
 #include "qapi/qmp/qstring.h"
+#include "qapi/opts-visitor.h"
+#include "qom/object_interfaces.h"
 #include "sysemu/block-backend.h"
 #include "block/block_int.h"
 #include "trace/control.h"
@@ -200,6 +202,8 @@ static void usage(const char *name)
 "Usage: %s [-h] [-V] [-rsnm] [-f FMT] [-c STRING] ... [file]\n"
 "QEMU Disk exerciser\n"
 "\n"
+"  --object OBJECTDEF   define an object such as 'secret' for\n"
+"   passwords and/or encryption keys\n"
 "  -c, --cmd STRING execute command with its arguments\n"
 "   from the given string\n"
 "  -f, --format FMT specifies the block driver to use\n"
@@ -361,6 +365,38 @@ static void reenable_tty_echo(void)
 qemu_set_tty_echo(STDIN_FILENO, true);
 }
 
+enum {
+OPTION_OBJECT = 256,
+};
+
+static QemuOptsList qemu_object_opts = {
+.name = "object",
+.implied_opt_name = "qom-type",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head),
+.desc = {
+{ }
+},
+};
+
+static int object_create(void *opaque, QemuOpts *opts, Error **errp)
+{
+Error *err = NULL;
+OptsVisitor *ov;
+QDict *pdict;
+
+ov = opts_visitor_new(opts);
+pdict = qemu_opts_to_qdict(opts, NULL);
+
+user_creatable_add(pdict, opts_get_visitor(ov), &err);
+opts_visitor_cleanup(ov);
+QDECREF(pdict);
+if (err) {
+error_propagate(errp, err);
+return -1;
+}
+return 0;
+}
+
 int main(int argc, char **argv)
 {
 int readonly = 0;
@@ -379,6 +415,7 @@ int main(int argc, char **argv)
 { "discard", 1, NULL, 'd' },
 { "cache", 1, NULL, 't' },
 { "trace", 1, NULL, 'T' },
+{ "object", 1, NULL, OPTION_OBJECT },
 { NULL, 0, NULL, 0 }
 };
 int c;
@@ -386,6 +423,7 @@ int main(int argc, char **argv)
 int flags = BDRV_O_UNMAP;
 Error *local_error = NULL;
 QDict *opts = NULL;
+QemuOpts *qopts = NULL;
 
 #ifdef CONFIG_POSIX
 signal(SIGPIPE, SIG_IGN);
@@ -394,6 +432,8 @@ int main(int argc, char **argv)
 progname = basename(argv[0]);
 qemu_init_exec_dir(argv[0]);
 
+module_call_init(MODULE_INIT_QOM);
+qemu_add_opts(&qemu_object_opts);
 bdrv_init();
 
 while ((c = getopt_long(argc, argv, sopt, lopt, &opt_index)) != -1) {
@@ -445,6 +485,13 @@ int main(int argc, char **argv)
 case 'h':
 usage(progname);
 exit(0);
+case OPTION_OBJECT:
+qopts = qemu_opts_parse_noisily(qemu_find_opts("object"),
+optarg, true);
+if (!qopts) {
+exit(1);
+}
+break;
 default:
 usage(progname);
 exit(1);
@@ -461,6 +508,13 @@ int main(int argc, char **argv)
 exit(1);
 }
 
+if (qemu_opts_foreach(qemu_find_opts("object"),
+  object_create,
+  NULL, &local_error)) {
+error_report_err(local_error);
+exit(1);
+}
+
 /* initialize commands */
 qemuio_add_command(&quit_cmd);
 qemuio_add_command(&open_cmd);
-- 
2.5.0




[Qemu-block] [PATCH v4 08/10] qemu-nbd: don't overlap long option values with short options

2016-01-26 Thread Daniel P. Berrange
When defining values for long options, the normal practice is
to start numbering from 256, to avoid overlap with the range
of valid values for short options.

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrange 
---
 qemu-nbd.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 764698f..bbc79f4 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -43,12 +43,12 @@
 #include 
 
 #define SOCKET_PATH"/var/lock/qemu-nbd-%s"
-#define QEMU_NBD_OPT_CACHE 1
-#define QEMU_NBD_OPT_AIO   2
-#define QEMU_NBD_OPT_DISCARD   3
-#define QEMU_NBD_OPT_DETECT_ZEROES 4
-#define QEMU_NBD_OPT_OBJECT5
-#define QEMU_NBD_OPT_IMAGE_OPTS6
+#define QEMU_NBD_OPT_CACHE 256
+#define QEMU_NBD_OPT_AIO   257
+#define QEMU_NBD_OPT_DISCARD   258
+#define QEMU_NBD_OPT_DETECT_ZEROES 259
+#define QEMU_NBD_OPT_OBJECT260
+#define QEMU_NBD_OPT_IMAGE_OPTS261
 
 static NBDExport *exp;
 static int verbose;
-- 
2.5.0




[Qemu-block] [PATCH v4 07/10] qemu-img: allow specifying image as a set of options args

2016-01-26 Thread Daniel P. Berrange
Currently qemu-img allows an image filename to be passed on the
command line, but unless using the JSON format, it does not have
a way to set any options except the format eg

   qemu-img info https://127.0.0.1/images/centos7.iso

This adds a --image-opts arg that indicates that the positional
filename should be interpreted as a full option string, not
just a filename.

   qemu-img info --source driver=http,url=https://127.0.0.1/images,sslverify=off

This flag is mutually exclusive with the '-f' / '-F' flags.

Signed-off-by: Daniel P. Berrange 
---
 qemu-img-cmds.hx |  44 
 qemu-img.c   | 304 +--
 qemu-img.texi|   6 ++
 3 files changed, 303 insertions(+), 51 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 5bb1de7..ee5c770 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -10,68 +10,68 @@ STEXI
 ETEXI
 
 DEF("check", img_check,
-"check [-q] [--object objectdef] [-f fmt] [--output=ofmt] [-r [leaks | 
all]] [-T src_cache] filename")
+"check [-q] [--object objectdef] [--image-opts] [-f fmt] [--output=ofmt] 
[-r [leaks | all]] [-T src_cache] filename")
 STEXI
-@item check [--object objectdef] [-q] [-f @var{fmt}] [--output=@var{ofmt}] [-r 
[leaks | all]] [-T @var{src_cache}] @var{filename}
+@item check [--object objectdef] [--image-opts] [-q] [-f @var{fmt}] 
[--output=@var{ofmt}] [-r [leaks | all]] [-T @var{src_cache}] @var{filename}
 ETEXI
 
 DEF("create", img_create,
-"create [-q] [--object objectdef] [-f fmt] [-o options] filename [size]")
+"create [-q] [--object objectdef] [--image-opts] [-f fmt] [-o options] 
filename [size]")
 STEXI
-@item create [--object objectdef] [-q] [-f @var{fmt}] [-o @var{options}] 
@var{filename} [@var{size}]
+@item create [--object objectdef] [--image-opts] [-q] [-f @var{fmt}] [-o 
@var{options}] @var{filename} [@var{size}]
 ETEXI
 
 DEF("commit", img_commit,
-"commit [-q] [--object objectdef] [-f fmt] [-t cache] [-b base] [-d] [-p] 
filename")
+"commit [-q] [--object objectdef] [--image-opts] [-f fmt] [-t cache] [-b 
base] [-d] [-p] filename")
 STEXI
-@item commit [--object objectdef] [-q] [-f @var{fmt}] [-t @var{cache}] [-b 
@var{base}] [-d] [-p] @var{filename}
+@item commit [--object objectdef] [--image-opts] [-q] [-f @var{fmt}] [-t 
@var{cache}] [-b @var{base}] [-d] [-p] @var{filename}
 ETEXI
 
 DEF("compare", img_compare,
-"compare [--object objectdef] [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] 
[-s] filename1 filename2")
+"compare [--object objectdef] [--image-opts] [-f fmt] [-F fmt] [-T 
src_cache] [-p] [-q] [-s] filename1 filename2")
 STEXI
-@item compare [--object objectdef] [-f @var{fmt}] [-F @var{fmt}] [-T 
@var{src_cache}] [-p] [-q] [-s] @var{filename1} @var{filename2}
+@item compare [--object objectdef] [--image-opts] [-f @var{fmt}] [-F 
@var{fmt}] [-T @var{src_cache}] [-p] [-q] [-s] @var{filename1} @var{filename2}
 ETEXI
 
 DEF("convert", img_convert,
-"convert [--object objectdef] [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T 
src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] [-l 
snapshot_param] [-S sparse_size] filename [filename2 [...]] output_filename")
+"convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f fmt] 
[-t cache] [-T src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] 
[-l snapshot_param] [-S sparse_size] filename [filename2 [...]] 
output_filename")
 STEXI
-@item convert [--object objectdef] [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t 
@var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-o @var{options}] [-s 
@var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S @var{sparse_size}] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
+@item convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f 
@var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-o 
@var{options}] [-s @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S 
@var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("info", img_info,
-"info [--object objectdef] [-f fmt] [--output=ofmt] [--backing-chain] 
filename")
+"info [--object objectdef] [--image-opts] [-f fmt] [--output=ofmt] 
[--backing-chain] filename")
 STEXI
-@item info [--object objectdef] [-f @var{fmt}] [--output=@var{ofmt}] 
[--backing-chain] @var{filename}
+@item info [--object objectdef] [--image-opts] [-f @var{fmt}] 
[--output=@var{ofmt}] [--backing-chain] @var{filename}
 ETEXI
 
 DEF("map", img_map,
-"map [--object objectdef] [-f fmt] [--output=ofmt] filename")
+"map [--object objectdef] [--image-opts] [-f fmt] [--output=ofmt] 
filename")
 STEXI
-@item map [--object objectdef] [-f @var{fmt}] [--output=@var{ofmt}] 
@var{filename}
+@item map [--object objectdef] [--image-opts] [-f @var{fmt}] 
[--output=@var{ofmt}] @var{filename}
 ETEXI
 
 DEF("snapshot", img_snapshot,
-"snapshot [--object objectdef] [-q] [-l | -a snapshot | -c snapshot | -d 
snapsh

[Qemu-block] [PATCH v4 06/10] qemu-nbd: allow specifying image as a set of options args

2016-01-26 Thread Daniel P. Berrange
Currently qemu-nbd allows an image filename to be passed on the
command line, but unless using the JSON format, it does not have
a way to set any options except the format eg

   qemu-nbd https://127.0.0.1/images/centos7.iso
   qemu-nbd /home/berrange/demo.qcow2

This adds a --image-opts arg that indicates that the positional
filename should be interpreted as a full option string, not
just a filename.

   qemu-nbd --image-opts driver=http,url=https://127.0.0.1/images,sslverify=off
   qemu-nbd --image-opts file=/home/berrange/demo.qcow2

This flag is mutually exclusive with the '-f' flag.

Signed-off-by: Daniel P. Berrange 
---
 qemu-nbd.c | 45 -
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 8e5d36c..764698f 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -48,6 +48,7 @@
 #define QEMU_NBD_OPT_DISCARD   3
 #define QEMU_NBD_OPT_DETECT_ZEROES 4
 #define QEMU_NBD_OPT_OBJECT5
+#define QEMU_NBD_OPT_IMAGE_OPTS6
 
 static NBDExport *exp;
 static int verbose;
@@ -381,6 +382,16 @@ static SocketAddress *nbd_build_socket_address(const char 
*sockpath,
 }
 
 
+static QemuOptsList file_opts = {
+.name = "file",
+.implied_opt_name = "file",
+.head = QTAILQ_HEAD_INITIALIZER(file_opts.head),
+.desc = {
+/* no elements => accept any params */
+{ /* end of list */ }
+},
+};
+
 static QemuOptsList qemu_object_opts = {
 .name = "object",
 .implied_opt_name = "qom-type",
@@ -448,6 +459,7 @@ int main(int argc, char **argv)
 { "persistent", 0, NULL, 't' },
 { "verbose", 0, NULL, 'v' },
 { "object", 1, NULL, QEMU_NBD_OPT_OBJECT },
+{ "image-opts", 0, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
 { NULL, 0, NULL, 0 }
 };
 int ch;
@@ -466,6 +478,7 @@ int main(int argc, char **argv)
 BlockdevDetectZeroesOptions detect_zeroes = 
BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF;
 QDict *options = NULL;
 QemuOpts *opts;
+bool imageOpts = false;
 
 /* The client thread uses SIGTERM to interrupt the server.  A signal
  * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
@@ -635,6 +648,9 @@ int main(int argc, char **argv)
 exit(1);
 }
 break;
+case QEMU_NBD_OPT_IMAGE_OPTS:
+imageOpts = true;
+break;
 case '?':
 error_report("Try `%s --help' for more information.", argv[0]);
 exit(EXIT_FAILURE);
@@ -744,13 +760,32 @@ int main(int argc, char **argv)
 bdrv_init();
 atexit(bdrv_close_all);
 
-if (fmt) {
-options = qdict_new();
-qdict_put(options, "driver", qstring_from_str(fmt));
+srcpath = argv[optind];
+if (imageOpts) {
+char *file = NULL;
+if (fmt) {
+error_report("--image-opts and -f are mutually exclusive");
+exit(EXIT_FAILURE);
+}
+opts = qemu_opts_parse_noisily(&file_opts, srcpath, true);
+if (!opts) {
+qemu_opts_reset(&file_opts);
+exit(EXIT_FAILURE);
+}
+file = g_strdup(qemu_opt_get(opts, "file"));
+qemu_opt_unset(opts, "file");
+options = qemu_opts_to_qdict(opts, NULL);
+qemu_opts_reset(&file_opts);
+blk = blk_new_open("hda", file, NULL, options, flags, &local_err);
+g_free(file);
+} else {
+if (fmt) {
+options = qdict_new();
+qdict_put(options, "driver", qstring_from_str(fmt));
+}
+blk = blk_new_open("hda", srcpath, NULL, options, flags, &local_err);
 }
 
-srcpath = argv[optind];
-blk = blk_new_open("hda", srcpath, NULL, options, flags, &local_err);
 if (!blk) {
 error_reportf_err(local_err, "Failed to blk_new_open '%s': ",
   argv[optind]);
-- 
2.5.0




[Qemu-block] [PATCH v4 05/10] qemu-io: allow specifying image as a set of options args

2016-01-26 Thread Daniel P. Berrange
Currently qemu-io allows an image filename to be passed on the
command line, but unless using the JSON format, it does not have
a way to set any options except the format eg

 qemu-io https://127.0.0.1/images/centos7.iso
 qemu-io /home/berrange/demo.qcow2

This adds a --image-opts arg that indicates that the positional
filename should be interpreted as a full option string, not
just a filename.

 qemu-io --image-opts driver=http,url=https://127.0.0.1/images,sslverify=off
 qemu-io --image-opts file=/home/berrange/demo.qcow2

This flag is mutually exclusive with the '-f' flag.

Signed-off-by: Daniel P. Berrange 
---
 qemu-io.c | 34 +-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/qemu-io.c b/qemu-io.c
index d1432ea..51d8272 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -367,6 +367,7 @@ static void reenable_tty_echo(void)
 
 enum {
 OPTION_OBJECT = 256,
+OPTION_IMAGE_OPTS = 257,
 };
 
 static QemuOptsList qemu_object_opts = {
@@ -397,6 +398,16 @@ static int object_create(void *opaque, QemuOpts *opts, 
Error **errp)
 return 0;
 }
 
+static QemuOptsList file_opts = {
+.name = "file",
+.implied_opt_name = "file",
+.head = QTAILQ_HEAD_INITIALIZER(file_opts.head),
+.desc = {
+/* no elements => accept any params */
+{ /* end of list */ }
+},
+};
+
 int main(int argc, char **argv)
 {
 int readonly = 0;
@@ -416,6 +427,7 @@ int main(int argc, char **argv)
 { "cache", 1, NULL, 't' },
 { "trace", 1, NULL, 'T' },
 { "object", 1, NULL, OPTION_OBJECT },
+{ "image-opts", 0, NULL, OPTION_IMAGE_OPTS },
 { NULL, 0, NULL, 0 }
 };
 int c;
@@ -424,6 +436,7 @@ int main(int argc, char **argv)
 Error *local_error = NULL;
 QDict *opts = NULL;
 QemuOpts *qopts = NULL;
+bool imageOpts = false;
 
 #ifdef CONFIG_POSIX
 signal(SIGPIPE, SIG_IGN);
@@ -492,6 +505,9 @@ int main(int argc, char **argv)
 exit(1);
 }
 break;
+case OPTION_IMAGE_OPTS:
+imageOpts = true;
+break;
 default:
 usage(progname);
 exit(1);
@@ -534,7 +550,23 @@ int main(int argc, char **argv)
 flags |= BDRV_O_RDWR;
 }
 
-if ((argc - optind) == 1) {
+if (imageOpts) {
+char *file;
+qopts = qemu_opts_parse_noisily(&file_opts, argv[optind], false);
+if (!qopts) {
+exit(1);
+}
+if (opts) {
+error_report("--image-opts and -f are mutually exclusive");
+exit(1);
+}
+file = g_strdup(qemu_opt_get(qopts, "file"));
+qemu_opt_unset(qopts, "file");
+opts = qemu_opts_to_qdict(qopts, NULL);
+qemu_opts_reset(&file_opts);
+openfile(file, flags, opts);
+g_free(file);
+} else if ((argc - optind) == 1) {
 openfile(argv[optind], flags, opts);
 }
 command_loop();
-- 
2.5.0




[Qemu-block] [PATCH v4 02/10] qemu-img: add support for --object command line arg

2016-01-26 Thread Daniel P. Berrange
Allow creation of user creatable object types with qemu-img
via a new --object command line arg. This will be used to supply
passwords and/or encryption keys to the various block driver
backends via the recently added 'secret' object type.

 # printf letmein > mypasswd.txt
 # qemu-img info --object secret,id=sec0,file=mypasswd.txt \
  ...other info args...

Signed-off-by: Daniel P. Berrange 
---
 qemu-img-cmds.hx |  44 -
 qemu-img.c   | 284 +--
 qemu-img.texi|   8 ++
 3 files changed, 306 insertions(+), 30 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 9567774..5bb1de7 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -10,68 +10,68 @@ STEXI
 ETEXI
 
 DEF("check", img_check,
-"check [-q] [-f fmt] [--output=ofmt] [-r [leaks | all]] [-T src_cache] 
filename")
+"check [-q] [--object objectdef] [-f fmt] [--output=ofmt] [-r [leaks | 
all]] [-T src_cache] filename")
 STEXI
-@item check [-q] [-f @var{fmt}] [--output=@var{ofmt}] [-r [leaks | all]] [-T 
@var{src_cache}] @var{filename}
+@item check [--object objectdef] [-q] [-f @var{fmt}] [--output=@var{ofmt}] [-r 
[leaks | all]] [-T @var{src_cache}] @var{filename}
 ETEXI
 
 DEF("create", img_create,
-"create [-q] [-f fmt] [-o options] filename [size]")
+"create [-q] [--object objectdef] [-f fmt] [-o options] filename [size]")
 STEXI
-@item create [-q] [-f @var{fmt}] [-o @var{options}] @var{filename} [@var{size}]
+@item create [--object objectdef] [-q] [-f @var{fmt}] [-o @var{options}] 
@var{filename} [@var{size}]
 ETEXI
 
 DEF("commit", img_commit,
-"commit [-q] [-f fmt] [-t cache] [-b base] [-d] [-p] filename")
+"commit [-q] [--object objectdef] [-f fmt] [-t cache] [-b base] [-d] [-p] 
filename")
 STEXI
-@item commit [-q] [-f @var{fmt}] [-t @var{cache}] [-b @var{base}] [-d] [-p] 
@var{filename}
+@item commit [--object objectdef] [-q] [-f @var{fmt}] [-t @var{cache}] [-b 
@var{base}] [-d] [-p] @var{filename}
 ETEXI
 
 DEF("compare", img_compare,
-"compare [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] [-s] filename1 
filename2")
+"compare [--object objectdef] [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] 
[-s] filename1 filename2")
 STEXI
-@item compare [-f @var{fmt}] [-F @var{fmt}] [-T @var{src_cache}] [-p] [-q] 
[-s] @var{filename1} @var{filename2}
+@item compare [--object objectdef] [-f @var{fmt}] [-F @var{fmt}] [-T 
@var{src_cache}] [-p] [-q] [-s] @var{filename1} @var{filename2}
 ETEXI
 
 DEF("convert", img_convert,
-"convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O 
output_fmt] [-o options] [-s snapshot_id_or_name] [-l snapshot_param] [-S 
sparse_size] filename [filename2 [...]] output_filename")
+"convert [--object objectdef] [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T 
src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] [-l 
snapshot_param] [-S sparse_size] filename [filename2 [...]] output_filename")
 STEXI
-@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-T 
@var{src_cache}] [-O @var{output_fmt}] [-o @var{options}] [-s 
@var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S @var{sparse_size}] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
+@item convert [--object objectdef] [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t 
@var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-o @var{options}] [-s 
@var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S @var{sparse_size}] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("info", img_info,
-"info [-f fmt] [--output=ofmt] [--backing-chain] filename")
+"info [--object objectdef] [-f fmt] [--output=ofmt] [--backing-chain] 
filename")
 STEXI
-@item info [-f @var{fmt}] [--output=@var{ofmt}] [--backing-chain] 
@var{filename}
+@item info [--object objectdef] [-f @var{fmt}] [--output=@var{ofmt}] 
[--backing-chain] @var{filename}
 ETEXI
 
 DEF("map", img_map,
-"map [-f fmt] [--output=ofmt] filename")
+"map [--object objectdef] [-f fmt] [--output=ofmt] filename")
 STEXI
-@item map [-f @var{fmt}] [--output=@var{ofmt}] @var{filename}
+@item map [--object objectdef] [-f @var{fmt}] [--output=@var{ofmt}] 
@var{filename}
 ETEXI
 
 DEF("snapshot", img_snapshot,
-"snapshot [-q] [-l | -a snapshot | -c snapshot | -d snapshot] filename")
+"snapshot [--object objectdef] [-q] [-l | -a snapshot | -c snapshot | -d 
snapshot] filename")
 STEXI
-@item snapshot [-q] [-l | -a @var{snapshot} | -c @var{snapshot} | -d 
@var{snapshot}] @var{filename}
+@item snapshot [--object objectdef] [-q] [-l | -a @var{snapshot} | -c 
@var{snapshot} | -d @var{snapshot}] @var{filename}
 ETEXI
 
 DEF("rebase", img_rebase,
-"rebase [-q] [-f fmt] [-t cache] [-T src_cache] [-p] [-u] -b backing_file 
[-F backing_fmt] filename")
+"rebase [--object objectdef] [-q] [-f fmt] [-t cache] [-T src_cache] [-p] 
[-u] -b backing_file [-F backing_fmt] filename")
 STEXI
-@item rebase [-q] [-f @var{fmt}] [-t @var{cache}] [-T @v

[Qemu-block] [PATCH v4 01/10] qom: add helpers for UserCreatable object types

2016-01-26 Thread Daniel P. Berrange
The QMP monitor code has two helper methods object_add
and qmp_object_del that are called from several places
in the code (QMP, HMP and main emulator startup).

The HMP and main emulator startup code also share
further logic that extracts the qom-type & id
values from a qdict.

We soon need to use this logic from qemu-img, qemu-io
and qemu-nbd too, but don't want those to depend on
the monitor, nor do we want to duplicate the code.

To avoid this, move some code out of qmp.c and hmp.c
adding 3 new methods to qom/object_interfaces.c

 - user_creatable_add - takes a QDict holding a full
   object definition & instantiates it
 - user_creatable_add_type - takes an ID, type name,
   and QDict holding object properties & instantiates
   it
 - user_creatable_del - takes an ID and deletes the
   corresponding object

The existing code is updated to use these new methods.

Signed-off-by: Daniel P. Berrange 
---
 hmp.c   |  52 ---
 include/monitor/monitor.h   |   3 -
 include/qom/object_interfaces.h |  48 ++
 qmp.c   |  76 ++
 qom/object_interfaces.c | 139 
 vl.c|  48 --
 6 files changed, 216 insertions(+), 150 deletions(-)

diff --git a/hmp.c b/hmp.c
index 54f2620..95930b0 100644
--- a/hmp.c
+++ b/hmp.c
@@ -29,6 +29,7 @@
 #include "qapi/string-output-visitor.h"
 #include "qapi/util.h"
 #include "qapi-visit.h"
+#include "qom/object_interfaces.h"
 #include "ui/console.h"
 #include "block/qapi.h"
 #include "qemu-io.h"
@@ -1652,58 +1653,27 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict)
 void hmp_object_add(Monitor *mon, const QDict *qdict)
 {
 Error *err = NULL;
-Error *err_end = NULL;
 QemuOpts *opts;
-char *type = NULL;
-char *id = NULL;
-void *dummy = NULL;
 OptsVisitor *ov;
-QDict *pdict;
+Object *obj = NULL;
 
 opts = qemu_opts_from_qdict(qemu_find_opts("object"), qdict, &err);
 if (err) {
-goto out;
+hmp_handle_error(mon, &err);
+return;
 }
 
 ov = opts_visitor_new(opts);
-pdict = qdict_clone_shallow(qdict);
-
-visit_start_struct(opts_get_visitor(ov), &dummy, NULL, NULL, 0, &err);
-if (err) {
-goto out_clean;
-}
-
-qdict_del(pdict, "qom-type");
-visit_type_str(opts_get_visitor(ov), &type, "qom-type", &err);
-if (err) {
-goto out_end;
-}
+obj = user_creatable_add(qdict, opts_get_visitor(ov), &err);
+opts_visitor_cleanup(ov);
+qemu_opts_del(opts);
 
-qdict_del(pdict, "id");
-visit_type_str(opts_get_visitor(ov), &id, "id", &err);
 if (err) {
-goto out_end;
+hmp_handle_error(mon, &err);
 }
-
-object_add(type, id, pdict, opts_get_visitor(ov), &err);
-
-out_end:
-visit_end_struct(opts_get_visitor(ov), &err_end);
-if (!err && err_end) {
-qmp_object_del(id, NULL);
+if (obj) {
+object_unref(obj);
 }
-error_propagate(&err, err_end);
-out_clean:
-opts_visitor_cleanup(ov);
-
-QDECREF(pdict);
-qemu_opts_del(opts);
-g_free(id);
-g_free(type);
-g_free(dummy);
-
-out:
-hmp_handle_error(mon, &err);
 }
 
 void hmp_getfd(Monitor *mon, const QDict *qdict)
@@ -1933,7 +1903,7 @@ void hmp_object_del(Monitor *mon, const QDict *qdict)
 const char *id = qdict_get_str(qdict, "id");
 Error *err = NULL;
 
-qmp_object_del(id, &err);
+user_creatable_del(id, &err);
 hmp_handle_error(mon, &err);
 }
 
diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 91b95ae..aa0f373 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -43,9 +43,6 @@ void monitor_read_command(Monitor *mon, int show_prompt);
 int monitor_read_password(Monitor *mon, ReadLineFunc *readline_func,
   void *opaque);
 
-void object_add(const char *type, const char *id, const QDict *qdict,
-Visitor *v, Error **errp);
-
 AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
 bool has_opaque, const char *opaque,
 Error **errp);
diff --git a/include/qom/object_interfaces.h b/include/qom/object_interfaces.h
index 283ae0d..7bbaf2f 100644
--- a/include/qom/object_interfaces.h
+++ b/include/qom/object_interfaces.h
@@ -2,6 +2,8 @@
 #define OBJECT_INTERFACES_H
 
 #include "qom/object.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/visitor.h"
 
 #define TYPE_USER_CREATABLE "user-creatable"
 
@@ -72,4 +74,50 @@ void user_creatable_complete(Object *obj, Error **errp);
  * from implements USER_CREATABLE interface.
  */
 bool user_creatable_can_be_deleted(UserCreatable *uc, Error **errp);
+
+/**
+ * user_creatable_add:
+ * @qdict: the object definition
+ * @v: the visitor
+ * @errp: if an error occurs, a pointer to an area to store the error
+ *
+ * Create an instance of the user creatable ob

[Qemu-block] [PATCH v4 00/10] Make qemu-img/qemu-nbd/qemu-io CLI more flexible

2016-01-26 Thread Daniel P. Berrange
This series of patches expands the syntax of the qemu-img,
qemu-nbd and qemu-io commands to make them more flexible.

  v0: https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg04365.html
  v1: https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg04014.html
  v2: https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg04354.html
  v3: https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg03381.html

First all three gain a --object parameter, which allows
instantiation of user creatable object types. The immediate
use case is to allow for creation of the 'secret' object
type to pass passwords for curl, iscsi and rbd drivers.
For qemu-nbd this will also be needed to create TLS
certificates for encryption support.

Then all three gain a '--image-opts' parameter which causes
the positional filenames to be interepreted as option strings
rather tha nplain filenames. This avoids the need to use the
JSON syntax, or to add custom CLI args for each block backend
option that exists. The immediate use case is to allow the
user to specify the ID of the 'secret' object they just created.

Finally, there are a few small cleanup patches

The first 4 patches in this series are a pre-requisite for
3 other series

 - Support for TLS in NBD
 - Support for secrets for passwd auth in curl, rbd, iscsi
   (fixes a CVE issue in libvirt)
 - Support for LUKS encryption passwords

Hopefully the --object patches are fairly uncontroversial
and can be merged soon. The latter patches for --image-opts
are very nice to have, but not a hard blocker right now
since the 'json:{}' syntax can be used until they are
merged.

Changed in v4:

 - Fix error reporting when object_create fails

Changed in v3:

 - Rebase to resolve with conflicts against recently
   merged code
 - Remove use of errx()

Changed in v2:

 - Share more common code in qom/object_interfaces.c to
   avoid duplicating so much of 'object_create' in each
   command
 - Remove previously added '--source optstring' parameter
   which replaced the positional filenames, in favour of
   keeping the positional filenames but using a --image-opts
   boolean arg to change their interpretation
 - Added docs for --image-opts to qemu-img man page
 - Use printf instead of echo -n in examples
 - Line wrap help string based on user terminal width not
   source code width
 - Update qemu-nbd/qemu-io to use constants for options
 - Update qemu-nbd to avoid overlapping option values



Daniel P. Berrange (10):
  qom: add helpers for UserCreatable object types
  qemu-img: add support for --object command line arg
  qemu-nbd: add support for --object command line arg
  qemu-io: add support for --object command line arg
  qemu-io: allow specifying image as a set of options args
  qemu-nbd: allow specifying image as a set of options args
  qemu-img: allow specifying image as a set of options args
  qemu-nbd: don't overlap long option values with short options
  qemu-nbd: use no_argument/required_argument constants
  qemu-io: use no_argument/required_argument constants

 hmp.c   |  52 +---
 include/monitor/monitor.h   |   3 -
 include/qom/object_interfaces.h |  48 
 qemu-img-cmds.hx|  44 +--
 qemu-img.c  | 588 +---
 qemu-img.texi   |  14 +
 qemu-io.c   | 114 +++-
 qemu-nbd.c  | 150 --
 qemu-nbd.texi   |   6 +
 qmp.c   |  76 +-
 qom/object_interfaces.c | 139 ++
 vl.c|  48 +---
 12 files changed, 1029 insertions(+), 253 deletions(-)

-- 
2.5.0




[Qemu-block] [PATCH v4 03/10] qemu-nbd: add support for --object command line arg

2016-01-26 Thread Daniel P. Berrange
Allow creation of user creatable object types with qemu-nbd
via a new --object command line arg. This will be used to supply
passwords and/or encryption keys to the various block driver
backends via the recently added 'secret' object type.

 # printf letmein > mypasswd.txt
 # qemu-nbd --object secret,id=sec0,file=mypasswd.txt \
  ...other nbd args...

Signed-off-by: Daniel P. Berrange 
---
 qemu-nbd.c| 54 ++
 qemu-nbd.texi |  6 ++
 2 files changed, 60 insertions(+)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index ede4a54..8e5d36c 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -23,9 +23,12 @@
 #include "qemu/main-loop.h"
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
+#include "qemu/config-file.h"
 #include "block/snapshot.h"
 #include "qapi/util.h"
 #include "qapi/qmp/qstring.h"
+#include "qapi/opts-visitor.h"
+#include "qom/object_interfaces.h"
 
 #include 
 #include 
@@ -44,6 +47,7 @@
 #define QEMU_NBD_OPT_AIO   2
 #define QEMU_NBD_OPT_DISCARD   3
 #define QEMU_NBD_OPT_DETECT_ZEROES 4
+#define QEMU_NBD_OPT_OBJECT5
 
 static NBDExport *exp;
 static int verbose;
@@ -77,6 +81,9 @@ static void usage(const char *name)
 "  -o, --offset=OFFSET   offset into the image\n"
 "  -P, --partition=NUM   only expose partition NUM\n"
 "\n"
+"General purpose options:\n"
+"  --object type,id=ID,...   define an object such as 'secret' for providing\n"
+"passwords and/or encryption keys\n"
 #ifdef __linux__
 "Kernel NBD client support:\n"
 "  -c, --connect=DEV connect FILE to the local NBD device DEV\n"
@@ -374,6 +381,35 @@ static SocketAddress *nbd_build_socket_address(const char 
*sockpath,
 }
 
 
+static QemuOptsList qemu_object_opts = {
+.name = "object",
+.implied_opt_name = "qom-type",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head),
+.desc = {
+{ }
+},
+};
+
+static int object_create(void *opaque, QemuOpts *opts, Error **errp)
+{
+Error *err = NULL;
+OptsVisitor *ov;
+QDict *pdict;
+
+ov = opts_visitor_new(opts);
+pdict = qemu_opts_to_qdict(opts, NULL);
+
+user_creatable_add(pdict, opts_get_visitor(ov), &err);
+opts_visitor_cleanup(ov);
+QDECREF(pdict);
+
+if (err) {
+error_propagate(errp, err);
+return -1;
+}
+return 0;
+}
+
 int main(int argc, char **argv)
 {
 BlockBackend *blk;
@@ -411,6 +447,7 @@ int main(int argc, char **argv)
 { "format", 1, NULL, 'f' },
 { "persistent", 0, NULL, 't' },
 { "verbose", 0, NULL, 'v' },
+{ "object", 1, NULL, QEMU_NBD_OPT_OBJECT },
 { NULL, 0, NULL, 0 }
 };
 int ch;
@@ -428,6 +465,7 @@ int main(int argc, char **argv)
 Error *local_err = NULL;
 BlockdevDetectZeroesOptions detect_zeroes = 
BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF;
 QDict *options = NULL;
+QemuOpts *opts;
 
 /* The client thread uses SIGTERM to interrupt the server.  A signal
  * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
@@ -436,6 +474,8 @@ int main(int argc, char **argv)
 memset(&sa_sigterm, 0, sizeof(sa_sigterm));
 sa_sigterm.sa_handler = termsig_handler;
 sigaction(SIGTERM, &sa_sigterm, NULL);
+module_call_init(MODULE_INIT_QOM);
+qemu_add_opts(&qemu_object_opts);
 qemu_init_exec_dir(argv[0]);
 
 while ((ch = getopt_long(argc, argv, sopt, lopt, &opt_ind)) != -1) {
@@ -588,6 +628,13 @@ int main(int argc, char **argv)
 usage(argv[0]);
 exit(0);
 break;
+case QEMU_NBD_OPT_OBJECT:
+opts = qemu_opts_parse_noisily(qemu_find_opts("object"),
+   optarg, true);
+if (!opts) {
+exit(1);
+}
+break;
 case '?':
 error_report("Try `%s --help' for more information.", argv[0]);
 exit(EXIT_FAILURE);
@@ -600,6 +647,13 @@ int main(int argc, char **argv)
 exit(EXIT_FAILURE);
 }
 
+if (qemu_opts_foreach(qemu_find_opts("object"),
+  object_create,
+  NULL, &local_err)) {
+error_report_err(local_err);
+exit(1);
+}
+
 if (disconnect) {
 fd = open(argv[optind], O_RDWR);
 if (fd < 0) {
diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 46fd483..9f9daca 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -14,6 +14,12 @@ Export QEMU disk image using NBD protocol.
 @table @option
 @item @var{filename}
  is a disk image filename
+@item --object type,id=@var{id},...props...
+  define a new instance of the @var{type} object class identified by @var{id}.
+  See the @code{qemu(1)} manual page for full details of the properties
+  supported. The common object type that it makes sense to define is the
+  @code{secret} object, which is used to supply passwords and/or encryption
+  keys.
 @item -p, --port=@var{port}
   port to listen o

Re: [Qemu-block] [PATCH v4 00/14] Implement TLS support to QEMU NBD server & client

2016-01-26 Thread Paolo Bonzini


On 21/01/2016 17:37, Daniel P. Berrange wrote:
> This is an update of the series previously posted:
> 
>   v1: https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg06126.html
>   v2: https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01580.html
>   v3: https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg03440.html
> 
> This series of patches implements support for TLS in the QEMU NBD
> server and client code.
> 
> It is implementing the NBD_OPT_STARTTLS option that was previously
> discussed here:
> 
>   https://www.redhat.com/archives/libvir-list/2014-October/msg00506.html
> 
> And is also described in the NBD spec here:
> 
>   https://github.com/yoe/nbd/blob/master/doc/proto.md
> 
> To ensure that clients always get a suitable error message from the
> NBD server when it is configured with TLS, a client speaking the
> new style protocol will always send NBD_OPT_LIST as the first thing
> it does, so that we can see the NBD_REP_ERR_TLS_REQD response. This
> should all be backwards & forwards compatible with previous QEMU
> impls of NBD
> 
> Usage of TLS is described in the commit messages for each patch,
> but for sake of people who don't want to explore the series, here's
> the summary
> 
> Starting QEMU system emulator with a disk backed by an TLS encrypted
> NBD export
> 
>   $ qemu-system-x86_64 \
>  -object 
> tls-creds-x509,id=tls0,endpoint=client,dir=/home/berrange/security/qemutls \
>  -drive driver=nbd,host=localhost,port=9000,tls-creds=tls0
> 
> Starting a standalone NBD server providing a TLS encrypted NBD export
> 
>   $ qemu-nbd \
>  --object 
> tls-creds-x509,id=tls0,endpoint=server,dir=/home/berrange/security/qemutls
>  --tls-creds tls0 \
>  --export-name default \
>  $IMAGEFILE
> 
> The --export-name is optional, if omitted, the default "" will
> be used.
> 
> Starting a QEMU system emulator built-in NBD server
> 
>   $ qemu-system-x86_64 \
>  -qmp unix:/tmp/qmp,server \
>  -hda /home/berrange/Fedora-Server-netinst-x86_64-23.iso \
>  -object 
> tls-creds-x509,id=tls0,dir=/home/berrange/security/qemutls,endpoint=server
> 
>   $ qmp-shell /tmp/qmp
> (qmp) nbd-server-start addr={"host":"localhost","port":"9000"} 
> tls-creds=tls0
> (qmp) nbd-server-add device=ide0-hd0
> 
> This series depends on this bug fix I recently sent:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg03406.html
> 
> And the qemu-nbd/etc command line options work
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg03381.html
> 
> The first 4 patches are the conversion to the I/O channels
> framework.
> 
> The next 6 patches are general tweaks to QEMU's impl of the
> NBD protocol for better compliance and/or future proofing.
> 
> The next patch provides the NBD protocol TLS implementation.
> 
> The final 3 patches allow TLS to be enabled in the QEMU NBD
> client and servers.
> 
> Changed in v4:
> 
>  - Don't pick the first export name in the list if no export
>name is provided (Paolo)
>  - Set client requested export name to "" if none is provided
>by the user (Paolo)
>  - Set server advertized export name to "" if TLS is enabled
>and none is provided by the user (Paolo)
>  - Rename qemu-nbd --exportname to --export-name (Paolo)
>  - Use iov_discard_front() to simplify iov handling (Paolo)
> 
> Changed in v3:
> 
>  - Rebase to resolve conflicts with recently merged NBD patches
> 
> Changed in v2:
> 
>  - Fix error codes used during NBD TLS option negotiate
>  - Update patch with helpers for UserCreatable object types
> 
> Daniel P. Berrange (14):
>   nbd: convert block client to use I/O channels for connection setup
>   nbd: convert qemu-nbd server to use I/O channels for connection setup
>   nbd: convert blockdev NBD server to use I/O channels for connection
> setup
>   nbd: convert to using I/O channels for actual socket I/O
>   nbd: invert client logic for negotiating protocol version
>   nbd: make server compliant with fixed newstyle spec
>   nbd: make client request fixed new style if advertized
>   nbd: allow setting of an export name for qemu-nbd server
>   nbd: always query export list in fixed new style protocol
>   nbd: use "" as a default export name if none provided
>   nbd: implement TLS support in the protocol negotiation
>   nbd: enable use of TLS with NBD block driver
>   nbd: enable use of TLS with qemu-nbd server
>   nbd: enable use of TLS with nbd-server-start command
> 
>  Makefile|   6 +-
>  block/nbd-client.c  |  91 +++
>  block/nbd-client.h  |  10 +-
>  block/nbd.c | 105 ++---
>  blockdev-nbd.c  | 131 +---
>  hmp.c   |   2 +-
>  include/block/nbd.h |  28 +++-
>  nbd/client.c| 440 
> +---
>  nbd/common.c|  83 ++
>  nbd/nbd-internal.h  |  32 ++--
>  nbd/server.c| 334 ---
>  qapi/block.json |   4 +-
>  qemu-nbd.c 

[Qemu-block] [RFC PATCH 16/16] iotests: Add persistent bitmap test case 141

2016-01-26 Thread Fam Zheng
For now it merely invokes block-dirty-bitmap-{add,set-persistent}.
Verification of the bitmap data and user data to be added in the future.

Signed-off-by: Fam Zheng 
---
 tests/qemu-iotests/141 | 62 ++
 tests/qemu-iotests/141.out |  5 
 tests/qemu-iotests/group   |  1 +
 3 files changed, 68 insertions(+)
 create mode 100644 tests/qemu-iotests/141
 create mode 100644 tests/qemu-iotests/141.out

diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
new file mode 100644
index 000..434c7ce
--- /dev/null
+++ b/tests/qemu-iotests/141
@@ -0,0 +1,62 @@
+#!/usr/bin/env python
+#
+# Tests for persistent dirty bitmap
+#
+# Copyright (C) 2016 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+from iotests import qemu_img, qemu_io
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+
+class TestPersistentDirtyBitmap(iotests.QMPTestCase):
+image_len = 64 * 1024 * 1024 # MB
+def setUp(self):
+# Write data to the image so we can compare later
+qemu_img('create', '-f', iotests.imgfmt, test_img, str(self.image_len))
+self.vm = iotests.VM().add_drive(test_img)
+self.vm.launch()
+
+def tearDown(self):
+self.vm.shutdown()
+os.remove(test_img)
+
+def do_test_create(self, n):
+def make_range(k):
+return (k * 65536, 512)
+r = range(n)
+for i in r:
+result = self.vm.qmp('block-dirty-bitmap-add', node='drive0',
+name='bitmap-%d' % i,
+persistent=True)
+self.assert_qmp(result, 'return', {})
+self.vm.hmp_qemu_io('drive0', 'write -P %d %d %d' % ((i % 255,) + 
make_range(i)))
+for i in r:
+result = self.vm.qmp('block-dirty-bitmap-set-persistent',
+ node='drive0', name='bitmap-%d' % i,
+ persistent=False)
+self.assert_qmp(result, 'return', {})
+
+def test_simple_one(self):
+self.do_test_create(1)
+
+def test_simple_multiple(self):
+self.do_test_create(10)
+
+if __name__ == '__main__':
+iotests.main(supported_fmts=['qbm'])
diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
new file mode 100644
index 000..fbc63e6
--- /dev/null
+++ b/tests/qemu-iotests/141.out
@@ -0,0 +1,5 @@
+..
+--
+Ran 2 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index e220a00..877bdbb 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -142,4 +142,5 @@
 138 rw auto quick
 139 rw auto quick
 140 rw auto quick
+141 rw auto quick
 142 auto
-- 
2.4.3




[Qemu-block] [RFC PATCH 14/16] iotests: Add qbm to applicable test cases

2016-01-26 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 tests/qemu-iotests/004 | 2 +-
 tests/qemu-iotests/017 | 2 +-
 tests/qemu-iotests/018 | 2 +-
 tests/qemu-iotests/019 | 2 +-
 tests/qemu-iotests/020 | 2 +-
 tests/qemu-iotests/024 | 2 +-
 tests/qemu-iotests/025 | 2 +-
 tests/qemu-iotests/027 | 2 +-
 tests/qemu-iotests/028 | 2 +-
 tests/qemu-iotests/030 | 2 +-
 tests/qemu-iotests/034 | 2 +-
 tests/qemu-iotests/037 | 2 +-
 tests/qemu-iotests/038 | 2 +-
 tests/qemu-iotests/040 | 2 +-
 tests/qemu-iotests/050 | 2 +-
 tests/qemu-iotests/055 | 2 +-
 tests/qemu-iotests/056 | 2 +-
 tests/qemu-iotests/069 | 2 +-
 tests/qemu-iotests/072 | 2 +-
 tests/qemu-iotests/086 | 2 +-
 tests/qemu-iotests/096 | 2 +-
 tests/qemu-iotests/099 | 2 +-
 tests/qemu-iotests/110 | 2 +-
 tests/qemu-iotests/129 | 2 +-
 tests/qemu-iotests/132 | 2 +-
 tests/qemu-iotests/139 | 2 +-
 26 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/tests/qemu-iotests/004 b/tests/qemu-iotests/004
index 2ad77ed..a67882e 100755
--- a/tests/qemu-iotests/004
+++ b/tests/qemu-iotests/004
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt raw qcow qcow2 qed vdi vmdk vhdx
+_supported_fmt raw qcow qcow2 qed vdi vmdk vhdx qbm
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/017 b/tests/qemu-iotests/017
index 3af3cdf..220d26f 100755
--- a/tests/qemu-iotests/017
+++ b/tests/qemu-iotests/017
@@ -40,7 +40,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed qbm
 _supported_proto generic
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat"
diff --git a/tests/qemu-iotests/018 b/tests/qemu-iotests/018
index 07b2de9..185c617 100755
--- a/tests/qemu-iotests/018
+++ b/tests/qemu-iotests/018
@@ -40,7 +40,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed qbm
 _supported_proto file
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat"
diff --git a/tests/qemu-iotests/019 b/tests/qemu-iotests/019
index 0937b5c..6354a7d 100755
--- a/tests/qemu-iotests/019
+++ b/tests/qemu-iotests/019
@@ -44,7 +44,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed qbm
 _supported_proto file
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" \
diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 6625b55..187739b 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -42,7 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed qbm
 _supported_proto generic
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" \
diff --git a/tests/qemu-iotests/024 b/tests/qemu-iotests/024
index 2c2d148..844bb11 100755
--- a/tests/qemu-iotests/024
+++ b/tests/qemu-iotests/024
@@ -42,7 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Currently only qcow2 and qed support rebasing
-_supported_fmt qcow2 qed
+_supported_fmt qcow2 qed qbm
 _supported_proto file
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/025 b/tests/qemu-iotests/025
index 467a4b7..6a7e592 100755
--- a/tests/qemu-iotests/025
+++ b/tests/qemu-iotests/025
@@ -39,7 +39,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.filter
 . ./common.pattern
 
-_supported_fmt raw qcow2 qed
+_supported_fmt raw qcow2 qed qbm
 _supported_proto file sheepdog rbd nfs archipelago
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/027 b/tests/qemu-iotests/027
index 3fa81b8..be97963 100755
--- a/tests/qemu-iotests/027
+++ b/tests/qemu-iotests/027
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt vmdk qcow qcow2 qed
+_supported_fmt vmdk qcow qcow2 qed qbm
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
index 4909b9b..43d3f0a 100755
--- a/tests/qemu-iotests/028
+++ b/tests/qemu-iotests/028
@@ -46,7 +46,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # Any format supporting backing files except vmdk and qcow which do not support
 # smaller backing files.
-_supported_fmt qcow2 qed
+_supported_fmt qcow2 qed qbm
 _supported_proto file
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 32469ef..fe5ad4a 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -467,4 +467,4 @@ class TestSetSpeed(iotests.QMPTestCase):
 self.cancel_and_wait()
 
 if __name__ == '__main__':
-iotests.main(supported_fmts=['qcow2', 'qed'])
+

[Qemu-block] [RFC PATCH 13/16] iotests: Add qbm to case 097

2016-01-26 Thread Fam Zheng
The output of "qemu-img map" will be slightly different for qbm because
the data image paths are not $TEST_IMG, but the pattern is predicatable
enough so we can just filter it out.

Signed-off-by: Fam Zheng 
---
 tests/qemu-iotests/095 | 2 +-
 tests/qemu-iotests/097 | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/095 b/tests/qemu-iotests/095
index dad04b9..2f68953 100755
--- a/tests/qemu-iotests/095
+++ b/tests/qemu-iotests/095
@@ -43,7 +43,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.filter
 . ./common.qemu
 
-_supported_fmt qcow2
+_supported_fmt qcow2 qbm
 _supported_proto file
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/097 b/tests/qemu-iotests/097
index c7a613b..2252d62 100755
--- a/tests/qemu-iotests/097
+++ b/tests/qemu-iotests/097
@@ -42,7 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files and bdrv_make_empty
-_supported_fmt qcow qcow2
+_supported_fmt qcow qcow2 qbm
 _supported_proto file
 _supported_os Linux
 
@@ -109,9 +109,11 @@ else
 # Both top and intermediate should be unchanged
 fi
 
+{
 $QEMU_IMG map "$TEST_IMG.base" | _filter_qemu_img_map
 $QEMU_IMG map "$TEST_IMG.itmd" | _filter_qemu_img_map
 $QEMU_IMG map "$TEST_IMG" | _filter_qemu_img_map
+} | sed -e 's/.data.img//'
 
 done
 
-- 
2.4.3




[Qemu-block] [RFC PATCH 09/16] qmp: Add block-dirty-bitmap-set-persistent

2016-01-26 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 blockdev.c   | 20 
 qapi/block-core.json | 22 ++
 qmp-commands.hx  | 31 +++
 3 files changed, 73 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 08236f2..a9d6617 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2699,6 +2699,9 @@ void qmp_block_dirty_bitmap_remove(const char *node, 
const char *name,
name);
 goto out;
 }
+if (bdrv_dirty_bitmap_set_persistent(bs, bitmap, false, false, errp)) {
+goto out;
+}
 bdrv_dirty_bitmap_make_anon(bitmap);
 bdrv_release_dirty_bitmap(bs, bitmap);
 
@@ -2740,6 +2743,23 @@ void qmp_block_dirty_bitmap_clear(const char *node, 
const char *name,
 aio_context_release(aio_context);
 }
 
+void qmp_block_dirty_bitmap_set_persistent(const char *node, const char *name,
+   bool persistent, Error **errp)
+{
+AioContext *aio_context;
+BdrvDirtyBitmap *bitmap;
+BlockDriverState *bs;
+
+bitmap = block_dirty_bitmap_lookup(node, name, &bs, &aio_context, errp);
+if (!bitmap || !bs) {
+return;
+}
+
+bdrv_dirty_bitmap_set_persistent(bs, bitmap, persistent, false, errp);
+
+aio_context_release(aio_context);
+}
+
 void hmp_drive_del(Monitor *mon, const QDict *qdict)
 {
 const char *id = qdict_get_str(qdict, "id");
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0ac107c..52689ed 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1263,6 +1263,28 @@
 '*on-target-error': 'BlockdevOnError' } }
 
 ##
+# @block-dirty-bitmap-set-persistent
+#
+# Update a dirty bitmap's persistent state on the device
+#
+# @node: name of device/node which the bitmap is tracking
+#
+# @name: name of the dirty bitmap
+#
+# @persistent: #optinal whether to make the bitmap persistent, default is false
+#
+# Returns: nothing on success
+#  If @node is not a valid block device, DeviceNotFound
+#  If @name is not found, GenericError with an explanation
+#  If an error happens when setting the persistent state, GenericError
+#  with an explanation
+#
+# Since 2.6
+##
+{ 'command': 'block-dirty-bitmap-set-persistent',
+  'data': { 'node': 'str', 'name': 'str', 'persistent': 'bool' } }
+
+##
 # @block_set_io_throttle:
 #
 # Change I/O throttle limits for a block drive.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index bd4428e..e37cf09 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1458,6 +1458,37 @@ Example:
 EQMP
 
 {
+.name   = "block-dirty-bitmap-set-persistent",
+.args_type  = "node:B,name:s,persistent:b",
+.mhandler.cmd_new = qmp_marshal_block_dirty_bitmap_set_persistent,
+},
+
+SQMP
+
+block-dirty-bitmap-set-persistent
+-
+Since 2.6
+
+Update the persistent state of a dirty bitmap. Format driver support is
+required.
+
+Arguments:
+
+- "node": device/node on which to update the dirty bitmap (json-string)
+- "name": name of the dirty bitmap to update (json-string)
+- "persistent": the state to update to. (json-bool)
+
+Example:
+
+-> { "execute": "block-dirty-bitmap-set-persistent",
+"arguments": { "node": "drive0",
+   "name": "bitmap0",
+   "persistent": true } }
+<- { "return": {} }
+
+EQMP
+
+{
 .name   = "blockdev-snapshot-sync",
 .args_type  = 
"device:s?,node-name:s?,snapshot-file:s,snapshot-node-name:s?,format:s?,mode:s?",
 .mhandler.cmd_new = qmp_marshal_blockdev_snapshot_sync,
-- 
2.4.3




[Qemu-block] [RFC PATCH 11/16] qapi: Add "qbm" as a generic cow format driver

2016-01-26 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 qapi/block-core.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 52689ed..97dc0cd 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1599,7 +1599,7 @@
 { 'enum': 'BlockdevDriver',
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
 'dmg', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
-'http', 'https', 'null-aio', 'null-co', 'parallels',
+'http', 'https', 'null-aio', 'null-co', 'parallels', 'qbm',
 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp', 'vdi', 'vhdx',
 'vmdk', 'vpc', 'vvfat' ] }
 
@@ -2058,6 +2058,7 @@
   'null-aio':   'BlockdevOptionsNull',
   'null-co':'BlockdevOptionsNull',
   'parallels':  'BlockdevOptionsGenericFormat',
+  'qbm':'BlockdevOptionsGenericCOWFormat',
   'qcow2':  'BlockdevOptionsQcow2',
   'qcow':   'BlockdevOptionsGenericCOWFormat',
   'qed':'BlockdevOptionsGenericCOWFormat',
-- 
2.4.3




[Qemu-block] [RFC PATCH 15/16] iotests: Add qbm specific test case 140

2016-01-26 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 tests/qemu-iotests/140 |  80 +
 tests/qemu-iotests/140.out | 145 +
 tests/qemu-iotests/common  |   6 ++
 tests/qemu-iotests/group   |   1 +
 4 files changed, 232 insertions(+)
 create mode 100755 tests/qemu-iotests/140
 create mode 100644 tests/qemu-iotests/140.out

diff --git a/tests/qemu-iotests/140 b/tests/qemu-iotests/140
new file mode 100755
index 000..e5c3c56
--- /dev/null
+++ b/tests/qemu-iotests/140
@@ -0,0 +1,80 @@
+#!/bin/bash
+#
+# General tests for QBM format
+#
+# Copyright (C) 2015 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=f...@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+return
+_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qbm
+_supported_proto file
+_supported_os Linux
+
+size=128M
+
+echo
+echo "=== Create a QBM file with no option ==="
+_make_test_img $size
+_img_info
+cat $TEST_IMG
+
+for n in 0 1 3; do
+echo
+echo "=== Create a QBM file with $n dirty bitmap(s) ==="
+echo
+_make_test_img -o dirty-bitmaps=$n $size
+_img_info
+cat $TEST_IMG
+done
+
+
+$QEMU_IMG map $TEST_IMG
+
+echo
+echo "=== Create a QBM file with raw backing image ==="
+IMGFMT=raw TEST_IMG=$TEST_IMG.base _make_test_img $size
+$QEMU_IO_PROG -f raw $TEST_IMG.base -c "write 0 $size" | _filter_qemu_io
+_make_test_img -o dirty-bitmaps=1 -b $TEST_IMG.base
+cat $TEST_IMG
+_img_info
+
+$QEMU_IO $TEST_IMG -c "write 130560 131072" | _filter_qemu_io
+$QEMU_IMG map $TEST_IMG
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/140.out b/tests/qemu-iotests/140.out
new file mode 100644
index 000..1d9cefb
--- /dev/null
+++ b/tests/qemu-iotests/140.out
@@ -0,0 +1,145 @@
+QA output created by 140
+
+=== Create a QBM file with no option ===
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
+image: TEST_DIR/t.IMGFMT
+file format: IMGFMT
+virtual size: 128M (134217728 bytes)
+cluster_size: 512
+{
+"QBM": {
+"bitmaps": {
+},
+"image": {
+"format": "raw",
+"file": "t-data.img.qbm"
+},
+"version": 1,
+"creator": "QEMU"
+}
+}
+
+=== Create a QBM file with 0 dirty bitmap(s) ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 dirty-bitmaps=0
+image: TEST_DIR/t.IMGFMT
+file format: IMGFMT
+virtual size: 128M (134217728 bytes)
+cluster_size: 512
+{
+"QBM": {
+"bitmaps": {
+},
+"image": {
+"format": "raw",
+"file": "t-data.img.qbm"
+},
+"version": 1,
+"creator": "QEMU"
+}
+}
+
+=== Create a QBM file with 1 dirty bitmap(s) ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 dirty-bitmaps=1
+image: TEST_DIR/t.IMGFMT
+file format: IMGFMT
+virtual size: 128M (134217728 bytes)
+cluster_size: 512
+{
+"QBM": {
+"bitmaps": {
+"dirty.0": {
+"granularity-bytes": 65536,
+"type": "dirty",
+"file": "t-dirty.0.bitmap.qbm"
+}
+},
+"image": {
+"format": "raw",
+"file": "t-data.img.qbm"
+},
+"version": 1,
+"creator": "QEMU"
+}
+}
+
+=== Create a QBM file with 3 dirty bitmap(s) ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 dirty-bitmaps=3
+image: TEST_DIR/t.IMGFMT
+file format: IMGFMT
+virtual size: 128M (134217728 bytes)
+cluster_size: 512
+{
+"QBM": {
+"bitmaps": {
+"dirty.1": {
+"granularity-bytes": 65536,
+"type": "dirty",
+"file": "t-dirty.1.bitmap.qbm"
+},
+"dirty.2": {
+"granularity-bytes": 65536,
+"type": "dirty",
+"file": "t-dirty.2.bitmap.qbm"
+},
+"dirty.0": {
+"granularity-bytes": 65536,
+"type": "dirty",
+"file": "t-dirty.0.bitmap.qbm"
+}
+},
+"image": {
+"format": "raw",
+"file": "t-da

[Qemu-block] [RFC PATCH 07/16] block: Only swap non-persistent dirty bitmaps

2016-01-26 Thread Fam Zheng
Persistent dirty bitmaps are special because they're tightly associated
with, or even belonging to the driver, swapping them doesn't make much
sense. Because this has nothing to do with backward compatibility, it's
okay to just let them stay with the old BDS.

Signed-off-by: Fam Zheng 
---
 block.c  | 11 +--
 block/dirty-bitmap.c | 25 +
 include/block/dirty-bitmap.h |  1 +
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 78db342..3a29de2 100644
--- a/block.c
+++ b/block.c
@@ -2274,9 +2274,6 @@ static void bdrv_move_feature_fields(BlockDriverState 
*bs_dest,
 bs_dest->copy_on_read   = bs_src->copy_on_read;
 
 bs_dest->enable_write_cache = bs_src->enable_write_cache;
-
-/* dirty bitmap */
-bs_dest->dirty_bitmaps  = bs_src->dirty_bitmaps;
 }
 
 static void change_parent_backing_link(BlockDriverState *from,
@@ -2302,10 +2299,12 @@ static void change_parent_backing_link(BlockDriverState 
*from,
 }
 
 static void swap_feature_fields(BlockDriverState *bs_top,
-BlockDriverState *bs_new)
+BlockDriverState *bs_new,
+Error **errp)
 {
 BlockDriverState tmp;
 
+bdrv_dirty_bitmap_swap(bs_top, bs_new);
 bdrv_move_feature_fields(&tmp, bs_top);
 bdrv_move_feature_fields(bs_top, bs_new);
 bdrv_move_feature_fields(bs_new, &tmp);
@@ -2343,7 +2342,7 @@ void bdrv_append(BlockDriverState *bs_new, 
BlockDriverState *bs_top)
 change_parent_backing_link(bs_top, bs_new);
 
 /* Some fields always stay on top of the backing file chain */
-swap_feature_fields(bs_top, bs_new);
+swap_feature_fields(bs_top, bs_new, NULL);
 
 bdrv_set_backing_hd(bs_new, bs_top);
 bdrv_unref(bs_top);
@@ -2368,7 +2367,7 @@ void bdrv_replace_in_backing_chain(BlockDriverState *old, 
BlockDriverState *new)
  * swap instead so that pointers aren't duplicated and cause trouble.
  * (Also, bdrv_swap() used to do the same.) */
 assert(!new->blk);
-swap_feature_fields(old, new);
+swap_feature_fields(old, new, NULL);
 }
 change_parent_backing_link(old, new);
 
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 882a0db..a3a401f 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -65,6 +65,31 @@ BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState 
*bs, const char *name)
 return NULL;
 }
 
+/* Swap non-persistent dirty bitmaps. */
+void bdrv_dirty_bitmap_swap(BlockDriverState *bs1, BlockDriverState *bs2)
+{
+BdrvDirtyBitmap *bm, *next;
+QLIST_HEAD(, BdrvDirtyBitmap) tmp = QLIST_HEAD_INITIALIZER(&tmp);
+
+QLIST_FOREACH_SAFE(bm, &bs1->dirty_bitmaps, list, next) {
+if (bm->persistent) {
+continue;
+}
+QLIST_REMOVE(bm, list);
+QLIST_INSERT_HEAD(&tmp, bm, list);
+}
+QLIST_FOREACH_SAFE(bm, &bs2->dirty_bitmaps, list, next) {
+if (bm->persistent) {
+continue;
+}
+QLIST_REMOVE(bm, list);
+QLIST_INSERT_HEAD(&bs1->dirty_bitmaps, bm, list);
+}
+QLIST_FOREACH_SAFE(bm, &tmp, list, next) {
+QLIST_INSERT_HEAD(&bs2->dirty_bitmaps, bm, list);
+}
+}
+
 void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
 {
 assert(!bdrv_dirty_bitmap_frozen(bitmap));
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 5885720..a4de9c7 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -23,6 +23,7 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState 
*bs,
Error **errp);
 BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs,
 const char *name);
+void bdrv_dirty_bitmap_swap(BlockDriverState *bs1, BlockDriverState *bs2);
 void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap);
 int bdrv_dirty_bitmap_set_persistent(BlockDriverState *bs,
  BdrvDirtyBitmap *bitmap,
-- 
2.4.3




[Qemu-block] [RFC PATCH 12/16] iotests: Add qbm format to 041

2016-01-26 Thread Fam Zheng
Though a number of test cases dosn't apply because of cluster size and
blkdebug limitation, mirroring qbm can be covered by all other cases.

Signed-off-by: Fam Zheng 
---
 tests/qemu-iotests/041 | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index c7da95d..3712aca 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -139,6 +139,8 @@ class TestSingleDrive(iotests.QMPTestCase):
 'target image does not match source after mirroring')
 
 def test_small_buffer2(self):
+if iotests.imgfmt == "qbm":
+return
 self.assert_no_active_block_jobs()
 
 qemu_img('create', '-f', iotests.imgfmt, '-o', 
'cluster_size=%d,size=%d'
@@ -155,6 +157,8 @@ class TestSingleDrive(iotests.QMPTestCase):
 'target image does not match source after mirroring')
 
 def test_large_cluster(self):
+if iotests.imgfmt == "qbm":
+return
 self.assert_no_active_block_jobs()
 
 qemu_img('create', '-f', iotests.imgfmt, '-o', 
'cluster_size=%d,backing_file=%s'
@@ -265,9 +269,9 @@ class TestMirrorNoBacking(iotests.QMPTestCase):
 os.remove(backing_img)
 try:
 os.remove(target_backing_img)
+os.remove(target_img)
 except:
 pass
-os.remove(target_img)
 
 def test_complete(self):
 self.assert_no_active_block_jobs()
@@ -300,6 +304,8 @@ class TestMirrorNoBacking(iotests.QMPTestCase):
 'target image does not match source after mirroring')
 
 def test_large_cluster(self):
+if iotests.imgfmt == "qbm":
+return
 self.assert_no_active_block_jobs()
 
 # qemu-img create fails if the image is not there
@@ -461,6 +467,8 @@ new_state = "1"
 self.vm.shutdown()
 
 def test_large_cluster(self):
+if iotests.imgfmt == "qbm":
+return
 self.assert_no_active_block_jobs()
 
 # Test COW into the target image.  The first half of the
@@ -568,6 +576,8 @@ new_state = "1"
 os.remove(self.blkdebug_file)
 
 def test_report_write(self):
+if iotests.imgfmt == "qbm":
+return
 self.assert_no_active_block_jobs()
 
 result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
@@ -595,6 +605,8 @@ new_state = "1"
 self.vm.shutdown()
 
 def test_ignore_write(self):
+if iotests.imgfmt == "qbm":
+return
 self.assert_no_active_block_jobs()
 
 result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
@@ -612,6 +624,8 @@ new_state = "1"
 self.vm.shutdown()
 
 def test_stop_write(self):
+if iotests.imgfmt == "qbm":
+return
 self.assert_no_active_block_jobs()
 
 result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
@@ -981,4 +995,4 @@ class TestRepairQuorum(iotests.QMPTestCase):
 self.vm.shutdown()
 
 if __name__ == '__main__':
-iotests.main(supported_fmts=['qcow2', 'qed'])
+iotests.main(supported_fmts=['qcow2', 'qed', 'qbm'])
-- 
2.4.3




[Qemu-block] [RFC PATCH 10/16] qbm: Implement format driver

2016-01-26 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 block/Makefile.objs |1 +
 block/qbm.c | 1315 +++
 2 files changed, 1316 insertions(+)
 create mode 100644 block/qbm.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index cdd8655..ba7 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -5,6 +5,7 @@ block-obj-y += qed-check.o
 block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
 block-obj-y += quorum.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
+block-obj-y += qbm.o
 block-obj-y += block-backend.o snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
 block-obj-$(CONFIG_POSIX) += raw-posix.o
diff --git a/block/qbm.c b/block/qbm.c
new file mode 100644
index 000..91e129f
--- /dev/null
+++ b/block/qbm.c
@@ -0,0 +1,1315 @@
+/*
+ * Block driver for the QBM format
+ *
+ * Copyright (c) 2016 Red Hat Inc.
+ *
+ * Authors:
+ * Fam Zheng 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
+#include "qemu/module.h"
+#include "migration/migration.h"
+#include "qapi/qmp/qint.h"
+#include "qapi/qmp/qjson.h"
+
+#define QBM_BUF_SIZE_MAX (32 << 20)
+
+typedef enum QBMBitmapType {
+QBM_TYPE_DIRTY,
+QBM_TYPE_ALLOC,
+} QBMBitmapType;
+
+typedef struct QBMBitmap {
+BdrvDirtyBitmap *bitmap;
+BdrvChild *file;
+char *name;
+QBMBitmapType type;
+} QBMBitmap;
+
+typedef struct BDRVQBMState {
+BdrvChild *image;
+BdrvDirtyBitmap *alloc_bitmap;
+QDict *desc;
+QDict *backing_dict;
+QBMBitmap *bitmaps;
+int num_bitmaps;
+} BDRVQBMState;
+
+static const char *qbm_token_consume(const char *p, const char *token)
+{
+size_t len = strlen(token);
+
+if (!p) {
+return NULL;
+}
+while (*p && (*p == ' ' ||
+  *p == '\t' ||
+  *p == '\n' ||
+  *p == '\r')) {
+p++;
+}
+if (strncmp(p, token, len)) {
+return p + len;
+}
+return NULL;
+}
+
+static int qbm_probe(const uint8_t *buf, int buf_size, const char *filename)
+{
+const char *p;
+p = strstr((const char *)buf, "\"QBM\"");
+if (!p) {
+p = strstr((const char *)buf, "'QBM'");
+}
+if (!p) {
+return 0;
+}
+p = qbm_token_consume(p, ":");
+p = qbm_token_consume(p, "{");
+if (p && *p) {
+return 100;
+}
+return 0;
+}
+
+static void qbm_load_bitmap(BlockDriverState *bs, QBMBitmap *bm, Error **errp)
+{
+int r;
+BDRVQBMState *s = bs->opaque;
+int64_t bitmap_file_size;
+int64_t bitmap_size;
+uint8_t *buf = NULL;
+BlockDriverState *file = bm->file->bs;
+int64_t image_size = bdrv_getlength(s->image->bs);
+
+if (image_size < 0) {
+error_setg(errp, "Cannot get image size: %s", s->image->bs->filename);
+return;
+}
+bitmap_size = bdrv_dirty_bitmap_serialization_size(bm->bitmap, 0,
+bdrv_dirty_bitmap_size(bm->bitmap));
+if (bitmap_size > QBM_BUF_SIZE_MAX) {
+error_setg(errp, "Bitmap too big");
+return;
+}
+bitmap_file_size = bdrv_getlength(file);
+if (bitmap_file_size < bitmap_size) {
+error_setg(errp,
+   "Bitmap \"%s\" file too small "
+   "(expecting at least %ld bytes but got %ld bytes): %s",
+   bm->name, bitmap_size, bitmap_file_size, file->filename);
+goto out;
+}
+buf = qemu_blockalign(file, bitmap_size);
+r = bdrv_pread(file, 0, buf, bitmap_size);
+if (r < 0) {
+error_setg(errp, "Failed to read bitmap file \"%s\"",
+   file->filename);
+goto out;
+}
+bdrv_dirty_bitmap_deserialize_part(bm->bitmap, buf, 0, bs->total_sectors,
+   true);
+
+out:
+g_free(buf);
+}
+
+static int qbm

[Qemu-block] [RFC PATCH 08/16] qmp: Add optional parameter "persistent" in block-dirty-bitmap-add

2016-01-26 Thread Fam Zheng
When omitted it defaults to false with unchanged behavior.

When set to true, the created dirty bitmap is made persistent if supported, it
requires support from the active image format. Otherwise an error is returned.

Signed-off-by: Fam Zheng 
---
 blockdev.c   | 8 +++-
 qapi/block-core.json | 6 +-
 qmp-commands.hx  | 3 ++-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 07cfe25..08236f2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1997,6 +1997,7 @@ static void block_dirty_bitmap_add_prepare(BlkActionState 
*common,
 /* AIO context taken and released within qmp_block_dirty_bitmap_add */
 qmp_block_dirty_bitmap_add(action->node, action->name,
action->has_granularity, action->granularity,
+   action->has_persistent, action->persistent,
&local_err);
 
 if (!local_err) {
@@ -2640,10 +2641,12 @@ out:
 
 void qmp_block_dirty_bitmap_add(const char *node, const char *name,
 bool has_granularity, uint32_t granularity,
+bool has_persistent, bool persistent,
 Error **errp)
 {
 AioContext *aio_context;
 BlockDriverState *bs;
+BdrvDirtyBitmap *bitmap;
 
 if (!name || name[0] == '\0') {
 error_setg(errp, "Bitmap name cannot be empty");
@@ -2669,7 +2672,10 @@ void qmp_block_dirty_bitmap_add(const char *node, const 
char *name,
 granularity = bdrv_get_default_bitmap_granularity(bs);
 }
 
-bdrv_create_dirty_bitmap(bs, granularity, name, errp);
+bitmap = bdrv_create_dirty_bitmap(bs, granularity, name, errp);
+if (bitmap && has_persistent && persistent) {
+bdrv_dirty_bitmap_set_persistent(bs, bitmap, true, false, errp);
+}
 
  out:
 aio_context_release(aio_context);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 30c2e5f..0ac107c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1162,10 +1162,14 @@
 # @granularity: #optional the bitmap granularity, default is 64k for
 #   block-dirty-bitmap-add
 #
+# @persistent: #optinal whether to make the bitmap persistent, default is 
false.
+#  (Since 2.6)
+#
 # Since 2.4
 ##
 { 'struct': 'BlockDirtyBitmapAdd',
-  'data': { 'node': 'str', 'name': 'str', '*granularity': 'uint32' } }
+  'data': { 'node': 'str', 'name': 'str', '*granularity': 'uint32',
+'*persistent': 'bool' } }
 
 ##
 # @block-dirty-bitmap-add
diff --git a/qmp-commands.hx b/qmp-commands.hx
index db072a6..bd4428e 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1373,7 +1373,7 @@ EQMP
 
 {
 .name   = "block-dirty-bitmap-add",
-.args_type  = "node:B,name:s,granularity:i?",
+.args_type  = "node:B,name:s,granularity:i?,persistent:b?",
 .mhandler.cmd_new = qmp_marshal_block_dirty_bitmap_add,
 },
 
@@ -1390,6 +1390,7 @@ Arguments:
 - "node": device/node on which to create dirty bitmap (json-string)
 - "name": name of the new dirty bitmap (json-string)
 - "granularity": granularity to track writes with (int, optional)
+- "persistent": whether the bitmap is persistent (bool, optional, default to 
no)
 
 Example:
 
-- 
2.4.3




[Qemu-block] [RFC PATCH 06/16] block: Introduce bdrv_dirty_bitmap_set_persistent

2016-01-26 Thread Fam Zheng
By implementing bdrv_dirty_bitmap_set_persistent, a driver can support
the persistent dirty bitmap feature.

Once a dirty bitmap is made persistent, the driver is responsible for saving
the dirty bitmap when appropriate, for example before close; if a persistent
bitmap is removed or made non-persistent, .bdrv_dirty_bitmap_set_persistent
will be called, the driver should then remove the dirty bitmap from the disk.

This operation is not recursed in block layer, a filter such as blkdebug needs
to implement the callback and explicitly pass down to bs->file, etc.

Signed-off-by: Fam Zheng 
---
 block/dirty-bitmap.c | 38 ++
 include/block/block_int.h|  8 
 include/block/dirty-bitmap.h |  4 
 3 files changed, 50 insertions(+)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 1aa7f76..882a0db 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -43,6 +43,7 @@ struct BdrvDirtyBitmap {
 int64_t size;   /* Size of the bitmap (Number of sectors) */
 bool disabled;  /* Bitmap is read-only */
 int active_iterators;   /* How many iterators are active */
+bool persistent;/* Whether this bitmap is persistent. */
 QLIST_ENTRY(BdrvDirtyBitmap) list;
 };
 
@@ -71,6 +72,37 @@ void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
 bitmap->name = NULL;
 }
 
+int bdrv_dirty_bitmap_set_persistent(BlockDriverState *bs,
+ BdrvDirtyBitmap *bitmap,
+ bool persistent, bool flag_only,
+ Error **errp)
+{
+int ret = 0;
+
+if (!bitmap->name) {
+error_setg(errp, "Cannot change the persistent status of an anonymous"
+ "bitmap");
+return -EINVAL;
+}
+
+if (persistent == bitmap->persistent) {
+return 0;
+}
+
+if (!flag_only) {
+if (!bs->drv || !bs->drv->bdrv_dirty_bitmap_set_persistent) {
+error_setg(errp, "Not supported in this format.");
+return -ENOTSUP;
+}
+ret = bs->drv->bdrv_dirty_bitmap_set_persistent(bs, bitmap, persistent,
+errp);
+}
+if (!ret) {
+bitmap->persistent = persistent;
+}
+return ret;
+}
+
 BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
   uint32_t granularity,
   const char *name,
@@ -194,6 +226,12 @@ int bdrv_dirty_bitmap_create_successor(BlockDriverState 
*bs,
 uint64_t granularity;
 BdrvDirtyBitmap *child;
 
+if (bitmap->persistent) {
+error_setg(errp, "Cannot create a successor for a bitmap that is "
+   "persistent");
+return -1;
+}
+
 if (bdrv_dirty_bitmap_frozen(bitmap)) {
 error_setg(errp, "Cannot create a successor for a bitmap that is "
"currently frozen");
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 5fa58e8..fbc34af 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -305,6 +305,14 @@ struct BlockDriver {
  */
 void (*bdrv_drain)(BlockDriverState *bs);
 
+/**
+ * Make the dirty bitmap persistent if persistent=true or transient
+ * otherwise.
+ */
+int (*bdrv_dirty_bitmap_set_persistent)(BlockDriverState *bs,
+BdrvDirtyBitmap *bitmap,
+bool persistent, Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index d14d923..5885720 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -24,6 +24,10 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState 
*bs,
 BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs,
 const char *name);
 void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap);
+int bdrv_dirty_bitmap_set_persistent(BlockDriverState *bs,
+ BdrvDirtyBitmap *bitmap,
+ bool persistent, bool flag_only,
+ Error **errp);
 void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
 void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
-- 
2.4.3




[Qemu-block] [RFC PATCH 01/16] doc: Add QBM format specification

2016-01-26 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 docs/specs/qbm.md | 118 ++
 1 file changed, 118 insertions(+)
 create mode 100644 docs/specs/qbm.md

diff --git a/docs/specs/qbm.md b/docs/specs/qbm.md
new file mode 100644
index 000..b91910b
--- /dev/null
+++ b/docs/specs/qbm.md
@@ -0,0 +1,118 @@
+QEMU Block Bitmap (QBM)
+===
+
+QBM is a multi-file disk format to allow storing persistent block bitmaps along
+with the tracked data image.  A QBM image includes one json descriptor file,
+one data image, one or more bitmap files that describe the block dirty status
+of the data image.
+
+The json file describes the structure of the image. The structure of the json
+descriptor file is:
+
+QBM-JSON-FILE := { "QBM": DESC-JSON }
+
+DESC-JSON := { "version": 1,
+   "image": IMAGE,
+   "BITMAPS": BITMAPS
+ }
+
+Fields in the top level json dictionary are:
+
+@version: An integer which must be 1.
+@image: A dictionary in IMAGE schema, as described later. It provides the
+information of the data image where user data is stored. Its format is
+documented in the "IMAGE schema" section.
+@bitmaps: A dictionary that describes one ore more bitmap files. The keys into
+  the dictionary are the names of bitmap, which must be strings, and
+  each value is a dictionary describing the information of the bitmap,
+  as documented below in the "BITMAP schema" section.
+
+=== IMAGE schema ===
+
+An IMAGE records the information of an image (such as a data image or a backing
+file). It has following fields:
+
+@file: The file name string of the referenced image. If it's a relative path,
+   the file should be found relative to the descriptor file's
+   location.
+@format: The format string of the file.
+
+=== BITMAP schema ===
+
+A BITMAP dictionary records the information of a bitmap (such as a dirty bitmap
+or a block allocation status bitmap). It has following mandatory fields:
+
+@file: The name of the bitmap file. The bitmap file is in little endian, both
+   byte-order-wise and bit-order-wise, which means the LSB in the byte 0
+   corresponds to the first sectors.
+@granularity-bytes: How many bytes of data does one bit in the bitmap track.
+This value must be a power of 2 and no less than 512.
+@type: The type of the bitmap.  Currently only "dirty" and "allocation" are
+   supported.
+   "dirty" indicates a block dirty bitmap; "allocation" indicates a
+   allocation status bitmap. There must be at most one "allocation" bitmap.
+
+If the type of the bitmap is "allocation", an extra field "backing" is also
+accepted:
+
+@backing: a dictionary as specified in the IMAGE schema. It can be used to
+  adding a backing file to raw image.
+
+
+=== Extended fields ===
+
+Implementations are allowed to extend the format schema by inserting additinoal
+members into above dictionaries, with key names that starts with either
+an "ext-hard-" or an "ext-soft-" prefix.
+
+Extended fields prefixed with "ext-soft-" are optional and can be ignored by
+parsers if they do not support it; fields starting with "ext-hard-" are
+mandatory and cannot be ignored, a parser should not proceed parsing the image
+if it does not support it.
+
+It is strongly recommended that the application names are also included in the
+extention name string, such as "ext-hard-qemu-", if the effect or
+interpretation of the field is local to a specific application.
+
+For example, QEMU can implement a "checksum" feature to make sure no files
+referred to by the json descriptor are modified inconsistently, by adding
+"ext-soft-qemu-checksum" fields in "image" and "bitmaps" descriptions, like in
+the json text found below.
+
+=== QBM descriptor file example ===
+
+This is the content of a QBM image's json descriptor file, which contains a
+data image (data.img), and three bitmaps, out of which the "allocation" bitmap
+associates a backing file to this image (base.img).
+
+{ "QBM": {
+"version": 1,
+"image": {
+"file": "data.img",
+"format": "raw"
+"ext-soft-qemu-checksum": "9eff24b72bd693cc8aa3e887141b96f8",
+},
+"bitmaps": {
+"0": {
+"file": "bitmap0.bin",
+"granularity-bytes": 512,
+"type": "dirty"
+},
+"1": {
+"file": "bitmap1.bin",
+"granularity-bytes": 4096,
+"type": "dirty"
+},
+"2": {
+"file": "bitmap3.bin",
+"granularity-bytes": 4096,
+"type": "allocation"
+"backing": {
+"file": "base.img",
+"format": "raw"
+"ext-soft-qemu-checksum": "fcad1f672b2fb19948405e7a1a18c2a7",
+},
+}
+}
+} }
+
-- 
2.4.3




[Qemu-block] [RFC PATCH 05/16] block: Make bdrv_get_cluster_size public

2016-01-26 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 block/io.c| 2 +-
 include/block/block.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index b964e7e..15e461f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -425,7 +425,7 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
 }
 }
 
-static int bdrv_get_cluster_size(BlockDriverState *bs)
+int bdrv_get_cluster_size(BlockDriverState *bs)
 {
 BlockDriverInfo bdi;
 int ret;
diff --git a/include/block/block.h b/include/block/block.h
index b9b30cb..16b7845 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -435,7 +435,7 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors,
 int64_t *cluster_sector_num,
 int *cluster_nb_sectors);
-
+int bdrv_get_cluster_size(BlockDriverState *bs);
 const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
 void bdrv_get_backing_filename(BlockDriverState *bs,
char *filename, int filename_size);
-- 
2.4.3




[Qemu-block] [RFC PATCH 04/16] block: Move filename_decompose to block.c

2016-01-26 Thread Fam Zheng
With the return value decoupled from VMDK, it can be reused by other block
code.

Signed-off-by: Fam Zheng 
---
 block.c   | 40 
 block/vmdk.c  | 40 
 include/block/block.h |  2 ++
 3 files changed, 42 insertions(+), 40 deletions(-)

diff --git a/block.c b/block.c
index fa6ad1d..78db342 100644
--- a/block.c
+++ b/block.c
@@ -144,6 +144,46 @@ int path_is_absolute(const char *path)
 #endif
 }
 
+int filename_decompose(const char *filename, char *path, char *prefix,
+   char *postfix, size_t buf_len, Error **errp)
+{
+const char *p, *q;
+
+if (filename == NULL || !strlen(filename)) {
+error_setg(errp, "No filename provided");
+return -EINVAL;
+}
+p = strrchr(filename, '/');
+if (p == NULL) {
+p = strrchr(filename, '\\');
+}
+if (p == NULL) {
+p = strrchr(filename, ':');
+}
+if (p != NULL) {
+p++;
+if (p - filename >= buf_len) {
+return -EINVAL;
+}
+pstrcpy(path, p - filename + 1, filename);
+} else {
+p = filename;
+path[0] = '\0';
+}
+q = strrchr(p, '.');
+if (q == NULL) {
+pstrcpy(prefix, buf_len, p);
+postfix[0] = '\0';
+} else {
+if (q - p >= buf_len) {
+return -EINVAL;
+}
+pstrcpy(prefix, q - p + 1, p);
+pstrcpy(postfix, buf_len, q);
+}
+return 0;
+}
+
 /* if filename is absolute, just copy it to dest. Otherwise, build a
path to it by considering it is relative to base_path. URL are
supported. */
diff --git a/block/vmdk.c b/block/vmdk.c
index f8f7fcf..505e0c2 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1764,46 +1764,6 @@ exit:
 return ret;
 }
 
-static int filename_decompose(const char *filename, char *path, char *prefix,
-  char *postfix, size_t buf_len, Error **errp)
-{
-const char *p, *q;
-
-if (filename == NULL || !strlen(filename)) {
-error_setg(errp, "No filename provided");
-return VMDK_ERROR;
-}
-p = strrchr(filename, '/');
-if (p == NULL) {
-p = strrchr(filename, '\\');
-}
-if (p == NULL) {
-p = strrchr(filename, ':');
-}
-if (p != NULL) {
-p++;
-if (p - filename >= buf_len) {
-return VMDK_ERROR;
-}
-pstrcpy(path, p - filename + 1, filename);
-} else {
-p = filename;
-path[0] = '\0';
-}
-q = strrchr(p, '.');
-if (q == NULL) {
-pstrcpy(prefix, buf_len, p);
-postfix[0] = '\0';
-} else {
-if (q - p >= buf_len) {
-return VMDK_ERROR;
-}
-pstrcpy(prefix, q - p + 1, p);
-pstrcpy(postfix, buf_len, q);
-}
-return VMDK_OK;
-}
-
 static int vmdk_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 int idx = 0;
diff --git a/include/block/block.h b/include/block/block.h
index bfb76f8..b9b30cb 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -449,6 +449,8 @@ int bdrv_is_snapshot(BlockDriverState *bs);
 
 int path_has_protocol(const char *path);
 int path_is_absolute(const char *path);
+int filename_decompose(const char *filename, char *path, char *prefix,
+   char *postfix, size_t buf_len, Error **errp);
 void path_combine(char *dest, int dest_size,
   const char *base_path,
   const char *filename);
-- 
2.4.3




[Qemu-block] [RFC PATCH 02/16] block: Set dirty before doing write

2016-01-26 Thread Fam Zheng
So that driver can write the dirty bits into persistent dirty bitmaps in
the write callback.

Signed-off-by: Fam Zheng 
---
 block/io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 343ff1f..b964e7e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1164,6 +1164,8 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 }
 }
 
+bdrv_set_dirty(bs, sector_num, nb_sectors);
+
 if (ret < 0) {
 /* Do nothing, write notifier decided to fail this request */
 } else if (flags & BDRV_REQ_ZERO_WRITE) {
@@ -1179,8 +1181,6 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 ret = bdrv_co_flush(bs);
 }
 
-bdrv_set_dirty(bs, sector_num, nb_sectors);
-
 if (bs->wr_highest_offset < offset + bytes) {
 bs->wr_highest_offset = offset + bytes;
 }
-- 
2.4.3




[Qemu-block] [RFC PATCH 03/16] block: Allow .bdrv_close callback to release dirty bitmaps

2016-01-26 Thread Fam Zheng
If the driver owns some dirty bitmaps, this assertion will fail.

The correct place to release them is in bdrv_close, so move the
assertion one line down.

Signed-off-by: Fam Zheng 
---
 block.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index afb71c0..fa6ad1d 100644
--- a/block.c
+++ b/block.c
@@ -2348,10 +2348,11 @@ static void bdrv_delete(BlockDriverState *bs)
 assert(!bs->job);
 assert(bdrv_op_blocker_is_empty(bs));
 assert(!bs->refcnt);
-assert(QLIST_EMPTY(&bs->dirty_bitmaps));
 
 bdrv_close(bs);
 
+assert(QLIST_EMPTY(&bs->dirty_bitmaps));
+
 /* remove from list, if necessary */
 bdrv_make_anon(bs);
 
-- 
2.4.3




[Qemu-block] [RFC PATCH 00/16] Qemu Bit Map (QBM) - an overlay format for persistent dirty bitmap

2016-01-26 Thread Fam Zheng
Hi all,

This series introduces a simple format to enable support of persistence of
block dirty bitmaps. Block dirty bitmap is the tool to achieve incremental
backup, and persistence of block dirty bitmap makes incrememtal backup possible
across VM shutdowns, where existing in-memory dirty bitmaps cannot survive.

When user creates a "persisted" dirty bitmap, the QBM driver will create a
binary file and synchronize it with the existing in-memory block dirty bitmap
(BdrvDirtyBitmap). When the VM is powered down, the binary file has all the
bits saved on disk, which will be loaded and used to initialize the in-memory
block dirty bitmap next time the guest is started.

The idea of the format is to reuse as much existing infrastructure as possible
and avoid introducing complex data structures - it works with any image format,
by gluing it together plain bitmap files with a json descriptor file. The
advantage of this approach over extending existing formats, such as qcow2, is
that the new feature is implemented by an orthogonal driver, in a format
agnostic way. This way, even raw images can have their persistent dirty
bitmaps.  (And you will notice in this series, with a little forging to the
spec, raw images can also have backing files through a QBM overlay!)

Rather than superseding it, this intends to be coexistent in parallel with the
qcow2 bitmap extension that Vladimir is working on.  The block driver interface
changes in this series also try to be generic and compatible for both drivers.

The format's specification is added to docs/specs/, see patch 1.

Patches 2-7 are necessary block layer changes in order be friendly to
persistent dirty bitmap drivers.

Patches 8, 9 and 11 extends the QMP interface to expose the added feature.

Patch 10 implements the driver. (todo: checksum extension for image/bitmap
integrity check)

Patch 12 - 16 are the tests I have for QBM so far. I'm sure more can be added
as questions emerge. :)

The series applies on top of my "qemu-img map" series and "meta bitmap" series:

https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04866.html
https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg03656.html

If you feel like to play with it, git branch is also available at:

https://github.com/famz/qemu qbm

Comments are welcome!

Fam


Fam Zheng (16):
  doc: Add QBM format specification
  block: Set dirty before doing write
  block: Allow .bdrv_close callback to release dirty bitmaps
  block: Move filename_decompose to block.c
  block: Make bdrv_get_cluster_size public
  block: Introduce bdrv_dirty_bitmap_set_persistent
  block: Only swap non-persistent dirty bitmaps
  qmp: Add optional parameter "persistent" in block-dirty-bitmap-add
  qmp: Add block-dirty-bitmap-set-persistent
  qbm: Implement format driver
  qapi: Add "qbm" as a generic cow format driver
  iotests: Add qbm format to 041
  iotests: Add qbm to case 097
  iotests: Add qbm to applicable test cases
  iotests: Add qbm specific test case 140
  iotests: Add persistent bitmap test case 141

 block.c  |   54 +-
 block/Makefile.objs  |1 +
 block/dirty-bitmap.c |   63 ++
 block/io.c   |6 +-
 block/qbm.c  | 1315 ++
 block/vmdk.c |   40 --
 blockdev.c   |   28 +-
 docs/specs/qbm.md|  118 
 include/block/block.h|4 +-
 include/block/block_int.h|8 +
 include/block/dirty-bitmap.h |5 +
 qapi/block-core.json |   31 +-
 qmp-commands.hx  |   34 +-
 tests/qemu-iotests/004   |2 +-
 tests/qemu-iotests/017   |2 +-
 tests/qemu-iotests/018   |2 +-
 tests/qemu-iotests/019   |2 +-
 tests/qemu-iotests/020   |2 +-
 tests/qemu-iotests/024   |2 +-
 tests/qemu-iotests/025   |2 +-
 tests/qemu-iotests/027   |2 +-
 tests/qemu-iotests/028   |2 +-
 tests/qemu-iotests/030   |2 +-
 tests/qemu-iotests/034   |2 +-
 tests/qemu-iotests/037   |2 +-
 tests/qemu-iotests/038   |2 +-
 tests/qemu-iotests/040   |2 +-
 tests/qemu-iotests/041   |   18 +-
 tests/qemu-iotests/050   |2 +-
 tests/qemu-iotests/055   |2 +-
 tests/qemu-iotests/056   |2 +-
 tests/qemu-iotests/069   |2 +-
 tests/qemu-iotests/072   |2 +-
 tests/qemu-iotests/086   |2 +-
 tests/qemu-iotests/095   |2 +-
 tests/qemu-iotests/096   |2 +-
 tests/qemu-iotests/097   |4 +-
 tests/qemu-iotests/099   |2 +-
 tests/qemu-iotests/110   |2 +-
 tests/qemu-iotests/129   |2 +-
 tests/qemu-iotests/132   |2 +-
 tests/qemu-iotests/139   |2 +-
 tests/qemu-iotests/140   |   80 +++
 tests/qemu-iotests/140.out   |  145 +
 tests/qemu-iotests/141   |   62 ++
 tests/qemu-iotests/141.out   |5 +
 tests/qemu-iotests/common|6 +
 tests/qemu-iotests/group |2 

Re: [Qemu-block] [PATCH 0/8] nbd: Fix failed assertion on negotiation error

2016-01-26 Thread Kevin Wolf
Am 25.01.2016 um 19:41 hat Max Reitz geschrieben:
> An error during negotiation, e.g. by the client trying to open an export
> that does not exist, should not lead to a crash of the server process.
> 
> The middle six patches of this series are taken from my series
> "block: Rework bdrv_close_all()", so here is a git-backport-diff against
> v7 of that series:

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH 1/8] nbd: client_close on error in nbd_co_client_start

2016-01-26 Thread Paolo Bonzini


On 26/01/2016 10:32, Kevin Wolf wrote:
> Am 25.01.2016 um 19:41 hat Max Reitz geschrieben:
>> > Use client_close() if an error in nbd_co_client_start() occurs instead
>> > of manually inlining parts of it. This fixes an assertion error on the
>> > server side if nbd_negotiate() fails.
>> > 
>> > Signed-off-by: Max Reitz 
> Paolo, if you can Ack this one, I can take the series through my tree.

Of course.  I had the same patch queued from Daniel.

Acked-by: Paolo Bonzini 

Paolo



Re: [Qemu-block] [PATCH 1/8] nbd: client_close on error in nbd_co_client_start

2016-01-26 Thread Kevin Wolf
Am 25.01.2016 um 19:41 hat Max Reitz geschrieben:
> Use client_close() if an error in nbd_co_client_start() occurs instead
> of manually inlining parts of it. This fixes an assertion error on the
> server side if nbd_negotiate() fails.
> 
> Signed-off-by: Max Reitz 

Paolo, if you can Ack this one, I can take the series through my tree.

Kevin

>  nbd/server.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 2265cb0..5169b59 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1080,8 +1080,7 @@ static coroutine_fn void nbd_co_client_start(void 
> *opaque)
>  nbd_export_get(exp);
>  }
>  if (nbd_negotiate(data)) {
> -shutdown(client->sock, 2);
> -client->close(client);
> +client_close(client);
>  goto out;
>  }
>  qemu_co_mutex_init(&client->send_lock);
> -- 
> 2.7.0
> 



Re: [Qemu-block] [Qemu-devel] [PATCH] iotests: Limit supported formats for 118

2016-01-26 Thread Kevin Wolf
Am 26.01.2016 um 08:09 hat Markus Armbruster geschrieben:
> Max Reitz  writes:
> 
> > Image formats used in test 118 need to support image creation.
> >
> > Reported-by: Markus Armbruster 
> > Signed-off-by: Max Reitz 
> > ---
> >  tests/qemu-iotests/118 | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/tests/qemu-iotests/118 b/tests/qemu-iotests/118
> > index 114d0e2..beb69d0 100755
> > --- a/tests/qemu-iotests/118
> > +++ b/tests/qemu-iotests/118
> > @@ -717,4 +717,6 @@ if __name__ == '__main__':
> >  # We need floppy and IDE CD-ROM
> >  iotests.notrun('not suitable for this machine type: %s' %
> > iotests.qemu_default_machine)
> > -iotests.main()
> > +# Need to support image creation
> > +iotests.main(supported_fmts=['vpc', 'parallels', 'qcow', 'vdi', 
> > 'qcow2',
> > + 'vmdk', 'raw', 'vhdx', 'qed'])
> 
> Reviewed-by: Markus Armbruster 

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH] vmdk: Fix converting to streamOptimized

2016-01-26 Thread Kevin Wolf
Am 26.01.2016 um 04:16 hat Fam Zheng geschrieben:
> On Mon, 01/25 12:16, Kevin Wolf wrote:
> > Am 25.01.2016 um 03:26 hat Fam Zheng geschrieben:
> > > Commit d62d9dc4b8 lifted streamOptimized images's version to 3, but we
> > > now refuse to open version 3 images read-write.  We need to make
> > > streamOptimized an exception to allow converting to it. This fixes the
> > > accidentally broken iotests case 059 for the same reason.
> > > 
> > > Signed-off-by: Fam Zheng 
> > 
> > How different are version 3 images for other subformats? Are we
> > arbitrarily restrictring their use or is it really that they don't work
> > with our driver? And if they don't work with our driver, are we sure
> > that streamOptimized images can't use the features we don't support?
> > 
> > Or is the version defined per subformat and doesn't necessarily exist
> > for other types?
> 
> Version 3 images are undocumented except in the VMware KB article mentioned in
> the comment around this line (http://kb.vmware.com/kb/2064959). A few years
> ago, when users complained that QEMU doesn't support version 3 images, we
> presumed from the article that reading is okay, as the new feature is
> "persistent changed block tracking" (although it didn't say it is the only
> feature enabled by version 3), and went ahead enabling it.
> 
> This time, it seems newer VMware products only accept version 3 if the
> subformat is streamOptimized.  Again, without any documentation/specification
> update. Then our users complains again, so we add another exception to
> mitigate. As this subformat doesn't allow overwrite, the only use case is
> qemu-img converting to it.  So this is pretty safe - it's always operating a
> new image - and the approach is tested by multiple users (both upstream and
> downstream).

I see. Then I guess we can't do much else.

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH v2 08/13] block: Support meta dirty bitmap

2016-01-26 Thread Fam Zheng
On Tue, 01/26 10:49, Vladimir Sementsov-Ogievskiy wrote:
> On 26.01.2016 09:25, Fam Zheng wrote:
> >On Fri, 01/22 15:05, Vladimir Sementsov-Ogievskiy wrote:
> >>>In my migration series I need iterators, get granularity, and
> >>>something like hbitmap_count  for meta bitmaps. You can add them
> >>>here if you want, or I can add them in my series.
> >Okay, I can add that.
> >
> >I have one more question on the interface: what dirty bitmaps are going to be
> >migrated? At this moment there are dirty bitmaps created by block jobs
> >(drive-mirror), and in-memory dirty bitmaps created for incremental backup;
> >later there will be persistent dirty bitmaps owned by block drivers (extended
> >version of qcow2, and in QBM driver I'm working on). Which of them are 
> >subject
> >of migration in your series?
> >
> >I'm asking because I want to know whether we need to implement multiple meta
> >bitmaps on one block dirty bitmap.
> >
> >Fam
> Only named bitmaps are migrated. For now, only qmp-created bitmaps
> are named. So, it can be in-memory dirty bitmaps or, in future,
> persistent dirty bitmaps.
> 
> Why multiple meta bitmaps?

The complication is from combining persistence and migration of a dirty bitmap.

To begin with, persistence drivers (qcow2 or QBM) would need meta dirty bitmaps
to avoid writing unchanged dirty bit ranges to disk, just like in migration.
This means if the persisted named dirty bitmap is being migrated, both the
block driver and migration code will then need their own meta dirty bitmaps,
that is where the question rose.

One step back, we haven't sorted out "migrating persistent dirty bitmap" at all
(see below). So it's probably okay to disallow that as the first step.

P.S. For discussion, I think we ultimately want users to be able to continue
their incremental backup even across the migration point.  If they're using
shared storage, I think it is possible even without dirty bitmap migration, by
flushing the dirty bitmap at src side then loading it at dest side. Otherwise
it is trickier as we will have to migrate the persisted dirty bitmap - the dest
side must use a capable format, and we need to check this capability when
migration starts.   By that time, the meta dirty bitmap interface can be
extended to allow at least two meta bitmaps on the same dirty bitmap.

Fam