date:20150513

[Qemu-block] [PATCH v2 10/11] blockdev: Block device IO during blockdev-backup transaction

2015-05-13 Thread Fam Zheng

Signed-off-by: Fam Zheng f...@redhat.com
---
 blockdev.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index ae52d27..bd28183 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1649,6 +1649,7 @@ typedef struct BlockdevBackupState {
 BlockDriverState *bs;
 BlockJob *job;
 AioContext *aio_context;
+Error *blocker;
 } BlockdevBackupState;
 
 static void blockdev_backup_prepare(BlkTransactionState *common, Error **errp)
@@ -1685,6 +1686,10 @@ static void blockdev_backup_prepare(BlkTransactionState 
*common, Error **errp)
 }
 aio_context_acquire(state-aio_context);
 
+state-bs = bs;
+error_setg(state-blocker, blockdev-backup in progress);
+bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, state-blocker);
+
 qmp_blockdev_backup(backup-device, backup-target,
 backup-sync,
 backup-has_speed, backup-speed,
@@ -1696,7 +1701,6 @@ static void blockdev_backup_prepare(BlkTransactionState 
*common, Error **errp)
 return;
 }
 
-state-bs = bs;
 state-job = state-bs-job;
 }
 
@@ -1715,6 +1719,10 @@ static void blockdev_backup_clean(BlkTransactionState 
*common)
 {
 BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, 
common);
 
+if (state-bs) {
+bdrv_op_unblock(state-bs, BLOCK_OP_TYPE_DEVICE_IO, state-blocker);
+error_free(state-blocker);
+}
 if (state-aio_context) {
 aio_context_release(state-aio_context);
 }
-- 
2.4.0

Re: [Qemu-block] [PATCH 14/34] qcow2: Factor out qcow2_update_options()

2015-05-13 Thread Max Reitz


On 08.05.2015 19:21, Kevin Wolf wrote:

Eventually we want to be able to change options at runtime. As a first
step towards that goal, separate some option handling code from the
general initialisation code in qcow2_open().

Signed-off-by: Kevin Wolf kw...@redhat.com
---
  block/qcow2.c | 135 +-
  1 file changed, 76 insertions(+), 59 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index b9a72e3..db535d4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c


[snip]


@@ -549,8 +623,6 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, 
int flags,
  Error *local_err = NULL;
  uint64_t ext_end;
  uint64_t l1_vm_state_index;
-const char *opt_overlap_check, *opt_overlap_check_template;
-int overlap_check_template = 0;
  uint64_t l2_cache_size, refcount_cache_size;
  
  ret = bdrv_pread(bs-file, 0, header, sizeof(header));

@@ -924,69 +996,14 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
  }
  
  /* Enable lazy_refcounts according to image and command line options */


[...]


+ret = qcow2_update_options(bs, opts, flags, errp);
+if (ret  0) {


The comment looks a bit strange now, because qcow2_update_options() 
doesn't update just lazy_refcounts.


Max

[Qemu-block] [PATCH v2 00/11] Fix transactional snapshot with virtio-blk dataplane and NBD export

2015-05-13 Thread Fam Zheng

Changes from RFC:
  - Add op blocker listener in nbd server.
  - Add other transaction types.
  - Only notify listeners when changing from/to empty. (Paolo)

Reported by Paolo.

Unlike the iohandler in main loop, iothreads currently process the event
notifier used by virtio-blk ioeventfd in nested aio_poll. This is dangerous
without proper protection, because guest requests could sneak to block layer
where they mustn't.

For example, a QMP transaction may involve multiple bdrv_drain_all() in
handling the list of AioContext it works on. If an aio_poll in one of the
bdrv_drain_all() happens to process a guest VQ kick, and dispatches the
ioeventfd event to virtio-blk, a new guest write is then submitted, and voila,
the transaction semantics is violated.

This series avoids this problem by disabling virtio-blk handlers during
bdrv_drain_all() and transactions.

- Patches 1~3 add the block layer op blocker change notifier code.
- Patches 4,5 secure virtio-blk dataplane.
- Patch 6 secures nbd export.
- Patch 7~10 protect each transaction type from being voilated by new IO
  generated in nested aio_poll.
- Patch 11 protects bdrv_drain and bdrv_drain_all.

Notes:

virtio-scsi-dataplane will be a bit more complicated, but still doable.  It
would probably need one more interface abstraction between scsi-disk, scsi-bus
and virtio-scsi.

Although other devices don't have a pause/resume callback yet, the
blk_check_request, which returns -EBUSY if device io op blocker is set, could
hopefully cover most cases already.

Timers and block jobs also generate IO, but it should be fine as long as they
don't change guest visible data, which is true AFAICT. Besides, bdrv_drain_all
already pauses block jobs.



Fam Zheng (11):
  block: Add op blocker type device IO
  block: Add op blocker notifier list
  block-backend: Add blk_op_blocker_add_notifier
  virtio-blk: Move complete_request to 'ops' structure
  virtio-blk: Don't handle output when there is device IO op blocker
  nbd-server: Clear can_read when device io blocker is set
  blockdev: Block device IO during internal snapshot transaction
  blockdev: Block device IO during external snapshot transaction
  blockdev: Block device IO during drive-backup transaction
  blockdev: Block device IO during blockdev-backup transaction
  block: Block device IO during bdrv_drain and bdrv_drain_all

 block.c | 28 +
 block/block-backend.c   | 10 ++
 block/io.c  | 12 +++
 blockdev.c  | 49 -
 hw/block/dataplane/virtio-blk.c | 36 ++---
 hw/block/virtio-blk.c   | 69 +++--
 include/block/block.h   |  9 ++
 include/block/block_int.h   |  3 ++
 include/hw/virtio/virtio-blk.h  | 17 --
 include/sysemu/block-backend.h  |  2 ++
 nbd.c   | 18 +++
 11 files changed, 235 insertions(+), 18 deletions(-)

-- 
2.4.0

[Qemu-block] [PATCH v2 04/11] virtio-blk: Move complete_request to 'ops' structure

2015-05-13 Thread Fam Zheng

Should more ops be added to differentiate code between dataplane and
non-dataplane, the new saved_ops approach will be cleaner than messing
with N pointers.

Signed-off-by: Fam Zheng f...@redhat.com
---
 hw/block/dataplane/virtio-blk.c | 13 -
 hw/block/virtio-blk.c   |  8 ++--
 include/hw/virtio/virtio-blk.h  |  9 +++--
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 3db139b..ec0c8f4 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -51,8 +51,7 @@ struct VirtIOBlockDataPlane {
 
 /* Operation blocker on BDS */
 Error *blocker;
-void (*saved_complete_request)(struct VirtIOBlockReq *req,
-   unsigned char status);
+const VirtIOBlockOps *saved_ops;
 };
 
 /* Raise an interrupt to signal guest, if necessary */
@@ -88,6 +87,10 @@ static void complete_request_vring(VirtIOBlockReq *req, 
unsigned char status)
 qemu_bh_schedule(s-bh);
 }
 
+static const VirtIOBlockOps virtio_blk_data_plane_ops = {
+.complete_request = complete_request_vring,
+};
+
 static void handle_notify(EventNotifier *e)
 {
 VirtIOBlockDataPlane *s = container_of(e, VirtIOBlockDataPlane,
@@ -269,8 +272,8 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
 }
 s-host_notifier = *virtio_queue_get_host_notifier(vq);
 
-s-saved_complete_request = vblk-complete_request;
-vblk-complete_request = complete_request_vring;
+s-saved_ops = vblk-ops;
+vblk-ops = virtio_blk_data_plane_ops;
 
 s-starting = false;
 s-started = true;
@@ -313,7 +316,7 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
 return;
 }
 s-stopping = true;
-vblk-complete_request = s-saved_complete_request;
+vblk-ops = s-saved_ops;
 trace_virtio_blk_data_plane_stop(s);
 
 aio_context_acquire(s-ctx);
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e6afe97..f4a9d19 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -59,9 +59,13 @@ static void virtio_blk_complete_request(VirtIOBlockReq *req,
 virtio_notify(vdev, s-vq);
 }
 
+static const VirtIOBlockOps virtio_blk_ops = (VirtIOBlockOps) {
+.complete_request = virtio_blk_complete_request,
+};
+
 static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
 {
-req-dev-complete_request(req, status);
+req-dev-ops-complete_request(req, status);
 }
 
 static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
@@ -905,7 +909,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 s-sector_mask = (s-conf.conf.logical_block_size / BDRV_SECTOR_SIZE) - 1;
 
 s-vq = virtio_add_queue(vdev, 128, virtio_blk_handle_output);
-s-complete_request = virtio_blk_complete_request;
+s-ops = virtio_blk_ops;
 virtio_blk_data_plane_create(vdev, conf, s-dataplane, err);
 if (err != NULL) {
 error_propagate(errp, err);
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 6bf5905..28b3436 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -44,6 +44,12 @@ struct VirtIOBlkConf
 struct VirtIOBlockDataPlane;
 
 struct VirtIOBlockReq;
+
+typedef struct {
+/* Function to push to vq and notify guest */
+void (*complete_request)(struct VirtIOBlockReq *req, unsigned char status);
+} VirtIOBlockOps;
+
 typedef struct VirtIOBlock {
 VirtIODevice parent_obj;
 BlockBackend *blk;
@@ -54,8 +60,7 @@ typedef struct VirtIOBlock {
 unsigned short sector_mask;
 bool original_wce;
 VMChangeStateEntry *change;
-/* Function to push to vq and notify guest */
-void (*complete_request)(struct VirtIOBlockReq *req, unsigned char status);
+const VirtIOBlockOps *ops;
 Notifier migration_state_notifier;
 struct VirtIOBlockDataPlane *dataplane;
 } VirtIOBlock;
-- 
2.4.0

Re: [Qemu-block] [PATCH v2 05/11] virtio-blk: Don't handle output when there is device IO op blocker

2015-05-13 Thread Fam Zheng

On Wed, 05/13 12:26, Paolo Bonzini wrote:
 
 
 On 13/05/2015 19:28, Fam Zheng wrote:
  +static void virtio_blk_data_plane_pause(VirtIOBlock *vblk)
  +{
  +VirtIOBlockDataPlane *s = vblk-dataplane;
  +
  +event_notifier_test_and_clear(s-host_notifier);
  +aio_set_event_notifier(s-ctx, s-host_notifier, NULL);
  +}
  +
  +static void handle_notify(EventNotifier *e);
  +static void virtio_blk_data_plane_resume(VirtIOBlock *vblk)
  +{
  +VirtIOBlockDataPlane *s = vblk-dataplane;
  +
  +aio_set_event_notifier(s-ctx, s-host_notifier, handle_notify);
  +
  +event_notifier_set(s-host_notifier);
  +}
 
 Perhaps add a note that these are called under aio_context_acquire?
 

OK, good idea.

Fam

Re: [Qemu-block] [Qemu-devel] [PATCH 20/34] qcow2: Support updating driver-specific options in reopen

2015-05-13 Thread Kevin Wolf

Am 12.05.2015 um 23:47 hat Eric Blake geschrieben:
 On 05/08/2015 11:21 AM, Kevin Wolf wrote:
  For updating the cache sizes or disabling lazy refcounts there is a bit
  more to do than just changing the variables, but otherwise we're all set
  for changing options during bdrv_reopen().
  
  Just implement the missing pieces and hook the functions up in
  bdrv_reopen().
  
  Signed-off-by: Kevin Wolf kw...@redhat.com
  ---
   block/qcow2.c | 70 
  ++-
   1 file changed, 64 insertions(+), 6 deletions(-)
  
 
  -/* We have no actual commit/abort logic for qcow2, but we need to write 
  out any
  - * unwritten data if we reopen read-only. */
   static int qcow2_reopen_prepare(BDRVReopenState *state,
   BlockReopenQueue *queue, Error **errp)
   {
  +Qcow2ReopenState *r;
   int ret;
   
  +r = g_new0(Qcow2ReopenState, 1);
  +state-opaque = r;
  +
  +ret = qcow2_update_options_prepare(state-bs, r, state-options,
  +   state-flags, errp);
  +if (ret  0) {
  +goto fail;
  +}
  +
  +/* We need to write out any unwritten data if we reopen read-only. */
   if ((state-flags  BDRV_O_RDWR) == 0) {
   ret = bdrv_flush(state-bs);
   if (ret  0) {
  -return ret;
  +goto fail;
   }
   
   ret = qcow2_mark_clean(state-bs);
   if (ret  0) {
  -return ret;
  +goto fail;
   }
   }
   
   return 0;
  +
  +fail:
  +qcow2_update_options_abort(state-bs, r);
  +return ret;
 
 Doesn't this leak r?  That is, you only free r if _commit or _abort is
 reached, but my understanding of transaction semantics is that we only
 guarantee that one of those is reached if _prepare succeeded.

Yes, it does. Thanks for catching that.

Kevin


pgp8BH9lG0TJO.pgp
Description: PGP signature

Re: [Qemu-block] [PATCH v2 05/11] virtio-blk: Don't handle output when there is device IO op blocker

2015-05-13 Thread Paolo Bonzini



On 13/05/2015 19:28, Fam Zheng wrote:
 +static void virtio_blk_data_plane_pause(VirtIOBlock *vblk)
 +{
 +VirtIOBlockDataPlane *s = vblk-dataplane;
 +
 +event_notifier_test_and_clear(s-host_notifier);
 +aio_set_event_notifier(s-ctx, s-host_notifier, NULL);
 +}
 +
 +static void handle_notify(EventNotifier *e);
 +static void virtio_blk_data_plane_resume(VirtIOBlock *vblk)
 +{
 +VirtIOBlockDataPlane *s = vblk-dataplane;
 +
 +aio_set_event_notifier(s-ctx, s-host_notifier, handle_notify);
 +
 +event_notifier_set(s-host_notifier);
 +}

Perhaps add a note that these are called under aio_context_acquire?

Paolo

Re: [Qemu-block] [PATCH v2 11/11] block: Block device IO during bdrv_drain and bdrv_drain_all

2015-05-13 Thread Paolo Bonzini



On 13/05/2015 19:28, Fam Zheng wrote:
 We don't want new requests from guest, so block the operation around the
 nested poll.
 
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  block/io.c | 12 
  1 file changed, 12 insertions(+)
 
 diff --git a/block/io.c b/block/io.c
 index 1ce62c4..d369de3 100644
 --- a/block/io.c
 +++ b/block/io.c
 @@ -289,9 +289,15 @@ static bool bdrv_drain_one(BlockDriverState *bs)
   */
  void bdrv_drain(BlockDriverState *bs)
  {
 +Error *blocker = NULL;
 +
 +error_setg(blocker, bdrv_drain in progress);
 +bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
  while (bdrv_drain_one(bs)) {
  /* Keep iterating */
  }
 +bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
 +error_free(blocker);
  }
  
  /*
 @@ -311,6 +317,9 @@ void bdrv_drain_all(void)
  /* Always run first iteration so any pending completion BHs run */
  bool busy = true;
  BlockDriverState *bs = NULL;
 +Error *blocker = NULL;
 +
 +error_setg(blocker, bdrv_drain_all in progress);
  
  while ((bs = bdrv_next(bs))) {
  AioContext *aio_context = bdrv_get_aio_context(bs);
 @@ -319,6 +328,7 @@ void bdrv_drain_all(void)
  if (bs-job) {
  block_job_pause(bs-job);
  }
 +bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
  aio_context_release(aio_context);
  }
  
 @@ -343,8 +353,10 @@ void bdrv_drain_all(void)
  if (bs-job) {
  block_job_resume(bs-job);
  }
 +bdrv_op_unblock(bs, BLOCK_OP_TYPE_DEVICE_IO, blocker);
  aio_context_release(aio_context);
  }
 +error_free(blocker);
  }
  
  /**
 

I think this isn't enough.  It's the callers of bdrv_drain and
bdrv_drain_all that need to block before drain and unblock before
aio_context_release.

Paolo

Re: [Qemu-block] [PATCH 14/34] qcow2: Factor out qcow2_update_options()

2015-05-13 Thread Max Reitz


On 08.05.2015 19:21, Kevin Wolf wrote:

Eventually we want to be able to change options at runtime. As a first
step towards that goal, separate some option handling code from the
general initialisation code in qcow2_open().

Signed-off-by: Kevin Wolf kw...@redhat.com
---
  block/qcow2.c | 135 +-
  1 file changed, 76 insertions(+), 59 deletions(-)


Now seeing that the comment I was complaining about is removed in patch 16:

Reviewed-by: Max Reitz mre...@redhat.com

Re: [Qemu-block] [Qemu-devel] [PATCH v2 01/11] block: Add op blocker type device IO

2015-05-13 Thread Wen Congyang

On 05/14/2015 01:28 AM, Fam Zheng wrote:
 Preventing device from submitting IO is useful around various nested
 aio_poll. Op blocker is a good place to put this flag.
 
 Devices would submit IO requests through blk_* block backend interface,
 which calls blk_check_request to check the validity. Return -EBUSY if
 the operation is blocked, which means device IO is temporarily disabled
 by another BDS user.
 
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  block/block-backend.c | 4 
  include/block/block.h | 1 +
  2 files changed, 5 insertions(+)
 
 diff --git a/block/block-backend.c b/block/block-backend.c
 index 93e46f3..71fc695 100644
 --- a/block/block-backend.c
 +++ b/block/block-backend.c
 @@ -478,6 +478,10 @@ static int blk_check_request(BlockBackend *blk, int64_t 
 sector_num,
  return -EIO;
  }
  
 +if (bdrv_op_is_blocked(blk-bs, BLOCK_OP_TYPE_DEVICE_IO, NULL)) {
 +return -EBUSY;
 +}
 +

The guest doesn't know this status, so how do we prevent it? If it
is virtio disk, you wait it, but if it is the other disks, what is
the behavior?

Thanks
Wen Congyang

  return blk_check_byte_request(blk, sector_num * BDRV_SECTOR_SIZE,
nb_sectors * BDRV_SECTOR_SIZE);
  }
 diff --git a/include/block/block.h b/include/block/block.h
 index 7d1a717..906fb31 100644
 --- a/include/block/block.h
 +++ b/include/block/block.h
 @@ -159,6 +159,7 @@ typedef enum BlockOpType {
  BLOCK_OP_TYPE_RESIZE,
  BLOCK_OP_TYPE_STREAM,
  BLOCK_OP_TYPE_REPLACE,
 +BLOCK_OP_TYPE_DEVICE_IO,
  BLOCK_OP_TYPE_MAX,
  } BlockOpType;

[Qemu-block] [PATCH v2 08/11] blockdev: Block device IO during external snapshot transaction

2015-05-13 Thread Fam Zheng

Signed-off-by: Fam Zheng f...@redhat.com
---
 blockdev.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 7f763d9..923fc90 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1404,6 +1404,7 @@ typedef struct ExternalSnapshotState {
 BlockDriverState *old_bs;
 BlockDriverState *new_bs;
 AioContext *aio_context;
+Error *blocker;
 } ExternalSnapshotState;
 
 static void external_snapshot_prepare(BlkTransactionState *common,
@@ -1525,6 +1526,9 @@ static void external_snapshot_prepare(BlkTransactionState 
*common,
 if (ret != 0) {
 error_propagate(errp, local_err);
 }
+
+error_setg(state-blocker, snapshot in progress);
+bdrv_op_block(state-old_bs, BLOCK_OP_TYPE_DEVICE_IO, state-blocker);
 }
 
 static void external_snapshot_commit(BlkTransactionState *common)
@@ -1541,8 +1545,6 @@ static void external_snapshot_commit(BlkTransactionState 
*common)
  * don't want to abort all of them if one of them fails the reopen */
 bdrv_reopen(state-new_bs, state-new_bs-open_flags  ~BDRV_O_RDWR,
 NULL);
-
-aio_context_release(state-aio_context);
 }
 
 static void external_snapshot_abort(BlkTransactionState *common)
@@ -1552,6 +1554,17 @@ static void external_snapshot_abort(BlkTransactionState 
*common)
 if (state-new_bs) {
 bdrv_unref(state-new_bs);
 }
+}
+
+static void external_snapshot_clean(BlkTransactionState *common)
+{
+ExternalSnapshotState *state =
+ DO_UPCAST(ExternalSnapshotState, common, common);
+
+if (state-old_bs) {
+bdrv_op_unblock(state-old_bs, BLOCK_OP_TYPE_DEVICE_IO, 
state-blocker);
+error_free(state-blocker);
+}
 if (state-aio_context) {
 aio_context_release(state-aio_context);
 }
@@ -1716,6 +1729,7 @@ static const BdrvActionOps actions[] = {
 .prepare  = external_snapshot_prepare,
 .commit   = external_snapshot_commit,
 .abort = external_snapshot_abort,
+.clean = external_snapshot_clean,
 },
 [TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = {
 .instance_size = sizeof(DriveBackupState),
-- 
2.4.0

[Qemu-block] [PATCH v2 06/11] nbd-server: Clear can_read when device io blocker is set

2015-05-13 Thread Fam Zheng

So that NBD export cannot submit IO during bdrv_drain_all().

Signed-off-by: Fam Zheng f...@redhat.com
---
 nbd.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/nbd.c b/nbd.c
index 06b501b..7d9d3e4 100644
--- a/nbd.c
+++ b/nbd.c
@@ -160,6 +160,8 @@ struct NBDExport {
 uint32_t nbdflags;
 QTAILQ_HEAD(, NBDClient) clients;
 QTAILQ_ENTRY(NBDExport) next;
+Notifier blocker_notifier;
+bool io_blocked;
 
 AioContext *ctx;
 };
@@ -1053,6 +1055,16 @@ static void blk_aio_detach(void *opaque)
 exp-ctx = NULL;
 }
 
+static void nbd_op_blocker_changed(Notifier *notifier, void *data)
+{
+BlockOpEvent *event = data;
+NBDExport *exp = container_of(notifier, NBDExport, blocker_notifier);
+if (event-type != BLOCK_OP_TYPE_DEVICE_IO) {
+return;
+}
+exp-io_blocked = event-blocking;
+}
+
 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
   uint32_t nbdflags, void (*close)(NBDExport *),
   Error **errp)
@@ -1081,6 +1093,8 @@ NBDExport *nbd_export_new(BlockBackend *blk, off_t 
dev_offset, off_t size,
  * access since the export could be available before migration handover.
  */
 blk_invalidate_cache(blk, NULL);
+exp-blocker_notifier.notify = nbd_op_blocker_changed;
+blk_op_blocker_add_notifier(blk, exp-blocker_notifier);
 return exp;
 
 fail:
@@ -1132,6 +1146,7 @@ void nbd_export_close(NBDExport *exp)
 nbd_export_set_name(exp, NULL);
 nbd_export_put(exp);
 if (exp-blk) {
+notifier_remove(exp-blocker_notifier);
 blk_remove_aio_context_notifier(exp-blk, blk_aio_attached,
 blk_aio_detach, exp);
 blk_unref(exp-blk);
@@ -1455,6 +1470,9 @@ static void nbd_update_can_read(NBDClient *client)
 bool can_read = client-recv_coroutine ||
 client-nb_requests  MAX_NBD_REQUESTS;
 
+if (client-exp  client-exp-io_blocked) {
+can_read = false;
+}
 if (can_read != client-can_read) {
 client-can_read = can_read;
 nbd_set_handlers(client);
-- 
2.4.0

[Qemu-block] [PATCH v2 09/11] blockdev: Block device IO during drive-backup transaction

2015-05-13 Thread Fam Zheng

Signed-off-by: Fam Zheng f...@redhat.com
---
 blockdev.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 923fc90..ae52d27 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1575,6 +1575,7 @@ typedef struct DriveBackupState {
 BlockDriverState *bs;
 AioContext *aio_context;
 BlockJob *job;
+Error *blocker;
 } DriveBackupState;
 
 static void drive_backup_prepare(BlkTransactionState *common, Error **errp)
@@ -1599,6 +1600,9 @@ static void drive_backup_prepare(BlkTransactionState 
*common, Error **errp)
 state-aio_context = bdrv_get_aio_context(bs);
 aio_context_acquire(state-aio_context);
 
+state-bs = bs;
+error_setg(state-blocker, drive-backup in progress);
+bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, state-blocker);
 qmp_drive_backup(backup-device, backup-target,
  backup-has_format, backup-format,
  backup-sync,
@@ -1613,7 +1617,6 @@ static void drive_backup_prepare(BlkTransactionState 
*common, Error **errp)
 return;
 }
 
-state-bs = bs;
 state-job = state-bs-job;
 }
 
@@ -1632,6 +1635,10 @@ static void drive_backup_clean(BlkTransactionState 
*common)
 {
 DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
 
+if (state-bs) {
+bdrv_op_unblock(state-bs, BLOCK_OP_TYPE_DEVICE_IO, state-blocker);
+error_free(state-blocker);
+}
 if (state-aio_context) {
 aio_context_release(state-aio_context);
 }
-- 
2.4.0

Re: [Qemu-block] [Qemu-devel] [PATCH 4/5] qemu-io: prompt for encryption keys when required

2015-05-13 Thread Markus Armbruster

Daniel P. Berrange berra...@redhat.com writes:

 On Tue, May 12, 2015 at 12:32:53PM -0600, Eric Blake wrote:
 On 05/12/2015 10:09 AM, Daniel P. Berrange wrote:
  The qemu-io tool does not check if the image is encrypted so
  historically would silently corrupt the sectors by writing
  plain text data into them instead of cipher text. The earlier
  commit turns this mistake into a fatal abort, so check for
  encryption and prompt for key when required.
 
 Doesn't that mean that 'git bisect' gives a crashing qemu-io for 3
 patches?  Should this be rearranged so that 1/5 comes after this to
 avoid triggering the abort?

 I'm ambivalent on that - previously qemu-io was data corrupting
 for this scenario, so crashing isn't really that much worse :-)

If it crashes before it can corrupt anything, I'd sell it as an
improvement ;)

[...]

Re: [Qemu-block] [Qemu-devel] [PATCH 14/34] qcow2: Factor out qcow2_update_options()

2015-05-13 Thread Kevin Wolf

Am 12.05.2015 um 22:04 hat Eric Blake geschrieben:
 On 05/08/2015 11:21 AM, Kevin Wolf wrote:
  Eventually we want to be able to change options at runtime. As a first
  step towards that goal, separate some option handling code from the
  general initialisation code in qcow2_open().
  
  Signed-off-by: Kevin Wolf kw...@redhat.com
  ---
   block/qcow2.c | 135 
  +-
   1 file changed, 76 insertions(+), 59 deletions(-)
  
  +} else {
  +error_setg(errp, Unsupported value '%s' for qcow2 option 
  +   'overlap-check'. Allowed are either of the following: 
  +   none, constant, cached, all, opt_overlap_check);
 
 Pre-existing due to code motion, but I find s/either/any/ easier to read.

Isn't either only for a choice between two things anyway?

The series isn't long enough yet, I'll fix it. :-)

Kevin


pgpA5J6xb7lVj.pgp
Description: PGP signature

[Qemu-block] [PATCH v2 01/11] block: Add op blocker type device IO

2015-05-13 Thread Fam Zheng

Preventing device from submitting IO is useful around various nested
aio_poll. Op blocker is a good place to put this flag.

Devices would submit IO requests through blk_* block backend interface,
which calls blk_check_request to check the validity. Return -EBUSY if
the operation is blocked, which means device IO is temporarily disabled
by another BDS user.

Signed-off-by: Fam Zheng f...@redhat.com
---
 block/block-backend.c | 4 
 include/block/block.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 93e46f3..71fc695 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -478,6 +478,10 @@ static int blk_check_request(BlockBackend *blk, int64_t 
sector_num,
 return -EIO;
 }
 
+if (bdrv_op_is_blocked(blk-bs, BLOCK_OP_TYPE_DEVICE_IO, NULL)) {
+return -EBUSY;
+}
+
 return blk_check_byte_request(blk, sector_num * BDRV_SECTOR_SIZE,
   nb_sectors * BDRV_SECTOR_SIZE);
 }
diff --git a/include/block/block.h b/include/block/block.h
index 7d1a717..906fb31 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -159,6 +159,7 @@ typedef enum BlockOpType {
 BLOCK_OP_TYPE_RESIZE,
 BLOCK_OP_TYPE_STREAM,
 BLOCK_OP_TYPE_REPLACE,
+BLOCK_OP_TYPE_DEVICE_IO,
 BLOCK_OP_TYPE_MAX,
 } BlockOpType;
 
-- 
2.4.0

Re: [Qemu-block] [PATCH] block/mirror: Sleep periodically during bitmap scanning

2015-05-13 Thread Paolo Bonzini



On 13/05/2015 05:11, Fam Zheng wrote:
 Before, we only yield after initializing dirty bitmap, where the QMP
 command would return. That may take very long, and guest IO will be
 blocked.
 
 Add sleep points like the later mirror iterations.

You were also planning to let bdrv_co_is_allocated/get_block_status
return a larger p_num than nb_sectors---which maybe could make
nb_sectors obsolete completely, I don't know.  But this is already an
improvement.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Thanks,

Paolo

 
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  block/mirror.c | 13 -
  1 file changed, 12 insertions(+), 1 deletion(-)
 
 diff --git a/block/mirror.c b/block/mirror.c
 index 1a1d997..baed225 100644
 --- a/block/mirror.c
 +++ b/block/mirror.c
 @@ -467,11 +467,23 @@ static void coroutine_fn mirror_run(void *opaque)
  sectors_per_chunk = s-granularity  BDRV_SECTOR_BITS;
  mirror_free_init(s);
  
 +last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
  if (!s-is_none_mode) {
  /* First part, loop on the sectors and initialize the dirty bitmap.  
 */
  BlockDriverState *base = s-base;
  for (sector_num = 0; sector_num  end; ) {
  int64_t next = (sector_num | (sectors_per_chunk - 1)) + 1;
 +int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 +
 +if (now - last_pause_ns  SLICE_TIME) {
 +last_pause_ns = now;
 +block_job_sleep_ns(s-common, QEMU_CLOCK_REALTIME, 0);
 +}
 +
 +if (block_job_is_cancelled(s-common)) {
 +goto immediate_exit;
 +}
 +
  ret = bdrv_is_allocated_above(bs, base,
sector_num, next - sector_num, n);
  
 @@ -490,7 +502,6 @@ static void coroutine_fn mirror_run(void *opaque)
  }
  
  bdrv_dirty_iter_init(s-dirty_bitmap, s-hbi);
 -last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
  for (;;) {
  uint64_t delay_ns = 0;
  int64_t cnt;

Re: [Qemu-block] [PATCH v2 11/11] block: Block device IO during bdrv_drain and bdrv_drain_all

2015-05-13 Thread Paolo Bonzini



On 13/05/2015 13:08, Fam Zheng wrote:
  I think this isn't enough.  It's the callers of bdrv_drain and
  bdrv_drain_all that need to block before drain and unblock before
  aio_context_release.
 
 Which callers do you mean? qmp_transaction is covered in this series.

All of them. :(

In some cases it may be unnecessary.  I'm thinking of bdrv_set_aio_context,
and mig_save_device_dirty, but that's probably the exception rather than
the rule.

In some cases, like bdrv_snapshot_delete, the change is trivial.

The only somewhat more complex case is probably block/mirror.c.  There,
the aio_context_release happens through block_job_sleep_ns or
qemu_coroutine_yield.  I guess the change would look something like
this sketch:

diff --git a/block/mirror.c b/block/mirror.c
index 58f391a..dcfede0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -513,16 +513,29 @@ static void coroutine_fn mirror_run(void *opaque)
 
 if (cnt == 0  should_complete) {
 /* The dirty bitmap is not updated while operations are pending.
- * If we're about to exit, wait for pending operations before
- * calling bdrv_get_dirty_count(bs), or we may exit while the
+ * If we're about to exit, wait for pending operations and
+ * recheck bdrv_get_dirty_count(bs), or we may exit while the
  * source has dirty data to copy!
  *
  * Note that I/O can be submitted by the guest while
- * mirror_populate runs.
+ * the rest of mirror_populate runs, but we lock it out here.
  */
 trace_mirror_before_drain(s, cnt);
+
+// ... block ...
 bdrv_drain(bs);
 cnt = bdrv_get_dirty_count(s-dirty_bitmap);
+
+if (cnt == 0) {
+/* The two disks are in sync.  Exit and report successful
+ * completion.
+ */
+assert(s-synced  QLIST_EMPTY(bs-tracked_requests));
+s-common.cancelled = false;
+break; // ... and unblock somewhere else...
+}
+
+// ... unblock ...
 }
 
 ret = 0;
@@ -535,13 +549,6 @@ static void coroutine_fn mirror_run(void *opaque)
 } else if (!should_complete) {
 delay_ns = (s-in_flight == 0  cnt == 0 ? SLICE_TIME : 0);
 block_job_sleep_ns(s-common, QEMU_CLOCK_REALTIME, delay_ns);
-} else if (cnt == 0) {
-/* The two disks are in sync.  Exit and report successful
- * completion.
- */
-assert(QLIST_EMPTY(bs-tracked_requests));
-s-common.cancelled = false;
-break;
 }
 last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 }


It can be the topic of a separate series.  But this patch brings a
false sense of security (either the blocker is unnecessary, or it
needs to last after bdrv_drain returns), so I think it should be
dropped.

Paolo

[Qemu-block] [PATCH v2 07/11] blockdev: Block device IO during internal snapshot transaction

2015-05-13 Thread Fam Zheng

Signed-off-by: Fam Zheng f...@redhat.com
---
 blockdev.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 5eaf77e..7f763d9 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1262,6 +1262,7 @@ typedef struct InternalSnapshotState {
 BlockDriverState *bs;
 AioContext *aio_context;
 QEMUSnapshotInfo sn;
+Error *blocker;
 } InternalSnapshotState;
 
 static void internal_snapshot_prepare(BlkTransactionState *common,
@@ -1300,6 +1301,10 @@ static void 
internal_snapshot_prepare(BlkTransactionState *common,
 state-aio_context = bdrv_get_aio_context(bs);
 aio_context_acquire(state-aio_context);
 
+state-bs = bs;
+error_setg(state-blocker, internal snapshot in progress);
+bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, state-blocker);
+
 if (!bdrv_is_inserted(bs)) {
 error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
 return;
@@ -1354,9 +1359,6 @@ static void internal_snapshot_prepare(BlkTransactionState 
*common,
  name, device);
 return;
 }
-
-/* 4. succeed, mark a snapshot is created */
-state-bs = bs;
 }
 
 static void internal_snapshot_abort(BlkTransactionState *common)
@@ -1387,6 +1389,10 @@ static void internal_snapshot_clean(BlkTransactionState 
*common)
 InternalSnapshotState *state = DO_UPCAST(InternalSnapshotState,
  common, common);
 
+if (state-bs) {
+bdrv_op_unblock(state-bs, BLOCK_OP_TYPE_DEVICE_IO, state-blocker);
+error_free(state-blocker);
+}
 if (state-aio_context) {
 aio_context_release(state-aio_context);
 }
-- 
2.4.0

Re: [Qemu-block] [PATCH] qemu-io: Use getopt() correctly

2015-05-13 Thread Kevin Wolf

Am 12.05.2015 um 17:10 hat Eric Blake geschrieben:
 POSIX says getopt() returns -1 on completion.  While Linux happens
 to define EOF as -1, this definition is not required by POSIX, and
 there may be platforms where checking for EOF instead of -1 would
 lead to an infinite loop.
 
 Signed-off-by: Eric Blake ebl...@redhat.com

Thanks, applied to my block branch.

Kevin

Re: [Qemu-block] [PATCH 15/34] qcow2: Move qcow2_update_options() call up

2015-05-13 Thread Max Reitz


On 08.05.2015 19:21, Kevin Wolf wrote:

qcow2_update_options() only updates some variables in BDRVQcowState and
doesn't really depend on other parts of it being initialised yet, so it
can be moved so that it immediately follows the other half of option
handling code in qcow2_open().

Signed-off-by: Kevin Wolf kw...@redhat.com
---
  block/qcow2.c | 18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index db535d4..761ba30 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -911,6 +911,15 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
  goto fail;
  }
  
+/* Enable lazy_refcounts according to image and command line options */


Again, I find the comment not very fitting. But the motion itself is 
okay, so regardless of what comment you are moving here:


Reviewed-by: Max Reitz mre...@redhat.com


+ret = qcow2_update_options(bs, opts, flags, errp);
+if (ret  0) {
+goto fail;
+}
+
+qemu_opts_del(opts);
+opts = NULL;
+
  s-cluster_cache = g_malloc(s-cluster_size);
  /* one more sector for decompressed data alignment */
  s-cluster_data = qemu_try_blockalign(bs-file, QCOW_MAX_CRYPT_CLUSTERS
@@ -995,15 +1004,6 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
  }
  }
  
-/* Enable lazy_refcounts according to image and command line options */

-ret = qcow2_update_options(bs, opts, flags, errp);
-if (ret  0) {
-goto fail;
-}
-
-qemu_opts_del(opts);
-opts = NULL;
-
  #ifdef DEBUG_ALLOC
  {
  BdrvCheckResult result = {0};

Re: [Qemu-block] [PATCH 17/34] qcow2: Leave s unchanged on qcow2_update_options() failure

2015-05-13 Thread Max Reitz


On 08.05.2015 19:21, Kevin Wolf wrote:

On return, either all new options should be applied to BDRVQcowState (on
success), or all of the old setting should be preserved (on failure).

Signed-off-by: Kevin Wolf kw...@redhat.com
---
  block/qcow2.c | 52 
  1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index a4a267d..abe22f3 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -546,6 +546,9 @@ static int qcow2_update_options(BlockDriverState *bs, QDict 
*options,
  const char *opt_overlap_check, *opt_overlap_check_template;
  int overlap_check_template = 0;
  uint64_t l2_cache_size, refcount_cache_size;
+Qcow2Cache* l2_table_cache;
+Qcow2Cache* refcount_block_cache;


;-)

Because patch 18 exists:

Reviewed-by: Max Reitz mre...@redhat.com

Re: [Qemu-block] [PATCH 18/34] qcow2: Fix memory leak in qcow2_update_options() error path

2015-05-13 Thread Max Reitz


On 13.05.2015 14:02, Kevin Wolf wrote:

Am 13.05.2015 um 13:52 hat Max Reitz geschrieben:

On 08.05.2015 19:21, Kevin Wolf wrote:

Signed-off-by: Kevin Wolf kw...@redhat.com
---
  block/qcow2.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index abe22f3..84d6e0f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -546,8 +546,8 @@ static int qcow2_update_options(BlockDriverState *bs, QDict 
*options,
  const char *opt_overlap_check, *opt_overlap_check_template;
  int overlap_check_template = 0;
  uint64_t l2_cache_size, refcount_cache_size;
-Qcow2Cache* l2_table_cache;
-Qcow2Cache* refcount_block_cache;
+Qcow2Cache* l2_table_cache = NULL;
+Qcow2Cache* refcount_block_cache = NULL;
  bool use_lazy_refcounts;
  int i;
  Error *local_err = NULL;
@@ -670,6 +670,14 @@ static int qcow2_update_options(BlockDriverState *bs, 
QDict *options,
  ret = 0;
  fail:
+if (ret  0) {
+if (l2_table_cache) {
+qcow2_cache_destroy(bs, l2_table_cache);
+}
+if (refcount_block_cache) {
+qcow2_cache_destroy(bs, refcount_block_cache);
+}
+}
  qemu_opts_del(opts);
  opts = NULL;

Why don't you squash this into patch 17 (I guess there is a reason,
but I fail to see it)?

It's a preexisting bug and its fix is unrelated to any of the
refactoring or preparation for reopen. So I think it deserves its own
commit, and if it doesn't, it's not clear to me why it should belong to
patch 17 of all patches.


Because patch 17 is introducing the {l2_table,refcount_block}_cache as 
local variables.



Also, I think it might make sense to either set
{l2_table,refcount_block}_cache to NULL once they have been moved to
s-{l2_table,refcount_block}_cache, or, even better, exchange them
with their respective field in *s. Thus, you could drop the if (ret
 0), I think I'd find the code easier to read, and with the
transation, we'd even be safe if the options had been set before and
are now to be updated.

The if (ret  0) disappears shortly after this patch when this is moved
into the abort part of a transaction.


Right, I just saw this. :-)

Max

[Qemu-block] [PATCH 01/11] block: keep a list of block jobs

2015-05-13 Thread Alberto Garcia

The current way to obtain the list of existing block jobs is to
iterate over all root nodes and check which ones own a job.

Since we want to be able to support block jobs in other nodes as well,
this patch keeps a list of jobs that is updated every time one is
created or destroyed.

This also updates qmp_query_block_jobs() and bdrv_drain_all() to use
this new list.

Signed-off-by: Alberto Garcia be...@igalia.com
Cc: Max Reitz mre...@redhat.com
Cc: Eric Blake ebl...@redhat.com
---
 block/io.c   | 21 -
 blockdev.c   | 19 ---
 blockjob.c   | 13 +
 include/block/blockjob.h | 14 ++
 4 files changed, 43 insertions(+), 24 deletions(-)

diff --git a/block/io.c b/block/io.c
index 1ce62c4..c38d776 100644
--- a/block/io.c
+++ b/block/io.c
@@ -310,21 +310,19 @@ void bdrv_drain_all(void)
 {
 /* Always run first iteration so any pending completion BHs run */
 bool busy = true;
-BlockDriverState *bs = NULL;
+BlockJob *job = NULL;
 
-while ((bs = bdrv_next(bs))) {
-AioContext *aio_context = bdrv_get_aio_context(bs);
+while ((job = block_job_next(job))) {
+AioContext *aio_context = bdrv_get_aio_context(job-bs);
 
 aio_context_acquire(aio_context);
-if (bs-job) {
-block_job_pause(bs-job);
-}
+block_job_pause(job);
 aio_context_release(aio_context);
 }
 
 while (busy) {
+BlockDriverState *bs = NULL;
 busy = false;
-bs = NULL;
 
 while ((bs = bdrv_next(bs))) {
 AioContext *aio_context = bdrv_get_aio_context(bs);
@@ -335,14 +333,11 @@ void bdrv_drain_all(void)
 }
 }
 
-bs = NULL;
-while ((bs = bdrv_next(bs))) {
-AioContext *aio_context = bdrv_get_aio_context(bs);
+while ((job = block_job_next(job))) {
+AioContext *aio_context = bdrv_get_aio_context(job-bs);
 
 aio_context_acquire(aio_context);
-if (bs-job) {
-block_job_resume(bs-job);
-}
+block_job_resume(job);
 aio_context_release(aio_context);
 }
 }
diff --git a/blockdev.c b/blockdev.c
index 5eaf77e..bf36a0e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3073,21 +3073,18 @@ fail:
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
 BlockJobInfoList *head = NULL, **p_next = head;
-BlockDriverState *bs;
+BlockJob *job;
 
-for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
-AioContext *aio_context = bdrv_get_aio_context(bs);
+for (job = block_job_next(NULL); job; job = block_job_next(job)) {
+BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1);
+AioContext *aio_context = bdrv_get_aio_context(job-bs);
 
 aio_context_acquire(aio_context);
-
-if (bs-job) {
-BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1);
-elem-value = block_job_query(bs-job);
-*p_next = elem;
-p_next = elem-next;
-}
-
+elem-value = block_job_query(job);
 aio_context_release(aio_context);
+
+*p_next = elem;
+p_next = elem-next;
 }
 
 return head;
diff --git a/blockjob.c b/blockjob.c
index 2755465..c46984d 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -35,6 +35,16 @@
 #include qemu/timer.h
 #include qapi-event.h
 
+static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
+
+BlockJob *block_job_next(BlockJob *job)
+{
+if (!job) {
+return QLIST_FIRST(block_jobs);
+}
+return QLIST_NEXT(job, job_list);
+}
+
 void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
int64_t speed, BlockCompletionFunc *cb,
void *opaque, Error **errp)
@@ -73,6 +83,8 @@ void *block_job_create(const BlockJobDriver *driver, 
BlockDriverState *bs,
 return NULL;
 }
 }
+
+QLIST_INSERT_HEAD(block_jobs, job, job_list);
 return job;
 }
 
@@ -85,6 +97,7 @@ void block_job_completed(BlockJob *job, int ret)
 bs-job = NULL;
 bdrv_op_unblock_all(bs, job-blocker);
 error_free(job-blocker);
+QLIST_REMOVE(job, job_list);
 g_free(job);
 }
 
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 57d8ef1..5431dd2 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -102,6 +102,9 @@ struct BlockJob {
  */
 bool ready;
 
+/** Element of the list of block jobs */
+QLIST_ENTRY(BlockJob) job_list;
+
 /** Status that is published by the query-block-jobs QMP API */
 BlockDeviceIoStatus iostatus;
 
@@ -125,6 +128,17 @@ struct BlockJob {
 };
 
 /**
+ * block_job_next:
+ * @job: A block job, or %NULL.
+ *
+ * Get the next element from the list of block jobs after @job, or the
+ * first one if @job is %NULL.
+ *
+ * Returns the requested job, or %NULL if there are no more jobs left.
+ */
+BlockJob *block_job_next(BlockJob *job);
+
+/**
  * block_job_create:
  *

[Qemu-block] [PATCH 03/11] block: never cancel a streaming job without running stream_complete()

2015-05-13 Thread Alberto Garcia

We need to call stream_complete() in order to do all the necessary
clean-ups, even if there's an early failure. At the moment it's only
useful to make sure that s-backing_file_str is not leaked, but it
will become more important as we introduce support for streaming to
any intermediate node.

Signed-off-by: Alberto Garcia be...@igalia.com
Reviewed-by: Max Reitz mre...@redhat.com
---
 block/stream.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index a628901..37bfd8b 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -114,21 +114,21 @@ static void coroutine_fn stream_run(void *opaque)
 StreamCompleteData *data;
 BlockDriverState *bs = s-common.bs;
 BlockDriverState *base = s-base;
-int64_t sector_num, end;
+int64_t sector_num = 0;
+int64_t end = -1;
 int error = 0;
 int ret = 0;
 int n = 0;
 void *buf;
 
 if (!bs-backing_hd) {
-block_job_completed(s-common, 0);
-return;
+goto out;
 }
 
 s-common.len = bdrv_getlength(bs);
 if (s-common.len  0) {
-block_job_completed(s-common, s-common.len);
-return;
+ret = s-common.len;
+goto out;
 }
 
 end = s-common.len  BDRV_SECTOR_BITS;
@@ -215,6 +215,7 @@ wait:
 
 qemu_vfree(buf);
 
+out:
 /* Modify backing chain and close BDSes in main loop */
 data = g_malloc(sizeof(*data));
 data-ret = ret;
-- 
2.1.4

[Qemu-block] [PATCH 09/11] qemu-iotests: test streaming to an intermediate layer

2015-05-13 Thread Alberto Garcia

This adds test_stream_intermediate(), similar to test_stream() but
streams to the intermediate image instead.

Signed-off-by: Alberto Garcia be...@igalia.com
Reviewed-by: Max Reitz mre...@redhat.com
---
 tests/qemu-iotests/030 | 18 +-
 tests/qemu-iotests/030.out |  4 ++--
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index bc53885..0927457 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -36,7 +36,8 @@ class TestSingleDrive(iotests.QMPTestCase):
 qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
mid_img, test_img)
 qemu_io('-f', 'raw', '-c', 'write -P 0x1 0 512', backing_img)
 qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0x1 524288 512', mid_img)
-self.vm = iotests.VM().add_drive(blkdebug:: + test_img)
+opts = backing.node-name=mid
+self.vm = iotests.VM().add_drive(blkdebug:: + test_img, opts)
 self.vm.launch()
 
 def tearDown(self):
@@ -60,6 +61,21 @@ class TestSingleDrive(iotests.QMPTestCase):
  qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img),
  'image file map does not match backing file after 
streaming')
 
+def test_stream_intermediate(self):
+self.assert_no_active_block_jobs()
+
+result = self.vm.qmp('block-stream', device='mid')
+self.assert_qmp(result, 'return', {})
+
+self.wait_until_completed(drive='mid')
+
+self.assert_no_active_block_jobs()
+self.vm.shutdown()
+
+self.assertEqual(qemu_io('-f', 'raw', '-c', 'map', backing_img),
+ qemu_io('-f', iotests.imgfmt, '-c', 'map', mid_img),
+ 'image file map does not match backing file after 
streaming')
+
 def test_stream_pause(self):
 self.assert_no_active_block_jobs()
 
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index 6323079..96961ed 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-..
+...
 --
-Ran 14 tests
+Ran 15 tests
 
 OK
-- 
2.1.4

Re: [Qemu-block] [PATCH 19/34] qcow2: Make qcow2_update_options() suitable for transactions

2015-05-13 Thread Max Reitz


On 08.05.2015 19:21, Kevin Wolf wrote:

Before we can allow updating options at runtime with bdrv_reopen(), we
need to split the function into prepare/commit/abort parts.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
  block/qcow2.c | 101 ++
  1 file changed, 67 insertions(+), 34 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 84d6e0f..fc9375e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -538,17 +538,24 @@ static void read_cache_sizes(QemuOpts *opts, uint64_t 
*l2_cache_size,
  }
  }
  
-static int qcow2_update_options(BlockDriverState *bs, QDict *options,

-int flags, Error **errp)
+typedef struct Qcow2ReopenState {
+Qcow2Cache* l2_table_cache;
+Qcow2Cache* refcount_block_cache;


*cough*


+bool use_lazy_refcounts;
+int overlap_check;
+bool discard_passthrough[QCOW2_DISCARD_MAX];
+} Qcow2ReopenState;
+
+static int qcow2_update_options_prepare(BlockDriverState *bs,
+Qcow2ReopenState *r,
+QDict *options, int flags,
+Error **errp)
  {
  BDRVQcowState *s = bs-opaque;
  QemuOpts *opts = NULL;
  const char *opt_overlap_check, *opt_overlap_check_template;
  int overlap_check_template = 0;
  uint64_t l2_cache_size, refcount_cache_size;
-Qcow2Cache* l2_table_cache = NULL;
-Qcow2Cache* refcount_block_cache = NULL;
-bool use_lazy_refcounts;
  int i;
  Error *local_err = NULL;
  int ret;
@@ -590,18 +597,18 @@ static int qcow2_update_options(BlockDriverState *bs, 
QDict *options,
  }
  
  /* alloc L2 table/refcount block cache */

-l2_table_cache = qcow2_cache_create(bs, l2_cache_size);
-refcount_block_cache = qcow2_cache_create(bs, refcount_cache_size);
-if (l2_table_cache == NULL || refcount_block_cache == NULL) {
+r-l2_table_cache = qcow2_cache_create(bs, l2_cache_size);
+r-refcount_block_cache = qcow2_cache_create(bs, refcount_cache_size);
+if (r-l2_table_cache == NULL || r-refcount_block_cache == NULL) {
  error_setg(errp, Could not allocate metadata caches);
  ret = -ENOMEM;
  goto fail;
  }
  
  /* Enable lazy_refcounts according to image and command line options */

-use_lazy_refcounts = qemu_opt_get_bool(opts, QCOW2_OPT_LAZY_REFCOUNTS,
+r-use_lazy_refcounts = qemu_opt_get_bool(opts, QCOW2_OPT_LAZY_REFCOUNTS,
  (s-compatible_features  QCOW2_COMPAT_LAZY_REFCOUNTS));
-if (use_lazy_refcounts  s-qcow_version  3) {
+if (r-use_lazy_refcounts  s-qcow_version  3) {
  error_setg(errp, Lazy refcounts require a qcow2 image with at least 
 qemu 1.1 compatibility level);
  ret = -EINVAL;
@@ -640,46 +647,72 @@ static int qcow2_update_options(BlockDriverState *bs, 
QDict *options,
  goto fail;
  }
  
-/*

- * Start updating fields in BDRVQcowState.
- * After this point no failure is allowed any more.
- */
-s-overlap_check = 0;
+r-overlap_check = 0;
  for (i = 0; i  QCOW2_OL_MAX_BITNR; i++) {
  /* overlap-check defines a template bitmask, but every flag may be
   * overwritten through the associated boolean option */
-s-overlap_check |=
+r-overlap_check |=
  qemu_opt_get_bool(opts, overlap_bool_option_names[i],
overlap_check_template  (1  i))  i;
  }
  
-s-l2_table_cache = l2_table_cache;

-s-refcount_block_cache = refcount_block_cache;
-
-s-use_lazy_refcounts = use_lazy_refcounts;
-
-s-discard_passthrough[QCOW2_DISCARD_NEVER] = false;
-s-discard_passthrough[QCOW2_DISCARD_ALWAYS] = true;
-s-discard_passthrough[QCOW2_DISCARD_REQUEST] =
+r-discard_passthrough[QCOW2_DISCARD_NEVER] = false;
+r-discard_passthrough[QCOW2_DISCARD_ALWAYS] = true;
+r-discard_passthrough[QCOW2_DISCARD_REQUEST] =
  qemu_opt_get_bool(opts, QCOW2_OPT_DISCARD_REQUEST,
flags  BDRV_O_UNMAP);
-s-discard_passthrough[QCOW2_DISCARD_SNAPSHOT] =
+r-discard_passthrough[QCOW2_DISCARD_SNAPSHOT] =
  qemu_opt_get_bool(opts, QCOW2_OPT_DISCARD_SNAPSHOT, true);
-s-discard_passthrough[QCOW2_DISCARD_OTHER] =
+r-discard_passthrough[QCOW2_DISCARD_OTHER] =
  qemu_opt_get_bool(opts, QCOW2_OPT_DISCARD_OTHER, false);
  
  ret = 0;

  fail:
-if (ret  0) {
-if (l2_table_cache) {
-qcow2_cache_destroy(bs, l2_table_cache);
-}
-if (refcount_block_cache) {
-qcow2_cache_destroy(bs, refcount_block_cache);
-}
-}
  qemu_opts_del(opts);
  opts = NULL;
+return ret;
+}
+
+static void qcow2_update_options_commit(BlockDriverState *bs,
+Qcow2ReopenState *r)
+{
+BDRVQcowState *s = bs-opaque;
+int i;
+
+s-l2_table_cache =

[Qemu-block] [PATCH 07/11] qemu-iotests: fix test_stream_partial()

2015-05-13 Thread Alberto Garcia

This test is streaming to the top layer using the intermediate image
as the base. This is a mistake since block-stream never copies data
from the base image and its backing chain, so this is effectively a
no-op.

In addition to fixing the base parameter, this patch also writes some
data to the intermediate image before the test, so there's something
to copy and the test is meaningful.

Signed-off-by: Alberto Garcia be...@igalia.com
Reviewed-by: Max Reitz mre...@redhat.com
---
 tests/qemu-iotests/030 | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 7ca9b25..6e6cb5a 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -35,6 +35,7 @@ class TestSingleDrive(iotests.QMPTestCase):
 qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
backing_img, mid_img)
 qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
mid_img, test_img)
 qemu_io('-f', 'raw', '-c', 'write -P 0x1 0 512', backing_img)
+qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0x1 524288 512', mid_img)
 self.vm = iotests.VM().add_drive(blkdebug:: + test_img)
 self.vm.launch()
 
@@ -93,7 +94,7 @@ class TestSingleDrive(iotests.QMPTestCase):
 def test_stream_partial(self):
 self.assert_no_active_block_jobs()
 
-result = self.vm.qmp('block-stream', device='drive0', base=mid_img)
+result = self.vm.qmp('block-stream', device='drive0', base=backing_img)
 self.assert_qmp(result, 'return', {})
 
 self.wait_until_completed()
-- 
2.1.4

[Qemu-block] [PATCH v7 00/11] Support streaming to an intermediate layer

2015-05-13 Thread Alberto Garcia

v7:
- Rebased against the current master
- Updated bdrv_drain_all() to use the new block_job_next() API.

v6: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg03046.html
- fix the no-op test following Max's suggestions

v5: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg03006.html
- Fix a few typos
- Minor documentation updates
- Update test_stream_partial() to test no-ops
- New test case: test_stream_parallel()
- New test case: test_stream_overlapping()

v4: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01878.html
- Refactor find_block_job to use the error from bdrv_lookup_bs()
- Don't use QERR_DEVICE_IN_USE in block_job_create() since we can be
  dealing with nodes now.
- Fix @device comment in the BlockJobInfo documentation
- stream_start(): simplify the bdrv_reopen() call and use
  bdrv_get_device_or_node_name() for error messages.
- Use a different variable name for BlockDriverState *i
- Documentation fixes in docs/live-block-ops.txt
- Update iotest 30 since now test_device_not_found() returns
  GenericError
- Fix test case test_stream_partial()
- Add new test case test_stream_intermediate()
- Fix typos

v3: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg00806.html
- Keep a list of block jobs and make qmp_query_block_jobs() iterate
  over it.

v2: https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg04798.html
- The 'block-stream' command does not have a 'node-name' parameter
  anymore and reuses 'device' for that purpose.
- Block jobs can now be owned by any intermediate node, and not just
  by the ones at the root. query-block-jobs is updated to reflect that
  change.
- The 'device' parameter of all 'block-job-*' commands can now take a
  node name.
- The BlockJobInfo type and all BLOCK_JOB_* events report the node
  name in the 'device' field if the node does not have a device name.
- All intermediate nodes are blocked (and checked for blockers) during
  the streaming operation.

v1: https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04116.html

Regards,

Berto

Alberto Garcia (11):
  block: keep a list of block jobs
  block: allow block jobs in any arbitrary node
  block: never cancel a streaming job without running stream_complete()
  block: Support streaming to an intermediate layer
  block: Add QMP support for streaming to an intermediate layer
  docs: Document how to stream to an intermediate layer
  qemu-iotests: fix test_stream_partial()
  qemu-iotests: add no-op streaming test
  qemu-iotests: test streaming to an intermediate layer
  qemu-iotests: test block-stream operations in parallel
  qemu-iotests: test overlapping block-stream operations

 block.c|   4 +-
 block/io.c |  21 +++
 block/mirror.c |   5 +-
 block/stream.c |  44 --
 blockdev.c |  55 -
 blockjob.c |  31 +++---
 docs/live-block-ops.txt|  31 ++
 docs/qmp/qmp-events.txt|   8 +--
 include/block/blockjob.h   |  14 +
 include/qapi/qmp/qerror.h  |   3 -
 qapi/block-core.json   |  30 +
 tests/qemu-iotests/030 | 148 -
 tests/qemu-iotests/030.out |   4 +-
 13 files changed, 304 insertions(+), 94 deletions(-)

-- 
2.1.4

Re: [Qemu-block] [PATCH 23/34] block: Pass driver-specific options to .bdrv_refresh_filename()

2015-05-13 Thread Max Reitz


On 08.05.2015 19:21, Kevin Wolf wrote:

In order to decide whether a blkdebug: filename can be produced or a
json: one is necessary, blkdebug checked whether bs-options had more
options than just config, x-image or image (the latter including
nested options). That doesn't work well when generic block layer options
are present.

This patch passes an option QDict to the driver that contains only
driver-specific options, i.e. the options for the general block layer as
well as child nodes are already filtered out. Works much better this
way.


Indeed. :-)


Signed-off-by: Kevin Wolf kw...@redhat.com
---
  block.c   |  5 -
  block/blkdebug.c  | 17 ++---
  block/blkverify.c |  2 +-
  block/nbd.c   | 10 +-
  block/quorum.c|  2 +-
  include/block/block_int.h |  2 +-
  6 files changed, 18 insertions(+), 20 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com

[Qemu-block] [PATCH 05/11] block: Add QMP support for streaming to an intermediate layer

2015-05-13 Thread Alberto Garcia

This patch makes the 'device' parameter of the 'block-stream' command
accept a node name as well as a device name.

In addition to that, operation blockers will be checked in all
intermediate nodes between the top and the base node.

Since qmp_block_stream() now uses the error from bdrv_lookup_bs() and
no longer returns DeviceNotFound, iotest 030 is updated to expect
GenericError instead.

Signed-off-by: Alberto Garcia be...@igalia.com
Reviewed-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 blockdev.c | 20 ++--
 qapi/block-core.json   | 10 +++---
 tests/qemu-iotests/030 |  2 +-
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 1fcf466..85d2d5e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2263,8 +2263,7 @@ void qmp_block_stream(const char *device,
   bool has_on_error, BlockdevOnError on_error,
   Error **errp)
 {
-BlockBackend *blk;
-BlockDriverState *bs;
+BlockDriverState *bs, *iter;
 BlockDriverState *base_bs = NULL;
 AioContext *aio_context;
 Error *local_err = NULL;
@@ -2274,20 +2273,14 @@ void qmp_block_stream(const char *device,
 on_error = BLOCKDEV_ON_ERROR_REPORT;
 }
 
-blk = blk_by_name(device);
-if (!blk) {
-error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+bs = bdrv_lookup_bs(device, device, errp);
+if (!bs) {
 return;
 }
-bs = blk_bs(blk);
 
 aio_context = bdrv_get_aio_context(bs);
 aio_context_acquire(aio_context);
 
-if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_STREAM, errp)) {
-goto out;
-}
-
 if (has_base) {
 base_bs = bdrv_find_backing_image(bs, base);
 if (base_bs == NULL) {
@@ -2298,6 +2291,13 @@ void qmp_block_stream(const char *device,
 base_name = base;
 }
 
+/* Check for op blockers in the whole chain between bs and base */
+for (iter = bs; iter  iter != base_bs; iter = iter-backing_hd) {
+if (bdrv_op_is_blocked(iter, BLOCK_OP_TYPE_STREAM, errp)) {
+goto out;
+}
+}
+
 /* if we are streaming the entire chain, the result will have no backing
  * file, and specifying one is therefore an error */
 if (base_bs == NULL  has_backing_file) {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9b5cddd..769b6c7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1095,6 +1095,10 @@
 # with query-block-jobs.  The operation can be stopped before it has completed
 # using the block-job-cancel command.
 #
+# The node that receives the data is called the top image, can be located
+# in any part of the whole chain and can be specified using its device
+# or node name.
+#
 # If a base file is specified then sectors are not copied from that base file 
and
 # its backing chain.  When streaming completes the image file will have the 
base
 # file as its backing file.  This can be used to stream a subset of the backing
@@ -1103,12 +1107,12 @@
 # On successful completion the image file is updated to drop the backing file
 # and the BLOCK_JOB_COMPLETED event is emitted.
 #
-# @device: the device name
+# @device: the device or node name of the top image
 #
 # @base:   #optional the common backing file name
 #
-# @backing-file: #optional The backing file string to write into the active
-#  layer. This filename is not validated.
+# @backing-file: #optional The backing file string to write into the top
+#  image. This filename is not validated.
 #
 #  If a pathname string is such that it cannot be
 #  resolved by QEMU, that means that subsequent QMP or
diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 952a524..7ca9b25 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -107,7 +107,7 @@ class TestSingleDrive(iotests.QMPTestCase):
 
 def test_device_not_found(self):
 result = self.vm.qmp('block-stream', device='nonexistent')
-self.assert_qmp(result, 'error/class', 'DeviceNotFound')
+self.assert_qmp(result, 'error/class', 'GenericError')
 
 
 class TestSmallerBackingFile(iotests.QMPTestCase):
-- 
2.1.4

[Qemu-block] [PATCH 10/11] qemu-iotests: test block-stream operations in parallel

2015-05-13 Thread Alberto Garcia

This test case checks that it's possible to launch several stream
operations in parallel in the same snapshot chain, each one involving
a different set of nodes.

Signed-off-by: Alberto Garcia be...@igalia.com
Reviewed-by: Max Reitz mre...@redhat.com
---
 tests/qemu-iotests/030 | 80 ++
 tests/qemu-iotests/030.out |  4 +--
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 0927457..c199cef 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -145,6 +145,86 @@ class TestSingleDrive(iotests.QMPTestCase):
 self.assert_qmp(result, 'error/class', 'GenericError')
 
 
+class TestParallelOps(iotests.QMPTestCase):
+image_len = 2 * 1024 * 1024 # MB
+num_ops = 4 # Number of parallel block-stream operations
+num_imgs = num_ops * 2 + 1
+imgs = []
+
+def setUp(self):
+opts = []
+self.imgs = []
+
+# Initialize file names and command-line options
+for i in range(self.num_imgs):
+img_depth = self.num_imgs - i - 1
+opts.append(backing. * img_depth + node-name=node%d % i)
+self.imgs.append(os.path.join(iotests.test_dir, 'img-%d.img' % i))
+
+# Create all images
+iotests.create_image(self.imgs[0], self.image_len)
+for i in range(1, self.num_imgs):
+qemu_img('create', '-f', iotests.imgfmt,
+ '-o', 'backing_file=%s' % self.imgs[i-1], self.imgs[i])
+
+# Put data into the images we are copying data from
+for i in range(1, self.num_imgs, 2):
+qemu_io('-f', iotests.imgfmt,
+'-c', 'write -P %d %d 128K' % (i, i*128*1024), 
self.imgs[i])
+
+# Attach the drive to the VM
+self.vm = iotests.VM()
+self.vm.add_drive(blkdebug:: + self.imgs[-1], ','.join(opts))
+self.vm.launch()
+
+def tearDown(self):
+self.vm.shutdown()
+for img in self.imgs:
+os.remove(img)
+
+# Test that it's possible to run several block-stream operations
+# in parallel in the same snapshot chain
+def test_stream_parallel(self):
+self.assert_no_active_block_jobs()
+
+# Check that the maps don't match before the streaming operations
+for i in range(2, self.num_imgs, 2):
+self.assertNotEqual(qemu_io('-f', iotests.imgfmt, '-c', 'map', 
self.imgs[i]),
+qemu_io('-f', iotests.imgfmt, '-c', 'map', 
self.imgs[i-1]),
+'image file map matches backing file before 
streaming')
+
+# Create all streaming jobs
+pending_jobs = []
+for i in range(2, self.num_imgs, 2):
+node_name = 'node%d' % i
+pending_jobs.append(node_name)
+result = self.vm.qmp('block-stream', device=node_name, 
base=self.imgs[i-2], speed=32768)
+self.assert_qmp(result, 'return', {})
+
+# The block job on the active image is always referenced by
+# its device name. Therefore we have to replace the node name
+# with the device name in the list of pending jobs
+pending_jobs.pop()
+pending_jobs.append(drive0)
+
+# Wait for all jobs to be finished.
+while len(pending_jobs)  0:
+for event in self.vm.get_qmp_events(wait=True):
+if event['event'] == 'BLOCK_JOB_COMPLETED':
+node_name = self.dictpath(event, 'data/device')
+self.assertTrue(node_name in pending_jobs)
+self.assert_qmp_absent(event, 'data/error')
+pending_jobs.remove(node_name)
+
+self.assert_no_active_block_jobs()
+self.vm.shutdown()
+
+# Check that all maps match now
+for i in range(2, self.num_imgs, 2):
+self.assertEqual(qemu_io('-f', iotests.imgfmt, '-c', 'map', 
self.imgs[i]),
+ qemu_io('-f', iotests.imgfmt, '-c', 'map', 
self.imgs[i-1]),
+ 'image file map does not match backing file after 
streaming')
+
 class TestSmallerBackingFile(iotests.QMPTestCase):
 backing_len = 1 * 1024 * 1024 # MB
 image_len = 2 * backing_len
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index 96961ed..b6f2576 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-...
+
 --
-Ran 15 tests
+Ran 16 tests
 
 OK
-- 
2.1.4

[Qemu-block] [PATCH 06/11] docs: Document how to stream to an intermediate layer

2015-05-13 Thread Alberto Garcia

Signed-off-by: Alberto Garcia be...@igalia.com
Reviewed-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 docs/live-block-ops.txt | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/docs/live-block-ops.txt b/docs/live-block-ops.txt
index a257087..a05d869 100644
--- a/docs/live-block-ops.txt
+++ b/docs/live-block-ops.txt
@@ -10,9 +10,9 @@ Snapshot live merge
 Given a snapshot chain, described in this document in the following
 format:
 
-[A] - [B] - [C] - [D]
+[A] - [B] - [C] - [D] - [E]
 
-Where the rightmost object ([D] in the example) described is the current
+Where the rightmost object ([E] in the example) described is the current
 image which the guest OS has write access to. To the left of it is its base
 image, and so on accordingly until the leftmost image, which has no
 base.
@@ -21,11 +21,14 @@ The snapshot live merge operation transforms such a chain 
into a
 smaller one with fewer elements, such as this transformation relative
 to the first example:
 
-[A] - [D]
+[A] - [E]
 
-Currently only forward merge with target being the active image is
-supported, that is, data copy is performed in the right direction with
-destination being the rightmost image.
+Data is copied in the right direction with destination being the
+rightmost image, but any other intermediate image can be specified
+instead. In this example data is copied from [C] into [D], so [D] can
+be backed by [B]:
+
+[A] - [B] - [D] - [E]
 
 The operation is implemented in QEMU through image streaming facilities.
 
@@ -35,14 +38,20 @@ streaming operation completes it raises a QMP event. 
'block_stream'
 copies data from the backing file(s) into the active image. When finished,
 it adjusts the backing file pointer.
 
-The 'base' parameter specifies an image which data need not be streamed from.
-This image will be used as the backing file for the active image when the
-operation is finished.
+The 'base' parameter specifies an image which data need not be
+streamed from. This image will be used as the backing file for the
+destination image when the operation is finished.
+
+In the first example above, the command would be:
+
+(qemu) block_stream virtio0 file-A.img
 
-In the example above, the command would be:
+In order to specify a destination image different from the active
+(rightmost) one we can use its (previously set) node name instead.
 
-(qemu) block_stream virtio0 A
+In the second example above, the command would be:
 
+(qemu) block_stream node-D file-B.img
 
 Live block copy
 ===
-- 
2.1.4

[Qemu-block] [PATCH 11/11] qemu-iotests: test overlapping block-stream operations

2015-05-13 Thread Alberto Garcia

This test case checks that it's not possible to perform two
block-stream operations if there are nodes involved in both.

Signed-off-by: Alberto Garcia be...@igalia.com
Reviewed-by: Max Reitz mre...@redhat.com
---
 tests/qemu-iotests/030 | 27 +++
 tests/qemu-iotests/030.out |  4 ++--
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index c199cef..f898c92 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -225,6 +225,33 @@ class TestParallelOps(iotests.QMPTestCase):
  qemu_io('-f', iotests.imgfmt, '-c', 'map', 
self.imgs[i-1]),
  'image file map does not match backing file after 
streaming')
 
+# Test that it's not possible to perform two block-stream
+# operations if there are nodes involved in both.
+def test_stream_overlapping(self):
+self.assert_no_active_block_jobs()
+
+# Set a speed limit to make sure that this job blocks the rest
+result = self.vm.qmp('block-stream', device='node4', 
base=self.imgs[0], speed=32768)
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('block-stream', device='node5', base=self.imgs[1])
+self.assert_qmp(result, 'error/class', 'GenericError')
+
+result = self.vm.qmp('block-stream', device='node3', base=self.imgs[2])
+self.assert_qmp(result, 'error/class', 'GenericError')
+
+result = self.vm.qmp('block-stream', device='node4')
+self.assert_qmp(result, 'error/class', 'GenericError')
+
+# If node4 is the active node, the id of the block job is drive0
+if self.num_imgs == 5:
+self.wait_until_completed(drive='drive0')
+else:
+self.wait_until_completed(drive='node4')
+self.assert_no_active_block_jobs()
+
+self.vm.shutdown()
+
 class TestSmallerBackingFile(iotests.QMPTestCase):
 backing_len = 1 * 1024 * 1024 # MB
 image_len = 2 * backing_len
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index b6f2576..52d796e 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-
+.
 --
-Ran 16 tests
+Ran 17 tests
 
 OK
-- 
2.1.4

Re: [Qemu-block] [PATCH 22/34] block: Exclude nested options only for children in append_open_options()

2015-05-13 Thread Max Reitz


On 08.05.2015 19:21, Kevin Wolf wrote:

Some drivers have nested options (e.g. blkdebug rule arrays), which
don't belong to a child node and shouldn't be removed. Don't remove all
options with . in their name, but check for the complete prefixes of
actually existing child nodes.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
  block.c   | 30 +++---
  include/block/block_int.h |  1 +
  2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index e329a47..e9a1d76 100644
--- a/block.c
+++ b/block.c
@@ -81,7 +81,7 @@ static QLIST_HEAD(, BlockDriver) bdrv_drivers =
  
  static int bdrv_open_inherit(BlockDriverState **pbs, const char *filename,

   const char *reference, QDict *options, int flags,
- BlockDriverState* parent,
+ BlockDriverState* parent, const char *child_name,
   const BdrvChildRole *child_role,
   BlockDriver *drv, Error **errp);
  
@@ -1174,8 +1174,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,

  backing_hd = NULL;
  ret = bdrv_open_inherit(backing_hd,
  *backing_filename ? backing_filename : NULL,
-reference, options, 0, bs, child_backing,
-NULL, local_err);
+reference, options, 0, bs, backing,
+child_backing, NULL, local_err);
  if (ret  0) {
  bs-open_flags |= BDRV_O_NO_BACKING;
  error_setg(errp, Could not open backing file: %s,
@@ -1238,7 +1238,7 @@ int bdrv_open_image(BlockDriverState **pbs, const char 
*filename,
  }
  
  ret = bdrv_open_inherit(pbs, filename, reference, image_options, 0,

-parent, child_role, NULL, errp);
+parent, bdref_key, child_role, NULL, errp);
  
  done:

  qdict_del(options, bdref_key);
@@ -1312,11 +1312,13 @@ out:
  
  static void bdrv_attach_child(BlockDriverState *parent_bs,

BlockDriverState *child_bs,
+  const char *child_name,
const BdrvChildRole *child_role)
  {
  BdrvChild *child = g_new(BdrvChild, 1);
  *child = (BdrvChild) {
  .bs = child_bs,
+.name   = child_name,
  .role   = child_role,
  };
  
@@ -1340,7 +1342,7 @@ static void bdrv_attach_child(BlockDriverState *parent_bs,

   */
  static int bdrv_open_inherit(BlockDriverState **pbs, const char *filename,
   const char *reference, QDict *options, int flags,
- BlockDriverState* parent,
+ BlockDriverState* parent, const char *child_name,
   const BdrvChildRole *child_role,
   BlockDriver *drv, Error **errp)
  {
@@ -1376,7 +1378,7 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
  }
  bdrv_ref(bs);
  if (child_role) {
-bdrv_attach_child(parent, bs, child_role);
+bdrv_attach_child(parent, bs, child_name, child_role);
  }
  *pbs = bs;
  return 0;
@@ -1519,7 +1521,7 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
  }
  
  if (child_role) {

-bdrv_attach_child(parent, bs, child_role);
+bdrv_attach_child(parent, bs, child_name, child_role);
  }
  
  QDECREF(options);

@@ -1563,7 +1565,7 @@ int bdrv_open(BlockDriverState **pbs, const char 
*filename,
BlockDriver *drv, Error **errp)
  {
  return bdrv_open_inherit(pbs, filename, reference, options, flags, NULL,
- NULL, drv, errp);
+ NULL, NULL, drv, errp);
  }
  
  typedef struct BlockReopenQueueEntry {

@@ -2093,7 +2095,7 @@ void bdrv_append(BlockDriverState *bs_new, 
BlockDriverState *bs_top)
  /* The contents of 'tmp' will become bs_top, as we are
   * swapping bs_new and bs_top contents. */
  bdrv_set_backing_hd(bs_top, bs_new);
-bdrv_attach_child(bs_top, bs_new, child_backing);
+bdrv_attach_child(bs_top, bs_new, backing, child_backing);
  }
  
  static void bdrv_delete(BlockDriverState *bs)

@@ -4037,13 +4039,19 @@ static bool append_open_options(QDict *d, 
BlockDriverState *bs)
  {
  const QDictEntry *entry;
  QemuOptDesc *desc;
+BdrvChild *child;
  bool found_any = false;
  
  for (entry = qdict_first(bs-options); entry;

   entry = qdict_next(bs-options, entry))
  {
-/* Only take options for this level */
-if (strchr(qdict_entry_key(entry), '.')) {
+/* Exclude options for children */
+QLIST_FOREACH(child, bs-children, next) {
+if (strstart(qdict_entry_key(entry), child-name, NULL)) {


I

Re: [Qemu-block] [PATCH 25/34] block: Allow specifying child options in reopen

2015-05-13 Thread Max Reitz


On 08.05.2015 19:21, Kevin Wolf wrote:

If the child was defined in the same context (-drive argument or
blockdev-add QMP command) as its parent, a reopen of the parent should
work the same and allow changing options of the child.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
  block.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com

Re: [Qemu-block] [PATCH v2 10/11] blockdev: Block device IO during blockdev-backup transaction

2015-05-13 Thread Paolo Bonzini



On 13/05/2015 19:28, Fam Zheng wrote:
 +state-bs = bs;
 +error_setg(state-blocker, blockdev-backup in progress);
 +bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, state-blocker);
 +
  qmp_blockdev_backup(backup-device, backup-target,
  backup-sync,
  backup-has_speed, backup-speed,
 @@ -1696,7 +1701,6 @@ static void blockdev_backup_prepare(BlkTransactionState 
 *common, Error **errp)
  return;
  }
  
 -state-bs = bs;

I don't understand this.  Jobs could pause/resume themselves by adding a
DEVICE_IO notifier on the targets.

However, block backups is the one job that cannot do this, because I/O
on the source triggers I/O on the target.

So if we consider this idea worthwhile, and decide that pausing device
I/O on the target should pause the block job, the backup job actually
has to prevent *adding a DEVICE_IO blocker* on the target.  This
meta-block is not possible in your design, which is a pity because on
the surface it looked nicer than mine.

FWIW, my original idea was:

- bdrv_pause checks if there is an operation blocker for PAUSE.  if it
is there, it fails

- otherwise, bdrv_pause invokes a notifier list if this is the outermost
call. if not the outermost call, it does nothing

- bdrv_resume does the same, but does not need a blocker

- drive-backup should block PAUSE on its target


Also, should the blockers (either DEVICE_IO in your design, or PAUSE in
mine) be included in bdrv_op_block_all.  I would say no in your case;
here is the proof:

- block-backup doesn't like DEVICE_IO blockers on the target

- block-backup calls bdrv_op_block_all on the target

- hence, bdrv_op_block_all shouldn't block DEVICE_IO


block_job_create is another suspicious caller of bdrv_op_block_all.  It
probably shouldn't block neither PAUSE nor DEVICE_IO.

Paolo

Re: [Qemu-block] [Qemu-devel] [PATCH v2 10/11] blockdev: Block device IO during blockdev-backup transaction

2015-05-13 Thread Fam Zheng

On Wed, 05/13 19:22, Wen Congyang wrote:
 On 05/14/2015 01:28 AM, Fam Zheng wrote:
  Signed-off-by: Fam Zheng f...@redhat.com
  ---
   blockdev.c | 10 +-
   1 file changed, 9 insertions(+), 1 deletion(-)
  
  diff --git a/blockdev.c b/blockdev.c
  index ae52d27..bd28183 100644
  --- a/blockdev.c
  +++ b/blockdev.c
  @@ -1649,6 +1649,7 @@ typedef struct BlockdevBackupState {
   BlockDriverState *bs;
   BlockJob *job;
   AioContext *aio_context;
  +Error *blocker;
   } BlockdevBackupState;
   
   static void blockdev_backup_prepare(BlkTransactionState *common, Error 
  **errp)
  @@ -1685,6 +1686,10 @@ static void 
  blockdev_backup_prepare(BlkTransactionState *common, Error **errp)
   }
   aio_context_acquire(state-aio_context);
   
  +state-bs = bs;
  +error_setg(state-blocker, blockdev-backup in progress);
  +bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, state-blocker);
  +
 
 Do you test this patch? You need to read from bs to do backup!!
 If the mode is none, you also need to write to bs!!

This blocker is only temporary and is removed in blockdev_backup_clean before
qmp_transaction returns.

Fam

 
 Thanks
 Wen Congyang
 
   qmp_blockdev_backup(backup-device, backup-target,
   backup-sync,
   backup-has_speed, backup-speed,
  @@ -1696,7 +1701,6 @@ static void 
  blockdev_backup_prepare(BlkTransactionState *common, Error **errp)
   return;
   }
   
  -state-bs = bs;
   state-job = state-bs-job;
   }
   
  @@ -1715,6 +1719,10 @@ static void 
  blockdev_backup_clean(BlkTransactionState *common)
   {
   BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, 
  common);
   
  +if (state-bs) {
  +bdrv_op_unblock(state-bs, BLOCK_OP_TYPE_DEVICE_IO, 
  state-blocker);
  +error_free(state-blocker);
  +}
   if (state-aio_context) {
   aio_context_release(state-aio_context);
   }

Re: [Qemu-block] [PATCH 21/34] block: Consider all block layer options in append_open_options

2015-05-13 Thread Max Reitz


On 08.05.2015 19:21, Kevin Wolf wrote:

The code already special-cased node-name, which is currently the only
option passed in the QDict that isn't driver-specific. Generalise the
code to take all general block layer options into consideration.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
  block.c | 26 ++
  1 file changed, 18 insertions(+), 8 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com

Re: [Qemu-block] [PATCH v8 0/2] block: enforce minimal 4096 alignment in qemu_blockalign

2015-05-13 Thread Kevin Wolf

Am 12.05.2015 um 16:30 hat Denis V. Lunev geschrieben:
 I have used the following program to test
 [...]
 and the amount of requests sent to disk (could be calculated counting
 number of lines in the output of blktrace) is reduced about 2 times.
 [...]
 Signed-off-by: Denis V. Lunev d...@openvz.org
 CC: Paolo Bonzini pbonz...@redhat.com
 CC: Kevin Wolf kw...@redhat.com
 CC: Stefan Hajnoczi stefa...@redhat.com

Reviewed-by: Kevin Wolf kw...@redhat.com

Re: [Qemu-block] [PATCH 26/34] block: reopen: Document option precedence and refactor accordingly

2015-05-13 Thread Max Reitz


On 08.05.2015 19:21, Kevin Wolf wrote:

The interesting part of reopening an image is from which sources the
effective options should be taken, i.e. which options take precedence
over which other options. This patch documents the precedence that will
be implemented in the following patches.

It also refactors bdrv_reopen_queue(), so that the top-level reopened
node is handled the same way as children are. Option/flag inheritance
from the parent becomes just one item in the list and is done at the
beginning of the function, similar to how the other items are/will be
handled.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
  block.c | 39 +--
  1 file changed, 33 insertions(+), 6 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com

Re: [Qemu-block] [PATCH v2 01/11] block: Add op blocker type device IO

2015-05-13 Thread Fam Zheng

On Wed, 05/13 14:04, Paolo Bonzini wrote:
 
 
 On 13/05/2015 19:28, Fam Zheng wrote:
  @@ -478,6 +478,10 @@ static int blk_check_request(BlockBackend *blk, 
  int64_t sector_num,
   return -EIO;
   }
   
  +if (bdrv_op_is_blocked(blk-bs, BLOCK_OP_TYPE_DEVICE_IO, NULL)) {
  +return -EBUSY;
  +}
 
 I think this is incorrect.  It's fine for backends to generate more I/O
 after a blocker is submitted, as long as it's bounded.
 
 For example, SCSI requests can result in many consecutive I/Os:
 
 (1) FUA requests are split in write+flush
 
 (2) adapters that do not use QEMUSGList-based I/O only read 128K at a time
 
 (3) WRITE SAME operations are also split in chunks
 
 (4) UNMAP operations process one descriptor at a time

I don't understand the point of these examples. If we don't return -EBUSY here,
the request will sneak into block/io.c and perhaps break qmp transaction
semantics, if it lands between two backups.

Fam

 
 Paolo
 
   return blk_check_byte_request(blk, sector_num * BDRV_SECTOR_SIZE,
 nb_sectors * BDRV_SECTOR_SIZE);
   }

Re: [Qemu-block] [PATCH v2 01/11] block: Add op blocker type device IO

2015-05-13 Thread Paolo Bonzini



On 13/05/2015 17:02, Fam Zheng wrote:
  For example, SCSI requests can result in many consecutive I/Os:
  
  (1) FUA requests are split in write+flush
  
  (2) adapters that do not use QEMUSGList-based I/O only read 128K at a time
  
  (3) WRITE SAME operations are also split in chunks
  
  (4) UNMAP operations process one descriptor at a time
 I don't understand the point of these examples. If we don't return -EBUSY 
 here,
 the request will sneak into block/io.c and perhaps break qmp transaction
 semantics, if it lands between two backups.

It won't, because after blocking DEVICE_IO you will always drain I/O and
the bdrv_drain will loop until the above are all satisfied.

Paolo

Re: [Qemu-block] [PATCH v2 10/11] blockdev: Block device IO during blockdev-backup transaction

2015-05-13 Thread Paolo Bonzini

On 13/05/2015 16:55, Fam Zheng wrote:
  So if we consider this idea worthwhile, and decide that pausing device
  I/O on the target should pause the block job, the backup job actually
  has to prevent *adding a DEVICE_IO blocker* on the target.
 
 When do we need to pause device IO on the target? For drive-backup the target
 is anonymous and no device exists on it, so it's out of question. For
 blockdev-backup, users can have a device attached, but I'm not sure we should
 pause the job in this case.

Well, it depends on the meaning you give to pause/device_io blocker.
 For me it was I want exclusive ownership of the disk, no one else
should write.

For blockdev-backup for example you could mirror the backup destination
somewhere else, and mirror needs to drain the source.  For that to make
sense, the block job has to be paused.

Of course this is contrived, but it's better to make the design generic.

   This
   meta-block is not possible in your design, which is a pity because on
   the surface it looked nicer than mine.
   
   FWIW, my original idea was:
   
   - bdrv_pause checks if there is an operation blocker for PAUSE.  if it
   is there, it fails
 
 It's more complicated, because bdrv_drain_all can't fail now, if we want
 bdrv_pause there, then it could fail, and will become much harder to use.

I think we don't need bdrv_pause there.  See my reply to patch 11---we
want it in the callers.

 In your idea, who will block PAUSE, and why?

For example, blockdev-backup/drive-backup can block PAUSE on the target.
 But maybe they only need to add an op blocker notifier, and use it to
block device I/O on the source and drain it.  So your idea is safe.  Good!

Another idea was to block PAUSE in virtio-scsi until it's fixed.

Paolo

   - otherwise, bdrv_pause invokes a notifier list if this is the outermost
   call. if not the outermost call, it does nothing
   
   - bdrv_resume does the same, but does not need a blocker
   
   - drive-backup should block PAUSE on its target

Re: [Qemu-block] [PATCH 27/34] block: Add infrastructure for option inheritance

2015-05-13 Thread Kevin Wolf

Am 13.05.2015 um 17:10 hat Max Reitz geschrieben:
 On 08.05.2015 19:21, Kevin Wolf wrote:
 Options are not actually inherited from the parent node yet, but this
 commit lays the grounds for doing so.
 
 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
   block.c   | 51 
  ++-
   include/block/block_int.h |  3 ++-
   2 files changed, 30 insertions(+), 24 deletions(-)
 
 diff --git a/block.c b/block.c
 index 1e5625f..9259b42 100644
 --- a/block.c
 +++ b/block.c
 @@ -678,11 +678,14 @@ static int bdrv_temp_snapshot_flags(int flags)
   }
   /*
 - * Returns the flags that bs-file should get if a protocol driver is 
 expected,
 - * based on the given flags for the parent BDS
 + * Returns the options and flags that bs-file should get if a protocol 
 driver
 + * is expected, based on the given flags for the parent BDS
*/
 -static int bdrv_inherited_flags(int flags)
 +static void bdrv_inherited_options(int *child_flags, QDict *child_options,
 +   int parent_flags, QDict *parent_options)
   {
 +int flags = parent_flags;
 +
   /* Enable protocol handling, disable format probing for bs-file */
   flags |= BDRV_O_PROTOCOL;
 @@ -693,45 +696,46 @@ static int bdrv_inherited_flags(int flags)
   /* Clear flags that only apply to the top layer */
   flags = ~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING | BDRV_O_COPY_ON_READ);
 -return flags;
 +*child_flags = flags;
   }
   const BdrvChildRole child_file = {
 -.inherit_flags = bdrv_inherited_flags,
 +.inherit_options = bdrv_inherited_options,
   };
 -/*
 - * Returns the flags that bs-file should get if the use of formats (and not
 - * only protocols) is permitted for it, based on the given flags for the 
 parent
 - * BDS
 - */
 
 Is removing this comment intentional?

Looks like a mismerge, thanks.

Kevin

[Qemu-block] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext

2015-05-13 Thread Alexander Yarygin

After the commit 9b536adc (block: acquire AioContext in
bdrv_drain_all()) the aio_poll() function got called for every
BlockDriverState, in assumption that every device may have its own
AioContext. The bdrv_drain_all() function is called in each
virtio_reset() call, which in turn is called for every virtio-blk
device on initialization, so we got aio_poll() called
'length(device_list)^2' times.

If we have thousands of disks attached, there are a lot of
BlockDriverStates but only a few AioContexts, leading to tons of
unnecessary aio_poll() calls. For example, startup times with 1000 disks
takes over 13 minutes.

This patch changes the bdrv_drain_all() function allowing it find shared
AioContexts and to call aio_poll() only for unique ones. This results in
much better startup times, e.g. 1000 disks do come up within 5 seconds.

Cc: Christian Borntraeger borntrae...@de.ibm.com
Cc: Cornelia Huck cornelia.h...@de.ibm.com
Cc: Kevin Wolf kw...@redhat.com
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Stefan Hajnoczi stefa...@redhat.com
Signed-off-by: Alexander Yarygin yary...@linux.vnet.ibm.com
---
 block.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index f2f8ae7..7414815 100644
--- a/block.c
+++ b/block.c
@@ -1994,7 +1994,6 @@ static bool bdrv_drain_one(BlockDriverState *bs)
 bdrv_flush_io_queue(bs);
 bdrv_start_throttled_reqs(bs);
 bs_busy = bdrv_requests_pending(bs);
-bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy);
 return bs_busy;
 }
 
@@ -2010,8 +2009,12 @@ static bool bdrv_drain_one(BlockDriverState *bs)
  */
 void bdrv_drain(BlockDriverState *bs)
 {
-while (bdrv_drain_one(bs)) {
+bool busy = true;
+
+while (busy) {
 /* Keep iterating */
+busy = bdrv_drain_one(bs);
+busy |= aio_poll(bdrv_get_aio_context(bs), busy);
 }
 }
 
@@ -2032,6 +2035,7 @@ void bdrv_drain_all(void)
 /* Always run first iteration so any pending completion BHs run */
 bool busy = true;
 BlockDriverState *bs;
+GList *aio_ctxs = NULL;
 
 while (busy) {
 busy = false;
@@ -2041,9 +2045,14 @@ void bdrv_drain_all(void)
 
 aio_context_acquire(aio_context);
 busy |= bdrv_drain_one(bs);
+if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context)) {
+busy |= aio_poll(aio_context, busy);
+aio_ctxs = g_list_append(aio_ctxs, aio_context);
+}
 aio_context_release(aio_context);
 }
 }
+g_list_free(aio_ctxs);
 }
 
 /* make a BlockDriverState anonymous by removing from bdrv_state and
-- 
1.9.1

Re: [Qemu-block] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext

2015-05-13 Thread Alexander Yarygin

Paolo Bonzini pbonz...@redhat.com writes:

 On 13/05/2015 17:18, Alexander Yarygin wrote:
 After the commit 9b536adc (block: acquire AioContext in
 bdrv_drain_all()) the aio_poll() function got called for every
 BlockDriverState, in assumption that every device may have its own
 AioContext. The bdrv_drain_all() function is called in each
 virtio_reset() call,

 ... which should actually call bdrv_drain().  Can you fix that?


I thought about it, but couldn't come to conclusion that it's safe. The
comment above bdrv_drain_all() states ... it is not possible to have a
function to drain a single device's I/O queue., besides that what if we
have several virtual disks that share host file?
Or I'm wrong and it's ok to do?

 which in turn is called for every virtio-blk
 device on initialization, so we got aio_poll() called
 'length(device_list)^2' times.
 
 If we have thousands of disks attached, there are a lot of
 BlockDriverStates but only a few AioContexts, leading to tons of
 unnecessary aio_poll() calls. For example, startup times with 1000 disks
 takes over 13 minutes.
 
 This patch changes the bdrv_drain_all() function allowing it find shared
 AioContexts and to call aio_poll() only for unique ones. This results in
 much better startup times, e.g. 1000 disks do come up within 5 seconds.

 I'm not sure this patch is correct.  You may have to call aio_poll
 multiple times before a BlockDriverState is drained.

 Paolo



Ah, right. We need second loop, something like this:

@@ -2030,20 +2033,33 @@ void bdrv_drain(BlockDriverState *bs)
 void bdrv_drain_all(void)
 {
 /* Always run first iteration so any pending completion BHs run */
-bool busy = true;
+bool busy = true, pending = false;
 BlockDriverState *bs;
+GList *aio_ctxs = NULL, *ctx;
+AioContext *aio_context;

 while (busy) {
 busy = false;

 QTAILQ_FOREACH(bs, bdrv_states, device_list) {
-AioContext *aio_context = bdrv_get_aio_context(bs);
+aio_context = bdrv_get_aio_context(bs);

 aio_context_acquire(aio_context);
 busy |= bdrv_drain_one(bs);
 aio_context_release(aio_context);
+if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context))
+aio_ctxs = g_list_append(aio_ctxs, aio_context);
+}
+pending = busy;
+
+for (ctx = aio_ctxs; ctx != NULL; ctx = ctx-next) {
+aio_context = ctx-data;
+aio_context_acquire(aio_context);
+busy |= aio_poll(aio_context, pending);
+aio_context_release(aio_context);
 }
 }
+g_list_free(aio_ctxs);
 }

That looks quite ugly for me and breaks consistence of bdrv_drain_one()
since it doesn't call aio_poll() anymore...


 Cc: Christian Borntraeger borntrae...@de.ibm.com
 Cc: Cornelia Huck cornelia.h...@de.ibm.com
 Cc: Kevin Wolf kw...@redhat.com
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Stefan Hajnoczi stefa...@redhat.com
 Signed-off-by: Alexander Yarygin yary...@linux.vnet.ibm.com
 ---
  block.c | 13 +++--
  1 file changed, 11 insertions(+), 2 deletions(-)
 
 diff --git a/block.c b/block.c
 index f2f8ae7..7414815 100644
 --- a/block.c
 +++ b/block.c
 @@ -1994,7 +1994,6 @@ static bool bdrv_drain_one(BlockDriverState *bs)
  bdrv_flush_io_queue(bs);
  bdrv_start_throttled_reqs(bs);
  bs_busy = bdrv_requests_pending(bs);
 -bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy);
  return bs_busy;
  }
  
 @@ -2010,8 +2009,12 @@ static bool bdrv_drain_one(BlockDriverState *bs)
   */
  void bdrv_drain(BlockDriverState *bs)
  {
 -while (bdrv_drain_one(bs)) {
 +bool busy = true;
 +
 +while (busy) {
  /* Keep iterating */
 +busy = bdrv_drain_one(bs);
 +busy |= aio_poll(bdrv_get_aio_context(bs), busy);
  }
  }
  
 @@ -2032,6 +2035,7 @@ void bdrv_drain_all(void)
  /* Always run first iteration so any pending completion BHs run */
  bool busy = true;
  BlockDriverState *bs;
 +GList *aio_ctxs = NULL;
  
  while (busy) {
  busy = false;
 @@ -2041,9 +2045,14 @@ void bdrv_drain_all(void)
  
  aio_context_acquire(aio_context);
  busy |= bdrv_drain_one(bs);
 +if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context)) {
 +busy |= aio_poll(aio_context, busy);
 +aio_ctxs = g_list_append(aio_ctxs, aio_context);
 +}
  aio_context_release(aio_context);
  }
  }
 +g_list_free(aio_ctxs);
  }
  
  /* make a BlockDriverState anonymous by removing from bdrv_state and

Re: [Qemu-block] [PATCH v2 11/11] block: Block device IO during bdrv_drain and bdrv_drain_all

2015-05-13 Thread Paolo Bonzini



On 13/05/2015 17:17, Fam Zheng wrote:
  
  It can be the topic of a separate series.  But this patch brings a
  false sense of security (either the blocker is unnecessary, or it
  needs to last after bdrv_drain returns), so I think it should be
  dropped.
 Doesn't this let bdrv_drain_all return when virtio-blk-dataplane is having 
 high
 workload, in places where you say the blocker is unnecessary?

Yes, you're right.  Please document it in the commit message and the
code, it's tricky.

Paolo

Re: [Qemu-block] [Qemu-devel] [PATCH v2 10/11] blockdev: Block device IO during blockdev-backup transaction

2015-05-13 Thread Wen Congyang

On 05/13/2015 08:55 PM, Fam Zheng wrote:
 On Wed, 05/13 19:22, Wen Congyang wrote:
 On 05/14/2015 01:28 AM, Fam Zheng wrote:
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  blockdev.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

 diff --git a/blockdev.c b/blockdev.c
 index ae52d27..bd28183 100644
 --- a/blockdev.c
 +++ b/blockdev.c
 @@ -1649,6 +1649,7 @@ typedef struct BlockdevBackupState {
  BlockDriverState *bs;
  BlockJob *job;
  AioContext *aio_context;
 +Error *blocker;
  } BlockdevBackupState;
  
  static void blockdev_backup_prepare(BlkTransactionState *common, Error 
 **errp)
 @@ -1685,6 +1686,10 @@ static void 
 blockdev_backup_prepare(BlkTransactionState *common, Error **errp)
  }
  aio_context_acquire(state-aio_context);
  
 +state-bs = bs;
 +error_setg(state-blocker, blockdev-backup in progress);
 +bdrv_op_block(bs, BLOCK_OP_TYPE_DEVICE_IO, state-blocker);
 +

 Do you test this patch? You need to read from bs to do backup!!
 If the mode is none, you also need to write to bs!!
 
 This blocker is only temporary and is removed in blockdev_backup_clean before
 qmp_transaction returns.

Yes. Another question:
We will use bdrv_op_block_all() in the job, and don't unblock 
BLOCK_OP_TYPE_DEVICE_IO.
Is it OK?

Thanks
Wen Congyang

 
 Fam
 

 Thanks
 Wen Congyang

  qmp_blockdev_backup(backup-device, backup-target,
  backup-sync,
  backup-has_speed, backup-speed,
 @@ -1696,7 +1701,6 @@ static void 
 blockdev_backup_prepare(BlkTransactionState *common, Error **errp)
  return;
  }
  
 -state-bs = bs;
  state-job = state-bs-job;
  }
  
 @@ -1715,6 +1719,10 @@ static void 
 blockdev_backup_clean(BlkTransactionState *common)
  {
  BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, 
 common);
  
 +if (state-bs) {
 +bdrv_op_unblock(state-bs, BLOCK_OP_TYPE_DEVICE_IO, 
 state-blocker);
 +error_free(state-blocker);
 +}
  if (state-aio_context) {
  aio_context_release(state-aio_context);
  }


 .

Re: [Qemu-block] [Qemu-devel] [PATCH v2 10/11] blockdev: Block device IO during blockdev-backup transaction

2015-05-13 Thread Fam Zheng

On Thu, 05/14 09:12, Wen Congyang wrote:
 We will use bdrv_op_block_all() in the job, and don't unblock 
 BLOCK_OP_TYPE_DEVICE_IO.
 Is it OK?

Good question and you're right, it's broken in this series. I will fix it.

Fam

Re: [Qemu-block] [Qemu-devel] [PATCH 14/34] qcow2: Factor out qcow2_update_options()

2015-05-13 Thread Eric Blake

On 05/13/2015 03:11 AM, Kevin Wolf wrote:
 Am 12.05.2015 um 22:04 hat Eric Blake geschrieben:
 On 05/08/2015 11:21 AM, Kevin Wolf wrote:
 Eventually we want to be able to change options at runtime. As a first
 step towards that goal, separate some option handling code from the
 general initialisation code in qcow2_open().

 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  block/qcow2.c | 135 
 +-
  1 file changed, 76 insertions(+), 59 deletions(-)

 +} else {
 +error_setg(errp, Unsupported value '%s' for qcow2 option 
 +   'overlap-check'. Allowed are either of the following: 
 +   none, constant, cached, all, opt_overlap_check);

 Pre-existing due to code motion, but I find s/either/any/ easier to read.
 
 Isn't either only for a choice between two things anyway?

Exactly. And since there are four things, that's why I found it easier
to read.

 
 The series isn't long enough yet, I'll fix it. :-)

If you want conciseness, this would also work:

Unsupported value '%s' for qcow2 option 'overlap-check'; expecting one
of: none, constant, cached, all

or even omitting the list of valid options altogether (which in the long
run is easier to maintain if we anticipate extending the list - as there
are fewer places where copies of the list need to be kept in sync)

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v5 0/2] block: enforce minimal 4096 alignment in qemu_blockalign

2015-05-13 Thread Denis V. Lunev


On 13/05/15 18:43, Stefan Hajnoczi wrote:

On Tue, May 12, 2015 at 12:46:57PM +0200, Paolo Bonzini wrote:


On 12/05/2015 12:19, Denis V. Lunev wrote:


hades /vol $ strace -f -e pwrite -e raw=write,pwrite  qemu-io -n -c
write -P 0x11 0 64M ./1.img
Process 19326 attached
[pid 19326] pwrite(0x6, 0x7fac07fff200, 0x400, 0x5) = 0x400
 1 GB Write from userspace

FWIW this is 64 MB (as expected).


wrote 67108864/67108864 bytes at offset 0
64 MiB, 1 ops; 0.2964 sec (215.863 MiB/sec and 3.3729 ops/sec)
[pid 19326] +++ exited with 0 +++
+++ exited with 0 +++
hades /vol $
   9,01  26674.030359772 19326  Q  WS 473095 + 1016 [(null)]
   9,01  26774.030361546 19326  Q  WS 474111 + 8 [(null)]
   9,01  26874.030395522 19326  Q  WS 474119 + 1016 [(null)]
   9,01  26974.030397509 19326  Q  WS 475135 + 8 [(null)]

This means, yes, kernel is INEFFECTIVE performing direct IO with
not aligned address. For example, without direct IO the pattern is
much better.

I think this means that the kernel is DMAing at most 128 pages at a
time.  If the buffer is misaligned, you need 129 pages and the kernel
then splits the request into a 128 page and a 1 page part.

This looks like a hardware limit, and the kernel probably cannot really
do anything about it because we requested O_DIRECT.  So your patch makes
sense.

A 64 MB buffer was given in the pwrite() call.

The first and the last 128-page write requests may have partial pages,
but why should the rest not use fully aligned 1024 sector writes?

Maybe the buffer is split by the max sectors per request before the
alignment requirements are considered.  It would be more efficient to
first split off the unaligned parts.

So I think the kernel is still doing something suboptimal here.

Stefan

I agree with this. Kernel guys are aware and may be we will have
the fix after a while... I have heard (not tested) that performance
loss over multi-queue SSD is around 30%.

Den

Re: [Qemu-block] [PATCH] block: Let bdrv_drain_all() to call aio_poll() for each AioContext

2015-05-13 Thread Alexander Yarygin

Alberto Garcia be...@igalia.com writes:

 On Wed 13 May 2015 05:18:31 PM CEST, Alexander Yarygin 
 yary...@linux.vnet.ibm.com wrote:

 +if (!aio_ctxs || !g_list_find(aio_ctxs, aio_context)) {
 +busy |= aio_poll(aio_context, busy);
 +aio_ctxs = g_list_append(aio_ctxs, aio_context);
 +}

 g_list_append() walks the whole list in order to append an element, I
 think you should use _prepend() instead.

 And since that's the only operation you're doing you can use a GSList
 instead.

 Berto

This seems reasonable, thanks.

Re: [Qemu-block] [Qemu-devel] [PATCH] block/mirror: Sleep periodically during bitmap scanning

2015-05-13 Thread Wen Congyang

On 05/13/2015 11:11 AM, Fam Zheng wrote:
 Before, we only yield after initializing dirty bitmap, where the QMP
 command would return. That may take very long, and guest IO will be
 blocked.
 
 Add sleep points like the later mirror iterations.
 
 Signed-off-by: Fam Zheng f...@redhat.com

Reviewed-by: Wen Congyang we...@cn.fujitsu.com

 ---
  block/mirror.c | 13 -
  1 file changed, 12 insertions(+), 1 deletion(-)
 
 diff --git a/block/mirror.c b/block/mirror.c
 index 1a1d997..baed225 100644
 --- a/block/mirror.c
 +++ b/block/mirror.c
 @@ -467,11 +467,23 @@ static void coroutine_fn mirror_run(void *opaque)
  sectors_per_chunk = s-granularity  BDRV_SECTOR_BITS;
  mirror_free_init(s);
  
 +last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
  if (!s-is_none_mode) {
  /* First part, loop on the sectors and initialize the dirty bitmap.  
 */
  BlockDriverState *base = s-base;
  for (sector_num = 0; sector_num  end; ) {
  int64_t next = (sector_num | (sectors_per_chunk - 1)) + 1;
 +int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 +
 +if (now - last_pause_ns  SLICE_TIME) {
 +last_pause_ns = now;
 +block_job_sleep_ns(s-common, QEMU_CLOCK_REALTIME, 0);
 +}
 +
 +if (block_job_is_cancelled(s-common)) {
 +goto immediate_exit;
 +}
 +
  ret = bdrv_is_allocated_above(bs, base,
sector_num, next - sector_num, n);
  
 @@ -490,7 +502,6 @@ static void coroutine_fn mirror_run(void *opaque)
  }
  
  bdrv_dirty_iter_init(s-dirty_bitmap, s-hbi);
 -last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
  for (;;) {
  uint64_t delay_ns = 0;
  int64_t cnt;

Re: [Qemu-block] [Qemu-devel] [PATCH] block/mirror: Sleep periodically during bitmap scanning

2015-05-13 Thread Fam Zheng

On Wed, 05/13 13:17, Wen Congyang wrote:
 On 05/13/2015 11:11 AM, Fam Zheng wrote:
  Before, we only yield after initializing dirty bitmap, where the QMP
  command would return. That may take very long, and guest IO will be
  blocked.
 
 Do you have such case to reproduce it? If the disk image is too larger,
 and I think qemu doesn't cache all metedata in the memory. So we will
 yield in bdrv_is_allocated_above() when we read the metedata from the
 disk.

True for qcow2, but raw-posix has no such yield points, because it uses
lseek(..., SEEK_HOLE). I do have a reproducer - just try a big raw image on
your ext4.

Fam

 
 Thanks
 Wen Congyang
 
  
  Add sleep points like the later mirror iterations.
  
  Signed-off-by: Fam Zheng f...@redhat.com
  ---
   block/mirror.c | 13 -
   1 file changed, 12 insertions(+), 1 deletion(-)
  
  diff --git a/block/mirror.c b/block/mirror.c
  index 1a1d997..baed225 100644
  --- a/block/mirror.c
  +++ b/block/mirror.c
  @@ -467,11 +467,23 @@ static void coroutine_fn mirror_run(void *opaque)
   sectors_per_chunk = s-granularity  BDRV_SECTOR_BITS;
   mirror_free_init(s);
   
  +last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
   if (!s-is_none_mode) {
   /* First part, loop on the sectors and initialize the dirty 
  bitmap.  */
   BlockDriverState *base = s-base;
   for (sector_num = 0; sector_num  end; ) {
   int64_t next = (sector_num | (sectors_per_chunk - 1)) + 1;
  +int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
  +
  +if (now - last_pause_ns  SLICE_TIME) {
  +last_pause_ns = now;
  +block_job_sleep_ns(s-common, QEMU_CLOCK_REALTIME, 0);
  +}
  +
  +if (block_job_is_cancelled(s-common)) {
  +goto immediate_exit;
  +}
  +
   ret = bdrv_is_allocated_above(bs, base,
 sector_num, next - sector_num, 
  n);
   
  @@ -490,7 +502,6 @@ static void coroutine_fn mirror_run(void *opaque)
   }
   
   bdrv_dirty_iter_init(s-dirty_bitmap, s-hbi);
  -last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
   for (;;) {
   uint64_t delay_ns = 0;
   int64_t cnt;

Re: [Qemu-block] [Qemu-devel] [PATCH 4/5] qemu-io: prompt for encryption keys when required

2015-05-13 Thread Daniel P. Berrange

On Tue, May 12, 2015 at 12:32:53PM -0600, Eric Blake wrote:
 On 05/12/2015 10:09 AM, Daniel P. Berrange wrote:
  The qemu-io tool does not check if the image is encrypted so
  historically would silently corrupt the sectors by writing
  plain text data into them instead of cipher text. The earlier
  commit turns this mistake into a fatal abort, so check for
  encryption and prompt for key when required.
 
 Doesn't that mean that 'git bisect' gives a crashing qemu-io for 3
 patches?  Should this be rearranged so that 1/5 comes after this to
 avoid triggering the abort?

I'm ambivalent on that - previously qemu-io was data corrupting
for this scenario, so crashing isn't really that much worse :-)

It is easy enough to reorder these though if that's desired.
The latter patches have no build time dep on the 1st patch, so
its trivial for Kevin to re-order when applying if he thinks it
is worth it.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

56 matches

Mail list logo