Re: [Qemu-block] [PATCH v20 00/10] Block replication for continuous checkpoints

2016-06-12 Thread Changlong Xie

On 06/08/2016 09:36 PM, Eric Blake wrote:

On 06/07/2016 07:11 PM, Changlong Xie wrote:

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).



Side note: Including qemu-trivial in CC: on a patch series at v20 feels
wrong.  Obviously it is not trivial to be ten patches with this much churn.



To avoid bothering qemu-trivial maintainers, i'll resend the patch sets 
until you have some comments on [PATCH V20 07/10].


Thanks
-Xie





Re: [Qemu-block] [PATCH v20 07/10] Introduce new APIs to do replication operation

2016-06-12 Thread Changlong Xie

On 06/08/2016 09:41 PM, Eric Blake wrote:

On 06/07/2016 07:11 PM, Changlong Xie wrote:

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 


No mention of the API names in the commit message?  Grepping 'git log'
is easier if there is something useful to grep for.


I'll use a brief description, such as below:

This commit introduces six replication interfaces(for block, network 
etc). Firstly we can use replication_(new/remove) to create/destroy 
replication instances, then in migration we can use 
replication_(start/stop/do_checkpoint/get_error)_all to handle all 
replication operations. More detail please refer to replication.h





---
  Makefile.objs|   1 +
  qapi/block-core.json |  13 
  replication.c| 105 ++
  replication.h| 176 +++
  4 files changed, 295 insertions(+)
  create mode 100644 replication.c
  create mode 100644 replication.h




+++ b/qapi/block-core.json
@@ -2032,6 +2032,19 @@
  '*read-pattern': 'QuorumReadPattern' } }

  ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.7
+##
+{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
+


This part is fine, from an interface point of view. However, I have not
closely reviewed the rest of the patch or series.  That said, here's
some quick things that caught my eye.


Appreciate.





+++ b/replication.c
@@ -0,0 +1,105 @@
+/*
+ * Replication filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Changlong Xie 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "replication.h"


All new .c files must include "qemu/osdep.h" first.



+++ b/replication.h
@@ -0,0 +1,176 @@
+/*
+ * Replication filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Changlong Xie 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef REPLICATION_H
+#define REPLICATION_H
+
+#include "qemu/osdep.h"


And .h files must NOT include osdep.h.


+#include "qapi/error.h"


Do you really need the full error.h, or is typedefs.h enough to get the
forward declaration of Error?


It's for error_propagate() in replication.c, and in summary to your 
comments on replication.h/replication.c, i'll

1) remove all uncorrelated *.h in replication.h
2) use following header files in replication.c

#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu/queue.h"
#include "replication.h"

Thanks










Re: [Qemu-block] [PATCH v20 00/10] Block replication for continuous checkpoints

2016-06-12 Thread Changlong Xie

On 06/10/2016 09:22 PM, Michael Tokarev wrote:

08.06.2016 04:11, Changlong Xie wrote:

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

...

I'm not sure I understand why this has been sent to qemu-trivial? :)



$HOME/.gitconfig mislead me, will remove it.

Thanks


Thanks

/mjt


.







[Qemu-block] [PATCH] macio: Use blk_drain instead of blk_drain_all

2016-06-12 Thread Fam Zheng
We only care about the associated backend, so blk_drain is more
appropriate here.

Signed-off-by: Fam Zheng 
---
 hw/ide/macio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index 78c10a0..a8c7321 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -400,7 +400,7 @@ static void pmac_ide_flush(DBDMA_io *io)
 IDEState *s = idebus_active_if(>bus);
 
 if (s->bus->dma->aiocb) {
-blk_drain_all();
+blk_drain(s->blk);
 }
 }
 
-- 
2.8.3




[Qemu-block] [PATCH v2] mirror: follow AioContext change gracefully

2016-06-12 Thread Fam Zheng
From: Stefan Hajnoczi 

When dataplane is enabled or disabled the drive switches to a new
AioContext.  The mirror block job must also move to the new AioContext
so that drive accesses are always made within its AioContext.

This patch partially achieves that by draining target and source
requests to reach a quiescent point.  The job is resumed in the new
AioContext after moving s->target into the new AioContext.

The quiesce_requested flag is added to deal with yield points in
block_job_sleep_ns(), bdrv_is_allocated_above(), and
bdrv_get_block_status_above().  Previously they continue executing in
the old AioContext. The nested aio_poll in mirror_detach_aio_context
will drive the mirror coroutine upto fixed yield points, where
mirror_check_for_quiesce is called.

Cc: Fam Zheng 
Cc: Paolo Bonzini 
Cc: Jeff Cody 
Signed-off-by: Stefan Hajnoczi 
[Drain source as well, and add s->quiesce_requested flag. -- Fam]
Signed-off-by: Fam Zheng 

---

v2: Picked up Stefan's RFC patch and move on towards a more complete
fix.  Please review!

Jason: it would be nice if you could test this version again. It differs
from the previous version.
---
 block/mirror.c | 45 -
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 80fd3c7..142199a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -63,6 +63,8 @@ typedef struct MirrorBlockJob {
 int ret;
 bool unmap;
 bool waiting_for_io;
+bool quiesce_requested; /* temporarily detached to move AioContext,
+   don't do more I/O */
 int target_cluster_sectors;
 int max_iov;
 } MirrorBlockJob;
@@ -119,7 +121,7 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 qemu_iovec_destroy(>qiov);
 g_free(op);
 
-if (s->waiting_for_io) {
+if (s->waiting_for_io && !s->quiesce_requested) {
 qemu_coroutine_enter(s->common.co, NULL);
 }
 }
@@ -307,6 +309,14 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
 }
 }
 
+static void coroutine_fn mirror_check_for_quiesce(MirrorBlockJob *s)
+{
+if (s->quiesce_requested) {
+s->quiesce_requested = false;
+qemu_coroutine_yield();
+}
+}
+
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
 BlockDriverState *source = blk_bs(s->common.blk);
@@ -331,6 +341,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 mirror_wait_for_io(s);
 }
 
+mirror_check_for_quiesce(s);
 /* Find the number of consective dirty chunks following the first dirty
  * one, and wait for in flight requests in them. */
 while (nb_chunks * sectors_per_chunk < (s->buf_size >> BDRV_SECTOR_BITS)) {
@@ -442,6 +453,31 @@ static void mirror_drain(MirrorBlockJob *s)
 }
 }
 
+static void mirror_attached_aio_context(AioContext *new_context, void *opaque)
+{
+MirrorBlockJob *s = opaque;
+
+blk_set_aio_context(s->target, new_context);
+
+/* Resume execution */
+assert(!s->quiesce_requested);
+if (s->waiting_for_io) {
+qemu_coroutine_enter(s->common.co, NULL);
+}
+}
+
+static void mirror_detach_aio_context(void *opaque)
+{
+MirrorBlockJob *s = opaque;
+
+/* Complete pending write requests */
+assert(!s->quiesce_requested);
+s->quiesce_requested = true;
+while (s->quiesce_requested || s->in_flight) {
+aio_poll(blk_get_aio_context(s->common.blk), true);
+}
+}
+
 typedef struct {
 int ret;
 } MirrorExitData;
@@ -491,6 +527,8 @@ static void mirror_exit(BlockJob *job, void *opaque)
 if (replace_aio_context) {
 aio_context_release(replace_aio_context);
 }
+blk_remove_aio_context_notifier(s->common.blk, mirror_attached_aio_context,
+mirror_detach_aio_context, s);
 g_free(s->replaces);
 bdrv_op_unblock_all(target_bs, s->common.blocker);
 blk_unref(s->target);
@@ -583,6 +621,7 @@ static void coroutine_fn mirror_run(void *opaque)
 block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, 0);
 }
 
+mirror_check_for_quiesce(s);
 if (block_job_is_cancelled(>common)) {
 goto immediate_exit;
 }
@@ -612,6 +651,7 @@ static void coroutine_fn mirror_run(void *opaque)
 goto immediate_exit;
 }
 
+mirror_check_for_quiesce(s);
 cnt = bdrv_get_dirty_count(s->dirty_bitmap);
 /* s->common.offset contains the number of bytes already processed so
  * far, cnt is the number of dirty sectors remaining and
@@ -851,6 +891,9 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
 
 bdrv_op_block_all(target, s->common.blocker);
 
+blk_add_aio_context_notifier(s->common.blk, mirror_attached_aio_context,
+