Re: [Qemu-block] [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication

2016-03-09 Thread Changlong Xie

On 03/05/2016 01:53 AM, Stefan Hajnoczi wrote:

On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:

+static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
+{
+Error *local_err = NULL;
+int ret;
+
+if (!s->secondary_disk->bs->job) {
+error_setg(errp, "Backup job was cancelled unexpectedly");
+return;
+}


Do you have test cases for the replication driver?

This error code path seems like something that should be exercised to
make sure it works.

Basic start/checkpoint/stop and checks that the active/secondary/hidden
disks contain the expected data are also important.

Probably the easiest way to do this would be a file called
tests/test-replication.c that calls start/checkpoint/stop and read/write
functions directly instead of running a guest.



Hi Stefan

I'm working on the testcase now, and encount some issues.

Replication is different from other block drivers, we just use it for 
COLO live migration what means "primary" and "secondary" *MUST*

co-work with each other.

(1)
There are two test "prototype":
1 "COLO prototype":
simulate COLO environment, run two guests at the same time and 
setup nbd on both 'primary'/'secondary' side.


2 "Separate prototype"
test relevant APIs in separate 'primary'/'secondary' mode, so we
don't need run two guests and setup nbd on both side

I prefer *"Separate prototype"*. Which one do you prefer?

(2)
Since we have replication_(start/stop/do_checkpoint/get_error)_all APIs, 
i know how to test start/checkpoint/stop/get_error callbacks.


But I'm confused about how to test read/write of replication, are there 
some ready-made APIs for qemu test system?


I know, for interpreted languages, we can use:
 $QEMU_IO -s -c "read 0 $size" "$TEST_IMG"
 $QEMU_IO -s -c "write -P 0xa 0 $size" "$TEST_IMG"

Also are some APIs to check disks contents(we need check 
active/secondary/hidden disk contents)?


Thanks
-Xie





Re: [Qemu-block] [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication

2016-03-08 Thread Changlong Xie

On 03/05/2016 01:39 AM, Stefan Hajnoczi wrote:

On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:

+static void replication_start(ReplicationState *rs, ReplicationMode mode,
+  Error **errp)
+{
+BlockDriverState *bs = rs->opaque;
+BDRVReplicationState *s = bs->opaque;
+int64_t active_length, hidden_length, disk_length;
+AioContext *aio_context;
+Error *local_err = NULL;
+
+if (s->replication_state != BLOCK_REPLICATION_NONE) {


Dereferencing bs and s is not thread-safe since the AioContext isn't
acquired here.  Unless you have other locking rules, the default is that
everything in BlockDriverState and bs->opaque must be accessed with the
AioContext acquired.

Please apply this comment to the rest of the patch series, too.
Functions that are not part of BlockDriver are probably called outside
the AioContext and must acquire it before they are allowed to access
anything else in bs.


+s->top_bs = get_top_bs(bs);
+if (!s->top_bs) {
+error_setg(errp, "Cannot get the top block driver state to do"
+   " internal backup");
+return;
+}


Traversing the BDS graph using the parent pointer is not safe - you
cannot stash the parent pointer because it changes if a parent node is
inserted/removed.

I suggest passing the drive name as an argument so that bdrv_lookup_bs()
can be used when installing the op blocker.  Then the BdrvChild.parent
pointer patch and get_top_bs() can be deleted.

Jeff Cody is currently working on a new op blocker system.  Hopefully
this system will allow you to install blockers on bs instead of on the
drive's top BDS.  But let's not worry about that for now and just use
the drive name as an argument.


+/*
+ * Must protect backup target if backup job was stopped/cancelled
+ * unexpectedly
+ */
+bdrv_ref(s->hidden_disk->bs);


Where is the matching bdrv_unref() call?


Two conditions
1. job is cancelled, so "local_err != 0"

449 if (local_err) {
450 error_propagate(errp, local_err);
451 backup_job_cleanup(s);
452 bdrv_unref(s->hidden_disk->bs);
453 return;
454 }

2. backup completed
backup_complete
bdrv_unref(s->target);

If i'm wrong, pls correct me.

Thanks
-Xie











Re: [Qemu-block] [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication

2016-03-07 Thread Changlong Xie

On 03/05/2016 01:53 AM, Stefan Hajnoczi wrote:

On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:

+static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
+{
+Error *local_err = NULL;
+int ret;
+
+if (!s->secondary_disk->bs->job) {
+error_setg(errp, "Backup job was cancelled unexpectedly");
+return;
+}


Do you have test cases for the replication driver?

This error code path seems like something that should be exercised to
make sure it works.

Basic start/checkpoint/stop and checks that the active/secondary/hidden
disks contain the expected data are also important.

Probably the easiest way to do this would be a file called
tests/test-replication.c that calls start/checkpoint/stop and read/write
functions directly instead of running a guest.


Will do that in next version.

Thanks
-Xie








Re: [Qemu-block] [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication

2016-03-07 Thread Changlong Xie

On 03/05/2016 01:39 AM, Stefan Hajnoczi wrote:

On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:

+static void replication_start(ReplicationState *rs, ReplicationMode mode,
+  Error **errp)
+{
+BlockDriverState *bs = rs->opaque;
+BDRVReplicationState *s = bs->opaque;
+int64_t active_length, hidden_length, disk_length;
+AioContext *aio_context;
+Error *local_err = NULL;
+
+if (s->replication_state != BLOCK_REPLICATION_NONE) {


Dereferencing bs and s is not thread-safe since the AioContext isn't
acquired here.  Unless you have other locking rules, the default is that
everything in BlockDriverState and bs->opaque must be accessed with the
AioContext acquired.

Please apply this comment to the rest of the patch series, too.
Functions that are not part of BlockDriver are probably called outside
the AioContext and must acquire it before they are allowed to access
anything else in bs.



Will do implement in replication_start/stop/checkpoint/get_error callbacks.


+s->top_bs = get_top_bs(bs);
+if (!s->top_bs) {
+error_setg(errp, "Cannot get the top block driver state to do"
+   " internal backup");
+return;
+}


Traversing the BDS graph using the parent pointer is not safe - you
cannot stash the parent pointer because it changes if a parent node is
inserted/removed.

I suggest passing the drive name as an argument so that bdrv_lookup_bs()
can be used when installing the op blocker.  Then the BdrvChild.parent
pointer patch and get_top_bs() can be deleted.



ok


Jeff Cody is currently working on a new op blocker system.  Hopefully
this system will allow you to install blockers on bs instead of on the
drive's top BDS.  But let's not worry about that for now and just use
the drive name as an argument.


Good news.

Thanks




+/*
+ * Must protect backup target if backup job was stopped/cancelled
+ * unexpectedly
+ */
+bdrv_ref(s->hidden_disk->bs);


Where is the matching bdrv_unref() call?







Re: [Qemu-block] [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication

2016-03-04 Thread Stefan Hajnoczi
On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:
> +static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
> +{
> +Error *local_err = NULL;
> +int ret;
> +
> +if (!s->secondary_disk->bs->job) {
> +error_setg(errp, "Backup job was cancelled unexpectedly");
> +return;
> +}

Do you have test cases for the replication driver?

This error code path seems like something that should be exercised to
make sure it works.

Basic start/checkpoint/stop and checks that the active/secondary/hidden
disks contain the expected data are also important.

Probably the easiest way to do this would be a file called
tests/test-replication.c that calls start/checkpoint/stop and read/write
functions directly instead of running a guest.


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication

2016-03-04 Thread Stefan Hajnoczi
On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:
> +static void replication_start(ReplicationState *rs, ReplicationMode mode,
> +  Error **errp)
> +{
> +BlockDriverState *bs = rs->opaque;
> +BDRVReplicationState *s = bs->opaque;
> +int64_t active_length, hidden_length, disk_length;
> +AioContext *aio_context;
> +Error *local_err = NULL;
> +
> +if (s->replication_state != BLOCK_REPLICATION_NONE) {

Dereferencing bs and s is not thread-safe since the AioContext isn't
acquired here.  Unless you have other locking rules, the default is that
everything in BlockDriverState and bs->opaque must be accessed with the
AioContext acquired.

Please apply this comment to the rest of the patch series, too.
Functions that are not part of BlockDriver are probably called outside
the AioContext and must acquire it before they are allowed to access
anything else in bs.

> +s->top_bs = get_top_bs(bs);
> +if (!s->top_bs) {
> +error_setg(errp, "Cannot get the top block driver state to do"
> +   " internal backup");
> +return;
> +}

Traversing the BDS graph using the parent pointer is not safe - you
cannot stash the parent pointer because it changes if a parent node is
inserted/removed.

I suggest passing the drive name as an argument so that bdrv_lookup_bs()
can be used when installing the op blocker.  Then the BdrvChild.parent
pointer patch and get_top_bs() can be deleted.

Jeff Cody is currently working on a new op blocker system.  Hopefully
this system will allow you to install blockers on bs instead of on the
drive's top BDS.  But let's not worry about that for now and just use
the drive name as an argument.

> +/*
> + * Must protect backup target if backup job was stopped/cancelled
> + * unexpectedly
> + */
> +bdrv_ref(s->hidden_disk->bs);

Where is the matching bdrv_unref() call?


signature.asc
Description: PGP signature