Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode

2013-10-15 Thread Paolo Bonzini
Il 14/10/2013 22:10, Wolfgang Richter ha scritto:
> Okay, I think my impression might be wrong, but I thought
> 'drive-mirror' would become deprecated with the new 'drive-backup'
> command and code.
> 
> If we look at what they do (current documentation and code),
> 'drive-backup' AFAIK behaves the same for all modes of 'drive-mirror'
> _except_ mode 'none' with _better_ consistency guarantees.  That is,
> 'drive-backup' clearly provides a point-in-time snapshot, whereas
> 'drive-mirror' may create a point-in-time snapshot, but it can not
> guarantee that.

They are different.

drive-backup provides a point-in-time snapshot at the time the job is
started.  Completing the job stops writing to the target.

drive-mirror provides a copy at the time the job is completed.
Completing the job stops writing to the source.

> In addition, 'drive-backup's code is cleaner, simpler, and easier to
> work with (in my opinion) than 'drive-mirror's code.  This is because
> of the new hooks in block.c for tracked requests etc. so that the job
> can insert code to be run on every write in a clean manner (I think).

The simpler code for drive-backup is because it doesn't have performance
requirements.  drive-mirror has to be optimized because otherwise too
many dirty sectors pile up and the job doesn't converge.  drive-backup
runs synchronously as the VM writes to the disk.

Using the hooks in block.c we can change drive-mirror to use an "active"
(but still asynchronous) approach as long as the in-flight I/O does not
exceed the size of the drive-mirror buffer.  This would not simplify the
code however, it would only guarantee that I/Os are performed in the
same order as the original operations issued by the VM.

Paolo



Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode

2013-10-14 Thread Eric Blake
On 10/14/2013 02:10 PM, Wolfgang Richter wrote:

>>
>> Add the designation '(since 1.7)' to make it obvious when this mode was
>> introduced.
> 
> Done.  Is it better to place the updated patch in this thread or start
> a new one?

http://wiki.qemu.org/Contribute/SubmitAPatch suggests submitting a new
top-level thread for each revision of a patch series, along with a
changelog after the --- or in the cover letter to help focus reviewers
on what changed from the earlier revision.

> 
>>
>>>  #
>>>  # Since: 1.3
>>>  ##
>>>  { 'enum': 'MirrorSyncMode',
>>> -  'data': ['top', 'full', 'none'] }
>>> +  'data': ['top', 'full', 'none', 'stream'] }
>>
>> MirrorSyncMode is used by multiple commands; your summary mentions how
>> it would affect 'drive-backup', but what happens to 'drive-mirror'?  For
>> that matter, why isn't 'drive-mirror' with mode 'none' doing what you
>> already want?
> 
> Okay, I think my impression might be wrong, but I thought
> 'drive-mirror' would become deprecated with the new 'drive-backup'
> command and code.

No - drive-mirror and drive-backup are independent, and both useful.
Each fills a niche that the other cannot.

> 
> If we look at what they do (current documentation and code),
> 'drive-backup' AFAIK behaves the same for all modes of 'drive-mirror'
> _except_ mode 'none' with _better_ consistency guarantees.  That is,
> 'drive-backup' clearly provides a point-in-time snapshot, whereas
> 'drive-mirror' may create a point-in-time snapshot, but it can not
> guarantee that.

'drive-backup' creates a point-in-time up front.
'drive-mirror' can be used to create a point-in-time at the tail end
(when you gracefully cancel the job once it is in mirroring phase).  But
it also does not have to be canceled - as long as it is still running,
you are still mirroring data.

> 
> In addition, 'drive-backup's code is cleaner, simpler, and easier to
> work with (in my opinion) than 'drive-mirror's code.  This is because
> of the new hooks in block.c for tracked requests etc. so that the job
> can insert code to be run on every write in a clean manner (I think).
> 
> I think that it would be less confusing to subsume 'drive-mirror' into
> 'drive-backup' so that we have a single command with clear consistency
> guarantees, and also it would prevent overloading (and more confusion)
> with the meaning of the 'MirrorSyncMode's.

You can't break the existing semantics, but if you think you can unify
the code base, be my guest.

> 
> Perhaps a better naming scheme for the modes would then be:
> 
> full - as before (same for both commands AFAIK)
> top - as before (same for both commands AFAIK)
> none - if we only have drive-backup, rename this to 'overlay' as it
> creates a low-overhead CoW overlay point-in-time snapshot
> stream - either keep my name 'stream' to do what 'none' does for
> drive-mirror, or leave this as the 'none' mode with the same
> drive-mirror semantics
> 
> Thus, I think, with a single extra mode, drive-backup can subsume
> drive-mirror.  This reduces the number of commands, the documentation,
> and the code (all duplicating each other in some manner).
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode

2013-10-14 Thread Wolfgang Richter
On Sat, Oct 12, 2013 at 1:47 AM, Fam Zheng  wrote:
> While mirroring write is a good idea, doing it with drive-backup is probably
> not. The function of this command is to 'backup' the image with existing data,
> instead of new data. With your 'stream' mode, this semantic is changed.

I'm not so sure.  I think that it would be better to switch between
semantics with 'modes' rather than 'commands' to reduce documentation,
duplicate code, and burden on users to remember different commands.

Thus, many of the _modes_ of 'drive-backup' might provide you with
point-in-time snapshots of a block device, but some of them might just
mirror writes for backup purposes other than a point-in-time snapshot.

> IMO this feature is best implemented as a block filter, which is currently
> being discussed and not ready yet. A second option may be doing with another
> command (e.g. block-mirror, or a new one?)

That may be true, as I haven't followed block filters very closely
yet, but it seemed simple enough with the nice drive-backup code to
easily implement.

Perhaps this 'mode' could be refactored in the future to use block filters.

-- 
Wolf



Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode

2013-10-14 Thread Wolfgang Richter
On Fri, Oct 11, 2013 at 11:38 AM, Eric Blake  wrote:
> On 10/11/2013 09:18 AM, Wolfgang Richter wrote:
>> Idea: Introduce a mode for drive-backup that duplicates writes to
>> another target, not CoW.  It is useful for introspecting (my use
>> case), and for keeping a remote block device in sync with writes
>> (helps with migration or backup).
>>
>>
>
>>
>> This is based off of v1.6.0 code.
>
> Best to rebase it against latest qemu.git.

Done.

>> +++ b/qapi-schema.json
>> @@ -1311,12 +1311,14 @@
>>  #
>>  # @full: copies data from all images to the destination
>>  #
>> -# @none: only copy data written from now on
>> +# @none: only copy on write data written from now on
>> +#
>> +# @stream: copy every new write to target
>
> Add the designation '(since 1.7)' to make it obvious when this mode was
> introduced.

Done.  Is it better to place the updated patch in this thread or start
a new one?

>
>>  #
>>  # Since: 1.3
>>  ##
>>  { 'enum': 'MirrorSyncMode',
>> -  'data': ['top', 'full', 'none'] }
>> +  'data': ['top', 'full', 'none', 'stream'] }
>
> MirrorSyncMode is used by multiple commands; your summary mentions how
> it would affect 'drive-backup', but what happens to 'drive-mirror'?  For
> that matter, why isn't 'drive-mirror' with mode 'none' doing what you
> already want?

Okay, I think my impression might be wrong, but I thought
'drive-mirror' would become deprecated with the new 'drive-backup'
command and code.

If we look at what they do (current documentation and code),
'drive-backup' AFAIK behaves the same for all modes of 'drive-mirror'
_except_ mode 'none' with _better_ consistency guarantees.  That is,
'drive-backup' clearly provides a point-in-time snapshot, whereas
'drive-mirror' may create a point-in-time snapshot, but it can not
guarantee that.

In addition, 'drive-backup's code is cleaner, simpler, and easier to
work with (in my opinion) than 'drive-mirror's code.  This is because
of the new hooks in block.c for tracked requests etc. so that the job
can insert code to be run on every write in a clean manner (I think).

I think that it would be less confusing to subsume 'drive-mirror' into
'drive-backup' so that we have a single command with clear consistency
guarantees, and also it would prevent overloading (and more confusion)
with the meaning of the 'MirrorSyncMode's.

Perhaps a better naming scheme for the modes would then be:

full - as before (same for both commands AFAIK)
top - as before (same for both commands AFAIK)
none - if we only have drive-backup, rename this to 'overlay' as it
creates a low-overhead CoW overlay point-in-time snapshot
stream - either keep my name 'stream' to do what 'none' does for
drive-mirror, or leave this as the 'none' mode with the same
drive-mirror semantics

Thus, I think, with a single extra mode, drive-backup can subsume
drive-mirror.  This reduces the number of commands, the documentation,
and the code (all duplicating each other in some manner).

-- 
Wolf



Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode

2013-10-11 Thread Fam Zheng
On Fri, 10/11 11:18, Wolfgang Richter wrote:
> Idea: Introduce a mode for drive-backup that duplicates writes to
> another target, not CoW.  It is useful for introspecting (my use
> case), and for keeping a remote block device in sync with writes
> (helps with migration or backup).
> 
> 
> 
> Issue with current modes:  All of the current modes are well-designed
> to support point-in-time snapshots, but none of them handle keeping
> another drive up-to-date as new writes continuously occur.  The 'None'
> mode documentation is a bit ambiguous in this regard, but what it
> actually implements is a very low overhead CoW snapshot.
> 
> 
> 
> Patch: Fixes ambiguity in the 'None' mode documentation, introduces a
> new mode 'stream' which duplicates writes without reading any data
> from the original disk.
> 
> I put the logic for copying the write into a new coroutine called
> 'backup_do_stream' as it needs almost nothing from the original
> 'backup_do_cow' function (no bit map, no reads from a block device,
> etc.).  The other major change is that tracked requests also contain a
> handle to the QIOV involved in the write (and it is passed along).
> 
> This is based off of v1.6.0 code.
> 

While mirroring write is a good idea, doing it with drive-backup is probably
not. The function of this command is to 'backup' the image with existing data,
instead of new data. With your 'stream' mode, this semantic is changed.

IMO this feature is best implemented as a block filter, which is currently
being discussed and not ready yet. A second option may be doing with another
command (e.g. block-mirror, or a new one?)

Thanks,
Fam



Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode

2013-10-11 Thread Eric Blake
On 10/11/2013 09:18 AM, Wolfgang Richter wrote:
> Idea: Introduce a mode for drive-backup that duplicates writes to
> another target, not CoW.  It is useful for introspecting (my use
> case), and for keeping a remote block device in sync with writes
> (helps with migration or backup).
> 
> 

> 
> This is based off of v1.6.0 code.

Best to rebase it against latest qemu.git.

> +++ b/qapi-schema.json
> @@ -1311,12 +1311,14 @@
>  #
>  # @full: copies data from all images to the destination
>  #
> -# @none: only copy data written from now on
> +# @none: only copy on write data written from now on
> +#
> +# @stream: copy every new write to target

Add the designation '(since 1.7)' to make it obvious when this mode was
introduced.

>  #
>  # Since: 1.3
>  ##
>  { 'enum': 'MirrorSyncMode',
> -  'data': ['top', 'full', 'none'] }
> +  'data': ['top', 'full', 'none', 'stream'] }

MirrorSyncMode is used by multiple commands; your summary mentions how
it would affect 'drive-backup', but what happens to 'drive-mirror'?  For
that matter, why isn't 'drive-mirror' with mode 'none' doing what you
already want?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [RFC PATCH] drive-backup 'stream' mode

2013-10-11 Thread Wolfgang Richter
Idea: Introduce a mode for drive-backup that duplicates writes to
another target, not CoW.  It is useful for introspecting (my use
case), and for keeping a remote block device in sync with writes
(helps with migration or backup).



Issue with current modes:  All of the current modes are well-designed
to support point-in-time snapshots, but none of them handle keeping
another drive up-to-date as new writes continuously occur.  The 'None'
mode documentation is a bit ambiguous in this regard, but what it
actually implements is a very low overhead CoW snapshot.



Patch: Fixes ambiguity in the 'None' mode documentation, introduces a
new mode 'stream' which duplicates writes without reading any data
from the original disk.

I put the logic for copying the write into a new coroutine called
'backup_do_stream' as it needs almost nothing from the original
'backup_do_cow' function (no bit map, no reads from a block device,
etc.).  The other major change is that tracked requests also contain a
handle to the QIOV involved in the write (and it is passed along).

This is based off of v1.6.0 code.







diff --git a/block.c b/block.c
index 01b66d8..159f825 100644
--- a/block.c
+++ b/block.c
@@ -1872,12 +1872,14 @@ static void tracked_request_end(BdrvTrackedRequest *req)
 static void tracked_request_begin(BdrvTrackedRequest *req,
   BlockDriverState *bs,
   int64_t sector_num,
-  int nb_sectors, bool is_write)
+  int nb_sectors, bool is_write,
+  QEMUIOVector *qiov)
 {
 *req = (BdrvTrackedRequest){
 .bs = bs,
 .sector_num = sector_num,
 .nb_sectors = nb_sectors,
+.qiov = qiov,
 .is_write = is_write,
 .co = qemu_coroutine_self(),
 };
@@ -2528,7 +2530,7 @@ static int coroutine_fn
bdrv_co_do_readv(BlockDriverState *bs,
 wait_for_overlapping_requests(bs, sector_num, nb_sectors);
 }

-tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
+tracked_request_begin(&req, bs, sector_num, nb_sectors, false, NULL);

 if (flags & BDRV_REQ_COPY_ON_READ) {
 int pnum;
@@ -2634,7 +2636,7 @@ static int coroutine_fn
bdrv_co_do_writev(BlockDriverState *bs,
 wait_for_overlapping_requests(bs, sector_num, nb_sectors);
 }

-tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
+tracked_request_begin(&req, bs, sector_num, nb_sectors, true, qiov);

 ret = notifier_with_return_list_notify(&bs->before_write_notifiers, &req);

diff --git a/block/backup.c b/block/backup.c
index 6ae8a05..686a53f 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -84,6 +84,37 @@ static void cow_request_end(CowRequest *req)
 qemu_co_queue_restart_all(&req->wait_queue);
 }

+static int coroutine_fn backup_do_stream(BlockDriverState *bs,
+ int64_t sector_num, int nb_sectors,
+ QEMUIOVector *qiov)
+{
+BackupBlockJob *job = (BackupBlockJob *)bs->job;
+CowRequest cow_request;
+int ret = 0;
+int64_t start = sector_num, end = sector_num + nb_sectors;
+
+qemu_co_rwlock_rdlock(&job->flush_rwlock);
+
+wait_for_overlapping_requests(job, start, end);
+cow_request_begin(&cow_request, job, start, end);
+
+ret = bdrv_co_writev(job->target,
+ sector_num, nb_sectors,
+ qiov);
+
+/* Publish progress, guest I/O counts as progress too.  Note that the
+ * offset field is an opaque progress value, it is not a disk offset.
+ */
+job->sectors_read += sector_num;
+job->common.offset += sector_num * BDRV_SECTOR_SIZE;
+
+cow_request_end(&cow_request);
+
+qemu_co_rwlock_unlock(&job->flush_rwlock);
+
+return ret;
+}
+
 static int coroutine_fn backup_do_cow(BlockDriverState *bs,
   int64_t sector_num, int nb_sectors,
   bool *error_is_read)
@@ -181,7 +212,12 @@ static int coroutine_fn backup_before_write_notify(
 {
 BdrvTrackedRequest *req = opaque;

-return backup_do_cow(req->bs, req->sector_num, req->nb_sectors, NULL);
+if (MIRROR_SYNC_MODE_STREAM) {
+return backup_do_stream(req->bs, req->sector_num, req->nb_sectors,
+req->qiov);
+} else {
+return backup_do_cow(req->bs, req->sector_num, req->nb_sectors, NULL);
+}
 }

 static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
@@ -248,7 +284,8 @@ static void coroutine_fn backup_run(void *opaque)

 bdrv_add_before_write_notifier(bs, &before_write);

-if (job->sync_mode == MIRROR_SYNC_MODE_NONE) {
+if (job->sync_mode == MIRROR_SYNC_MODE_NONE ||
+job->sync_mode == MIRROR_SYNC_MODE_STREAM) {
 while (!block_job_is_cancelled(&job->common)) {
 /* Yield until the job is cancelled.  We ju