Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode
Il 14/10/2013 22:10, Wolfgang Richter ha scritto: > Okay, I think my impression might be wrong, but I thought > 'drive-mirror' would become deprecated with the new 'drive-backup' > command and code. > > If we look at what they do (current documentation and code), > 'drive-backup' AFAIK behaves the same for all modes of 'drive-mirror' > _except_ mode 'none' with _better_ consistency guarantees. That is, > 'drive-backup' clearly provides a point-in-time snapshot, whereas > 'drive-mirror' may create a point-in-time snapshot, but it can not > guarantee that. They are different. drive-backup provides a point-in-time snapshot at the time the job is started. Completing the job stops writing to the target. drive-mirror provides a copy at the time the job is completed. Completing the job stops writing to the source. > In addition, 'drive-backup's code is cleaner, simpler, and easier to > work with (in my opinion) than 'drive-mirror's code. This is because > of the new hooks in block.c for tracked requests etc. so that the job > can insert code to be run on every write in a clean manner (I think). The simpler code for drive-backup is because it doesn't have performance requirements. drive-mirror has to be optimized because otherwise too many dirty sectors pile up and the job doesn't converge. drive-backup runs synchronously as the VM writes to the disk. Using the hooks in block.c we can change drive-mirror to use an "active" (but still asynchronous) approach as long as the in-flight I/O does not exceed the size of the drive-mirror buffer. This would not simplify the code however, it would only guarantee that I/Os are performed in the same order as the original operations issued by the VM. Paolo
Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode
On 10/14/2013 02:10 PM, Wolfgang Richter wrote: >> >> Add the designation '(since 1.7)' to make it obvious when this mode was >> introduced. > > Done. Is it better to place the updated patch in this thread or start > a new one? http://wiki.qemu.org/Contribute/SubmitAPatch suggests submitting a new top-level thread for each revision of a patch series, along with a changelog after the --- or in the cover letter to help focus reviewers on what changed from the earlier revision. > >> >>> # >>> # Since: 1.3 >>> ## >>> { 'enum': 'MirrorSyncMode', >>> - 'data': ['top', 'full', 'none'] } >>> + 'data': ['top', 'full', 'none', 'stream'] } >> >> MirrorSyncMode is used by multiple commands; your summary mentions how >> it would affect 'drive-backup', but what happens to 'drive-mirror'? For >> that matter, why isn't 'drive-mirror' with mode 'none' doing what you >> already want? > > Okay, I think my impression might be wrong, but I thought > 'drive-mirror' would become deprecated with the new 'drive-backup' > command and code. No - drive-mirror and drive-backup are independent, and both useful. Each fills a niche that the other cannot. > > If we look at what they do (current documentation and code), > 'drive-backup' AFAIK behaves the same for all modes of 'drive-mirror' > _except_ mode 'none' with _better_ consistency guarantees. That is, > 'drive-backup' clearly provides a point-in-time snapshot, whereas > 'drive-mirror' may create a point-in-time snapshot, but it can not > guarantee that. 'drive-backup' creates a point-in-time up front. 'drive-mirror' can be used to create a point-in-time at the tail end (when you gracefully cancel the job once it is in mirroring phase). But it also does not have to be canceled - as long as it is still running, you are still mirroring data. > > In addition, 'drive-backup's code is cleaner, simpler, and easier to > work with (in my opinion) than 'drive-mirror's code. This is because > of the new hooks in block.c for tracked requests etc. so that the job > can insert code to be run on every write in a clean manner (I think). > > I think that it would be less confusing to subsume 'drive-mirror' into > 'drive-backup' so that we have a single command with clear consistency > guarantees, and also it would prevent overloading (and more confusion) > with the meaning of the 'MirrorSyncMode's. You can't break the existing semantics, but if you think you can unify the code base, be my guest. > > Perhaps a better naming scheme for the modes would then be: > > full - as before (same for both commands AFAIK) > top - as before (same for both commands AFAIK) > none - if we only have drive-backup, rename this to 'overlay' as it > creates a low-overhead CoW overlay point-in-time snapshot > stream - either keep my name 'stream' to do what 'none' does for > drive-mirror, or leave this as the 'none' mode with the same > drive-mirror semantics > > Thus, I think, with a single extra mode, drive-backup can subsume > drive-mirror. This reduces the number of commands, the documentation, > and the code (all duplicating each other in some manner). > -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode
On Sat, Oct 12, 2013 at 1:47 AM, Fam Zheng wrote: > While mirroring write is a good idea, doing it with drive-backup is probably > not. The function of this command is to 'backup' the image with existing data, > instead of new data. With your 'stream' mode, this semantic is changed. I'm not so sure. I think that it would be better to switch between semantics with 'modes' rather than 'commands' to reduce documentation, duplicate code, and burden on users to remember different commands. Thus, many of the _modes_ of 'drive-backup' might provide you with point-in-time snapshots of a block device, but some of them might just mirror writes for backup purposes other than a point-in-time snapshot. > IMO this feature is best implemented as a block filter, which is currently > being discussed and not ready yet. A second option may be doing with another > command (e.g. block-mirror, or a new one?) That may be true, as I haven't followed block filters very closely yet, but it seemed simple enough with the nice drive-backup code to easily implement. Perhaps this 'mode' could be refactored in the future to use block filters. -- Wolf
Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode
On Fri, Oct 11, 2013 at 11:38 AM, Eric Blake wrote: > On 10/11/2013 09:18 AM, Wolfgang Richter wrote: >> Idea: Introduce a mode for drive-backup that duplicates writes to >> another target, not CoW. It is useful for introspecting (my use >> case), and for keeping a remote block device in sync with writes >> (helps with migration or backup). >> >> > >> >> This is based off of v1.6.0 code. > > Best to rebase it against latest qemu.git. Done. >> +++ b/qapi-schema.json >> @@ -1311,12 +1311,14 @@ >> # >> # @full: copies data from all images to the destination >> # >> -# @none: only copy data written from now on >> +# @none: only copy on write data written from now on >> +# >> +# @stream: copy every new write to target > > Add the designation '(since 1.7)' to make it obvious when this mode was > introduced. Done. Is it better to place the updated patch in this thread or start a new one? > >> # >> # Since: 1.3 >> ## >> { 'enum': 'MirrorSyncMode', >> - 'data': ['top', 'full', 'none'] } >> + 'data': ['top', 'full', 'none', 'stream'] } > > MirrorSyncMode is used by multiple commands; your summary mentions how > it would affect 'drive-backup', but what happens to 'drive-mirror'? For > that matter, why isn't 'drive-mirror' with mode 'none' doing what you > already want? Okay, I think my impression might be wrong, but I thought 'drive-mirror' would become deprecated with the new 'drive-backup' command and code. If we look at what they do (current documentation and code), 'drive-backup' AFAIK behaves the same for all modes of 'drive-mirror' _except_ mode 'none' with _better_ consistency guarantees. That is, 'drive-backup' clearly provides a point-in-time snapshot, whereas 'drive-mirror' may create a point-in-time snapshot, but it can not guarantee that. In addition, 'drive-backup's code is cleaner, simpler, and easier to work with (in my opinion) than 'drive-mirror's code. This is because of the new hooks in block.c for tracked requests etc. so that the job can insert code to be run on every write in a clean manner (I think). I think that it would be less confusing to subsume 'drive-mirror' into 'drive-backup' so that we have a single command with clear consistency guarantees, and also it would prevent overloading (and more confusion) with the meaning of the 'MirrorSyncMode's. Perhaps a better naming scheme for the modes would then be: full - as before (same for both commands AFAIK) top - as before (same for both commands AFAIK) none - if we only have drive-backup, rename this to 'overlay' as it creates a low-overhead CoW overlay point-in-time snapshot stream - either keep my name 'stream' to do what 'none' does for drive-mirror, or leave this as the 'none' mode with the same drive-mirror semantics Thus, I think, with a single extra mode, drive-backup can subsume drive-mirror. This reduces the number of commands, the documentation, and the code (all duplicating each other in some manner). -- Wolf
Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode
On Fri, 10/11 11:18, Wolfgang Richter wrote: > Idea: Introduce a mode for drive-backup that duplicates writes to > another target, not CoW. It is useful for introspecting (my use > case), and for keeping a remote block device in sync with writes > (helps with migration or backup). > > > > Issue with current modes: All of the current modes are well-designed > to support point-in-time snapshots, but none of them handle keeping > another drive up-to-date as new writes continuously occur. The 'None' > mode documentation is a bit ambiguous in this regard, but what it > actually implements is a very low overhead CoW snapshot. > > > > Patch: Fixes ambiguity in the 'None' mode documentation, introduces a > new mode 'stream' which duplicates writes without reading any data > from the original disk. > > I put the logic for copying the write into a new coroutine called > 'backup_do_stream' as it needs almost nothing from the original > 'backup_do_cow' function (no bit map, no reads from a block device, > etc.). The other major change is that tracked requests also contain a > handle to the QIOV involved in the write (and it is passed along). > > This is based off of v1.6.0 code. > While mirroring write is a good idea, doing it with drive-backup is probably not. The function of this command is to 'backup' the image with existing data, instead of new data. With your 'stream' mode, this semantic is changed. IMO this feature is best implemented as a block filter, which is currently being discussed and not ready yet. A second option may be doing with another command (e.g. block-mirror, or a new one?) Thanks, Fam
Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode
On 10/11/2013 09:18 AM, Wolfgang Richter wrote: > Idea: Introduce a mode for drive-backup that duplicates writes to > another target, not CoW. It is useful for introspecting (my use > case), and for keeping a remote block device in sync with writes > (helps with migration or backup). > > > > This is based off of v1.6.0 code. Best to rebase it against latest qemu.git. > +++ b/qapi-schema.json > @@ -1311,12 +1311,14 @@ > # > # @full: copies data from all images to the destination > # > -# @none: only copy data written from now on > +# @none: only copy on write data written from now on > +# > +# @stream: copy every new write to target Add the designation '(since 1.7)' to make it obvious when this mode was introduced. > # > # Since: 1.3 > ## > { 'enum': 'MirrorSyncMode', > - 'data': ['top', 'full', 'none'] } > + 'data': ['top', 'full', 'none', 'stream'] } MirrorSyncMode is used by multiple commands; your summary mentions how it would affect 'drive-backup', but what happens to 'drive-mirror'? For that matter, why isn't 'drive-mirror' with mode 'none' doing what you already want? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
[Qemu-devel] [RFC PATCH] drive-backup 'stream' mode
Idea: Introduce a mode for drive-backup that duplicates writes to another target, not CoW. It is useful for introspecting (my use case), and for keeping a remote block device in sync with writes (helps with migration or backup). Issue with current modes: All of the current modes are well-designed to support point-in-time snapshots, but none of them handle keeping another drive up-to-date as new writes continuously occur. The 'None' mode documentation is a bit ambiguous in this regard, but what it actually implements is a very low overhead CoW snapshot. Patch: Fixes ambiguity in the 'None' mode documentation, introduces a new mode 'stream' which duplicates writes without reading any data from the original disk. I put the logic for copying the write into a new coroutine called 'backup_do_stream' as it needs almost nothing from the original 'backup_do_cow' function (no bit map, no reads from a block device, etc.). The other major change is that tracked requests also contain a handle to the QIOV involved in the write (and it is passed along). This is based off of v1.6.0 code. diff --git a/block.c b/block.c index 01b66d8..159f825 100644 --- a/block.c +++ b/block.c @@ -1872,12 +1872,14 @@ static void tracked_request_end(BdrvTrackedRequest *req) static void tracked_request_begin(BdrvTrackedRequest *req, BlockDriverState *bs, int64_t sector_num, - int nb_sectors, bool is_write) + int nb_sectors, bool is_write, + QEMUIOVector *qiov) { *req = (BdrvTrackedRequest){ .bs = bs, .sector_num = sector_num, .nb_sectors = nb_sectors, +.qiov = qiov, .is_write = is_write, .co = qemu_coroutine_self(), }; @@ -2528,7 +2530,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, wait_for_overlapping_requests(bs, sector_num, nb_sectors); } -tracked_request_begin(&req, bs, sector_num, nb_sectors, false); +tracked_request_begin(&req, bs, sector_num, nb_sectors, false, NULL); if (flags & BDRV_REQ_COPY_ON_READ) { int pnum; @@ -2634,7 +2636,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs, wait_for_overlapping_requests(bs, sector_num, nb_sectors); } -tracked_request_begin(&req, bs, sector_num, nb_sectors, true); +tracked_request_begin(&req, bs, sector_num, nb_sectors, true, qiov); ret = notifier_with_return_list_notify(&bs->before_write_notifiers, &req); diff --git a/block/backup.c b/block/backup.c index 6ae8a05..686a53f 100644 --- a/block/backup.c +++ b/block/backup.c @@ -84,6 +84,37 @@ static void cow_request_end(CowRequest *req) qemu_co_queue_restart_all(&req->wait_queue); } +static int coroutine_fn backup_do_stream(BlockDriverState *bs, + int64_t sector_num, int nb_sectors, + QEMUIOVector *qiov) +{ +BackupBlockJob *job = (BackupBlockJob *)bs->job; +CowRequest cow_request; +int ret = 0; +int64_t start = sector_num, end = sector_num + nb_sectors; + +qemu_co_rwlock_rdlock(&job->flush_rwlock); + +wait_for_overlapping_requests(job, start, end); +cow_request_begin(&cow_request, job, start, end); + +ret = bdrv_co_writev(job->target, + sector_num, nb_sectors, + qiov); + +/* Publish progress, guest I/O counts as progress too. Note that the + * offset field is an opaque progress value, it is not a disk offset. + */ +job->sectors_read += sector_num; +job->common.offset += sector_num * BDRV_SECTOR_SIZE; + +cow_request_end(&cow_request); + +qemu_co_rwlock_unlock(&job->flush_rwlock); + +return ret; +} + static int coroutine_fn backup_do_cow(BlockDriverState *bs, int64_t sector_num, int nb_sectors, bool *error_is_read) @@ -181,7 +212,12 @@ static int coroutine_fn backup_before_write_notify( { BdrvTrackedRequest *req = opaque; -return backup_do_cow(req->bs, req->sector_num, req->nb_sectors, NULL); +if (MIRROR_SYNC_MODE_STREAM) { +return backup_do_stream(req->bs, req->sector_num, req->nb_sectors, +req->qiov); +} else { +return backup_do_cow(req->bs, req->sector_num, req->nb_sectors, NULL); +} } static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp) @@ -248,7 +284,8 @@ static void coroutine_fn backup_run(void *opaque) bdrv_add_before_write_notifier(bs, &before_write); -if (job->sync_mode == MIRROR_SYNC_MODE_NONE) { +if (job->sync_mode == MIRROR_SYNC_MODE_NONE || +job->sync_mode == MIRROR_SYNC_MODE_STREAM) { while (!block_job_is_cancelled(&job->common)) { /* Yield until the job is cancelled. We ju