backup: use backup-top instead of write notifiers

Vladimir Sementsov-Ogievskiy Thu, 29 Aug 2019 07:57:19 -0700

28.08.2019 22:50, Max Reitz wrote:
> On 26.08.19 18:13, Vladimir Sementsov-Ogievskiy wrote:
>> Drop write notifiers and use filter node instead.
>>
>> = Changes =
>>
>> 1. add filter-node-name argument for backup qmp api. We have to do it
>> in this commit, as 257 needs to be fixed.
> 
> I feel a bit bad about it not being an implicit node.  But I know you
> don’t like them, they’re probably just broken altogether and because
> libvirt doesn’t use backup (yet), probably nobody cares.
> 
>> 2. there no move write notifiers here, so is_write_notifier parameter
> 
> s/there/there are/, I suppose?
> 
>> is dropped from block-copy paths.
>>
>> 3. Intersecting requests handling changed, now synchronization between
>> backup-top, backup and guest writes are all done in block/block-copy.c
>> and works as follows:
>>
>> On copy operation, we work only with dirty areas. If bits are dirty it
>> means that there are no requests intersecting with this area. We clear
>> dirty bits and take bdrv range lock (bdrv_co_try_lock) on this area to
>> prevent further operations from interaction with guest (only with
>> guest, as neither backup nor backup-top will touch non-dirty area). If
>> copy-operation failed we set dirty bits back together with releasing
>> the lock.
>>
>> The actual difference with old scheme is that on guest writes we
>> don't lock the whole region but only dirty-parts, and to be more
>> precise: only dirty-part we are currently operate on. In old scheme
>> guest write to non-dirty area (which may be safely ignored by backup)
>> may wait for intersecting request, touching some other area which is
>> dirty.
>>
>> 4. To sync with in-flight requests at job finish we now have drained
>> removing of the filter, we don't need rw-lock.
>>
>> = Notes =
>>
>> Note the consequence of three objects appearing: backup-top, backup job
>> and block-copy-state:
>>
>> 1. We want to insert backup-top before job creation, to behave similar
>> with mirror and commit, where job is started upon filter.
>>
>> 2. We also have to create block-copy-state after filter injection, as
>> we don't want it's source child be replaced by fitler. Instead we want
> 
> s/it's/its/, s/filter/filter/ (although “fitler” does have an amusing
> ring to it)
> 
>> to keep BCS.source to be real source node, as we want to use
>> bdrv_co_try_lock in CBW operations and it can't be used on filter, as
>> on filter we already have in-flight (write) request from upper layer.
> 
> Reasonable, even more so as I suppose BCS.source should eventually be a
> BdrvChild of backup-top.
> 
> What looks wrong is that the sync_bitmap is created on the source (“bs”
> in backup_job_create()), but backup_cleanup_sync_bitmap() still assumes
> it is on blk_bs(job->common.blk) (which is no longer true).
> 
>> So, we firstly create inject backup-top, then create job and BCS. BCS
>> is the latest just to not create extra variable for it. Finally we set
>> bcs for backup-top filter.
>>
>> = Iotest changes =
>>
>> 56: op-blocker doesn't shot now, as we set it on source, but then check
> 
> s/shot/show/?
> 
>> on filter, when trying to start second backup, so error caught in
>> test_dismiss_collision is changed. It's OK anyway, as this test-case
>> seems to test that after some collision we can dismiss first job and
>> successfully start the second one. So, the change is that collision is
>> changed from op-blocker to file-posix locks.
> 
> Well, but the op blocker belongs to the source, which should be covered
> by internal permissions.  The fact that it now shows up as a file-posix
> error thus shows that the conflict also moves from the source to the
> target.  It’s OK because we actually don’t have a conflict on the source.
> 
> But I wonder whether it would be better for test_dismiss_collision() to
> do a blockdev-backup instead where we can see the collision on the target.
> 
> Hm.  On second thought, why do we even get a conflict on the target?
> block-copy does share the WRITE permission for it...


Not sure, but assume that this is because in file-posix.c in raw_co_create
we do want RESIZE perm.

I can instead move this test to use specified job-id, to move the collision
to "job-id already exists" error. Is it better?

I'm afraid that posix locks will not work if disable them in config.

> 
>> However, it's obvious now that we'd better drop this op-blocker at all
>> and add a test-case for two backups from one node (to different
>> destinations) actually works. But not in these series.
>>
>> 257: The test wants to emulate guest write during backup. They should
>> go to filter node, not to original source node, of course. Therefore we
>> need to specify filter node name and use it.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
>> ---
>>   qapi/block-core.json       |   8 +-
>>   include/block/block-copy.h |   2 +-
>>   include/block/block_int.h  |   1 +
>>   block/backup-top.c         |  14 +-
>>   block/backup.c             | 113 +++-----------
>>   block/block-copy.c         |  55 ++++---
>>   block/replication.c        |   2 +-
>>   blockdev.c                 |   1 +
>>   tests/qemu-iotests/056     |   2 +-
>>   tests/qemu-iotests/257     |   7 +-
>>   tests/qemu-iotests/257.out | 306 ++++++++++++++++++-------------------
>>   11 files changed, 230 insertions(+), 281 deletions(-)
> 
> Nice.
> 
> I don’t have much to object (for some reason, I feel a bit uneasy about
> dropping all the request waiting code, but I suppose that is indeed
> taken care of with the range locks now).
> 
> [...]
> 
>> diff --git a/block/backup.c b/block/backup.c
>> index d927c63e5a..259a165405 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
> 
> [...]
> 
>> @@ -552,6 +480,9 @@ BlockJob *backup_job_create(const char *job_id, 
>> BlockDriverState *bs,
>>           backup_clean(&job->common.job);
>>           job_early_fail(&job->common.job);
>>       }

actually, here should be "else if"

>> +    if (backup_top) {
>> +        bdrv_backup_top_drop(backup_top);
>> +    }
> 
> This order is weird because backup-top still has a pointer to the BCS,
> but we have already freed it at this point.
> 
>>   
>>       return NULL;
>>   }
>> diff --git a/block/block-copy.c b/block/block-copy.c
>> index 6828c46ba0..f3102ec3ff 100644
>> --- a/block/block-copy.c
>> +++ b/block/block-copy.c
>> @@ -98,27 +98,32 @@ fail:
>>    * error occurred, return a negative error number
>>    */
>>   static int coroutine_fn block_copy_with_bounce_buffer(
>> -        BlockCopyState *s, int64_t start, int64_t end, bool 
>> is_write_notifier,
>> +        BlockCopyState *s, int64_t start, int64_t end,
>>           bool *error_is_read, void **bounce_buffer)
>>   {
>>       int ret;
>> -    int nbytes;
>> -    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>> +    int nbytes = MIN(s->cluster_size, s->len - start);
>> +    void *lock = bdrv_co_try_lock(blk_bs(s->source), start, nbytes);
>> +
>> +    /*
>> +     * Function must be called only on full-dirty region, so nobody 
>> touching or
>> +     * touched these bytes. Therefore, we must successfully get lock.
> 
> Which is OK because backup-top does not share the WRITE permission.  But
> it is organized a bit weirdly, because it’s actually block-copy here
> that relies on the WRITE permission not to be shared; which it itself
> cannot do because backup-top needs it.
> 
> This of course results from the fact that backup-top and block-copy
> should really use the same object to access the source, namely the
> backing BdrvChild of backup-top.
> 
> Maybe there should be a comment somewhere that points this out?

But I wanted block-copy not to be directly connected to backup-top.

I think actually we rely on the fact that user of block-copy() guarantees that
nobody is touching dirty areas (except block_copy). And with backup, this is 
done by
backup-top filter. I'll add a comment on this.

> 
>> +     */
>> +    assert(lock);
> 
> [...]
> 
>> @@ -128,13 +133,16 @@ static int coroutine_fn block_copy_with_bounce_buffer(
>>           if (error_is_read) {
>>               *error_is_read = false;
>>           }
>> -        goto fail;
>> +        goto out;
>>       }
>>   
>> -    return nbytes;
>> -fail:
>> -    bdrv_set_dirty_bitmap(s->copy_bitmap, start, s->cluster_size);
>> -    return ret;
>> +out:
>> +    if (ret < 0) {
>> +        bdrv_set_dirty_bitmap(s->copy_bitmap, start, s->cluster_size);
>> +    }
>> +    bdrv_co_unlock(lock);
>> +
>> +    return ret < 0 ? ret : nbytes;
>>   
> 
> I feel like it would still be simpler to keep the separate fail path and
> just duplicate the bdrv_co_unlock(lock) call.
> 
> [...]
> 
>> diff --git a/blockdev.c b/blockdev.c
>> index fbef6845c8..f89e48fc79 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -3601,6 +3601,7 @@ static BlockJob *do_backup_common(BackupCommon *backup,
>>       job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
>>                               backup->sync, bmap, backup->bitmap_mode,
>>                               backup->compress,
>> +                            backup->filter_node_name,
> 
> Is this guaranteed to be NULL when not specified by the user?
> 

As far as I know - yes, it is.


-- 
Best regards,
Vladimir

Re: [Qemu-block] [PATCH v9 13/13] block/backup: use backup-top instead of write notifiers

Reply via email to