backup: use backup-top instead of write notifiers

Vladimir Sementsov-Ogievskiy Tue, 03 Sep 2019 01:07:24 -0700

02.09.2019 19:34, Max Reitz wrote:
> On 29.08.19 16:55, Vladimir Sementsov-Ogievskiy wrote:
>> 28.08.2019 22:50, Max Reitz wrote:
>>> On 26.08.19 18:13, Vladimir Sementsov-Ogievskiy wrote:
>>>> Drop write notifiers and use filter node instead.
>>>>
>>>> = Changes =
>>>>
>>>> 1. add filter-node-name argument for backup qmp api. We have to do it
>>>> in this commit, as 257 needs to be fixed.
>>>
>>> I feel a bit bad about it not being an implicit node.  But I know you
>>> don’t like them, they’re probably just broken altogether and because
>>> libvirt doesn’t use backup (yet), probably nobody cares.
>>>
>>>> 2. there no move write notifiers here, so is_write_notifier parameter
>>>
>>> s/there/there are/, I suppose?
>>>
>>>> is dropped from block-copy paths.
>>>>
>>>> 3. Intersecting requests handling changed, now synchronization between
>>>> backup-top, backup and guest writes are all done in block/block-copy.c
>>>> and works as follows:
>>>>
>>>> On copy operation, we work only with dirty areas. If bits are dirty it
>>>> means that there are no requests intersecting with this area. We clear
>>>> dirty bits and take bdrv range lock (bdrv_co_try_lock) on this area to
>>>> prevent further operations from interaction with guest (only with
>>>> guest, as neither backup nor backup-top will touch non-dirty area). If
>>>> copy-operation failed we set dirty bits back together with releasing
>>>> the lock.
>>>>
>>>> The actual difference with old scheme is that on guest writes we
>>>> don't lock the whole region but only dirty-parts, and to be more
>>>> precise: only dirty-part we are currently operate on. In old scheme
>>>> guest write to non-dirty area (which may be safely ignored by backup)
>>>> may wait for intersecting request, touching some other area which is
>>>> dirty.
>>>>
>>>> 4. To sync with in-flight requests at job finish we now have drained
>>>> removing of the filter, we don't need rw-lock.
>>>>
>>>> = Notes =
>>>>
>>>> Note the consequence of three objects appearing: backup-top, backup job
>>>> and block-copy-state:
>>>>
>>>> 1. We want to insert backup-top before job creation, to behave similar
>>>> with mirror and commit, where job is started upon filter.
>>>>
>>>> 2. We also have to create block-copy-state after filter injection, as
>>>> we don't want it's source child be replaced by fitler. Instead we want
>>>
>>> s/it's/its/, s/filter/filter/ (although “fitler” does have an amusing
>>> ring to it)
>>>
>>>> to keep BCS.source to be real source node, as we want to use
>>>> bdrv_co_try_lock in CBW operations and it can't be used on filter, as
>>>> on filter we already have in-flight (write) request from upper layer.
>>>
>>> Reasonable, even more so as I suppose BCS.source should eventually be a
>>> BdrvChild of backup-top.
>>>
>>> What looks wrong is that the sync_bitmap is created on the source (“bs”
>>> in backup_job_create()), but backup_cleanup_sync_bitmap() still assumes
>>> it is on blk_bs(job->common.blk) (which is no longer true).
>>>
>>>> So, we firstly create inject backup-top, then create job and BCS. BCS
>>>> is the latest just to not create extra variable for it. Finally we set
>>>> bcs for backup-top filter.
>>>>
>>>> = Iotest changes =
>>>>
>>>> 56: op-blocker doesn't shot now, as we set it on source, but then check
>>>
>>> s/shot/show/?
>>>
>>>> on filter, when trying to start second backup, so error caught in
>>>> test_dismiss_collision is changed. It's OK anyway, as this test-case
>>>> seems to test that after some collision we can dismiss first job and
>>>> successfully start the second one. So, the change is that collision is
>>>> changed from op-blocker to file-posix locks.
>>>
>>> Well, but the op blocker belongs to the source, which should be covered
>>> by internal permissions.  The fact that it now shows up as a file-posix
>>> error thus shows that the conflict also moves from the source to the
>>> target.  It’s OK because we actually don’t have a conflict on the source.
>>>
>>> But I wonder whether it would be better for test_dismiss_collision() to
>>> do a blockdev-backup instead where we can see the collision on the target.
>>>
>>> Hm.  On second thought, why do we even get a conflict on the target?
>>> block-copy does share the WRITE permission for it...
>>
>> Not sure, but assume that this is because in file-posix.c in raw_co_create
>> we do want RESIZE perm.
>>
>> I can instead move this test to use specified job-id, to move the collision
>> to "job-id already exists" error. Is it better?
> 
> It’s probably the only thing that actually makes sense.
> 
>> I'm afraid that posix locks will not work if disable them in config.
> 
> Yes (or if the environment doesn’t support them); which is why I
> suggested using blockdev-backup.  (But that would have the same problem
> of “Where does the permission conflict even come from and is it
> inevitable?”)
> 
> [...]
> 
>>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>>> index 6828c46ba0..f3102ec3ff 100644
>>>> --- a/block/block-copy.c
>>>> +++ b/block/block-copy.c
>>>> @@ -98,27 +98,32 @@ fail:
>>>>     * error occurred, return a negative error number
>>>>     */
>>>>    static int coroutine_fn block_copy_with_bounce_buffer(
>>>> -        BlockCopyState *s, int64_t start, int64_t end, bool 
>>>> is_write_notifier,
>>>> +        BlockCopyState *s, int64_t start, int64_t end,
>>>>            bool *error_is_read, void **bounce_buffer)
>>>>    {
>>>>        int ret;
>>>> -    int nbytes;
>>>> -    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>> +    int nbytes = MIN(s->cluster_size, s->len - start);
>>>> +    void *lock = bdrv_co_try_lock(blk_bs(s->source), start, nbytes);
>>>> +
>>>> +    /*
>>>> +     * Function must be called only on full-dirty region, so nobody 
>>>> touching or
>>>> +     * touched these bytes. Therefore, we must successfully get lock.
>>>
>>> Which is OK because backup-top does not share the WRITE permission.  But
>>> it is organized a bit weirdly, because it’s actually block-copy here
>>> that relies on the WRITE permission not to be shared; which it itself
>>> cannot do because backup-top needs it.
>>>
>>> This of course results from the fact that backup-top and block-copy
>>> should really use the same object to access the source, namely the
>>> backing BdrvChild of backup-top.
>>>
>>> Maybe there should be a comment somewhere that points this out?
>>
>> But I wanted block-copy not to be directly connected to backup-top.
> 
> But it doesn’t really work.  There’s an obscure contract that already
> binds block-copy to backup-top.
> 
>> I think actually we rely on the fact that user of block-copy() guarantees 
>> that
>> nobody is touching dirty areas (except block_copy). And with backup, this is 
>> done by
>> backup-top filter. I'll add a comment on this.
> 
> Yes, that’s the contract.  And it’s weird, because block-copy has its
> own BB, so if it doesn’t want anyone to write


doesn't want anyone to write to dirty areas, and we don't have such option in 
the permission
system...

> to the source outside of
> its control, it should just unshare the WRITE permission – but it cannot
> do that, because its user needs WRITE access, and so it implicitly
> relies on that user to then unshare WRITE.
> 
> If both used the same single BdrvChild, it would be clear
> 
> (1) why the BdrvChild belongs to block-copy’s user (because block-copy’s
> user would need to own a node that then has this BdrvChild),
> 
> (2) why block-copy does not unshare the WRITE permission (it cannot,
> because it doesn’t create the BdrvChild),
> 
> (3) how block-copy could explicitly ensure that WRITE is unshared (by
> asserting !(child->shared_perm & WRITE)).
> 
> 
> I’m not saying that we need to abandon having BBs right now, but I think
> there are a couple of cases which show why I say it’s uglier than using
> BdrvChildren instead.
> 

OK. I'd prefer to move block-copy to BdrvChildren as follow-up, if you don't 
mind.


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v9 13/13] block/backup: use backup-top instead of write notifiers

Reply via email to