02.09.2019 19:34, Max Reitz wrote: > On 29.08.19 16:55, Vladimir Sementsov-Ogievskiy wrote: >> 28.08.2019 22:50, Max Reitz wrote: >>> On 26.08.19 18:13, Vladimir Sementsov-Ogievskiy wrote: >>>> Drop write notifiers and use filter node instead. >>>> >>>> = Changes = >>>> >>>> 1. add filter-node-name argument for backup qmp api. We have to do it >>>> in this commit, as 257 needs to be fixed. >>> >>> I feel a bit bad about it not being an implicit node. But I know you >>> don’t like them, they’re probably just broken altogether and because >>> libvirt doesn’t use backup (yet), probably nobody cares. >>> >>>> 2. there no move write notifiers here, so is_write_notifier parameter >>> >>> s/there/there are/, I suppose? >>> >>>> is dropped from block-copy paths. >>>> >>>> 3. Intersecting requests handling changed, now synchronization between >>>> backup-top, backup and guest writes are all done in block/block-copy.c >>>> and works as follows: >>>> >>>> On copy operation, we work only with dirty areas. If bits are dirty it >>>> means that there are no requests intersecting with this area. We clear >>>> dirty bits and take bdrv range lock (bdrv_co_try_lock) on this area to >>>> prevent further operations from interaction with guest (only with >>>> guest, as neither backup nor backup-top will touch non-dirty area). If >>>> copy-operation failed we set dirty bits back together with releasing >>>> the lock. >>>> >>>> The actual difference with old scheme is that on guest writes we >>>> don't lock the whole region but only dirty-parts, and to be more >>>> precise: only dirty-part we are currently operate on. In old scheme >>>> guest write to non-dirty area (which may be safely ignored by backup) >>>> may wait for intersecting request, touching some other area which is >>>> dirty. >>>> >>>> 4. To sync with in-flight requests at job finish we now have drained >>>> removing of the filter, we don't need rw-lock. >>>> >>>> = Notes = >>>> >>>> Note the consequence of three objects appearing: backup-top, backup job >>>> and block-copy-state: >>>> >>>> 1. We want to insert backup-top before job creation, to behave similar >>>> with mirror and commit, where job is started upon filter. >>>> >>>> 2. We also have to create block-copy-state after filter injection, as >>>> we don't want it's source child be replaced by fitler. Instead we want >>> >>> s/it's/its/, s/filter/filter/ (although “fitler” does have an amusing >>> ring to it) >>> >>>> to keep BCS.source to be real source node, as we want to use >>>> bdrv_co_try_lock in CBW operations and it can't be used on filter, as >>>> on filter we already have in-flight (write) request from upper layer. >>> >>> Reasonable, even more so as I suppose BCS.source should eventually be a >>> BdrvChild of backup-top. >>> >>> What looks wrong is that the sync_bitmap is created on the source (“bs” >>> in backup_job_create()), but backup_cleanup_sync_bitmap() still assumes >>> it is on blk_bs(job->common.blk) (which is no longer true). >>> >>>> So, we firstly create inject backup-top, then create job and BCS. BCS >>>> is the latest just to not create extra variable for it. Finally we set >>>> bcs for backup-top filter. >>>> >>>> = Iotest changes = >>>> >>>> 56: op-blocker doesn't shot now, as we set it on source, but then check >>> >>> s/shot/show/? >>> >>>> on filter, when trying to start second backup, so error caught in >>>> test_dismiss_collision is changed. It's OK anyway, as this test-case >>>> seems to test that after some collision we can dismiss first job and >>>> successfully start the second one. So, the change is that collision is >>>> changed from op-blocker to file-posix locks. >>> >>> Well, but the op blocker belongs to the source, which should be covered >>> by internal permissions. The fact that it now shows up as a file-posix >>> error thus shows that the conflict also moves from the source to the >>> target. It’s OK because we actually don’t have a conflict on the source. >>> >>> But I wonder whether it would be better for test_dismiss_collision() to >>> do a blockdev-backup instead where we can see the collision on the target. >>> >>> Hm. On second thought, why do we even get a conflict on the target? >>> block-copy does share the WRITE permission for it... >> >> Not sure, but assume that this is because in file-posix.c in raw_co_create >> we do want RESIZE perm. >> >> I can instead move this test to use specified job-id, to move the collision >> to "job-id already exists" error. Is it better? > > It’s probably the only thing that actually makes sense. > >> I'm afraid that posix locks will not work if disable them in config. > > Yes (or if the environment doesn’t support them); which is why I > suggested using blockdev-backup. (But that would have the same problem > of “Where does the permission conflict even come from and is it > inevitable?”) > > [...] > >>>> diff --git a/block/block-copy.c b/block/block-copy.c >>>> index 6828c46ba0..f3102ec3ff 100644 >>>> --- a/block/block-copy.c >>>> +++ b/block/block-copy.c >>>> @@ -98,27 +98,32 @@ fail: >>>> * error occurred, return a negative error number >>>> */ >>>> static int coroutine_fn block_copy_with_bounce_buffer( >>>> - BlockCopyState *s, int64_t start, int64_t end, bool >>>> is_write_notifier, >>>> + BlockCopyState *s, int64_t start, int64_t end, >>>> bool *error_is_read, void **bounce_buffer) >>>> { >>>> int ret; >>>> - int nbytes; >>>> - int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; >>>> + int nbytes = MIN(s->cluster_size, s->len - start); >>>> + void *lock = bdrv_co_try_lock(blk_bs(s->source), start, nbytes); >>>> + >>>> + /* >>>> + * Function must be called only on full-dirty region, so nobody >>>> touching or >>>> + * touched these bytes. Therefore, we must successfully get lock. >>> >>> Which is OK because backup-top does not share the WRITE permission. But >>> it is organized a bit weirdly, because it’s actually block-copy here >>> that relies on the WRITE permission not to be shared; which it itself >>> cannot do because backup-top needs it. >>> >>> This of course results from the fact that backup-top and block-copy >>> should really use the same object to access the source, namely the >>> backing BdrvChild of backup-top. >>> >>> Maybe there should be a comment somewhere that points this out? >> >> But I wanted block-copy not to be directly connected to backup-top. > > But it doesn’t really work. There’s an obscure contract that already > binds block-copy to backup-top. > >> I think actually we rely on the fact that user of block-copy() guarantees >> that >> nobody is touching dirty areas (except block_copy). And with backup, this is >> done by >> backup-top filter. I'll add a comment on this. > > Yes, that’s the contract. And it’s weird, because block-copy has its > own BB, so if it doesn’t want anyone to write
doesn't want anyone to write to dirty areas, and we don't have such option in the permission system... > to the source outside of > its control, it should just unshare the WRITE permission – but it cannot > do that, because its user needs WRITE access, and so it implicitly > relies on that user to then unshare WRITE. > > If both used the same single BdrvChild, it would be clear > > (1) why the BdrvChild belongs to block-copy’s user (because block-copy’s > user would need to own a node that then has this BdrvChild), > > (2) why block-copy does not unshare the WRITE permission (it cannot, > because it doesn’t create the BdrvChild), > > (3) how block-copy could explicitly ensure that WRITE is unshared (by > asserting !(child->shared_perm & WRITE)). > > > I’m not saying that we need to abandon having BBs right now, but I think > there are a couple of cases which show why I say it’s uglier than using > BdrvChildren instead. > OK. I'd prefer to move block-copy to BdrvChildren as follow-up, if you don't mind. -- Best regards, Vladimir