On 07/06/2017 11:02 AM, Kevin Wolf wrote: > Am 05.07.2017 um 23:08 hat Eric Blake geschrieben: >> We are gradually moving away from sector-based interfaces, towards >> byte-based. In the common case, allocation is unlikely to ever use >> values that are not naturally sector-aligned, but it is possible >> that byte-based values will let us be more precise about allocation >> at the end of an unaligned file that can do byte-based access. >> >> Changing the signature of the function to use int64_t *pnum ensures >> that the compiler enforces that all callers are updated. For now, >> the io.c layer still assert()s that all callers are sector-aligned >> on input and that *pnum is sector-aligned on return to the caller, >> but that can be relaxed when a later patch implements byte-based >> block status. Therefore, this code adds usages like >> DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned >> values, where the call might reasonbly give non-aligned results >> in the future; on the other hand, no rounding is needed for callers >> that should just continue to work with byte alignment. >> >> For the most part this patch is just the addition of scaling at the >> callers followed by inverse scaling at bdrv_is_allocated(). But >> some code, particularly bdrv_commit(), gets a lot simpler because it >> no longer has to mess with sectors; also, it is now possible to pass >> NULL if the caller does not care how much of the image is allocated >> beyond the initial offset. >> >> For ease of review, bdrv_is_allocated_above() will be tackled >> separately. >>
>> @@ -1036,14 +1036,15 @@ static int coroutine_fn >> bdrv_aligned_preadv(BdrvChild *child, >> >> - if (!ret || pnum != nb_sectors) { >> + if (!ret || pnum != nb_sectors << BDRV_SECTOR_BITS) { >> ret = bdrv_co_do_copy_on_readv(child, offset, bytes, qiov); >> goto out; >> } > > This code would become a lot simpler if you just removed the local > variables start_sector/end_sector and used the byte offsets instead that > are available in this function. (And even simpler once sector alignment > isn't required any more for bdrv_is_allocated(). Should we leave a > comment here so we won't forget the conversion?) Hmm - that cleanup is not (yet) done in my bdrv_get_block_status series (where I finally drop the need for sector alignment in bdrv_is_allocated). Yes, I can add a temporary comment here (that aids with the understanding of intermediate states), as well as a new followup patch that actually does the cleanup. >> -int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, >> - int nb_sectors, int *pnum) >> +int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset, >> + int64_t bytes, int64_t *pnum) >> { >> BlockDriverState *file; >> - int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum, >> - &file); >> + int64_t sector_num = offset >> BDRV_SECTOR_BITS; >> + int nb_sectors = bytes >> BDRV_SECTOR_BITS; >> + int64_t ret; >> + int psectors; >> + >> + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE)); >> + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) && >> + bytes < INT_MAX * BDRV_SECTOR_SIZE); > > I think we can actually assert bytes <= INT_MAX. Both ways to write the > assertion are only true because of BDRV_REQUEST_MAX_SECTORS, which > ensures that the byte counts fit in an int. That's a tighter assertion, but if it passes, I'm fine using it. > > Some of the places that call bdrv_is_allocated() actually depend on this > because your patches tend to use nb_sectors << BDRV_SECTOR_BITS (which > truncates) rather than nb_sectors * BDRV_SECTOR_BYTES (which is always a > 64 bit calculation). Actually, for this series of cleanups, I tried really hard to use * _BYTES everywhere that I widen sectors to bytes, and limit myself to >> _BITS for going from bytes back to sectors (after also checking if I needed DIV_ROUND_UP instead of straight division). If you spot places where I'm still doing a 32-bit << _BITS, please call it out (as I do remember that I caused bugs that way back when I converted pdiscard to byte-based, back near commit d2cb36af). >> +++ b/block/stream.c >> @@ -137,6 +137,7 @@ static void coroutine_fn stream_run(void *opaque) >> >> for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) { >> bool copy; >> + int64_t count = 0; >> >> /* Note that even when no rate limit is applied we need to yield >> * with no pending I/O here so that bdrv_drain_all() returns. >> @@ -148,8 +149,8 @@ static void coroutine_fn stream_run(void *opaque) >> >> copy = false; >> >> - ret = bdrv_is_allocated(bs, offset / BDRV_SECTOR_SIZE, >> - STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n); >> + ret = bdrv_is_allocated(bs, offset, STREAM_BUFFER_SIZE, &count); >> + n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE); >> if (ret == 1) { >> /* Allocated in the top, no need to copy. */ >> } else if (ret >= 0) { > > I'm not sure if I agree with rounding up here. If ret == 0, it could > mean that we skip some data that is actually allocated. So I'd like an > assertion better. > > The last patch of the series gets rid of this by switching the whole > function to bytes, so not that important. A temporary assertion is fine (and you're right that the assertion goes away once the function uses bytes everywhere). Sounds like we're racking up enough cleanups that I'll spin a v5, but also sounds like we're real close to getting this in. >> @@ -274,9 +275,9 @@ static int mig_save_device_bulk(QEMUFile *f, >> BlkMigDevState *bmds) >> /* Skip unallocated sectors; intentionally treats failure as >> * an allocated sector */ >> while (cur_sector < total_sectors && >> - !bdrv_is_allocated(blk_bs(bb), cur_sector, >> - MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) { >> - cur_sector += nr_sectors; >> + !bdrv_is_allocated(blk_bs(bb), cur_sector * BDRV_SECTOR_SIZE, >> + MAX_IS_ALLOCATED_SEARCH, &count)) { >> + cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE); >> } >> aio_context_release(blk_get_aio_context(bb)); >> qemu_mutex_unlock_iothread(); > > This one stays and it has the same problem. I think you need to round > down if you don't want to accidentally skip migrating allocated data. I haven't (yet) tackled trying to rewrite this one to do byte-base iteration, but that may help. I'm also worried (in general) that if something truncates down a sub-sector result in code that is still sector-based, then we could get stuck in an inf-loop of querying the same sector offset over and over again instead of making progress. But for this particular code, I see what you are saying - we are trying to optimize the migration by skipping unallocated sectors, but if there is nothing to skip, we still make migration progress as long as I remember to break from the loop if I ever get a sub-sector answer (thus migrating the whole sector, including the [unlikely] initial unallocated portion). Okay, I'll fix it for v5. > >> diff --git a/qemu-img.c b/qemu-img.c >> index d5a1560..5271b41 100644 >> --- a/qemu-img.c >> +++ b/qemu-img.c >> @@ -3229,6 +3229,7 @@ static int img_rebase(int argc, char **argv) >> int64_t new_backing_num_sectors = 0; >> uint64_t sector; >> int n; >> + int64_t count; >> float local_progress = 0; >> >> buf_old = blk_blockalign(blk, IO_BUF_SIZE); >> @@ -3276,12 +3277,14 @@ static int img_rebase(int argc, char **argv) >> } >> >> /* If the cluster is allocated, we don't need to take action */ >> - ret = bdrv_is_allocated(bs, sector, n, &n); >> + ret = bdrv_is_allocated(bs, sector << BDRV_SECTOR_BITS, >> + n << BDRV_SECTOR_BITS, &count); >> if (ret < 0) { >> error_report("error while reading image metadata: %s", >> strerror(-ret)); >> goto out; >> } >> + n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE); >> if (ret) { >> continue; >> } > > Same thing. Sectors that are half allocated and half free must be > considered completely allocated if the caller can't do byte alignment. > Just rounding up without looking at ret isn't correct. > >> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c >> index b0ea327..e529b8f 100644 >> --- a/qemu-io-cmds.c >> +++ b/qemu-io-cmds.c >> @@ -1894,15 +1889,14 @@ static int map_f(BlockBackend *blk, int argc, char >> **argv) >> } >> >> retstr = ret ? " allocated" : "not allocated"; >> - cvtstr(num << BDRV_SECTOR_BITS, s1, sizeof(s1)); >> - cvtstr(offset << BDRV_SECTOR_BITS, s2, sizeof(s2)); >> + cvtstr(num, s1, sizeof(s1)); >> + cvtstr(offset, s2, sizeof(s2)); >> printf("%s (0x%" PRIx64 ") bytes %s at offset %s (0x%" PRIx64 ")\n", >> - s1, num << BDRV_SECTOR_BITS, retstr, >> - s2, offset << BDRV_SECTOR_BITS); >> + s1, num, retstr, s2, offset); >> >> offset += num; >> - nb_sectors -= num; >> - } while (offset < total_sectors); >> + bytes -= num; >> + }; > > This semicolon isn't needed. Indeed. I converted do/while to plain while, but not quite correctly ;) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature