On 9/19/23 13:46, Hanna Czenczek wrote: > On 15.09.23 18:20, Andrey Drobyshev wrote: >> When rebasing an image from one backing file to another, we need to >> compare data from old and new backings. If the diff between that data >> happens to be unaligned to the target cluster size, we might end up >> doing partial writes, which would lead to copy-on-write and additional >> IO. >> >> Consider the following simple case (virtual_size == cluster_size == 64K): >> >> base <-- inc1 <-- inc2 >> >> qemu-io -c "write -P 0xaa 0 32K" base.qcow2 >> qemu-io -c "write -P 0xcc 32K 32K" base.qcow2 >> qemu-io -c "write -P 0xbb 0 32K" inc1.qcow2 >> qemu-io -c "write -P 0xcc 32K 32K" inc1.qcow2 >> qemu-img rebase -f qcow2 -b base.qcow2 -F qcow2 inc2.qcow2 >> >> While doing rebase, we'll write a half of the cluster to inc2, and block >> layer will have to read the 2nd half of the same cluster from the base >> image >> inc1 while doing this write operation, although the whole cluster is >> already >> read earlier to perform data comparison. >> >> In order to avoid these unnecessary IO cycles, let's make sure every >> write request is aligned to the overlay subcluster boundaries. Using >> subcluster size is universal as for the images which don't have them >> this size equals to the cluster size, so in any case we end up aligning >> to the smallest unit of allocation. >> >> Signed-off-by: Andrey Drobyshev <andrey.drobys...@virtuozzo.com> >> --- >> qemu-img.c | 76 ++++++++++++++++++++++++++++++++++++++++-------------- >> 1 file changed, 56 insertions(+), 20 deletions(-) > > Looks good, I like the changes from v1! Two minor things: > >> diff --git a/qemu-img.c b/qemu-img.c >> index fcd31d7b5b..83950af42b 100644 >> --- a/qemu-img.c >> +++ b/qemu-img.c > > [...] > >> @@ -3844,33 +3861,48 @@ static int img_rebase(int argc, char **argv) >> } >> } >> + /* >> + * At this point we know that the region [offset; offset >> + n) >> + * is unallocated within the target image. This region >> might be >> + * unaligned to the target image's (sub)cluster >> boundaries, as >> + * old backing may have smaller clusters (or have >> subclusters). >> + * We extend it to the aligned boundaries to avoid CoW on >> + * partial writes in blk_pwrite(), >> + */ >> + n += offset - QEMU_ALIGN_DOWN(offset, write_align); >> + offset = QEMU_ALIGN_DOWN(offset, write_align); >> + n += QEMU_ALIGN_UP(offset + n, write_align) - (offset + n); >> + n = MIN(n, size - offset); >> + assert(!bdrv_is_allocated(unfiltered_bs, offset, n, >> &n_alloc) && >> + n_alloc == n); >> + >> + /* >> + * Much like the with the target image, we'll try to read >> as much > > s/the with the/with the/ >
Noted. >> + * of the old and new backings as we can. >> + */ >> + n_old = MIN(n, MAX(0, old_backing_size - (int64_t) offset)); >> + if (blk_new_backing) { >> + n_new = MIN(n, MAX(0, new_backing_size - (int64_t) >> offset)); >> + } > > If we don’t have a check for blk_old_backing (old_backing_size is 0 if > blk_old_backing is NULL), why do we have a check for blk_new_backing > (new_backing_size is 0 if blk_new_backing is NULL)? > > (Perhaps because the previous check was `offset >= new_backing_size || > !blk_new_backing`, i.e. included exactly such a check – but I don’t > think it’s necessary, new_backing_size will be 0 if blk_new_backing is > NULL.) > >> + >> /* >> * Read old and new backing file and take into >> consideration that >> * backing files may be smaller than the COW image. >> */ >> - if (offset >= old_backing_size) { >> - memset(buf_old, 0, n); >> - buf_old_is_zero = true; >> + memset(buf_old + n_old, 0, n - n_old); >> + if (!n_old) { >> + old_backing_eof = true; >> } else { >> - if (offset + n > old_backing_size) { >> - n = old_backing_size - offset; >> - } >> - >> - ret = blk_pread(blk_old_backing, offset, n, buf_old, 0); >> + ret = blk_pread(blk_old_backing, offset, n_old, >> buf_old, 0); >> if (ret < 0) { >> error_report("error while reading from old >> backing file"); >> goto out; >> } >> } >> - if (offset >= new_backing_size || !blk_new_backing) { >> - memset(buf_new, 0, n); >> - } else { >> - if (offset + n > new_backing_size) { >> - n = new_backing_size - offset; >> - } >> - >> - ret = blk_pread(blk_new_backing, offset, n, buf_new, 0); >> + memset(buf_new + n_new, 0, n - n_new); >> + if (blk_new_backing && n_new) { > > Same as above, I think we can drop the blk_new_backing check, just so > that both blocks (for old and new) look the same. > > (Also, the memset() already has to trust that n_new is 0 if > blk_new_backing is NULL, so the check doesn’t make much sense from that > perspective either, and makes the memset() look wrong.) > Good point, thank you. Looks like we indeed can safely remove these extra checks. >> + ret = blk_pread(blk_new_backing, offset, n_new, >> buf_new, 0); >> if (ret < 0) { >> error_report("error while reading from new >> backing file"); >> goto out; >