On 5/6/20 4:25 AM, Vladimir Sementsov-Ogievskiy wrote:
In case of !VDI_IS_ALLOCATED[], we do zero out the corresponding chunk
of qiov. So, this should be reported as ZERO.
This part makes sense outright, since vdi does not allow backing files,
so all reads of a VDI disk result in data rather than deferral to
another layer, and we indeed read zero in this case. However, it _is_ a
behavior change: previously, 'qemu-io -c map' on a vdi image will show
unallocated portions, but with this change, the entire image now shows
as allocated. You need to call out this fact in the commit message,
documenting that it is intentional (it matches what we do for posix
files: since neither files nor vdi support backing images, we guarantee
that all read attempts will be satisfied by this layer rather than
deferring to a backing layer, and thus can treat all data as allocated
in this layer, regardless of whether it is sparse).
Do any existing iotests need an update to reflect change in output? If
not, should we have an iotest covering it?
After block-status update, it never reports 0, so setting
unallocated_blocks_are_zero doesn't make sense. Drop it.
This is a harder claim. To prove it, we need to inspect all callers that
use unallocated_blocks_are_zero, to see their intent. Fortunately, the
list is small - a mere 2 clients!
block/io.c:bdrv_co_block_status(): if .bdrv_co_block_status sets either
_DATA or _ZERO, block status adds _ALLOCATED; but if the driver did not
set either, we use bdrv_unallocated_blocks_are_zero() in order to set
_ZERO but not _ALLOCATED. This is the code that explains the behavior
change mentioned above (now that vdi is reporting _ZERO instead of 0,
block_status is appending _ALLOCATED instead of querying
bdrv_unallocated_blocks_are_zero). But you are correct that it does not
change where _ZERO is reported.
qemu-img.c:img_convert(): calls bdrv_get_info() up front and caches what
it learned from unallocated_blocks_are_zero about the target; later on,
in convert_iteration_sectors(), it uses this information to optimize the
case where the source reads as zero, but the target has a backing file
and the range being written lies beyond the end of the backing file.
That is, given the following scenario:
qemu-img convert input -B backing output
input: ABCD-0E
backing: ACBD
the optimization lets us produce:
output: -BC---E
instead of the larger:
output: -BC--0E
Do we have any iotests that cover this particular scenario, to prove
that our optimization does indeed result in a smaller image file without
explicit zeros written in the tail? Or put another way, if we were to
disable the post_backing_zero optimization in
convert_iteration_sectors(), would any iotests fail? If not, we should
really enhance our testsuite.
With that said, we could just as easily achieve the optimization of the
conversion to the tail of a destination with short backing file by
checking block_status to see whether the tail already reads as all
zeroes, rather than relying on unallocated_blocks_are_zero. Even if
querying block_status is a slight pessimization in time, it would avoid
regressing in the more important factor of artificially bloating the
destination. I would _really_ love to see a patch to qemu-img.c
reworking the logic to avoid the use of unallocated_blocks_are_zero
first, prior to removing the setting of that field in the various
drivers. Once bdrv_co_block_status() remains as the only client of the
field, then I could accept this patch with a better commit message about
the intentional change in block_status _ALLOCATION behavior.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
---
block/vdi.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/vdi.c b/block/vdi.c
index 0c7835ae70..83471528d2 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -334,7 +334,6 @@ static int vdi_get_info(BlockDriverState *bs,
BlockDriverInfo *bdi)
logout("\n");
bdi->cluster_size = s->block_size;
bdi->vm_state_offset = 0;
- bdi->unallocated_blocks_are_zero = true;
return 0;
}
@@ -536,7 +535,7 @@ static int coroutine_fn vdi_co_block_status(BlockDriverState *bs,
*pnum = MIN(s->block_size - index_in_block, bytes);
result = VDI_IS_ALLOCATED(bmap_entry);
if (!result) {
- return 0;
+ return BDRV_BLOCK_ZERO;
}
*map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org