On Fri, Jul 12, 2019 at 08:33:14PM +0200, Max Reitz wrote: > On 12.07.19 12:27, Stefano Garzarella wrote: > > On Thu, Jul 11, 2019 at 03:29:35PM +0200, Max Reitz wrote: > >> When preallocating an encrypted qcow2 image, it just lets the protocol > >> driver write data and then does not mark the clusters as zero. > >> Therefore, reading this image will yield effectively random data. > >> > >> As such, we have not fulfilled the promise of always writing zeroes when > >> preallocating an image in a while. It seems that nobody has really > >> cared, so change the documentation to conform to qemu's actual behavior. > >> > >> Signed-off-by: Max Reitz <mre...@redhat.com> > >> --- > >> qapi/block-core.json | 9 +++++---- > >> docs/qemu-block-drivers.texi | 4 ++-- > >> qemu-img.texi | 4 ++-- > >> 3 files changed, 9 insertions(+), 8 deletions(-) > >> > >> diff --git a/qapi/block-core.json b/qapi/block-core.json > >> index 0d43d4f37c..a4363b84d2 100644 > >> --- a/qapi/block-core.json > >> +++ b/qapi/block-core.json > >> @@ -5167,10 +5167,11 @@ > >> # @off: no preallocation > >> # @metadata: preallocate only for metadata > >> # @falloc: like @full preallocation but allocate disk space by > >> -# posix_fallocate() rather than writing zeros. > >> -# @full: preallocate all data by writing zeros to device to ensure disk > >> -# space is really available. @full preallocation also sets up > >> -# metadata correctly. > >> +# posix_fallocate() rather than writing data. > >> +# @full: preallocate all data by writing it to the device to ensure > >> +# disk space is really available. This data may or may not be > >> +# zero, depending on the image format and storage. > >> +# @full preallocation also sets up metadata correctly. > >> # > >> # Since: 2.2 > >> ## > >> diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi > >> index 91ab0eceae..c02547e28c 100644 > >> --- a/docs/qemu-block-drivers.texi > >> +++ b/docs/qemu-block-drivers.texi > >> @@ -31,8 +31,8 @@ Supported options: > >> @item preallocation > >> Preallocation mode (allowed values: @code{off}, @code{falloc}, > >> @code{full}). > >> @code{falloc} mode preallocates space for image by calling > >> posix_fallocate(). > >> -@code{full} mode preallocates space for image by writing zeros to > >> underlying > >> -storage. > >> +@code{full} mode preallocates space for image by writing data to > >> underlying > >> +storage. This data may or may not be zero, depending on the storage > >> location. > >> @end table > >> > >> @item qcow2 > >> diff --git a/qemu-img.texi b/qemu-img.texi > >> index c8e9bba515..b5156d6316 100644 > >> --- a/qemu-img.texi > >> +++ b/qemu-img.texi > >> @@ -666,8 +666,8 @@ Supported options: > >> @item preallocation > >> Preallocation mode (allowed values: @code{off}, @code{falloc}, > >> @code{full}). > >> @code{falloc} mode preallocates space for image by calling > >> posix_fallocate(). > >> -@code{full} mode preallocates space for image by writing zeros to > >> underlying > >> -storage. > >> +@code{full} mode preallocates space for image by writing data to > >> underlying > >> +storage. This data may or may not be zero, depending on the storage > >> location. > >> @end table > >> > >> @item qcow2 > > > > Just a question: > > But a very good one, actually.
:-) > > > if a protocol driver returns 1 with the .bdrv_has_zero_init callback, is it > > expected that the preallocation will fill the image with zeroes? > > Yes. > > > IIUC, for example, the qcow2 returns 1 with the .bdrv_has_zero_init. but > > during > > the qcow2_co_truncate() it calls bdrv_co_truncate() and, depending on the > > backend driver, it should fill the image with zeroes (or not?). > > Yes. > > > Maybe I miss something related to the metadata... > > Well. If the image isn’t preallocated, all will be well because nothing > of the added range is entered into the metadata, so it returns zero when > read (unless there is a backing file, but that is handled independently > of has_zero_init). Okay, that makes sense. > > But you were asking about preallocation. As I wrote in the commit > message, the qcow2 driver lets the protocol driver preallocate the data > and then enters it as normal data clusters into its metadata. If the > image is encrypted, it will appears as random data (or if the protocol > dirver writes non-zero data). But then it shouldn’t report > has_zero_init as 1. > > Let’s test it. > > $ qemu-img create -f qcow2 src.qcow2 64M > Formatting 'src.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 > lazy_refcounts=off refcount_bits=16 > $ qemu-img convert -O qcow2 \ > -o \ > encrypt.format=luks,encrypt.key-secret=pass,preallocation=metadata \ > --object secret,id=pass,data=123456 \ > src.qcow2 dest.qcow2 > $ qemu-img compare --image-opts \ > file.filename=src.qcow2 \ > file.filename=dest.qcow2,encrypt.key-secret=pass \ > --object secret,id=pass,data=123456 > Content mismatch at offset 0! > > Oops. > > > So. We can do two things here. > > (A) We drop this patch and actually make sure that preallocation always > writes zeroes, and if that cannot be done efficiently, then too bad. > > Note that for qcow2, we cannot just mark all clusters as zero clusters > because that would kind of defeat the purpose of metadata preallocation. > One of its main purposes is to prevent COW when you write to a new > image, i.e. that the qcow2 driver needs to do a read-modify-write cycle > just to zero a new cluster. If we kept preallocating potentially random > data and hooked it up as preallocated zero clusters, those RMW cycles > would remain, thus defeating the point of metadata preallocation. So > even for qcow2, if there is a chance that the data stored is not zero, > it needs to explicitly store zeroes then. > > But the good thing here is that the protocol driver would always > guarantee that it preallocates pure zeroes. > > The bad thing is that I don’t think we could support pure metadata or > falloc preallocation together with encryption any longer, which would > definitely be an incompatible change. Well, because we wouldn’t want to > break this support, I suppose we would in this case (encryption) resort > to linking the data clusters as preallocated zero clusters. Which is > bad because of RMW. but well. That’s life. Yes, I agree that it is a bad thing. > > > (B) We keep this patch and audit our use of bdrv_has_zero_init(). So > for qcow2, we need to return 0 for encrypted images. That is suboptimal > for non-preallocated encrypted images, but again, that’s life. > > Also, we need to return 0 if the protocol layer does not preallocate the > data to be zero -- which we can see from its setting of > bdrv_has_zero_init(), I’d suppose. So for preallocated unencrypted > images, qcow2 would need to return bdrv_has_zero_init(s->data_file->bs). Yes, that's what I had in mind, too. > > But here’s the problem: How do we know whether an image has been > preallocated or not? And that is a problem. I don’t know a solution > off the top of my head. We could add a parameter to > bdrv_has_zero_init() for that, but what would blockdev-mirror assume? > It doesn’t know. Why we need to know if an image has been preallocated with zeroes or not? Maybe when we convert it, but I'm not sure. > > Hm. I suppose qcow2’s implementation could sift through its L2 tables > and if there are any links to data clusters, it is by definition in some > way preallocated. > > > So I suppose B seems like the better solution if there is a way for all > format drivers to determine whether their image has been preallocated or > not? Seems like the better solution to me, also if I miss something why we need to determine whether or not it's been pre-allocated. :-( Thanks for very detailed answer, Stefano