Am 23.04.2020 um 17:36 hat Eric Blake geschrieben: > On 4/23/20 10:01 AM, Kevin Wolf wrote: > > The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the > > image is possibly preallocated and then the zero flag is added to all > > clusters. This means that a copy-on-write operation may be needed when > > writing to these clusters, despite having used preallocation, negating > > one of the major benefits of preallocation. > > > > Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver, > > and if the protocol driver can ensure that the new area reads as zeros, > > we can skip setting the zero flag in the qcow2 layer. > > Hmm. When we get block status, it is very easy to tell that something reads > as zero when the qcow2 file has the zero bit set, but when the qcow2 file > does not have the zero bit set, we have to then query the format layer > whether it reads as zeros (which is expensive enough for some format layers > that we no longer report things as reading as zero). I'm worried that > optimizing this case could penalize later block status.
That's just how preallocation works. If you don't want that, you need preallocation=off. > We already can tell the difference between a cluster that has the zero bit > set but is not preallocated, vs. has the zero bit set and is preallocated. > Are we really forcing a copy-on-write to a cluster that is marked zero but > preallocated? Is the problem that we don't have a way to distinguish > between 'reads as zeroes, allocated, but we don't know state of format > layer' and 'reads as zeroes, allocated, and we know format layer reads as > zeroes'? Basically, yes. If we had this, we could have a type of cluster where writing to it still involves a metadata update (to clear the zero flag), but never copy-on-write, even for partial writes. I'm not sure if this would cover a very relevant case, though. > > > > Unfortunately, the same approach doesn't work for metadata > > preallocation, so we'll still set the zero flag there. > > > > Signed-off-by: Kevin Wolf <kw...@redhat.com> > > --- > > block/qcow2.c | 22 +++++++++++++++++++--- > > tests/qemu-iotests/274.out | 4 ++-- > > 2 files changed, 21 insertions(+), 5 deletions(-) > > > > diff --git a/block/qcow2.c b/block/qcow2.c > > index ad621fe404..b28e588942 100644 > > --- a/block/qcow2.c > > +++ b/block/qcow2.c > > @@ -4170,9 +4170,25 @@ static int coroutine_fn > > qcow2_co_truncate(BlockDriverState *bs, int64_t offset, > > /* Allocate the data area */ > > new_file_size = allocation_start + > > nb_new_data_clusters * s->cluster_size; > > - /* Image file grows, so @exact does not matter */ > > - ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0, > > - errp); > > + /* > > + * Image file grows, so @exact does not matter. > > + * > > + * If we need to zero out the new area, try first whether the > > protocol > > + * driver can already take care of this. > > + */ > > + if (flags & BDRV_REQ_ZERO_WRITE) { > > + ret = bdrv_co_truncate(bs->file, new_file_size, false, > > prealloc, > > + BDRV_REQ_ZERO_WRITE, errp); > > + if (ret >= 0) { > > + flags &= ~BDRV_REQ_ZERO_WRITE; > > + } > > Hmm - just noticing things: how does this series interplay with the existing > bdrv_has_zero_init_truncate? Should this series automatically handle > BDRV_REQ_ZERO_WRITE on a bdrv_co_truncate(PREALLOC_NONE) request for all > drivers that report true, even if that driver does not advertise support for > the BDRV_REQ_ZERO_WRITE flag? Hmm... It feels risky to me. > > + } else { > > + ret = -1; > > + } > > Here, ret == -1 does not imply whether errp is set - but annoyingly, errp > CAN be set if bdrv_co_truncate() failed. > > > + if (ret < 0) { > > + ret = bdrv_co_truncate(bs->file, new_file_size, false, > > prealloc, 0, > > + errp); > > And here, you are passing a possibly-set errp back to bdrv_co_truncate(). > That is a bug that can abort. You need to pass NULL to the first > bdrv_co_truncate() call or else clear errp prior to trying a fallback to > this second bdrv_co_truncate() call. Yes, you're right. If nothing else comes up, I'll fix this while applying. Kevin