Am 19.05.2017 um 11:34 hat Anton Nefedov geschrieben: > From: "Denis V. Lunev" <d...@openvz.org> > > Currently each single write operation can result in 3 write operations > if guest offsets are not cluster aligned. One write is performed for the > real payload and two for COW-ed areas. Thus the data possibly lays > non-contiguously on the host filesystem. This will reduce further > sequential read performance significantly. > > The patch allocates the space in the file with cluster granularity, > ensuring > 1. better host offset locality > 2. less space allocation operations > (which can be expensive on distributed storages) > > Signed-off-by: Denis V. Lunev <d...@openvz.org> > Signed-off-by: Anton Nefedov <anton.nefe...@virtuozzo.com>
I don't think this is the right approach. You end up with two write operations: One write_zeroes and then one data write. If the backend actually supports efficient zero writes, write_zeroes won't necessarily allocate space, but writing data will definitely split the already existing allocation. If anything, we need a new bdrv_allocate() or something that would call fallocate() instead of abusing write_zeroes. It seems much clearer to me that simply unifying the three write requests into a single one is an improvement. And it's easy to do, I even had a patch once to do this. The reason that I didn't send it was that it seemed to conflict with the data cache approach > block/qcow2.c | 32 +++++++++++++++++++++++++++++++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/block/qcow2.c b/block/qcow2.c > index a8d61f0..2e6a0ec 100644 > --- a/block/qcow2.c > +++ b/block/qcow2.c > @@ -1575,6 +1575,32 @@ fail: > return ret; > } > > +static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta) > +{ > + BDRVQcow2State *s = bs->opaque; > + BlockDriverState *file = bs->file->bs; > + QCowL2Meta *m; > + int ret; > + > + for (m = l2meta; m != NULL; m = m->next) { > + uint64_t bytes = m->nb_clusters << s->cluster_bits; > + > + if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) { > + continue; > + } > + > + /* try to alloc host space in one chunk for better locality */ > + ret = file->drv->bdrv_co_pwrite_zeroes(file, m->alloc_offset, bytes, > 0); No. This is what you bypass: * All sanity checks that the block layer does * bdrv_inc/dec_in_flight(), which is required for drain to work correctly. Not doing this will cause crashes. * tracked_request_begin/end(), mark_request_serialising() and wait_serialising_requests(), which are required for serialising requests to work correctly * Ensuring correct request alignment for file. This means that e.g. qcow2 with cluster size 512 on a host with a 4k native disk will break. * blkdebug events * before_write_notifiers. Not calling these will cause corrupted backups if someone backups file. * Dirty bitmap updates * Updating write_gen, wr_highest_offset and total_sectors * Ensuring that bl.max_pwrite_zeroes and bl.pwrite_zeroes_alignment are respected And these are just the obvious things. I'm sure I missed some. > + if (ret != 0) { > + continue; > + } > + > + file->total_sectors = MAX(file->total_sectors, > + (m->alloc_offset + bytes) / > BDRV_SECTOR_SIZE); You only compensate for part of a single item in the list above, by duplicating code with block/io.c. This is not how to do things. As I said above, I think you don't really want write_zeroes anyway, but if you wanted a write_zeroes "but only if it's efficient" (which I'm not sure is a good thing to want), then a better way might be introducing a new request flag. > + } > +} > + > static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t > offset, > uint64_t bytes, QEMUIOVector *qiov, > int flags) Kevin