On Mon, Oct 24, 2016 at 2:21 PM, Eric Blake <ebl...@redhat.com> wrote: > How are you getting max_transfer == 65536? I can't reproduce it with > the following setup: > > $ qemu-img create -f qcow2 -o cluster_size=1M file 10M > $ qemu-io -f qcow2 -c 'w 7m 1k' file > $ qemu-io -f qcow2 -c 'w -z 8003584 2093056' file > > although I did confirm that the above sequence was enough to get the > -ENOTSUP failure and fall into the code calculating max_transfer. > > I'm guessing that you are using something other than a file system as > the backing protocol for your qcow2 image. But do you really have a > protocol that takes AT MOST 64k per transaction, while still trying to a > cluster size of 1M in the qcow2 format? That's rather awkward, as it > means that you are required to do 16 transactions per cluster (the whole > point of using larger clusters is usually to get fewer transactions). I > think we need to get to a root cause of why you are seeing such a small > max_transfer, before I can propose the right patch, since I haven't been > able to reproduce it locally yet (although I admit I haven't tried to > see if blkdebug could reliably introduce artificial limits to simulate > your setup). And it may turn out that I just have to fix the > bdrv_co_do_pwrite_zeroes() code to loop multiple times if the size of > the unaligned head really does exceed the max_transfer size that the > underlying protocol is able to support, rather than assuming that the > unaligned head/tail always fit in a single fallback write.
In this case I'm using a qcow2 image that's stored directly in a raw dm-crypt/LUKS container, which is itself a loop device on an ext4 filesystem. It appears loop devices (with or without dm-crypt/LUKS) report a 255-sector maximum per request via the BLKSECTGET ioctl, which qemu rounds down to 64k in raw_refresh_limits(). However this maximum appears to be just a hint: bdrv_driver_pwritev() succeeds even with a 385024-byte buffer of zeroes. As for the 1M cluster size, this is a temporary workaround for another qemu issue (the default qcow2 L2 table cache size performs well with random reads covering only up to 8 GB of image data with 64k clusters; beyond that the L2 table cache thrashes). I agree this is not an optimal configuration for writes. > Can you also try this patch? If I'm right, you'll still fail, but the > assertion will be slightly different. (Again, I'm passing locally, but > that's because I'm using the file protocol, and my file system does not > impose a puny 64k max transfer). > > diff --git i/block/io.c w/block/io.c > index b136c89..8757063 100644 > --- i/block/io.c > +++ w/block/io.c > @@ -1179,6 +1179,8 @@ static int coroutine_fn > bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, > int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX); > int alignment = MAX(bs->bl.pwrite_zeroes_alignment, > bs->bl.request_alignment); > + int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, > + MAX_WRITE_ZEROES_BOUNCE_BUFFER); > > assert(alignment % bs->bl.request_alignment == 0); > head = offset % alignment; > @@ -1197,6 +1199,8 @@ static int coroutine_fn > bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, > /* Make a small request up to the first aligned sector. */ > num = MIN(count, alignment - head); > head = 0; > + assert(num < max_write_zeroes); > + assert(num < max_transfer); > } else if (tail && num > alignment) { > /* Shorten the request to the last aligned sector. */ > num -= tail; > @@ -1222,8 +1226,6 @@ static int coroutine_fn > bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, > > if (ret == -ENOTSUP) { > /* Fall back to bounce buffer if write zeroes is unsupported */ > - int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, > - > MAX_WRITE_ZEROES_BOUNCE_BUFFER); > BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE; > > if ((flags & BDRV_REQ_FUA) && With this change, the num < max_transfer assertion fails on the first iteration (with num=385024 and max_transfer=65536). --Ed