Am 06.01.26 um 3:23 PM schrieb Stefan Hajnoczi:
> On Mon, Jan 05, 2026 at 03:29:55PM +0100, Fiona Ebner wrote:
>> Commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
>> enables the BLKZEROOUT ioctl when using 'writeback' cache, regressing
>> certain 'qemu-img convert' invocations, because of a pre-existing
>> issue. Namely, the BLKZEROOUT ioctl might fail with errno EINVAL when
>> the request is shorter than the block size of the block device.
>> Fallback to the bounce buffer, similar to when the ioctl is not
>> supported at all, rather than treating such an error as fatal.
>>
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3257
>> Resolves: https://bugzilla.proxmox.com/show_bug.cgi?id=7197
>> Cc: [email protected]
>> Signed-off-by: Fiona Ebner <[email protected]>
>> ---
>> block/io.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index cace297f22..e37d9d2417 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1917,7 +1917,8 @@ bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, int64_t
>> offset, int64_t bytes,
>> assert(!bs->supported_zero_flags);
>> }
>>
>> - if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) {
>> + if ((ret == -ENOTSUP || (ret == -EINVAL && num < alignment)) &&
>> + !(flags & BDRV_REQ_NO_FALLBACK)) {
>> /* Fall back to bounce buffer if write zeroes is unsupported */
>> BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
>
> This patch invokes the ioctl(BLKZERO) first and then falls back to a
> write request. In the bug report this only occurs at the end of qemu-img
> convert, so the performance overhead of failing the ioctl first and then
> submitting a write doesn't matter.
In my testing, it does not only happen at the end of the image, but
after the 2 GiB mark for larger zero images. Initially,
convert_iteration_sectors() returns 4096, but at the 2 GiB mark
s->sector_next_status - sector_num is 4095, so it will take precedence
in the assignment
n = MIN(n, s->sector_next_status - sector_num);
in convert_iteration_sectors(). After that, convert_iteration_sectors()
will return 4096 again, but short zero writes happen from there on,
because of that one time throwing off the alignment. (Similarly, around
other multiples of 2 GiB, there will be one time where 4095 is returned
in between).
> I would prefer taking advantage of the existing alignment code in
> bdrv_co_do_zero_pwritev() instead of falling back like this. The
> alignment code currently has no effect because
> BlockLimits->request_alignment = 1 for cache=writeback. It is unaware
> that ioctl(BLKZERO) requires block alignment. Once that is fixed,
> bdrv_co_do_zero_pwritev() will submit a regular write request for the
> misaligned tail.
>
> Block drivers with specific alignment requirements for write zeroes
> report them in BlockLimits->pwrite_zeroes_alignment. I think
> bdrv_co_do_zero_pwritev() should prioritize
> bs->bl.pwrite_zeroes_alignment over bs->bl.request_alignment.
> file-posix.c already populates pwrite_zeroes_alignment - it's just not
> used by bdrv_co_do_zero_pwritev() yet.
>
> Do you want to give that a try?
Yes, I'll try. This approach sounds cleaner :)
Best Regards,
Fiona