On Fri, Jan 09, 2026 at 01:08:27PM +0100, Fiona Ebner wrote: > Previous discussion here: > https://lore.kernel.org/qemu-devel/[email protected]/ > > Commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback") > enables the BLKZEROOUT ioctl when using 'writeback' cache, regressing > certain 'qemu-img convert' invocations, because of a pre-existing > issue. Namely, the BLKZEROOUT ioctl might fail with errno EINVAL when > the request is shorter than the block size of the block device. > > Stefan suggested prioritizing bl.pwrite_zeroes_alignment in > bdrv_co_do_zero_pwritev(). This RFC explores that approach and the > issues with qcow2 I encountered, where > bl.pwrite_zeroes_alignment = s->subcluster_size; > I would be happy to discuss potential solutions and whether we should > use this approach after all.
These issues are a headache, but I think it's important for us to consider them. They indicate that QEMU does not properly distinguish between read/write and pwrite_zeroes constraints. If we can agree on how the block layer should handle pwrite_zeroes constraints in a consistent way that makes the tests pass, then that should serve the QEMU block layer well in the future. I will mention this patch series to Kevin as well so we can get his opinion. > > For example, in iotest 154 and 271, there are assertion failures, > because the padded request extends beyond the end of the image: > Assertion `offset + bytes <= bs->total_sectors * BDRV_SECTOR_SIZE || > child->perm & BLK_PERM_RESIZE' failed. > The total image length is not necessarily aligned to the cluster size. > This could be solved by shortening the relevant requests in > bdrv_co_do_zero_pwritev() and submitting them without the > BDRV_REQ_ZERO_WRITE flag and with bl.request_alignment as the > alignment see patch 5/6. > > For iotest 179, I would need to avoid clearing BDRV_REQ_ZERO_WRITE for > the head and tail parts as long as the buffer is fully zero. > Otherwise, we end up with more 'data' sectors in the target map. See > patch 6/6. With or without that, iotests 154 and 271 produces > different output (I think it might be expected, but haven't checked in > detail yet). > > Another issue is exposed by iotest 177, where the (sub-)cluster size > is 1MiB, but max-transfer is only 64KiB leading to assertion failures, > because max_transfer = > QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX), align); > evaluates to 0 (because align > bs->bl.max_transfer). This could be > fixed by safeguarding doing the QEMU_ALIGN_DOWN only if the value is > bigger than align, see patch 4/6. > > I'm also not sure what to do about iotest 204 and 177 which use > 'opt-write-zero=15M' for the blkdebug driver (which assigns that value > to pwrite_zeroes_alignment) making an is_power_of_2(align) assertion > fail. > > Yet another issue is the 'detect_zeroes' option. If the option is set, > bdrv_aligned_pwritev() might set the BDRV_REQ_ZERO_WRITE flag even if > the request is not aligned to pwrite_zeroes_alignment and the original > bug could resurface. > > Best Regards, > Fiona > > > Fiona Ebner (6): > block/io: pass alignment to bdrv_init_padding() > block/io: add 'bytes' parameter to bdrv_padding_rmw_read() > block/io: honor pwrite_zeroes_alignment in bdrv_co_do_zero_pwritev() > block/io: safeguard max transfer calculation in bdrv_aligned_pwritev() > block/io: handle image length not aligned to write zeroes alignment in > bdrv_co_do_zero_pwritev() > block/io: keep zero flag for head/tail parts of misaligned zero write > when possible > > block/io.c | 78 ++++++++++++++++++++++++++++++++++++++---------------- > 1 file changed, 55 insertions(+), 23 deletions(-) > > -- > 2.47.3 > >
signature.asc
Description: PGP signature
