On Wed, Oct 29, 2025 at 03:50:27PM +0900, Akihiko Odaki wrote:
> file-posix used to assume that existing holes satisfy the requested
> alignment, which equals to the estimated direct I/O alignment
> requirement if direct I/O is requested, and assert the assumption
> unless it is at EOF.
>
> However, the estimation of direct I/O alignment requirement is sometimes
> inexact and can be overly strict. For example, I observed that QEMU
> estimated the alignment requirement as 16K while the real requirement
> is 4K when Btrfs is used on Linux 6.14.6 and the host page size is 16K.
If needed, it should be possible to use nbdkit and an NBD device to
force other unusual alignment scenarios. But for this patch, I agree
with your analysis that...
> Moreover, even if we could figure out the direct I/O alignment
> requirement, I could not find a documentation saying it will exactly
> match with the alignment of holes.
>
> So stop asserting the assumption on the holes and handle unaligned holes
> properly.
...not asserting, and merely handling rounding ourselves, is the best
path forward.
> block/file-posix.c | 41 +++++++++++++++++++++++++----------------
> 1 file changed, 25 insertions(+), 16 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 8c738674cedb..b6d7a31b4d04 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -3315,29 +3315,38 @@ static int coroutine_fn
> raw_co_block_status(BlockDriverState *bs,
> + /*
> + * We may have allocation unaligned with the requested
> + * alignment due to the following reaons:
reasons
> + * - unaligned file size
> + * - inexact direct I/O alignment requirement estimation
> + * - mismatches between the allocation size and
> + * direct I/O alignment requirement.
> + *
> + * We are not allowed to return partial sectors, though, so
> + * round up the end of allocation if necessary.
> + */
> + *pnum = ROUND_UP(*pnum, bs->bl.request_alignment);
> ret = BDRV_BLOCK_DATA;
> } else {
> /* On a hole, compute bytes to the beginning of the next extent. */
> assert(hole == offset);
> *pnum = data - offset;
> - ret = BDRV_BLOCK_ZERO;
> +
> + /*
> + * We may have allocation unaligned, so round down the beginning
> + * of allocation if necessary.
> + */
> + if (*pnum < bs->bl.request_alignment) {
> + *pnum = bs->bl.request_alignment;
> + ret = BDRV_BLOCK_DATA;
> + } else {
> + *pnum = ROUND_DOWN(*pnum, bs->bl.request_alignment);
> + ret = BDRV_BLOCK_ZERO;
> + }
With the typo fix,
Reviewed-by: Eric Blake <[email protected]>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org