On 03.04.23 16:33, Hanna Czenczek wrote:
(Sorry for the rather late reply... Thanks for the review!)
On 20.03.23 11:31, Vladimir Sementsov-Ogievskiy wrote:
On 17.03.23 20:50, Hanna Czenczek wrote:
[...]
diff --git a/block/io.c b/block/io.c
index 8974d46941..1e9cdba17a 100644
--- a/block/io.c
+++ b/block/io.c
[..]
+ pad->write = write;
+
return true;
}
@@ -1545,6 +1561,18 @@ zero_mem:
static void bdrv_padding_destroy(BdrvRequestPadding *pad)
Maybe, rename to _finalize, to stress that it's not only freeing memory.
Sounds good!
[...]
@@ -1552,6 +1580,101 @@ static void bdrv_padding_destroy(BdrvRequestPadding
*pad)
memset(pad, 0, sizeof(*pad));
}
+/*
+ * Create pad->local_qiov by wrapping @iov in the padding head and tail, while
+ * ensuring that the resulting vector will not exceed IOV_MAX elements.
+ *
+ * To ensure this, when necessary, the first couple of elements (up to three)
maybe, "first two-three elements"
Sure (here and...
[...]
+ /*
+ * If padded_niov > IOV_MAX, we cannot just concatenate everything.
+ * Instead, merge the first couple of elements of @iov to reduce the number
maybe, "first two-three elements"
...here).
+ * of vector elements as necessary.
+ */
+ if (padded_niov > IOV_MAX) {
[..]
@@ -1653,8 +1786,8 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
flags |= BDRV_REQ_COPY_ON_READ;
}
- ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
- NULL, &flags);
+ ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, false,
+ &pad, NULL, &flags);
if (ret < 0) {
goto fail;
}
a bit later:
tracked_request_end(&req);
bdrv_padding_destroy(&pad);
Now, the request is formally finished inside bdrv_padding_destroy().. Not sure, does it really violate something, but seems safer to swap these two calls.
I’d rather not, for two reasons: First, tracked requests are (as far as I
understand) only there to implement request serialization, and so only care
about metadata (offset, length, and type), which is not changed by changes to
the I/O vector.
Second, even if the state of the I/O vector were relevant to tracked requests,
I think it would actually be the other way around, i.e. the tracked request
must be ended before the padding is finalized/destroyed. The tracked request
is about the actual request we submit to `child` (which is why
tracked_request_begin() is called after bdrv_pad_request()), and that request
is done using the modified I/O vector. So if the tracked request had any
connection to the request’s I/O vector (which it doesn’t), it would be to this
modified one, so we mustn’t invalidate it via bdrv_padding_finalize() while the
tracked request lives.
Or, said differently: I generally try to clean up things in the inverse way
they were set up, and because bdrv_pad_requests() comes before
tracked_request_begin(), I think tracked_request_end() should come before
bdrv_padding_finalize().
Note, that it's wise-versa in bdrv_co_pwritev_part().
For me it's just simpler to think that the whole request, including filling
user-given qiov with data on read part is inside tracked_request_begin() /
tracked_request_end(). And moving the last manipulation with qiov out of it
breaks this simple thought.
Guest should not care of it, as it doesn't know about request tracking.. But
what about internal code? Some code may depend on some requests be finished
after bdrv_drained_begin() call, but now they may be not fully finished, and
some data may be not copied back to original qiov.
I agree with your point about sequence of objects finalization, but maybe, that
just shows that copying data back to qiov should not be a part of
bdrv_padding_finalize(), but instead be a separate function, called before
tracked_request_end().
With that:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsement...@yandex-team.ru>
PS, I feel here still exists small space for optimization:
The question is whether any optimization is really worth it, and I’m not sure
it is. The bug has been in qemu for over two years, and because the only
report I’ve seen about it came from our QE department, it seems like a very
rare case, so I find it more important for the code to be as simple as possible
than to optimize.
move the logic to bdrv_init_padding(), and
1. allocate only one buffer
2. make the new collpase are to be attached to head or tail padding
3. avoid creating extra iov-slice, maybe with help of some new qemu_iovec_* API
that can control number of copied/to-be-copied iovs and/or calculation number
of iovs in qiov/qiov_offset/bytes slice
I’ve actually begun by trying to reuse the padding buffer, and to collapse
head/tail into it, but found it to be rather complicated. See also my reply to
Stefan here:
https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg04774.html
Hanna
--
Best regards,
Vladimir