Re: [PULL for-9.0 0/1] Block patches

2024-03-22 Thread Peter Maydell
On Thu, 21 Mar 2024 at 17:23, Stefan Hajnoczi  wrote:
>
> The following changes since commit fea445e8fe9acea4f775a832815ee22bdf2b0222:
>
>   Merge tag 'pull-maintainer-final-for-real-this-time-200324-1' of 
> https://gitlab.com/stsquad/qemu into staging (2024-03-21 10:31:56 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to 9352f80cd926fe2dde7c89b93ee33bb0356ff40e:
>
>   coroutine: reserve 5,000 mappings (2024-03-21 13:14:30 -0400)
>
> 
> Pull request
>
> I was too quick in sending the coroutine pool sizing change for -rc0 and still
> needed to address feedback from Daniel Berrangé.
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM



Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-03-22 Thread Eugenio Perez Martin
On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:
>
> The goal of these patches is to add support to a variety of virtio and
> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> indicates that all buffers are used by the device in the same order in
> which they were made available by the driver.
>
> These patches attempt to implement a generalized, non-device-specific
> solution to support this feature.
>
> The core feature behind this solution is a buffer mechanism in the form
> of GLib's GHashTable. The decision behind using a hash table was to
> leverage their ability for quick lookup, insertion, and removal
> operations. Given that our keys are simply numbers of an ordered
> sequence, a hash table seemed like the best choice for a buffer
> mechanism.
>
> -
>
> The strategy behind this implementation is as follows:
>
> We know that buffers that are popped from the available ring and enqueued
> for further processing will always done in the same order in which they
> were made available by the driver. Given this, we can note their order
> by assigning the resulting VirtQueueElement a key. This key is a number
> in a sequence that represents the order in which they were popped from
> the available ring, relative to the other VirtQueueElements.
>
> For example, given 3 "elements" that were popped from the available
> ring, we assign a key value to them which represents their order (elem0
> is popped first, then elem1, then lastly elem2):
>
>  elem2   --  elem1   --  elem0   ---> Enqueue for processing
> (key: 2)(key: 1)(key: 0)
>
> Then these elements are enqueued for further processing by the host.
>
> While most devices will return these completed elements in the same
> order in which they were enqueued, some devices may not (e.g.
> virtio-blk). To guarantee that these elements are put on the used ring
> in the same order in which they were enqueued, we can use a buffering
> mechanism that keeps track of the next expected sequence number of an
> element.
>
> In other words, if the completed element does not have a key value that
> matches the next expected sequence number, then we know this element is
> not in-order and we must stash it away in a hash table until an order
> can be made. The element's key value is used as the key for placing it
> in the hash table.
>
> If the completed element has a key value that matches the next expected
> sequence number, then we know this element is in-order and we can push
> it on the used ring. Then we increment the next expected sequence number
> and check if the hash table contains an element at this key location.
>
> If so, we retrieve this element, push it to the used ring, delete the
> key-value pair from the hash table, increment the next expected sequence
> number, and check the hash table again for an element at this new key
> location. This process is repeated until we're unable to find an element
> in the hash table to continue the order.
>
> So, for example, say the 3 elements we enqueued were completed in the
> following order: elem1, elem2, elem0. The next expected sequence number
> is 0:
>
> exp-seq-num = 0:
>
>  elem1   --> elem1.key == exp-seq-num ? --> No, stash it
> (key: 1) |
>  |
>  v
>
>|key: 1 - elem1|
>
> -
> exp-seq-num = 0:
>
>  elem2   --> elem2.key == exp-seq-num ? --> No, stash it
> (key: 2) |
>  |
>  v
>
>|key: 1 - elem1|
>|--|
>|key: 2 - elem2|
>
> -
> exp-seq-num = 0:
>
>  elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
> (key: 0)
>
> exp-seq-num = 1:
>
> lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>  remove elem from table
>  |
>  v
>
>|key: 2 - elem2|
>
>
> exp-seq-num = 2:
>
> lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>   

Re: [RFC 8/8] virtio: Add VIRTIO_F_IN_ORDER property definition

2024-03-22 Thread Eugenio Perez Martin
On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:
>
> Extend the virtio device property definitions to include the
> VIRTIO_F_IN_ORDER feature.
>
> The default state of this feature is disabled, allowing it to be
> explicitly enabled where it's supported.
>

Acked-by: Eugenio Pérez 

Thanks!

> Signed-off-by: Jonah Palmer 
> ---
>  include/hw/virtio/virtio.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index eeeda397a9..ffd78830a3 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -400,7 +400,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
>  DEFINE_PROP_BIT64("packed", _state, _field, \
>VIRTIO_F_RING_PACKED, false), \
>  DEFINE_PROP_BIT64("queue_reset", _state, _field, \
> -  VIRTIO_F_RING_RESET, true)
> +  VIRTIO_F_RING_RESET, true), \
> +DEFINE_PROP_BIT64("in_order", _state, _field, \
> +  VIRTIO_F_IN_ORDER, false)
>
>  hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
>  bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
> --
> 2.39.3
>




Re: [RFC 7/8] vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits

2024-03-22 Thread Eugenio Perez Martin
On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:
>
> Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
> devices.
>
> The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
> devices ensures that the backend is capable of offering and providing
> support for this feature, and that it can be disabled if the backend
> does not support it.
>

Acked-by: Eugenio Pérez 

Thanks!

> Signed-off-by: Jonah Palmer 
> ---
>  hw/block/vhost-user-blk.c| 1 +
>  hw/net/vhost_net.c   | 2 ++
>  hw/scsi/vhost-scsi.c | 1 +
>  hw/scsi/vhost-user-scsi.c| 1 +
>  hw/virtio/vhost-user-fs.c| 1 +
>  hw/virtio/vhost-user-vsock.c | 1 +
>  net/vhost-vdpa.c | 1 +
>  7 files changed, 8 insertions(+)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 6a856ad51a..d176ed857e 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_IOMMU_PLATFORM,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_IN_ORDER,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index e8e1661646..33d1d4b9d3 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
>  VIRTIO_F_IOMMU_PLATFORM,
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_IN_ORDER,
>  VIRTIO_NET_F_HASH_REPORT,
>  VHOST_INVALID_FEATURE_BIT
>  };
> @@ -76,6 +77,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_F_IOMMU_PLATFORM,
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_IN_ORDER,
>  VIRTIO_NET_F_RSS,
>  VIRTIO_NET_F_HASH_REPORT,
>  VIRTIO_NET_F_GUEST_USO4,
> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> index ae26bc19a4..40e7630191 100644
> --- a/hw/scsi/vhost-scsi.c
> +++ b/hw/scsi/vhost-scsi.c
> @@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_SCSI_F_HOTPLUG,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_IN_ORDER,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index a63b1f4948..1d59951ab7 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_SCSI_F_HOTPLUG,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_IN_ORDER,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index cca2cd41be..9243dbb128 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -33,6 +33,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_IOMMU_PLATFORM,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_IN_ORDER,
>
>  VHOST_INVALID_FEATURE_BIT
>  };
> diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
> index 9431b9792c..cc7e4e47b4 100644
> --- a/hw/virtio/vhost-user-vsock.c
> +++ b/hw/virtio/vhost-user-vsock.c
> @@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_RING_F_INDIRECT_DESC,
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_F_NOTIFY_ON_EMPTY,
> +VIRTIO_F_IN_ORDER,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 85e73dd6a7..ed3185acfa 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_RING_RESET,
>  VIRTIO_F_VERSION_1,
> +VIRTIO_F_IN_ORDER,
>  VIRTIO_NET_F_CSUM,
>  VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
>  VIRTIO_NET_F_CTRL_MAC_ADDR,
> --
> 2.39.3
>




Re: [RFC 4/8] virtio: Implement in-order handling for virtio devices

2024-03-22 Thread Eugenio Perez Martin
On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:
>
> Implements in-order handling for most virtio devices using the
> VIRTIO_F_IN_ORDER transport feature, specifically those who call
> virtqueue_push to push their used elements onto the used ring.
>
> The logic behind this implementation is as follows:
>
> 1.) virtqueue_pop always enqueues VirtQueueElements in-order.
>
> virtqueue_pop always retrieves one or more buffer descriptors in-order
> from the available ring and converts them into a VirtQueueElement. This
> means that the order in which VirtQueueElements are enqueued are
> in-order by default.
>
> By virtue, as VirtQueueElements are created, we can assign a sequential
> key value to them. This preserves the order of buffers that have been
> made available to the device by the driver.
>
> As VirtQueueElements are assigned a key value, the current sequence
> number is incremented.
>
> 2.) Requests can be completed out-of-order.
>
> While most devices complete requests in the same order that they were
> enqueued by default, some devices don't (e.g. virtio-blk). The goal of
> this out-of-order handling is to reduce the impact of devices that
> process elements in-order by default while also guaranteeing compliance
> with the VIRTIO_F_IN_ORDER feature.
>
> Below is the logic behind handling completed requests (which may or may
> not be in-order).
>
> 3.) Does the incoming used VirtQueueElement preserve the correct order?
>
> In other words, is the sequence number (key) assigned to the
> VirtQueueElement the expected number that would preserve the original
> order?
>
> 3a.)
> If it does... immediately push the used element onto the used ring.
> Then increment the next expected sequence number and check to see if
> any previous out-of-order VirtQueueElements stored on the hash table
> has a key that matches this next expected sequence number.
>
> For each VirtQueueElement found on the hash table with a matching key:
> push the element on the used ring, remove the key-value pair from the
> hash table, and then increment the next expected sequence number. Repeat
> this process until we're unable to find an element with a matching key.
>
> Note that if the device uses batching (e.g. virtio-net), then we skip
> the virtqueue_flush call and let the device call it themselves.
>
> 3b.)
> If it does not... stash the VirtQueueElement, along with relevant data,
> as a InOrderVQElement on the hash table. The key used is the order_key
> that was assigned when the VirtQueueElement was created.
>
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio.c | 70 --
>  include/hw/virtio/virtio.h |  8 +
>  2 files changed, 76 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 40124545d6..40e4377f1e 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -992,12 +992,56 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
>  }
>  }
>
> +void virtqueue_order_element(VirtQueue *vq, const VirtQueueElement *elem,
> + unsigned int len, unsigned int idx,
> + unsigned int count)
> +{
> +InOrderVQElement *in_order_elem;
> +
> +if (elem->order_key == vq->current_order_idx) {
> +/* Element is in-order, push to used ring */
> +virtqueue_fill(vq, elem, len, idx);
> +
> +/* Batching? Don't flush */
> +if (count) {
> +virtqueue_flush(vq, count);

The "count" parameter is the number of heads used, but here you're
only using one head (elem). Same with the other virtqueue_flush in the
function.

Also, this function sometimes replaces virtqueue_fill and other
replaces virtqueue_fill + virtqueue_flush (both examples in patch
6/8). I have the impression the series would be simpler if
virtqueue_order_element is a static function just handling the
virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER) path of
virtqueue_fill, so the caller does not need to know if the in_order
feature is on or off.

> +}
> +
> +/* Increment next expected order, search for more in-order elements 
> */
> +while ((in_order_elem = g_hash_table_lookup(vq->in_order_ht,
> +GUINT_TO_POINTER(++vq->current_order_idx))) != NULL) 
> {
> +/* Found in-order element, push to used ring */
> +virtqueue_fill(vq, in_order_elem->elem, in_order_elem->len,
> +   in_order_elem->idx);
> +
> +/* Batching? Don't flush */
> +if (count) {
> +virtqueue_flush(vq, in_order_elem->count);
> +}
> +
> +/* Remove key-value pair from hash table */
> +g_hash_table_remove(vq->in_order_ht,
> +GUINT_TO_POINTER(vq->current_order_idx));
> +}
> +} else {
> +/* Element is out-of-order, stash in hash table */
> +in_order_elem = 

[PATCH v3 1/4] block/io: accept NULL qiov in bdrv_pad_request

2024-03-22 Thread Fiona Ebner
From: Stefan Reiter 

Some operations, e.g. block-stream, perform reads while discarding the
results (only copy-on-read matters). In this case, they will pass NULL
as the target QEMUIOVector, which will however trip bdrv_pad_request,
since it wants to extend its passed vector. In particular, this is the
case for the blk_co_preadv() call in stream_populate().

If there is no qiov, no operation can be done with it, but the bytes
and offset still need to be updated, so the subsequent aligned read
will actually be aligned and not run into an assertion failure.

In particular, this can happen when the request alignment of the top
node is larger than the allocated part of the bottom node, in which
case padding becomes necessary. For example:

> ./qemu-img create /tmp/backing.qcow2 -f qcow2 64M -o cluster_size=32768
> ./qemu-io -c "write -P42 0x0 0x1" /tmp/backing.qcow2
> ./qemu-img create /tmp/top.qcow2 -f qcow2 64M -b /tmp/backing.qcow2 -F qcow2
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev 
> qcow2,node-name=node0,file.driver=file,file.filename=/tmp/top.qcow2 \
> < {"execute": "qmp_capabilities"}
> {"execute": "blockdev-add", "arguments": { "driver": "compress", "file": 
> "node0", "node-name": "node1" } }
> {"execute": "block-stream", "arguments": { "job-id": "stream0", "device": 
> "node1" } }
> EOF

Originally-by: Stefan Reiter 
Signed-off-by: Thomas Lamprecht 
[FE: do update bytes and offset in any case
 add reproducer to commit message]
Signed-off-by: Fiona Ebner 
---

No changes in v3.
No changes in v2.

 block/io.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/block/io.c b/block/io.c
index 33150c0359..395bea3bac 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1726,22 +1726,29 @@ static int bdrv_pad_request(BlockDriverState *bs,
 return 0;
 }
 
-sliced_iov = qemu_iovec_slice(*qiov, *qiov_offset, *bytes,
-  _head, _tail,
-  _niov);
+/*
+ * For prefetching in stream_populate(), no qiov is passed along, because
+ * only copy-on-read matters.
+ */
+if (qiov && *qiov) {
+sliced_iov = qemu_iovec_slice(*qiov, *qiov_offset, *bytes,
+  _head, _tail,
+  _niov);
 
-/* Guaranteed by bdrv_check_request32() */
-assert(*bytes <= SIZE_MAX);
-ret = bdrv_create_padded_qiov(bs, pad, sliced_iov, sliced_niov,
-  sliced_head, *bytes);
-if (ret < 0) {
-bdrv_padding_finalize(pad);
-return ret;
+/* Guaranteed by bdrv_check_request32() */
+assert(*bytes <= SIZE_MAX);
+ret = bdrv_create_padded_qiov(bs, pad, sliced_iov, sliced_niov,
+  sliced_head, *bytes);
+if (ret < 0) {
+bdrv_padding_finalize(pad);
+return ret;
+}
+*qiov = >local_qiov;
+*qiov_offset = 0;
 }
+
 *bytes += pad->head + pad->tail;
 *offset -= pad->head;
-*qiov = >local_qiov;
-*qiov_offset = 0;
 if (padded) {
 *padded = true;
 }
-- 
2.39.2





[PATCH v3 4/4] iotests: add test for stream job with an unaligned prefetch read

2024-03-22 Thread Fiona Ebner
Previously, bdrv_pad_request() could not deal with a NULL qiov when
a read needed to be aligned. During prefetch, a stream job will pass a
NULL qiov. Add a test case to cover this scenario.

By accident, also covers a previous race during shutdown, where block
graph changes during iteration in bdrv_flush_all() could lead to
unreferencing the wrong block driver state and an assertion failure
later.

Signed-off-by: Fiona Ebner 
---

No changes in v3.
New in v2.

 .../tests/stream-unaligned-prefetch   | 86 +++
 .../tests/stream-unaligned-prefetch.out   |  5 ++
 2 files changed, 91 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/stream-unaligned-prefetch
 create mode 100644 tests/qemu-iotests/tests/stream-unaligned-prefetch.out

diff --git a/tests/qemu-iotests/tests/stream-unaligned-prefetch 
b/tests/qemu-iotests/tests/stream-unaligned-prefetch
new file mode 100755
index 00..546db1d369
--- /dev/null
+++ b/tests/qemu-iotests/tests/stream-unaligned-prefetch
@@ -0,0 +1,86 @@
+#!/usr/bin/env python3
+# group: rw quick
+#
+# Test what happens when a stream job does an unaligned prefetch read
+# which requires padding while having a NULL qiov.
+#
+# Copyright (C) Proxmox Server Solutions GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+from iotests import imgfmt, qemu_img_create, qemu_io, QMPTestCase
+
+image_size = 1 * 1024 * 1024
+cluster_size = 64 * 1024
+base = os.path.join(iotests.test_dir, 'base.img')
+top = os.path.join(iotests.test_dir, 'top.img')
+
+class TestStreamUnalignedPrefetch(QMPTestCase):
+def setUp(self) -> None:
+"""
+Create two images:
+- base image {base} with {cluster_size // 2} bytes allocated
+- top image {top} without any data allocated and coarser
+  cluster size
+
+Attach a compress filter for the top image, because that
+requires that the request alignment is the top image's cluster
+size.
+"""
+qemu_img_create('-f', imgfmt,
+'-o', 'cluster_size={}'.format(cluster_size // 2),
+base, str(image_size))
+qemu_io('-c', f'write 0 {cluster_size // 2}', base)
+qemu_img_create('-f', imgfmt,
+'-o', 'cluster_size={}'.format(cluster_size),
+top, str(image_size))
+
+self.vm = iotests.VM()
+self.vm.add_blockdev(self.vm.qmp_to_opts({
+'driver': imgfmt,
+'node-name': 'base',
+'file': {
+'driver': 'file',
+'filename': base
+}
+}))
+self.vm.add_blockdev(self.vm.qmp_to_opts({
+'driver': 'compress',
+'node-name': 'compress-top',
+'file': {
+'driver': imgfmt,
+'node-name': 'top',
+'file': {
+'driver': 'file',
+'filename': top
+},
+'backing': 'base'
+}
+}))
+self.vm.launch()
+
+def tearDown(self) -> None:
+self.vm.shutdown()
+os.remove(top)
+os.remove(base)
+
+def test_stream_unaligned_prefetch(self) -> None:
+self.vm.cmd('block-stream', job_id='stream', device='compress-top')
+
+
+if __name__ == '__main__':
+iotests.main(supported_fmts=['qcow2'], supported_protocols=['file'])
diff --git a/tests/qemu-iotests/tests/stream-unaligned-prefetch.out 
b/tests/qemu-iotests/tests/stream-unaligned-prefetch.out
new file mode 100644
index 00..ae1213e6f8
--- /dev/null
+++ b/tests/qemu-iotests/tests/stream-unaligned-prefetch.out
@@ -0,0 +1,5 @@
+.
+--
+Ran 1 tests
+
+OK
-- 
2.39.2





[PATCH v3 0/4] fix two edge cases related to stream block jobs

2024-03-22 Thread Fiona Ebner
Changes in v3:
* Also deal with edge case in bdrv_next_cleanup(). Haven't run
  into an actual issue there, but at least the caller in
  migration/block.c uses bdrv_nb_sectors() which, while not a
  coroutine wrapper itself (it's written manually), may call
  bdrv_refresh_total_sectors(), which is a generated coroutine
  wrapper, so AFAIU, the block graph can change during that call.
  And even without that, it's just better to be more consistent
  with bdrv_next().

Changes in v2:
* Ran into another issue while writing the IO test Stefan wanted
  to have (good call :)), so include a fix for that and add the
  test. I didn't notice during manual testing, because I hadn't
  used a scripted QMP 'quit', so there was no race.

Fiona Ebner (3):
  block-backend: fix edge case in bdrv_next() where BDS associated to BB
changes
  block-backend: fix edge case in bdrv_next_cleanup() where BDS
associated to BB changes
  iotests: add test for stream job with an unaligned prefetch read

Stefan Reiter (1):
  block/io: accept NULL qiov in bdrv_pad_request

 block/block-backend.c | 18 ++--
 block/io.c| 31 ---
 .../tests/stream-unaligned-prefetch   | 86 +++
 .../tests/stream-unaligned-prefetch.out   |  5 ++
 4 files changed, 117 insertions(+), 23 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/stream-unaligned-prefetch
 create mode 100644 tests/qemu-iotests/tests/stream-unaligned-prefetch.out

-- 
2.39.2





[PATCH v3 2/4] block-backend: fix edge case in bdrv_next() where BDS associated to BB changes

2024-03-22 Thread Fiona Ebner
The old_bs variable in bdrv_next() is currently determined by looking
at the old block backend. However, if the block graph changes before
the next bdrv_next() call, it might be that the associated BDS is not
the same that was referenced previously. In that case, the wrong BDS
is unreferenced, leading to an assertion failure later:

> bdrv_unref: Assertion `bs->refcnt > 0' failed.

In particular, this can happen in the context of bdrv_flush_all(),
when polling for bdrv_co_flush() in the generated co-wrapper leads to
a graph change (for example with a stream block job [0]).

A racy reproducer:

> #!/bin/bash
> rm -f /tmp/backing.qcow2
> rm -f /tmp/top.qcow2
> ./qemu-img create /tmp/backing.qcow2 -f qcow2 64M
> ./qemu-io -c "write -P42 0x0 0x1" /tmp/backing.qcow2
> ./qemu-img create /tmp/top.qcow2 -f qcow2 64M -b /tmp/backing.qcow2 -F qcow2
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev 
> qcow2,node-name=node0,file.driver=file,file.filename=/tmp/top.qcow2 \
> < {"execute": "qmp_capabilities"}
> {"execute": "block-stream", "arguments": { "job-id": "stream0", "device": 
> "node0" } }
> {"execute": "quit"}
> EOF

[0]:

> #0  bdrv_replace_child_tran (child=..., new_bs=..., tran=...)
> #1  bdrv_replace_node_noperm (from=..., to=..., auto_skip=..., tran=..., 
> errp=...)
> #2  bdrv_replace_node_common (from=..., to=..., auto_skip=..., 
> detach_subchain=..., errp=...)
> #3  bdrv_drop_filter (bs=..., errp=...)
> #4  bdrv_cor_filter_drop (cor_filter_bs=...)
> #5  stream_prepare (job=...)
> #6  job_prepare_locked (job=...)
> #7  job_txn_apply_locked (fn=..., job=...)
> #8  job_do_finalize_locked (job=...)
> #9  job_exit (opaque=...)
> #10 aio_bh_poll (ctx=...)
> #11 aio_poll (ctx=..., blocking=...)
> #12 bdrv_poll_co (s=...)
> #13 bdrv_flush (bs=...)
> #14 bdrv_flush_all ()
> #15 do_vm_stop (state=..., send_stop=...)
> #16 vm_shutdown ()

Signed-off-by: Fiona Ebner 
---

No changes in v3.
New in v2.

 block/block-backend.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 9c4de79e6b..28af1eb17a 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -599,14 +599,14 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it)
 /* Must be called from the main loop */
 assert(qemu_get_current_aio_context() == qemu_get_aio_context());
 
+old_bs = it->bs;
+
 /* First, return all root nodes of BlockBackends. In order to avoid
  * returning a BDS twice when multiple BBs refer to it, we only return it
  * if the BB is the first one in the parent list of the BDS. */
 if (it->phase == BDRV_NEXT_BACKEND_ROOTS) {
 BlockBackend *old_blk = it->blk;
 
-old_bs = old_blk ? blk_bs(old_blk) : NULL;
-
 do {
 it->blk = blk_all_next(it->blk);
 bs = it->blk ? blk_bs(it->blk) : NULL;
@@ -620,11 +620,10 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it)
 if (bs) {
 bdrv_ref(bs);
 bdrv_unref(old_bs);
+it->bs = bs;
 return bs;
 }
 it->phase = BDRV_NEXT_MONITOR_OWNED;
-} else {
-old_bs = it->bs;
 }
 
 /* Then return the monitor-owned BDSes without a BB attached. Ignore all
-- 
2.39.2





[PATCH v3 3/4] block-backend: fix edge case in bdrv_next_cleanup() where BDS associated to BB changes

2024-03-22 Thread Fiona Ebner
Same rationale as for commit "block-backend: fix edge case in
bdrv_next() where BDS associated to BB changes". The block graph might
change between the bdrv_next() call and the bdrv_next_cleanup() call,
so it could be that the associated BDS is not the same that was
referenced previously anymore. Instead, rely on bdrv_next() to set
it->bs to the BDS it referenced and unreference that one in any case.

Signed-off-by: Fiona Ebner 
---

New in v3.

 block/block-backend.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 28af1eb17a..db6f9b92a3 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -663,13 +663,10 @@ void bdrv_next_cleanup(BdrvNextIterator *it)
 /* Must be called from the main loop */
 assert(qemu_get_current_aio_context() == qemu_get_aio_context());
 
-if (it->phase == BDRV_NEXT_BACKEND_ROOTS) {
-if (it->blk) {
-bdrv_unref(blk_bs(it->blk));
-blk_unref(it->blk);
-}
-} else {
-bdrv_unref(it->bs);
+bdrv_unref(it->bs);
+
+if (it->phase == BDRV_NEXT_BACKEND_ROOTS && it->blk) {
+blk_unref(it->blk);
 }
 
 bdrv_next_reset(it);
-- 
2.39.2





Re: [RFC 1/8] virtio: Define InOrderVQElement

2024-03-22 Thread Eugenio Perez Martin
On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:
>
> Define the InOrderVQElement structure for the VIRTIO_F_IN_ORDER
> transport feature implementation.
>
> The InOrderVQElement structure is used to encapsulate out-of-order
> VirtQueueElement data that was processed by the host. This data
> includes:
>  - The processed VirtQueueElement (elem)
>  - Length of data (len)
>  - VirtQueueElement array index (idx)
>  - Number of processed VirtQueueElements (count)
>
> InOrderVQElements will be stored in a buffering mechanism until an
> order can be achieved.
>
> Signed-off-by: Jonah Palmer 
> ---
>  include/hw/virtio/virtio.h | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index b3c74a1bca..c8aa435a5e 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -77,6 +77,13 @@ typedef struct VirtQueueElement
>  struct iovec *out_sg;
>  } VirtQueueElement;
>
> +typedef struct InOrderVQElement {
> +const VirtQueueElement *elem;

Some subsystems allocate space for extra elements after
VirtQueueElement, like VirtIOBlockReq. You can request virtqueue_pop
to allocate this extra space by its second argument. Would it work for
this?

> +unsigned int len;
> +unsigned int idx;
> +unsigned int count;

Now I don't get why these fields cannot be obtained from elem->(len,
index, ndescs) ?

> +} InOrderVQElement;
> +
>  #define VIRTIO_QUEUE_MAX 1024
>
>  #define VIRTIO_NO_VECTOR 0x
> --
> 2.39.3
>