Re: [PATCH v2 5/7] block/block-copy: block_copy(): add timeout_ns parameter

2022-04-04 Thread Hanna Reitz

On 01.04.22 18:08, Vladimir Sementsov-Ogievskiy wrote:

01.04.2022 16:16, Hanna Reitz wrote:

On 01.04.22 11:19, Vladimir Sementsov-Ogievskiy wrote:

Add possibility to limit block_copy() call in time. To be used in the
next commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/block-copy.c | 26 +++---
  block/copy-before-write.c  |  2 +-
  include/block/block-copy.h |  2 +-
  3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index ec46775ea5..b47cb188dd 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c


[...]

@@ -894,12 +902,16 @@ int coroutine_fn block_copy(BlockCopyState *s, 
int64_t start, int64_t bytes,

  .max_workers = BLOCK_COPY_MAX_WORKERS,
  };
-    return block_copy_common(&call_state);
-}
+    ret = qemu_co_timeout(block_copy_async_co_entry, call_state, 
timeout_ns,

+  g_free);


A direct path for timeout_ns == 0 might still be nice to have.


+    if (ret < 0) {
+    /* Timeout. call_state will be freed by running coroutine. */


Maybe assert(ret == -ETIMEDOUT);?


OK




+    return ret;


If I’m right in understanding how qemu_co_timeout() works, 
block_copy_common() will continue to run here.  Shouldn’t we at least 
cancel it by setting call_state->cancelled to true?


Agree



(Besides this, I think that letting block_copy_common() running in 
the background should be OK.  I’m not sure what the implications are 
if we do cancel the call here, while on-cbw-error is 
break-guest-write, though.  Should be fine, I guess, because 
block_copy_common() will still correctly keep track of what it has 
successfully copied and what it hasn’t?)


Hmm. I now think, that we should at least wait for such cancelled 
background requests before block_copy_state_free in cbw_close(). But 
in "[PATCH v5 00/45] Transactional block-graph modifying API" I want 
to detach children from CBW filter before calling .close().. So, 
possible solution is to wait for all cancelled requests on 
.bdrv_co_drain_begin().


Or alternatively, may be just increase bs->in_flight for CBW filter 
for each background cancelled request? And decrease when it finish. 
For this we should add a kind of callback to be called when timed-out 
coroutine entry finish.


in_flight sounds good to me.  That would automatically work for 
draining, right?





Re: [PATCH v3] spec: Clarify BLOCK_STATUS reply details

2022-04-04 Thread Richard W.M. Jones
On Fri, Apr 01, 2022 at 04:08:07PM -0500, Eric Blake wrote:
> Our docs were inconsistent on whether a NBD_REPLY_TYPE_BLOCK_STATUS
> reply chunk can exceed the client's requested length, and silent on
> whether the lengths must be consistent when multiple contexts were
> negotiated.  Clarify this to match existing practice as implemented in
> qemu-nbd.  Clean up some nearby grammatical errors while at it.
> ---
> 
> Another round of rewording attempts, based on feedback from Rich on
> v2.  I went ahead and pushed patch 1 and 2 of the v2 series, as they
> were less controversial.
> 
>  doc/proto.md | 42 --
>  1 file changed, 28 insertions(+), 14 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 8a817d2..bacccfa 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -882,15 +882,25 @@ The procedure works as follows:
>server supports.
>  - During transmission, a client can then indicate interest in metadata
>for a given region by way of the `NBD_CMD_BLOCK_STATUS` command,
> -  where *offset* and *length* indicate the area of interest. The
> -  server MUST then respond with the requested information, for all
> -  contexts which were selected during negotiation. For every metadata
> -  context, the server sends one set of extent chunks, where the sizes
> -  of the extents MUST be less than or equal to the length as specified
> -  in the request. Each extent comes with a *flags* field, the
> -  semantics of which are defined by the metadata context.
> -- A server MUST reply to `NBD_CMD_BLOCK_STATUS` with a structured
> -  reply of type `NBD_REPLY_TYPE_BLOCK_STATUS`.
> +  where *offset* and *length* indicate the area of interest. On
> +  success, the server MUST respond with one structured reply chunk of
> +  type `NBD_REPLY_TYPE_BLOCK_STATUS` per metadata context selected
> +  during negotiation, where each reply chunk is a list of one or more
> +  consecutive extents for that context.  Each extent comes with a
> +  *flags* field, the semantics of which are defined by the metadata
> +  context.
> +
> +The client's requested *length* is only a hint to the server, so the
> +cumulative extent length contained in a chunk of the server's reply
> +may be shorter or longer the original request.  When more than one
> +metadata context was negotiated, the reply chunks for the different
> +contexts of a single block status request need not have the same
> +number of extents or cumulative extent length.
> +
> +In the request, the client may use the `NBD_CMD_FLAG_REQ_ONE` command
> +flag to further constrain the server's reply so that each chunk
> +contains exactly one extent whose length does not exceed the client's
> +original *length*.
> 
>  A client MUST NOT use `NBD_CMD_BLOCK_STATUS` unless it selected a
>  nonzero number of metadata contexts during negotiation, and used the
> @@ -1778,8 +1788,8 @@ MUST initiate a hard disconnect.
>*length* MUST be 4 + (a positive integer multiple of 8).  This reply
>represents a series of consecutive block descriptors where the sum
>of the length fields within the descriptors is subject to further
> -  constraints documented below. This chunk type MUST appear
> -  exactly once per metadata ID in a structured reply.
> +  constraints documented below.  A successful block status request MUST
> +  have exactly one status chunk per negotiated metadata context ID.
> 
>The payload starts with:
> 
> @@ -1801,15 +1811,19 @@ MUST initiate a hard disconnect.
>*length* of the final extent MAY result in a sum larger than the
>original requested length, if the server has that information anyway
>as a side effect of reporting the status of the requested region.
> +  When multiple metadata contexts are negotiated, the reply chunks for
> +  the different contexts need not have the same number of extents or
> +  cumulative extent length.
> 
>Even if the client did not use the `NBD_CMD_FLAG_REQ_ONE` flag in
>its request, the server MAY return fewer descriptors in the reply
>than would be required to fully specify the whole range of requested
>information to the client, if looking up the information would be
>too resource-intensive for the server, so long as at least one
> -  extent is returned. Servers should however be aware that most
> -  clients implementations will then simply ask for the next extent
> -  instead.
> +  extent is returned.  Servers should however be aware that most
> +  client implementations will likely follow up with a request for
> +  extent information at the first offset not covered by a
> +  reduced-length reply.
> 
>  All error chunk types have bit 15 set, and begin with the same
>  *error*, *message length*, and optional *message* fields as

This seems clearer now, thanks.

Reviewed-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
vi

Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept

2022-04-04 Thread Paolo Bonzini
On 4/4/22 11:51, Emanuele Giuseppe Esposito wrote:
>>
>> I agree that it doesn't. This new lock is only protecting ->parents and
>> ->children. 
> Side note: it will also be used to protect other fields, like
> .aio_context I think. I haven't checked if there is something else we
> might want to protect that is currently protected by AioContext lock.
> 
> At least, I think we are going to use the same lock, right?

I have no idea honestly.  It can make sense for anything that is changed
very rarely and read during requests.

.aio_context has the .detach/.attach callbacks and I wonder if there
should be any reason to access it outside the callbacks.  A lot of uses
of .aio_context (for example for aio_bh_new or
aio_bh_schedule_oneshot/replay_bh_schedule_oneshot_event) can, and
perhaps should, be changed to just qemu_get_current_aio_context().  For
multiqueue we probably want the same BlockDriverState to use the
AioContext corresponding to a virtio queue, rather than always the same one.

Paolo




Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept

2022-04-04 Thread Emanuele Giuseppe Esposito



Am 04/04/2022 um 11:41 schrieb Paolo Bonzini:
> 
> 
> On Mon, Apr 4, 2022 at 11:25 AM Stefan Hajnoczi  > wrote:
> 
> - The new API doesn't stop more I/O requests from being submitted, it
>   just blocks the current coroutine so request processing is deferred.
> 
> 
> New I/O requests would not complete until the write-side critical
> section ends. However they would still be accepted: from the point of
> view of the guest, the "consumed" index of the virtio ring would move
> forward, unlike bdrv_drained_begin/end().
> 
> - In other words, is_external is a flow control API whereas the new API
>   queues up request coroutines without notifying the caller.
> 
> 
> Yes, I think this is the same I wrote above.
> 
> - The new API still needs to be combined with bdrv_drained_begin/end()
>   to ensure in-flight requests are done.
> 
> 
> I don't think so, because in-flight requests would take the lock for
> reading. The write side would not start until those in-flight requests
> release the lock.
> 
> - It's not obvious to me whether the new API obsoletes is_external.
> I think it probably doesn't.
> 
> 
> I agree that it doesn't. This new lock is only protecting ->parents and
> ->children. 

Side note: it will also be used to protect other fields, like
.aio_context I think. I haven't checked if there is something else we
might want to protect that is currently protected by AioContext lock.

At least, I think we are going to use the same lock, right?

Emanuele

bdrv_drained_begin()/end() remains necessary, for example,
> when you need to send a request during the drained section. An example
> is block_resize.
> 
> In addition, bdrv_drained_begin()/end() ensures that the callback of
> blk_aio_*() functions has been invoked (see commit 46aaf2a566,
> "block-backend: Decrease in_flight only after callback", 2018-09-25). 
> This new lock would not ensure that.
> 
> As an aside, instead of is_external, QEMU could remove/add the ioeventfd
> handler in the blk->dev_ops->drained_begin and blk->dev_ops->drained_end
> callbacks respectively. But that's just a code cleanup.
> 
> Paolo




Re: [PATCH v7 12/12] hw/acpi: Make the PCI hot-plug aware of SR-IOV

2022-04-04 Thread Łukasz Gieryk
On Thu, Mar 31, 2022 at 02:38:41PM +0200, Igor Mammedov wrote:
> it's unclear what's bing hotpluged and unplugged, it would be better if
> you included QEMU CLI and relevan qmp/monito commands to reproduce it.

Qemu CLI:
-
-device pcie-root-port,slot=0,id=rp0
-device nvme-subsys,id=subsys0
-device 
nvme,id=nvme0,bus=rp0,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,sriov_vq_flexible=2,sriov_vi_flexible=1

Guest OS:
-
sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0
sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0
echo 1 > /sys/bus/pci/devices/:01:00.0/reset
sleep 1
echo 1 > /sys/bus/pci/devices/:01:00.0/sriov_numvfs
nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1
nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2
nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0
sleep 2
echo 01:00.1 > /sys/bus/pci/drivers/nvme/bind

Qemu monitor:
-
device_del nvme0
 



Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept

2022-04-04 Thread Paolo Bonzini
On Mon, Apr 4, 2022 at 11:25 AM Stefan Hajnoczi  wrote:

> - The new API doesn't stop more I/O requests from being submitted, it
>   just blocks the current coroutine so request processing is deferred.
>

New I/O requests would not complete until the write-side critical section
ends. However they would still be accepted: from the point of view of the
guest, the "consumed" index of the virtio ring would move forward, unlike
bdrv_drained_begin/end().

- In other words, is_external is a flow control API whereas the new API
>   queues up request coroutines without notifying the caller.
>

Yes, I think this is the same I wrote above.

> - The new API still needs to be combined with bdrv_drained_begin/end()
>   to ensure in-flight requests are done.
>

I don't think so, because in-flight requests would take the lock for
reading. The write side would not start until those in-flight requests
release the lock.

- It's not obvious to me whether the new API obsoletes is_external. I think
> it probably doesn't.
>

I agree that it doesn't. This new lock is only protecting ->parents and
->children. bdrv_drained_begin()/end() remains necessary, for example, when
you need to send a request during the drained section. An example is
block_resize.

In addition, bdrv_drained_begin()/end() ensures that the callback of
blk_aio_*() functions has been invoked (see commit 46aaf2a566,
"block-backend: Decrease in_flight only after callback", 2018-09-25).  This
new lock would not ensure that.

As an aside, instead of is_external, QEMU could remove/add the ioeventfd
handler in the blk->dev_ops->drained_begin and blk->dev_ops->drained_end
callbacks respectively. But that's just a code cleanup.

Paolo


Re: [PATCH v4 0/4] util/thread-pool: Expose minimun and maximum size

2022-04-04 Thread Stefan Hajnoczi
On Fri, Apr 01, 2022 at 11:35:20AM +0200, Nicolas Saenz Julienne wrote:
> As discussed on the previous RFC[1] the thread-pool's dynamic thread
> management doesn't play well with real-time and latency sensitive
> systems. This series introduces a set of controls that'll permit
> achieving more deterministic behaviours, for example by fixing the
> pool's size.
> 
> We first introduce a new common interface to event loop configuration by
> moving iothread's already available properties into an abstract class
> called 'EventLooopBackend' and have both 'IOThread' and the newly
> created 'MainLoop' inherit the properties from that class.
> 
> With this new configuration interface in place it's relatively simple to
> introduce new options to fix the even loop's thread pool sizes. The
> resulting QAPI looks like this:
> 
> -object main-loop,id=main-loop,thread-pool-min=1,thread-pool-max=1
> 
> Note that all patches are bisect friendly and pass all the tests.
> 
> [1] 
> https://patchwork.ozlabs.org/project/qemu-devel/patch/20220202175234.656711-1-nsaen...@redhat.com/
> 
> @Stefan I kept your Signed-off-by, since the changes trivial/not
> thread-pool related

Looks good to me. I will wait for Markus to review the QAPI schema changes.

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept

2022-04-04 Thread Stefan Hajnoczi
On Fri, Apr 01, 2022 at 01:01:53PM +0200, Paolo Bonzini wrote:
> On 4/1/22 10:05, Emanuele Giuseppe Esposito wrote:
> > > The list itself would be used internally to implement the write-side
> > > lock and unlock primitives, but it would not be protected by the above
> > > functions.  So there would be a couple additional functions:
> > > 
> > >    bdrv_graph_list_lock <-> cpu_list_lock
> > >    bdrv_graph_list_unlock <-> cpu_list_unlock
> > 
> > The list would be graph_bdrv_states, why do we need to protect it with a
> > lock? Currently it is protected by BQL, and theoretically only
> > bdrv_graph_wrlock iterates on it. And as we defined in the assertion
> > below, wrlock is always in the main loop too.
> 
> You're right, CPU_FOREACH only appears in start_exclusive; so likewise you
> only need to walk the list in bdrv_graph_wrlock, i.e. only under BQL.
> 
> My thought was that, within the implementation, you'll need a mutex to
> protect has_waiter, and protecting the list with the same mutex made sense
> to me.  But indeed it's not necessary.

What is the relationship between this new API and aio_set_fd_handler()'s
is_external?

A few thoughts:

- The new API doesn't stop more I/O requests from being submitted, it
  just blocks the current coroutine so request processing is deferred.

- In other words, is_external is a flow control API whereas the new API
  queues up request coroutines without notifying the caller.

- The new API still needs to be combined with bdrv_drained_begin/end()
  to ensure in-flight requests are done.

- It's not obvious to me whether the new API obsoletes is_external. I
  think it probably doesn't.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v4 0/4] util/thread-pool: Expose minimun and maximum size

2022-04-04 Thread Nicolas Saenz Julienne
On Fri, 2022-04-01 at 11:35 +0200, Nicolas Saenz Julienne wrote:

Subject says 0/4 where is should've been 0/3.

> As discussed on the previous RFC[1] the thread-pool's dynamic thread
> management doesn't play well with real-time and latency sensitive
> systems. This series introduces a set of controls that'll permit
> achieving more deterministic behaviours, for example by fixing the
> pool's size.
> 
> We first introduce a new common interface to event loop configuration by
> moving iothread's already available properties into an abstract class
> called 'EventLooopBackend' and have both 'IOThread' and the newly
> created 'MainLoop' inherit the properties from that class.
> 
> With this new configuration interface in place it's relatively simple to
> introduce new options to fix the even loop's thread pool sizes. The
> resulting QAPI looks like this:
> 
> -object main-loop,id=main-loop,thread-pool-min=1,thread-pool-max=1
> 
> Note that all patches are bisect friendly and pass all the tests.
> 
> [1] 
> https://patchwork.ozlabs.org/project/qemu-devel/patch/20220202175234.656711-1-nsaen...@redhat.com/
> 
> @Stefan I kept your Signed-off-by, since the changes trivial/not
> thread-pool related
> 
> ---
> Changes since v3:
>  - Avoid duplication in qom.json by creating EventLoopBaseProperties.
>  - Fix failures on first compilation due to race between
>event-loop-base.o and qapi header generation.
> 
> Changes since v2:
>  - Get rid of wrong locking/waiting
>  - Fix qapi versioning
>  - Better commit messages
> 
> Changes since v1:
>  - Address all Stefan's comments
>  - Introduce new fix
> 
> Nicolas Saenz Julienne (3):
>   Introduce event-loop-base abstract class
>   util/main-loop: Introduce the main loop into QOM
>   util/event-loop-base: Introduce options to set the thread pool size
> 
>  event-loop-base.c| 140 +++
>  include/block/aio.h  |  10 +++
>  include/block/thread-pool.h  |   3 +
>  include/qemu/main-loop.h |  10 +++
>  include/sysemu/event-loop-base.h |  41 +
>  include/sysemu/iothread.h|   6 +-
>  iothread.c   |  68 +--
>  meson.build  |  26 +++---
>  qapi/qom.json|  40 +++--
>  util/aio-posix.c |   1 +
>  util/async.c |  20 +
>  util/main-loop.c |  65 ++
>  util/thread-pool.c   |  55 +++-
>  13 files changed, 416 insertions(+), 69 deletions(-)
>  create mode 100644 event-loop-base.c
>  create mode 100644 include/sysemu/event-loop-base.h
> 

-- 
Nicolás Sáenz