Re: [PATCH 00/10] vhost: stick to -errno error return convention

2021-11-12 Thread Roman Kagan
On Thu, Nov 11, 2021 at 03:14:56PM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2021 at 06:33:44PM +0300, Roman Kagan wrote:
> > Error propagation between the generic vhost code and the specific backends 
> > is
> > not quite consistent: some places follow "return -1 and set errno" 
> > convention,
> > while others assume "return negated errno".  Furthermore, not enough care is
> > taken not to clobber errno.
> > 
> > As a result, on certain code paths the errno resulting from a failure may 
> > get
> > overridden by another function call, and then that zero errno inidicating
> > success is propagated up the stack, leading to failures being lost.  In
> > particular, we've seen errors in the communication with a vhost-user-blk 
> > slave
> > not trigger an immediate connection drop and reconnection, leaving it in a
> > broken state.
> > 
> > Rework error propagation to always return negated errno on errors and
> > correctly pass it up the stack.
> 
> Looks like something we want post release. I'll tag it
> but pls ping me after the release to help make sure
> it's not lost.

It doesn't introduce new features so I guess it might qualify for rc0,
but the churn is somewhat too big indeed.

OK I'll reiterate once 6.2 is out; meanwhile if anyone has spare cycles
to review it, it'll be much appreciated.

Thanks,
Roman.

> 
> 
> > Roman Kagan (10):
> >   vhost-user-blk: reconnect on any error during realize
> >   chardev/char-socket: tcp_chr_recv: don't clobber errno
> >   chardev/char-socket: tcp_chr_sync_read: don't clobber errno
> >   chardev/char-fe: don't allow EAGAIN from blocking read
> >   vhost-backend: avoid overflow on memslots_limit
> >   vhost-backend: stick to -errno error return convention
> >   vhost-vdpa: stick to -errno error return convention
> >   vhost-user: stick to -errno error return convention
> >   vhost: stick to -errno error return convention
> >   vhost-user-blk: propagate error return from generic vhost
> > 
> >  chardev/char-fe.c |   7 +-
> >  chardev/char-socket.c |  17 +-
> >  hw/block/vhost-user-blk.c |   4 +-
> >  hw/virtio/vhost-backend.c |   4 +-
> >  hw/virtio/vhost-user.c| 401 +-
> >  hw/virtio/vhost-vdpa.c|  37 ++--
> >  hw/virtio/vhost.c |  98 +-
> >  7 files changed, 307 insertions(+), 261 deletions(-)
> > 
> > -- 
> > 2.33.1
> > 
> 



Re: [PATCH 04/10] chardev/char-fe: don't allow EAGAIN from blocking read

2021-11-12 Thread Marc-André Lureau
Hi

On Thu, Nov 11, 2021 at 7:44 PM Roman Kagan  wrote:

> As its name suggests, ChardevClass.chr_sync_read is supposed to do a
> blocking read.  The only implementation of it, tcp_chr_sync_read, does
> set the underlying io channel to the blocking mode indeed.
>
> Therefore a failure return with EAGAIN is not expected from this call.
>
> So do not retry it in qemu_chr_fe_read_all; instead place an assertion
> that it doesn't fail with EAGAIN.
>

The code was introduced in :
commit 7b0bfdf52d694c9a3a96505aa42ce3f8d63acd35
Author: Nikolay Nikolaev 
Date:   Tue May 27 15:03:48 2014 +0300

Add chardev API qemu_chr_fe_read_all

Also touched later by Daniel in:
commit 53628efbc8aa7a7ab5354d24b971f4d69452151d
Author: Daniel P. Berrangé 
Date:   Thu Mar 31 16:29:27 2016 +0100

char: fix broken EAGAIN retry on OS-X due to errno clobbering



> Signed-off-by: Roman Kagan 
> ---
>  chardev/char-fe.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/chardev/char-fe.c b/chardev/char-fe.c
> index 7789f7be9c..f94efe928e 100644
> --- a/chardev/char-fe.c
> +++ b/chardev/char-fe.c
> @@ -68,13 +68,10 @@ int qemu_chr_fe_read_all(CharBackend *be, uint8_t
> *buf, int len)
>  }
>
>  while (offset < len) {
> -retry:
>  res = CHARDEV_GET_CLASS(s)->chr_sync_read(s, buf + offset,
>len - offset);
> -if (res == -1 && errno == EAGAIN) {
> -g_usleep(100);
> -goto retry;
> -}
> +/* ->chr_sync_read should block */
> +assert(!(res < 0 && errno == EAGAIN));
>
>
While I agree with the rationale to clean this code a bit, I am not so sure
about replacing it with an assert(). In the past, when we did such things
we had unexpected regressions :)

A slightly better approach perhaps is g_warn_if_fail(), although it's not
very popular in qemu.



>  if (res == 0) {
>  break;
> --
> 2.33.1
>
>
>

-- 
Marc-André Lureau


Re: [PATCH 03/10] chardev/char-socket: tcp_chr_sync_read: don't clobber errno

2021-11-12 Thread Marc-André Lureau
On Thu, Nov 11, 2021 at 7:36 PM Roman Kagan  wrote:

> After the return from tcp_chr_recv, tcp_chr_sync_read calls into a
> function which eventually makes a system call and may clobber errno.
>
> Make a copy of errno right after tcp_chr_recv and restore the errno on
> return from tcp_chr_sync_read.
>
> Signed-off-by: Roman Kagan 
>

Reviewed-by: Marc-André Lureau 

---
>  chardev/char-socket.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 90054ce58c..cf7f2ba65a 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -581,6 +581,7 @@ static int tcp_chr_sync_read(Chardev *chr, const
> uint8_t *buf, int len)
>  {
>  SocketChardev *s = SOCKET_CHARDEV(chr);
>  int size;
> +int saved_errno;
>
>  if (s->state != TCP_CHARDEV_STATE_CONNECTED) {
>  return 0;
> @@ -588,6 +589,7 @@ static int tcp_chr_sync_read(Chardev *chr, const
> uint8_t *buf, int len)
>
>  qio_channel_set_blocking(s->ioc, true, NULL);
>  size = tcp_chr_recv(chr, (void *) buf, len);
> +saved_errno = errno;
>  if (s->state != TCP_CHARDEV_STATE_DISCONNECTED) {
>  qio_channel_set_blocking(s->ioc, false, NULL);
>  }
> @@ -596,6 +598,7 @@ static int tcp_chr_sync_read(Chardev *chr, const
> uint8_t *buf, int len)
>  tcp_chr_disconnect(chr);
>  }
>
> +errno = saved_errno;
>  return size;
>  }
>
> --
> 2.33.1
>
>
>

-- 
Marc-André Lureau


Re: [PATCH 02/10] chardev/char-socket: tcp_chr_recv: don't clobber errno

2021-11-12 Thread Marc-André Lureau
On Thu, Nov 11, 2021 at 7:38 PM Roman Kagan  wrote:

> tcp_chr_recv communicates the specific error condition to the caller via
> errno.  However, after setting it, it may call into some system calls or
> library functions which can clobber the errno.
>
> Avoid this by moving the errno assignment to the end of the function.
>
> Signed-off-by: Roman Kagan 
>

Reviewed-by: Marc-André Lureau 

---
>  chardev/char-socket.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 836cfa0bc2..90054ce58c 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -346,13 +346,6 @@ static ssize_t tcp_chr_recv(Chardev *chr, char *buf,
> size_t len)
>   NULL);
>  }
>
> -if (ret == QIO_CHANNEL_ERR_BLOCK) {
> -errno = EAGAIN;
> -ret = -1;
> -} else if (ret == -1) {
> -errno = EIO;
> -}
> -
>  if (msgfds_num) {
>  /* close and clean read_msgfds */
>  for (i = 0; i < s->read_msgfds_num; i++) {
> @@ -381,6 +374,13 @@ static ssize_t tcp_chr_recv(Chardev *chr, char *buf,
> size_t len)
>  #endif
>  }
>
> +if (ret == QIO_CHANNEL_ERR_BLOCK) {
> +errno = EAGAIN;
> +ret = -1;
> +} else if (ret == -1) {
> +errno = EIO;
> +}
> +
>  return ret;
>  }
>
> --
> 2.33.1
>
>
>

-- 
Marc-André Lureau


Re: [PATCH 05/10] vhost-backend: avoid overflow on memslots_limit

2021-11-12 Thread Daniel P . Berrangé
On Fri, Nov 12, 2021 at 10:46:46AM +0300, Roman Kagan wrote:
> On Thu, Nov 11, 2021 at 06:59:43PM +0100, Philippe Mathieu-Daudé wrote:
> > On 11/11/21 16:33, Roman Kagan wrote:
> > > Fix the (hypothetical) potential problem when the value parsed out of
> > > the vhost module parameter in sysfs overflows the return value from
> > > vhost_kernel_memslots_limit.
> > > 
> > > Signed-off-by: Roman Kagan 
> > > ---
> > >  hw/virtio/vhost-backend.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > > index b65f8f7e97..44f7dbb243 100644
> > > --- a/hw/virtio/vhost-backend.c
> > > +++ b/hw/virtio/vhost-backend.c
> > > @@ -58,7 +58,7 @@ static int vhost_kernel_memslots_limit(struct vhost_dev 
> > > *dev)
> > >  if 
> > > (g_file_get_contents("/sys/module/vhost/parameters/max_mem_regions",
> > >  &s, NULL, NULL)) {
> > >  uint64_t val = g_ascii_strtoull(s, NULL, 10);
> > 
> > Would using qemu_strtou64() simplify this?
> 
> I'm afraid not.  None of the existing strtoXX converting functions has
> the desired output range (0 < retval < INT_MAX), so the following
> condition will remain necessary anyway; then it doesn't seem to matter
> which particular parser is used to extract the value which is in the
> range, so I left the one that was already there to reduce churn.

If  qemu_strtou64() can't handle all values in (0 < retval < INT_MAX)
isn't that a bug in qemu_strtou64 ?


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v4 04/25] include/sysemu/block-backend: split header into I/O and global state (GS) API

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

Similarly to the previous patches, split block-backend.h
in block-backend-io.h and block-backend-global-state.h

In addition, remove "block/block.h" include as it seems
it is not necessary anymore, together with "qemu/iov.h"

block-backend-common.h contains the structures shared between
the two headers, and the functions that can't be categorized as
I/O or global state.

Assertions are added in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  block/block-backend.c   |   9 +-
  include/sysemu/block-backend-common.h   |  74 ++
  include/sysemu/block-backend-global-state.h | 122 +
  include/sysemu/block-backend-io.h   | 139 ++
  include/sysemu/block-backend.h  | 269 +---
  5 files changed, 344 insertions(+), 269 deletions(-)
  create mode 100644 include/sysemu/block-backend-common.h
  create mode 100644 include/sysemu/block-backend-global-state.h
  create mode 100644 include/sysemu/block-backend-io.h

diff --git a/block/block-backend.c b/block/block-backend.c
index 39cd99df2b..0afc03fd66 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -79,6 +79,7 @@ struct BlockBackend {
  bool allow_aio_context_change;
  bool allow_write_beyond_eof;
  
+/* Protected by BQL lock */

  NotifierList remove_bs_notifiers, insert_bs_notifiers;
  QLIST_HEAD(, BlockBackendAioNotifier) aio_notifiers;
  
@@ -111,12 +112,14 @@ static const AIOCBInfo block_backend_aiocb_info = {

  static void drive_info_del(DriveInfo *dinfo);
  static BlockBackend *bdrv_first_blk(BlockDriverState *bs);
  
-/* All BlockBackends */

+/* All BlockBackends. Protected by BQL lock. */
  static QTAILQ_HEAD(, BlockBackend) block_backends =
  QTAILQ_HEAD_INITIALIZER(block_backends);
  
-/* All BlockBackends referenced by the monitor and which are iterated through by

- * blk_next() */
+/*
+ * All BlockBackends referenced by the monitor and which are iterated through 
by
+ * blk_next(). Protected by BQL lock.
+ */
  static QTAILQ_HEAD(, BlockBackend) monitor_block_backends =
  QTAILQ_HEAD_INITIALIZER(monitor_block_backends);
  
diff --git a/include/sysemu/block-backend-common.h b/include/sysemu/block-backend-common.h

new file mode 100644
index 00..52ff6a4d26
--- /dev/null
+++ b/include/sysemu/block-backend-common.h
@@ -0,0 +1,74 @@
+/*
+ * QEMU Block backends
+ *
+ * Copyright (C) 2014-2016 Red Hat, Inc.
+ *
+ * Authors:
+ *  Markus Armbruster ,
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1
+ * or later.  See the COPYING.LIB file in the top-level directory.
+ */
+
+#ifndef BLOCK_BACKEND_COMMON_H
+#define BLOCK_BACKEND_COMMON_H
+
+#include "block/throttle-groups.h"
+
+/* Callbacks for block device models */
+typedef struct BlockDevOps {
+/*
+ * Runs when virtual media changed (monitor commands eject, change)
+ * Argument load is true on load and false on eject.
+ * Beware: doesn't run when a host device's physical media
+ * changes.  Sure would be useful if it did.
+ * Device models with removable media must implement this callback.
+ */
+void (*change_media_cb)(void *opaque, bool load, Error **errp);
+/*
+ * Runs when an eject request is issued from the monitor, the tray
+ * is closed, and the medium is locked.
+ * Device models that do not implement is_medium_locked will not need
+ * this callback.  Device models that can lock the medium or tray might
+ * want to implement the callback and unlock the tray when "force" is
+ * true, even if they do not support eject requests.
+ */
+void (*eject_request_cb)(void *opaque, bool force);
+/*
+ * Is the virtual tray open?
+ * Device models implement this only when the device has a tray.
+ */
+bool (*is_tray_open)(void *opaque);
+/*
+ * Is the virtual medium locked into the device?
+ * Device models implement this only when device has such a lock.
+ */
+bool (*is_medium_locked)(void *opaque);
+/*
+ * Runs when the size changed (e.g. monitor command block_resize)
+ */
+void (*resize_cb)(void *opaque);
+/*
+ * Runs when the backend receives a drain request.
+ */
+void (*drained_begin)(void *opaque);
+/*
+ * Runs when the backend's last drain request ends.
+ */
+void (*drained_end)(void *opaque);
+/*
+ * Is the device still busy?
+ */
+bool (*drained_poll)(void *opaque);
+} BlockDevOps;
+
+/*
+ * This struct is embedded in (the private) BlockBackend struct and contains
+ * fields that must be public. This is in particular for QLIST_ENTRY() and
+ * friends so that BlockBackends can be kept in lists outside block-backend.c
+ */
+typedef struct BlockBackendPublic {
+ThrottleGroupMember throttle_group_member;
+} BlockBackendPublic;
+
+#endif /* BLOCK_BACKEND_COMMON_H */
diff --git a/include/sy

Re: [PATCH v2 0/1] Jobs based on custom runners: add CentOS Stream 8

2021-11-12 Thread Alex Bennée


Cleber Rosa  writes:

> This adds a new custom runner, showing an example of how other
> entities can add their own custom jobs to the GitLab CI pipeline.
>

>
> One example of a job introduced here, running on the host reserved for
> this purpose can be seen at:
>
>  - https://gitlab.com/cleber.gnu/qemu/-/jobs/1773761640

As I'm not going to be able to test myself I'll just queue this and we
can see it spring into live once merged ;-)

Queued to for-6.2/misc-fixes, thanks.

-- 
Alex Bennée



Re: [PATCH v4 05/25] block/block-backend.c: assertions for block-backend

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

All the global state (GS) API functions will check that
qemu_in_main_thread() returns true. If not, it means
that the safety of BQL cannot be guaranteed, and
they need to be moved to I/O.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  block/block-backend.c  | 90 +-
  softmmu/qdev-monitor.c |  2 +
  2 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 0afc03fd66..ed45576007 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c


[...]


@@ -1550,6 +1596,7 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t 
offset,
  
  void blk_aio_cancel(BlockAIOCB *acb)

  {
+assert(qemu_in_main_thread());
  bdrv_aio_cancel(acb);
  }
  


This function is in block-backend-io.h, though.

[...]


@@ -1879,7 +1936,6 @@ void blk_invalidate_cache(BlockBackend *blk, Error **errp)
  bool blk_is_inserted(BlockBackend *blk)
  {
  BlockDriverState *bs = blk_bs(blk);
-
  return bs && bdrv_is_inserted(bs);
  }


Seems like an unrelated hunk.

[...]


@@ -2443,11 +2529,13 @@ int coroutine_fn blk_co_copy_range(BlockBackend 
*blk_in, int64_t off_in,


[…]


  int blk_make_empty(BlockBackend *blk, Error **errp)
  {
+assert(qemu_in_main_thread());
  if (!blk_is_available(blk)) {
  error_setg(errp, "No medium inserted");
  return -ENOMEDIUM;


This function too is in block-backend-io.h.

Hanna




Re: [PATCH 05/10] vhost-backend: avoid overflow on memslots_limit

2021-11-12 Thread Roman Kagan
On Fri, Nov 12, 2021 at 09:56:17AM +, Daniel P. Berrangé wrote:
> On Fri, Nov 12, 2021 at 10:46:46AM +0300, Roman Kagan wrote:
> > On Thu, Nov 11, 2021 at 06:59:43PM +0100, Philippe Mathieu-Daudé wrote:
> > > On 11/11/21 16:33, Roman Kagan wrote:
> > > > Fix the (hypothetical) potential problem when the value parsed out of
> > > > the vhost module parameter in sysfs overflows the return value from
> > > > vhost_kernel_memslots_limit.
> > > > 
> > > > Signed-off-by: Roman Kagan 
> > > > ---
> > > >  hw/virtio/vhost-backend.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > > > index b65f8f7e97..44f7dbb243 100644
> > > > --- a/hw/virtio/vhost-backend.c
> > > > +++ b/hw/virtio/vhost-backend.c
> > > > @@ -58,7 +58,7 @@ static int vhost_kernel_memslots_limit(struct 
> > > > vhost_dev *dev)
> > > >  if 
> > > > (g_file_get_contents("/sys/module/vhost/parameters/max_mem_regions",
> > > >  &s, NULL, NULL)) {
> > > >  uint64_t val = g_ascii_strtoull(s, NULL, 10);
> > > 
> > > Would using qemu_strtou64() simplify this?
> > 
> > I'm afraid not.  None of the existing strtoXX converting functions has
> > the desired output range (0 < retval < INT_MAX), so the following
> > condition will remain necessary anyway; then it doesn't seem to matter
> > which particular parser is used to extract the value which is in the
> > range, so I left the one that was already there to reduce churn.
> 
> If  qemu_strtou64() can't handle all values in (0 < retval < INT_MAX)
> isn't that a bug in qemu_strtou64 ?

I must have been unclear.  It sure can handle all values in this range;
the point is that the range check after it would still be needed, so
switching from g_ascii_strtoull to qemu_strtoXX saves nothing, therefore
I left it as it was.

Thanks,
Roman.



Re: [PATCH v4 03/25] assertions for block global state API

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

All the global state (GS) API functions will check that
qemu_in_main_thread() returns true. If not, it means
that the safety of BQL cannot be guaranteed, and
they need to be moved to I/O.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  block.c| 136 +++--
  block/commit.c |   2 +
  block/io.c |  20 
  blockdev.c |   1 +
  4 files changed, 156 insertions(+), 3 deletions(-)


bdrv_make_zero() seems missing here – it can be considered an I/O or a 
GS function, but patch 2 classified it as GS.


Hanna




Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize

2021-11-12 Thread Kevin Wolf
Am 12.11.2021 um 08:39 hat Roman Kagan geschrieben:
> On Thu, Nov 11, 2021 at 06:52:30PM +0100, Kevin Wolf wrote:
> > Am 11.11.2021 um 16:33 hat Roman Kagan geschrieben:
> > > vhost-user-blk realize only attempts to reconnect if the previous
> > > connection attempt failed on "a problem with the connection and not an
> > > error related to the content (which would fail again the same way in the
> > > next attempt)".
> > > 
> > > However this distinction is very subtle, and may be inadvertently broken
> > > if the code changes somewhere deep down the stack and a new error gets
> > > propagated up to here.
> > > 
> > > OTOH now that the number of reconnection attempts is limited it seems
> > > harmless to try reconnecting on any error.
> > > 
> > > So relax the condition of whether to retry connecting to check for any
> > > error.
> > > 
> > > This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
> > > during realize".
> > > 
> > > Signed-off-by: Roman Kagan 
> > 
> > It results in less than perfect error messages. With a modified export
> > that just crashes qemu-storage-daemon during get_features, I get:
> > 
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to read 
> > msg header. Read 0 instead of 12. Original request 1.
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > after error: vhost_backend_init failed: Protocol error
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > after error: Failed to connect to '/tmp/vsock': Connection refused
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > after error: Failed to connect to '/tmp/vsock': Connection refused
> > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to connect 
> > to '/tmp/vsock': Connection refused
> 
> This patch doesn't change any error messages.  Which ones specifically
> became less than perfect as a result of this patch?

But it adds error messages (for each retry), which are different from
the first error message. As I said this is not the end of the world, but
maybe a bit more confusing.

> > I guess this might be tolerable. On the other hand, the patch doesn't
> > really fix anything either, but just gets rid of possible subtleties.
> 
> The remaining patches in the series make other errors beside -EPROTO
> propagate up to this point, and some (most) of them are retryable.  This
> was the reason to include this patch at the beginning of the series (I
> guess I should've mentioned that in the patch log).

I see. I hadn't looked at the rest of the series yet because I ran out
of time, but now that I'm skimming them, I see quite a few places that
use non-EPROTO, but I wonder which of them actually should be
reconnected. So far all I saw were presumably persistent errors where a
retry won't help. Can you give me some examples?

Kevin




Re: [PATCH v2 05/10] block: Pass BdrvChild ** to replace_child_noperm

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

11.11.2021 15:08, Hanna Reitz wrote:

bdrv_replace_child_noperm() modifies BdrvChild.bs, and can potentially
set it to NULL.  That is dangerous, because BDS parents generally assume
that their children's .bs pointer is never NULL.  We therefore want to
let bdrv_replace_child_noperm() set the corresponding BdrvChild pointer
to NULL, too.

This patch lays the foundation for it by passing a BdrvChild ** pointer
to bdrv_replace_child_noperm() so that it can later use it to NULL the
BdrvChild pointer immediately after setting BdrvChild.bs to NULL.

(We will still need to undertake some intermediate steps, though.)

Signed-off-by: Hanna Reitz 


Series already applied, but I still feel myself responsible to track how 
transactions changed:)

Don't bother with applying my r-b marks into applied series.


---
  block.c | 23 ---
  1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index c7d5aa5254..d668156eca 100644
--- a/block.c
+++ b/block.c
@@ -87,7 +87,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
  static bool bdrv_recurse_has_child(BlockDriverState *bs,
 BlockDriverState *child);
  
-static void bdrv_replace_child_noperm(BdrvChild *child,

+static void bdrv_replace_child_noperm(BdrvChild **child,
BlockDriverState *new_bs);
  static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
BdrvChild *child,
@@ -2270,7 +2270,7 @@ static void bdrv_replace_child_abort(void *opaque)
  BlockDriverState *new_bs = s->child->bs;
  
  /* old_bs reference is transparently moved from @s to @s->child */

-bdrv_replace_child_noperm(s->child, s->old_bs);
+bdrv_replace_child_noperm(&s->child, s->old_bs);


 - no sense / no harm in  clearing the pointer, as it's a field in transaction 
state struct, and should not be used after abort
 - hard to say do we really need clearing some another pointer, upper level 
should care about it


  bdrv_unref(new_bs);
  }
  
@@ -2300,7 +2300,7 @@ static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,

  if (new_bs) {
  bdrv_ref(new_bs);
  }
-bdrv_replace_child_noperm(child, new_bs);
+bdrv_replace_child_noperm(&child, new_bs);


 - no sence / no harm, as it's a local variable, which is not used anymore
 - most probably we have some another pointer that should be cleared, but it's 
not available here.. To make it available, bdrv_replace_child_tran() should get 
BdrvChild **.. maybe later patch will do it


  /* old_bs reference is transparently moved from @child to @s */
  }
  
@@ -2672,9 +2672,10 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)

  return permissions[qapi_perm];
  }
  
-static void bdrv_replace_child_noperm(BdrvChild *child,

+static void bdrv_replace_child_noperm(BdrvChild **childp,
BlockDriverState *new_bs)
  {
+BdrvChild *child = *childp;


No real logic change for now, OK


  BlockDriverState *old_bs = child->bs;
  int new_bs_quiesce_counter;
  int drain_saldo;
@@ -2767,7 +2768,7 @@ static void bdrv_attach_child_common_abort(void *opaque)
  BdrvChild *child = *s->child;
  BlockDriverState *bs = child->bs;
  
-bdrv_replace_child_noperm(child, NULL);

+bdrv_replace_child_noperm(s->child, NULL);


More interesting. Currently bdrv_replace_child_tran() clear the pointer as last action, so 
later we can remove this last "*s->child = NULL" as bdrv_replace_child_noperm() 
will do it.
No harm: in the the function we use local variable, initialized as *s->child.

  
  if (bdrv_get_aio_context(bs) != s->old_child_ctx) {

  bdrv_try_set_aio_context(bs, s->old_child_ctx, &error_abort);
@@ -2867,7 +2868,7 @@ static int bdrv_attach_child_common(BlockDriverState 
*child_bs,
  }
  
  bdrv_ref(child_bs);

-bdrv_replace_child_noperm(new_child, child_bs);
+bdrv_replace_child_noperm(&new_child, child_bs);


Here child_bs must not be NULL, otherwise bdrv_ref() crashes. So, nothing would 
be cleared.

  
  *child = new_child;
  
@@ -2922,12 +2923,12 @@ static int bdrv_attach_child_noperm(BlockDriverState *parent_bs,

  return 0;
  }
  
-static void bdrv_detach_child(BdrvChild *child)

+static void bdrv_detach_child(BdrvChild **childp)
  {
-BlockDriverState *old_bs = child->bs;
+BlockDriverState *old_bs = (*childp)->bs;
  
-bdrv_replace_child_noperm(child, NULL);

-bdrv_child_free(child);
+bdrv_replace_child_noperm(childp, NULL);


And here for sure we'll clear the pointer


+bdrv_child_free(*childp);


This obviously should be changed in further patches

  
  if (old_bs) {

  /*
@@ -3033,7 +3034,7 @@ void bdrv_root_unref_child(BdrvChild *child)
  BlockDriverState *child_bs;
  
  child_bs = child->bs;

-bdrv_detach_child(child);
+bdrv_detach_chil

Re: [PATCH v2 06/10] block: Restructure remove_file_or_backing_child()

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

11.11.2021 15:08, Hanna Reitz wrote:

As of a future patch, bdrv_replace_child_tran() will take a BdrvChild **
pointer.  Prepare for that by getting such a pointer and using it where
applicable, and (dereferenced) as a parameter for
bdrv_replace_child_tran().

Signed-off-by: Hanna Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH v4 06/25] include/block/block_int: split header into I/O and global state API

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

Similarly to the previous patch, split block_int.h
in block_int-io.h and block_int-global-state.h

block_int-common.h contains the structures shared between
the two headers, and the functions that can't be categorized as
I/O or global state.

Assertions are added in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  blockdev.c |5 +
  include/block/block_int-common.h   | 1164 +++
  include/block/block_int-global-state.h |  319 +
  include/block/block_int-io.h   |  163 +++
  include/block/block_int.h  | 1478 +---
  5 files changed, 1654 insertions(+), 1475 deletions(-)
  create mode 100644 include/block/block_int-common.h
  create mode 100644 include/block/block_int-global-state.h
  create mode 100644 include/block/block_int-io.h


[...]


diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
new file mode 100644
index 00..79a3d801d2
--- /dev/null
+++ b/include/block/block_int-common.h


[...]


+struct BlockDriver {


[...]


+/**
+ * Try to get @bs's logical and physical block size.
+ * On success, store them in @bsz and return zero.
+ * On failure, return negative errno.
+ */
+/* I/O API, even though if it's a filter jumps on parent */


I don’t understand this...


+int (*bdrv_probe_blocksizes)(BlockDriverState *bs, BlockSizes *bsz);
+/**
+ * Try to get @bs's geometry (cyls, heads, sectors)
+ * On success, store them in @geo and return 0.
+ * On failure return -errno.
+ * Only drivers that want to override guest geometry implement this
+ * callback; see hd_geometry_guess().
+ */
+/* I/O API, even though if it's a filter jumps on parent */


...or this comment.  bdrv_probe_blocksizes() and bdrv_probe_geometry() 
are in block-global-state.h, so why are these methods part of the I/O 
API?  (And I’m afraid I can’t parse “even though if it’s a filter jumps 
on parent”.)


Hanna


+int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);





Re: [PATCH v2 07/10] transactions: Invoke clean() after everything else

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

11.11.2021 15:08, Hanna Reitz wrote:

Invoke the transaction drivers' .clean() methods only after all
.commit() or .abort() handlers are done.

This makes it easier to have nested transactions where the top-level
transactions pass objects to lower transactions that the latter can
still use throughout their commit/abort phases, while the top-level
transaction keeps a reference that is released in its .clean() method.

(Before this commit, that is also possible, but the top-level
transaction would need to take care to invoke tran_add() before the
lower-level transaction does.  This commit makes the ordering
irrelevant, which is just a bit nicer.)

Signed-off-by: Hanna Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH v4 02/25] include/block/block: split header into I/O and global state API

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

block.h currently contains a mix of functions:
some of them run under the BQL and modify the block layer graph,
others are instead thread-safe and perform I/O in iothreads.
It is not easy to understand which function is part of which
group (I/O vs GS), and this patch aims to clarify it.

The "GS" functions need the BQL, and often use
aio_context_acquire/release and/or drain to be sure they
can modify the graph safely.
The I/O function are instead thread safe, and can run in
any AioContext.

By splitting the header in two files, block-io.h
and block-global-state.h we have a clearer view on what
needs what kind of protection. block-common.h
contains common structures shared by both headers.

block.h is left there for legacy and to avoid changing
all includes in all c files that use the block APIs.

Assertions are added in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  block.c|   3 +
  block/meson.build  |   7 +-
  include/block/block-common.h   | 389 +
  include/block/block-global-state.h | 286 ++
  include/block/block-io.h   | 306 ++
  include/block/block.h  | 878 +
  6 files changed, 1012 insertions(+), 857 deletions(-)
  create mode 100644 include/block/block-common.h
  create mode 100644 include/block/block-global-state.h
  create mode 100644 include/block/block-io.h


[...]


diff --git a/include/block/block-common.h b/include/block/block-common.h
new file mode 100644
index 00..4f1fd8de21
--- /dev/null
+++ b/include/block/block-common.h


[...]


+#define BLKDBG_EVENT(child, evt) \
+do { \
+if (child) { \
+bdrv_debug_event(child->bs, evt); \
+} \
+} while (0)


This is defined twice, once here, and...


diff --git a/include/block/block-io.h b/include/block/block-io.h
new file mode 100644
index 00..9af4609ccb
--- /dev/null
+++ b/include/block/block-io.h


[...]


+#define BLKDBG_EVENT(child, evt) \
+do { \
+if (child) { \
+bdrv_debug_event(child->bs, evt); \
+} \
+} while (0)


...once here.

[...]


+/**
+ * bdrv_drained_begin:
+ *
+ * Begin a quiesced section for exclusive access to the BDS, by disabling
+ * external request sources including NBD server and device model. Note that
+ * this doesn't block timers or coroutines from submitting more requests, which
+ * means block_job_pause is still necessary.


Where does this sentence come from?  I can’t see it in master or in the 
lines removed from block.h:



+ *
+ * This function can be recursive.
+ */
+void bdrv_drained_begin(BlockDriverState *bs);


[...]


diff --git a/include/block/block.h b/include/block/block.h
index e5dd22b034..1e6b8fef1e 100644
--- a/include/block/block.h
+++ b/include/block/block.h


[...]


-/**
- * bdrv_drained_begin:
- *
- * Begin a quiesced section for exclusive access to the BDS, by disabling
- * external request sources including NBD server, block jobs, and device model.
- *
- * This function can be recursive.
- */
-void bdrv_drained_begin(BlockDriverState *bs);





Re: [PATCH v4 04/25] include/sysemu/block-backend: split header into I/O and global state (GS) API

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

Similarly to the previous patches, split block-backend.h
in block-backend-io.h and block-backend-global-state.h

In addition, remove "block/block.h" include as it seems
it is not necessary anymore, together with "qemu/iov.h"

block-backend-common.h contains the structures shared between
the two headers, and the functions that can't be categorized as
I/O or global state.

Assertions are added in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  block/block-backend.c   |   9 +-
  include/sysemu/block-backend-common.h   |  74 ++
  include/sysemu/block-backend-global-state.h | 122 +
  include/sysemu/block-backend-io.h   | 139 ++
  include/sysemu/block-backend.h  | 269 +---
  5 files changed, 344 insertions(+), 269 deletions(-)
  create mode 100644 include/sysemu/block-backend-common.h
  create mode 100644 include/sysemu/block-backend-global-state.h
  create mode 100644 include/sysemu/block-backend-io.h


[...]


diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index e5e1524f06..038be9fc40 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -13,272 +13,9 @@
  #ifndef BLOCK_BACKEND_H
  #define BLOCK_BACKEND_H
  
-#include "qemu/iov.h"

-#include "block/throttle-groups.h"
+#include "block-backend-global-state.h"
+#include "block-backend-io.h"
  
-/*

- * TODO Have to include block/block.h for a bunch of block layer
- * types.  Unfortunately, this pulls in the whole BlockDriverState
- * API, which we don't want used by many BlockBackend users.  Some of
- * the types belong here, and the rest should be split into a common
- * header and one for the BlockDriverState API.
- */
-#include "block/block.h"


This note and the include is gone.  Sounds like something positive, but 
why is this possible?


Hanna




Re: does drive_get_next(IF_NONE) make sense?

2021-11-12 Thread Markus Armbruster
Thomas Huth  writes:

> On 03/11/2021 09.41, Markus Armbruster wrote:
>> Peter Maydell  writes:
>> 
>>> Does it make sense for a device/board to do drive_get_next(IF_NONE) ?
>> Short answer: hell, no!  ;)
>
> Would it make sense to add an "assert(type != IF_NONE)" to drive_get()
> to avoid such mistakes in the future?

Worth a try.




Re: [PATCH v4 07/25] assertions for block_int global state API

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  block.c | 17 +
  block/backup.c  |  1 +
  block/block-backend.c   |  3 +++
  block/commit.c  |  2 ++
  block/dirty-bitmap.c|  1 +
  block/io.c  |  6 ++
  block/mirror.c  |  4 
  block/monitor/bitmap-qmp-cmds.c |  6 ++
  block/stream.c  |  2 ++
  blockdev.c  |  7 +++
  10 files changed, 49 insertions(+)

diff --git a/block.c b/block.c
index 672f946065..41c5883c5c 100644
--- a/block.c
+++ b/block.c


[...]


@@ -7473,6 +7488,7 @@ static bool append_strong_runtime_options(QDict *d, 
BlockDriverState *bs)
   * would result in exactly bs->backing. */
  bool bdrv_backing_overridden(BlockDriverState *bs)
  {
+assert(qemu_in_main_thread());
  if (bs->backing) {
  return strcmp(bs->auto_backing_file,
bs->backing->bs->filename);


This function is in block_int-common.h, though.

[...]


diff --git a/block/io.c b/block/io.c
index c5d7f8495e..f271ab3684 100644
--- a/block/io.c
+++ b/block/io.c


[...]


@@ -3419,6 +3423,7 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, 
int64_t src_offset,
  {
  trace_bdrv_co_copy_range_from(src, src_offset, dst, dst_offset, bytes,
read_flags, write_flags);
+assert(qemu_in_main_thread());
  return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
 bytes, read_flags, write_flags, true);
  }


This is a block_int-io.h function.


@@ -3435,6 +3440,7 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, 
int64_t src_offset,
  {
  trace_bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes,
  read_flags, write_flags);
+assert(qemu_in_main_thread());
  return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
 bytes, read_flags, write_flags, false);
  }


This, too.

Hanna




Re: [PATCH v4 08/25] block: introduce assert_bdrv_graph_writable

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

We want to be sure that the functions that write the child and
parent list of a bs are under BQL and drain.

BQL prevents from concurrent writings from the GS API, while
drains protect from I/O.

TODO: drains are missing in some functions using this assert.
Therefore a proper assertion will fail. Because adding drains
requires additional discussions, they will be added in future
series.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Stefan Hajnoczi 
---
  block.c|  5 +
  block/io.c | 11 +++
  include/block/block_int-global-state.h | 10 +-
  3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 41c5883c5c..94bff5c757 100644
--- a/block.c
+++ b/block.c
@@ -2734,12 +2734,14 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
  if (child->klass->detach) {
  child->klass->detach(child);
  }
+assert_bdrv_graph_writable(old_bs);
  QLIST_REMOVE(child, next_parent);


I think this belongs above the .detach() call (and the QLIST_REMOVE() 
belongs into the .detach() implementation, as done in 
https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00240.html, 
which has been merged to Kevin’s block branch).



  }
  
  child->bs = new_bs;
  
  if (new_bs) {

+assert_bdrv_graph_writable(new_bs);
  QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);


In both these places it’s a bit strange that the assertion is done on 
the child nodes.  The subgraph starting from them isn’t modified after 
all, so their subgraph technically doesn’t need to be writable.  I think 
a single assertion on the parent node would be preferable.


I presume the problem with that is that we don’t have the parent node 
here?  Do we need a new BdrvChildClass method that performs this 
assertion on the parent node?


  
  /*

@@ -2940,6 +2942,7 @@ static int bdrv_attach_child_noperm(BlockDriverState 
*parent_bs,
  return ret;
  }
  
+assert_bdrv_graph_writable(parent_bs);

  QLIST_INSERT_HEAD(&parent_bs->children, *child, next);
  /*
   * child is removed in bdrv_attach_child_common_abort(), so don't care to
@@ -3140,6 +3143,7 @@ static void bdrv_unset_inherits_from(BlockDriverState 
*root, BdrvChild *child,
  void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
  {
  assert(qemu_in_main_thread());
+assert_bdrv_graph_writable(parent);


It looks to me like we have this assertion mainly because 
bdrv_replace_child_noperm() doesn’t have a pointer to this parent node.  
It’s a workaround, but we should have this in every path that eventually 
ends up at bdrv_replace_child_noperm(), and that seems rather difficult 
for the bdrv_replace_node() family of functions. That to me sounds like 
it’d be good to have this as a BdrvChildClass function.



  if (child == NULL) {
  return;
  }
@@ -4903,6 +4907,7 @@ static void bdrv_remove_filter_or_cow_child_abort(void 
*opaque)
  BdrvRemoveFilterOrCowChild *s = opaque;
  BlockDriverState *parent_bs = s->child->opaque;
  
+assert_bdrv_graph_writable(parent_bs);

  QLIST_INSERT_HEAD(&parent_bs->children, s->child, next);
  if (s->is_backing) {
  parent_bs->backing = s->child;
diff --git a/block/io.c b/block/io.c
index f271ab3684..1c71e354d6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -740,6 +740,17 @@ void bdrv_drain_all(void)
  bdrv_drain_all_end();
  }
  
+void assert_bdrv_graph_writable(BlockDriverState *bs)

+{
+/*
+ * TODO: this function is incomplete. Because the users of this
+ * assert lack the necessary drains, check only for BQL.
+ * Once the necessary drains are added,
+ * assert also for qatomic_read(&bs->quiesce_counter) > 0
+ */
+assert(qemu_in_main_thread());
+}
+
  /**
   * Remove an active request from the tracked requests list
   *
diff --git a/include/block/block_int-global-state.h 
b/include/block/block_int-global-state.h
index d08e80222c..6bd7746409 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -316,4 +316,12 @@ void bdrv_remove_aio_context_notifier(BlockDriverState *bs,
   */
  void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
  
-#endif /* BLOCK_INT_GLOBAL_STATE*/

+/**
+ * Make sure that the function is either running under
+ * drain and BQL. The latter protects from concurrent writings


“either ... and” sounds wrong to me.  I’d drop the “either” or say 
“running under both drain and BQL”.


Hanna


+ * from the GS API, while the former prevents concurrent reads
+ * from I/O.
+ */
+void assert_bdrv_graph_writable(BlockDriverState *bs);
+
+#endif /* BLOCK_INT_GLOBAL_STATE */





Re: [PATCH v4 10/25] assertions for blockjob_int.h

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  blockjob.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 4bad1408cb..fbd6c7d873 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -83,6 +83,7 @@ BlockJob *block_job_get(const char *id)
  
  void block_job_free(Job *job)

  {
+assert(qemu_in_main_thread());
  BlockJob *bjob = container_of(job, BlockJob, job);


Our coding style (docs/devel/style.rst) requires all statements to come 
after all declarations in a block, so the assert() may not precede the 
bjob declaration.


  
  block_job_remove_all_bdrv(bjob);

@@ -436,6 +437,8 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
  BlockBackend *blk;
  BlockJob *job;
  
+assert(qemu_in_main_thread());

+
  if (job_id == NULL && !(flags & JOB_INTERNAL)) {
  job_id = bdrv_get_device_name(bs);
  }
@@ -504,6 +507,7 @@ void block_job_iostatus_reset(BlockJob *job)
  
  void block_job_user_resume(Job *job)

  {
+assert(qemu_in_main_thread());
  BlockJob *bjob = container_of(job, BlockJob, job);


Same here.

(And now I see that I’ve missed such instances in the other assertion 
patches, like in bdrv_save_vmstate(), those should be fixed, too)


Hanna


  block_job_iostatus_reset(bjob);
  }





Re: [PATCH v4 12/25] assertions for blockob.h global state API

2021-11-12 Thread Hanna Reitz

Subject: s/blockob.h/blockjob.h/

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  blockjob.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index fbd6c7d873..4982f6a2b5 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -61,6 +61,7 @@ static bool is_block_job(Job *job)
  
  BlockJob *block_job_next(BlockJob *bjob)

  {
+assert(qemu_in_main_thread());
  Job *job = bjob ? &bjob->job : NULL;


Here and...


  do {
@@ -72,6 +73,7 @@ BlockJob *block_job_next(BlockJob *bjob)
  
  BlockJob *block_job_get(const char *id)

  {
+assert(qemu_in_main_thread());
  Job *job = job_get(id);


...here, the assertion and declaration should be switched.

Hanna




Re: [PATCH v2 08/10] block: Let replace_child_tran keep indirect pointer

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

11.11.2021 15:08, Hanna Reitz wrote:

As of a future commit, bdrv_replace_child_noperm() will clear the
indirect BdrvChild pointer passed to it if the new child BDS is NULL.
bdrv_replace_child_tran() will want to let it do that, but revert this
change in its abort handler.  For that, we need to have it receive a
BdrvChild ** pointer, too, and keep it stored in the
BdrvReplaceChildState object that we attach to the transaction.

Note that we do not need to store it in the BdrvReplaceChildState when
new_bs is not NULL, because then there is nothing to revert.  This is
important so that bdrv_replace_node_noperm() can pass a pointer to a
loop-local variable to bdrv_replace_child_tran() without worrying that
this pointer will outlive one loop iteration.

(Of course, for that to work, bdrv_replace_node_noperm() and in turn
bdrv_replace_node() and its relatives may not be called with a NULL @to
node.  Luckily, they already are not, but now we should assert this.)

bdrv_remove_file_or_backing_child() on the other hand needs to ensure
that the indirect pointer it passes will stay valid for the duration of
the transaction.  Ensure this by keeping a strong reference to the BDS
whose &bs->backing or &bs->file it passes to bdrv_replace_child_tran(),
and giving up that reference only in the transaction .clean() handler.

Signed-off-by: Hanna Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 


--
Best regards,
Vladimir



Re: [PATCH v4 13/25] include/sysemu/blockdev.h: move drive_add and inline drive_def

2021-11-12 Thread Hanna Reitz

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

drive_add is only used in softmmu/vl.c, so it can be a static
function there,and drive_def is only a particular use case of
qemu_opts_parse_noisily, so it can be inlined.

Also remove drive_mark_claimed_by_board, as it is only defined
but not implemented (nor used) anywhere.

This also helps simplifying next patch.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Stefan Hajnoczi 
---
  block/monitor/block-hmp-cmds.c |  2 +-
  blockdev.c | 27 +--
  include/sysemu/blockdev.h  |  6 ++
  softmmu/vl.c   | 25 -
  4 files changed, 28 insertions(+), 32 deletions(-)


[...]


diff --git a/blockdev.c b/blockdev.c
index c1f6171c6c..1bf49ef610 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -73,7 +73,7 @@ void bdrv_set_monitor_owned(BlockDriverState *bs)
  QTAILQ_INSERT_TAIL(&monitor_bdrv_states, bs, monitor_list);
  }
  
-static const char *const if_name[IF_COUNT] = {

+const char *const if_name[IF_COUNT] = {


When making this global, I’d give its name a prefix, like 
`block_if_name` (or even `block_if_type_to_name`).


Hanna


  [IF_NONE] = "none",
  [IF_IDE] = "ide",
  [IF_SCSI] = "scsi",





Re: [PATCH v4 14/25] include/systemu/blockdev.h: global state API

2021-11-12 Thread Hanna Reitz

Subject: s/systemu/sysemu/

On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

blockdev functions run always under the BQL lock.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  include/sysemu/blockdev.h | 18 ++
  1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index 960b54d320..b07f15df09 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -13,9 +13,6 @@
  #include "block/block.h"
  #include "qemu/queue.h"
  
-void blockdev_mark_auto_del(BlockBackend *blk);

-void blockdev_auto_del(BlockBackend *blk);
-
  typedef enum {
  IF_DEFAULT = -1,/* for use with drive_add() only */
  /*
@@ -40,6 +37,16 @@ struct DriveInfo {
  QTAILQ_ENTRY(DriveInfo) next;
  };
  
+/*

+ * Global state (GS) API. These functions run under the BQL lock.
+ *
+ * See include/block/block-global-state.h for more information about
+ * the GS API.
+ */
+
+void blockdev_mark_auto_del(BlockBackend *blk);
+void blockdev_auto_del(BlockBackend *blk);
+
  DriveInfo *blk_legacy_dinfo(BlockBackend *blk);
  DriveInfo *blk_set_legacy_dinfo(BlockBackend *blk, DriveInfo *dinfo);
  BlockBackend *blk_by_legacy_dinfo(DriveInfo *dinfo);
@@ -50,10 +57,13 @@ DriveInfo *drive_get(BlockInterfaceType type, int bus, int 
unit);
  void drive_check_orphaned(void);
  DriveInfo *drive_get_by_index(BlockInterfaceType type, int index);
  int drive_get_max_bus(BlockInterfaceType type);
-int drive_get_max_devs(BlockInterfaceType type);
  DriveInfo *drive_get_next(BlockInterfaceType type);
  
  DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType block_default_type,

   Error **errp);
  
+/* Common functions that are neither I/O nor Global State */

+
+int drive_get_max_devs(BlockInterfaceType type);
+


It seems to me like this function is never used and could just be 
dropped.  In any case, if it were used, it looks to me like it’d be used 
in a GS context.  (Not that I know anything about it, but I don’t see 
what makes it different from the other functions here.)


Hanna




Re: [PATCH v2 09/10] block: Let replace_child_noperm free children

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

11.11.2021 15:08, Hanna Reitz wrote:

In most of the block layer, especially when traversing down from other
BlockDriverStates, we assume that BdrvChild.bs can never be NULL.  When
it becomes NULL, it is expected that the corresponding BdrvChild pointer
also becomes NULL and the BdrvChild object is freed.

Therefore, once bdrv_replace_child_noperm() sets the BdrvChild.bs
pointer to NULL, it should also immediately set the corresponding
BdrvChild pointer (like bs->file or bs->backing) to NULL.

In that context, it also makes sense for this function to free the
child.  Sometimes we cannot do so, though, because it is called in a
transactional context where the caller might still want to reinstate the
child in the abort branch (and free it only on commit), so this behavior
has to remain optional.

In bdrv_replace_child_tran()'s abort handler, we now rely on the fact
that the BdrvChild passed to bdrv_replace_child_tran() must have had a
non-NULL .bs pointer initially.  Make a note of that and assert it.

Signed-off-by: Hanna Reitz 
---
  block.c | 102 +++-
  1 file changed, 79 insertions(+), 23 deletions(-)

diff --git a/block.c b/block.c
index a40027161c..0ac5b163d2 100644
--- a/block.c
+++ b/block.c
@@ -87,8 +87,10 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
  static bool bdrv_recurse_has_child(BlockDriverState *bs,
 BlockDriverState *child);
  
+static void bdrv_child_free(BdrvChild *child);

  static void bdrv_replace_child_noperm(BdrvChild **child,
-  BlockDriverState *new_bs);
+  BlockDriverState *new_bs,
+  bool free_empty_child);
  static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
BdrvChild *child,
Transaction *tran);
@@ -2256,12 +2258,16 @@ typedef struct BdrvReplaceChildState {
  BdrvChild *child;
  BdrvChild **childp;
  BlockDriverState *old_bs;
+bool free_empty_child;
  } BdrvReplaceChildState;
  
  static void bdrv_replace_child_commit(void *opaque)

  {
  BdrvReplaceChildState *s = opaque;
  
+if (s->free_empty_child && !s->child->bs) {

+bdrv_child_free(s->child);
+}
  bdrv_unref(s->old_bs);
  }
  
@@ -2278,22 +2284,26 @@ static void bdrv_replace_child_abort(void *opaque)

   * modify the BdrvChild * pointer we indirectly pass to it, i.e. it
   * will not modify s->child.  From that perspective, it does not 
matter
   * whether we pass s->childp or &s->child.
- * (TODO: Right now, bdrv_replace_child_noperm() never modifies that
- * pointer anyway (though it will in the future), so at this point it
- * absolutely does not matter whether we pass s->childp or &s->child.)
   * (2) If new_bs is not NULL, s->childp will be NULL.  We then cannot use
   * it here.
   * (3) If new_bs is NULL, *s->childp will have been NULLed by
   * bdrv_replace_child_tran()'s bdrv_replace_child_noperm() call, and 
we
   * must not pass a NULL *s->childp here.
- * (TODO: In its current state, bdrv_replace_child_noperm() will not
- * have NULLed *s->childp, so this does not apply yet.  It will in the
- * future.)


What I don't like about this patch is that it does two different things: 
zeroing the pointer and clearing the object. And if we look at the latter in 
separate, it seems that it's not needed:

Look: bdrv_replace_child_tran(): new parameter is set to true in two places, in 
both of them we are sure (and do assertion and comment) that new bs is not NULL 
and nothing will be freed.

Similarly, bdrv_replace_child_noperm() is called with true in two places where 
we sure that new bs is not NULL.

and only one place where new parameter set to true really do something:


@@ -2960,8 +3013,7 @@ static void bdrv_detach_child(BdrvChild **childp)
  {
  BlockDriverState *old_bs = (*childp)->bs;
  
-bdrv_replace_child_noperm(childp, NULL);

-bdrv_child_free(*childp);
+bdrv_replace_child_noperm(childp, NULL, true);
  
  if (old_bs) {

  /*


And it doesn't worth the whole complexity of new parameters for two functions.

In this place we can simply do something like

BdrvChild *child = *childp;

bdrv_replace_child_noperm(childp, NULL);

bdrv_child_free(child);


I understand the idea: it seems good and intuitive to do zeroing the pointer and clearing 
the object in one shot. But this patch itself shows that we just can't do it in 90% of 
cases. So, I think better is not do it and live with only "zeroing the pointer" 
part of this patch.





Another idea that come to my mind while reviewing this series: did you consider zeroing 
bs->file / bs->backing in .detach, like you do with bs->children list at start of 
the series?  We c

Re: [PATCH v2 10/10] iotests/030: Unthrottle parallel jobs in reverse

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

11.11.2021 15:08, Hanna Reitz wrote:

See the comment for why this is necessary.

Signed-off-by: Hanna Reitz 
---
  tests/qemu-iotests/030 | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 5fb65b4bef..567bf1da67 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -251,7 +251,16 @@ class TestParallelOps(iotests.QMPTestCase):
   speed=1024)
  self.assert_qmp(result, 'return', {})
  
-for job in pending_jobs:

+# Do this in reverse: After unthrottling them, some jobs may finish
+# before we have unthrottled all of them.  This will drain their
+# subgraph, and this will make jobs above them advance (despite those
+# jobs on top being throttled).  In the worst case, all jobs below the
+# top one are finished before we can unthrottle it, and this makes it
+# advance so far that it completes before we can unthrottle it - which
+# results in an error.
+# Starting from the top (i.e. in reverse) does not have this problem:
+# When a job finishes, the ones below it are not advanced.


Hmm, interesting why only jobs above the finished job may advance in the 
situation..

Looks like something may change and this workaround will stop working.

Isn't it better just handle the error, and don't care if job was just finished?

Something like

if result['return'] != {}:
   # Job was finished during drain caused by finish of already unthrottled job
   self.assert_qmp(result, 'error/class', 'DeviceNotActive')

Next thing in the test case is checking for completion events, so we'll get all 
events anyway.



+for job in reversed(pending_jobs):
  result = self.vm.qmp('block-job-set-speed', device=job, speed=0)
  self.assert_qmp(result, 'return', {})
  




--
Best regards,
Vladimir



Re: [PATCH v1] job.c: add missing notifier initialization

2021-11-12 Thread Vladimir Sementsov-Ogievskiy

03.11.2021 19:21, Emanuele Giuseppe Esposito wrote:

It seems that on_idle list is not properly initialized like
the other notifiers.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  job.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/job.c b/job.c
index dbfa67bb0a..54db80df66 100644
--- a/job.c
+++ b/job.c
@@ -352,6 +352,7 @@ void *job_create(const char *job_id, const JobDriver 
*driver, JobTxn *txn,
  notifier_list_init(&job->on_finalize_completed);
  notifier_list_init(&job->on_pending);
  notifier_list_init(&job->on_ready);
+notifier_list_init(&job->on_idle);
  
  job_state_transition(job, JOB_STATUS_CREATED);

  aio_timer_init(qemu_get_aio_context(), &job->sleep_timer,



Reviewed-by: Vladimir Sementsov-Ogievskiy 


I don't think it worth applying it now:

job object is alloced with g_malloc0, so job->on_idle is initialized to zero.

notifier_list_init() simply calls QLIST_INIT(), which initializes the only 
field of QLIST structure to NULL. So, actually these notifier_list_init() calls 
are no-op in this context.

I queue it in jobs branch, but will not send a pull request until more critical 
fix comes for 6.2 or 6.3 development starts.

Thanks!

--
Best regards,
Vladimir



Re: [PATCH v2 1/1] Jobs based on custom runners: add CentOS Stream 8

2021-11-12 Thread Willian Rampazzo
On Thu, Nov 11, 2021 at 1:06 PM Cleber Rosa  wrote:
>
> This introduces three different parts of a job designed to run
> on a custom runner managed by Red Hat.  The goals include:
>
>   a) propose a model for other organizations that want to onboard
>  their own runners, with their specific platforms, build
>  configuration and tests.
>
>   b) bring awareness to the differences between upstream QEMU and the
>  version available under CentOS Stream, which is "A preview of
>  upcoming Red Hat Enterprise Linux minor and major releases".
>
>   c) because of b), it should be easier to identify and reduce the gap
>  between Red Hat's downstream and upstream QEMU.
>
> The components of this custom job are:
>
>   I) OS build environment setup code:
>
>  - additions to the existing "build-environment.yml" playbook
>that can be used to set up CentOS/EL 8 systems.
>
>  - a CentOS Stream 8 specific "build-environment.yml" playbook
>that adds to the generic one.
>
>  II) QEMU build configuration: a script that will produce binaries with
>  features as similar as possible to the ones built and packaged on
>  CentOS stream 8.
>
> III) Scripts that define the minimum amount of testing that the
>  binaries built with the given configuration (point II) under the
>  given OS build environment (point I) should be subjected to.
>
>  IV) Job definition: GitLab CI jobs that will dispatch the build/test
>  jobs (see points #II and #III) to the machine specifically
>  configured according to #I.
>
> Signed-off-by: Cleber Rosa 
> ---
>  .gitlab-ci.d/custom-runners.yml   |  29 +++
>  docs/devel/ci-jobs.rst.inc|   7 +
>  .../org.centos/stream/8/build-environment.yml |  51 +
>  .../ci/org.centos/stream/8/x86_64/configure   | 208 ++
>  .../org.centos/stream/8/x86_64/test-avocado   |  70 ++
>  scripts/ci/org.centos/stream/README   |  17 ++
>  scripts/ci/setup/build-environment.yml|  38 
>  7 files changed, 420 insertions(+)
>  create mode 100644 scripts/ci/org.centos/stream/8/build-environment.yml
>  create mode 100755 scripts/ci/org.centos/stream/8/x86_64/configure
>  create mode 100755 scripts/ci/org.centos/stream/8/x86_64/test-avocado
>  create mode 100644 scripts/ci/org.centos/stream/README
>

Maybe it is too late, but just for the records:

Reviewed-by: Willian Rampazzo 
Tested-by: Willian Rampazzo 

CI job on a custom VM runner:
https://gitlab.com/willianrampazzo/qemu/-/jobs/1778451942




Re: [PATCH 04/10] chardev/char-fe: don't allow EAGAIN from blocking read

2021-11-12 Thread Roman Kagan
On Fri, Nov 12, 2021 at 12:24:06PM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Nov 11, 2021 at 7:44 PM Roman Kagan  wrote:
> 
> > As its name suggests, ChardevClass.chr_sync_read is supposed to do a
> > blocking read.  The only implementation of it, tcp_chr_sync_read, does
> > set the underlying io channel to the blocking mode indeed.
> >
> > Therefore a failure return with EAGAIN is not expected from this call.
> >
> > So do not retry it in qemu_chr_fe_read_all; instead place an assertion
> > that it doesn't fail with EAGAIN.
> >
> 
> The code was introduced in :
> commit 7b0bfdf52d694c9a3a96505aa42ce3f8d63acd35
> Author: Nikolay Nikolaev 
> Date:   Tue May 27 15:03:48 2014 +0300
> 
> Add chardev API qemu_chr_fe_read_all

Right, but at that point chr_sync_read wasn't made to block.  It
happened later in

commit bcdeb9be566ded2eb35233aaccf38742a21e5daa
Author: Marc-André Lureau 
Date:   Thu Jul 6 19:03:53 2017 +0200

chardev: block during sync read

A sync read should block until all requested data is
available (instead of retrying in qemu_chr_fe_read_all). Change the
channel to blocking during sync_read.

> > @@ -68,13 +68,10 @@ int qemu_chr_fe_read_all(CharBackend *be, uint8_t
> > *buf, int len)
> >  }
> >
> >  while (offset < len) {
> > -retry:
> >  res = CHARDEV_GET_CLASS(s)->chr_sync_read(s, buf + offset,
> >len - offset);
> > -if (res == -1 && errno == EAGAIN) {
> > -g_usleep(100);
> > -goto retry;
> > -}
> > +/* ->chr_sync_read should block */
> > +assert(!(res < 0 && errno == EAGAIN));
> >
> >
> While I agree with the rationale to clean this code a bit, I am not so sure
> about replacing it with an assert(). In the past, when we did such things
> we had unexpected regressions :)

Valid point, qemu may be run against some OS where a blocking call may
sporadically return -EAGAIN, and it would be hard to reliably catch this
with testing.

> A slightly better approach perhaps is g_warn_if_fail(), although it's not
> very popular in qemu.

I think the first thing to decide is whether -EAGAIN from a blocking
call isn't broken enough, and justifies (unlimited) retries.  I'm
tempted to just remove any special handling of -EAGAIN and treat it as
any other error, leaving up to the caller to handle (most probably to
fail the call and initiate a recovery, if possible).

Does this make sense?

Thanks,
Roman.



Re: [PATCH 01/10] vhost-user-blk: reconnect on any error during realize

2021-11-12 Thread Roman Kagan
On Fri, Nov 12, 2021 at 12:37:59PM +0100, Kevin Wolf wrote:
> Am 12.11.2021 um 08:39 hat Roman Kagan geschrieben:
> > On Thu, Nov 11, 2021 at 06:52:30PM +0100, Kevin Wolf wrote:
> > > Am 11.11.2021 um 16:33 hat Roman Kagan geschrieben:
> > > > vhost-user-blk realize only attempts to reconnect if the previous
> > > > connection attempt failed on "a problem with the connection and not an
> > > > error related to the content (which would fail again the same way in the
> > > > next attempt)".
> > > > 
> > > > However this distinction is very subtle, and may be inadvertently broken
> > > > if the code changes somewhere deep down the stack and a new error gets
> > > > propagated up to here.
> > > > 
> > > > OTOH now that the number of reconnection attempts is limited it seems
> > > > harmless to try reconnecting on any error.
> > > > 
> > > > So relax the condition of whether to retry connecting to check for any
> > > > error.
> > > > 
> > > > This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
> > > > during realize".
> > > > 
> > > > Signed-off-by: Roman Kagan 
> > > 
> > > It results in less than perfect error messages. With a modified export
> > > that just crashes qemu-storage-daemon during get_features, I get:
> > > 
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to read 
> > > msg header. Read 0 instead of 12. Original request 1.
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > > after error: vhost_backend_init failed: Protocol error
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > > after error: Failed to connect to '/tmp/vsock': Connection refused
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Reconnecting 
> > > after error: Failed to connect to '/tmp/vsock': Connection refused
> > > qemu-system-x86_64: -device vhost-user-blk-pci,chardev=c: Failed to 
> > > connect to '/tmp/vsock': Connection refused
> > 
> > This patch doesn't change any error messages.  Which ones specifically
> > became less than perfect as a result of this patch?
> 
> But it adds error messages (for each retry), which are different from
> the first error message. As I said this is not the end of the world, but
> maybe a bit more confusing.

Ah, now I see what you mean: it adds reconnection attempts where there
used to be immediate failure return, so now every failed attempt logs
its own message.

> > > I guess this might be tolerable. On the other hand, the patch doesn't
> > > really fix anything either, but just gets rid of possible subtleties.
> > 
> > The remaining patches in the series make other errors beside -EPROTO
> > propagate up to this point, and some (most) of them are retryable.  This
> > was the reason to include this patch at the beginning of the series (I
> > guess I should've mentioned that in the patch log).
> 
> I see. I hadn't looked at the rest of the series yet because I ran out
> of time, but now that I'm skimming them, I see quite a few places that
> use non-EPROTO, but I wonder which of them actually should be
> reconnected. So far all I saw were presumably persistent errors where a
> retry won't help. Can you give me some examples?

E.g. the particular case you mention earlier, -ECONNREFUSED, is not
unlikely to happen due to the vhost-user server restart for maintenance;
in this case retying looks like a reasonable thing to do, doesn't it?

Thanks,
Roman.