Re: making a qdev bus available from a (non-qtree?) device

2021-05-11 Thread Philippe Mathieu-Daudé
On 5/11/21 8:17 PM, Klaus Jensen wrote:
> Hi all,
> 
> I need some help with grok'ing qdev busses. Stefan, Michael - David
> suggested on IRC that I CC'ed you guys since you might have solved a
> similar issue with virtio devices. I've tried to study how that works,
> but I'm not exactly sure how to apply it to the issue I'm having.

The experts on this topic are Peter/Markus/Eduardo.

> Currently, to support multiple namespaces on the emulated nvme device,
> one can do something like this:
> 
>   -device nvme,id=nvme-ctrl-0,serial=foo,...
>   -device nvme-ns,id=nvme-ns-0,bus=nvme-ctrl-0,...
>   -device nvme-ns,id-nvme-ns-1,bus=nvme-ctrl-0,...
> 
> The nvme device creates an 'nvme-bus' and the nvme-ns devices has
> dc->bus_type = TYPE_NVME_BUS. This all works very well and provides a
> nice overview in `info qtree`:
> 
>   bus: main-system-bus
>   type System
>     ...
>     dev: q35-pcihost, id ""
>   ..
>   bus: pcie.0
> type PCIE
> ..
> dev: nvme, id "nvme-ctrl-0"
>   ..
>   bus: nvme-ctrl-0
>     type nvme-bus
>     dev: nvme-ns, id "nvme-ns-0"
>   ..
>     dev: nvme-ns, id "nvme-ns-1"
>   ..
> 
> 
> Nice and qdevy.
> 
> We have since introduced support for NVM Subsystems through an
> nvme-subsys device. The nvme-subsys device is just a TYPE_DEVICE and
> does not show in `info qtree` (I wonder if this should actually just
> have been an -object?). Anyway. The nvme device has a 'subsys' link
> parameter and we use this to manage the namespaces across the subsystem
> that may contain several nvme devices (controllers). The problem is that
> this doesnt work too well with unplugging since if the nvme device is
> `device_del`'ed, the nvme-ns devices on the nvme-bus are unrealized
> which is not what we want. We really want the namespaces to linger,
> preferably on an nvme-bus of the nvme-subsys device so they can be
> attached to other nvme devices that may show up (or already exist) in
> the subsystem.

IIUC, while we can have unattached drives, we can't (by design) have
qdev unattached to qbus.

Not sure this is a good suggestion (bad design IMO) but you could add
a fake nvme qbus to hold the lingering nvme devices...

> The core problem I'm having is that I can't seem to create an nvme-bus
> from the nvme-subsys device and make it available to the nvme-ns device
> on the command line:
> 
>   -device nvme-subsys,id=nvme-subsys-0,...
>   -device nvme-ns,bus=nvme-subsys-0
> 
> The above results in 'No 'nvme-bus' bus found for device 'nvme-ns', even
> though I do `qbus_create_inplace()` just like the nvme device. However,
> I *can* reparent the nvme-ns device in its realize() method, so if I
> instead define it like so:
> 
>   -device nvme-subsys,id=nvme-subsys-0,...
>   -device nvme,id=nvme-ctrl-0,subsys=nvme-subsys-0
>   -device nvme-ns,bus=nvme-ctrl-0
> 
> I can then call `qdev_set_parent_bus()` and set the parent bus to the
> bus creates in the nvme-subsys device. This solves the problem since the
> namespaces are not "garbage collected" when the nvme device is removed,
> but it just feels wrong you know? Also, if possible, I'd of course
> really like to retain the nice entries in `info qtree`.
> 
> 
> Thanks,
> Klaus




Re: [PATCH 3/3] virtio-net: Constify VirtIOFeature feature_sizes[]

2021-05-11 Thread Jason Wang



在 2021/5/11 下午6:41, Philippe Mathieu-Daudé 写道:

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/net/virtio-net.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 66b9ff45118..6b7e8dd04ef 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -89,7 +89,7 @@
   VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \
   VIRTIO_NET_RSS_HASH_TYPE_UDP_EX)
  
-static VirtIOFeature feature_sizes[] = {

+static const VirtIOFeature feature_sizes[] = {
  {.flags = 1ULL << VIRTIO_NET_F_MAC,
   .end = endof(struct virtio_net_config, mac)},
  {.flags = 1ULL << VIRTIO_NET_F_STATUS,



Acked-by: Jason Wang 





Re: [PATCH 2/3] virtio-blk: Constify VirtIOFeature feature_sizes[]

2021-05-11 Thread Jason Wang



在 2021/5/11 下午6:41, Philippe Mathieu-Daudé 写道:

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/block/virtio-blk.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index d28979efb8d..f139cd7cc9c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -40,7 +40,7 @@
   * Starting from the discard feature, we can use this array to properly
   * set the config size depending on the features enabled.
   */
-static VirtIOFeature feature_sizes[] = {
+static const VirtIOFeature feature_sizes[] = {
  {.flags = 1ULL << VIRTIO_BLK_F_DISCARD,
   .end = endof(struct virtio_blk_config, discard_sector_alignment)},
  {.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,



Acked-by: Jason Wang 





Re: [PATCH 1/3] hw/virtio: Pass virtio_feature_get_config_size() a const argument

2021-05-11 Thread Jason Wang



在 2021/5/11 下午6:41, Philippe Mathieu-Daudé 写道:

The VirtIOFeature structure isn't modified, mark it const.

Signed-off-by: Philippe Mathieu-Daudé 



Acked-by: Jason Wang 



---
  include/hw/virtio/virtio.h | 2 +-
  hw/virtio/virtio.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b7ece7a6a89..8bab9cfb750 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -43,7 +43,7 @@ typedef struct VirtIOFeature {
  size_t end;
  } VirtIOFeature;
  
-size_t virtio_feature_get_config_size(VirtIOFeature *features,

+size_t virtio_feature_get_config_size(const VirtIOFeature *features,
uint64_t host_features);
  
  typedef struct VirtQueue VirtQueue;

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 9e13cb9e3ad..e02544b2df7 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2981,7 +2981,7 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
  return ret;
  }
  
-size_t virtio_feature_get_config_size(VirtIOFeature *feature_sizes,

+size_t virtio_feature_get_config_size(const VirtIOFeature *feature_sizes,
uint64_t host_features)
  {
  size_t config_size = 0;





Re: [PATCH v4 06/11] block: make BlockLimits::max_pwrite_zeroes 64bit

2021-05-11 Thread Eric Blake
On 3/24/21 3:51 PM, Vladimir Sementsov-Ogievskiy wrote:
> We are going to support 64 bit write-zeroes requests. Now update the
> limit variable. It's absolutely safe. The variable is set in some
> drivers, and used in bdrv_co_do_pwrite_zeroes().
> 
> Update also max_write_zeroes variable in bdrv_co_do_pwrite_zeroes(), so
> that bdrv_co_do_pwrite_zeroes() is now prepared to 64bit requests. The
> remaining logic including num, offset and bytes variables is already
> supporting 64bit requests.
> 
> So the only thing that prevents 64 bit requests is limiting
> max_write_zeroes variable to INT_MAX in bdrv_co_do_pwrite_zeroes().
> We'll drop this limitation after updating all block drivers.
> 
> Ah, we also have bdrv_check_request32() in bdrv_co_pwritev_part(). It
> will be modified to do bdrv_check_request() for write-zeroes path.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/block_int.h | 7 +++
>  block/io.c| 2 +-
>  2 files changed, 4 insertions(+), 5 deletions(-)
> 

> +++ b/include/block/block_int.h
> @@ -676,10 +676,9 @@ typedef struct BlockLimits {
>   * that is set. May be 0 if bl.request_alignment is good enough */
>  uint32_t pdiscard_alignment;
>  
> -/* Maximum number of bytes that can zeroized at once (since it is
> - * signed, it must be < 2G, if set). Must be multiple of
> - * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
> -int32_t max_pwrite_zeroes;
> +/* Maximum number of bytes that can zeroized at once. Must be multiple of
> + * pwrite_zeroes_alignment. May be 0 if no inherent 64-bit limit */

Is the comment still right?

Leaving as 0 is the easiest way for a driver to say "default limit", but
I would feel safer with the default being 2G-align rather than 63-bit
limit.  And it is a 63-bit limit, not 64-bit, if the driver opts in to
INT64_MAX.

> +int64_t max_pwrite_zeroes;
>  
>  /* Optimal alignment for write zeroes requests in bytes. A power
>   * of 2 is best but not mandatory.  Must be a multiple of
> diff --git a/block/io.c b/block/io.c
> index 55095dd08e..79e600af27 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1836,7 +1836,7 @@ static int coroutine_fn 
> bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>  int head = 0;
>  int tail = 0;
>  
> -int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
> +int64_t max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, 
> INT_MAX);

You are correct that for now we have no behavior change; a driver opting
in to a larger limit will still be clamped until we revisit this patch
later to drop the MIN() - but I agree with your approach of keeping
MIN() here until all drivers are audited.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v4 05/11] block: use int64_t instead of uint64_t in copy_range driver handlers

2021-05-11 Thread Eric Blake
On 3/24/21 3:51 PM, Vladimir Sementsov-Ogievskiy wrote:
> We are generally moving to int64_t for both offset and bytes parameters
> on all io paths.
> 
> Main motivation is realization of 64-bit write_zeroes operation for
> fast zeroing large disk chunks, up to the whole disk.
> 
> We chose signed type, to be consistent with off_t (which is signed) and
> with possibility for signed return type (where negative value means
> error).
> 
> So, convert driver copy_range handlers parameters which are already
> 64bit to signed type.
> 
> Now let's consider all callers. Simple
> 
>   git grep '\->bdrv_co_copy_range'
> 
> shows the only caller:
> 
>   bdrv_co_copy_range_internal(), which doesn bdrv_check_request32(),

s/doesn/does/

>   so everything is OK.
> 
> Still, the functions may be called directly, not only by drv->...
> Let's check:
> 
> git grep '\.bdrv_co_copy_range_\(from\|to\)\s*=' | \
> awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \
> while read func; do git grep "$func(" | \
> grep -v "$func(BlockDriverState"; done
> 
> shows no more callers. So, we are done.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/block_int.h | 12 ++--
>  block/file-posix.c| 10 +-
>  block/iscsi.c | 12 ++--
>  block/qcow2.c | 12 ++--
>  block/raw-format.c| 16 
>  5 files changed, 31 insertions(+), 31 deletions(-)

Fewer drivers implement this, so easier review :)

> +++ b/block/qcow2.c
> @@ -3975,9 +3975,9 @@ static coroutine_fn int 
> qcow2_co_pdiscard(BlockDriverState *bs,
>  
>  static int coroutine_fn
>  qcow2_co_copy_range_from(BlockDriverState *bs,
> - BdrvChild *src, uint64_t src_offset,
> - BdrvChild *dst, uint64_t dst_offset,
> - uint64_t bytes, BdrvRequestFlags read_flags,
> + BdrvChild *src, int64_t src_offset,
> + BdrvChild *dst, int64_t dst_offset,
> + int64_t bytes, BdrvRequestFlags read_flags,
>   BdrvRequestFlags write_flags)
>  {
>  BDRVQcow2State *s = bs->opaque;

The use of cur_bytes = MIN(bytes, INT_MAX) looks odd, when we could
instead clamp to a nicer aligned boundary.  As noted before,
qcow2_get_host_offset() then further clamps cur_bytes, and the caller is
already splitting up requests larger than 2G, but copy_from is one of
those interfaces that is actually designed to have potentially nice
speedups with large byte ranges within the same filesystem (faster than
what is possible with usual pread/pwrite).  Then again, that's probably
more for file-posix (where we know the bytes are contiguous) than qcow2
(where we do have to worry about whether clusters are contiguous), so
we'll still have to keep a while(bytes) fragmenting loop in this
function.  Thoughts for a future patch, doesn't affect correctness of
this one.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 18/33] nbd/client-connection: shutdown connection on release

2021-05-11 Thread Roman Kagan
On Fri, Apr 16, 2021 at 11:08:56AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Now, when thread can do negotiation and retry, it may run relatively
> long. We need a mechanism to stop it, when user is not interested in
> result anymore. So, on nbd_client_connection_release() let's shutdown
> the socked, and do not retry connection if thread is detached.

This kinda answers my question to the previous patch about reconnect
cancellation.

> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  nbd/client-connection.c | 36 ++--
>  1 file changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/nbd/client-connection.c b/nbd/client-connection.c
> index 002bd91f42..54f73c6c24 100644
> --- a/nbd/client-connection.c
> +++ b/nbd/client-connection.c
> @@ -158,9 +158,13 @@ static void *connect_thread_func(void *opaque)
>  uint64_t timeout = 1;
>  uint64_t max_timeout = 16;
>  
> -while (true) {
> +qemu_mutex_lock(>mutex);
> +while (!conn->detached) {
> +assert(!conn->sioc);
>  conn->sioc = qio_channel_socket_new();
>  
> +qemu_mutex_unlock(>mutex);
> +
>  error_free(conn->err);
>  conn->err = NULL;
>  conn->updated_info = conn->initial_info;
> @@ -171,14 +175,20 @@ static void *connect_thread_func(void *opaque)
>  conn->updated_info.x_dirty_bitmap = NULL;
>  conn->updated_info.name = NULL;
>  
> +qemu_mutex_lock(>mutex);
> +
>  if (ret < 0) {
>  object_unref(OBJECT(conn->sioc));
>  conn->sioc = NULL;
> -if (conn->do_retry) {
> +if (conn->do_retry && !conn->detached) {
> +qemu_mutex_unlock(>mutex);
> +
>  sleep(timeout);
>  if (timeout < max_timeout) {
>  timeout *= 2;
>  }
> +
> +qemu_mutex_lock(>mutex);
>  continue;
>  }
>  }
> @@ -186,15 +196,17 @@ static void *connect_thread_func(void *opaque)
>  break;
>  }
>  
> -WITH_QEMU_LOCK_GUARD(>mutex) {
> -assert(conn->running);
> -conn->running = false;
> -if (conn->wait_co) {
> -aio_co_schedule(NULL, conn->wait_co);
> -conn->wait_co = NULL;
> -}
> -do_free = conn->detached;
> +/* mutex is locked */
> +
> +assert(conn->running);
> +conn->running = false;
> +if (conn->wait_co) {
> +aio_co_schedule(NULL, conn->wait_co);
> +conn->wait_co = NULL;
>  }
> +do_free = conn->detached;
> +
> +qemu_mutex_unlock(>mutex);

This basically reverts some hunks from patch 15 "nbd/client-connection:
use QEMU_LOCK_GUARD".  How about dropping them there (they weren't an
obvious improvement even then).

Roman.

>  
>  if (do_free) {
>  nbd_client_connection_do_free(conn);
> @@ -215,6 +227,10 @@ void nbd_client_connection_release(NBDClientConnection 
> *conn)
>  if (conn->running) {
>  conn->detached = true;
>  }
> +if (conn->sioc) {
> +qio_channel_shutdown(QIO_CHANNEL(conn->sioc),
> + QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> +}
>  do_free = !conn->running && !conn->detached;
>  qemu_mutex_unlock(>mutex);
>  
> -- 
> 2.29.2
> 



Re: [PATCH v4 04/11] block: use int64_t instead of uint64_t in driver write handlers

2021-05-11 Thread Eric Blake
On 3/24/21 3:51 PM, Vladimir Sementsov-Ogievskiy wrote:
> We are generally moving to int64_t for both offset and bytes parameters
> on all io paths.
> 
> Main motivation is realization of 64-bit write_zeroes operation for
> fast zeroing large disk chunks, up to the whole disk.
> 
> We chose signed type, to be consistent with off_t (which is signed) and
> with possibility for signed return type (where negative value means
> error).
> 
> So, convert driver write handlers parameters which are already 64bit to
> signed type.
> 
> While being here, convert also flags parameter to be BdrvRequestFlags.
> 
> Now let's consider all callers. Simple
> 
>   git grep '\->bdrv_\(aio\|co\)_pwritev\(_part\)\?'
> 
> shows that's there two callers of driver function:
> 
>  bdrv_driver_pwritev() in block/io.c, passes int64_t, checked by
>bdrv_check_qiov_request() to be non-negative.
> 
>  qcow2_save_vmstate() does bdrv_check_qiov_request().
> 
> Still, the functions may be called directly, not only by drv->...
> Let's check:
> 
> git grep '\.bdrv_\(aio\|co\)_pwritev\(_part\)\?\s*=' | \
> awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \
> while read func; do git grep "$func(" | \
> grep -v "$func(BlockDriverState"; done
> 
> shows several callers:

...
Thanks for recording the audit.

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/block_int.h| 16 
>  block/backup-top.c   |  6 +++---

>  30 files changed, 95 insertions(+), 86 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index e6bcf74e46..928369e0bc 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -237,8 +237,8 @@ struct BlockDriver {
>  int64_t offset, int64_t bytes, QEMUIOVector *qiov,
>  BdrvRequestFlags flags, BlockCompletionFunc *cb, void *opaque);
>  BlockAIOCB *(*bdrv_aio_pwritev)(BlockDriverState *bs,
> -uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags,
> -BlockCompletionFunc *cb, void *opaque);
> +int64_t offset, int64_t bytes, QEMUIOVector *qiov,
> +BdrvRequestFlags flags, BlockCompletionFunc *cb, void *opaque);
>  BlockAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
>  BlockCompletionFunc *cb, void *opaque);
>  BlockAIOCB *(*bdrv_aio_pdiscard)(BlockDriverState *bs,
> @@ -287,10 +287,11 @@ struct BlockDriver {
>   * The buffer in @qiov may point directly to guest memory.
>   */
>  int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs,
> -uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
> +int64_t offset, int64_t bytes, QEMUIOVector *qiov,
> +BdrvRequestFlags flags);
>  int coroutine_fn (*bdrv_co_pwritev_part)(BlockDriverState *bs,
> -uint64_t offset, uint64_t bytes,
> -QEMUIOVector *qiov, size_t qiov_offset, int flags);
> +int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t 
> qiov_offset,
> +BdrvRequestFlags flags);
>  
>  /*
>   * Efficiently zero a region of the disk image.  Typically an image 
> format
> @@ -428,10 +429,9 @@ struct BlockDriver {
>Error **errp);
>  
>  int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs,
> -uint64_t offset, uint64_t bytes, QEMUIOVector *qiov);
> +int64_t offset, int64_t bytes, QEMUIOVector *qiov);
>  int coroutine_fn (*bdrv_co_pwritev_compressed_part)(BlockDriverState *bs,
> -uint64_t offset, uint64_t bytes, QEMUIOVector *qiov,
> -size_t qiov_offset);
> +int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t 
> qiov_offset);

Someday it may be nice to convert remaining drivers of the old callbacks
to use the newer ones instead, for fewer callbacks to manage here.  But
not the problem of this patch.

> +++ b/block/backup-top.c
> @@ -97,9 +97,9 @@ static int coroutine_fn 
> backup_top_co_pwrite_zeroes(BlockDriverState *bs,
>  }
>  
>  static coroutine_fn int backup_top_co_pwritev(BlockDriverState *bs,
> -  uint64_t offset,
> -  uint64_t bytes,
> -  QEMUIOVector *qiov, int flags)
> +  int64_t offset, int64_t bytes,
> +  QEMUIOVector *qiov,
> +  BdrvRequestFlags flags)
>  {
>  int ret = backup_top_cbw(bs, offset, bytes, flags);

We should clean up the signature of backup_top_cbw() in a followup, now
that pdiscard, pwrite_zeroes, pwritev all pass int64_t.

> +++ b/block/blkdebug.c
> @@ -635,8 +635,8 @@ blkdebug_co_preadv(BlockDriverState *bs, int64_t offset, 
> int64_t bytes,
>  }
>  
>  static int coroutine_fn
> -blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
> -QEMUIOVector *qiov, int 

Re: [PATCH v3 17/33] nbd/client-connection: implement connection retry

2021-05-11 Thread Roman Kagan
On Fri, Apr 16, 2021 at 11:08:55AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Add an option for thread to retry connection until success. We'll use
> nbd/client-connection both for reconnect and for initial connection in
> nbd_open(), so we need a possibility to use same NBDClientConnection
> instance to connect once in nbd_open() and then use retry semantics for
> reconnect.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/nbd.h |  2 ++
>  nbd/client-connection.c | 55 +
>  2 files changed, 41 insertions(+), 16 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 5d86e6a393..5bb54d831c 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -409,6 +409,8 @@ const char *nbd_err_lookup(int err);
>  /* nbd/client-connection.c */
>  typedef struct NBDClientConnection NBDClientConnection;
>  
> +void nbd_client_connection_enable_retry(NBDClientConnection *conn);
> +
>  NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,
> bool do_negotiation,
> const char *export_name,
> diff --git a/nbd/client-connection.c b/nbd/client-connection.c
> index ae4a77f826..002bd91f42 100644
> --- a/nbd/client-connection.c
> +++ b/nbd/client-connection.c
> @@ -36,6 +36,8 @@ struct NBDClientConnection {
>  NBDExportInfo initial_info;
>  bool do_negotiation;
>  
> +bool do_retry;
> +
>  /*
>   * Result of last attempt. Valid in FAIL and SUCCESS states.
>   * If you want to steal error, don't forget to set pointer to NULL.
> @@ -52,6 +54,15 @@ struct NBDClientConnection {
>  Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */
>  };
>  
> +/*
> + * The function isn't protected by any mutex, so call it when thread is not
> + * running.
> + */
> +void nbd_client_connection_enable_retry(NBDClientConnection *conn)
> +{
> +conn->do_retry = true;
> +}
> +
>  NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,
> bool do_negotiation,
> const char *export_name,
> @@ -144,24 +155,37 @@ static void *connect_thread_func(void *opaque)
>  NBDClientConnection *conn = opaque;
>  bool do_free;
>  int ret;
> +uint64_t timeout = 1;
> +uint64_t max_timeout = 16;
> +
> +while (true) {
> +conn->sioc = qio_channel_socket_new();
> +
> +error_free(conn->err);
> +conn->err = NULL;
> +conn->updated_info = conn->initial_info;
> +
> +ret = nbd_connect(conn->sioc, conn->saddr,
> +  conn->do_negotiation ? >updated_info : NULL,
> +  conn->tlscreds, >ioc, >err);
> +conn->updated_info.x_dirty_bitmap = NULL;
> +conn->updated_info.name = NULL;
> +
> +if (ret < 0) {
> +object_unref(OBJECT(conn->sioc));
> +conn->sioc = NULL;
> +if (conn->do_retry) {
> +sleep(timeout);
> +if (timeout < max_timeout) {
> +timeout *= 2;
> +}
> +continue;
> +}
> +}

How is it supposed to get canceled?

Roman.

> -conn->sioc = qio_channel_socket_new();
> -
> -error_free(conn->err);
> -conn->err = NULL;
> -conn->updated_info = conn->initial_info;
> -
> -ret = nbd_connect(conn->sioc, conn->saddr,
> -  conn->do_negotiation ? >updated_info : NULL,
> -  conn->tlscreds, >ioc, >err);
> -if (ret < 0) {
> -object_unref(OBJECT(conn->sioc));
> -conn->sioc = NULL;
> +break;
>  }
>  
> -conn->updated_info.x_dirty_bitmap = NULL;
> -conn->updated_info.name = NULL;
> -
>  WITH_QEMU_LOCK_GUARD(>mutex) {
>  assert(conn->running);
>  conn->running = false;
> @@ -172,7 +196,6 @@ static void *connect_thread_func(void *opaque)
>  do_free = conn->detached;
>  }
>  
> -
>  if (do_free) {
>  nbd_client_connection_do_free(conn);
>  }
> -- 
> 2.29.2
> 



Re: [PATCH v4 03/11] block: use int64_t instead of uint64_t in driver read handlers

2021-05-11 Thread Eric Blake
On 3/24/21 3:51 PM, Vladimir Sementsov-Ogievskiy wrote:
> We are going to convert .bdrv_co_preadv_part and .bdrv_co_pwritev_part
> to int64_t type for offset and bytes parameters (as it's already done
> for generic block/io.c layer).
> 
> In qcow2 .bdrv_co_preadv_part is used in some places, so let's add
> corresponding checks and assertions.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Unusual line, especially since...

> 
> block: use int64_t instead of uint64_t in driver read handlers
> 
> We are generally moving to int64_t for both offset and bytes parameters
> on all io paths.
> 
> Main motivation is realization of 64-bit write_zeroes operation for
> fast zeroing large disk chunks, up to the whole disk.
> 
> We chose signed type, to be consistent with off_t (which is signed) and
> with possibility for signed return type (where negative value means
> error).
> 
> So, convert driver read handlers parameters which are already 64bit to
> signed type.
> 
> While being here, convert also flags parameter to be BdrvRequestFlags.
> 
> Now let's consider all callers. Simple
> 
>   git grep '\->bdrv_\(aio\|co\)_preadv\(_part\)\?'
> 
> shows that's there three callers of driver function:
> 
>  bdrv_driver_preadv() in block/io.c, passes int64_t, checked by
>bdrv_check_qiov_request() to be non-negative.
> 
>  qcow2_load_vmstate() does bdrv_check_qiov_request().
> 
>  do_perform_cow_read() has uint64_t argument. And a lot of things in
>  qcow2 driver are uint64_t, so converting it is big job. But we must
>  not work with requests that doesn't satisfy bdrv_check_qiov_request(),

s/doesn't/don't/

>  so let's just assert it here.
> 
> Still, the functions may be called directly, not only by drv->...
> Let's check:
> 
> git grep '\.bdrv_\(aio\|co\)_preadv\(_part\)\?\s*=' | \
> awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \
> while read func; do git grep "$func(" | \
> grep -v "$func(BlockDriverState"; done
> 
> The only one such caller:
> 
> QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, , 1);
> ...
> ret = bdrv_replace_test_co_preadv(bs, 0, 1, , 0);
> 
> in tesTS/unit/test-bdrv-drain.c, and it's OK obviously.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

...it is repeated here. I'm fine dropping the first.

> ---
>  include/block/block_int.h| 11 ++-
>  block/backup-top.c   |  4 ++--

>  35 files changed, 120 insertions(+), 89 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index db7a909ea9..e6bcf74e46 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -234,8 +234,8 @@ struct BlockDriver {
>  
>  /* aio */
>  BlockAIOCB *(*bdrv_aio_preadv)(BlockDriverState *bs,
> -uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags,
> -BlockCompletionFunc *cb, void *opaque);
> +int64_t offset, int64_t bytes, QEMUIOVector *qiov,
> +BdrvRequestFlags flags, BlockCompletionFunc *cb, void *opaque);
>  BlockAIOCB *(*bdrv_aio_pwritev)(BlockDriverState *bs,
>  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags,
>  BlockCompletionFunc *cb, void *opaque);
> @@ -264,10 +264,11 @@ struct BlockDriver {
>   * The buffer in @qiov may point directly to guest memory.
>   */
>  int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs,
> -uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
> +int64_t offset, int64_t bytes, QEMUIOVector *qiov,
> +BdrvRequestFlags flags);
>  int coroutine_fn (*bdrv_co_preadv_part)(BlockDriverState *bs,
> -uint64_t offset, uint64_t bytes,
> -QEMUIOVector *qiov, size_t qiov_offset, int flags);
> +int64_t offset, int64_t bytes,
> +QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags);

Lots of fallout, but I'm in favor of this signature change.


> +++ b/block/blkdebug.c
> @@ -614,8 +614,8 @@ static int rule_check(BlockDriverState *bs, uint64_t 
> offset, uint64_t bytes,
>  }
>  
>  static int coroutine_fn
> -blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
> -   QEMUIOVector *qiov, int flags)
> +blkdebug_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
> +   QEMUIOVector *qiov, BdrvRequestFlags flags)
>  {
>  int err;

Still calls rule_check() with uint64_t parameters, but since we've
asserted the caller was within range, no behavior change.

> +++ b/block/blkverify.c
> @@ -221,8 +221,8 @@ blkverify_co_prwv(BlockDriverState *bs, BlkverifyRequest 
> *r, uint64_t offset,
>  }
>  
>  static int coroutine_fn
> -blkverify_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
> -QEMUIOVector *qiov, int flags)
> +blkverify_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
> +QEMUIOVector *qiov, BdrvRequestFlags flags)
>  {

Similarly for the call to blkverify_co_prwv(), and probably 

Re: [PATCH] docs: add table of contents to QAPI references

2021-05-11 Thread Connor Kuehl
On 5/11/21 4:25 AM, Daniel P. Berrangé wrote:
> The QAPI reference docs for the guest agent, storage daemon and QMP are
> all rather long and hard to navigate unless you already know the name of
> the command and can do full text search for it.
> 
> A table of contents in each doc will help people locate stuff much more
> easily.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---

This looks so much better!

Reviewed-by: Connor Kuehl 




making a qdev bus available from a (non-qtree?) device

2021-05-11 Thread Klaus Jensen

Hi all,

I need some help with grok'ing qdev busses. Stefan, Michael - David 
suggested on IRC that I CC'ed you guys since you might have solved a 
similar issue with virtio devices. I've tried to study how that works, 
but I'm not exactly sure how to apply it to the issue I'm having.


Currently, to support multiple namespaces on the emulated nvme device, 
one can do something like this:


  -device nvme,id=nvme-ctrl-0,serial=foo,...
  -device nvme-ns,id=nvme-ns-0,bus=nvme-ctrl-0,...
  -device nvme-ns,id-nvme-ns-1,bus=nvme-ctrl-0,...

The nvme device creates an 'nvme-bus' and the nvme-ns devices has 
dc->bus_type = TYPE_NVME_BUS. This all works very well and provides a 
nice overview in `info qtree`:


  bus: main-system-bus
  type System
...
dev: q35-pcihost, id ""
  ..
  bus: pcie.0
type PCIE
..
dev: nvme, id "nvme-ctrl-0"
  ..
  bus: nvme-ctrl-0
type nvme-bus
dev: nvme-ns, id "nvme-ns-0"
  ..
dev: nvme-ns, id "nvme-ns-1"
  ..


Nice and qdevy.

We have since introduced support for NVM Subsystems through an 
nvme-subsys device. The nvme-subsys device is just a TYPE_DEVICE and 
does not show in `info qtree` (I wonder if this should actually just 
have been an -object?). Anyway. The nvme device has a 'subsys' link 
parameter and we use this to manage the namespaces across the subsystem 
that may contain several nvme devices (controllers). The problem is that 
this doesnt work too well with unplugging since if the nvme device is 
`device_del`'ed, the nvme-ns devices on the nvme-bus are unrealized 
which is not what we want. We really want the namespaces to linger, 
preferably on an nvme-bus of the nvme-subsys device so they can be 
attached to other nvme devices that may show up (or already exist) in 
the subsystem.


The core problem I'm having is that I can't seem to create an nvme-bus 
from the nvme-subsys device and make it available to the nvme-ns device 
on the command line:


  -device nvme-subsys,id=nvme-subsys-0,...
  -device nvme-ns,bus=nvme-subsys-0

The above results in 'No 'nvme-bus' bus found for device 'nvme-ns', even 
though I do `qbus_create_inplace()` just like the nvme device. However, 
I *can* reparent the nvme-ns device in its realize() method, so if I 
instead define it like so:


  -device nvme-subsys,id=nvme-subsys-0,...
  -device nvme,id=nvme-ctrl-0,subsys=nvme-subsys-0
  -device nvme-ns,bus=nvme-ctrl-0

I can then call `qdev_set_parent_bus()` and set the parent bus to the 
bus creates in the nvme-subsys device. This solves the problem since the 
namespaces are not "garbage collected" when the nvme device is removed, 
but it just feels wrong you know? Also, if possible, I'd of course 
really like to retain the nice entries in `info qtree`.



Thanks,
Klaus


signature.asc
Description: PGP signature


Re: [PATCH v4 02/11] qcow2: check request on vmstate save/load path

2021-05-11 Thread Eric Blake
On 3/24/21 3:51 PM, Vladimir Sementsov-Ogievskiy wrote:
> We modify the request by adding an offset to vmstate. Let's check the
> modified request. It will help us to safely move .bdrv_co_preadv_part
> and .bdrv_co_pwritev_part to int64_t type of offset and bytes.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/block_int.h |  3 +++
>  block/io.c|  6 +++---
>  block/qcow2.c | 43 +--
>  3 files changed, 43 insertions(+), 9 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v4 01/11] block/io: bring request check to bdrv_co_{read,write}v_vmstate

2021-05-11 Thread Eric Blake
On 3/24/21 3:51 PM, Vladimir Sementsov-Ogievskiy wrote:
> There are only two drivers supporting vmstate: qcow2 and sheepdog.
> Sheepdog is deprecated. In qcow2 these requests go through
> .bdrv_co_p{read,write}v_part handlers.
> 
> So, let's do our basic check for the request on vmstate generic
> handlers.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/io.c | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 10/10] qcow2-refcount: check_refblocks(): add separate message for reserved

2021-05-11 Thread Kirill Tkhai
On 05.05.2021 09:59, Vladimir Sementsov-Ogievskiy wrote:
> Split checking for reserved bits out of aligned offset check.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Eric Blake 

Tested-by: Kirill Tkhai 

> ---
>  block/qcow2.h  |  1 +
>  block/qcow2-refcount.c | 10 +-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 58fd7f1678..fd48a89d45 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -591,6 +591,7 @@ typedef enum QCow2MetadataOverlap {
>  #define L2E_STD_RESERVED_MASK 0x3f0001feULL
>  
>  #define REFT_OFFSET_MASK 0xfe00ULL
> +#define REFT_RESERVED_MASK 0x1ffULL
>  
>  #define INV_OFFSET (-1ULL)
>  
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 15c4f6b075..472a7026db 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -2091,9 +2091,17 @@ static int check_refblocks(BlockDriverState *bs, 
> BdrvCheckResult *res,
>  
>  for(i = 0; i < s->refcount_table_size; i++) {
>  uint64_t offset, cluster;
> -offset = s->refcount_table[i];
> +offset = s->refcount_table[i] & REFT_OFFSET_MASK;
>  cluster = offset >> s->cluster_bits;
>  
> +if (s->refcount_table[i] & REFT_RESERVED_MASK) {
> +fprintf(stderr, "ERROR refcount table entry %" PRId64 " has "
> +"reserved bits set\n", i);
> +res->corruptions++;
> +*rebuild = true;
> +continue;
> +}
> +
>  /* Refcount blocks are cluster aligned */
>  if (offset_into_cluster(s, offset)) {
>  fprintf(stderr, "ERROR refcount block %" PRId64 " is not "
> 




Re: [PATCH] hmp: Fix loadvm to resume the VM on success instead of failure

2021-05-11 Thread Daniel P . Berrangé
On Tue, May 11, 2021 at 06:58:01PM +0200, Kevin Wolf wrote:
> Am 11.05.2021 um 18:49 hat Daniel P. Berrangé geschrieben:
> > On Tue, May 11, 2021 at 06:31:51PM +0200, Kevin Wolf wrote:
> > > Commit f61fe11aa6f broke hmp_loadvm() by adding an incorrect negation
> > > when converting from 0/-errno return values to a bool value. The result
> > > is that loadvm resumes the VM now if it failed and keeps it stopped if
> > > it failed. Fix it to restore the old behaviour and do it the other way
> > > around.
> > > 
> > > Fixes: f61fe11aa6f7f8f0ffe4ddaa56a8108f3ab57854
> > > Cc: qemu-sta...@nongnu.org
> > > Reported-by: Yanhui Ma 
> > > Signed-off-by: Kevin Wolf 
> > > ---
> > >  monitor/hmp-cmds.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> > > index 0ad5b77477..cc15d9b6ee 100644
> > > --- a/monitor/hmp-cmds.c
> > > +++ b/monitor/hmp-cmds.c
> > > @@ -1133,7 +1133,7 @@ void hmp_loadvm(Monitor *mon, const QDict *qdict)
> > >  
> > >  vm_stop(RUN_STATE_RESTORE_VM);
> > >  
> > > -if (!load_snapshot(name, NULL, false, NULL, ) && 
> > > saved_vm_running) {
> > > +if (load_snapshot(name, NULL, false, NULL, ) && 
> > > saved_vm_running) {
> > >  vm_start();
> > >  }
> > >  hmp_handle_error(mon, err);
> > 
> > Paolo had sent a different fix here:
> > 
> >   https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg01093.html
> 
> Hm... I missed that patch, but doesn't it just generalise the buggy HMP
> code instead of fixing it? That is, we still resume the VM on failure
> rather than on success?

Yes, if I use my iotest patch on Paolo's patch it shows that it is
still broken.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] hmp: Fix loadvm to resume the VM on success instead of failure

2021-05-11 Thread Kevin Wolf
Am 11.05.2021 um 18:49 hat Daniel P. Berrangé geschrieben:
> On Tue, May 11, 2021 at 06:31:51PM +0200, Kevin Wolf wrote:
> > Commit f61fe11aa6f broke hmp_loadvm() by adding an incorrect negation
> > when converting from 0/-errno return values to a bool value. The result
> > is that loadvm resumes the VM now if it failed and keeps it stopped if
> > it failed. Fix it to restore the old behaviour and do it the other way
> > around.
> > 
> > Fixes: f61fe11aa6f7f8f0ffe4ddaa56a8108f3ab57854
> > Cc: qemu-sta...@nongnu.org
> > Reported-by: Yanhui Ma 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  monitor/hmp-cmds.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> > index 0ad5b77477..cc15d9b6ee 100644
> > --- a/monitor/hmp-cmds.c
> > +++ b/monitor/hmp-cmds.c
> > @@ -1133,7 +1133,7 @@ void hmp_loadvm(Monitor *mon, const QDict *qdict)
> >  
> >  vm_stop(RUN_STATE_RESTORE_VM);
> >  
> > -if (!load_snapshot(name, NULL, false, NULL, ) && saved_vm_running) 
> > {
> > +if (load_snapshot(name, NULL, false, NULL, ) && saved_vm_running) {
> >  vm_start();
> >  }
> >  hmp_handle_error(mon, err);
> 
> Paolo had sent a different fix here:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg01093.html

Hm... I missed that patch, but doesn't it just generalise the buggy HMP
code instead of fixing it? That is, we still resume the VM on failure
rather than on success?

> As with my feedback there, I think we should be adding test coverage
> when fixing this. How about this:
> [...]

Yes, this looks good.

Kevin




Re: [PATCH v2 09/10] qcow2-refcount: check_refcounts_l1(): check reserved bits

2021-05-11 Thread Kirill Tkhai
On 05.05.2021 09:59, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2.h  | 1 +
>  block/qcow2-refcount.c | 6 ++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index b8b1093b61..58fd7f1678 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -586,6 +586,7 @@ typedef enum QCow2MetadataOverlap {
>  (QCOW2_OL_CACHED | QCOW2_OL_INACTIVE_L2)
>  
>  #define L1E_OFFSET_MASK 0x00fffe00ULL
> +#define L1E_RESERVED_MASK 0x7f0001ffULL
>  #define L2E_OFFSET_MASK 0x00fffe00ULL
>  #define L2E_STD_RESERVED_MASK 0x3f0001feULL
>  
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 69294a94fe..15c4f6b075 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1904,6 +1904,12 @@ static int check_refcounts_l1(BlockDriverState *bs,
>  continue;
>  }
>  
> +if (l1_table[i] & L1E_RESERVED_MASK) {
> +fprintf(stderr, "ERROR found L1 entry with reserved bits set: "
> +"%" PRIx64, l1_table[i]);

'\n' is missed.

The rest is OK for me.

Tested-by: Kirill Tkhai 

> +res->corruptions++;
> +}
> +
>  l2_offset = l1_table[i] & L1E_OFFSET_MASK;
>  
>  /* Mark L2 table as used */
> 




Re: [PATCH v2 07/10] qcow2-refcount: check_refcounts_l2(): check reserved bits

2021-05-11 Thread Kirill Tkhai
On 05.05.2021 09:59, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2.h  |  1 +
>  block/qcow2-refcount.c | 12 +++-
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index c0e1e83796..b8b1093b61 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -587,6 +587,7 @@ typedef enum QCow2MetadataOverlap {
>  
>  #define L1E_OFFSET_MASK 0x00fffe00ULL
>  #define L2E_OFFSET_MASK 0x00fffe00ULL
> +#define L2E_STD_RESERVED_MASK 0x3f0001feULL
>  
>  #define REFT_OFFSET_MASK 0xfe00ULL
>  
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 062ec48a15..47cc82449b 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1682,8 +1682,18 @@ static int check_refcounts_l2(BlockDriverState *bs, 
> BdrvCheckResult *res,
>  int csize;
>  l2_entry = get_l2_entry(s, l2_table, i);
>  uint64_t l2_bitmap = get_l2_bitmap(s, l2_table, i);
> +QCow2ClusterType type = qcow2_get_cluster_type(bs, l2_entry);
>  
> -switch (qcow2_get_cluster_type(bs, l2_entry)) {
> +if (type != QCOW2_CLUSTER_COMPRESSED) {
> +/* Check reserved bits of Standard Cluster Descriptor */
> +if (l2_entry & L2E_STD_RESERVED_MASK) {
> +fprintf(stderr, "ERROR found l2 entry with reserved bits 
> set: "
> +"%" PRIx64, l2_entry);

'\n' is missed.

The rest is OK for me.

Tested-by: Kirill Tkhai 

> +res->corruptions++;
> +}
> +}
> +
> +switch (type) {
>  case QCOW2_CLUSTER_COMPRESSED:
>  /* Compressed clusters don't have QCOW_OFLAG_COPIED */
>  if (l2_entry & QCOW_OFLAG_COPIED) {
> 




Re: [PATCH 0/3] hw/virtio: Constify VirtIOFeature

2021-05-11 Thread Richard Henderson

On 5/11/21 5:41 AM, Philippe Mathieu-Daudé wrote:

Philippe Mathieu-Daudé (3):
   hw/virtio: Pass virtio_feature_get_config_size() a const argument
   virtio-blk: Constify VirtIOFeature feature_sizes[]
   virtio-net: Constify VirtIOFeature feature_sizes[]


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 06/10] qcow2-refcount: check_refcounts_l2(): check l2_bitmap

2021-05-11 Thread Kirill Tkhai
On 05.05.2021 09:59, Vladimir Sementsov-Ogievskiy wrote:
> Check subcluster bitmap of the l2 entry for different types of
> clusters:
> 
>  - for compressed it must be zero
>  - for allocated check consistency of two parts of the bitmap
>  - for unallocated all subclusters should be unallocated
>(or zero-plain)
> 
> For unallocated clusters we can safely fix the entry by making it
> zero-plain.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Eric Blake 

Tested-by: Kirill Tkhai 

> ---
>  block/qcow2-refcount.c | 30 +-
>  1 file changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index f48c5e1b5d..062ec48a15 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1681,6 +1681,7 @@ static int check_refcounts_l2(BlockDriverState *bs, 
> BdrvCheckResult *res,
>  uint64_t coffset;
>  int csize;
>  l2_entry = get_l2_entry(s, l2_table, i);
> +uint64_t l2_bitmap = get_l2_bitmap(s, l2_table, i);
>  
>  switch (qcow2_get_cluster_type(bs, l2_entry)) {
>  case QCOW2_CLUSTER_COMPRESSED:
> @@ -1700,6 +1701,14 @@ static int check_refcounts_l2(BlockDriverState *bs, 
> BdrvCheckResult *res,
>  break;
>  }
>  
> +if (l2_bitmap) {
> +fprintf(stderr, "ERROR compressed cluster %d with non-zero "
> +"subcluster allocation bitmap, entry=0x%" PRIx64 
> "\n",
> +i, l2_entry);
> +res->corruptions++;
> +break;
> +}
> +
>  /* Mark cluster as used */
>  qcow2_parse_compressed_l2_entry(bs, l2_entry, , );
>  ret = qcow2_inc_refcounts_imrt(
> @@ -1727,13 +1736,19 @@ static int check_refcounts_l2(BlockDriverState *bs, 
> BdrvCheckResult *res,
>  {
>  uint64_t offset = l2_entry & L2E_OFFSET_MASK;
>  
> +if ((l2_bitmap >> 32) & l2_bitmap) {
> +res->corruptions++;
> +fprintf(stderr, "ERROR offset=%" PRIx64 ": Allocated "
> +"cluster has corrupted subcluster allocation 
> bitmap\n",
> +offset);
> +}
> +
>  /* Correct offsets are cluster aligned */
>  if (offset_into_cluster(s, offset)) {
>  bool contains_data;
>  res->corruptions++;
>  
>  if (has_subclusters(s)) {
> -uint64_t l2_bitmap = get_l2_bitmap(s, l2_table, i);
>  contains_data = (l2_bitmap & QCOW_L2_BITMAP_ALL_ALLOC);
>  } else {
>  contains_data = !(l2_entry & QCOW_OFLAG_ZERO);
> @@ -1800,6 +1815,19 @@ static int check_refcounts_l2(BlockDriverState *bs, 
> BdrvCheckResult *res,
>  
>  case QCOW2_CLUSTER_ZERO_PLAIN:
>  case QCOW2_CLUSTER_UNALLOCATED:
> +if (l2_bitmap & QCOW_L2_BITMAP_ALL_ALLOC) {
> +res->corruptions++;
> +fprintf(stderr, "%s: Unallocated "
> +"cluster has non-zero subcluster allocation map\n",
> +fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR");
> +if (fix & BDRV_FIX_ERRORS) {
> +ret = fix_l2_entry_by_zero(bs, res, l2_offset, l2_table, 
> i,
> +   active, _overlap);
> +if (metadata_overlap) {
> +return ret;
> +}
> +}
> +}
>  break;
>  
>  default:
> 




Re: [PATCH] hmp: Fix loadvm to resume the VM on success instead of failure

2021-05-11 Thread Daniel P . Berrangé
On Tue, May 11, 2021 at 06:31:51PM +0200, Kevin Wolf wrote:
> Commit f61fe11aa6f broke hmp_loadvm() by adding an incorrect negation
> when converting from 0/-errno return values to a bool value. The result
> is that loadvm resumes the VM now if it failed and keeps it stopped if
> it failed. Fix it to restore the old behaviour and do it the other way
> around.
> 
> Fixes: f61fe11aa6f7f8f0ffe4ddaa56a8108f3ab57854
> Cc: qemu-sta...@nongnu.org
> Reported-by: Yanhui Ma 
> Signed-off-by: Kevin Wolf 
> ---
>  monitor/hmp-cmds.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index 0ad5b77477..cc15d9b6ee 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -1133,7 +1133,7 @@ void hmp_loadvm(Monitor *mon, const QDict *qdict)
>  
>  vm_stop(RUN_STATE_RESTORE_VM);
>  
> -if (!load_snapshot(name, NULL, false, NULL, ) && saved_vm_running) {
> +if (load_snapshot(name, NULL, false, NULL, ) && saved_vm_running) {
>  vm_start();
>  }
>  hmp_handle_error(mon, err);

Paolo had sent a different fix here:

  https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg01093.html

As with my feedback there, I think we should be adding test coverage
when fixing this. How about this:

diff --git a/tests/qemu-iotests/068 b/tests/qemu-iotests/068
index 54e49c8ffa..137c5d0577 100755
--- a/tests/qemu-iotests/068
+++ b/tests/qemu-iotests/068
@@ -77,7 +77,7 @@ for extra_args in \
 # Give qemu some time to boot before saving the VM state
 { sleep 1; printf "savevm 0\nquit\n"; } | _qemu $extra_args
 # Now try to continue from that VM state (this should just work)
-{ sleep 1; printf "loadvm 0\nloadvm 0\nquit\n"; } | _qemu $extra_args -S
+{ sleep 1; printf "info status\nloadvm 0\ninfo status\ncont\ninfo 
status\nloadvm 0\ninfo status\nquit\n"; } | _qemu $extra_args -S
 done
 
 # success, all done
diff --git a/tests/qemu-iotests/068.out b/tests/qemu-iotests/068.out
index f07a938a38..75c0a5df5f 100644
--- a/tests/qemu-iotests/068.out
+++ b/tests/qemu-iotests/068.out
@@ -7,8 +7,17 @@ QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) savevm 0
 (qemu) quit
 QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) info status
+VM status: paused (prelaunch)
 (qemu) loadvm 0
+(qemu) info status
+VM status: paused (prelaunch)
+(qemu) cont
+(qemu) info status
+VM status: running
 (qemu) loadvm 0
+(qemu) info status
+VM status: running
 (qemu) quit
 
 === Saving and reloading a VM state to/from a qcow2 image (-object 
iothread,id=iothread0 -set device.hba0.iothread=iothread0) ===
@@ -18,7 +27,16 @@ QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) savevm 0
 (qemu) quit
 QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) info status
+VM status: paused (prelaunch)
 (qemu) loadvm 0
+(qemu) info status
+VM status: paused (prelaunch)
+(qemu) cont
+(qemu) info status
+VM status: running
 (qemu) loadvm 0
+(qemu) info status
+VM status: running
 (qemu) quit
 *** done


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH] hmp: Fix loadvm to resume the VM on success instead of failure

2021-05-11 Thread Kevin Wolf
Commit f61fe11aa6f broke hmp_loadvm() by adding an incorrect negation
when converting from 0/-errno return values to a bool value. The result
is that loadvm resumes the VM now if it failed and keeps it stopped if
it failed. Fix it to restore the old behaviour and do it the other way
around.

Fixes: f61fe11aa6f7f8f0ffe4ddaa56a8108f3ab57854
Cc: qemu-sta...@nongnu.org
Reported-by: Yanhui Ma 
Signed-off-by: Kevin Wolf 
---
 monitor/hmp-cmds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 0ad5b77477..cc15d9b6ee 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1133,7 +1133,7 @@ void hmp_loadvm(Monitor *mon, const QDict *qdict)
 
 vm_stop(RUN_STATE_RESTORE_VM);
 
-if (!load_snapshot(name, NULL, false, NULL, ) && saved_vm_running) {
+if (load_snapshot(name, NULL, false, NULL, ) && saved_vm_running) {
 vm_start();
 }
 hmp_handle_error(mon, err);
-- 
2.30.2




[PULL 1/2] hw/block/pflash_cfi02: Set romd mode in pflash_cfi02_realize()

2021-05-11 Thread Philippe Mathieu-Daudé
From: Philippe Mathieu-Daudé 

The ROMD mode isn't related to mapping setup.
Ideally we'd set this mode when the state machine resets,
but for now simply move it to pflash_cfi02_realize() to
not introduce logical change.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20210325120921.858993-2-f4...@amsat.org>
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/pflash_cfi02.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index 25c053693ce..35e30bb812c 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -173,7 +173,6 @@ static void pflash_setup_mappings(PFlashCFI02 *pfl)
  "pflash-alias", >orig_mem, 0, size);
 memory_region_add_subregion(>mem, i * size, 
>mem_mappings[i]);
 }
-pfl->rom_mode = true;
 }
 
 static void pflash_reset_state_machine(PFlashCFI02 *pfl)
@@ -917,6 +916,7 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
**errp)
 /* Allocate memory for a bitmap for sectors being erased. */
 pfl->sector_erase_map = bitmap_new(pfl->total_sectors);
 
+pfl->rom_mode = true;
 pflash_setup_mappings(pfl);
 sysbus_init_mmio(SYS_BUS_DEVICE(dev), >mem);
 
-- 
2.26.3




[PULL 2/2] hw/block/pflash_cfi02: Do not create aliases when not necessary

2021-05-11 Thread Philippe Mathieu-Daudé
From: Philippe Mathieu-Daudé 

When no mapping is requested, it is pointless to create
alias regions.
Only create them when multiple mappings are requested to
simplify the memory layout. The flatview is not changed.

For example using 'qemu-system-sh4 -M r2d -S -monitor stdio',

* before:

  (qemu) info mtree
  address-space: memory
- (prio 0, i/o): system
  -00ff (prio 0, i/o): pflash
-00ff (prio 0, romd): alias pflash-alias 
@r2d.flash -00ff
  0400-043f (prio 0, i/o): r2d-fpga
  0c00-0fff (prio 0, ram): r2d.sdram
  (qemu) info mtree -f
  FlatView #0
   AS "memory", root: system
   AS "cpu-memory-0", root: system
   Root memory region: system
-00ff (prio 0, romd): r2d.flash
0400-043f (prio 0, i/o): r2d-fpga
0c00-0fff (prio 0, ram): r2d.sdram

* after:

  (qemu) info mtree
  address-space: memory
- (prio 0, i/o): system
  -00ff (prio 0, romd): r2d.flash
  0400-043f (prio 0, i/o): r2d-fpga
  0c00-0fff (prio 0, ram): r2d.sdram
  (qemu) info mtree -f
  FlatView #0
   AS "memory", root: system
   AS "cpu-memory-0", root: system
   Root memory region: system
-00ff (prio 0, romd): r2d.flash
0400-043f (prio 0, i/o): r2d-fpga
0c00-0fff (prio 0, ram): r2d.sdram

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20210325120921.858993-3-f4...@amsat.org>
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/pflash_cfi02.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index 35e30bb812c..02c514fb6e0 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -917,8 +917,12 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
**errp)
 pfl->sector_erase_map = bitmap_new(pfl->total_sectors);
 
 pfl->rom_mode = true;
-pflash_setup_mappings(pfl);
-sysbus_init_mmio(SYS_BUS_DEVICE(dev), >mem);
+if (pfl->mappings > 1) {
+pflash_setup_mappings(pfl);
+sysbus_init_mmio(SYS_BUS_DEVICE(dev), >mem);
+} else {
+sysbus_init_mmio(SYS_BUS_DEVICE(dev), >orig_mem);
+}
 
 timer_init_ns(>timer, QEMU_CLOCK_VIRTUAL, pflash_timer, pfl);
 pfl->status = 0;
-- 
2.26.3




[PULL 0/2] pflash patches for 2021-05-11

2021-05-11 Thread Philippe Mathieu-Daudé
The following changes since commit f9a576a818044133f8564e0d243ebd97df0b3280:

  Merge remote-tracking branch 
'remotes/dgilbert-gitlab/tags/pull-virtiofs-20210506' into staging (2021-05-11 
13:03:44 +0100)

are available in the Git repository at:

  https://github.com/philmd/qemu.git tags/pflash-20210511

for you to fetch changes up to 27545c9df24f509c6d1c1f17478281a357125554:

  hw/block/pflash_cfi02: Do not create aliases when not necessary (2021-05-11 
18:11:02 +0200)


Parallel NOR Flash patches queue

- Simplify memory layout when no pflash_cfi02 mapping requested


Philippe Mathieu-Daudé (2):
  hw/block/pflash_cfi02: Set romd mode in pflash_cfi02_realize()
  hw/block/pflash_cfi02: Do not create aliases when not necessary

 hw/block/pflash_cfi02.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

-- 
2.26.3





Re: [PATCH] hw/block/nvme: re-enable NVMe PCI hotplug

2021-05-11 Thread Hannes Reinecke

On 5/11/21 6:03 PM, Klaus Jensen wrote:

On May 11 16:54, Hannes Reinecke wrote:

On 5/11/21 3:37 PM, Klaus Jensen wrote:

On May 11 15:12, Hannes Reinecke wrote:

On 5/11/21 2:22 PM, Klaus Jensen wrote:

[ .. ]
The hotplug fix looks good - I'll post a series that tries to 
integrate

both.


Ta.

The more I think about it, the more I think we should be looking into
reparenting the namespaces to the subsystem.
That would have the _immediate_ benefit that 'device_del' and
'device_add' becomes symmetric (ie one doesn't have to do a separate
'device_add nvme-ns'), as the nvme namespace is not affected by the
hotplug event.



I have that working, but I'm struggling with a QEMU API technicality in
that I apparently cannot simply move the NvmeBus creation to the
nvme-subsys device. For some reason the bus is not available for the
nvme-ns devices. That is, if one does something like this:

  -device nvme-subsys,...
  -device nvme-ns,...

Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns".
This is probably just me not grok'ing the qdev well enough, so I'll keep
trying to fix that. What works now is to have the regular setup:


_Normally_ the 'id' of the parent device spans a bus, so the syntax
should be

-device nvme-subsys,id=subsys1,...
-device nvme-ns,bus=subsys1,...


Yeah, I know, I just oversimplified the example. This *is* how I wanted 
it to work ;)




As for the nvme device I would initially expose any namespace from the
subsystem to the controller; the nvme spec has some concept of 'active'
or 'inactive' namespaces which would allow us to blank out individual
namespaces on a per-controller basis, but I fear that's not easy to
model with qdev and the structure above.



The nvme-ns device already supports the boolean 'detached' parameter to 
support the concept of an inactive namespace.


Yeah, but that doesn't really work if we move the namespace to the 
subsystem; the 'detached' parameter is for the controller<->namespace

relationship.
That's why I meant we have to have some sort of NSID map for the 
controller such that the controller knows with NSID to access.
I guess we can copy the trick with the NSID array, and reverse the 
operation we have now wrt subsystem; keep the main NSID array in the 
subsystem, and per-controller NSID arrays holding those which can be 
accessed.


And ignore the commandline for now; figure that one out later.

Cheers,

Hannes

  -device nvme-subsys,...
  -device nvme,...
  -device nvme-ns,...

And the nvme-ns device will then reparent to the NvmeBus on nvme-subsys
(which magically now IS available when nvme-ns is realized). This has
the same end result, but I really would like that the namespaces could
be specified as children of the subsys directly.


Shudder.
Automatic reparenting.
To my understanding from qdev that shouldn't even be possible.
Please don't.



It's perfectly possible with the API and used to implement stuff like 
failover. We are not changing the parent object, we are changing the 
parent bus. hw/sd does something like this (but does mention that its a 
bit of a hack). In this case I'd say we could argue to get away with it 
as well.
Allowing the nvme-ns device to be a child of the controller allows the 
initially attached controller of non-shared namespaces to be easily 
expressible. But I agree, the approach is a bit wacky, which is why I 
havnt posted anything yet.


Yep, I did try to implement multipathing for SCSI at one point, and that 
became patently horrible as it would run against qdev where everything 
is ordered within a tree.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer



Re: [PATCH] hw/block/nvme: re-enable NVMe PCI hotplug

2021-05-11 Thread Klaus Jensen

On May 11 16:54, Hannes Reinecke wrote:

On 5/11/21 3:37 PM, Klaus Jensen wrote:

On May 11 15:12, Hannes Reinecke wrote:

On 5/11/21 2:22 PM, Klaus Jensen wrote:

[ .. ]

The hotplug fix looks good - I'll post a series that tries to integrate
both.


Ta.

The more I think about it, the more I think we should be looking into
reparenting the namespaces to the subsystem.
That would have the _immediate_ benefit that 'device_del' and
'device_add' becomes symmetric (ie one doesn't have to do a separate
'device_add nvme-ns'), as the nvme namespace is not affected by the
hotplug event.



I have that working, but I'm struggling with a QEMU API technicality in
that I apparently cannot simply move the NvmeBus creation to the
nvme-subsys device. For some reason the bus is not available for the
nvme-ns devices. That is, if one does something like this:

  -device nvme-subsys,...
  -device nvme-ns,...

Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns".
This is probably just me not grok'ing the qdev well enough, so I'll keep
trying to fix that. What works now is to have the regular setup:


_Normally_ the 'id' of the parent device spans a bus, so the syntax
should be

-device nvme-subsys,id=subsys1,...
-device nvme-ns,bus=subsys1,...


Yeah, I know, I just oversimplified the example. This *is* how I wanted 
it to work ;)




As for the nvme device I would initially expose any namespace from the
subsystem to the controller; the nvme spec has some concept of 'active'
or 'inactive' namespaces which would allow us to blank out individual
namespaces on a per-controller basis, but I fear that's not easy to
model with qdev and the structure above.



The nvme-ns device already supports the boolean 'detached' parameter to 
support the concept of an inactive namespace.



  -device nvme-subsys,...
  -device nvme,...
  -device nvme-ns,...

And the nvme-ns device will then reparent to the NvmeBus on nvme-subsys
(which magically now IS available when nvme-ns is realized). This has
the same end result, but I really would like that the namespaces could
be specified as children of the subsys directly.


Shudder.
Automatic reparenting.
To my understanding from qdev that shouldn't even be possible.
Please don't.



It's perfectly possible with the API and used to implement stuff like 
failover. We are not changing the parent object, we are changing the 
parent bus. hw/sd does something like this (but does mention that its a 
bit of a hack). In this case I'd say we could argue to get away with it 
as well. 

Allowing the nvme-ns device to be a child of the controller allows the 
initially attached controller of non-shared namespaces to be easily 
expressible. But I agree, the approach is a bit wacky, which is why I 
havnt posted anything yet.


signature.asc
Description: PGP signature


[PATCH] block: add more commands to preconfig mode

2021-05-11 Thread Paolo Bonzini
Most block device commands do not require a fully constructed machine.
Allow running them before machine initialization has concluded.

Signed-off-by: Paolo Bonzini 
---
 hmp-commands.hx|  14 +
 qapi/block-core.json   | 117 +++--
 qapi/block-export.json |  21 +---
 qapi/block.json|   6 ++-
 4 files changed, 110 insertions(+), 48 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 146a13c896..901a50ebd1 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -78,6 +78,7 @@ ERST
 .help   = "resize a block image",
 .cmd= hmp_block_resize,
 .coroutine  = true,
+.flags  = "p",
 },
 
 SRST
@@ -94,6 +95,7 @@ ERST
 .params = "device [speed [base]]",
 .help   = "copy data from a backing file into a block device",
 .cmd= hmp_block_stream,
+.flags  = "p",
 },
 
 SRST
@@ -107,6 +109,7 @@ ERST
 .params = "device speed",
 .help   = "set maximum speed for a background block operation",
 .cmd= hmp_block_job_set_speed,
+.flags  = "p",
 },
 
 SRST
@@ -122,6 +125,7 @@ ERST
   "\n\t\t\t if you want to abort the operation immediately"
   "\n\t\t\t instead of keep running until data is in 
sync)",
 .cmd= hmp_block_job_cancel,
+.flags  = "p",
 },
 
 SRST
@@ -135,6 +139,7 @@ ERST
 .params = "device",
 .help   = "stop an active background block operation",
 .cmd= hmp_block_job_complete,
+.flags  = "p",
 },
 
 SRST
@@ -149,6 +154,7 @@ ERST
 .params = "device",
 .help   = "pause an active background block operation",
 .cmd= hmp_block_job_pause,
+.flags  = "p",
 },
 
 SRST
@@ -162,6 +168,7 @@ ERST
 .params = "device",
 .help   = "resume a paused background block operation",
 .cmd= hmp_block_job_resume,
+.flags  = "p",
 },
 
 SRST
@@ -1396,6 +1403,7 @@ ERST
 .params = "nbd_server_start [-a] [-w] host:port",
 .help   = "serve block devices on the given host and port",
 .cmd= hmp_nbd_server_start,
+.flags  = "p",
 },
 SRST
 ``nbd_server_start`` *host*:*port*
@@ -1411,6 +1419,7 @@ ERST
 .params = "nbd_server_add [-w] device [name]",
 .help   = "export a block device via NBD",
 .cmd= hmp_nbd_server_add,
+.flags  = "p",
 },
 SRST
 ``nbd_server_add`` *device* [ *name* ]
@@ -1426,6 +1435,7 @@ ERST
 .params = "nbd_server_remove [-f] name",
 .help   = "remove an export previously exposed via NBD",
 .cmd= hmp_nbd_server_remove,
+.flags  = "p",
 },
 SRST
 ``nbd_server_remove [-f]`` *name*
@@ -1442,6 +1452,7 @@ ERST
 .params = "nbd_server_stop",
 .help   = "stop serving block devices using the NBD protocol",
 .cmd= hmp_nbd_server_stop,
+.flags  = "p",
 },
 SRST
 ``nbd_server_stop``
@@ -1471,6 +1482,7 @@ ERST
 .params = "getfd name",
 .help   = "receive a file descriptor via SCM rights and assign it 
a name",
 .cmd= hmp_getfd,
+.flags  = "p",
 },
 
 SRST
@@ -1486,6 +1498,7 @@ ERST
 .params = "closefd name",
 .help   = "close a file descriptor previously passed via SCM 
rights",
 .cmd= hmp_closefd,
+.flags  = "p",
 },
 
 SRST
@@ -1501,6 +1514,7 @@ ERST
 .params = "device bps bps_rd bps_wr iops iops_rd iops_wr",
 .help   = "change I/O throttle limits for a block drive",
 .cmd= hmp_block_set_io_throttle,
+.flags  = "p",
 },
 
 SRST
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6d227924d0..8bbc06ac86 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -680,7 +680,8 @@
 #}
 #
 ##
-{ 'command': 'query-block', 'returns': ['BlockInfo'] }
+{ 'command': 'query-block', 'returns': ['BlockInfo'],
+  'allow-preconfig': true }
 
 
 ##
@@ -1056,7 +1057,8 @@
 ##
 { 'command': 'query-blockstats',
   'data': { '*query-nodes': 'bool' },
-  'returns': ['BlockStats'] }
+  'returns': ['BlockStats'],
+  'allow-preconfig': true }
 
 ##
 # @BlockdevOnError:
@@ -1205,7 +1207,8 @@
 #
 # Since: 1.1
 ##
-{ 'command': 'query-block-jobs', 'returns': ['BlockJobInfo'] }
+{ 'command': 'query-block-jobs', 'returns': ['BlockJobInfo'],
+  'allow-preconfig': true }
 
 ##
 # @block_resize:
@@ -1236,7 +1239,8 @@
   'data': { '*device': 'str',
 '*node-name': 'str',
 'size': 'int' },
-  'coroutine': true }
+  'coroutine': true,
+  'allow-preconfig': true }
 
 ##
 # @NewImageMode:
@@ -1447,7 +1451,8 @@
 #
 ##
 { 'command': 'blockdev-snapshot-sync',
-  'data': 'BlockdevSnapshotSync' 

Re: [PATCH] hw/block/nvme: re-enable NVMe PCI hotplug

2021-05-11 Thread Hannes Reinecke
On 5/11/21 3:37 PM, Klaus Jensen wrote:
> On May 11 15:12, Hannes Reinecke wrote:
>> On 5/11/21 2:22 PM, Klaus Jensen wrote:
[ .. ]
>>> The hotplug fix looks good - I'll post a series that tries to integrate
>>> both.
>>>
>> Ta.
>>
>> The more I think about it, the more I think we should be looking into
>> reparenting the namespaces to the subsystem.
>> That would have the _immediate_ benefit that 'device_del' and
>> 'device_add' becomes symmetric (ie one doesn't have to do a separate
>> 'device_add nvme-ns'), as the nvme namespace is not affected by the
>> hotplug event.
>>
> 
> I have that working, but I'm struggling with a QEMU API technicality in
> that I apparently cannot simply move the NvmeBus creation to the
> nvme-subsys device. For some reason the bus is not available for the
> nvme-ns devices. That is, if one does something like this:
> 
>   -device nvme-subsys,...
>   -device nvme-ns,...
> 
> Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns".
> This is probably just me not grok'ing the qdev well enough, so I'll keep
> trying to fix that. What works now is to have the regular setup:
> 
_Normally_ the 'id' of the parent device spans a bus, so the syntax
should be

-device nvme-subsys,id=subsys1,...
-device nvme-ns,bus=subsys1,...

As for the nvme device I would initially expose any namespace from the
subsystem to the controller; the nvme spec has some concept of 'active'
or 'inactive' namespaces which would allow us to blank out individual
namespaces on a per-controller basis, but I fear that's not easy to
model with qdev and the structure above.

>   -device nvme-subsys,...
>   -device nvme,...
>   -device nvme-ns,...
> 
> And the nvme-ns device will then reparent to the NvmeBus on nvme-subsys
> (which magically now IS available when nvme-ns is realized). This has
> the same end result, but I really would like that the namespaces could
> be specified as children of the subsys directly.
> 
Shudder.
Automatic reparenting.
To my understanding from qdev that shouldn't even be possible.
Please don't.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke Kernel Storage Architect
h...@suse.de   +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)



Re: [PATCH] block: Improve backing file validation

2021-05-11 Thread lizhij...@fujitsu.com

On 2021/5/10 16:41, Daniel P. Berrangé wrote:
> On Mon, May 10, 2021 at 12:30:45PM +0800, Li Zhijian wrote:
>> Image below user cases:
>> case 1:
>> ```
>> $ qemu-img create -f raw source.raw 1G
>> $ qemu-img create -f qcow2 -F raw -b source.raw ./source.raw
>> qemu-img info source.raw
>> image: source.raw
>> file format: qcow2
>> virtual size: 193K (197120 bytes)
>> disk size: 196K
>> cluster_size: 65536
>> backing file: source.raw <<
>> backing file format: raw
>> Format specific information:
>>  compat: 1.1
>>  lazy refcounts: false
>>  refcount bits: 16
>>  corrupt: false
>> ```
>>
>> case 2:
>> ```
>> $ qemu-img create -f raw source.raw 1G
>> $ ln -sf source.raw destination.qcow2
>> $ qemu-img create -f qcow2 -F raw -b source.raw ./destination.qcow2
>> qemu-img info source.raw
>> image: source.raw
>> file format: qcow2 <<
>> virtual size: 2.0G (2147483648 bytes)
>> disk size: 196K
>> cluster_size: 65536
>> backing file: source.raw
>> backing file format: raw
>> Format specific information:
>>  compat: 1.1
>>  lazy refcounts: false
>>  refcount bits: 16
>>  corrupt: false
>> ```
>> Generally, we don't expect to corrupte the source.raw anyway, while
>> actually it does.
>>
>> Here we validate the realpath of file instead the input string.
> That still won't handle the case where you use hard links
>
>$ ln source.raw destination.qcow2
>
> To properly validate the scenarios I think it is neccessary
> to ignore the filename sentirely.
>
> Instead attempt to open both files, and if successful, fstat()
> them both, and then compare the st_dev + st_ino  fields.


Sounds great, i will update it.

Thanks

Zhijian


Re: [PATCH] hw/block/nvme: re-enable NVMe PCI hotplug

2021-05-11 Thread Klaus Jensen

On May 11 15:12, Hannes Reinecke wrote:

On 5/11/21 2:22 PM, Klaus Jensen wrote:

On May 11 09:35, Hannes Reinecke wrote:

Ever since commit e570768566 ("hw/block/nvme: support for shared
namespace in subsystem") NVMe PCI hotplug is broken, as the PCI
hotplug infrastructure will only work for the nvme devices (which
are PCI devices), but not for any attached namespaces.
So when re-adding the NVMe PCI device via 'device_add' the NVMe
controller is added, but all namespaces are missing.
This patch adds device hotplug hooks for NVMe namespaces, such that one
can call 'device_add nvme-ns' to (re-)attach the namespaces after
the PCI NVMe device 'device_add nvme' hotplug call.



Hi Hannes,

Thanks for this.

The real fix here is that namespaces are properly detached from other
controllers that it may be shared on.

But is this really the behavior we want? That nvme-ns devices always
"belongs to" (in QEMU qdev terms) an nvme device is an artifact of the
Bus/Device architecture and not really how an NVM subsystem should
behave. Removing a controller should not cause shared namespaces to
disappear from other controllers.

I have a WIP that instead adds an NvmeBus to the nvme-subsys device and
reparents the nvme-ns devices to that if the parent controller is linked
to a sybsystem. This way, nvme-ns devices wont be unrealized under the
feet of other controllers.


That would be the other direction I thought of; _technically_ NVMe
namespaces are objects of the subsystem, and 'controllers' are just
temporary objects providing access to the namespaces presented by the
subsystem.
So if you are going to rework it I'd rather make the namespaces children
objects of the subsystem, and have nsid maps per controller detailing
which nsids are accessible from the individual controllers.
That would probably a simple memcpy() to start with, but it would allow
us to modify that map via NVMe-MI and stuff.

However, if you do that you'll find that subsystems can't be hotplugged,
too; but I'm sure you'll be able to fix it up :-)


The hotplug fix looks good - I'll post a series that tries to integrate
both.


Ta.

The more I think about it, the more I think we should be looking into
reparenting the namespaces to the subsystem.
That would have the _immediate_ benefit that 'device_del' and
'device_add' becomes symmetric (ie one doesn't have to do a separate
'device_add nvme-ns'), as the nvme namespace is not affected by the
hotplug event.



I have that working, but I'm struggling with a QEMU API technicality in 
that I apparently cannot simply move the NvmeBus creation to the 
nvme-subsys device. For some reason the bus is not available for the 
nvme-ns devices. That is, if one does something like this:


  -device nvme-subsys,...
  -device nvme-ns,...

Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns". 
This is probably just me not grok'ing the qdev well enough, so I'll keep 
trying to fix that. What works now is to have the regular setup:


  -device nvme-subsys,...
  -device nvme,...
  -device nvme-ns,...

And the nvme-ns device will then reparent to the NvmeBus on nvme-subsys 
(which magically now IS available when nvme-ns is realized). This has 
the same end result, but I really would like that the namespaces could 
be specified as children of the subsys directly.


Any help from qdev experts would be appreciated! :)


This really was a quick hack to demonstrate a shortcoming in the linux
NVMe stack (cf 'nvme-mpath: delete disk after last connection' if you
are interested in details), so I'm sure there is room for improvement.



I follow linux-nvme, so I saw the patch ;)


And the prime reason for sending it out was to gauge interest by
qemu-devel; I have a somewhat mixed experience when sending patches to
the qemu ML ...



Your contribution is very much appreciated :)


signature.asc
Description: PGP signature


Re: [PATCH] hw/block/nvme: re-enable NVMe PCI hotplug

2021-05-11 Thread Hannes Reinecke
On 5/11/21 2:22 PM, Klaus Jensen wrote:
> On May 11 09:35, Hannes Reinecke wrote:
>> Ever since commit e570768566 ("hw/block/nvme: support for shared
>> namespace in subsystem") NVMe PCI hotplug is broken, as the PCI
>> hotplug infrastructure will only work for the nvme devices (which
>> are PCI devices), but not for any attached namespaces.
>> So when re-adding the NVMe PCI device via 'device_add' the NVMe
>> controller is added, but all namespaces are missing.
>> This patch adds device hotplug hooks for NVMe namespaces, such that one
>> can call 'device_add nvme-ns' to (re-)attach the namespaces after
>> the PCI NVMe device 'device_add nvme' hotplug call.
>>
> 
> Hi Hannes,
> 
> Thanks for this.
> 
> The real fix here is that namespaces are properly detached from other
> controllers that it may be shared on.
> 
> But is this really the behavior we want? That nvme-ns devices always
> "belongs to" (in QEMU qdev terms) an nvme device is an artifact of the
> Bus/Device architecture and not really how an NVM subsystem should
> behave. Removing a controller should not cause shared namespaces to
> disappear from other controllers.
> 
> I have a WIP that instead adds an NvmeBus to the nvme-subsys device and
> reparents the nvme-ns devices to that if the parent controller is linked
> to a sybsystem. This way, nvme-ns devices wont be unrealized under the
> feet of other controllers.
> 
That would be the other direction I thought of; _technically_ NVMe
namespaces are objects of the subsystem, and 'controllers' are just
temporary objects providing access to the namespaces presented by the
subsystem.
So if you are going to rework it I'd rather make the namespaces children
objects of the subsystem, and have nsid maps per controller detailing
which nsids are accessible from the individual controllers.
That would probably a simple memcpy() to start with, but it would allow
us to modify that map via NVMe-MI and stuff.

However, if you do that you'll find that subsystems can't be hotplugged,
too; but I'm sure you'll be able to fix it up :-)

> The hotplug fix looks good - I'll post a series that tries to integrate
> both.
> 
Ta.

The more I think about it, the more I think we should be looking into
reparenting the namespaces to the subsystem.
That would have the _immediate_ benefit that 'device_del' and
'device_add' becomes symmetric (ie one doesn't have to do a separate
'device_add nvme-ns'), as the nvme namespace is not affected by the
hotplug event.

This really was a quick hack to demonstrate a shortcoming in the linux
NVMe stack (cf 'nvme-mpath: delete disk after last connection' if you
are interested in details), so I'm sure there is room for improvement.

And the prime reason for sending it out was to gauge interest by
qemu-devel; I have a somewhat mixed experience when sending patches to
the qemu ML ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke Kernel Storage Architect
h...@suse.de   +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)



Re: [PATCH v2 0/6] hw/southbridge: QOM'ify vt82c686 as VT82C686B_SOUTHBRIDGE

2021-05-11 Thread BALATON Zoltan

On Tue, 11 May 2021, Philippe Mathieu-Daudé wrote:

Hi Zoltan,

On 5/11/21 1:28 PM, BALATON Zoltan wrote:

On Tue, 11 May 2021, Philippe Mathieu-Daudé wrote:

The motivation behind this series is to remove the
isa_get_irq(NULL) call to simplify the ISA generic model.

Since v1:
- rebased on top of remotes/dg-gitlab/tags/ppc-for-6.1-20210504


I'll try to have a look at these later but some notes: The pegasos2
changes are now in master so if this was before that maybe rebasing on
master is now enough.


This is what this series does, simply rebase on top of your merged
patches.


However I wonder if any changes to pegasos2.c is
needed due to changed init of the chip model or is that only affecting
82c686b?


There is no change in 'init' in this series, it is only QOM boilerplate
code churn, no logical change intended.


Please also note that pegasos2 is not enabled by default due to
needing undistributable firmware ROM so to test it you need to enable it
in default-configs/devices/ppc-softmmu.mak


I remember you said you were mostly interested in the VT8231, not
the VT82C686. This series only QOM'ify the latter.


OK as I said I haven't looked at it in detail.


What is your idea? Send the firmware off-list and explain how
the OS works and how (what) to test?


I've already sent you this info:

https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg01553.html

but I can't write a test case so if you want to automate this and make it 
part of QEMU tests then some help with that would be appreciated.


As for the firmware, once Alexey's VOF (Virtual Open Firmware, minimial OF 
emulation in QEMU) is merged I plan to try to use that to make it possible 
to boot some guests with that so no firmware image would be needed and 
pegasos2 could be enabled by default. But for now a firmware image is 
needed as guests expect an OF environment to boot.


Regards,
BALATON Zoltan

Re: [PATCH 4/6] progressmeter: protect with a mutex

2021-05-11 Thread Paolo Bonzini

On 10/05/21 13:28, Vladimir Sementsov-Ogievskiy wrote:



Could we instead add a c file and add the structure private? Then we'll 
have progress_new() and progress_free() APIs instead.


This way, it would be a lot simpler to control that nobady use structure 
fields directly.


I don't know...  I prefer embedding structs to heap allocation.

Paolo




Re: [PATCH v2 0/6] hw/southbridge: QOM'ify vt82c686 as VT82C686B_SOUTHBRIDGE

2021-05-11 Thread Philippe Mathieu-Daudé
Hi Zoltan,

On 5/11/21 1:28 PM, BALATON Zoltan wrote:
> On Tue, 11 May 2021, Philippe Mathieu-Daudé wrote:
>> The motivation behind this series is to remove the
>> isa_get_irq(NULL) call to simplify the ISA generic model.
>>
>> Since v1:
>> - rebased on top of remotes/dg-gitlab/tags/ppc-for-6.1-20210504
> 
> I'll try to have a look at these later but some notes: The pegasos2
> changes are now in master so if this was before that maybe rebasing on
> master is now enough.

This is what this series does, simply rebase on top of your merged
patches.

> However I wonder if any changes to pegasos2.c is
> needed due to changed init of the chip model or is that only affecting
> 82c686b?

There is no change in 'init' in this series, it is only QOM boilerplate
code churn, no logical change intended.

> Please also note that pegasos2 is not enabled by default due to
> needing undistributable firmware ROM so to test it you need to enable it
> in default-configs/devices/ppc-softmmu.mak

I remember you said you were mostly interested in the VT8231, not
the VT82C686. This series only QOM'ify the latter.

What is your idea? Send the firmware off-list and explain how
the OS works and how (what) to test?

Regards,

Phil.

>> Philippe Mathieu-Daudé (6):
>>  hw/isa/vt82c686: Name output IRQ as 'intr'
>>  hw/isa/vt82c686: Simplify removing unuseful qemu_allocate_irqs() call
>>  hw/isa/vt82c686: Let ISA function expose ISA IRQs
>>  hw/ide/via: Replace magic 2 value by ARRAY_SIZE / MAX_IDE_DEVS
>>  hw/ide/via: Connect IDE function output IRQs to the ISA function input
>>  hw/southbridge/vt82c686: Introduce VT82C686B_SOUTHBRIDGE



Re: [PATCH] hw/block/nvme: re-enable NVMe PCI hotplug

2021-05-11 Thread Klaus Jensen

On May 11 09:35, Hannes Reinecke wrote:

Ever since commit e570768566 ("hw/block/nvme: support for shared
namespace in subsystem") NVMe PCI hotplug is broken, as the PCI
hotplug infrastructure will only work for the nvme devices (which
are PCI devices), but not for any attached namespaces.
So when re-adding the NVMe PCI device via 'device_add' the NVMe
controller is added, but all namespaces are missing.
This patch adds device hotplug hooks for NVMe namespaces, such that one
can call 'device_add nvme-ns' to (re-)attach the namespaces after
the PCI NVMe device 'device_add nvme' hotplug call.



Hi Hannes,

Thanks for this.

The real fix here is that namespaces are properly detached from other 
controllers that it may be shared on.


But is this really the behavior we want? That nvme-ns devices always 
"belongs to" (in QEMU qdev terms) an nvme device is an artifact of the 
Bus/Device architecture and not really how an NVM subsystem should 
behave. Removing a controller should not cause shared namespaces to 
disappear from other controllers.


I have a WIP that instead adds an NvmeBus to the nvme-subsys device and 
reparents the nvme-ns devices to that if the parent controller is linked 
to a sybsystem. This way, nvme-ns devices wont be unrealized under the 
feet of other controllers.


The hotplug fix looks good - I'll post a series that tries to integrate 
both.



Fixes: e570768566 ("hw/block/nvme: support for shared namespace in subsystem")
Signed-off-by: Hannes Reinecke 
---
capstone   |  2 +-
hw/block/nvme-ns.c | 31 ++
hw/block/nvme-subsys.c | 12 +
hw/block/nvme-subsys.h |  1 +
hw/block/nvme.c| 60 +++---
hw/block/nvme.h|  1 +
roms/SLOF  |  2 +-
roms/openbios  |  2 +-
roms/u-boot|  2 +-
9 files changed, 93 insertions(+), 20 deletions(-)

diff --git a/capstone b/capstone
index f8b1b83301..22ead3e0bf 16
--- a/capstone
+++ b/capstone
@@ -1 +1 @@
-Subproject commit f8b1b833015a4ae47110ed068e0deb7106ced66d
+Subproject commit 22ead3e0bfdb87516656453336160e0a37b066bf
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7bb618f182..3a7e01f10f 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -526,6 +526,36 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
nvme_attach_ns(n, ns);
}

+static void nvme_ns_unrealize(DeviceState *dev)
+{
+NvmeNamespace *ns = NVME_NS(dev);
+BusState *s = qdev_get_parent_bus(dev);
+NvmeCtrl *n = NVME(s->parent);
+NvmeSubsystem *subsys = n->subsys;
+uint32_t nsid = ns->params.nsid;
+int i;
+
+nvme_ns_drain(ns);
+nvme_ns_shutdown(ns);
+nvme_ns_cleanup(ns);
+
+if (subsys) {
+subsys->namespaces[nsid] = NULL;
+
+if (ns->params.shared) {
+for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+NvmeCtrl *ctrl = subsys->ctrls[i];
+
+if (ctrl) {
+nvme_detach_ns(ctrl, ns);
+}
+}
+return;
+}
+}
+nvme_detach_ns(n, ns);
+}
+
static Property nvme_ns_props[] = {
DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false),
@@ -563,6 +593,7 @@ static void nvme_ns_class_init(ObjectClass *oc, void *data)

dc->bus_type = TYPE_NVME_BUS;
dc->realize = nvme_ns_realize;
+dc->unrealize = nvme_ns_unrealize;
device_class_set_props(dc, nvme_ns_props);
dc->desc = "Virtual NVMe namespace";
}
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index 9604c19117..1c00508f33 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -42,6 +42,18 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
return cntlid;
}

+void nvme_subsys_unregister_ctrl(NvmeCtrl *n)
+{
+NvmeSubsystem *subsys = n->subsys;
+int cntlid = n->cntlid;
+
+if (!n->subsys)
+return;
+assert(cntlid < ARRAY_SIZE(subsys->ctrls));
+subsys->ctrls[cntlid] = NULL;
+n->cntlid = -1;
+}
+
static void nvme_subsys_setup(NvmeSubsystem *subsys)
{
const char *nqn = subsys->params.nqn ?
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 7d7ef5f7f1..2d8a146c4f 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -32,6 +32,7 @@ typedef struct NvmeSubsystem {
} NvmeSubsystem;

int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+void nvme_subsys_unregister_ctrl(NvmeCtrl *n);

static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
uint32_t cntlid)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 5fe082ec34..515678b686 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4963,26 +4963,12 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
}

nvme_attach_ns(ctrl, ns);
-__nvme_select_ns_iocs(ctrl, ns);
} else {
if (!nvme_ns(ctrl, nsid)) {
return 

Re: [PATCH v2 0/6] hw/southbridge: QOM'ify vt82c686 as VT82C686B_SOUTHBRIDGE

2021-05-11 Thread BALATON Zoltan

On Tue, 11 May 2021, Philippe Mathieu-Daudé wrote:

The motivation behind this series is to remove the
isa_get_irq(NULL) call to simplify the ISA generic model.

Since v1:
- rebased on top of remotes/dg-gitlab/tags/ppc-for-6.1-20210504


I'll try to have a look at these later but some notes: The pegasos2 
changes are now in master so if this was before that maybe rebasing on 
master is now enough. However I wonder if any changes to pegasos2.c is 
needed due to changed init of the chip model or is that only affecting 
82c686b? Please also note that pegasos2 is not enabled by default due to 
needing undistributable firmware ROM so to test it you need to enable it 
in default-configs/devices/ppc-softmmu.mak


Regards,
BALATON Zoltan


Philippe Mathieu-Daudé (6):
 hw/isa/vt82c686: Name output IRQ as 'intr'
 hw/isa/vt82c686: Simplify removing unuseful qemu_allocate_irqs() call
 hw/isa/vt82c686: Let ISA function expose ISA IRQs
 hw/ide/via: Replace magic 2 value by ARRAY_SIZE / MAX_IDE_DEVS
 hw/ide/via: Connect IDE function output IRQs to the ISA function input
 hw/southbridge/vt82c686: Introduce VT82C686B_SOUTHBRIDGE

hw/ide/via.c   |  31 ---
hw/isa/vt82c686.c  |  27 +-
hw/mips/fuloong2e.c|  35 +++-
hw/southbridge/vt82c686.c  | 107 +
MAINTAINERS|   1 +
hw/Kconfig |   1 +
hw/isa/Kconfig |   9 
hw/meson.build |   1 +
hw/southbridge/Kconfig |   8 +++
hw/southbridge/meson.build |   1 +
10 files changed, 164 insertions(+), 57 deletions(-)
create mode 100644 hw/southbridge/vt82c686.c
create mode 100644 hw/southbridge/Kconfig
create mode 100644 hw/southbridge/meson.build

--
2.26.3




[PATCH 0/3] hw/virtio: Constify VirtIOFeature

2021-05-11 Thread Philippe Mathieu-Daudé
Trivial patches to keep VirtIOFeature arrays read-only
(better safe than sorry).

Philippe Mathieu-Daudé (3):
  hw/virtio: Pass virtio_feature_get_config_size() a const argument
  virtio-blk: Constify VirtIOFeature feature_sizes[]
  virtio-net: Constify VirtIOFeature feature_sizes[]

 include/hw/virtio/virtio.h | 2 +-
 hw/block/virtio-blk.c  | 2 +-
 hw/net/virtio-net.c| 2 +-
 hw/virtio/virtio.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

-- 
2.26.3





[PATCH 3/3] virtio-net: Constify VirtIOFeature feature_sizes[]

2021-05-11 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/net/virtio-net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 66b9ff45118..6b7e8dd04ef 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -89,7 +89,7 @@
  VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \
  VIRTIO_NET_RSS_HASH_TYPE_UDP_EX)
 
-static VirtIOFeature feature_sizes[] = {
+static const VirtIOFeature feature_sizes[] = {
 {.flags = 1ULL << VIRTIO_NET_F_MAC,
  .end = endof(struct virtio_net_config, mac)},
 {.flags = 1ULL << VIRTIO_NET_F_STATUS,
-- 
2.26.3




[PATCH 1/3] hw/virtio: Pass virtio_feature_get_config_size() a const argument

2021-05-11 Thread Philippe Mathieu-Daudé
The VirtIOFeature structure isn't modified, mark it const.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/virtio/virtio.h | 2 +-
 hw/virtio/virtio.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b7ece7a6a89..8bab9cfb750 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -43,7 +43,7 @@ typedef struct VirtIOFeature {
 size_t end;
 } VirtIOFeature;
 
-size_t virtio_feature_get_config_size(VirtIOFeature *features,
+size_t virtio_feature_get_config_size(const VirtIOFeature *features,
   uint64_t host_features);
 
 typedef struct VirtQueue VirtQueue;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 9e13cb9e3ad..e02544b2df7 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2981,7 +2981,7 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
 return ret;
 }
 
-size_t virtio_feature_get_config_size(VirtIOFeature *feature_sizes,
+size_t virtio_feature_get_config_size(const VirtIOFeature *feature_sizes,
   uint64_t host_features)
 {
 size_t config_size = 0;
-- 
2.26.3




Re: [PATCH v3 16/33] nbd/client-connection: add possibility of negotiation

2021-05-11 Thread Roman Kagan
On Fri, Apr 16, 2021 at 11:08:54AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Add arguments and logic to support nbd negotiation in the same thread
> after successful connection.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/nbd.h |   9 +++-
>  block/nbd.c |   4 +-
>  nbd/client-connection.c | 105 ++--
>  3 files changed, 109 insertions(+), 9 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 57381be76f..5d86e6a393 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -409,11 +409,16 @@ const char *nbd_err_lookup(int err);
>  /* nbd/client-connection.c */
>  typedef struct NBDClientConnection NBDClientConnection;
>  
> -NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr);
> +NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,
> +   bool do_negotiation,
> +   const char *export_name,
> +   const char *x_dirty_bitmap,
> +   QCryptoTLSCreds *tlscreds);
>  void nbd_client_connection_release(NBDClientConnection *conn);
>  
>  QIOChannelSocket *coroutine_fn
> -nbd_co_establish_connection(NBDClientConnection *conn, Error **errp);
> +nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info,
> +QIOChannel **ioc, Error **errp);
>  
>  void coroutine_fn nbd_co_establish_connection_cancel(NBDClientConnection 
> *conn);
>  
> diff --git a/block/nbd.c b/block/nbd.c
> index 9bd68dcf10..5e63caaf4b 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -361,7 +361,7 @@ static coroutine_fn void 
> nbd_reconnect_attempt(BDRVNBDState *s)
>  s->ioc = NULL;
>  }
>  
> -s->sioc = nbd_co_establish_connection(s->conn, NULL);
> +s->sioc = nbd_co_establish_connection(s->conn, NULL, NULL, NULL);
>  if (!s->sioc) {
>  ret = -ECONNREFUSED;
>  goto out;
> @@ -2033,7 +2033,7 @@ static int nbd_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  goto fail;
>  }
>  
> -s->conn = nbd_client_connection_new(s->saddr);
> +s->conn = nbd_client_connection_new(s->saddr, false, NULL, NULL, NULL);
>  
>  /*
>   * establish TCP connection, return error if it fails
> diff --git a/nbd/client-connection.c b/nbd/client-connection.c
> index b45a0bd5f6..ae4a77f826 100644
> --- a/nbd/client-connection.c
> +++ b/nbd/client-connection.c
> @@ -30,14 +30,19 @@
>  #include "qapi/clone-visitor.h"
>  
>  struct NBDClientConnection {
> -/* Initialization constants */
> +/* Initialization constants, never change */
>  SocketAddress *saddr; /* address to connect to */
> +QCryptoTLSCreds *tlscreds;
> +NBDExportInfo initial_info;
> +bool do_negotiation;
>  
>  /*
>   * Result of last attempt. Valid in FAIL and SUCCESS states.
>   * If you want to steal error, don't forget to set pointer to NULL.
>   */
> +NBDExportInfo updated_info;
>  QIOChannelSocket *sioc;
> +QIOChannel *ioc;
>  Error *err;
>  
>  QemuMutex mutex;
> @@ -47,12 +52,25 @@ struct NBDClientConnection {
>  Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */
>  };
>  
> -NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr)
> +NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,
> +   bool do_negotiation,
> +   const char *export_name,
> +   const char *x_dirty_bitmap,
> +   QCryptoTLSCreds *tlscreds)
>  {
>  NBDClientConnection *conn = g_new(NBDClientConnection, 1);
>  
> +object_ref(OBJECT(tlscreds));
>  *conn = (NBDClientConnection) {
>  .saddr = QAPI_CLONE(SocketAddress, saddr),
> +.tlscreds = tlscreds,
> +.do_negotiation = do_negotiation,
> +
> +.initial_info.request_sizes = true,
> +.initial_info.structured_reply = true,
> +.initial_info.base_allocation = true,
> +.initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap),
> +.initial_info.name = g_strdup(export_name ?: "")
>  };
>  
>  qemu_mutex_init(>mutex);
> @@ -68,9 +86,59 @@ static void 
> nbd_client_connection_do_free(NBDClientConnection *conn)
>  }
>  error_free(conn->err);
>  qapi_free_SocketAddress(conn->saddr);
> +object_unref(OBJECT(conn->tlscreds));
> +g_free(conn->initial_info.x_dirty_bitmap);
> +g_free(conn->initial_info.name);
>  g_free(conn);
>  }
>  
> +/*
> + * Connect to @addr and do NBD negotiation if @info is not null. If @tlscreds
> + * given @outioc is provided. @outioc is provided only on success.  The call 
> may

s/given/are given/
s/provided/returned/g

> + * 

[PATCH 2/3] virtio-blk: Constify VirtIOFeature feature_sizes[]

2021-05-11 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/virtio-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index d28979efb8d..f139cd7cc9c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -40,7 +40,7 @@
  * Starting from the discard feature, we can use this array to properly
  * set the config size depending on the features enabled.
  */
-static VirtIOFeature feature_sizes[] = {
+static const VirtIOFeature feature_sizes[] = {
 {.flags = 1ULL << VIRTIO_BLK_F_DISCARD,
  .end = endof(struct virtio_blk_config, discard_sector_alignment)},
 {.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
-- 
2.26.3




Re: [PATCH v3 0/2] virtio-blk: Convert QEMUBH callback to "bitops.h" API

2021-05-11 Thread Stefan Hajnoczi
On Mon, May 10, 2021 at 10:07:56PM +0200, Philippe Mathieu-Daudé wrote:
> This series follow a suggestion from Stefan to use the bitops
> API in virtio-blk:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg805139.html
> 
> Since v2:
> - clear bitmap to avoid spurious interrupts! (Stefan)
> - use 'further' in find_next docstring (Eric)
> - added Richard R-b tag
> 
> Since v1:
> - improved "bitops.h" docstring
> - addressed Richard's review comments
> 
> Philippe Mathieu-Daudé (2):
>   bitops.h: Improve find_xxx_bit() documentation
>   virtio-blk: Convert QEMUBH callback to "bitops.h" API
> 
>  include/qemu/bitops.h   | 15 ---
>  hw/block/dataplane/virtio-blk.c | 20 +---
>  2 files changed, 17 insertions(+), 18 deletions(-)
> 
> -- 
> 2.26.3
> 
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PULL 0/9] scripts/simplebench patches

2021-05-11 Thread Vladimir Sementsov-Ogievskiy

11.05.2021 10:39, Peter Maydell wrote:

On Tue, 11 May 2021 at 06:59, Vladimir Sementsov-Ogievskiy
 wrote:


Hi!

Kindly ping, or what's wrong with it?


You've never sent me a pullreq before. Pull requests from new
people are a pain and take more time to deal with, so I only
look at them when I have the time to do that. Pull requests
from established submaintainers are easy because I know they
know the process and they have their gpg key set up and so on.



OK, I'll wait)

--
Best regards,
Vladimir



[PATCH] docs: add table of contents to QAPI references

2021-05-11 Thread Daniel P . Berrangé
The QAPI reference docs for the guest agent, storage daemon and QMP are
all rather long and hard to navigate unless you already know the name of
the command and can do full text search for it.

A table of contents in each doc will help people locate stuff much more
easily.

Signed-off-by: Daniel P. Berrangé 
---
 docs/interop/qemu-ga-ref.rst | 3 +++
 docs/interop/qemu-qmp-ref.rst| 3 +++
 docs/interop/qemu-storage-daemon-qmp-ref.rst | 3 +++
 3 files changed, 9 insertions(+)

diff --git a/docs/interop/qemu-ga-ref.rst b/docs/interop/qemu-ga-ref.rst
index 3f1c4f908f..db1e946124 100644
--- a/docs/interop/qemu-ga-ref.rst
+++ b/docs/interop/qemu-ga-ref.rst
@@ -10,4 +10,7 @@ QEMU Guest Agent Protocol Reference
TODO: display the QEMU version, both here and in our Sphinx manuals
more generally.
 
+.. contents::
+   :depth: 3
+
 .. qapi-doc:: qga/qapi-schema.json
diff --git a/docs/interop/qemu-qmp-ref.rst b/docs/interop/qemu-qmp-ref.rst
index c8abaaf8e3..b5bebf6b9a 100644
--- a/docs/interop/qemu-qmp-ref.rst
+++ b/docs/interop/qemu-qmp-ref.rst
@@ -10,4 +10,7 @@ QEMU QMP Reference Manual
TODO: display the QEMU version, both here and in our Sphinx manuals
more generally.
 
+.. contents::
+   :depth: 3
+
 .. qapi-doc:: qapi/qapi-schema.json
diff --git a/docs/interop/qemu-storage-daemon-qmp-ref.rst 
b/docs/interop/qemu-storage-daemon-qmp-ref.rst
index caf9dad23a..d0ebb42ebd 100644
--- a/docs/interop/qemu-storage-daemon-qmp-ref.rst
+++ b/docs/interop/qemu-storage-daemon-qmp-ref.rst
@@ -10,4 +10,7 @@ QEMU Storage Daemon QMP Reference Manual
TODO: display the QEMU version, both here and in our Sphinx manuals
more generally.
 
+.. contents::
+   :depth: 3
+
 .. qapi-doc:: storage-daemon/qapi/qapi-schema.json
-- 
2.31.1




Re: [PATCH v2 5/5] blkdebug: protect rules and suspended_reqs with a lock

2021-05-11 Thread Emanuele Giuseppe Esposito




On 11/05/2021 10:37, Paolo Bonzini wrote:

On 07/05/21 17:29, Eric Blake wrote:

+    qemu_mutex_lock(>lock);
  QLIST_FOREACH(r, >suspended_reqs, next) {
  if (!strcmp(r->tag, tag)) {
+    qemu_mutex_unlock(>lock);
  return true;
  }
  }
+    qemu_mutex_unlock(>lock);
  return false;

Would code like this be easier to write by using QEMU_LOCK_GUARD from
lockable.h?


Yes, this one would.  In other cases (rule_check) it's not so clear cut. 
  It depends whether you prefer to have the simplest code, or rather to 
have homogeneous use of either guards or lock/unlock.


Makes sense. I will use the lock guard and fix the "yield" typos in the 
other patches.


Thank you,
Emanuele




Re: [PATCH v2 5/5] blkdebug: protect rules and suspended_reqs with a lock

2021-05-11 Thread Paolo Bonzini

On 07/05/21 17:29, Eric Blake wrote:

+qemu_mutex_lock(>lock);
  QLIST_FOREACH(r, >suspended_reqs, next) {
  if (!strcmp(r->tag, tag)) {
+qemu_mutex_unlock(>lock);
  return true;
  }
  }
+qemu_mutex_unlock(>lock);
  return false;

Would code like this be easier to write by using QEMU_LOCK_GUARD from
lockable.h?


Yes, this one would.  In other cases (rule_check) it's not so clear cut. 
 It depends whether you prefer to have the simplest code, or rather to 
have homogeneous use of either guards or lock/unlock.


Paolo




Re: [PATCH v2] block: Improve backing file validation

2021-05-11 Thread Daniel P . Berrangé
On Tue, May 11, 2021 at 01:55:18PM +0800, Li Zhijian wrote:
> Image below user cases:
> case 1:
> ```
> $ qemu-img create -f raw source.raw 1G
> $ qemu-img create -f qcow2 -F raw -b source.raw ./source.raw
> qemu-img info source.raw
> image: source.raw
> file format: qcow2
> virtual size: 193K (197120 bytes)
> disk size: 196K
> cluster_size: 65536
> backing file: source.raw <<
> backing file format: raw
> Format specific information:
> compat: 1.1
> lazy refcounts: false
> refcount bits: 16
> corrupt: false
> ```
> 
> case 2:
> ```
> $ qemu-img create -f raw source.raw 1G
> $ ln -sf source.raw destination.qcow2
> $ qemu-img create -f qcow2 -F raw -b source.raw ./destination.qcow2
> qemu-img info source.raw
> image: source.raw
> file format: qcow2 <<
> virtual size: 2.0G (2147483648 bytes)
> disk size: 196K
> cluster_size: 65536
> backing file: source.raw
> backing file format: raw
> Format specific information:
> compat: 1.1
> lazy refcounts: false
> refcount bits: 16
> corrupt: false
> ```
> Generally, we don't expect to corrupte the source.raw anyway, while
> actually it does.
> 
> Here we check their inode number instead of file name.
> 
> Suggested-by: Daniel P. Berrangé 
> Signed-off-by: Li Zhijian 
> 
> ---
> v2: utilize stat() instead of realpath() (Daniel)
> 
> Signed-off-by: Li Zhijian 
> ---
>  block.c | 39 ---
>  1 file changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 9ad725d205..db4ae57959 100644
> --- a/block.c
> +++ b/block.c
> @@ -6431,6 +6431,37 @@ bool bdrv_op_blocker_is_empty(BlockDriverState *bs)
>  return true;
>  }
>  
> +static bool validate_backing_file(const char *filename,
> +  const char *backing_file, Error **errp)
> +{
> +struct stat filename_stat, backing_stat;
> +
> +if (backing_file[0] == '\0') {
> +error_setg(errp, "Expected backing file name, got empty string");
> +goto out;
> +}
> +
> +/* check whether filename and backing_file are refering to the same file 
> */
> +if (stat(backing_file, _stat) == -1) {
> +error_setg(errp, "Cannot stat backing file %s", backing_file);
> +goto out;
> +}
> +if (stat(filename, _stat) == -1) {
> +/* Simply consider filename doesn't exist, no need to further check 
> */
> +return true;
> +}
> +if ((filename_stat.st_dev == backing_stat.st_dev) &&
> +(filename_stat.st_ino == backing_stat.st_ino)) {
> +error_setg(errp, "Error: Trying to create an image with the "
> + "same filename as the backing file");
> +goto out;
> +}
> +
> +return true;
> +out:
> +return false;
> +}
> +
>  void bdrv_img_create(const char *filename, const char *fmt,
>   const char *base_filename, const char *base_fmt,
>   char *options, uint64_t img_size, int flags, bool quiet,
> @@ -6507,13 +6538,7 @@ void bdrv_img_create(const char *filename, const char 
> *fmt,
>  
>  backing_file = qemu_opt_get(opts, BLOCK_OPT_BACKING_FILE);
>  if (backing_file) {
> -if (!strcmp(filename, backing_file)) {
> -error_setg(errp, "Error: Trying to create an image with the "
> - "same filename as the backing file");
> -goto out;
> -}
> -if (backing_file[0] == '\0') {
> -error_setg(errp, "Expected backing file name, got empty string");
> +if (!validate_backing_file(filename, backing_file, errp)) {
>  goto out;
>  }
>  }

Thinking about this again, this seems to be quite high in the generic block
layer code. As such I don't think we can assume that the backing file here
is actually a plain file on disk. IIUC the backing file could still be any
of the block drivers. Only once we get down into the protocol specific
drivers can be validate the type of backend.

I'm not sure what the right way to deal with that is, so perhaps Kevin or
Max can make a suggestion.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 6/6] aiopool: protect with a mutex

2021-05-11 Thread Paolo Bonzini

On 10/05/21 13:56, Vladimir Sementsov-Ogievskiy wrote:


+    }
-    if (task->ret < 0 && pool->status == 0) {
-    pool->status = task->ret;
+    if (ret < 0) {
+    qatomic_cmpxchg(>status, 0, ret);
  }


Can we just do it inside critical section above and avoid extra cmpxchg? 
We'll need just qatomic_set as a pair to qatomic_read()


Good idea.


  g_free(task);
-    if (pool->waiting) {
-    pool->waiting = false;
-    aio_co_wake(pool->main_co);
-    }
+    qemu_co_queue_next(>queue);


this call doesn't need mutex protection?


It does indeed.

I second the idea of "stealing" Denis's two patches to block/aio_task 
and only adding the mutex (plus qatomic_read/set) here.


Paolo

Then we should modify comment 
insid AioTaskPool structure.


Anyway, I think it's simpler to just have one QEMU_MUTEX_GUARD() for the 
whole function.





Re: [PATCH 0/3] vhost-user-blk-test: add tests for the vhost-user-blk server

2021-05-11 Thread Stefan Hajnoczi
On Mon, Mar 22, 2021 at 09:23:24AM +, Stefan Hajnoczi wrote:
> These patches add a qtest for the vhost-user-blk server. CI found several
> issues that caused these patches to be dropped from Michael Tsirkin and Kevin
> Wolf's pull requests in the past. Hopefully they will go in smoothly this 
> time!
> 
> Coiby Xu (1):
>   test: new qTest case to test the vhost-user-blk-server
> 
> Stefan Hajnoczi (2):
>   tests/qtest: add multi-queue test case to vhost-user-blk-test
>   vhost-user-blk-test: test discard/write zeroes invalid inputs
> 
>  MAINTAINERS |   2 +
>  tests/qtest/libqos/vhost-user-blk.h |  48 ++
>  tests/qtest/libqos/vhost-user-blk.c | 130 
>  tests/qtest/vhost-user-blk-test.c   | 989 
>  tests/qtest/libqos/meson.build  |   1 +
>  tests/qtest/meson.build |   4 +
>  6 files changed, 1174 insertions(+)
>  create mode 100644 tests/qtest/libqos/vhost-user-blk.h
>  create mode 100644 tests/qtest/libqos/vhost-user-blk.c
>  create mode 100644 tests/qtest/vhost-user-blk-test.c

Ping for QEMU 6.1

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 5/6] co-shared-resource: protect with a mutex

2021-05-11 Thread Paolo Bonzini

On 10/05/21 13:40, Vladimir Sementsov-Ogievskiy wrote:



co-shared-resource is currently not thread-safe, as also reported
in co-shared-resource.h. Add a QemuMutex because co_try_get_from_shres
can also be invoked from non-coroutine context.


But it doesn't. It's called only from co_get_from_shres(). So, better 
make it a static function first.


It's a sensible interface though.  It lets you sleep or retry in your 
own way if you cannot get the resources, so (apart from the 
unlocked/locked confusion in the names) I like keeping it in the public API.


Paolo




Re: [PATCH] block/export: improve vu_blk_sect_range_ok()

2021-05-11 Thread Stefan Hajnoczi
On Wed, Mar 31, 2021 at 03:27:27PM +0100, Stefan Hajnoczi wrote:
> The checks in vu_blk_sect_range_ok() assume VIRTIO_BLK_SECTOR_SIZE is
> equal to BDRV_SECTOR_SIZE. This is true, but let's add a
> QEMU_BUILD_BUG_ON() to make it explicit.
> 
> We might as well check that the request buffer size is a multiple of
> VIRTIO_BLK_SECTOR_SIZE while we're at it.
> 
> Suggested-by: Max Reitz 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  block/export/vhost-user-blk-server.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)

Ping for QEMU 6.1.

Stefan

> 
> diff --git a/block/export/vhost-user-blk-server.c 
> b/block/export/vhost-user-blk-server.c
> index fa06996d37..1862563336 100644
> --- a/block/export/vhost-user-blk-server.c
> +++ b/block/export/vhost-user-blk-server.c
> @@ -70,9 +70,16 @@ static void vu_blk_req_complete(VuBlkReq *req)
>  static bool vu_blk_sect_range_ok(VuBlkExport *vexp, uint64_t sector,
>   size_t size)
>  {
> -uint64_t nb_sectors = size >> BDRV_SECTOR_BITS;
> +uint64_t nb_sectors;
>  uint64_t total_sectors;
>  
> +if (size % VIRTIO_BLK_SECTOR_SIZE) {
> +return false;
> +}
> +
> +nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS;
> +
> +QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE);
>  if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
>  return false;
>  }
> -- 
> 2.30.2
> 


signature.asc
Description: PGP signature


Re: [PATCH] hw/block/nvme: re-enable NVMe PCI hotplug

2021-05-11 Thread Philippe Mathieu-Daudé
Cc'ing Klaus (maintainer)

On 5/11/21 9:35 AM, Hannes Reinecke wrote:
> Ever since commit e570768566 ("hw/block/nvme: support for shared
> namespace in subsystem") NVMe PCI hotplug is broken, as the PCI
> hotplug infrastructure will only work for the nvme devices (which
> are PCI devices), but not for any attached namespaces.
> So when re-adding the NVMe PCI device via 'device_add' the NVMe
> controller is added, but all namespaces are missing.
> This patch adds device hotplug hooks for NVMe namespaces, such that one
> can call 'device_add nvme-ns' to (re-)attach the namespaces after
> the PCI NVMe device 'device_add nvme' hotplug call.
> 
> Fixes: e570768566 ("hw/block/nvme: support for shared namespace in subsystem")
> Signed-off-by: Hannes Reinecke 
> ---
>  capstone   |  2 +-

>  roms/SLOF  |  2 +-
>  roms/openbios  |  2 +-
>  roms/u-boot|  2 +-
>  9 files changed, 93 insertions(+), 20 deletions(-)
> 
> diff --git a/capstone b/capstone
> index f8b1b83301..22ead3e0bf 16
> --- a/capstone
> +++ b/capstone
> @@ -1 +1 @@
> -Subproject commit f8b1b833015a4ae47110ed068e0deb7106ced66d
> +Subproject commit 22ead3e0bfdb87516656453336160e0a37b066bf

> index 33a7322de1..e18ddad851 16
> --- a/roms/SLOF
> +++ b/roms/SLOF
> @@ -1 +1 @@
> -Subproject commit 33a7322de13e9dca4b38851a345a58d37e7a441d
> +Subproject commit e18ddad8516ff2cfe36ec130200318f7251aa78c
> diff --git a/roms/openbios b/roms/openbios
> index 4a0041107b..7f28286f5c 16
> --- a/roms/openbios
> +++ b/roms/openbios
> @@ -1 +1 @@
> -Subproject commit 4a0041107b8ef77e0e8337bfcb5f8078887261a7
> +Subproject commit 7f28286f5cb1ca682e3ba0a8706d8884f12bc49e
> diff --git a/roms/u-boot b/roms/u-boot
> index b46dd116ce..d3689267f9 16
> --- a/roms/u-boot
> +++ b/roms/u-boot
> @@ -1 +1 @@
> -Subproject commit b46dd116ce03e235f2a7d4843c6278e1da44b5e1
> +Subproject commit d3689267f92c5956e09cc7d1baa4700141662bff
> 

Submodule changes unlikely related.




Re: [PATCH] hw/block/nvme: re-enable NVMe PCI hotplug

2021-05-11 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20210511073511.32511-1-h...@suse.de/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210511073511.32511-1-h...@suse.de
Subject: [PATCH] hw/block/nvme: re-enable NVMe PCI hotplug

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
   e58c7a3..e4f3ede  master -> master
 * [new tag] patchew/20210511073511.32511-1-h...@suse.de -> 
patchew/20210511073511.32511-1-h...@suse.de
Switched to a new branch 'test'
1d24622 hw/block/nvme: re-enable NVMe PCI hotplug

=== OUTPUT BEGIN ===
ERROR: braces {} are necessary for all arms of this statement
#101: FILE: hw/block/nvme-subsys.c:50:
+if (!n->subsys)
[...]

total: 1 errors, 0 warnings, 182 lines checked

Commit 1d24622c5d19 (hw/block/nvme: re-enable NVMe PCI hotplug) has style 
problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210511073511.32511-1-h...@suse.de/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PULL 0/9] scripts/simplebench patches

2021-05-11 Thread Peter Maydell
On Tue, 11 May 2021 at 06:59, Vladimir Sementsov-Ogievskiy
 wrote:
>
> Hi!
>
> Kindly ping, or what's wrong with it?

You've never sent me a pullreq before. Pull requests from new
people are a pain and take more time to deal with, so I only
look at them when I have the time to do that. Pull requests
from established submaintainers are easy because I know they
know the process and they have their gpg key set up and so on.

-- PMM



[PATCH] hw/block/nvme: re-enable NVMe PCI hotplug

2021-05-11 Thread Hannes Reinecke
Ever since commit e570768566 ("hw/block/nvme: support for shared
namespace in subsystem") NVMe PCI hotplug is broken, as the PCI
hotplug infrastructure will only work for the nvme devices (which
are PCI devices), but not for any attached namespaces.
So when re-adding the NVMe PCI device via 'device_add' the NVMe
controller is added, but all namespaces are missing.
This patch adds device hotplug hooks for NVMe namespaces, such that one
can call 'device_add nvme-ns' to (re-)attach the namespaces after
the PCI NVMe device 'device_add nvme' hotplug call.

Fixes: e570768566 ("hw/block/nvme: support for shared namespace in subsystem")
Signed-off-by: Hannes Reinecke 
---
 capstone   |  2 +-
 hw/block/nvme-ns.c | 31 ++
 hw/block/nvme-subsys.c | 12 +
 hw/block/nvme-subsys.h |  1 +
 hw/block/nvme.c| 60 +++---
 hw/block/nvme.h|  1 +
 roms/SLOF  |  2 +-
 roms/openbios  |  2 +-
 roms/u-boot|  2 +-
 9 files changed, 93 insertions(+), 20 deletions(-)

diff --git a/capstone b/capstone
index f8b1b83301..22ead3e0bf 16
--- a/capstone
+++ b/capstone
@@ -1 +1 @@
-Subproject commit f8b1b833015a4ae47110ed068e0deb7106ced66d
+Subproject commit 22ead3e0bfdb87516656453336160e0a37b066bf
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7bb618f182..3a7e01f10f 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -526,6 +526,36 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 nvme_attach_ns(n, ns);
 }
 
+static void nvme_ns_unrealize(DeviceState *dev)
+{
+NvmeNamespace *ns = NVME_NS(dev);
+BusState *s = qdev_get_parent_bus(dev);
+NvmeCtrl *n = NVME(s->parent);
+NvmeSubsystem *subsys = n->subsys;
+uint32_t nsid = ns->params.nsid;
+int i;
+
+nvme_ns_drain(ns);
+nvme_ns_shutdown(ns);
+nvme_ns_cleanup(ns);
+
+if (subsys) {
+subsys->namespaces[nsid] = NULL;
+
+if (ns->params.shared) {
+for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+NvmeCtrl *ctrl = subsys->ctrls[i];
+
+if (ctrl) {
+nvme_detach_ns(ctrl, ns);
+}
+}
+return;
+}
+}
+nvme_detach_ns(n, ns);
+}
+
 static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
 DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false),
@@ -563,6 +593,7 @@ static void nvme_ns_class_init(ObjectClass *oc, void *data)
 
 dc->bus_type = TYPE_NVME_BUS;
 dc->realize = nvme_ns_realize;
+dc->unrealize = nvme_ns_unrealize;
 device_class_set_props(dc, nvme_ns_props);
 dc->desc = "Virtual NVMe namespace";
 }
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index 9604c19117..1c00508f33 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -42,6 +42,18 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
 return cntlid;
 }
 
+void nvme_subsys_unregister_ctrl(NvmeCtrl *n)
+{
+NvmeSubsystem *subsys = n->subsys;
+int cntlid = n->cntlid;
+
+if (!n->subsys)
+return;
+assert(cntlid < ARRAY_SIZE(subsys->ctrls));
+subsys->ctrls[cntlid] = NULL;
+n->cntlid = -1;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 const char *nqn = subsys->params.nqn ?
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 7d7ef5f7f1..2d8a146c4f 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -32,6 +32,7 @@ typedef struct NvmeSubsystem {
 } NvmeSubsystem;
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+void nvme_subsys_unregister_ctrl(NvmeCtrl *n);
 
 static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
 uint32_t cntlid)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 5fe082ec34..515678b686 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4963,26 +4963,12 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 }
 
 nvme_attach_ns(ctrl, ns);
-__nvme_select_ns_iocs(ctrl, ns);
 } else {
 if (!nvme_ns(ctrl, nsid)) {
 return NVME_NS_NOT_ATTACHED | NVME_DNR;
 }
 
-ctrl->namespaces[nsid - 1] = NULL;
-ns->attached--;
-
-nvme_update_dmrsl(ctrl);
-}
-
-/*
- * Add namespace id to the changed namespace id list for event clearing
- * via Get Log Page command.
- */
-if (!test_and_set_bit(nsid, ctrl->changed_nsids)) {
-nvme_enqueue_event(ctrl, NVME_AER_TYPE_NOTICE,
-   NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,
-   NVME_LOG_CHANGED_NSLIST);
+nvme_detach_ns(ctrl, ns);
 }
 }
 
@@ -6166,6 +6152,34 @@ void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns)
 
 n->dmrsl = MIN_NON_ZERO(n->dmrsl,
 

[PATCH v2] block: Improve backing file validation

2021-05-11 Thread Li Zhijian
Image below user cases:
case 1:
```
$ qemu-img create -f raw source.raw 1G
$ qemu-img create -f qcow2 -F raw -b source.raw ./source.raw
qemu-img info source.raw
image: source.raw
file format: qcow2
virtual size: 193K (197120 bytes)
disk size: 196K
cluster_size: 65536
backing file: source.raw <<
backing file format: raw
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
```

case 2:
```
$ qemu-img create -f raw source.raw 1G
$ ln -sf source.raw destination.qcow2
$ qemu-img create -f qcow2 -F raw -b source.raw ./destination.qcow2
qemu-img info source.raw
image: source.raw
file format: qcow2 <<
virtual size: 2.0G (2147483648 bytes)
disk size: 196K
cluster_size: 65536
backing file: source.raw
backing file format: raw
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
```
Generally, we don't expect to corrupte the source.raw anyway, while
actually it does.

Here we check their inode number instead of file name.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Li Zhijian 

---
v2: utilize stat() instead of realpath() (Daniel)

Signed-off-by: Li Zhijian 
---
 block.c | 39 ---
 1 file changed, 32 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index 9ad725d205..db4ae57959 100644
--- a/block.c
+++ b/block.c
@@ -6431,6 +6431,37 @@ bool bdrv_op_blocker_is_empty(BlockDriverState *bs)
 return true;
 }
 
+static bool validate_backing_file(const char *filename,
+  const char *backing_file, Error **errp)
+{
+struct stat filename_stat, backing_stat;
+
+if (backing_file[0] == '\0') {
+error_setg(errp, "Expected backing file name, got empty string");
+goto out;
+}
+
+/* check whether filename and backing_file are refering to the same file */
+if (stat(backing_file, _stat) == -1) {
+error_setg(errp, "Cannot stat backing file %s", backing_file);
+goto out;
+}
+if (stat(filename, _stat) == -1) {
+/* Simply consider filename doesn't exist, no need to further check */
+return true;
+}
+if ((filename_stat.st_dev == backing_stat.st_dev) &&
+(filename_stat.st_ino == backing_stat.st_ino)) {
+error_setg(errp, "Error: Trying to create an image with the "
+ "same filename as the backing file");
+goto out;
+}
+
+return true;
+out:
+return false;
+}
+
 void bdrv_img_create(const char *filename, const char *fmt,
  const char *base_filename, const char *base_fmt,
  char *options, uint64_t img_size, int flags, bool quiet,
@@ -6507,13 +6538,7 @@ void bdrv_img_create(const char *filename, const char 
*fmt,
 
 backing_file = qemu_opt_get(opts, BLOCK_OPT_BACKING_FILE);
 if (backing_file) {
-if (!strcmp(filename, backing_file)) {
-error_setg(errp, "Error: Trying to create an image with the "
- "same filename as the backing file");
-goto out;
-}
-if (backing_file[0] == '\0') {
-error_setg(errp, "Expected backing file name, got empty string");
+if (!validate_backing_file(filename, backing_file, errp)) {
 goto out;
 }
 }
-- 
2.30.2