from:"Stefan Hajnoczi"

Re: [PULL v2 00/40] Misc patches for 2023-08-31

2023-09-05 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PATCH v2 11/12] hw/vmapple/virtio-blk: Add support for apple virtio-blk

2023-08-31 Thread Stefan Hajnoczi

On Wed, Aug 30, 2023 at 04:14:24PM +, Alexander Graf wrote:
> Apple has its own virtio-blk PCI device ID where it deviates from the
> official virtio-pci spec slightly: It puts a new "apple type"
> field at a static offset in config space and introduces a new barrier
> command.
> 
> This patch first creates a mechanism for virtio-blk downstream classes to
> handle unknown commands. It then creates such a downstream class and a new
> vmapple-virtio-blk-pci class which support the additional apple type config
> identifier as well as the barrier command.
> 
> It then exposes 2 subclasses from that that we can use to expose root and
> aux virtio-blk devices: "vmapple-virtio-root" and "vmapple-virtio-aux".
> 
> Signed-off-by: Alexander Graf 

Aside from my comments below:

Reviewed-by: Stefan Hajnoczi 

> 
> ---
> 
> v1 -> v2:
> 
>   - Rework to make all vmapple virtio-blk logic a subclass
> ---
>  include/hw/pci/pci_ids.h|   1 +
>  include/hw/virtio/virtio-blk.h  |  12 +-
>  include/hw/vmapple/virtio-blk.h |  39 ++
>  hw/block/virtio-blk.c   |  19 ++-
>  hw/vmapple/virtio-blk.c | 212 
>  hw/vmapple/Kconfig  |   3 +
>  hw/vmapple/meson.build  |   1 +
>  7 files changed, 282 insertions(+), 5 deletions(-)
>  create mode 100644 include/hw/vmapple/virtio-blk.h
>  create mode 100644 hw/vmapple/virtio-blk.c
> 
> diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
> index e4386ebb20..74e589a298 100644
> --- a/include/hw/pci/pci_ids.h
> +++ b/include/hw/pci/pci_ids.h
> @@ -188,6 +188,7 @@
>  #define PCI_DEVICE_ID_APPLE_UNI_N_AGP0x0020
>  #define PCI_DEVICE_ID_APPLE_U3_AGP   0x004b
>  #define PCI_DEVICE_ID_APPLE_UNI_N_GMAC   0x0021
> +#define PCI_DEVICE_ID_APPLE_VIRTIO_BLK   0x1a00
>  
>  #define PCI_VENDOR_ID_SUN0x108e
>  #define PCI_DEVICE_ID_SUN_EBUS   0x1000
> diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
> index dafec432ce..381a906410 100644
> --- a/include/hw/virtio/virtio-blk.h
> +++ b/include/hw/virtio/virtio-blk.h
> @@ -23,7 +23,7 @@
>  #include "qom/object.h"
>  
>  #define TYPE_VIRTIO_BLK "virtio-blk-device"
> -OBJECT_DECLARE_SIMPLE_TYPE(VirtIOBlock, VIRTIO_BLK)
> +OBJECT_DECLARE_TYPE(VirtIOBlock, VirtIOBlkClass, VIRTIO_BLK)
>  
>  /* This is the last element of the write scatter-gather list */
>  struct virtio_blk_inhdr
> @@ -91,6 +91,16 @@ typedef struct MultiReqBuffer {
>  bool is_write;
>  } MultiReqBuffer;
>  
> +typedef struct VirtIOBlkClass {
> +/*< private >*/
> +VirtioDeviceClass parent;
> +/*< public >*/
> +bool (*handle_unknown_request)(VirtIOBlockReq *req, MultiReqBuffer *mrb,
> +   uint32_t type);

Vendor-specific commands ;_;

> +} VirtIOBlkClass;
> +
>  void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq);
> +void virtio_blk_free_request(VirtIOBlockReq *req);
> +void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status);
>  
>  #endif
> diff --git a/include/hw/vmapple/virtio-blk.h b/include/hw/vmapple/virtio-blk.h
> new file mode 100644
> index 00..b23106a3df
> --- /dev/null
> +++ b/include/hw/vmapple/virtio-blk.h
> @@ -0,0 +1,39 @@
> +/*
> + * VMApple specific VirtIO Block implementation
> + *
> + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef HW_VMAPPLE_CFG_H
> +#define HW_VMAPPLE_CFG_H
> +
> +#include "hw/sysbus.h"

I'm surprised to see this header since this isn't a SysBus device. Is it
really needed?

> +#include "qom/object.h"
> +#include "hw/virtio/virtio-pci.h"
> +#include "hw/virtio/virtio-blk.h"
> +
> +#define TYPE_VMAPPLE_VIRTIO_BLK "vmapple-virtio-blk"
> +#define TYPE_VMAPPLE_VIRTIO_ROOT "vmapple-virtio-root"
> +#define TYPE_VMAPPLE_VIRTIO_AUX "vmapple-virtio-aux"
> +
> +OBJECT_DECLARE_TYPE(VMAppleVirtIOBlk, VMAppleVirtIOBlkClass, 
> VMAPPLE_VIRTIO_BLK)
> +
> +typedef struct VMAppleVirtIOBlkClass {
> +/*< private >*/
> +VirtIOBlkClass parent;
> +/*< public >*/
> +void (*get_config)(VirtIODevice *vdev, uint8_t *config);
> +} VMAppleVirtIOBlkClass;
> +
> +typedef struct VMAppleVirtIOBlk {
> +/*  */
> +VirtIOBlock parent_obj;
> +
> +/*  */
> +uint32_t apple_type;
> +} VMAppleVirtIOBlk;
> +
> +#endif /* HW_VMAPPLE_CFG_H */
> diff -

Re: [PATCH v2 08/12] hw/vmapple/bdif: Introduce vmapple backdoor interface

2023-08-31 Thread Stefan Hajnoczi

On Wed, Aug 30, 2023 at 04:14:21PM +, Alexander Graf wrote:
> The VMApple machine exposes AUX and ROOT block devices (as well as USB OTG
> emulation) via virtio-pci as well as a special, simple backdoor platform
> device.
> 
> This patch implements this backdoor platform device to the best of my
> understanding. I left out any USB OTG parts; they're only needed for
> guest recovery and I don't understand the protocol yet.

Out of curiosity: This interface has no way to check the size of the
block device? I guess that's not necessary in a boot loader that just
parses a boot record and then loads the next stage...

I posted comments below. Otherwise:
Reviewed-by: Stefan Hajnoczi 

> 
> Signed-off-by: Alexander Graf 
> 
> ---
> 
> v1 -> v2:
> 
>   - Adapt to system_ss meson.build target
> ---
>  include/hw/vmapple/bdif.h |  31 +
>  hw/vmapple/bdif.c | 245 ++
>  hw/vmapple/Kconfig|   2 +
>  hw/vmapple/meson.build|   1 +
>  hw/vmapple/trace-events   |   5 +
>  5 files changed, 284 insertions(+)
>  create mode 100644 include/hw/vmapple/bdif.h
>  create mode 100644 hw/vmapple/bdif.c
> 
> diff --git a/include/hw/vmapple/bdif.h b/include/hw/vmapple/bdif.h
> new file mode 100644
> index 00..65ee43457b
> --- /dev/null
> +++ b/include/hw/vmapple/bdif.h
> @@ -0,0 +1,31 @@
> +/*
> + * VMApple Backdoor Interface
> + *
> + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef HW_VMAPPLE_BDIF_H
> +#define HW_VMAPPLE_BDIF_H
> +
> +#include "hw/sysbus.h"
> +#include "qom/object.h"
> +
> +#define TYPE_VMAPPLE_BDIF "vmapple-bdif"
> +OBJECT_DECLARE_SIMPLE_TYPE(VMAppleBdifState, VMAPPLE_BDIF)
> +
> +struct VMAppleBdifState {
> +/*  */
> +SysBusDevice parent_obj;
> +
> +/*  */
> +BlockBackend *aux;
> +BlockBackend *root;
> +MemoryRegion mmio;
> +};
> +
> +#define VMAPPLE_BDIF_SIZE 0x0020
> +
> +#endif /* HW_VMAPPLE_BDIF_H */
> diff --git a/hw/vmapple/bdif.c b/hw/vmapple/bdif.c
> new file mode 100644
> index 00..36b5915ff3
> --- /dev/null
> +++ b/hw/vmapple/bdif.c
> @@ -0,0 +1,245 @@
> +/*
> + * VMApple Backdoor Interface
> + *
> + * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/vmapple/bdif.h"
> +#include "qemu/log.h"
> +#include "qemu/module.h"
> +#include "qapi/error.h"
> +#include "trace.h"
> +#include "hw/block/block.h"
> +#include "sysemu/block-backend.h"
> +
> +#define REG_DEVID_MASK  0x
> +#define DEVID_ROOT  0x
> +#define DEVID_AUX   0x0001
> +#define DEVID_USB   0x0010
> +
> +#define REG_STATUS  0x0
> +#define REG_STATUS_ACTIVE BIT(0)
> +#define REG_CFG 0x4
> +#define REG_CFG_ACTIVEBIT(1)
> +#define REG_UNK10x8
> +#define REG_BUSY0x10
> +#define REG_BUSY_READYBIT(0)
> +#define REG_UNK20x400
> +#define REG_CMD 0x408
> +#define REG_NEXT_DEVICE 0x420
> +#define REG_UNK30x434
> +
> +typedef struct vblk_sector {
> +uint32_t pad;
> +uint32_t pad2;
> +uint32_t sector;
> +uint32_t pad3;
> +} VblkSector;
> +
> +typedef struct vblk_req_cmd {
> +uint64_t addr;
> +uint32_t len;
> +uint32_t flags;
> +} VblkReqCmd;
> +
> +typedef struct vblk_req {
> +VblkReqCmd sector;
> +VblkReqCmd data;
> +VblkReqCmd retval;
> +} VblkReq;
> +
> +#define VBLK_DATA_FLAGS_READ  0x00030001
> +#define VBLK_DATA_FLAGS_WRITE 0x00010001
> +
> +#define VBLK_RET_SUCCESS  0
> +#define VBLK_RET_FAILED   1
> +
> +static uint64_t bdif_read(void *opaque, hwaddr offset, unsigned size)
> +{
> +uint64_t ret = -1;
> +uint64_t devid = (offset & REG_DEVID_MASK);
> +
> +switch (offset & ~REG_DEVID_MASK) {
> +case REG_STATUS:
> +ret = REG_STATUS_ACTIVE;
> +break;
> +case REG_CFG:
> +ret = REG_CFG_ACTIVE;
> +break;
> +case REG_UNK1:
> +ret = 0x420;
> +break;
> +case REG_BUSY:
> +ret = REG_BUSY_READY;
> +break;
> +case REG_UNK2

Re: [PATCH 4/7] block/dirty-bitmap: Clean up local variable shadowing

2023-08-31 Thread Stefan Hajnoczi

On Thu, Aug 31, 2023 at 03:25:43PM +0200, Markus Armbruster wrote:
> Local variables shadowing other local variables or parameters make the
> code needlessly hard to understand.  Tracked down with -Wshadow=local.
> Clean up: delete inner declarations when they are actually redundant,
> else rename variables.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  block/monitor/bitmap-qmp-cmds.c | 2 +-
>  block/qcow2-bitmap.c| 3 +--
>  2 files changed, 2 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 5/7] block/vdi: Clean up local variable shadowing

2023-08-31 Thread Stefan Hajnoczi

On Thu, Aug 31, 2023 at 03:25:44PM +0200, Markus Armbruster wrote:
> Local variables shadowing other local variables or parameters make the
> code needlessly hard to understand.  Tracked down with -Wshadow=local.
> Clean up: delete inner declarations when they are actually redundant,
> else rename variables.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  block/vdi.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 6/7] block: Clean up local variable shadowing

2023-08-31 Thread Stefan Hajnoczi

On Thu, Aug 31, 2023 at 03:25:45PM +0200, Markus Armbruster wrote:
> Local variables shadowing other local variables or parameters make the
> code needlessly hard to understand.  Tracked down with -Wshadow=local.
> Clean up: delete inner declarations when they are actually redundant,
> else rename variables.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  block.c  |  7 ---
>  block/rbd.c  |  2 +-
>  block/stream.c   |  1 -
>  block/vvfat.c| 34 +-
>  hw/block/xen-block.c |  6 +++---
>  5 files changed, 25 insertions(+), 25 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PULL 0/1] Quick fix patches

2023-08-31 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/24] target-arm queue

2023-08-31 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL v2 0/2] xen-virtio-2-tag

2023-08-31 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/12] testing and gdbstub updates

2023-08-31 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

[PATCH v3 4/4] io: follow coroutine AioContext in qio_channel_yield()

2023-08-30 Thread Stefan Hajnoczi

The ongoing QEMU multi-queue block layer effort makes it possible for multiple
threads to process I/O in parallel. The nbd block driver is not compatible with
the multi-queue block layer yet because QIOChannel cannot be used easily from
coroutines running in multiple threads. This series changes the QIOChannel API
to make that possible.

In the current API, calling qio_channel_attach_aio_context() sets the
AioContext where qio_channel_yield() installs an fd handler prior to yielding:

  qio_channel_attach_aio_context(ioc, my_ctx);
  ...
  qio_channel_yield(ioc); // my_ctx is used here
  ...
  qio_channel_detach_aio_context(ioc);

This API design has limitations: reading and writing must be done in the same
AioContext and moving between AioContexts involves a cumbersome sequence of API
calls that is not suitable for doing on a per-request basis.

There is no fundamental reason why a QIOChannel needs to run within the
same AioContext every time qio_channel_yield() is called. QIOChannel
only uses the AioContext while inside qio_channel_yield(). The rest of
the time, QIOChannel is independent of any AioContext.

In the new API, qio_channel_yield() queries the AioContext from the current
coroutine using qemu_coroutine_get_aio_context(). There is no need to
explicitly attach/detach AioContexts anymore and
qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are gone.
One coroutine can read from the QIOChannel while another coroutine writes from
a different AioContext.

This API change allows the nbd block driver to use QIOChannel from any thread.
It's important to keep in mind that the block driver already synchronizes
QIOChannel access and ensures that two coroutines never read simultaneously or
write simultaneously.

This patch updates all users of qio_channel_attach_aio_context() to the
new API. Most conversions are simple, but vhost-user-server requires a
new qemu_coroutine_yield() call to quiesce the vu_client_trip()
coroutine when not attached to any AioContext.

While the API is has become simpler, there is one wart: QIOChannel has a
special case for the iohandler AioContext (used for handlers that must not run
in nested event loops). I didn't find an elegant way preserve that behavior, so
I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, true|false)
for opting in to the new AioContext model. By default QIOChannel uses the
iohandler AioHandler. Code that formerly called
qio_channel_attach_aio_context() now calls
qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
created.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Acked-by: Daniel P. Berrangé 
---
 include/io/channel-util.h|  23 ++
 include/io/channel.h |  69 --
 include/qemu/vhost-user-server.h |   1 +
 block/nbd.c  |  11 +--
 io/channel-command.c |  10 ++-
 io/channel-file.c|   9 ++-
 io/channel-null.c|   3 +-
 io/channel-socket.c  |   9 ++-
 io/channel-tls.c |   6 +-
 io/channel-util.c|  24 +++
 io/channel.c | 120 ++-
 migration/channel-block.c|   3 +-
 nbd/server.c |  14 +---
 scsi/qemu-pr-helper.c|   4 +-
 util/vhost-user-server.c |  27 +--
 15 files changed, 216 insertions(+), 117 deletions(-)

diff --git a/include/io/channel-util.h b/include/io/channel-util.h
index a5d720d9a0..fa18a3756d 100644
--- a/include/io/channel-util.h
+++ b/include/io/channel-util.h
@@ -49,4 +49,27 @@
 QIOChannel *qio_channel_new_fd(int fd,
Error **errp);
 
+/**
+ * qio_channel_util_set_aio_fd_handler:
+ * @read_fd: the file descriptor for the read handler
+ * @read_ctx: the AioContext for the read handler
+ * @io_read: the read handler
+ * @write_fd: the file descriptor for the write handler
+ * @write_ctx: the AioContext for the write handler
+ * @io_write: the write handler
+ * @opaque: the opaque argument to the read and write handler
+ *
+ * Set the read and write handlers when @read_ctx and @write_ctx are non-NULL,
+ * respectively. To leave a handler in its current state, pass a NULL
+ * AioContext. To clear a handler, pass a non-NULL AioContext and a NULL
+ * handler.
+ */
+void qio_channel_util_set_aio_fd_handler(int read_fd,
+ AioContext *read_ctx,
+ IOHandler *io_read,
+ int write_fd,
+ AioContext *write_ctx,
+ IOHandler *io_write,
+ void *opaque);
+
 #endif /* QIO_CHANNEL_UTIL_H */
diff --git a/include/io/channel.h b/include/io/channel.h
index 229bf36910..5f9dbaab65 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -81,9 +81,11 @@ struct QIOChannel

[PATCH v3 3/4] io: check there are no qio_channel_yield() coroutines during ->finalize()

2023-08-30 Thread Stefan Hajnoczi

Callers must clean up their coroutines before calling
object_unref(OBJECT(ioc)) to prevent an fd handler leak. Add an
assertion to check this.

This patch is preparation for the fd handler changes that follow.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Eric Blake 
---
 io/channel.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/io/channel.c b/io/channel.c
index 72f0066af5..c415f3fc88 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -653,6 +653,10 @@ static void qio_channel_finalize(Object *obj)
 {
 QIOChannel *ioc = QIO_CHANNEL(obj);
 
+/* Must not have coroutines in qio_channel_yield() */
+assert(!ioc->read_coroutine);
+assert(!ioc->write_coroutine);
+
 g_free(ioc->name);
 
 #ifdef _WIN32
-- 
2.41.0

[PATCH v3 1/4] nbd: drop unused nbd_receive_negotiate() aio_context argument

2023-08-30 Thread Stefan Hajnoczi

aio_context is always NULL, so drop it.

Suggested-by: Fabiano Rosas 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/block/nbd.h | 3 +--
 nbd/client-connection.c | 3 +--
 nbd/client.c| 5 ++---
 qemu-nbd.c  | 4 ++--
 4 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 4428bcffbb..f672b76173 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -324,8 +324,7 @@ typedef struct NBDExportInfo {
 char **contexts;
 } NBDExportInfo;
 
-int nbd_receive_negotiate(AioContext *aio_context, QIOChannel *ioc,
-  QCryptoTLSCreds *tlscreds,
+int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
   const char *hostname, QIOChannel **outioc,
   NBDExportInfo *info, Error **errp);
 void nbd_free_export_list(NBDExportInfo *info, int count);
diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 3d14296c04..aafb3d0fb4 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -146,8 +146,7 @@ static int nbd_connect(QIOChannelSocket *sioc, 
SocketAddress *addr,
 return 0;
 }
 
-ret = nbd_receive_negotiate(NULL, QIO_CHANNEL(sioc), tlscreds,
-tlshostname,
+ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), tlscreds, tlshostname,
 outioc, info, errp);
 if (ret < 0) {
 /*
diff --git a/nbd/client.c b/nbd/client.c
index 479208d5d9..16ec10c8a9 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1014,8 +1014,7 @@ static int nbd_negotiate_finish_oldstyle(QIOChannel *ioc, 
NBDExportInfo *info,
  * Returns: negative errno: failure talking to server
  *  0: server is connected
  */
-int nbd_receive_negotiate(AioContext *aio_context, QIOChannel *ioc,
-  QCryptoTLSCreds *tlscreds,
+int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
   const char *hostname, QIOChannel **outioc,
   NBDExportInfo *info, Error **errp)
 {
@@ -1027,7 +1026,7 @@ int nbd_receive_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 assert(info->name && strlen(info->name) <= NBD_MAX_STRING_SIZE);
 trace_nbd_receive_negotiate_name(info->name);
 
-result = nbd_start_negotiate(aio_context, ioc, tlscreds, hostname, outioc,
+result = nbd_start_negotiate(NULL, ioc, tlscreds, hostname, outioc,
  info->structured_reply, , errp);
 if (result < 0) {
 return result;
diff --git a/qemu-nbd.c b/qemu-nbd.c
index aaccaa3318..b47459f781 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -295,8 +295,8 @@ static void *nbd_client_thread(void *arg)
 goto out;
 }
 
-if (nbd_receive_negotiate(NULL, QIO_CHANNEL(sioc),
-  NULL, NULL, NULL, , _error) < 0) {
+if (nbd_receive_negotiate(QIO_CHANNEL(sioc), NULL, NULL, NULL,
+  , _error) < 0) {
 if (local_error) {
 error_report_err(local_error);
 }
-- 
2.41.0

[PATCH v3 2/4] nbd: drop unused nbd_start_negotiate() aio_context argument

2023-08-30 Thread Stefan Hajnoczi

aio_context is always NULL, so drop it.

Suggested-by: Fabiano Rosas 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
---
 nbd/client.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 16ec10c8a9..bd7e200136 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -877,8 +877,7 @@ static int nbd_list_meta_contexts(QIOChannel *ioc,
  * Returns: negative errno: failure talking to server
  *  non-negative: enum NBDMode describing server abilities
  */
-static int nbd_start_negotiate(AioContext *aio_context, QIOChannel *ioc,
-   QCryptoTLSCreds *tlscreds,
+static int nbd_start_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
const char *hostname, QIOChannel **outioc,
bool structured_reply, bool *zeroes,
Error **errp)
@@ -946,10 +945,6 @@ static int nbd_start_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 return -EINVAL;
 }
 ioc = *outioc;
-if (aio_context) {
-qio_channel_set_blocking(ioc, false, NULL);
-qio_channel_attach_aio_context(ioc, aio_context);
-}
 } else {
 error_setg(errp, "Server does not support STARTTLS");
 return -EINVAL;
@@ -1026,7 +1021,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 assert(info->name && strlen(info->name) <= NBD_MAX_STRING_SIZE);
 trace_nbd_receive_negotiate_name(info->name);
 
-result = nbd_start_negotiate(NULL, ioc, tlscreds, hostname, outioc,
+result = nbd_start_negotiate(ioc, tlscreds, hostname, outioc,
  info->structured_reply, , errp);
 if (result < 0) {
 return result;
@@ -1149,7 +1144,7 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 QIOChannel *sioc = NULL;
 
 *info = NULL;
-result = nbd_start_negotiate(NULL, ioc, tlscreds, hostname, , true,
+result = nbd_start_negotiate(ioc, tlscreds, hostname, , true,
  NULL, errp);
 if (tlscreds && sioc) {
 ioc = sioc;
-- 
2.41.0

[PATCH v3 0/4] io: follow coroutine AioContext in qio_channel_yield()

2023-08-30 Thread Stefan Hajnoczi

v3:
- Fix wrong copy-pasted function name in doc comment [Eric]
- Fix "match" -> "matches" in comment [Eric]
v2:
- Add Patch 1 & 2 to remove unused NBD aio_context arguments and dead code 
[Fabiano]
- Remove attach/detach prototypes from "io/channel.h" [Daniel]
- Add utility function to set fd handlers [Daniel]

The ongoing QEMU multi-queue block layer effort makes it possible for multiple
threads to process I/O in parallel. The nbd block driver is not compatible with
the multi-queue block layer yet because QIOChannel cannot be used easily from
coroutines running in multiple threads. This series changes the QIOChannel API
to make that possible.

Stefan Hajnoczi (4):
  nbd: drop unused nbd_receive_negotiate() aio_context argument
  nbd: drop unused nbd_start_negotiate() aio_context argument
  io: check there are no qio_channel_yield() coroutines during
->finalize()
  io: follow coroutine AioContext in qio_channel_yield()

 include/block/nbd.h  |   3 +-
 include/io/channel-util.h|  23 ++
 include/io/channel.h |  69 -
 include/qemu/vhost-user-server.h |   1 +
 block/nbd.c  |  11 +--
 io/channel-command.c |  10 ++-
 io/channel-file.c|   9 ++-
 io/channel-null.c|   3 +-
 io/channel-socket.c  |   9 ++-
 io/channel-tls.c |   6 +-
 io/channel-util.c|  24 ++
 io/channel.c | 124 ++-
 migration/channel-block.c|   3 +-
 nbd/client-connection.c  |   3 +-
 nbd/client.c |  14 +---
 nbd/server.c |  14 +---
 qemu-nbd.c   |   4 +-
 scsi/qemu-pr-helper.c|   4 +-
 util/vhost-user-server.c |  27 +--
 19 files changed, 228 insertions(+), 133 deletions(-)

-- 
2.41.0

Re: [PULL v3 0/5] Block patches

2023-08-30 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PATCH v2 0/5] tcg patch queue

2023-08-30 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 0/3] Dirty page rate and dirty page limit 20230829 patches

2023-08-30 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-08-30 Thread Stefan Hajnoczi

On Wed, 30 Aug 2023 at 09:30, Laszlo Ersek  wrote:
>
> On 8/30/23 14:10, Stefan Hajnoczi wrote:
> > On Sun, 27 Aug 2023 at 14:31, Laszlo Ersek  wrote:
> >>
> >> (1) The virtio-1.0 specification
> >> <http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html> writes:
> >>
> >>> 3 General Initialization And Device Operation
> >>> 3.1   Device Initialization
> >>> 3.1.1 Driver Requirements: Device Initialization
> >>>
> >>> [...]
> >>>
> >>> 7. Perform device-specific setup, including discovery of virtqueues for
> >>>the device, optional per-bus setup, reading and possibly writing the
> >>>device’s virtio configuration space, and population of virtqueues.
> >>>
> >>> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> >>
> >> and
> >>
> >>> 4 Virtio Transport Options
> >>> 4.1   Virtio Over PCI Bus
> >>> 4.1.4 Virtio Structure PCI Capabilities
> >>> 4.1.4.3   Common configuration structure layout
> >>> 4.1.4.3.2 Driver Requirements: Common configuration structure layout
> >>>
> >>> [...]
> >>>
> >>> The driver MUST configure the other virtqueue fields before enabling the
> >>> virtqueue with queue_enable.
> >>>
> >>> [...]
> >>
> >> These together mean that the following sub-sequence of steps is valid for
> >> a virtio-1.0 guest driver:
> >>
> >> (1.1) set "queue_enable" for the needed queues as the final part of device
> >> initialization step (7),
> >>
> >> (1.2) set DRIVER_OK in step (8),
> >>
> >> (1.3) immediately start sending virtio requests to the device.
> >>
> >> (2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
> >> special virtio feature is negotiated, then virtio rings start in disabled
> >> state, according to
> >> <https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
> >> In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed for
> >> enabling vrings.
> >>
> >> Therefore setting "queue_enable" from the guest (1.1) is a *control plane*
> >> operation, which travels from the guest through QEMU to the vhost-user
> >> backend, using a unix domain socket.
> >>
> >> Whereas sending a virtio request (1.3) is a *data plane* operation, which
> >> evades QEMU -- it travels from guest to the vhost-user backend via
> >> eventfd.
> >>
> >> This means that steps (1.1) and (1.3) travel through different channels,
> >> and their relative order can be reversed, as perceived by the vhost-user
> >> backend.
> >>
> >> That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe) runs
> >> against the Rust-language virtiofsd version 1.7.2. (Which uses version
> >> 0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
> >> crate.)
> >>
> >> Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
> >> device initialization steps (i.e., control plane operations), and
> >> immediately sends a FUSE_INIT request too (i.e., performs a data plane
> >> operation). In the Rust-language virtiofsd, this creates a race between
> >> two components that run *concurrently*, i.e., in different threads or
> >> processes:
> >>
> >> - Control plane, handling vhost-user protocol messages:
> >>
> >>   The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
> >>   [crates/vhost-user-backend/src/handler.rs] handles
> >>   VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
> >>   flag according to the message processed.
> >>
> >> - Data plane, handling virtio / FUSE requests:
> >>
> >>   The "VringEpollHandler::handle_event" method
> >>   [crates/vhost-user-backend/src/event_loop.rs] handles the incoming
> >>   virtio / FUSE request, consuming the virtio kick at the same time. If
> >>   the vring's "enabled" flag is set, the virtio / FUSE request is
> >>   processed genuinely. If the vring's "enabled" flag is clear, then the
> >>   virtio / FUSE request is discarded.
> >
> > Why is virtiofsd monitoring the virtqueue and discarding requests
> > while it's disabled?
>
> That's what the vhost-user spec requires

Re: [PULL 0/1] Quick fix patches

2023-08-30 Thread Stefan Hajnoczi

Hi,
The patch introduces the following build failure:

cc -m64 -mcx16 -Isubprojects/libvhost-user/libvhost-user.a.p
-Isubprojects/libvhost-user -I../subprojects/libvhost-user
-fdiagnostics-color=auto -Wall -Winvalid-pch -Werror -std=gnu99 -O2 -g
-Wsign-compare -Wdeclaration-after-statement -Wstrict-aliasing
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing
-fno-common -fwrapv -fPIE -pthread -D_GNU_SOURCE -MD -MQ
subprojects/libvhost-user/libvhost-user.a.p/libvhost-user.c.o -MF
subprojects/libvhost-user/libvhost-user.a.p/libvhost-user.c.o.d -o
subprojects/libvhost-user/libvhost-user.a.p/libvhost-user.c.o -c
../subprojects/libvhost-user/libvhost-user.c
In file included from ../subprojects/libvhost-user/include/atomic.h:18,
from ../subprojects/libvhost-user/libvhost-user.c:53:
../subprojects/libvhost-user/include/compiler.h:38:40: error: missing
binary operator before token "("
38 | #if defined(__clang__) && __has_warning("-Waddress-of-packed-member")
| ^

https://gitlab.com/qemu-project/qemu/-/jobs/4981576093

Stefan

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-08-30 Thread Stefan Hajnoczi

On Sun, 27 Aug 2023 at 14:31, Laszlo Ersek  wrote:
>
> (1) The virtio-1.0 specification
>  writes:
>
> > 3 General Initialization And Device Operation
> > 3.1   Device Initialization
> > 3.1.1 Driver Requirements: Device Initialization
> >
> > [...]
> >
> > 7. Perform device-specific setup, including discovery of virtqueues for
> >the device, optional per-bus setup, reading and possibly writing the
> >device’s virtio configuration space, and population of virtqueues.
> >
> > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
>
> and
>
> > 4 Virtio Transport Options
> > 4.1   Virtio Over PCI Bus
> > 4.1.4 Virtio Structure PCI Capabilities
> > 4.1.4.3   Common configuration structure layout
> > 4.1.4.3.2 Driver Requirements: Common configuration structure layout
> >
> > [...]
> >
> > The driver MUST configure the other virtqueue fields before enabling the
> > virtqueue with queue_enable.
> >
> > [...]
>
> These together mean that the following sub-sequence of steps is valid for
> a virtio-1.0 guest driver:
>
> (1.1) set "queue_enable" for the needed queues as the final part of device
> initialization step (7),
>
> (1.2) set DRIVER_OK in step (8),
>
> (1.3) immediately start sending virtio requests to the device.
>
> (2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
> special virtio feature is negotiated, then virtio rings start in disabled
> state, according to
> .
> In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed for
> enabling vrings.
>
> Therefore setting "queue_enable" from the guest (1.1) is a *control plane*
> operation, which travels from the guest through QEMU to the vhost-user
> backend, using a unix domain socket.
>
> Whereas sending a virtio request (1.3) is a *data plane* operation, which
> evades QEMU -- it travels from guest to the vhost-user backend via
> eventfd.
>
> This means that steps (1.1) and (1.3) travel through different channels,
> and their relative order can be reversed, as perceived by the vhost-user
> backend.
>
> That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe) runs
> against the Rust-language virtiofsd version 1.7.2. (Which uses version
> 0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
> crate.)
>
> Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
> device initialization steps (i.e., control plane operations), and
> immediately sends a FUSE_INIT request too (i.e., performs a data plane
> operation). In the Rust-language virtiofsd, this creates a race between
> two components that run *concurrently*, i.e., in different threads or
> processes:
>
> - Control plane, handling vhost-user protocol messages:
>
>   The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
>   [crates/vhost-user-backend/src/handler.rs] handles
>   VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
>   flag according to the message processed.
>
> - Data plane, handling virtio / FUSE requests:
>
>   The "VringEpollHandler::handle_event" method
>   [crates/vhost-user-backend/src/event_loop.rs] handles the incoming
>   virtio / FUSE request, consuming the virtio kick at the same time. If
>   the vring's "enabled" flag is set, the virtio / FUSE request is
>   processed genuinely. If the vring's "enabled" flag is clear, then the
>   virtio / FUSE request is discarded.

Why is virtiofsd monitoring the virtqueue and discarding requests
while it's disabled? This seems like a bug in the vhost-user backend
to me.

When the virtqueue is disabled, don't monitor the kickfd.

When the virtqueue transitions from disabled to enabled, the control
plane should self-trigger the kickfd so that any available buffers
will be processed.

QEMU uses this scheme to switch between vhost/IOThreads and built-in
virtqueue kick processing.

This approach is more robust than relying buffers being enqueued after
the virtqueue is enabled.

Stefan

>
> Note that OVMF enables the queue *first*, and sends FUSE_INIT *second*.
> However, if the data plane processor in virtiofsd wins the race, then it
> sees the FUSE_INIT *before* the control plane processor took notice of
> VHOST_USER_SET_VRING_ENABLE and green-lit the queue for the data plane
> processor. Therefore the latter drops FUSE_INIT on the floor, and goes
> back to waiting for further virtio / FUSE requests with epoll_wait.
> Meanwhile OVMF is stuck waiting for the FUSET_INIT response -- a deadlock.
>
> The deadlock is not deterministic. OVMF hangs infrequently during first
> boot. However, OVMF hangs almost certainly during reboots from the UEFI
> shell.
>
> The race can be "reliably masked" by inserting a very small delay -- a
> single debug message -- at the top of "VringEpollHandler::handle_event",
> i.e., just before the data plane processor checks the "enabled" field

[PULL v3 4/5] tests/qemu-iotests/197: add testcase for CoR with subclusters

2023-08-30 Thread Stefan Hajnoczi

From: Andrey Drobyshev 

Add testcase which checks that allocations during copy-on-read are
performed on the subcluster basis when subclusters are enabled in target
image.

This testcase also triggers the following assert with previous commit
not being applied, so we check that as well:

qemu-io: ../block/io.c:1236: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < 
pnum' failed.

Reviewed-by: Eric Blake 
Reviewed-by: Denis V. Lunev 
Signed-off-by: Andrey Drobyshev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20230711172553.234055-4-andrey.drobys...@virtuozzo.com>
---
 tests/qemu-iotests/197 | 29 +
 tests/qemu-iotests/197.out | 24 
 2 files changed, 53 insertions(+)

diff --git a/tests/qemu-iotests/197 b/tests/qemu-iotests/197
index a2547bc280..f07a9da136 100755
--- a/tests/qemu-iotests/197
+++ b/tests/qemu-iotests/197
@@ -122,6 +122,35 @@ $QEMU_IO -f qcow2 -C -c 'read 0 1024' "$TEST_WRAP" | 
_filter_qemu_io
 $QEMU_IO -f qcow2 -c map "$TEST_WRAP"
 _check_test_img
 
+echo
+echo '=== Copy-on-read with subclusters ==='
+echo
+
+# Create base and top images 64K (1 cluster) each.  Make subclusters enabled
+# for the top image
+_make_test_img 64K
+IMGPROTO=file IMGFMT=qcow2 TEST_IMG_FILE="$TEST_WRAP" \
+_make_test_img --no-opts -o extended_l2=true -F "$IMGFMT" -b "$TEST_IMG" \
+64K | _filter_img_create
+
+$QEMU_IO -c "write -P 0xaa 0 64k" "$TEST_IMG" | _filter_qemu_io
+
+# Allocate individual subclusters in the top image, and not the whole cluster
+$QEMU_IO -c "write -P 0xbb 28K 2K" -c "write -P 0xcc 34K 2K" "$TEST_WRAP" \
+| _filter_qemu_io
+
+# Only 2 subclusters should be allocated in the top image at this point
+$QEMU_IMG map "$TEST_WRAP" | _filter_qemu_img_map
+
+# Actual copy-on-read operation
+$QEMU_IO -C -c "read -P 0xaa 30K 4K" "$TEST_WRAP" | _filter_qemu_io
+
+# And here we should have 4 subclusters allocated right in the middle of the
+# top image. Make sure the whole cluster remains unallocated
+$QEMU_IMG map "$TEST_WRAP" | _filter_qemu_img_map
+
+_check_test_img
+
 # success, all done
 echo '*** done'
 status=0
diff --git a/tests/qemu-iotests/197.out b/tests/qemu-iotests/197.out
index ad414c3b0e..8f34a30afe 100644
--- a/tests/qemu-iotests/197.out
+++ b/tests/qemu-iotests/197.out
@@ -31,4 +31,28 @@ read 1024/1024 bytes at offset 0
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 1 KiB (0x400) bytes allocated at offset 0 bytes (0x0)
 No errors were found on the image.
+
+=== Copy-on-read with subclusters ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65536
+Formatting 'TEST_DIR/t.wrap.IMGFMT', fmt=IMGFMT size=65536 
backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 28672
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 34816
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset  Length  File
+0   0x7000  TEST_DIR/t.IMGFMT
+0x7000  0x800   TEST_DIR/t.wrap.IMGFMT
+0x7800  0x1000  TEST_DIR/t.IMGFMT
+0x8800  0x800   TEST_DIR/t.wrap.IMGFMT
+0x9000  0x7000  TEST_DIR/t.IMGFMT
+read 4096/4096 bytes at offset 30720
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset  Length  File
+0   0x7000  TEST_DIR/t.IMGFMT
+0x7000  0x2000  TEST_DIR/t.wrap.IMGFMT
+0x9000  0x7000  TEST_DIR/t.IMGFMT
+No errors were found on the image.
 *** done
-- 
2.41.0

[PULL v3 1/5] block-migration: Ensure we don't crash during migration cleanup

2023-08-30 Thread Stefan Hajnoczi

From: Fabiano Rosas 

We can fail the blk_insert_bs() at init_blk_migration(), leaving the
BlkMigDevState without a dirty_bitmap and BlockDriverState. Account
for the possibly missing elements when doing cleanup.

Fix the following crashes:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x55ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at 
../block/dirty-bitmap.c:359
359 BlockDriverState *bs = bitmap->bs;
 #0  0x55ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at 
../block/dirty-bitmap.c:359
 #1  0x55bba331 in unset_dirty_tracking () at ../migration/block.c:371
 #2  0x55bbad98 in block_migration_cleanup_bmds () at 
../migration/block.c:681

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x55e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, 
reason=0x0) at ../block.c:7073
7073QLIST_FOREACH_SAFE(blocker, >op_blockers[op], list, next) {
 #0  0x55e971ff in bdrv_op_unblock (bs=0x0, 
op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073
 #1  0x55e9734a in bdrv_op_unblock_all (bs=0x0, reason=0x0) at 
../block.c:7095
 #2  0x55bbae13 in block_migration_cleanup_bmds () at 
../migration/block.c:690

Signed-off-by: Fabiano Rosas 
Message-id: 20230731203338.27581-1-faro...@suse.de
Signed-off-by: Stefan Hajnoczi 
---
 migration/block.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index b9580a6c7e..86c2256a2b 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -368,7 +368,9 @@ static void unset_dirty_tracking(void)
 BlkMigDevState *bmds;
 
 QSIMPLEQ_FOREACH(bmds, _mig_state.bmds_list, entry) {
-bdrv_release_dirty_bitmap(bmds->dirty_bitmap);
+if (bmds->dirty_bitmap) {
+bdrv_release_dirty_bitmap(bmds->dirty_bitmap);
+}
 }
 }
 
@@ -676,13 +678,18 @@ static int64_t get_remaining_dirty(void)
 static void block_migration_cleanup_bmds(void)
 {
 BlkMigDevState *bmds;
+BlockDriverState *bs;
 AioContext *ctx;
 
 unset_dirty_tracking();
 
 while ((bmds = QSIMPLEQ_FIRST(_mig_state.bmds_list)) != NULL) {
 QSIMPLEQ_REMOVE_HEAD(_mig_state.bmds_list, entry);
-bdrv_op_unblock_all(blk_bs(bmds->blk), bmds->blocker);
+
+bs = blk_bs(bmds->blk);
+if (bs) {
+bdrv_op_unblock_all(bs, bmds->blocker);
+}
 error_free(bmds->blocker);
 
 /* Save ctx, because bmds->blk can disappear during blk_unref.  */
-- 
2.41.0

[PULL v3 5/5] aio-posix: zero out io_uring sqe user_data

2023-08-30 Thread Stefan Hajnoczi

liburing does not clear sqe->user_data. We must do it ourselves to avoid
undefined behavior in process_cqe() when user_data is used.

Note that fdmon-io_uring is currently disabled, so this is a latent bug
that does not affect users. Let's merge this fix now to make it easier
to enable fdmon-io_uring in the future (and I'm working on that).

Signed-off-by: Stefan Hajnoczi 
Message-ID: <20230426212639.82310-1-stefa...@redhat.com>
---
 util/fdmon-io_uring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
index 17ec18b7bd..16054c5ede 100644
--- a/util/fdmon-io_uring.c
+++ b/util/fdmon-io_uring.c
@@ -184,6 +184,7 @@ static void add_poll_remove_sqe(AioContext *ctx, AioHandler 
*node)
 #else
 io_uring_prep_poll_remove(sqe, node);
 #endif
+io_uring_sqe_set_data(sqe, NULL);
 }
 
 /* Add a timeout that self-cancels when another cqe becomes ready */
@@ -197,6 +198,7 @@ static void add_timeout_sqe(AioContext *ctx, int64_t ns)
 
 sqe = get_sqe(ctx);
 io_uring_prep_timeout(sqe, , 1, 0);
+io_uring_sqe_set_data(sqe, NULL);
 }
 
 /* Add sqes from ctx->submit_list for submission */
-- 
2.41.0

[PULL v3 2/5] block: add subcluster_size field to BlockDriverInfo

2023-08-30 Thread Stefan Hajnoczi

From: Andrey Drobyshev 

This is going to be used in the subsequent commit as requests alignment
(in particular, during copy-on-read).  This value only makes sense for
the formats which support subclusters (currently QCOW2 only).  If this
field isn't set by driver's own bdrv_get_info() implementation, we
simply set it equal to the cluster size thus treating each cluster as
having a single subcluster.

Reviewed-by: Eric Blake 
Reviewed-by: Denis V. Lunev 
Signed-off-by: Andrey Drobyshev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20230711172553.234055-2-andrey.drobys...@virtuozzo.com>
---
 include/block/block-common.h | 5 +
 block.c  | 7 +++
 block/qcow2.c| 1 +
 3 files changed, 13 insertions(+)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index e15395f2cb..df5ffc8d09 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -132,6 +132,11 @@ typedef struct BlockZoneWps {
 typedef struct BlockDriverInfo {
 /* in bytes, 0 if irrelevant */
 int cluster_size;
+/*
+ * A fraction of cluster_size, if supported (currently QCOW2 only); if
+ * disabled or unsupported, set equal to cluster_size.
+ */
+int subcluster_size;
 /* offset at which the VM state can be saved (0 if not possible) */
 int64_t vm_state_offset;
 bool is_dirty;
diff --git a/block.c b/block.c
index a307c151a8..0af890f647 100644
--- a/block.c
+++ b/block.c
@@ -6480,6 +6480,13 @@ int coroutine_fn bdrv_co_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 }
 memset(bdi, 0, sizeof(*bdi));
 ret = drv->bdrv_co_get_info(bs, bdi);
+if (bdi->subcluster_size == 0) {
+/*
+ * If the driver left this unset, subclusters are not supported.
+ * Then it is safe to treat each cluster as having only one subcluster.
+ */
+bdi->subcluster_size = bdi->cluster_size;
+}
 if (ret < 0) {
 return ret;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index c51388e99d..b48cd9ce63 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5197,6 +5197,7 @@ qcow2_co_get_info(BlockDriverState *bs, BlockDriverInfo 
*bdi)
 {
 BDRVQcow2State *s = bs->opaque;
 bdi->cluster_size = s->cluster_size;
+bdi->subcluster_size = s->subcluster_size;
 bdi->vm_state_offset = qcow2_vm_state_offset(s);
 bdi->is_dirty = s->incompatible_features & QCOW2_INCOMPAT_DIRTY;
 return 0;
-- 
2.41.0

[PULL v3 0/5] Block patches

2023-08-30 Thread Stefan Hajnoczi

The following changes since commit 813bac3d8d70d85cb7835f7945eb9eed84c2d8d0:

  Merge tag '2023q3-bsd-user-pull-request' of https://gitlab.com/bsdimp/qemu 
into staging (2023-08-29 08:58:00 -0400)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 87ec6f55af38e29be5b2b65a8acf84da73e06d06:

  aio-posix: zero out io_uring sqe user_data (2023-08-30 07:39:59 -0400)


Pull request

v3:
- Drop UFS emulation due to CI failures
- Add "aio-posix: zero out io_uring sqe user_data"



Andrey Drobyshev (3):
  block: add subcluster_size field to BlockDriverInfo
  block/io: align requests to subcluster_size
  tests/qemu-iotests/197: add testcase for CoR with subclusters

Fabiano Rosas (1):
  block-migration: Ensure we don't crash during migration cleanup

Stefan Hajnoczi (1):
  aio-posix: zero out io_uring sqe user_data

 include/block/block-common.h |  5 
 include/block/block-io.h |  8 +++---
 block.c  |  7 +
 block/io.c   | 50 ++--
 block/mirror.c   |  8 +++---
 block/qcow2.c|  1 +
 migration/block.c| 11 ++--
 util/fdmon-io_uring.c|  2 ++
 tests/qemu-iotests/197   | 29 +
 tests/qemu-iotests/197.out   | 24 +
 10 files changed, 110 insertions(+), 35 deletions(-)

-- 
2.41.0

[PULL v3 3/5] block/io: align requests to subcluster_size

2023-08-30 Thread Stefan Hajnoczi

From: Andrey Drobyshev 

When target image is using subclusters, and we align the request during
copy-on-read, it makes sense to align to subcluster_size rather than
cluster_size.  Otherwise we end up with unnecessary allocations.

This commit renames bdrv_round_to_clusters() to bdrv_round_to_subclusters()
and utilizes subcluster_size field of BlockDriverInfo to make necessary
alignments.  It affects copy-on-read as well as mirror job (which is
using bdrv_round_to_clusters()).

This change also fixes the following bug with failing assert (covered by
the test in the subsequent commit):

qemu-img create -f qcow2 base.qcow2 64K
qemu-img create -f qcow2 -o 
extended_l2=on,backing_file=base.qcow2,backing_fmt=qcow2 img.qcow2 64K
qemu-io -c "write -P 0xaa 0 2K" img.qcow2
qemu-io -C -c "read -P 0x00 2K 62K" img.qcow2

qemu-io: ../block/io.c:1236: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < 
pnum' failed.

Reviewed-by: Eric Blake 
Reviewed-by: Denis V. Lunev 
Signed-off-by: Andrey Drobyshev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20230711172553.234055-3-andrey.drobys...@virtuozzo.com>
---
 include/block/block-io.h |  8 +++
 block/io.c   | 50 
 block/mirror.c   |  8 +++
 3 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/include/block/block-io.h b/include/block/block-io.h
index 4415506e40..6db48f2d35 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -189,10 +189,10 @@ bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs,
   Error **errp);
 BlockStatsSpecific *bdrv_get_specific_stats(BlockDriverState *bs);
-void bdrv_round_to_clusters(BlockDriverState *bs,
-int64_t offset, int64_t bytes,
-int64_t *cluster_offset,
-int64_t *cluster_bytes);
+void bdrv_round_to_subclusters(BlockDriverState *bs,
+   int64_t offset, int64_t bytes,
+   int64_t *cluster_offset,
+   int64_t *cluster_bytes);
 
 void bdrv_get_backing_filename(BlockDriverState *bs,
char *filename, int filename_size);
diff --git a/block/io.c b/block/io.c
index 055fcf7438..76e7df18d8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -728,21 +728,21 @@ BdrvTrackedRequest *coroutine_fn 
bdrv_co_get_self_request(BlockDriverState *bs)
 }
 
 /**
- * Round a region to cluster boundaries
+ * Round a region to subcluster (if supported) or cluster boundaries
  */
 void coroutine_fn GRAPH_RDLOCK
-bdrv_round_to_clusters(BlockDriverState *bs, int64_t offset, int64_t bytes,
-   int64_t *cluster_offset, int64_t *cluster_bytes)
+bdrv_round_to_subclusters(BlockDriverState *bs, int64_t offset, int64_t bytes,
+  int64_t *align_offset, int64_t *align_bytes)
 {
 BlockDriverInfo bdi;
 IO_CODE();
-if (bdrv_co_get_info(bs, ) < 0 || bdi.cluster_size == 0) {
-*cluster_offset = offset;
-*cluster_bytes = bytes;
+if (bdrv_co_get_info(bs, ) < 0 || bdi.subcluster_size == 0) {
+*align_offset = offset;
+*align_bytes = bytes;
 } else {
-int64_t c = bdi.cluster_size;
-*cluster_offset = QEMU_ALIGN_DOWN(offset, c);
-*cluster_bytes = QEMU_ALIGN_UP(offset - *cluster_offset + bytes, c);
+int64_t c = bdi.subcluster_size;
+*align_offset = QEMU_ALIGN_DOWN(offset, c);
+*align_bytes = QEMU_ALIGN_UP(offset - *align_offset + bytes, c);
 }
 }
 
@@ -1168,8 +1168,8 @@ bdrv_co_do_copy_on_readv(BdrvChild *child, int64_t 
offset, int64_t bytes,
 void *bounce_buffer = NULL;
 
 BlockDriver *drv = bs->drv;
-int64_t cluster_offset;
-int64_t cluster_bytes;
+int64_t align_offset;
+int64_t align_bytes;
 int64_t skip_bytes;
 int ret;
 int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
@@ -1203,28 +1203,28 @@ bdrv_co_do_copy_on_readv(BdrvChild *child, int64_t 
offset, int64_t bytes,
  * BDRV_REQUEST_MAX_BYTES (even when the original read did not), which
  * is one reason we loop rather than doing it all at once.
  */
-bdrv_round_to_clusters(bs, offset, bytes, _offset, _bytes);
-skip_bytes = offset - cluster_offset;
+bdrv_round_to_subclusters(bs, offset, bytes, _offset, _bytes);
+skip_bytes = offset - align_offset;
 
 trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
-   cluster_offset, cluster_bytes);
+   align_offset, align_bytes);
 
-while (cluster_bytes) {
+while (align_bytes) {
 int64_t pnum;
 
 if (skip_write) {
 ret = 1; /* "already allocated", so nothing will be copied */
-pnum = MIN(cl

Re: [PATCH v9 0/4] hw/ufs: Add Universal Flash Storage (UFS) support

2023-08-30 Thread Stefan Hajnoczi

On Thu, 3 Aug 2023 at 07:49, Jeuk Kim  wrote:
>
> Dear Stefan,
> I'm really sorry, but could you please put this patch series
> instead of v8, which was previously merged into block-next?
> The fixes from v8 are below.
> Please let me know if you have any comments or issues.

The CI hit a test failure:
https://gitlab.com/qemu-project/qemu/-/jobs/4977256030

Please investigate how to fix this so this series can be merged. Thanks!

Stefan

Re: [PATCH v9 0/4] hw/ufs: Add Universal Flash Storage (UFS) support

2023-08-30 Thread Stefan Hajnoczi

On Thu, 3 Aug 2023 at 07:49, Jeuk Kim  wrote:
>
> Dear Stefan,
> I'm really sorry, but could you please put this patch series
> instead of v8, which was previously merged into block-next?
> The fixes from v8 are below.
> Please let me know if you have any comments or issues.

Hi,
This series fails to compile for win32. I have dropped it from my queue.

You can use "make docker-test-build@fedora-win32-cross" to run a win32
build locally on a Linux host. Please take a look and I'll merge it
again when the build issue is fixed.

Thanks!

https://gitlab.com/qemu-project/qemu/-/jobs/4977255992

In file included from /usr/i686-w64-mingw32/sys-root/mingw/include/windows.h:72,
from /usr/i686-w64-mingw32/sys-root/mingw/include/winsock2.h:23,
from /builds/qemu-project/qemu/include/sysemu/os-win32.h:29,
from /builds/qemu-project/qemu/include/qemu/osdep.h:160,
from ../hw/ufs/lu.c:11:
/builds/qemu-project/qemu/include/block/ufs.h:456:5: error: expected
identifier before numeric constant
456 | PWR_OK = 0x0,
| ^~
[1657/3707] Compiling C object libcommon.fa.p/hw_ufs_ufs.c.obj
FAILED: libcommon.fa.p/hw_ufs_ufs.c.obj
i686-w64-mingw32-gcc -m32 -Ilibcommon.fa.p -Isubprojects/dtc/libfdt
-I../subprojects/dtc/libfdt
-I/usr/i686-w64-mingw32/sys-root/mingw/include/pixman-1
-I/usr/i686-w64-mingw32/sys-root/mingw/include/libpng16
-I/usr/i686-w64-mingw32/sys-root/mingw/include/p11-kit-1
-I/usr/i686-w64-mingw32/sys-root/mingw/include/SDL2
-I/usr/i686-w64-mingw32/sys-root/mingw/include/glib-2.0
-I/usr/i686-w64-mingw32/sys-root/mingw/lib/glib-2.0/include
-I/usr/i686-w64-mingw32/sys-root/mingw/include/gtk-3.0
-I/usr/i686-w64-mingw32/sys-root/mingw/include/pango-1.0
-I/usr/i686-w64-mingw32/sys-root/mingw/include/harfbuzz
-I/usr/i686-w64-mingw32/sys-root/mingw/include/freetype2
-I/usr/i686-w64-mingw32/sys-root/mingw/include/fribidi
-I/usr/i686-w64-mingw32/sys-root/mingw/include/cairo
-I/usr/i686-w64-mingw32/sys-root/mingw/include/gdk-pixbuf-2.0
-I/usr/i686-w64-mingw32/sys-root/mingw/include/atk-1.0
-I/usr/i686-w64-mingw32/sys-root/mingw/include/webp
-fdiagnostics-color=auto -Wall -Winvalid-pch -Werror -std=gnu11 -O2 -g
-fstack-protector-strong -Wundef -Wwrite-strings -Wmissing-prototypes
-Wstrict-prototypes -Wredundant-decls -Wold-style-declaration
-Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k
-Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs
-Wendif-labels -Wexpansion-to-defined -Wimplicit-fallthrough=2
-Wmissing-format-attribute -Wno-missing-include-dirs
-Wno-shift-negative-value -Wno-psabi -iquote . -iquote
/builds/qemu-project/qemu -iquote /builds/qemu-project/qemu/include
-iquote /builds/qemu-project/qemu/host/include/i386 -iquote
/builds/qemu-project/qemu/host/include/generic -iquote
/builds/qemu-project/qemu/tcg/i386 -mms-bitfields -D_GNU_SOURCE
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing
-fno-common -fwrapv -fno-pie -no-pie -mms-bitfields -mms-bitfields
-mms-bitfields -Dmain=SDL_main -MD -MQ libcommon.fa.p/hw_ufs_ufs.c.obj
-MF libcommon.fa.p/hw_ufs_ufs.c.obj.d -o
libcommon.fa.p/hw_ufs_ufs.c.obj -c ../hw/ufs/ufs.c
In file included from /usr/i686-w64-mingw32/sys-root/mingw/include/windows.h:72,
from /usr/i686-w64-mingw32/sys-root/mingw/include/winsock2.h:23,
from /builds/qemu-project/qemu/include/sysemu/os-win32.h:29,
from /builds/qemu-project/qemu/include/qemu/osdep.h:160,
from ../hw/ufs/ufs.c:24:
/builds/qemu-project/qemu/include/block/ufs.h:456:5: error: expected
identifier before numeric constant
456 | PWR_OK = 0x0,
| ^~
In file included from /builds/qemu-project/qemu/include/block/ufs.h:6,
from ../hw/ufs/ufs.h:16,
from ../hw/ufs/ufs.c:28:
../hw/ufs/ufs.c: In function 'ufs_process_uiccmd':
../hw/ufs/ufs.c:310:58: error: 'PWR_LOCAL' undeclared (first use in
this function)
310 | u->reg.hcs = FIELD_DP32(u->reg.hcs, HCS, UPMCRS, PWR_LOCAL);
| ^
/builds/qemu-project/qemu/include/hw/registerfields.h:104:19: note: in
definition of macro 'FIELD_DP32'
104 | } _v = { .v = val }; \
| ^~~
../hw/ufs/ufs.c:310:58: note: each undeclared identifier is reported
only once for each function it appears in
310 | u->reg.hcs = FIELD_DP32(u->reg.hcs, HCS, UPMCRS, PWR_LOCAL);
| ^
/builds/qemu-project/qemu/include/hw/registerfields.h:104:19: note: in
definition of macro 'FIELD_DP32'
104 | } _v = { .v = val }; \
| ^~~

Re: [PATCH v2 4/4] io: follow coroutine AioContext in qio_channel_yield()

2023-08-29 Thread Stefan Hajnoczi

On Tue, Aug 29, 2023 at 02:32:46PM -0500, Eric Blake wrote:
> On Tue, Aug 29, 2023 at 12:06:22PM -0400, Stefan Hajnoczi wrote:
> > The ongoing QEMU multi-queue block layer effort makes it possible for 
> > multiple
> > threads to process I/O in parallel. The nbd block driver is not compatible 
> > with
> > the multi-queue block layer yet because QIOChannel cannot be used easily 
> > from
> > coroutines running in multiple threads. This series changes the QIOChannel 
> > API
> > to make that possible.
> > 
> > In the current API, calling qio_channel_attach_aio_context() sets the
> > AioContext where qio_channel_yield() installs an fd handler prior to 
> > yielding:
> > 
> >   qio_channel_attach_aio_context(ioc, my_ctx);
> >   ...
> >   qio_channel_yield(ioc); // my_ctx is used here
> >   ...
> >   qio_channel_detach_aio_context(ioc);
> > 
> > This API design has limitations: reading and writing must be done in the 
> > same
> > AioContext and moving between AioContexts involves a cumbersome sequence of 
> > API
> > calls that is not suitable for doing on a per-request basis.
> > 
> > There is no fundamental reason why a QIOChannel needs to run within the
> > same AioContext every time qio_channel_yield() is called. QIOChannel
> > only uses the AioContext while inside qio_channel_yield(). The rest of
> > the time, QIOChannel is independent of any AioContext.
> > 
> > In the new API, qio_channel_yield() queries the AioContext from the current
> > coroutine using qemu_coroutine_get_aio_context(). There is no need to
> > explicitly attach/detach AioContexts anymore and
> > qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are 
> > gone.
> > One coroutine can read from the QIOChannel while another coroutine writes 
> > from
> > a different AioContext.
> > 
> > This API change allows the nbd block driver to use QIOChannel from any 
> > thread.
> > It's important to keep in mind that the block driver already synchronizes
> > QIOChannel access and ensures that two coroutines never read simultaneously 
> > or
> > write simultaneously.
> > 
> > This patch updates all users of qio_channel_attach_aio_context() to the
> > new API. Most conversions are simple, but vhost-user-server requires a
> > new qemu_coroutine_yield() call to quiesce the vu_client_trip()
> > coroutine when not attached to any AioContext.
> > 
> > While the API is has become simpler, there is one wart: QIOChannel has a
> > special case for the iohandler AioContext (used for handlers that must not 
> > run
> > in nested event loops). I didn't find an elegant way preserve that 
> > behavior, so
> > I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, 
> > true|false)
> > for opting in to the new AioContext model. By default QIOChannel uses the
> > iohandler AioHandler. Code that formerly called
> > qio_channel_attach_aio_context() now calls
> > qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
> > created.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> > +++ b/include/io/channel-util.h
> > @@ -49,4 +49,27 @@
> >  QIOChannel *qio_channel_new_fd(int fd,
> > Error **errp);
> >  
> > +/**
> > + * qio_channel_yield:
> 
> Wrong function name...
> 
> > + * @read_fd: the file descriptor for the read handler
> > + * @read_ctx: the AioContext for the read handler
> > + * @io_read: the read handler
> > + * @write_fd: the file descriptor for the write handler
> > + * @write_ctx: the AioContext for the write handler
> > + * @io_write: the write handler
> > + * @opaque: the opaque argument to the read and write handler
> > + *
> > + * Set the read and write handlers when @read_ctx and @write_ctx are 
> > non-NULL,
> > + * respectively. To leave a handler in its current state, pass a NULL
> > + * AioContext. To clear a handler, pass a non-NULL AioContext and a NULL
> > + * handler.
> > + */
> > +void qio_channel_util_set_aio_fd_handler(int read_fd,
> 
> ...should be this.
> 
> > + AioContext *read_ctx,
> > + IOHandler *io_read,
> > + int write_fd,
> > + AioContext *write_ctx,
> > + IOHandler *io_write,
> > + void *opaque);
> > +
>

Re: [PULL 0/3] Dirty page rate and dirty page limit 20230828 patches

2023-08-29 Thread Stefan Hajnoczi

On Tue, 29 Aug 2023 at 12:30, Yong Huang  wrote:
> On Tue, Aug 29, 2023 at 4:01 AM Stefan Hajnoczi  wrote:
>>
>> On Mon, 28 Aug 2023 at 10:36, Hyman Huang  wrote:
>> >
>> > From: Hyman 
>> >
>> > The following changes since commit 
>> > 50e7a40af372ee5931c99ef7390f5d3d6fbf6ec4:
>> >
>> >   Merge tag 'pull-target-arm-20230824' of 
>> > https://git.linaro.org/people/pmaydell/qemu-arm into staging (2023-08-24 
>> > 10:08:33 -0400)
>> >
>> > are available in the git repository at:
>> >
>> >   https://github.com/newfriday/qemu.git 
>> > tags/dirtylimit-dirtyrate-fixes-pull-request
>>
>> Hi,
>> This is not a signed tag. Please use "git tag -s" so the tag is signed
>> with your GPG key.
>>
>> I also noticed that this pull request email thread only has a cover
>> letter. Please also send the individual patches along with the pull
>> request email. This makes it easier for people to reply if they have
>> comments about a patch.
>>
>> After pushing a signed tag, please send the pull request again with
>> "PULL v2" in the subject line. Thanks!
>
>
> Sorry for not noticing this earlier and I have sent a pull request with "PULL"
> in the subject line instead of "PULL v3" that you mentioned above, please
> ping me if PULL request resending is required indeed.

I have applied the pull request. To make it easier to verify future
pull requests, please publish your public key to a keyserver:

  $ gpg --send-keys DFF223D6B3FECB9C

Thanks!

Stefan

[PULL v2 8/8] tests/qemu-iotests/197: add testcase for CoR with subclusters

2023-08-29 Thread Stefan Hajnoczi

From: Andrey Drobyshev 

Add testcase which checks that allocations during copy-on-read are
performed on the subcluster basis when subclusters are enabled in target
image.

This testcase also triggers the following assert with previous commit
not being applied, so we check that as well:

qemu-io: ../block/io.c:1236: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < 
pnum' failed.

Reviewed-by: Eric Blake 
Reviewed-by: Denis V. Lunev 
Signed-off-by: Andrey Drobyshev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20230711172553.234055-4-andrey.drobys...@virtuozzo.com>
---
 tests/qemu-iotests/197 | 29 +
 tests/qemu-iotests/197.out | 24 
 2 files changed, 53 insertions(+)

diff --git a/tests/qemu-iotests/197 b/tests/qemu-iotests/197
index a2547bc280..f07a9da136 100755
--- a/tests/qemu-iotests/197
+++ b/tests/qemu-iotests/197
@@ -122,6 +122,35 @@ $QEMU_IO -f qcow2 -C -c 'read 0 1024' "$TEST_WRAP" | 
_filter_qemu_io
 $QEMU_IO -f qcow2 -c map "$TEST_WRAP"
 _check_test_img
 
+echo
+echo '=== Copy-on-read with subclusters ==='
+echo
+
+# Create base and top images 64K (1 cluster) each.  Make subclusters enabled
+# for the top image
+_make_test_img 64K
+IMGPROTO=file IMGFMT=qcow2 TEST_IMG_FILE="$TEST_WRAP" \
+_make_test_img --no-opts -o extended_l2=true -F "$IMGFMT" -b "$TEST_IMG" \
+64K | _filter_img_create
+
+$QEMU_IO -c "write -P 0xaa 0 64k" "$TEST_IMG" | _filter_qemu_io
+
+# Allocate individual subclusters in the top image, and not the whole cluster
+$QEMU_IO -c "write -P 0xbb 28K 2K" -c "write -P 0xcc 34K 2K" "$TEST_WRAP" \
+| _filter_qemu_io
+
+# Only 2 subclusters should be allocated in the top image at this point
+$QEMU_IMG map "$TEST_WRAP" | _filter_qemu_img_map
+
+# Actual copy-on-read operation
+$QEMU_IO -C -c "read -P 0xaa 30K 4K" "$TEST_WRAP" | _filter_qemu_io
+
+# And here we should have 4 subclusters allocated right in the middle of the
+# top image. Make sure the whole cluster remains unallocated
+$QEMU_IMG map "$TEST_WRAP" | _filter_qemu_img_map
+
+_check_test_img
+
 # success, all done
 echo '*** done'
 status=0
diff --git a/tests/qemu-iotests/197.out b/tests/qemu-iotests/197.out
index ad414c3b0e..8f34a30afe 100644
--- a/tests/qemu-iotests/197.out
+++ b/tests/qemu-iotests/197.out
@@ -31,4 +31,28 @@ read 1024/1024 bytes at offset 0
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 1 KiB (0x400) bytes allocated at offset 0 bytes (0x0)
 No errors were found on the image.
+
+=== Copy-on-read with subclusters ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65536
+Formatting 'TEST_DIR/t.wrap.IMGFMT', fmt=IMGFMT size=65536 
backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 28672
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 34816
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset  Length  File
+0   0x7000  TEST_DIR/t.IMGFMT
+0x7000  0x800   TEST_DIR/t.wrap.IMGFMT
+0x7800  0x1000  TEST_DIR/t.IMGFMT
+0x8800  0x800   TEST_DIR/t.wrap.IMGFMT
+0x9000  0x7000  TEST_DIR/t.IMGFMT
+read 4096/4096 bytes at offset 30720
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset  Length  File
+0   0x7000  TEST_DIR/t.IMGFMT
+0x7000  0x2000  TEST_DIR/t.wrap.IMGFMT
+0x9000  0x7000  TEST_DIR/t.IMGFMT
+No errors were found on the image.
 *** done
-- 
2.41.0

[PULL v2 7/8] block/io: align requests to subcluster_size

2023-08-29 Thread Stefan Hajnoczi

From: Andrey Drobyshev 

When target image is using subclusters, and we align the request during
copy-on-read, it makes sense to align to subcluster_size rather than
cluster_size.  Otherwise we end up with unnecessary allocations.

This commit renames bdrv_round_to_clusters() to bdrv_round_to_subclusters()
and utilizes subcluster_size field of BlockDriverInfo to make necessary
alignments.  It affects copy-on-read as well as mirror job (which is
using bdrv_round_to_clusters()).

This change also fixes the following bug with failing assert (covered by
the test in the subsequent commit):

qemu-img create -f qcow2 base.qcow2 64K
qemu-img create -f qcow2 -o 
extended_l2=on,backing_file=base.qcow2,backing_fmt=qcow2 img.qcow2 64K
qemu-io -c "write -P 0xaa 0 2K" img.qcow2
qemu-io -C -c "read -P 0x00 2K 62K" img.qcow2

qemu-io: ../block/io.c:1236: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < 
pnum' failed.

Reviewed-by: Eric Blake 
Reviewed-by: Denis V. Lunev 
Signed-off-by: Andrey Drobyshev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20230711172553.234055-3-andrey.drobys...@virtuozzo.com>
---
 include/block/block-io.h |  8 +++
 block/io.c   | 50 
 block/mirror.c   |  8 +++
 3 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/include/block/block-io.h b/include/block/block-io.h
index 4415506e40..6db48f2d35 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -189,10 +189,10 @@ bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs,
   Error **errp);
 BlockStatsSpecific *bdrv_get_specific_stats(BlockDriverState *bs);
-void bdrv_round_to_clusters(BlockDriverState *bs,
-int64_t offset, int64_t bytes,
-int64_t *cluster_offset,
-int64_t *cluster_bytes);
+void bdrv_round_to_subclusters(BlockDriverState *bs,
+   int64_t offset, int64_t bytes,
+   int64_t *cluster_offset,
+   int64_t *cluster_bytes);
 
 void bdrv_get_backing_filename(BlockDriverState *bs,
char *filename, int filename_size);
diff --git a/block/io.c b/block/io.c
index 055fcf7438..76e7df18d8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -728,21 +728,21 @@ BdrvTrackedRequest *coroutine_fn 
bdrv_co_get_self_request(BlockDriverState *bs)
 }
 
 /**
- * Round a region to cluster boundaries
+ * Round a region to subcluster (if supported) or cluster boundaries
  */
 void coroutine_fn GRAPH_RDLOCK
-bdrv_round_to_clusters(BlockDriverState *bs, int64_t offset, int64_t bytes,
-   int64_t *cluster_offset, int64_t *cluster_bytes)
+bdrv_round_to_subclusters(BlockDriverState *bs, int64_t offset, int64_t bytes,
+  int64_t *align_offset, int64_t *align_bytes)
 {
 BlockDriverInfo bdi;
 IO_CODE();
-if (bdrv_co_get_info(bs, ) < 0 || bdi.cluster_size == 0) {
-*cluster_offset = offset;
-*cluster_bytes = bytes;
+if (bdrv_co_get_info(bs, ) < 0 || bdi.subcluster_size == 0) {
+*align_offset = offset;
+*align_bytes = bytes;
 } else {
-int64_t c = bdi.cluster_size;
-*cluster_offset = QEMU_ALIGN_DOWN(offset, c);
-*cluster_bytes = QEMU_ALIGN_UP(offset - *cluster_offset + bytes, c);
+int64_t c = bdi.subcluster_size;
+*align_offset = QEMU_ALIGN_DOWN(offset, c);
+*align_bytes = QEMU_ALIGN_UP(offset - *align_offset + bytes, c);
 }
 }
 
@@ -1168,8 +1168,8 @@ bdrv_co_do_copy_on_readv(BdrvChild *child, int64_t 
offset, int64_t bytes,
 void *bounce_buffer = NULL;
 
 BlockDriver *drv = bs->drv;
-int64_t cluster_offset;
-int64_t cluster_bytes;
+int64_t align_offset;
+int64_t align_bytes;
 int64_t skip_bytes;
 int ret;
 int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
@@ -1203,28 +1203,28 @@ bdrv_co_do_copy_on_readv(BdrvChild *child, int64_t 
offset, int64_t bytes,
  * BDRV_REQUEST_MAX_BYTES (even when the original read did not), which
  * is one reason we loop rather than doing it all at once.
  */
-bdrv_round_to_clusters(bs, offset, bytes, _offset, _bytes);
-skip_bytes = offset - cluster_offset;
+bdrv_round_to_subclusters(bs, offset, bytes, _offset, _bytes);
+skip_bytes = offset - align_offset;
 
 trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
-   cluster_offset, cluster_bytes);
+   align_offset, align_bytes);
 
-while (cluster_bytes) {
+while (align_bytes) {
 int64_t pnum;
 
 if (skip_write) {
 ret = 1; /* "already allocated", so nothing will be copied */
-pnum = MIN(cl

[PULL v2 1/8] hw/ufs: Initial commit for emulated Universal-Flash-Storage

2023-08-29 Thread Stefan Hajnoczi

From: Jeuk Kim 

Universal Flash Storage (UFS) is a high-performance mass storage device
with a serial interface. It is primarily used as a high-performance
data storage device for embedded applications.

This commit contains code for UFS device to be recognized
as a UFS PCI device.
Patches to handle UFS logical unit and Transfer Request will follow.

Signed-off-by: Jeuk Kim 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
9f3db32fe1c708090a6bb764d456973b5abef55f.1691062912.git.jeuk20@samsung.com
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS  |6 +
 docs/specs/pci-ids.rst   |2 +
 meson.build  |1 +
 hw/ufs/trace.h   |1 +
 hw/ufs/ufs.h |   42 ++
 include/block/ufs.h  | 1090 ++
 include/hw/pci/pci.h |1 +
 include/hw/pci/pci_ids.h |1 +
 hw/ufs/ufs.c |  278 ++
 hw/Kconfig   |1 +
 hw/meson.build   |1 +
 hw/ufs/Kconfig   |4 +
 hw/ufs/meson.build   |1 +
 hw/ufs/trace-events  |   32 ++
 14 files changed, 1461 insertions(+)
 create mode 100644 hw/ufs/trace.h
 create mode 100644 hw/ufs/ufs.h
 create mode 100644 include/block/ufs.h
 create mode 100644 hw/ufs/ufs.c
 create mode 100644 hw/ufs/Kconfig
 create mode 100644 hw/ufs/meson.build
 create mode 100644 hw/ufs/trace-events

diff --git a/MAINTAINERS b/MAINTAINERS
index 6111b6b4d9..7da40dabc2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2256,6 +2256,12 @@ F: tests/qtest/nvme-test.c
 F: docs/system/devices/nvme.rst
 T: git git://git.infradead.org/qemu-nvme.git nvme-next
 
+ufs
+M: Jeuk Kim 
+S: Supported
+F: hw/ufs/*
+F: include/block/ufs.h
+
 megasas
 M: Hannes Reinecke 
 L: qemu-bl...@nongnu.org
diff --git a/docs/specs/pci-ids.rst b/docs/specs/pci-ids.rst
index e302bea484..d6707fa069 100644
--- a/docs/specs/pci-ids.rst
+++ b/docs/specs/pci-ids.rst
@@ -92,6 +92,8 @@ PCI devices (other than virtio):
   PCI PVPanic device (``-device pvpanic-pci``)
 1b36:0012
   PCI ACPI ERST device (``-device acpi-erst``)
+1b36:0013
+  PCI UFS device (``-device ufs``)
 
 All these devices are documented in :doc:`index`.
 
diff --git a/meson.build b/meson.build
index 98e68ef0b1..76578b2bf2 100644
--- a/meson.build
+++ b/meson.build
@@ -3277,6 +3277,7 @@ if have_system
 'hw/ssi',
 'hw/timer',
 'hw/tpm',
+'hw/ufs',
 'hw/usb',
 'hw/vfio',
 'hw/virtio',
diff --git a/hw/ufs/trace.h b/hw/ufs/trace.h
new file mode 100644
index 00..2dbd6397c3
--- /dev/null
+++ b/hw/ufs/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-hw_ufs.h"
diff --git a/hw/ufs/ufs.h b/hw/ufs/ufs.h
new file mode 100644
index 00..d9d195caec
--- /dev/null
+++ b/hw/ufs/ufs.h
@@ -0,0 +1,42 @@
+/*
+ * QEMU UFS
+ *
+ * Copyright (c) 2023 Samsung Electronics Co., Ltd. All rights reserved.
+ *
+ * Written by Jeuk Kim 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_UFS_UFS_H
+#define HW_UFS_UFS_H
+
+#include "hw/pci/pci_device.h"
+#include "hw/scsi/scsi.h"
+#include "block/ufs.h"
+
+#define UFS_MAX_LUS 32
+#define UFS_BLOCK_SIZE 4096
+
+typedef struct UfsParams {
+char *serial;
+uint8_t nutrs; /* Number of UTP Transfer Request Slots */
+uint8_t nutmrs; /* Number of UTP Task Management Request Slots */
+} UfsParams;
+
+typedef struct UfsHc {
+PCIDevice parent_obj;
+MemoryRegion iomem;
+UfsReg reg;
+UfsParams params;
+uint32_t reg_size;
+
+qemu_irq irq;
+QEMUBH *doorbell_bh;
+QEMUBH *complete_bh;
+} UfsHc;
+
+#define TYPE_UFS "ufs"
+#define UFS(obj) OBJECT_CHECK(UfsHc, (obj), TYPE_UFS)
+
+#endif /* HW_UFS_UFS_H */
diff --git a/include/block/ufs.h b/include/block/ufs.h
new file mode 100644
index 00..e70ee23183
--- /dev/null
+++ b/include/block/ufs.h
@@ -0,0 +1,1090 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef BLOCK_UFS_H
+#define BLOCK_UFS_H
+
+#include "hw/registerfields.h"
+
+typedef struct QEMU_PACKED UfsReg {
+uint32_t cap;
+uint32_t rsvd0;
+uint32_t ver;
+uint32_t rsvd1;
+uint32_t hcpid;
+uint32_t hcmid;
+uint32_t ahit;
+uint32_t rsvd2;
+uint32_t is;
+uint32_t ie;
+uint32_t rsvd3[2];
+uint32_t hcs;
+uint32_t hce;
+uint32_t uecpa;
+uint32_t uecdl;
+uint32_t uecn;
+uint32_t uect;
+uint32_t uecdme;
+uint32_t utriacr;
+uint32_t utrlba;
+uint32_t utrlbau;
+uint32_t utrldbr;
+uint32_t utrlclr;
+uint32_t utrlrsr;
+uint32_t utrlcnr;
+uint32_t rsvd4[2];
+uint32_t utmrlba;
+uint32_t utmrlbau;
+uint32_t utmrldbr;
+uint32_t utmrlclr;
+uint32_t utmrlrsr;
+uint32_t rsvd5[3];
+uint32_t uiccmd;
+uint32_t ucmdarg1;
+uint32_t ucmdarg2;
+uint32_t ucmdarg3;
+uint32_t rsvd6[4];
+uint32_t rsvd7[4];
+uint32_t rsvd8[16];
+uint32_t ccap;
+} UfsReg;
+
+REG32(CAP, offsetof(UfsReg, cap))
+FIELD(CAP, NUTRS, 0, 5)
+FIELD(C

[PULL v2 2/8] hw/ufs: Support for Query Transfer Requests

2023-08-29 Thread Stefan Hajnoczi

From: Jeuk Kim 

This commit makes the UFS device support query
and nop out transfer requests.

The next patch would be support for UFS logical
unit and scsi command transfer request.

Signed-off-by: Jeuk Kim 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
d06b440d660872092f70af1b8167bd5f4704c957.1691062912.git.jeuk20@samsung.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/ufs/ufs.h|  46 +++
 hw/ufs/ufs.c| 980 +++-
 hw/ufs/trace-events |   1 +
 3 files changed, 1025 insertions(+), 2 deletions(-)

diff --git a/hw/ufs/ufs.h b/hw/ufs/ufs.h
index d9d195caec..3d1b2cff4e 100644
--- a/hw/ufs/ufs.h
+++ b/hw/ufs/ufs.h
@@ -18,6 +18,32 @@
 #define UFS_MAX_LUS 32
 #define UFS_BLOCK_SIZE 4096
 
+typedef enum UfsRequestState {
+UFS_REQUEST_IDLE = 0,
+UFS_REQUEST_READY = 1,
+UFS_REQUEST_RUNNING = 2,
+UFS_REQUEST_COMPLETE = 3,
+UFS_REQUEST_ERROR = 4,
+} UfsRequestState;
+
+typedef enum UfsReqResult {
+UFS_REQUEST_SUCCESS = 0,
+UFS_REQUEST_FAIL = 1,
+} UfsReqResult;
+
+typedef struct UfsRequest {
+struct UfsHc *hc;
+UfsRequestState state;
+int slot;
+
+UtpTransferReqDesc utrd;
+UtpUpiuReq req_upiu;
+UtpUpiuRsp rsp_upiu;
+
+/* for scsi command */
+QEMUSGList *sg;
+} UfsRequest;
+
 typedef struct UfsParams {
 char *serial;
 uint8_t nutrs; /* Number of UTP Transfer Request Slots */
@@ -30,6 +56,12 @@ typedef struct UfsHc {
 UfsReg reg;
 UfsParams params;
 uint32_t reg_size;
+UfsRequest *req_list;
+
+DeviceDescriptor device_desc;
+GeometryDescriptor geometry_desc;
+Attributes attributes;
+Flags flags;
 
 qemu_irq irq;
 QEMUBH *doorbell_bh;
@@ -39,4 +71,18 @@ typedef struct UfsHc {
 #define TYPE_UFS "ufs"
 #define UFS(obj) OBJECT_CHECK(UfsHc, (obj), TYPE_UFS)
 
+typedef enum UfsQueryFlagPerm {
+UFS_QUERY_FLAG_NONE = 0x0,
+UFS_QUERY_FLAG_READ = 0x1,
+UFS_QUERY_FLAG_SET = 0x2,
+UFS_QUERY_FLAG_CLEAR = 0x4,
+UFS_QUERY_FLAG_TOGGLE = 0x8,
+} UfsQueryFlagPerm;
+
+typedef enum UfsQueryAttrPerm {
+UFS_QUERY_ATTR_NONE = 0x0,
+UFS_QUERY_ATTR_READ = 0x1,
+UFS_QUERY_ATTR_WRITE = 0x2,
+} UfsQueryAttrPerm;
+
 #endif /* HW_UFS_UFS_H */
diff --git a/hw/ufs/ufs.c b/hw/ufs/ufs.c
index 101082a8a3..c771f8e0b4 100644
--- a/hw/ufs/ufs.c
+++ b/hw/ufs/ufs.c
@@ -15,10 +15,221 @@
 #include "ufs.h"
 
 /* The QEMU-UFS device follows spec version 3.1 */
-#define UFS_SPEC_VER 0x0310
+#define UFS_SPEC_VER 0x0310
 #define UFS_MAX_NUTRS 32
 #define UFS_MAX_NUTMRS 8
 
+static MemTxResult ufs_addr_read(UfsHc *u, hwaddr addr, void *buf, int size)
+{
+hwaddr hi = addr + size - 1;
+
+if (hi < addr) {
+return MEMTX_DECODE_ERROR;
+}
+
+if (!FIELD_EX32(u->reg.cap, CAP, 64AS) && (hi >> 32)) {
+return MEMTX_DECODE_ERROR;
+}
+
+return pci_dma_read(PCI_DEVICE(u), addr, buf, size);
+}
+
+static MemTxResult ufs_addr_write(UfsHc *u, hwaddr addr, const void *buf,
+  int size)
+{
+hwaddr hi = addr + size - 1;
+if (hi < addr) {
+return MEMTX_DECODE_ERROR;
+}
+
+if (!FIELD_EX32(u->reg.cap, CAP, 64AS) && (hi >> 32)) {
+return MEMTX_DECODE_ERROR;
+}
+
+return pci_dma_write(PCI_DEVICE(u), addr, buf, size);
+}
+
+static void ufs_complete_req(UfsRequest *req, UfsReqResult req_result);
+
+static inline hwaddr ufs_get_utrd_addr(UfsHc *u, uint32_t slot)
+{
+hwaddr utrl_base_addr = (((hwaddr)u->reg.utrlbau) << 32) + u->reg.utrlba;
+hwaddr utrd_addr = utrl_base_addr + slot * sizeof(UtpTransferReqDesc);
+
+return utrd_addr;
+}
+
+static inline hwaddr ufs_get_req_upiu_base_addr(const UtpTransferReqDesc *utrd)
+{
+uint32_t cmd_desc_base_addr_lo =
+le32_to_cpu(utrd->command_desc_base_addr_lo);
+uint32_t cmd_desc_base_addr_hi =
+le32_to_cpu(utrd->command_desc_base_addr_hi);
+
+return (((hwaddr)cmd_desc_base_addr_hi) << 32) + cmd_desc_base_addr_lo;
+}
+
+static inline hwaddr ufs_get_rsp_upiu_base_addr(const UtpTransferReqDesc *utrd)
+{
+hwaddr req_upiu_base_addr = ufs_get_req_upiu_base_addr(utrd);
+uint32_t rsp_upiu_byte_off =
+le16_to_cpu(utrd->response_upiu_offset) * sizeof(uint32_t);
+return req_upiu_base_addr + rsp_upiu_byte_off;
+}
+
+static MemTxResult ufs_dma_read_utrd(UfsRequest *req)
+{
+UfsHc *u = req->hc;
+hwaddr utrd_addr = ufs_get_utrd_addr(u, req->slot);
+MemTxResult ret;
+
+ret = ufs_addr_read(u, utrd_addr, >utrd, sizeof(req->utrd));
+if (ret) {
+trace_ufs_err_dma_read_utrd(req->slot, utrd_addr);
+}
+return ret;
+}
+
+static MemTxResult ufs_dma_read_req_upiu(UfsRequest *req)
+{
+UfsHc *u = req->hc;
+hwaddr req_upiu_base_addr = ufs_get_req_upiu_base_addr(>utrd);
+UtpUpiuReq *req_upiu = >req_upiu;
+uint32_t copy_size;
+uint16_t

[PULL v2 4/8] tests/qtest: Introduce tests for UFS

2023-08-29 Thread Stefan Hajnoczi

From: Jeuk Kim 

This patch includes the following tests
  Test mmio read
  Test ufs device initialization and ufs-lu recognition
  Test I/O (Performs a write followed by a read to verify)

Signed-off-by: Jeuk Kim 
Acked-by: Thomas Huth 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
af6b8d54c049490b3533a784a0aeac4798bb9217.1691062912.git.jeuk20@samsung.com
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS |   1 +
 tests/qtest/ufs-test.c  | 584 
 tests/qtest/meson.build |   1 +
 3 files changed, 586 insertions(+)
 create mode 100644 tests/qtest/ufs-test.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7da40dabc2..a45bac20c9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2261,6 +2261,7 @@ M: Jeuk Kim 
 S: Supported
 F: hw/ufs/*
 F: include/block/ufs.h
+F: tests/qtest/ufs-test.c
 
 megasas
 M: Hannes Reinecke 
diff --git a/tests/qtest/ufs-test.c b/tests/qtest/ufs-test.c
new file mode 100644
index 00..34c97bdf1b
--- /dev/null
+++ b/tests/qtest/ufs-test.c
@@ -0,0 +1,584 @@
+/*
+ * QTest testcase for UFS
+ *
+ * Copyright (c) 2023 Samsung Electronics Co., Ltd. All rights reserved.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/module.h"
+#include "qemu/units.h"
+#include "libqtest.h"
+#include "libqos/qgraph.h"
+#include "libqos/pci.h"
+#include "scsi/constants.h"
+#include "include/block/ufs.h"
+
+/* Test images sizes in Bytes */
+#define TEST_IMAGE_SIZE (64 * 1024 * 1024)
+/* Timeout for various operations, in seconds. */
+#define TIMEOUT_SECONDS 10
+/* Maximum PRD entry count */
+#define MAX_PRD_ENTRY_COUNT 10
+#define PRD_ENTRY_DATA_SIZE 4096
+/* Constants to build upiu */
+#define UTP_COMMAND_DESCRIPTOR_SIZE 4096
+#define UTP_RESPONSE_UPIU_OFFSET 1024
+#define UTP_PRDT_UPIU_OFFSET 2048
+
+typedef struct QUfs QUfs;
+
+struct QUfs {
+QOSGraphObject obj;
+QPCIDevice dev;
+QPCIBar bar;
+
+uint64_t utrlba;
+uint64_t utmrlba;
+uint64_t cmd_desc_addr;
+uint64_t data_buffer_addr;
+
+bool enabled;
+};
+
+static inline uint32_t ufs_rreg(QUfs *ufs, size_t offset)
+{
+return qpci_io_readl(>dev, ufs->bar, offset);
+}
+
+static inline void ufs_wreg(QUfs *ufs, size_t offset, uint32_t value)
+{
+qpci_io_writel(>dev, ufs->bar, offset, value);
+}
+
+static void ufs_wait_for_irq(QUfs *ufs)
+{
+uint64_t end_time;
+uint32_t is;
+/* Wait for device to reset as the linux driver does. */
+end_time = g_get_monotonic_time() + TIMEOUT_SECONDS * G_TIME_SPAN_SECOND;
+do {
+qtest_clock_step(ufs->dev.bus->qts, 100);
+is = ufs_rreg(ufs, A_IS);
+} while (is == 0 && g_get_monotonic_time() < end_time);
+}
+
+static UtpTransferReqDesc ufs_build_req_utrd(uint64_t cmd_desc_addr,
+ uint8_t slot,
+ uint32_t data_direction,
+ uint16_t prd_table_length)
+{
+UtpTransferReqDesc req = { 0 };
+uint64_t command_desc_base_addr =
+cmd_desc_addr + slot * UTP_COMMAND_DESCRIPTOR_SIZE;
+
+req.header.dword_0 =
+cpu_to_le32(1 << 28 | data_direction | UTP_REQ_DESC_INT_CMD);
+req.header.dword_2 = cpu_to_le32(OCS_INVALID_COMMAND_STATUS);
+
+req.command_desc_base_addr_hi = cpu_to_le32(command_desc_base_addr >> 32);
+req.command_desc_base_addr_lo =
+cpu_to_le32(command_desc_base_addr & 0x);
+req.response_upiu_offset =
+cpu_to_le16(UTP_RESPONSE_UPIU_OFFSET / sizeof(uint32_t));
+req.response_upiu_length = cpu_to_le16(sizeof(UtpUpiuRsp));
+req.prd_table_offset = cpu_to_le16(UTP_PRDT_UPIU_OFFSET / 
sizeof(uint32_t));
+req.prd_table_length = cpu_to_le16(prd_table_length);
+return req;
+}
+
+static void ufs_send_nop_out(QUfs *ufs, uint8_t slot,
+ UtpTransferReqDesc *utrd_out, UtpUpiuRsp *rsp_out)
+{
+/* Build up utp transfer request descriptor */
+UtpTransferReqDesc utrd =
+ufs_build_req_utrd(ufs->cmd_desc_addr, slot, UTP_NO_DATA_TRANSFER, 0);
+uint64_t utrd_addr = ufs->utrlba + slot * sizeof(UtpTransferReqDesc);
+uint64_t req_upiu_addr =
+ufs->cmd_desc_addr + slot * UTP_COMMAND_DESCRIPTOR_SIZE;
+uint64_t rsp_upiu_addr = req_upiu_addr + UTP_RESPONSE_UPIU_OFFSET;
+qtest_memwrite(ufs->dev.bus->qts, utrd_addr, , sizeof(utrd));
+
+/* Build up request upiu */
+UtpUpiuReq req_upiu = { 0 };
+req_upiu.header.trans_type = UPIU_TRANSACTION_NOP_OUT;
+req_upiu.header.task_tag = slot;
+qtest_memwrite(ufs->dev.bus->qts, req_upiu_addr, _upiu,
+   sizeof(req_upiu));
+
+/* Ring Doorbell */
+ufs_wreg(ufs, A_UTRLDBR, 1);
+ufs_wait_for_irq(ufs);
+g_assert_true(FIELD_EX32(ufs_rreg(ufs, A_IS), IS, UTRCS));
+ufs_wreg(ufs, A_IS, F

[PULL v2 6/8] block: add subcluster_size field to BlockDriverInfo

2023-08-29 Thread Stefan Hajnoczi

From: Andrey Drobyshev 

This is going to be used in the subsequent commit as requests alignment
(in particular, during copy-on-read).  This value only makes sense for
the formats which support subclusters (currently QCOW2 only).  If this
field isn't set by driver's own bdrv_get_info() implementation, we
simply set it equal to the cluster size thus treating each cluster as
having a single subcluster.

Reviewed-by: Eric Blake 
Reviewed-by: Denis V. Lunev 
Signed-off-by: Andrey Drobyshev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20230711172553.234055-2-andrey.drobys...@virtuozzo.com>
---
 include/block/block-common.h | 5 +
 block.c  | 7 +++
 block/qcow2.c| 1 +
 3 files changed, 13 insertions(+)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index e15395f2cb..df5ffc8d09 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -132,6 +132,11 @@ typedef struct BlockZoneWps {
 typedef struct BlockDriverInfo {
 /* in bytes, 0 if irrelevant */
 int cluster_size;
+/*
+ * A fraction of cluster_size, if supported (currently QCOW2 only); if
+ * disabled or unsupported, set equal to cluster_size.
+ */
+int subcluster_size;
 /* offset at which the VM state can be saved (0 if not possible) */
 int64_t vm_state_offset;
 bool is_dirty;
diff --git a/block.c b/block.c
index a307c151a8..0af890f647 100644
--- a/block.c
+++ b/block.c
@@ -6480,6 +6480,13 @@ int coroutine_fn bdrv_co_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 }
 memset(bdi, 0, sizeof(*bdi));
 ret = drv->bdrv_co_get_info(bs, bdi);
+if (bdi->subcluster_size == 0) {
+/*
+ * If the driver left this unset, subclusters are not supported.
+ * Then it is safe to treat each cluster as having only one subcluster.
+ */
+bdi->subcluster_size = bdi->cluster_size;
+}
 if (ret < 0) {
 return ret;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index c51388e99d..b48cd9ce63 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5197,6 +5197,7 @@ qcow2_co_get_info(BlockDriverState *bs, BlockDriverInfo 
*bdi)
 {
 BDRVQcow2State *s = bs->opaque;
 bdi->cluster_size = s->cluster_size;
+bdi->subcluster_size = s->subcluster_size;
 bdi->vm_state_offset = qcow2_vm_state_offset(s);
 bdi->is_dirty = s->incompatible_features & QCOW2_INCOMPAT_DIRTY;
 return 0;
-- 
2.41.0

[PULL v2 5/8] block-migration: Ensure we don't crash during migration cleanup

2023-08-29 Thread Stefan Hajnoczi

From: Fabiano Rosas 

We can fail the blk_insert_bs() at init_blk_migration(), leaving the
BlkMigDevState without a dirty_bitmap and BlockDriverState. Account
for the possibly missing elements when doing cleanup.

Fix the following crashes:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x55ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at 
../block/dirty-bitmap.c:359
359 BlockDriverState *bs = bitmap->bs;
 #0  0x55ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at 
../block/dirty-bitmap.c:359
 #1  0x55bba331 in unset_dirty_tracking () at ../migration/block.c:371
 #2  0x55bbad98 in block_migration_cleanup_bmds () at 
../migration/block.c:681

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x55e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, 
reason=0x0) at ../block.c:7073
7073QLIST_FOREACH_SAFE(blocker, >op_blockers[op], list, next) {
 #0  0x55e971ff in bdrv_op_unblock (bs=0x0, 
op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073
 #1  0x55e9734a in bdrv_op_unblock_all (bs=0x0, reason=0x0) at 
../block.c:7095
 #2  0x55bbae13 in block_migration_cleanup_bmds () at 
../migration/block.c:690

Signed-off-by: Fabiano Rosas 
Message-id: 20230731203338.27581-1-faro...@suse.de
Signed-off-by: Stefan Hajnoczi 
---
 migration/block.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index b9580a6c7e..86c2256a2b 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -368,7 +368,9 @@ static void unset_dirty_tracking(void)
 BlkMigDevState *bmds;
 
 QSIMPLEQ_FOREACH(bmds, _mig_state.bmds_list, entry) {
-bdrv_release_dirty_bitmap(bmds->dirty_bitmap);
+if (bmds->dirty_bitmap) {
+bdrv_release_dirty_bitmap(bmds->dirty_bitmap);
+}
 }
 }
 
@@ -676,13 +678,18 @@ static int64_t get_remaining_dirty(void)
 static void block_migration_cleanup_bmds(void)
 {
 BlkMigDevState *bmds;
+BlockDriverState *bs;
 AioContext *ctx;
 
 unset_dirty_tracking();
 
 while ((bmds = QSIMPLEQ_FIRST(_mig_state.bmds_list)) != NULL) {
 QSIMPLEQ_REMOVE_HEAD(_mig_state.bmds_list, entry);
-bdrv_op_unblock_all(blk_bs(bmds->blk), bmds->blocker);
+
+bs = blk_bs(bmds->blk);
+if (bs) {
+bdrv_op_unblock_all(bs, bmds->blocker);
+}
 error_free(bmds->blocker);
 
 /* Save ctx, because bmds->blk can disappear during blk_unref.  */
-- 
2.41.0

[PULL v2 0/8] Block patches

2023-08-29 Thread Stefan Hajnoczi

The following changes since commit 813bac3d8d70d85cb7835f7945eb9eed84c2d8d0:

  Merge tag '2023q3-bsd-user-pull-request' of https://gitlab.com/bsdimp/qemu 
into staging (2023-08-29 08:58:00 -0400)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 3f5f2285bfcdd855508a55da7875fb92de1a6ed0:

  tests/qemu-iotests/197: add testcase for CoR with subclusters (2023-08-29 
13:19:56 -0400)


Pull request

v2:
- Fix authorship information lost by the mailing list for Andrey Drobyshev



Andrey Drobyshev (3):
  block: add subcluster_size field to BlockDriverInfo
  block/io: align requests to subcluster_size
  tests/qemu-iotests/197: add testcase for CoR with subclusters

Fabiano Rosas (1):
  block-migration: Ensure we don't crash during migration cleanup

Jeuk Kim (4):
  hw/ufs: Initial commit for emulated Universal-Flash-Storage
  hw/ufs: Support for Query Transfer Requests
  hw/ufs: Support for UFS logical unit
  tests/qtest: Introduce tests for UFS

 MAINTAINERS  |7 +
 docs/specs/pci-ids.rst   |2 +
 meson.build  |1 +
 hw/ufs/trace.h   |1 +
 hw/ufs/ufs.h |  131 +++
 include/block/block-common.h |5 +
 include/block/block-io.h |8 +-
 include/block/ufs.h  | 1090 +
 include/hw/pci/pci.h |1 +
 include/hw/pci/pci_ids.h |1 +
 include/scsi/constants.h |1 +
 block.c  |7 +
 block/io.c   |   50 +-
 block/mirror.c   |8 +-
 block/qcow2.c|1 +
 hw/ufs/lu.c  | 1445 
 hw/ufs/ufs.c | 1494 ++
 migration/block.c|   11 +-
 tests/qtest/ufs-test.c   |  584 +
 hw/Kconfig   |1 +
 hw/meson.build   |1 +
 hw/ufs/Kconfig   |4 +
 hw/ufs/meson.build   |1 +
 hw/ufs/trace-events  |   58 ++
 tests/qemu-iotests/197   |   29 +
 tests/qemu-iotests/197.out   |   24 +
 tests/qtest/meson.build  |1 +
 27 files changed, 4932 insertions(+), 35 deletions(-)
 create mode 100644 hw/ufs/trace.h
 create mode 100644 hw/ufs/ufs.h
 create mode 100644 include/block/ufs.h
 create mode 100644 hw/ufs/lu.c
 create mode 100644 hw/ufs/ufs.c
 create mode 100644 tests/qtest/ufs-test.c
 create mode 100644 hw/ufs/Kconfig
 create mode 100644 hw/ufs/meson.build
 create mode 100644 hw/ufs/trace-events

-- 
2.41.0

[PULL v2 3/8] hw/ufs: Support for UFS logical unit

2023-08-29 Thread Stefan Hajnoczi

From: Jeuk Kim 

This commit adds support for ufs logical unit.
The LU handles processing for the SCSI command,
unit descriptor query request.

This commit enables the UFS device to process
IO requests.

Signed-off-by: Jeuk Kim 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
898cea923e819dc21a99597bf045a12d7983be28.1691062912.git.jeuk20@samsung.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/ufs/ufs.h |   43 ++
 include/scsi/constants.h |1 +
 hw/ufs/lu.c  | 1445 ++
 hw/ufs/ufs.c |  252 ++-
 hw/ufs/meson.build   |2 +-
 hw/ufs/trace-events  |   25 +
 6 files changed, 1761 insertions(+), 7 deletions(-)
 create mode 100644 hw/ufs/lu.c

diff --git a/hw/ufs/ufs.h b/hw/ufs/ufs.h
index 3d1b2cff4e..f244228617 100644
--- a/hw/ufs/ufs.h
+++ b/hw/ufs/ufs.h
@@ -18,6 +18,18 @@
 #define UFS_MAX_LUS 32
 #define UFS_BLOCK_SIZE 4096
 
+typedef struct UfsBusClass {
+BusClass parent_class;
+bool (*parent_check_address)(BusState *bus, DeviceState *dev, Error 
**errp);
+} UfsBusClass;
+
+typedef struct UfsBus {
+SCSIBus parent_bus;
+} UfsBus;
+
+#define TYPE_UFS_BUS "ufs-bus"
+DECLARE_OBJ_CHECKERS(UfsBus, UfsBusClass, UFS_BUS, TYPE_UFS_BUS)
+
 typedef enum UfsRequestState {
 UFS_REQUEST_IDLE = 0,
 UFS_REQUEST_READY = 1,
@@ -29,6 +41,7 @@ typedef enum UfsRequestState {
 typedef enum UfsReqResult {
 UFS_REQUEST_SUCCESS = 0,
 UFS_REQUEST_FAIL = 1,
+UFS_REQUEST_NO_COMPLETE = 2,
 } UfsReqResult;
 
 typedef struct UfsRequest {
@@ -44,6 +57,17 @@ typedef struct UfsRequest {
 QEMUSGList *sg;
 } UfsRequest;
 
+typedef struct UfsLu {
+SCSIDevice qdev;
+uint8_t lun;
+UnitDescriptor unit_desc;
+} UfsLu;
+
+typedef struct UfsWLu {
+SCSIDevice qdev;
+uint8_t lun;
+} UfsWLu;
+
 typedef struct UfsParams {
 char *serial;
 uint8_t nutrs; /* Number of UTP Transfer Request Slots */
@@ -52,12 +76,18 @@ typedef struct UfsParams {
 
 typedef struct UfsHc {
 PCIDevice parent_obj;
+UfsBus bus;
 MemoryRegion iomem;
 UfsReg reg;
 UfsParams params;
 uint32_t reg_size;
 UfsRequest *req_list;
 
+UfsLu *lus[UFS_MAX_LUS];
+UfsWLu *report_wlu;
+UfsWLu *dev_wlu;
+UfsWLu *boot_wlu;
+UfsWLu *rpmb_wlu;
 DeviceDescriptor device_desc;
 GeometryDescriptor geometry_desc;
 Attributes attributes;
@@ -71,6 +101,12 @@ typedef struct UfsHc {
 #define TYPE_UFS "ufs"
 #define UFS(obj) OBJECT_CHECK(UfsHc, (obj), TYPE_UFS)
 
+#define TYPE_UFS_LU "ufs-lu"
+#define UFSLU(obj) OBJECT_CHECK(UfsLu, (obj), TYPE_UFS_LU)
+
+#define TYPE_UFS_WLU "ufs-wlu"
+#define UFSWLU(obj) OBJECT_CHECK(UfsWLu, (obj), TYPE_UFS_WLU)
+
 typedef enum UfsQueryFlagPerm {
 UFS_QUERY_FLAG_NONE = 0x0,
 UFS_QUERY_FLAG_READ = 0x1,
@@ -85,4 +121,11 @@ typedef enum UfsQueryAttrPerm {
 UFS_QUERY_ATTR_WRITE = 0x2,
 } UfsQueryAttrPerm;
 
+static inline bool is_wlun(uint8_t lun)
+{
+return (lun == UFS_UPIU_REPORT_LUNS_WLUN ||
+lun == UFS_UPIU_UFS_DEVICE_WLUN || lun == UFS_UPIU_BOOT_WLUN ||
+lun == UFS_UPIU_RPMB_WLUN);
+}
+
 #endif /* HW_UFS_UFS_H */
diff --git a/include/scsi/constants.h b/include/scsi/constants.h
index 6a8bad556a..9b98451912 100644
--- a/include/scsi/constants.h
+++ b/include/scsi/constants.h
@@ -231,6 +231,7 @@
 #define MODE_PAGE_FLEXIBLE_DISK_GEOMETRY  0x05
 #define MODE_PAGE_CACHING 0x08
 #define MODE_PAGE_AUDIO_CTL   0x0e
+#define MODE_PAGE_CONTROL 0x0a
 #define MODE_PAGE_POWER   0x1a
 #define MODE_PAGE_FAULT_FAIL  0x1c
 #define MODE_PAGE_TO_PROTECT  0x1d
diff --git a/hw/ufs/lu.c b/hw/ufs/lu.c
new file mode 100644
index 00..ec8932ff86
--- /dev/null
+++ b/hw/ufs/lu.c
@@ -0,0 +1,1445 @@
+/*
+ * QEMU UFS Logical Unit
+ *
+ * Copyright (c) 2023 Samsung Electronics Co., Ltd. All rights reserved.
+ *
+ * Written by Jeuk Kim 
+ *
+ * This code is licensed under the GNU GPL v2 or later.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qapi/error.h"
+#include "qemu/memalign.h"
+#include "hw/scsi/scsi.h"
+#include "scsi/constants.h"
+#include "sysemu/block-backend.h"
+#include "qemu/cutils.h"
+#include "trace.h"
+#include "ufs.h"
+
+/*
+ * The code below handling SCSI commands is copied from hw/scsi/scsi-disk.c,
+ * with minor adjustments to make it work for UFS.
+ */
+
+#define SCSI_DMA_BUF_SIZE (128 * KiB)
+#define SCSI_MAX_INQUIRY_LEN 256
+#define SCSI_INQUIRY_DATA_SIZE 36
+#define SCSI_MAX_MODE_LEN 256
+
+typedef struct UfsSCSIReq {
+SCSIRequest req;
+/* Both sector and sector_count are in terms of BDRV_SECTOR_SIZE bytes.  */
+uint64_t sector;
+uint32_t sector_count;
+uint32_t buflen;
+bool started;
+bool need_fua_emulation;
+struct iovec iov;
+

[PATCH v2 4/4] io: follow coroutine AioContext in qio_channel_yield()

2023-08-29 Thread Stefan Hajnoczi

The ongoing QEMU multi-queue block layer effort makes it possible for multiple
threads to process I/O in parallel. The nbd block driver is not compatible with
the multi-queue block layer yet because QIOChannel cannot be used easily from
coroutines running in multiple threads. This series changes the QIOChannel API
to make that possible.

In the current API, calling qio_channel_attach_aio_context() sets the
AioContext where qio_channel_yield() installs an fd handler prior to yielding:

  qio_channel_attach_aio_context(ioc, my_ctx);
  ...
  qio_channel_yield(ioc); // my_ctx is used here
  ...
  qio_channel_detach_aio_context(ioc);

This API design has limitations: reading and writing must be done in the same
AioContext and moving between AioContexts involves a cumbersome sequence of API
calls that is not suitable for doing on a per-request basis.

There is no fundamental reason why a QIOChannel needs to run within the
same AioContext every time qio_channel_yield() is called. QIOChannel
only uses the AioContext while inside qio_channel_yield(). The rest of
the time, QIOChannel is independent of any AioContext.

In the new API, qio_channel_yield() queries the AioContext from the current
coroutine using qemu_coroutine_get_aio_context(). There is no need to
explicitly attach/detach AioContexts anymore and
qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are gone.
One coroutine can read from the QIOChannel while another coroutine writes from
a different AioContext.

This API change allows the nbd block driver to use QIOChannel from any thread.
It's important to keep in mind that the block driver already synchronizes
QIOChannel access and ensures that two coroutines never read simultaneously or
write simultaneously.

This patch updates all users of qio_channel_attach_aio_context() to the
new API. Most conversions are simple, but vhost-user-server requires a
new qemu_coroutine_yield() call to quiesce the vu_client_trip()
coroutine when not attached to any AioContext.

While the API is has become simpler, there is one wart: QIOChannel has a
special case for the iohandler AioContext (used for handlers that must not run
in nested event loops). I didn't find an elegant way preserve that behavior, so
I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, true|false)
for opting in to the new AioContext model. By default QIOChannel uses the
iohandler AioHandler. Code that formerly called
qio_channel_attach_aio_context() now calls
qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
created.

Signed-off-by: Stefan Hajnoczi 
---
 include/io/channel-util.h|  23 ++
 include/io/channel.h |  69 --
 include/qemu/vhost-user-server.h |   1 +
 block/nbd.c  |  11 +--
 io/channel-command.c |  10 ++-
 io/channel-file.c|   9 ++-
 io/channel-null.c|   3 +-
 io/channel-socket.c  |   9 ++-
 io/channel-tls.c |   6 +-
 io/channel-util.c|  24 +++
 io/channel.c | 120 ++-
 migration/channel-block.c|   3 +-
 nbd/server.c |  14 +---
 scsi/qemu-pr-helper.c|   4 +-
 util/vhost-user-server.c |  27 +--
 15 files changed, 216 insertions(+), 117 deletions(-)

diff --git a/include/io/channel-util.h b/include/io/channel-util.h
index a5d720d9a0..74ed299fad 100644
--- a/include/io/channel-util.h
+++ b/include/io/channel-util.h
@@ -49,4 +49,27 @@
 QIOChannel *qio_channel_new_fd(int fd,
Error **errp);
 
+/**
+ * qio_channel_yield:
+ * @read_fd: the file descriptor for the read handler
+ * @read_ctx: the AioContext for the read handler
+ * @io_read: the read handler
+ * @write_fd: the file descriptor for the write handler
+ * @write_ctx: the AioContext for the write handler
+ * @io_write: the write handler
+ * @opaque: the opaque argument to the read and write handler
+ *
+ * Set the read and write handlers when @read_ctx and @write_ctx are non-NULL,
+ * respectively. To leave a handler in its current state, pass a NULL
+ * AioContext. To clear a handler, pass a non-NULL AioContext and a NULL
+ * handler.
+ */
+void qio_channel_util_set_aio_fd_handler(int read_fd,
+ AioContext *read_ctx,
+ IOHandler *io_read,
+ int write_fd,
+ AioContext *write_ctx,
+ IOHandler *io_write,
+ void *opaque);
+
 #endif /* QIO_CHANNEL_UTIL_H */
diff --git a/include/io/channel.h b/include/io/channel.h
index 229bf36910..5f9dbaab65 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -81,9 +81,11 @@ struct QIOChannel {
 Object parent;
 unsigned int features; /* bitmask of QIOChannelFeatures

[PATCH v2 1/4] nbd: drop unused nbd_receive_negotiate() aio_context argument

2023-08-29 Thread Stefan Hajnoczi

aio_context is always NULL, so drop it.

Suggested-by: Fabiano Rosas 
Signed-off-by: Stefan Hajnoczi 
---
 include/block/nbd.h | 3 +--
 nbd/client-connection.c | 3 +--
 nbd/client.c| 5 ++---
 qemu-nbd.c  | 4 ++--
 4 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 4428bcffbb..f672b76173 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -324,8 +324,7 @@ typedef struct NBDExportInfo {
 char **contexts;
 } NBDExportInfo;
 
-int nbd_receive_negotiate(AioContext *aio_context, QIOChannel *ioc,
-  QCryptoTLSCreds *tlscreds,
+int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
   const char *hostname, QIOChannel **outioc,
   NBDExportInfo *info, Error **errp);
 void nbd_free_export_list(NBDExportInfo *info, int count);
diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 3d14296c04..aafb3d0fb4 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -146,8 +146,7 @@ static int nbd_connect(QIOChannelSocket *sioc, 
SocketAddress *addr,
 return 0;
 }
 
-ret = nbd_receive_negotiate(NULL, QIO_CHANNEL(sioc), tlscreds,
-tlshostname,
+ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), tlscreds, tlshostname,
 outioc, info, errp);
 if (ret < 0) {
 /*
diff --git a/nbd/client.c b/nbd/client.c
index 479208d5d9..16ec10c8a9 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1014,8 +1014,7 @@ static int nbd_negotiate_finish_oldstyle(QIOChannel *ioc, 
NBDExportInfo *info,
  * Returns: negative errno: failure talking to server
  *  0: server is connected
  */
-int nbd_receive_negotiate(AioContext *aio_context, QIOChannel *ioc,
-  QCryptoTLSCreds *tlscreds,
+int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
   const char *hostname, QIOChannel **outioc,
   NBDExportInfo *info, Error **errp)
 {
@@ -1027,7 +1026,7 @@ int nbd_receive_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 assert(info->name && strlen(info->name) <= NBD_MAX_STRING_SIZE);
 trace_nbd_receive_negotiate_name(info->name);
 
-result = nbd_start_negotiate(aio_context, ioc, tlscreds, hostname, outioc,
+result = nbd_start_negotiate(NULL, ioc, tlscreds, hostname, outioc,
  info->structured_reply, , errp);
 if (result < 0) {
 return result;
diff --git a/qemu-nbd.c b/qemu-nbd.c
index aaccaa3318..b47459f781 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -295,8 +295,8 @@ static void *nbd_client_thread(void *arg)
 goto out;
 }
 
-if (nbd_receive_negotiate(NULL, QIO_CHANNEL(sioc),
-  NULL, NULL, NULL, , _error) < 0) {
+if (nbd_receive_negotiate(QIO_CHANNEL(sioc), NULL, NULL, NULL,
+  , _error) < 0) {
 if (local_error) {
 error_report_err(local_error);
 }
-- 
2.41.0

[PATCH v2 2/4] nbd: drop unused nbd_start_negotiate() aio_context argument

2023-08-29 Thread Stefan Hajnoczi

aio_context is always NULL, so drop it.

Suggested-by: Fabiano Rosas 
Signed-off-by: Stefan Hajnoczi 
---
 nbd/client.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 16ec10c8a9..bd7e200136 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -877,8 +877,7 @@ static int nbd_list_meta_contexts(QIOChannel *ioc,
  * Returns: negative errno: failure talking to server
  *  non-negative: enum NBDMode describing server abilities
  */
-static int nbd_start_negotiate(AioContext *aio_context, QIOChannel *ioc,
-   QCryptoTLSCreds *tlscreds,
+static int nbd_start_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
const char *hostname, QIOChannel **outioc,
bool structured_reply, bool *zeroes,
Error **errp)
@@ -946,10 +945,6 @@ static int nbd_start_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 return -EINVAL;
 }
 ioc = *outioc;
-if (aio_context) {
-qio_channel_set_blocking(ioc, false, NULL);
-qio_channel_attach_aio_context(ioc, aio_context);
-}
 } else {
 error_setg(errp, "Server does not support STARTTLS");
 return -EINVAL;
@@ -1026,7 +1021,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 assert(info->name && strlen(info->name) <= NBD_MAX_STRING_SIZE);
 trace_nbd_receive_negotiate_name(info->name);
 
-result = nbd_start_negotiate(NULL, ioc, tlscreds, hostname, outioc,
+result = nbd_start_negotiate(ioc, tlscreds, hostname, outioc,
  info->structured_reply, , errp);
 if (result < 0) {
 return result;
@@ -1149,7 +1144,7 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 QIOChannel *sioc = NULL;
 
 *info = NULL;
-result = nbd_start_negotiate(NULL, ioc, tlscreds, hostname, , true,
+result = nbd_start_negotiate(ioc, tlscreds, hostname, , true,
  NULL, errp);
 if (tlscreds && sioc) {
 ioc = sioc;
-- 
2.41.0

[PATCH v2 3/4] io: check there are no qio_channel_yield() coroutines during ->finalize()

2023-08-29 Thread Stefan Hajnoczi

Callers must clean up their coroutines before calling
object_unref(OBJECT(ioc)) to prevent an fd handler leak. Add an
assertion to check this.

This patch is preparation for the fd handler changes that follow.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Eric Blake 
---
 io/channel.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/io/channel.c b/io/channel.c
index 72f0066af5..c415f3fc88 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -653,6 +653,10 @@ static void qio_channel_finalize(Object *obj)
 {
 QIOChannel *ioc = QIO_CHANNEL(obj);
 
+/* Must not have coroutines in qio_channel_yield() */
+assert(!ioc->read_coroutine);
+assert(!ioc->write_coroutine);
+
 g_free(ioc->name);
 
 #ifdef _WIN32
-- 
2.41.0

[PATCH v2 0/4] io: follow coroutine AioContext in qio_channel_yield()

2023-08-29 Thread Stefan Hajnoczi

v2:
- Add Patch 1 & 2 to remove unused NBD aio_context arguments and dead code 
[Fabiano]
- Remove attach/detach prototypes from "io/channel.h" [Daniel]
- Add utility function to set fd handlers [Daniel]

The ongoing QEMU multi-queue block layer effort makes it possible for multiple
threads to process I/O in parallel. The nbd block driver is not compatible with
the multi-queue block layer yet because QIOChannel cannot be used easily from
coroutines running in multiple threads. This series changes the QIOChannel API
to make that possible.

Stefan Hajnoczi (4):
  nbd: drop unused nbd_receive_negotiate() aio_context argument
  nbd: drop unused nbd_start_negotiate() aio_context argument
  io: check there are no qio_channel_yield() coroutines during
->finalize()
  io: follow coroutine AioContext in qio_channel_yield()

 include/block/nbd.h  |   3 +-
 include/io/channel-util.h|  23 ++
 include/io/channel.h |  69 -
 include/qemu/vhost-user-server.h |   1 +
 block/nbd.c  |  11 +--
 io/channel-command.c |  10 ++-
 io/channel-file.c|   9 ++-
 io/channel-null.c|   3 +-
 io/channel-socket.c  |   9 ++-
 io/channel-tls.c |   6 +-
 io/channel-util.c|  24 ++
 io/channel.c | 124 ++-
 migration/channel-block.c|   3 +-
 nbd/client-connection.c  |   3 +-
 nbd/client.c |  14 +---
 nbd/server.c |  14 +---
 qemu-nbd.c   |   4 +-
 scsi/qemu-pr-helper.c|   4 +-
 util/vhost-user-server.c |  27 +--
 19 files changed, 228 insertions(+), 133 deletions(-)

-- 
2.41.0

Re: [PULL 00/36] 2023q3 bsd user patches

2023-08-29 Thread Stefan Hajnoczi

Applied, thanks. In the future, please invoke git-request-pull(1) with
the public HTTPS repo URL to make it easier for anyone to fetch the
changes:

> are available in the Git repository at:
>
>  g...@gitlab.com:bsdimp/qemu.git tags/2023q3-bsd-user-pull-request
   ^^

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.

Thanks!


signature.asc
Description: PGP signature

Re: [PULL 00/14] Python, i386 changes for 2023-08-28

2023-08-29 Thread Stefan Hajnoczi

Hi Paolo,
I see a new test error that may have been introduced by this pull request:
https://gitlab.com/qemu-project/qemu/-/jobs/4968468877#L131

AVOCADO Downloading avocado tests VM image for s390x
Failed to load plugin from module "avocado.plugins.journal":
ImportError("Module 'sqlite3' is not installed.\nUse:\n sudo zypper
install python311\nto install it.") :
File 
"/builds/qemu-project/qemu/build/pyvenv/lib64/python3.11/site-packages/avocado/core/extension_manager.py",
line 63, in __init__
plugin = ep.load()
^
File "/usr/lib/python3.11/site-packages/pkg_resources/__init__.py",
line 2517, in load
return self.resolve()
^^
File "/usr/lib/python3.11/site-packages/pkg_resources/__init__.py",
line 2523, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)

File 
"/builds/qemu-project/qemu/build/pyvenv/lib64/python3.11/site-packages/avocado/plugins/journal.py",
line 19, in 
import sqlite3
File "/usr/lib64/python3.11/_import_failed/sqlite3.py", line 16, in 

Stefan

On Mon, 28 Aug 2023 at 06:41, Paolo Bonzini  wrote:
>
> The following changes since commit 50e7a40af372ee5931c99ef7390f5d3d6fbf6ec4:
>
>   Merge tag 'pull-target-arm-20230824' of 
> https://git.linaro.org/people/pmaydell/qemu-arm into staging (2023-08-24 
> 10:08:33 -0400)
>
> are available in the Git repository at:
>
>   https://gitlab.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to 29a8238510df27080b0ffa92c58400412ce19daa:
>
>   configure: remove unnecessary mkdir -p (2023-08-28 10:01:44 +0200)
>
> 
> * separate accepted and auto-installed versions of Python dependencies
> * bump tricore container to Debian 11
> * small configure cleanups
>
> 
> Ake Koomsin (1):
>   target/i386: add support for VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE
>
> Paolo Bonzini (13):
>   configure: fix and complete detection of tricore tools
>   dockerfiles: bump tricore cross compiler container to Debian 11
>   python: mkvenv: tweak the matching of --diagnose to depspecs
>   python: mkvenv: introduce TOML-like representation of dependencies
>   python: mkvenv: add ensuregroup command
>   lcitool: bump libvirt-ci submodule and regenerate
>   configure: never use PyPI for Meson
>   python: use vendored tomli
>   configure: switch to ensuregroup
>   Revert "tests: Use separate virtual environment for avocado"
>   tests/docker: add python3-tomli dependency to containers
>   configure: fix container_hosts misspellings and duplications
>   configure: remove unnecessary mkdir -p
>
>  .gitlab-ci.d/buildtest.yml |   6 +-
>  .gitlab-ci.d/cirrus/freebsd-13.vars|   2 +-
>  .gitlab-ci.d/cirrus/macos-12.vars  |   2 +-
>  configure  |  31 +---
>  docs/devel/acpi-bits.rst   |   6 +-
>  docs/devel/testing.rst |  14 +-
>  python/scripts/mkvenv.py   | 201 
> +++--
>  python/scripts/vendor.py   |   5 +-
>  python/setup.cfg   |   6 +
>  python/wheels/tomli-2.0.1-py3-none-any.whl | Bin 0 -> 12757 bytes
>  pythondeps.toml|  32 
>  scripts/ci/org.centos/stream/8/x86_64/test-avocado |   4 +-
>  scripts/device-crash-test  |   2 +-
>  target/i386/cpu.c  |   6 +-
>  target/i386/cpu.h  |   1 +
>  tests/Makefile.include |  19 +-
>  tests/docker/dockerfiles/centos8.docker|   3 +-
>  .../dockerfiles/debian-all-test-cross.docker   |   7 +-
>  tests/docker/dockerfiles/debian-amd64-cross.docker |   6 +-
>  tests/docker/dockerfiles/debian-amd64.docker   |   4 +
>  tests/docker/dockerfiles/debian-arm64-cross.docker |   6 +-
>  tests/docker/dockerfiles/debian-armel-cross.docker |   6 +-
>  tests/docker/dockerfiles/debian-armhf-cross.docker |   6 +-
>  .../docker/dockerfiles/debian-hexagon-cross.docker |   6 +-
>  .../dockerfiles/debian-mips64el-cross.docker   |   6 +-
>  .../docker/dockerfiles/debian-mipsel-cross.docker  |   6 +-
>  .../docker/dockerfiles/debian-ppc64el-cross.docker |   6 +-
>  .../docker/dockerfiles/debian-riscv64-cross.docker |   2 +-
>  tests/docker/dockerfiles/debian-s390x-cross.docker |   6 +-
>  .../docker/dockerfiles/debian-tricore-cross.docker |   4 +-
>  tests/docker/dockerfiles/fedora-i386-cross.docker  |   1 +
>  tests/docker/dockerfiles/fedora-win32-cross.docker |   2 +-
>  tests/docker/dockerfiles/fedora-win64-cross.docker |   2 +-
>  tests/docker/dockerfiles/opensuse-leap.docker  |  22 +--
>  tests/docker/dockerfiles/ubuntu2004.docker

Re: [PATCH v2 00/48] tcg patch queue

2023-08-28 Thread Stefan Hajnoczi

On Thu, 24 Aug 2023 at 14:29, Richard Henderson
 wrote:
>
> The following changes since commit 50e7a40af372ee5931c99ef7390f5d3d6fbf6ec4:
>
>   Merge tag 'pull-target-arm-20230824' of 
> https://git.linaro.org/people/pmaydell/qemu-arm into staging (2023-08-24 
> 10:08:33 -0400)
>
> are available in the Git repository at:
>
>   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230823-2
>
> for you to fetch changes up to 4daad8d9d6b9d426beb8ce505d2164ba36ea3168:
>
>   tcg: spelling fixes (2023-08-24 11:22:42 -0700)
>
> 
> accel/*: Widen pc/saved_insn for *_sw_breakpoint
> accel/tcg: Replace remaining target_ulong in system-mode accel
> tcg: spelling fixes
> tcg: Document bswap, hswap, wswap byte patterns
> tcg: Introduce negsetcond opcodes
> tcg: Fold deposit with zero to and
> tcg: Unify TCG_TARGET_HAS_extr[lh]_i64_i32
> tcg/i386: Drop BYTEH deposits for 64-bit
> tcg/i386: Allow immediate as input to deposit
> target/*: Use tcg_gen_negsetcond_*

Hi Richard,
I'm seeing a segfault in "make docker-test-tcg@debian-tricore-cross"
after this pull request.

Note that it has already been merged into qemu.git/master. CI actually
caught it but I accidentally pushed staging to master.

Stefan

>
> 
> Anton Johansson (9):
>   accel/kvm: Widen pc/saved_insn for kvm_sw_breakpoint
>   accel/hvf: Widen pc/saved_insn for hvf_sw_breakpoint
>   sysemu/kvm: Use vaddr for kvm_arch_[insert|remove]_hw_breakpoint
>   sysemu/hvf: Use vaddr for hvf_arch_[insert|remove]_hw_breakpoint
>   include/exec: Replace target_ulong with abi_ptr in cpu_[st|ld]*()
>   include/exec: typedef abi_ptr to vaddr in softmmu
>   include/exec: Widen tlb_hit/tlb_hit_page()
>   accel/tcg: Widen address arg in tlb_compare_set()
>   accel/tcg: Update run_on_cpu_data static assert
>
> Mark Cave-Ayland (1):
>   docs/devel/tcg-ops: fix missing newlines in "Host vector operations"
>
> Michael Tokarev (1):
>   tcg: spelling fixes
>
> Philippe Mathieu-Daudé (9):
>   docs/devel/tcg-ops: Bury mentions of trunc_shr_i64_i32()
>   tcg/tcg-op: Document bswap16_i32() byte pattern
>   tcg/tcg-op: Document bswap16_i64() byte pattern
>   tcg/tcg-op: Document bswap32_i32() byte pattern
>   tcg/tcg-op: Document bswap32_i64() byte pattern
>   tcg/tcg-op: Document bswap64_i64() byte pattern
>   tcg/tcg-op: Document hswap_i32/64() byte pattern
>   tcg/tcg-op: Document wswap_i64() byte pattern
>   target/cris: Fix a typo in gen_swapr()
>
> Richard Henderson (28):
>   target/m68k: Use tcg_gen_deposit_i32 in gen_partset_reg
>   tcg/i386: Drop BYTEH deposits for 64-bit
>   tcg: Fold deposit with zero to and
>   tcg/i386: Allow immediate as input to deposit_*
>   tcg: Unify TCG_TARGET_HAS_extr[lh]_i64_i32
>   tcg: Introduce negsetcond opcodes
>   tcg: Use tcg_gen_negsetcond_*
>   target/alpha: Use tcg_gen_movcond_i64 in gen_fold_mzero
>   target/arm: Use tcg_gen_negsetcond_*
>   target/m68k: Use tcg_gen_negsetcond_*
>   target/openrisc: Use tcg_gen_negsetcond_*
>   target/ppc: Use tcg_gen_negsetcond_*
>   target/sparc: Use tcg_gen_movcond_i64 in gen_edge
>   target/tricore: Replace gen_cond_w with tcg_gen_negsetcond_tl
>   tcg/ppc: Implement negsetcond_*
>   tcg/ppc: Use the Set Boolean Extension
>   tcg/aarch64: Implement negsetcond_*
>   tcg/arm: Implement negsetcond_i32
>   tcg/riscv: Implement negsetcond_*
>   tcg/s390x: Implement negsetcond_*
>   tcg/sparc64: Implement negsetcond_*
>   tcg/i386: Merge tcg_out_brcond{32,64}
>   tcg/i386: Merge tcg_out_setcond{32,64}
>   tcg/i386: Merge tcg_out_movcond{32,64}
>   tcg/i386: Use CMP+SBB in tcg_out_setcond
>   tcg/i386: Clear dest first in tcg_out_setcond if possible
>   tcg/i386: Use shift in tcg_out_setcond
>   tcg/i386: Implement negsetcond_*
>
>  docs/devel/tcg-ops.rst |  15 +-
>  accel/tcg/atomic_template.h|  16 +-
>  include/exec/cpu-all.h |   4 +-
>  include/exec/cpu_ldst.h|  28 +--
>  include/sysemu/hvf.h   |  12 +-
>  include/sysemu/kvm.h   |  12 +-
>  include/tcg/tcg-op-common.h|   4 +
>  include/tcg/tcg-op.h   |   2 +
>  include/tcg/tcg-opc.h  |   6 +-
>  include/tcg/tcg.h  |   4 +-
>  tcg/aarch64/tcg-target.h   |   5 +-
>  tcg/arm/tcg-target.h   |   1 +
>  tcg/i386/tcg-target-con-set.h  |   2 +-
>  tcg/i386/tcg-target-con-str.h  |   1 -
>  tcg/i386/tcg-target.h  |   9 +-
>  tcg/loongarch64/tcg-target.h   |   6 +-
>  tcg/mips/tcg-target.h  |   5 +-
>  tcg/ppc/tcg-target.h   |   5 +-
>

Re: [PULL 0/5] Devel hppa priv cleanup2 patches

2023-08-28 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/14] Python, i386 changes for 2023-08-28

2023-08-28 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PATCH v2 00/48] tcg patch queue

2023-08-28 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 0/3] Dirty page rate and dirty page limit 20230828 patches

2023-08-28 Thread Stefan Hajnoczi

I see you sent a second pull request that includes the emails.

Please send a v3 with a signed tag. Thanks!

Stefan

Re: [PULL 0/3] Dirty page rate and dirty page limit 20230828 patches

2023-08-28 Thread Stefan Hajnoczi

On Mon, 28 Aug 2023 at 10:36, Hyman Huang  wrote:
>
> From: Hyman 
>
> The following changes since commit 50e7a40af372ee5931c99ef7390f5d3d6fbf6ec4:
>
>   Merge tag 'pull-target-arm-20230824' of 
> https://git.linaro.org/people/pmaydell/qemu-arm into staging (2023-08-24 
> 10:08:33 -0400)
>
> are available in the git repository at:
>
>   https://github.com/newfriday/qemu.git 
> tags/dirtylimit-dirtyrate-fixes-pull-request

Hi,
This is not a signed tag. Please use "git tag -s" so the tag is signed
with your GPG key.

I also noticed that this pull request email thread only has a cover
letter. Please also send the individual patches along with the pull
request email. This makes it easier for people to reply if they have
comments about a patch.

After pushing a signed tag, please send the pull request again with
"PULL v2" in the subject line. Thanks!

Thanks,
Stefan

>
> for you to fetch changes up to e424d9f7e749c84de4a6ce532981271db1c14b23:
>
>   migration/dirtyrate: Fix precision losses and g_usleep overshoot 
> (2023-08-28 21:03:58 +0800)
>
> 
> Dirty page limit and dirty page rate PULL request
>
> Hi, this is the fix for dirty page limit and dirty page rate.
>
> Please apply.
>
> Thanks, Yong.
> 
> Andrei Gudkov (1):
>   migration/dirtyrate: Fix precision losses and g_usleep overshoot
>
> alloc.young (2):
>   softmmu: Fix dirtylimit memory leak
>   softmmu/dirtylimit: Convert free to g_free
>
>  migration/dirtyrate.c | 10 --
>  softmmu/dirtylimit.c  | 26 --
>  2 files changed, 20 insertions(+), 16 deletions(-)
>
> --
> 1.8.3.1
>
>

Re: [PATCH 2/2] io: follow coroutine AioContext in qio_channel_yield()

2023-08-24 Thread Stefan Hajnoczi

On Thu, Aug 24, 2023 at 12:26:05PM +0100, Daniel P. Berrangé wrote:
> On Wed, Aug 23, 2023 at 07:45:04PM -0400, Stefan Hajnoczi wrote:
> > The ongoing QEMU multi-queue block layer effort makes it possible for 
> > multiple
> > threads to process I/O in parallel. The nbd block driver is not compatible 
> > with
> > the multi-queue block layer yet because QIOChannel cannot be used easily 
> > from
> > coroutines running in multiple threads. This series changes the QIOChannel 
> > API
> > to make that possible.
> > 
> > In the current API, calling qio_channel_attach_aio_context() sets the
> > AioContext where qio_channel_yield() installs an fd handler prior to 
> > yielding:
> > 
> >   qio_channel_attach_aio_context(ioc, my_ctx);
> >   ...
> >   qio_channel_yield(ioc); // my_ctx is used here
> >   ...
> >   qio_channel_detach_aio_context(ioc);
> > 
> > This API design has limitations: reading and writing must be done in the 
> > same
> > AioContext and moving between AioContexts involves a cumbersome sequence of 
> > API
> > calls that is not suitable for doing on a per-request basis.
> > 
> > There is no fundamental reason why a QIOChannel needs to run within the
> > same AioContext every time qio_channel_yield() is called. QIOChannel
> > only uses the AioContext while inside qio_channel_yield(). The rest of
> > the time, QIOChannel is independent of any AioContext.
> > 
> > In the new API, qio_channel_yield() queries the AioContext from the current
> > coroutine using qemu_coroutine_get_aio_context(). There is no need to
> > explicitly attach/detach AioContexts anymore and
> > qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are 
> > gone.
> > One coroutine can read from the QIOChannel while another coroutine writes 
> > from
> > a different AioContext.
> > 
> > This API change allows the nbd block driver to use QIOChannel from any 
> > thread.
> > It's important to keep in mind that the block driver already synchronizes
> > QIOChannel access and ensures that two coroutines never read simultaneously 
> > or
> > write simultaneously.
> > 
> > This patch updates all users of qio_channel_attach_aio_context() to the
> > new API. Most conversions are simple, but vhost-user-server requires a
> > new qemu_coroutine_yield() call to quiesce the vu_client_trip()
> > coroutine when not attached to any AioContext.
> > 
> > While the API is has become simpler, there is one wart: QIOChannel has a
> > special case for the iohandler AioContext (used for handlers that must not 
> > run
> > in nested event loops). I didn't find an elegant way preserve that 
> > behavior, so
> > I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, 
> > true|false)
> > for opting in to the new AioContext model. By default QIOChannel uses the
> > iohandler AioHandler. Code that formerly called
> > qio_channel_attach_aio_context() now calls
> > qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
> > created.
> 
> I wonder if it is better to just pass the AioContext object into
> qio_channel_yield explicitly eg have
> 
>   qio_channel_yield(QIOChannel *ioc,
> AioContext *ctx,
> GIOCondition cond);
> 
> With semantics that if 'ctx == NULL', then we assume the default
> global iohandler context, and for non-default context it must
> be non-NULL ?
> 
> That could nicely de-couple the API  from relying on global
> coroutine/thread state for querying an AioContext, which
> makes it easier to reason about IMHO.

Hi Dan,
I've done most of the audit necessary to understand which AioContext is
used where. The call graph is large because qio_channel_yield() is used
internally by qio_channel_readv_full_all_eof(),
qio_channel_writev_full_all(), and their variants. They would all need
a new AioContext argument.

I think it's not worth passing AioContext explicitly everywhere since
this involves a lot of code changes and more verbosity to achieve what
we already have.

However, If you think the QIOChannel API should pass AioContext
explicitly then I'll go ahead and make the changes.

Here is what I've explored so far:

qio_channel_readv_full_all_eof
  mpqemu_read - should be doable
  qio_channel_readv_all_eof
qio_channel_read_all_eof
  multifd_recv_thread - NULL non-coroutine
vu_message_read - coroutine AioContext
  qio_channel_readv_full_all
hw/virtio/vhost-user.c:backend_read() - NULL non-coroutine
qio_channel_readv_all
  nbd_co_receive_offset_data_payload - coroutine AioContext
  nbd_co_do_receive_on

Re: [PATCH 2/2] io: follow coroutine AioContext in qio_channel_yield()

2023-08-24 Thread Stefan Hajnoczi

On Thu, Aug 24, 2023 at 01:09:59PM -0300, Fabiano Rosas wrote:
> Stefan Hajnoczi  writes:
> 
> > @@ -2089,10 +2088,6 @@ static void nbd_attach_aio_context(BlockDriverState 
> > *bs,
> >   * the reconnect_delay_timer cannot be active here.
> >   */
> >  assert(!s->reconnect_delay_timer);
> > -
> > -if (s->ioc) {
> > -qio_channel_attach_aio_context(s->ioc, new_context);
> > -}
> >  }
> >  
> >  static void nbd_detach_aio_context(BlockDriverState *bs)
> > @@ -2101,10 +2096,6 @@ static void nbd_detach_aio_context(BlockDriverState 
> > *bs)
> >  
> >  assert(!s->open_timer);
> >  assert(!s->reconnect_delay_timer);
> > -
> > -if (s->ioc) {
> > -qio_channel_detach_aio_context(s->ioc);
> > -}
> >  }
> 
> The whole attach/detach functions could go away.

Yes, but at the expense of losing the assertions. Some of them have
extensive comments and I guess they are important to someone, so I
didn't drop them.

> 
> >  
> >  static BlockDriver bdrv_nbd = {
> > diff --git a/io/channel-command.c b/io/channel-command.c
> > index 7ed726c802..1f61026222 100644
> > --- a/io/channel-command.c
> > +++ b/io/channel-command.c
> > @@ -331,14 +331,21 @@ static int qio_channel_command_close(QIOChannel *ioc,
> >  
> >  
> >  static void qio_channel_command_set_aio_fd_handler(QIOChannel *ioc,
> > -   AioContext *ctx,
> > +   AioContext *read_ctx,
> > IOHandler *io_read,
> > +   AioContext *write_ctx,
> > IOHandler *io_write,
> > void *opaque)
> >  {
> >  QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc);
> > -aio_set_fd_handler(ctx, cioc->readfd, io_read, NULL, NULL, NULL, 
> > opaque);
> > -aio_set_fd_handler(ctx, cioc->writefd, NULL, io_write, NULL, NULL, 
> > opaque);
> > +if (read_ctx) {
> > +aio_set_fd_handler(read_ctx, cioc->readfd, io_read, NULL,
> > +NULL, NULL, opaque);
> > +}
> > +if (write_ctx) {
> > +aio_set_fd_handler(write_ctx, cioc->writefd, NULL, io_write,
> > +NULL, NULL, opaque);
> > +}
> >  }
> >  
> >  
> 
> ...
> 
> > diff --git a/nbd/client.c b/nbd/client.c
> > index 479208d5d9..81877d088d 100644
> > --- a/nbd/client.c
> > +++ b/nbd/client.c
> > @@ -948,7 +948,7 @@ static int nbd_start_negotiate(AioContext *aio_context, 
> > QIOChannel *ioc,
> >  ioc = *outioc;
> >  if (aio_context) {
> >  qio_channel_set_blocking(ioc, false, NULL);
> > -qio_channel_attach_aio_context(ioc, aio_context);
> > +qio_channel_set_follow_coroutine_ctx(ioc, true);
> 
> This is actually unreachable, aio_context is always NULL here.

Interesting, I will add a patch to remove the dead code. Thanks!


signature.asc
Description: PGP signature

Re: [PATCH v2 0/3] block: align CoR requests to subclusters

2023-08-24 Thread Stefan Hajnoczi

On Thu, Aug 24, 2023 at 06:47:20PM +0300, Andrey Drobyshev wrote:
> On 8/24/23 17:32, Stefan Hajnoczi wrote:
> > On Wed, Aug 23, 2023 at 03:50:55PM +0300, Andrey Drobyshev wrote:
> >> On 8/22/23 22:58, John Snow wrote:
> >>> On Tue, Aug 22, 2023 at 1:33 PM Andrey Drobyshev
> >>>  wrote:
> >>>>
> >>>> On 8/16/23 12:22, Andrey Drobyshev wrote:
> >>>>> On 7/31/23 17:51, Andrey Drobyshev wrote:
> >>>>>> On 7/24/23 16:11, Andrey Drobyshev wrote:
> >>>>>>> On 7/11/23 20:25, Andrey Drobyshev wrote:
> >>>>>>>> v1 --> v2:
> >>>>>>>>  * Fixed line indentation;
> >>>>>>>>  * Fixed wording in a comment;
> >>>>>>>>  * Added R-b.
> >>>>>>>>
> >>>>>>>> v1: 
> >>>>>>>> https://lists.nongnu.org/archive/html/qemu-block/2023-06/msg00606.html
> >>>>>>>>
> >>>>>>>> Andrey Drobyshev (3):
> >>>>>>>>   block: add subcluster_size field to BlockDriverInfo
> >>>>>>>>   block/io: align requests to subcluster_size
> >>>>>>>>   tests/qemu-iotests/197: add testcase for CoR with subclusters
> >>>>>>>>
> >>>>>>>>  block.c  |  7 +
> >>>>>>>>  block/io.c   | 50 
> >>>>>>>> ++--
> >>>>>>>>  block/mirror.c   |  8 +++---
> >>>>>>>>  block/qcow2.c|  1 +
> >>>>>>>>  include/block/block-common.h |  5 
> >>>>>>>>  include/block/block-io.h |  8 +++---
> >>>>>>>>  tests/qemu-iotests/197   | 29 +
> >>>>>>>>  tests/qemu-iotests/197.out   | 24 +
> >>>>>>>>  8 files changed, 99 insertions(+), 33 deletions(-)
> >>>>>>>>
> >>>>>>>
> >>>>>>> Ping
> >>>>>>
> >>>>>> Another ping
> >>>>>
> >>>>> Yet another friendly ping
> >>>>
> >>>> One more friendly ping
> >>>
> >>> Looks like Stefan gave you an R-B for the series; do we just need an
> >>> ACK by the block maintainers at this point or is there someone
> >>> specific you're hoping will review this?
> >>>
> >>> --js
> >>>
> >>
> >> Hi John,
> >>
> >> I figure a maintainer's R-b doesn't imply the patches being merged into
> >> the tree.  Hence I'm waiting for the notice that they actually are merged.
> >>
> >> Please let me know if the process should be different.
> > 
> > Hi Andrey,
> > Kevin is away right now but seemed happy enough when I mentioned this
> > series to him, so I have merged this into my own tree:
> > 
> >   https://gitlab.com/stefanha/qemu block
> > 
> > Sorry that your patch series have not been merged in a timely manner.
> > 
> > Stefan
> 
> Hi Stefan,
> Good news! Thank you for the notice.
> 
> When you have some time would you mind taking a look at these series as
> well:
> 
> https://lists.nongnu.org/archive/html/qemu-block/2023-06/msg00068.html
> https://lists.nongnu.org/archive/html/qemu-block/2023-07/msg00106.html
> 
> They've been hanging in the list for weeks (probably even longer than
> this one) with little to no feedback.  Would be nice to know if people
> find those changes useful at all.

Kevin will be back on Tuesday and Hanna Czenczek said she may have time
to look before then. Hopefully they will get taken care of soon.

Thanks,
Stefan


signature.asc
Description: PGP signature

Re: [PATCH] hw/block/dataplane/virtio-block: Avoid dynamic stack allocation

2023-08-24 Thread Stefan Hajnoczi

On Thu, Aug 24, 2023 at 05:57:40PM +0100, Peter Maydell wrote:
> Instead of using a variable length array in notify_guest_bh(), always
> use a fixed sized bitmap (this will be 128 bytes).  This means we
> need to avoid assuming that bitmap and the s->batch_notify_vqs bitmap
> are the same size; the neatest way to do this is to switch to using
> bitmap.h APIs to declare, copy and clear, because then we can specify
> the length in bits, exactly as we did when creating
> s->batch_notify_vqs with bitmap_new().
> 
> The codebase has very few VLAs, and if we can get rid of them all we
> can make the compiler error on new additions.  This is a defensive
> measure against security bugs where an on-stack dynamic allocation
> isn't correctly size-checked (e.g.  CVE-2021-3527).
> 
> Signed-off-by: Peter Maydell 
> ---
> In discussion on Philippe's attempt at getting rid of this VLA:
> https://patchew.org/QEMU/20210505211047.1496765-1-phi...@redhat.com/20210505211047.1496765-7-phi...@redhat.com/
> Stefan suggested getting rid of the local bitmap array entirely.
> But I don't know this code at all and have no idea of the
> implications (presumably there is a reason we have the local
> array rather than iterating directly on batch_notify_vqs),
> so I have opted for the more minimal change.
> 
> Usual disclaimer: tested only with "make check" and
> "make check-avocado".
> ---
>  hw/block/dataplane/virtio-blk.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)

Hi Peter,
I recently sent a patch series that removes notify_guest_bh() completely:
https://lore.kernel.org/qemu-devel/20230817155847.3605115-5-stefa...@redhat.com/

If it's urgent we can merge your patch immediately, though I hope my
series will be merged soon anyway:
Reviewed-by: Stefan Hajnoczi 

> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> index da36fcfd0b5..f31ec79d0b2 100644
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -59,11 +59,16 @@ static void notify_guest_bh(void *opaque)
>  {
>  VirtIOBlockDataPlane *s = opaque;
>  unsigned nvqs = s->conf->num_queues;
> -unsigned long bitmap[BITS_TO_LONGS(nvqs)];
> +DECLARE_BITMAP(bitmap, VIRTIO_QUEUE_MAX);
>  unsigned j;
>  
> -memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
> -memset(s->batch_notify_vqs, 0, sizeof(bitmap));
> +/*
> + * Note that our local 'bitmap' is declared at a fixed
> + * worst case size, but s->batch_notify_vqs has only
> + * nvqs bits in it.
> + */
> +bitmap_copy(bitmap, s->batch_notify_vqs, nvqs);
> +bitmap_zero(s->batch_notify_vqs, nvqs);
>  
>  for (j = 0; j < nvqs; j += BITS_PER_LONG) {
>  unsigned long bits = bitmap[j / BITS_PER_LONG];
> -- 
> 2.34.1
> 


signature.asc
Description: PGP signature

Re: [PATCH 2/2] io: follow coroutine AioContext in qio_channel_yield()

2023-08-24 Thread Stefan Hajnoczi

On Thu, Aug 24, 2023 at 12:26:05PM +0100, Daniel P. Berrangé wrote:
> On Wed, Aug 23, 2023 at 07:45:04PM -0400, Stefan Hajnoczi wrote:
> > The ongoing QEMU multi-queue block layer effort makes it possible for 
> > multiple
> > threads to process I/O in parallel. The nbd block driver is not compatible 
> > with
> > the multi-queue block layer yet because QIOChannel cannot be used easily 
> > from
> > coroutines running in multiple threads. This series changes the QIOChannel 
> > API
> > to make that possible.
> > 
> > In the current API, calling qio_channel_attach_aio_context() sets the
> > AioContext where qio_channel_yield() installs an fd handler prior to 
> > yielding:
> > 
> >   qio_channel_attach_aio_context(ioc, my_ctx);
> >   ...
> >   qio_channel_yield(ioc); // my_ctx is used here
> >   ...
> >   qio_channel_detach_aio_context(ioc);
> > 
> > This API design has limitations: reading and writing must be done in the 
> > same
> > AioContext and moving between AioContexts involves a cumbersome sequence of 
> > API
> > calls that is not suitable for doing on a per-request basis.
> > 
> > There is no fundamental reason why a QIOChannel needs to run within the
> > same AioContext every time qio_channel_yield() is called. QIOChannel
> > only uses the AioContext while inside qio_channel_yield(). The rest of
> > the time, QIOChannel is independent of any AioContext.
> > 
> > In the new API, qio_channel_yield() queries the AioContext from the current
> > coroutine using qemu_coroutine_get_aio_context(). There is no need to
> > explicitly attach/detach AioContexts anymore and
> > qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are 
> > gone.
> > One coroutine can read from the QIOChannel while another coroutine writes 
> > from
> > a different AioContext.
> > 
> > This API change allows the nbd block driver to use QIOChannel from any 
> > thread.
> > It's important to keep in mind that the block driver already synchronizes
> > QIOChannel access and ensures that two coroutines never read simultaneously 
> > or
> > write simultaneously.
> > 
> > This patch updates all users of qio_channel_attach_aio_context() to the
> > new API. Most conversions are simple, but vhost-user-server requires a
> > new qemu_coroutine_yield() call to quiesce the vu_client_trip()
> > coroutine when not attached to any AioContext.
> > 
> > While the API is has become simpler, there is one wart: QIOChannel has a
> > special case for the iohandler AioContext (used for handlers that must not 
> > run
> > in nested event loops). I didn't find an elegant way preserve that 
> > behavior, so
> > I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, 
> > true|false)
> > for opting in to the new AioContext model. By default QIOChannel uses the
> > iohandler AioHandler. Code that formerly called
> > qio_channel_attach_aio_context() now calls
> > qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
> > created.
> 
> I wonder if it is better to just pass the AioContext object into
> qio_channel_yield explicitly eg have
> 
>   qio_channel_yield(QIOChannel *ioc,
> AioContext *ctx,
> GIOCondition cond);
> 
> With semantics that if 'ctx == NULL', then we assume the default
> global iohandler context, and for non-default context it must
> be non-NULL ?
> 
> That could nicely de-couple the API  from relying on global
> coroutine/thread state for querying an AioContext, which
> makes it easier to reason about IMHO.

I like the idea and am auditing all callers of qio_channel_yield() to
see whether passing along the AioContext is feasible. Hopefully the next
version of this series can take that approach.

> 
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  include/io/channel.h |  34 +++--
> >  include/qemu/vhost-user-server.h |   1 +
> >  block/nbd.c  |  11 +--
> >  io/channel-command.c |  13 +++-
> >  io/channel-file.c|  18 -
> >  io/channel-null.c|   3 +-
> >  io/channel-socket.c  |  18 -
> >  io/channel-tls.c |   6 +-
> >  io/channel.c | 120 ++-
> >  migration/channel-block.c|   3 +-
> >  nbd/client.c |   2 +-
> >  nbd/server.c |  14 +---
> >  scsi/qemu-pr-helper.c|   4 +-
> >  util/vhost-user

Re: [PATCH] aio-posix: zero out io_uring sqe user_data

2023-08-24 Thread Stefan Hajnoczi

On Wed, Apr 26, 2023 at 05:26:39PM -0400, Stefan Hajnoczi wrote:
> liburing does not clear sqe->user_data. We must do it ourselves to avoid
> undefined behavior in process_cqe() when user_data is used.
> 
> Note that fdmon-io_uring is currently disabled, so this is a latent bug
> that does not affect users. Let's merge this fix now to make it easier
> to enable fdmon-io_uring in the future (and I'm working on that).
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  util/fdmon-io_uring.c | 2 ++
>  1 file changed, 2 insertions(+)

Thanks, applied to my block-next tree:
https://gitlab.com/stefanha/qemu/commits/block-next

Stefan


signature.asc
Description: PGP signature

[PULL 8/8] tests/qemu-iotests/197: add testcase for CoR with subclusters

2023-08-24 Thread Stefan Hajnoczi

From: Andrey Drobyshev via 

Add testcase which checks that allocations during copy-on-read are
performed on the subcluster basis when subclusters are enabled in target
image.

This testcase also triggers the following assert with previous commit
not being applied, so we check that as well:

qemu-io: ../block/io.c:1236: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < 
pnum' failed.

Reviewed-by: Eric Blake 
Reviewed-by: Denis V. Lunev 
Signed-off-by: Andrey Drobyshev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20230711172553.234055-4-andrey.drobys...@virtuozzo.com>
---
 tests/qemu-iotests/197 | 29 +
 tests/qemu-iotests/197.out | 24 
 2 files changed, 53 insertions(+)

diff --git a/tests/qemu-iotests/197 b/tests/qemu-iotests/197
index a2547bc280..f07a9da136 100755
--- a/tests/qemu-iotests/197
+++ b/tests/qemu-iotests/197
@@ -122,6 +122,35 @@ $QEMU_IO -f qcow2 -C -c 'read 0 1024' "$TEST_WRAP" | 
_filter_qemu_io
 $QEMU_IO -f qcow2 -c map "$TEST_WRAP"
 _check_test_img
 
+echo
+echo '=== Copy-on-read with subclusters ==='
+echo
+
+# Create base and top images 64K (1 cluster) each.  Make subclusters enabled
+# for the top image
+_make_test_img 64K
+IMGPROTO=file IMGFMT=qcow2 TEST_IMG_FILE="$TEST_WRAP" \
+_make_test_img --no-opts -o extended_l2=true -F "$IMGFMT" -b "$TEST_IMG" \
+64K | _filter_img_create
+
+$QEMU_IO -c "write -P 0xaa 0 64k" "$TEST_IMG" | _filter_qemu_io
+
+# Allocate individual subclusters in the top image, and not the whole cluster
+$QEMU_IO -c "write -P 0xbb 28K 2K" -c "write -P 0xcc 34K 2K" "$TEST_WRAP" \
+| _filter_qemu_io
+
+# Only 2 subclusters should be allocated in the top image at this point
+$QEMU_IMG map "$TEST_WRAP" | _filter_qemu_img_map
+
+# Actual copy-on-read operation
+$QEMU_IO -C -c "read -P 0xaa 30K 4K" "$TEST_WRAP" | _filter_qemu_io
+
+# And here we should have 4 subclusters allocated right in the middle of the
+# top image. Make sure the whole cluster remains unallocated
+$QEMU_IMG map "$TEST_WRAP" | _filter_qemu_img_map
+
+_check_test_img
+
 # success, all done
 echo '*** done'
 status=0
diff --git a/tests/qemu-iotests/197.out b/tests/qemu-iotests/197.out
index ad414c3b0e..8f34a30afe 100644
--- a/tests/qemu-iotests/197.out
+++ b/tests/qemu-iotests/197.out
@@ -31,4 +31,28 @@ read 1024/1024 bytes at offset 0
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 1 KiB (0x400) bytes allocated at offset 0 bytes (0x0)
 No errors were found on the image.
+
+=== Copy-on-read with subclusters ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65536
+Formatting 'TEST_DIR/t.wrap.IMGFMT', fmt=IMGFMT size=65536 
backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 28672
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 34816
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset  Length  File
+0   0x7000  TEST_DIR/t.IMGFMT
+0x7000  0x800   TEST_DIR/t.wrap.IMGFMT
+0x7800  0x1000  TEST_DIR/t.IMGFMT
+0x8800  0x800   TEST_DIR/t.wrap.IMGFMT
+0x9000  0x7000  TEST_DIR/t.IMGFMT
+read 4096/4096 bytes at offset 30720
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset  Length  File
+0   0x7000  TEST_DIR/t.IMGFMT
+0x7000  0x2000  TEST_DIR/t.wrap.IMGFMT
+0x9000  0x7000  TEST_DIR/t.IMGFMT
+No errors were found on the image.
 *** done
-- 
2.41.0

[PULL 5/8] block-migration: Ensure we don't crash during migration cleanup

2023-08-24 Thread Stefan Hajnoczi

From: Fabiano Rosas 

We can fail the blk_insert_bs() at init_blk_migration(), leaving the
BlkMigDevState without a dirty_bitmap and BlockDriverState. Account
for the possibly missing elements when doing cleanup.

Fix the following crashes:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x55ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at 
../block/dirty-bitmap.c:359
359 BlockDriverState *bs = bitmap->bs;
 #0  0x55ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at 
../block/dirty-bitmap.c:359
 #1  0x55bba331 in unset_dirty_tracking () at ../migration/block.c:371
 #2  0x55bbad98 in block_migration_cleanup_bmds () at 
../migration/block.c:681

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x55e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, 
reason=0x0) at ../block.c:7073
7073QLIST_FOREACH_SAFE(blocker, >op_blockers[op], list, next) {
 #0  0x55e971ff in bdrv_op_unblock (bs=0x0, 
op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073
 #1  0x55e9734a in bdrv_op_unblock_all (bs=0x0, reason=0x0) at 
../block.c:7095
 #2  0x55bbae13 in block_migration_cleanup_bmds () at 
../migration/block.c:690

Signed-off-by: Fabiano Rosas 
Message-id: 20230731203338.27581-1-faro...@suse.de
Signed-off-by: Stefan Hajnoczi 
---
 migration/block.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index b9580a6c7e..86c2256a2b 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -368,7 +368,9 @@ static void unset_dirty_tracking(void)
 BlkMigDevState *bmds;
 
 QSIMPLEQ_FOREACH(bmds, _mig_state.bmds_list, entry) {
-bdrv_release_dirty_bitmap(bmds->dirty_bitmap);
+if (bmds->dirty_bitmap) {
+bdrv_release_dirty_bitmap(bmds->dirty_bitmap);
+}
 }
 }
 
@@ -676,13 +678,18 @@ static int64_t get_remaining_dirty(void)
 static void block_migration_cleanup_bmds(void)
 {
 BlkMigDevState *bmds;
+BlockDriverState *bs;
 AioContext *ctx;
 
 unset_dirty_tracking();
 
 while ((bmds = QSIMPLEQ_FIRST(_mig_state.bmds_list)) != NULL) {
 QSIMPLEQ_REMOVE_HEAD(_mig_state.bmds_list, entry);
-bdrv_op_unblock_all(blk_bs(bmds->blk), bmds->blocker);
+
+bs = blk_bs(bmds->blk);
+if (bs) {
+bdrv_op_unblock_all(bs, bmds->blocker);
+}
 error_free(bmds->blocker);
 
 /* Save ctx, because bmds->blk can disappear during blk_unref.  */
-- 
2.41.0

[PULL 7/8] block/io: align requests to subcluster_size

2023-08-24 Thread Stefan Hajnoczi

From: Andrey Drobyshev via 

When target image is using subclusters, and we align the request during
copy-on-read, it makes sense to align to subcluster_size rather than
cluster_size.  Otherwise we end up with unnecessary allocations.

This commit renames bdrv_round_to_clusters() to bdrv_round_to_subclusters()
and utilizes subcluster_size field of BlockDriverInfo to make necessary
alignments.  It affects copy-on-read as well as mirror job (which is
using bdrv_round_to_clusters()).

This change also fixes the following bug with failing assert (covered by
the test in the subsequent commit):

qemu-img create -f qcow2 base.qcow2 64K
qemu-img create -f qcow2 -o 
extended_l2=on,backing_file=base.qcow2,backing_fmt=qcow2 img.qcow2 64K
qemu-io -c "write -P 0xaa 0 2K" img.qcow2
qemu-io -C -c "read -P 0x00 2K 62K" img.qcow2

qemu-io: ../block/io.c:1236: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < 
pnum' failed.

Reviewed-by: Eric Blake 
Reviewed-by: Denis V. Lunev 
Signed-off-by: Andrey Drobyshev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20230711172553.234055-3-andrey.drobys...@virtuozzo.com>
---
 include/block/block-io.h |  8 +++
 block/io.c   | 50 
 block/mirror.c   |  8 +++
 3 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/include/block/block-io.h b/include/block/block-io.h
index 4415506e40..6db48f2d35 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -189,10 +189,10 @@ bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs,
   Error **errp);
 BlockStatsSpecific *bdrv_get_specific_stats(BlockDriverState *bs);
-void bdrv_round_to_clusters(BlockDriverState *bs,
-int64_t offset, int64_t bytes,
-int64_t *cluster_offset,
-int64_t *cluster_bytes);
+void bdrv_round_to_subclusters(BlockDriverState *bs,
+   int64_t offset, int64_t bytes,
+   int64_t *cluster_offset,
+   int64_t *cluster_bytes);
 
 void bdrv_get_backing_filename(BlockDriverState *bs,
char *filename, int filename_size);
diff --git a/block/io.c b/block/io.c
index 055fcf7438..76e7df18d8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -728,21 +728,21 @@ BdrvTrackedRequest *coroutine_fn 
bdrv_co_get_self_request(BlockDriverState *bs)
 }
 
 /**
- * Round a region to cluster boundaries
+ * Round a region to subcluster (if supported) or cluster boundaries
  */
 void coroutine_fn GRAPH_RDLOCK
-bdrv_round_to_clusters(BlockDriverState *bs, int64_t offset, int64_t bytes,
-   int64_t *cluster_offset, int64_t *cluster_bytes)
+bdrv_round_to_subclusters(BlockDriverState *bs, int64_t offset, int64_t bytes,
+  int64_t *align_offset, int64_t *align_bytes)
 {
 BlockDriverInfo bdi;
 IO_CODE();
-if (bdrv_co_get_info(bs, ) < 0 || bdi.cluster_size == 0) {
-*cluster_offset = offset;
-*cluster_bytes = bytes;
+if (bdrv_co_get_info(bs, ) < 0 || bdi.subcluster_size == 0) {
+*align_offset = offset;
+*align_bytes = bytes;
 } else {
-int64_t c = bdi.cluster_size;
-*cluster_offset = QEMU_ALIGN_DOWN(offset, c);
-*cluster_bytes = QEMU_ALIGN_UP(offset - *cluster_offset + bytes, c);
+int64_t c = bdi.subcluster_size;
+*align_offset = QEMU_ALIGN_DOWN(offset, c);
+*align_bytes = QEMU_ALIGN_UP(offset - *align_offset + bytes, c);
 }
 }
 
@@ -1168,8 +1168,8 @@ bdrv_co_do_copy_on_readv(BdrvChild *child, int64_t 
offset, int64_t bytes,
 void *bounce_buffer = NULL;
 
 BlockDriver *drv = bs->drv;
-int64_t cluster_offset;
-int64_t cluster_bytes;
+int64_t align_offset;
+int64_t align_bytes;
 int64_t skip_bytes;
 int ret;
 int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
@@ -1203,28 +1203,28 @@ bdrv_co_do_copy_on_readv(BdrvChild *child, int64_t 
offset, int64_t bytes,
  * BDRV_REQUEST_MAX_BYTES (even when the original read did not), which
  * is one reason we loop rather than doing it all at once.
  */
-bdrv_round_to_clusters(bs, offset, bytes, _offset, _bytes);
-skip_bytes = offset - cluster_offset;
+bdrv_round_to_subclusters(bs, offset, bytes, _offset, _bytes);
+skip_bytes = offset - align_offset;
 
 trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
-   cluster_offset, cluster_bytes);
+   align_offset, align_bytes);
 
-while (cluster_bytes) {
+while (align_bytes) {
 int64_t pnum;
 
 if (skip_write) {
 ret = 1; /* "already allocated", so nothing will be copied */
-pnum =

[PULL 3/8] hw/ufs: Support for UFS logical unit

2023-08-24 Thread Stefan Hajnoczi

From: Jeuk Kim 

This commit adds support for ufs logical unit.
The LU handles processing for the SCSI command,
unit descriptor query request.

This commit enables the UFS device to process
IO requests.

Signed-off-by: Jeuk Kim 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
898cea923e819dc21a99597bf045a12d7983be28.1691062912.git.jeuk20@samsung.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/ufs/ufs.h |   43 ++
 include/scsi/constants.h |1 +
 hw/ufs/lu.c  | 1445 ++
 hw/ufs/ufs.c |  252 ++-
 hw/ufs/meson.build   |2 +-
 hw/ufs/trace-events  |   25 +
 6 files changed, 1761 insertions(+), 7 deletions(-)
 create mode 100644 hw/ufs/lu.c

diff --git a/hw/ufs/ufs.h b/hw/ufs/ufs.h
index 3d1b2cff4e..f244228617 100644
--- a/hw/ufs/ufs.h
+++ b/hw/ufs/ufs.h
@@ -18,6 +18,18 @@
 #define UFS_MAX_LUS 32
 #define UFS_BLOCK_SIZE 4096
 
+typedef struct UfsBusClass {
+BusClass parent_class;
+bool (*parent_check_address)(BusState *bus, DeviceState *dev, Error 
**errp);
+} UfsBusClass;
+
+typedef struct UfsBus {
+SCSIBus parent_bus;
+} UfsBus;
+
+#define TYPE_UFS_BUS "ufs-bus"
+DECLARE_OBJ_CHECKERS(UfsBus, UfsBusClass, UFS_BUS, TYPE_UFS_BUS)
+
 typedef enum UfsRequestState {
 UFS_REQUEST_IDLE = 0,
 UFS_REQUEST_READY = 1,
@@ -29,6 +41,7 @@ typedef enum UfsRequestState {
 typedef enum UfsReqResult {
 UFS_REQUEST_SUCCESS = 0,
 UFS_REQUEST_FAIL = 1,
+UFS_REQUEST_NO_COMPLETE = 2,
 } UfsReqResult;
 
 typedef struct UfsRequest {
@@ -44,6 +57,17 @@ typedef struct UfsRequest {
 QEMUSGList *sg;
 } UfsRequest;
 
+typedef struct UfsLu {
+SCSIDevice qdev;
+uint8_t lun;
+UnitDescriptor unit_desc;
+} UfsLu;
+
+typedef struct UfsWLu {
+SCSIDevice qdev;
+uint8_t lun;
+} UfsWLu;
+
 typedef struct UfsParams {
 char *serial;
 uint8_t nutrs; /* Number of UTP Transfer Request Slots */
@@ -52,12 +76,18 @@ typedef struct UfsParams {
 
 typedef struct UfsHc {
 PCIDevice parent_obj;
+UfsBus bus;
 MemoryRegion iomem;
 UfsReg reg;
 UfsParams params;
 uint32_t reg_size;
 UfsRequest *req_list;
 
+UfsLu *lus[UFS_MAX_LUS];
+UfsWLu *report_wlu;
+UfsWLu *dev_wlu;
+UfsWLu *boot_wlu;
+UfsWLu *rpmb_wlu;
 DeviceDescriptor device_desc;
 GeometryDescriptor geometry_desc;
 Attributes attributes;
@@ -71,6 +101,12 @@ typedef struct UfsHc {
 #define TYPE_UFS "ufs"
 #define UFS(obj) OBJECT_CHECK(UfsHc, (obj), TYPE_UFS)
 
+#define TYPE_UFS_LU "ufs-lu"
+#define UFSLU(obj) OBJECT_CHECK(UfsLu, (obj), TYPE_UFS_LU)
+
+#define TYPE_UFS_WLU "ufs-wlu"
+#define UFSWLU(obj) OBJECT_CHECK(UfsWLu, (obj), TYPE_UFS_WLU)
+
 typedef enum UfsQueryFlagPerm {
 UFS_QUERY_FLAG_NONE = 0x0,
 UFS_QUERY_FLAG_READ = 0x1,
@@ -85,4 +121,11 @@ typedef enum UfsQueryAttrPerm {
 UFS_QUERY_ATTR_WRITE = 0x2,
 } UfsQueryAttrPerm;
 
+static inline bool is_wlun(uint8_t lun)
+{
+return (lun == UFS_UPIU_REPORT_LUNS_WLUN ||
+lun == UFS_UPIU_UFS_DEVICE_WLUN || lun == UFS_UPIU_BOOT_WLUN ||
+lun == UFS_UPIU_RPMB_WLUN);
+}
+
 #endif /* HW_UFS_UFS_H */
diff --git a/include/scsi/constants.h b/include/scsi/constants.h
index 6a8bad556a..9b98451912 100644
--- a/include/scsi/constants.h
+++ b/include/scsi/constants.h
@@ -231,6 +231,7 @@
 #define MODE_PAGE_FLEXIBLE_DISK_GEOMETRY  0x05
 #define MODE_PAGE_CACHING 0x08
 #define MODE_PAGE_AUDIO_CTL   0x0e
+#define MODE_PAGE_CONTROL 0x0a
 #define MODE_PAGE_POWER   0x1a
 #define MODE_PAGE_FAULT_FAIL  0x1c
 #define MODE_PAGE_TO_PROTECT  0x1d
diff --git a/hw/ufs/lu.c b/hw/ufs/lu.c
new file mode 100644
index 00..ec8932ff86
--- /dev/null
+++ b/hw/ufs/lu.c
@@ -0,0 +1,1445 @@
+/*
+ * QEMU UFS Logical Unit
+ *
+ * Copyright (c) 2023 Samsung Electronics Co., Ltd. All rights reserved.
+ *
+ * Written by Jeuk Kim 
+ *
+ * This code is licensed under the GNU GPL v2 or later.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qapi/error.h"
+#include "qemu/memalign.h"
+#include "hw/scsi/scsi.h"
+#include "scsi/constants.h"
+#include "sysemu/block-backend.h"
+#include "qemu/cutils.h"
+#include "trace.h"
+#include "ufs.h"
+
+/*
+ * The code below handling SCSI commands is copied from hw/scsi/scsi-disk.c,
+ * with minor adjustments to make it work for UFS.
+ */
+
+#define SCSI_DMA_BUF_SIZE (128 * KiB)
+#define SCSI_MAX_INQUIRY_LEN 256
+#define SCSI_INQUIRY_DATA_SIZE 36
+#define SCSI_MAX_MODE_LEN 256
+
+typedef struct UfsSCSIReq {
+SCSIRequest req;
+/* Both sector and sector_count are in terms of BDRV_SECTOR_SIZE bytes.  */
+uint64_t sector;
+uint32_t sector_count;
+uint32_t buflen;
+bool started;
+bool need_fua_emulation;
+struct iovec iov;
+

[PULL 6/8] block: add subcluster_size field to BlockDriverInfo

2023-08-24 Thread Stefan Hajnoczi

From: Andrey Drobyshev via 

This is going to be used in the subsequent commit as requests alignment
(in particular, during copy-on-read).  This value only makes sense for
the formats which support subclusters (currently QCOW2 only).  If this
field isn't set by driver's own bdrv_get_info() implementation, we
simply set it equal to the cluster size thus treating each cluster as
having a single subcluster.

Reviewed-by: Eric Blake 
Reviewed-by: Denis V. Lunev 
Signed-off-by: Andrey Drobyshev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20230711172553.234055-2-andrey.drobys...@virtuozzo.com>
---
 include/block/block-common.h | 5 +
 block.c  | 7 +++
 block/qcow2.c| 1 +
 3 files changed, 13 insertions(+)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index e15395f2cb..df5ffc8d09 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -132,6 +132,11 @@ typedef struct BlockZoneWps {
 typedef struct BlockDriverInfo {
 /* in bytes, 0 if irrelevant */
 int cluster_size;
+/*
+ * A fraction of cluster_size, if supported (currently QCOW2 only); if
+ * disabled or unsupported, set equal to cluster_size.
+ */
+int subcluster_size;
 /* offset at which the VM state can be saved (0 if not possible) */
 int64_t vm_state_offset;
 bool is_dirty;
diff --git a/block.c b/block.c
index a307c151a8..0af890f647 100644
--- a/block.c
+++ b/block.c
@@ -6480,6 +6480,13 @@ int coroutine_fn bdrv_co_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 }
 memset(bdi, 0, sizeof(*bdi));
 ret = drv->bdrv_co_get_info(bs, bdi);
+if (bdi->subcluster_size == 0) {
+/*
+ * If the driver left this unset, subclusters are not supported.
+ * Then it is safe to treat each cluster as having only one subcluster.
+ */
+bdi->subcluster_size = bdi->cluster_size;
+}
 if (ret < 0) {
 return ret;
 }
diff --git a/block/qcow2.c b/block/qcow2.c
index c51388e99d..b48cd9ce63 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5197,6 +5197,7 @@ qcow2_co_get_info(BlockDriverState *bs, BlockDriverInfo 
*bdi)
 {
 BDRVQcow2State *s = bs->opaque;
 bdi->cluster_size = s->cluster_size;
+bdi->subcluster_size = s->subcluster_size;
 bdi->vm_state_offset = qcow2_vm_state_offset(s);
 bdi->is_dirty = s->incompatible_features & QCOW2_INCOMPAT_DIRTY;
 return 0;
-- 
2.41.0

[PULL 4/8] tests/qtest: Introduce tests for UFS

2023-08-24 Thread Stefan Hajnoczi

From: Jeuk Kim 

This patch includes the following tests
  Test mmio read
  Test ufs device initialization and ufs-lu recognition
  Test I/O (Performs a write followed by a read to verify)

Signed-off-by: Jeuk Kim 
Acked-by: Thomas Huth 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
af6b8d54c049490b3533a784a0aeac4798bb9217.1691062912.git.jeuk20@samsung.com
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS |   1 +
 tests/qtest/ufs-test.c  | 584 
 tests/qtest/meson.build |   1 +
 3 files changed, 586 insertions(+)
 create mode 100644 tests/qtest/ufs-test.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7da40dabc2..a45bac20c9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2261,6 +2261,7 @@ M: Jeuk Kim 
 S: Supported
 F: hw/ufs/*
 F: include/block/ufs.h
+F: tests/qtest/ufs-test.c
 
 megasas
 M: Hannes Reinecke 
diff --git a/tests/qtest/ufs-test.c b/tests/qtest/ufs-test.c
new file mode 100644
index 00..34c97bdf1b
--- /dev/null
+++ b/tests/qtest/ufs-test.c
@@ -0,0 +1,584 @@
+/*
+ * QTest testcase for UFS
+ *
+ * Copyright (c) 2023 Samsung Electronics Co., Ltd. All rights reserved.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/module.h"
+#include "qemu/units.h"
+#include "libqtest.h"
+#include "libqos/qgraph.h"
+#include "libqos/pci.h"
+#include "scsi/constants.h"
+#include "include/block/ufs.h"
+
+/* Test images sizes in Bytes */
+#define TEST_IMAGE_SIZE (64 * 1024 * 1024)
+/* Timeout for various operations, in seconds. */
+#define TIMEOUT_SECONDS 10
+/* Maximum PRD entry count */
+#define MAX_PRD_ENTRY_COUNT 10
+#define PRD_ENTRY_DATA_SIZE 4096
+/* Constants to build upiu */
+#define UTP_COMMAND_DESCRIPTOR_SIZE 4096
+#define UTP_RESPONSE_UPIU_OFFSET 1024
+#define UTP_PRDT_UPIU_OFFSET 2048
+
+typedef struct QUfs QUfs;
+
+struct QUfs {
+QOSGraphObject obj;
+QPCIDevice dev;
+QPCIBar bar;
+
+uint64_t utrlba;
+uint64_t utmrlba;
+uint64_t cmd_desc_addr;
+uint64_t data_buffer_addr;
+
+bool enabled;
+};
+
+static inline uint32_t ufs_rreg(QUfs *ufs, size_t offset)
+{
+return qpci_io_readl(>dev, ufs->bar, offset);
+}
+
+static inline void ufs_wreg(QUfs *ufs, size_t offset, uint32_t value)
+{
+qpci_io_writel(>dev, ufs->bar, offset, value);
+}
+
+static void ufs_wait_for_irq(QUfs *ufs)
+{
+uint64_t end_time;
+uint32_t is;
+/* Wait for device to reset as the linux driver does. */
+end_time = g_get_monotonic_time() + TIMEOUT_SECONDS * G_TIME_SPAN_SECOND;
+do {
+qtest_clock_step(ufs->dev.bus->qts, 100);
+is = ufs_rreg(ufs, A_IS);
+} while (is == 0 && g_get_monotonic_time() < end_time);
+}
+
+static UtpTransferReqDesc ufs_build_req_utrd(uint64_t cmd_desc_addr,
+ uint8_t slot,
+ uint32_t data_direction,
+ uint16_t prd_table_length)
+{
+UtpTransferReqDesc req = { 0 };
+uint64_t command_desc_base_addr =
+cmd_desc_addr + slot * UTP_COMMAND_DESCRIPTOR_SIZE;
+
+req.header.dword_0 =
+cpu_to_le32(1 << 28 | data_direction | UTP_REQ_DESC_INT_CMD);
+req.header.dword_2 = cpu_to_le32(OCS_INVALID_COMMAND_STATUS);
+
+req.command_desc_base_addr_hi = cpu_to_le32(command_desc_base_addr >> 32);
+req.command_desc_base_addr_lo =
+cpu_to_le32(command_desc_base_addr & 0x);
+req.response_upiu_offset =
+cpu_to_le16(UTP_RESPONSE_UPIU_OFFSET / sizeof(uint32_t));
+req.response_upiu_length = cpu_to_le16(sizeof(UtpUpiuRsp));
+req.prd_table_offset = cpu_to_le16(UTP_PRDT_UPIU_OFFSET / 
sizeof(uint32_t));
+req.prd_table_length = cpu_to_le16(prd_table_length);
+return req;
+}
+
+static void ufs_send_nop_out(QUfs *ufs, uint8_t slot,
+ UtpTransferReqDesc *utrd_out, UtpUpiuRsp *rsp_out)
+{
+/* Build up utp transfer request descriptor */
+UtpTransferReqDesc utrd =
+ufs_build_req_utrd(ufs->cmd_desc_addr, slot, UTP_NO_DATA_TRANSFER, 0);
+uint64_t utrd_addr = ufs->utrlba + slot * sizeof(UtpTransferReqDesc);
+uint64_t req_upiu_addr =
+ufs->cmd_desc_addr + slot * UTP_COMMAND_DESCRIPTOR_SIZE;
+uint64_t rsp_upiu_addr = req_upiu_addr + UTP_RESPONSE_UPIU_OFFSET;
+qtest_memwrite(ufs->dev.bus->qts, utrd_addr, , sizeof(utrd));
+
+/* Build up request upiu */
+UtpUpiuReq req_upiu = { 0 };
+req_upiu.header.trans_type = UPIU_TRANSACTION_NOP_OUT;
+req_upiu.header.task_tag = slot;
+qtest_memwrite(ufs->dev.bus->qts, req_upiu_addr, _upiu,
+   sizeof(req_upiu));
+
+/* Ring Doorbell */
+ufs_wreg(ufs, A_UTRLDBR, 1);
+ufs_wait_for_irq(ufs);
+g_assert_true(FIELD_EX32(ufs_rreg(ufs, A_IS), IS, UTRCS));
+ufs_wreg(ufs, A_IS, F

[PULL 2/8] hw/ufs: Support for Query Transfer Requests

2023-08-24 Thread Stefan Hajnoczi

From: Jeuk Kim 

This commit makes the UFS device support query
and nop out transfer requests.

The next patch would be support for UFS logical
unit and scsi command transfer request.

Signed-off-by: Jeuk Kim 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
d06b440d660872092f70af1b8167bd5f4704c957.1691062912.git.jeuk20@samsung.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/ufs/ufs.h|  46 +++
 hw/ufs/ufs.c| 980 +++-
 hw/ufs/trace-events |   1 +
 3 files changed, 1025 insertions(+), 2 deletions(-)

diff --git a/hw/ufs/ufs.h b/hw/ufs/ufs.h
index d9d195caec..3d1b2cff4e 100644
--- a/hw/ufs/ufs.h
+++ b/hw/ufs/ufs.h
@@ -18,6 +18,32 @@
 #define UFS_MAX_LUS 32
 #define UFS_BLOCK_SIZE 4096
 
+typedef enum UfsRequestState {
+UFS_REQUEST_IDLE = 0,
+UFS_REQUEST_READY = 1,
+UFS_REQUEST_RUNNING = 2,
+UFS_REQUEST_COMPLETE = 3,
+UFS_REQUEST_ERROR = 4,
+} UfsRequestState;
+
+typedef enum UfsReqResult {
+UFS_REQUEST_SUCCESS = 0,
+UFS_REQUEST_FAIL = 1,
+} UfsReqResult;
+
+typedef struct UfsRequest {
+struct UfsHc *hc;
+UfsRequestState state;
+int slot;
+
+UtpTransferReqDesc utrd;
+UtpUpiuReq req_upiu;
+UtpUpiuRsp rsp_upiu;
+
+/* for scsi command */
+QEMUSGList *sg;
+} UfsRequest;
+
 typedef struct UfsParams {
 char *serial;
 uint8_t nutrs; /* Number of UTP Transfer Request Slots */
@@ -30,6 +56,12 @@ typedef struct UfsHc {
 UfsReg reg;
 UfsParams params;
 uint32_t reg_size;
+UfsRequest *req_list;
+
+DeviceDescriptor device_desc;
+GeometryDescriptor geometry_desc;
+Attributes attributes;
+Flags flags;
 
 qemu_irq irq;
 QEMUBH *doorbell_bh;
@@ -39,4 +71,18 @@ typedef struct UfsHc {
 #define TYPE_UFS "ufs"
 #define UFS(obj) OBJECT_CHECK(UfsHc, (obj), TYPE_UFS)
 
+typedef enum UfsQueryFlagPerm {
+UFS_QUERY_FLAG_NONE = 0x0,
+UFS_QUERY_FLAG_READ = 0x1,
+UFS_QUERY_FLAG_SET = 0x2,
+UFS_QUERY_FLAG_CLEAR = 0x4,
+UFS_QUERY_FLAG_TOGGLE = 0x8,
+} UfsQueryFlagPerm;
+
+typedef enum UfsQueryAttrPerm {
+UFS_QUERY_ATTR_NONE = 0x0,
+UFS_QUERY_ATTR_READ = 0x1,
+UFS_QUERY_ATTR_WRITE = 0x2,
+} UfsQueryAttrPerm;
+
 #endif /* HW_UFS_UFS_H */
diff --git a/hw/ufs/ufs.c b/hw/ufs/ufs.c
index 101082a8a3..c771f8e0b4 100644
--- a/hw/ufs/ufs.c
+++ b/hw/ufs/ufs.c
@@ -15,10 +15,221 @@
 #include "ufs.h"
 
 /* The QEMU-UFS device follows spec version 3.1 */
-#define UFS_SPEC_VER 0x0310
+#define UFS_SPEC_VER 0x0310
 #define UFS_MAX_NUTRS 32
 #define UFS_MAX_NUTMRS 8
 
+static MemTxResult ufs_addr_read(UfsHc *u, hwaddr addr, void *buf, int size)
+{
+hwaddr hi = addr + size - 1;
+
+if (hi < addr) {
+return MEMTX_DECODE_ERROR;
+}
+
+if (!FIELD_EX32(u->reg.cap, CAP, 64AS) && (hi >> 32)) {
+return MEMTX_DECODE_ERROR;
+}
+
+return pci_dma_read(PCI_DEVICE(u), addr, buf, size);
+}
+
+static MemTxResult ufs_addr_write(UfsHc *u, hwaddr addr, const void *buf,
+  int size)
+{
+hwaddr hi = addr + size - 1;
+if (hi < addr) {
+return MEMTX_DECODE_ERROR;
+}
+
+if (!FIELD_EX32(u->reg.cap, CAP, 64AS) && (hi >> 32)) {
+return MEMTX_DECODE_ERROR;
+}
+
+return pci_dma_write(PCI_DEVICE(u), addr, buf, size);
+}
+
+static void ufs_complete_req(UfsRequest *req, UfsReqResult req_result);
+
+static inline hwaddr ufs_get_utrd_addr(UfsHc *u, uint32_t slot)
+{
+hwaddr utrl_base_addr = (((hwaddr)u->reg.utrlbau) << 32) + u->reg.utrlba;
+hwaddr utrd_addr = utrl_base_addr + slot * sizeof(UtpTransferReqDesc);
+
+return utrd_addr;
+}
+
+static inline hwaddr ufs_get_req_upiu_base_addr(const UtpTransferReqDesc *utrd)
+{
+uint32_t cmd_desc_base_addr_lo =
+le32_to_cpu(utrd->command_desc_base_addr_lo);
+uint32_t cmd_desc_base_addr_hi =
+le32_to_cpu(utrd->command_desc_base_addr_hi);
+
+return (((hwaddr)cmd_desc_base_addr_hi) << 32) + cmd_desc_base_addr_lo;
+}
+
+static inline hwaddr ufs_get_rsp_upiu_base_addr(const UtpTransferReqDesc *utrd)
+{
+hwaddr req_upiu_base_addr = ufs_get_req_upiu_base_addr(utrd);
+uint32_t rsp_upiu_byte_off =
+le16_to_cpu(utrd->response_upiu_offset) * sizeof(uint32_t);
+return req_upiu_base_addr + rsp_upiu_byte_off;
+}
+
+static MemTxResult ufs_dma_read_utrd(UfsRequest *req)
+{
+UfsHc *u = req->hc;
+hwaddr utrd_addr = ufs_get_utrd_addr(u, req->slot);
+MemTxResult ret;
+
+ret = ufs_addr_read(u, utrd_addr, >utrd, sizeof(req->utrd));
+if (ret) {
+trace_ufs_err_dma_read_utrd(req->slot, utrd_addr);
+}
+return ret;
+}
+
+static MemTxResult ufs_dma_read_req_upiu(UfsRequest *req)
+{
+UfsHc *u = req->hc;
+hwaddr req_upiu_base_addr = ufs_get_req_upiu_base_addr(>utrd);
+UtpUpiuReq *req_upiu = >req_upiu;
+uint32_t copy_size;
+uint16_t

[PULL 0/8] Block patches

2023-08-24 Thread Stefan Hajnoczi

The following changes since commit b0dd9a7d6dd15a6898e9c585b521e6bec79b25aa:

  Open 8.2 development tree (2023-08-22 07:14:07 -0700)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 892470a8de848a685acb67ba739485424efc3aed:

  tests/qemu-iotests/197: add testcase for CoR with subclusters (2023-08-24 
10:28:50 -0400)


Pull request

First block pull request for the QEMU 8.2 release cycle.



Andrey Drobyshev via (3):
  block: add subcluster_size field to BlockDriverInfo
  block/io: align requests to subcluster_size
  tests/qemu-iotests/197: add testcase for CoR with subclusters

Fabiano Rosas (1):
  block-migration: Ensure we don't crash during migration cleanup

Jeuk Kim (4):
  hw/ufs: Initial commit for emulated Universal-Flash-Storage
  hw/ufs: Support for Query Transfer Requests
  hw/ufs: Support for UFS logical unit
  tests/qtest: Introduce tests for UFS

 MAINTAINERS  |7 +
 docs/specs/pci-ids.rst   |2 +
 meson.build  |1 +
 hw/ufs/trace.h   |1 +
 hw/ufs/ufs.h |  131 +++
 include/block/block-common.h |5 +
 include/block/block-io.h |8 +-
 include/block/ufs.h  | 1090 +
 include/hw/pci/pci.h |1 +
 include/hw/pci/pci_ids.h |1 +
 include/scsi/constants.h |1 +
 block.c  |7 +
 block/io.c   |   50 +-
 block/mirror.c   |8 +-
 block/qcow2.c|1 +
 hw/ufs/lu.c  | 1445 
 hw/ufs/ufs.c | 1494 ++
 migration/block.c|   11 +-
 tests/qtest/ufs-test.c   |  584 +
 hw/Kconfig   |1 +
 hw/meson.build   |1 +
 hw/ufs/Kconfig   |4 +
 hw/ufs/meson.build   |1 +
 hw/ufs/trace-events  |   58 ++
 tests/qemu-iotests/197   |   29 +
 tests/qemu-iotests/197.out   |   24 +
 tests/qtest/meson.build  |1 +
 27 files changed, 4932 insertions(+), 35 deletions(-)
 create mode 100644 hw/ufs/trace.h
 create mode 100644 hw/ufs/ufs.h
 create mode 100644 include/block/ufs.h
 create mode 100644 hw/ufs/lu.c
 create mode 100644 hw/ufs/ufs.c
 create mode 100644 tests/qtest/ufs-test.c
 create mode 100644 hw/ufs/Kconfig
 create mode 100644 hw/ufs/meson.build
 create mode 100644 hw/ufs/trace-events

-- 
2.41.0

[PULL 1/8] hw/ufs: Initial commit for emulated Universal-Flash-Storage

2023-08-24 Thread Stefan Hajnoczi

From: Jeuk Kim 

Universal Flash Storage (UFS) is a high-performance mass storage device
with a serial interface. It is primarily used as a high-performance
data storage device for embedded applications.

This commit contains code for UFS device to be recognized
as a UFS PCI device.
Patches to handle UFS logical unit and Transfer Request will follow.

Signed-off-by: Jeuk Kim 
Reviewed-by: Stefan Hajnoczi 
Message-id: 
9f3db32fe1c708090a6bb764d456973b5abef55f.1691062912.git.jeuk20@samsung.com
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS  |6 +
 docs/specs/pci-ids.rst   |2 +
 meson.build  |1 +
 hw/ufs/trace.h   |1 +
 hw/ufs/ufs.h |   42 ++
 include/block/ufs.h  | 1090 ++
 include/hw/pci/pci.h |1 +
 include/hw/pci/pci_ids.h |1 +
 hw/ufs/ufs.c |  278 ++
 hw/Kconfig   |1 +
 hw/meson.build   |1 +
 hw/ufs/Kconfig   |4 +
 hw/ufs/meson.build   |1 +
 hw/ufs/trace-events  |   32 ++
 14 files changed, 1461 insertions(+)
 create mode 100644 hw/ufs/trace.h
 create mode 100644 hw/ufs/ufs.h
 create mode 100644 include/block/ufs.h
 create mode 100644 hw/ufs/ufs.c
 create mode 100644 hw/ufs/Kconfig
 create mode 100644 hw/ufs/meson.build
 create mode 100644 hw/ufs/trace-events

diff --git a/MAINTAINERS b/MAINTAINERS
index 6111b6b4d9..7da40dabc2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2256,6 +2256,12 @@ F: tests/qtest/nvme-test.c
 F: docs/system/devices/nvme.rst
 T: git git://git.infradead.org/qemu-nvme.git nvme-next
 
+ufs
+M: Jeuk Kim 
+S: Supported
+F: hw/ufs/*
+F: include/block/ufs.h
+
 megasas
 M: Hannes Reinecke 
 L: qemu-bl...@nongnu.org
diff --git a/docs/specs/pci-ids.rst b/docs/specs/pci-ids.rst
index e302bea484..d6707fa069 100644
--- a/docs/specs/pci-ids.rst
+++ b/docs/specs/pci-ids.rst
@@ -92,6 +92,8 @@ PCI devices (other than virtio):
   PCI PVPanic device (``-device pvpanic-pci``)
 1b36:0012
   PCI ACPI ERST device (``-device acpi-erst``)
+1b36:0013
+  PCI UFS device (``-device ufs``)
 
 All these devices are documented in :doc:`index`.
 
diff --git a/meson.build b/meson.build
index 98e68ef0b1..76578b2bf2 100644
--- a/meson.build
+++ b/meson.build
@@ -3277,6 +3277,7 @@ if have_system
 'hw/ssi',
 'hw/timer',
 'hw/tpm',
+'hw/ufs',
 'hw/usb',
 'hw/vfio',
 'hw/virtio',
diff --git a/hw/ufs/trace.h b/hw/ufs/trace.h
new file mode 100644
index 00..2dbd6397c3
--- /dev/null
+++ b/hw/ufs/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-hw_ufs.h"
diff --git a/hw/ufs/ufs.h b/hw/ufs/ufs.h
new file mode 100644
index 00..d9d195caec
--- /dev/null
+++ b/hw/ufs/ufs.h
@@ -0,0 +1,42 @@
+/*
+ * QEMU UFS
+ *
+ * Copyright (c) 2023 Samsung Electronics Co., Ltd. All rights reserved.
+ *
+ * Written by Jeuk Kim 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_UFS_UFS_H
+#define HW_UFS_UFS_H
+
+#include "hw/pci/pci_device.h"
+#include "hw/scsi/scsi.h"
+#include "block/ufs.h"
+
+#define UFS_MAX_LUS 32
+#define UFS_BLOCK_SIZE 4096
+
+typedef struct UfsParams {
+char *serial;
+uint8_t nutrs; /* Number of UTP Transfer Request Slots */
+uint8_t nutmrs; /* Number of UTP Task Management Request Slots */
+} UfsParams;
+
+typedef struct UfsHc {
+PCIDevice parent_obj;
+MemoryRegion iomem;
+UfsReg reg;
+UfsParams params;
+uint32_t reg_size;
+
+qemu_irq irq;
+QEMUBH *doorbell_bh;
+QEMUBH *complete_bh;
+} UfsHc;
+
+#define TYPE_UFS "ufs"
+#define UFS(obj) OBJECT_CHECK(UfsHc, (obj), TYPE_UFS)
+
+#endif /* HW_UFS_UFS_H */
diff --git a/include/block/ufs.h b/include/block/ufs.h
new file mode 100644
index 00..e70ee23183
--- /dev/null
+++ b/include/block/ufs.h
@@ -0,0 +1,1090 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef BLOCK_UFS_H
+#define BLOCK_UFS_H
+
+#include "hw/registerfields.h"
+
+typedef struct QEMU_PACKED UfsReg {
+uint32_t cap;
+uint32_t rsvd0;
+uint32_t ver;
+uint32_t rsvd1;
+uint32_t hcpid;
+uint32_t hcmid;
+uint32_t ahit;
+uint32_t rsvd2;
+uint32_t is;
+uint32_t ie;
+uint32_t rsvd3[2];
+uint32_t hcs;
+uint32_t hce;
+uint32_t uecpa;
+uint32_t uecdl;
+uint32_t uecn;
+uint32_t uect;
+uint32_t uecdme;
+uint32_t utriacr;
+uint32_t utrlba;
+uint32_t utrlbau;
+uint32_t utrldbr;
+uint32_t utrlclr;
+uint32_t utrlrsr;
+uint32_t utrlcnr;
+uint32_t rsvd4[2];
+uint32_t utmrlba;
+uint32_t utmrlbau;
+uint32_t utmrldbr;
+uint32_t utmrlclr;
+uint32_t utmrlrsr;
+uint32_t rsvd5[3];
+uint32_t uiccmd;
+uint32_t ucmdarg1;
+uint32_t ucmdarg2;
+uint32_t ucmdarg3;
+uint32_t rsvd6[4];
+uint32_t rsvd7[4];
+uint32_t rsvd8[16];
+uint32_t ccap;
+} UfsReg;
+
+REG32(CAP, offsetof(UfsReg, cap))
+FIELD(CAP, NUTRS, 0, 5)
+FIELD(C

Re: [PULL 00/35] target-arm queue

2023-08-24 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/31] loongarch-to-apply queue

2023-08-24 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/12] First batch of s390x patches for QEMU 8.2

2023-08-24 Thread Stefan Hajnoczi

On Thu, 24 Aug 2023 at 02:53, Thomas Huth  wrote:
>
> On 23/08/2023 18.34, Stefan Hajnoczi wrote:
> > On Wed, Aug 23, 2023 at 01:45:32PM +0200, Thomas Huth wrote:
> >> The following changes since commit 
> >> b0dd9a7d6dd15a6898e9c585b521e6bec79b25aa:
> >>
> >>Open 8.2 development tree (2023-08-22 07:14:07 -0700)
> >>
> >> are available in the Git repository at:
> >>
> >>https://gitlab.com/thuth/qemu.git tags/pull-request-2023-08-23
> >>
> >> for you to fetch changes up to 6c49f685d30ffe81cfa47da2c258904ad28ac368:
> >>
> >>tests/tcg/s390x: Test VSTRS (2023-08-23 12:07:30 +0200)
> >
> > Hi Thomas,
> > Please take a look at the following ubuntu-20.04-s390x-all CI failure:
> > https://gitlab.com/qemu-project/qemu/-/jobs/4931341536
>
> It says: "TimeoutError: Timeout waiting for job to pause" ... could you
> please check the load on that host? ... I think that s390x runner is known
> for being too slow some times, so I assume that problem should go away if
> you re-run the job when it is less loaded.

I ran it again and it timed out. I've merged the PR and assume the
test is just flaky.

Stefan

Re: [PATCH] .gitlab-ci.d/cirrus.yml: Update FreeBSD to v13.2

2023-08-24 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PATCH v2 0/3] block: align CoR requests to subclusters

2023-08-24 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 03:50:55PM +0300, Andrey Drobyshev wrote:
> On 8/22/23 22:58, John Snow wrote:
> > On Tue, Aug 22, 2023 at 1:33 PM Andrey Drobyshev
> >  wrote:
> >>
> >> On 8/16/23 12:22, Andrey Drobyshev wrote:
> >>> On 7/31/23 17:51, Andrey Drobyshev wrote:
>  On 7/24/23 16:11, Andrey Drobyshev wrote:
> > On 7/11/23 20:25, Andrey Drobyshev wrote:
> >> v1 --> v2:
> >>  * Fixed line indentation;
> >>  * Fixed wording in a comment;
> >>  * Added R-b.
> >>
> >> v1: 
> >> https://lists.nongnu.org/archive/html/qemu-block/2023-06/msg00606.html
> >>
> >> Andrey Drobyshev (3):
> >>   block: add subcluster_size field to BlockDriverInfo
> >>   block/io: align requests to subcluster_size
> >>   tests/qemu-iotests/197: add testcase for CoR with subclusters
> >>
> >>  block.c  |  7 +
> >>  block/io.c   | 50 ++--
> >>  block/mirror.c   |  8 +++---
> >>  block/qcow2.c|  1 +
> >>  include/block/block-common.h |  5 
> >>  include/block/block-io.h |  8 +++---
> >>  tests/qemu-iotests/197   | 29 +
> >>  tests/qemu-iotests/197.out   | 24 +
> >>  8 files changed, 99 insertions(+), 33 deletions(-)
> >>
> >
> > Ping
> 
>  Another ping
> >>>
> >>> Yet another friendly ping
> >>
> >> One more friendly ping
> > 
> > Looks like Stefan gave you an R-B for the series; do we just need an
> > ACK by the block maintainers at this point or is there someone
> > specific you're hoping will review this?
> > 
> > --js
> > 
> 
> Hi John,
> 
> I figure a maintainer's R-b doesn't imply the patches being merged into
> the tree.  Hence I'm waiting for the notice that they actually are merged.
> 
> Please let me know if the process should be different.

Hi Andrey,
Kevin is away right now but seemed happy enough when I mentioned this
series to him, so I have merged this into my own tree:

  https://gitlab.com/stefanha/qemu block

Sorry that your patch series have not been merged in a timely manner.

Stefan


signature.asc
Description: PGP signature

Re: [PULL 00/48] tcg patch queue

2023-08-24 Thread Stefan Hajnoczi

On Wed, 23 Aug 2023 at 16:24, Richard Henderson
 wrote:
>
> The following changes since commit b0dd9a7d6dd15a6898e9c585b521e6bec79b25aa:
>
>   Open 8.2 development tree (2023-08-22 07:14:07 -0700)
>
> are available in the Git repository at:
>
>   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230823
>
> for you to fetch changes up to 05e09d2a830a74d651ca6106e2002ec4f7b6997a:
>
>   tcg: spelling fixes (2023-08-23 13:20:47 -0700)
>
> 
> accel/*: Widen pc/saved_insn for *_sw_breakpoint
> accel/tcg: Replace remaining target_ulong in system-mode accel
> tcg: spelling fixes
> tcg: Document bswap, hswap, wswap byte patterns
> tcg: Introduce negsetcond opcodes
> tcg: Fold deposit with zero to and
> tcg: Unify TCG_TARGET_HAS_extr[lh]_i64_i32
> tcg/i386: Drop BYTEH deposits for 64-bit
> tcg/i386: Allow immediate as input to deposit
> target/*: Use tcg_gen_negsetcond_*
>
> 
> Anton Johansson via (9):

Hi Richard,
Please fix up the authorship information for Anton Johansson. The pull
request has "Anton Johansson via ". I think the
email address should be "Anton Johansson ".

Thanks!

Stefan

Re: [PATCH v2 1/4] softmmu: Support concurrent bounce buffers

2023-08-24 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 04:54:06PM -0400, Peter Xu wrote:
> On Wed, Aug 23, 2023 at 10:08:08PM +0200, Mattias Nissler wrote:
> > On Wed, Aug 23, 2023 at 7:35 PM Peter Xu  wrote:
> > > On Wed, Aug 23, 2023 at 02:29:02AM -0700, Mattias Nissler wrote:
> > > > diff --git a/softmmu/vl.c b/softmmu/vl.c
> > > > index b0b96f67fa..dbe52f5ea1 100644
> > > > --- a/softmmu/vl.c
> > > > +++ b/softmmu/vl.c
> > > > @@ -3469,6 +3469,12 @@ void qemu_init(int argc, char **argv)
> > > >  exit(1);
> > > >  #endif
> > > >  break;
> > > > +case QEMU_OPTION_max_bounce_buffer_size:
> > > > +if (qemu_strtosz(optarg, NULL, 
> > > > _bounce_buffer_size) < 0) {
> > > > +error_report("invalid -max-ounce-buffer-size 
> > > > value");
> > > > +exit(1);
> > > > +}
> > > > +break;
> > >
> > > PS: I had a vague memory that we do not recommend adding more qemu cmdline
> > > options, but I don't know enough on the plan to say anything real.
> > 
> > I am aware of that, and I'm really not happy with the command line
> > option myself. Consider the command line flag a straw man I put in to
> > see whether any reviewers have better ideas :)
> > 
> > More seriously, I actually did look around to see whether I can add
> > the parameter to one of the existing option groupings somewhere, but
> > neither do I have a suitable QOM object that I can attach the
> > parameter to, nor did I find any global option groups that fits: this
> > is not really memory configuration, and it's not really CPU
> > configuration, it's more related to shared device model
> > infrastructure... If you have a good idea for a home for this, I'm all
> > ears.
> 
> No good & simple suggestion here, sorry.  We can keep the option there
> until someone jumps in, then the better alternative could also come along.
> 
> After all I expect if we can choose a sensible enough default value, this
> new option shouldn't be used by anyone for real.

QEMU commits to stability in its external interfaces. Once the
command-line option is added, it needs to be supported in the future or
go through the deprecation process. I think we should agree on the
command-line option now.

Two ways to avoid the issue:
1. Drop the command-line option until someone needs it.
2. Make it an experimental option (with an "x-" prefix).

The closest to a proper solution that I found was adding it as a
-machine property. What bothers me is that if QEMU supports
multi-machine emulation in a single process some day, then the property
doesn't make sense since it's global rather than specific to a machine.

CCing Markus Armbruster for ideas.

Stefan


signature.asc
Description: PGP signature

Re: Starting the QEMU 8.2 release cycle

2023-08-24 Thread Stefan Hajnoczi

On Thu, 24 Aug 2023 at 05:25, Peter Maydell  wrote:
>
> On Wed, 23 Aug 2023 at 12:23, Stefan Hajnoczi  wrote:
> >
> > Hi,
> > The QEMU 8.2 release cycle has begun. I am merging qemu.git pull
> > requests for this release. You do not need to CC me on pull requests
> > because I am monitoring qemu-devel.
> >
> > The proposed schedule is as follows:
> >
> > 2023-08-22  Beginning of development phase
> > 2023-11-14  Soft feature freeze. Only bug fixes after this point. All 
> > feature changes must be already in a sub maintainer tree and all pull 
> > requests from submaintainers must have been sent to the list by this date.
> > 2023-11-21  Hard feature freeze. Tag rc0
> > 2023-11-28  Tag rc1
> > 2023-12-05  Tag rc2
> > 2023-12-12  Tag rc3
> > 2023-12-19  Release; or tag rc4 if needed
> > 2023-12-26  Release if we needed an rc4
> >
> > Please let me know if these dates are inconvenient. I may adjust the
> > last date depending on my availability at the end of 2023.
>
> Up to you, but my personal preference when doing the end-of-year
> releases was to move the cycle a week or so earlier, because
> it's pretty common that we need the rc4 and it's better not
> to be trying to do the final release in the middle of the
> holiday season (especially bearing in mind that releases
> require not just your availability but also Mike Roth's).

That sounds good. I will bring the dates forward by one week.

Stefan

[PATCH v2 1/4] block: remove AIOCBInfo->get_aio_context()

2023-08-23 Thread Stefan Hajnoczi

The synchronous bdrv_aio_cancel() function needs the acb's AioContext so
it can call aio_poll() to wait for cancellation.

It turns out that all users run under the BQL in the main AioContext, so
this callback is not needed.

Remove the callback, mark bdrv_aio_cancel() GLOBAL_STATE_CODE just like
its blk_aio_cancel() caller, and poll the main loop AioContext.

The purpose of this cleanup is to identify bdrv_aio_cancel() as an API
that does not work with the multi-queue block layer.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h|  1 -
 include/block/block-global-state.h |  2 ++
 include/block/block-io.h   |  1 -
 block/block-backend.c  | 17 -
 block/io.c | 23 ---
 hw/nvme/ctrl.c |  7 ---
 softmmu/dma-helpers.c  |  8 
 util/thread-pool.c |  8 
 8 files changed, 10 insertions(+), 57 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 32042e8905..bcc165c974 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -31,7 +31,6 @@ typedef void BlockCompletionFunc(void *opaque, int ret);
 
 typedef struct AIOCBInfo {
 void (*cancel_async)(BlockAIOCB *acb);
-AioContext *(*get_aio_context)(BlockAIOCB *acb);
 size_t aiocb_size;
 } AIOCBInfo;
 
diff --git a/include/block/block-global-state.h 
b/include/block/block-global-state.h
index f347199bff..ac2a605ef5 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -185,6 +185,8 @@ void bdrv_drain_all_begin_nopoll(void);
 void bdrv_drain_all_end(void);
 void bdrv_drain_all(void);
 
+void bdrv_aio_cancel(BlockAIOCB *acb);
+
 int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
 BlockDriverState *bdrv_find_node(const char *node_name);
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 4415506e40..b078d17bf1 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -101,7 +101,6 @@ bdrv_co_delete_file_noerr(BlockDriverState *bs);
 
 
 /* async block I/O */
-void bdrv_aio_cancel(BlockAIOCB *acb);
 void bdrv_aio_cancel_async(BlockAIOCB *acb);
 
 /* sg packet commands */
diff --git a/block/block-backend.c b/block/block-backend.c
index 4009ed5fed..a77295a198 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -33,8 +33,6 @@
 
 #define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
 
-static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
-
 typedef struct BlockBackendAioNotifier {
 void (*attached_aio_context)(AioContext *new_context, void *opaque);
 void (*detach_aio_context)(void *opaque);
@@ -103,7 +101,6 @@ typedef struct BlockBackendAIOCB {
 } BlockBackendAIOCB;
 
 static const AIOCBInfo block_backend_aiocb_info = {
-.get_aio_context = blk_aiocb_get_aio_context,
 .aiocb_size = sizeof(BlockBackendAIOCB),
 };
 
@@ -1545,16 +1542,8 @@ typedef struct BlkAioEmAIOCB {
 bool has_returned;
 } BlkAioEmAIOCB;
 
-static AioContext *blk_aio_em_aiocb_get_aio_context(BlockAIOCB *acb_)
-{
-BlkAioEmAIOCB *acb = container_of(acb_, BlkAioEmAIOCB, common);
-
-return blk_get_aio_context(acb->rwco.blk);
-}
-
 static const AIOCBInfo blk_aio_em_aiocb_info = {
 .aiocb_size = sizeof(BlkAioEmAIOCB),
-.get_aio_context= blk_aio_em_aiocb_get_aio_context,
 };
 
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
@@ -2434,12 +2423,6 @@ AioContext *blk_get_aio_context(BlockBackend *blk)
 return blk->ctx;
 }
 
-static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb)
-{
-BlockBackendAIOCB *blk_acb = DO_UPCAST(BlockBackendAIOCB, common, acb);
-return blk_get_aio_context(blk_acb->blk);
-}
-
 int blk_set_aio_context(BlockBackend *blk, AioContext *new_context,
 Error **errp)
 {
diff --git a/block/io.c b/block/io.c
index 055fcf7438..16245dc93a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2944,25 +2944,18 @@ int bdrv_load_vmstate(BlockDriverState *bs, uint8_t 
*buf,
 /**/
 /* async I/Os */
 
+/**
+ * Synchronously cancels an acb. Must be called with the BQL held and the acb
+ * must be processed with the BQL held too (IOThreads are not allowed).
+ *
+ * Use bdrv_aio_cancel_async() instead when possible.
+ */
 void bdrv_aio_cancel(BlockAIOCB *acb)
 {
-IO_CODE();
+GLOBAL_STATE_CODE();
 qemu_aio_ref(acb);
 bdrv_aio_cancel_async(acb);
-while (acb->refcnt > 1) {
-if (acb->aiocb_info->get_aio_context) {
-aio_poll(acb->aiocb_info->get_aio_context(acb), true);
-} else if (acb->bs) {
-/* qemu_aio_ref and qemu_aio_unref are not thread-safe, so
- * assert that we're not using an I/O thread.  Thread-safe
- * code should use bdrv_aio_cancel_async exclusively.
- */
-asser

[PATCH v2 4/4] block-coroutine-wrapper: use qemu_get_current_aio_context()

2023-08-23 Thread Stefan Hajnoczi

Use qemu_get_current_aio_context() in mixed wrappers and coroutine
wrappers so that code runs in the caller's AioContext instead of moving
to the BlockDriverState's AioContext. This change is necessary for the
multi-queue block layer where any thread can call into the block layer.

Most wrappers are IO_CODE where it's safe to use the current AioContext
nowadays. BlockDrivers and the core block layer use their own locks and
no longer depend on the AioContext lock for thread-safety.

The bdrv_create() wrapper invokes GLOBAL_STATE code. Using the current
AioContext is safe because this code is only called with the BQL held
from the main loop thread.

The output of qemu-iotests 051 is sensitive to event loop activity.
Update the output because the monitor BH runs at a different time,
causing prompts to be printed differently in the output.

Signed-off-by: Stefan Hajnoczi 
---
 scripts/block-coroutine-wrapper.py | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/scripts/block-coroutine-wrapper.py 
b/scripts/block-coroutine-wrapper.py
index d4a183db61..f93fe154c3 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -88,8 +88,6 @@ def __init__(self, wrapper_type: str, return_type: str, name: 
str,
 raise ValueError(f"no_co function can't be rdlock: 
{self.name}")
 self.target_name = f'{subsystem}_{subname}'
 
-self.ctx = self.gen_ctx()
-
 self.get_result = 's->ret = '
 self.ret = 'return s.ret;'
 self.co_ret = 'return '
@@ -162,7 +160,7 @@ def create_mixed_wrapper(func: FuncDecl) -> str:
 {func.co_ret}{name}({ func.gen_list('{name}') });
 }} else {{
 {struct_name} s = {{
-.poll_state.ctx = {func.ctx},
+.poll_state.ctx = qemu_get_current_aio_context(),
 .poll_state.in_progress = true,
 
 { func.gen_block('.{name} = {name},') }
@@ -186,7 +184,7 @@ def create_co_wrapper(func: FuncDecl) -> str:
 {func.return_type} {func.name}({ func.gen_list('{decl}') })
 {{
 {struct_name} s = {{
-.poll_state.ctx = {func.ctx},
+.poll_state.ctx = qemu_get_current_aio_context(),
 .poll_state.in_progress = true,
 
 { func.gen_block('.{name} = {name},') }
-- 
2.41.0

[PATCH v2 0/4] block-backend: process I/O in the current AioContext

2023-08-23 Thread Stefan Hajnoczi

v2
- Add patch to remove AIOCBInfo->get_aio_context() [Kevin]
- Add patch to use qemu_get_current_aio_context() in block-coroutine-wrapper so
  that the wrappers use the current AioContext instead of
  bdrv_get_aio_context().

Switch blk_aio_*() APIs over to multi-queue by using
qemu_get_current_aio_context() instead of blk_get_aio_context(). This change
will allow devices to process I/O in multiple IOThreads in the future.

The final patch requires my QIOChannel AioContext series to pass
tests/qemu-iotests/check -qcow2 281 because the nbd block driver is now
accessed from the main loop thread in addition to the IOThread:
https://lore.kernel.org/qemu-devel/20230823234504.1387239-1-stefa...@redhat.com/T/#t

Based-on: 20230823234504.1387239-1-stefa...@redhat.com

Stefan Hajnoczi (4):
  block: remove AIOCBInfo->get_aio_context()
  block-backend: process I/O in the current AioContext
  block-backend: process zoned requests in the current AioContext
  block-coroutine-wrapper: use qemu_get_current_aio_context()

 include/block/aio.h|  1 -
 include/block/block-global-state.h |  2 ++
 include/block/block-io.h   |  1 -
 block/block-backend.c  | 35 --
 block/io.c | 23 +++-
 hw/nvme/ctrl.c |  7 --
 softmmu/dma-helpers.c  |  8 ---
 util/thread-pool.c |  8 ---
 scripts/block-coroutine-wrapper.py |  6 ++---
 9 files changed, 21 insertions(+), 70 deletions(-)

-- 
2.41.0

[PATCH v2 3/4] block-backend: process zoned requests in the current AioContext

2023-08-23 Thread Stefan Hajnoczi

Process zoned requests in the current thread's AioContext instead of in
the BlockBackend's AioContext.

There is no need to use the BlockBackend's AioContext thanks to CoMutex
bs->wps->colock, which protects zone metadata.

Signed-off-by: Stefan Hajnoczi 
---
 block/block-backend.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 4863be5691..427ebcc0e4 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1890,11 +1890,11 @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, 
int64_t offset,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
@@ -1931,11 +1931,11 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, 
BlockZoneOp op,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
@@ -1971,10 +1971,10 @@ BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, 
int64_t *offset,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
-- 
2.41.0

[PATCH v2 2/4] block-backend: process I/O in the current AioContext

2023-08-23 Thread Stefan Hajnoczi

Switch blk_aio_*() APIs over to multi-queue by using
qemu_get_current_aio_context() instead of blk_get_aio_context(). This
change will allow devices to process I/O in multiple IOThreads in the
future.

I audited existing blk_aio_*() callers:
- migration/block.c: blk_mig_lock() protects the data accessed by the
  completion callback.
- The remaining emulated devices and exports run with
  qemu_get_aio_context() == blk_get_aio_context().

Signed-off-by: Stefan Hajnoczi 
---
 block/block-backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index a77295a198..4863be5691 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1530,7 +1530,7 @@ BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
 acb->blk = blk;
 acb->ret = ret;
 
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  error_callback_bh, acb);
 return >common;
 }
@@ -1584,11 +1584,11 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, 
int64_t offset,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(co_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
-- 
2.41.0

[PATCH 2/2] io: follow coroutine AioContext in qio_channel_yield()

2023-08-23 Thread Stefan Hajnoczi

The ongoing QEMU multi-queue block layer effort makes it possible for multiple
threads to process I/O in parallel. The nbd block driver is not compatible with
the multi-queue block layer yet because QIOChannel cannot be used easily from
coroutines running in multiple threads. This series changes the QIOChannel API
to make that possible.

In the current API, calling qio_channel_attach_aio_context() sets the
AioContext where qio_channel_yield() installs an fd handler prior to yielding:

  qio_channel_attach_aio_context(ioc, my_ctx);
  ...
  qio_channel_yield(ioc); // my_ctx is used here
  ...
  qio_channel_detach_aio_context(ioc);

This API design has limitations: reading and writing must be done in the same
AioContext and moving between AioContexts involves a cumbersome sequence of API
calls that is not suitable for doing on a per-request basis.

There is no fundamental reason why a QIOChannel needs to run within the
same AioContext every time qio_channel_yield() is called. QIOChannel
only uses the AioContext while inside qio_channel_yield(). The rest of
the time, QIOChannel is independent of any AioContext.

In the new API, qio_channel_yield() queries the AioContext from the current
coroutine using qemu_coroutine_get_aio_context(). There is no need to
explicitly attach/detach AioContexts anymore and
qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are gone.
One coroutine can read from the QIOChannel while another coroutine writes from
a different AioContext.

This API change allows the nbd block driver to use QIOChannel from any thread.
It's important to keep in mind that the block driver already synchronizes
QIOChannel access and ensures that two coroutines never read simultaneously or
write simultaneously.

This patch updates all users of qio_channel_attach_aio_context() to the
new API. Most conversions are simple, but vhost-user-server requires a
new qemu_coroutine_yield() call to quiesce the vu_client_trip()
coroutine when not attached to any AioContext.

While the API is has become simpler, there is one wart: QIOChannel has a
special case for the iohandler AioContext (used for handlers that must not run
in nested event loops). I didn't find an elegant way preserve that behavior, so
I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, true|false)
for opting in to the new AioContext model. By default QIOChannel uses the
iohandler AioHandler. Code that formerly called
qio_channel_attach_aio_context() now calls
qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
created.

Signed-off-by: Stefan Hajnoczi 
---
 include/io/channel.h |  34 +++--
 include/qemu/vhost-user-server.h |   1 +
 block/nbd.c  |  11 +--
 io/channel-command.c |  13 +++-
 io/channel-file.c|  18 -
 io/channel-null.c|   3 +-
 io/channel-socket.c  |  18 -
 io/channel-tls.c |   6 +-
 io/channel.c | 120 ++-
 migration/channel-block.c|   3 +-
 nbd/client.c |   2 +-
 nbd/server.c |  14 +---
 scsi/qemu-pr-helper.c|   4 +-
 util/vhost-user-server.c |  27 +--
 14 files changed, 191 insertions(+), 83 deletions(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 229bf36910..dfbe6f2931 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -81,9 +81,11 @@ struct QIOChannel {
 Object parent;
 unsigned int features; /* bitmask of QIOChannelFeatures */
 char *name;
-AioContext *ctx;
+AioContext *read_ctx;
 Coroutine *read_coroutine;
+AioContext *write_ctx;
 Coroutine *write_coroutine;
+bool follow_coroutine_ctx;
 #ifdef _WIN32
 HANDLE event; /* For use with GSource on Win32 */
 #endif
@@ -140,8 +142,9 @@ struct QIOChannelClass {
  int whence,
  Error **errp);
 void (*io_set_aio_fd_handler)(QIOChannel *ioc,
-  AioContext *ctx,
+  AioContext *read_ctx,
   IOHandler *io_read,
+  AioContext *write_ctx,
   IOHandler *io_write,
   void *opaque);
 int (*io_flush)(QIOChannel *ioc,
@@ -498,6 +501,21 @@ int qio_channel_set_blocking(QIOChannel *ioc,
  bool enabled,
  Error **errp);
 
+/**
+ * qio_channel_set_follow_coroutine_ctx:
+ * @ioc: the channel object
+ * @enabled: whether or not to follow the coroutine's AioContext
+ *
+ * If @enabled is true, calls to qio_channel_yield() use the current
+ * coroutine's AioContext. Usually this is desirable.
+ *
+ * If @enabled is false, calls to qio_channel_yield() use the global iohandler
+ * AioContext. This is may be used by coroutines that run in the main loop and
+ * do

[PATCH 0/2] io: follow coroutine AioContext in qio_channel_yield()

2023-08-23 Thread Stefan Hajnoczi

The ongoing QEMU multi-queue block layer effort makes it possible for multiple
threads to process I/O in parallel. The nbd block driver is not compatible with
the multi-queue block layer yet because QIOChannel cannot be used easily from
coroutines running in multiple threads. This series changes the QIOChannel API
to make that possible.

Stefan Hajnoczi (2):
  io: check there are no qio_channel_yield() coroutines during
->finalize()
  io: follow coroutine AioContext in qio_channel_yield()

 include/io/channel.h |  34 -
 include/qemu/vhost-user-server.h |   1 +
 block/nbd.c  |  11 +--
 io/channel-command.c |  13 +++-
 io/channel-file.c|  18 -
 io/channel-null.c|   3 +-
 io/channel-socket.c  |  18 -
 io/channel-tls.c |   6 +-
 io/channel.c | 124 ++-
 migration/channel-block.c|   3 +-
 nbd/client.c |   2 +-
 nbd/server.c |  14 +---
 scsi/qemu-pr-helper.c|   4 +-
 util/vhost-user-server.c |  27 +--
 14 files changed, 195 insertions(+), 83 deletions(-)

-- 
2.41.0

[PATCH 1/2] io: check there are no qio_channel_yield() coroutines during ->finalize()

2023-08-23 Thread Stefan Hajnoczi

Callers must clean up their coroutines before calling
object_unref(OBJECT(ioc)) to prevent an fd handler leak. Add an
assertion to check this.

This patch is preparation for the fd handler changes that follow.

Signed-off-by: Stefan Hajnoczi 
---
 io/channel.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/io/channel.c b/io/channel.c
index 72f0066af5..c415f3fc88 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -653,6 +653,10 @@ static void qio_channel_finalize(Object *obj)
 {
 QIOChannel *ioc = QIO_CHANNEL(obj);
 
+/* Must not have coroutines in qio_channel_yield() */
+assert(!ioc->read_coroutine);
+assert(!ioc->write_coroutine);
+
 g_free(ioc->name);
 
 #ifdef _WIN32
-- 
2.41.0

Re: [PULL 00/12] First batch of s390x patches for QEMU 8.2

2023-08-23 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 01:45:32PM +0200, Thomas Huth wrote:
> The following changes since commit b0dd9a7d6dd15a6898e9c585b521e6bec79b25aa:
> 
>   Open 8.2 development tree (2023-08-22 07:14:07 -0700)
> 
> are available in the Git repository at:
> 
>   https://gitlab.com/thuth/qemu.git tags/pull-request-2023-08-23
> 
> for you to fetch changes up to 6c49f685d30ffe81cfa47da2c258904ad28ac368:
> 
>   tests/tcg/s390x: Test VSTRS (2023-08-23 12:07:30 +0200)

Hi Thomas,
Please take a look at the following ubuntu-20.04-s390x-all CI failure:
https://gitlab.com/qemu-project/qemu/-/jobs/4931341536

Stefan


signature.asc
Description: PGP signature

Re: NVMe ZNS last zone size

2023-08-23 Thread Stefan Hajnoczi

On Wed, 23 Aug 2023 at 14:53, Klaus Jensen  wrote:
>
> On Aug 23 22:58, Sam Li wrote:
> > Stefan Hajnoczi  于2023年8月23日周三 22:41写道：
> > >
> > > On Wed, 23 Aug 2023 at 10:24, Sam Li  wrote:
> > > >
> > > > Hi Stefan,
> > > >
> > > > Stefan Hajnoczi  于2023年8月23日周三 21:26写道：
> > > > >
> > > > > Hi Sam and Klaus,
> > > > > Val is adding nvme-io_uring ZNS support to libblkio
> > > > > (https://gitlab.com/libblkio/libblkio/-/merge_requests/221) and asked
> > > > > how to test the size of the last zone when the namespace's total size
> > > > > is not a multiple of the zone size.
> > > >
> > > > I think a zone report operation can do the trick. Given zone configs,
> > > > the size of last zone should be [size - (nr_zones - 1) * zone_size].
> > > > Reporting last zone on such devices tells whether the value is
> > > > correct.
> > >
> > > In nvme_ns_zoned_check_calc_geometry() the number of zones is rounded 
> > > down:
> > >
> > >   ns->num_zones = le64_to_cpu(ns->id_ns.nsze) / ns->zone_size;
> > >
> > > Afterwards nsze is recalculated as follows:
> > >
> > >   ns->id_ns.nsze = cpu_to_le64(ns->num_zones * ns->zone_size);
> > >
> > > I interpret this to mean that when the namespace's total size is not a
> > > multiple of the zone size, then the last part will be ignored and not
> > > exposed as a zone.
> >
> > I see. Current ZNS emulation does not support this case.
> >
>
> NVMe Zoned Namespaces requires all zones to be the same size. The
> "trailing zone" is a thing in SMR HDDs.

Thanks for letting me know.

Stefan

Re: NVMe ZNS last zone size

2023-08-23 Thread Stefan Hajnoczi

On Wed, 23 Aug 2023 at 10:24, Sam Li  wrote:
>
> Hi Stefan,
>
> Stefan Hajnoczi  于2023年8月23日周三 21:26写道：
> >
> > Hi Sam and Klaus,
> > Val is adding nvme-io_uring ZNS support to libblkio
> > (https://gitlab.com/libblkio/libblkio/-/merge_requests/221) and asked
> > how to test the size of the last zone when the namespace's total size
> > is not a multiple of the zone size.
>
> I think a zone report operation can do the trick. Given zone configs,
> the size of last zone should be [size - (nr_zones - 1) * zone_size].
> Reporting last zone on such devices tells whether the value is
> correct.

In nvme_ns_zoned_check_calc_geometry() the number of zones is rounded down:

  ns->num_zones = le64_to_cpu(ns->id_ns.nsze) / ns->zone_size;

Afterwards nsze is recalculated as follows:

  ns->id_ns.nsze = cpu_to_le64(ns->num_zones * ns->zone_size);

I interpret this to mean that when the namespace's total size is not a
multiple of the zone size, then the last part will be ignored and not
exposed as a zone.

>
> >
> > My understanding is that the zoned storage model allows the last zone
> > to be smaller than the zone size in this case. However, the NVMe ZNS
> > emulation code in QEMU makes all zones a multiple of the zone size. I
> > think QEMU cannot be used for this test case at the moment.
> >
> > Are there any plans to allow the last zone to have a different size?
> > Maybe Sam's qcow2 work will allow this?
>
> Yes, the zone report in qcow2 allows smaller last zone.
> Please let me know if there is any problem.

Great. Val can try your qcow2 patches and see if that allows her to
test last zone size != zone_size.

Thanks,
Stefan

NVMe ZNS last zone size

2023-08-23 Thread Stefan Hajnoczi

Hi Sam and Klaus,
Val is adding nvme-io_uring ZNS support to libblkio
(https://gitlab.com/libblkio/libblkio/-/merge_requests/221) and asked
how to test the size of the last zone when the namespace's total size
is not a multiple of the zone size.

My understanding is that the zoned storage model allows the last zone
to be smaller than the zone size in this case. However, the NVMe ZNS
emulation code in QEMU makes all zones a multiple of the zone size. I
think QEMU cannot be used for this test case at the moment.

Are there any plans to allow the last zone to have a different size?
Maybe Sam's qcow2 work will allow this?

Thanks,
Stefan

Re: [PULL 00/12] First batch of s390x patches for QEMU 8.2

2023-08-23 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PATCH v2] docs/about/license: Update LICENSE URL

2023-08-23 Thread Stefan Hajnoczi

On Tue, Aug 22, 2023 at 02:57:16PM +0200, Philippe Mathieu-Daudé wrote:
> In early 2021 (see commit 2ad784339e "docs: update README to use
> GitLab repo URLs") almost all of the code base was converted to
> point to GitLab instead of git.qemu.org. During 2023, git.qemu.org
> switched from a git mirror to a http redirect to GitLab (see [1]).
> 
> Update the LICENSE URL to match its previous content, displaying
> the file raw content similarly to gitweb 'blob_plain' format ([2]).
> 
> [1] 
> https://lore.kernel.org/qemu-devel/cabgobfzu3mfc8tm20k-yxdt7f-7ev-ukzn4skdarseu7dyo...@mail.gmail.com/
> [2] https://git-scm.com/docs/gitweb#Documentation/gitweb.txt-blobplain
> 
> Reviewed-by: Daniel P. Berrangé 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> v2: Drop '_type=heads' (danpb)
> ---
>  docs/about/license.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks, applied to my staging tree.

Stefan


signature.asc
Description: PGP signature

Starting the QEMU 8.2 release cycle

2023-08-23 Thread Stefan Hajnoczi

Hi,
The QEMU 8.2 release cycle has begun. I am merging qemu.git pull
requests for this release. You do not need to CC me on pull requests
because I am monitoring qemu-devel.

The proposed schedule is as follows:

2023-08-22  Beginning of development phase
2023-11-14  Soft feature freeze. Only bug fixes after this point. All 
feature changes must be already in a sub maintainer tree and all pull requests 
from submaintainers must have been sent to the list by this date.
2023-11-21  Hard feature freeze. Tag rc0
2023-11-28  Tag rc1
2023-12-05  Tag rc2
2023-12-12  Tag rc3
2023-12-19  Release; or tag rc4 if needed
2023-12-26  Release if we needed an rc4

Please let me know if these dates are inconvenient. I may adjust the
last date depending on my availability at the end of 2023.

Add the following milestone to issues that need to be tracked for this
release:
https://gitlab.com/qemu-project/qemu/-/milestones/10

You can update the changelog here:
https://wiki.qemu.org/ChangeLog/8.2

The release schedule is here:
https://wiki.qemu.org/Planning/8.2

Thanks,
Stefan


signature.asc
Description: PGP signature

Re: [PATCH v2 4/4] iotests: test the zoned format feature for qcow2 file

2023-08-22 Thread Stefan Hajnoczi

On Mon, Aug 14, 2023 at 04:58:02PM +0800, Sam Li wrote:
> The zoned format feature can be tested by:
> $ tests/qemu-iotests/check zoned-qcow2
> 
> Signed-off-by: Sam Li 
> ---
>  tests/qemu-iotests/tests/zoned-qcow2 | 135 ++
>  tests/qemu-iotests/tests/zoned-qcow2.out | 140 +++
>  2 files changed, 275 insertions(+)
>  create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
>  create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v2 3/4] qcow2: add zoned emulation capability

2023-08-22 Thread Stefan Hajnoczi

On Mon, Aug 14, 2023 at 04:58:01PM +0800, Sam Li wrote:
> By adding zone operations and zoned metadata, the zoned emulation
> capability enables full emulation support of zoned device using
> a qcow2 file. The zoned device metadata includes zone type,
> zoned device state and write pointer of each zone, which is stored
> to an array of unsigned integers.
> 
> Each zone of a zoned device makes state transitions following
> the zone state machine. The zone state machine mainly describes
> five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
> READ ONLY and OFFLINE states will generally be affected by device
> internal events. The operations on zones cause corresponding state
> changing.
> 
> Zoned devices have a limit on zone resources, which puts constraints on
> write operations into zones.
> 
> Signed-off-by: Sam Li 
> ---
>  block/qcow2.c  | 676 -
>  block/qcow2.h  |   2 +
>  docs/interop/qcow2.txt |   2 +
>  3 files changed, 678 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index c1077c4a4a..5ccf79cbe7 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -194,6 +194,164 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char 
> *fmt, Error **errp)
>  return cryptoopts_qdict;
>  }
>  
> +#define QCOW2_ZT_IS_CONV(wp)(wp & 1ULL << 59)
> +
> +static inline int qcow2_get_wp(uint64_t wp)
> +{
> +/* clear state and type information */
> +return ((wp << 5) >> 5);
> +}
> +
> +static inline int qcow2_get_zs(uint64_t wp)
> +{
> +return (wp >> 60);
> +}
> +
> +static inline void qcow2_set_wp(uint64_t *wp, BlockZoneState zs)
> +{
> +uint64_t addr = qcow2_get_wp(*wp);
> +addr |= ((uint64_t)zs << 60);
> +*wp = addr;
> +}
> +
> +/*
> + * File wp tracking: reset zone, finish zone and append zone can
> + * change the value of write pointer. All zone operations will change
> + * the state of that/those zone.
> + * */
> +static inline void qcow2_wp_tracking_helper(int index, uint64_t wp) {
> +/* format: operations, the wp. */
> +printf("wps[%d]: 0x%x\n", index, qcow2_get_wp(wp)>>BDRV_SECTOR_BITS);
> +}
> +
> +/*
> + * Perform a state assignment and a flush operation that writes the new wp
> + * value to the dedicated location of the disk file.
> + */
> +static int qcow2_write_wp_at(BlockDriverState *bs, uint64_t *wp,
> + uint32_t index, BlockZoneState zs) {
> +BDRVQcow2State *s = bs->opaque;
> +int ret;
> +
> +qcow2_set_wp(wp, zs);
> +ret = bdrv_pwrite(bs->file, s->zoned_header.zonedmeta_offset
> ++ sizeof(uint64_t) * index, sizeof(uint64_t), wp, 0);
> +
> +if (ret < 0) {
> +goto exit;

Should *wp be restored to its original value to undo the effect of
qcow2_set_wp()?

> +}
> +qcow2_wp_tracking_helper(index, *wp);
> +return ret;
> +
> +exit:
> +error_report("Failed to write metadata with file");
> +return ret;
> +}
> +
> +static int qcow2_check_active(BlockDriverState *bs)

Please rename this to qcow2_check_active_zones() to avoid confusion with
other uses "active" in qcow2.

> +{
> +BDRVQcow2State *s = bs->opaque;
> +
> +if (!s->zoned_header.max_active_zones) {
> +return 0;
> +}
> +
> +if (s->nr_zones_exp_open + s->nr_zones_imp_open + s->nr_zones_closed
> +< s->zoned_header.max_active_zones) {
> +return 0;
> +}
> +
> +return -1;
> +}

(This function could return a bool instead of 0/-1 since it doesn't
really need an int.)

> +
> +static int qcow2_check_open(BlockDriverState *bs)

qcow2_check_open_zones() or, even better, qcow2_can_open_zone().

> +{
> +BDRVQcow2State *s = bs->opaque;
> +int ret;
> +
> +if (!s->zoned_header.max_open_zones) {
> +return 0;
> +}
> +
> +if (s->nr_zones_exp_open + s->nr_zones_imp_open
> +< s->zoned_header.max_open_zones) {
> +return 0;
> +}
> +
> +if(s->nr_zones_imp_open) {
> +ret = qcow2_check_active(bs);
> +if (ret == 0) {
> +/* TODO: it takes O(n) time complexity (n = nr_zones).
> + * Optimizations required. */

One solution is to keep an implicitly open list. Then this operation is
O(1).

> +/* close one implicitly open zones to make it available */
> +for (int i = s->zoned_header.zone_nr_conv;
> +i < bs->bl.nr_zones; ++i) {
> +uint64_t *wp = >wps->wp[i];
> +if (qcow2_get_zs(*wp) == BLK_ZS_IOPEN) {
> +ret = qcow2_write_wp_at(bs, wp, i, BLK_ZS_CLOSED);

I'm wondering if it's correct to store the zone state persistently in
the qcow2 file. If the guest or QEMU crashes, then zones will be left in
states like EOPEN. Since the guest software will have forgotten about
explicitly opened zones, the guest would need to recover zone states.
I'm not sure if existing software is designed to do that.

Damien: Should the zone state be persistent?

> +

Re: [PATCH 21/21] block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK

2023-08-22 Thread Stefan Hajnoczi

On Thu, Aug 17, 2023 at 02:50:20PM +0200, Kevin Wolf wrote:
> The functions read the parents list in the generic block layer, so we
> need to hold the graph lock already there. The BlockDriver
> implementations actually modify the graph, so it has to be a writer
> lock.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block-global-state.h |  8 +---
>  include/block/block_int-common.h   |  9 +
>  block/quorum.c | 23 ++-
>  blockdev.c | 17 +++--
>  4 files changed, 27 insertions(+), 30 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 20/21] block: Mark bdrv_unref_child() GRAPH_WRLOCK

2023-08-22 Thread Stefan Hajnoczi

On Thu, Aug 17, 2023 at 02:50:19PM +0200, Kevin Wolf wrote:
> Instead of taking the writer lock internally, require callers to already
> hold it when calling bdrv_unref_child(). These callers will typically
> already hold the graph lock once the locking work is completed, which
> means that they can't call functions that take it internally.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block-global-state.h |  7 ++-
>  block.c| 11 +++
>  block/blklogwrites.c   |  4 
>  block/blkverify.c  |  2 ++
>  block/qcow2.c  |  4 +++-
>  block/quorum.c |  6 ++
>  block/replication.c|  3 +++
>  block/snapshot.c   |  2 ++
>  block/vmdk.c   | 11 +++
>  tests/unit/test-bdrv-drain.c   |  8 ++--
>  10 files changed, 50 insertions(+), 8 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 19/21] block: Mark bdrv_root_unref_child() GRAPH_WRLOCK

2023-08-22 Thread Stefan Hajnoczi

On Thu, Aug 17, 2023 at 02:50:18PM +0200, Kevin Wolf wrote:
> Instead of taking the writer lock internally, require callers to already
> hold it when calling bdrv_root_unref_child(). These callers will
> typically already hold the graph lock once the locking work is
> completed, which means that they can't call functions that take it
> internally.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block_int-global-state.h | 2 +-
>  block.c| 6 +++---
>  block/block-backend.c  | 3 +++
>  blockjob.c | 2 ++
>  4 files changed, 9 insertions(+), 4 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 17/21] block: Take graph rdlock in bdrv_drop_intermediate()

2023-08-22 Thread Stefan Hajnoczi

On Thu, Aug 17, 2023 at 02:50:16PM +0200, Kevin Wolf wrote:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 7df8780d6e..a82389f742 100644
> --- a/block.c
> +++ b/block.c
> @@ -5934,9 +5934,11 @@ int bdrv_drop_intermediate(BlockDriverState *top, 
> BlockDriverState *base,
>  backing_file_str = base->filename;
>  }
>  
> +bdrv_graph_rdlock_main_loop();
>  QLIST_FOREACH(c, >parents, next_parent) {
>  updated_children = g_slist_prepend(updated_children, c);
>  }
> +bdrv_graph_rdunlock_main_loop();

This is GLOBAL_STATE_CODE, so why take the read lock? I thought nothing
can modify the graph here. If it could, then stashing the parents in the
updated_children probably wouldn't be safe anyway.

>  
>  /*
>   * It seems correct to pass detach_subchain=true here, but it triggers
> -- 
> 2.41.0
> 


signature.asc
Description: PGP signature

Re: [PATCH 16/21] block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK

2023-08-22 Thread Stefan Hajnoczi

On Thu, Aug 17, 2023 at 02:50:15PM +0200, Kevin Wolf wrote:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 15/21] block: Mark bdrv_child_perm() GRAPH_RDLOCK

2023-08-22 Thread Stefan Hajnoczi

On Thu, Aug 17, 2023 at 02:50:14PM +0200, Kevin Wolf wrote:
> This adds GRAPH_RDLOCK annotations to declare that callers of
> bdrv_child_perm() need to hold a reader lock for the graph because
> some implementations access the children list of a node.
> 
> The callers of bdrv_child_perm() conveniently already hold the lock.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block_int-common.h | 10 +-
>  block.c  | 11 ++-
>  block/copy-before-write.c| 10 +-
>  3 files changed, 16 insertions(+), 15 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 14/21] block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK

2023-08-22 Thread Stefan Hajnoczi

On Thu, Aug 17, 2023 at 02:50:13PM +0200, Kevin Wolf wrote:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> This happens to result in BlockDriver.bdrv_set_perm() to be called with
> the graph lock held. For consistency, make it the same for all of the
> BlockDriver callbacks for updating permissions and annotate the function
> pointers with GRAPH_RDLOCK_PTR.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block_int-common.h   |  9 ---
>  include/block/block_int-global-state.h |  4 +--
>  block.c| 35 --
>  blockdev.c |  6 +
>  4 files changed, 40 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

< 5 6 7 8 9 10 11 12 13 14 >

901 - 1000 of 27226 matches

Mail list logo