[PATCH v2 2/3] libvduse: Replace strcpy() with strncpy()

2022-07-06 Thread Xie Yongji
Coverity reported a string overflow issue since we copied
"name" to "dev_config->name" without checking the length.
This should be a false positive since we already checked
the length of "name" in vduse_name_is_invalid(). But anyway,
let's replace strcpy() with strncpy() (as a general library,
we'd like to minimize dependencies on other libraries, so we
didn't use g_strlcpy() here) to fix the coverity complaint.

Fixes: Coverity CID 1490224
Signed-off-by: Xie Yongji 
Reviewed-by: Markus Armbruster 
---
 subprojects/libvduse/libvduse.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index 6374933881..1e36227388 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -1309,7 +1309,8 @@ VduseDev *vduse_dev_create(const char *name, uint32_t 
device_id,
 goto err_dev;
 }
 
-strcpy(dev_config->name, name);
+strncpy(dev_config->name, name, VDUSE_NAME_MAX);
+dev_config->name[VDUSE_NAME_MAX - 1] = '\0';
 dev_config->device_id = device_id;
 dev_config->vendor_id = vendor_id;
 dev_config->features = features;
-- 
2.20.1




[PATCH v2 0/3] Fix some coverity issues on VDUSE

2022-07-06 Thread Xie Yongji
This series fixes some issues reported by coverity.

Patch 1 fixes a incorrect function name.

Patch 2 fixes Coverity CID 1490224.

Patch 3 fixes Coverity CID 1490226, 1490223.

V1 to V2:
- Drop the patch to fix Coverity CID 1490222, 1490227 [Markus]
- Add some commit log to explain why we don't use g_strlcpy() [Markus]

Xie Yongji (3):
  libvduse: Fix the incorrect function name
  libvduse: Replace strcpy() with strncpy()
  libvduse: Pass positive value to strerror()

 subprojects/libvduse/libvduse.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

-- 
2.20.1




[PATCH v2 3/3] libvduse: Pass positive value to strerror()

2022-07-06 Thread Xie Yongji
The value passed to strerror() should be positive.
So let's fix it.

Fixes: Coverity CID 1490226, 1490223
Signed-off-by: Xie Yongji 
Reviewed-by: Richard Henderson 
Reviewed-by: Markus Armbruster 
---
 subprojects/libvduse/libvduse.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index 1e36227388..1a5981445c 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -1257,7 +1257,7 @@ VduseDev *vduse_dev_create_by_name(const char *name, 
uint16_t num_queues,
 ret = vduse_dev_init(dev, name, num_queues, ops, priv);
 if (ret < 0) {
 fprintf(stderr, "Failed to init vduse device %s: %s\n",
-name, strerror(ret));
+name, strerror(-ret));
 free(dev);
 return NULL;
 }
@@ -1331,7 +1331,7 @@ VduseDev *vduse_dev_create(const char *name, uint32_t 
device_id,
 ret = vduse_dev_init(dev, name, num_queues, ops, priv);
 if (ret < 0) {
 fprintf(stderr, "Failed to init vduse device %s: %s\n",
-name, strerror(ret));
+name, strerror(-ret));
 goto err;
 }
 
-- 
2.20.1




[PATCH v2 1/3] libvduse: Fix the incorrect function name

2022-07-06 Thread Xie Yongji
In vduse_name_is_valid(), we actually check whether
the name is invalid or not. So let's change the
function name to vduse_name_is_invalid() to match
the behavior.

Signed-off-by: Xie Yongji 
Reviewed-by: Markus Armbruster 
---
 subprojects/libvduse/libvduse.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index 9a2bcec282..6374933881 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -1193,7 +1193,7 @@ static int vduse_dev_init(VduseDev *dev, const char *name,
 return 0;
 }
 
-static inline bool vduse_name_is_valid(const char *name)
+static inline bool vduse_name_is_invalid(const char *name)
 {
 return strlen(name) >= VDUSE_NAME_MAX || strstr(name, "..");
 }
@@ -1242,7 +1242,7 @@ VduseDev *vduse_dev_create_by_name(const char *name, 
uint16_t num_queues,
 VduseDev *dev;
 int ret;
 
-if (!name || vduse_name_is_valid(name) || !ops ||
+if (!name || vduse_name_is_invalid(name) || !ops ||
 !ops->enable_queue || !ops->disable_queue) {
 fprintf(stderr, "Invalid parameter for vduse\n");
 return NULL;
@@ -1276,7 +1276,7 @@ VduseDev *vduse_dev_create(const char *name, uint32_t 
device_id,
 struct vduse_dev_config *dev_config;
 size_t size = offsetof(struct vduse_dev_config, config);
 
-if (!name || vduse_name_is_valid(name) ||
+if (!name || vduse_name_is_invalid(name) ||
 !has_feature(features,  VIRTIO_F_VERSION_1) || !config ||
 !config_size || !ops || !ops->enable_queue || !ops->disable_queue) {
 fprintf(stderr, "Invalid parameter for vduse\n");
-- 
2.20.1




[PATCH 3/4] libvduse: Pass positive value to strerror()

2022-06-27 Thread Xie Yongji
The value passed to strerror() should be positive.
So let's fix it.

Fixes: Coverity CID 1490226, 1490223
Signed-off-by: Xie Yongji 
---
 subprojects/libvduse/libvduse.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index 1e36227388..1a5981445c 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -1257,7 +1257,7 @@ VduseDev *vduse_dev_create_by_name(const char *name, 
uint16_t num_queues,
 ret = vduse_dev_init(dev, name, num_queues, ops, priv);
 if (ret < 0) {
 fprintf(stderr, "Failed to init vduse device %s: %s\n",
-name, strerror(ret));
+name, strerror(-ret));
 free(dev);
 return NULL;
 }
@@ -1331,7 +1331,7 @@ VduseDev *vduse_dev_create(const char *name, uint32_t 
device_id,
 ret = vduse_dev_init(dev, name, num_queues, ops, priv);
 if (ret < 0) {
 fprintf(stderr, "Failed to init vduse device %s: %s\n",
-name, strerror(ret));
+name, strerror(-ret));
 goto err;
 }
 
-- 
2.20.1




[PATCH 4/4] libvduse: Check the return value of some ioctls

2022-06-27 Thread Xie Yongji
Coverity pointed out (CID 1490222, 1490227) that we called
ioctl somewhere without checking the return value. This
patch fixes these issues.

Fixes: Coverity CID 1490222, 1490227
Signed-off-by: Xie Yongji 
---
 subprojects/libvduse/libvduse.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index 1a5981445c..bf7302c60a 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -947,7 +947,10 @@ static void vduse_queue_disable(VduseVirtq *vq)
 
 eventfd.index = vq->index;
 eventfd.fd = VDUSE_EVENTFD_DEASSIGN;
-ioctl(dev->fd, VDUSE_VQ_SETUP_KICKFD, &eventfd);
+if (ioctl(dev->fd, VDUSE_VQ_SETUP_KICKFD, &eventfd)) {
+fprintf(stderr, "Failed to disable eventfd for vq[%d]: %s\n",
+vq->index, strerror(errno));
+}
 close(vq->fd);
 
 assert(vq->inuse == 0);
@@ -1337,7 +1340,10 @@ VduseDev *vduse_dev_create(const char *name, uint32_t 
device_id,
 
 return dev;
 err:
-ioctl(ctrl_fd, VDUSE_DESTROY_DEV, name);
+if (ioctl(ctrl_fd, VDUSE_DESTROY_DEV, name)) {
+fprintf(stderr, "Failed to destroy vduse device %s: %s\n",
+name, strerror(errno));
+}
 err_dev:
 close(ctrl_fd);
 err_ctrl:
-- 
2.20.1




[PATCH 0/4] Fix some coverity issues on VDUSE

2022-06-27 Thread Xie Yongji
This series fixes some issues reported by coverity.

Patch 1 fixes an incorrect function name.

Patch 2 fixes Coverity CID 1490224.

Patch 3 fixes Coverity CID 1490226, 1490223.

Patch 4 fixes Coverity CID 1490222, 1490227.

Xie Yongji (4):
  libvduse: Fix the incorrect function name
  libvduse: Replace strcpy() with strncpy()
  libvduse: Pass positive value to strerror()
  libvduse: Check the return value of some ioctls

 subprojects/libvduse/libvduse.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

-- 
2.20.1




[PATCH 1/4] libvduse: Fix the incorrect function name

2022-06-27 Thread Xie Yongji
In vduse_name_is_valid(), we actually check whether
the name is invalid or not. So let's change the
function name to vduse_name_is_invalid() to match
the behavior.

Signed-off-by: Xie Yongji 
---
 subprojects/libvduse/libvduse.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index 9a2bcec282..6374933881 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -1193,7 +1193,7 @@ static int vduse_dev_init(VduseDev *dev, const char *name,
 return 0;
 }
 
-static inline bool vduse_name_is_valid(const char *name)
+static inline bool vduse_name_is_invalid(const char *name)
 {
 return strlen(name) >= VDUSE_NAME_MAX || strstr(name, "..");
 }
@@ -1242,7 +1242,7 @@ VduseDev *vduse_dev_create_by_name(const char *name, 
uint16_t num_queues,
 VduseDev *dev;
 int ret;
 
-if (!name || vduse_name_is_valid(name) || !ops ||
+if (!name || vduse_name_is_invalid(name) || !ops ||
 !ops->enable_queue || !ops->disable_queue) {
 fprintf(stderr, "Invalid parameter for vduse\n");
 return NULL;
@@ -1276,7 +1276,7 @@ VduseDev *vduse_dev_create(const char *name, uint32_t 
device_id,
 struct vduse_dev_config *dev_config;
 size_t size = offsetof(struct vduse_dev_config, config);
 
-if (!name || vduse_name_is_valid(name) ||
+if (!name || vduse_name_is_invalid(name) ||
 !has_feature(features,  VIRTIO_F_VERSION_1) || !config ||
 !config_size || !ops || !ops->enable_queue || !ops->disable_queue) {
 fprintf(stderr, "Invalid parameter for vduse\n");
-- 
2.20.1




[PATCH 2/4] libvduse: Replace strcpy() with strncpy()

2022-06-27 Thread Xie Yongji
Coverity reported a string overflow issue since we copied
"name" to "dev_config->name" without checking the length.
This should be a false positive since we already checked
the length of "name" in vduse_name_is_invalid(). But anyway,
let's replace strcpy() with strncpy() to fix the coverity
complaint.

Fixes: Coverity CID 1490224
Signed-off-by: Xie Yongji 
---
 subprojects/libvduse/libvduse.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index 6374933881..1e36227388 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -1309,7 +1309,8 @@ VduseDev *vduse_dev_create(const char *name, uint32_t 
device_id,
 goto err_dev;
 }
 
-strcpy(dev_config->name, name);
+strncpy(dev_config->name, name, VDUSE_NAME_MAX);
+dev_config->name[VDUSE_NAME_MAX - 1] = '\0';
 dev_config->device_id = device_id;
 dev_config->vendor_id = vendor_id;
 dev_config->features = features;
-- 
2.20.1




[PATCH v2 6/6] vduse-blk: Add name option

2022-06-13 Thread Xie Yongji
Currently we use 'id' option as the name of VDUSE device.
It's a bit confusing since we use one value for two different
purposes: the ID to identfy the export within QEMU (must be
distinct from any other exports in the same QEMU process, but
can overlap with names used by other processes), and the VDUSE
name to uniquely identify it on the host (must be distinct from
other VDUSE devices on the same host, but can overlap with other
export types like NBD in the same process). To make it clear,
this patch adds a separate 'name' option to specify the VDUSE
name for the vduse-blk export instead.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 4 ++--
 docs/tools/qemu-storage-daemon.rst   | 5 +++--
 qapi/block-export.json   | 7 ---
 storage-daemon/qemu-storage-daemon.c | 8 
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 066e088b00..f101c24c3f 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -300,7 +300,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 features |= 1ULL << VIRTIO_BLK_F_RO;
 }
 
-vblk_exp->dev = vduse_dev_create(exp->id, VIRTIO_ID_BLOCK, 0,
+vblk_exp->dev = vduse_dev_create(vblk_opts->name, VIRTIO_ID_BLOCK, 0,
  features, num_queues,
  sizeof(struct virtio_blk_config),
  (char *)&config, &vduse_blk_ops,
@@ -312,7 +312,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 }
 
 vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
-   g_get_tmp_dir(), exp->id);
+   g_get_tmp_dir(), vblk_opts->name);
 if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
 error_setg(errp, "failed to set reconnect log file");
 ret = -EINVAL;
diff --git a/docs/tools/qemu-storage-daemon.rst 
b/docs/tools/qemu-storage-daemon.rst
index 034f2809a6..ea00149a63 100644
--- a/docs/tools/qemu-storage-daemon.rst
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -77,7 +77,7 @@ Standard options:
   --export 
[type=]vhost-user-blk,id=,node-name=,addr.type=unix,addr.path=[,writable=on|off][,logical-block-size=][,num-queues=]
   --export 
[type=]vhost-user-blk,id=,node-name=,addr.type=fd,addr.str=[,writable=on|off][,logical-block-size=][,num-queues=]
   --export 
[type=]fuse,id=,node-name=,mountpoint=[,growable=on|off][,writable=on|off][,allow-other=on|off|auto]
-  --export 
[type=]vduse-blk,id=,node-name=[,writable=on|off][,num-queues=][,queue-size=][,logical-block-size=][,serial=]
+  --export 
[type=]vduse-blk,id=,node-name=,name=[,writable=on|off][,num-queues=][,queue-size=][,logical-block-size=][,serial=]
 
   is a block export definition. ``node-name`` is the block node that should be
   exported. ``writable`` determines whether or not the export allows write
@@ -111,7 +111,8 @@ Standard options:
   ``allow-other`` to auto (the default) will try enabling this option, and on
   error fall back to disabling it.
 
-  The ``vduse-blk`` export type uses the ``id`` as the VDUSE device name.
+  The ``vduse-blk`` export type takes a ``name`` (must be unique across the 
host)
+  to create the VDUSE device.
   ``num-queues`` sets the number of virtqueues (the default is 1).
   ``queue-size`` sets the virtqueue descriptor table size (the default is 256).
 
diff --git a/qapi/block-export.json b/qapi/block-export.json
index d7aeb1fbf7..81ef1e3dcd 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -182,6 +182,7 @@
 #
 # A vduse-blk block export.
 #
+# @name: the name of VDUSE device (must be unique across the host).
 # @num-queues: the number of virtqueues. Defaults to 1.
 # @queue-size: the size of virtqueue. Defaults to 256.
 # @logical-block-size: Logical block size in bytes. Range [512, PAGE_SIZE]
@@ -191,7 +192,8 @@
 # Since: 7.1
 ##
 { 'struct': 'BlockExportOptionsVduseBlk',
-  'data': { '*num-queues': 'uint16',
+  'data': { 'name': 'str',
+'*num-queues': 'uint16',
 '*queue-size': 'uint16',
 '*logical-block-size': 'size',
 '*serial': 'str' } }
@@ -316,8 +318,7 @@
 # Describes a block export, i.e. how single node should be exported on an
 # external interface.
 #
-# @id: A unique identifier for the block export (across the host for vduse-blk
-#  export type or across all export types for other types)
+# @id: A unique identifier for the block export (across all export types)
 #
 # @node-name: The node name of the block node to be exported (since: 5.2)
 #
diff --git a/stora

[PATCH v2 4/6] vduse-blk: Don't delete the export until all inflight I/Os completed

2022-06-13 Thread Xie Yongji
Don't delete the export until all inflight I/Os completed.
Otherwise, it might lead to a use-after-free.

Fixes: cc241b5505b2 ("vduse-blk: Implement vduse-blk export")
Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index c3a89894ae..251d73c841 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -31,6 +31,7 @@ typedef struct VduseBlkExport {
 VduseDev *dev;
 uint16_t num_queues;
 char *recon_file;
+unsigned int inflight;
 } VduseBlkExport;
 
 typedef struct VduseBlkReq {
@@ -38,6 +39,18 @@ typedef struct VduseBlkReq {
 VduseVirtq *vq;
 } VduseBlkReq;
 
+static void vduse_blk_inflight_inc(VduseBlkExport *vblk_exp)
+{
+vblk_exp->inflight++;
+}
+
+static void vduse_blk_inflight_dec(VduseBlkExport *vblk_exp)
+{
+if (--vblk_exp->inflight == 0) {
+aio_wait_kick();
+}
+}
+
 static void vduse_blk_req_complete(VduseBlkReq *req, size_t in_len)
 {
 vduse_queue_push(req->vq, &req->elem, in_len);
@@ -68,10 +81,13 @@ static void coroutine_fn vduse_blk_virtio_process_req(void 
*opaque)
 }
 
 vduse_blk_req_complete(req, in_len);
+vduse_blk_inflight_dec(vblk_exp);
 }
 
 static void vduse_blk_vq_handler(VduseDev *dev, VduseVirtq *vq)
 {
+VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+
 while (1) {
 VduseBlkReq *req;
 
@@ -83,6 +99,8 @@ static void vduse_blk_vq_handler(VduseDev *dev, VduseVirtq 
*vq)
 
 Coroutine *co =
 qemu_coroutine_create(vduse_blk_virtio_process_req, req);
+
+vduse_blk_inflight_inc(vblk_exp);
 qemu_coroutine_enter(co);
 }
 }
@@ -168,6 +186,8 @@ static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp)
 }
 aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
true, NULL, NULL, NULL, NULL, NULL);
+
+AIO_WAIT_WHILE(vblk_exp->export.ctx, vblk_exp->inflight > 0);
 }
 
 
@@ -332,7 +352,9 @@ static void vduse_blk_exp_request_shutdown(BlockExport *exp)
 {
 VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
 
+aio_context_acquire(vblk_exp->export.ctx);
 vduse_blk_detach_ctx(vblk_exp);
+aio_context_acquire(vblk_exp->export.ctx);
 }
 
 const BlockExportDriver blk_exp_vduse_blk = {
-- 
2.20.1




[PATCH v2 0/6] Some fixes and improvements for vduse-blk

2022-06-13 Thread Xie Yongji
This series includes few fixes and improvements for the
vduse-blk export.

Patch 1 fixes some compile errors with clang in 32-bit machine.

Patch 2 fixes resources leak when vduse fd is zero.

Patch 3, 4 fixes two bugs which could be triggered
by force deleting a vduse-blk export with high I/O loads.

Patch 5, 6 adds two new options for vduse-blk export.

V1 to V2:
- Add a patch to fix some compile errors with clang

Xie Yongji (6):
  libvduse: Fix some compile errors with clang
  libvduse: Fix resources leak in vduse_dev_destroy()
  vduse-blk: Don't unlink the reconnect file if device exists
  vduse-blk: Don't delete the export until all inflight I/Os completed
  vduse-blk: Add serial option
  vduse-blk: Add name option

 block/export/vduse-blk.c | 53 ++--
 block/export/vhost-user-blk-server.c |  4 ++-
 block/export/virtio-blk-handler.h|  2 +-
 docs/tools/qemu-storage-daemon.rst   |  5 +--
 qapi/block-export.json   | 11 +++---
 storage-daemon/qemu-storage-daemon.c |  9 ++---
 subprojects/libvduse/libvduse.c  | 27 +++---
 7 files changed, 67 insertions(+), 44 deletions(-)

-- 
2.20.1




[PATCH v2 5/6] vduse-blk: Add serial option

2022-06-13 Thread Xie Yongji
Add a 'serial' option to allow user to specify this value
explicitly. And the default value is changed to an empty
string as what we did in "hw/block/virtio-blk.c".

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 20 ++--
 block/export/vhost-user-blk-server.c |  4 +++-
 block/export/virtio-blk-handler.h|  2 +-
 docs/tools/qemu-storage-daemon.rst   |  2 +-
 qapi/block-export.json   |  4 +++-
 storage-daemon/qemu-storage-daemon.c |  1 +
 6 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 251d73c841..066e088b00 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -235,7 +235,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 Error *local_err = NULL;
 struct virtio_blk_config config = { 0 };
 uint64_t features;
-int i;
+int i, ret;
 
 if (vblk_opts->has_num_queues) {
 num_queues = vblk_opts->num_queues;
@@ -265,7 +265,8 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 }
 vblk_exp->num_queues = num_queues;
 vblk_exp->handler.blk = exp->blk;
-vblk_exp->handler.serial = exp->id;
+vblk_exp->handler.serial = g_strdup(vblk_opts->has_serial ?
+vblk_opts->serial : "");
 vblk_exp->handler.logical_block_size = logical_block_size;
 vblk_exp->handler.writable = opts->writable;
 
@@ -306,16 +307,16 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
  vblk_exp);
 if (!vblk_exp->dev) {
 error_setg(errp, "failed to create vduse device");
-return -ENOMEM;
+ret = -ENOMEM;
+goto err_dev;
 }
 
 vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
g_get_tmp_dir(), exp->id);
 if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
 error_setg(errp, "failed to set reconnect log file");
-vduse_dev_destroy(vblk_exp->dev);
-g_free(vblk_exp->recon_file);
-return -EINVAL;
+ret = -EINVAL;
+goto err;
 }
 
 for (i = 0; i < num_queues; i++) {
@@ -331,6 +332,12 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
 
 return 0;
+err:
+vduse_dev_destroy(vblk_exp->dev);
+g_free(vblk_exp->recon_file);
+err_dev:
+g_free(vblk_exp->handler.serial);
+return ret;
 }
 
 static void vduse_blk_exp_delete(BlockExport *exp)
@@ -346,6 +353,7 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 unlink(vblk_exp->recon_file);
 }
 g_free(vblk_exp->recon_file);
+g_free(vblk_exp->handler.serial);
 }
 
 static void vduse_blk_exp_request_shutdown(BlockExport *exp)
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index c9c290cc4c..3409d9e02e 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -282,7 +282,7 @@ static int vu_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 return -EINVAL;
 }
 vexp->handler.blk = exp->blk;
-vexp->handler.serial = "vhost_user_blk";
+vexp->handler.serial = g_strdup("vhost_user_blk");
 vexp->handler.logical_block_size = logical_block_size;
 vexp->handler.writable = opts->writable;
 
@@ -296,6 +296,7 @@ static int vu_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
  num_queues, &vu_blk_iface, errp)) {
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
 blk_aio_detach, vexp);
+g_free(vexp->handler.serial);
 return -EADDRNOTAVAIL;
 }
 
@@ -308,6 +309,7 @@ static void vu_blk_exp_delete(BlockExport *exp)
 
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 vexp);
+g_free(vexp->handler.serial);
 }
 
 const BlockExportDriver blk_exp_vhost_user_blk = {
diff --git a/block/export/virtio-blk-handler.h 
b/block/export/virtio-blk-handler.h
index 1c7a5e32ad..150d44cff2 100644
--- a/block/export/virtio-blk-handler.h
+++ b/block/export/virtio-blk-handler.h
@@ -23,7 +23,7 @@
 
 typedef struct {
 BlockBackend *blk;
-const char *serial;
+char *serial;
 uint32_t logical_block_size;
 bool writable;
 } VirtioBlkHandler;
diff --git a/docs/tools/qemu-storage-daemon.rst 
b/docs/tools/qemu-storage-daemon.rst
index fbeaf76954..034f2809a6 100644
--- a/docs/tools/qemu-storage-daemon.rst
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -77,7 +77,7 @@ Standard options:
   --expor

[PATCH v2 3/6] vduse-blk: Don't unlink the reconnect file if device exists

2022-06-13 Thread Xie Yongji
We should not unlink the reconnect file if vduse_dev_destroy()
fails with -EBUSY which means the VDUSE device has not been
removed from the vDPA bus. Otherwise, we might fail on
the reconnection later.

Fixes: 730abef0e873 ("libvduse: Add support for reconnecting")
Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 3b10349173..c3a89894ae 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -316,12 +316,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 static void vduse_blk_exp_delete(BlockExport *exp)
 {
 VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+int ret;
 
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 vblk_exp);
 blk_set_dev_ops(exp->blk, NULL, NULL);
-vduse_dev_destroy(vblk_exp->dev);
-unlink(vblk_exp->recon_file);
+ret = vduse_dev_destroy(vblk_exp->dev);
+if (ret != -EBUSY) {
+unlink(vblk_exp->recon_file);
+}
 g_free(vblk_exp->recon_file);
 }
 
-- 
2.20.1




[PATCH v2 2/6] libvduse: Fix resources leak in vduse_dev_destroy()

2022-06-13 Thread Xie Yongji
This fixes resource leak when the fd is zero in
vduse_dev_destroy().

Fixes: 8dbd281c1675 ("libvduse: Add VDUSE (vDPA Device in Userspace) library")
Signed-off-by: Xie Yongji 
---
 subprojects/libvduse/libvduse.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index dd1faffe66..9a2bcec282 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -1357,11 +1357,11 @@ int vduse_dev_destroy(VduseDev *dev)
 free(dev->vqs[i].resubmit_list);
 }
 free(dev->vqs);
-if (dev->fd > 0) {
+if (dev->fd >= 0) {
 close(dev->fd);
 dev->fd = -1;
 }
-if (dev->ctrl_fd > 0) {
+if (dev->ctrl_fd >= 0) {
 if (ioctl(dev->ctrl_fd, VDUSE_DESTROY_DEV, dev->name)) {
 ret = -errno;
 }
-- 
2.20.1




[PATCH v2 1/6] libvduse: Fix some compile errors with clang

2022-06-13 Thread Xie Yongji
This fixes some compile errors with clang:

../subprojects/libvduse/libvduse.c:578:20: error: unused function
'vring_used_flags_set_bit' [-Werror,-Wunused-function]
static inline void vring_used_flags_set_bit(VduseVirtq *vq, int mask)
   ^
../subprojects/libvduse/libvduse.c:587:20: error: unused function
'vring_used_flags_unset_bit' [-Werror,-Wunused-function]
static inline void vring_used_flags_unset_bit(VduseVirtq *vq, int mask)

../subprojects/libvduse/libvduse.c:325:20: error: cast to pointer from
integer of different size [-Werror=int-to-pointer-cast]
   325 | munmap((void *)dev->regions[i].mmap_addr,
   |^
../subprojects/libvduse/libvduse.c: In function 'vduse_dev_create':
../subprojects/libvduse/libvduse.c:1318:54: error: format '%lu' expects
argument of type 'long unsigned int', but argument 3 has type 'uint64_t'
{aka 'long long unsigned int'} [-Werror=format=]
 1318 | fprintf(stderr, "Failed to set api version %lu: %s\n",
  |~~^
  |  |
  |  long unsigned int
  |%llu
 1319 | version, strerror(errno));
  | ~~~
  |     |
  | uint64_t {aka long long unsigned int}

Signed-off-by: Xie Yongji 
---
 subprojects/libvduse/libvduse.c | 23 +++
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index 78bb777402..dd1faffe66 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -322,7 +323,7 @@ static void vduse_iova_remove_region(VduseDev *dev, 
uint64_t start,
 
 if (start <= dev->regions[i].iova &&
 last >= (dev->regions[i].iova + dev->regions[i].size - 1)) {
-munmap((void *)dev->regions[i].mmap_addr,
+munmap((void *)(uintptr_t)dev->regions[i].mmap_addr,
dev->regions[i].mmap_offset + dev->regions[i].size);
 dev->regions[i].mmap_addr = 0;
 dev->num_regions--;
@@ -575,24 +576,6 @@ void vduse_queue_notify(VduseVirtq *vq)
 }
 }
 
-static inline void vring_used_flags_set_bit(VduseVirtq *vq, int mask)
-{
-uint16_t *flags;
-
-flags = (uint16_t *)((char*)vq->vring.used +
- offsetof(struct vring_used, flags));
-*flags = htole16(le16toh(*flags) | mask);
-}
-
-static inline void vring_used_flags_unset_bit(VduseVirtq *vq, int mask)
-{
-uint16_t *flags;
-
-flags = (uint16_t *)((char*)vq->vring.used +
- offsetof(struct vring_used, flags));
-*flags = htole16(le16toh(*flags) & ~mask);
-}
-
 static inline void vring_set_avail_event(VduseVirtq *vq, uint16_t val)
 {
 *((uint16_t *)&vq->vring.used->ring[vq->vring.num]) = htole16(val);
@@ -1315,7 +1298,7 @@ VduseDev *vduse_dev_create(const char *name, uint32_t 
device_id,
 
 version = VDUSE_API_VERSION;
 if (ioctl(ctrl_fd, VDUSE_SET_API_VERSION, &version)) {
-fprintf(stderr, "Failed to set api version %lu: %s\n",
+fprintf(stderr, "Failed to set api version %" PRIu64 ": %s\n",
 version, strerror(errno));
 goto err_dev;
 }
-- 
2.20.1




[PATCH 5/5] vduse-blk: Add name option

2022-06-13 Thread Xie Yongji
Currently we use 'id' option as the name of VDUSE device.
It's a bit confusing since we use one value for two different
purposes: the ID to identfy the export within QEMU (must be
distinct from any other exports in the same QEMU process, but
can overlap with names used by other processes), and the VDUSE
name to uniquely identify it on the host (must be distinct from
other VDUSE devices on the same host, but can overlap with other
export types like NBD in the same process). To make it clear,
this patch adds a separate 'name' option to specify the VDUSE
name for the vduse-blk export instead.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 4 ++--
 docs/tools/qemu-storage-daemon.rst   | 5 +++--
 qapi/block-export.json   | 7 ---
 storage-daemon/qemu-storage-daemon.c | 8 
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 066e088b00..f101c24c3f 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -300,7 +300,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 features |= 1ULL << VIRTIO_BLK_F_RO;
 }
 
-vblk_exp->dev = vduse_dev_create(exp->id, VIRTIO_ID_BLOCK, 0,
+vblk_exp->dev = vduse_dev_create(vblk_opts->name, VIRTIO_ID_BLOCK, 0,
  features, num_queues,
  sizeof(struct virtio_blk_config),
  (char *)&config, &vduse_blk_ops,
@@ -312,7 +312,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 }
 
 vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
-   g_get_tmp_dir(), exp->id);
+   g_get_tmp_dir(), vblk_opts->name);
 if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
 error_setg(errp, "failed to set reconnect log file");
 ret = -EINVAL;
diff --git a/docs/tools/qemu-storage-daemon.rst 
b/docs/tools/qemu-storage-daemon.rst
index 034f2809a6..ea00149a63 100644
--- a/docs/tools/qemu-storage-daemon.rst
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -77,7 +77,7 @@ Standard options:
   --export 
[type=]vhost-user-blk,id=,node-name=,addr.type=unix,addr.path=[,writable=on|off][,logical-block-size=][,num-queues=]
   --export 
[type=]vhost-user-blk,id=,node-name=,addr.type=fd,addr.str=[,writable=on|off][,logical-block-size=][,num-queues=]
   --export 
[type=]fuse,id=,node-name=,mountpoint=[,growable=on|off][,writable=on|off][,allow-other=on|off|auto]
-  --export 
[type=]vduse-blk,id=,node-name=[,writable=on|off][,num-queues=][,queue-size=][,logical-block-size=][,serial=]
+  --export 
[type=]vduse-blk,id=,node-name=,name=[,writable=on|off][,num-queues=][,queue-size=][,logical-block-size=][,serial=]
 
   is a block export definition. ``node-name`` is the block node that should be
   exported. ``writable`` determines whether or not the export allows write
@@ -111,7 +111,8 @@ Standard options:
   ``allow-other`` to auto (the default) will try enabling this option, and on
   error fall back to disabling it.
 
-  The ``vduse-blk`` export type uses the ``id`` as the VDUSE device name.
+  The ``vduse-blk`` export type takes a ``name`` (must be unique across the 
host)
+  to create the VDUSE device.
   ``num-queues`` sets the number of virtqueues (the default is 1).
   ``queue-size`` sets the virtqueue descriptor table size (the default is 256).
 
diff --git a/qapi/block-export.json b/qapi/block-export.json
index d7aeb1fbf7..81ef1e3dcd 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -182,6 +182,7 @@
 #
 # A vduse-blk block export.
 #
+# @name: the name of VDUSE device (must be unique across the host).
 # @num-queues: the number of virtqueues. Defaults to 1.
 # @queue-size: the size of virtqueue. Defaults to 256.
 # @logical-block-size: Logical block size in bytes. Range [512, PAGE_SIZE]
@@ -191,7 +192,8 @@
 # Since: 7.1
 ##
 { 'struct': 'BlockExportOptionsVduseBlk',
-  'data': { '*num-queues': 'uint16',
+  'data': { 'name': 'str',
+'*num-queues': 'uint16',
 '*queue-size': 'uint16',
 '*logical-block-size': 'size',
 '*serial': 'str' } }
@@ -316,8 +318,7 @@
 # Describes a block export, i.e. how single node should be exported on an
 # external interface.
 #
-# @id: A unique identifier for the block export (across the host for vduse-blk
-#  export type or across all export types for other types)
+# @id: A unique identifier for the block export (across all export types)
 #
 # @node-name: The node name of the block node to be exported (since: 5.2)
 #
diff --git a/stora

[PATCH 3/5] vduse-blk: Don't delete the export until all inflight I/Os completed

2022-06-13 Thread Xie Yongji
Don't delete the export until all inflight I/Os completed.
Otherwise, it might lead to a use-after-free.

Fixes: cc241b5505b2 ("vduse-blk: Implement vduse-blk export")
Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index c3a89894ae..251d73c841 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -31,6 +31,7 @@ typedef struct VduseBlkExport {
 VduseDev *dev;
 uint16_t num_queues;
 char *recon_file;
+unsigned int inflight;
 } VduseBlkExport;
 
 typedef struct VduseBlkReq {
@@ -38,6 +39,18 @@ typedef struct VduseBlkReq {
 VduseVirtq *vq;
 } VduseBlkReq;
 
+static void vduse_blk_inflight_inc(VduseBlkExport *vblk_exp)
+{
+vblk_exp->inflight++;
+}
+
+static void vduse_blk_inflight_dec(VduseBlkExport *vblk_exp)
+{
+if (--vblk_exp->inflight == 0) {
+aio_wait_kick();
+}
+}
+
 static void vduse_blk_req_complete(VduseBlkReq *req, size_t in_len)
 {
 vduse_queue_push(req->vq, &req->elem, in_len);
@@ -68,10 +81,13 @@ static void coroutine_fn vduse_blk_virtio_process_req(void 
*opaque)
 }
 
 vduse_blk_req_complete(req, in_len);
+vduse_blk_inflight_dec(vblk_exp);
 }
 
 static void vduse_blk_vq_handler(VduseDev *dev, VduseVirtq *vq)
 {
+VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+
 while (1) {
 VduseBlkReq *req;
 
@@ -83,6 +99,8 @@ static void vduse_blk_vq_handler(VduseDev *dev, VduseVirtq 
*vq)
 
 Coroutine *co =
 qemu_coroutine_create(vduse_blk_virtio_process_req, req);
+
+vduse_blk_inflight_inc(vblk_exp);
 qemu_coroutine_enter(co);
 }
 }
@@ -168,6 +186,8 @@ static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp)
 }
 aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
true, NULL, NULL, NULL, NULL, NULL);
+
+AIO_WAIT_WHILE(vblk_exp->export.ctx, vblk_exp->inflight > 0);
 }
 
 
@@ -332,7 +352,9 @@ static void vduse_blk_exp_request_shutdown(BlockExport *exp)
 {
 VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
 
+aio_context_acquire(vblk_exp->export.ctx);
 vduse_blk_detach_ctx(vblk_exp);
+aio_context_acquire(vblk_exp->export.ctx);
 }
 
 const BlockExportDriver blk_exp_vduse_blk = {
-- 
2.20.1




[PATCH 2/5] vduse-blk: Don't unlink the reconnect file if device exists

2022-06-13 Thread Xie Yongji
We should not unlink the reconnect file if vduse_dev_destroy()
fails with -EBUSY which means the VDUSE device has not been
removed from the vDPA bus. Otherwise, we might fail on
the reconnection later.

Fixes: 730abef0e873 ("libvduse: Add support for reconnecting")
Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 3b10349173..c3a89894ae 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -316,12 +316,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 static void vduse_blk_exp_delete(BlockExport *exp)
 {
 VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+int ret;
 
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 vblk_exp);
 blk_set_dev_ops(exp->blk, NULL, NULL);
-vduse_dev_destroy(vblk_exp->dev);
-unlink(vblk_exp->recon_file);
+ret = vduse_dev_destroy(vblk_exp->dev);
+if (ret != -EBUSY) {
+unlink(vblk_exp->recon_file);
+}
 g_free(vblk_exp->recon_file);
 }
 
-- 
2.20.1




[PATCH 1/5] libvduse: Fix resources leak in vduse_dev_destroy()

2022-06-13 Thread Xie Yongji
This fixes resource leak when the fd is zero in
vduse_dev_destroy().

Fixes: 8dbd281c1675 ("libvduse: Add VDUSE (vDPA Device in Userspace) library")
Signed-off-by: Xie Yongji 
---
 subprojects/libvduse/libvduse.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index 78bb777402..e781bfa907 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -1374,11 +1374,11 @@ int vduse_dev_destroy(VduseDev *dev)
 free(dev->vqs[i].resubmit_list);
 }
 free(dev->vqs);
-if (dev->fd > 0) {
+if (dev->fd >= 0) {
 close(dev->fd);
 dev->fd = -1;
 }
-if (dev->ctrl_fd > 0) {
+if (dev->ctrl_fd >= 0) {
 if (ioctl(dev->ctrl_fd, VDUSE_DESTROY_DEV, dev->name)) {
 ret = -errno;
 }
-- 
2.20.1




[PATCH 4/5] vduse-blk: Add serial option

2022-06-13 Thread Xie Yongji
Add a 'serial' option to allow user to specify this value
explicitly. And the default value is changed to an empty
string as what we did in "hw/block/virtio-blk.c".

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 20 ++--
 block/export/vhost-user-blk-server.c |  4 +++-
 block/export/virtio-blk-handler.h|  2 +-
 docs/tools/qemu-storage-daemon.rst   |  2 +-
 qapi/block-export.json   |  4 +++-
 storage-daemon/qemu-storage-daemon.c |  1 +
 6 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 251d73c841..066e088b00 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -235,7 +235,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 Error *local_err = NULL;
 struct virtio_blk_config config = { 0 };
 uint64_t features;
-int i;
+int i, ret;
 
 if (vblk_opts->has_num_queues) {
 num_queues = vblk_opts->num_queues;
@@ -265,7 +265,8 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 }
 vblk_exp->num_queues = num_queues;
 vblk_exp->handler.blk = exp->blk;
-vblk_exp->handler.serial = exp->id;
+vblk_exp->handler.serial = g_strdup(vblk_opts->has_serial ?
+vblk_opts->serial : "");
 vblk_exp->handler.logical_block_size = logical_block_size;
 vblk_exp->handler.writable = opts->writable;
 
@@ -306,16 +307,16 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
  vblk_exp);
 if (!vblk_exp->dev) {
 error_setg(errp, "failed to create vduse device");
-return -ENOMEM;
+ret = -ENOMEM;
+goto err_dev;
 }
 
 vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
g_get_tmp_dir(), exp->id);
 if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
 error_setg(errp, "failed to set reconnect log file");
-vduse_dev_destroy(vblk_exp->dev);
-g_free(vblk_exp->recon_file);
-return -EINVAL;
+ret = -EINVAL;
+goto err;
 }
 
 for (i = 0; i < num_queues; i++) {
@@ -331,6 +332,12 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
 
 return 0;
+err:
+vduse_dev_destroy(vblk_exp->dev);
+g_free(vblk_exp->recon_file);
+err_dev:
+g_free(vblk_exp->handler.serial);
+return ret;
 }
 
 static void vduse_blk_exp_delete(BlockExport *exp)
@@ -346,6 +353,7 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 unlink(vblk_exp->recon_file);
 }
 g_free(vblk_exp->recon_file);
+g_free(vblk_exp->handler.serial);
 }
 
 static void vduse_blk_exp_request_shutdown(BlockExport *exp)
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index c9c290cc4c..3409d9e02e 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -282,7 +282,7 @@ static int vu_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 return -EINVAL;
 }
 vexp->handler.blk = exp->blk;
-vexp->handler.serial = "vhost_user_blk";
+vexp->handler.serial = g_strdup("vhost_user_blk");
 vexp->handler.logical_block_size = logical_block_size;
 vexp->handler.writable = opts->writable;
 
@@ -296,6 +296,7 @@ static int vu_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
  num_queues, &vu_blk_iface, errp)) {
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
 blk_aio_detach, vexp);
+g_free(vexp->handler.serial);
 return -EADDRNOTAVAIL;
 }
 
@@ -308,6 +309,7 @@ static void vu_blk_exp_delete(BlockExport *exp)
 
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 vexp);
+g_free(vexp->handler.serial);
 }
 
 const BlockExportDriver blk_exp_vhost_user_blk = {
diff --git a/block/export/virtio-blk-handler.h 
b/block/export/virtio-blk-handler.h
index 1c7a5e32ad..150d44cff2 100644
--- a/block/export/virtio-blk-handler.h
+++ b/block/export/virtio-blk-handler.h
@@ -23,7 +23,7 @@
 
 typedef struct {
 BlockBackend *blk;
-const char *serial;
+char *serial;
 uint32_t logical_block_size;
 bool writable;
 } VirtioBlkHandler;
diff --git a/docs/tools/qemu-storage-daemon.rst 
b/docs/tools/qemu-storage-daemon.rst
index fbeaf76954..034f2809a6 100644
--- a/docs/tools/qemu-storage-daemon.rst
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -77,7 +77,7 @@ Standard options:
   --expor

[PATCH 0/5] Some fixes and improvements for vduse-blk

2022-06-13 Thread Xie Yongji
This series includes few fixes and improvements for the
vduse-blk export.

Patch 1 fixes resources leak when vduse fd is zero.

Patch 2, 3 fixes two bugs which could be triggered
by force deleting a vduse-blk export with high I/O loads.

Patch 4, 5 adds two new options for vduse-blk export.

Xie Yongji (5):
  libvduse: Fix resources leak in vduse_dev_destroy()
  vduse-blk: Don't unlink the reconnect file if device exists
  vduse-blk: Don't delete the export until all inflight I/Os completed
  vduse-blk: Add serial option
  vduse-blk: Add name option

 block/export/vduse-blk.c | 53 ++--
 block/export/vhost-user-blk-server.c |  4 ++-
 block/export/virtio-blk-handler.h|  2 +-
 docs/tools/qemu-storage-daemon.rst   |  5 +--
 qapi/block-export.json   | 11 +++---
 storage-daemon/qemu-storage-daemon.c |  9 ++---
 subprojects/libvduse/libvduse.c  |  4 +--
 7 files changed, 64 insertions(+), 24 deletions(-)

-- 
2.20.1




[PATCH v2] vduse-blk: Add name option

2022-05-31 Thread Xie Yongji
Currently we use 'id' option as the name of VDUSE device.
It's a bit confusing since we use one value for two different
purposes: the ID to identfy the export within QEMU (must be
distinct from any other exports in the same QEMU process, but
can overlap with names used by other processes), and the VDUSE
name to uniquely identify it on the host (must be distinct from
other VDUSE devices on the same host, but can overlap with other
export types like NBD in the same process). To make it clear,
this patch adds a separate 'name ' option to specify the VDUSE
name for the vduse-blk export instead.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 9 ++---
 docs/tools/qemu-storage-daemon.rst   | 5 +++--
 qapi/block-export.json   | 7 ---
 storage-daemon/qemu-storage-daemon.c | 8 
 4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 3b10349173..d96993bdf5 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -245,7 +245,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 }
 vblk_exp->num_queues = num_queues;
 vblk_exp->handler.blk = exp->blk;
-vblk_exp->handler.serial = exp->id;
+vblk_exp->handler.serial = g_strdup(vblk_opts->name);
 vblk_exp->handler.logical_block_size = logical_block_size;
 vblk_exp->handler.writable = opts->writable;
 
@@ -279,22 +279,24 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 features |= 1ULL << VIRTIO_BLK_F_RO;
 }
 
-vblk_exp->dev = vduse_dev_create(exp->id, VIRTIO_ID_BLOCK, 0,
+vblk_exp->dev = vduse_dev_create(vblk_opts->name, VIRTIO_ID_BLOCK, 0,
  features, num_queues,
  sizeof(struct virtio_blk_config),
  (char *)&config, &vduse_blk_ops,
  vblk_exp);
 if (!vblk_exp->dev) {
 error_setg(errp, "failed to create vduse device");
+g_free((void *)vblk_exp->handler.serial);
 return -ENOMEM;
 }
 
 vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
-   g_get_tmp_dir(), exp->id);
+   g_get_tmp_dir(), vblk_opts->name);
 if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
 error_setg(errp, "failed to set reconnect log file");
 vduse_dev_destroy(vblk_exp->dev);
 g_free(vblk_exp->recon_file);
+g_free((void *)vblk_exp->handler.serial);
 return -EINVAL;
 }
 
@@ -323,6 +325,7 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 vduse_dev_destroy(vblk_exp->dev);
 unlink(vblk_exp->recon_file);
 g_free(vblk_exp->recon_file);
+g_free((void *)vblk_exp->handler.serial);
 }
 
 static void vduse_blk_exp_request_shutdown(BlockExport *exp)
diff --git a/docs/tools/qemu-storage-daemon.rst 
b/docs/tools/qemu-storage-daemon.rst
index fbeaf76954..8a20ebd304 100644
--- a/docs/tools/qemu-storage-daemon.rst
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -77,7 +77,7 @@ Standard options:
   --export 
[type=]vhost-user-blk,id=,node-name=,addr.type=unix,addr.path=[,writable=on|off][,logical-block-size=][,num-queues=]
   --export 
[type=]vhost-user-blk,id=,node-name=,addr.type=fd,addr.str=[,writable=on|off][,logical-block-size=][,num-queues=]
   --export 
[type=]fuse,id=,node-name=,mountpoint=[,growable=on|off][,writable=on|off][,allow-other=on|off|auto]
-  --export 
[type=]vduse-blk,id=,node-name=[,writable=on|off][,num-queues=][,queue-size=][,logical-block-size=]
+  --export 
[type=]vduse-blk,id=,node-name=,name=[,writable=on|off][,num-queues=][,queue-size=][,logical-block-size=]
 
   is a block export definition. ``node-name`` is the block node that should be
   exported. ``writable`` determines whether or not the export allows write
@@ -111,7 +111,8 @@ Standard options:
   ``allow-other`` to auto (the default) will try enabling this option, and on
   error fall back to disabling it.
 
-  The ``vduse-blk`` export type uses the ``id`` as the VDUSE device name.
+  The ``vduse-blk`` export type takes a ``name`` (must be unique across the 
host)
+  to create the VDUSE device.
   ``num-queues`` sets the number of virtqueues (the default is 1).
   ``queue-size`` sets the virtqueue descriptor table size (the default is 256).
 
diff --git a/qapi/block-export.json b/qapi/block-export.json
index e4bd4de363..f5a2713e59 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -182,6 +182,7 @@
 #
 # A vduse-blk block export.
 #
+# @name: the name of VDUSE device (must be unique across the host).
 # @num-queues: the number of virtqueues. Defaults to 1.
 # @queue-size: the size of

[PATCH] vduse-blk: Add name option

2022-05-30 Thread Xie Yongji
Currently we use 'id' option as the name of VDUSE device.
It's a bit confusing since we use one value for two different
purposes: the ID to identfy the export within QEMU (must be
distinct from any other exports in the same QEMU process, but
can overlap with names used by other processes), and the VDUSE
name to uniquely identify it on the host (must be distinct from
other VDUSE devices on the same host, but can overlap with other
export types like NBD in the same process). To make it clear,
this patch adds a separate 'name ' option to specify the VDUSE
name for the vduse-blk export instead.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 6 +++---
 docs/tools/qemu-storage-daemon.rst   | 5 +++--
 qapi/block-export.json   | 7 ---
 storage-daemon/qemu-storage-daemon.c | 8 
 4 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 3b10349173..acf2d30e6c 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -245,7 +245,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 }
 vblk_exp->num_queues = num_queues;
 vblk_exp->handler.blk = exp->blk;
-vblk_exp->handler.serial = exp->id;
+vblk_exp->handler.serial = vblk_opts->name;
 vblk_exp->handler.logical_block_size = logical_block_size;
 vblk_exp->handler.writable = opts->writable;
 
@@ -279,7 +279,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 features |= 1ULL << VIRTIO_BLK_F_RO;
 }
 
-vblk_exp->dev = vduse_dev_create(exp->id, VIRTIO_ID_BLOCK, 0,
+vblk_exp->dev = vduse_dev_create(vblk_opts->name, VIRTIO_ID_BLOCK, 0,
  features, num_queues,
  sizeof(struct virtio_blk_config),
  (char *)&config, &vduse_blk_ops,
@@ -290,7 +290,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 }
 
 vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
-   g_get_tmp_dir(), exp->id);
+   g_get_tmp_dir(), vblk_opts->name);
 if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
 error_setg(errp, "failed to set reconnect log file");
 vduse_dev_destroy(vblk_exp->dev);
diff --git a/docs/tools/qemu-storage-daemon.rst 
b/docs/tools/qemu-storage-daemon.rst
index fbeaf76954..8a20ebd304 100644
--- a/docs/tools/qemu-storage-daemon.rst
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -77,7 +77,7 @@ Standard options:
   --export 
[type=]vhost-user-blk,id=,node-name=,addr.type=unix,addr.path=[,writable=on|off][,logical-block-size=][,num-queues=]
   --export 
[type=]vhost-user-blk,id=,node-name=,addr.type=fd,addr.str=[,writable=on|off][,logical-block-size=][,num-queues=]
   --export 
[type=]fuse,id=,node-name=,mountpoint=[,growable=on|off][,writable=on|off][,allow-other=on|off|auto]
-  --export 
[type=]vduse-blk,id=,node-name=[,writable=on|off][,num-queues=][,queue-size=][,logical-block-size=]
+  --export 
[type=]vduse-blk,id=,node-name=,name=[,writable=on|off][,num-queues=][,queue-size=][,logical-block-size=]
 
   is a block export definition. ``node-name`` is the block node that should be
   exported. ``writable`` determines whether or not the export allows write
@@ -111,7 +111,8 @@ Standard options:
   ``allow-other`` to auto (the default) will try enabling this option, and on
   error fall back to disabling it.
 
-  The ``vduse-blk`` export type uses the ``id`` as the VDUSE device name.
+  The ``vduse-blk`` export type takes a ``name`` (must be unique across the 
host)
+  to create the VDUSE device.
   ``num-queues`` sets the number of virtqueues (the default is 1).
   ``queue-size`` sets the virtqueue descriptor table size (the default is 256).
 
diff --git a/qapi/block-export.json b/qapi/block-export.json
index e4bd4de363..f5a2713e59 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -182,6 +182,7 @@
 #
 # A vduse-blk block export.
 #
+# @name: the name of VDUSE device (must be unique across the host).
 # @num-queues: the number of virtqueues. Defaults to 1.
 # @queue-size: the size of virtqueue. Defaults to 256.
 # @logical-block-size: Logical block size in bytes. Range [512, PAGE_SIZE]
@@ -190,7 +191,8 @@
 # Since: 7.1
 ##
 { 'struct': 'BlockExportOptionsVduseBlk',
-  'data': { '*num-queues': 'uint16',
+  'data': { 'name': 'str',
+'*num-queues': 'uint16',
 '*queue-size': 'uint16',
 '*logical-block-size': 'size'} }
 
@@ -314,8 +316,7 @@
 # Describes a block export, i.e. how single n

[PATCH v6 0/8] Support exporting BDSs via VDUSE

2022-05-23 Thread Xie Yongji
Hi all,

Last few months ago, VDUSE (vDPA Device in Userspace) [1] has
been merged into Linux kernel as a framework that make it
possible to emulate a vDPA device in userspace. This series
aimed at implementing a VDUSE block backend based on the
qemu-storage-daemon infrastructure.

To support that, we firstly introduce a VDUSE library as a
subproject (like what libvhost-user does) to help implementing
VDUSE backends in QEMU. Then a VDUSE block export is implemented
based on this library. At last, we add resize and reconnect support
to the VDUSE block export and VDUSE library.

Now this series is based on Stefan's patch [2]. And since we don't
support vdpa-blk in QEMU currently, the VM case is tested with my
previous patchset [3].

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html
[2] https://lore.kernel.org/all/20220518130945.2657905-1-stefa...@redhat.com/
[3] https://www.mail-archive.com/qemu-devel@nongnu.org/msg797569.html

Please review, thanks!

V5 to V6:
- Remove blk_get_guest_block_size() [Stefan]
- A patch is splited to fix incorrect length for
  vhost-user-blk-server
- Define a VirtioBlkHandler structure for virtio-blk
  I/O process [Stefan]
- Add documentation for block export id [Stefan]
- Remove some assert from libvduse library [Stefan]
- Remove unused VIRTIO_BLK_F_SIZE_MAX for vduse block export

V4 to V5:
- Abstract out the logic for virito-blk I/O process from
  vhost-user-blk-server and reuse it [Kevin]
- Fix missing VIRTIO_BLK_F_FLUSH [Kevin]
- Support discard and write_zeroes [Kevin]
- Rebase to the newest tree

V3 to V4:
- Fix some comments on QAPI [Eric]

V2 to V3:
- Introduce vduse_get_virtio_features() [Stefan]
- Update MAINTAINERS file [Stefan]
- Fix handler of VIRTIO_BLK_T_GET_ID request [Stefan]
- Add barrier for vduse_queue_inflight_get() [Stefan]

V1 to V2:
- Move vduse header to linux-headers [Stefan]
- Add two new API to support creating device from /dev/vduse/$NAME or
  file descriptor [Stefan]
- Check VIRTIO_F_VERSION_1 during intialization [Stefan]
- Replace malloc() + memset to calloc() [Stefan]
- Increase default queue size to 256 for vduse-blk [Stefan]
- Zero-initialize virtio-blk config space [Stefan]
- Add a patch to support reset blk->dev_ops
- Validate vq->log->inflight fields [Stefan]
- Add vduse_set_reconnect_log_file() API to support specifing the
  reconnect log file
- Fix some bugs [Stefan]

Xie Yongji (8):
  block: Support passing NULL ops to blk_set_dev_ops()
  block/export: Fix incorrect length passed to vu_queue_push()
  block/export: Abstract out the logic of virtio-blk I/O process
  linux-headers: Add vduse.h
  libvduse: Add VDUSE (vDPA Device in Userspace) library
  vduse-blk: Implement vduse-blk export
  vduse-blk: Add vduse-blk resize support
  libvduse: Add support for reconnecting

 MAINTAINERS |9 +
 block/block-backend.c   |2 +-
 block/export/export.c   |6 +
 block/export/meson.build|7 +-
 block/export/vduse-blk.c|  341 +
 block/export/vduse-blk.h|   20 +
 block/export/vhost-user-blk-server.c|  260 +---
 block/export/virtio-blk-handler.c   |  240 
 block/export/virtio-blk-handler.h   |   37 +
 linux-headers/linux/vduse.h |  306 
 meson.build |   28 +
 meson_options.txt   |4 +
 qapi/block-export.json  |   28 +-
 scripts/meson-buildoptions.sh   |7 +
 scripts/update-linux-headers.sh |2 +-
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/include/compiler.h |1 +
 subprojects/libvduse/libvduse.c | 1392 +++
 subprojects/libvduse/libvduse.h |  247 
 subprojects/libvduse/linux-headers/linux|1 +
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 22 files changed, 2705 insertions(+), 245 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h
 create mode 100644 block/export/virtio-blk-handler.c
 create mode 100644 block/export/virtio-blk-handler.h
 create mode 100644 linux-headers/linux/vduse.h
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 12 subprojects/libvduse/include/compiler.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 12 subprojects/libvduse/linux-headers/linux
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

-- 
2.20.1




[PATCH v6 8/8] libvduse: Add support for reconnecting

2022-05-23 Thread Xie Yongji
To support reconnecting after restart or crash, VDUSE backend
might need to resubmit inflight I/Os. This stores the metadata
such as the index of inflight I/O's descriptors to a shm file so
that VDUSE backend can restore them during reconnecting.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c|  14 ++
 subprojects/libvduse/libvduse.c | 235 +++-
 subprojects/libvduse/libvduse.h |  12 ++
 3 files changed, 256 insertions(+), 5 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 1040130f52..3b10349173 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -30,6 +30,7 @@ typedef struct VduseBlkExport {
 VirtioBlkHandler handler;
 VduseDev *dev;
 uint16_t num_queues;
+char *recon_file;
 } VduseBlkExport;
 
 typedef struct VduseBlkReq {
@@ -107,6 +108,8 @@ static void vduse_blk_enable_queue(VduseDev *dev, 
VduseVirtq *vq)
 
 aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+/* Make sure we don't miss any kick afer reconnecting */
+eventfd_write(vduse_queue_get_fd(vq), 1);
 }
 
 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
@@ -286,6 +289,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 return -ENOMEM;
 }
 
+vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
+   g_get_tmp_dir(), exp->id);
+if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
+error_setg(errp, "failed to set reconnect log file");
+vduse_dev_destroy(vblk_exp->dev);
+g_free(vblk_exp->recon_file);
+return -EINVAL;
+}
+
 for (i = 0; i < num_queues; i++) {
 vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
 }
@@ -309,6 +321,8 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 vblk_exp);
 blk_set_dev_ops(exp->blk, NULL, NULL);
 vduse_dev_destroy(vblk_exp->dev);
+unlink(vblk_exp->recon_file);
+g_free(vblk_exp->recon_file);
 }
 
 static void vduse_blk_exp_request_shutdown(BlockExport *exp)
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index fa4822b9a9..78bb777402 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -41,6 +41,8 @@
 #define VDUSE_VQ_ALIGN 4096
 #define MAX_IOVA_REGIONS 256
 
+#define LOG_ALIGNMENT 64
+
 /* Round number down to multiple */
 #define ALIGN_DOWN(n, m) ((n) / (m) * (m))
 
@@ -51,6 +53,31 @@
 #define unlikely(x)   __builtin_expect(!!(x), 0)
 #endif
 
+typedef struct VduseDescStateSplit {
+uint8_t inflight;
+uint8_t padding[5];
+uint16_t next;
+uint64_t counter;
+} VduseDescStateSplit;
+
+typedef struct VduseVirtqLogInflight {
+uint64_t features;
+uint16_t version;
+uint16_t desc_num;
+uint16_t last_batch_head;
+uint16_t used_idx;
+VduseDescStateSplit desc[];
+} VduseVirtqLogInflight;
+
+typedef struct VduseVirtqLog {
+VduseVirtqLogInflight inflight;
+} VduseVirtqLog;
+
+typedef struct VduseVirtqInflightDesc {
+uint16_t index;
+uint64_t counter;
+} VduseVirtqInflightDesc;
+
 typedef struct VduseRing {
 unsigned int num;
 uint64_t desc_addr;
@@ -73,6 +100,10 @@ struct VduseVirtq {
 bool ready;
 int fd;
 VduseDev *dev;
+VduseVirtqInflightDesc *resubmit_list;
+uint16_t resubmit_num;
+uint64_t counter;
+VduseVirtqLog *log;
 };
 
 typedef struct VduseIovaRegion {
@@ -96,8 +127,36 @@ struct VduseDev {
 int fd;
 int ctrl_fd;
 void *priv;
+void *log;
 };
 
+static inline size_t vduse_vq_log_size(uint16_t queue_size)
+{
+return ALIGN_UP(sizeof(VduseDescStateSplit) * queue_size +
+sizeof(VduseVirtqLogInflight), LOG_ALIGNMENT);
+}
+
+static void *vduse_log_get(const char *filename, size_t size)
+{
+void *ptr = MAP_FAILED;
+int fd;
+
+fd = open(filename, O_RDWR | O_CREAT, 0600);
+if (fd == -1) {
+return MAP_FAILED;
+}
+
+if (ftruncate(fd, size) == -1) {
+goto out;
+}
+
+ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+
+out:
+close(fd);
+return ptr;
+}
+
 static inline bool has_feature(uint64_t features, unsigned int fbit)
 {
 assert(fbit < 64);
@@ -148,6 +207,105 @@ static int vduse_inject_irq(VduseDev *dev, int index)
 return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, &index);
 }
 
+static int inflight_desc_compare(const void *a, const void *b)
+{
+VduseVirtqInflightDesc *desc0 = (VduseVirtqInflightDesc *)a,
+   *desc1 = (VduseVirtqInflightDesc *)b;
+
+if (desc1->counter > desc0->counter &&
+(desc1->counter - desc0->counter) < VIRTQUEUE_MAX_SIZE * 2) {
+return 1

[PATCH v6 7/8] vduse-blk: Add vduse-blk resize support

2022-05-23 Thread Xie Yongji
To support block resize, this uses vduse_dev_update_config()
to update the capacity field in configuration space and inject
config interrupt on the block resize callback.

Signed-off-by: Xie Yongji 
Reviewed-by: Stefan Hajnoczi 
---
 block/export/vduse-blk.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 143d58a3f2..1040130f52 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -184,6 +184,23 @@ static void blk_aio_detach(void *opaque)
 vblk_exp->export.ctx = NULL;
 }
 
+static void vduse_blk_resize(void *opaque)
+{
+BlockExport *exp = opaque;
+VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+struct virtio_blk_config config;
+
+config.capacity =
+cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
+vduse_dev_update_config(vblk_exp->dev, sizeof(config.capacity),
+offsetof(struct virtio_blk_config, capacity),
+(char *)&config.capacity);
+}
+
+static const BlockDevOps vduse_block_ops = {
+.resize_cb = vduse_blk_resize,
+};
+
 static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
 Error **errp)
 {
@@ -279,6 +296,8 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
  vblk_exp);
 
+blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
+
 return 0;
 }
 
@@ -288,6 +307,7 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 vblk_exp);
+blk_set_dev_ops(exp->blk, NULL, NULL);
 vduse_dev_destroy(vblk_exp->dev);
 }
 
-- 
2.20.1




[PATCH v6 6/8] vduse-blk: Implement vduse-blk export

2022-05-23 Thread Xie Yongji
This implements a VDUSE block backends based on
the libvduse library. We can use it to export the BDSs
for both VM and container (host) usage.

The new command-line syntax is:

$ qemu-storage-daemon \
--blockdev file,node-name=drive0,filename=test.img \
--export vduse-blk,node-name=drive0,id=vduse-export0,writable=on

After the qemu-storage-daemon started, we need to use
the "vdpa" command to attach the device to vDPA bus:

$ vdpa dev add name vduse-export0 mgmtdev vduse

Also the device must be removed via the "vdpa" command
before we stop the qemu-storage-daemon.

Signed-off-by: Xie Yongji 
Reviewed-by: Stefan Hajnoczi 
---
 MAINTAINERS   |   4 +-
 block/export/export.c |   6 +
 block/export/meson.build  |   5 +
 block/export/vduse-blk.c  | 307 ++
 block/export/vduse-blk.h  |  20 +++
 meson.build   |  13 ++
 meson_options.txt |   2 +
 qapi/block-export.json|  28 +++-
 scripts/meson-buildoptions.sh |   4 +
 9 files changed, 385 insertions(+), 4 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 966e07b7a0..d6fc0285a1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3554,10 +3554,12 @@ L: qemu-block@nongnu.org
 S: Supported
 F: block/export/fuse.c
 
-VDUSE library
+VDUSE library and block device exports
 M: Xie Yongji 
 S: Maintained
 F: subprojects/libvduse/
+F: block/export/vduse-blk.c
+F: block/export/vduse-blk.h
 
 Replication
 M: Wen Congyang 
diff --git a/block/export/export.c b/block/export/export.c
index 7253af3bc3..4744862915 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -26,6 +26,9 @@
 #ifdef CONFIG_VHOST_USER_BLK_SERVER
 #include "vhost-user-blk-server.h"
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+#include "vduse-blk.h"
+#endif
 
 static const BlockExportDriver *blk_exp_drivers[] = {
 &blk_exp_nbd,
@@ -35,6 +38,9 @@ static const BlockExportDriver *blk_exp_drivers[] = {
 #ifdef CONFIG_FUSE
 &blk_exp_fuse,
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+&blk_exp_vduse_blk,
+#endif
 };
 
 /* Only accessed from the main thread */
diff --git a/block/export/meson.build b/block/export/meson.build
index 431e47ca51..c60116f455 100644
--- a/block/export/meson.build
+++ b/block/export/meson.build
@@ -5,3 +5,8 @@ if have_vhost_user_blk_server
 endif
 
 blockdev_ss.add(when: fuse, if_true: files('fuse.c'))
+
+if have_vduse_blk_export
+blockdev_ss.add(files('vduse-blk.c', 'virtio-blk-handler.c'))
+blockdev_ss.add(libvduse)
+endif
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
new file mode 100644
index 00..143d58a3f2
--- /dev/null
+++ b/block/export/vduse-blk.c
@@ -0,0 +1,307 @@
+/*
+ * Export QEMU block device via VDUSE
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author:
+ *   Xie Yongji 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "block/export.h"
+#include "qemu/error-report.h"
+#include "util/block-helpers.h"
+#include "subprojects/libvduse/libvduse.h"
+#include "virtio-blk-handler.h"
+
+#include "standard-headers/linux/virtio_blk.h"
+
+#define VDUSE_DEFAULT_NUM_QUEUE 1
+#define VDUSE_DEFAULT_QUEUE_SIZE 256
+
+typedef struct VduseBlkExport {
+BlockExport export;
+VirtioBlkHandler handler;
+VduseDev *dev;
+uint16_t num_queues;
+} VduseBlkExport;
+
+typedef struct VduseBlkReq {
+VduseVirtqElement elem;
+VduseVirtq *vq;
+} VduseBlkReq;
+
+static void vduse_blk_req_complete(VduseBlkReq *req, size_t in_len)
+{
+vduse_queue_push(req->vq, &req->elem, in_len);
+vduse_queue_notify(req->vq);
+
+free(req);
+}
+
+static void coroutine_fn vduse_blk_virtio_process_req(void *opaque)
+{
+VduseBlkReq *req = opaque;
+VduseVirtq *vq = req->vq;
+VduseDev *dev = vduse_queue_get_dev(vq);
+VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+VirtioBlkHandler *handler = &vblk_exp->handler;
+VduseVirtqElement *elem = &req->elem;
+struct iovec *in_iov = elem->in_sg;
+struct iovec *out_iov = elem->out_sg;
+unsigned in_num = elem->in_num;
+unsigned out_num = elem->out_num;
+int in_len;
+
+in_len = virtio_blk_process_req(handler, in_iov,
+out_iov, in_num, out_num);
+if (in_len < 0) {
+free(req);
+return;
+}
+
+vduse_blk_req_complete(req, in_len);
+}
+
+static void vduse_blk_vq_handler(VduseDev *dev, VduseVirtq *vq)
+{
+while (1) {
+VduseBlkReq *req;
+
+req = vduse

[PATCH v6 1/8] block: Support passing NULL ops to blk_set_dev_ops()

2022-05-23 Thread Xie Yongji
This supports passing NULL ops to blk_set_dev_ops()
so that we can remove stale ops in some cases.

Signed-off-by: Xie Yongji 
Reviewed-by: Stefan Hajnoczi 
---
 block/block-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index e0e1aff4b1..35457a6a1d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1062,7 +1062,7 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps 
*ops,
 blk->dev_opaque = opaque;
 
 /* Are we currently quiesced? Should we enforce this right now? */
-if (blk->quiesce_counter && ops->drained_begin) {
+if (blk->quiesce_counter && ops && ops->drained_begin) {
 ops->drained_begin(opaque);
 }
 }
-- 
2.20.1




[PATCH v6 5/8] libvduse: Add VDUSE (vDPA Device in Userspace) library

2022-05-23 Thread Xie Yongji
VDUSE [1] is a linux framework that makes it possible to implement
software-emulated vDPA devices in userspace. This adds a library
as a subproject to help implementing VDUSE backends in QEMU.

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html

Signed-off-by: Xie Yongji 
---
 MAINTAINERS |5 +
 meson.build |   15 +
 meson_options.txt   |2 +
 scripts/meson-buildoptions.sh   |3 +
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/include/compiler.h |1 +
 subprojects/libvduse/libvduse.c | 1167 +++
 subprojects/libvduse/libvduse.h |  235 
 subprojects/libvduse/linux-headers/linux|1 +
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 11 files changed, 1441 insertions(+)
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 12 subprojects/libvduse/include/compiler.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 12 subprojects/libvduse/linux-headers/linux
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

diff --git a/MAINTAINERS b/MAINTAINERS
index 01fb25421b..966e07b7a0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3554,6 +3554,11 @@ L: qemu-block@nongnu.org
 S: Supported
 F: block/export/fuse.c
 
+VDUSE library
+M: Xie Yongji 
+S: Maintained
+F: subprojects/libvduse/
+
 Replication
 M: Wen Congyang 
 M: Xie Changlong 
diff --git a/meson.build b/meson.build
index 9ebc00f032..3f70ccec60 100644
--- a/meson.build
+++ b/meson.build
@@ -1514,6 +1514,21 @@ if get_option('fuse_lseek').allowed()
   endif
 endif
 
+have_libvduse = (targetos == 'linux')
+if get_option('libvduse').enabled()
+if targetos != 'linux'
+error('libvduse requires linux')
+endif
+elif get_option('libvduse').disabled()
+have_libvduse = false
+endif
+
+libvduse = not_found
+if have_libvduse
+  libvduse_proj = subproject('libvduse')
+  libvduse = libvduse_proj.get_variable('libvduse_dep')
+endif
+
 # libbpf
 libbpf = dependency('libbpf', required: get_option('bpf'), method: 
'pkg-config')
 if libbpf.found() and not cc.links('''
diff --git a/meson_options.txt b/meson_options.txt
index 2de94af037..50da8dea94 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -253,6 +253,8 @@ option('virtfs', type: 'feature', value: 'auto',
description: 'virtio-9p support')
 option('virtiofsd', type: 'feature', value: 'auto',
description: 'build virtiofs daemon (virtiofsd)')
+option('libvduse', type: 'feature', value: 'auto',
+   description: 'build VDUSE Library')
 
 option('capstone', type: 'feature', value: 'auto',
description: 'Whether and how to find the capstone library')
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 731e5ea1cf..957a46ec89 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -109,6 +109,7 @@ meson_options_help() {
   printf "%s\n" '  libssh  ssh block device support'
   printf "%s\n" '  libudev Use libudev to enumerate host devices'
   printf "%s\n" '  libusb  libusb support for USB passthrough'
+  printf "%s\n" '  libvdusebuild VDUSE Library'
   printf "%s\n" '  linux-aio   Linux AIO support'
   printf "%s\n" '  linux-io-uring  Linux io_uring support'
   printf "%s\n" '  live-block-migration'
@@ -302,6 +303,8 @@ _meson_option_parse() {
 --disable-libudev) printf "%s" -Dlibudev=disabled ;;
 --enable-libusb) printf "%s" -Dlibusb=enabled ;;
 --disable-libusb) printf "%s" -Dlibusb=disabled ;;
+--enable-libvduse) printf "%s" -Dlibvduse=enabled ;;
+--disable-libvduse) printf "%s" -Dlibvduse=disabled ;;
 --enable-linux-aio) printf "%s" -Dlinux_aio=enabled ;;
 --disable-linux-aio) printf "%s" -Dlinux_aio=disabled ;;
 --enable-linux-io-uring) printf "%s" -Dlinux_io_uring=enabled ;;
diff --git a/subprojects/libvduse/include/atomic.h 
b/subprojects/libvduse/include/atomic.h
new file mode 12
index 00..8c2be64f7b
--- /dev/null
+++ b/subprojects/libvduse/include/atomic.h
@@ -0,0 +1 @@
+../../../include/qemu/atomic.h
\ No newline at end of file
diff --git a/subprojects/libvduse/include/compiler.h 
b/subprojects/libvduse/include/compiler.

[PATCH v6 2/8] block/export: Fix incorrect length passed to vu_queue_push()

2022-05-23 Thread Xie Yongji
Now the req->size is set to the correct value only
when handling VIRTIO_BLK_T_GET_ID request. This patch
fixes it.

Signed-off-by: Xie Yongji 
---
 block/export/vhost-user-blk-server.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index b2e458ade3..19c6ee51d3 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -60,8 +60,7 @@ static void vu_blk_req_complete(VuBlkReq *req)
 {
 VuDev *vu_dev = &req->server->vu_dev;
 
-/* IO size with 1 extra status byte */
-vu_queue_push(vu_dev, req->vq, &req->elem, req->size + 1);
+vu_queue_push(vu_dev, req->vq, &req->elem, req->size);
 vu_queue_notify(vu_dev, req->vq);
 
 free(req);
@@ -207,6 +206,7 @@ static void coroutine_fn vu_blk_virtio_process_req(void 
*opaque)
 goto err;
 }
 
+req->size = iov_size(in_iov, in_num);
 /* We always touch the last byte, so just see how big in_iov is.  */
 req->in = (void *)in_iov[in_num - 1].iov_base
   + in_iov[in_num - 1].iov_len
@@ -267,7 +267,6 @@ static void coroutine_fn vu_blk_virtio_process_req(void 
*opaque)
   VIRTIO_BLK_ID_BYTES);
 snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk");
 req->in->status = VIRTIO_BLK_S_OK;
-req->size = elem->in_sg[0].iov_len;
 break;
 }
 case VIRTIO_BLK_T_DISCARD:
-- 
2.20.1




[PATCH v6 4/8] linux-headers: Add vduse.h

2022-05-23 Thread Xie Yongji
This adds vduse header to linux headers so that the
relevant VDUSE API can be used in subsequent patches.

Signed-off-by: Xie Yongji 
Reviewed-by: Stefan Hajnoczi 
---
 linux-headers/linux/vduse.h | 306 
 scripts/update-linux-headers.sh |   2 +-
 2 files changed, 307 insertions(+), 1 deletion(-)
 create mode 100644 linux-headers/linux/vduse.h

diff --git a/linux-headers/linux/vduse.h b/linux-headers/linux/vduse.h
new file mode 100644
index 00..d47b004ce6
--- /dev/null
+++ b/linux-headers/linux/vduse.h
@@ -0,0 +1,306 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _VDUSE_H_
+#define _VDUSE_H_
+
+#include 
+
+#define VDUSE_BASE 0x81
+
+/* The ioctls for control device (/dev/vduse/control) */
+
+#define VDUSE_API_VERSION  0
+
+/*
+ * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION).
+ * This is used for future extension.
+ */
+#define VDUSE_GET_API_VERSION  _IOR(VDUSE_BASE, 0x00, __u64)
+
+/* Set the version of VDUSE API that userspace supported. */
+#define VDUSE_SET_API_VERSION  _IOW(VDUSE_BASE, 0x01, __u64)
+
+/**
+ * struct vduse_dev_config - basic configuration of a VDUSE device
+ * @name: VDUSE device name, needs to be NUL terminated
+ * @vendor_id: virtio vendor id
+ * @device_id: virtio device id
+ * @features: virtio features
+ * @vq_num: the number of virtqueues
+ * @vq_align: the allocation alignment of virtqueue's metadata
+ * @reserved: for future use, needs to be initialized to zero
+ * @config_size: the size of the configuration space
+ * @config: the buffer of the configuration space
+ *
+ * Structure used by VDUSE_CREATE_DEV ioctl to create VDUSE device.
+ */
+struct vduse_dev_config {
+#define VDUSE_NAME_MAX 256
+   char name[VDUSE_NAME_MAX];
+   __u32 vendor_id;
+   __u32 device_id;
+   __u64 features;
+   __u32 vq_num;
+   __u32 vq_align;
+   __u32 reserved[13];
+   __u32 config_size;
+   __u8 config[];
+};
+
+/* Create a VDUSE device which is represented by a char device 
(/dev/vduse/$NAME) */
+#define VDUSE_CREATE_DEV   _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config)
+
+/*
+ * Destroy a VDUSE device. Make sure there are no more references
+ * to the char device (/dev/vduse/$NAME).
+ */
+#define VDUSE_DESTROY_DEV  _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX])
+
+/* The ioctls for VDUSE device (/dev/vduse/$NAME) */
+
+/**
+ * struct vduse_iotlb_entry - entry of IOTLB to describe one IOVA region 
[start, last]
+ * @offset: the mmap offset on returned file descriptor
+ * @start: start of the IOVA region
+ * @last: last of the IOVA region
+ * @perm: access permission of the IOVA region
+ *
+ * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA 
region.
+ */
+struct vduse_iotlb_entry {
+   __u64 offset;
+   __u64 start;
+   __u64 last;
+#define VDUSE_ACCESS_RO 0x1
+#define VDUSE_ACCESS_WO 0x2
+#define VDUSE_ACCESS_RW 0x3
+   __u8 perm;
+};
+
+/*
+ * Find the first IOVA region that overlaps with the range [start, last]
+ * and return the corresponding file descriptor. Return -EINVAL means the
+ * IOVA region doesn't exist. Caller should set start and last fields.
+ */
+#define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x10, struct 
vduse_iotlb_entry)
+
+/*
+ * Get the negotiated virtio features. It's a subset of the features in
+ * struct vduse_dev_config which can be accepted by virtio driver. It's
+ * only valid after FEATURES_OK status bit is set.
+ */
+#define VDUSE_DEV_GET_FEATURES _IOR(VDUSE_BASE, 0x11, __u64)
+
+/**
+ * struct vduse_config_data - data used to update configuration space
+ * @offset: the offset from the beginning of configuration space
+ * @length: the length to write to configuration space
+ * @buffer: the buffer used to write from
+ *
+ * Structure used by VDUSE_DEV_SET_CONFIG ioctl to update device
+ * configuration space.
+ */
+struct vduse_config_data {
+   __u32 offset;
+   __u32 length;
+   __u8 buffer[];
+};
+
+/* Set device configuration space */
+#define VDUSE_DEV_SET_CONFIG   _IOW(VDUSE_BASE, 0x12, struct vduse_config_data)
+
+/*
+ * Inject a config interrupt. It's usually used to notify virtio driver
+ * that device configuration space has changed.
+ */
+#define VDUSE_DEV_INJECT_CONFIG_IRQ_IO(VDUSE_BASE, 0x13)
+
+/**
+ * struct vduse_vq_config - basic configuration of a virtqueue
+ * @index: virtqueue index
+ * @max_size: the max size of virtqueue
+ * @reserved: for future use, needs to be initialized to zero
+ *
+ * Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue.
+ */
+struct vduse_vq_config {
+   __u32 index;
+   __u16 max_size;
+   __u16 reserved[13];
+};
+
+/*
+ * Setup the specified virtqueue. Make sure all virtqueues have been
+ * configured before the device is attached to vDPA bus.
+ */
+#define VDUSE_VQ_SETUP _IOW(VDUSE_BASE, 0x14, struct vduse_vq_config)
+
+/**
+ * struct vduse_vq_state_spli

[PATCH v6 3/8] block/export: Abstract out the logic of virtio-blk I/O process

2022-05-23 Thread Xie Yongji
Abstract the common logic of virtio-blk I/O process to a function
named virtio_blk_process_req(). It's needed for the following commit.

Signed-off-by: Xie Yongji 
---
 MAINTAINERS  |   2 +
 block/export/meson.build |   2 +-
 block/export/vhost-user-blk-server.c | 259 +++
 block/export/virtio-blk-handler.c| 240 +
 block/export/virtio-blk-handler.h|  37 
 5 files changed, 301 insertions(+), 239 deletions(-)
 create mode 100644 block/export/virtio-blk-handler.c
 create mode 100644 block/export/virtio-blk-handler.h

diff --git a/MAINTAINERS b/MAINTAINERS
index dff0200f70..01fb25421b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3540,6 +3540,8 @@ M: Coiby Xu 
 S: Maintained
 F: block/export/vhost-user-blk-server.c
 F: block/export/vhost-user-blk-server.h
+F: block/export/virtio-blk-handler.c
+F: block/export/virtio-blk-handler.h
 F: include/qemu/vhost-user-server.h
 F: tests/qtest/libqos/vhost-user-blk.c
 F: tests/qtest/libqos/vhost-user-blk.h
diff --git a/block/export/meson.build b/block/export/meson.build
index 0a08e384c7..431e47ca51 100644
--- a/block/export/meson.build
+++ b/block/export/meson.build
@@ -1,7 +1,7 @@
 blockdev_ss.add(files('export.c'))
 
 if have_vhost_user_blk_server
-blockdev_ss.add(files('vhost-user-blk-server.c'))
+blockdev_ss.add(files('vhost-user-blk-server.c', 'virtio-blk-handler.c'))
 endif
 
 blockdev_ss.add(when: fuse, if_true: files('fuse.c'))
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index 19c6ee51d3..c9c290cc4c 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -17,31 +17,15 @@
 #include "vhost-user-blk-server.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
-#include "sysemu/block-backend.h"
 #include "util/block-helpers.h"
-
-/*
- * Sector units are 512 bytes regardless of the
- * virtio_blk_config->blk_size value.
- */
-#define VIRTIO_BLK_SECTOR_BITS 9
-#define VIRTIO_BLK_SECTOR_SIZE (1ull << VIRTIO_BLK_SECTOR_BITS)
+#include "virtio-blk-handler.h"
 
 enum {
 VHOST_USER_BLK_NUM_QUEUES_DEFAULT = 1,
-VHOST_USER_BLK_MAX_DISCARD_SECTORS = 32768,
-VHOST_USER_BLK_MAX_WRITE_ZEROES_SECTORS = 32768,
-};
-struct virtio_blk_inhdr {
-unsigned char status;
 };
 
 typedef struct VuBlkReq {
 VuVirtqElement elem;
-int64_t sector_num;
-size_t size;
-struct virtio_blk_inhdr *in;
-struct virtio_blk_outhdr out;
 VuServer *server;
 struct VuVirtq *vq;
 } VuBlkReq;
@@ -50,247 +34,44 @@ typedef struct VuBlkReq {
 typedef struct {
 BlockExport export;
 VuServer vu_server;
-uint32_t blk_size;
+VirtioBlkHandler handler;
 QIOChannelSocket *sioc;
 struct virtio_blk_config blkcfg;
-bool writable;
 } VuBlkExport;
 
-static void vu_blk_req_complete(VuBlkReq *req)
+static void vu_blk_req_complete(VuBlkReq *req, size_t in_len)
 {
 VuDev *vu_dev = &req->server->vu_dev;
 
-vu_queue_push(vu_dev, req->vq, &req->elem, req->size);
+vu_queue_push(vu_dev, req->vq, &req->elem, in_len);
 vu_queue_notify(vu_dev, req->vq);
 
 free(req);
 }
 
-static bool vu_blk_sect_range_ok(VuBlkExport *vexp, uint64_t sector,
- size_t size)
-{
-uint64_t nb_sectors;
-uint64_t total_sectors;
-
-if (size % VIRTIO_BLK_SECTOR_SIZE) {
-return false;
-}
-
-nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS;
-
-QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE);
-if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
-return false;
-}
-if ((sector << VIRTIO_BLK_SECTOR_BITS) % vexp->blk_size) {
-return false;
-}
-blk_get_geometry(vexp->export.blk, &total_sectors);
-if (sector > total_sectors || nb_sectors > total_sectors - sector) {
-return false;
-}
-return true;
-}
-
-static int coroutine_fn
-vu_blk_discard_write_zeroes(VuBlkExport *vexp, struct iovec *iov,
-uint32_t iovcnt, uint32_t type)
-{
-BlockBackend *blk = vexp->export.blk;
-struct virtio_blk_discard_write_zeroes desc;
-ssize_t size;
-uint64_t sector;
-uint32_t num_sectors;
-uint32_t max_sectors;
-uint32_t flags;
-int bytes;
-
-/* Only one desc is currently supported */
-if (unlikely(iov_size(iov, iovcnt) > sizeof(desc))) {
-return VIRTIO_BLK_S_UNSUPP;
-}
-
-size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
-if (unlikely(size != sizeof(desc))) {
-error_report("Invalid size %zd, expected %zu", size, sizeof(desc));
-return VIRTIO_BLK_S_IOERR;
-}
-
-sector = le64_to_cpu(desc.sector);
-num_sectors = le32_to_cpu(desc.num_sectors);
-flags = le32_to

[PATCH v5 5/8] libvduse: Add VDUSE (vDPA Device in Userspace) library

2022-05-04 Thread Xie Yongji
VDUSE [1] is a linux framework that makes it possible to implement
software-emulated vDPA devices in userspace. This adds a library
as a subproject to help implementing VDUSE backends in QEMU.

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html

Signed-off-by: Xie Yongji 
---
 MAINTAINERS |5 +
 meson.build |   15 +
 meson_options.txt   |2 +
 scripts/meson-buildoptions.sh   |3 +
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/include/compiler.h |1 +
 subprojects/libvduse/libvduse.c | 1161 +++
 subprojects/libvduse/libvduse.h |  235 
 subprojects/libvduse/linux-headers/linux|1 +
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 11 files changed, 1435 insertions(+)
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 12 subprojects/libvduse/include/compiler.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 12 subprojects/libvduse/linux-headers/linux
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

diff --git a/MAINTAINERS b/MAINTAINERS
index 4113b6fc5c..6de3cbaa1e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3550,6 +3550,11 @@ L: qemu-block@nongnu.org
 S: Supported
 F: block/export/fuse.c
 
+VDUSE library
+M: Xie Yongji 
+S: Maintained
+F: subprojects/libvduse/
+
 Replication
 M: Wen Congyang 
 M: Xie Changlong 
diff --git a/meson.build b/meson.build
index 1fe7d257ff..4a0f1a2016 100644
--- a/meson.build
+++ b/meson.build
@@ -1392,6 +1392,21 @@ if get_option('fuse_lseek').allowed()
   endif
 endif
 
+have_libvduse = (targetos == 'linux')
+if get_option('libvduse').enabled()
+if targetos != 'linux'
+error('libvduse requires linux')
+endif
+elif get_option('libvduse').disabled()
+have_libvduse = false
+endif
+
+libvduse = not_found
+if have_libvduse
+  libvduse_proj = subproject('libvduse')
+  libvduse = libvduse_proj.get_variable('libvduse_dep')
+endif
+
 # libbpf
 libbpf = dependency('libbpf', required: get_option('bpf'), method: 
'pkg-config')
 if libbpf.found() and not cc.links('''
diff --git a/meson_options.txt b/meson_options.txt
index af432a4ee6..47955f972d 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -231,6 +231,8 @@ option('virtfs', type: 'feature', value: 'auto',
description: 'virtio-9p support')
 option('virtiofsd', type: 'feature', value: 'auto',
description: 'build virtiofs daemon (virtiofsd)')
+option('libvduse', type: 'feature', value: 'auto',
+   description: 'build VDUSE Library')
 
 option('capstone', type: 'combo', value: 'auto',
choices: ['disabled', 'enabled', 'auto', 'system', 'internal'],
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 21366b2102..f725636ea8 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -81,6 +81,7 @@ meson_options_help() {
   printf "%s\n" '  libssh  ssh block device support'
   printf "%s\n" '  libudev Use libudev to enumerate host devices'
   printf "%s\n" '  libusb  libusb support for USB passthrough'
+  printf "%s\n" '  libvdusebuild VDUSE Library'
   printf "%s\n" '  linux-aio   Linux AIO support'
   printf "%s\n" '  linux-io-uring  Linux io_uring support'
   printf "%s\n" '  live-block-migration'
@@ -255,6 +256,8 @@ _meson_option_parse() {
 --disable-libudev) printf "%s" -Dlibudev=disabled ;;
 --enable-libusb) printf "%s" -Dlibusb=enabled ;;
 --disable-libusb) printf "%s" -Dlibusb=disabled ;;
+--enable-libvduse) printf "%s" -Dlibvduse=enabled ;;
+--disable-libvduse) printf "%s" -Dlibvduse=disabled ;;
 --enable-linux-aio) printf "%s" -Dlinux_aio=enabled ;;
 --disable-linux-aio) printf "%s" -Dlinux_aio=disabled ;;
 --enable-linux-io-uring) printf "%s" -Dlinux_io_uring=enabled ;;
diff --git a/subprojects/libvduse/include/atomic.h 
b/subprojects/libvduse/include/atomic.h
new file mode 12
index 00..8c2be64f7b
--- /dev/null
+++ b/subprojects/libvduse/include/atomic.h
@@ -0,0 +1 @@
+../../../include/qemu/atomic.h
\ No newline at end of file
diff --git a/subprojects/libvduse/include/compiler.h 

[PATCH v5 7/8] vduse-blk: Add vduse-blk resize support

2022-05-04 Thread Xie Yongji
To support block resize, this uses vduse_dev_update_config()
to update the capacity field in configuration space and inject
config interrupt on the block resize callback.

Signed-off-by: Xie Yongji 
Reviewed-by: Stefan Hajnoczi 
---
 block/export/vduse-blk.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 8580ae929f..2b72baf7ab 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -188,6 +188,23 @@ static void blk_aio_detach(void *opaque)
 vblk_exp->export.ctx = NULL;
 }
 
+static void vduse_blk_resize(void *opaque)
+{
+BlockExport *exp = opaque;
+VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+struct virtio_blk_config config;
+
+config.capacity =
+cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
+vduse_dev_update_config(vblk_exp->dev, sizeof(config.capacity),
+offsetof(struct virtio_blk_config, capacity),
+(char *)&config.capacity);
+}
+
+static const BlockDevOps vduse_block_ops = {
+.resize_cb = vduse_blk_resize,
+};
+
 static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
 Error **errp)
 {
@@ -284,6 +301,8 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
  vblk_exp);
 
+blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
+
 return 0;
 }
 
@@ -293,6 +312,7 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 vblk_exp);
+blk_set_dev_ops(exp->blk, NULL, NULL);
 vduse_dev_destroy(vblk_exp->dev);
 }
 
-- 
2.20.1




[PATCH v5 8/8] libvduse: Add support for reconnecting

2022-05-04 Thread Xie Yongji
To support reconnecting after restart or crash, VDUSE backend
might need to resubmit inflight I/Os. This stores the metadata
such as the index of inflight I/O's descriptors to a shm file so
that VDUSE backend can restore them during reconnecting.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c|  14 ++
 subprojects/libvduse/libvduse.c | 235 +++-
 subprojects/libvduse/libvduse.h |  12 ++
 3 files changed, 256 insertions(+), 5 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 2b72baf7ab..1fd787a862 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -33,6 +33,7 @@ typedef struct VduseBlkExport {
 VduseDev *dev;
 uint16_t num_queues;
 bool writable;
+char *recon_file;
 } VduseBlkExport;
 
 typedef struct VduseBlkReq {
@@ -111,6 +112,8 @@ static void vduse_blk_enable_queue(VduseDev *dev, 
VduseVirtq *vq)
 
 aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+/* Make sure we don't miss any kick afer reconnecting */
+eventfd_write(vduse_queue_get_fd(vq), 1);
 }
 
 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
@@ -291,6 +294,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 return -ENOMEM;
 }
 
+vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
+   g_get_tmp_dir(), exp->id);
+if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
+error_setg(errp, "failed to set reconnect log file");
+vduse_dev_destroy(vblk_exp->dev);
+g_free(vblk_exp->recon_file);
+return -EINVAL;
+}
+
 for (i = 0; i < num_queues; i++) {
 vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
 }
@@ -314,6 +326,8 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 vblk_exp);
 blk_set_dev_ops(exp->blk, NULL, NULL);
 vduse_dev_destroy(vblk_exp->dev);
+unlink(vblk_exp->recon_file);
+g_free(vblk_exp->recon_file);
 }
 
 static void vduse_blk_exp_request_shutdown(BlockExport *exp)
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index ecee9c0568..b27145ceed 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -41,6 +41,8 @@
 #define VDUSE_VQ_ALIGN 4096
 #define MAX_IOVA_REGIONS 256
 
+#define LOG_ALIGNMENT 64
+
 /* Round number down to multiple */
 #define ALIGN_DOWN(n, m) ((n) / (m) * (m))
 
@@ -51,6 +53,31 @@
 #define unlikely(x)   __builtin_expect(!!(x), 0)
 #endif
 
+typedef struct VduseDescStateSplit {
+uint8_t inflight;
+uint8_t padding[5];
+uint16_t next;
+uint64_t counter;
+} VduseDescStateSplit;
+
+typedef struct VduseVirtqLogInflight {
+uint64_t features;
+uint16_t version;
+uint16_t desc_num;
+uint16_t last_batch_head;
+uint16_t used_idx;
+VduseDescStateSplit desc[];
+} VduseVirtqLogInflight;
+
+typedef struct VduseVirtqLog {
+VduseVirtqLogInflight inflight;
+} VduseVirtqLog;
+
+typedef struct VduseVirtqInflightDesc {
+uint16_t index;
+uint64_t counter;
+} VduseVirtqInflightDesc;
+
 typedef struct VduseRing {
 unsigned int num;
 uint64_t desc_addr;
@@ -73,6 +100,10 @@ struct VduseVirtq {
 bool ready;
 int fd;
 VduseDev *dev;
+VduseVirtqInflightDesc *resubmit_list;
+uint16_t resubmit_num;
+uint64_t counter;
+VduseVirtqLog *log;
 };
 
 typedef struct VduseIovaRegion {
@@ -96,8 +127,36 @@ struct VduseDev {
 int fd;
 int ctrl_fd;
 void *priv;
+void *log;
 };
 
+static inline size_t vduse_vq_log_size(uint16_t queue_size)
+{
+return ALIGN_UP(sizeof(VduseDescStateSplit) * queue_size +
+sizeof(VduseVirtqLogInflight), LOG_ALIGNMENT);
+}
+
+static void *vduse_log_get(const char *filename, size_t size)
+{
+void *ptr = MAP_FAILED;
+int fd;
+
+fd = open(filename, O_RDWR | O_CREAT, 0600);
+if (fd == -1) {
+return MAP_FAILED;
+}
+
+if (ftruncate(fd, size) == -1) {
+goto out;
+}
+
+ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+
+out:
+close(fd);
+return ptr;
+}
+
 static inline bool has_feature(uint64_t features, unsigned int fbit)
 {
 assert(fbit < 64);
@@ -148,6 +207,105 @@ static int vduse_inject_irq(VduseDev *dev, int index)
 return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, &index);
 }
 
+static int inflight_desc_compare(const void *a, const void *b)
+{
+VduseVirtqInflightDesc *desc0 = (VduseVirtqInflightDesc *)a,
+   *desc1 = (VduseVirtqInflightDesc *)b;
+
+if (desc1->counter > desc0->counter &&
+(desc1->counter - desc0->counter) < VIRTQUEUE_MAX_SIZE * 2) {
+return 1;
+}
+
+retu

[PATCH v5 3/8] block/export: Abstract out the logic of virtio-blk I/O process

2022-05-04 Thread Xie Yongji
Abstract the common logic of virtio-blk I/O process to a function
named virtio_blk_process_req(). It's needed for the following commit.

Signed-off-by: Xie Yongji 
---
 MAINTAINERS  |   2 +
 block/export/meson.build |   2 +-
 block/export/vhost-user-blk-server.c | 249 ++-
 block/export/virtio-blk-handler.c| 237 +
 block/export/virtio-blk-handler.h|  33 
 5 files changed, 287 insertions(+), 236 deletions(-)
 create mode 100644 block/export/virtio-blk-handler.c
 create mode 100644 block/export/virtio-blk-handler.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 294c88ace9..4113b6fc5c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3536,6 +3536,8 @@ M: Coiby Xu 
 S: Maintained
 F: block/export/vhost-user-blk-server.c
 F: block/export/vhost-user-blk-server.h
+F: block/export/virtio-blk-handler.c
+F: block/export/virtio-blk-handler.h
 F: include/qemu/vhost-user-server.h
 F: tests/qtest/libqos/vhost-user-blk.c
 F: tests/qtest/libqos/vhost-user-blk.h
diff --git a/block/export/meson.build b/block/export/meson.build
index 0a08e384c7..431e47ca51 100644
--- a/block/export/meson.build
+++ b/block/export/meson.build
@@ -1,7 +1,7 @@
 blockdev_ss.add(files('export.c'))
 
 if have_vhost_user_blk_server
-blockdev_ss.add(files('vhost-user-blk-server.c'))
+blockdev_ss.add(files('vhost-user-blk-server.c', 'virtio-blk-handler.c'))
 endif
 
 blockdev_ss.add(when: fuse, if_true: files('fuse.c'))
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index a129204c44..8705f7c27c 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -17,31 +17,15 @@
 #include "vhost-user-blk-server.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
-#include "sysemu/block-backend.h"
 #include "util/block-helpers.h"
-
-/*
- * Sector units are 512 bytes regardless of the
- * virtio_blk_config->blk_size value.
- */
-#define VIRTIO_BLK_SECTOR_BITS 9
-#define VIRTIO_BLK_SECTOR_SIZE (1ull << VIRTIO_BLK_SECTOR_BITS)
+#include "virtio-blk-handler.h"
 
 enum {
 VHOST_USER_BLK_NUM_QUEUES_DEFAULT = 1,
-VHOST_USER_BLK_MAX_DISCARD_SECTORS = 32768,
-VHOST_USER_BLK_MAX_WRITE_ZEROES_SECTORS = 32768,
-};
-struct virtio_blk_inhdr {
-unsigned char status;
 };
 
 typedef struct VuBlkReq {
 VuVirtqElement elem;
-int64_t sector_num;
-size_t size;
-struct virtio_blk_inhdr *in;
-struct virtio_blk_outhdr out;
 VuServer *server;
 struct VuVirtq *vq;
 } VuBlkReq;
@@ -50,248 +34,44 @@ typedef struct VuBlkReq {
 typedef struct {
 BlockExport export;
 VuServer vu_server;
-uint32_t blk_size;
 QIOChannelSocket *sioc;
 struct virtio_blk_config blkcfg;
 bool writable;
 } VuBlkExport;
 
-static void vu_blk_req_complete(VuBlkReq *req)
+static void vu_blk_req_complete(VuBlkReq *req, size_t in_len)
 {
 VuDev *vu_dev = &req->server->vu_dev;
 
-/* IO size with 1 extra status byte */
-vu_queue_push(vu_dev, req->vq, &req->elem, req->size + 1);
+vu_queue_push(vu_dev, req->vq, &req->elem, in_len);
 vu_queue_notify(vu_dev, req->vq);
 
 free(req);
 }
 
-static bool vu_blk_sect_range_ok(VuBlkExport *vexp, uint64_t sector,
- size_t size)
-{
-uint64_t nb_sectors;
-uint64_t total_sectors;
-
-if (size % VIRTIO_BLK_SECTOR_SIZE) {
-return false;
-}
-
-nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS;
-
-QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE);
-if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
-return false;
-}
-if ((sector << VIRTIO_BLK_SECTOR_BITS) % vexp->blk_size) {
-return false;
-}
-blk_get_geometry(vexp->export.blk, &total_sectors);
-if (sector > total_sectors || nb_sectors > total_sectors - sector) {
-return false;
-}
-return true;
-}
-
-static int coroutine_fn
-vu_blk_discard_write_zeroes(VuBlkExport *vexp, struct iovec *iov,
-uint32_t iovcnt, uint32_t type)
-{
-BlockBackend *blk = vexp->export.blk;
-struct virtio_blk_discard_write_zeroes desc;
-ssize_t size;
-uint64_t sector;
-uint32_t num_sectors;
-uint32_t max_sectors;
-uint32_t flags;
-int bytes;
-
-/* Only one desc is currently supported */
-if (unlikely(iov_size(iov, iovcnt) > sizeof(desc))) {
-return VIRTIO_BLK_S_UNSUPP;
-}
-
-size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
-if (unlikely(size != sizeof(desc))) {
-error_report("Invalid size %zd, expected %zu", size, sizeof(desc));
-return VIRTIO_BLK_S_IOERR;
-}
-
-sector = le64_to_cpu(desc.sector);
-num_sectors = le32_to_cpu(desc.num_sectors)

[PATCH v5 6/8] vduse-blk: Implement vduse-blk export

2022-05-04 Thread Xie Yongji
This implements a VDUSE block backends based on
the libvduse library. We can use it to export the BDSs
for both VM and container (host) usage.

The new command-line syntax is:

$ qemu-storage-daemon \
--blockdev file,node-name=drive0,filename=test.img \
--export vduse-blk,node-name=drive0,id=vduse-export0,writable=on

After the qemu-storage-daemon started, we need to use
the "vdpa" command to attach the device to vDPA bus:

$ vdpa dev add name vduse-export0 mgmtdev vduse

Also the device must be removed via the "vdpa" command
before we stop the qemu-storage-daemon.

Signed-off-by: Xie Yongji 
---
 MAINTAINERS   |   4 +-
 block/export/export.c |   6 +
 block/export/meson.build  |   5 +
 block/export/vduse-blk.c  | 312 ++
 block/export/vduse-blk.h  |  20 +++
 meson.build   |  13 ++
 meson_options.txt |   2 +
 qapi/block-export.json|  25 ++-
 scripts/meson-buildoptions.sh |   4 +
 9 files changed, 388 insertions(+), 3 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6de3cbaa1e..704489e50d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3550,10 +3550,12 @@ L: qemu-block@nongnu.org
 S: Supported
 F: block/export/fuse.c
 
-VDUSE library
+VDUSE library and block device exports
 M: Xie Yongji 
 S: Maintained
 F: subprojects/libvduse/
+F: block/export/vduse-blk.c
+F: block/export/vduse-blk.h
 
 Replication
 M: Wen Congyang 
diff --git a/block/export/export.c b/block/export/export.c
index 7253af3bc3..4744862915 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -26,6 +26,9 @@
 #ifdef CONFIG_VHOST_USER_BLK_SERVER
 #include "vhost-user-blk-server.h"
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+#include "vduse-blk.h"
+#endif
 
 static const BlockExportDriver *blk_exp_drivers[] = {
 &blk_exp_nbd,
@@ -35,6 +38,9 @@ static const BlockExportDriver *blk_exp_drivers[] = {
 #ifdef CONFIG_FUSE
 &blk_exp_fuse,
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+&blk_exp_vduse_blk,
+#endif
 };
 
 /* Only accessed from the main thread */
diff --git a/block/export/meson.build b/block/export/meson.build
index 431e47ca51..c60116f455 100644
--- a/block/export/meson.build
+++ b/block/export/meson.build
@@ -5,3 +5,8 @@ if have_vhost_user_blk_server
 endif
 
 blockdev_ss.add(when: fuse, if_true: files('fuse.c'))
+
+if have_vduse_blk_export
+blockdev_ss.add(files('vduse-blk.c', 'virtio-blk-handler.c'))
+blockdev_ss.add(libvduse)
+endif
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
new file mode 100644
index 00..8580ae929f
--- /dev/null
+++ b/block/export/vduse-blk.c
@@ -0,0 +1,312 @@
+/*
+ * Export QEMU block device via VDUSE
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *   Portions of codes and concepts borrowed from vhost-user-blk-server.c, so:
+ * Copyright (c) 2020 Red Hat, Inc.
+ *
+ * Author:
+ *   Xie Yongji 
+ *   Coiby Xu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "block/export.h"
+#include "qemu/error-report.h"
+#include "util/block-helpers.h"
+#include "subprojects/libvduse/libvduse.h"
+#include "virtio-blk-handler.h"
+
+#include "standard-headers/linux/virtio_blk.h"
+
+#define VDUSE_DEFAULT_NUM_QUEUE 1
+#define VDUSE_DEFAULT_QUEUE_SIZE 256
+
+typedef struct VduseBlkExport {
+BlockExport export;
+VduseDev *dev;
+uint16_t num_queues;
+bool writable;
+} VduseBlkExport;
+
+typedef struct VduseBlkReq {
+VduseVirtqElement elem;
+VduseVirtq *vq;
+} VduseBlkReq;
+
+static void vduse_blk_req_complete(VduseBlkReq *req, size_t in_len)
+{
+vduse_queue_push(req->vq, &req->elem, in_len);
+vduse_queue_notify(req->vq);
+
+free(req);
+}
+
+static void coroutine_fn vduse_blk_virtio_process_req(void *opaque)
+{
+VduseBlkReq *req = opaque;
+VduseVirtq *vq = req->vq;
+VduseDev *dev = vduse_queue_get_dev(vq);
+VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+BlockBackend *blk = vblk_exp->export.blk;
+VduseVirtqElement *elem = &req->elem;
+struct iovec *in_iov = elem->in_sg;
+struct iovec *out_iov = elem->out_sg;
+unsigned in_num = elem->in_num;
+unsigned out_num = elem->out_num;
+int in_len;
+
+in_len = virtio_blk_process_req(blk, vblk_exp->writable,
+vblk_exp->export.id, in_iov,
+out_iov, in_num, out_num);
+if (in_len < 0) {
+free(req);
+return;
+}
+
+vduse_blk_req_complet

[PATCH v5 2/8] block-backend: Introduce blk_get_guest_block_size()

2022-05-04 Thread Xie Yongji
Support getting the guest block size for the block backend.
It's needed for the following commit.

Signed-off-by: Xie Yongji 
---
 block/block-backend.c | 6 ++
 include/sysemu/block-backend-io.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 35457a6a1d..1582ff81c9 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2106,6 +2106,12 @@ void blk_set_guest_block_size(BlockBackend *blk, int 
align)
 blk->guest_block_size = align;
 }
 
+int blk_get_guest_block_size(BlockBackend *blk)
+{
+IO_CODE();
+return blk->guest_block_size;
+}
+
 void *blk_try_blockalign(BlockBackend *blk, size_t size)
 {
 IO_CODE();
diff --git a/include/sysemu/block-backend-io.h 
b/include/sysemu/block-backend-io.h
index 6517c39295..7600822196 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -73,6 +73,7 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
+int blk_get_guest_block_size(BlockBackend *blk);
 
 void blk_io_plug(BlockBackend *blk);
 void blk_io_unplug(BlockBackend *blk);
-- 
2.20.1




[PATCH v5 1/8] block: Support passing NULL ops to blk_set_dev_ops()

2022-05-04 Thread Xie Yongji
This supports passing NULL ops to blk_set_dev_ops()
so that we can remove stale ops in some cases.

Signed-off-by: Xie Yongji 
---
 block/block-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index e0e1aff4b1..35457a6a1d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1062,7 +1062,7 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps 
*ops,
 blk->dev_opaque = opaque;
 
 /* Are we currently quiesced? Should we enforce this right now? */
-if (blk->quiesce_counter && ops->drained_begin) {
+if (blk->quiesce_counter && ops && ops->drained_begin) {
 ops->drained_begin(opaque);
 }
 }
-- 
2.20.1




[PATCH v5 4/8] linux-headers: Add vduse.h

2022-05-04 Thread Xie Yongji
This adds vduse header to linux headers so that the
relevant VDUSE API can be used in subsequent patches.

Signed-off-by: Xie Yongji 
---
 linux-headers/linux/vduse.h | 306 
 scripts/update-linux-headers.sh |   2 +-
 2 files changed, 307 insertions(+), 1 deletion(-)
 create mode 100644 linux-headers/linux/vduse.h

diff --git a/linux-headers/linux/vduse.h b/linux-headers/linux/vduse.h
new file mode 100644
index 00..d47b004ce6
--- /dev/null
+++ b/linux-headers/linux/vduse.h
@@ -0,0 +1,306 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _VDUSE_H_
+#define _VDUSE_H_
+
+#include 
+
+#define VDUSE_BASE 0x81
+
+/* The ioctls for control device (/dev/vduse/control) */
+
+#define VDUSE_API_VERSION  0
+
+/*
+ * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION).
+ * This is used for future extension.
+ */
+#define VDUSE_GET_API_VERSION  _IOR(VDUSE_BASE, 0x00, __u64)
+
+/* Set the version of VDUSE API that userspace supported. */
+#define VDUSE_SET_API_VERSION  _IOW(VDUSE_BASE, 0x01, __u64)
+
+/**
+ * struct vduse_dev_config - basic configuration of a VDUSE device
+ * @name: VDUSE device name, needs to be NUL terminated
+ * @vendor_id: virtio vendor id
+ * @device_id: virtio device id
+ * @features: virtio features
+ * @vq_num: the number of virtqueues
+ * @vq_align: the allocation alignment of virtqueue's metadata
+ * @reserved: for future use, needs to be initialized to zero
+ * @config_size: the size of the configuration space
+ * @config: the buffer of the configuration space
+ *
+ * Structure used by VDUSE_CREATE_DEV ioctl to create VDUSE device.
+ */
+struct vduse_dev_config {
+#define VDUSE_NAME_MAX 256
+   char name[VDUSE_NAME_MAX];
+   __u32 vendor_id;
+   __u32 device_id;
+   __u64 features;
+   __u32 vq_num;
+   __u32 vq_align;
+   __u32 reserved[13];
+   __u32 config_size;
+   __u8 config[];
+};
+
+/* Create a VDUSE device which is represented by a char device 
(/dev/vduse/$NAME) */
+#define VDUSE_CREATE_DEV   _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config)
+
+/*
+ * Destroy a VDUSE device. Make sure there are no more references
+ * to the char device (/dev/vduse/$NAME).
+ */
+#define VDUSE_DESTROY_DEV  _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX])
+
+/* The ioctls for VDUSE device (/dev/vduse/$NAME) */
+
+/**
+ * struct vduse_iotlb_entry - entry of IOTLB to describe one IOVA region 
[start, last]
+ * @offset: the mmap offset on returned file descriptor
+ * @start: start of the IOVA region
+ * @last: last of the IOVA region
+ * @perm: access permission of the IOVA region
+ *
+ * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA 
region.
+ */
+struct vduse_iotlb_entry {
+   __u64 offset;
+   __u64 start;
+   __u64 last;
+#define VDUSE_ACCESS_RO 0x1
+#define VDUSE_ACCESS_WO 0x2
+#define VDUSE_ACCESS_RW 0x3
+   __u8 perm;
+};
+
+/*
+ * Find the first IOVA region that overlaps with the range [start, last]
+ * and return the corresponding file descriptor. Return -EINVAL means the
+ * IOVA region doesn't exist. Caller should set start and last fields.
+ */
+#define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x10, struct 
vduse_iotlb_entry)
+
+/*
+ * Get the negotiated virtio features. It's a subset of the features in
+ * struct vduse_dev_config which can be accepted by virtio driver. It's
+ * only valid after FEATURES_OK status bit is set.
+ */
+#define VDUSE_DEV_GET_FEATURES _IOR(VDUSE_BASE, 0x11, __u64)
+
+/**
+ * struct vduse_config_data - data used to update configuration space
+ * @offset: the offset from the beginning of configuration space
+ * @length: the length to write to configuration space
+ * @buffer: the buffer used to write from
+ *
+ * Structure used by VDUSE_DEV_SET_CONFIG ioctl to update device
+ * configuration space.
+ */
+struct vduse_config_data {
+   __u32 offset;
+   __u32 length;
+   __u8 buffer[];
+};
+
+/* Set device configuration space */
+#define VDUSE_DEV_SET_CONFIG   _IOW(VDUSE_BASE, 0x12, struct vduse_config_data)
+
+/*
+ * Inject a config interrupt. It's usually used to notify virtio driver
+ * that device configuration space has changed.
+ */
+#define VDUSE_DEV_INJECT_CONFIG_IRQ_IO(VDUSE_BASE, 0x13)
+
+/**
+ * struct vduse_vq_config - basic configuration of a virtqueue
+ * @index: virtqueue index
+ * @max_size: the max size of virtqueue
+ * @reserved: for future use, needs to be initialized to zero
+ *
+ * Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue.
+ */
+struct vduse_vq_config {
+   __u32 index;
+   __u16 max_size;
+   __u16 reserved[13];
+};
+
+/*
+ * Setup the specified virtqueue. Make sure all virtqueues have been
+ * configured before the device is attached to vDPA bus.
+ */
+#define VDUSE_VQ_SETUP _IOW(VDUSE_BASE, 0x14, struct vduse_vq_config)
+
+/**
+ * struct vduse_vq_state_split - split virtqueue state
+ * @ava

[PATCH v5 0/8] Support exporting BDSs via VDUSE

2022-05-04 Thread Xie Yongji
Hi all,

Last few months ago, VDUSE (vDPA Device in Userspace) [1] has
been merged into Linux kernel as a framework that make it
possible to emulate a vDPA device in userspace. This series
aimed at implementing a VDUSE block backend based on the
qemu-storage-daemon infrastructure.

To support that, we firstly introduce a VDUSE library as a
subproject (like what libvhost-user does) to help implementing
VDUSE backends in QEMU. Then a VDUSE block export is implemented
based on this library. At last, we add resize and reconnect support
to the VDUSE block export and VDUSE library.

Since we don't support vdpa-blk in QEMU currently, the VM case is
tested with my previous patchset [2].

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html
[2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg797569.html

Please review, thanks!

V4 to V5:
- Abstract out the logic for virito-blk I/O process from
  vhost-user-blk-server and reuse it [Kevin]
- Fix missing VIRTIO_BLK_F_FLUSH [Kevin]
- Support discard and write_zeroes [Kevin]
- Rebase to the newest tree

V3 to V4:
- Fix some comments on QAPI [Eric]

V2 to V3:
- Introduce vduse_get_virtio_features() [Stefan]
- Update MAINTAINERS file [Stefan]
- Fix handler of VIRTIO_BLK_T_GET_ID request [Stefan]
- Add barrier for vduse_queue_inflight_get() [Stefan]

V1 to V2:
- Move vduse header to linux-headers [Stefan]
- Add two new API to support creating device from /dev/vduse/$NAME or
  file descriptor [Stefan]
- Check VIRTIO_F_VERSION_1 during intialization [Stefan]
- Replace malloc() + memset to calloc() [Stefan]
- Increase default queue size to 256 for vduse-blk [Stefan]
- Zero-initialize virtio-blk config space [Stefan]
- Add a patch to support reset blk->dev_ops
- Validate vq->log->inflight fields [Stefan]
- Add vduse_set_reconnect_log_file() API to support specifing the
  reconnect log file
- Fix some bugs [Stefan]

Xie Yongji (8):
  block: Support passing NULL ops to blk_set_dev_ops()
  block-backend: Introduce blk_get_guest_block_size()
  block/export: Abstract out the logic of virtio-blk I/O process
  linux-headers: Add vduse.h
  libvduse: Add VDUSE (vDPA Device in Userspace) library
  vduse-blk: Implement vduse-blk export
  vduse-blk: Add vduse-blk resize support
  libvduse: Add support for reconnecting

 MAINTAINERS |9 +
 block/block-backend.c   |8 +-
 block/export/export.c   |6 +
 block/export/meson.build|7 +-
 block/export/vduse-blk.c|  346 +
 block/export/vduse-blk.h|   20 +
 block/export/vhost-user-blk-server.c|  249 +---
 block/export/virtio-blk-handler.c   |  237 
 block/export/virtio-blk-handler.h   |   33 +
 include/sysemu/block-backend-io.h   |1 +
 linux-headers/linux/vduse.h |  306 
 meson.build |   28 +
 meson_options.txt   |4 +
 qapi/block-export.json  |   25 +-
 scripts/meson-buildoptions.sh   |7 +
 scripts/update-linux-headers.sh |2 +-
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/include/compiler.h |1 +
 subprojects/libvduse/libvduse.c | 1386 +++
 subprojects/libvduse/libvduse.h |  247 
 subprojects/libvduse/linux-headers/linux|1 +
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 23 files changed, 2695 insertions(+), 240 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h
 create mode 100644 block/export/virtio-blk-handler.c
 create mode 100644 block/export/virtio-blk-handler.h
 create mode 100644 linux-headers/linux/vduse.h
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 12 subprojects/libvduse/include/compiler.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 12 subprojects/libvduse/linux-headers/linux
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

-- 
2.20.1




[PATCH v4 1/6] block: Support passing NULL ops to blk_set_dev_ops()

2022-04-06 Thread Xie Yongji
This supports passing NULL ops to blk_set_dev_ops()
so that we can remove stale ops in some cases.

Signed-off-by: Xie Yongji 
---
 block/block-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index e0e1aff4b1..35457a6a1d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1062,7 +1062,7 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps 
*ops,
 blk->dev_opaque = opaque;
 
 /* Are we currently quiesced? Should we enforce this right now? */
-if (blk->quiesce_counter && ops->drained_begin) {
+if (blk->quiesce_counter && ops && ops->drained_begin) {
 ops->drained_begin(opaque);
 }
 }
-- 
2.20.1




[PATCH v4 0/6] Support exporting BDSs via VDUSE

2022-04-06 Thread Xie Yongji
Hi all,

Last few months ago, VDUSE (vDPA Device in Userspace) [1] has
been merged into Linux kernel as a framework that make it
possible to emulate a vDPA device in userspace. This series
aimed at implementing a VDUSE block backend based on the
qemu-storage-daemon infrastructure.

To support that, we firstly introduce a VDUSE library as a
subproject (like what libvhost-user does) to help implementing
VDUSE backends in QEMU. Then a VDUSE block export is implemented
based on this library. At last, we add resize and reconnect support
to the VDUSE block export and VDUSE library.

Since we don't support vdpa-blk in QEMU currently, the VM case is
tested with my previous patchset [2].

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html
[2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg797569.html

Please review, thanks!

V3 to V4:
- Fix some comments on QAPI [Eric]

V2 to V3:
- Introduce vduse_get_virtio_features() [Stefan]
- Update MAINTAINERS file [Stefan]
- Fix handler of VIRTIO_BLK_T_GET_ID request [Stefan]
- Add barrier for vduse_queue_inflight_get() [Stefan]

V1 to V2:
- Move vduse header to linux-headers [Stefan]
- Add two new API to support creating device from /dev/vduse/$NAME or
  file descriptor [Stefan]
- Check VIRTIO_F_VERSION_1 during intialization [Stefan]
- Replace malloc() + memset to calloc() [Stefan]
- Increase default queue size to 256 for vduse-blk [Stefan]
- Zero-initialize virtio-blk config space [Stefan]
- Add a patch to support reset blk->dev_ops
- Validate vq->log->inflight fields [Stefan]
- Add vduse_set_reconnect_log_file() API to support specifing the
  reconnect log file
- Fix some bugs [Stefan]

Xie Yongji (6):
  block: Support passing NULL ops to blk_set_dev_ops()
  linux-headers: Add vduse.h
  libvduse: Add VDUSE (vDPA Device in Userspace) library
  vduse-blk: implements vduse-blk export
  vduse-blk: Add vduse-blk resize support
  libvduse: Add support for reconnecting

 MAINTAINERS |7 +
 block/block-backend.c   |2 +-
 block/export/export.c   |6 +
 block/export/meson.build|5 +
 block/export/vduse-blk.c|  459 ++
 block/export/vduse-blk.h|   20 +
 linux-headers/linux/vduse.h |  306 
 meson.build |   28 +
 meson_options.txt   |4 +
 qapi/block-export.json  |   25 +-
 scripts/meson-buildoptions.sh   |7 +
 scripts/update-linux-headers.sh |2 +-
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/libvduse.c | 1386 +++
 subprojects/libvduse/libvduse.h |  247 
 subprojects/libvduse/linux-headers/linux|1 +
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 18 files changed, 2513 insertions(+), 4 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h
 create mode 100644 linux-headers/linux/vduse.h
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 12 subprojects/libvduse/linux-headers/linux
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

-- 
2.20.1




[PATCH v4 6/6] libvduse: Add support for reconnecting

2022-04-06 Thread Xie Yongji
To support reconnecting after restart or crash, VDUSE backend
might need to resubmit inflight I/Os. This stores the metadata
such as the index of inflight I/O's descriptors to a shm file so
that VDUSE backend can restore them during reconnecting.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c|  14 ++
 subprojects/libvduse/libvduse.c | 235 +++-
 subprojects/libvduse/libvduse.h |  12 ++
 3 files changed, 256 insertions(+), 5 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index e027b2e5ff..b24b5aeda9 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -38,6 +38,7 @@ typedef struct VduseBlkExport {
 uint16_t num_queues;
 uint32_t blk_size;
 bool writable;
+char *recon_file;
 } VduseBlkExport;
 
 struct virtio_blk_inhdr {
@@ -233,6 +234,8 @@ static void vduse_blk_enable_queue(VduseDev *dev, 
VduseVirtq *vq)
 
 aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+/* Make sure we don't miss any kick afer reconnecting */
+eventfd_write(vduse_queue_get_fd(vq), 1);
 }
 
 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
@@ -404,6 +407,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 return -ENOMEM;
 }
 
+vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
+   g_get_tmp_dir(), exp->id);
+if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
+error_setg(errp, "failed to set reconnect log file");
+vduse_dev_destroy(vblk_exp->dev);
+g_free(vblk_exp->recon_file);
+return -EINVAL;
+}
+
 for (i = 0; i < num_queues; i++) {
 vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
 }
@@ -427,6 +439,8 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 vblk_exp);
 blk_set_dev_ops(exp->blk, NULL, NULL);
 vduse_dev_destroy(vblk_exp->dev);
+unlink(vblk_exp->recon_file);
+g_free(vblk_exp->recon_file);
 }
 
 static void vduse_blk_exp_request_shutdown(BlockExport *exp)
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index ecee9c0568..b27145ceed 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -41,6 +41,8 @@
 #define VDUSE_VQ_ALIGN 4096
 #define MAX_IOVA_REGIONS 256
 
+#define LOG_ALIGNMENT 64
+
 /* Round number down to multiple */
 #define ALIGN_DOWN(n, m) ((n) / (m) * (m))
 
@@ -51,6 +53,31 @@
 #define unlikely(x)   __builtin_expect(!!(x), 0)
 #endif
 
+typedef struct VduseDescStateSplit {
+uint8_t inflight;
+uint8_t padding[5];
+uint16_t next;
+uint64_t counter;
+} VduseDescStateSplit;
+
+typedef struct VduseVirtqLogInflight {
+uint64_t features;
+uint16_t version;
+uint16_t desc_num;
+uint16_t last_batch_head;
+uint16_t used_idx;
+VduseDescStateSplit desc[];
+} VduseVirtqLogInflight;
+
+typedef struct VduseVirtqLog {
+VduseVirtqLogInflight inflight;
+} VduseVirtqLog;
+
+typedef struct VduseVirtqInflightDesc {
+uint16_t index;
+uint64_t counter;
+} VduseVirtqInflightDesc;
+
 typedef struct VduseRing {
 unsigned int num;
 uint64_t desc_addr;
@@ -73,6 +100,10 @@ struct VduseVirtq {
 bool ready;
 int fd;
 VduseDev *dev;
+VduseVirtqInflightDesc *resubmit_list;
+uint16_t resubmit_num;
+uint64_t counter;
+VduseVirtqLog *log;
 };
 
 typedef struct VduseIovaRegion {
@@ -96,8 +127,36 @@ struct VduseDev {
 int fd;
 int ctrl_fd;
 void *priv;
+void *log;
 };
 
+static inline size_t vduse_vq_log_size(uint16_t queue_size)
+{
+return ALIGN_UP(sizeof(VduseDescStateSplit) * queue_size +
+sizeof(VduseVirtqLogInflight), LOG_ALIGNMENT);
+}
+
+static void *vduse_log_get(const char *filename, size_t size)
+{
+void *ptr = MAP_FAILED;
+int fd;
+
+fd = open(filename, O_RDWR | O_CREAT, 0600);
+if (fd == -1) {
+return MAP_FAILED;
+}
+
+if (ftruncate(fd, size) == -1) {
+goto out;
+}
+
+ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+
+out:
+close(fd);
+return ptr;
+}
+
 static inline bool has_feature(uint64_t features, unsigned int fbit)
 {
 assert(fbit < 64);
@@ -148,6 +207,105 @@ static int vduse_inject_irq(VduseDev *dev, int index)
 return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, &index);
 }
 
+static int inflight_desc_compare(const void *a, const void *b)
+{
+VduseVirtqInflightDesc *desc0 = (VduseVirtqInflightDesc *)a,
+   *desc1 = (VduseVirtqInflightDesc *)b;
+
+if (desc1->counter > desc0->counter &&
+(desc1->counter - desc0->counter) < VIRTQUEUE_MAX_SIZE * 2) {
+return 1;
+}
+
+retu

[PATCH v4 5/6] vduse-blk: Add vduse-blk resize support

2022-04-06 Thread Xie Yongji
To support block resize, this uses vduse_dev_update_config()
to update the capacity field in configuration space and inject
config interrupt on the block resize callback.

Signed-off-by: Xie Yongji 
Reviewed-by: Stefan Hajnoczi 
---
 block/export/vduse-blk.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 3f4e0df34b..e027b2e5ff 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -310,6 +310,23 @@ static void blk_aio_detach(void *opaque)
 vblk_exp->export.ctx = NULL;
 }
 
+static void vduse_blk_resize(void *opaque)
+{
+BlockExport *exp = opaque;
+VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+struct virtio_blk_config config;
+
+config.capacity =
+cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
+vduse_dev_update_config(vblk_exp->dev, sizeof(config.capacity),
+offsetof(struct virtio_blk_config, capacity),
+(char *)&config.capacity);
+}
+
+static const BlockDevOps vduse_block_ops = {
+.resize_cb = vduse_blk_resize,
+};
+
 static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
 Error **errp)
 {
@@ -397,6 +414,8 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
  vblk_exp);
 
+blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
+
 return 0;
 }
 
@@ -406,6 +425,7 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 vblk_exp);
+blk_set_dev_ops(exp->blk, NULL, NULL);
 vduse_dev_destroy(vblk_exp->dev);
 }
 
-- 
2.20.1




[PATCH v4 3/6] libvduse: Add VDUSE (vDPA Device in Userspace) library

2022-04-06 Thread Xie Yongji
VDUSE [1] is a linux framework that makes it possible to implement
software-emulated vDPA devices in userspace. This adds a library
as a subproject to help implementing VDUSE backends in QEMU.

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html

Signed-off-by: Xie Yongji 
---
 MAINTAINERS |5 +
 meson.build |   15 +
 meson_options.txt   |2 +
 scripts/meson-buildoptions.sh   |3 +
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/libvduse.c | 1161 +++
 subprojects/libvduse/libvduse.h |  235 
 subprojects/libvduse/linux-headers/linux|1 +
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 10 files changed, 1434 insertions(+)
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 12 subprojects/libvduse/linux-headers/linux
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

diff --git a/MAINTAINERS b/MAINTAINERS
index 9aed5f3e04..53a14bf7a8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3547,6 +3547,11 @@ L: qemu-block@nongnu.org
 S: Supported
 F: block/export/fuse.c
 
+VDUSE library
+M: Xie Yongji 
+S: Maintained
+F: subprojects/libvduse/
+
 Replication
 M: Wen Congyang 
 M: Xie Changlong 
diff --git a/meson.build b/meson.build
index bae62efc9c..5c71904461 100644
--- a/meson.build
+++ b/meson.build
@@ -1351,6 +1351,21 @@ if get_option('fuse_lseek').allowed()
   endif
 endif
 
+have_libvduse = (targetos == 'linux')
+if get_option('libvduse').enabled()
+if targetos != 'linux'
+error('libvduse requires linux')
+endif
+elif get_option('libvduse').disabled()
+have_libvduse = false
+endif
+
+libvduse = not_found
+if have_libvduse
+  libvduse_proj = subproject('libvduse')
+  libvduse = libvduse_proj.get_variable('libvduse_dep')
+endif
+
 # libbpf
 libbpf = dependency('libbpf', required: get_option('bpf'), method: 
'pkg-config')
 if libbpf.found() and not cc.links('''
diff --git a/meson_options.txt b/meson_options.txt
index 52b11cead4..e25af3277d 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -219,6 +219,8 @@ option('virtfs', type: 'feature', value: 'auto',
description: 'virtio-9p support')
 option('virtiofsd', type: 'feature', value: 'auto',
description: 'build virtiofs daemon (virtiofsd)')
+option('libvduse', type: 'feature', value: 'auto',
+   description: 'build VDUSE Library')
 
 option('capstone', type: 'combo', value: 'auto',
choices: ['disabled', 'enabled', 'auto', 'system', 'internal'],
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 1e26f4571e..ccab9ca9da 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -77,6 +77,7 @@ meson_options_help() {
   printf "%s\n" '  libssh  ssh block device support'
   printf "%s\n" '  libudev Use libudev to enumerate host devices'
   printf "%s\n" '  libusb  libusb support for USB passthrough'
+  printf "%s\n" '  libvdusebuild VDUSE Library'
   printf "%s\n" '  linux-aio   Linux AIO support'
   printf "%s\n" '  linux-io-uring  Linux io_uring support'
   printf "%s\n" '  live-block-migration'
@@ -244,6 +245,8 @@ _meson_option_parse() {
 --disable-libudev) printf "%s" -Dlibudev=disabled ;;
 --enable-libusb) printf "%s" -Dlibusb=enabled ;;
 --disable-libusb) printf "%s" -Dlibusb=disabled ;;
+--enable-libvduse) printf "%s" -Dlibvduse=enabled ;;
+--disable-libvduse) printf "%s" -Dlibvduse=disabled ;;
 --enable-linux-aio) printf "%s" -Dlinux_aio=enabled ;;
 --disable-linux-aio) printf "%s" -Dlinux_aio=disabled ;;
 --enable-linux-io-uring) printf "%s" -Dlinux_io_uring=enabled ;;
diff --git a/subprojects/libvduse/include/atomic.h 
b/subprojects/libvduse/include/atomic.h
new file mode 12
index 00..8c2be64f7b
--- /dev/null
+++ b/subprojects/libvduse/include/atomic.h
@@ -0,0 +1 @@
+../../../include/qemu/atomic.h
\ No newline at end of file
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
new file mode 100644
index 00..ecee9c0568
--- /dev/null
+++ b/subprojects/libvdus

[PATCH v4 4/6] vduse-blk: implements vduse-blk export

2022-04-06 Thread Xie Yongji
This implements a VDUSE block backends based on
the libvduse library. We can use it to export the BDSs
for both VM and container (host) usage.

The new command-line syntax is:

$ qemu-storage-daemon \
--blockdev file,node-name=drive0,filename=test.img \
--export vduse-blk,node-name=drive0,id=vduse-export0,writable=on

After the qemu-storage-daemon started, we need to use
the "vdpa" command to attach the device to vDPA bus:

$ vdpa dev add name vduse-export0 mgmtdev vduse

Also the device must be removed via the "vdpa" command
before we stop the qemu-storage-daemon.

Signed-off-by: Xie Yongji 
---
 MAINTAINERS   |   4 +-
 block/export/export.c |   6 +
 block/export/meson.build  |   5 +
 block/export/vduse-blk.c  | 425 ++
 block/export/vduse-blk.h  |  20 ++
 meson.build   |  13 ++
 meson_options.txt |   2 +
 qapi/block-export.json|  25 +-
 scripts/meson-buildoptions.sh |   4 +
 9 files changed, 501 insertions(+), 3 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 53a14bf7a8..9d9f68479f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3547,10 +3547,12 @@ L: qemu-block@nongnu.org
 S: Supported
 F: block/export/fuse.c
 
-VDUSE library
+VDUSE library and block device exports
 M: Xie Yongji 
 S: Maintained
 F: subprojects/libvduse/
+F: block/export/vduse-blk.c
+F: block/export/vduse-blk.h
 
 Replication
 M: Wen Congyang 
diff --git a/block/export/export.c b/block/export/export.c
index 7253af3bc3..4744862915 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -26,6 +26,9 @@
 #ifdef CONFIG_VHOST_USER_BLK_SERVER
 #include "vhost-user-blk-server.h"
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+#include "vduse-blk.h"
+#endif
 
 static const BlockExportDriver *blk_exp_drivers[] = {
 &blk_exp_nbd,
@@ -35,6 +38,9 @@ static const BlockExportDriver *blk_exp_drivers[] = {
 #ifdef CONFIG_FUSE
 &blk_exp_fuse,
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+&blk_exp_vduse_blk,
+#endif
 };
 
 /* Only accessed from the main thread */
diff --git a/block/export/meson.build b/block/export/meson.build
index 0a08e384c7..cf311d2b1b 100644
--- a/block/export/meson.build
+++ b/block/export/meson.build
@@ -5,3 +5,8 @@ if have_vhost_user_blk_server
 endif
 
 blockdev_ss.add(when: fuse, if_true: files('fuse.c'))
+
+if have_vduse_blk_export
+blockdev_ss.add(files('vduse-blk.c'))
+blockdev_ss.add(libvduse)
+endif
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
new file mode 100644
index 00..3f4e0df34b
--- /dev/null
+++ b/block/export/vduse-blk.c
@@ -0,0 +1,425 @@
+/*
+ * Export QEMU block device via VDUSE
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *   Portions of codes and concepts borrowed from vhost-user-blk-server.c, so:
+ * Copyright (c) 2020 Red Hat, Inc.
+ *
+ * Author:
+ *   Xie Yongji 
+ *   Coiby Xu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "sysemu/block-backend.h"
+#include "block/export.h"
+#include "qemu/error-report.h"
+#include "util/block-helpers.h"
+#include "subprojects/libvduse/libvduse.h"
+
+#include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/virtio_blk.h"
+
+#define VIRTIO_BLK_SECTOR_BITS 9
+#define VIRTIO_BLK_SECTOR_SIZE (1ULL << VIRTIO_BLK_SECTOR_BITS)
+
+#define VDUSE_DEFAULT_NUM_QUEUE 1
+#define VDUSE_DEFAULT_QUEUE_SIZE 256
+
+typedef struct VduseBlkExport {
+BlockExport export;
+VduseDev *dev;
+uint16_t num_queues;
+uint32_t blk_size;
+bool writable;
+} VduseBlkExport;
+
+struct virtio_blk_inhdr {
+unsigned char status;
+};
+
+typedef struct VduseBlkReq {
+VduseVirtqElement elem;
+int64_t sector_num;
+size_t in_len;
+struct virtio_blk_inhdr *in;
+struct virtio_blk_outhdr out;
+VduseVirtq *vq;
+} VduseBlkReq;
+
+static void vduse_blk_req_complete(VduseBlkReq *req)
+{
+vduse_queue_push(req->vq, &req->elem, req->in_len);
+vduse_queue_notify(req->vq);
+
+free(req);
+}
+
+static bool vduse_blk_sect_range_ok(VduseBlkExport *vblk_exp,
+uint64_t sector, size_t size)
+{
+uint64_t nb_sectors;
+uint64_t total_sectors;
+
+if (size % VIRTIO_BLK_SECTOR_SIZE) {
+return false;
+}
+
+nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS;
+
+QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE);
+if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
+return false;
+}
+if ((sector << VIRTIO_BLK

[PATCH v4 2/6] linux-headers: Add vduse.h

2022-04-06 Thread Xie Yongji
This adds vduse header to linux headers so that the
relevant VDUSE API can be used in subsequent patches.

Signed-off-by: Xie Yongji 
---
 linux-headers/linux/vduse.h | 306 
 scripts/update-linux-headers.sh |   2 +-
 2 files changed, 307 insertions(+), 1 deletion(-)
 create mode 100644 linux-headers/linux/vduse.h

diff --git a/linux-headers/linux/vduse.h b/linux-headers/linux/vduse.h
new file mode 100644
index 00..d47b004ce6
--- /dev/null
+++ b/linux-headers/linux/vduse.h
@@ -0,0 +1,306 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _VDUSE_H_
+#define _VDUSE_H_
+
+#include 
+
+#define VDUSE_BASE 0x81
+
+/* The ioctls for control device (/dev/vduse/control) */
+
+#define VDUSE_API_VERSION  0
+
+/*
+ * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION).
+ * This is used for future extension.
+ */
+#define VDUSE_GET_API_VERSION  _IOR(VDUSE_BASE, 0x00, __u64)
+
+/* Set the version of VDUSE API that userspace supported. */
+#define VDUSE_SET_API_VERSION  _IOW(VDUSE_BASE, 0x01, __u64)
+
+/**
+ * struct vduse_dev_config - basic configuration of a VDUSE device
+ * @name: VDUSE device name, needs to be NUL terminated
+ * @vendor_id: virtio vendor id
+ * @device_id: virtio device id
+ * @features: virtio features
+ * @vq_num: the number of virtqueues
+ * @vq_align: the allocation alignment of virtqueue's metadata
+ * @reserved: for future use, needs to be initialized to zero
+ * @config_size: the size of the configuration space
+ * @config: the buffer of the configuration space
+ *
+ * Structure used by VDUSE_CREATE_DEV ioctl to create VDUSE device.
+ */
+struct vduse_dev_config {
+#define VDUSE_NAME_MAX 256
+   char name[VDUSE_NAME_MAX];
+   __u32 vendor_id;
+   __u32 device_id;
+   __u64 features;
+   __u32 vq_num;
+   __u32 vq_align;
+   __u32 reserved[13];
+   __u32 config_size;
+   __u8 config[];
+};
+
+/* Create a VDUSE device which is represented by a char device 
(/dev/vduse/$NAME) */
+#define VDUSE_CREATE_DEV   _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config)
+
+/*
+ * Destroy a VDUSE device. Make sure there are no more references
+ * to the char device (/dev/vduse/$NAME).
+ */
+#define VDUSE_DESTROY_DEV  _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX])
+
+/* The ioctls for VDUSE device (/dev/vduse/$NAME) */
+
+/**
+ * struct vduse_iotlb_entry - entry of IOTLB to describe one IOVA region 
[start, last]
+ * @offset: the mmap offset on returned file descriptor
+ * @start: start of the IOVA region
+ * @last: last of the IOVA region
+ * @perm: access permission of the IOVA region
+ *
+ * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA 
region.
+ */
+struct vduse_iotlb_entry {
+   __u64 offset;
+   __u64 start;
+   __u64 last;
+#define VDUSE_ACCESS_RO 0x1
+#define VDUSE_ACCESS_WO 0x2
+#define VDUSE_ACCESS_RW 0x3
+   __u8 perm;
+};
+
+/*
+ * Find the first IOVA region that overlaps with the range [start, last]
+ * and return the corresponding file descriptor. Return -EINVAL means the
+ * IOVA region doesn't exist. Caller should set start and last fields.
+ */
+#define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x10, struct 
vduse_iotlb_entry)
+
+/*
+ * Get the negotiated virtio features. It's a subset of the features in
+ * struct vduse_dev_config which can be accepted by virtio driver. It's
+ * only valid after FEATURES_OK status bit is set.
+ */
+#define VDUSE_DEV_GET_FEATURES _IOR(VDUSE_BASE, 0x11, __u64)
+
+/**
+ * struct vduse_config_data - data used to update configuration space
+ * @offset: the offset from the beginning of configuration space
+ * @length: the length to write to configuration space
+ * @buffer: the buffer used to write from
+ *
+ * Structure used by VDUSE_DEV_SET_CONFIG ioctl to update device
+ * configuration space.
+ */
+struct vduse_config_data {
+   __u32 offset;
+   __u32 length;
+   __u8 buffer[];
+};
+
+/* Set device configuration space */
+#define VDUSE_DEV_SET_CONFIG   _IOW(VDUSE_BASE, 0x12, struct vduse_config_data)
+
+/*
+ * Inject a config interrupt. It's usually used to notify virtio driver
+ * that device configuration space has changed.
+ */
+#define VDUSE_DEV_INJECT_CONFIG_IRQ_IO(VDUSE_BASE, 0x13)
+
+/**
+ * struct vduse_vq_config - basic configuration of a virtqueue
+ * @index: virtqueue index
+ * @max_size: the max size of virtqueue
+ * @reserved: for future use, needs to be initialized to zero
+ *
+ * Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue.
+ */
+struct vduse_vq_config {
+   __u32 index;
+   __u16 max_size;
+   __u16 reserved[13];
+};
+
+/*
+ * Setup the specified virtqueue. Make sure all virtqueues have been
+ * configured before the device is attached to vDPA bus.
+ */
+#define VDUSE_VQ_SETUP _IOW(VDUSE_BASE, 0x14, struct vduse_vq_config)
+
+/**
+ * struct vduse_vq_state_split - split virtqueue state
+ * @ava

[PATCH v3 4/6] vduse-blk: implements vduse-blk export

2022-03-21 Thread Xie Yongji
This implements a VDUSE block backends based on
the libvduse library. We can use it to export the BDSs
for both VM and container (host) usage.

The new command-line syntax is:

$ qemu-storage-daemon \
--blockdev file,node-name=drive0,filename=test.img \
--export vduse-blk,node-name=drive0,id=vduse-export0,writable=on

After the qemu-storage-daemon started, we need to use
the "vdpa" command to attach the device to vDPA bus:

$ vdpa dev add name vduse-export0 mgmtdev vduse

Also the device must be removed via the "vdpa" command
before we stop the qemu-storage-daemon.

Signed-off-by: Xie Yongji 
---
 MAINTAINERS   |   4 +-
 block/export/export.c |   6 +
 block/export/meson.build  |   5 +
 block/export/vduse-blk.c  | 425 ++
 block/export/vduse-blk.h  |  20 ++
 meson.build   |  13 ++
 meson_options.txt |   2 +
 qapi/block-export.json|  24 +-
 scripts/meson-buildoptions.sh |   4 +
 9 files changed, 500 insertions(+), 3 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 53a14bf7a8..9d9f68479f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3547,10 +3547,12 @@ L: qemu-block@nongnu.org
 S: Supported
 F: block/export/fuse.c
 
-VDUSE library
+VDUSE library and block device exports
 M: Xie Yongji 
 S: Maintained
 F: subprojects/libvduse/
+F: block/export/vduse-blk.c
+F: block/export/vduse-blk.h
 
 Replication
 M: Wen Congyang 
diff --git a/block/export/export.c b/block/export/export.c
index 7253af3bc3..4744862915 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -26,6 +26,9 @@
 #ifdef CONFIG_VHOST_USER_BLK_SERVER
 #include "vhost-user-blk-server.h"
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+#include "vduse-blk.h"
+#endif
 
 static const BlockExportDriver *blk_exp_drivers[] = {
 &blk_exp_nbd,
@@ -35,6 +38,9 @@ static const BlockExportDriver *blk_exp_drivers[] = {
 #ifdef CONFIG_FUSE
 &blk_exp_fuse,
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+&blk_exp_vduse_blk,
+#endif
 };
 
 /* Only accessed from the main thread */
diff --git a/block/export/meson.build b/block/export/meson.build
index 0a08e384c7..cf311d2b1b 100644
--- a/block/export/meson.build
+++ b/block/export/meson.build
@@ -5,3 +5,8 @@ if have_vhost_user_blk_server
 endif
 
 blockdev_ss.add(when: fuse, if_true: files('fuse.c'))
+
+if have_vduse_blk_export
+blockdev_ss.add(files('vduse-blk.c'))
+blockdev_ss.add(libvduse)
+endif
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
new file mode 100644
index 00..3f4e0df34b
--- /dev/null
+++ b/block/export/vduse-blk.c
@@ -0,0 +1,425 @@
+/*
+ * Export QEMU block device via VDUSE
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *   Portions of codes and concepts borrowed from vhost-user-blk-server.c, so:
+ * Copyright (c) 2020 Red Hat, Inc.
+ *
+ * Author:
+ *   Xie Yongji 
+ *   Coiby Xu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "sysemu/block-backend.h"
+#include "block/export.h"
+#include "qemu/error-report.h"
+#include "util/block-helpers.h"
+#include "subprojects/libvduse/libvduse.h"
+
+#include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/virtio_blk.h"
+
+#define VIRTIO_BLK_SECTOR_BITS 9
+#define VIRTIO_BLK_SECTOR_SIZE (1ULL << VIRTIO_BLK_SECTOR_BITS)
+
+#define VDUSE_DEFAULT_NUM_QUEUE 1
+#define VDUSE_DEFAULT_QUEUE_SIZE 256
+
+typedef struct VduseBlkExport {
+BlockExport export;
+VduseDev *dev;
+uint16_t num_queues;
+uint32_t blk_size;
+bool writable;
+} VduseBlkExport;
+
+struct virtio_blk_inhdr {
+unsigned char status;
+};
+
+typedef struct VduseBlkReq {
+VduseVirtqElement elem;
+int64_t sector_num;
+size_t in_len;
+struct virtio_blk_inhdr *in;
+struct virtio_blk_outhdr out;
+VduseVirtq *vq;
+} VduseBlkReq;
+
+static void vduse_blk_req_complete(VduseBlkReq *req)
+{
+vduse_queue_push(req->vq, &req->elem, req->in_len);
+vduse_queue_notify(req->vq);
+
+free(req);
+}
+
+static bool vduse_blk_sect_range_ok(VduseBlkExport *vblk_exp,
+uint64_t sector, size_t size)
+{
+uint64_t nb_sectors;
+uint64_t total_sectors;
+
+if (size % VIRTIO_BLK_SECTOR_SIZE) {
+return false;
+}
+
+nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS;
+
+QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE);
+if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
+return false;
+}
+if ((sector << VIRTIO_BLK

[PATCH v3 3/6] libvduse: Add VDUSE (vDPA Device in Userspace) library

2022-03-21 Thread Xie Yongji
VDUSE [1] is a linux framework that makes it possible to implement
software-emulated vDPA devices in userspace. This adds a library
as a subproject to help implementing VDUSE backends in QEMU.

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html

Signed-off-by: Xie Yongji 
---
 MAINTAINERS |5 +
 meson.build |   15 +
 meson_options.txt   |2 +
 scripts/meson-buildoptions.sh   |3 +
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/libvduse.c | 1161 +++
 subprojects/libvduse/libvduse.h |  235 
 subprojects/libvduse/linux-headers/linux|1 +
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 10 files changed, 1434 insertions(+)
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 12 subprojects/libvduse/linux-headers/linux
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

diff --git a/MAINTAINERS b/MAINTAINERS
index 9aed5f3e04..53a14bf7a8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3547,6 +3547,11 @@ L: qemu-block@nongnu.org
 S: Supported
 F: block/export/fuse.c
 
+VDUSE library
+M: Xie Yongji 
+S: Maintained
+F: subprojects/libvduse/
+
 Replication
 M: Wen Congyang 
 M: Xie Changlong 
diff --git a/meson.build b/meson.build
index bae62efc9c..5c71904461 100644
--- a/meson.build
+++ b/meson.build
@@ -1351,6 +1351,21 @@ if get_option('fuse_lseek').allowed()
   endif
 endif
 
+have_libvduse = (targetos == 'linux')
+if get_option('libvduse').enabled()
+if targetos != 'linux'
+error('libvduse requires linux')
+endif
+elif get_option('libvduse').disabled()
+have_libvduse = false
+endif
+
+libvduse = not_found
+if have_libvduse
+  libvduse_proj = subproject('libvduse')
+  libvduse = libvduse_proj.get_variable('libvduse_dep')
+endif
+
 # libbpf
 libbpf = dependency('libbpf', required: get_option('bpf'), method: 
'pkg-config')
 if libbpf.found() and not cc.links('''
diff --git a/meson_options.txt b/meson_options.txt
index 52b11cead4..e25af3277d 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -219,6 +219,8 @@ option('virtfs', type: 'feature', value: 'auto',
description: 'virtio-9p support')
 option('virtiofsd', type: 'feature', value: 'auto',
description: 'build virtiofs daemon (virtiofsd)')
+option('libvduse', type: 'feature', value: 'auto',
+   description: 'build VDUSE Library')
 
 option('capstone', type: 'combo', value: 'auto',
choices: ['disabled', 'enabled', 'auto', 'system', 'internal'],
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 1e26f4571e..ccab9ca9da 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -77,6 +77,7 @@ meson_options_help() {
   printf "%s\n" '  libssh  ssh block device support'
   printf "%s\n" '  libudev Use libudev to enumerate host devices'
   printf "%s\n" '  libusb  libusb support for USB passthrough'
+  printf "%s\n" '  libvdusebuild VDUSE Library'
   printf "%s\n" '  linux-aio   Linux AIO support'
   printf "%s\n" '  linux-io-uring  Linux io_uring support'
   printf "%s\n" '  live-block-migration'
@@ -244,6 +245,8 @@ _meson_option_parse() {
 --disable-libudev) printf "%s" -Dlibudev=disabled ;;
 --enable-libusb) printf "%s" -Dlibusb=enabled ;;
 --disable-libusb) printf "%s" -Dlibusb=disabled ;;
+--enable-libvduse) printf "%s" -Dlibvduse=enabled ;;
+--disable-libvduse) printf "%s" -Dlibvduse=disabled ;;
 --enable-linux-aio) printf "%s" -Dlinux_aio=enabled ;;
 --disable-linux-aio) printf "%s" -Dlinux_aio=disabled ;;
 --enable-linux-io-uring) printf "%s" -Dlinux_io_uring=enabled ;;
diff --git a/subprojects/libvduse/include/atomic.h 
b/subprojects/libvduse/include/atomic.h
new file mode 12
index 00..8c2be64f7b
--- /dev/null
+++ b/subprojects/libvduse/include/atomic.h
@@ -0,0 +1 @@
+../../../include/qemu/atomic.h
\ No newline at end of file
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
new file mode 100644
index 00..ecee9c0568
--- /dev/null
+++ b/subprojects/libvdus

[PATCH v3 6/6] libvduse: Add support for reconnecting

2022-03-21 Thread Xie Yongji
To support reconnecting after restart or crash, VDUSE backend
might need to resubmit inflight I/Os. This stores the metadata
such as the index of inflight I/O's descriptors to a shm file so
that VDUSE backend can restore them during reconnecting.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c|  14 ++
 subprojects/libvduse/libvduse.c | 235 +++-
 subprojects/libvduse/libvduse.h |  12 ++
 3 files changed, 256 insertions(+), 5 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index e027b2e5ff..b24b5aeda9 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -38,6 +38,7 @@ typedef struct VduseBlkExport {
 uint16_t num_queues;
 uint32_t blk_size;
 bool writable;
+char *recon_file;
 } VduseBlkExport;
 
 struct virtio_blk_inhdr {
@@ -233,6 +234,8 @@ static void vduse_blk_enable_queue(VduseDev *dev, 
VduseVirtq *vq)
 
 aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+/* Make sure we don't miss any kick afer reconnecting */
+eventfd_write(vduse_queue_get_fd(vq), 1);
 }
 
 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
@@ -404,6 +407,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 return -ENOMEM;
 }
 
+vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
+   g_get_tmp_dir(), exp->id);
+if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
+error_setg(errp, "failed to set reconnect log file");
+vduse_dev_destroy(vblk_exp->dev);
+g_free(vblk_exp->recon_file);
+return -EINVAL;
+}
+
 for (i = 0; i < num_queues; i++) {
 vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
 }
@@ -427,6 +439,8 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 vblk_exp);
 blk_set_dev_ops(exp->blk, NULL, NULL);
 vduse_dev_destroy(vblk_exp->dev);
+unlink(vblk_exp->recon_file);
+g_free(vblk_exp->recon_file);
 }
 
 static void vduse_blk_exp_request_shutdown(BlockExport *exp)
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index ecee9c0568..b27145ceed 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -41,6 +41,8 @@
 #define VDUSE_VQ_ALIGN 4096
 #define MAX_IOVA_REGIONS 256
 
+#define LOG_ALIGNMENT 64
+
 /* Round number down to multiple */
 #define ALIGN_DOWN(n, m) ((n) / (m) * (m))
 
@@ -51,6 +53,31 @@
 #define unlikely(x)   __builtin_expect(!!(x), 0)
 #endif
 
+typedef struct VduseDescStateSplit {
+uint8_t inflight;
+uint8_t padding[5];
+uint16_t next;
+uint64_t counter;
+} VduseDescStateSplit;
+
+typedef struct VduseVirtqLogInflight {
+uint64_t features;
+uint16_t version;
+uint16_t desc_num;
+uint16_t last_batch_head;
+uint16_t used_idx;
+VduseDescStateSplit desc[];
+} VduseVirtqLogInflight;
+
+typedef struct VduseVirtqLog {
+VduseVirtqLogInflight inflight;
+} VduseVirtqLog;
+
+typedef struct VduseVirtqInflightDesc {
+uint16_t index;
+uint64_t counter;
+} VduseVirtqInflightDesc;
+
 typedef struct VduseRing {
 unsigned int num;
 uint64_t desc_addr;
@@ -73,6 +100,10 @@ struct VduseVirtq {
 bool ready;
 int fd;
 VduseDev *dev;
+VduseVirtqInflightDesc *resubmit_list;
+uint16_t resubmit_num;
+uint64_t counter;
+VduseVirtqLog *log;
 };
 
 typedef struct VduseIovaRegion {
@@ -96,8 +127,36 @@ struct VduseDev {
 int fd;
 int ctrl_fd;
 void *priv;
+void *log;
 };
 
+static inline size_t vduse_vq_log_size(uint16_t queue_size)
+{
+return ALIGN_UP(sizeof(VduseDescStateSplit) * queue_size +
+sizeof(VduseVirtqLogInflight), LOG_ALIGNMENT);
+}
+
+static void *vduse_log_get(const char *filename, size_t size)
+{
+void *ptr = MAP_FAILED;
+int fd;
+
+fd = open(filename, O_RDWR | O_CREAT, 0600);
+if (fd == -1) {
+return MAP_FAILED;
+}
+
+if (ftruncate(fd, size) == -1) {
+goto out;
+}
+
+ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+
+out:
+close(fd);
+return ptr;
+}
+
 static inline bool has_feature(uint64_t features, unsigned int fbit)
 {
 assert(fbit < 64);
@@ -148,6 +207,105 @@ static int vduse_inject_irq(VduseDev *dev, int index)
 return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, &index);
 }
 
+static int inflight_desc_compare(const void *a, const void *b)
+{
+VduseVirtqInflightDesc *desc0 = (VduseVirtqInflightDesc *)a,
+   *desc1 = (VduseVirtqInflightDesc *)b;
+
+if (desc1->counter > desc0->counter &&
+(desc1->counter - desc0->counter) < VIRTQUEUE_MAX_SIZE * 2) {
+return 1;
+}
+
+retu

[PATCH v3 2/6] linux-headers: Add vduse.h

2022-03-21 Thread Xie Yongji
This adds vduse header to linux headers so that the
relevant VDUSE API can be used in subsequent patches.

Signed-off-by: Xie Yongji 
---
 linux-headers/linux/vduse.h | 306 
 scripts/update-linux-headers.sh |   2 +-
 2 files changed, 307 insertions(+), 1 deletion(-)
 create mode 100644 linux-headers/linux/vduse.h

diff --git a/linux-headers/linux/vduse.h b/linux-headers/linux/vduse.h
new file mode 100644
index 00..d47b004ce6
--- /dev/null
+++ b/linux-headers/linux/vduse.h
@@ -0,0 +1,306 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _VDUSE_H_
+#define _VDUSE_H_
+
+#include 
+
+#define VDUSE_BASE 0x81
+
+/* The ioctls for control device (/dev/vduse/control) */
+
+#define VDUSE_API_VERSION  0
+
+/*
+ * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION).
+ * This is used for future extension.
+ */
+#define VDUSE_GET_API_VERSION  _IOR(VDUSE_BASE, 0x00, __u64)
+
+/* Set the version of VDUSE API that userspace supported. */
+#define VDUSE_SET_API_VERSION  _IOW(VDUSE_BASE, 0x01, __u64)
+
+/**
+ * struct vduse_dev_config - basic configuration of a VDUSE device
+ * @name: VDUSE device name, needs to be NUL terminated
+ * @vendor_id: virtio vendor id
+ * @device_id: virtio device id
+ * @features: virtio features
+ * @vq_num: the number of virtqueues
+ * @vq_align: the allocation alignment of virtqueue's metadata
+ * @reserved: for future use, needs to be initialized to zero
+ * @config_size: the size of the configuration space
+ * @config: the buffer of the configuration space
+ *
+ * Structure used by VDUSE_CREATE_DEV ioctl to create VDUSE device.
+ */
+struct vduse_dev_config {
+#define VDUSE_NAME_MAX 256
+   char name[VDUSE_NAME_MAX];
+   __u32 vendor_id;
+   __u32 device_id;
+   __u64 features;
+   __u32 vq_num;
+   __u32 vq_align;
+   __u32 reserved[13];
+   __u32 config_size;
+   __u8 config[];
+};
+
+/* Create a VDUSE device which is represented by a char device 
(/dev/vduse/$NAME) */
+#define VDUSE_CREATE_DEV   _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config)
+
+/*
+ * Destroy a VDUSE device. Make sure there are no more references
+ * to the char device (/dev/vduse/$NAME).
+ */
+#define VDUSE_DESTROY_DEV  _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX])
+
+/* The ioctls for VDUSE device (/dev/vduse/$NAME) */
+
+/**
+ * struct vduse_iotlb_entry - entry of IOTLB to describe one IOVA region 
[start, last]
+ * @offset: the mmap offset on returned file descriptor
+ * @start: start of the IOVA region
+ * @last: last of the IOVA region
+ * @perm: access permission of the IOVA region
+ *
+ * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA 
region.
+ */
+struct vduse_iotlb_entry {
+   __u64 offset;
+   __u64 start;
+   __u64 last;
+#define VDUSE_ACCESS_RO 0x1
+#define VDUSE_ACCESS_WO 0x2
+#define VDUSE_ACCESS_RW 0x3
+   __u8 perm;
+};
+
+/*
+ * Find the first IOVA region that overlaps with the range [start, last]
+ * and return the corresponding file descriptor. Return -EINVAL means the
+ * IOVA region doesn't exist. Caller should set start and last fields.
+ */
+#define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x10, struct 
vduse_iotlb_entry)
+
+/*
+ * Get the negotiated virtio features. It's a subset of the features in
+ * struct vduse_dev_config which can be accepted by virtio driver. It's
+ * only valid after FEATURES_OK status bit is set.
+ */
+#define VDUSE_DEV_GET_FEATURES _IOR(VDUSE_BASE, 0x11, __u64)
+
+/**
+ * struct vduse_config_data - data used to update configuration space
+ * @offset: the offset from the beginning of configuration space
+ * @length: the length to write to configuration space
+ * @buffer: the buffer used to write from
+ *
+ * Structure used by VDUSE_DEV_SET_CONFIG ioctl to update device
+ * configuration space.
+ */
+struct vduse_config_data {
+   __u32 offset;
+   __u32 length;
+   __u8 buffer[];
+};
+
+/* Set device configuration space */
+#define VDUSE_DEV_SET_CONFIG   _IOW(VDUSE_BASE, 0x12, struct vduse_config_data)
+
+/*
+ * Inject a config interrupt. It's usually used to notify virtio driver
+ * that device configuration space has changed.
+ */
+#define VDUSE_DEV_INJECT_CONFIG_IRQ_IO(VDUSE_BASE, 0x13)
+
+/**
+ * struct vduse_vq_config - basic configuration of a virtqueue
+ * @index: virtqueue index
+ * @max_size: the max size of virtqueue
+ * @reserved: for future use, needs to be initialized to zero
+ *
+ * Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue.
+ */
+struct vduse_vq_config {
+   __u32 index;
+   __u16 max_size;
+   __u16 reserved[13];
+};
+
+/*
+ * Setup the specified virtqueue. Make sure all virtqueues have been
+ * configured before the device is attached to vDPA bus.
+ */
+#define VDUSE_VQ_SETUP _IOW(VDUSE_BASE, 0x14, struct vduse_vq_config)
+
+/**
+ * struct vduse_vq_state_split - split virtqueue state
+ * @ava

[PATCH v3 5/6] vduse-blk: Add vduse-blk resize support

2022-03-21 Thread Xie Yongji
To support block resize, this uses vduse_dev_update_config()
to update the capacity field in configuration space and inject
config interrupt on the block resize callback.

Signed-off-by: Xie Yongji 
Reviewed-by: Stefan Hajnoczi 
---
 block/export/vduse-blk.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 3f4e0df34b..e027b2e5ff 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -310,6 +310,23 @@ static void blk_aio_detach(void *opaque)
 vblk_exp->export.ctx = NULL;
 }
 
+static void vduse_blk_resize(void *opaque)
+{
+BlockExport *exp = opaque;
+VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+struct virtio_blk_config config;
+
+config.capacity =
+cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
+vduse_dev_update_config(vblk_exp->dev, sizeof(config.capacity),
+offsetof(struct virtio_blk_config, capacity),
+(char *)&config.capacity);
+}
+
+static const BlockDevOps vduse_block_ops = {
+.resize_cb = vduse_blk_resize,
+};
+
 static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
 Error **errp)
 {
@@ -397,6 +414,8 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
  vblk_exp);
 
+blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
+
 return 0;
 }
 
@@ -406,6 +425,7 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 vblk_exp);
+blk_set_dev_ops(exp->blk, NULL, NULL);
 vduse_dev_destroy(vblk_exp->dev);
 }
 
-- 
2.20.1




[PATCH v3 0/6] Support exporting BDSs via VDUSE

2022-03-21 Thread Xie Yongji
Hi all,

Last few months ago, VDUSE (vDPA Device in Userspace) [1] has
been merged into Linux kernel as a framework that make it
possible to emulate a vDPA device in userspace. This series
aimed at implementing a VDUSE block backend based on the
qemu-storage-daemon infrastructure.

To support that, we firstly introduce a VDUSE library as a
subproject (like what libvhost-user does) to help implementing
VDUSE backends in QEMU. Then a VDUSE block export is implemented
based on this library. At last, we add resize and reconnect support
to the VDUSE block export and VDUSE library.

Since we don't support vdpa-blk in QEMU currently, the VM case is
tested with my previous patchset [2].

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html
[2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg797569.html

Please review, thanks!

V2 to V3:
- Introduce vduse_get_virtio_features() [Stefan]
- Update MAINTAINERS file [Stefan]
- Fix handler of VIRTIO_BLK_T_GET_ID request [Stefan]
- Add barrier for vduse_queue_inflight_get() [Stefan]

V1 to V2:
- Move vduse header to linux-headers [Stefan]
- Add two new API to support creating device from /dev/vduse/$NAME or
  file descriptor [Stefan]
- Check VIRTIO_F_VERSION_1 during intialization [Stefan]
- Replace malloc() + memset to calloc() [Stefan]
- Increase default queue size to 256 for vduse-blk [Stefan]
- Zero-initialize virtio-blk config space [Stefan]
- Add a patch to support reset blk->dev_ops
- Validate vq->log->inflight fields [Stefan]
- Add vduse_set_reconnect_log_file() API to support specifing the
  reconnect log file
- Fix some bugs [Stefan]

Xie Yongji (6):
  block: Support passing NULL ops to blk_set_dev_ops()
  linux-headers: Add vduse.h
  libvduse: Add VDUSE (vDPA Device in Userspace) library
  vduse-blk: implements vduse-blk export
  vduse-blk: Add vduse-blk resize support
  libvduse: Add support for reconnecting

 MAINTAINERS |7 +
 block/block-backend.c   |2 +-
 block/export/export.c   |6 +
 block/export/meson.build|5 +
 block/export/vduse-blk.c|  459 ++
 block/export/vduse-blk.h|   20 +
 linux-headers/linux/vduse.h |  306 
 meson.build |   28 +
 meson_options.txt   |4 +
 qapi/block-export.json  |   24 +-
 scripts/meson-buildoptions.sh   |7 +
 scripts/update-linux-headers.sh |2 +-
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/libvduse.c | 1386 +++
 subprojects/libvduse/libvduse.h |  247 
 subprojects/libvduse/linux-headers/linux|1 +
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 18 files changed, 2512 insertions(+), 4 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h
 create mode 100644 linux-headers/linux/vduse.h
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 12 subprojects/libvduse/linux-headers/linux
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

-- 
2.20.1




[PATCH v3 1/6] block: Support passing NULL ops to blk_set_dev_ops()

2022-03-21 Thread Xie Yongji
This supports passing NULL ops to blk_set_dev_ops()
so that we can remove stale ops in some cases.

Signed-off-by: Xie Yongji 
---
 block/block-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index e0e1aff4b1..35457a6a1d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1062,7 +1062,7 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps 
*ops,
 blk->dev_opaque = opaque;
 
 /* Are we currently quiesced? Should we enforce this right now? */
-if (blk->quiesce_counter && ops->drained_begin) {
+if (blk->quiesce_counter && ops && ops->drained_begin) {
 ops->drained_begin(opaque);
 }
 }
-- 
2.20.1




[PATCH v2 6/6] libvduse: Add support for reconnecting

2022-02-15 Thread Xie Yongji
To support reconnecting after restart or crash, VDUSE backend
might need to resubmit inflight I/Os. This stores the metadata
such as the index of inflight I/O's descriptors to a shm file so
that VDUSE backend can restore them during reconnecting.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c|  14 ++
 subprojects/libvduse/libvduse.c | 232 +++-
 subprojects/libvduse/libvduse.h |  12 ++
 3 files changed, 253 insertions(+), 5 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index e456dfe2b3..b519a34370 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -38,6 +38,7 @@ typedef struct VduseBlkExport {
 uint16_t num_queues;
 uint32_t blk_size;
 bool writable;
+char *recon_file;
 } VduseBlkExport;
 
 struct virtio_blk_inhdr {
@@ -232,6 +233,8 @@ static void vduse_blk_enable_queue(VduseDev *dev, 
VduseVirtq *vq)
 
 aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+/* Make sure we don't miss any kick afer reconnecting */
+eventfd_write(vduse_queue_get_fd(vq), 1);
 }
 
 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
@@ -407,6 +410,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 return -ENOMEM;
 }
 
+vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
+   g_get_tmp_dir(), exp->id);
+if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
+error_setg(errp, "failed to set reconnect log file");
+vduse_dev_destroy(vblk_exp->dev);
+g_free(vblk_exp->recon_file);
+return -EINVAL;
+}
+
 for (i = 0; i < num_queues; i++) {
 vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
 }
@@ -430,6 +442,8 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 vblk_exp);
 blk_set_dev_ops(exp->blk, NULL, NULL);
 vduse_dev_destroy(vblk_exp->dev);
+unlink(vblk_exp->recon_file);
+g_free(vblk_exp->recon_file);
 }
 
 static void vduse_blk_exp_request_shutdown(BlockExport *exp)
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index 9c6a8bcf6e..31f7d6e119 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -41,6 +41,8 @@
 #define VDUSE_VQ_ALIGN 4096
 #define MAX_IOVA_REGIONS 256
 
+#define LOG_ALIGNMENT 64
+
 /* Round number down to multiple */
 #define ALIGN_DOWN(n, m) ((n) / (m) * (m))
 
@@ -51,6 +53,31 @@
 #define unlikely(x)   __builtin_expect(!!(x), 0)
 #endif
 
+typedef struct VduseDescStateSplit {
+uint8_t inflight;
+uint8_t padding[5];
+uint16_t next;
+uint64_t counter;
+} VduseDescStateSplit;
+
+typedef struct VduseVirtqLogInflight {
+uint64_t features;
+uint16_t version;
+uint16_t desc_num;
+uint16_t last_batch_head;
+uint16_t used_idx;
+VduseDescStateSplit desc[];
+} VduseVirtqLogInflight;
+
+typedef struct VduseVirtqLog {
+VduseVirtqLogInflight inflight;
+} VduseVirtqLog;
+
+typedef struct VduseVirtqInflightDesc {
+uint16_t index;
+uint64_t counter;
+} VduseVirtqInflightDesc;
+
 typedef struct VduseRing {
 unsigned int num;
 uint64_t desc_addr;
@@ -73,6 +100,10 @@ struct VduseVirtq {
 bool ready;
 int fd;
 VduseDev *dev;
+VduseVirtqInflightDesc *resubmit_list;
+uint16_t resubmit_num;
+uint64_t counter;
+VduseVirtqLog *log;
 };
 
 typedef struct VduseIovaRegion {
@@ -96,8 +127,36 @@ struct VduseDev {
 int fd;
 int ctrl_fd;
 void *priv;
+void *log;
 };
 
+static inline size_t vduse_vq_log_size(uint16_t queue_size)
+{
+return ALIGN_UP(sizeof(VduseDescStateSplit) * queue_size +
+sizeof(VduseVirtqLogInflight), LOG_ALIGNMENT);
+}
+
+static void *vduse_log_get(const char *filename, size_t size)
+{
+void *ptr = MAP_FAILED;
+int fd;
+
+fd = open(filename, O_RDWR | O_CREAT, 0600);
+if (fd == -1) {
+return MAP_FAILED;
+}
+
+if (ftruncate(fd, size) == -1) {
+goto out;
+}
+
+ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+
+out:
+close(fd);
+return ptr;
+}
+
 static inline bool has_feature(uint64_t features, unsigned int fbit)
 {
 assert(fbit < 64);
@@ -139,6 +198,102 @@ static int vduse_inject_irq(VduseDev *dev, int index)
 return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, &index);
 }
 
+static int inflight_desc_compare(const void *a, const void *b)
+{
+VduseVirtqInflightDesc *desc0 = (VduseVirtqInflightDesc *)a,
+   *desc1 = (VduseVirtqInflightDesc *)b;
+
+if (desc1->counter > desc0->counter &&
+(desc1->counter - desc0->counter) < VIRTQUEUE_MAX_SIZE * 2) {
+return 1;
+}
+
+retu

[PATCH v2 5/6] vduse-blk: Add vduse-blk resize support

2022-02-15 Thread Xie Yongji
To support block resize, this uses vduse_dev_update_config()
to update the capacity field in configuration space and inject
config interrupt on the block resize callback.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 942f985de3..e456dfe2b3 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -309,6 +309,23 @@ static void blk_aio_detach(void *opaque)
 vblk_exp->export.ctx = NULL;
 }
 
+static void vduse_blk_resize(void *opaque)
+{
+BlockExport *exp = opaque;
+VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+struct virtio_blk_config config;
+
+config.capacity =
+cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
+vduse_dev_update_config(vblk_exp->dev, sizeof(config.capacity),
+offsetof(struct virtio_blk_config, capacity),
+(char *)&config.capacity);
+}
+
+static const BlockDevOps vduse_block_ops = {
+.resize_cb = vduse_blk_resize,
+};
+
 static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
 Error **errp)
 {
@@ -400,6 +417,8 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
  vblk_exp);
 
+blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
+
 return 0;
 }
 
@@ -409,6 +428,7 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 vblk_exp);
+blk_set_dev_ops(exp->blk, NULL, NULL);
 vduse_dev_destroy(vblk_exp->dev);
 }
 
-- 
2.20.1




[PATCH v2 3/6] libvduse: Add VDUSE (vDPA Device in Userspace) library

2022-02-15 Thread Xie Yongji
VDUSE [1] is a linux framework that makes it possible to implement
software-emulated vDPA devices in userspace. This adds a library
as a subproject to help implementing VDUSE backends in QEMU.

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html

Signed-off-by: Xie Yongji 
---
 meson.build |   15 +
 meson_options.txt   |2 +
 scripts/meson-buildoptions.sh   |3 +
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/libvduse.c | 1152 +++
 subprojects/libvduse/libvduse.h |  225 
 subprojects/libvduse/linux-headers/linux|1 +
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 9 files changed, 1410 insertions(+)
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 12 subprojects/libvduse/linux-headers/linux
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

diff --git a/meson.build b/meson.build
index ae5f7eec6e..27e6e07110 100644
--- a/meson.build
+++ b/meson.build
@@ -1304,6 +1304,21 @@ if not get_option('fuse_lseek').disabled()
   endif
 endif
 
+have_libvduse = (targetos == 'linux')
+if get_option('libvduse').enabled()
+if targetos != 'linux'
+error('libvduse requires linux')
+endif
+elif get_option('libvduse').disabled()
+have_libvduse = false
+endif
+
+libvduse = not_found
+if have_libvduse
+  libvduse_proj = subproject('libvduse')
+  libvduse = libvduse_proj.get_variable('libvduse_dep')
+endif
+
 # libbpf
 libbpf = dependency('libbpf', required: get_option('bpf'), method: 
'pkg-config')
 if libbpf.found() and not cc.links('''
diff --git a/meson_options.txt b/meson_options.txt
index 95d527f773..7099ab7b8b 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -193,6 +193,8 @@ option('virtfs', type: 'feature', value: 'auto',
description: 'virtio-9p support')
 option('virtiofsd', type: 'feature', value: 'auto',
description: 'build virtiofs daemon (virtiofsd)')
+option('libvduse', type: 'feature', value: 'auto',
+   description: 'build VDUSE Library')
 
 option('capstone', type: 'combo', value: 'auto',
choices: ['disabled', 'enabled', 'auto', 'system', 'internal'],
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 48a454cece..8f031ea92e 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -58,6 +58,7 @@ meson_options_help() {
   printf "%s\n" '  libssh  ssh block device support'
   printf "%s\n" '  libudev Use libudev to enumerate host devices'
   printf "%s\n" '  libusb  libusb support for USB passthrough'
+  printf "%s\n" '  libvdusebuild VDUSE Library'
   printf "%s\n" '  linux-aio   Linux AIO support'
   printf "%s\n" '  linux-io-uring  Linux io_uring support'
   printf "%s\n" '  lzfse   lzfse support for DMG images'
@@ -187,6 +188,8 @@ _meson_option_parse() {
 --disable-libudev) printf "%s" -Dlibudev=disabled ;;
 --enable-libusb) printf "%s" -Dlibusb=enabled ;;
 --disable-libusb) printf "%s" -Dlibusb=disabled ;;
+--enable-libvduse) printf "%s" -Dlibvduse=enabled ;;
+--disable-libvduse) printf "%s" -Dlibvduse=disabled ;;
 --enable-linux-aio) printf "%s" -Dlinux_aio=enabled ;;
 --disable-linux-aio) printf "%s" -Dlinux_aio=disabled ;;
 --enable-linux-io-uring) printf "%s" -Dlinux_io_uring=enabled ;;
diff --git a/subprojects/libvduse/include/atomic.h 
b/subprojects/libvduse/include/atomic.h
new file mode 12
index 00..8c2be64f7b
--- /dev/null
+++ b/subprojects/libvduse/include/atomic.h
@@ -0,0 +1 @@
+../../../include/qemu/atomic.h
\ No newline at end of file
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
new file mode 100644
index 00..9c6a8bcf6e
--- /dev/null
+++ b/subprojects/libvduse/libvduse.c
@@ -0,0 +1,1152 @@
+/*
+ * VDUSE (vDPA Device in Userspace) library
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *   Portions of codes and concepts borrowed from libvhost-user.c, so:
+ * Copyright IBM, Corp. 2007
+ * Copyright (c) 2016 Red Hat, Inc.
+ *
+ * Author:
+ *   Xie Yongji 

[PATCH v2 1/6] block: Support passing NULL ops to blk_set_dev_ops()

2022-02-15 Thread Xie Yongji
This supports passing NULL ops to blk_set_dev_ops()
so that we can remove stale ops in some cases.

Signed-off-by: Xie Yongji 
---
 block/block-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 4ff6b4d785..08dd0a3093 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1015,7 +1015,7 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps 
*ops,
 blk->dev_opaque = opaque;
 
 /* Are we currently quiesced? Should we enforce this right now? */
-if (blk->quiesce_counter && ops->drained_begin) {
+if (blk->quiesce_counter && ops && ops->drained_begin) {
 ops->drained_begin(opaque);
 }
 }
-- 
2.20.1




[PATCH v2 4/6] vduse-blk: implements vduse-blk export

2022-02-15 Thread Xie Yongji
This implements a VDUSE block backends based on
the libvduse library. We can use it to export the BDSs
for both VM and container (host) usage.

The new command-line syntax is:

$ qemu-storage-daemon \
--blockdev file,node-name=drive0,filename=test.img \
--export vduse-blk,node-name=drive0,id=vduse-export0,writable=on

After the qemu-storage-daemon started, we need to use
the "vdpa" command to attach the device to vDPA bus:

$ vdpa dev add name vduse-export0 mgmtdev vduse

Also the device must be removed via the "vdpa" command
before we stop the qemu-storage-daemon.

Signed-off-by: Xie Yongji 
---
 block/export/export.c |   6 +
 block/export/meson.build  |   5 +
 block/export/vduse-blk.c  | 428 ++
 block/export/vduse-blk.h  |  20 ++
 meson.build   |  13 ++
 meson_options.txt |   2 +
 qapi/block-export.json|  24 +-
 scripts/meson-buildoptions.sh |   4 +
 8 files changed, 500 insertions(+), 2 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h

diff --git a/block/export/export.c b/block/export/export.c
index 6d3b9964c8..00dd505540 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -26,6 +26,9 @@
 #ifdef CONFIG_VHOST_USER_BLK_SERVER
 #include "vhost-user-blk-server.h"
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+#include "vduse-blk.h"
+#endif
 
 static const BlockExportDriver *blk_exp_drivers[] = {
 &blk_exp_nbd,
@@ -35,6 +38,9 @@ static const BlockExportDriver *blk_exp_drivers[] = {
 #ifdef CONFIG_FUSE
 &blk_exp_fuse,
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+&blk_exp_vduse_blk,
+#endif
 };
 
 /* Only accessed from the main thread */
diff --git a/block/export/meson.build b/block/export/meson.build
index 0a08e384c7..cf311d2b1b 100644
--- a/block/export/meson.build
+++ b/block/export/meson.build
@@ -5,3 +5,8 @@ if have_vhost_user_blk_server
 endif
 
 blockdev_ss.add(when: fuse, if_true: files('fuse.c'))
+
+if have_vduse_blk_export
+blockdev_ss.add(files('vduse-blk.c'))
+blockdev_ss.add(libvduse)
+endif
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
new file mode 100644
index 00..942f985de3
--- /dev/null
+++ b/block/export/vduse-blk.c
@@ -0,0 +1,428 @@
+/*
+ * Export QEMU block device via VDUSE
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *   Portions of codes and concepts borrowed from vhost-user-blk-server.c, so:
+ * Copyright (c) 2020 Red Hat, Inc.
+ *
+ * Author:
+ *   Xie Yongji 
+ *   Coiby Xu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "sysemu/block-backend.h"
+#include "block/export.h"
+#include "qemu/error-report.h"
+#include "util/block-helpers.h"
+#include "subprojects/libvduse/libvduse.h"
+
+#include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/virtio_blk.h"
+
+#define VIRTIO_BLK_SECTOR_BITS 9
+#define VIRTIO_BLK_SECTOR_SIZE (1ULL << VIRTIO_BLK_SECTOR_BITS)
+
+#define VDUSE_DEFAULT_NUM_QUEUE 1
+#define VDUSE_DEFAULT_QUEUE_SIZE 256
+
+typedef struct VduseBlkExport {
+BlockExport export;
+VduseDev *dev;
+uint16_t num_queues;
+uint32_t blk_size;
+bool writable;
+} VduseBlkExport;
+
+struct virtio_blk_inhdr {
+unsigned char status;
+};
+
+typedef struct VduseBlkReq {
+VduseVirtqElement elem;
+int64_t sector_num;
+size_t in_len;
+struct virtio_blk_inhdr *in;
+struct virtio_blk_outhdr out;
+VduseVirtq *vq;
+} VduseBlkReq;
+
+static void vduse_blk_req_complete(VduseBlkReq *req)
+{
+vduse_queue_push(req->vq, &req->elem, req->in_len);
+vduse_queue_notify(req->vq);
+
+free(req);
+}
+
+static bool vduse_blk_sect_range_ok(VduseBlkExport *vblk_exp,
+uint64_t sector, size_t size)
+{
+uint64_t nb_sectors;
+uint64_t total_sectors;
+
+if (size % VIRTIO_BLK_SECTOR_SIZE) {
+return false;
+}
+
+nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS;
+
+QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE);
+if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
+return false;
+}
+if ((sector << VIRTIO_BLK_SECTOR_BITS) % vblk_exp->blk_size) {
+return false;
+}
+blk_get_geometry(vblk_exp->export.blk, &total_sectors);
+if (sector > total_sectors || nb_sectors > total_sectors - sector) {
+return false;
+}
+return true;
+}
+
+static void coroutine_fn vduse_blk_virtio_process_req(void *opaque)
+{
+VduseBlkReq *req = opaque;
+VduseVirtq *vq = req->vq;
+VduseDev *dev = vduse_queue_get_dev(v

[PATCH v2 0/6] Support exporting BDSs via VDUSE

2022-02-15 Thread Xie Yongji
Hi all,

Last few months ago, VDUSE (vDPA Device in Userspace) [1] has
been merged into Linux kernel as a framework that make it
possible to emulate a vDPA device in userspace. This series
aimed at implementing a VDUSE block backend based on the
qemu-storage-daemon infrastructure.

To support that, we firstly introduce a VDUSE library as a
subproject (like what libvhost-user does) to help implementing
VDUSE backends in QEMU. Then a VDUSE block export is implemented
based on this library. At last, we add resize and reconnect support
to the VDUSE block export and VDUSE library.

Since we don't support vdpa-blk in QEMU currently, the VM case is
tested with my previous patchset [2].

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html
[2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg797569.html

Please review, thanks!

V1 to V2:
- Move vduse header to linux-headers [Stefan]
- Add two new API to support creating device from /dev/vduse/$NAME or
  file descriptor [Stefan]
- Check VIRTIO_F_VERSION_1 during intialization [Stefan]
- Replace malloc() + memset to calloc() [Stefan]
- Increase default queue size to 256 for vduse-blk [Stefan]
- Zero-initialize virtio-blk config space [Stefan]
- Add a patch to support reset blk->dev_ops
- Validate vq->log->inflight fields [Stefan]
- Add vduse_set_reconnect_log_file() API to support specifing the
  reconnect log file
- Fix some bugs [Stefan]

Xie Yongji (6):
  block: Support passing NULL ops to blk_set_dev_ops()
  linux-headers: Add vduse.h
  libvduse: Add VDUSE (vDPA Device in Userspace) library
  vduse-blk: implements vduse-blk export
  vduse-blk: Add vduse-blk resize support
  libvduse: Add support for reconnecting

 block/block-backend.c   |2 +-
 block/export/export.c   |6 +
 block/export/meson.build|5 +
 block/export/vduse-blk.c|  462 +++
 block/export/vduse-blk.h|   20 +
 linux-headers/linux/vduse.h |  306 +
 meson.build |   28 +
 meson_options.txt   |4 +
 qapi/block-export.json  |   24 +-
 scripts/meson-buildoptions.sh   |7 +
 scripts/update-linux-headers.sh |2 +-
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/libvduse.c | 1374 +++
 subprojects/libvduse/libvduse.h |  237 
 subprojects/libvduse/linux-headers/linux|1 +
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 17 files changed, 2486 insertions(+), 4 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h
 create mode 100644 linux-headers/linux/vduse.h
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 12 subprojects/libvduse/linux-headers/linux
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

-- 
2.20.1




[PATCH v2 2/6] linux-headers: Add vduse.h

2022-02-15 Thread Xie Yongji
This adds vduse header to linux headers so that the
relevant VDUSE API can be used in subsequent patches.

Signed-off-by: Xie Yongji 
---
 linux-headers/linux/vduse.h | 306 
 scripts/update-linux-headers.sh |   2 +-
 2 files changed, 307 insertions(+), 1 deletion(-)
 create mode 100644 linux-headers/linux/vduse.h

diff --git a/linux-headers/linux/vduse.h b/linux-headers/linux/vduse.h
new file mode 100644
index 00..d47b004ce6
--- /dev/null
+++ b/linux-headers/linux/vduse.h
@@ -0,0 +1,306 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _VDUSE_H_
+#define _VDUSE_H_
+
+#include 
+
+#define VDUSE_BASE 0x81
+
+/* The ioctls for control device (/dev/vduse/control) */
+
+#define VDUSE_API_VERSION  0
+
+/*
+ * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION).
+ * This is used for future extension.
+ */
+#define VDUSE_GET_API_VERSION  _IOR(VDUSE_BASE, 0x00, __u64)
+
+/* Set the version of VDUSE API that userspace supported. */
+#define VDUSE_SET_API_VERSION  _IOW(VDUSE_BASE, 0x01, __u64)
+
+/**
+ * struct vduse_dev_config - basic configuration of a VDUSE device
+ * @name: VDUSE device name, needs to be NUL terminated
+ * @vendor_id: virtio vendor id
+ * @device_id: virtio device id
+ * @features: virtio features
+ * @vq_num: the number of virtqueues
+ * @vq_align: the allocation alignment of virtqueue's metadata
+ * @reserved: for future use, needs to be initialized to zero
+ * @config_size: the size of the configuration space
+ * @config: the buffer of the configuration space
+ *
+ * Structure used by VDUSE_CREATE_DEV ioctl to create VDUSE device.
+ */
+struct vduse_dev_config {
+#define VDUSE_NAME_MAX 256
+   char name[VDUSE_NAME_MAX];
+   __u32 vendor_id;
+   __u32 device_id;
+   __u64 features;
+   __u32 vq_num;
+   __u32 vq_align;
+   __u32 reserved[13];
+   __u32 config_size;
+   __u8 config[];
+};
+
+/* Create a VDUSE device which is represented by a char device 
(/dev/vduse/$NAME) */
+#define VDUSE_CREATE_DEV   _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config)
+
+/*
+ * Destroy a VDUSE device. Make sure there are no more references
+ * to the char device (/dev/vduse/$NAME).
+ */
+#define VDUSE_DESTROY_DEV  _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX])
+
+/* The ioctls for VDUSE device (/dev/vduse/$NAME) */
+
+/**
+ * struct vduse_iotlb_entry - entry of IOTLB to describe one IOVA region 
[start, last]
+ * @offset: the mmap offset on returned file descriptor
+ * @start: start of the IOVA region
+ * @last: last of the IOVA region
+ * @perm: access permission of the IOVA region
+ *
+ * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA 
region.
+ */
+struct vduse_iotlb_entry {
+   __u64 offset;
+   __u64 start;
+   __u64 last;
+#define VDUSE_ACCESS_RO 0x1
+#define VDUSE_ACCESS_WO 0x2
+#define VDUSE_ACCESS_RW 0x3
+   __u8 perm;
+};
+
+/*
+ * Find the first IOVA region that overlaps with the range [start, last]
+ * and return the corresponding file descriptor. Return -EINVAL means the
+ * IOVA region doesn't exist. Caller should set start and last fields.
+ */
+#define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x10, struct 
vduse_iotlb_entry)
+
+/*
+ * Get the negotiated virtio features. It's a subset of the features in
+ * struct vduse_dev_config which can be accepted by virtio driver. It's
+ * only valid after FEATURES_OK status bit is set.
+ */
+#define VDUSE_DEV_GET_FEATURES _IOR(VDUSE_BASE, 0x11, __u64)
+
+/**
+ * struct vduse_config_data - data used to update configuration space
+ * @offset: the offset from the beginning of configuration space
+ * @length: the length to write to configuration space
+ * @buffer: the buffer used to write from
+ *
+ * Structure used by VDUSE_DEV_SET_CONFIG ioctl to update device
+ * configuration space.
+ */
+struct vduse_config_data {
+   __u32 offset;
+   __u32 length;
+   __u8 buffer[];
+};
+
+/* Set device configuration space */
+#define VDUSE_DEV_SET_CONFIG   _IOW(VDUSE_BASE, 0x12, struct vduse_config_data)
+
+/*
+ * Inject a config interrupt. It's usually used to notify virtio driver
+ * that device configuration space has changed.
+ */
+#define VDUSE_DEV_INJECT_CONFIG_IRQ_IO(VDUSE_BASE, 0x13)
+
+/**
+ * struct vduse_vq_config - basic configuration of a virtqueue
+ * @index: virtqueue index
+ * @max_size: the max size of virtqueue
+ * @reserved: for future use, needs to be initialized to zero
+ *
+ * Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue.
+ */
+struct vduse_vq_config {
+   __u32 index;
+   __u16 max_size;
+   __u16 reserved[13];
+};
+
+/*
+ * Setup the specified virtqueue. Make sure all virtqueues have been
+ * configured before the device is attached to vDPA bus.
+ */
+#define VDUSE_VQ_SETUP _IOW(VDUSE_BASE, 0x14, struct vduse_vq_config)
+
+/**
+ * struct vduse_vq_state_split - split virtqueue state
+ * @ava

[PATCH 4/5] vduse-blk: Add vduse-blk resize support

2022-01-25 Thread Xie Yongji
To support block resize, this uses vduse_dev_update_config()
to update the capacity field in configuration space and inject
config interrupt on the block resize callback.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 5a8d289685..83845e9a9a 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -297,6 +297,23 @@ static void blk_aio_detach(void *opaque)
 vblk_exp->export.ctx = NULL;
 }
 
+static void vduse_blk_resize(void *opaque)
+{
+BlockExport *exp = opaque;
+VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+struct virtio_blk_config config;
+
+config.capacity =
+cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
+vduse_dev_update_config(vblk_exp->dev, sizeof(config.capacity),
+offsetof(struct virtio_blk_config, capacity),
+(char *)&config.capacity);
+}
+
+static const BlockDevOps vduse_block_ops = {
+.resize_cb = vduse_blk_resize,
+};
+
 static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
 Error **errp)
 {
@@ -387,6 +404,8 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
  vblk_exp);
 
+blk_set_dev_ops(exp->blk, &vduse_block_ops, exp);
+
 return 0;
 }
 
-- 
2.20.1




[PATCH 2/5] libvduse: Add VDUSE (vDPA Device in Userspace) library

2022-01-25 Thread Xie Yongji
VDUSE [1] is a linux framework that makes it possible to implement
software-emulated vDPA devices in userspace. This adds a library
as a subproject to help implementing VDUSE backends in QEMU.

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html

Signed-off-by: Xie Yongji 
---
 meson.build |   15 +
 meson_options.txt   |2 +
 scripts/meson-buildoptions.sh   |3 +
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/libvduse.c | 1025 +++
 subprojects/libvduse/libvduse.h |  193 
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 8 files changed, 1250 insertions(+)
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

diff --git a/meson.build b/meson.build
index 333c61deba..864fb50ade 100644
--- a/meson.build
+++ b/meson.build
@@ -1305,6 +1305,21 @@ if not get_option('fuse_lseek').disabled()
   endif
 endif
 
+have_libvduse = (targetos == 'linux')
+if get_option('libvduse').enabled()
+if targetos != 'linux'
+error('libvduse requires linux')
+endif
+elif get_option('libvduse').disabled()
+have_libvduse = false
+endif
+
+libvduse = not_found
+if have_libvduse
+  libvduse_proj = subproject('libvduse')
+  libvduse = libvduse_proj.get_variable('libvduse_dep')
+endif
+
 # libbpf
 libbpf = dependency('libbpf', required: get_option('bpf'), method: 
'pkg-config')
 if libbpf.found() and not cc.links('''
diff --git a/meson_options.txt b/meson_options.txt
index 921967eddb..16790d1814 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -195,6 +195,8 @@ option('virtfs', type: 'feature', value: 'auto',
description: 'virtio-9p support')
 option('virtiofsd', type: 'feature', value: 'auto',
description: 'build virtiofs daemon (virtiofsd)')
+option('libvduse', type: 'feature', value: 'auto',
+   description: 'build VDUSE Library')
 
 option('capstone', type: 'combo', value: 'auto',
choices: ['disabled', 'enabled', 'auto', 'system', 'internal'],
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index a4af02c527..af5c75d758 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -58,6 +58,7 @@ meson_options_help() {
   printf "%s\n" '  libssh  ssh block device support'
   printf "%s\n" '  libudev Use libudev to enumerate host devices'
   printf "%s\n" '  libusb  libusb support for USB passthrough'
+  printf "%s\n" '  libvdusebuild VDUSE Library'
   printf "%s\n" '  libxml2 libxml2 support for Parallels image format'
   printf "%s\n" '  linux-aio   Linux AIO support'
   printf "%s\n" '  linux-io-uring  Linux io_uring support'
@@ -188,6 +189,8 @@ _meson_option_parse() {
 --disable-libudev) printf "%s" -Dlibudev=disabled ;;
 --enable-libusb) printf "%s" -Dlibusb=enabled ;;
 --disable-libusb) printf "%s" -Dlibusb=disabled ;;
+--enable-libvduse) printf "%s" -Dlibvduse=enabled ;;
+--disable-libvduse) printf "%s" -Dlibvduse=disabled ;;
 --enable-libxml2) printf "%s" -Dlibxml2=enabled ;;
 --disable-libxml2) printf "%s" -Dlibxml2=disabled ;;
 --enable-linux-aio) printf "%s" -Dlinux_aio=enabled ;;
diff --git a/subprojects/libvduse/include/atomic.h 
b/subprojects/libvduse/include/atomic.h
new file mode 12
index 00..8c2be64f7b
--- /dev/null
+++ b/subprojects/libvduse/include/atomic.h
@@ -0,0 +1 @@
+../../../include/qemu/atomic.h
\ No newline at end of file
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
new file mode 100644
index 00..7671864bca
--- /dev/null
+++ b/subprojects/libvduse/libvduse.c
@@ -0,0 +1,1025 @@
+/*
+ * VDUSE (vDPA Device in Userspace) library
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *   Portions of codes and concepts borrowed from libvhost-user.c, so:
+ * Copyright IBM, Corp. 2007
+ * Copyright (c) 2016 Red Hat, Inc.
+ *
+ * Author:
+ *   Xie Yongji 
+ *   Anthony Liguori 
+ *   Marc-André Lureau 
+ *   Victor Kaplansky 
+ *
+ * This work is licensed under the t

[PATCH 1/5] linux-headers: Add vduse.h

2022-01-25 Thread Xie Yongji
This adds vduse header to standard headers so that the
relevant VDUSE API can be used in subsequent patches.

Signed-off-by: Xie Yongji 
---
 include/standard-headers/linux/vduse.h | 306 +
 scripts/update-linux-headers.sh|   1 +
 2 files changed, 307 insertions(+)
 create mode 100644 include/standard-headers/linux/vduse.h

diff --git a/include/standard-headers/linux/vduse.h 
b/include/standard-headers/linux/vduse.h
new file mode 100644
index 00..4242bc9fdf
--- /dev/null
+++ b/include/standard-headers/linux/vduse.h
@@ -0,0 +1,306 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _VDUSE_H_
+#define _VDUSE_H_
+
+#include "standard-headers/linux/types.h"
+
+#define VDUSE_BASE 0x81
+
+/* The ioctls for control device (/dev/vduse/control) */
+
+#define VDUSE_API_VERSION  0
+
+/*
+ * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION).
+ * This is used for future extension.
+ */
+#define VDUSE_GET_API_VERSION  _IOR(VDUSE_BASE, 0x00, uint64_t)
+
+/* Set the version of VDUSE API that userspace supported. */
+#define VDUSE_SET_API_VERSION  _IOW(VDUSE_BASE, 0x01, uint64_t)
+
+/**
+ * struct vduse_dev_config - basic configuration of a VDUSE device
+ * @name: VDUSE device name, needs to be NUL terminated
+ * @vendor_id: virtio vendor id
+ * @device_id: virtio device id
+ * @features: virtio features
+ * @vq_num: the number of virtqueues
+ * @vq_align: the allocation alignment of virtqueue's metadata
+ * @reserved: for future use, needs to be initialized to zero
+ * @config_size: the size of the configuration space
+ * @config: the buffer of the configuration space
+ *
+ * Structure used by VDUSE_CREATE_DEV ioctl to create VDUSE device.
+ */
+struct vduse_dev_config {
+#define VDUSE_NAME_MAX 256
+   char name[VDUSE_NAME_MAX];
+   uint32_t vendor_id;
+   uint32_t device_id;
+   uint64_t features;
+   uint32_t vq_num;
+   uint32_t vq_align;
+   uint32_t reserved[13];
+   uint32_t config_size;
+   uint8_t config[];
+};
+
+/* Create a VDUSE device which is represented by a char device 
(/dev/vduse/$NAME) */
+#define VDUSE_CREATE_DEV   _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config)
+
+/*
+ * Destroy a VDUSE device. Make sure there are no more references
+ * to the char device (/dev/vduse/$NAME).
+ */
+#define VDUSE_DESTROY_DEV  _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX])
+
+/* The ioctls for VDUSE device (/dev/vduse/$NAME) */
+
+/**
+ * struct vduse_iotlb_entry - entry of IOTLB to describe one IOVA region 
[start, last]
+ * @offset: the mmap offset on returned file descriptor
+ * @start: start of the IOVA region
+ * @last: last of the IOVA region
+ * @perm: access permission of the IOVA region
+ *
+ * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA 
region.
+ */
+struct vduse_iotlb_entry {
+   uint64_t offset;
+   uint64_t start;
+   uint64_t last;
+#define VDUSE_ACCESS_RO 0x1
+#define VDUSE_ACCESS_WO 0x2
+#define VDUSE_ACCESS_RW 0x3
+   uint8_t perm;
+};
+
+/*
+ * Find the first IOVA region that overlaps with the range [start, last]
+ * and return the corresponding file descriptor. Return -EINVAL means the
+ * IOVA region doesn't exist. Caller should set start and last fields.
+ */
+#define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x10, struct 
vduse_iotlb_entry)
+
+/*
+ * Get the negotiated virtio features. It's a subset of the features in
+ * struct vduse_dev_config which can be accepted by virtio driver. It's
+ * only valid after FEATURES_OK status bit is set.
+ */
+#define VDUSE_DEV_GET_FEATURES _IOR(VDUSE_BASE, 0x11, uint64_t)
+
+/**
+ * struct vduse_config_data - data used to update configuration space
+ * @offset: the offset from the beginning of configuration space
+ * @length: the length to write to configuration space
+ * @buffer: the buffer used to write from
+ *
+ * Structure used by VDUSE_DEV_SET_CONFIG ioctl to update device
+ * configuration space.
+ */
+struct vduse_config_data {
+   uint32_t offset;
+   uint32_t length;
+   uint8_t buffer[];
+};
+
+/* Set device configuration space */
+#define VDUSE_DEV_SET_CONFIG   _IOW(VDUSE_BASE, 0x12, struct vduse_config_data)
+
+/*
+ * Inject a config interrupt. It's usually used to notify virtio driver
+ * that device configuration space has changed.
+ */
+#define VDUSE_DEV_INJECT_CONFIG_IRQ_IO(VDUSE_BASE, 0x13)
+
+/**
+ * struct vduse_vq_config - basic configuration of a virtqueue
+ * @index: virtqueue index
+ * @max_size: the max size of virtqueue
+ * @reserved: for future use, needs to be initialized to zero
+ *
+ * Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue.
+ */
+struct vduse_vq_config {
+   uint32_t index;
+   uint16_t max_size;
+   uint16_t reserved[13];
+};
+
+/*
+ * Setup the specified virtqueue. Make sure all virtqueues have been
+ * configured before the device is attached to vDPA bus.
+ */
+#define VDU

[PATCH 3/5] vduse-blk: implements vduse-blk export

2022-01-25 Thread Xie Yongji
This implements a VDUSE block backends based on
the libvduse library. We can use it to export the BDSs
for both VM and container (host) usage.

The new command-line syntax is:

$ qemu-storage-daemon \
--blockdev file,node-name=drive0,filename=test.img \
--export vduse-blk,node-name=drive0,id=vduse-export0,writable=on

After the qemu-storage-daemon started, we need to use
the "vdpa" command to attach the device to vDPA bus:

$ vdpa dev add name vduse-export0 mgmtdev vduse

Also the device must be removed via the "vdpa" command
before we stop the qemu-storage-daemon.

Signed-off-by: Xie Yongji 
---
 block/export/export.c |   6 +
 block/export/meson.build  |   5 +
 block/export/vduse-blk.c  | 427 ++
 block/export/vduse-blk.h  |  20 ++
 meson.build   |  13 ++
 meson_options.txt |   2 +
 qapi/block-export.json|  24 +-
 scripts/meson-buildoptions.sh |   4 +
 8 files changed, 499 insertions(+), 2 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h

diff --git a/block/export/export.c b/block/export/export.c
index 6d3b9964c8..00dd505540 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -26,6 +26,9 @@
 #ifdef CONFIG_VHOST_USER_BLK_SERVER
 #include "vhost-user-blk-server.h"
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+#include "vduse-blk.h"
+#endif
 
 static const BlockExportDriver *blk_exp_drivers[] = {
 &blk_exp_nbd,
@@ -35,6 +38,9 @@ static const BlockExportDriver *blk_exp_drivers[] = {
 #ifdef CONFIG_FUSE
 &blk_exp_fuse,
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+&blk_exp_vduse_blk,
+#endif
 };
 
 /* Only accessed from the main thread */
diff --git a/block/export/meson.build b/block/export/meson.build
index 0a08e384c7..cf311d2b1b 100644
--- a/block/export/meson.build
+++ b/block/export/meson.build
@@ -5,3 +5,8 @@ if have_vhost_user_blk_server
 endif
 
 blockdev_ss.add(when: fuse, if_true: files('fuse.c'))
+
+if have_vduse_blk_export
+blockdev_ss.add(files('vduse-blk.c'))
+blockdev_ss.add(libvduse)
+endif
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
new file mode 100644
index 00..5a8d289685
--- /dev/null
+++ b/block/export/vduse-blk.c
@@ -0,0 +1,427 @@
+/*
+ * Export QEMU block device via VDUSE
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *   Portions of codes and concepts borrowed from vhost-user-blk-server.c, so:
+ * Copyright (c) 2020 Red Hat, Inc.
+ *
+ * Author:
+ *   Xie Yongji 
+ *   Coiby Xu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "sysemu/block-backend.h"
+#include "block/export.h"
+#include "qemu/error-report.h"
+#include "util/block-helpers.h"
+#include "subprojects/libvduse/libvduse.h"
+
+#include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/virtio_blk.h"
+
+#define VIRTIO_BLK_SECTOR_BITS 9
+#define VIRTIO_BLK_SECTOR_SIZE (1ULL << VIRTIO_BLK_SECTOR_BITS)
+
+#define VDUSE_DEFAULT_NUM_QUEUE 1
+#define VDUSE_DEFAULT_QUEUE_SIZE 128
+
+typedef struct VduseBlkExport {
+BlockExport export;
+VduseDev *dev;
+uint16_t num_queues;
+uint32_t blk_size;
+bool writable;
+} VduseBlkExport;
+
+struct virtio_blk_inhdr {
+unsigned char status;
+};
+
+typedef struct VduseBlkReq {
+VduseVirtqElement elem;
+int64_t sector_num;
+size_t in_len;
+struct virtio_blk_inhdr *in;
+struct virtio_blk_outhdr out;
+VduseVirtq *vq;
+} VduseBlkReq;
+
+static void vduse_blk_req_complete(VduseBlkReq *req)
+{
+vduse_queue_push(req->vq, &req->elem, req->in_len);
+vduse_queue_notify(req->vq);
+
+free(req);
+}
+
+static bool vduse_blk_sect_range_ok(VduseBlkExport *vblk_exp,
+uint64_t sector, size_t size)
+{
+uint64_t nb_sectors;
+uint64_t total_sectors;
+
+if (size % VIRTIO_BLK_SECTOR_SIZE) {
+return false;
+}
+
+nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS;
+
+QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE);
+if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
+return false;
+}
+if ((sector << VIRTIO_BLK_SECTOR_BITS) % vblk_exp->blk_size) {
+return false;
+}
+blk_get_geometry(vblk_exp->export.blk, &total_sectors);
+if (sector > total_sectors || nb_sectors > total_sectors - sector) {
+return false;
+}
+return true;
+}
+
+static void coroutine_fn vduse_blk_virtio_process_req(void *opaque)
+{
+VduseBlkReq *req = opaque;
+VduseVirtq *vq = req->vq;
+VduseDev *dev = vduse_queue_get_dev(v

[PATCH 5/5] libvduse: Add support for reconnecting

2022-01-25 Thread Xie Yongji
To support reconnecting after restart or crash, VDUSE backend
might need to resubmit inflight I/Os. This stores the metadata
such as the index of inflight I/O's descriptors to a shm file so
that VDUSE backend can restore them during reconnecting.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c|   4 +-
 subprojects/libvduse/libvduse.c | 254 +++-
 subprojects/libvduse/libvduse.h |   4 +-
 3 files changed, 254 insertions(+), 8 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 83845e9a9a..bc14fd798b 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -232,6 +232,8 @@ static void vduse_blk_enable_queue(VduseDev *dev, 
VduseVirtq *vq)
 
 aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+/* Make sure we don't miss any kick afer reconnecting */
+eventfd_write(vduse_queue_get_fd(vq), 1);
 }
 
 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
@@ -388,7 +390,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
  features, num_queues,
  sizeof(struct virtio_blk_config),
  (char *)&config, &vduse_blk_ops,
- vblk_exp);
+ g_get_tmp_dir(), vblk_exp);
 if (!vblk_exp->dev) {
 error_setg(errp, "failed to create vduse device");
 return -ENOMEM;
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index 7671864bca..ce2f6c7949 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -41,6 +41,8 @@
 #define VDUSE_VQ_ALIGN 4096
 #define MAX_IOVA_REGIONS 256
 
+#define LOG_ALIGNMENT 64
+
 /* Round number down to multiple */
 #define ALIGN_DOWN(n, m) ((n) / (m) * (m))
 
@@ -51,6 +53,31 @@
 #define unlikely(x)   __builtin_expect(!!(x), 0)
 #endif
 
+typedef struct VduseDescStateSplit {
+uint8_t inflight;
+uint8_t padding[5];
+uint16_t next;
+uint64_t counter;
+} VduseDescStateSplit;
+
+typedef struct VduseVirtqLogInflight {
+uint64_t features;
+uint16_t version;
+uint16_t desc_num;
+uint16_t last_batch_head;
+uint16_t used_idx;
+VduseDescStateSplit desc[];
+} VduseVirtqLogInflight;
+
+typedef struct VduseVirtqLog {
+VduseVirtqLogInflight inflight;
+} VduseVirtqLog;
+
+typedef struct VduseVirtqInflightDesc {
+uint16_t index;
+uint64_t counter;
+} VduseVirtqInflightDesc;
+
 typedef struct VduseRing {
 unsigned int num;
 uint64_t desc_addr;
@@ -73,6 +100,10 @@ struct VduseVirtq {
 bool ready;
 int fd;
 VduseDev *dev;
+VduseVirtqInflightDesc *resubmit_list;
+uint16_t resubmit_num;
+uint64_t counter;
+VduseVirtqLog *log;
 };
 
 typedef struct VduseIovaRegion {
@@ -96,8 +127,67 @@ struct VduseDev {
 int fd;
 int ctrl_fd;
 void *priv;
+char *shm_log_dir;
+void *log;
+bool reconnect;
 };
 
+static inline size_t vduse_vq_log_size(uint16_t queue_size)
+{
+return ALIGN_UP(sizeof(VduseDescStateSplit) * queue_size +
+sizeof(VduseVirtqLogInflight), LOG_ALIGNMENT);
+}
+
+static void *vduse_log_get(const char *dir, const char *name, size_t size)
+{
+void *ptr = MAP_FAILED;
+char *path;
+int fd;
+
+path = (char *)malloc(strlen(dir) + strlen(name) +
+  strlen("/vduse-log-") + 1);
+if (!path) {
+return ptr;
+}
+sprintf(path, "%s/vduse-log-%s", dir, name);
+
+fd = open(path, O_RDWR | O_CREAT, 0600);
+if (fd == -1) {
+goto out;
+}
+
+if (ftruncate(fd, size) == -1) {
+goto out;
+}
+
+ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+if (ptr == MAP_FAILED) {
+goto out;
+}
+out:
+if (fd > 0) {
+close(fd);
+}
+free(path);
+
+return ptr;
+}
+
+static void vduse_log_destroy(const char *dir, const char *name)
+{
+char *path;
+
+path = (char *)malloc(strlen(dir) + strlen(name) +
+  strlen("/vduse-log-") + 1);
+if (!path) {
+return;
+}
+sprintf(path, "%s/vduse-log-%s", dir, name);
+
+unlink(path);
+free(path);
+}
+
 static inline bool has_feature(uint64_t features, unsigned int fbit)
 {
 assert(fbit < 64);
@@ -139,6 +229,98 @@ static int vduse_inject_irq(VduseDev *dev, int index)
 return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, &index);
 }
 
+static int inflight_desc_compare(const void *a, const void *b)
+{
+VduseVirtqInflightDesc *desc0 = (VduseVirtqInflightDesc *)a,
+   *desc1 = (VduseVirtqInflightDesc *)b;
+
+if (desc1->counter > desc0->counter &&
+(desc1->counter

[PATCH 0/5] Support exporting BDSs via VDUSE

2022-01-25 Thread Xie Yongji
Hi all,

Last few months ago, VDUSE (vDPA Device in Userspace) [1] has
been merged into Linux kernel as a framework that make it
possible to emulate a vDPA device in userspace. This series
aimed at implementing a VDUSE block backend based on the
qemu-storage-daemon infrastructure.

To support that, we firstly introduce a VDUSE library as a
subproject (like what libvhost-user does) to help implementing
VDUSE backends in QEMU. Then a VDUSE block export is implemented
based on this library. At last, we add resize and reconnect support
to the VDUSE block export and VDUSE library.

Since we don't support vdpa-blk in QEMU currently, the VM case is
tested with my previous patchset [2].

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html
[2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg797569.html

Please review, thanks!

Xie Yongji (5):
  headers: Add vduse.h
  libvduse: Add VDUSE (vDPA Device in Userspace) library
  vduse-blk: implements vduse-blk export
  vduse-blk: Add vduse-blk resize support
  libvduse: Add support for reconnecting

 block/export/export.c   |6 +
 block/export/meson.build|5 +
 block/export/vduse-blk.c|  448 +++
 block/export/vduse-blk.h|   20 +
 include/standard-headers/linux/vduse.h  |  306 +
 meson.build |   28 +
 meson_options.txt   |4 +
 qapi/block-export.json  |   24 +-
 scripts/meson-buildoptions.sh   |7 +
 scripts/update-linux-headers.sh |1 +
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/libvduse.c | 1267 +++
 subprojects/libvduse/libvduse.h |  195 +++
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 15 files changed, 2321 insertions(+), 2 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h
 create mode 100644 include/standard-headers/linux/vduse.h
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

-- 
2.20.1




[PATCH 2/2] block/export: Add vhost-user-blk resize support

2022-01-21 Thread Xie Yongji
To support block resize, this updates the capacity field
in configuration space and use vu_notify_config_change()
to notify the vhost-user master on the block resize callback.

Signed-off-by: Xie Yongji 
---
 block/export/vhost-user-blk-server.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index 1862563336..929a0bd007 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -435,6 +435,20 @@ static void blk_aio_detach(void *opaque)
 vexp->export.ctx = NULL;
 }
 
+static void vu_blk_resize(void *opaque)
+{
+BlockExport *exp = opaque;
+VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
+
+vexp->blkcfg.capacity =
+cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
+vu_notify_config_change(&vexp->vu_server.vu_dev);
+}
+
+static const BlockDevOps vu_block_ops = {
+.resize_cb = vu_blk_resize,
+};
+
 static void
 vu_blk_initialize_config(BlockDriverState *bs,
  struct virtio_blk_config *config,
@@ -513,6 +527,8 @@ static int vu_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 return -EADDRNOTAVAIL;
 }
 
+blk_set_dev_ops(exp->blk, &vu_block_ops, exp);
+
 return 0;
 }
 
-- 
2.20.1




[PATCH 1/2] libvhost-user: Add vu_notify_config_change() to support config change notify

2022-01-21 Thread Xie Yongji
This adds a new API vu_notify_config_change() to support
sending VHOST_USER_SLAVE_CONFIG_CHANGE_MSG message to notify
that the configuration space has changed.

Signed-off-by: Xie Yongji 
---
 subprojects/libvhost-user/libvhost-user.c | 20 
 subprojects/libvhost-user/libvhost-user.h |  8 
 2 files changed, 28 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 787f4d2d4f..ff95ccd6f3 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -1545,6 +1545,26 @@ vu_set_config(VuDev *dev, VhostUserMsg *vmsg)
 return false;
 }
 
+bool vu_notify_config_change(VuDev *dev)
+{
+bool ret;
+VhostUserMsg vmsg = {
+.request = VHOST_USER_SLAVE_CONFIG_CHANGE_MSG,
+.flags = VHOST_USER_VERSION,
+.size = 0,
+};
+
+if (!vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_CONFIG)) {
+return false;
+}
+
+pthread_mutex_lock(&dev->slave_mutex);
+ret = !vu_message_write(dev, dev->slave_fd, &vmsg);
+pthread_mutex_unlock(&dev->slave_mutex);
+
+return ret;
+}
+
 static bool
 vu_set_postcopy_advise(VuDev *dev, VhostUserMsg *vmsg)
 {
diff --git a/subprojects/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
index 3d13dfadde..dd14242a7b 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -491,6 +491,14 @@ bool vu_dispatch(VuDev *dev);
  */
 void *vu_gpa_to_va(VuDev *dev, uint64_t *plen, uint64_t guest_addr);
 
+/**
+ * vu_notify_config_change:
+ * @dev: a VuDev context
+ *
+ * Notify that the configuration space has changed. Returns FALSE on failure.
+ */
+bool vu_notify_config_change(VuDev *dev);
+
 /**
  * vu_get_queue:
  * @dev: a VuDev context
-- 
2.20.1




[PATCH 1/2] iscsi: handle check condition status in retry loop

2020-07-01 Thread Xie Yongji
The handling of check condition was incorrect because
we would only do it after retries exceed maximum.

Fixes: 8c460269aa ("iscsi: base all handling of check condition on 
scsi_sense_to_errno")
Signed-off-by: Xie Yongji 
---
 block/iscsi.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index a8b76979d8..2964c9f8d2 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -266,16 +266,16 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
status,
 timer_mod(&iTask->retry_timer,
   qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + retry_time);
 iTask->do_retry = 1;
-}
-} else if (status == SCSI_STATUS_CHECK_CONDITION) {
-int error = iscsi_translate_sense(&task->sense);
-if (error == EAGAIN) {
-error_report("iSCSI CheckCondition: %s",
- iscsi_get_error(iscsi));
-iTask->do_retry = 1;
-} else {
-iTask->err_code = -error;
-iTask->err_str = g_strdup(iscsi_get_error(iscsi));
+} else if (status == SCSI_STATUS_CHECK_CONDITION) {
+int error = iscsi_translate_sense(&task->sense);
+if (error == EAGAIN) {
+error_report("iSCSI CheckCondition: %s",
+ iscsi_get_error(iscsi));
+iTask->do_retry = 1;
+} else {
+iTask->err_code = -error;
+iTask->err_str = g_strdup(iscsi_get_error(iscsi));
+}
 }
 }
 }
-- 
2.11.0




[PATCH 2/2] iscsi: return -EIO when sense fields are meaningless

2020-07-01 Thread Xie Yongji
When an I/O request failed, now we only return correct
value on scsi check condition. We should also have a
default errno such as -EIO in other case.

Signed-off-by: Xie Yongji 
---
 block/iscsi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 2964c9f8d2..387ed872ef 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -241,9 +241,11 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
status,
 
 iTask->status = status;
 iTask->do_retry = 0;
+iTask->err_code = 0;
 iTask->task = task;
 
 if (status != SCSI_STATUS_GOOD) {
+iTask->err_code = -EIO;
 if (iTask->retries++ < ISCSI_CMD_RETRIES) {
 if (status == SCSI_STATUS_BUSY ||
 status == SCSI_STATUS_TIMEOUT ||
-- 
2.11.0