[PATCH] hw/sd: Correct CMD58's R3 response "in idle state" bit in SPI-mode

2022-01-25 Thread frank . chang
From: Frank Chang 

In SPI-mode, CMD58 returns R3 response with the format:

39  32 31  0
++ +---+
| R1 | |OCR|
++ +---+

Where R1 has bits[0] indicating whether SD card is "in idle state".
However, according to SD card state transition table, CMD58 can only be
transited from trans to data state, which the "in idle state" bit should
not be set in CMD58's R3 response.
(But CMD8 should still have "in idle state" bit to be set in its
R7 response because it can only be transited from idle to idle state.)

Signed-off-by: Frank Chang 
---
 hw/sd/ssi-sd.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/sd/ssi-sd.c b/hw/sd/ssi-sd.c
index 167c03b780..7faa969e82 100644
--- a/hw/sd/ssi-sd.c
+++ b/hw/sd/ssi-sd.c
@@ -176,12 +176,17 @@ static uint32_t ssi_sd_transfer(SSIPeripheral *dev, 
uint32_t val)
 s->arglen = 1;
 s->response[0] = 4;
 DPRINTF("SD command failed\n");
-} else if (s->cmd == 8 || s->cmd == 58) {
-/* CMD8/CMD58 returns R3/R7 response */
-DPRINTF("Returned R3/R7\n");
+} else if (s->cmd == 8) {
+/* CMD8 returns R7 response */
+DPRINTF("Returned R7\n");
 s->arglen = 5;
 s->response[0] = 1;
 memcpy(>response[1], longresp, 4);
+} else if (s->cmd == 58) {
+/* CMD58 returns R3 response */
+DPRINTF("Returned R3\n");
+s->arglen = 5;
+memcpy(>response[1], longresp, 4);
 } else if (s->arglen != 4) {
 BADF("Unexpected response to cmd %d\n", s->cmd);
 /* Illegal command is about as near as we can get.  */
-- 
2.31.1




Re: [PATCH v1 08/22] drop libxml2 checks since libxml is not actually used (for parallels)

2022-01-25 Thread Vladimir Sementsov-Ogievskiy

24.01.2022 23:15, Alex Bennée wrote:

From: Michael Tokarev 

For a long time, we assumed that libxml2 is necessary for parallels
block format support (block/parallels*). However, this format actually
does not use libxml [*]. Since this is the only user of libxml2 in
whole QEMU tree, we can drop all libxml2 checks and dependencies too.

It is even more: --enable-parallels configure option was the only
option which was silently ignored when it's (fake) dependency
(libxml2) isn't installed.

Drop all mentions of libxml2.

[*] Actually the basis for libxml use were introduced in commit
 ed279a06c53 ("configure: add dependency") but the implementation
 was never merged:
 
https://lore.kernel.org/qemu-devel/70227bbd-a517-70e9-714f-e6e0ec431...@openvz.org/

Signed-off-by: Michael Tokarev 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20220119090423.149315-1-...@msgid.tls.msk.ru>
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
[PMD: Updated description and adapted to use lcitool]
Reviewed-by: Thomas Huth 
Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Alex Bennée 
Message-Id: <20220121154134.315047-5-f4...@amsat.org>


Reviewed-by: Vladimir Sementsov-Ogievskiy 

Parallels format includes xml disk descriptor format, which was never merged to 
Qemu. Yes, the introduced dependency was premature, sorry for that :(

The implementation of xml part was sent, but only for someone other who want to 
continue this work, Virtuozzo don't have such plans now:

https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg03601.html

--
Best regards,
Vladimir



Re: [PATCH v6 04/33] block/export/fuse.c: allow writable exports to take RESIZE permission

2022-01-25 Thread Hanna Reitz

On 21.01.22 18:05, Emanuele Giuseppe Esposito wrote:

Allow writable exports to get BLK_PERM_RESIZE permission
from creation, in fuse_export_create().
In this way, there is no need to give the permission in
fuse_do_truncate(), which might be run in an iothread.

Permissions should be set only in the main thread, so
in any case if an iothread tries to set RESIZE, it will
be blocked.

Also assert in fuse_do_truncate that if we give the
RESIZE permission we can then restore the original ones,
since we don't check the return value of blk_set_perm.


We do, because we pass _abort for errp, so if an error were to 
occur, qemu would abort.


Not that I mind adding an assertion on the return value, just noting 
that we omitted that kind of intentionally.


Reviewed-by: Hanna Reitz 


Signed-off-by: Emanuele Giuseppe Esposito 
---
  block/export/fuse.c | 25 ++---
  1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 823c126d23..3c177b9e67 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -86,8 +86,8 @@ static int fuse_export_create(BlockExport *blk_exp,
  
  assert(blk_exp_args->type == BLOCK_EXPORT_TYPE_FUSE);
  
-/* For growable exports, take the RESIZE permission */

-if (args->growable) {
+/* For growable and writable exports, take the RESIZE permission */
+if (args->growable || blk_exp_args->writable) {
  uint64_t blk_perm, blk_shared_perm;
  
  blk_get_perm(exp->common.blk, _perm, _shared_perm);

@@ -392,14 +392,23 @@ static int fuse_do_truncate(const FuseExport *exp, 
int64_t size,
  {
  uint64_t blk_perm, blk_shared_perm;
  BdrvRequestFlags truncate_flags = 0;
-int ret;
+bool add_resize_perm;
+int ret, ret_check;
+
+/* Growable and writable exports have a permanent RESIZE permission */
+add_resize_perm = !exp->growable && !exp->writable;
  
  if (req_zero_write) {

  truncate_flags |= BDRV_REQ_ZERO_WRITE;
  }
  
-/* Growable exports have a permanent RESIZE permission */

-if (!exp->growable) {
+if (add_resize_perm) {
+
+if (!qemu_in_main_thread()) {
+/* Changing permissions like below only works in the main thread */
+return -EPERM;
+}
+
  blk_get_perm(exp->common.blk, _perm, _shared_perm);
  
  ret = blk_set_perm(exp->common.blk, blk_perm | BLK_PERM_RESIZE,

@@ -412,9 +421,11 @@ static int fuse_do_truncate(const FuseExport *exp, int64_t 
size,
  ret = blk_truncate(exp->common.blk, size, true, prealloc,
 truncate_flags, NULL);
  
-if (!exp->growable) {

+if (add_resize_perm) {
  /* Must succeed, because we are only giving up the RESIZE permission 
*/
-blk_set_perm(exp->common.blk, blk_perm, blk_shared_perm, _abort);
+ret_check = blk_set_perm(exp->common.blk, blk_perm,
+ blk_shared_perm, _abort);
+assert(ret_check == 0);
  }
  
  return ret;





[PATCH] qemu-storage-daemon: Fix typo in vhost-user-blk help

2022-01-25 Thread Kevin Wolf
The syntax of the fd passing case misses the "addr.type=" key. Add it.

Signed-off-by: Kevin Wolf 
---
 storage-daemon/qemu-storage-daemon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index 9d76d1114d..ec9aa79b55 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -111,7 +111,7 @@ static void help(void)
 " export the specified block node as a\n"
 " vhost-user-blk device over UNIX domain socket\n"
 "  --export [type=]vhost-user-blk,id=,node-name=,\n"
-"   fd,addr.str=[,writable=on|off]\n"
+"   addr.type=fd,addr.str=[,writable=on|off]\n"
 "   [,logical-block-size=][,num-queues=]\n"
 " export the specified block node as a\n"
 " vhost-user-blk device over file descriptor\n"
-- 
2.31.1




Re: [PATCH v2 1/3] block/export/fuse: Extract fuse_fallocate_punch_hole()

2022-01-25 Thread Philippe Mathieu-Daudé via



On 1/25/22 12:10, Thomas Huth wrote:
> On 24/01/2022 23.03, Philippe Mathieu-Daudé via wrote:
>> Extract fuse_fallocate_punch_hole() to avoid #ifdef'ry
>> mixed within if/else statement.
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>   block/export/fuse.c | 59 +++--
>>   1 file changed, 35 insertions(+), 24 deletions(-)
>>
>> diff --git a/block/export/fuse.c b/block/export/fuse.c
>> index 6710d8aed86..31cb0503adc 100644
>> --- a/block/export/fuse.c
>> +++ b/block/export/fuse.c
>> @@ -603,6 +603,38 @@ static void fuse_write(fuse_req_t req, fuse_ino_t
>> inode, const char *buf,
>>   }
>>   }
>>   +static bool fuse_fallocate_zero_range(fuse_req_t req, fuse_ino_t
>> inode,
>> +  int mode, int64_t blk_len,
>> +  off_t offset, off_t *length)
>> +{
>> +#ifdef CONFIG_FALLOCATE_ZERO_RANGE
>> +    FuseExport *exp = fuse_req_userdata(req);
>> +
>> +    if (mode & FALLOC_FL_ZERO_RANGE) {
>> +    int ret;
>> +
>> +   if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + *length >
>> blk_len) {
>> +    /* No need for zeroes, we are going to write them
>> ourselves */
>> +    ret = fuse_do_truncate(exp, offset + *length, false,
>> +   PREALLOC_MODE_OFF);
>> +    if (ret < 0) {
>> +    fuse_reply_err(req, -ret);
>> +    return false;
>> +    }
>> +    }
>> +
>> +    do {
>> +    int size = MIN(*length, BDRV_REQUEST_MAX_BYTES);
>> +
>> +    ret = blk_pwrite_zeroes(exp->common.blk, offset, size, 0);
>> +    offset += size;
>> +    *length -= size;
>> +    } while (ret == 0 && *length > 0);
>> +    }
>> +#endif /* CONFIG_FALLOCATE_ZERO_RANGE */
>> +    return true;
>> +}
>> +
>>   /**
>>    * Let clients perform various fallocate() operations.
>>    */
>> @@ -642,30 +674,9 @@ static void fuse_fallocate(fuse_req_t req,
>> fuse_ino_t inode, int mode,
>>   offset += size;
>>   length -= size;
>>   } while (ret == 0 && length > 0);
>> -    }
>> -#ifdef CONFIG_FALLOCATE_ZERO_RANGE
>> -    else if (mode & FALLOC_FL_ZERO_RANGE) {
>> -    if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + length >
>> blk_len) {
>> -    /* No need for zeroes, we are going to write them
>> ourselves */
>> -    ret = fuse_do_truncate(exp, offset + length, false,
>> -   PREALLOC_MODE_OFF);
>> -    if (ret < 0) {
>> -    fuse_reply_err(req, -ret);
>> -    return;
>> -    }
>> -    }
>> -
>> -    do {
>> -    int size = MIN(length, BDRV_REQUEST_MAX_BYTES);
>> -
>> -    ret = blk_pwrite_zeroes(exp->common.blk,
>> -    offset, size, 0);
>> -    offset += size;
>> -    length -= size;
>> -    } while (ret == 0 && length > 0);
> 
> I might not have enough coffee today yet, but I think your patch is
> wrong: If the code executed this do-while loop/if-statement-branch, the
> other else-statements below were never called. Now with your patch, if
> the do-while loop in fuse_fallocate_zero_range() is called, the function
> will return with "true" at the end, causing the other else-statements
> below to be called, so that ret finally gets set to -EOPNOTSUPP. Or did
> I miss something?

You are right, this patch is crap, sorry.



[PATCH] block/export: Fix vhost-user-blk shutdown with requests in flight

2022-01-25 Thread Kevin Wolf
The vhost-user-blk export runs requests asynchronously in their own
coroutine. When the vhost connection goes away and we want to stop the
vhost-user server, we need to wait for these coroutines to stop before
we can unmap the shared memory. Otherwise, they would still access the
unmapped memory and crash.

This introduces a refcount to VuServer which is increased when spawning
a new request coroutine and decreased before the coroutine exits. The
memory is only unmapped when the refcount reaches zero.

Signed-off-by: Kevin Wolf 
---
 include/qemu/vhost-user-server.h |  5 +
 block/export/vhost-user-blk-server.c |  5 +
 util/vhost-user-server.c | 22 ++
 3 files changed, 32 insertions(+)

diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
index 121ea1dedf..cd43193b80 100644
--- a/include/qemu/vhost-user-server.h
+++ b/include/qemu/vhost-user-server.h
@@ -42,6 +42,8 @@ typedef struct {
 const VuDevIface *vu_iface;
 
 /* Protected by ctx lock */
+unsigned int refcount;
+bool wait_idle;
 VuDev vu_dev;
 QIOChannel *ioc; /* The I/O channel with the client */
 QIOChannelSocket *sioc; /* The underlying data channel with the client */
@@ -59,6 +61,9 @@ bool vhost_user_server_start(VuServer *server,
 
 void vhost_user_server_stop(VuServer *server);
 
+void vhost_user_server_ref(VuServer *server);
+void vhost_user_server_unref(VuServer *server);
+
 void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
 void vhost_user_server_detach_aio_context(VuServer *server);
 
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index 1862563336..a129204c44 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -172,6 +172,7 @@ vu_blk_discard_write_zeroes(VuBlkExport *vexp, struct iovec 
*iov,
 return VIRTIO_BLK_S_IOERR;
 }
 
+/* Called with server refcount increased, must decrease before returning */
 static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
 {
 VuBlkReq *req = opaque;
@@ -286,10 +287,12 @@ static void coroutine_fn vu_blk_virtio_process_req(void 
*opaque)
 }
 
 vu_blk_req_complete(req);
+vhost_user_server_unref(server);
 return;
 
 err:
 free(req);
+vhost_user_server_unref(server);
 }
 
 static void vu_blk_process_vq(VuDev *vu_dev, int idx)
@@ -310,6 +313,8 @@ static void vu_blk_process_vq(VuDev *vu_dev, int idx)
 
 Coroutine *co =
 qemu_coroutine_create(vu_blk_virtio_process_req, req);
+
+vhost_user_server_ref(server);
 qemu_coroutine_enter(co);
 }
 }
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index f68287e811..f66fbba710 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -74,6 +74,20 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
 error_report("vu_panic: %s", buf);
 }
 
+void vhost_user_server_ref(VuServer *server)
+{
+assert(!server->wait_idle);
+server->refcount++;
+}
+
+void vhost_user_server_unref(VuServer *server)
+{
+server->refcount--;
+if (server->wait_idle && !server->refcount) {
+aio_co_wake(server->co_trip);
+}
+}
+
 static bool coroutine_fn
 vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
 {
@@ -177,6 +191,14 @@ static coroutine_fn void vu_client_trip(void *opaque)
 /* Keep running */
 }
 
+if (server->refcount) {
+/* Wait for requests to complete before we can unmap the memory */
+server->wait_idle = true;
+qemu_coroutine_yield();
+server->wait_idle = false;
+}
+assert(server->refcount == 0);
+
 vu_deinit(vu_dev);
 
 /* vu_deinit() should have called remove_watch() */
-- 
2.31.1




[PATCH 4/5] vduse-blk: Add vduse-blk resize support

2022-01-25 Thread Xie Yongji
To support block resize, this uses vduse_dev_update_config()
to update the capacity field in configuration space and inject
config interrupt on the block resize callback.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 5a8d289685..83845e9a9a 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -297,6 +297,23 @@ static void blk_aio_detach(void *opaque)
 vblk_exp->export.ctx = NULL;
 }
 
+static void vduse_blk_resize(void *opaque)
+{
+BlockExport *exp = opaque;
+VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+struct virtio_blk_config config;
+
+config.capacity =
+cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
+vduse_dev_update_config(vblk_exp->dev, sizeof(config.capacity),
+offsetof(struct virtio_blk_config, capacity),
+(char *));
+}
+
+static const BlockDevOps vduse_block_ops = {
+.resize_cb = vduse_blk_resize,
+};
+
 static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
 Error **errp)
 {
@@ -387,6 +404,8 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
  vblk_exp);
 
+blk_set_dev_ops(exp->blk, _block_ops, exp);
+
 return 0;
 }
 
-- 
2.20.1




[PATCH 2/5] libvduse: Add VDUSE (vDPA Device in Userspace) library

2022-01-25 Thread Xie Yongji
VDUSE [1] is a linux framework that makes it possible to implement
software-emulated vDPA devices in userspace. This adds a library
as a subproject to help implementing VDUSE backends in QEMU.

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html

Signed-off-by: Xie Yongji 
---
 meson.build |   15 +
 meson_options.txt   |2 +
 scripts/meson-buildoptions.sh   |3 +
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/libvduse.c | 1025 +++
 subprojects/libvduse/libvduse.h |  193 
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 8 files changed, 1250 insertions(+)
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

diff --git a/meson.build b/meson.build
index 333c61deba..864fb50ade 100644
--- a/meson.build
+++ b/meson.build
@@ -1305,6 +1305,21 @@ if not get_option('fuse_lseek').disabled()
   endif
 endif
 
+have_libvduse = (targetos == 'linux')
+if get_option('libvduse').enabled()
+if targetos != 'linux'
+error('libvduse requires linux')
+endif
+elif get_option('libvduse').disabled()
+have_libvduse = false
+endif
+
+libvduse = not_found
+if have_libvduse
+  libvduse_proj = subproject('libvduse')
+  libvduse = libvduse_proj.get_variable('libvduse_dep')
+endif
+
 # libbpf
 libbpf = dependency('libbpf', required: get_option('bpf'), method: 
'pkg-config')
 if libbpf.found() and not cc.links('''
diff --git a/meson_options.txt b/meson_options.txt
index 921967eddb..16790d1814 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -195,6 +195,8 @@ option('virtfs', type: 'feature', value: 'auto',
description: 'virtio-9p support')
 option('virtiofsd', type: 'feature', value: 'auto',
description: 'build virtiofs daemon (virtiofsd)')
+option('libvduse', type: 'feature', value: 'auto',
+   description: 'build VDUSE Library')
 
 option('capstone', type: 'combo', value: 'auto',
choices: ['disabled', 'enabled', 'auto', 'system', 'internal'],
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index a4af02c527..af5c75d758 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -58,6 +58,7 @@ meson_options_help() {
   printf "%s\n" '  libssh  ssh block device support'
   printf "%s\n" '  libudev Use libudev to enumerate host devices'
   printf "%s\n" '  libusb  libusb support for USB passthrough'
+  printf "%s\n" '  libvdusebuild VDUSE Library'
   printf "%s\n" '  libxml2 libxml2 support for Parallels image format'
   printf "%s\n" '  linux-aio   Linux AIO support'
   printf "%s\n" '  linux-io-uring  Linux io_uring support'
@@ -188,6 +189,8 @@ _meson_option_parse() {
 --disable-libudev) printf "%s" -Dlibudev=disabled ;;
 --enable-libusb) printf "%s" -Dlibusb=enabled ;;
 --disable-libusb) printf "%s" -Dlibusb=disabled ;;
+--enable-libvduse) printf "%s" -Dlibvduse=enabled ;;
+--disable-libvduse) printf "%s" -Dlibvduse=disabled ;;
 --enable-libxml2) printf "%s" -Dlibxml2=enabled ;;
 --disable-libxml2) printf "%s" -Dlibxml2=disabled ;;
 --enable-linux-aio) printf "%s" -Dlinux_aio=enabled ;;
diff --git a/subprojects/libvduse/include/atomic.h 
b/subprojects/libvduse/include/atomic.h
new file mode 12
index 00..8c2be64f7b
--- /dev/null
+++ b/subprojects/libvduse/include/atomic.h
@@ -0,0 +1 @@
+../../../include/qemu/atomic.h
\ No newline at end of file
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
new file mode 100644
index 00..7671864bca
--- /dev/null
+++ b/subprojects/libvduse/libvduse.c
@@ -0,0 +1,1025 @@
+/*
+ * VDUSE (vDPA Device in Userspace) library
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *   Portions of codes and concepts borrowed from libvhost-user.c, so:
+ * Copyright IBM, Corp. 2007
+ * Copyright (c) 2016 Red Hat, Inc.
+ *
+ * Author:
+ *   Xie Yongji 
+ *   Anthony Liguori 
+ *   Marc-André Lureau 
+ *   Victor Kaplansky 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "include/atomic.h"
+#include "standard-headers/linux/vhost_types.h"
+#include "standard-headers/linux/vduse.h"
+#include "libvduse.h"
+
+#define VIRTQUEUE_MAX_SIZE 1024
+#define VDUSE_VQ_ALIGN 4096
+#define MAX_IOVA_REGIONS 256
+
+/* Round number down to multiple */
+#define 

[PATCH 1/5] linux-headers: Add vduse.h

2022-01-25 Thread Xie Yongji
This adds vduse header to standard headers so that the
relevant VDUSE API can be used in subsequent patches.

Signed-off-by: Xie Yongji 
---
 include/standard-headers/linux/vduse.h | 306 +
 scripts/update-linux-headers.sh|   1 +
 2 files changed, 307 insertions(+)
 create mode 100644 include/standard-headers/linux/vduse.h

diff --git a/include/standard-headers/linux/vduse.h 
b/include/standard-headers/linux/vduse.h
new file mode 100644
index 00..4242bc9fdf
--- /dev/null
+++ b/include/standard-headers/linux/vduse.h
@@ -0,0 +1,306 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _VDUSE_H_
+#define _VDUSE_H_
+
+#include "standard-headers/linux/types.h"
+
+#define VDUSE_BASE 0x81
+
+/* The ioctls for control device (/dev/vduse/control) */
+
+#define VDUSE_API_VERSION  0
+
+/*
+ * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION).
+ * This is used for future extension.
+ */
+#define VDUSE_GET_API_VERSION  _IOR(VDUSE_BASE, 0x00, uint64_t)
+
+/* Set the version of VDUSE API that userspace supported. */
+#define VDUSE_SET_API_VERSION  _IOW(VDUSE_BASE, 0x01, uint64_t)
+
+/**
+ * struct vduse_dev_config - basic configuration of a VDUSE device
+ * @name: VDUSE device name, needs to be NUL terminated
+ * @vendor_id: virtio vendor id
+ * @device_id: virtio device id
+ * @features: virtio features
+ * @vq_num: the number of virtqueues
+ * @vq_align: the allocation alignment of virtqueue's metadata
+ * @reserved: for future use, needs to be initialized to zero
+ * @config_size: the size of the configuration space
+ * @config: the buffer of the configuration space
+ *
+ * Structure used by VDUSE_CREATE_DEV ioctl to create VDUSE device.
+ */
+struct vduse_dev_config {
+#define VDUSE_NAME_MAX 256
+   char name[VDUSE_NAME_MAX];
+   uint32_t vendor_id;
+   uint32_t device_id;
+   uint64_t features;
+   uint32_t vq_num;
+   uint32_t vq_align;
+   uint32_t reserved[13];
+   uint32_t config_size;
+   uint8_t config[];
+};
+
+/* Create a VDUSE device which is represented by a char device 
(/dev/vduse/$NAME) */
+#define VDUSE_CREATE_DEV   _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config)
+
+/*
+ * Destroy a VDUSE device. Make sure there are no more references
+ * to the char device (/dev/vduse/$NAME).
+ */
+#define VDUSE_DESTROY_DEV  _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX])
+
+/* The ioctls for VDUSE device (/dev/vduse/$NAME) */
+
+/**
+ * struct vduse_iotlb_entry - entry of IOTLB to describe one IOVA region 
[start, last]
+ * @offset: the mmap offset on returned file descriptor
+ * @start: start of the IOVA region
+ * @last: last of the IOVA region
+ * @perm: access permission of the IOVA region
+ *
+ * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA 
region.
+ */
+struct vduse_iotlb_entry {
+   uint64_t offset;
+   uint64_t start;
+   uint64_t last;
+#define VDUSE_ACCESS_RO 0x1
+#define VDUSE_ACCESS_WO 0x2
+#define VDUSE_ACCESS_RW 0x3
+   uint8_t perm;
+};
+
+/*
+ * Find the first IOVA region that overlaps with the range [start, last]
+ * and return the corresponding file descriptor. Return -EINVAL means the
+ * IOVA region doesn't exist. Caller should set start and last fields.
+ */
+#define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x10, struct 
vduse_iotlb_entry)
+
+/*
+ * Get the negotiated virtio features. It's a subset of the features in
+ * struct vduse_dev_config which can be accepted by virtio driver. It's
+ * only valid after FEATURES_OK status bit is set.
+ */
+#define VDUSE_DEV_GET_FEATURES _IOR(VDUSE_BASE, 0x11, uint64_t)
+
+/**
+ * struct vduse_config_data - data used to update configuration space
+ * @offset: the offset from the beginning of configuration space
+ * @length: the length to write to configuration space
+ * @buffer: the buffer used to write from
+ *
+ * Structure used by VDUSE_DEV_SET_CONFIG ioctl to update device
+ * configuration space.
+ */
+struct vduse_config_data {
+   uint32_t offset;
+   uint32_t length;
+   uint8_t buffer[];
+};
+
+/* Set device configuration space */
+#define VDUSE_DEV_SET_CONFIG   _IOW(VDUSE_BASE, 0x12, struct vduse_config_data)
+
+/*
+ * Inject a config interrupt. It's usually used to notify virtio driver
+ * that device configuration space has changed.
+ */
+#define VDUSE_DEV_INJECT_CONFIG_IRQ_IO(VDUSE_BASE, 0x13)
+
+/**
+ * struct vduse_vq_config - basic configuration of a virtqueue
+ * @index: virtqueue index
+ * @max_size: the max size of virtqueue
+ * @reserved: for future use, needs to be initialized to zero
+ *
+ * Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue.
+ */
+struct vduse_vq_config {
+   uint32_t index;
+   uint16_t max_size;
+   uint16_t reserved[13];
+};
+
+/*
+ * Setup the specified virtqueue. Make sure all virtqueues have been
+ * configured before the device is attached to vDPA bus.
+ */
+#define VDUSE_VQ_SETUP _IOW(VDUSE_BASE, 

[PATCH 3/5] vduse-blk: implements vduse-blk export

2022-01-25 Thread Xie Yongji
This implements a VDUSE block backends based on
the libvduse library. We can use it to export the BDSs
for both VM and container (host) usage.

The new command-line syntax is:

$ qemu-storage-daemon \
--blockdev file,node-name=drive0,filename=test.img \
--export vduse-blk,node-name=drive0,id=vduse-export0,writable=on

After the qemu-storage-daemon started, we need to use
the "vdpa" command to attach the device to vDPA bus:

$ vdpa dev add name vduse-export0 mgmtdev vduse

Also the device must be removed via the "vdpa" command
before we stop the qemu-storage-daemon.

Signed-off-by: Xie Yongji 
---
 block/export/export.c |   6 +
 block/export/meson.build  |   5 +
 block/export/vduse-blk.c  | 427 ++
 block/export/vduse-blk.h  |  20 ++
 meson.build   |  13 ++
 meson_options.txt |   2 +
 qapi/block-export.json|  24 +-
 scripts/meson-buildoptions.sh |   4 +
 8 files changed, 499 insertions(+), 2 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h

diff --git a/block/export/export.c b/block/export/export.c
index 6d3b9964c8..00dd505540 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -26,6 +26,9 @@
 #ifdef CONFIG_VHOST_USER_BLK_SERVER
 #include "vhost-user-blk-server.h"
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+#include "vduse-blk.h"
+#endif
 
 static const BlockExportDriver *blk_exp_drivers[] = {
 _exp_nbd,
@@ -35,6 +38,9 @@ static const BlockExportDriver *blk_exp_drivers[] = {
 #ifdef CONFIG_FUSE
 _exp_fuse,
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+_exp_vduse_blk,
+#endif
 };
 
 /* Only accessed from the main thread */
diff --git a/block/export/meson.build b/block/export/meson.build
index 0a08e384c7..cf311d2b1b 100644
--- a/block/export/meson.build
+++ b/block/export/meson.build
@@ -5,3 +5,8 @@ if have_vhost_user_blk_server
 endif
 
 blockdev_ss.add(when: fuse, if_true: files('fuse.c'))
+
+if have_vduse_blk_export
+blockdev_ss.add(files('vduse-blk.c'))
+blockdev_ss.add(libvduse)
+endif
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
new file mode 100644
index 00..5a8d289685
--- /dev/null
+++ b/block/export/vduse-blk.c
@@ -0,0 +1,427 @@
+/*
+ * Export QEMU block device via VDUSE
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *   Portions of codes and concepts borrowed from vhost-user-blk-server.c, so:
+ * Copyright (c) 2020 Red Hat, Inc.
+ *
+ * Author:
+ *   Xie Yongji 
+ *   Coiby Xu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "sysemu/block-backend.h"
+#include "block/export.h"
+#include "qemu/error-report.h"
+#include "util/block-helpers.h"
+#include "subprojects/libvduse/libvduse.h"
+
+#include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/virtio_blk.h"
+
+#define VIRTIO_BLK_SECTOR_BITS 9
+#define VIRTIO_BLK_SECTOR_SIZE (1ULL << VIRTIO_BLK_SECTOR_BITS)
+
+#define VDUSE_DEFAULT_NUM_QUEUE 1
+#define VDUSE_DEFAULT_QUEUE_SIZE 128
+
+typedef struct VduseBlkExport {
+BlockExport export;
+VduseDev *dev;
+uint16_t num_queues;
+uint32_t blk_size;
+bool writable;
+} VduseBlkExport;
+
+struct virtio_blk_inhdr {
+unsigned char status;
+};
+
+typedef struct VduseBlkReq {
+VduseVirtqElement elem;
+int64_t sector_num;
+size_t in_len;
+struct virtio_blk_inhdr *in;
+struct virtio_blk_outhdr out;
+VduseVirtq *vq;
+} VduseBlkReq;
+
+static void vduse_blk_req_complete(VduseBlkReq *req)
+{
+vduse_queue_push(req->vq, >elem, req->in_len);
+vduse_queue_notify(req->vq);
+
+free(req);
+}
+
+static bool vduse_blk_sect_range_ok(VduseBlkExport *vblk_exp,
+uint64_t sector, size_t size)
+{
+uint64_t nb_sectors;
+uint64_t total_sectors;
+
+if (size % VIRTIO_BLK_SECTOR_SIZE) {
+return false;
+}
+
+nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS;
+
+QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE);
+if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
+return false;
+}
+if ((sector << VIRTIO_BLK_SECTOR_BITS) % vblk_exp->blk_size) {
+return false;
+}
+blk_get_geometry(vblk_exp->export.blk, _sectors);
+if (sector > total_sectors || nb_sectors > total_sectors - sector) {
+return false;
+}
+return true;
+}
+
+static void coroutine_fn vduse_blk_virtio_process_req(void *opaque)
+{
+VduseBlkReq *req = opaque;
+VduseVirtq *vq = req->vq;
+VduseDev *dev = vduse_queue_get_dev(vq);
+VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+BlockBackend *blk = vblk_exp->export.blk;
+VduseVirtqElement *elem = >elem;
+struct iovec *in_iov = elem->in_sg;
+struct iovec *out_iov = elem->out_sg;
+

[PATCH 5/5] libvduse: Add support for reconnecting

2022-01-25 Thread Xie Yongji
To support reconnecting after restart or crash, VDUSE backend
might need to resubmit inflight I/Os. This stores the metadata
such as the index of inflight I/O's descriptors to a shm file so
that VDUSE backend can restore them during reconnecting.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c|   4 +-
 subprojects/libvduse/libvduse.c | 254 +++-
 subprojects/libvduse/libvduse.h |   4 +-
 3 files changed, 254 insertions(+), 8 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 83845e9a9a..bc14fd798b 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -232,6 +232,8 @@ static void vduse_blk_enable_queue(VduseDev *dev, 
VduseVirtq *vq)
 
 aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+/* Make sure we don't miss any kick afer reconnecting */
+eventfd_write(vduse_queue_get_fd(vq), 1);
 }
 
 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
@@ -388,7 +390,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
  features, num_queues,
  sizeof(struct virtio_blk_config),
  (char *), _blk_ops,
- vblk_exp);
+ g_get_tmp_dir(), vblk_exp);
 if (!vblk_exp->dev) {
 error_setg(errp, "failed to create vduse device");
 return -ENOMEM;
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index 7671864bca..ce2f6c7949 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -41,6 +41,8 @@
 #define VDUSE_VQ_ALIGN 4096
 #define MAX_IOVA_REGIONS 256
 
+#define LOG_ALIGNMENT 64
+
 /* Round number down to multiple */
 #define ALIGN_DOWN(n, m) ((n) / (m) * (m))
 
@@ -51,6 +53,31 @@
 #define unlikely(x)   __builtin_expect(!!(x), 0)
 #endif
 
+typedef struct VduseDescStateSplit {
+uint8_t inflight;
+uint8_t padding[5];
+uint16_t next;
+uint64_t counter;
+} VduseDescStateSplit;
+
+typedef struct VduseVirtqLogInflight {
+uint64_t features;
+uint16_t version;
+uint16_t desc_num;
+uint16_t last_batch_head;
+uint16_t used_idx;
+VduseDescStateSplit desc[];
+} VduseVirtqLogInflight;
+
+typedef struct VduseVirtqLog {
+VduseVirtqLogInflight inflight;
+} VduseVirtqLog;
+
+typedef struct VduseVirtqInflightDesc {
+uint16_t index;
+uint64_t counter;
+} VduseVirtqInflightDesc;
+
 typedef struct VduseRing {
 unsigned int num;
 uint64_t desc_addr;
@@ -73,6 +100,10 @@ struct VduseVirtq {
 bool ready;
 int fd;
 VduseDev *dev;
+VduseVirtqInflightDesc *resubmit_list;
+uint16_t resubmit_num;
+uint64_t counter;
+VduseVirtqLog *log;
 };
 
 typedef struct VduseIovaRegion {
@@ -96,8 +127,67 @@ struct VduseDev {
 int fd;
 int ctrl_fd;
 void *priv;
+char *shm_log_dir;
+void *log;
+bool reconnect;
 };
 
+static inline size_t vduse_vq_log_size(uint16_t queue_size)
+{
+return ALIGN_UP(sizeof(VduseDescStateSplit) * queue_size +
+sizeof(VduseVirtqLogInflight), LOG_ALIGNMENT);
+}
+
+static void *vduse_log_get(const char *dir, const char *name, size_t size)
+{
+void *ptr = MAP_FAILED;
+char *path;
+int fd;
+
+path = (char *)malloc(strlen(dir) + strlen(name) +
+  strlen("/vduse-log-") + 1);
+if (!path) {
+return ptr;
+}
+sprintf(path, "%s/vduse-log-%s", dir, name);
+
+fd = open(path, O_RDWR | O_CREAT, 0600);
+if (fd == -1) {
+goto out;
+}
+
+if (ftruncate(fd, size) == -1) {
+goto out;
+}
+
+ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+if (ptr == MAP_FAILED) {
+goto out;
+}
+out:
+if (fd > 0) {
+close(fd);
+}
+free(path);
+
+return ptr;
+}
+
+static void vduse_log_destroy(const char *dir, const char *name)
+{
+char *path;
+
+path = (char *)malloc(strlen(dir) + strlen(name) +
+  strlen("/vduse-log-") + 1);
+if (!path) {
+return;
+}
+sprintf(path, "%s/vduse-log-%s", dir, name);
+
+unlink(path);
+free(path);
+}
+
 static inline bool has_feature(uint64_t features, unsigned int fbit)
 {
 assert(fbit < 64);
@@ -139,6 +229,98 @@ static int vduse_inject_irq(VduseDev *dev, int index)
 return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, );
 }
 
+static int inflight_desc_compare(const void *a, const void *b)
+{
+VduseVirtqInflightDesc *desc0 = (VduseVirtqInflightDesc *)a,
+   *desc1 = (VduseVirtqInflightDesc *)b;
+
+if (desc1->counter > desc0->counter &&
+(desc1->counter - desc0->counter) < VIRTQUEUE_MAX_SIZE * 2) {
+return 1;
+}
+
+return -1;
+}
+
+static int 

[PATCH 0/5] Support exporting BDSs via VDUSE

2022-01-25 Thread Xie Yongji
Hi all,

Last few months ago, VDUSE (vDPA Device in Userspace) [1] has
been merged into Linux kernel as a framework that make it
possible to emulate a vDPA device in userspace. This series
aimed at implementing a VDUSE block backend based on the
qemu-storage-daemon infrastructure.

To support that, we firstly introduce a VDUSE library as a
subproject (like what libvhost-user does) to help implementing
VDUSE backends in QEMU. Then a VDUSE block export is implemented
based on this library. At last, we add resize and reconnect support
to the VDUSE block export and VDUSE library.

Since we don't support vdpa-blk in QEMU currently, the VM case is
tested with my previous patchset [2].

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html
[2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg797569.html

Please review, thanks!

Xie Yongji (5):
  headers: Add vduse.h
  libvduse: Add VDUSE (vDPA Device in Userspace) library
  vduse-blk: implements vduse-blk export
  vduse-blk: Add vduse-blk resize support
  libvduse: Add support for reconnecting

 block/export/export.c   |6 +
 block/export/meson.build|5 +
 block/export/vduse-blk.c|  448 +++
 block/export/vduse-blk.h|   20 +
 include/standard-headers/linux/vduse.h  |  306 +
 meson.build |   28 +
 meson_options.txt   |4 +
 qapi/block-export.json  |   24 +-
 scripts/meson-buildoptions.sh   |7 +
 scripts/update-linux-headers.sh |1 +
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/libvduse.c | 1267 +++
 subprojects/libvduse/libvduse.h |  195 +++
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 15 files changed, 2321 insertions(+), 2 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h
 create mode 100644 include/standard-headers/linux/vduse.h
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

-- 
2.20.1




Re: [PATCH v2 2/3] block/export/fuse: Extract fuse_fallocate_zero_range()

2022-01-25 Thread Thomas Huth

On 24/01/2022 23.03, Philippe Mathieu-Daudé via wrote:

Signed-off-by: Philippe Mathieu-Daudé 
---
  block/export/fuse.c | 48 +
  1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 31cb0503adc..3a158342c75 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -603,6 +603,35 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, 
const char *buf,
  }
  }
  
+static bool fuse_fallocate_punch_hole(fuse_req_t req, fuse_ino_t inode,

+  int mode, int64_t blk_len,
+  off_t offset, off_t *length)
+{
+FuseExport *exp = fuse_req_userdata(req);
+
+if (mode & FALLOC_FL_KEEP_SIZE) {
+*length = MIN(*length, blk_len - offset);
+}
+
+if (mode & FALLOC_FL_PUNCH_HOLE) {
+int ret;
+
+if (!(mode & FALLOC_FL_KEEP_SIZE)) {
+fuse_reply_err(req, EINVAL);
+return false;
+}
+
+do {
+int size = MIN(*length, BDRV_REQUEST_MAX_BYTES);
+
+ret = blk_pdiscard(exp->common.blk, offset, size);
+offset += size;
+*length -= size;
+} while (ret == 0 && *length > 0);
+}
+return true;
+}
+
  static bool fuse_fallocate_zero_range(fuse_req_t req, fuse_ino_t inode,
int mode, int64_t blk_len,
off_t offset, off_t *length)
@@ -657,23 +686,8 @@ static void fuse_fallocate(fuse_req_t req, fuse_ino_t 
inode, int mode,
  return;
  }
  
-if (mode & FALLOC_FL_KEEP_SIZE) {

-length = MIN(length, blk_len - offset);
-}
-
-if (mode & FALLOC_FL_PUNCH_HOLE) {
-if (!(mode & FALLOC_FL_KEEP_SIZE)) {
-fuse_reply_err(req, EINVAL);
-return;
-}
-
-do {
-int size = MIN(length, BDRV_REQUEST_MAX_BYTES);
-
-ret = blk_pdiscard(exp->common.blk, offset, size);
-offset += size;
-length -= size;
-} while (ret == 0 && length > 0);
+if (!fuse_fallocate_punch_hole(req, inode, mode, blk_len, offset, 
)) {
+return;


Same issue as with the previous patch? If the do-while loop has been 
executed, the else branches below should not be called...


 Thomas


  } else if (!fuse_fallocate_zero_range(req, inode, blk_len, mode, offset, 
)) {
  return;
  } else if (!mode) {





Re: [PATCH v2 1/3] block/export/fuse: Extract fuse_fallocate_punch_hole()

2022-01-25 Thread Thomas Huth

On 24/01/2022 23.03, Philippe Mathieu-Daudé via wrote:

Extract fuse_fallocate_punch_hole() to avoid #ifdef'ry
mixed within if/else statement.

Signed-off-by: Philippe Mathieu-Daudé 
---
  block/export/fuse.c | 59 +++--
  1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 6710d8aed86..31cb0503adc 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -603,6 +603,38 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, 
const char *buf,
  }
  }
  
+static bool fuse_fallocate_zero_range(fuse_req_t req, fuse_ino_t inode,

+  int mode, int64_t blk_len,
+  off_t offset, off_t *length)
+{
+#ifdef CONFIG_FALLOCATE_ZERO_RANGE
+FuseExport *exp = fuse_req_userdata(req);
+
+if (mode & FALLOC_FL_ZERO_RANGE) {
+int ret;
+
+   if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + *length > blk_len) {
+/* No need for zeroes, we are going to write them ourselves */
+ret = fuse_do_truncate(exp, offset + *length, false,
+   PREALLOC_MODE_OFF);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return false;
+}
+}
+
+do {
+int size = MIN(*length, BDRV_REQUEST_MAX_BYTES);
+
+ret = blk_pwrite_zeroes(exp->common.blk, offset, size, 0);
+offset += size;
+*length -= size;
+} while (ret == 0 && *length > 0);
+}
+#endif /* CONFIG_FALLOCATE_ZERO_RANGE */
+return true;
+}
+
  /**
   * Let clients perform various fallocate() operations.
   */
@@ -642,30 +674,9 @@ static void fuse_fallocate(fuse_req_t req, fuse_ino_t 
inode, int mode,
  offset += size;
  length -= size;
  } while (ret == 0 && length > 0);
-}
-#ifdef CONFIG_FALLOCATE_ZERO_RANGE
-else if (mode & FALLOC_FL_ZERO_RANGE) {
-if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + length > blk_len) {
-/* No need for zeroes, we are going to write them ourselves */
-ret = fuse_do_truncate(exp, offset + length, false,
-   PREALLOC_MODE_OFF);
-if (ret < 0) {
-fuse_reply_err(req, -ret);
-return;
-}
-}
-
-do {
-int size = MIN(length, BDRV_REQUEST_MAX_BYTES);
-
-ret = blk_pwrite_zeroes(exp->common.blk,
-offset, size, 0);
-offset += size;
-length -= size;
-} while (ret == 0 && length > 0);


I might not have enough coffee today yet, but I think your patch is wrong: 
If the code executed this do-while loop/if-statement-branch, the other 
else-statements below were never called. Now with your patch, if the 
do-while loop in fuse_fallocate_zero_range() is called, the function will 
return with "true" at the end, causing the other else-statements below to be 
called, so that ret finally gets set to -EOPNOTSUPP. Or did I miss something?


 Thomas



-}
-#endif /* CONFIG_FALLOCATE_ZERO_RANGE */
-else if (!mode) {
+} else if (!fuse_fallocate_zero_range(req, inode, blk_len, mode, offset, 
)) {
+return;
+} else if (!mode) {
  /* We can only fallocate at the EOF with a truncate */
  if (offset < blk_len) {
  fuse_reply_err(req, EOPNOTSUPP);





Re: [PATCH v1 09/22] tests/lcitool: Refresh submodule and remove libxml2

2022-01-25 Thread Philippe Mathieu-Daudé via
On 1/25/22 11:23, Thomas Huth wrote:
> On 24/01/2022 21.15, Alex Bennée wrote:
>> From: Philippe Mathieu-Daudé 
>>
>> The previous commit removed all uses of libxml2.
>>
>> Refresh lcitool submodule, update qemu.yml and refresh the generated
>> files by running:
>>
>>    $ make lcitool-refresh
>>
>> Note: This refreshment also removes libudev dependency on Fedora
>> and CentOS due to libvirt-ci commit 18bfaee ("mappings: Improve
>> mapping for libudev"), since "The udev project has been absorbed
>> by the systemd project", and lttng-ust on FreeBSD runners due to
>> libvirt-ci commit 6dd9b6f ("guests: drop lttng-ust from FreeBSD
>> platform").
>>
>> Reviewed-by: Daniel P. Berrangé 
>> Signed-off-by: Philippe Mathieu-Daudé 
>> Signed-off-by: Alex Bennée 
>> Message-Id: <20220121154134.315047-6-f4...@amsat.org>
>> ---
>>   .gitlab-ci.d/cirrus/freebsd-12.vars   | 2 +-
>>   .gitlab-ci.d/cirrus/freebsd-13.vars   | 2 +-
>>   .gitlab-ci.d/cirrus/macos-11.vars | 2 +-
>>   tests/docker/dockerfiles/alpine.docker    | 4 ++--
>>   tests/docker/dockerfiles/centos8.docker   | 4 +---
>>   tests/docker/dockerfiles/fedora.docker    | 4 +---
>>   tests/docker/dockerfiles/opensuse-leap.docker | 3 +--
>>   tests/docker/dockerfiles/ubuntu1804.docker    | 3 +--
>>   tests/docker/dockerfiles/ubuntu2004.docker    | 3 +--
>>   tests/lcitool/libvirt-ci  | 2 +-
>>   tests/lcitool/projects/qemu.yml   | 1 -
>>   11 files changed, 11 insertions(+), 19 deletions(-)
>>
>> diff --git a/.gitlab-ci.d/cirrus/freebsd-12.vars
>> b/.gitlab-ci.d/cirrus/freebsd-12.vars
>> index 9c52266811..07f313aa3a 100644
>> --- a/.gitlab-ci.d/cirrus/freebsd-12.vars
>> +++ b/.gitlab-ci.d/cirrus/freebsd-12.vars
>> @@ -11,6 +11,6 @@ MAKE='/usr/local/bin/gmake'
>>   NINJA='/usr/local/bin/ninja'
>>   PACKAGING_COMMAND='pkg'
>>   PIP3='/usr/local/bin/pip-3.8'
>> -PKGS='alsa-lib bash bzip2 ca_root_nss capstone4 ccache
>> cdrkit-genisoimage ctags curl cyrus-sasl dbus diffutils dtc gettext
>> git glib gmake gnutls gsed gtk3 libepoxy libffi libgcrypt
>> libjpeg-turbo libnfs libspice-server libssh libtasn1 libxml2 llvm
>> lttng-ust lzo2 meson ncurses nettle ninja opencv p5-Test-Harness perl5
>> pixman pkgconf png py38-numpy py38-pillow py38-pip py38-sphinx
>> py38-sphinx_rtd_theme py38-virtualenv py38-yaml python3 rpm2cpio sdl2
>> sdl2_image snappy spice-protocol tesseract texinfo usbredir
>> virglrenderer vte3 zstd'
>> +PKGS='alsa-lib bash bzip2 ca_root_nss capstone4 ccache
>> cdrkit-genisoimage ctags curl cyrus-sasl dbus diffutils dtc
>> fusefs-libs3 gettext git glib gmake gnutls gsed gtk3 libepoxy libffi
>> libgcrypt libjpeg-turbo libnfs libspice-server libssh libtasn1 llvm
>> lzo2 meson ncurses nettle ninja opencv p5-Test-Harness perl5 pixman
>> pkgconf png py38-numpy py38-pillow py38-pip py38-sphinx
>> py38-sphinx_rtd_theme py38-virtualenv py38-yaml python3 rpm2cpio sdl2
>> sdl2_image snappy spice-protocol tesseract texinfo usbredir
>> virglrenderer vte3 zstd'
>>   PYPI_PKGS=''
>>   PYTHON='/usr/local/bin/python3'
>> diff --git a/.gitlab-ci.d/cirrus/freebsd-13.vars
>> b/.gitlab-ci.d/cirrus/freebsd-13.vars
>> index 7b44dba324..8a648dda1e 100644
>> --- a/.gitlab-ci.d/cirrus/freebsd-13.vars
>> +++ b/.gitlab-ci.d/cirrus/freebsd-13.vars
>> @@ -11,6 +11,6 @@ MAKE='/usr/local/bin/gmake'
>>   NINJA='/usr/local/bin/ninja'
>>   PACKAGING_COMMAND='pkg'
>>   PIP3='/usr/local/bin/pip-3.8'
>> -PKGS='alsa-lib bash bzip2 ca_root_nss capstone4 ccache
>> cdrkit-genisoimage ctags curl cyrus-sasl dbus diffutils dtc gettext
>> git glib gmake gnutls gsed gtk3 libepoxy libffi libgcrypt
>> libjpeg-turbo libnfs libspice-server libssh libtasn1 libxml2 llvm
>> lttng-ust lzo2 meson ncurses nettle ninja opencv p5-Test-Harness perl5
>> pixman pkgconf png py38-numpy py38-pillow py38-pip py38-sphinx
>> py38-sphinx_rtd_theme py38-virtualenv py38-yaml python3 rpm2cpio sdl2
>> sdl2_image snappy spice-protocol tesseract texinfo usbredir
>> virglrenderer vte3 zstd'
>> +PKGS='alsa-lib bash bzip2 ca_root_nss capstone4 ccache
>> cdrkit-genisoimage ctags curl cyrus-sasl dbus diffutils dtc
>> fusefs-libs3 gettext git glib gmake gnutls gsed gtk3 libepoxy libffi
>> libgcrypt libjpeg-turbo libnfs libspice-server libssh libtasn1 llvm
>> lzo2 meson ncurses nettle ninja opencv p5-Test-Harness perl5 pixman
>> pkgconf png py38-numpy py38-pillow py38-pip py38-sphinx
>> py38-sphinx_rtd_theme py38-virtualenv py38-yaml python3 rpm2cpio sdl2
>> sdl2_image snappy spice-protocol tesseract texinfo usbredir
>> virglrenderer vte3 zstd'
> 
> Seems like this now also added fusefs-libs3 on FreeBSD which causes the
> build to break:
> 
>  https://gitlab.com/thuth/qemu/-/jobs/2012083924#L3454
> 
> Any ideas how to best fix this?

Candidate patch:
https://lore.kernel.org/qemu-devel/20220124220357.74017-1-f4...@amsat.org/



Re: [PATCH v1 09/22] tests/lcitool: Refresh submodule and remove libxml2

2022-01-25 Thread Thomas Huth

On 24/01/2022 21.15, Alex Bennée wrote:

From: Philippe Mathieu-Daudé 

The previous commit removed all uses of libxml2.

Refresh lcitool submodule, update qemu.yml and refresh the generated
files by running:

   $ make lcitool-refresh

Note: This refreshment also removes libudev dependency on Fedora
and CentOS due to libvirt-ci commit 18bfaee ("mappings: Improve
mapping for libudev"), since "The udev project has been absorbed
by the systemd project", and lttng-ust on FreeBSD runners due to
libvirt-ci commit 6dd9b6f ("guests: drop lttng-ust from FreeBSD
platform").

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Alex Bennée 
Message-Id: <20220121154134.315047-6-f4...@amsat.org>
---
  .gitlab-ci.d/cirrus/freebsd-12.vars   | 2 +-
  .gitlab-ci.d/cirrus/freebsd-13.vars   | 2 +-
  .gitlab-ci.d/cirrus/macos-11.vars | 2 +-
  tests/docker/dockerfiles/alpine.docker| 4 ++--
  tests/docker/dockerfiles/centos8.docker   | 4 +---
  tests/docker/dockerfiles/fedora.docker| 4 +---
  tests/docker/dockerfiles/opensuse-leap.docker | 3 +--
  tests/docker/dockerfiles/ubuntu1804.docker| 3 +--
  tests/docker/dockerfiles/ubuntu2004.docker| 3 +--
  tests/lcitool/libvirt-ci  | 2 +-
  tests/lcitool/projects/qemu.yml   | 1 -
  11 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/.gitlab-ci.d/cirrus/freebsd-12.vars 
b/.gitlab-ci.d/cirrus/freebsd-12.vars
index 9c52266811..07f313aa3a 100644
--- a/.gitlab-ci.d/cirrus/freebsd-12.vars
+++ b/.gitlab-ci.d/cirrus/freebsd-12.vars
@@ -11,6 +11,6 @@ MAKE='/usr/local/bin/gmake'
  NINJA='/usr/local/bin/ninja'
  PACKAGING_COMMAND='pkg'
  PIP3='/usr/local/bin/pip-3.8'
-PKGS='alsa-lib bash bzip2 ca_root_nss capstone4 ccache cdrkit-genisoimage 
ctags curl cyrus-sasl dbus diffutils dtc gettext git glib gmake gnutls gsed 
gtk3 libepoxy libffi libgcrypt libjpeg-turbo libnfs libspice-server libssh 
libtasn1 libxml2 llvm lttng-ust lzo2 meson ncurses nettle ninja opencv 
p5-Test-Harness perl5 pixman pkgconf png py38-numpy py38-pillow py38-pip 
py38-sphinx py38-sphinx_rtd_theme py38-virtualenv py38-yaml python3 rpm2cpio 
sdl2 sdl2_image snappy spice-protocol tesseract texinfo usbredir virglrenderer 
vte3 zstd'
+PKGS='alsa-lib bash bzip2 ca_root_nss capstone4 ccache cdrkit-genisoimage 
ctags curl cyrus-sasl dbus diffutils dtc fusefs-libs3 gettext git glib gmake 
gnutls gsed gtk3 libepoxy libffi libgcrypt libjpeg-turbo libnfs libspice-server 
libssh libtasn1 llvm lzo2 meson ncurses nettle ninja opencv p5-Test-Harness 
perl5 pixman pkgconf png py38-numpy py38-pillow py38-pip py38-sphinx 
py38-sphinx_rtd_theme py38-virtualenv py38-yaml python3 rpm2cpio sdl2 
sdl2_image snappy spice-protocol tesseract texinfo usbredir virglrenderer vte3 
zstd'
  PYPI_PKGS=''
  PYTHON='/usr/local/bin/python3'
diff --git a/.gitlab-ci.d/cirrus/freebsd-13.vars 
b/.gitlab-ci.d/cirrus/freebsd-13.vars
index 7b44dba324..8a648dda1e 100644
--- a/.gitlab-ci.d/cirrus/freebsd-13.vars
+++ b/.gitlab-ci.d/cirrus/freebsd-13.vars
@@ -11,6 +11,6 @@ MAKE='/usr/local/bin/gmake'
  NINJA='/usr/local/bin/ninja'
  PACKAGING_COMMAND='pkg'
  PIP3='/usr/local/bin/pip-3.8'
-PKGS='alsa-lib bash bzip2 ca_root_nss capstone4 ccache cdrkit-genisoimage 
ctags curl cyrus-sasl dbus diffutils dtc gettext git glib gmake gnutls gsed 
gtk3 libepoxy libffi libgcrypt libjpeg-turbo libnfs libspice-server libssh 
libtasn1 libxml2 llvm lttng-ust lzo2 meson ncurses nettle ninja opencv 
p5-Test-Harness perl5 pixman pkgconf png py38-numpy py38-pillow py38-pip 
py38-sphinx py38-sphinx_rtd_theme py38-virtualenv py38-yaml python3 rpm2cpio 
sdl2 sdl2_image snappy spice-protocol tesseract texinfo usbredir virglrenderer 
vte3 zstd'
+PKGS='alsa-lib bash bzip2 ca_root_nss capstone4 ccache cdrkit-genisoimage 
ctags curl cyrus-sasl dbus diffutils dtc fusefs-libs3 gettext git glib gmake 
gnutls gsed gtk3 libepoxy libffi libgcrypt libjpeg-turbo libnfs libspice-server 
libssh libtasn1 llvm lzo2 meson ncurses nettle ninja opencv p5-Test-Harness 
perl5 pixman pkgconf png py38-numpy py38-pillow py38-pip py38-sphinx 
py38-sphinx_rtd_theme py38-virtualenv py38-yaml python3 rpm2cpio sdl2 
sdl2_image snappy spice-protocol tesseract texinfo usbredir virglrenderer vte3 
zstd'


Seems like this now also added fusefs-libs3 on FreeBSD which causes the 
build to break:


 https://gitlab.com/thuth/qemu/-/jobs/2012083924#L3454

Any ideas how to best fix this?

 Thomas




Re: [PATCH V2 for-6.2 0/2] fixes for bdrv_co_block_status

2022-01-25 Thread Stefano Garzarella

On Thu, Jan 20, 2022 at 10:19:27AM +0100, Peter Lieven wrote:

Am 19.01.22 um 15:57 schrieb Stefano Garzarella:

On Fri, Jan 14, 2022 at 11:58:40AM +0100, Ilya Dryomov wrote:

On Thu, Jan 13, 2022 at 3:44 PM Peter Lieven  wrote:


V1->V2:
 Patch 1: Treat a hole just like an unallocated area. [Ilya]
 Patch 2: Apply workaround only for pre-Quincy librbd versions and
  ensure default striping and non child images. [Ilya]

Peter Lieven (2):
  block/rbd: fix handling of holes in .bdrv_co_block_status
  block/rbd: workaround for ceph issue #53784

 block/rbd.c | 52 +---
 1 file changed, 45 insertions(+), 7 deletions(-)

--
2.25.1




These patches have both "for-6.2" in the subject and
Cc: qemu-sta...@nongnu.org in the description, which is a little
confusing.  Just want to clarify that they should go into master
and be backported to 6.2.


Yeah, a bit confusing. These are for 7.0, so @Kevin can these patches go with 
your tree?



Yes, sorry, my fault. It should be 7.0


Don't worry :-)

What about sending a v3 fixing the version tag (I think you can just 
remove for-6.2), the extra space in the comment, and the Fixes tag on 
patch 2?


If you will send v3, remember to report the R-b/T-b tags received in 
this version from me and Ilya.


Thanks,
Stefano




Re: [PATCH] block: bdrv_set_backing_hd(): use drained section

2022-01-25 Thread Vladimir Sementsov-Ogievskiy

25.01.2022 12:24, Paolo Bonzini wrote:

On 1/24/22 18:37, Vladimir Sementsov-Ogievskiy wrote:

Graph modifications should be done in drained section. stream_prepare()
handler of block stream job call bdrv_set_backing_hd() without using
drained section and it's theoretically possible that some IO request
will interleave with graph modification and will use outdated pointers
to removed block nodes.

Some other callers use bdrv_set_backing_hd() not caring about drained
sections too. So it seems good to make a drained section exactly in
bdrv_set_backing_hd().


Emanuele has a similar patch in his series to protect all graph
modifications with drains:

@@ -3456,6 +3478,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,

  assert(qemu_in_main_thread());

+    bdrv_subtree_drained_begin_unlocked(bs);
+    if (backing_hd) {
+    bdrv_subtree_drained_begin_unlocked(backing_hd);
+    }
+
  ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
  if (ret < 0) {
  goto out;
@@ -3464,6 +3491,10 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
  ret = bdrv_refresh_perms(bs, errp);
  out:
  tran_finalize(tran, ret);
+    if (backing_hd) {
+    bdrv_subtree_drained_end_unlocked(backing_hd);
+    }
+    bdrv_subtree_drained_end_unlocked(bs);

  return ret;
  }

so the idea at least is correct.

I don't object to fixing this independently, but please check
1) if a subtree drain would be more appropriate, 2) whether
backing_hd should be drained as well, 3) whether we're guaranteed
to be holding the AioContext lock as required for
bdrv_drained_begin/end.



Hmm.

1. Subtree draining of backing_hd will not help, as bs is not drained, we still 
may have in-fight request in bs, touching old bs->backing.

2. I think non-recursive drain of bs is enough. We modify only bs node, so we 
should drain it. backing_hd itself is not modified. If backing_hd participate in 
some other backing chain - it's not touched, and in-flight requests in that other 
chain are not broken by modification, so why to drain it? Same for old 
bs->backing and other bs children. We are not interested in in-flight requests 
in subtree which are not part of request in bs. So, if no inflight requests in bs, 
we can modify bs and not care about requests in subtree.

3. Jobs are bound to aio context, so I believe that they care to hold AioContext 
lock. For example, on path job_prepare may be called through job_exit(), 
job_exit() does aio_context_acquire(job->aio_context), or it may be called 
through job_cancel(), which seems to be called under aio_context_acquire() as 
well. So, seems in general we care about it, and of course bdrv_set_backing_hd() 
must be called with AioContext lock held. If for some code path it isn't, it's a 
bug..


--
Best regards,
Vladimir



Re: [PATCH] block: bdrv_set_backing_hd(): use drained section

2022-01-25 Thread Paolo Bonzini

On 1/24/22 18:37, Vladimir Sementsov-Ogievskiy wrote:

Graph modifications should be done in drained section. stream_prepare()
handler of block stream job call bdrv_set_backing_hd() without using
drained section and it's theoretically possible that some IO request
will interleave with graph modification and will use outdated pointers
to removed block nodes.

Some other callers use bdrv_set_backing_hd() not caring about drained
sections too. So it seems good to make a drained section exactly in
bdrv_set_backing_hd().


Emanuele has a similar patch in his series to protect all graph
modifications with drains:

@@ -3456,6 +3478,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
 
 assert(qemu_in_main_thread());
 
+bdrv_subtree_drained_begin_unlocked(bs);

+if (backing_hd) {
+bdrv_subtree_drained_begin_unlocked(backing_hd);
+}
+
 ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
 if (ret < 0) {
 goto out;
@@ -3464,6 +3491,10 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
 ret = bdrv_refresh_perms(bs, errp);
 out:
 tran_finalize(tran, ret);
+if (backing_hd) {
+bdrv_subtree_drained_end_unlocked(backing_hd);
+}
+bdrv_subtree_drained_end_unlocked(bs);
 
 return ret;

 }

so the idea at least is correct.

I don't object to fixing this independently, but please check
1) if a subtree drain would be more appropriate, 2) whether
backing_hd should be drained as well, 3) whether we're guaranteed
to be holding the AioContext lock as required for
bdrv_drained_begin/end.

Paolo



Re: [PATCH 1/1] hw/usb: pacify xhciwmi.exe warning

2022-01-25 Thread Denis V. Lunev
On 1/25/22 10:09 AM, Gerd Hoffmann wrote:
>   Hi,
>
>>>  case CR_VENDOR_NEC_FIRMWARE_REVISION:
>>>  if (xhci->nec_quirks) {
>>>  event.type = 48; /* NEC reply */
>>> -event.length = 0x3025;
>>> +event.length = 0x3034;
>> ping v2
> Queued now (missed last pull request due to missing qemu-devek, which
> confused my mail filtering).
>
> take care,
>   Gerd
>
thanks!



Re: [PATCH 1/1] hw/usb: pacify xhciwmi.exe warning

2022-01-25 Thread Gerd Hoffmann
  Hi,

> >  case CR_VENDOR_NEC_FIRMWARE_REVISION:
> >  if (xhci->nec_quirks) {
> >  event.type = 48; /* NEC reply */
> > -event.length = 0x3025;
> > +event.length = 0x3034;

> ping v2

Queued now (missed last pull request due to missing qemu-devek, which
confused my mail filtering).

take care,
  Gerd