date:20210322

Re: [PATCH v1] vhost-user-blk: use different event handlers on init and operation

2021-03-22 Thread Denis Plotnikov


ping!


On 11.03.2021 11:10, Denis Plotnikov wrote:

Commit a1a20d06b73e "vhost-user-blk: delay vhost_user_blk_disconnect"
introduced postponing vhost_dev cleanup aiming to eliminate qemu aborts
because of connection problems with vhost-blk daemon.

However, it introdues a new problem. Now, any communication errors
during execution of vhost_dev_init() called by vhost_user_blk_device_realize()
lead to qemu abort on assert in vhost_dev_get_config().

This happens because vhost_user_blk_disconnect() is postponed but
it should have dropped s->connected flag by the time
vhost_user_blk_device_realize() performs a new connection opening.
On the connection opening, vhost_dev initialization in
vhost_user_blk_connect() relies on s->connection flag and
if it's not dropped, it skips vhost_dev initialization and returns
with success. Then, vhost_user_blk_device_realize()'s execution flow
goes to vhost_dev_get_config() where it's aborted on the assert.

It seems connection/disconnection processing should happen
differently on initialization and operation of vhost-user-blk.
On initialization (in vhost_user_blk_device_realize()) we fully
control the initialization process. At that point, nobody can use the
device since it isn't initialized and we don't need to postpone any
cleanups, so we can do cleanup right away when there is communication
problems with the vhost-blk daemon.
On operation the disconnect may happen when the device is in use, so
the device users may want to use vhost_dev's data to do rollback before
vhost_dev is re-initialized (e.g. in vhost_dev_set_log()), so we
postpone the cleanup.

The patch splits those two cases, and performs the cleanup immediately on
initialization, and postpones cleanup when the device is initialized and
in use.

Signed-off-by: Denis Plotnikov 
---
  hw/block/vhost-user-blk.c | 88 ---
  1 file changed, 54 insertions(+), 34 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index b870a50e6b20..84940122b8ca 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -362,7 +362,17 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
  vhost_dev_cleanup(&s->dev);
  }
  
-static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);

+static void vhost_user_blk_event(void *opaque, QEMUChrEvent event, bool init);
+
+static void vhost_user_blk_event_init(void *opaque, QEMUChrEvent event)
+{
+vhost_user_blk_event(opaque, event, true);
+}
+
+static void vhost_user_blk_event_oper(void *opaque, QEMUChrEvent event)
+{
+vhost_user_blk_event(opaque, event, false);
+}
  
  static void vhost_user_blk_chr_closed_bh(void *opaque)

  {
@@ -371,11 +381,11 @@ static void vhost_user_blk_chr_closed_bh(void *opaque)
  VHostUserBlk *s = VHOST_USER_BLK(vdev);
  
  vhost_user_blk_disconnect(dev);

-qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
-NULL, opaque, NULL, true);
+qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL,
+vhost_user_blk_event_oper, NULL, opaque, NULL, true);
  }
  
-static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)

+static void vhost_user_blk_event(void *opaque, QEMUChrEvent event, bool init)
  {
  DeviceState *dev = opaque;
  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -390,38 +400,42 @@ static void vhost_user_blk_event(void *opaque, 
QEMUChrEvent event)
  break;
  case CHR_EVENT_CLOSED:
  /*
- * A close event may happen during a read/write, but vhost
- * code assumes the vhost_dev remains setup, so delay the
- * stop & clear. There are two possible paths to hit this
- * disconnect event:
- * 1. When VM is in the RUN_STATE_PRELAUNCH state. The
- * vhost_user_blk_device_realize() is a caller.
- * 2. In tha main loop phase after VM start.
- *
- * For p2 the disconnect event will be delayed. We can't
- * do the same for p1, because we are not running the loop
- * at this moment. So just skip this step and perform
- * disconnect in the caller function.
- *
- * TODO: maybe it is a good idea to make the same fix
- * for other vhost-user devices.
+ * Closing the connection should happen differently on device
+ * initialization and operation stages.
+ * On initalization, we want to re-start vhost_dev initialization
+ * from the very beginning right away when the connection is closed,
+ * so we clean up vhost_dev on each connection closing.
+ * On operation, we want to postpone vhost_dev cleanup to let the
+ * other code perform its own cleanup sequence using vhost_dev data
+ * (e.g. vhost_dev_set_log).
   */
-if (runstate_is_running()) {
+if (init) {
+vhost_user_blk_disconnect(dev);
+} else {
+/*
+ * A close event may happen during a re

[PATCH] nvme: expose 'bootindex' property

2021-03-22 Thread Joelle van Dyne

The check for `n->namespace.blkconf.blk` always fails because
this is in the initialization function.

Signed-off-by: Joelle van Dyne 
---
 hw/block/nvme.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6842b01ab5..42605fc55d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -6330,11 +6330,9 @@ static void nvme_instance_init(Object *obj)
 {
 NvmeCtrl *n = NVME(obj);
 
-if (n->namespace.blkconf.blk) {
-device_add_bootindex_property(obj, &n->namespace.blkconf.bootindex,
-  "bootindex", "/namespace@1,0",
-  DEVICE(obj));
-}
+device_add_bootindex_property(obj, &n->namespace.blkconf.bootindex,
+  "bootindex", "/namespace@1,0",
+  DEVICE(obj));
 
 object_property_add(obj, "smart_critical_warning", "uint8",
 nvme_get_smart_warning,
-- 
2.28.0

[PATCH 3/3] vhost-user-blk-test: test discard/write zeroes invalid inputs

2021-03-22 Thread Stefan Hajnoczi

Exercise input validation code paths in
block/export/vhost-user-blk-server.c.

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20210309094106.196911-5-stefa...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qtest/vhost-user-blk-test.c | 124 ++
 1 file changed, 124 insertions(+)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index d37e1c30bd..8796c74ca4 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -94,6 +94,124 @@ static uint64_t virtio_blk_request(QGuestAllocator *alloc, 
QVirtioDevice *d,
 return addr;
 }
 
+static void test_invalid_discard_write_zeroes(QVirtioDevice *dev,
+  QGuestAllocator *alloc,
+  QTestState *qts,
+  QVirtQueue *vq,
+  uint32_t type)
+{
+QVirtioBlkReq req;
+struct virtio_blk_discard_write_zeroes dwz_hdr;
+struct virtio_blk_discard_write_zeroes dwz_hdr2[2];
+uint64_t req_addr;
+uint32_t free_head;
+uint8_t status;
+
+/* More than one dwz is not supported */
+req.type = type;
+req.data = (char *) dwz_hdr2;
+dwz_hdr2[0].sector = 0;
+dwz_hdr2[0].num_sectors = 1;
+dwz_hdr2[0].flags = 0;
+dwz_hdr2[1].sector = 1;
+dwz_hdr2[1].num_sectors = 1;
+dwz_hdr2[1].flags = 0;
+
+virtio_blk_fix_dwz_hdr(dev, &dwz_hdr2[0]);
+virtio_blk_fix_dwz_hdr(dev, &dwz_hdr2[1]);
+
+req_addr = virtio_blk_request(alloc, dev, &req, sizeof(dwz_hdr2));
+
+free_head = qvirtqueue_add(qts, vq, req_addr, 16, false, true);
+qvirtqueue_add(qts, vq, req_addr + 16, sizeof(dwz_hdr2), false, true);
+qvirtqueue_add(qts, vq, req_addr + 16 + sizeof(dwz_hdr2), 1, true,
+   false);
+
+qvirtqueue_kick(qts, dev, vq, free_head);
+
+qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+   QVIRTIO_BLK_TIMEOUT_US);
+status = readb(req_addr + 16 + sizeof(dwz_hdr2));
+g_assert_cmpint(status, ==, VIRTIO_BLK_S_UNSUPP);
+
+guest_free(alloc, req_addr);
+
+/* num_sectors must be less than config->max_write_zeroes_sectors */
+req.type = type;
+req.data = (char *) &dwz_hdr;
+dwz_hdr.sector = 0;
+dwz_hdr.num_sectors = 0x;
+dwz_hdr.flags = 0;
+
+virtio_blk_fix_dwz_hdr(dev, &dwz_hdr);
+
+req_addr = virtio_blk_request(alloc, dev, &req, sizeof(dwz_hdr));
+
+free_head = qvirtqueue_add(qts, vq, req_addr, 16, false, true);
+qvirtqueue_add(qts, vq, req_addr + 16, sizeof(dwz_hdr), false, true);
+qvirtqueue_add(qts, vq, req_addr + 16 + sizeof(dwz_hdr), 1, true,
+   false);
+
+qvirtqueue_kick(qts, dev, vq, free_head);
+
+qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+   QVIRTIO_BLK_TIMEOUT_US);
+status = readb(req_addr + 16 + sizeof(dwz_hdr));
+g_assert_cmpint(status, ==, VIRTIO_BLK_S_IOERR);
+
+guest_free(alloc, req_addr);
+
+/* sector must be less than the device capacity */
+req.type = type;
+req.data = (char *) &dwz_hdr;
+dwz_hdr.sector = TEST_IMAGE_SIZE / 512 + 1;
+dwz_hdr.num_sectors = 1;
+dwz_hdr.flags = 0;
+
+virtio_blk_fix_dwz_hdr(dev, &dwz_hdr);
+
+req_addr = virtio_blk_request(alloc, dev, &req, sizeof(dwz_hdr));
+
+free_head = qvirtqueue_add(qts, vq, req_addr, 16, false, true);
+qvirtqueue_add(qts, vq, req_addr + 16, sizeof(dwz_hdr), false, true);
+qvirtqueue_add(qts, vq, req_addr + 16 + sizeof(dwz_hdr), 1, true,
+   false);
+
+qvirtqueue_kick(qts, dev, vq, free_head);
+
+qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+   QVIRTIO_BLK_TIMEOUT_US);
+status = readb(req_addr + 16 + sizeof(dwz_hdr));
+g_assert_cmpint(status, ==, VIRTIO_BLK_S_IOERR);
+
+guest_free(alloc, req_addr);
+
+/* reserved flag bits must be zero */
+req.type = type;
+req.data = (char *) &dwz_hdr;
+dwz_hdr.sector = 0;
+dwz_hdr.num_sectors = 1;
+dwz_hdr.flags = ~VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP;
+
+virtio_blk_fix_dwz_hdr(dev, &dwz_hdr);
+
+req_addr = virtio_blk_request(alloc, dev, &req, sizeof(dwz_hdr));
+
+free_head = qvirtqueue_add(qts, vq, req_addr, 16, false, true);
+qvirtqueue_add(qts, vq, req_addr + 16, sizeof(dwz_hdr), false, true);
+qvirtqueue_add(qts, vq, req_addr + 16 + sizeof(dwz_hdr), 1, true,
+   false);
+
+qvirtqueue_kick(qts, dev, vq, free_head);
+
+qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+   QVIRTIO_BLK_TIMEOUT_US);
+status = readb(req_addr + 16 + sizeof(dwz_hdr));
+g_assert_cmpint(status, ==, VIRTIO_BLK_S_UNSUPP);
+
+guest_free(alloc, req_addr);
+}
+
 /* Returns the request virtqueue so the caller can perform further tests */
 static QVirtQueue *test_basic(QVirtioDevice *dev, QGuestAllocator *all

[PATCH 2/3] tests/qtest: add multi-queue test case to vhost-user-blk-test

2021-03-22 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20210309094106.196911-4-stefa...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 tests/qtest/vhost-user-blk-test.c | 81 +--
 1 file changed, 76 insertions(+), 5 deletions(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 3e79549899..d37e1c30bd 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -569,6 +569,67 @@ static void pci_hotplug(void *obj, void *data, 
QGuestAllocator *t_alloc)
 qpci_unplug_acpi_device_test(qts, "drv1", PCI_SLOT_HP);
 }
 
+static void multiqueue(void *obj, void *data, QGuestAllocator *t_alloc)
+{
+QVirtioPCIDevice *pdev1 = obj;
+QVirtioDevice *dev1 = &pdev1->vdev;
+QVirtioPCIDevice *pdev8;
+QVirtioDevice *dev8;
+QTestState *qts = pdev1->pdev->bus->qts;
+uint64_t features;
+uint16_t num_queues;
+
+/*
+ * The primary device has 1 queue and VIRTIO_BLK_F_MQ is not enabled. The
+ * VIRTIO specification allows VIRTIO_BLK_F_MQ to be enabled when there is
+ * only 1 virtqueue, but --device vhost-user-blk-pci doesn't do this (which
+ * is also spec-compliant).
+ */
+features = qvirtio_get_features(dev1);
+g_assert_cmpint(features & (1u << VIRTIO_BLK_F_MQ), ==, 0);
+features = features & ~(QVIRTIO_F_BAD_FEATURE |
+(1u << VIRTIO_RING_F_INDIRECT_DESC) |
+(1u << VIRTIO_F_NOTIFY_ON_EMPTY) |
+(1u << VIRTIO_BLK_F_SCSI));
+qvirtio_set_features(dev1, features);
+
+/* Hotplug a secondary device with 8 queues */
+qtest_qmp_device_add(qts, "vhost-user-blk-pci", "drv1",
+ "{'addr': %s, 'chardev': 'char2', 'num-queues': 8}",
+ stringify(PCI_SLOT_HP) ".0");
+
+pdev8 = virtio_pci_new(pdev1->pdev->bus,
+   &(QPCIAddress) {
+   .devfn = QPCI_DEVFN(PCI_SLOT_HP, 0)
+   });
+g_assert_nonnull(pdev8);
+g_assert_cmpint(pdev8->vdev.device_type, ==, VIRTIO_ID_BLOCK);
+
+qos_object_start_hw(&pdev8->obj);
+
+dev8 = &pdev8->vdev;
+features = qvirtio_get_features(dev8);
+g_assert_cmpint(features & (1u << VIRTIO_BLK_F_MQ),
+==,
+(1u << VIRTIO_BLK_F_MQ));
+features = features & ~(QVIRTIO_F_BAD_FEATURE |
+(1u << VIRTIO_RING_F_INDIRECT_DESC) |
+(1u << VIRTIO_F_NOTIFY_ON_EMPTY) |
+(1u << VIRTIO_BLK_F_SCSI) |
+(1u << VIRTIO_BLK_F_MQ));
+qvirtio_set_features(dev8, features);
+
+num_queues = qvirtio_config_readw(dev8,
+offsetof(struct virtio_blk_config, num_queues));
+g_assert_cmpint(num_queues, ==, 8);
+
+qvirtio_pci_device_disable(pdev8);
+qos_object_destroy(&pdev8->obj);
+
+/* unplug secondary disk */
+qpci_unplug_acpi_device_test(qts, "drv1", PCI_SLOT_HP);
+}
+
 /*
  * Check that setting the vring addr on a non-existent virtqueue does
  * not crash.
@@ -688,7 +749,8 @@ static void quit_storage_daemon(void *data)
 g_free(data);
 }
 
-static void start_vhost_user_blk(GString *cmd_line, int vus_instances)
+static void start_vhost_user_blk(GString *cmd_line, int vus_instances,
+ int num_queues)
 {
 const char *vhost_user_blk_bin = qtest_qemu_storage_daemon_binary();
 int i;
@@ -713,8 +775,8 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances)
 g_string_append_printf(storage_daemon_command,
 "--blockdev driver=file,node-name=disk%d,filename=%s "
 "--export 
type=vhost-user-blk,id=disk%d,addr.type=unix,addr.path=%s,"
-"node-name=disk%i,writable=on ",
-i, img_path, i, sock_path, i);
+"node-name=disk%i,writable=on,num-queues=%d ",
+i, img_path, i, sock_path, i, num_queues);
 
 g_string_append_printf(cmd_line, "-chardev socket,id=char%d,path=%s ",
i + 1, sock_path);
@@ -748,7 +810,7 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances)
 
 static void *vhost_user_blk_test_setup(GString *cmd_line, void *arg)
 {
-start_vhost_user_blk(cmd_line, 1);
+start_vhost_user_blk(cmd_line, 1, 1);
 return arg;
 }
 
@@ -762,7 +824,13 @@ static void *vhost_user_blk_test_setup(GString *cmd_line, 
void *arg)
 static void *vhost_user_blk_hotplug_test_setup(GString *cmd_line, void *arg)
 {
 /* "-chardev socket,id=char2" is used for pci_hotplug*/
-start_vhost_user_blk(cmd_line, 2);
+start_vhost_user_blk(cmd_line, 2, 1);
+return arg;
+}
+
+static void *vhost_user_blk_multiqueue_test_setup(GString *cmd_line, void *arg)
+{
+start_vhost_user_blk(cmd_line, 2, 8);
 return arg;
 }
 
@@ -789,6 +857,9 @@ static void register_vhost_user_blk_test(void)
 
 opts.before = vhos

[PATCH 0/3] vhost-user-blk-test: add tests for the vhost-user-blk server

2021-03-22 Thread Stefan Hajnoczi

These patches add a qtest for the vhost-user-blk server. CI found several
issues that caused these patches to be dropped from Michael Tsirkin and Kevin
Wolf's pull requests in the past. Hopefully they will go in smoothly this time!

Coiby Xu (1):
  test: new qTest case to test the vhost-user-blk-server

Stefan Hajnoczi (2):
  tests/qtest: add multi-queue test case to vhost-user-blk-test
  vhost-user-blk-test: test discard/write zeroes invalid inputs

 MAINTAINERS |   2 +
 tests/qtest/libqos/vhost-user-blk.h |  48 ++
 tests/qtest/libqos/vhost-user-blk.c | 130 
 tests/qtest/vhost-user-blk-test.c   | 989 
 tests/qtest/libqos/meson.build  |   1 +
 tests/qtest/meson.build |   4 +
 6 files changed, 1174 insertions(+)
 create mode 100644 tests/qtest/libqos/vhost-user-blk.h
 create mode 100644 tests/qtest/libqos/vhost-user-blk.c
 create mode 100644 tests/qtest/vhost-user-blk-test.c

-- 
2.30.2

[PATCH 1/3] test: new qTest case to test the vhost-user-blk-server

2021-03-22 Thread Stefan Hajnoczi

From: Coiby Xu 

This test case has the same tests as tests/virtio-blk-test.c except for
tests have block_resize. Since the vhost-user-blk export only serves one
client one time, two exports are started by qemu-storage-daemon for the
hotplug test.

Suggested-by: Thomas Huth 
Signed-off-by: Coiby Xu 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20210309094106.196911-3-stefa...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 MAINTAINERS |   2 +
 tests/qtest/libqos/vhost-user-blk.h |  48 ++
 tests/qtest/libqos/vhost-user-blk.c | 130 +
 tests/qtest/vhost-user-blk-test.c   | 794 
 tests/qtest/libqos/meson.build  |   1 +
 tests/qtest/meson.build |   4 +
 6 files changed, 979 insertions(+)
 create mode 100644 tests/qtest/libqos/vhost-user-blk.h
 create mode 100644 tests/qtest/libqos/vhost-user-blk.c
 create mode 100644 tests/qtest/vhost-user-blk-test.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 25fc49d1dc..34cfe468f1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3251,6 +3251,8 @@ F: block/export/vhost-user-blk-server.c
 F: block/export/vhost-user-blk-server.h
 F: include/qemu/vhost-user-server.h
 F: tests/qtest/libqos/vhost-user-blk.c
+F: tests/qtest/libqos/vhost-user-blk.h
+F: tests/qtest/vhost-user-blk-test.c
 F: util/vhost-user-server.c
 
 FUSE block device exports
diff --git a/tests/qtest/libqos/vhost-user-blk.h 
b/tests/qtest/libqos/vhost-user-blk.h
new file mode 100644
index 00..2a03456a45
--- /dev/null
+++ b/tests/qtest/libqos/vhost-user-blk.h
@@ -0,0 +1,48 @@
+/*
+ * libqos driver framework
+ *
+ * Based on tests/qtest/libqos/virtio-blk.c
+ *
+ * Copyright (c) 2020 Coiby Xu 
+ *
+ * Copyright (c) 2018 Emanuele Giuseppe Esposito 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License version 2 as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#ifndef TESTS_LIBQOS_VHOST_USER_BLK_H
+#define TESTS_LIBQOS_VHOST_USER_BLK_H
+
+#include "qgraph.h"
+#include "virtio.h"
+#include "virtio-pci.h"
+
+typedef struct QVhostUserBlk QVhostUserBlk;
+typedef struct QVhostUserBlkPCI QVhostUserBlkPCI;
+typedef struct QVhostUserBlkDevice QVhostUserBlkDevice;
+
+struct QVhostUserBlk {
+QVirtioDevice *vdev;
+};
+
+struct QVhostUserBlkPCI {
+QVirtioPCIDevice pci_vdev;
+QVhostUserBlk blk;
+};
+
+struct QVhostUserBlkDevice {
+QOSGraphObject obj;
+QVhostUserBlk blk;
+};
+
+#endif
diff --git a/tests/qtest/libqos/vhost-user-blk.c 
b/tests/qtest/libqos/vhost-user-blk.c
new file mode 100644
index 00..568c3426ed
--- /dev/null
+++ b/tests/qtest/libqos/vhost-user-blk.c
@@ -0,0 +1,130 @@
+/*
+ * libqos driver framework
+ *
+ * Based on tests/qtest/libqos/virtio-blk.c
+ *
+ * Copyright (c) 2020 Coiby Xu 
+ *
+ * Copyright (c) 2018 Emanuele Giuseppe Esposito 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License version 2.1 as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "qemu/module.h"
+#include "standard-headers/linux/virtio_blk.h"
+#include "vhost-user-blk.h"
+
+#define PCI_SLOT0x04
+#define PCI_FN  0x00
+
+/* virtio-blk-device */
+static void *qvhost_user_blk_get_driver(QVhostUserBlk *v_blk,
+const char *interface)
+{
+if (!g_strcmp0(interface, "vhost-user-blk")) {
+return v_blk;
+}
+if (!g_strcmp0(interface, "virtio")) {
+return v_blk->vdev;
+}
+
+fprintf(stderr, "%s not present in vhost-user-blk-device\n", interface);
+g_assert_not_reached();
+}
+
+static void *qvhost_user_blk_device_get_driver(void *object,
+   const char *interface)
+{
+QVhostUserBlkDevice *v_blk = object;
+return qvhost_user_blk_get_driver(&v_blk->blk, interface);
+}
+
+static void *vhost_user_blk_device_create(void *virtio_dev,
+  QGuestAllocator *t_alloc,
+  void *addr)
+{
+QV

[PATCH V4] file-posix: allow -EBUSY error during ioctl(fd, BLKZEROOUT, range) on block

2021-03-22 Thread ChangLimin

For Linux 5.10/5.11, qemu write zeros to a multipath device using
ioctl(fd, BLKZEROOUT, range) with cache none or directsync return -EBUSY
permanently. Fallback to pwritev instead of exit for -EBUSY error.

The issue was introduced in Linux 5.10:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=384d87ef2c954fc58e6c5fd8253e4a1984f5fe02

Fixed in Linux 5.12:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56887cffe946bb0a90c74429fa94d6110a73119d

Signed-off-by: ChangLimin 
---
 block/file-posix.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 20e14f8e96..d4054ac9cb 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1624,8 +1624,12 @@ static ssize_t 
handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
 } while (errno == EINTR);

 ret = translate_err(-errno);
-if (ret == -ENOTSUP) {
-s->has_write_zeroes = false;
+switch (ret) {
+case -ENOTSUP:
+s->has_write_zeroes = false; /* fall through */
+case -EBUSY: /* Linux 5.10/5.11 may return -EBUSY for multipath 
devices */
+return -ENOTSUP;
+break;
 }
 }
 #endif
--
2.27.0

[PATCH 2/2] migrate-bitmaps-postcopy-test: check that we can't remove in-flight bitmaps

2021-03-22 Thread Vladimir Sementsov-Ogievskiy

Check that we can't remove bitmaps being migrated on destination vm.
The new check proves that previous commit helps.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test 
b/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test
index d046ebeb94..584062b412 100755
--- a/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test
+++ b/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test
@@ -224,6 +224,16 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.start_postcopy()
 
 self.vm_b_events += self.vm_b.get_qmp_events()
+
+# While being here, let's check that we can't remove in-flight bitmaps.
+for vm in (self.vm_a, self.vm_b):
+for i in range(0, nb_bitmaps):
+result = vm.qmp('block-dirty-bitmap-remove', node='drive0',
+name=f'bitmap{i}')
+self.assert_qmp(result, 'error/desc',
+f"Bitmap 'bitmap{i}' is currently in use by "
+"another operation and cannot be used")
+
 self.vm_b.shutdown()
 # recreate vm_b, so there is no incoming option, which prevents
 # loading bitmaps from disk
-- 
2.29.2

[PATCH for-6.0 0/2] Fix use-after-free, if remove bitmap during migration

2021-03-22 Thread Vladimir Sementsov-Ogievskiy

Hi all! Accidentally we found on use-after-free. Normally user should
not remove bitmaps during migration.. But some wrong user actions may
simply lead to Qemu crash and that's not good.

Vladimir Sementsov-Ogievskiy (2):
  migration/block-dirty-bitmap: make incoming disabled bitmaps busy
  migrate-bitmaps-postcopy-test: check that we can't remove in-flight
bitmaps

 migration/block-dirty-bitmap.c |  6 ++
 tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test | 10 ++
 2 files changed, 16 insertions(+)

-- 
2.29.2

[PATCH 1/2] migration/block-dirty-bitmap: make incoming disabled bitmaps busy

2021-03-22 Thread Vladimir Sementsov-Ogievskiy

Incoming enabled bitmaps are busy, because we do
bdrv_dirty_bitmap_create_successor() for them. But disabled bitmaps
being migrated are not marked busy, and user can remove them during the
incoming migration. Then we may crash in cancel_incoming_locked() when
try to remove the bitmap that was already removed by user, like this:

 #0  qemu_mutex_lock_impl (mutex=0x5593d88c50d1, file=0x559680554b20
   "../block/dirty-bitmap.c", line=64) at ../util/qemu-thread-posix.c:77
 #1  bdrv_dirty_bitmaps_lock (bs=0x5593d88c0ee9)
   at ../block/dirty-bitmap.c:64
 #2  bdrv_release_dirty_bitmap (bitmap=0x5596810e9570)
   at ../block/dirty-bitmap.c:362
 #3  cancel_incoming_locked (s=0x559680be8208 )
   at ../migration/block-dirty-bitmap.c:918
 #4  dirty_bitmap_load (f=0x559681d02b10, opaque=0x559680be81e0
   , version_id=1) at ../migration/block-dirty-bitmap.c:1194
 #5  vmstate_load (f=0x559681d02b10, se=0x559680fb5810)
   at ../migration/savevm.c:908
 #6  qemu_loadvm_section_part_end (f=0x559681d02b10,
   mis=0x559680fb4a30) at ../migration/savevm.c:2473
 #7  qemu_loadvm_state_main (f=0x559681d02b10, mis=0x559680fb4a30)
   at ../migration/savevm.c:2626
 #8  postcopy_ram_listen_thread (opaque=0x0)
   at ../migration/savevm.c:1871
 #9  qemu_thread_start (args=0x5596817ccd10)
   at ../util/qemu-thread-posix.c:521
 #10 start_thread () at /lib64/libpthread.so.0
 #11 clone () at /lib64/libc.so.6

Note bs pointer taken from bitmap: it's definitely bad aligned. That's
because we are in use after free, bitmap is already freed.

So, let's make disabled bitmaps (being migrated) busy during incoming
migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 migration/block-dirty-bitmap.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 975093610a..35f5ef688d 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -839,6 +839,8 @@ static int dirty_bitmap_load_start(QEMUFile *f, 
DBMLoadState *s)
 error_report_err(local_err);
 return -EINVAL;
 }
+} else {
+bdrv_dirty_bitmap_set_busy(s->bitmap, true);
 }
 
 b = g_new(LoadBitmapState, 1);
@@ -914,6 +916,8 @@ static void cancel_incoming_locked(DBMLoadState *s)
 assert(!s->before_vm_start_handled || !b->migrated);
 if (bdrv_dirty_bitmap_has_successor(b->bitmap)) {
 bdrv_reclaim_dirty_bitmap(b->bitmap, &error_abort);
+} else {
+bdrv_dirty_bitmap_set_busy(b->bitmap, false);
 }
 bdrv_release_dirty_bitmap(b->bitmap);
 }
@@ -951,6 +955,8 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
DBMLoadState *s)
 
 if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
 bdrv_reclaim_dirty_bitmap(s->bitmap, &error_abort);
+} else {
+bdrv_dirty_bitmap_set_busy(s->bitmap, false);
 }
 
 for (item = s->bitmaps; item; item = g_slist_next(item)) {
-- 
2.29.2

Re: [PATCH] nvme: expose 'bootindex' property

2021-03-22 Thread Philippe Mathieu-Daudé

On 3/22/21 9:24 AM, Joelle van Dyne wrote:
> The check for `n->namespace.blkconf.blk` always fails because
> this is in the initialization function.

This usually mean the code depends to some state only available
during the QOM 'realization' step, so this code should be in
nvme_realize(). Maybe in this case we don't need it there and
can add the property regardless a block drive is provided, I
haven't checked.

> 
> Signed-off-by: Joelle van Dyne 
> ---
>  hw/block/nvme.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 6842b01ab5..42605fc55d 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -6330,11 +6330,9 @@ static void nvme_instance_init(Object *obj)
>  {
>  NvmeCtrl *n = NVME(obj);
>  
> -if (n->namespace.blkconf.blk) {
> -device_add_bootindex_property(obj, &n->namespace.blkconf.bootindex,
> -  "bootindex", "/namespace@1,0",
> -  DEVICE(obj));
> -}
> +device_add_bootindex_property(obj, &n->namespace.blkconf.bootindex,
> +  "bootindex", "/namespace@1,0",
> +  DEVICE(obj));
>  
>  object_property_add(obj, "smart_critical_warning", "uint8",
>  nvme_get_smart_warning,
>

Re: [PATCH 2/2] hw/block/nvme: fix resource leak in nvme_format_ns

2021-03-22 Thread Max Reitz


On 22.03.21 07:19, Klaus Jensen wrote:

From: Klaus Jensen 

In nvme_format_ns(), if the namespace is of zero size (which might be
useless, but not invalid), the `count` variable will leak. Fix this by
returning early in that case.


When looking at the Coverity report, something else caught my eye: As 
far as I’m aware, blk_aio_pwrite_zeroes() may invoke the CB before 
returning (if blk_do_pwritev_part() returns without yielding).  I don’t 
think that will happen with real hardware (who knows, though), but it 
should be possible to see with the null-co block driver.


nvme_format_ns() doesn’t quite look like it takes that into account. 
For example, because *count starts at 1 and is decremented after the 
while (len) loop, all nvme_aio_format_cb() invocations (if they are 
invoked before their blk_aio_pwrite_zeroes() returns) will see

*count == 2, and thus not free it, or call nvme_enqueue_req_completion().

I don’t know whether the latter is problematic, but not freeing `count` 
doesn’t seem right.  Perhaps this could be addressed by adding a 
condition to the `(*count)--` to see whether `(*count)-- == 1` (or 
rather `--(*count) == 0`), which would indicate that there are no AIO 
functions still in flight?


Max


Reported-by: Coverity (CID 1451082)
Fixes: dc04d25e2f3f ("hw/block/nvme: add support for the format nvm command")
Signed-off-by: Klaus Jensen 
---
  hw/block/nvme.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6842b01ab58b..dad275971a84 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4984,6 +4984,11 @@ static uint16_t nvme_format_ns(NvmeCtrl *n, 
NvmeNamespace *ns, uint8_t lbaf,
  ns->status = NVME_FORMAT_IN_PROGRESS;
  
  len = ns->size;

+
+if (!len) {
+return NVME_SUCCESS;
+}
+
  offset = 0;
  
  count = g_new(int, 1);

Re: Fwd: [PATCH 0/2] block/raw: implemented persistent dirty bitmap and ability to dump bitmap content via qapi

2021-03-22 Thread Max Reitz


Hi,

On 20.03.21 11:01, Patrik Janoušek wrote:
I'm sorry, but I forgot to add you to the cc, so I'm forwarding the 
patch to you additionally. I don't want to spam the mailing list 
unnecessarily.


I think it’s better to still CC the list.  It’s so full of mail, one 
more won’t hurt. :)


(Re-adding qemu-block and qemu-devel, because the discussion belongs on 
the list(s).)



 Forwarded Message 
Subject: 	[PATCH 0/2] block/raw: implemented persistent dirty bitmap and 
ability to dump bitmap content via qapi

Date:   Sat, 20 Mar 2021 10:32:33 +0100
From:   Patrik Janoušek 
To: qemu-de...@nongnu.org
CC: Patrik Janoušek , lmate...@kiv.zcu.cz



Currently, QEMU doesn't support persistent dirty bitmaps for raw format
and also dirty bitmaps are for internal use only, and cannot be accessed
using third-party applications. These facts are very limiting
in case someone would like to develop their own backup tool becaouse
without access to the dirty bitmap it would be possible to implement
only full backups. And without persistent dirty bitmaps, it wouldn't
be possible to keep track of changed data after QEMU is restarted. And
this is exactly what I do as a part of my bachelor thesis. I've
developed a tool that is able to create incremental backups of drives
in raw format that are LVM volumes (ability to create snapshot is
required).


Similarly to what Vladimir has said already, the thing is that 
conceptually I can see no difference between having a raw image with the 
bitmaps stored in some other file, i.e.:


  { "driver": "raw",
"dirty-bitmaps": [ {
  "filename": "sdc1.bitmap",
  "persistent": true
} ],
"file": {
  "driver": "file",
  "filename": "/dev/sdc1"
} }

And having a qcow2 image with the raw data stored in some other file, i.e.:

  { "driver": "qcow2",
"file": {
  "driver": "file",
  "filename": "sdc1.metadata"
},
"data-file": {
  "driver": "file",
  "filename": "/dev/sdc1"
} }

(Where sdc1.metadata is a qcow2 file created with
“data-file=/dev/sdc1,data-file-raw=on”.)

To use persistent bitmaps with raw images, you need to add metadata 
(namely, the bitmaps).  Why not store that metadata in a qcow2 file?


Max

Re: [PATCH 2/2] hw/block/nvme: fix resource leak in nvme_format_ns

2021-03-22 Thread Klaus Jensen

On Mar 22 11:02, Max Reitz wrote:
> On 22.03.21 07:19, Klaus Jensen wrote:
> > From: Klaus Jensen 
> > 
> > In nvme_format_ns(), if the namespace is of zero size (which might be
> > useless, but not invalid), the `count` variable will leak. Fix this by
> > returning early in that case.
> 
> When looking at the Coverity report, something else caught my eye: As far as
> I’m aware, blk_aio_pwrite_zeroes() may invoke the CB before returning (if
> blk_do_pwritev_part() returns without yielding).  I don’t think that will
> happen with real hardware (who knows, though), but it should be possible to
> see with the null-co block driver.
> 
> nvme_format_ns() doesn’t quite look like it takes that into account. For
> example, because *count starts at 1 and is decremented after the while (len)
> loop, all nvme_aio_format_cb() invocations (if they are invoked before their
> blk_aio_pwrite_zeroes() returns) will see
> *count == 2, and thus not free it, or call nvme_enqueue_req_completion().
> 
> I don’t know whether the latter is problematic, but not freeing `count`
> doesn’t seem right.  Perhaps this could be addressed by adding a condition
> to the `(*count)--` to see whether `(*count)-- == 1` (or rather `--(*count)
> == 0`), which would indicate that there are no AIO functions still in
> flight?

Hi Max,

Thanks for glossing over this.

And, yeah, you are right, nice catch. The reference counting is exactly
to guard against pwrite_zeroes invoking the CB before returning, but if
it happens it is not correctly handling it anyway :(

This stuff is exactly why I proposed converting all this into the
"proper" BlockAIOCB approach (see [1]), but it need a little more work.

I'll v2 this with a fix for this! Thanks!


  [1]: 
https://lore.kernel.org/qemu-devel/20210302111040.289244-1-...@irrelevant.dk/


signature.asc
Description: PGP signature

Re: [PATCH 2/2] hw/block/nvme: fix resource leak in nvme_format_ns

2021-03-22 Thread Max Reitz


On 22.03.21 11:48, Klaus Jensen wrote:

On Mar 22 11:02, Max Reitz wrote:

On 22.03.21 07:19, Klaus Jensen wrote:

From: Klaus Jensen 

In nvme_format_ns(), if the namespace is of zero size (which might be
useless, but not invalid), the `count` variable will leak. Fix this by
returning early in that case.


When looking at the Coverity report, something else caught my eye: As far as
I’m aware, blk_aio_pwrite_zeroes() may invoke the CB before returning (if
blk_do_pwritev_part() returns without yielding).  I don’t think that will
happen with real hardware (who knows, though), but it should be possible to
see with the null-co block driver.

nvme_format_ns() doesn’t quite look like it takes that into account. For
example, because *count starts at 1 and is decremented after the while (len)
loop, all nvme_aio_format_cb() invocations (if they are invoked before their
blk_aio_pwrite_zeroes() returns) will see
*count == 2, and thus not free it, or call nvme_enqueue_req_completion().

I don’t know whether the latter is problematic, but not freeing `count`
doesn’t seem right.  Perhaps this could be addressed by adding a condition
to the `(*count)--` to see whether `(*count)-- == 1` (or rather `--(*count)
== 0`), which would indicate that there are no AIO functions still in
flight?


Hi Max,

Thanks for glossing over this.

And, yeah, you are right, nice catch. The reference counting is exactly
to guard against pwrite_zeroes invoking the CB before returning, but if
it happens it is not correctly handling it anyway :(

This stuff is exactly why I proposed converting all this into the
"proper" BlockAIOCB approach (see [1]), but it need a little more work.

I'll v2 this with a fix for this! Thanks!


   [1]: 
https://lore.kernel.org/qemu-devel/20210302111040.289244-1-...@irrelevant.dk/


OK, thanks! :)

That rewrite does look more in-line with how AIO is handled elsewhere, yes.

Max

Re: [PATCH 1/2] block/raw: added support of persistent dirty bitmaps

2021-03-22 Thread Vladimir Sementsov-Ogievskiy


22.03.2021 13:46, Vladimir Sementsov-Ogievskiy wrote:

22.03.2021 13:18, Patrik Janoušek wrote:

On 3/22/21 9:41 AM, Vladimir Sementsov-Ogievskiy wrote:

20.03.2021 12:32, Patrik Janoušek wrote:

Current implementation of dirty bitmaps for raw format is very
limited, because those bitmaps cannot be persistent. Basically it
makes sense, because the raw format doesn't have space where could
be dirty bitmap stored when QEMU is offline. This patch solves it
by storing content of every dirty bitmap in separate file on the
host filesystem.

However, this only solves one part of the problem. We also have to
store information about the existence of the dirty bitmap. This is
solved by adding custom options, that stores all required metadata
about dirty bitmap (filename where is the bitmap stored on the
host filesystem, granularity, persistence, etc.).

Signed-off-by: Patrik Janoušek



Hmm. Did you considered other ways? Honestly, I don't see a reason for
yet another storing format for bitmaps.

The task could be simply solved with existing features:

1. We have extenal-data-file feature in qcow2 (read
docs/interop/qcow2.txt). With this thing enabled, qcow2 file contains
only metadata (persistent bitmaps for example) and data is stored in
separate sequential raw file. I think you should start from it.


I didn't know about that feature. I'll look at it.

In case I use NBD to access the bitmap context and qcow2 as a solution
for persistent layer. Would the patch be acceptable? This is significant
change to my solution and I don't have enought time for it at the moment
(mainly due to other parts of my bachelor's thesis). I just want to know
if this kind of feature is interesting to you and its implementation is
worth my time.


Honestly, at this point I think it doesn't. If existing features satisfy your 
use-case, no reason to increase complexity of file-posix driver and QAPI.



It's unpleasant to say this, keeping in mind that that's your first submission 
:(

I can still recommend in a connection with your bachelor's thesis to look at 
the videos at kvm-forum youtube channel, searching for backup:

  https://www.youtube.com/channel/UCRCSQmAOh7yzgheq-emy1xA/search?query=backup

You'll get a lot of information about current developments of external backup 
API.

Also note, that there is (or there will be ?) libvirt Backup API, which 
includes an API for external backup. I don't know the current status of it, but 
if your project is based on libvirt, it's better to use libvirt backup API 
instead of using qemu directly. About Libvirt Backup API it's better to ask 
Eric Blake (adding him to CC).


--
Best regards,
Vladimir

Re: [PATCH 0/2] Fix crash if try to remove bitmap on target during migration

2021-03-22 Thread Stefan Hajnoczi

On Fri, Mar 19, 2021 at 11:41:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> Bitmaps on source are marked busy during migration.
> 
> Enabled bitmaps on target have successor, so they are busy.
> 
> But disabled migrated bitmaps are not protected on target. User can
> simple remove them and it lead to use-after-free. These bitmaps should
> be marked busy.
> 
> Vladimir Sementsov-Ogievskiy (2):
>   migration/block-dirty-bitmap: make incoming disabled bitmaps busy
>   migrate-bitmaps-postcopy-test: check that we can't remove in-flight
> bitmaps
> 
>  migration/block-dirty-bitmap.c | 6 ++
>  tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test | 9 +
>  2 files changed, 15 insertions(+)
> 
> -- 
> 2.29.2
> 

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature

Re: [PATCH 0/2] Fix crash if try to remove bitmap on target during migration

2021-03-22 Thread Vladimir Sementsov-Ogievskiy


22.03.2021 14:28, Stefan Hajnoczi wrote:

On Fri, Mar 19, 2021 at 11:41:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

Bitmaps on source are marked busy during migration.

Enabled bitmaps on target have successor, so they are busy.

But disabled migrated bitmaps are not protected on target. User can
simple remove them and it lead to use-after-free. These bitmaps should
be marked busy.

Vladimir Sementsov-Ogievskiy (2):
   migration/block-dirty-bitmap: make incoming disabled bitmaps busy
   migrate-bitmaps-postcopy-test: check that we can't remove in-flight
 bitmaps

  migration/block-dirty-bitmap.c | 6 ++
  tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test | 9 +
  2 files changed, 15 insertions(+)

--
2.29.2



Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan



Thanks!

O_o. Somehow, I've sent this thing twice, look at "[PATCH for-6.0 0/2] Fix 
use-after-free, if remove bitmap during migration". Sorry for the mess :\

patch 1 is the same, but patch 2 in new submission is updated to check that 
bitmaps can't be removed on source too. If it doesn't bother you can update the 
patch 2 in your branch too.

--
Best regards,
Vladimir

[PATCH v2 1/2] hw/block/nvme: fix resource leak in nvme_dif_rw

2021-03-22 Thread Klaus Jensen

From: Klaus Jensen 

If nvme_map_dptr() fails, nvme_dif_rw() will leak the bounce context.
Fix this by using the same error handling as everywhere else in the
function.

Reported-by: Coverity (CID 1451080)
Fixes: 146f720c5563 ("hw/block/nvme: end-to-end data protection")
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-dif.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme-dif.c b/hw/block/nvme-dif.c
index 2038d724bda5..e6f04faafb5f 100644
--- a/hw/block/nvme-dif.c
+++ b/hw/block/nvme-dif.c
@@ -432,7 +432,7 @@ uint16_t nvme_dif_rw(NvmeCtrl *n, NvmeRequest *req)
 
 status = nvme_map_dptr(n, &req->sg, mapped_len, &req->cmd);
 if (status) {
-return status;
+goto err;
 }
 
 ctx->data.bounce = g_malloc(len);
-- 
2.31.0

[PATCH v2 0/2] hw/block/nvme: coverity fixes

2021-03-22 Thread Klaus Jensen

From: Klaus Jensen 

Fix two issues reported by coverity (CID 1451080 and 1451082).

v2:
  - replace [2/2] with a fix for the bad reference counting noticed by
Max

Klaus Jensen (2):
  hw/block/nvme: fix resource leak in nvme_dif_rw
  hw/block/nvme: fix ref counting in nvme_format_ns

 hw/block/nvme-dif.c |  2 +-
 hw/block/nvme.c | 10 --
 2 files changed, 9 insertions(+), 3 deletions(-)

-- 
2.31.0

Re: Fwd: [PATCH 0/2] block/raw: implemented persistent dirty bitmap and ability to dump bitmap content via qapi

2021-03-22 Thread Max Reitz


On 22.03.21 12:27, Patrik Janoušek wrote:

On 3/22/21 11:48 AM, Max Reitz wrote:

Hi,

On 20.03.21 11:01, Patrik Janoušek wrote:

I'm sorry, but I forgot to add you to the cc, so I'm forwarding the
patch to you additionally. I don't want to spam the mailing list
unnecessarily.


I think it’s better to still CC the list.  It’s so full of mail, one
more won’t hurt. :)

(Re-adding qemu-block and qemu-devel, because the discussion belongs
on the list(s).)


 Forwarded Message 
Subject: [PATCH 0/2] block/raw: implemented persistent dirty
bitmap and ability to dump bitmap content via qapi
Date: Sat, 20 Mar 2021 10:32:33 +0100
From: Patrik Janoušek 
To: qemu-de...@nongnu.org
CC: Patrik Janoušek , lmate...@kiv.zcu.cz



Currently, QEMU doesn't support persistent dirty bitmaps for raw format
and also dirty bitmaps are for internal use only, and cannot be accessed
using third-party applications. These facts are very limiting
in case someone would like to develop their own backup tool becaouse
without access to the dirty bitmap it would be possible to implement
only full backups. And without persistent dirty bitmaps, it wouldn't
be possible to keep track of changed data after QEMU is restarted. And
this is exactly what I do as a part of my bachelor thesis. I've
developed a tool that is able to create incremental backups of drives
in raw format that are LVM volumes (ability to create snapshot is
required).


Similarly to what Vladimir has said already, the thing is that
conceptually I can see no difference between having a raw image with
the bitmaps stored in some other file, i.e.:

   { "driver": "raw",
     "dirty-bitmaps": [ {
   "filename": "sdc1.bitmap",
   "persistent": true
     } ],
     "file": {
   "driver": "file",
   "filename": "/dev/sdc1"
     } }

And having a qcow2 image with the raw data stored in some other file,
i.e.:

   { "driver": "qcow2",
     "file": {
   "driver": "file",
   "filename": "sdc1.metadata"
     },
     "data-file": {
   "driver": "file",
   "filename": "/dev/sdc1"
     } }

(Where sdc1.metadata is a qcow2 file created with
“data-file=/dev/sdc1,data-file-raw=on”.)

To use persistent bitmaps with raw images, you need to add metadata
(namely, the bitmaps).  Why not store that metadata in a qcow2 file?

Max


So if I understand it correctly. I can configure dirty bitmaps in the
latest version of QEMU to be persistently stored in some other file.
Because even Proxmox Backup Server can't perform an incremental backup
after restarting QEMU, and that means something to me. I think they
would implement it if it was that simple.

Could you please send me simple example on how to configure (via command
line args) one raw format drive that can store dirty bitmaps
persistently in other qcow2 file? I may be missing something, but I
thought QEMU couldn't do it, because Proxmox community wants this
feature for a long time.


One trouble is that if you use qemu-img create to create the qcow2 
image, it will always create an empty image, and so if use pass 
data_file to it, it will empty the existing raw image:


$ cp ~/tmp/arch.iso raw.img # Just some Arch Linux ISO

$ qemu-img create \
-f qcow2 \
-o data_file=raw.img,data_file_raw=on,preallocation=metadata \
metadata.qcow2 \
$(stat -c '%s' raw.img)
Formatting 'metadata.qcow2', fmt=qcow2 cluster_size=65536 
preallocation=metadata compression_type=zlib size=687865856 
data_file=raw.img data_file_raw=on lazy_refcounts=off refcount_bits=16


(If you check raw.img at this point, you’ll find that it’s empty, so you 
need to copy it from the source again:)


$ cp ~/tmp/arch.iso raw.img

Now if you use metadata.qcow2, the image data will actually all be 
stored in raw.img.



To get around the “creating metadata.qcow2 clears raw.img” problem, you 
can either create a temporary empty image of the same size as raw.img 
that you pass to qemu-img create, and then you use qemu-img amend to 
change the data-file pointer (which will not overwrite the new 
data-file’s contents):


$ qemu-img create -f raw tmp.raw $(stat -c '%s' raw.img)

$ qemu-img create \
-f qcow2 \
-o data_file=tmp.img,data_file_raw=on,preallocation=metadata \
metadata.qcow2 \
$(stat -c '%s' raw.img)
Formatting 'metadata.qcow2', fmt=qcow2 cluster_size=65536 
preallocation=metadata compression_type=zlib size=687865856 
data_file=tmp.img data_file_raw=on lazy_refcounts=off refcount_bits=16


$ qemu-img amend -o data_file=raw.img metadata.qcow2

$ rm tmp.img


Or you use the blockdev-create job to create the qcow2 image (because 
contrary to qemu-img create, that will not clear the data file):


$ touch metadata.qcow2

(Note that in the following QMP communication, what I sent and what qemu 
replies is mixed.  Everything that begins with '{ "execute"' is from me, 
everything else from qemu.  The number 687865856 is the size of raw.img 
in bytes.)


$ qemu-system-x86_64 -qmp stdio \
-blockdev

[PATCH v2 2/2] hw/block/nvme: fix ref counting in nvme_format_ns

2021-03-22 Thread Klaus Jensen

From: Klaus Jensen 

Max noticed that since blk_aio_pwrite_zeroes() may invoke the callback
before returning, the callbacks will never see *count == 0 and thus
never free the count variable or decrement num_formats causing a CQE to
never be posted.

Coverity (CID 1451082) also picked up on the fact that count would not
be free'ed if the namespace was of zero size.

Fix both of these issues by explicitly checking *count and finalize for
the given namespace if --(*count) is zero. Enqueing a CQE if there are
no AIOs outstanding after this case is already handled by nvme_format()
by inspecting *num_formats.

Reported-by: Max Reitz 
Reported-by: Coverity (CID 1451082)
Fixes: dc04d25e2f3f ("hw/block/nvme: add support for the format nvm command")
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6842b01ab58b..c54ec3c9523c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -5009,9 +5009,15 @@ static uint16_t nvme_format_ns(NvmeCtrl *n, 
NvmeNamespace *ns, uint8_t lbaf,
 
 }
 
-(*count)--;
+if (--(*count)) {
+return NVME_NO_COMPLETE;
+}
 
-return NVME_NO_COMPLETE;
+g_free(count);
+ns->status = 0x0;
+(*num_formats)--;
+
+return NVME_SUCCESS;
 }
 
 static uint16_t nvme_format(NvmeCtrl *n, NvmeRequest *req)
-- 
2.31.0

Re: [PATCH for-6.0 0/2] Fix use-after-free, if remove bitmap during migration

2021-03-22 Thread Vladimir Sementsov-Ogievskiy


22.03.2021 12:49, Vladimir Sementsov-Ogievskiy wrote:

Hi all! Accidentally we found on use-after-free. Normally user should
not remove bitmaps during migration.. But some wrong user actions may
simply lead to Qemu crash and that's not good.

Vladimir Sementsov-Ogievskiy (2):
   migration/block-dirty-bitmap: make incoming disabled bitmaps busy
   migrate-bitmaps-postcopy-test: check that we can't remove in-flight
 bitmaps

  migration/block-dirty-bitmap.c |  6 ++
  tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test | 10 ++
  2 files changed, 16 insertions(+)



Oops sorry. Actually, it's a v2 for "[PATCH 0/2] Fix crash if try to remove bitmap 
on target during migration" with a bit improved test, patch 1` unchanged.

Supersedes: <20210319204124.364312-1-vsement...@virtuozzo.com>

--
Best regards,
Vladimir

Re: [PATCH] nvme: expose 'bootindex' property

2021-03-22 Thread Klaus Jensen

On Mar 22 10:58, Philippe Mathieu-Daudé wrote:
> On 3/22/21 9:24 AM, Joelle van Dyne wrote:
> > The check for `n->namespace.blkconf.blk` always fails because
> > this is in the initialization function.
> 
> This usually mean the code depends to some state only available
> during the QOM 'realization' step, so this code should be in
> nvme_realize(). Maybe in this case we don't need it there and
> can add the property regardless a block drive is provided, I
> haven't checked.
> 

If we defer to realization, it won't be available as a parameter on the
command line, but as far as I can test, adding it unconditionally
doesn't break anything when there is no drive attached to the controller
device.

> > 
> > Signed-off-by: Joelle van Dyne 
> > ---
> >  hw/block/nvme.c | 8 +++-
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 6842b01ab5..42605fc55d 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -6330,11 +6330,9 @@ static void nvme_instance_init(Object *obj)
> >  {
> >  NvmeCtrl *n = NVME(obj);
> >  
> > -if (n->namespace.blkconf.blk) {
> > -device_add_bootindex_property(obj, &n->namespace.blkconf.bootindex,
> > -  "bootindex", "/namespace@1,0",
> > -  DEVICE(obj));
> > -}
> > +device_add_bootindex_property(obj, &n->namespace.blkconf.bootindex,
> > +  "bootindex", "/namespace@1,0",
> > +  DEVICE(obj));
> >  
> >  object_property_add(obj, "smart_critical_warning", "uint8",
> >  nvme_get_smart_warning,
> > 
> 
> 


signature.asc
Description: PGP signature

Re: Fwd: [PATCH 0/2] block/raw: implemented persistent dirty bitmap and ability to dump bitmap content via qapi

2021-03-22 Thread Fabian Grünbichler

On March 22, 2021 12:27 pm, Patrik Janoušek wrote:
> On 3/22/21 11:48 AM, Max Reitz wrote:
>> Hi,
>>
>> On 20.03.21 11:01, Patrik Janoušek wrote:
>>> I'm sorry, but I forgot to add you to the cc, so I'm forwarding the
>>> patch to you additionally. I don't want to spam the mailing list
>>> unnecessarily.
>>
>> I think it’s better to still CC the list.  It’s so full of mail, one
>> more won’t hurt. :)
>>
>> (Re-adding qemu-block and qemu-devel, because the discussion belongs
>> on the list(s).)
>>
>>>  Forwarded Message 
>>> Subject: [PATCH 0/2] block/raw: implemented persistent dirty
>>> bitmap and ability to dump bitmap content via qapi
>>> Date: Sat, 20 Mar 2021 10:32:33 +0100
>>> From: Patrik Janoušek 
>>> To: qemu-de...@nongnu.org
>>> CC: Patrik Janoušek , lmate...@kiv.zcu.cz
>>>
>>>
>>>
>>> Currently, QEMU doesn't support persistent dirty bitmaps for raw format
>>> and also dirty bitmaps are for internal use only, and cannot be accessed
>>> using third-party applications. These facts are very limiting
>>> in case someone would like to develop their own backup tool becaouse
>>> without access to the dirty bitmap it would be possible to implement
>>> only full backups. And without persistent dirty bitmaps, it wouldn't
>>> be possible to keep track of changed data after QEMU is restarted. And
>>> this is exactly what I do as a part of my bachelor thesis. I've
>>> developed a tool that is able to create incremental backups of drives
>>> in raw format that are LVM volumes (ability to create snapshot is
>>> required).
>>
>> Similarly to what Vladimir has said already, the thing is that
>> conceptually I can see no difference between having a raw image with
>> the bitmaps stored in some other file, i.e.:
>>
>>   { "driver": "raw",
>>     "dirty-bitmaps": [ {
>>   "filename": "sdc1.bitmap",
>>   "persistent": true
>>     } ],
>>     "file": {
>>   "driver": "file",
>>   "filename": "/dev/sdc1"
>>     } }
>>
>> And having a qcow2 image with the raw data stored in some other file,
>> i.e.:
>>
>>   { "driver": "qcow2",
>>     "file": {
>>   "driver": "file",
>>   "filename": "sdc1.metadata"
>>     },
>>     "data-file": {
>>   "driver": "file",
>>   "filename": "/dev/sdc1"
>>     } }
>>
>> (Where sdc1.metadata is a qcow2 file created with
>> “data-file=/dev/sdc1,data-file-raw=on”.)
>>
>> To use persistent bitmaps with raw images, you need to add metadata
>> (namely, the bitmaps).  Why not store that metadata in a qcow2 file?
>>
>> Max
> 
> So if I understand it correctly. I can configure dirty bitmaps in the
> latest version of QEMU to be persistently stored in some other file.
> Because even Proxmox Backup Server can't perform an incremental backup
> after restarting QEMU, and that means something to me. I think they
> would implement it if it was that simple.

the main reason we haven't implemented something like that (yet) is not 
that it is technically difficult, but that we have no way to safeguard 
against external manipulation of the (raw) image:

- VM is running, backup happens, bitmap represents the delta since this 
  backup
- VM is stopped
- user/admin does something to VM disk image, believing such an action 
  is safe because VM is not running
- VM is started again - bitmap does not contain the manual changes
- VM is running, backup happens, backup is inconsistent with actual data 
  stored in image

because of persistence, this error is now carried forward indefinitely 
(it might self-correct at some point if all the invalid stuff has been 
dirtied by the guest).

while it is of course possible to argue that this is entirely the user's 
fault, it looks like it's the backup software/hypervisor's fault (no 
indication anything is wrong, backup data is bogus).

it might happen at some point as opt-in feature with all the warnings 
associated with potentially dangerous features, but for now we are okay 
with just carrying the bitmap as long as the VM instance is running 
(including migrations), and having to re-read and chunk/hash the disks 
for fresh instances. the latter is expensive, but less expensive than 
having to explain to users why their backups are bogus.

> 
> Could you please send me simple example on how to configure (via command
> line args) one raw format drive that can store dirty bitmaps
> persistently in other qcow2 file? I may be missing something, but I
> thought QEMU couldn't do it, because Proxmox community wants this
> feature for a long time.
> 
> Patrik
> 
> 
> 
> 
> 
>

Re: [PATCH 1/2] block/raw: added support of persistent dirty bitmaps

2021-03-22 Thread Patrik Janoušek



On 3/22/21 12:18 PM, Vladimir Sementsov-Ogievskiy wrote:
> 22.03.2021 13:46, Vladimir Sementsov-Ogievskiy wrote:
>> 22.03.2021 13:18, Patrik Janoušek wrote:
>>> On 3/22/21 9:41 AM, Vladimir Sementsov-Ogievskiy wrote:
 20.03.2021 12:32, Patrik Janoušek wrote:
> Current implementation of dirty bitmaps for raw format is very
> limited, because those bitmaps cannot be persistent. Basically it
> makes sense, because the raw format doesn't have space where could
> be dirty bitmap stored when QEMU is offline. This patch solves it
> by storing content of every dirty bitmap in separate file on the
> host filesystem.
>
> However, this only solves one part of the problem. We also have to
> store information about the existence of the dirty bitmap. This is
> solved by adding custom options, that stores all required metadata
> about dirty bitmap (filename where is the bitmap stored on the
> host filesystem, granularity, persistence, etc.).
>
> Signed-off-by: Patrik Janoušek


 Hmm. Did you considered other ways? Honestly, I don't see a reason for
 yet another storing format for bitmaps.

 The task could be simply solved with existing features:

 1. We have extenal-data-file feature in qcow2 (read
 docs/interop/qcow2.txt). With this thing enabled, qcow2 file contains
 only metadata (persistent bitmaps for example) and data is stored in
 separate sequential raw file. I think you should start from it.
>>>
>>> I didn't know about that feature. I'll look at it.
>>>
>>> In case I use NBD to access the bitmap context and qcow2 as a solution
>>> for persistent layer. Would the patch be acceptable? This is
>>> significant
>>> change to my solution and I don't have enought time for it at the
>>> moment
>>> (mainly due to other parts of my bachelor's thesis). I just want to
>>> know
>>> if this kind of feature is interesting to you and its implementation is
>>> worth my time.
>>
>> Honestly, at this point I think it doesn't. If existing features
>> satisfy your use-case, no reason to increase complexity of file-posix
>> driver and QAPI.
>>
>
> It's unpleasant to say this, keeping in mind that that's your first
> submission :(
>
> I can still recommend in a connection with your bachelor's thesis to
> look at the videos at kvm-forum youtube channel, searching for backup:
>
>  
> https://www.youtube.com/channel/UCRCSQmAOh7yzgheq-emy1xA/search?query=backup
>
> You'll get a lot of information about current developments of external
> backup API.
>
> Also note, that there is (or there will be ?) libvirt Backup API,
> which includes an API for external backup. I don't know the current
> status of it, but if your project is based on libvirt, it's better to
> use libvirt backup API instead of using qemu directly. About Libvirt
> Backup API it's better to ask Eric Blake (adding him to CC).
Unfortunately, my solution is based on Proxmox so I can't use libvirt's
features. I know that a beta version of Proxmox Backup Server has been
released and it would be much better to improve their solution, but they
did it too late so I couldn't change assignment of my bachelor's thesis.

Re: Fwd: [PATCH 0/2] block/raw: implemented persistent dirty bitmap and ability to dump bitmap content via qapi

2021-03-22 Thread Patrik Janoušek

On 3/22/21 11:48 AM, Max Reitz wrote:
> Hi,
>
> On 20.03.21 11:01, Patrik Janoušek wrote:
>> I'm sorry, but I forgot to add you to the cc, so I'm forwarding the
>> patch to you additionally. I don't want to spam the mailing list
>> unnecessarily.
>
> I think it’s better to still CC the list.  It’s so full of mail, one
> more won’t hurt. :)
>
> (Re-adding qemu-block and qemu-devel, because the discussion belongs
> on the list(s).)
>
>>  Forwarded Message 
>> Subject: [PATCH 0/2] block/raw: implemented persistent dirty
>> bitmap and ability to dump bitmap content via qapi
>> Date: Sat, 20 Mar 2021 10:32:33 +0100
>> From: Patrik Janoušek 
>> To: qemu-de...@nongnu.org
>> CC: Patrik Janoušek , lmate...@kiv.zcu.cz
>>
>>
>>
>> Currently, QEMU doesn't support persistent dirty bitmaps for raw format
>> and also dirty bitmaps are for internal use only, and cannot be accessed
>> using third-party applications. These facts are very limiting
>> in case someone would like to develop their own backup tool becaouse
>> without access to the dirty bitmap it would be possible to implement
>> only full backups. And without persistent dirty bitmaps, it wouldn't
>> be possible to keep track of changed data after QEMU is restarted. And
>> this is exactly what I do as a part of my bachelor thesis. I've
>> developed a tool that is able to create incremental backups of drives
>> in raw format that are LVM volumes (ability to create snapshot is
>> required).
>
> Similarly to what Vladimir has said already, the thing is that
> conceptually I can see no difference between having a raw image with
> the bitmaps stored in some other file, i.e.:
>
>   { "driver": "raw",
>     "dirty-bitmaps": [ {
>   "filename": "sdc1.bitmap",
>   "persistent": true
>     } ],
>     "file": {
>   "driver": "file",
>   "filename": "/dev/sdc1"
>     } }
>
> And having a qcow2 image with the raw data stored in some other file,
> i.e.:
>
>   { "driver": "qcow2",
>     "file": {
>   "driver": "file",
>   "filename": "sdc1.metadata"
>     },
>     "data-file": {
>   "driver": "file",
>   "filename": "/dev/sdc1"
>     } }
>
> (Where sdc1.metadata is a qcow2 file created with
> “data-file=/dev/sdc1,data-file-raw=on”.)
>
> To use persistent bitmaps with raw images, you need to add metadata
> (namely, the bitmaps).  Why not store that metadata in a qcow2 file?
>
> Max

So if I understand it correctly. I can configure dirty bitmaps in the
latest version of QEMU to be persistently stored in some other file.
Because even Proxmox Backup Server can't perform an incremental backup
after restarting QEMU, and that means something to me. I think they
would implement it if it was that simple.

Could you please send me simple example on how to configure (via command
line args) one raw format drive that can store dirty bitmaps
persistently in other qcow2 file? I may be missing something, but I
thought QEMU couldn't do it, because Proxmox community wants this
feature for a long time.

Patrik

Re: [PATCH] nvme: expose 'bootindex' property

2021-03-22 Thread Philippe Mathieu-Daudé

On 3/22/21 1:37 PM, Klaus Jensen wrote:
> On Mar 22 10:58, Philippe Mathieu-Daudé wrote:
>> On 3/22/21 9:24 AM, Joelle van Dyne wrote:
>>> The check for `n->namespace.blkconf.blk` always fails because
>>> this is in the initialization function.
>>
>> This usually mean the code depends to some state only available
>> during the QOM 'realization' step, so this code should be in
>> nvme_realize(). Maybe in this case we don't need it there and
>> can add the property regardless a block drive is provided, I
>> haven't checked.
>>
> 
> If we defer to realization, it won't be available as a parameter on the
> command line, but as far as I can test, adding it unconditionally
> doesn't break anything when there is no drive attached to the controller
> device.

Patch is good then :)

Re: [PATCH] nvme: expose 'bootindex' property

2021-03-22 Thread Klaus Jensen

On Mar 22 14:10, Philippe Mathieu-Daudé wrote:
> On 3/22/21 1:37 PM, Klaus Jensen wrote:
> > On Mar 22 10:58, Philippe Mathieu-Daudé wrote:
> >> On 3/22/21 9:24 AM, Joelle van Dyne wrote:
> >>> The check for `n->namespace.blkconf.blk` always fails because
> >>> this is in the initialization function.
> >>
> >> This usually mean the code depends to some state only available
> >> during the QOM 'realization' step, so this code should be in
> >> nvme_realize(). Maybe in this case we don't need it there and
> >> can add the property regardless a block drive is provided, I
> >> haven't checked.
> >>
> > 
> > If we defer to realization, it won't be available as a parameter on the
> > command line, but as far as I can test, adding it unconditionally
> > doesn't break anything when there is no drive attached to the controller
> > device.
> 
> Patch is good then :)
> 

Agreed :)

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature

Re: [PATCH] hw/sd: sd: Fix build error when DEBUG_SD is on

2021-03-22 Thread Philippe Mathieu-Daudé

On 2/28/21 6:06 AM, Bin Meng wrote:
> From: Bin Meng 
> 
> "qemu-common.h" should be included to provide the forward declaration
> of qemu_hexdump() when DEBUG_SD is on.
> 
> Signed-off-by: Bin Meng 
> ---
> 
>  hw/sd/sd.c | 1 +
>  1 file changed, 1 insertion(+)

Thanks, patch applied to sdmmc-fixes.

Re: [PATCH] xen-block: Fix removal of backend instance via xenstore

2021-03-22 Thread Anthony PERARD via

Hi Paul, Stefano,

Could one of you could give a Ack to this patch?

Thanks,


On Mon, Mar 08, 2021 at 02:32:32PM +, Anthony PERARD wrote:
> From: Anthony PERARD 
> 
> Whenever a Xen block device is detach via xenstore, the image
> associated with it remained open by the backend QEMU and an error is
> logged:
> qemu-system-i386: failed to destroy drive: Node xvdz-qcow2 is in use
> 
> This happened since object_unparent() doesn't immediately frees the
> object and thus keep a reference to the node we are trying to free.
> The reference is hold by the "drive" property and the call
> xen_block_drive_destroy() fails.
> 
> In order to fix that, we call drain_call_rcu() to run the callback
> setup by bus_remove_child() via object_unparent().
> 
> Fixes: 2d24a6466154 ("device-core: use RCU for list of children of a bus")
> 
> Signed-off-by: Anthony PERARD 
> ---
> CCing people whom introduced/reviewed the change to use RCU to give
> them a chance to say if the change is fine.
> ---
>  hw/block/xen-block.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
> index a3b69e27096f..fe5f828e2d25 100644
> --- a/hw/block/xen-block.c
> +++ b/hw/block/xen-block.c
> @@ -972,6 +972,15 @@ static void xen_block_device_destroy(XenBackendInstance 
> *backend,
>  
>  object_unparent(OBJECT(xendev));
>  
> +/*
> + * Drall all pending RCU callbacks as object_unparent() frees `xendev'
> + * in a RCU callback.
> + * And due to the property "drive" still existing in `xendev', we
> + * cann't destroy the XenBlockDrive associated with `xendev' with
> + * xen_block_drive_destroy() below.
> + */
> +drain_call_rcu();
> +
>  if (iothread) {
>  xen_block_iothread_destroy(iothread, errp);
>  if (*errp) {
> -- 
> Anthony PERARD
> 

-- 
Anthony PERARD

Re: [PATCH v3] hw/sd: sd: Actually perform the erase operation

2021-03-22 Thread Philippe Mathieu-Daudé

On 3/10/21 8:16 AM, Bin Meng wrote:
> Hi Philippe,
> 
> On Sat, Feb 20, 2021 at 4:58 PM Bin Meng  wrote:
>>
>> From: Bin Meng 
>>
>> At present the sd_erase() does not erase the requested range of card
>> data to 0xFFs. Let's make the erase operation actually happen.
>>
>> Signed-off-by: Bin Meng 
>>
>> ---
>>
>> Changes in v3:
>> - fix the skip erase logic for SDSC cards
>>
> 
> Any comments for v3?

None :)

Reviewed-by: Philippe Mathieu-Daudé 

Patch applied to sdmmc-fixes.

Re: [PATCH] xen-block: Fix removal of backend instance via xenstore

2021-03-22 Thread Paul Durrant


On 08/03/2021 14:32, Anthony PERARD wrote:

From: Anthony PERARD 

Whenever a Xen block device is detach via xenstore, the image
associated with it remained open by the backend QEMU and an error is
logged:
 qemu-system-i386: failed to destroy drive: Node xvdz-qcow2 is in use

This happened since object_unparent() doesn't immediately frees the
object and thus keep a reference to the node we are trying to free.
The reference is hold by the "drive" property and the call
xen_block_drive_destroy() fails.

In order to fix that, we call drain_call_rcu() to run the callback
setup by bus_remove_child() via object_unparent().

Fixes: 2d24a6466154 ("device-core: use RCU for list of children of a bus")

Signed-off-by: Anthony PERARD 
---
CCing people whom introduced/reviewed the change to use RCU to give
them a chance to say if the change is fine.
---
  hw/block/xen-block.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index a3b69e27096f..fe5f828e2d25 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -972,6 +972,15 @@ static void xen_block_device_destroy(XenBackendInstance 
*backend,
  
  object_unparent(OBJECT(xendev));
  
+/*

+ * Drall all pending RCU callbacks as object_unparent() frees `xendev'


s/Drall/Drain ?


+ * in a RCU callback.
+ * And due to the property "drive" still existing in `xendev', we
+ * cann't destroy the XenBlockDrive associated with `xendev' with


s/cann't/can't

With those fixed...

Reviewed-by: Paul Durrant 


+ * xen_block_drive_destroy() below.
+ */
+drain_call_rcu();
+
  if (iothread) {
  xen_block_iothread_destroy(iothread, errp);
  if (*errp) {

Re: [PATCH v2] hw/block: m25p80: Support fast read for SST flashes

2021-03-22 Thread Alistair Francis

On Sat, Mar 6, 2021 at 1:02 AM Bin Meng  wrote:
>
> From: Bin Meng 
>
> Per SST25VF016B datasheet [1], SST flash requires a dummy byte after
> the address bytes. Note only SPI mode is supported by SST flashes.
>
> [1] http://ww1.microchip.com/downloads/en/devicedoc/s71271_04.pdf
>
> Signed-off-by: Bin Meng 
> Acked-by: Alistair Francis 

Thanks!

Applied to riscv-to-apply.next

Alistair

>
> ---
>
> Changes in v2:
> - rebase on qemu/master
>
>  hw/block/m25p80.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> index 5f9471d83c..183d3f44c2 100644
> --- a/hw/block/m25p80.c
> +++ b/hw/block/m25p80.c
> @@ -895,6 +895,9 @@ static void decode_fast_read_cmd(Flash *s)
>  s->needed_bytes = get_addr_length(s);
>  switch (get_man(s)) {
>  /* Dummy cycles - modeled with bytes writes instead of bits */
> +case MAN_SST:
> +s->needed_bytes += 1;
> +break;
>  case MAN_WINBOND:
>  s->needed_bytes += 8;
>  break;
> --
> 2.25.1
>
>

Re: [PATCH for-6.0 0/2] Fix use-after-free, if remove bitmap during migration

2021-03-22 Thread Stefan Hajnoczi

On Mon, Mar 22, 2021 at 12:49:04PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi all! Accidentally we found on use-after-free. Normally user should
> not remove bitmaps during migration.. But some wrong user actions may
> simply lead to Qemu crash and that's not good.
> 
> Vladimir Sementsov-Ogievskiy (2):
>   migration/block-dirty-bitmap: make incoming disabled bitmaps busy
>   migrate-bitmaps-postcopy-test: check that we can't remove in-flight
> bitmaps
> 
>  migration/block-dirty-bitmap.c |  6 ++
>  tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test | 10 ++
>  2 files changed, 16 insertions(+)
> 
> -- 
> 2.29.2
> 

Thanks, applied to my cpuidle-haltpoll-virtqueue tree:
https://gitlab.com/stefanha/qemu/commits/cpuidle-haltpoll-virtqueue

Stefan


signature.asc
Description: PGP signature

Re: [RESEND PATCH v3 0/5] hw/sd: sdhci: Fixes to CVE-2020-17380, CVE-2020-25085, CVE-2021-3409

2021-03-22 Thread Philippe Mathieu-Daudé

On 3/3/21 1:26 PM, Bin Meng wrote:
> This series includes several fixes to CVE-2020-17380, CVE-2020-25085
> and CVE-2021-3409 that are heap-based buffer overflow issues existing
> in the sdhci model.
> 
> These CVEs are pretty much similar, and were filed using different
> reproducers. With this series, current known reproducers I have
> cannot be reproduced any more.
> 
> The implementation of this model still has some issues, e.g.: some codes
> do not seem to strictly match the spec, but since this series only aimes
> to address CVEs, such issue should be fixed in a separate series in the
> future, if I have time :)

> Bin Meng (5):
>   hw/sd: sdhci: Don't transfer any data when command time out
>   hw/sd: sdhci: Don't write to SDHC_SYSAD register when transfer is in
> progress
>   hw/sd: sdhci: Correctly set the controller status for ADMA
>   hw/sd: sdhci: Limit block size only when SDHC_BLKSIZE register is
> writable
>   hw/sd: sdhci: Reset the data pointer of s->fifo_buffer[] when a
> different block size is programmed
> 
>  hw/sd/sdhci.c | 53 ++-
>  1 file changed, 36 insertions(+), 17 deletions(-)

Thanks, patch applied to sdmmc-fixes.

[PULL 0/7] SD/MMC patches for 2021-03-21

2021-03-22 Thread Philippe Mathieu-Daudé

The following changes since commit b184750926812cb78ac0caf4c4b2b13683b5bde3:

  Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into 
staging (2021-03-22 11:24:55 +)

are available in the Git repository at:

  https://github.com/philmd/qemu.git tags/sdmmc-20210322

for you to fetch changes up to cffb446e8fd19a14e1634c7a3a8b07be3f01d5c9:

  hw/sd: sdhci: Reset the data pointer of s->fifo_buffer[] when a different 
block size is programmed (2021-03-22 16:56:22 +0100)


SD/MMC patches queue

- Fix build error when DEBUG_SD is on
- Perform SD ERASE operation
- SDHCI ADMA heap buffer overflow
  (CVE-2020-17380, CVE-2020-25085, CVE-2021-3409)


Bin Meng (7):
  hw/sd: sd: Fix build error when DEBUG_SD is on
  hw/sd: sd: Actually perform the erase operation
  hw/sd: sdhci: Don't transfer any data when command time out
  hw/sd: sdhci: Don't write to SDHC_SYSAD register when transfer is in
progress
  hw/sd: sdhci: Correctly set the controller status for ADMA
  hw/sd: sdhci: Limit block size only when SDHC_BLKSIZE register is
writable
  hw/sd: sdhci: Reset the data pointer of s->fifo_buffer[] when a
different block size is programmed

 hw/sd/sd.c| 23 +-
 hw/sd/sdhci.c | 53 ++-
 2 files changed, 50 insertions(+), 26 deletions(-)

-- 
2.26.2

[PULL 1/7] hw/sd: sd: Fix build error when DEBUG_SD is on

2021-03-22 Thread Philippe Mathieu-Daudé

From: Bin Meng 

"qemu-common.h" should be included to provide the forward declaration
of qemu_hexdump() when DEBUG_SD is on.

Signed-off-by: Bin Meng 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20210228050609.24779-1-bmeng...@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 8b397effbcc..7b09ce9c2ef 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -47,6 +47,7 @@
 #include "qemu/timer.h"
 #include "qemu/log.h"
 #include "qemu/module.h"
+#include "qemu-common.h"
 #include "sdmmc-internal.h"
 #include "trace.h"
 
-- 
2.26.2

[PULL 2/7] hw/sd: sd: Actually perform the erase operation

2021-03-22 Thread Philippe Mathieu-Daudé

From: Bin Meng 

At present the sd_erase() does not erase the requested range of card
data to 0xFFs. Let's make the erase operation actually happen.

Signed-off-by: Bin Meng 
Message-Id: <1613811493-58815-1-git-send-email-bmeng...@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sd.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 7b09ce9c2ef..282d39a7042 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -763,10 +763,12 @@ static void sd_blk_write(SDState *sd, uint64_t addr, 
uint32_t len)
 
 static void sd_erase(SDState *sd)
 {
-int i;
 uint64_t erase_start = sd->erase_start;
 uint64_t erase_end = sd->erase_end;
 bool sdsc = true;
+uint64_t wpnum;
+uint64_t erase_addr;
+int erase_len = 1 << HWBLOCK_SHIFT;
 
 trace_sdcard_erase(sd->erase_start, sd->erase_end);
 if (sd->erase_start == INVALID_ADDRESS
@@ -795,17 +797,19 @@ static void sd_erase(SDState *sd)
 sd->erase_end = INVALID_ADDRESS;
 sd->csd[14] |= 0x40;
 
-/* Only SDSC cards support write protect groups */
-if (sdsc) {
-erase_start = sd_addr_to_wpnum(erase_start);
-erase_end = sd_addr_to_wpnum(erase_end);
-
-for (i = erase_start; i <= erase_end; i++) {
-assert(i < sd->wpgrps_size);
-if (test_bit(i, sd->wp_groups)) {
+memset(sd->data, 0xff, erase_len);
+for (erase_addr = erase_start; erase_addr <= erase_end;
+ erase_addr += erase_len) {
+if (sdsc) {
+/* Only SDSC cards support write protect groups */
+wpnum = sd_addr_to_wpnum(erase_addr);
+assert(wpnum < sd->wpgrps_size);
+if (test_bit(wpnum, sd->wp_groups)) {
 sd->card_status |= WP_ERASE_SKIP;
+continue;
 }
 }
+BLK_WRITE_BLOCK(erase_addr, erase_len);
 }
 }
 
-- 
2.26.2

[PULL 3/7] hw/sd: sdhci: Don't transfer any data when command time out

2021-03-22 Thread Philippe Mathieu-Daudé

From: Bin Meng 

At the end of sdhci_send_command(), it starts a data transfer if the
command register indicates data is associated. But the data transfer
should only be initiated when the command execution has succeeded.

With this fix, the following reproducer:

outl 0xcf8 0x80001810
outl 0xcfc 0xe1068000
outl 0xcf8 0x80001804
outw 0xcfc 0x7
write 0xe106802c 0x1 0x0f
write 0xe1068004 0xc 0x2801d10101fbff28a384
write 0xe106800c 0x1f 
0x9dacbbcad9e8f7061524334251606f7e8d9cabbac9d8e7f60514233241505f
write 0xe1068003 0x28 
0x80d000251480d000252280d000253080d000253e80d000254c80d000255a80d000256880d0002576
write 0xe1068003 0x1 0xfe

cannot be reproduced with the following QEMU command line:

$ qemu-system-x86_64 -nographic -M pc-q35-5.0 \
  -device sdhci-pci,sd-spec-version=3 \
  -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \
  -device sd-card,drive=mydrive \
  -monitor none -serial none -qtest stdio

Cc: qemu-sta...@nongnu.org
Fixes: CVE-2020-17380
Fixes: CVE-2020-25085
Fixes: CVE-2021-3409
Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller")
Reported-by: Alexander Bulekov 
Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum)
Reported-by: Sergej Schumilo (Ruhr-Universität Bochum)
Reported-by: Simon Wörner (Ruhr-Universität Bochum)
Buglink: https://bugs.launchpad.net/qemu/+bug/1892960
Buglink: https://bugs.launchpad.net/qemu/+bug/1909418
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146
Acked-by: Alistair Francis 
Tested-by: Alexander Bulekov 
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Bin Meng 
Message-Id: <20210303122639.20004-2-bmeng...@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sdhci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 9acf4467a32..f72d76c1784 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -326,6 +326,7 @@ static void sdhci_send_command(SDHCIState *s)
 SDRequest request;
 uint8_t response[16];
 int rlen;
+bool timeout = false;
 
 s->errintsts = 0;
 s->acmd12errsts = 0;
@@ -349,6 +350,7 @@ static void sdhci_send_command(SDHCIState *s)
 trace_sdhci_response16(s->rspreg[3], s->rspreg[2],
s->rspreg[1], s->rspreg[0]);
 } else {
+timeout = true;
 trace_sdhci_error("timeout waiting for command response");
 if (s->errintstsen & SDHC_EISEN_CMDTIMEOUT) {
 s->errintsts |= SDHC_EIS_CMDTIMEOUT;
@@ -369,7 +371,7 @@ static void sdhci_send_command(SDHCIState *s)
 
 sdhci_update_irq(s);
 
-if (s->blksize && (s->cmdreg & SDHC_CMD_DATA_PRESENT)) {
+if (!timeout && s->blksize && (s->cmdreg & SDHC_CMD_DATA_PRESENT)) {
 s->data_count = 0;
 sdhci_data_transfer(s);
 }
-- 
2.26.2

[PULL 6/7] hw/sd: sdhci: Limit block size only when SDHC_BLKSIZE register is writable

2021-03-22 Thread Philippe Mathieu-Daudé

From: Bin Meng 

The codes to limit the maximum block size is only necessary when
SDHC_BLKSIZE register is writable.

Tested-by: Alexander Bulekov 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Bin Meng 
Message-Id: <20210303122639.20004-5-bmeng...@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sdhci.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 7a2003b28b3..d0c8e293c0b 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -1142,15 +1142,15 @@ sdhci_write(void *opaque, hwaddr offset, uint64_t val, 
unsigned size)
 if (!TRANSFERRING_DATA(s->prnsts)) {
 MASKED_WRITE(s->blksize, mask, extract32(value, 0, 12));
 MASKED_WRITE(s->blkcnt, mask >> 16, value >> 16);
-}
 
-/* Limit block size to the maximum buffer size */
-if (extract32(s->blksize, 0, 12) > s->buf_maxsz) {
-qemu_log_mask(LOG_GUEST_ERROR, "%s: Size 0x%x is larger than "
-  "the maximum buffer 0x%x\n", __func__, s->blksize,
-  s->buf_maxsz);
+/* Limit block size to the maximum buffer size */
+if (extract32(s->blksize, 0, 12) > s->buf_maxsz) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Size 0x%x is larger than "
+  "the maximum buffer 0x%x\n", __func__, 
s->blksize,
+  s->buf_maxsz);
 
-s->blksize = deposit32(s->blksize, 0, 12, s->buf_maxsz);
+s->blksize = deposit32(s->blksize, 0, 12, s->buf_maxsz);
+}
 }
 
 break;
-- 
2.26.2

[PULL 4/7] hw/sd: sdhci: Don't write to SDHC_SYSAD register when transfer is in progress

2021-03-22 Thread Philippe Mathieu-Daudé

From: Bin Meng 

Per "SD Host Controller Standard Specification Version 7.00"
chapter 2.2.1 SDMA System Address Register:

This register can be accessed only if no transaction is executing
(i.e., after a transaction has stopped).

With this fix, the following reproducer:

outl 0xcf8 0x80001010
outl 0xcfc 0xfbefff00
outl 0xcf8 0x80001001
outl 0xcfc 0x0600
write 0xfbefff2c 0x1 0x05
write 0xfbefff0f 0x1 0x37
write 0xfbefff0a 0x1 0x01
write 0xfbefff0f 0x1 0x29
write 0xfbefff0f 0x1 0x02
write 0xfbefff0f 0x1 0x03
write 0xfbefff04 0x1 0x01
write 0xfbefff05 0x1 0x01
write 0xfbefff07 0x1 0x02
write 0xfbefff0c 0x1 0x33
write 0xfbefff0e 0x1 0x20
write 0xfbefff0f 0x1 0x00
write 0xfbefff2a 0x1 0x01
write 0xfbefff0c 0x1 0x00
write 0xfbefff03 0x1 0x00
write 0xfbefff05 0x1 0x00
write 0xfbefff2a 0x1 0x02
write 0xfbefff0c 0x1 0x32
write 0xfbefff01 0x1 0x01
write 0xfbefff02 0x1 0x01
write 0xfbefff03 0x1 0x01

cannot be reproduced with the following QEMU command line:

$ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \
   -nodefaults -device sdhci-pci,sd-spec-version=3 \
   -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \
   -device sd-card,drive=mydrive -qtest stdio

Cc: qemu-sta...@nongnu.org
Fixes: CVE-2020-17380
Fixes: CVE-2020-25085
Fixes: CVE-2021-3409
Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller")
Reported-by: Alexander Bulekov 
Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum)
Reported-by: Sergej Schumilo (Ruhr-Universität Bochum)
Reported-by: Simon Wörner (Ruhr-Universität Bochum)
Buglink: https://bugs.launchpad.net/qemu/+bug/1892960
Buglink: https://bugs.launchpad.net/qemu/+bug/1909418
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146
Tested-by: Alexander Bulekov 
Signed-off-by: Bin Meng 
Message-Id: <20210303122639.20004-3-bmeng...@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sdhci.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index f72d76c1784..3feb6c3a1fe 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -1121,15 +1121,17 @@ sdhci_write(void *opaque, hwaddr offset, uint64_t val, 
unsigned size)
 
 switch (offset & ~0x3) {
 case SDHC_SYSAD:
-s->sdmasysad = (s->sdmasysad & mask) | value;
-MASKED_WRITE(s->sdmasysad, mask, value);
-/* Writing to last byte of sdmasysad might trigger transfer */
-if (!(mask & 0xFF00) && TRANSFERRING_DATA(s->prnsts) && s->blkcnt 
&&
-s->blksize && SDHC_DMA_TYPE(s->hostctl1) == SDHC_CTRL_SDMA) {
-if (s->trnmod & SDHC_TRNS_MULTI) {
-sdhci_sdma_transfer_multi_blocks(s);
-} else {
-sdhci_sdma_transfer_single_block(s);
+if (!TRANSFERRING_DATA(s->prnsts)) {
+s->sdmasysad = (s->sdmasysad & mask) | value;
+MASKED_WRITE(s->sdmasysad, mask, value);
+/* Writing to last byte of sdmasysad might trigger transfer */
+if (!(mask & 0xFF00) && s->blkcnt && s->blksize &&
+SDHC_DMA_TYPE(s->hostctl1) == SDHC_CTRL_SDMA) {
+if (s->trnmod & SDHC_TRNS_MULTI) {
+sdhci_sdma_transfer_multi_blocks(s);
+} else {
+sdhci_sdma_transfer_single_block(s);
+}
 }
 }
 break;
-- 
2.26.2

[PULL 5/7] hw/sd: sdhci: Correctly set the controller status for ADMA

2021-03-22 Thread Philippe Mathieu-Daudé

From: Bin Meng 

When an ADMA transfer is started, the codes forget to set the
controller status to indicate a transfer is in progress.

With this fix, the following 2 reproducers:

https://paste.debian.net/plain/1185136
https://paste.debian.net/plain/1185141

cannot be reproduced with the following QEMU command line:

$ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \
  -nodefaults -device sdhci-pci,sd-spec-version=3 \
  -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \
  -device sd-card,drive=mydrive -qtest stdio

Cc: qemu-sta...@nongnu.org
Fixes: CVE-2020-17380
Fixes: CVE-2020-25085
Fixes: CVE-2021-3409
Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller")
Reported-by: Alexander Bulekov 
Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum)
Reported-by: Sergej Schumilo (Ruhr-Universität Bochum)
Reported-by: Simon Wörner (Ruhr-Universität Bochum)
Buglink: https://bugs.launchpad.net/qemu/+bug/1892960
Buglink: https://bugs.launchpad.net/qemu/+bug/1909418
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146
Tested-by: Alexander Bulekov 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Bin Meng 
Message-Id: <20210303122639.20004-4-bmeng...@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sdhci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 3feb6c3a1fe..7a2003b28b3 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -768,7 +768,9 @@ static void sdhci_do_adma(SDHCIState *s)
 
 switch (dscr.attr & SDHC_ADMA_ATTR_ACT_MASK) {
 case SDHC_ADMA_ATTR_ACT_TRAN:  /* data transfer */
+s->prnsts |= SDHC_DATA_INHIBIT | SDHC_DAT_LINE_ACTIVE;
 if (s->trnmod & SDHC_TRNS_READ) {
+s->prnsts |= SDHC_DOING_READ;
 while (length) {
 if (s->data_count == 0) {
 sdbus_read_data(&s->sdbus, s->fifo_buffer, block_size);
@@ -796,6 +798,7 @@ static void sdhci_do_adma(SDHCIState *s)
 }
 }
 } else {
+s->prnsts |= SDHC_DOING_WRITE;
 while (length) {
 begin = s->data_count;
 if ((length + begin) < block_size) {
-- 
2.26.2

[PULL 7/7] hw/sd: sdhci: Reset the data pointer of s->fifo_buffer[] when a different block size is programmed

2021-03-22 Thread Philippe Mathieu-Daudé

From: Bin Meng 

If the block size is programmed to a different value from the
previous one, reset the data pointer of s->fifo_buffer[] so that
s->fifo_buffer[] can be filled in using the new block size in
the next transfer.

With this fix, the following reproducer:

outl 0xcf8 0x80001010
outl 0xcfc 0xe000
outl 0xcf8 0x80001001
outl 0xcfc 0x0600
write 0xe02c 0x1 0x05
write 0xe005 0x1 0x02
write 0xe007 0x1 0x01
write 0xe028 0x1 0x10
write 0x0 0x1 0x23
write 0x2 0x1 0x08
write 0xe00c 0x1 0x01
write 0xe00e 0x1 0x20
write 0xe00f 0x1 0x00
write 0xe00c 0x1 0x32
write 0xe004 0x2 0x0200
write 0xe028 0x1 0x00
write 0xe003 0x1 0x40

cannot be reproduced with the following QEMU command line:

$ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \
  -nodefaults -device sdhci-pci,sd-spec-version=3 \
  -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \
  -device sd-card,drive=mydrive -qtest stdio

Cc: qemu-sta...@nongnu.org
Fixes: CVE-2020-17380
Fixes: CVE-2020-25085
Fixes: CVE-2021-3409
Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller")
Reported-by: Alexander Bulekov 
Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum)
Reported-by: Sergej Schumilo (Ruhr-Universität Bochum)
Reported-by: Simon Wörner (Ruhr-Universität Bochum)
Buglink: https://bugs.launchpad.net/qemu/+bug/1892960
Buglink: https://bugs.launchpad.net/qemu/+bug/1909418
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146
Tested-by: Alexander Bulekov 
Signed-off-by: Bin Meng 
Message-Id: <20210303122639.20004-6-bmeng...@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sdhci.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index d0c8e293c0b..5b8678110b0 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -1140,6 +1140,8 @@ sdhci_write(void *opaque, hwaddr offset, uint64_t val, 
unsigned size)
 break;
 case SDHC_BLKSIZE:
 if (!TRANSFERRING_DATA(s->prnsts)) {
+uint16_t blksize = s->blksize;
+
 MASKED_WRITE(s->blksize, mask, extract32(value, 0, 12));
 MASKED_WRITE(s->blkcnt, mask >> 16, value >> 16);
 
@@ -1151,6 +1153,16 @@ sdhci_write(void *opaque, hwaddr offset, uint64_t val, 
unsigned size)
 
 s->blksize = deposit32(s->blksize, 0, 12, s->buf_maxsz);
 }
+
+/*
+ * If the block size is programmed to a different value from
+ * the previous one, reset the data pointer of s->fifo_buffer[]
+ * so that s->fifo_buffer[] can be filled in using the new block
+ * size in the next transfer.
+ */
+if (blksize != s->blksize) {
+s->data_count = 0;
+}
 }
 
 break;
-- 
2.26.2

Re: [PATCH v3 1/4] block: feature detection for host block support

2021-03-22 Thread Joelle van Dyne

On Mon, Mar 15, 2021 at 11:03 AM Joelle van Dyne  wrote:
>
> On Darwin (iOS), there are no system level APIs for directly accessing
> host block devices. We detect this at configure time.
>
> Signed-off-by: Joelle van Dyne 

Hi all, this is the last patch in this set that has to be reviewed. I
was wondering if it's still possible to make it into 6.0? Thanks.

-j

Re: [PATCH V4] file-posix: allow -EBUSY error during ioctl(fd, BLKZEROOUT, range) on block

2021-03-22 Thread John Snow


On 3/22/21 5:25 AM, ChangLimin wrote:

For Linux 5.10/5.11, qemu write zeros to a multipath device using
ioctl(fd, BLKZEROOUT, range) with cache none or directsync return -EBUSY
permanently. Fallback to pwritev instead of exit for -EBUSY error.

The issue was introduced in Linux 5.10:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=384d87ef2c954fc58e6c5fd8253e4a1984f5fe02

Fixed in Linux 5.12:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56887cffe946bb0a90c74429fa94d6110a73119d

Signed-off-by: ChangLimin 


To be clear, when I asked "When do we get -EINVAL?" it wasn't because I 
doubted that we would ever get it, I was just unclear of the 
circumstances in which we might receive EINVAL and was hoping you would 
explain it to me.



---
  block/file-posix.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 20e14f8e96..d4054ac9cb 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1624,8 +1624,12 @@ static ssize_t 
handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)

          } while (errno == EINTR);

          ret = translate_err(-errno);
-        if (ret == -ENOTSUP) {
-            s->has_write_zeroes = false;
+        switch (ret) {
+        case -ENOTSUP:
+            s->has_write_zeroes = false; /* fall through */
+        case -EBUSY: /* Linux 5.10/5.11 may return -EBUSY for multipath 
devices */

+            return -ENOTSUP;
+            break;


What effect does this have, now?

We'll return ENOTSUP but we won't disable trying it again in the future, 
is that right?


Kevin, is this what you had in mind?

--js


          }
      }
  #endif
--
2.27.0

Re: [PATCH v3 0/6] iotests: fix failures with non-PCI machines

2021-03-22 Thread Alex Bennée



Laurent Vivier  writes:

> Tests are executed using virtio-*-pci even on a non PCI machine..
> .
> The problem can be easily fixed using the virtio aliases.
> (virtio-*), to run virtio-*-ccw on s390x and virtio-*-device on.
> m68k..
> .
> A first attempt was tried with virtio-*-ccw by detecting.
> the machine type, this series removes it to use the aliases that.
> are a cleaner approach..

Queued to for-6.0/fixes-for-rc1, thanks.

-- 
Alex Bennée

[PATCH RFC] docs: document file-posix locking protocol

2021-03-22 Thread Vladimir Sementsov-Ogievskiy

Let's document how we use file locks in file-posix driver, to allow
external programs to "communicate" in this way with Qemu.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all!

We need to access disk images from non-Qemu code and coordinate with
Qemu utilities which may use same image. So, we want to support Qemu
file locking in the external code.

So, here is a patch to document how Qemu locking works, and make this
thing "public".

This is an RFC, because I'm unsure how should we actually document
different operations we have.

For example greaph-mod is a strange thing, I think we should get rid
of it at all.. And at least, no sense in locking corresponding byte in a
raw file.

The other thing is write-unchanged.. What it means when we consider a
raw file opened in several processes? Probably we don't need it too..

 docs/system/qemu-block-drivers.rst.inc | 55 ++
 1 file changed, 55 insertions(+)

diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index b052a6d14e..3cd708b3dc 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -952,3 +952,58 @@ on host and see if there are locks held by the QEMU 
process on the image file.
 More than one byte could be locked by the QEMU instance, each byte of which
 reflects a particular permission that is acquired or protected by the running
 block driver.
+
+Image locking protocol
+~~
+
+QEMU holds rd locks and never rw locks. Instead, GETLK fcntl is used with 
F_WRLCK
+to handle permissions as described below.
+QEMU process may rd-lock the following bytes of the image with corresponding
+meaning:
+
+Permission bytes. If permission byte is rd-locked, it means that some process
+uses corresponding permission on that file.
+
+ByteOperation
+100 read
+  Lock holder can read
+101 write
+  Lock holder can write
+102 write-unchanged
+  Lock holder can write same data
+103 resize
+  Lock holder can resize the file
+104 graph-mod
+  Undefined. QEMU sometimes locks this byte, but external programs
+  should not. QEMU will stop locking this byte in future
+
+Unshare bytes. If permission byte is rd-locked, it means that some process
+does not allow the others use corresponding options on that file.
+
+ByteOperation
+200 read
+  Lock holder don't allow read operation to other processes.
+201 write
+  Lock holder don't allow write operation to other processes.
+202 write-unchanged
+  Lock holder don't allow write-unchanged operation to other processes.
+203 resize
+  Lock holder don't allow resizing the file by other processes.
+204 graph-mod
+  Undefined. QEMU sometimes locks this byte, but external programs
+  should not. QEMU will stop locking this byte in future
+
+Handling the permissions works as follows: assume we want to open the file to 
do
+some operations and in the same time want to disallow some operation to other
+processes. So, we want to lock some of the bytes described above. We operate as
+follows:
+
+1. rd-lock all needed bytes, both "permission" bytes and "unshare" bytes.
+
+2. For each "unshare" byte we rd-locked, do GETLK that "tries" to wr-lock
+corresponding "permission" byte. So, we check is there any other process that
+uses the permission we want to unshare. If it exists we fail.
+
+3. For each "permission" byte we rd-locked, do GETLK that "tries" to wr-lock
+corresponding "unshare" byte. So, we check is there any other process that
+unshares the permission we want to have. If it exists we fail.
-- 
2.29.2

Re: [PULL 00/11] emulated nvme updates and fixes

2021-03-22 Thread Klaus Jensen

On Mar 18 12:11, Daniel P. Berrangé wrote:
> On Thu, Mar 18, 2021 at 01:01:58PM +0100, Klaus Jensen wrote:
> > On Mar 18 11:28, Peter Maydell wrote:
> > > On Thu, 18 Mar 2021 at 11:27, Klaus Jensen  wrote:
> > > >
> > > > On Mar 18 11:26, Peter Maydell wrote:
> > > > > On Tue, 16 Mar 2021 at 21:47, Klaus Jensen  wrote:
> > > > > Hi. This tag includes a submodule update which is not mentioned
> > > > > in the cover letter or listed in the cover letter diffstat:
> > > > >
> > > > >  roms/opensbi   |2 +-
> > > > >
> > > > > so I suspect it was inadvertent. Please fix up and resend.
> > > > >
> > > >
> > > > Oh crap. Sorry!
> > > 
> > > No worries -- git makes this a very easy mistake to make when
> > > doing rebases. That's why I have a check for it in my 'apply
> > > a pull request' scripts :-)
> > > 
> > 
> > Out of curiosity, are there any obvious safe guards I can implement
> > myself to stop this from happening?
> 
> AFAICT, latest versions of git no longer add a submodule when doing
> "git add -u". You have to explicitly specify the submodule path
> to stage it. So this prevent exactly this kind of accident.
> 

Never said thanks for that tip, so Thanks! :)

git commit -a is a pretty bad habbit and it's too bad it can't easily be
disallowed in the config file ;)


signature.asc
Description: PGP signature

Re: Fwd: [PATCH 0/2] block/raw: implemented persistent dirty bitmap and ability to dump bitmap content via qapi

2021-03-22 Thread Patrik Janoušek

On 3/22/21 1:06 PM, Max Reitz wrote:
> On 22.03.21 12:27, Patrik Janoušek wrote:
>> On 3/22/21 11:48 AM, Max Reitz wrote:
>>> Hi,
>>>
>>> On 20.03.21 11:01, Patrik Janoušek wrote:
 I'm sorry, but I forgot to add you to the cc, so I'm forwarding the
 patch to you additionally. I don't want to spam the mailing list
 unnecessarily.
>>>
>>> I think it’s better to still CC the list.  It’s so full of mail, one
>>> more won’t hurt. :)
>>>
>>> (Re-adding qemu-block and qemu-devel, because the discussion belongs
>>> on the list(s).)
>>>
  Forwarded Message 
 Subject: [PATCH 0/2] block/raw: implemented persistent dirty
 bitmap and ability to dump bitmap content via qapi
 Date: Sat, 20 Mar 2021 10:32:33 +0100
 From: Patrik Janoušek 
 To: qemu-de...@nongnu.org
 CC: Patrik Janoušek , lmate...@kiv.zcu.cz



 Currently, QEMU doesn't support persistent dirty bitmaps for raw
 format
 and also dirty bitmaps are for internal use only, and cannot be
 accessed
 using third-party applications. These facts are very limiting
 in case someone would like to develop their own backup tool becaouse
 without access to the dirty bitmap it would be possible to implement
 only full backups. And without persistent dirty bitmaps, it wouldn't
 be possible to keep track of changed data after QEMU is restarted. And
 this is exactly what I do as a part of my bachelor thesis. I've
 developed a tool that is able to create incremental backups of drives
 in raw format that are LVM volumes (ability to create snapshot is
 required).
>>>
>>> Similarly to what Vladimir has said already, the thing is that
>>> conceptually I can see no difference between having a raw image with
>>> the bitmaps stored in some other file, i.e.:
>>>
>>>    { "driver": "raw",
>>>  "dirty-bitmaps": [ {
>>>    "filename": "sdc1.bitmap",
>>>    "persistent": true
>>>  } ],
>>>  "file": {
>>>    "driver": "file",
>>>    "filename": "/dev/sdc1"
>>>  } }
>>>
>>> And having a qcow2 image with the raw data stored in some other file,
>>> i.e.:
>>>
>>>    { "driver": "qcow2",
>>>  "file": {
>>>    "driver": "file",
>>>    "filename": "sdc1.metadata"
>>>  },
>>>  "data-file": {
>>>    "driver": "file",
>>>    "filename": "/dev/sdc1"
>>>  } }
>>>
>>> (Where sdc1.metadata is a qcow2 file created with
>>> “data-file=/dev/sdc1,data-file-raw=on”.)
>>>
>>> To use persistent bitmaps with raw images, you need to add metadata
>>> (namely, the bitmaps).  Why not store that metadata in a qcow2 file?
>>>
>>> Max
>>
>> So if I understand it correctly. I can configure dirty bitmaps in the
>> latest version of QEMU to be persistently stored in some other file.
>> Because even Proxmox Backup Server can't perform an incremental backup
>> after restarting QEMU, and that means something to me. I think they
>> would implement it if it was that simple.
>>
>> Could you please send me simple example on how to configure (via command
>> line args) one raw format drive that can store dirty bitmaps
>> persistently in other qcow2 file? I may be missing something, but I
>> thought QEMU couldn't do it, because Proxmox community wants this
>> feature for a long time.
>
> One trouble is that if you use qemu-img create to create the qcow2
> image, it will always create an empty image, and so if use pass
> data_file to it, it will empty the existing raw image:
>
> $ cp ~/tmp/arch.iso raw.img # Just some Arch Linux ISO
>
> $ qemu-img create \
>     -f qcow2 \
>     -o data_file=raw.img,data_file_raw=on,preallocation=metadata \
>     metadata.qcow2 \
>     $(stat -c '%s' raw.img)
> Formatting 'metadata.qcow2', fmt=qcow2 cluster_size=65536
> preallocation=metadata compression_type=zlib size=687865856
> data_file=raw.img data_file_raw=on lazy_refcounts=off refcount_bits=16
>
> (If you check raw.img at this point, you’ll find that it’s empty, so
> you need to copy it from the source again:)
>
> $ cp ~/tmp/arch.iso raw.img
>
> Now if you use metadata.qcow2, the image data will actually all be
> stored in raw.img.
>
>
> To get around the “creating metadata.qcow2 clears raw.img” problem,
> you can either create a temporary empty image of the same size as
> raw.img that you pass to qemu-img create, and then you use qemu-img
> amend to change the data-file pointer (which will not overwrite the
> new data-file’s contents):
>
> $ qemu-img create -f raw tmp.raw $(stat -c '%s' raw.img)
>
> $ qemu-img create \
>     -f qcow2 \
>     -o data_file=tmp.img,data_file_raw=on,preallocation=metadata \
>     metadata.qcow2 \
>     $(stat -c '%s' raw.img)
> Formatting 'metadata.qcow2', fmt=qcow2 cluster_size=65536
> preallocation=metadata compression_type=zlib size=687865856
> data_file=tmp.img data_file_raw=on lazy_refcounts=off refcount_bits=16
>
> $ qemu-img amend -o data_file=raw.img metadata.qcow2
>
> $ rm tmp.img
>
>
> Or you use the bl

Re: [PATCH v2] piix: fix regression during unplug in Xen HVM domUs

2021-03-22 Thread John Snow


On 3/17/21 3:00 AM, Olaf Hering wrote:

Commit ee358e919e385fdc79d59d0d47b4a81e349cd5c9 causes a regression in
Xen HVM domUs which run xenlinux based kernels.

If the domU has an USB device assigned, for example with
"usbdevice=['tablet']" in domU.cfg, the late unplug of devices will
kill the emulated USB host. As a result the khubd thread hangs, and as
a result the entire boot process.

For some reason this does not affect pvops based kernels. This is
most likely caused by the fact that unplugging happens very early
during boot.



I'm not entirely sure of how the commit message relates to the patch, 
actually. (Sorry, I am not well familiar with XEN.)



Signed-off-by: Olaf Hering 
---
  hw/ide/piix.c| 5 +
  include/hw/ide/pci.h | 1 +
  2 files changed, 6 insertions(+)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index b9860e35a5..7f1998bf04 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -109,6 +109,9 @@ static void piix_ide_reset(DeviceState *dev)
  uint8_t *pci_conf = pd->config;
  int i;
  
+if (d->xen_unplug_done == true) {

+return;
+}


My understanding is that XEN has some extra disks that it unplugs when 
it later figures out it doesn't need them. How exactly this works is 
something I've not looked into too closely.


So if these IDE devices have been "unplugged" already, we avoid 
resetting them here. What about this reset causes the bug you describe 
in the commit message?


Does this reset now happen earlier/later as compared to what it did 
prior to ee358e91 ?



  for (i = 0; i < 2; i++) {
  ide_bus_reset(&d->bus[i]);
  }
@@ -151,6 +154,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
  PCIIDEState *d = PCI_IDE(dev);
  uint8_t *pci_conf = dev->config;
  
+d->xen_unplug_done = false;

  pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
  
  bmdma_setup_bar(d);

@@ -170,6 +174,7 @@ int pci_piix3_xen_ide_unplug(DeviceState *dev, bool aux)
  BlockBackend *blk;
  
  pci_ide = PCI_IDE(dev);

+pci_ide->xen_unplug_done = true;
  
  for (i = aux ? 1 : 0; i < 4; i++) {

  idebus = &pci_ide->bus[i / 2];
diff --git a/include/hw/ide/pci.h b/include/hw/ide/pci.h
index d8384e1c42..9e71cfec3b 100644
--- a/include/hw/ide/pci.h
+++ b/include/hw/ide/pci.h
@@ -50,6 +50,7 @@ struct PCIIDEState {
  IDEBus bus[2];
  BMDMAState bmdma[2];
  uint32_t secondary; /* used only for cmd646 */
+bool xen_unplug_done;


I am hesitant to put a new XEN-specific boolean here, but don't know 
enough about the problem to outright say "no".


This looks like a band-aid that's out of place, but I don't understand 
the problem well enough yet to suggest a better place.



  MemoryRegion bmdma_bar;
  MemoryRegion cmd_bar[2];
  MemoryRegion data_bar[2];



(If anyone else with more experience with XEN wants to take over the 
review of this patch, let me know. I only really care about the IDE bits.)

Re: [PATCH v1] vhost-user-blk: use different event handlers on init and operation

2021-03-22 Thread Raphael Norwitz

I'm mostly happy with this. My biggest overall comment is that I think
this should be split into two, as your refactor using different event
handlers for init is a standalone improvement over and above the bugfix.

I would have the first commit split out vhost_user_blk_event_init() and
vhost_user_blk_event_oper(), replace the runstate_is_running() check 
...etc and the second commit immidiately call
vhost_user_blk_disconnect() during device realization.

A couple other comments mixed in bellow:

On Thu, Mar 11, 2021 at 11:10:45AM +0300, Denis Plotnikov wrote:
> Commit a1a20d06b73e "vhost-user-blk: delay vhost_user_blk_disconnect"

For the hash above can we rather use the first digits of the commit hash
instead of the last?

> introduced postponing vhost_dev cleanup aiming to eliminate qemu aborts
> because of connection problems with vhost-blk daemon.
> 
> However, it introdues a new problem. Now, any communication errors
> during execution of vhost_dev_init() called by vhost_user_blk_device_realize()
> lead to qemu abort on assert in vhost_dev_get_config().
> 
> This happens because vhost_user_blk_disconnect() is postponed but
> it should have dropped s->connected flag by the time
> vhost_user_blk_device_realize() performs a new connection opening.
> On the connection opening, vhost_dev initialization in
> vhost_user_blk_connect() relies on s->connection flag and
> if it's not dropped, it skips vhost_dev initialization and returns
> with success. Then, vhost_user_blk_device_realize()'s execution flow
> goes to vhost_dev_get_config() where it's aborted on the assert.
> 
> It seems connection/disconnection processing should happen
> differently on initialization and operation of vhost-user-blk.
> On initialization (in vhost_user_blk_device_realize()) we fully
> control the initialization process. At that point, nobody can use the
> device since it isn't initialized and we don't need to postpone any
> cleanups, so we can do cleanup right away when there is communication
> problems with the vhost-blk daemon.
> On operation the disconnect may happen when the device is in use, so
> the device users may want to use vhost_dev's data to do rollback before
> vhost_dev is re-initialized (e.g. in vhost_dev_set_log()), so we
> postpone the cleanup.
> 
> The patch splits those two cases, and performs the cleanup immediately on
> initialization, and postpones cleanup when the device is initialized and
> in use.
> 
> Signed-off-by: Denis Plotnikov 
> ---
>  hw/block/vhost-user-blk.c | 88 ---
>  1 file changed, 54 insertions(+), 34 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index b870a50e6b20..84940122b8ca 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -362,7 +362,17 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
>  vhost_dev_cleanup(&s->dev);
>  }
>  
> -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
> +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event, bool 
> init);

The parameter name "init" feels a little unclear. Maybe "realized"
would be better? I would also change the vhost_user_blk_event_init
function name accordingly.

> +
> +static void vhost_user_blk_event_init(void *opaque, QEMUChrEvent event)
> +{
> +vhost_user_blk_event(opaque, event, true);
> +}
> +
> +static void vhost_user_blk_event_oper(void *opaque, QEMUChrEvent event)
> +{
> +vhost_user_blk_event(opaque, event, false);
> +}
>  
>  static void vhost_user_blk_chr_closed_bh(void *opaque)
>  {
> @@ -371,11 +381,11 @@ static void vhost_user_blk_chr_closed_bh(void *opaque)
>  VHostUserBlk *s = VHOST_USER_BLK(vdev);
>  
>  vhost_user_blk_disconnect(dev);
> -qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> -NULL, opaque, NULL, true);
> +qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL,
> +vhost_user_blk_event_oper, NULL, opaque, NULL, true);
>  }
>  
> -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
> +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event, bool init)
>  {
>  DeviceState *dev = opaque;
>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> @@ -390,38 +400,42 @@ static void vhost_user_blk_event(void *opaque, 
> QEMUChrEvent event)
>  break;
>  case CHR_EVENT_CLOSED:
>  /*
> - * A close event may happen during a read/write, but vhost
> - * code assumes the vhost_dev remains setup, so delay the
> - * stop & clear. There are two possible paths to hit this
> - * disconnect event:
> - * 1. When VM is in the RUN_STATE_PRELAUNCH state. The
> - * vhost_user_blk_device_realize() is a caller.
> - * 2. In tha main loop phase after VM start.
> - *
> - * For p2 the disconnect event will be delayed. We can't
> - * do the same for p1, because we are not running the loop
> - * at this mome

Re: [PATCH 1/2] block/raw: added support of persistent dirty bitmaps

2021-03-22 Thread Lubos Matejka

Kdy si muzem cinknout k dalsimu vyvoji?

Odesláno z iPhonu

> 22. 3. 2021 v 12:37, Patrik Janoušek :
> 
> 
>> On 3/22/21 12:18 PM, Vladimir Sementsov-Ogievskiy wrote:
>> 22.03.2021 13:46, Vladimir Sementsov-Ogievskiy wrote:
>>> 22.03.2021 13:18, Patrik Janoušek wrote:
 On 3/22/21 9:41 AM, Vladimir Sementsov-Ogievskiy wrote:
> 20.03.2021 12:32, Patrik Janoušek wrote:
>> Current implementation of dirty bitmaps for raw format is very
>> limited, because those bitmaps cannot be persistent. Basically it
>> makes sense, because the raw format doesn't have space where could
>> be dirty bitmap stored when QEMU is offline. This patch solves it
>> by storing content of every dirty bitmap in separate file on the
>> host filesystem.
>> 
>> However, this only solves one part of the problem. We also have to
>> store information about the existence of the dirty bitmap. This is
>> solved by adding custom options, that stores all required metadata
>> about dirty bitmap (filename where is the bitmap stored on the
>> host filesystem, granularity, persistence, etc.).
>> 
>> Signed-off-by: Patrik Janoušek
> 
> 
> Hmm. Did you considered other ways? Honestly, I don't see a reason for
> yet another storing format for bitmaps.
> 
> The task could be simply solved with existing features:
> 
> 1. We have extenal-data-file feature in qcow2 (read
> docs/interop/qcow2.txt). With this thing enabled, qcow2 file contains
> only metadata (persistent bitmaps for example) and data is stored in
> separate sequential raw file. I think you should start from it.
 
 I didn't know about that feature. I'll look at it.
 
 In case I use NBD to access the bitmap context and qcow2 as a solution
 for persistent layer. Would the patch be acceptable? This is
 significant
 change to my solution and I don't have enought time for it at the
 moment
 (mainly due to other parts of my bachelor's thesis). I just want to
 know
 if this kind of feature is interesting to you and its implementation is
 worth my time.
>>> 
>>> Honestly, at this point I think it doesn't. If existing features
>>> satisfy your use-case, no reason to increase complexity of file-posix
>>> driver and QAPI.
>>> 
>> 
>> It's unpleasant to say this, keeping in mind that that's your first
>> submission :(
>> 
>> I can still recommend in a connection with your bachelor's thesis to
>> look at the videos at kvm-forum youtube channel, searching for backup:
>> 
>>  
>> https://www.youtube.com/channel/UCRCSQmAOh7yzgheq-emy1xA/search?query=backup
>> 
>> You'll get a lot of information about current developments of external
>> backup API.
>> 
>> Also note, that there is (or there will be ?) libvirt Backup API,
>> which includes an API for external backup. I don't know the current
>> status of it, but if your project is based on libvirt, it's better to
>> use libvirt backup API instead of using qemu directly. About Libvirt
>> Backup API it's better to ask Eric Blake (adding him to CC).
> Unfortunately, my solution is based on Proxmox so I can't use libvirt's
> features. I know that a beta version of Proxmox Backup Server has been
> released and it would be much better to improve their solution, but they
> did it too late so I couldn't change assignment of my bachelor's thesis.
>

53 matches

Mail list logo