[PATCH 3/3] iotests: add backup-discard-source

2022-03-31 Thread Vladimir Sementsov-Ogievskiy
Add test for a new backup option: discard-source.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 .../qemu-iotests/tests/backup-discard-source  | 154 ++
 .../tests/backup-discard-source.out   |   5 +
 2 files changed, 159 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/backup-discard-source
 create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out

diff --git a/tests/qemu-iotests/tests/backup-discard-source 
b/tests/qemu-iotests/tests/backup-discard-source
new file mode 100755
index 00..d301fbd2d1
--- /dev/null
+++ b/tests/qemu-iotests/tests/backup-discard-source
@@ -0,0 +1,154 @@
+#!/usr/bin/env python3
+#
+# Test removing persistent bitmap from backing
+#
+# Copyright (c) 2022 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+
+import iotests
+from iotests import qemu_img_create, qemu_img_map, qemu_io
+
+
+temp_img = os.path.join(iotests.test_dir, 'temp')
+source_img = os.path.join(iotests.test_dir, 'source')
+target_img = os.path.join(iotests.test_dir, 'target')
+size = '1M'
+
+def get_actual_size(vm, node_name):
+nodes = vm.qmp('query-named-block-nodes', flat=True)['return']
+node = next(n for n in nodes if n['node-name'] == node_name)
+return node['image']['actual-size']
+
+
+class TestBackup(iotests.QMPTestCase):
+def setUp(self):
+qemu_img_create('-f', iotests.imgfmt, source_img, size)
+qemu_img_create('-f', iotests.imgfmt, temp_img, size)
+qemu_img_create('-f', iotests.imgfmt, target_img, size)
+qemu_io('-c', 'write 0 1M', source_img)
+
+self.vm = iotests.VM()
+self.vm.launch()
+
+result = self.vm.qmp('blockdev-add', {
+'node-name': 'cbw',
+'driver': 'copy-before-write',
+'file': {
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': source_img,
+}
+},
+'target': {
+'driver': iotests.imgfmt,
+'discard': 'unmap',
+'node-name': 'temp',
+'file': {
+'driver': 'file',
+'filename': temp_img
+}
+}
+})
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('blockdev-add', {
+'node-name': 'access',
+'discard': 'unmap',
+'driver': 'snapshot-access',
+'file': 'cbw'
+})
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('blockdev-add', {
+'driver': iotests.imgfmt,
+'node-name': 'target',
+'file': {
+'driver': 'file',
+'filename': target_img
+}
+})
+self.assert_qmp(result, 'return', {})
+
+self.assertLess(get_actual_size(self.vm, 'temp'), 512 * 1024)
+
+def tearDown(self):
+# That should fail, because region is discarded
+self.vm.hmp_qemu_io('access', 'read 0 1M')
+
+self.vm.shutdown()
+
+self.assertTrue('read failed: Permission denied' in self.vm.get_log())
+
+# Final check that temp image is empty
+mapping = qemu_img_map(temp_img)
+self.assertEqual(len(mapping), 1)
+self.assertEqual(mapping[0]['start'], 0)
+self.assertEqual(mapping[0]['length'], 1024 * 1024)
+self.assertEqual(mapping[0]['data'], False)
+
+os.remove(temp_img)
+os.remove(source_img)
+os.remove(target_img)
+
+def do_backup(self):
+result = self.vm.qmp('blockdev-backup', device='access',
+ sync='full', target='target',
+ job_id='backup0',
+ discard_source=True)
+self.assert_qmp(result, 'return', {})
+
+self.vm.event_wait(name='BLOCK_JOB_COMPLETED')
+
+def test_discard_written(self):
+"""
+1. Guest writes
+2. copy-before-write operation, data is stored to temp
+3. start backup(discard_source=True), check that data is
+   removed from temp
+"""
+# Trigger copy-before-write operation
+result = self.vm.hmp_qemu_io('cbw', 'write 0 1M')
+self.assert_qmp(result, 'return', '')
+
+# Check

[PATCH 0/3] backup: discard-source parameter

2022-03-31 Thread Vladimir Sementsov-Ogievskiy
Hi all!

Here is a new option for backup, that brings two things into
push-backup-with-fleecing scheme:

 - discard copied region in temporary image to save disk space
 - avoid extra copy-before-write operation in the region that is already
   copied

This is based on
"[PATCH v5 00/45] Transactional block-graph modifying API"
Based-on: <20220330212902.590099-1-vsement...@openvz.org>

Vladimir Sementsov-Ogievskiy (3):
  block/copy-before-write: create block_copy bitmap in filter node
  qapi: blockdev-backup: add discard-source parameter
  iotests: add backup-discard-source

 block/backup.c|   5 +-
 block/block-copy.c|  13 +-
 block/copy-before-write.c |   4 +-
 block/replication.c   |   4 +-
 blockdev.c|   2 +-
 include/block/block-copy.h|   3 +-
 include/block/block_int-global-state.h|   2 +-
 qapi/block-core.json  |   4 +
 tests/qemu-iotests/257.out| 112 ++---
 .../qemu-iotests/tests/backup-discard-source  | 154 ++
 .../tests/backup-discard-source.out   |   5 +
 11 files changed, 240 insertions(+), 68 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/backup-discard-source
 create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out

-- 
2.35.1




[PATCH 2/3] qapi: blockdev-backup: add discard-source parameter

2022-03-31 Thread Vladimir Sementsov-Ogievskiy
Add a parameter that enables discard-after-copy. That is mostly useful
in "push backup with fleecing" scheme, when source is snapshot-access
format driver node, based on copy-before-write filter snapshot-access
API:

[guest]  [snapshot-access] ~~ blockdev-backup ~~> [backup target]
   ||
   | root   | file
   vv
[copy-before-write]
   | |
   | file| target
   v v
[active disk]   [temp.img]

In this case discard-after-copy does two things:

 - discard data in temp.img to save disk space
 - avoid further copy-before-write operation in discarded area

Note that we have to declare WRITE permission on source in
copy-before-write filter, for discard to work. Alternative is to pass
an option to bdrv_cbw_append(), add some internal open-option for
copy-before-write filter to require WRITE permission only for backup
with discard-source=true. But I'm not sure it worth the complexity.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/backup.c |  5 +++--
 block/block-copy.c | 10 --
 block/copy-before-write.c  |  2 +-
 block/replication.c|  4 ++--
 blockdev.c |  2 +-
 include/block/block-copy.h |  2 +-
 include/block/block_int-global-state.h |  2 +-
 qapi/block-core.json   |  4 
 8 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 5cfd0b999c..d0d512ec61 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -355,7 +355,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
   BlockDriverState *target, int64_t speed,
   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
   BitmapSyncMode bitmap_mode,
-  bool compress,
+  bool compress, bool discard_source,
   const char *filter_node_name,
   BackupPerf *perf,
   BlockdevOnError on_source_error,
@@ -486,7 +486,8 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 job->len = len;
 job->perf = *perf;
 
-block_copy_set_copy_opts(bcs, perf->use_copy_range, compress);
+block_copy_set_copy_opts(bcs, perf->use_copy_range, compress,
+ discard_source);
 block_copy_set_progress_meter(bcs, &job->common.job.progress);
 block_copy_set_speed(bcs, speed);
 
diff --git a/block/block-copy.c b/block/block-copy.c
index 9626043480..2d8373f63f 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -133,6 +133,7 @@ typedef struct BlockCopyState {
 CoMutex lock;
 int64_t in_flight_bytes;
 BlockCopyMethod method;
+bool discard_source;
 BlockReqList reqs;
 QLIST_HEAD(, BlockCopyCallState) calls;
 /*
@@ -278,11 +279,12 @@ static uint32_t block_copy_max_transfer(BdrvChild 
*source, BdrvChild *target)
 }
 
 void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
-  bool compress)
+  bool compress, bool discard_source)
 {
 /* Keep BDRV_REQ_SERIALISING set (or not set) in block_copy_state_new() */
 s->write_flags = (s->write_flags & BDRV_REQ_SERIALISING) |
 (compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
+s->discard_source = discard_source;
 
 if (s->max_transfer < s->cluster_size) {
 /*
@@ -405,7 +407,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 cluster_size),
 };
 
-block_copy_set_copy_opts(s, false, false);
+block_copy_set_copy_opts(s, false, false, false);
 
 ratelimit_init(&s->rate_limit);
 qemu_co_mutex_init(&s->lock);
@@ -575,6 +577,10 @@ static coroutine_fn int block_copy_task_entry(AioTask 
*task)
 co_put_to_shres(s->mem, t->req.bytes);
 block_copy_task_end(t, ret);
 
+if (s->discard_source && ret == 0) {
+bdrv_co_pdiscard(s->source, t->req.offset, t->req.bytes);
+}
+
 return ret;
 }
 
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 79cf12380e..3e77313a9a 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -319,7 +319,7 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
 bdrv_default_perms(bs, c, role, reopen_queue,
perm, shared, nperm, nshared);
 
-*nperm = *nperm | BLK_PERM_CONSISTENT_READ;
+*nperm = *nperm | BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE;
 *nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
 }
 }
diff --git a/block/replication.c b/block/replication.c
index 2f17397764..f6a0b23563 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -587,8 +587,8 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
 
 s->backup_job = backup_job_create(
 NULL, s->secondary

[PATCH 1/3] block/copy-before-write: create block_copy bitmap in filter node

2022-03-31 Thread Vladimir Sementsov-Ogievskiy
Currently block_copy creates copy_bitmap in source node. But that is in
bad relation with .independent_close=true of copy-before-write filter:
source node may be detached and removed before .bdrv_close() handler
called, which should call block_copy_state_free(), which in turn should
remove copy_bitmap.

That's all not ideal: it would be better if internal bitmap of
block-copy object is not attached to any node. But that is not possible
now.

The simplest solution is just create copy_bitmap in filter node, where
anyway two other bitmaps are created.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/block-copy.c |   3 +-
 block/copy-before-write.c  |   2 +-
 include/block/block-copy.h |   1 +
 tests/qemu-iotests/257.out | 112 ++---
 4 files changed, 60 insertions(+), 58 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index ec46775ea5..9626043480 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -342,6 +342,7 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
 }
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+ BlockDriverState *copy_bitmap_bs,
  const BdrvDirtyBitmap *bitmap,
  Error **errp)
 {
@@ -356,7 +357,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 return NULL;
 }
 
-copy_bitmap = bdrv_create_dirty_bitmap(source->bs, cluster_size, NULL,
+copy_bitmap = bdrv_create_dirty_bitmap(copy_bitmap_bs, cluster_size, NULL,
errp);
 if (!copy_bitmap) {
 return NULL;
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 90a9c7874a..79cf12380e 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -398,7 +398,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
 
-s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
+s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap, errp);
 if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 return -EINVAL;
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 68bbd344b2..b03eb5f016 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,6 +25,7 @@ typedef struct BlockCopyState BlockCopyState;
 typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+ BlockDriverState *copy_bitmap_bs,
  const BdrvDirtyBitmap *bitmap,
  Error **errp);
 
diff --git a/tests/qemu-iotests/257.out b/tests/qemu-iotests/257.out
index aa76131ca9..c33dd7f3a9 100644
--- a/tests/qemu-iotests/257.out
+++ b/tests/qemu-iotests/257.out
@@ -120,16 +120,16 @@ write -P0x67 0x3fe 0x2
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  }
-],
-"drive0": [
+  },
   {
 "busy": false,
 "count": 0,
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  },
+  }
+],
+"drive0": [
   {
 "busy": false,
 "count": 458752,
@@ -596,16 +596,16 @@ write -P0x67 0x3fe 0x2
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  }
-],
-"drive0": [
+  },
   {
 "busy": false,
 "count": 0,
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  },
+  }
+],
+"drive0": [
   {
 "busy": false,
 "count": 458752,
@@ -865,16 +865,16 @@ write -P0x67 0x3fe 0x2
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  }
-],
-"drive0": [
+  },
   {
 "busy": false,
 "count": 0,
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  },
+  }
+],
+"drive0": [
   {
 "busy": false,
 "count": 458752,
@@ -1341,16 +1341,16 @@ write -P0x67 0x3fe 0x2
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  }
-],
-"drive0": [
+  },
   {
 "busy": false,
 "count": 0,
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  },
+  }
+],
+"drive0": [
   {
 "busy": false,
 "count": 458752,
@@ -1610,16 +1610,16 @@ write -P0x67 0x3fe 0x2
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  }
-]

Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept

2022-03-31 Thread Paolo Bonzini

On 3/31/22 15:51, Emanuele Giuseppe Esposito wrote:


bdrv_graph_list_wrlock <-> start_exclusive
bdrv_graph_list_wrunlock <-> end_exclusive
bdrv_graph_list_rdlock <-> cpu_exec_start
bdrv_graph_list_rdunlock <-> cpu_exec_end


This wouldn't protect the list but the whole graph, i.e. the parents and 
children of all BDSes.  So the functions would be:


  bdrv_graph_wrlock <-> start_exclusive
  bdrv_graph_wrunlock <-> end_exclusive
  bdrv_graph_rdlock <-> cpu_exec_start
  bdrv_graph_rdunlock <-> cpu_exec_end


The list itself would be used internally to implement the write-side 
lock and unlock primitives, but it would not be protected by the above 
functions.  So there would be a couple additional functions:


  bdrv_graph_list_lock <-> cpu_list_lock
  bdrv_graph_list_unlock <-> cpu_list_unlock


+void bdrv_graph_list_rdlock(BlockDriverState *bs);
+void bdrv_graph_list_rdunlock(BlockDriverState *bs);


Apart from the naming change, these two would be coroutine_fn.


+#define BS_GRAPH_READER(bs) /* in main loop OR bs->reading_graph */
+#define BS_GRAPH_WRITER(bs) /* in main loop AND bs->bs_graph_pending_op


bs_graph_pending_op is not part of bs->, it is a global variable 
(corresponding to pending_cpus in cpus-common.c).  I would call it 
bs_graph_pending_reader since you have "has_writer" below.


Also, this second #define does not need an argument, and is really the 
same as assert_bdrv_graph_writable(bs).  So perhaps you can rename the 
first one to assert_bdrv_graph_readable(bs).




+/*
+ * If true, the main loop is modifying the graph.
+ * bs cannot read the graph.
+ * Protected by bs_graph_list_lock.
+ */
+bool has_writer;


Note that it's "has_waiter" in cpus-common.c. :)  has_writer is fine too.

Paolo



Re: [PATCH v2 05/17] iotests/040: Fix TestCommitWithFilters test

2022-03-31 Thread John Snow
On Thu, Mar 24, 2022 at 9:33 PM Eric Blake  wrote:
>
> On Thu, Mar 24, 2022 at 02:30:06PM -0400, John Snow wrote:
> > Without this change, asserting that qemu_io always returns 0 causes this
> > test to fail in a way we happened not to be catching previously:
> >
> >  qemu.utils.VerboseProcessError: Command
> >   '('/home/jsnow/src/qemu/bin/git/tests/qemu-iotests/../../qemu-io',
> >   '--cache', 'writeback', '--aio', 'threads', '-f', 'qcow2', '-c',
> >   'read -P 4 3M 1M',
> >   '/home/jsnow/src/qemu/bin/git/tests/qemu-iotests/scratch/3.img')'
> >   returned non-zero exit status 1.
> >   ┏━ output 
> >   ┃ qemu-io: can't open device
> >   ┃ /home/jsnow/src/qemu/bin/git/tests/qemu-iotests/scratch/3.img:
> >   ┃ Could not open backing file: Could not open backing file: Throttle
> >   ┃ group 'tg' does not exist
> >   ┗━
> >
> > Explicitly provide the backing file so that opening the file outside of
> > QEMU (Where we will not have throttle groups) will succeed.
> >
> > [Patch entirely written by Hanna but I don't have her S-o-B]
>
> Yeah, you'll want that.
>
> > [My commit message is probably also garbage, sorry]
>
> No, it was actually decent.
>
> > [Feel free to suggest a better one]
> > [I hope your day is going well]
> > Signed-off-by: John Snow 
> >
> > Signed-off-by: John Snow 
>
> So giving your S-o-b twice makes up for it, right ;)

This happens when I add a '---' myself into the commit message, and
git-publish sees that the end of the commit message doesn't have a
S-o-B and adds one into the ignored region.
Haven't bothered to fix it yet.

>
> Well, you did say v3 would fix this.  But while you're having fun
> fixing it, you can add:
>
> Reviewed-by: Eric Blake 
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>




Re: [PATCH] hw/vhost-user-blk: fix ioeventfd add failed when start reenter

2022-03-31 Thread Raphael Norwitz
High level looks good but I have some questions.

Rather than a new boolean I'd rather we re-used started_vu by changing
it to an enum and having different values for starting and started.

On Tue, Mar 29, 2022 at 12:15:46AM +0800, Jie Wang wrote:
> During Virtio1.0 dev(start_on_kick) in vhost_user_blk_start process,
> if spdk abnormal after vhost_dev_enable_notifiers, then vhost_user_blk_start 
> will
> goto vhost_dev_disable_notifiers and reenter vhost_user_blk_start, and
> add ioeventfd again.
> 
> func call Process as follows:
> vhost_user_blk_start(spdk abnormal after vhost_dev_enable_notifiers)
> ->vhost_dev_disable_notifiers
> ->virtio_bus_cleanup_host_notifier
> ->virtio_queue_host_notifier_read
> ->virtio_queue_notify_vq
> ->vhost_user_blk_handle_output
> ->vhost_user_blk_start
> ->vhost_dev_enable_notifiers
> 
> then kvm_mem_ioeventfd_add will failed with errno17(File exists) and
> abort().
> 
> The GDB stack is as follows:
> (gdb) bt
> 0  0x7fca4264c81b in raise () from /usr/lib64/libc.so.6
> 1  0x7fca4264db41 in abort () from /usr/lib64/libc.so.6
> 2  0x7fca423ebe8b in kvm_mem_ioeventfd_add
> 3  0x7fca4241c816 in address_space_add_del_ioeventfds
> 4  0x7fca4241ddc6 in address_space_update_ioeventfds
> 5  0x7fca424203d5 in memory_region_commit ()
> 6  0x7fca424204e5 in memory_region_transaction_commit ()
> 7  0x7fca42421861 in memory_region_add_eventfd
> 8  0x7fca42917a4c in virtio_pci_ioeventfd_assign
> 9  0x7fca41054178 in virtio_bus_set_host_notifier
> 10 0x7fca41058729 in vhost_dev_enable_notifiers
> 11 0x7fca40fdec1e in vhost_user_blk_start
> 12 0x7fca40fdefa8 in vhost_user_blk_handle_output
> 13 0x7fca4104e135 in virtio_queue_notify_vq
> 14 0x7fca4104f192 in virtio_queue_host_notifier_read
> 15 0x7fca41054054 in virtio_bus_cleanup_host_notifier
> 16 0x7fca41058916 in vhost_dev_disable_notifiers
> 17 0x7fca40fdede0 in vhost_user_blk_start
> 18 0x7fca40fdefa8 in vhost_user_blk_handle_output
> 19 0x7fca41050a6d in virtio_queue_notify
> 20 0x7fca4241bbae in memory_region_write_accessor
> 21 0x7fca4241ab1d in access_with_adjusted_size
> 22 0x7fca4241e466 in memory_region_dispatch_write
> 23 0x7fca4242da36 in flatview_write_continue
> 24 0x7fca4242db75 in flatview_write
> 25 0x7fca42430beb in address_space_write
> 26 0x7fca42430c25 in address_space_rw
> 27 0x7fca423e8ecc in kvm_handle_io
> 28 0x7fca423ecb48 in kvm_cpu_exec
> 29 0x7fca424279d5 in qemu_kvm_cpu_thread_fn
> 30 0x7fca423c9480 in qemu_thread_start
> 31 0x7fca4257ff3b in ?? () from /usr/lib64/libpthread.so.0
> 32 0x7fca4270b550 in clone () from /usr/lib64/libc.so.6
> 
> Signed-off-by: Jie Wang 
> ---
>  hw/block/vhost-user-blk.c  | 12 +++-
>  include/hw/virtio/vhost-user-blk.h |  2 ++
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 1a42ae9187..2182769676 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -124,6 +124,13 @@ static int vhost_user_blk_start(VirtIODevice *vdev, 
> Error **errp)
>  VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
>  int i, ret;
>

I would like a comment here explaining the check.

Also wouldn't you want to set starting irrespective of whether
start_on_kick is set?

> +if (vdev->start_on_kick) {
> +if (s->starting) {
> +return 0;
> +}
> +s->starting = true;
> +}
> +
>  if (!k->set_guest_notifiers) {
>  error_setg(errp, "binding does not support guest notifiers");
>  return -ENOSYS;
> @@ -178,6 +185,8 @@ static int vhost_user_blk_start(VirtIODevice *vdev, Error 
> **errp)
>  vhost_virtqueue_mask(&s->dev, vdev, i, false);
>  }
>  
> +s->starting = false;
> +
>  return ret;
>  
>  err_guest_notifiers:
> @@ -344,7 +353,7 @@ static int vhost_user_blk_connect(DeviceState *dev, Error 
> **errp)
>  }
>  
>  /* restore vhost state */

Can you explain the case where we could enter this path? If the device
is starting and there is a full reconnect, why would we want to enter
vhost_user_blk_start() again? Seems like allowing it to enter
vhost_user_blk_start could cause the same issue?

> -if (virtio_device_started(vdev, vdev->status)) {
> +if (s->starting || virtio_device_started(vdev, vdev->status)) {
>  ret = vhost_user_blk_start(vdev, errp);
>  if (ret < 0) {
>  return ret;
> @@ -500,6 +509,7 @@ static void vhost_user_blk_device_realize(DeviceState 
> *dev, Error **errp)
>  vhost_user_blk_handle_output);
>  }
>  
> +s->starting = false;
>  s->inflight = g_new0(struct vhost_inflight, 1);
>  s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues);
>  
> diff --git a/include/hw/virtio/vhost-user-blk.h 
> b/include/hw/virtio/vhost-user-blk.h
> index 7c91f15040..6e67f36962 1

Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept

2022-03-31 Thread Emanuele Giuseppe Esposito



Am 31/03/2022 um 11:59 schrieb Paolo Bonzini:
> On 3/30/22 18:02, Paolo Bonzini wrote:
>> On 3/30/22 12:53, Hanna Reitz wrote:

 Seems a good compromise between drains and rwlock. What do you think?
>>>
>>> Well, sounds complicated.  So I’m asking myself whether this would be
>>> noticeably better than just an RwLock for graph modifications, like
>>> the global lock Vladimir has proposed.
> 
> [try again, this time with brain connected]
> 
> A global lock would have to be taken by all iothreads on every I/O
> operation, even a CoRwLock would not scale because it has a global
> CoMutex inside and rdlock/unlock both take it.  Even if the critical
> section is small, the cacheline bumping would be pretty bad.
> 
> Perhaps we could reuse the code in cpus-common.c, which relies on a list
> of possible readers and is quite cheap (two memory barriers basically)
> for readers.  Here we would have a list of AioContexts (or perhaps
> BlockDriverStates?) as the possible readers.
> 
> The slow path uses a QemuMutex, a QemuCond for the readers and a
> QemuCond for the writers.  The reader QemuCond can be replaced by a
> CoQueue, I think.
> 

It would be something like this:
Essentially move graph_bdrv_states as public, so that we can iterate
through all bs as cpu-common.c does with cpus, and then duplicate the
already existing API in cpu-common.c:

bdrv_graph_list_wrlock <-> start_exclusive
bdrv_graph_list_wrunlock <-> end_exclusive
bdrv_graph_list_rdlock <-> cpu_exec_start
bdrv_graph_list_rdunlock <-> cpu_exec_end

The only difference is that there is no qemu_cpu_kick(), but I don't
think it matters.

What do you think?

diff --git a/block.c b/block.c
index 718e4cae8b..af8dd37101 100644
--- a/block.c
+++ b/block.c
@@ -52,6 +52,7 @@
 #include "qemu/range.h"
 #include "qemu/rcu.h"
 #include "block/coroutines.h"
+#include "block/block-list.h"

 #ifdef CONFIG_BSD
 #include 
@@ -67,10 +68,6 @@

 #define NOT_DONE 0x7fff /* used while emulated sync operation in
progress */

-/* Protected by BQL */
-static QTAILQ_HEAD(, BlockDriverState) graph_bdrv_states =
-QTAILQ_HEAD_INITIALIZER(graph_bdrv_states);
-
 /* Protected by BQL */
 static QTAILQ_HEAD(, BlockDriverState) all_bdrv_states =
 QTAILQ_HEAD_INITIALIZER(all_bdrv_states);
diff --git a/include/block/block-list.h b/include/block/block-list.h
new file mode 100644
index 00..d55a056938
--- /dev/null
+++ b/include/block/block-list.h
@@ -0,0 +1,54 @@
+#ifndef BLOCK_LIST_H
+#define BLOCK_LIST_H
+
+#include "qemu/osdep.h"
+
+
+/* Protected by BQL */
+QTAILQ_HEAD(,BlockDriverState) graph_bdrv_states =
QTAILQ_HEAD_INITIALIZER(graph_bdrv_states);
+
+void bdrv_init_graph_list_lock(void);
+
+/*
+ * bdrv_graph_list_wrlock:
+ * Modify the graph. Nobody else is allowed to access the graph.
+ * Set pending op >= 1, so that the next readers will wait in a
coroutine queue.
+ * Then keep track of the running readers using bs->has_writer,
+ * and wait until they finish.
+ */
+void bdrv_graph_list_wrlock(void);
+
+/*
+ * bdrv_graph_list_wrunlock:
+ * Write finished, reset pending operations to 0 and restart
+ * all readers that are waiting.
+ */
+void bdrv_graph_list_wrunlock(void);
+
+/*
+ * bdrv_graph_list_rdlock:
+ * Read the bs graph. Set bs->reading_graph true, and if there are pending
+ * operations, it means that the main loop is modifying the graph,
+ * therefore wait in exclusive_resume coroutine queue.
+ * If this read is coming while the writer has already marked the
+ * running read requests and it's waiting for them to finish,
+ * wait that the write finishes first.
+ * Main loop will then wake this coroutine once it is done.
+ */
+void bdrv_graph_list_rdlock(BlockDriverState *bs);
+
+/*
+ * bdrv_graph_list_rdunlock:
+ * Read terminated, decrease the count of pending operations.
+ * If it's the last read that the writer is waiting for, signal
+ * the writer that we are done and let it continue.
+ */
+void bdrv_graph_list_rdunlock(BlockDriverState *bs);
+
+
+
+
+#define BS_GRAPH_READER(bs) /* in main loop OR bs->reading_graph */
+#define BS_GRAPH_WRITER(bs) /* in main loop AND bs->bs_graph_pending_op
> 0 */
+#endif /* BLOCK_LIST_H */
+
diff --git a/include/block/block_int-common.h
b/include/block/block_int-common.h
index 5a04c778e4..0266876a59 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -985,6 +985,20 @@ struct BlockDriverState {
 bool force_share; /* if true, always allow all shared permissions */
 bool implicit;  /* if true, this filter node was automatically
inserted */

+/*
+ * If true, the bs is reading the graph.
+ * No write should happen in the meanwhile.
+ * Atomic.
+ */
+bool reading_graph;
+
+/*
+ * If true, the main loop is modifying the graph.
+ * bs cannot read the graph.
+ * Protected by bs_graph_list_lock.
+ */
+bool has_writer;
+
 BlockDriver *drv; /* NULL means no media */
 void *opaque;




Re: [PATCH v7 12/12] hw/acpi: Make the PCI hot-plug aware of SR-IOV

2022-03-31 Thread Igor Mammedov
On Fri, 18 Mar 2022 20:18:19 +0100
Lukasz Maniak  wrote:

> From: Łukasz Gieryk 
> 
> PCI device capable of SR-IOV support is a new, still-experimental
> feature with only a single working example of the Nvme device.
> 
> This patch in an attempt to fix a double-free problem when a
> SR-IOV-capable Nvme device is hot-unplugged. The problem and the
> reproduction steps can be found in this thread:
> 
> https://patchew.org/QEMU/20220217174504.1051716-1-lukasz.man...@linux.intel.com/20220217174504.1051716-14-lukasz.man...@linux.intel.com/

pls include that in patch description.

> Details of the proposed solution are, for convenience, included below.
> 
> 1) The current SR-IOV implementation assumes it’s the PhysicalFunction
>that creates and deletes VirtualFunctions.
> 2) It’s a design decision (the Nvme device at least) for the VFs to be
>of the same class as PF. Effectively, they share the dc->hotpluggable
>value.
> 3) When a VF is created, it’s added as a child node to PF’s PCI bus
>slot.
> 4) Monitor/device_del triggers the ACPI mechanism. The implementation is
>not aware of SR/IOV and ejects PF’s PCI slot, directly unrealizing all
>hot-pluggable (!acpi_pcihp_pc_no_hotplug) children nodes.
> 5) VFs are unrealized directly, and it doesn’t work well with (1).
>SR/IOV structures are not updated, so when it’s PF’s turn to be
>unrealized, it works on stale pointers to already-deleted VFs.
it's unclear what's bing hotpluged and unplugged, it would be better if
you included QEMU CLI and relevan qmp/monito commands to reproduce it.

> 
> Signed-off-by: Łukasz Gieryk 
> ---
>  hw/acpi/pcihp.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 6351bd3424d..248839e1110 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -192,8 +192,12 @@ static bool acpi_pcihp_pc_no_hotplug(AcpiPciHpState *s, 
> PCIDevice *dev)
>   * ACPI doesn't allow hotplug of bridge devices.  Don't allow
>   * hot-unplug of bridge devices unless they were added by hotplug
>   * (and so, not described by acpi).
> + *
> + * Don't allow hot-unplug of SR-IOV Virtual Functions, as they
> + * will be removed implicitly, when Physical Function is unplugged.
>   */
> -return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable;
> +return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable ||
> +   pci_is_vf(dev);
>  }
>  
>  static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned 
> slots)




Re: [PATCH experiment 00/16] C++20 coroutine backend

2022-03-31 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> On Tue, Mar 15, 2022 at 03:50:26PM +0100, Kevin Wolf wrote:

[...]

>> ...using C++ in coroutine code means that all of the block layer would
>> suddenly become C++ and would be most affected by this effect. I'm not
>> sure if that's something I would like to see, at least until there is a
>> clear tree-wide policy (that preferably limits it to a subset that I
>> understand).
>
> Expecting maintainers to enforce a subset during code review feels
> like it would be a tedious burden, that will inevitably let stuff
> through because humans are fallible, especially when presented
> with uninspiring, tedious, repetitive tasks.
>
> Restricting ourselves to a subset is only viable if we have
> an automated tool that can reliably enforce that subset.

Concur.  We're talking about a complex subset of an even more complex
whole, where few to none of us will fully understand either.

>  I'm not
> sure that any such tool exists, and not convinced our time is
> best served by trying to write & maintainer one either.
>
> IOW, I fear one we allow C++ in any level, it won't be practical
> to constrain it as much we desire. I fear us turning QEMU into
> even more of a monster like other big C++ apps I see which take
> all hours to compile while using all available RAM in Fedora RPM
> build hosts.

While I'm certainly concerned about the often poor ergonomics of
compiling C++ code, I'm even more concerned about the poor ergonomics of
*grokking* C++ code.

> My other question is whether adoption of C++ would complicate any
> desire to make more use of Rust in QEMU ? I know Rust came out of
> work by the Mozilla Firefox crew, and Firefox was C++, but I don't
> have any idea how they integrated use of Rust with Firefox, so
> whether there are any gotcha's for us or not ?

Good question.




Re: [PATCH experiment 00/16] C++20 coroutine backend

2022-03-31 Thread Markus Armbruster
Hanna Reitz  writes:

> On 17.03.22 16:11, Paolo Bonzini wrote:
>> On 3/16/22 13:32, Stefan Hajnoczi wrote:
>>> You can define rules and a way to enforce a subset of C++, but I think
>>> over time the code will be C++. A policy that is complicated discourages
>>> contributors.
>>>
>>> For these reasons I think that if code runs through a C++ compiler we
>>> should just allow C++. Either way, it will take time but that way no one
>>> will feel betrayed when C++ creeps in.
>>
>> Fair enough.  We even already have some conventions that will make
>> any C++ that creeps in less weird (for example, mandatory typedef of 
>> structs).
>>
>> I don't think it would be a big deal overall.  I actually agree that
>> we should "just allow C++", what matters more to have style rules
>> that make QEMU's flavors of C and C++ consistent.
>
> I just want to throw in that I personally am absolutely not confident
> in reviewing C++ code.

Me neither.

> As for Rust, you keep mentioning that we don’t
> have anyone who would “shepherd us through the learning experience”,
> but I find the very same argument applies to C++.
>
> C++ may start out looking like C, but if used ideally, then it is a
> very different language, too.  I understand the difference is that we
> can incrementally convert our C code to C++,

Bad C++ in need of rework, presumably.

>  but I’m not comfortable 
> overseeing that process, which I would have to do as a maintainer. 

That makes two of us.

The plain language version of "I'm not comfortable serving" is of course
"I do not intend to serve".

> Assuming our overall stance does change to “just allowing C++”, I’d
> feel unjust if I were to reject C++isms just based on the fact that I
> don’t know C++, so I’d be forced to learn C++.  I’m not strictly
> opposed to that (though from experience I’m more than hesitant), but
> forcing maintainers to learn C++ is something that has a cost
> associated with it.

I'm a lot more interested in learning Rust than in (re-)learning C++.

> My biggest gripe about C++ is that as far as I understand there are
> many antipatterns, many of which I think stem from the exact fact that
> it is kind of compatible with C, and so you can easily accidentally
> write really bad C++ code; but without years of experience, I’m
> absolutely not confident that I could recognize them.  Now, I might
> say I’d just disallow complicated stuff, and keep everything C-like,
> but from what I’ve heard, C-like C++ seems to be exactly one case that
> is considered bad C++.  I’m really under the impression that I’d need
> years of experience to discern good from bad C++, and I don’t have
> that experience.

Right.




Re: qemu iotest 161 and make check

2022-03-31 Thread Li Zhang

On 3/31/22 09:44, Christian Borntraeger wrote:



Am 21.02.22 um 11:27 schrieb Christian Borntraeger:


Am 10.02.22 um 18:44 schrieb Vladimir Sementsov-Ogievskiy:

10.02.2022 20:13, Thomas Huth wrote:

On 10/02/2022 15.51, Christian Borntraeger wrote:



Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy:

10.02.2022 10:57, Christian Borntraeger wrote:

Hello,

I do see spurious failures of 161 in our CI, but only when I use
make check with parallelism (-j).
I have not yet figured out which other testcase could interfere

@@ -34,6 +34,8 @@
  *** Commit and then change an option on the backing file

  Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
+qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock


FWIW, qemu_lock_fd_test returns -11 (EAGAIN)
and raw_check_lock_bytes spits this error.



I also run into this issue on S390 when running test cases.
I think it will report this "write" lock error if different processes 
are using the same image.




Is this just some overload situation that we do not recover because we 
do not handle EAGAIN any special.








Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept

2022-03-31 Thread Paolo Bonzini

On 3/30/22 18:02, Paolo Bonzini wrote:

On 3/30/22 12:53, Hanna Reitz wrote:


Seems a good compromise between drains and rwlock. What do you think?


Well, sounds complicated.  So I’m asking myself whether this would be 
noticeably better than just an RwLock for graph modifications, like 
the global lock Vladimir has proposed.


[try again, this time with brain connected]

A global lock would have to be taken by all iothreads on every I/O
operation, even a CoRwLock would not scale because it has a global
CoMutex inside and rdlock/unlock both take it.  Even if the critical
section is small, the cacheline bumping would be pretty bad.

Perhaps we could reuse the code in cpus-common.c, which relies on a list
of possible readers and is quite cheap (two memory barriers basically)
for readers.  Here we would have a list of AioContexts (or perhaps 
BlockDriverStates?) as the possible readers.


The slow path uses a QemuMutex, a QemuCond for the readers and a
QemuCond for the writers.  The reader QemuCond can be replaced by a
CoQueue, I think.

Paolo




Re: qemu iotest 161 and make check

2022-03-31 Thread Christian Borntraeger




Am 31.03.22 um 09:44 schrieb Christian Borntraeger:



Am 21.02.22 um 11:27 schrieb Christian Borntraeger:


Am 10.02.22 um 18:44 schrieb Vladimir Sementsov-Ogievskiy:

10.02.2022 20:13, Thomas Huth wrote:

On 10/02/2022 15.51, Christian Borntraeger wrote:



Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy:

10.02.2022 10:57, Christian Borntraeger wrote:

Hello,

I do see spurious failures of 161 in our CI, but only when I use
make check with parallelism (-j).
I have not yet figured out which other testcase could interfere

@@ -34,6 +34,8 @@
  *** Commit and then change an option on the backing file

  Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
+qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock


FWIW, qemu_lock_fd_test returns -11 (EAGAIN)
and raw_check_lock_bytes spits this error.



And its coming from here (ret is 0)

int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive)
{
int ret;
struct flock fl = {
.l_whence = SEEK_SET,
.l_start  = start,
.l_len= len,
.l_type   = exclusive ? F_WRLCK : F_RDLCK,
};
qemu_probe_lock_ops();
ret = fcntl(fd, fcntl_op_getlk, &fl);
if (ret == -1) {
return -errno;
} else {
->return fl.l_type == F_UNLCK ? 0 : -EAGAIN;
}
}




Is this just some overload situation that we do not recover because we do not 
handle EAGAIN any special.




Re: qemu iotest 161 and make check

2022-03-31 Thread Christian Borntraeger




Am 21.02.22 um 11:27 schrieb Christian Borntraeger:


Am 10.02.22 um 18:44 schrieb Vladimir Sementsov-Ogievskiy:

10.02.2022 20:13, Thomas Huth wrote:

On 10/02/2022 15.51, Christian Borntraeger wrote:



Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy:

10.02.2022 10:57, Christian Borntraeger wrote:

Hello,

I do see spurious failures of 161 in our CI, but only when I use
make check with parallelism (-j).
I have not yet figured out which other testcase could interfere

@@ -34,6 +34,8 @@
  *** Commit and then change an option on the backing file

  Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
+qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock


FWIW, qemu_lock_fd_test returns -11 (EAGAIN)
and raw_check_lock_bytes spits this error.


Is this just some overload situation that we do not recover because we do not 
handle EAGAIN any special.