On 18/11/2021 10:55, Emanuele Giuseppe Esposito wrote:
On 12/11/2021 15:40, Hanna Reitz wrote:
On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:
We want to be sure that the functions that write the child and
parent list of a bs are under BQL and drain.
BQL prevents from concurrent writings from the GS API, while
drains protect from I/O.
TODO: drains are missing in some functions using this assert.
Therefore a proper assertion will fail. Because adding drains
requires additional discussions, they will be added in future
series.
Signed-off-by: Emanuele Giuseppe Esposito <eespo...@redhat.com>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
---
block.c | 5 +++++
block/io.c | 11 +++++++++++
include/block/block_int-global-state.h | 10 +++++++++-
3 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/block.c b/block.c
index 41c5883c5c..94bff5c757 100644
--- a/block.c
+++ b/block.c
@@ -2734,12 +2734,14 @@ static void
bdrv_replace_child_noperm(BdrvChild *child,
if (child->klass->detach) {
child->klass->detach(child);
}
+ assert_bdrv_graph_writable(old_bs);
QLIST_REMOVE(child, next_parent);
I think this belongs above the .detach() call (and the QLIST_REMOVE()
belongs into the .detach() implementation, as done in
https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00240.html, which
has been merged to Kevin’s block branch).
Yes, I rebased on kwolf/block branch. Thank you for pointing that out.
}
child->bs = new_bs;
if (new_bs) {
+ assert_bdrv_graph_writable(new_bs);
QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
In both these places it’s a bit strange that the assertion is done on
the child nodes. The subgraph starting from them isn’t modified after
all, so their subgraph technically doesn’t need to be writable. I
think a single assertion on the parent node would be preferable.
I presume the problem with that is that we don’t have the parent node
here? Do we need a new BdrvChildClass method that performs this
assertion on the parent node?
Uhm I am not sure what you mean here.
Just to recap on how I see this: the assertion
assert_bdrv_graph_writable(bs) is basically used to make sure we are
protecting the write on some fields (childrens and parents lists in this
patch) of a given @bs. It should work like a rwlock: reading is allowed
to be concurrent, but a write should stop all readers to prevent
concurrency issues. This is achieved by draining.
I am thinking to add an additional explanation to
assert_bdrv_graph_writable header comment by saying
"Drains act as a rwlock: while reading is allowed to be concurrent from
all iothreads, when a write needs to be performed we need to stop
(drain) all involved iothreads from reading the graph, to avoid race
conditions."
Somethink like that.
Emanuele
Let's use the first case that you point out, old_bs (it's specular for
new_bs):
>> + assert_bdrv_graph_writable(old_bs);
>> QLIST_REMOVE(child, next_parent);
So old_bs should be the child "son" (child->bs), meaning old_bs->parents
contains the child. Therefore when a child is removed by old_bs, we need
to be sure we are doing it safely.
So we should check that if old_bs exists, old_bs should be drained, to
prevent any other iothread from reading the ->parents list that is being
updated.
The only thing to keep in mind in this case is that just wrapping a
drain around that won't be enough, because then the child won't be
included in the drain_end(old_bs). Therefore the right way to cover this
drain-wise once the assertion also checks for drains is:
drain_begin(old_bs)
assert_bdrv_graph_writable(old_bs)
QLIST_REMOVE(child, next_parent)
/* old_bs will be under drain_end, but not the child */
bdrv_parent_drained_end_single(child);
bdrv_drained_end(old_bs);
I think you agree on this so far.
Now I think your concern is related to the child "parent", namely
child->opaque. The problem is that in the .detach and .attach callbacks
we are firstly adding/removing the child from the list, and then calling
drain on the subtree. We would ideally need to do the opposite:
assert_bdrv_graph_writable(bs);
QLIST_REMOVE(child, next);
bdrv_unapply_subtree_drain(child, bs);
In this case I think this would actually work, because removing/adding
the child from the ->children list beforehand just prevents an
additional recursion call (I think, and the fact that tests are passing
seems to confirm my theory).
Of course you know this stuff better than me, so let me know if
something here is wrong.
/*
@@ -2940,6 +2942,7 @@ static int
bdrv_attach_child_noperm(BlockDriverState *parent_bs,
return ret;
}
+ assert_bdrv_graph_writable(parent_bs);
QLIST_INSERT_HEAD(&parent_bs->children, *child, next);
/*
* child is removed in bdrv_attach_child_common_abort(), so
don't care to
@@ -3140,6 +3143,7 @@ static void
bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild *child,
void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
{
assert(qemu_in_main_thread());
+ assert_bdrv_graph_writable(parent);
It looks to me like we have this assertion mainly because
bdrv_replace_child_noperm() doesn’t have a pointer to this parent
node. It’s a workaround, but we should have this in every path that
eventually ends up at bdrv_replace_child_noperm(), and that seems
rather difficult for the bdrv_replace_node() family of functions. That
to me sounds like it’d be good to have this as a BdrvChildClass function.
I think this assertion is wrong. There is no ->childrens or ->parents
manipulation here, it used to be in one of the function that it calls
internally, but now as you pointed out is moved to .attach and .detach.
So I will remove this.
Not sure about the BdrvChildClass function, feel free to elaborate more
if what I wrote above is wrong/does not make sense to you.
Thank you,
Emanuele
if (child == NULL) {
return;
}
@@ -4903,6 +4907,7 @@ static void
bdrv_remove_filter_or_cow_child_abort(void *opaque)
BdrvRemoveFilterOrCowChild *s = opaque;
BlockDriverState *parent_bs = s->child->opaque;
+ assert_bdrv_graph_writable(parent_bs);
QLIST_INSERT_HEAD(&parent_bs->children, s->child, next);
if (s->is_backing) {
parent_bs->backing = s->child;
diff --git a/block/io.c b/block/io.c
index f271ab3684..1c71e354d6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -740,6 +740,17 @@ void bdrv_drain_all(void)
bdrv_drain_all_end();
}
+void assert_bdrv_graph_writable(BlockDriverState *bs)
+{
+ /*
+ * TODO: this function is incomplete. Because the users of this
+ * assert lack the necessary drains, check only for BQL.
+ * Once the necessary drains are added,
+ * assert also for qatomic_read(&bs->quiesce_counter) > 0
+ */
+ assert(qemu_in_main_thread());
+}
+
/**
* Remove an active request from the tracked requests list
*
diff --git a/include/block/block_int-global-state.h
b/include/block/block_int-global-state.h
index d08e80222c..6bd7746409 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -316,4 +316,12 @@ void
bdrv_remove_aio_context_notifier(BlockDriverState *bs,
*/
void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
-#endif /* BLOCK_INT_GLOBAL_STATE*/
+/**
+ * Make sure that the function is either running under
+ * drain and BQL. The latter protects from concurrent writings
“either ... and” sounds wrong to me. I’d drop the “either” or say
“running under both drain and BQL”.
Hanna
+ * from the GS API, while the former prevents concurrent reads
+ * from I/O.
+ */
+void assert_bdrv_graph_writable(BlockDriverState *bs);
+
+#endif /* BLOCK_INT_GLOBAL_STATE */