On 25.09.19 16:12, Vladimir Sementsov-Ogievskiy wrote: > 20.09.2019 18:27, Max Reitz wrote: >> Signed-off-by: Max Reitz <mre...@redhat.com> >> --- >> block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 62 insertions(+) >> >> diff --git a/block/quorum.c b/block/quorum.c >> index 207054a64e..81b57dbae2 100644 >> --- a/block/quorum.c >> +++ b/block/quorum.c >> @@ -825,6 +825,67 @@ static bool >> quorum_recurse_is_first_non_filter(BlockDriverState *bs, >> return false; >> } >> >> +static bool quorum_recurse_can_replace(BlockDriverState *bs, >> + BlockDriverState *to_replace) >> +{ >> + BDRVQuorumState *s = bs->opaque; >> + int i; >> + >> + for (i = 0; i < s->num_children; i++) { >> + /* >> + * We have no idea whether our children show the same data as >> + * this node (@bs). It is actually highly likely that >> + * @to_replace does not, because replacing a broken child is >> + * one of the main use cases here. >> + * >> + * We do know that the new BDS will match @bs, so replacing >> + * any of our children by it will be safe. It cannot change >> + * the data this quorum node presents to its parents. >> + * >> + * However, replacing @to_replace by @bs in any of our >> + * children's chains may change visible data somewhere in >> + * there. We therefore cannot recurse down those chains with >> + * bdrv_recurse_can_replace(). >> + * (More formally, bdrv_recurse_can_replace() requires that >> + * @to_replace will be replaced by something matching the @bs >> + * passed to it. We cannot guarantee that.) >> + * >> + * Thus, we can only check whether any of our immediate >> + * children matches @to_replace. >> + * >> + * (In the future, we might add a function to recurse down a >> + * chain that checks that nothing there cares about a change >> + * in data from the respective child in question. For >> + * example, most filters do not care when their child's data >> + * suddenly changes, as long as their parents do not care.) >> + */ >> + if (s->children[i].child->bs == to_replace) { >> + Error *local_err = NULL; >> + >> + /* >> + * We now have to ensure that there is no other parent >> + * that cares about replacing this child by a node with >> + * potentially different data. >> + */ >> + s->children[i].to_be_replaced = true; >> + bdrv_child_refresh_perms(bs, s->children[i].child, &local_err); >> + > > So we are trying to answer on a question "is it ok to replace" it, by > cheating on > permission system... Possibly, it's a problem of general design, and instead > of > examining one subtree, we should ask all parents of to_replace node, are > they > OK with such replacement..
I’m not sure whether it’s cheating. We want to replace some node. A parent should be A-OK with that as long as it hasn’t frozen its child link, and as long as it doesn’t care about data changes (it should not have taken CONSISTENT_READ, and it must have shared WRITE). The only actual problem we have is that currently basically everything takes CONSISTENT_READ (which is completely fine), but the only thing that doesn’t is the mirror_top_bs, and that has exactly the problem of “I can only get away without CONSISTENT_READ if it was me who unshared it”. But that’s a different problem. I don’t think this is cheating. > Another idea is that it's strange to check permissions somewhere else than in > generic > permission check functions. But I've no idea how to handle it in permission > system. I don’t check the permissions, though. I let quorum take what it needs to allow changing one of its children. What is a problem is that I should keep the permissions tightened until the node is actually replaced and only then release them. But that turned out to be a huge mess so I resorted to just double-checking before mirror actually completes. Max
signature.asc
Description: OpenPGP digital signature