Am 08.10.2015 um 13:02 hat Kevin Wolf geschrieben: > Am 08.10.2015 um 08:15 hat Markus Armbruster geschrieben: > > Max Reitz <mre...@redhat.com> writes: > > > E.g. you may have a block filter in the future where you want to > > > exchange its child BDS. This exchange should be an atomic operation, so > > > we cannot use this interface there anyway. For quorum, such an exchange > > > does not need to be atomic, since you can just add the new child first > > > and remove the old one afterwards. > > > > > > So maybe in the future we get some block driver other than quorum for > > > which adding and removing children (as opposed to atomically exchanging > > > them) makes sense, but for now I can only see quorum. Therefore, that > > > this works for quorum only is in my opinion not a reason to make it > > > experimental. I think we actually want to keep it that way. > > > > Are you telling us the existing interface is insufficiently general? > > That the general interface neeeds to support atomic replacement? > > > > If yes, why isn't the general interface is what we should do for quorum? > > Delete is atomic replacement by nothing, add is atomic replacement of > > nothing. > > The general thing is what we used to call "dynamic reconfiguration". If > we want a single command to enable it (which would be great if we > could), we need to some more design work first. Atomic replacement might > be the operation we're looking for, but we have to be sure. > > So far we haven't thought about dynamic reconfiguation enough that we > would know the right solution, but enough that we know it's hard. That > would be an argument for me that makes adding an x-* command now > acceptable. On the other hand, the fact that we want a single command in > the end makes me want to keep it experimental. > > > What types of dynamic reconfiguration do we need to support? I'll start > with a small list, feel free to extend it: > > * Replace a child node with another node. This works pretty much > everywhere in the tree - including the root, i.e. BlockBackend! > Working just on BDSes doesn't seem to be enough. > > * Adding a child to a node that can take additional children (e.g. > quorum can take an arbitrary number; but also COW image formats have > an option child, so you could add a backing file to a node originally > opened with BDRV_O_NO_BACKING) > > Same as atomically replacing nothing by a node. > > * Removing an optional child from a node that remains valid with that > child removed. The same examples apply. > > Same as atomically replacing a child by nothing. > > * Add a new node between two existing nodes. An example is taking a live > snapshot, which inserts a new node between the BlockBackend and the > first BDS. Or it could be used to insert a filter somewhere in the > graph. > > Same as creating the new node pointing to node B (or dynamically > adding it) and then atomically replacing the reference of node A that > pointed to B with a reference to the new node. > > * Add a new node between multiple existing nodes. This is one of the > examples we always used to discuss with dynamic reconfiguration: > > base <- sn1 <- sn2 <--+-- virtio-blk > | > +-- NBD server > > Adding a filter could result in this: > > base <- sn1 <- sn2 <- throttling <--+-- virtio-blk > | > +-- NBD server > > Or this: > > base <- sn1 <- sn2 <--+-- throttling <- virtio-blk > | > +-- NBD server > > Or this: > > base <- sn1 <- sn2 <--+-- virtio-blk > | > +-- throttling <- NBD server > > All of these are different kinds of "adding a filter", and all of > them are valid operations that a user could want to perform. > > Case 2 and 3 are really just "add a new node between two existing > nodes", as explained above. > > Case 1 is new: We still create the throttling node so that it already > points to sn2, but now we have to atomically change the children of > two things (the BlockBackends of virtio-blk and the NBD server). Not a > problem per se because we can just do that, but it raises the question > whether the atomic replacement operation needs to be transactionable. > > * Remove a node between two (or more) other nodes. > > Same as atomically replacing a child by a grandchild. For more than > two involved nodes, again a transactional version might be needed. > > So at the first sight, this operation seems to work as the general > interface for dynamic reconfiguration.
And actually, after re-reading the other branch of this email thread where I replied to Berto that he wants to use bdrv_reopen() to change the voting threshold for quorum, it occurred to me that bdrv_reopen() should possibly also be this atomatic replacement function. After all, the references to other nodes are blockdev-add options, and if we implement bdrv_reopen() fully so that it can change any options that can be given to blockdev-add, that means that we must be able to replace children. Kevin > One problem we discussed earlier that I'm not sure whether it's related > is filter nodes inserted automatically once we change e.g. the I/O > throttling QMP commands to add a throttling filter BDS to the graph. > > If the user creates nodes A and B, but sets throttling options, they > might end up with a chain like this: > > A <- throttling <- B > > Now imagine that they want to add another filter between A and B, let's > say blkdebug. They would need to know that they have to insert the new > node between A and throttling or B and throttling, but not between A and > B. If they tried to insert it between A and B, the algorithm above says > that they would let blkdebug point to A, and replace B's child with > blkdebug, the resulting tree wouldn't be throttled any more! > > A <- blkdebug <- B > > throttling (ref = 0 -> delete) > > Even if they knew that they have to consider the throttling node, it > currently wouldn't have a node-name, and with Jeff's autogenerated names > it wouldn't be predictable. > > Maybe the dynamic reconfiguration interface does need to be a bit > cleverer. > > > Anyway, after writing all of this, I'm almost convinced now that an > experimental interface is the right thing to do in this patch series. > > Kevin >