Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
On Fri, Jul 28, 2017 at 01:55:38PM +0200, Kevin Wolf wrote: > Am 28.07.2017 um 00:09 hat John Snow geschrieben: > > On 07/26/2017 02:23 PM, Manos Pitsidianakis wrote: > > > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote: > > > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote: > > I think I agree with Stefan's concerns that this is more of a > > temporary workaround for the benefit of jobs, but that there may well > > be other reasons (especially in the future) where graph manipulations > > may occur invisibly to the user. > > We should do our best to avoid this. Implicit graph changes only make > libvirt's life hard. I agree that we'll have many more graph changes in > the future, but let's keep them as explicit as possible. Speaking of implicit graph changes: the COLO replication driver (block/replication.c) uses the backup job internally and that requires before write notifiers. That means there is an implicit graph change when COLO decides to complete the backup job. Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
On Tue, Aug 01, 2017 at 03:50:36PM +0200, Kevin Wolf wrote: Am 31.07.2017 um 19:30 hat Manos Pitsidianakis geschrieben: On Fri, Jul 28, 2017 at 02:08:43PM +0200, Kevin Wolf wrote: > Am 27.07.2017 um 12:07 hat Stefan Hajnoczi geschrieben: > > On Wed, Jul 26, 2017 at 09:23:20PM +0300, Manos Pitsidianakis wrote: > > > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote: > > > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote: > > > > > This proposal follows a discussion we had with Kevin and Stefan on filter > > > > > node management. > > > > > > > > > > With block filter drivers arises a need to configure filter nodes on runtime > > > > > with QMP on live graphs. A problem with doing live graph modifications is > > > > > that some block jobs modify the graph when they are done and don't operate > > > > > under any synchronisation, resulting in a race condition if we try to insert > > > > > a filter node in the place of an existing edge. > > > > > > > > But maybe you are thinking about higher level race conditions between > > > > QMP commands. Can you give an example of the race? > > > > > > Exactly, synchronisation between the QMP/user actions. Here's an example, > > > correct me if I'm wrong: > > > > > > User issues a block-commit to this tree to commit A onto B: > > >BB -> A -> B > > > > > > This inserts the dummy mirror node at the top since this is a mirror job > > > (active commit) > > >BB -> dummy -> A -> B > > > > > > User wants to add a filter node F below the BB. First we create the node: > > >BB -> dummy -> A -> B > > > F / > > > > > > Commit finishes before we do block-insert-node, dummy and A is removed from > > > the BB's chain, mirror_exit calls bdrv_replace_node() > > > > > >BB > B > > >F -> / > > > > > > Now inserting F under BB will error since dummy does not exist any more. > > > > I see the diagrams but the flow isn't clear without the user's QMP > > commands. > > > > F is created using blockdev-add? What are the parameters and how does > > it know about dummy? I think this is an interesting question in itself > > because dummy is a transient internal node that QMP users don't > > necessarily know about. > > We can expect that users of block-insert-node also know about block job > filter nodes, simply because the former is newer than the latter. > > (I also don't like the name "dummy", this makes it sound like it's not > really a proper node. In reality, there is little reason to treat it > specially.) > > > What is the full block-insert-node command? > > > > > The proposal doesn't guard against the following: > > > User issues a block-commit to this tree to commit B onto C: > > >BB -> A -> B -> C > > > > > > The dummy node (commit's top bs is A): > > >BB -> A -> dummy -> B -> C > > > > > > blockdev-add of a filter we wish to eventually be between A and C: > > >BB -> A -> dummy -> B -> C > > > F/ > > > > > > if commit finishes before we issue block-insert-node (commit_complete() > > > calls bdrv_set_backing_hd() which only touches A) > > >BB -> A --> C > > > F -> dummy -> B / > > >resulting in error when issuing block-insert-node, > > > > > > else if we set the job to manual-delete and insert F: > > >BB -> A -> F -> dummy -> B -> C > > > deleting the job will result in F being lost since the job's top bs wasn't > > > updated. > > > > manual-delete is a solution for block jobs. The write threshold > > feature is a plain QMP command that in the future will create a > > transient internal node (before write notifier). > > Yes, that becomes a legacy QMP command then. The new way is blockdev-add > and block-insert-node for write threshold, too. > > Apart from that, a write threshold node never disappears by itself, it > is only inserted or removed in the context of a QMP command. That makes > it a lot less dangerous than automatic block completion. > > > I'm not sure it makes sense to turn the write threshold feature into a > > block job. I guess it could work, but it seems a little unnatural and > > maybe there will be other features that use transient internal nodes. > > Yeah, no reason to do so. Just a manually created filter node. > Can you explain what you have in mind? The current workflow is using block-set-write-threshold on the targetted bs. If we want write threshold to be on node level we would want a new filter driver so that it can take options special to write-threshold. Yes, I think that's what we want. Unless we make the notify filter be a write threshold by default, and when using it internally by the backup job, disable the threshold and add our backup notifier to the node's list. Of course the current write threshold code could be refactored into a new driver so that it doesn't have to rely on notifiers. As discussed on IRC, we shouldn't implement a bad interface (which notifiers are, they bypass normal block device c
Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
Am 31.07.2017 um 19:30 hat Manos Pitsidianakis geschrieben: > On Fri, Jul 28, 2017 at 02:08:43PM +0200, Kevin Wolf wrote: > > Am 27.07.2017 um 12:07 hat Stefan Hajnoczi geschrieben: > > > On Wed, Jul 26, 2017 at 09:23:20PM +0300, Manos Pitsidianakis wrote: > > > > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote: > > > > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote: > > > > > > This proposal follows a discussion we had with Kevin and Stefan on > > > > > > filter > > > > > > node management. > > > > > > > > > > > > With block filter drivers arises a need to configure filter nodes > > > > > > on runtime > > > > > > with QMP on live graphs. A problem with doing live graph > > > > > > modifications is > > > > > > that some block jobs modify the graph when they are done and don't > > > > > > operate > > > > > > under any synchronisation, resulting in a race condition if we try > > > > > > to insert > > > > > > a filter node in the place of an existing edge. > > > > > > > > > > But maybe you are thinking about higher level race conditions between > > > > > QMP commands. Can you give an example of the race? > > > > > > > > Exactly, synchronisation between the QMP/user actions. Here's an > > > > example, > > > > correct me if I'm wrong: > > > > > > > > User issues a block-commit to this tree to commit A onto B: > > > >BB -> A -> B > > > > > > > > This inserts the dummy mirror node at the top since this is a mirror job > > > > (active commit) > > > >BB -> dummy -> A -> B > > > > > > > > User wants to add a filter node F below the BB. First we create the > > > > node: > > > >BB -> dummy -> A -> B > > > > F / > > > > > > > > Commit finishes before we do block-insert-node, dummy and A is removed > > > > from > > > > the BB's chain, mirror_exit calls bdrv_replace_node() > > > > > > > >BB > B > > > >F -> / > > > > > > > > Now inserting F under BB will error since dummy does not exist any more. > > > > > > I see the diagrams but the flow isn't clear without the user's QMP > > > commands. > > > > > > F is created using blockdev-add? What are the parameters and how does > > > it know about dummy? I think this is an interesting question in itself > > > because dummy is a transient internal node that QMP users don't > > > necessarily know about. > > > > We can expect that users of block-insert-node also know about block job > > filter nodes, simply because the former is newer than the latter. > > > > (I also don't like the name "dummy", this makes it sound like it's not > > really a proper node. In reality, there is little reason to treat it > > specially.) > > > > > What is the full block-insert-node command? > > > > > > > The proposal doesn't guard against the following: > > > > User issues a block-commit to this tree to commit B onto C: > > > >BB -> A -> B -> C > > > > > > > > The dummy node (commit's top bs is A): > > > >BB -> A -> dummy -> B -> C > > > > > > > > blockdev-add of a filter we wish to eventually be between A and C: > > > >BB -> A -> dummy -> B -> C > > > > F/ > > > > > > > > if commit finishes before we issue block-insert-node (commit_complete() > > > > calls bdrv_set_backing_hd() which only touches A) > > > >BB -> A --> C > > > > F -> dummy -> B / > > > >resulting in error when issuing block-insert-node, > > > > > > > > else if we set the job to manual-delete and insert F: > > > >BB -> A -> F -> dummy -> B -> C > > > > deleting the job will result in F being lost since the job's top bs > > > > wasn't > > > > updated. > > > > > > manual-delete is a solution for block jobs. The write threshold > > > feature is a plain QMP command that in the future will create a > > > transient internal node (before write notifier). > > > > Yes, that becomes a legacy QMP command then. The new way is blockdev-add > > and block-insert-node for write threshold, too. > > > > Apart from that, a write threshold node never disappears by itself, it > > is only inserted or removed in the context of a QMP command. That makes > > it a lot less dangerous than automatic block completion. > > > > > I'm not sure it makes sense to turn the write threshold feature into a > > > block job. I guess it could work, but it seems a little unnatural and > > > maybe there will be other features that use transient internal nodes. > > > > Yeah, no reason to do so. Just a manually created filter node. > > > > Can you explain what you have in mind? The current workflow is using > block-set-write-threshold on the targetted bs. If we want write threshold to > be on node level we would want a new filter driver so that it can take > options special to write-threshold. Yes, I think that's what we want. > Unless we make the notify filter be a write threshold by default, and > when using it internally by the backup job, disable the threshold and > add our backup notifier to the node's li
Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
On Fri, Jul 28, 2017 at 02:08:43PM +0200, Kevin Wolf wrote: Am 27.07.2017 um 12:07 hat Stefan Hajnoczi geschrieben: On Wed, Jul 26, 2017 at 09:23:20PM +0300, Manos Pitsidianakis wrote: > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote: > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote: > > > This proposal follows a discussion we had with Kevin and Stefan on filter > > > node management. > > > > > > With block filter drivers arises a need to configure filter nodes on runtime > > > with QMP on live graphs. A problem with doing live graph modifications is > > > that some block jobs modify the graph when they are done and don't operate > > > under any synchronisation, resulting in a race condition if we try to insert > > > a filter node in the place of an existing edge. > > > > But maybe you are thinking about higher level race conditions between > > QMP commands. Can you give an example of the race? > > Exactly, synchronisation between the QMP/user actions. Here's an example, > correct me if I'm wrong: > > User issues a block-commit to this tree to commit A onto B: >BB -> A -> B > > This inserts the dummy mirror node at the top since this is a mirror job > (active commit) >BB -> dummy -> A -> B > > User wants to add a filter node F below the BB. First we create the node: >BB -> dummy -> A -> B > F / > > Commit finishes before we do block-insert-node, dummy and A is removed from > the BB's chain, mirror_exit calls bdrv_replace_node() > >BB > B >F -> / > > Now inserting F under BB will error since dummy does not exist any more. I see the diagrams but the flow isn't clear without the user's QMP commands. F is created using blockdev-add? What are the parameters and how does it know about dummy? I think this is an interesting question in itself because dummy is a transient internal node that QMP users don't necessarily know about. We can expect that users of block-insert-node also know about block job filter nodes, simply because the former is newer than the latter. (I also don't like the name "dummy", this makes it sound like it's not really a proper node. In reality, there is little reason to treat it specially.) What is the full block-insert-node command? > The proposal doesn't guard against the following: > User issues a block-commit to this tree to commit B onto C: >BB -> A -> B -> C > > The dummy node (commit's top bs is A): >BB -> A -> dummy -> B -> C > > blockdev-add of a filter we wish to eventually be between A and C: >BB -> A -> dummy -> B -> C > F/ > > if commit finishes before we issue block-insert-node (commit_complete() > calls bdrv_set_backing_hd() which only touches A) >BB -> A --> C > F -> dummy -> B / >resulting in error when issuing block-insert-node, > > else if we set the job to manual-delete and insert F: >BB -> A -> F -> dummy -> B -> C > deleting the job will result in F being lost since the job's top bs wasn't > updated. manual-delete is a solution for block jobs. The write threshold feature is a plain QMP command that in the future will create a transient internal node (before write notifier). Yes, that becomes a legacy QMP command then. The new way is blockdev-add and block-insert-node for write threshold, too. Apart from that, a write threshold node never disappears by itself, it is only inserted or removed in the context of a QMP command. That makes it a lot less dangerous than automatic block completion. I'm not sure it makes sense to turn the write threshold feature into a block job. I guess it could work, but it seems a little unnatural and maybe there will be other features that use transient internal nodes. Yeah, no reason to do so. Just a manually created filter node. Can you explain what you have in mind? The current workflow is using block-set-write-threshold on the targetted bs. If we want write threshold to be on node level we would want a new filter driver so that it can take options special to write-threshold. Unless we make the notify filter be a write threshold by default, and when using it internally by the backup job, disable the threshold and add our backup notifier to the node's list. Of course the current write threshold code could be refactored into a new driver so that it doesn't have to rely on notifiers. The way I've currently done the conversion is to add an implicit filter when enabling the threshold, just like in backup jobs. signature.asc Description: PGP signature
Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
On Fri, Jul 28, 2017 at 02:08:43PM +0200, Kevin Wolf wrote: > Am 27.07.2017 um 12:07 hat Stefan Hajnoczi geschrieben: > > On Wed, Jul 26, 2017 at 09:23:20PM +0300, Manos Pitsidianakis wrote: > > > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote: > > > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote: > > > > > This proposal follows a discussion we had with Kevin and Stefan on > > > > > filter > > > > > node management. > > > > > > > > > > With block filter drivers arises a need to configure filter nodes on > > > > > runtime > > > > > with QMP on live graphs. A problem with doing live graph > > > > > modifications is > > > > > that some block jobs modify the graph when they are done and don't > > > > > operate > > > > > under any synchronisation, resulting in a race condition if we try to > > > > > insert > > > > > a filter node in the place of an existing edge. > > > > > > > > But maybe you are thinking about higher level race conditions between > > > > QMP commands. Can you give an example of the race? > > > > > > Exactly, synchronisation between the QMP/user actions. Here's an example, > > > correct me if I'm wrong: > > > > > > User issues a block-commit to this tree to commit A onto B: > > >BB -> A -> B > > > > > > This inserts the dummy mirror node at the top since this is a mirror job > > > (active commit) > > >BB -> dummy -> A -> B > > > > > > User wants to add a filter node F below the BB. First we create the node: > > >BB -> dummy -> A -> B > > > F / > > > > > > Commit finishes before we do block-insert-node, dummy and A is removed > > > from > > > the BB's chain, mirror_exit calls bdrv_replace_node() > > > > > >BB > B > > >F -> / > > > > > > Now inserting F under BB will error since dummy does not exist any more. > > > > I see the diagrams but the flow isn't clear without the user's QMP > > commands. > > > > F is created using blockdev-add? What are the parameters and how does > > it know about dummy? I think this is an interesting question in itself > > because dummy is a transient internal node that QMP users don't > > necessarily know about. > > We can expect that users of block-insert-node also know about block job > filter nodes, simply because the former is newer than the latter. > > (I also don't like the name "dummy", this makes it sound like it's not > really a proper node. In reality, there is little reason to treat it > specially.) > > > What is the full block-insert-node command? > > > > > The proposal doesn't guard against the following: > > > User issues a block-commit to this tree to commit B onto C: > > >BB -> A -> B -> C > > > > > > The dummy node (commit's top bs is A): > > >BB -> A -> dummy -> B -> C > > > > > > blockdev-add of a filter we wish to eventually be between A and C: > > >BB -> A -> dummy -> B -> C > > > F/ > > > > > > if commit finishes before we issue block-insert-node (commit_complete() > > > calls bdrv_set_backing_hd() which only touches A) > > >BB -> A --> C > > > F -> dummy -> B / > > >resulting in error when issuing block-insert-node, > > > > > > else if we set the job to manual-delete and insert F: > > >BB -> A -> F -> dummy -> B -> C > > > deleting the job will result in F being lost since the job's top bs wasn't > > > updated. > > > > manual-delete is a solution for block jobs. The write threshold > > feature is a plain QMP command that in the future will create a > > transient internal node (before write notifier). > > Yes, that becomes a legacy QMP command then. The new way is blockdev-add > and block-insert-node for write threshold, too. > > Apart from that, a write threshold node never disappears by itself, it > is only inserted or removed in the context of a QMP command. That makes > it a lot less dangerous than automatic block completion. Nice thinking - write threshold can be a node too! signature.asc Description: PGP signature
Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
Am 27.07.2017 um 12:07 hat Stefan Hajnoczi geschrieben: > On Wed, Jul 26, 2017 at 09:23:20PM +0300, Manos Pitsidianakis wrote: > > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote: > > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote: > > > > This proposal follows a discussion we had with Kevin and Stefan on > > > > filter > > > > node management. > > > > > > > > With block filter drivers arises a need to configure filter nodes on > > > > runtime > > > > with QMP on live graphs. A problem with doing live graph modifications > > > > is > > > > that some block jobs modify the graph when they are done and don't > > > > operate > > > > under any synchronisation, resulting in a race condition if we try to > > > > insert > > > > a filter node in the place of an existing edge. > > > > > > But maybe you are thinking about higher level race conditions between > > > QMP commands. Can you give an example of the race? > > > > Exactly, synchronisation between the QMP/user actions. Here's an example, > > correct me if I'm wrong: > > > > User issues a block-commit to this tree to commit A onto B: > >BB -> A -> B > > > > This inserts the dummy mirror node at the top since this is a mirror job > > (active commit) > >BB -> dummy -> A -> B > > > > User wants to add a filter node F below the BB. First we create the node: > >BB -> dummy -> A -> B > > F / > > > > Commit finishes before we do block-insert-node, dummy and A is removed from > > the BB's chain, mirror_exit calls bdrv_replace_node() > > > >BB > B > >F -> / > > > > Now inserting F under BB will error since dummy does not exist any more. > > I see the diagrams but the flow isn't clear without the user's QMP > commands. > > F is created using blockdev-add? What are the parameters and how does > it know about dummy? I think this is an interesting question in itself > because dummy is a transient internal node that QMP users don't > necessarily know about. We can expect that users of block-insert-node also know about block job filter nodes, simply because the former is newer than the latter. (I also don't like the name "dummy", this makes it sound like it's not really a proper node. In reality, there is little reason to treat it specially.) > What is the full block-insert-node command? > > > The proposal doesn't guard against the following: > > User issues a block-commit to this tree to commit B onto C: > >BB -> A -> B -> C > > > > The dummy node (commit's top bs is A): > >BB -> A -> dummy -> B -> C > > > > blockdev-add of a filter we wish to eventually be between A and C: > >BB -> A -> dummy -> B -> C > > F/ > > > > if commit finishes before we issue block-insert-node (commit_complete() > > calls bdrv_set_backing_hd() which only touches A) > >BB -> A --> C > > F -> dummy -> B / > >resulting in error when issuing block-insert-node, > > > > else if we set the job to manual-delete and insert F: > >BB -> A -> F -> dummy -> B -> C > > deleting the job will result in F being lost since the job's top bs wasn't > > updated. > > manual-delete is a solution for block jobs. The write threshold > feature is a plain QMP command that in the future will create a > transient internal node (before write notifier). Yes, that becomes a legacy QMP command then. The new way is blockdev-add and block-insert-node for write threshold, too. Apart from that, a write threshold node never disappears by itself, it is only inserted or removed in the context of a QMP command. That makes it a lot less dangerous than automatic block completion. > I'm not sure it makes sense to turn the write threshold feature into a > block job. I guess it could work, but it seems a little unnatural and > maybe there will be other features that use transient internal nodes. Yeah, no reason to do so. Just a manually created filter node. > What I'm getting at is that there might be alternative to manual-delete > that work in the general case. For example, if blockdev-add + > block-insert-node are part of a QMP 'transaction' command, can we lock > the graph to protect it against modifications? We thought about this at the last STR meeting. However, not for much longer than a minute. Markus and I both agreed that we wouldn't be the ones to try and make blockdev-add transactionable. It sounds like a huge amount of work and a high risk of bugs for little benefit. Let's just make sure that we provide a way for management tools to avoid any surprise changes in the block graph. Kevin signature.asc Description: PGP signature
Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
Am 28.07.2017 um 00:09 hat John Snow geschrieben: > On 07/26/2017 02:23 PM, Manos Pitsidianakis wrote: > > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote: > > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote: > > > > This proposal follows a discussion we had with Kevin and Stefan > > > > on filter > > > > node management. > > > > > > > > With block filter drivers arises a need to configure filter > > > > nodes on runtime > > > > with QMP on live graphs. A problem with doing live graph > > > > modifications is > > > > that some block jobs modify the graph when they are done and > > > > don't operate > > > > under any synchronisation, resulting in a race condition if we > > > > try to insert > > > > a filter node in the place of an existing edge. > > > > > > But maybe you are thinking about higher level race conditions between > > > QMP commands. Can you give an example of the race? > > > > Exactly, synchronisation between the QMP/user actions. Here's an > > example, correct me if I'm wrong: > > > > User issues a block-commit to this tree to commit A onto B: > > BB -> A -> B > > > > This inserts the dummy mirror node at the top since this is a mirror job > > (active commit) > > BB -> dummy -> A -> B > > > > User wants to add a filter node F below the BB. First we create the node: > > BB -> dummy -> A -> B > >F / > > > > Commit finishes before we do block-insert-node, dummy and A is removed > > from the BB's chain, mirror_exit calls bdrv_replace_node() > > > > BB > B > > F -> / > > > > Now inserting F under BB will error since dummy does not exist any more. So the basic scenario is the one from the last section of the STR block meeting notes: http://lists.gnu.org/archive/html/qemu-block/2016-12/msg00045.html At the time, this scenario seemed to make sense, but in fact, as your explanation show, it's not a real problem. The theory that inserting the filter would bring the mirror_top node back is false because removing the mirror_top node updates _all_ of its parents (including the new throttling node) rather than just the BB to point to the BDS below. However, whether block-insert-node actually fails depends on how you address the edge(s) on which you want to insert a filter node. We would only get an error if you specify it as "insert filter above this node". However, this is not a way to unambiguously reference a single edge, you could only update _all_ incoming edges for a node at once. Therefore I don't think this is a good way to do things. An unambiguous way to identify a single edge is the name of the parent node (usually a node-name, but possibly a BB name - they share a single namespace, so having one QMP argument for both could be enough) and of the child (as in BdrvChild), e.g. "insert as new child=backing of node=myimage", This way, the example scenario would actually just work. > Exactly how much of a problem is this? You, the user, have instructed QEMU > to do two conflicting things: > > (1) Squash A down into B, and > (2) Insert a filter above A > > Shame on you for deleting the node you were trying to reference, no? And as > an added bonus, QEMU will even error out on your command instead of trying > to do something and getting it wrong. > > Is this a problem? I don't think these two actions can reasonably be considered conflicting, adding a filter on top doesn't influence the commit operation at all. We used to just forbid everything while doing block jobs, but these operations are block jobs because they are long-running, and just forbidding everything for a long time isn't very helpful. There is no reason why a user shouldn't be able to take a snapshot while they are creating a backup, etc. As I said above, filter nodes automagically going away at some point without an explicit user command isn't necessarily a problem, but I also don't feel very comfortable with it. It certainly feels like an easy way to introduce bugs. Things should become much easier to manage when all graph modifications are explicit. I'm surprised that you seem to be opposed to an explicit job-delete now as we already discussed this before and it would solve several problems: * Block job completion issues only an event. If the management tools reconnects, it has no way to get the same information with a query-* command. If jobs stay around until job-delete, this is not a problem. * Completion is usually rather complicated with all of the graph changes and involves operations that can fail, but we don't have a way to actually report errors to the user. job-delete could do that. * And finally the management problem we're discussing here. Yes, if I want to snapshot below a mirror node and the mirror goes away immediately before I send the command, libvirt will just get an error and can process all events and update its graph view and then retry with another node name. Though even if it's po
Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
On Thu, Jul 27, 2017 at 06:09:04PM -0400, John Snow wrote: On 07/26/2017 02:23 PM, Manos Pitsidianakis wrote: On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote: On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote: This proposal follows a discussion we had with Kevin and Stefan on filter node management. With block filter drivers arises a need to configure filter nodes on runtime with QMP on live graphs. A problem with doing live graph modifications is that some block jobs modify the graph when they are done and don't operate under any synchronisation, resulting in a race condition if we try to insert a filter node in the place of an existing edge. But maybe you are thinking about higher level race conditions between QMP commands. Can you give an example of the race? Exactly, synchronisation between the QMP/user actions. Here's an example, correct me if I'm wrong: User issues a block-commit to this tree to commit A onto B: BB -> A -> B This inserts the dummy mirror node at the top since this is a mirror job (active commit) BB -> dummy -> A -> B User wants to add a filter node F below the BB. First we create the node: BB -> dummy -> A -> B F / Commit finishes before we do block-insert-node, dummy and A is removed from the BB's chain, mirror_exit calls bdrv_replace_node() BB > B F -> / Now inserting F under BB will error since dummy does not exist any more. Exactly how much of a problem is this? You, the user, have instructed QEMU to do two conflicting things: (1) Squash A down into B, and (2) Insert a filter above A Shame on you for deleting the node you were trying to reference, no? And as an added bonus, QEMU will even error out on your command instead of trying to do something and getting it wrong. Is this a problem? The proposal doesn't guard against the following: User issues a block-commit to this tree to commit B onto C: BB -> A -> B -> C The dummy node (commit's top bs is A): BB -> A -> dummy -> B -> C blockdev-add of a filter we wish to eventually be between A and C: BB -> A -> dummy -> B -> C F/ if commit finishes before we issue block-insert-node (commit_complete() calls bdrv_set_backing_hd() which only touches A) BB -> A --> C F -> dummy -> B / resulting in error when issuing block-insert-node, Because "B" and "dummy" have already actually been deleted, I assume. (The diagram is confusing.) because we have an edge (A, C) and C is not a child of F, so we cannot insert it as a child to A. (I rotated the usual ascii art by 90 deg to save me from handling whitespace, but I might have done it worse) else if we set the job to manual-delete and insert F: BB -> A -> F -> dummy -> B -> C deleting the job will result in F being lost since the job's top bs wasn't updated. Seems like a fairly good reason not to pursue this avenue then, yes? I think this is just a different problem, not a race but of broadcasting graph operations to affected parties (block jobs). I think I agree with Stefan's concerns that this is more of a temporary workaround for the benefit of jobs, but that there may well be other reasons (especially in the future) where graph manipulations may occur invisibly to the user. Do you have any use cases for failure here that don't involve the user themselves asking for two conflicting things? That is, QEMU doing some graph reorganization without the user's knowledge coupled with the user asking for a new filter node? Are there any failures that result in anything more dangerous or bad than a "404: node not found" style error where we simply reject the premise of what the user was attempting to accomplish? I couldn't come up with a case that caused catastrophic failure, no (though I lack familiarity with block jobs). If this isn't a problem after all, good news. block-insert-node needs to work with no dangerous conflicts to other graph operations, basically. Is there a case where there's not just an error and we can't recover? signature.asc Description: PGP signature
Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
On 07/26/2017 02:23 PM, Manos Pitsidianakis wrote: On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote: On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote: This proposal follows a discussion we had with Kevin and Stefan on filter node management. With block filter drivers arises a need to configure filter nodes on runtime with QMP on live graphs. A problem with doing live graph modifications is that some block jobs modify the graph when they are done and don't operate under any synchronisation, resulting in a race condition if we try to insert a filter node in the place of an existing edge. But maybe you are thinking about higher level race conditions between QMP commands. Can you give an example of the race? Exactly, synchronisation between the QMP/user actions. Here's an example, correct me if I'm wrong: User issues a block-commit to this tree to commit A onto B: BB -> A -> B This inserts the dummy mirror node at the top since this is a mirror job (active commit) BB -> dummy -> A -> B User wants to add a filter node F below the BB. First we create the node: BB -> dummy -> A -> B F / Commit finishes before we do block-insert-node, dummy and A is removed from the BB's chain, mirror_exit calls bdrv_replace_node() BB > B F -> / Now inserting F under BB will error since dummy does not exist any more. Exactly how much of a problem is this? You, the user, have instructed QEMU to do two conflicting things: (1) Squash A down into B, and (2) Insert a filter above A Shame on you for deleting the node you were trying to reference, no? And as an added bonus, QEMU will even error out on your command instead of trying to do something and getting it wrong. Is this a problem? The proposal doesn't guard against the following: User issues a block-commit to this tree to commit B onto C: BB -> A -> B -> C The dummy node (commit's top bs is A): BB -> A -> dummy -> B -> C blockdev-add of a filter we wish to eventually be between A and C: BB -> A -> dummy -> B -> C F/ if commit finishes before we issue block-insert-node (commit_complete() calls bdrv_set_backing_hd() which only touches A) BB -> A --> C F -> dummy -> B / resulting in error when issuing block-insert-node, Because "B" and "dummy" have already actually been deleted, I assume. (The diagram is confusing.) else if we set the job to manual-delete and insert F: BB -> A -> F -> dummy -> B -> C deleting the job will result in F being lost since the job's top bs wasn't updated. Seems like a fairly good reason not to pursue this avenue then, yes? I think I agree with Stefan's concerns that this is more of a temporary workaround for the benefit of jobs, but that there may well be other reasons (especially in the future) where graph manipulations may occur invisibly to the user. Do you have any use cases for failure here that don't involve the user themselves asking for two conflicting things? That is, QEMU doing some graph reorganization without the user's knowledge coupled with the user asking for a new filter node? Are there any failures that result in anything more dangerous or bad than a "404: node not found" style error where we simply reject the premise of what the user was attempting to accomplish?
Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
On Wed, Jul 26, 2017 at 09:23:20PM +0300, Manos Pitsidianakis wrote: > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote: > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote: > > > This proposal follows a discussion we had with Kevin and Stefan on filter > > > node management. > > > > > > With block filter drivers arises a need to configure filter nodes on > > > runtime > > > with QMP on live graphs. A problem with doing live graph modifications is > > > that some block jobs modify the graph when they are done and don't operate > > > under any synchronisation, resulting in a race condition if we try to > > > insert > > > a filter node in the place of an existing edge. > > > > But maybe you are thinking about higher level race conditions between > > QMP commands. Can you give an example of the race? > > Exactly, synchronisation between the QMP/user actions. Here's an example, > correct me if I'm wrong: > > User issues a block-commit to this tree to commit A onto B: >BB -> A -> B > > This inserts the dummy mirror node at the top since this is a mirror job > (active commit) >BB -> dummy -> A -> B > > User wants to add a filter node F below the BB. First we create the node: >BB -> dummy -> A -> B > F / > > Commit finishes before we do block-insert-node, dummy and A is removed from > the BB's chain, mirror_exit calls bdrv_replace_node() > >BB > B >F -> / > > Now inserting F under BB will error since dummy does not exist any more. I see the diagrams but the flow isn't clear without the user's QMP commands. F is created using blockdev-add? What are the parameters and how does it know about dummy? I think this is an interesting question in itself because dummy is a transient internal node that QMP users don't necessarily know about. What is the full block-insert-node command? > The proposal doesn't guard against the following: > User issues a block-commit to this tree to commit B onto C: >BB -> A -> B -> C > > The dummy node (commit's top bs is A): >BB -> A -> dummy -> B -> C > > blockdev-add of a filter we wish to eventually be between A and C: >BB -> A -> dummy -> B -> C > F/ > > if commit finishes before we issue block-insert-node (commit_complete() > calls bdrv_set_backing_hd() which only touches A) >BB -> A --> C > F -> dummy -> B / >resulting in error when issuing block-insert-node, > > else if we set the job to manual-delete and insert F: >BB -> A -> F -> dummy -> B -> C > deleting the job will result in F being lost since the job's top bs wasn't > updated. manual-delete is a solution for block jobs. The write threshold feature is a plain QMP command that in the future will create a transient internal node (before write notifier). I'm not sure it makes sense to turn the write threshold feature into a block job. I guess it could work, but it seems a little unnatural and maybe there will be other features that use transient internal nodes. What I'm getting at is that there might be alternative to manual-delete that work in the general case. For example, if blockdev-add + block-insert-node are part of a QMP 'transaction' command, can we lock the graph to protect it against modifications? signature.asc Description: PGP signature
Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote: On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote: This proposal follows a discussion we had with Kevin and Stefan on filter node management. With block filter drivers arises a need to configure filter nodes on runtime with QMP on live graphs. A problem with doing live graph modifications is that some block jobs modify the graph when they are done and don't operate under any synchronisation, resulting in a race condition if we try to insert a filter node in the place of an existing edge. But maybe you are thinking about higher level race conditions between QMP commands. Can you give an example of the race? Exactly, synchronisation between the QMP/user actions. Here's an example, correct me if I'm wrong: User issues a block-commit to this tree to commit A onto B: BB -> A -> B This inserts the dummy mirror node at the top since this is a mirror job (active commit) BB -> dummy -> A -> B User wants to add a filter node F below the BB. First we create the node: BB -> dummy -> A -> B F / Commit finishes before we do block-insert-node, dummy and A is removed from the BB's chain, mirror_exit calls bdrv_replace_node() BB > B F -> / Now inserting F under BB will error since dummy does not exist any more. The proposal doesn't guard against the following: User issues a block-commit to this tree to commit B onto C: BB -> A -> B -> C The dummy node (commit's top bs is A): BB -> A -> dummy -> B -> C blockdev-add of a filter we wish to eventually be between A and C: BB -> A -> dummy -> B -> C F/ if commit finishes before we issue block-insert-node (commit_complete() calls bdrv_set_backing_hd() which only touches A) BB -> A --> C F -> dummy -> B / resulting in error when issuing block-insert-node, else if we set the job to manual-delete and insert F: BB -> A -> F -> dummy -> B -> C deleting the job will result in F being lost since the job's top bs wasn't updated. signature.asc Description: PGP signature
Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote: > This proposal follows a discussion we had with Kevin and Stefan on filter > node management. > > With block filter drivers arises a need to configure filter nodes on runtime > with QMP on live graphs. A problem with doing live graph modifications is > that some block jobs modify the graph when they are done and don't operate > under any synchronisation, resulting in a race condition if we try to insert > a filter node in the place of an existing edge. Block jobs *do* operate under synchronization. They only manipulate the graph from the main loop (under the QEMU global mutex just like a monitor command). But maybe you are thinking about higher level race conditions between QMP commands. Can you give an example of the race? CCing John Snow (interested in commands for 'reaping' completed block jobs) and Jeff Cody (block jobs maintainer). signature.asc Description: PGP signature