Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On Mon, 27 Feb 2012 10:42:31 -0600 Anthony Liguori wrote: > On 02/27/2012 10:33 AM, Federico Simoncelli wrote: > > I'm all for the modularity of the commands (I suggested it since the > > beginning), > > but all this infrastructure goes way beyond what I'd need for oVirt now. > > > > When I submitted my patches we knew that my work wasn't the definitive > > solution > > (eg: the future implementation of -blockdev). So I'd suggest to try and > > settle > > with something that is good enough and that is not preventing a future > > improvement. > > > > What about having a (temporary) flag in drive-mirror to accept also a > > new-image-file > > until we will have the optimal solution? > > Unless there are extenuating circumstances (like the absence of core > infrastructure in QEMU), then we should not add commands that we know are not > the right command. > > We have to support our commands forever. Agreed. Worst case we have vendor extension support that downstream can use to add its own commands.
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 05:58 PM, Anthony Liguori wrote: >> >> Jeff could rework his patches to work with transaction begin/commit >> commands, and Federico can then add drive-reopen and drive-migrate on >> top. > > Yes, maybe I lack imagination but I fail to see how it generalizes > easily/nicely. > From what I can tell, all of the rollback logic is very specific to the > commands being used, right? The rollback logic is just "close the new devices". The commit logic is specific to the commands being used, but reopen should be easier than snapshot (and basically the same except for backing_hd handling). Migrate is really syntactic sugar around reopen, so no surprises there. Paolo
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 10:54 AM, Paolo Bonzini wrote: On 02/27/2012 05:53 PM, Anthony Liguori wrote: It looks like this is exactly the case where the core infrastructure (transactions) is missing. Batch requests are incredibly easy to add. I'm stuck in meetings for the next couple days but I'm sure Luiz throw it together in no time at all. Batch requests are unnecessary. They should be a convenience, not a correctness feature. I think this discussion has gotten a bit unwieldy as I'm having trouble keeping up with the various proposals. Can we move to something more formal and written on the wiki? Can we start with a clear description of what the various requirements are? Regards, Anthony Liguroi Paolo
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 10:51 AM, Paolo Bonzini wrote: On 02/27/2012 04:24 PM, Anthony Liguori wrote: Then you get an error with the block devices still frozen. You can execute another command to reopen back to the old image to roll back the transaction. Pushing the rollback logic to the client does make the client interface a bit more complicated and adds latency to the error path but it's much easier than building a complex transaction infrastructure. And there are examples of this in the wild too. LDAP uses a similar mechanism. Actually, have you seen Jeff's atomic snapshot patch? It really implements all that is needed to do transactions, and gets 100% ok error-recovery unlike the existing blockdev_snapshot_sync. It really looks like we can do better than client-side error management, and there is not that much complexity at all. Jeff could rework his patches to work with transaction begin/commit commands, and Federico can then add drive-reopen and drive-migrate on top. Yes, maybe I lack imagination but I fail to see how it generalizes easily/nicely. From what I can tell, all of the rollback logic is very specific to the commands being used, right? Regards, Anthony Liguori Paolo
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 05:53 PM, Anthony Liguori wrote: >> It looks like this is exactly >> the case where the core infrastructure (transactions) is missing. > > Batch requests are incredibly easy to add. I'm stuck in meetings for > the next couple days but I'm sure Luiz throw it together in no time at all. Batch requests are unnecessary. They should be a convenience, not a correctness feature. Paolo
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 10:33 AM, Federico Simoncelli wrote: I'm all for the modularity of the commands (I suggested it since the beginning), but all this infrastructure goes way beyond what I'd need for oVirt now. When I submitted my patches we knew that my work wasn't the definitive solution (eg: the future implementation of -blockdev). So I'd suggest to try and settle with something that is good enough and that is not preventing a future improvement. What about having a (temporary) flag in drive-mirror to accept also a new-image-file until we will have the optimal solution? Unless there are extenuating circumstances (like the absence of core infrastructure in QEMU), then we should not add commands that we know are not the right command. So are you in favor or against my suggestion? Against. I think we have everything we need. It looks like this is exactly the case where the core infrastructure (transactions) is missing. Batch requests are incredibly easy to add. I'm stuck in meetings for the next couple days but I'm sure Luiz throw it together in no time at all. Regards, Anthony Liguori
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 04:24 PM, Anthony Liguori wrote: > > Then you get an error with the block devices still frozen. You can > execute another command to reopen back to the old image to roll back the > transaction. > > Pushing the rollback logic to the client does make the client interface > a bit more complicated and adds latency to the error path but it's much > easier than building a complex transaction infrastructure. > > And there are examples of this in the wild too. LDAP uses a similar > mechanism. Actually, have you seen Jeff's atomic snapshot patch? It really implements all that is needed to do transactions, and gets 100% ok error-recovery unlike the existing blockdev_snapshot_sync. It really looks like we can do better than client-side error management, and there is not that much complexity at all. Jeff could rework his patches to work with transaction begin/commit commands, and Federico can then add drive-reopen and drive-migrate on top. Paolo
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
- Original Message - > From: "Anthony Liguori" > To: "Federico Simoncelli" > Cc: "Paolo Bonzini" , kw...@redhat.com, > arm...@redhat.com, "Jeff Cody" , > mtosa...@redhat.com, qemu-devel@nongnu.org, "Luiz Capitulino" > > Sent: Monday, February 27, 2012 5:42:31 PM > Subject: Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the > blockdev-reopen and blockdev-migrate > commands) > > On 02/27/2012 10:33 AM, Federico Simoncelli wrote: > > I'm all for the modularity of the commands (I suggested it since > > the beginning), > > but all this infrastructure goes way beyond what I'd need for oVirt > > now. > > > > When I submitted my patches we knew that my work wasn't the > > definitive solution > > (eg: the future implementation of -blockdev). So I'd suggest to try > > and settle > > with something that is good enough and that is not preventing a > > future improvement. > > > > What about having a (temporary) flag in drive-mirror to accept also > > a new-image-file > > until we will have the optimal solution? > > Unless there are extenuating circumstances (like the absence of core > infrastructure in QEMU), then we should not add commands that we know > are not > the right command. So are you in favor or against my suggestion? It looks like this is exactly the case where the core infrastructure (transactions) is missing. -- Federico
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 10:33 AM, Federico Simoncelli wrote: I'm all for the modularity of the commands (I suggested it since the beginning), but all this infrastructure goes way beyond what I'd need for oVirt now. When I submitted my patches we knew that my work wasn't the definitive solution (eg: the future implementation of -blockdev). So I'd suggest to try and settle with something that is good enough and that is not preventing a future improvement. What about having a (temporary) flag in drive-mirror to accept also a new-image-file until we will have the optimal solution? Unless there are extenuating circumstances (like the absence of core infrastructure in QEMU), then we should not add commands that we know are not the right command. We have to support our commands forever. Regards, Anthony Liguori
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 05:33 PM, Federico Simoncelli wrote: >> > >> > blockdev-begin-transaction >> > drive-reopen device new-image-file >> > drive-mirror streaming=false device dest >> > blockdev-commit-transaction >> > >> > No strange optional arguments, no proliferation of commands, etc. >> > The >> > only downside is that if someone tries to do (4) without transactions >> > (or without stopping the VM) they'll get corruption because atomicity >> > is >> > required. > I'm all for the modularity of the commands (I suggested it since the > beginning), > but all this infrastructure goes way beyond what I'd need for oVirt now. > > When I submitted my patches we knew that my work wasn't the definitive > solution > (eg: the future implementation of -blockdev). So I'd suggest to try and settle > with something that is good enough and that is not preventing a future > improvement. > > What about having a (temporary) flag in drive-mirror to accept also a > new-image-file > until we will have the optimal solution? What if libvirt could simply try blockdev-freeze and, if it's not there, try passing a new-image-file argument. Add the new-image-file downstream if you have time constraints, and leave it out upstream. Too ugly? Paolo
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
- Original Message - > From: "Paolo Bonzini" > To: "Luiz Capitulino" > Cc: "Federico Simoncelli" , kw...@redhat.com, > mtosa...@redhat.com, qemu-devel@nongnu.org, > arm...@redhat.com, "Jeff Cody" > Sent: Monday, February 27, 2012 3:39:33 PM > Subject: drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen > and blockdev-migrate commands) > > > > 4) incremental=true, new_image_file passed: > > >- no images will be created > > >- writes will be mirrored to new_image_file and dest > > No need to provide this from within QEMU, because libvirt/oVirt can > do > the dance using elementary operations: > > blockdev-begin-transaction > drive-reopen device new-image-file > drive-mirror streaming=false device dest > blockdev-commit-transaction > > No strange optional arguments, no proliferation of commands, etc. > The > only downside is that if someone tries to do (4) without transactions > (or without stopping the VM) they'll get corruption because atomicity > is > required. I'm all for the modularity of the commands (I suggested it since the beginning), but all this infrastructure goes way beyond what I'd need for oVirt now. When I submitted my patches we knew that my work wasn't the definitive solution (eg: the future implementation of -blockdev). So I'd suggest to try and settle with something that is good enough and that is not preventing a future improvement. What about having a (temporary) flag in drive-mirror to accept also a new-image-file until we will have the optimal solution? -- Federico
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 09:17 AM, Kevin Wolf wrote: Am 27.02.2012 15:59, schrieb Anthony Liguori: On 02/27/2012 08:54 AM, Paolo Bonzini wrote: On 02/27/2012 03:46 PM, Anthony Liguori wrote: I think a better way to think of this is as a batch submission. It would be relatively easy to model in QMP too (just have a batch-command that has a list of commands as it's argument). The difference between batch submission and a transaction is atomic rollback. But I don't think atomic rollback is really needed here. A transaction enforces atomicity at the block layer level. It's different from batch commands in two ways: * bdrv_drain_all/bdrv_flush needs to be called at the beginning of the commit. This may not be the case with batch commands. * with batch commands, atomicity happens by chance because VCPUs cannot send I/O while the monitor is holding the global mutex. If the commands are designed correctly, then this works well. The problem is that the current commands are not designed well. For instance, multi-snapshot could look like: block-freeze ide0-hd0 block-freeze ide1-hd1 block-reopen ide0-hd0 my-new-file0.qcow2 block-reopen ide1-hd1 my-new-file1.qcow2 block-unfreeze ide1-hd1 block-unfreeze ide1-hd0 This would work regardless of whether the commands were implemented asynchronously within QEMU too. What if block-reopen ide1-hd1 fails? In case of failure, you want both drives to point to their old image, not the first one to the new image and the second one to the old image. Then you get an error with the block devices still frozen. You can execute another command to reopen back to the old image to roll back the transaction. Pushing the rollback logic to the client does make the client interface a bit more complicated and adds latency to the error path but it's much easier than building a complex transaction infrastructure. And there are examples of this in the wild too. LDAP uses a similar mechanism. Regards, Anthony Liguori Kevin
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
Am 27.02.2012 15:59, schrieb Anthony Liguori: > On 02/27/2012 08:54 AM, Paolo Bonzini wrote: >> On 02/27/2012 03:46 PM, Anthony Liguori wrote: >>> I think a better way to think of this is as a batch submission. It >>> would be relatively easy to model in QMP too (just have a batch-command >>> that has a list of commands as it's argument). >>> >>> The difference between batch submission and a transaction is atomic >>> rollback. But I don't think atomic rollback is really needed here. >> >> A transaction enforces atomicity at the block layer level. It's >> different from batch commands in two ways: >> >> * bdrv_drain_all/bdrv_flush needs to be called at the beginning of the >> commit. This may not be the case with batch commands. >> >> * with batch commands, atomicity happens by chance because VCPUs cannot >> send I/O while the monitor is holding the global mutex. > > If the commands are designed correctly, then this works well. The problem is > that the current commands are not designed well. For instance, > multi-snapshot > could look like: > > block-freeze ide0-hd0 > block-freeze ide1-hd1 > block-reopen ide0-hd0 my-new-file0.qcow2 > block-reopen ide1-hd1 my-new-file1.qcow2 > block-unfreeze ide1-hd1 > block-unfreeze ide1-hd0 > > This would work regardless of whether the commands were implemented > asynchronously within QEMU too. What if block-reopen ide1-hd1 fails? In case of failure, you want both drives to point to their old image, not the first one to the new image and the second one to the old image. Kevin
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 09:03 AM, Paolo Bonzini wrote: On 02/27/2012 03:59 PM, Anthony Liguori wrote: The problem is that the current commands are not designed well. For instance, multi-snapshot could look like: block-freeze ide0-hd0 block-freeze ide1-hd1 block-reopen ide0-hd0 my-new-file0.qcow2 block-reopen ide1-hd1 my-new-file1.qcow2 block-unfreeze ide1-hd1 block-unfreeze ide1-hd0 This would work regardless of whether the commands were implemented asynchronously within QEMU too. This looks good, too. Positive: maps well to fsfreeze/thaw with help from the guest agent. Negative: you have to specify the devices three times. Overall, I think I like it. However, you need to add freeze/unfreeze capabilities to the block layer. Not hard, but one more thing to do. Right. But it also generalizes to other QMP operations which is potentially interesting. And providing mechanisms like this gives more flexibility to management tools to implement interesting features without constantly chancing new QMP commands. Regards, Anthony Liguori Paolo
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 03:59 PM, Anthony Liguori wrote: > The problem is that the current commands are not designed well. For > instance, multi-snapshot could look like: > > block-freeze ide0-hd0 > block-freeze ide1-hd1 > block-reopen ide0-hd0 my-new-file0.qcow2 > block-reopen ide1-hd1 my-new-file1.qcow2 > block-unfreeze ide1-hd1 > block-unfreeze ide1-hd0 > > This would work regardless of whether the commands were implemented > asynchronously within QEMU too. This looks good, too. Positive: maps well to fsfreeze/thaw with help from the guest agent. Negative: you have to specify the devices three times. Overall, I think I like it. However, you need to add freeze/unfreeze capabilities to the block layer. Not hard, but one more thing to do. Paolo
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 08:54 AM, Paolo Bonzini wrote: On 02/27/2012 03:46 PM, Anthony Liguori wrote: I think a better way to think of this is as a batch submission. It would be relatively easy to model in QMP too (just have a batch-command that has a list of commands as it's argument). The difference between batch submission and a transaction is atomic rollback. But I don't think atomic rollback is really needed here. A transaction enforces atomicity at the block layer level. It's different from batch commands in two ways: * bdrv_drain_all/bdrv_flush needs to be called at the beginning of the commit. This may not be the case with batch commands. * with batch commands, atomicity happens by chance because VCPUs cannot send I/O while the monitor is holding the global mutex. If the commands are designed correctly, then this works well. The problem is that the current commands are not designed well. For instance, multi-snapshot could look like: block-freeze ide0-hd0 block-freeze ide1-hd1 block-reopen ide0-hd0 my-new-file0.qcow2 block-reopen ide1-hd1 my-new-file1.qcow2 block-unfreeze ide1-hd1 block-unfreeze ide1-hd0 This would work regardless of whether the commands were implemented asynchronously within QEMU too. Regards, Anthony Liguori Paolo
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 03:46 PM, Anthony Liguori wrote: > I think a better way to think of this is as a batch submission. It > would be relatively easy to model in QMP too (just have a batch-command > that has a list of commands as it's argument). > > The difference between batch submission and a transaction is atomic > rollback. But I don't think atomic rollback is really needed here. A transaction enforces atomicity at the block layer level. It's different from batch commands in two ways: * bdrv_drain_all/bdrv_flush needs to be called at the beginning of the commit. This may not be the case with batch commands. * with batch commands, atomicity happens by chance because VCPUs cannot send I/O while the monitor is holding the global mutex. Paolo
Re: [Qemu-devel] drive transactions (was Re: [PATCH 2/2 v2] Add the blockdev-reopen and blockdev-migrate commands)
On 02/27/2012 08:39 AM, Paolo Bonzini wrote: On 02/27/2012 02:06 PM, Luiz Capitulino wrote: IMHO, this is asking for two commands, where cases 1& 2 is one of them and cases 3& 4 is the other one. Note how 'incremental' goes away and 'new_image_file' really becomes an optional. I really would have no idea on naming except perhaps "drive_migrate" and "funny_drive_migrate_for_ovirt". But actually I must admit that it's a rat's nest. First, there's no reason why live-snapshotting to new_image_file shouldn't be handled within QEMU. That would make the above table much more orthogonal. However, oVirt likes to create its own snapshots. Perhaps we do need to rethink this together with group snapshots. There are four kinds of operations that we do on block devices: (a) create an image. This is part of what blockdev-snapshot does. (b) switch a block device to a new image. drive-reopen does this. (c) add mirroring to a new destination. (d) activate streaming from a base image drive-migrate does (b) and (c), will do (a) and (d) when we add pre-copy, and should do (a) right now if Federico wasn't an oVirt developer. :) Thinking more about it, the commands we _do_ need are: - start a transaction - switch to a new image - add mirroring to a new destination - commit a transaction - rollback a transaction - query transaction errors I think a better way to think of this is as a batch submission. It would be relatively easy to model in QMP too (just have a batch-command that has a list of commands as it's argument). The difference between batch submission and a transaction is atomic rollback. But I don't think atomic rollback is really needed here. Regards, Anthony Liguori