On 03/14/2012 03:34 AM, Kevin Wolf wrote: >> That said, reopen is really hard to be implemented as a transaction >> without breaking that rule. For example in the blkmirror case you'd >> need to open the destination image in r/w while the mirroring is in >> action (already having the same image in r/w mode). >> >> There are several solutions here but they are either really hard to >> implement or non definitive. For example: >> >> * We could try to implement the reopen command for each special case, >> eg: blkmirror, reopening the same image, etc... and in such cases >> reusing the same bs that we already have. The downside is that this >> command will be coupled with all this special cases. > > The problem we're trying to solve is that we have a graph of open > BlockDriverStates (connected by bs->file, backing file relations etc.) > and we want to transform it into a different graph of open BDSs > atomically, reusing zero, one or many of the existing BDSs, and possibly > with changed properties (cache mode, read-only, etc.) > > What we can have reasonably easily (there are patches floating around, > they just need to be completed), is a bdrv_reopen() that changes flags > on one given BDS, without changing the file it's backed by. This is > already broken up into prepare/commit/abort stages as we need it to > reopen VMDK's split images safely. > > In theory this should be enough to build the new graph by opening yet > unused BDSs, preparing the reopen of reused ones and only if all of that > was successful, committing the bdrv_reopen and changing the relations > between the nodes. I hope that at the same time it's clear that this > isn't exactly trivial to implement.
Agreed that 1) it looks implementable, and 2) it does not look trivial. > >> * We could use the transaction APIs without actually making it >> transaction (if we fail in the middle we can't rollback). The only >> advantage of this is that we'd provide a consistent API to libvirt >> and we would postpone the problem to the future. Anyway I strongly >> discourage this as it's completely unsafe and it's going to break >> the transaction semantic. Moreover it's a solution that relies too >> much on the hope of finding something appropriate in the future. > > This is not an option. Advertising transactional behaviour and not > implementing it is just plain wrong. Absolutely concur. From libvirt's perspective, I will be assuming that: if 'transaction' exists, then it supports 'blockdev-snapshot-sync' and 'drive-mirror' (assuming that these are the only two commands that are in 'transaction' in qemu 1.1), but nothing else is safe to use without a further probe. Then, if down the road, we go through the pain of making 'drive-reopen' transactionable, then there will be some new introspection command (one idea was an optional argument to query-commands which limits output to just the commands that can also occur in a 'transaction'), and libvirt will have to use that introspection before using the additional features. > >> * We could leave it as it is, a distinct command that is not part of >> the transaction and that it's closing the old image before opening >> the new one. > > Yes, this would be the short-term preliminary solution. I would tend to > leave it to downstreams to implement it as an extension, though. Correct - I'm fine with libvirt using the direct, non-atomic, 'drive-reopen' for the immediate needs of oVirt while we work on a more complete solution for down the road. > >> This is not completely correct, the main intent was to not spread one >> image chain across two storage domains (making it incomplete if one of >> them was missing). In the next oVirt release a VM can have different >> disks on different storage domains, so this wouldn't be a special case >> but just a normal situation. > > The problem with this kind of argument is that we're not developing only > for oVirt, but need to look for what makes sense for any management tool > (or even just direct users of qemu). Indeed - that's why I still think it's dangerous to not have an atomic 'drive-reopen', even if oVirt can be coded to work in spite of an initial implementation being non-atomic. But there's a difference between wish-lists (atomic reopen) and practicality (what can we implement now), and I'm not going to insist on the impossible :) -- Eric Blake ebl...@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature