On Mon, Mar 05, 2012 at 10:20:36AM -0700, Eric Blake wrote: > On 03/05/2012 09:59 AM, Marcelo Tosatti wrote: > > On Wed, Feb 22, 2012 at 05:13:32PM +0000, Federico Simoncelli wrote: > >> Hi, > >> recently I've been working on live block migration combining the live > >> snapshots and the blkmirror patch sent by Marcelo Tosatti few months ago. > >> > >> The design is summarized at this url as "Mirrored-Snapshot": > >> > >> http://www.ovirt.org/wiki/Features/Design/StorageLiveMigration > >> > >> The design assumes that the qemu process can reach both the source and > >> destination storages and no real VM migration between hosts is involved. > >> The principal problem that it tries to solve is moving a VM to a new > >> reachable storage (more space, faster) without temporarily disrupting its > >> services. > >> > >> The following set of patches are implementing the required changes in > >> QEMU. > > > > What is the motivation here? What is the limitation with image streaming > > that this tries to solve? > > My understanding is that this solves the scenario of a storage failure > during the migration. The original post-copy approach has the flaw that > you are setting up a situation where qemu is operating on a qcow2 file > on one storage domain that is backed by a file on another storage > domain. After you start the migration process, but before it completes, > any failure in the migration is fatal to the domain: if the destination > storage domain fails, then you have lost all the delta changes made > since the migration started. And after the migration has completed, you > still have the problem that qemu is crossing storage domains - if the > source storage domain fails, then qemu's access to the backing file > renders the destination qcow2 worthless, so you cannot shut down the > source storage domain without also restarting the guest. > > But a mirrored solution does not have these drawbacks - at all points > through the migration phase, you are guaranteed that _all_ data is > accessible from a single storage domain. If the destination storage > fails, you still have the source storage intact, and can restart the > migration process. Then, when the migration is complete, you tell qemu > to atomically switch storage domains, at which point the entire storage > is accessed from the destination domain, and you can safely shut down > the source storage domain while the guest continues to run..
OK, can't it be fixed by image streaming on top of a blkmirror device? This would avoid a duplicate interface (such as no need to snapshot_blkdev to change to final copy). That is, start image streaming to a blkmirror device so that updates to the new snapshot are replicated across target and destination domains. Obviously then usage of blkmirror is only necessary when moving across image domains.