Re: [Qemu-devel] Live Block Migration using Mirroring
On Wed, Feb 22, 2012 at 05:13:32PM +, Federico Simoncelli wrote: Hi, recently I've been working on live block migration combining the live snapshots and the blkmirror patch sent by Marcelo Tosatti few months ago. The design is summarized at this url as Mirrored-Snapshot: http://www.ovirt.org/wiki/Features/Design/StorageLiveMigration The design assumes that the qemu process can reach both the source and destination storages and no real VM migration between hosts is involved. The principal problem that it tries to solve is moving a VM to a new reachable storage (more space, faster) without temporarily disrupting its services. The following set of patches are implementing the required changes in QEMU. What is the motivation here? What is the limitation with image streaming that this tries to solve? Here it is a quick example of the use case (for consistency with the design at the url above I will use the same step numbers): Preparation === $ mkdir /tmp/{src/dst} $ qemu-img create -f qcow2 /tmp/src/hd0base.qcow2 20G Formatting '/tmp/src/hd0base.qcow2', fmt=qcow2 size=21474836480 encryption=off cluster_size=65536 Step 1 - Initital Scenario == VM1 is running on the src/hd0base. (Where = stands for uses) [src/hd0base] = VM1(read-write) $ qemu-system-x86_64 -hda /tmp/src/hd0base.qcow2 -monitor stdio QEMU 1.0.50 monitor - type 'help' for more information (qemu) Step 3 - Mirrored Live Snapshot === A mirrored live snapshot is issued using src/hd0snap1 and dst/hd0snap1 as image files. (Where - stands for has backing file) [src/hd0base] - [src/hd0snap1] = VM1(read-write) ... - [dst/hd0snap1] = VM1(write-only) $ qemu-img create -f qcow2 \ -b /tmp/src/hd0base.qcow2 /tmp/src/hd0snap1.qcow2 20G Formatting '/tmp/src/hd0snap1.qcow2', fmt=qcow2 size=21474836480 backing_file='/tmp/src/hd0base.qcow2' encryption=off cluster_size=65536 $ qemu-img create -f qcow2 \ -b /tmp/dst/hd0base.qcow2 /tmp/dst/hd0snap1.qcow2 20G Formatting '/tmp/dst/hd0snap1.qcow2', fmt=qcow2 size=21474836480 backing_file='/tmp/src/hd0base.qcow2' encryption=off cluster_size=65536 (qemu) snapshot_blkdev -n ide0-hd0 \ blkmirror:/tmp/src/hd0snap1.qcow2:/tmp/dst/hd0snap1.qcow2 blkmirror Step 4 - Backing File Copy == An external manager copies src/hd0base to the destination dst/hd0base. [src/hd0base] - [src/hd0snap1] = VM1(read-write) [dst/hd0base] - [dst/hd0snap1] = VM1(write-only) $ cp -a /tmp/src/hd0base.qcow2 /tmp/dst/hd0base.qcow2 Step 5 - Final Switch to Destination VM1 is now able to switch to the destination for both read and write operations. [src/hd0base] - [src/hd0snap1] = VM1(read-write) (qemu) snapshot_blkdev -n ide0-hd0 /tmp/dst/hd0snap1.qcow2 -- Federico
Re: [Qemu-devel] Live Block Migration using Mirroring
On 03/05/2012 09:59 AM, Marcelo Tosatti wrote: On Wed, Feb 22, 2012 at 05:13:32PM +, Federico Simoncelli wrote: Hi, recently I've been working on live block migration combining the live snapshots and the blkmirror patch sent by Marcelo Tosatti few months ago. The design is summarized at this url as Mirrored-Snapshot: http://www.ovirt.org/wiki/Features/Design/StorageLiveMigration The design assumes that the qemu process can reach both the source and destination storages and no real VM migration between hosts is involved. The principal problem that it tries to solve is moving a VM to a new reachable storage (more space, faster) without temporarily disrupting its services. The following set of patches are implementing the required changes in QEMU. What is the motivation here? What is the limitation with image streaming that this tries to solve? My understanding is that this solves the scenario of a storage failure during the migration. The original post-copy approach has the flaw that you are setting up a situation where qemu is operating on a qcow2 file on one storage domain that is backed by a file on another storage domain. After you start the migration process, but before it completes, any failure in the migration is fatal to the domain: if the destination storage domain fails, then you have lost all the delta changes made since the migration started. And after the migration has completed, you still have the problem that qemu is crossing storage domains - if the source storage domain fails, then qemu's access to the backing file renders the destination qcow2 worthless, so you cannot shut down the source storage domain without also restarting the guest. But a mirrored solution does not have these drawbacks - at all points through the migration phase, you are guaranteed that _all_ data is accessible from a single storage domain. If the destination storage fails, you still have the source storage intact, and can restart the migration process. Then, when the migration is complete, you tell qemu to atomically switch storage domains, at which point the entire storage is accessed from the destination domain, and you can safely shut down the source storage domain while the guest continues to run.. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Live Block Migration using Mirroring
On Mon, Mar 05, 2012 at 10:20:36AM -0700, Eric Blake wrote: On 03/05/2012 09:59 AM, Marcelo Tosatti wrote: On Wed, Feb 22, 2012 at 05:13:32PM +, Federico Simoncelli wrote: Hi, recently I've been working on live block migration combining the live snapshots and the blkmirror patch sent by Marcelo Tosatti few months ago. The design is summarized at this url as Mirrored-Snapshot: http://www.ovirt.org/wiki/Features/Design/StorageLiveMigration The design assumes that the qemu process can reach both the source and destination storages and no real VM migration between hosts is involved. The principal problem that it tries to solve is moving a VM to a new reachable storage (more space, faster) without temporarily disrupting its services. The following set of patches are implementing the required changes in QEMU. What is the motivation here? What is the limitation with image streaming that this tries to solve? My understanding is that this solves the scenario of a storage failure during the migration. The original post-copy approach has the flaw that you are setting up a situation where qemu is operating on a qcow2 file on one storage domain that is backed by a file on another storage domain. After you start the migration process, but before it completes, any failure in the migration is fatal to the domain: if the destination storage domain fails, then you have lost all the delta changes made since the migration started. And after the migration has completed, you still have the problem that qemu is crossing storage domains - if the source storage domain fails, then qemu's access to the backing file renders the destination qcow2 worthless, so you cannot shut down the source storage domain without also restarting the guest. But a mirrored solution does not have these drawbacks - at all points through the migration phase, you are guaranteed that _all_ data is accessible from a single storage domain. If the destination storage fails, you still have the source storage intact, and can restart the migration process. Then, when the migration is complete, you tell qemu to atomically switch storage domains, at which point the entire storage is accessed from the destination domain, and you can safely shut down the source storage domain while the guest continues to run.. OK, can't it be fixed by image streaming on top of a blkmirror device? This would avoid a duplicate interface (such as no need to snapshot_blkdev to change to final copy). That is, start image streaming to a blkmirror device so that updates to the new snapshot are replicated across target and destination domains. Obviously then usage of blkmirror is only necessary when moving across image domains.
Re: [Qemu-devel] Live Block Migration using Mirroring
Il 05/03/2012 18:44, Marcelo Tosatti ha scritto: OK, can't it be fixed by image streaming on top of a blkmirror device? This would avoid a duplicate interface (such as no need to snapshot_blkdev to change to final copy). That is, start image streaming to a blkmirror device so that updates to the new snapshot are replicated across target and destination domains. This works too, but if you don't have a base image, streaming will complete both the source and destination images with zero clusters. It's just a limitation of the current implementation, of course. Paolo
Re: [Qemu-devel] Live Block Migration using Mirroring
On Wed, Feb 22, 2012 at 5:13 PM, Federico Simoncelli fsimo...@redhat.com wrote: Step 3 - Mirrored Live Snapshot === A mirrored live snapshot is issued using src/hd0snap1 and dst/hd0snap1 as image files. (Where - stands for has backing file) [src/hd0base] - [src/hd0snap1] = VM1(read-write) ... - [dst/hd0snap1] = VM1(write-only) $ qemu-img create -f qcow2 \ -b /tmp/src/hd0base.qcow2 /tmp/src/hd0snap1.qcow2 20G Formatting '/tmp/src/hd0snap1.qcow2', fmt=qcow2 size=21474836480 backing_file='/tmp/src/hd0base.qcow2' encryption=off cluster_size=65536 $ qemu-img create -f qcow2 \ -b /tmp/dst/hd0base.qcow2 /tmp/dst/hd0snap1.qcow2 20G Formatting '/tmp/dst/hd0snap1.qcow2', fmt=qcow2 size=21474836480 backing_file='/tmp/src/hd0base.qcow2' encryption=off cluster_size=65536 (qemu) snapshot_blkdev -n ide0-hd0 \ blkmirror:/tmp/src/hd0snap1.qcow2:/tmp/dst/hd0snap1.qcow2 blkmirror Step 4 - Backing File Copy == An external manager copies src/hd0base to the destination dst/hd0base. [src/hd0base] - [src/hd0snap1] = VM1(read-write) [dst/hd0base] - [dst/hd0snap1] = VM1(write-only) At this stage we have dst/hd0snap1 opened with BDRV_O_NO_BACKING. If it has no backing file and the guest issues a write request that is smaller than a cluster in the image file, the untouched areas of that cluster will be populated with zeroes. Once dst/hd0snap1 is reopened with dst/hd0base in place there will be zeros in clusters where the guest wrote only a few sectors. We will not see the backing file data in those clusters. Have you hit this problem or did I miss something? Stefan
Re: [Qemu-devel] Live Block Migration using Mirroring
- Original Message - From: Stefan Hajnoczi stefa...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: qemu-devel@nongnu.org, kw...@redhat.com, mtosa...@redhat.com Sent: Tuesday, February 28, 2012 4:47:48 PM Subject: Re: [Qemu-devel] Live Block Migration using Mirroring On Wed, Feb 22, 2012 at 5:13 PM, Federico Simoncelli fsimo...@redhat.com wrote: Step 3 - Mirrored Live Snapshot === A mirrored live snapshot is issued using src/hd0snap1 and dst/hd0snap1 as image files. (Where - stands for has backing file) [src/hd0base] - [src/hd0snap1] = VM1(read-write) ... - [dst/hd0snap1] = VM1(write-only) $ qemu-img create -f qcow2 \ -b /tmp/src/hd0base.qcow2 /tmp/src/hd0snap1.qcow2 20G Formatting '/tmp/src/hd0snap1.qcow2', fmt=qcow2 size=21474836480 backing_file='/tmp/src/hd0base.qcow2' encryption=off cluster_size=65536 $ qemu-img create -f qcow2 \ -b /tmp/dst/hd0base.qcow2 /tmp/dst/hd0snap1.qcow2 20G Formatting '/tmp/dst/hd0snap1.qcow2', fmt=qcow2 size=21474836480 backing_file='/tmp/src/hd0base.qcow2' encryption=off cluster_size=65536 (qemu) snapshot_blkdev -n ide0-hd0 \ blkmirror:/tmp/src/hd0snap1.qcow2:/tmp/dst/hd0snap1.qcow2 blkmirror Step 4 - Backing File Copy == An external manager copies src/hd0base to the destination dst/hd0base. [src/hd0base] - [src/hd0snap1] = VM1(read-write) [dst/hd0base] - [dst/hd0snap1] = VM1(write-only) At this stage we have dst/hd0snap1 opened with BDRV_O_NO_BACKING. If it has no backing file and the guest issues a write request that is smaller than a cluster in the image file, the untouched areas of that cluster will be populated with zeroes. Once dst/hd0snap1 is reopened with dst/hd0base in place there will be zeros in clusters where the guest wrote only a few sectors. We will not see the backing file data in those clusters. Have you hit this problem or did I miss something? Thank you for getting this. Being able to have a bogus backing file was a bonus but it's not really required for the mirrored live block migration. We can add the support for switching the backing file in the drive-reopen part. I'll remove the BDRV_O_NO_BACKING flag from the blkmirror patch. -- Federico
Re: [Qemu-devel] Live Block Migration using Mirroring
Il 28/02/2012 16:47, Stefan Hajnoczi ha scritto: At this stage we have dst/hd0snap1 opened with BDRV_O_NO_BACKING. If it has no backing file and the guest issues a write request that is smaller than a cluster in the image file, the untouched areas of that cluster will be populated with zeroes. Once dst/hd0snap1 is reopened with dst/hd0base in place there will be zeros in clusters where the guest wrote only a few sectors. We will not see the backing file data in those clusters. Have you hit this problem or did I miss something? I'm afraid not. Federico, perhaps you have to rewrite blkmirror to reuse copy-on-read mechanism. You can implement is_allocated and make the base image a real backing_file even though you can write to it. Perhaps some hack with BDRV_O_NO_BACKING, or perhaps it just works with Jeff's open-on-top behavior. Looks like it would also make streaming of the base image Just Work, at least to a non-raw destination. Remaining problems: 1) differentiating writes from copy-on-read and writes from the guest. This is needed to avoid spurious writes to the backing image each time you're doing a copy-on-read. 2) triggering a copy-on-read before writing partial clusters. Stefan, what do you think? Paolo
Re: [Qemu-devel] Live Block Migration using Mirroring
Il 28/02/2012 18:15, Federico Simoncelli ha scritto: Thank you for getting this. Being able to have a bogus backing file was a bonus but it's not really required for the mirrored live block migration. We can add the support for switching the backing file in the drive-reopen part. Wait, it's not really required for oVirt because it creates the snapshot outside QEMU. What about everyone else? Paolo
Re: [Qemu-devel] Live Block Migration using Mirroring
- Original Message - From: Paolo Bonzini pbonz...@redhat.com To: Federico Simoncelli fsimo...@redhat.com Cc: Stefan Hajnoczi stefa...@gmail.com, qemu-devel@nongnu.org, kw...@redhat.com, mtosa...@redhat.com Sent: Tuesday, February 28, 2012 6:36:57 PM Subject: Re: [Qemu-devel] Live Block Migration using Mirroring Il 28/02/2012 18:15, Federico Simoncelli ha scritto: Thank you for getting this. Being able to have a bogus backing file was a bonus but it's not really required for the mirrored live block migration. We can add the support for switching the backing file in the drive-reopen part. Wait, it's not really required for oVirt because it creates the snapshot outside QEMU. What about everyone else? They'll have (as oVirt) two mirrored snapshot pointing at the same base. The only difference is that the image is created internally, but that's not hard. -- Federico
Re: [Qemu-devel] Live Block Migration using Mirroring
Il 28/02/2012 18:46, Federico Simoncelli ha scritto: Thank you for getting this. Being able to have a bogus backing file was a bonus but it's not really required for the mirrored live block migration. We can add the support for switching the backing file in the drive-reopen part. Wait, it's not really required for oVirt because it creates the snapshot outside QEMU. What about everyone else? They'll have (as oVirt) two mirrored snapshot pointing at the same base. The only difference is that the image is created internally, but that's not hard. Can you detail how you are switching the backing file in the drive-reopen? Either the BlockDriverState opened by blkmirror, or the one opened at the end, will have to use the wrong backing_file. How do you arrange for that? Paolo
Re: [Qemu-devel] Live Block Migration using Mirroring
- Original Message - From: Paolo Bonzini pbonz...@redhat.com To: qemu-devel@nongnu.org Sent: Tuesday, February 28, 2012 7:02:40 PM Subject: Re: [Qemu-devel] Live Block Migration using Mirroring Il 28/02/2012 18:46, Federico Simoncelli ha scritto: Thank you for getting this. Being able to have a bogus backing file was a bonus but it's not really required for the mirrored live block migration. We can add the support for switching the backing file in the drive-reopen part. Wait, it's not really required for oVirt because it creates the snapshot outside QEMU. What about everyone else? They'll have (as oVirt) two mirrored snapshot pointing at the same base. The only difference is that the image is created internally, but that's not hard. Can you detail how you are switching the backing file in the drive-reopen? Either the BlockDriverState opened by blkmirror, or the one opened at the end, will have to use the wrong backing_file. How do you arrange for that? Step 1 - Initital Scenario == VM1 is running on the src/hd0base. [src/hd0base] = VM1(read-write) Step 3 - Mirrored Live Snapshot === A mirrored live snapshot is issued using src/hd0snap1 and dst/hd0snap1 as image files (both having src/hd0base as backing file). [src/hd0base] - [src/hd0snap1] = VM1(read-write) ^-- [dst/hd0snap1] = VM1(read-write) Step 4 - Backing File Copy == An external manager copies src/hd0base to the destination (dst/hd0base). [src/hd0base] - [src/hd0snap1] = VM1(read-write) [dst/hd0base]^-- [dst/hd0snap1] = VM1(read-write) Step 5 - Final Switch to Destination VM1 is now able to switch to the destination for both read and write operations fixing the backing file path in dst/hd0snap1. [src/hd0base] - [src/hd0snap1] [dst/hd0base] - [dst/hd0snap1] = VM1(read-write) -- Federico
Re: [Qemu-devel] Live Block Migration using Mirroring
On Wed, Feb 22, 2012 at 5:13 PM, Federico Simoncelli fsimo...@redhat.com wrote: Preparation === $ mkdir /tmp/{src/dst} $ qemu-img create -f qcow2 /tmp/src/hd0base.qcow2 20G Formatting '/tmp/src/hd0base.qcow2', fmt=qcow2 size=21474836480 encryption=off cluster_size=65536 Step 1 - Initital Scenario == VM1 is running on the src/hd0base. (Where = stands for uses) [src/hd0base] = VM1(read-write) $ qemu-system-x86_64 -hda /tmp/src/hd0base.qcow2 -monitor stdio QEMU 1.0.50 monitor - type 'help' for more information (qemu) Step 3 - Mirrored Live Snapshot === A mirrored live snapshot is issued using src/hd0snap1 and dst/hd0snap1 as image files. (Where - stands for has backing file) [src/hd0base] - [src/hd0snap1] = VM1(read-write) ... - [dst/hd0snap1] = VM1(write-only) $ qemu-img create -f qcow2 \ -b /tmp/src/hd0base.qcow2 /tmp/src/hd0snap1.qcow2 20G Formatting '/tmp/src/hd0snap1.qcow2', fmt=qcow2 size=21474836480 backing_file='/tmp/src/hd0base.qcow2' encryption=off cluster_size=65536 $ qemu-img create -f qcow2 \ -b /tmp/dst/hd0base.qcow2 /tmp/dst/hd0snap1.qcow2 20G Formatting '/tmp/dst/hd0snap1.qcow2', fmt=qcow2 size=21474836480 backing_file='/tmp/src/hd0base.qcow2' encryption=off cluster_size=65536 At this stage /tmp/dst/hd0base.qcow2 does not exist yet. The qemu-img output you pasted shows /tmp/src/hd0base.qcow2 was actually used. Typo? (qemu) snapshot_blkdev -n ide0-hd0 \ blkmirror:/tmp/src/hd0snap1.qcow2:/tmp/dst/hd0snap1.qcow2 blkmirror Step 4 - Backing File Copy == An external manager copies src/hd0base to the destination dst/hd0base. [src/hd0base] - [src/hd0snap1] = VM1(read-write) [dst/hd0base] - [dst/hd0snap1] = VM1(write-only) $ cp -a /tmp/src/hd0base.qcow2 /tmp/dst/hd0base.qcow2 Are we missing a fixup step that changes backing_file in dst/hd0snap1.qcow2 to point at dst/hd0base.qcow2? Step 5 - Final Switch to Destination VM1 is now able to switch to the destination for both read and write operations. [src/hd0base] - [src/hd0snap1] = VM1(read-write) (qemu) snapshot_blkdev -n ide0-hd0 /tmp/dst/hd0snap1.qcow2 -- Federico
Re: [Qemu-devel] Live Block Migration using Mirroring
- Original Message - From: Stefan Hajnoczi stefa...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: qemu-devel@nongnu.org, kw...@redhat.com, mtosa...@redhat.com Sent: Thursday, February 23, 2012 4:47:38 PM Subject: Re: [Qemu-devel] Live Block Migration using Mirroring On Wed, Feb 22, 2012 at 5:13 PM, Federico Simoncelli fsimo...@redhat.com wrote: Step 3 - Mirrored Live Snapshot === A mirrored live snapshot is issued using src/hd0snap1 and dst/hd0snap1 as image files. (Where - stands for has backing file) [src/hd0base] - [src/hd0snap1] = VM1(read-write) ... - [dst/hd0snap1] = VM1(write-only) $ qemu-img create -f qcow2 \ -b /tmp/src/hd0base.qcow2 /tmp/src/hd0snap1.qcow2 20G Formatting '/tmp/src/hd0snap1.qcow2', fmt=qcow2 size=21474836480 backing_file='/tmp/src/hd0base.qcow2' encryption=off cluster_size=65536 $ qemu-img create -f qcow2 \ -b /tmp/dst/hd0base.qcow2 /tmp/dst/hd0snap1.qcow2 20G Formatting '/tmp/dst/hd0snap1.qcow2', fmt=qcow2 size=21474836480 backing_file='/tmp/src/hd0base.qcow2' encryption=off cluster_size=65536 At this stage /tmp/dst/hd0base.qcow2 does not exist yet. The qemu-img output you pasted shows /tmp/src/hd0base.qcow2 was actually used. Typo? No that's part of the flag used in [PATCH 2/3] (Update the blkmirror block driver): BDRV_O_NO_BACKING It's also documented in the design: http://www.ovirt.org/wiki/File:StorageLiveMigration2.png (qemu) snapshot_blkdev -n ide0-hd0 \ blkmirror:/tmp/src/hd0snap1.qcow2:/tmp/dst/hd0snap1.qcow2 blkmirror Step 4 - Backing File Copy == An external manager copies src/hd0base to the destination dst/hd0base. [src/hd0base] - [src/hd0snap1] = VM1(read-write) [dst/hd0base] - [dst/hd0snap1] = VM1(write-only) $ cp -a /tmp/src/hd0base.qcow2 /tmp/dst/hd0base.qcow2 Are we missing a fixup step that changes backing_file in dst/hd0snap1.qcow2 to point at dst/hd0base.qcow2? See above. -- Federico
Re: [Qemu-devel] Live Block Migration using Mirroring
On Wed, Feb 22, 2012 at 5:13 PM, Federico Simoncelli fsimo...@redhat.com wrote: recently I've been working on live block migration combining the live snapshots and the blkmirror patch sent by Marcelo Tosatti few months ago. The design is summarized at this url as Mirrored-Snapshot: http://www.ovirt.org/wiki/Features/Design/StorageLiveMigration After mirrored-snapshot completes we're left with the base and the snapshot. Is the idea to implement live snapshot merge next? Or do you have something else planned to avoid growing the backing file chain each time mirrored-snapshot is used? Stefan
Re: [Qemu-devel] Live Block Migration using Mirroring
- Original Message - From: Stefan Hajnoczi stefa...@gmail.com To: Federico Simoncelli fsimo...@redhat.com Cc: qemu-devel@nongnu.org, kw...@redhat.com, mtosa...@redhat.com Sent: Thursday, February 23, 2012 5:35:23 PM Subject: Re: [Qemu-devel] Live Block Migration using Mirroring On Wed, Feb 22, 2012 at 5:13 PM, Federico Simoncelli fsimo...@redhat.com wrote: recently I've been working on live block migration combining the live snapshots and the blkmirror patch sent by Marcelo Tosatti few months ago. The design is summarized at this url as Mirrored-Snapshot: http://www.ovirt.org/wiki/Features/Design/StorageLiveMigration After mirrored-snapshot completes we're left with the base and the snapshot. Is the idea to implement live snapshot merge next? Or do you have something else planned to avoid growing the backing file chain each time mirrored-snapshot is used? The general idea is that I don't expect the need to migrate to a new storage to be very frequent. Being able to live merge the new snapshot would be great. -- Federico