Re: A Live Backup feature for KVM

2011-04-25 Thread Jagane Sundar

Hello Stefan,

It's good to know that live snapshots and online backup are useful
functions.

I read through the two snapshot proposals that you pointed me at.

The direction that I chose to go is slightly different. In both of the
proposals you pointed me at, the original virtual disk is made
read-only and the VM writes to a different COW file. After backup
of the original virtual disk file is complete, the COW file is merged
with the original vdisk file.

Instead, I create an Original-Blocks-COW-file to store the original
blocks that are overwritten by the VM everytime the VM performs
a write while the backup is in progress. Livebackup copies these
underlying blocks from the original virtual disk file before the VM's
write to the original virtual disk file is scheduled. The advantage of
this is that there is no merge necessary at the end of the backup, we
can simply delete the Original-Blocks-COW-file.

I have some reasons to believe that the Original-Blocks-COW-file
design that I am putting forth might work better. I have listed them
below. (It's past midnight here, so pardon me if it sounds garbled -- I
will try to clarify more in a writeup on wiki.qemu.org).
Let me know what your thoughts are..

I feel that the livebackup mechanism will impact the running VM
less. For example, if something goes wrong with the backup process,
then we can simply delete the Original-Blocks-COW-file and force
the backup client to do a full backup the next time around. The
running VM or its virtual disks are not impacted at all.

Adjunct functionality such as block migration and live migration
might work easier with the Original-Blocks-COW-file way, since
the original virtual disk file functions as the only virtual disk
file for the VM. If a live migration needs to happen while a
backup is in progress, we can just delete the Original-Blocks-COW-file
and be on our way.

Livebackup includes a rudimentary network protocol to transfer
the modified blocks to a livebackup_client. It supports incremental
backups. Also, livebackup treats a backup as containing all the virtual
disks of a VM. Hence a snapshot in livebackup terms refer to a
snapshot of all the virtual disks.

The approximate sequence of operation is as follows:
1. VM boots up. When bdrv_open_common opens any file backed
virtual disk, it checks for a file called base_file.livebackupconf.
If such a file exists, then the virtual disk is part of the backup set,
and a chunk of memory is allocated to keep track of dirty blocks.
2. qemu starts up a  livebackup thread that listens on a specified port
(e.g) port 7900, for connections from the livebackup client.
3. The livebackup_client connects to qemu at port 7900.
4. livebackup_client sends a 'do snapshot' command.
5. qemu waits 30 seconds for outstanding asynchronous I/O to complete.
6. When there are no more outstanding async I/O requests, qemu
copies the dirty_bitmap to its snapshot structure and starts a new 
dirty

bitmap.
7. livebackup_client starts iterating through the list of dirty blocks, and
starts saving these blocks to the backup image
8. When all blocks have been backed up, then the backup_client sends a
destroy snapshot command; the server simply deletes the
Original-Blocks-COW-files for each of the virtual disks and frees the
calloc'd memory holding the dirty blocks list.

Thanks for the pointers to virtagent and fsfreeze. fsfreeze looks 
exactly like

what is necessary to quiesce file system activity.

I have pushed my code to the following git tree.
git://github.com/jagane/qemu-kvm-livebackup.git

It started as a clone of the linux kvm tree at:

git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git

If you want to look at the code, see livebackup.[ch] and livebackup_client.c

This is very much a work in progress, and I expect to do a lot of
testing/debugging over the next few weeks. I will also create a
detailed proposal on wiki.qemu.org, with much more information.

Thanks,
Jagane

On 4/24/2011 1:32 AM, Stefan Hajnoczi wrote:

On Sun, Apr 24, 2011 at 12:17 AM, Jagane Sundarjag...@sundar.org  wrote:

I would like to get your input on a KVM feature that I am
currently developing.

What it does is this - it can perform full and incremental
disk backups of running KVM VMs, where a backup is defined
as a snapshot of the disk state of all virtual disks
configured for the VM.

Great, there is definitely demand for live snapshots and online
backup.  Some efforts are already underway to implement this.

Jes has worked on a live snapshot feature for online backups.  The
snapshot_blkdev QEMU monitor command is available in qemu.git and
works like this:
qemu  snapshot_blockdev virtio-disk0 /tmp/new-img.qcow2

It will create a new image file backed by the current image file.  It
then switches the VM disk to the new image file.  All writes will go
to the new image file.  The backup software on the host can now read
from the original image file since it will not be 

Re: A Live Backup feature for KVM

2011-04-25 Thread Stefan Hajnoczi
On Mon, Apr 25, 2011 at 9:16 AM, Jagane Sundar jag...@sundar.org wrote:
 The direction that I chose to go is slightly different. In both of the
 proposals you pointed me at, the original virtual disk is made
 read-only and the VM writes to a different COW file. After backup
 of the original virtual disk file is complete, the COW file is merged
 with the original vdisk file.

 Instead, I create an Original-Blocks-COW-file to store the original
 blocks that are overwritten by the VM everytime the VM performs
 a write while the backup is in progress. Livebackup copies these
 underlying blocks from the original virtual disk file before the VM's
 write to the original virtual disk file is scheduled. The advantage of
 this is that there is no merge necessary at the end of the backup, we
 can simply delete the Original-Blocks-COW-file.

The advantage of the approach that redirects writes to a new file
instead is that the heavy work of copying data is done asynchronously
during the merge operation instead of in the write path which will
impact guest performance.

Here's what I understand:

1. User takes a snapshot of the disk, QEMU creates old-disk.img backed
by the current-disk.img.
2. Guest issues a write A.
3. QEMU reads B from current-disk.img.
4. QEMU writes B to old-disk.img.
5. QEMU writes A to current-disk.img.
6. Guest receives write completion A.

The tricky thing is what happens if there is a failure after Step 5.
If writes A and B were unstable writes (no fsync()) then no ordering
is guaranteed and perhaps write A reached current-disk.img but write B
did not reach old-disk.img.  In this case we no longer have a
consistent old-disk.img snapshot - we're left with an updated
current-disk.img and old-disk.img does not have a copy of the old
data.

The solution is to fsync() after Step 4 and before Step 5 but this
will hurt performance.  We now have an extra read, write, and fsync()
on every write.

 I have some reasons to believe that the Original-Blocks-COW-file
 design that I am putting forth might work better. I have listed them
 below. (It's past midnight here, so pardon me if it sounds garbled -- I
 will try to clarify more in a writeup on wiki.qemu.org).
 Let me know what your thoughts are..

 I feel that the livebackup mechanism will impact the running VM
 less. For example, if something goes wrong with the backup process,
 then we can simply delete the Original-Blocks-COW-file and force
 the backup client to do a full backup the next time around. The
 running VM or its virtual disks are not impacted at all.

Abandoning snapshots is not okay.  Snapshots will be used in scenarios
beyond backup and I don't think we can make them
unreliable/throw-away.

 Livebackup includes a rudimentary network protocol to transfer
 the modified blocks to a livebackup_client. It supports incremental
 backups. Also, livebackup treats a backup as containing all the virtual
 disks of a VM. Hence a snapshot in livebackup terms refer to a
 snapshot of all the virtual disks.

 The approximate sequence of operation is as follows:
 1. VM boots up. When bdrv_open_common opens any file backed
    virtual disk, it checks for a file called base_file.livebackupconf.
    If such a file exists, then the virtual disk is part of the backup set,
    and a chunk of memory is allocated to keep track of dirty blocks.
 2. qemu starts up a  livebackup thread that listens on a specified port
    (e.g) port 7900, for connections from the livebackup client.
 3. The livebackup_client connects to qemu at port 7900.
 4. livebackup_client sends a 'do snapshot' command.
 5. qemu waits 30 seconds for outstanding asynchronous I/O to complete.
 6. When there are no more outstanding async I/O requests, qemu
    copies the dirty_bitmap to its snapshot structure and starts a new dirty
    bitmap.
 7. livebackup_client starts iterating through the list of dirty blocks, and
    starts saving these blocks to the backup image
 8. When all blocks have been backed up, then the backup_client sends a
    destroy snapshot command; the server simply deletes the
    Original-Blocks-COW-files for each of the virtual disks and frees the
    calloc'd memory holding the dirty blocks list.

I think there's a benefit to just pointing at
Original-Blocks-COW-files and letting the client access it directly.
This even works with shared storage where the actual backup work is
performed on another host via access to a shared network filesystem or
LUN.  It may not be desirable to send everything over the network.


Perhaps you made a custom network client because you are writing a
full-blown backup solution for KVM?  In that case it's your job to
move the data around and get it backed up.  But from QEMU's point of
view we just need to provide the data and it's up to the backup
software to send it over the network and do its magic.

 I have pushed my code to the following git tree.
 git://github.com/jagane/qemu-kvm-livebackup.git

 It started as a clone of the linux kvm tree 

Re: A Live Backup feature for KVM

2011-04-25 Thread Jagane Sundar

On 4/25/2011 6:34 AM, Stefan Hajnoczi wrote:

On Mon, Apr 25, 2011 at 9:16 AM, Jagane Sundarjag...@sundar.org  wrote:

The direction that I chose to go is slightly different. In both of the
proposals you pointed me at, the original virtual disk is made
read-only and the VM writes to a different COW file. After backup
of the original virtual disk file is complete, the COW file is merged
with the original vdisk file.

Instead, I create an Original-Blocks-COW-file to store the original
blocks that are overwritten by the VM everytime the VM performs
a write while the backup is in progress. Livebackup copies these
underlying blocks from the original virtual disk file before the VM's
write to the original virtual disk file is scheduled. The advantage of
this is that there is no merge necessary at the end of the backup, we
can simply delete the Original-Blocks-COW-file.

The advantage of the approach that redirects writes to a new file
instead is that the heavy work of copying data is done asynchronously
during the merge operation instead of in the write path which will
impact guest performance.

Here's what I understand:

1. User takes a snapshot of the disk, QEMU creates old-disk.img backed
by the current-disk.img.
2. Guest issues a write A.
3. QEMU reads B from current-disk.img.
4. QEMU writes B to old-disk.img.
5. QEMU writes A to current-disk.img.
6. Guest receives write completion A.

The tricky thing is what happens if there is a failure after Step 5.
If writes A and B were unstable writes (no fsync()) then no ordering
is guaranteed and perhaps write A reached current-disk.img but write B
did not reach old-disk.img.  In this case we no longer have a
consistent old-disk.img snapshot - we're left with an updated
current-disk.img and old-disk.img does not have a copy of the old
data.


In both approaches the number of I/O operations remains constant:

WRITES_TO_NEW_FILE_APPROACH
Create snapshot
- As new writes from the VM come in:
1. Write to new-disk.img
Asynchronously:
a. Read from new-disk.img
b. Write into old-disk.img
Delete snapshot

WRITES_TO_CURRENT_FILE_APPROACH
Create snapshot
- As new writes from the VM come in:
1. Read old block from current-disk.img
2. Write old block to old-disk.img
3. Write new block to current-disk.img
Delete snapshot

The number of I/O operations is 2 writes and 1 read, in both cases.
The critical factor, then, is the duration for which the VM must
maintain the snapshot.


The solution is to fsync() after Step 4 and before Step 5 but this
will hurt performance.  We now have an extra read, write, and fsync()
on every write.

I agree - fsync() just defeats the whole purpose of building a super 
efficient

live backup mechanism. I'm not planning to introduce fsync()s.
However, I want to treat the snapshot as a limited snapshot, only for backup
purposes. In my proposal, the old-disk.img is valid only for the time when
the livebackup client connects to qemu and transfers the blocks for
that backup over. If the disk suffers an intermittent failure after (5),
then the snapshot is deemed inconsistent, and discarded.


I have some reasons to believe that the Original-Blocks-COW-file
design that I am putting forth might work better. I have listed them
below. (It's past midnight here, so pardon me if it sounds garbled -- I
will try to clarify more in a writeup on wiki.qemu.org).
Let me know what your thoughts are..

I feel that the livebackup mechanism will impact the running VM
less. For example, if something goes wrong with the backup process,
then we can simply delete the Original-Blocks-COW-file and force
the backup client to do a full backup the next time around. The
running VM or its virtual disks are not impacted at all.

Abandoning snapshots is not okay.  Snapshots will be used in scenarios
beyond backup and I don't think we can make them
unreliable/throw-away.

My proposal is to treat the snapshot as a specific to livebackup entitiy 
that exists
only for the duration of the livebackup_client's connection to qemu to 
transfer
the blocks over. At other times, there is no snapshot, just a dirty 
blocks bitmap

indicating which blocks were modified since the last backup was taken.

Consider the use case of daily incremental backups:

WRITES_TO_NEW_FILE_APPROACH
- 1:00 AM Create snapshot A
24 hours go by. All writes by the VM
during this time are stored in the new-disk.img file.
- 1 AM next day, the backup program starts copying its
  incremental backup blocks, i.e. the blocks that were modified
  in the last 24 hours, and are all stored in new-disk.img
- 1:15 AM Merge snapshot A
 The asynchronous process now kicks in, and starts merging
the blocks from new-disk.img into the old-disk.img
- 1:15 AM Create snapshot B

WRITES_TO_CURRENT_FILE_APPROACH
- 1:00 AM livebackup_client connects to qemu and creates snapshot
- livebackup_client starts transferring blocks modified by VM
  in the last 24 hours over the network to the backup server.
  Let's say that this takes about 15 minutes.
- 

Re: A Live Backup feature for KVM

2011-04-24 Thread Stefan Hajnoczi
On Sun, Apr 24, 2011 at 12:17 AM, Jagane Sundar jag...@sundar.org wrote:
 I would like to get your input on a KVM feature that I am
 currently developing.

 What it does is this - it can perform full and incremental
 disk backups of running KVM VMs, where a backup is defined
 as a snapshot of the disk state of all virtual disks
 configured for the VM.

Great, there is definitely demand for live snapshots and online
backup.  Some efforts are already underway to implement this.

Jes has worked on a live snapshot feature for online backups.  The
snapshot_blkdev QEMU monitor command is available in qemu.git and
works like this:
qemu snapshot_blockdev virtio-disk0 /tmp/new-img.qcow2

It will create a new image file backed by the current image file.  It
then switches the VM disk to the new image file.  All writes will go
to the new image file.  The backup software on the host can now read
from the original image file since it will not be modified.

There is no support yet for live merging the new image file back into
the original image file (live commit).

Here are some of the workflows and requirements:

http://wiki.qemu.org/Features/Snapshots
http://wiki.qemu.org/Features/Snapshots2
http://wiki.qemu.org/Features/Block/Merge

It is possible to find the dirty blocks by enumerating allocated
clusters in the new image file - these are the clusters that have been
written to since the snapshot.

 My proposal will also eventually need the capability to run an
 agent in the guest for sync'ing the filesystem, flushing
 database caches, etc. I am also unsure whether just sync'ing
 a ext3 or ext4 FS and then snapshotting is adequate for backup
 purposes.

virtagent is being developed by Mike Roth as a guest agent for QEMU.
One of the use cases for virtagent is backup/snapshots and Jes has
submitted patches to add file system freeze.  You can find both
virtagent and fsfreeze on the qemu mailing list.

 Please let me know if you find this feature interesting. I am
 looking forward to feedback on any and all aspects of this
 design. I would like to work with the KVM community to
 contribute this feature to the KVM code base.

Do you have a link to a git repo with your code?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


A Live Backup feature for KVM

2011-04-23 Thread Jagane Sundar

Hello All,

I would like to get your input on a KVM feature that I am
currently developing.

What it does is this - it can perform full and incremental
disk backups of running KVM VMs, where a backup is defined
as a snapshot of the disk state of all virtual disks
configured for the VM.

This backup mechanism is built by modifying the qemu-kvm
userland process, and works as follows:
- If a VM is configured for backup, qemu-kvm maintains a
   dirty blocks list since the last backup. Note that this
   is different from the dirty blocks list currently
   maintained for block migration purposes in that it is
   persistent across VM reboots.
- qemu-kvm creates a thread and listens for backup clients.
- A backup client connects to qemu-kvm and initiates an
   incremental backup.
  * A snapshot of each virtual disk is created by
qemu-kvm. This is as simple as saving the dirty
blocks map in the snapshot structure
  * The dirty blocks are now transferred over to the
backup client.
  * While this transfer is in progress, if any blocks
are written by the VM, the livebackup code
intercepts these writes, saves the old blocks in
a qcow2 file, and then allows the write to progress.
  * When the transfer of all dirty blocks in the
incremental backup is completed, then the snapshot
is destroyed.

I have considered other technologies that may be utilized
to solve the same problem such as LVM snapshots. It is
possible to create a new LVM partition for each virtual disk
in the VM. When a VM needs to be backed up, each of these LVM
partitions is snapshotted. At this point things get messy
- I don't really know of a good way to identify the blocks
that were modified since the last backup. Also, once these
blocks are identified, we need a mechanism to transfer
them over a TCP connection to the backup server. Perhaps
a way to export the 'dirty blocks' map to userland and use
a deamon to transfer the block. Or maybe a kernel thread
capable of listening on TCP sockets and transferring the
blocks over to the backup client (I don't know if this
is possible).

In any case, my first attempt is to implement this in the
qemu-kvm userland binary.

The benefit to the end user of this technology is this: Today
IaaS cloud platforms such as EC2 provide you with the ability
to have two types of virtual disks in VM instances
1. Ephemeral virtual disks that are lost if there is a
hardware failure
2. EBS storage volumes which are costly.

I think that an efficient disk backup mechanism will enable
a third type of virtual disk - one that is backed up, perhaps
every hour or so. So a cloud operator using KVM virtual
machines can offer three types of VMS:
1. An ephemeral VM that is lost if a hardware failure happens
2. A backed up VM that can be restored from the last hourly
backup
3. A fully highly-available VM running off of a NAS or SAN
or some such shared storage.

VMware has extensive support for backing up running Virtual
Machines in their products. It is called VMware Consolidated
Backup. A lot of it seems to be targeted at Windows VMs,
with hooks provided into Microsoft's Volume Snapshot Service
running in the guest.

My proposal will also eventually need the capability to run an
agent in the guest for sync'ing the filesystem, flushing
database caches, etc. I am also unsure whether just sync'ing
a ext3 or ext4 FS and then snapshotting is adequate for backup
purposes.

I want to target this feature squarely at the cloud use model,
with automated backups scheduled for instances created using
an EC2 or Openstack API.

Please let me know if you find this feature interesting. I am
looking forward to feedback on any and all aspects of this
design. I would like to work with the KVM community to
contribute this feature to the KVM code base.

Thanks,
Jagane Sundar

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html