Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-28 Thread Stefan Hajnoczi
On Wed, Jul 27, 2011 at 3:17 PM, Eric Blake  wrote:
> On 07/27/2011 04:04 AM, Stefan Hajnoczi wrote:
>>
>> On Thu, Jun 16, 2011 at 6:41 AM, Eric Blake  wrote:
>>>
>>> Right now, libvirt has a snapshot API via virDomainSnapshotCreateXML,
>>> but for qemu domains, it only works if all the guest disk images are
>>> qcow2, and qemu rather than libvirt does all the work.  However, it has
>>> a couple of drawbacks: it is inherently tied to domains (there is no way
>>> to manage snapshots of storage volumes not tied to a domain, even though
>>> libvirt does that for qcow2 images associated with offline qemu domains
>>> by using the qemu-img application).  And it necessarily operates on all
>>> of the images associated with a domain in parallel - if any disk image
>>> is not qcow2, the snapshot fails, and there is no way to select a subset
>>> of disks to save.  However, it works on both active (disk and memory
>>> state) and inactive domains (just disk state).
>>
>> Hi Eric,
>> Any updates on your proposed snapshot API enhancements?
>
> I still need to post a v2 RFC that gives the XML changes needed to support
> disk snapshots on top of virDomainSnapshotCreateXML.
>
> Meanwhile, I still want to add additional API to make it easier to manage
> offline storage volume snapshots (storage volumes that are not in use by a
> defined or running domain), although it obviously won't happen by 0.9.4.

Previous discussion has focussed on image files.  Is part of your plan
to extend virStorageBackend and let the LVM backend support the new
APIs?

>> The use case I am particularly interested in is a backup solution
>> using libvirt snapshot APIs to take consistent backups of guests.  The
>> two workflows are reading out full snapshots of disk images (full
>> backup) and reading only those blocks that have changed since the last
>> backup (incremental backup).
>
> Full backup should be supported via virDomainSnapshotCreateXML, once I have
> the new XML in place, by using the new qemu snapshot_blkdev monitor
> commands.

Excellent :)

>> Incremental backups can be done just like full backups except with an
>> API call to get a dirty blocks list.  The client only reads out those
>> dirty blocks from the snapshot.
>
> Incremental backups will need more work - qemu does not yet have the monitor
> commands for exposing which blocks are dirty.  Of course, as code is written
> for qemu, it would be nice to also be thinking about how to support that in
> libvirt, so once I do propose my XML changes for full snapshots, it would be
> nice to remember in your review to think about whether it remains extensible
> to the incremental case.

Robert Wang and I would like to help make the incremental use case
possible.  Like you say, QEMU does not have this feature yet but it
makes sense to plan together with libvirt.  Will keep you CCed on QEMU
discussions about adding this feature.

Stefan

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-27 Thread Eric Blake

On 07/27/2011 04:04 AM, Stefan Hajnoczi wrote:

On Thu, Jun 16, 2011 at 6:41 AM, Eric Blake  wrote:

Right now, libvirt has a snapshot API via virDomainSnapshotCreateXML,
but for qemu domains, it only works if all the guest disk images are
qcow2, and qemu rather than libvirt does all the work.  However, it has
a couple of drawbacks: it is inherently tied to domains (there is no way
to manage snapshots of storage volumes not tied to a domain, even though
libvirt does that for qcow2 images associated with offline qemu domains
by using the qemu-img application).  And it necessarily operates on all
of the images associated with a domain in parallel - if any disk image
is not qcow2, the snapshot fails, and there is no way to select a subset
of disks to save.  However, it works on both active (disk and memory
state) and inactive domains (just disk state).


Hi Eric,
Any updates on your proposed snapshot API enhancements?


I still need to post a v2 RFC that gives the XML changes needed to 
support disk snapshots on top of virDomainSnapshotCreateXML.


Meanwhile, I still want to add additional API to make it easier to 
manage offline storage volume snapshots (storage volumes that are not in 
use by a defined or running domain), although it obviously won't happen 
by 0.9.4.




The use case I am particularly interested in is a backup solution
using libvirt snapshot APIs to take consistent backups of guests.  The
two workflows are reading out full snapshots of disk images (full
backup) and reading only those blocks that have changed since the last
backup (incremental backup).


Full backup should be supported via virDomainSnapshotCreateXML, once I 
have the new XML in place, by using the new qemu snapshot_blkdev monitor 
commands.




Incremental backups can be done just like full backups except with an
API call to get a dirty blocks list.  The client only reads out those
dirty blocks from the snapshot.


Incremental backups will need more work - qemu does not yet have the 
monitor commands for exposing which blocks are dirty.  Of course, as 
code is written for qemu, it would be nice to also be thinking about how 
to support that in libvirt, so once I do propose my XML changes for full 
snapshots, it would be nice to remember in your review to think about 
whether it remains extensible to the incremental case.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-27 Thread Stefan Hajnoczi
On Thu, Jun 16, 2011 at 6:41 AM, Eric Blake  wrote:
> Right now, libvirt has a snapshot API via virDomainSnapshotCreateXML,
> but for qemu domains, it only works if all the guest disk images are
> qcow2, and qemu rather than libvirt does all the work.  However, it has
> a couple of drawbacks: it is inherently tied to domains (there is no way
> to manage snapshots of storage volumes not tied to a domain, even though
> libvirt does that for qcow2 images associated with offline qemu domains
> by using the qemu-img application).  And it necessarily operates on all
> of the images associated with a domain in parallel - if any disk image
> is not qcow2, the snapshot fails, and there is no way to select a subset
> of disks to save.  However, it works on both active (disk and memory
> state) and inactive domains (just disk state).

Hi Eric,
Any updates on your proposed snapshot API enhancements?

The use case I am particularly interested in is a backup solution
using libvirt snapshot APIs to take consistent backups of guests.  The
two workflows are reading out full snapshots of disk images (full
backup) and reading only those blocks that have changed since the last
backup (incremental backup).

Incremental backups can be done just like full backups except with an
API call to get a dirty blocks list.  The client only reads out those
dirty blocks from the snapshot.

Stefan

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-08 Thread Eric Blake
On 07/08/2011 07:35 AM, Jes Sorensen wrote:
> On 07/08/11 10:58, Stefan Hajnoczi wrote:
>> On Thu, Jul 7, 2011 at 8:34 PM, Eric Blake  wrote:
>>> Well, the best thing (from libvirt's point of view) would be if
>>> snapshot_blkdev took a single string argument, which is either a
>>> /path/to/filename (and qemu does open()) or fd:name notation (to refer
>>> to a previously-named fd passed via the getfd monitor command, so that
>>> libvirt does open()).  This would make SELinux integration easier, as
>>> one of the sVirt goals is to get to the point where we can use SELinux
>>> to forbid qemu from open()ing files on NFS shares, while still
>>> permitting all other operations on already-open fds passed in from libvirt.
>>
>> Today QEMU supports /path/to/filename.  An fd argument to
>> snapshot_blkdev requires a little bit of work since the QEMU block
>> layer .bdrv_create() interface takes a filename and tries to create
>> it.
>>
>> Jes: Is the fd argument to snapshot_blkdev in your plans?
> 
> I only ever heard suggestions for taking fd arguments yesterday, so I
> cannot say it really is in my plans. If I get a good justification I
> might be convinced :)

I already gave the justification - SELinux.

For a disk mounted over NFS, the current SELinux bool virt_use_nfs is a
bit broad (it gives all-or-nothing access to qemu to be able to open()
arbitrary files on NFS).  The way to tighten it is to instead have
libvirt in charge of opening any file on NFS, then pass that fd to qemu
- in which case SELinux policy can give qemu carte-blanche to do
anything on an existing NFS fd, but not to open() a new one.  Then sVirt
has gained an additional layer of protection where qemu cannot blindly
stomp on another VM's storage merely because both VMs happened to store
their disk images on the same NFS server.  But this requires that _all_
places in qemu that currently open() a file also be given an alternative
to get the fd via command-line or getfd inheritance.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-08 Thread Jes Sorensen
On 07/08/11 10:58, Stefan Hajnoczi wrote:
> On Thu, Jul 7, 2011 at 8:34 PM, Eric Blake  wrote:
>> Well, the best thing (from libvirt's point of view) would be if
>> snapshot_blkdev took a single string argument, which is either a
>> /path/to/filename (and qemu does open()) or fd:name notation (to refer
>> to a previously-named fd passed via the getfd monitor command, so that
>> libvirt does open()).  This would make SELinux integration easier, as
>> one of the sVirt goals is to get to the point where we can use SELinux
>> to forbid qemu from open()ing files on NFS shares, while still
>> permitting all other operations on already-open fds passed in from libvirt.
> 
> Today QEMU supports /path/to/filename.  An fd argument to
> snapshot_blkdev requires a little bit of work since the QEMU block
> layer .bdrv_create() interface takes a filename and tries to create
> it.
> 
> Jes: Is the fd argument to snapshot_blkdev in your plans?

I only ever heard suggestions for taking fd arguments yesterday, so I
cannot say it really is in my plans. If I get a good justification I
might be convinced :)

Cheers,
Jes

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-08 Thread Stefan Hajnoczi
On Thu, Jul 7, 2011 at 8:34 PM, Eric Blake  wrote:
> On 07/07/2011 03:13 AM, Stefan Hajnoczi wrote:
>> On Wed, Jul 6, 2011 at 3:03 PM, Eric Blake  wrote:
>>> In other words, it looks like we are stuck with updating XML to track
>>> new file names any time we take a snapshot.
>>
>> Yes, but QEMU's snapshot_blkdev command takes a filename argument so
>> at least you get to specify that new filename.
>
> Well, the best thing (from libvirt's point of view) would be if
> snapshot_blkdev took a single string argument, which is either a
> /path/to/filename (and qemu does open()) or fd:name notation (to refer
> to a previously-named fd passed via the getfd monitor command, so that
> libvirt does open()).  This would make SELinux integration easier, as
> one of the sVirt goals is to get to the point where we can use SELinux
> to forbid qemu from open()ing files on NFS shares, while still
> permitting all other operations on already-open fds passed in from libvirt.

Today QEMU supports /path/to/filename.  An fd argument to
snapshot_blkdev requires a little bit of work since the QEMU block
layer .bdrv_create() interface takes a filename and tries to create
it.

Jes: Is the fd argument to snapshot_blkdev in your plans?

Stefan

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-07 Thread Eric Blake
On 07/07/2011 03:13 AM, Stefan Hajnoczi wrote:
> On Wed, Jul 6, 2011 at 3:03 PM, Eric Blake  wrote:
>> In other words, it looks like we are stuck with updating XML to track
>> new file names any time we take a snapshot.
> 
> Yes, but QEMU's snapshot_blkdev command takes a filename argument so
> at least you get to specify that new filename.

Well, the best thing (from libvirt's point of view) would be if
snapshot_blkdev took a single string argument, which is either a
/path/to/filename (and qemu does open()) or fd:name notation (to refer
to a previously-named fd passed via the getfd monitor command, so that
libvirt does open()).  This would make SELinux integration easier, as
one of the sVirt goals is to get to the point where we can use SELinux
to forbid qemu from open()ing files on NFS shares, while still
permitting all other operations on already-open fds passed in from libvirt.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-07 Thread Stefan Hajnoczi
On Wed, Jul 6, 2011 at 3:03 PM, Eric Blake  wrote:
> In other words, it looks like we are stuck with updating XML to track
> new file names any time we take a snapshot.

Yes, but QEMU's snapshot_blkdev command takes a filename argument so
at least you get to specify that new filename.

Stefan

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-06 Thread Eric Blake
On 07/06/2011 02:59 AM, Nicolas Sebrecht wrote:
> The 05/07/11, Eric Blake wrote:
> 
>> I was trying to model this after the virDomainSnapshot API, which has a
>> notion of current snapshot, but I guess I don't fully understand what
>> that API was implying by current domain snapshot.
>>
>> After looking more through the code, it looks like the idea was to
>> support the notion of a branching hierarchy of snapshots:
>>
>> A -> B -> D
>>  \-> C -> E

Someone pointed out to me (I'm not sure if it was in this thread) that
qcow2 does _not_ by itself maintain any hierarchy - from qemu-img's
point of view, all snapshots are equally independent reference-counted
copies of data at a moment in time.  So the hierarchy present in
libvirt's use of virDomainSnapshot* APIs as applied to system
checkpoints is a factor of libvirt maintaining extra metadata alongside
the qcow2 images.  That is, libvirt remembers which checkpoint was most
recently created/reverted, and when creating a new checkpoint, assigns
the current snapshot as the parent of the new one.

> 
> May I ask what would happen if the user asks for deletion of A (as both
> base and snapshot)?  Would it be possible?  Would it be merged to both B
> and C? Using which filename?

Libvirt already handles this just fine for live checkpoints (although in
my testing yesterday, it look like libvirt forgets to maintain a notion
of current snapshot if you used virDomainSnapshotCreateXML while a
domain was inactive, making for an incomplete hierarchy).

Basically, when deleting a snapshot, you have two options - delete that
snapshot and all descendants, or delete just the snapshot and reparent
all immediate children.  That is, in the above scenario, when deleting
B, you can choose whether to end up with:

A -> C -> E   (B and B's children all deleted)

or

A -> D
 \-> C -> E   (B deleted, and D reparented to B's parent)

Right now, libvirt's handling of checkpoint hierarchy under
virDomainSnapshot* API fits in with the qemu 'savevm' usage, which uses
only internal qcow2 snapshots.  So there are no filenames to merge.  You
are indeed correct, though, that when deleting snapshots where snapshots
involve external files, that it would be nice to have user control over
which of the two files gets deleted.  But the same principles apply - if
E.img has C.img as a backing file, then requesting to delete snapshot C
means that either E.img will also need to be deleted, or it will need to
be rebased to have its backing-file updated to match the parent of C.img.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-06 Thread Eric Blake
On 07/06/2011 02:47 AM, Nicolas Sebrecht wrote:
>> And even if you have control over which of the two images to
>> delete, you may also want to have control over the final filename used
>> for the merged image (that is, in the 5% dirty case, use the
>> snapshot->base merge followed by rename(base,snapshot), rather than
>> wasting time on the base->snapshot merge, to still get the end result
>> that the final filename is snapshot).
> 
> I agree with your analysis. The current behaviour is blocking us from
> using qcow2 snapshots while your RFC fix the issues.
> 
> But in your last sentence, what do you mean by "to still get the end
> result that the final filename is snapshot"? As end result, are you
> talking about:
> 
> 1) the filename (which would mean we could have moving path for disks)

Starting from a single file base.img, if I take an external snapshot, I
would now have base.img as the read-only base and new.img as the live
file with a backing of base.img.  If I then want to merge the two files,
I want to choose which file to keep:

if I want to keep base.img, then I either merge the dirty blocks from
new.img back to base.img, or I merge the clean blocks from base.img to
new.img then rename new.img to base.img.

if I want to keep new.img, then I either merge clean blocks from
base.img to new.img, or I merge dirty blocks from new.img to base.img
then rename base.img to new.img.

But your point about changing file names is also an important
consideration - when libvirt tells qemu to do a snapshot, what is really
happening (and should qemu support both modes of operation)?

1. libvirt pre-creates an empty file new.img with correct permissions,
then tells qemu that file name; qemu uses the existing base.img as the
read-only base and makes all further edits into the file new.img; so
libvirt has to update the domain XML

2. libvirt creates a hard link new.img as an alternate name to base.img,
then tells qemu that file name; qemu then opens new.img, unlinks
base.img, and recreates base.img with new.img as the backing file,
making all further edits to the new inode but existing base.img file
name; so libvirt does not have to edit domain XML.  Except that
permissions may prevent qemu from re-creating a file, and truncating a
hard-linked file is insufficient.  So this method would involve some
additional handshaking steps, where qemu would have to get help from
libvirt in re-creating the new file.  So this is a non-starter.

3. libvirt renames base.img to new.img while qemu still has the fd open,
then creates base.img with the right permissions, and tells qemu to make
the snapshot into base.img with a backing of new.img; all further edits
go into the file base.img which now has a backing file of new.img.  But
this method implies that qemu has to either trust that libvirt did the
rename correctly (or compare the inode between an fstat of the existing
open fd and the stat of the backing file name), as well as implying that
libvirt has to pass two filenames instead of one.  It also implies that
you can rename an in-use file (renaming devices doesn't work as well,
and this isn't portable to mingw).  So this also sounds like a non-starter.

4. Any other possibilities?

In other words, it looks like we are stuck with updating XML to track
new file names any time we take a snapshot.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-06 Thread Stefan Hajnoczi
On Tue, Jul 5, 2011 at 8:59 PM, Eric Blake  wrote:
> On 07/04/2011 08:19 AM, Stefan Hajnoczi wrote:
>> On Thu, Jun 16, 2011 at 6:41 AM, Eric Blake  wrote:
>>
>> Robert, Fernando, Jagane: I have CCed you because we have discussed
>> snapshot APIs and I thought you'd be interested in Eric's work to
>> build them for libvirt.
>>
>> Does each volume have its own independent snapshot namespace?  It may
>> be wise to document that snapshot namespaces are *not* independent
>> because storage backends may not be able to provide these semantics.
>
> Good question, and I'm not quite sure on the best way to represent this.
>
> For qcow2 internal snapshots, the answer is obvious - each qcow2 image
> has its own snapshot namespace (and you can currently view this with
> qemu-img snapshot -l).  But it looks like the existing drive to add qemu
> snapshot support to live domains is focusing solely on external
> snapshots (that is, a qcow2 snapshot involves creating a new filename,
> then marking the old filename as a read-only backing image of the new
> qcow2 filename; where qcow2 can even be used as a snapshot around a raw
> file).
>
> For external snapshots, I'm not sure whether the namespace should be
> specific to a storage pool (that is, any additional metadata that
> libvirt needs to track snapshot relationships should be stored in the
> same directory as the disk images themselves) or specific to the libvirt
> host (that is, just as libvirt's notion of a persistent storage pool
> happens to be stored in /etc/libvirt/storage/pool.xml, libvirt should
> also manage a file /etc/libvirt/storage/pool/snapshot.xml to describe
> all snapshots tracked within "pool").  But I'm certainly thinking that
> it is more likely to be a pool-wide namespace, rather than a
> storage-volume local namespace.

Pool-wide seems reasonable.

>>
>> Is this function necessary when you already have
>> virStorageVolSnapshotListNames()?
>
> Maybe I need to understand why we have it for the virDomainSnapshot
> case, and whether it still makes sense for a disk image that is not
> associated with a domain.  To some degree, I think it seems necessary,
> to reflect the fact that with internal qcow2 snapshots, I can do:
>
> qemu-img snapshsot -c one file
> run then stop vm to modify file
> qemu-img snapshot -c two file
> qemu-img snapshot -a one file
> run then stop vm to modify file
> qemu-img snapshot -c three file
>
> with the resulting hierarchy:
>
> one -> two
>   \-> three
>
> On the other hand, qemu-img doesn't appear to list any hierarchies
> between internal snapshots - that is, while 'qemu-img snapshot -l' will
> list one, two, and three, it gives no indication that three depends on
> one but not two, nor whether the current state of the file would be a
> delta against three, two, one, or even parent-less.

There is no explicit relationship between internal qcow2 snapshots.
qcow2 does reference counting of the actual data clusters and tables
but the snapshot itself is oblivious.  You can delete "one" without
affecting "two" or "three".  There is no dependency relationship
between snapshots themselves, only reference counts on data clusters
and tables.

Here is how qcow2 snapshot operations work:

1. Create snapshot
Increment reference counts for entire active image.
Copy active L1 table into the snapshot data structure.

2. Activate snapshot
Decrement reference counts for entire active image.
Copy snapshot L1 table into active data structure.
Increment reference counts for entire active image.

3. Delete snapshot
Decrement reference counts for entire snapshot image.

> This also starts to get into questions about the ability to split a
> qcow2 image with internal snapshots.  That is, if I have a single file
> with snapshot one and a delta against that snapshot as the current disk
> state, it would be nice to create a new qcow2 file with identical
> contents to snapshot one, then rebase the existing qcow2 file to have a
> backing file of my new clone file and delete the internal snapshot from
> the original file.  But this starts to sound like work on live block
> copy APIs.  For an offline storage volume, we can do things manually
> (qemu-img snapshot -c to temporarily create yet another snapshot point
> to later return to, qemu-img snapshot -a to revert to the snapshot of
> interest, then qemu-img convert to copy off the contents, then qemu-img
> snapshot -a to the temporary state, then qemu-img snapshot -d to clean
> up both the temporary andstate).  But for a storage volume currently in
> use by qemu, this would imply a new qemu command to have qemu assist in
> streaming out the contents of the snapshot state.

The current live block copy/image streaming APIs do not know about
internal snapshots.  Copy the contents of a snapshot while the VM is
running is technically doable but there is no API and no code for it
in QEMU.

>>> /* Return the most recent snapshot of a volume, if one exists, or NULL
>>> on failure.  Flags is 0 for now.  */
>>>

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-05 Thread Jagane Sundar

On 7/5/2011 2:36 PM, Eric Blake wrote:

On 07/04/2011 07:53 PM, Jagane Sundar wrote:

Thanks for looping me in, Stefan.


Does each volume have its own independent snapshot namespace?  It may
be wise to document that snapshot namespaces are *not* independent
because storage backends may not be able to provide these semantics.


There is a need for 'just-a-volume-snapshot', and for a
'whole-vm-snapshot'.
The 'whole-vm-snapshot' can possibly be collection of
'just-a-volume-snapshot'.

In the case of the current libvirt API, which makes use of the qemu
'savevm' monitor command (and thus is more of a checkpoint, rather than
just disk snapshots):

'savevm' fails unless all disks associated with the guest are already
qcow2 format.  Additionally, it creates a snapshot visible to 'qemu-img
snapshot -l' in all associated disks, but where only the primary disk
additionally has the state of RAM also in the image.

As for the creating a snapshot commands - the proposal for
virStorageVolSnapshotCreateXML is _solely_ for offline management of
storage volumes.  If a storage volume is in use by a running qemu
domain, then the only appropriate way to take an online snapshot of that
disk (short of stopping the domain to get to the offline snapshot case)
is to use the existing virDomainSnapshotCreateXML API instead.  And that
API is already flexible enough to support 'whole-vm-snapshot' vs.
'just-a-volume-snapshot'.

OK. My use case is solved by virDomainSnapshotCreateXML, so long as this 
function

is enhanced to support disk-only snapshots.


information to explicitly specify the filename to use on the created
external snapshot file (rather than letting libvirt generate the
snapshot name).  That is, provide at least one  element, and you
now have fine-grained control over which volumes get a snapshot (or even
how that snapshot is created).


There are two types of snapshots that I am aware of:
- Base file is left unmodified after snapshot, snapshot file is created
and modified. e.g. qcow2 (I think)
- Base file continues to be modified. The snapshot file gets COW blocks
copied into it. e.g. LVM, Livebackup, etc.

There's a third - the qcow2 internal snapshot:

- Base file contains both the snapshot and the delta.


Can we enhance the libvirt API to indicate what type of snapshot is
desired. Also, when a snapshot is listed, can we try and describe it as
one kind or the other?

Yes, there are already some read-only XML elements in the
  XML (that is, libvirt ignores or rejects them if you
pass them to virDomainSnapshotCreateXML, but virDomainSnapshotGetXMLDesc
will list those additional elements to give you more details about the
snapshot); having a sub-element to state whether the snapshot is
backing-file based (original is now treated as read-only, and
modifications affect the snapshot) or COW based (original and backup
share all blocks to begin with, but as original get modified, the
read-only backup has more unique blocks).



Sounds fine.


There is no facility in the API to track dirty bitmaps. Suppose a disk
format or qemu proper has the ability to maintain a dirty bitmap of
blocks(or clusters) modified since some event (time in ms, perhaps). I
would like libvirt to provide a function such as:

/*
* returns NULL if the underlying block driver does not support
* maintaining a dirty bitmap. If it does support a dirty bitmap,
* the driver returns an opaque object that represents the time
* since which this dirty bitmap is valid.
*
* Used by incremental backup programs to determine if qemu
* has a bitmap of blocks that were dirtied since the last time
* a backup was taken.
*/
virStorageDirtyBitmapTimeOpaquePtr
virStorageVolDirtyBitmapPresent(virStorageVolPtr vol)

Yes, we already had a discussion about the utility of being able to
expose how much of an image is directly contained within a file, vs.
being pulled in from a backing file (which can also be read as how much
of an image is dirty compared to the state of a snapshot).  See Daniel's
earlier thoughts:
https://www.redhat.com/archives/libvir-list/2011-April/msg00555.html

You are right. The thread that you point to is indeed more relevant to 
my use

case.

I missed that thread of conversation. However, I do not see a conclusion 
to the
discussion. My position is similar to Daniel's - I would find an API 
that exposes
'logical allocation information' (to quote Daniel) useful. His use case 
is for COR

image streaming. My use case is for making incremental backups. The backup
program wants to know which blocks were modified since the last backup
was taken.

Thanks,
Jagane

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-05 Thread Eric Blake
On 07/05/2011 04:02 AM, Stefan Hajnoczi wrote:
> On Tue, Jul 5, 2011 at 2:53 AM, Jagane Sundar  wrote:
>>> /* Create a snapshot of a storage volume.  XML is optional, if non-NULL,
>>>  * it would be a new top-level element  which is similar to
>>>  * the top-level  for virDomainSnapshotCreateXML, to
>>>  * specify name and description. Flags is 0 for now.
>>>  */
>>> virStorageVolSnapshotPtr virDomainSnapshotCreateXML(
>>> virStorageVolPtr vol, const char *xml, unsigned int flags);
>>>
>> There are two types of snapshots that I am aware of:
>> - Base file is left unmodified after snapshot, snapshot file is created and
>> modified. e.g. qcow2 (I think)
> 
> More detail on this approach as implemented by QEMU's snapshot_blkdev:
> 
> Create snapshot.qcow2 with base.img as backing file.  base.img is now
> read-only and can be accessed as a "snapshot".  All writes go to
> snapshot.qcow2.
> 
> When the snapshot is no longer needed it is necessary to merge the COW
> data back into base.img before deleting snapshot.qcow2.

or, to merge all of base.img into snapshot.qcow2 then change
snapshot.qcow2 to no longer have a backing file, before deleting base.img.

As I understand it, either file can be deleted when a snapshot is no
longer needed, but having the flexibility to decide which of the two
files to delete would be useful, and may require knowing how dirty the
snapshot file is in relation to the original file (if it is 95% dirty,
it is faster to just pull in the last few blocks from base into snapshot
before deleting base, whereas if it is only 5% dirty, it is faster to
sync the dirtied blocks from snapshot back to base before deleting
snapshot).  And even if you have control over which of the two images to
delete, you may also want to have control over the final filename used
for the merged image (that is, in the 5% dirty case, use the
snapshot->base merge followed by rename(base,snapshot), rather than
wasting time on the base->snapshot merge, to still get the end result
that the final filename is snapshot).

>  This merge
> has not been implemented in QEMU yet.

Not to mention that it overlaps somewhat with the concept of live block
copying.

> 
>> - Base file continues to be modified. The snapshot file gets COW blocks
>> copied into it. e.g. LVM, Livebackup, etc.
>>
>> Can we enhance the libvirt API to indicate what type of snapshot is desired.
>> Also, when a snapshot is listed, can we try and describe it as one kind or
>> the other?
> 
> I think the snapshot mechanism will depend on your storage backend.
> If the disk image is an LVM volume, then it is natural for the
> snapshot to be an LVM snapshot.  If the disk image is a qcow2 file,
> then it is natural for the snapshot be a QEMU snapshot_blkdev
> snapshot.

What if it is both at once?  That is, it is possible to create an LVM
partition whose contents are a qcow2 image.  In that case, it seems like
the user might want the flexibility to determine whether the snapshot is
done at the qcow2 level or at the LVM level.

> 
> Also, it is often not possible to mix these snapshot mechanisms.  For
> example, LVM snapshots don't work on qcow2 image files.

Why not?  They might not be as space-efficient (the whole idea of LVM
cloning is that each block of the original LVM partition is now
COW-shared between multiple partitions, and that the backup partition
only consumes as additional space according to the amount of blocks that
get dirtied in the original partition), but I'm not seeing a technical
reason that would prohibit them (and I welcome evidence to the contrary,
so that I know more about what I am up against).

> 
> Does the application have to be aware of which snapshotting approach
> is used by the backend?  Perhaps there are a few cases where it is
> technically possible to mix-and-match but it just seems to expose
> complexity without much gain.
> 
> Put another way: "If a storage backend fundamentally doesn't support
> snapshotting the you like, use a different backend".

So is this an accurate summary of your suggestion?

vir{StorageVol,Domain}SnapshotGetXMLDesc should have a sub-element or
attribute stating which method of snapshotting is in use, but other than
telling you about the method, libvirt doesn't expose any further control
over the matter (each disk gets snapshotted in the most efficient manner
for that disk, given the constraints of the storage pool [directory vs.
LVM partition] and image type [raw vs. qcow2 vs. qed] involved in that
storage volume).

> 
> Yes, dirty bitmap support is important.  This will make backup much
> more efficient on storage backends that support it.
> 
> For QEMU image files it will be possible to provide dirty block
> information in the future.  btrfs and a SAN appliance that I have
> looked both have mechanisms that could be used to provide dirty block
> tracking.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP d

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-05 Thread Eric Blake
On 07/04/2011 07:53 PM, Jagane Sundar wrote:
> Thanks for looping me in, Stefan.
> 
>> Does each volume have its own independent snapshot namespace?  It may
>> be wise to document that snapshot namespaces are *not* independent
>> because storage backends may not be able to provide these semantics.
>>
> There is a need for 'just-a-volume-snapshot', and for a
> 'whole-vm-snapshot'.
> The 'whole-vm-snapshot' can possibly be collection of
> 'just-a-volume-snapshot'.

In the case of the current libvirt API, which makes use of the qemu
'savevm' monitor command (and thus is more of a checkpoint, rather than
just disk snapshots):

'savevm' fails unless all disks associated with the guest are already
qcow2 format.  Additionally, it creates a snapshot visible to 'qemu-img
snapshot -l' in all associated disks, but where only the primary disk
additionally has the state of RAM also in the image.

As for the creating a snapshot commands - the proposal for
virStorageVolSnapshotCreateXML is _solely_ for offline management of
storage volumes.  If a storage volume is in use by a running qemu
domain, then the only appropriate way to take an online snapshot of that
disk (short of stopping the domain to get to the offline snapshot case)
is to use the existing virDomainSnapshotCreateXML API instead.  And that
API is already flexible enough to support 'whole-vm-snapshot' vs.
'just-a-volume-snapshot'.

The existing virDomainSnapshotCreateXML API is currently mapped to the
'savevm' command (which takes a checkpoint, which is the
whole-vm-snapshot + memory), but can easily be modified to take just
disk snapshots, and I already mentioned doing that by modifying the XML
and adding a flag.  That is, on creation:

virDomainSnapshotCreateXML(domain, "

  whatever

", 0)

is the existing usage, which creates a checkpoint (all qcow2 images get
an internal snapshot named "whatever", and the first image also saves
the memory state).

virDomainSnapshotCreateXML(domain, "

  whatever

", VIR_DOMAIN_SNAPSHOT_DISK_ONLY)

would try to create a snapshot of all disks associated with the image
(although without the name of the snapshot file, either libvirt will
have to have some default smarts for how to generate a reasonable backup
file name, or this will fail).  That is, omit all mention of 
subelements, and libvirt will then fill out the XML to cover all disks
(the whole-vm-snapshot case).

virDomainSnapshotCreateXML(domain, "

  whatever
  
  
...
  

", VIR_DOMAIN_SNAPSHOT_DISK_ONLY)

will only do a snapshot of disk image 2, using the 
information to explicitly specify the filename to use on the created
external snapshot file (rather than letting libvirt generate the
snapshot name).  That is, provide at least one  element, and you
now have fine-grained control over which volumes get a snapshot (or even
how that snapshot is created).

> There are two types of snapshots that I am aware of:
> - Base file is left unmodified after snapshot, snapshot file is created
> and modified. e.g. qcow2 (I think)
> - Base file continues to be modified. The snapshot file gets COW blocks
> copied into it. e.g. LVM, Livebackup, etc.

There's a third - the qcow2 internal snapshot:

- Base file contains both the snapshot and the delta.

> 
> Can we enhance the libvirt API to indicate what type of snapshot is
> desired. Also, when a snapshot is listed, can we try and describe it as
> one kind or the other?

Yes, there are already some read-only XML elements in the
 XML (that is, libvirt ignores or rejects them if you
pass them to virDomainSnapshotCreateXML, but virDomainSnapshotGetXMLDesc
will list those additional elements to give you more details about the
snapshot); having a sub-element to state whether the snapshot is
backing-file based (original is now treated as read-only, and
modifications affect the snapshot) or COW based (original and backup
share all blocks to begin with, but as original get modified, the
read-only backup has more unique blocks).

> 
> There is no facility in the API to track dirty bitmaps. Suppose a disk
> format or qemu proper has the ability to maintain a dirty bitmap of
> blocks(or clusters) modified since some event (time in ms, perhaps). I
> would like libvirt to provide a function such as:
> 
> /*
> * returns NULL if the underlying block driver does not support
> * maintaining a dirty bitmap. If it does support a dirty bitmap,
> * the driver returns an opaque object that represents the time
> * since which this dirty bitmap is valid.
> *
> * Used by incremental backup programs to determine if qemu
> * has a bitmap of blocks that were dirtied since the last time
> * a backup was taken.
> */
> virStorageDirtyBitmapTimeOpaquePtr
> virStorageVolDirtyBitmapPresent(virStorageVolPtr vol)

Yes, we already had a discussion about the utility of being able to
expose how much of an image is directly contained within a file, vs.
being pulled in from a backing file (which can also be read as how much
of an image is dirty compared to the state

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-05 Thread Eric Blake
On 07/04/2011 08:19 AM, Stefan Hajnoczi wrote:
> On Thu, Jun 16, 2011 at 6:41 AM, Eric Blake  wrote:
> 
> Robert, Fernando, Jagane: I have CCed you because we have discussed
> snapshot APIs and I thought you'd be interested in Eric's work to
> build them for libvirt.
> 
> Does each volume have its own independent snapshot namespace?  It may
> be wise to document that snapshot namespaces are *not* independent
> because storage backends may not be able to provide these semantics.

Good question, and I'm not quite sure on the best way to represent this.

For qcow2 internal snapshots, the answer is obvious - each qcow2 image
has its own snapshot namespace (and you can currently view this with
qemu-img snapshot -l).  But it looks like the existing drive to add qemu
snapshot support to live domains is focusing solely on external
snapshots (that is, a qcow2 snapshot involves creating a new filename,
then marking the old filename as a read-only backing image of the new
qcow2 filename; where qcow2 can even be used as a snapshot around a raw
file).

For external snapshots, I'm not sure whether the namespace should be
specific to a storage pool (that is, any additional metadata that
libvirt needs to track snapshot relationships should be stored in the
same directory as the disk images themselves) or specific to the libvirt
host (that is, just as libvirt's notion of a persistent storage pool
happens to be stored in /etc/libvirt/storage/pool.xml, libvirt should
also manage a file /etc/libvirt/storage/pool/snapshot.xml to describe
all snapshots tracked within "pool").  But I'm certainly thinking that
it is more likely to be a pool-wide namespace, rather than a
storage-volume local namespace.

> 
> I found this formatting quite hard to read.  Something along these
> lines would be easier to parse visually:

Yeah, sometimes there's only so much you can do with plain text emails.
 I'll try to make future revisions of this RFC easier to parse, and/or
get it all in the wiki with nicer formatting first.

>> /* Probe if vol has snapshots.  1 if true, 0 if false, -1 on error.
>> Flags is 0 for now.  */
>> int virStorageVolHasCurrentSnapshot(virStorageVolPtr vol, unsigned int
>> flags);
>> [For qcow2 images, snapshots can be contained within the same file and
>> managed with qemu-img -l, but for other formats, this may mean that
>> libvirt has to start managing externally saved data associated with the
>> storage pool that associates snapshots with filenames.  In fact, even
>> for qcow2 it might be useful to support creation of new files backed by
>> the previous snapshot rather than cramming multiple snapshots in one
>> file, so we may have a use for flags to filter out the presence of
>> single-file vs. multiple-file snapshot setups.]
> 
> What is the "current snapshot"?

I was trying to model this after the virDomainSnapshot API, which has a
notion of current snapshot, but I guess I don't fully understand what
that API was implying by current domain snapshot.

After looking more through the code, it looks like the idea was to
support the notion of a branching hierarchy of snapshots:

A -> B -> D
 \-> C -> E

such that you can revert to the snapshot at point B, start the domain,
and create snapshot D; then stop the domain, revert to the snapshot at
point C, start the domain, and create snapshot E.  Every time you start
the domain, you know from which state it was started, so the current
snapshot is the snapshot that would be the parent if you were to create
a snapshot right now.

As long as you can create a branching hierarchy (that is, after creating
a snapshot, you can revert to its parent state, then make different
changes, and create a new snapshot), then there really is a need to know
which snapshot is current - listing all snapshots can show you the
parent(s) of each snapshot in that list, but can't tell you which
snapshot would be the parent of a new one created right now.

> 
> Is this function necessary when you already have
> virStorageVolSnapshotListNames()?

Maybe I need to understand why we have it for the virDomainSnapshot
case, and whether it still makes sense for a disk image that is not
associated with a domain.  To some degree, I think it seems necessary,
to reflect the fact that with internal qcow2 snapshots, I can do:

qemu-img snapshsot -c one file
run then stop vm to modify file
qemu-img snapshot -c two file
qemu-img snapshot -a one file
run then stop vm to modify file
qemu-img snapshot -c three file

with the resulting hierarchy:

one -> two
   \-> three

On the other hand, qemu-img doesn't appear to list any hierarchies
between internal snapshots - that is, while 'qemu-img snapshot -l' will
list one, two, and three, it gives no indication that three depends on
one but not two, nor whether the current state of the file would be a
delta against three, two, one, or even parent-less.

This also starts to get into questions about the ability to split a
qcow2 image with internal snapshots.  That is, if I have

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-05 Thread Eric Blake
On 07/05/2011 04:26 AM, Osier Yang wrote:
> 
> Does it also need to enhance volume's XML so that it can include
> ?

No.  We _don't_ want to modify virStorageVolGetXMLDesc to output
snapshot information from a virStorageVolPtr, rather you would use the
new APIs to query whether a virStorageVol has associated snapshot
information, and if so, grab a new virStorageVolSnapshotPtr object that
wraps the information, and use virStorageVolSnapshotGetXMLDesc on that
new object to get the  XML.

That is, this is parallel to the existing virDomainPtr and
virDomainSnapshotPtr - the XML for a given domain does not give you any
details about snapshots (or checkpoints) tied to the domain; rather, you
use the virDomainSnapshot API to get the corresponding snapshot object
tied to a domain, then do all your queries on the snapshot object.

>  Since you are already planning
> to introduce kinds of new storage volume APIs. One might want to
> manage the snapshots without domain.

That's the whole point of my RFC.  The _existing_ virDomainSnapshot APIs
allow management of checkpoints (disk + memory), and can easily be
extended to also manage snapshots (disk only) associated with a domain,
whereas my proposed new virStorageVolSnapshot API is for management of
snapshots of a given disk image without an associated domain.

> If so, it might also need modifications on bunch of internal storage
> volume APIs, such as volDelete, volCreateFromXML, volGetXMLDesc.
> etc, it might also need to introduce new flags for the according public
> APIs, the good is that it seems all of these APIs has "flags" argument.

I'm not sure I follow what additional modifications you are envisioning,
although I'll give it a shot:

Right now, virStorageVolDelete has a flags argument, but does not have
any defined flags.  What should happen if the virStorageVolPtr has
associated snapshots?  I can see the use of adding new flags:

virStorageVolDelete(,0) - fail if there are associated snapshots
virStorageVolDelete(,VIR_STORAGE_VOL_DELETE_SNAPSHOTS) - delete both the
image and all snapshots associated with the image

I don't see any changes to virStorageVolGetXMLDesc.

Right now, virStorageVolCreateFromXML creates a clone storage volume,
and takes a flags argument but no defined flags.  What happens if the
input virStorageVolPtr clonevol passed to that API has associated
snapshots?  Will the new volume also have the same snapshots as the old
volume?  Will the cloning also duplicate the snapshots?  Is the new
volume a flattened version of the input (no backing files in the new
image)?  There are probably several useful behaviors, and I'm not sure
whether they should be controlled by new flags, by new elements in the
xmldesc argument, or some combination of those.  But you certainly have
a point that there is more to think about when cloning an existing
volume if that existing volume has associated snapshots.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-05 Thread Osier Yang

On 06/16/2011 01:41 PM, Eric Blake wrote:

Right now, libvirt has a snapshot API via virDomainSnapshotCreateXML,
but for qemu domains, it only works if all the guest disk images are
qcow2, and qemu rather than libvirt does all the work.  However, it has
a couple of drawbacks: it is inherently tied to domains (there is no way
to manage snapshots of storage volumes not tied to a domain, even though
libvirt does that for qcow2 images associated with offline qemu domains
by using the qemu-img application).  And it necessarily operates on all
of the images associated with a domain in parallel - if any disk image
is not qcow2, the snapshot fails, and there is no way to select a subset
of disks to save.  However, it works on both active (disk and memory
state) and inactive domains (just disk state).

Upstream qemu is developing a 'live snapshot' feature, which allows the
creation of a snapshot without the current downtime of several seconds
required by the current 'savevm' monitor command, as well as means for
controlling applications (libvirt) to request that qemu pause I/O to a
particular disk, then externally perform a snapshot, then tell qemu to
resume I/O (perhaps on a different file name or fd from the host, but
with no change to the contents seen by the guest).  Eventually, these
changes will make it possible for libvirt to create fast snapshots of
LVM partitions or btrfs files for guest disk images, as well as to
select which disks are saved in a snapshot (that is, save a
crash-consistent state of a subset of disks, without the corresponding
RAM state, rather than making a full system restore point); the latter
would work best with guest cooperation to quiesce disks before qemu
pauses I/O to that disk, but that is an orthogonal enhancement.
However, my first goal with API enhancements is to merely prove that
libvirt can manage a live snapshot by using qemu-img on a qcow2 image
rather than the current 'savevm' approach of qemu doing all the work.

Additionally, libvirt provides the virDomainSave command, which saves
just the state of the domain's memory, and stops the guest.  A crude
libvirt-only snapshot could thus already be done by using virDomainSave,
then externally doing a snapshot of all disk images associated with the
domain by using virStorageVol APIs, except that such APIs don't yet
exist.  Additionally, virDomainSave has no flags argument, so there is
no way to request that the guest be resumed after the snapshot completes.

Right now, I'm proposing the addition of virDomainSaveFlags, along with
a series of virStorageVolSnapshot* APIs that mirror the
virDomainSnapshot* APIs.  This would mean adding:


/* Opaque type to manage a snapshot of a single storage volume.  */
typedef virStorageVolSnapshotPtr;

/* Create a snapshot of a storage volume.  XML is optional, if non-NULL,
it would be a new top-level element  which is similar to
the top-level  for virDomainSnapshotCreateXML, to
specify name and description. Flags is 0 for now. */
virStorageVolSnapshotPtr virDomainSnapshotCreateXML(virStorageVolPtr
vol, const char *xml, unsigned int flags);
[For qcow2, this would be implemented with 'qemu-img snapshot -c',
similar to what virDomainSnapshotXML already does on inactive domains.
Later, we can add LVM and btrfs support, or even allow full file copies
of any file type.  Also in the future, we could enhance XML to take a
new element that describes a relationship between the name of the
original and of the snapshot, in the case where a new filename has to be
created to complete the snapshot process.]


/* Probe if vol has snapshots.  1 if true, 0 if false, -1 on error.
Flags is 0 for now.  */
int virStorageVolHasCurrentSnapshot(virStorageVolPtr vol, unsigned int
flags);
[For qcow2 images, snapshots can be contained within the same file and
managed with qemu-img -l, but for other formats, this may mean that
libvirt has to start managing externally saved data associated with the
storage pool that associates snapshots with filenames.  In fact, even
for qcow2 it might be useful to support creation of new files backed by
the previous snapshot rather than cramming multiple snapshots in one
file, so we may have a use for flags to filter out the presence of
single-file vs. multiple-file snapshot setups.]


/* Revert a volume back to the state of a snapshot, returning 0 on
success.  Flags is 0 for now.  */
int virStorageVolRevertToSnapsot(virStorageVolSnapshotPtr snapshot,
unsigned int flags);
[For qcow2, this would involve qemu-img snapshot -a.  Here, a useful
flag might be whether to delete any changes made after the point of the
snapshot; virDomainRevertToSnapshot should probably honor the same type
of flag.]


/* Return the most recent snapshot of a volume, if one exists, or NULL
on failure.  Flags is 0 for now.  */
virStorageVolSnapshotPtr virStorageVolSnapshotCurrent(virStorageVolPtr
vol, unsigned int flags);


/* Delete the storage associated with a snapshot (although the opaque
snapshot object must sti

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-05 Thread Stefan Hajnoczi
On Tue, Jul 5, 2011 at 2:53 AM, Jagane Sundar  wrote:
>> /* Create a snapshot of a storage volume.  XML is optional, if non-NULL,
>>  * it would be a new top-level element  which is similar to
>>  * the top-level  for virDomainSnapshotCreateXML, to
>>  * specify name and description. Flags is 0 for now.
>>  */
>> virStorageVolSnapshotPtr virDomainSnapshotCreateXML(
>>     virStorageVolPtr vol, const char *xml, unsigned int flags);
>>
> There are two types of snapshots that I am aware of:
> - Base file is left unmodified after snapshot, snapshot file is created and
> modified. e.g. qcow2 (I think)

More detail on this approach as implemented by QEMU's snapshot_blkdev:

Create snapshot.qcow2 with base.img as backing file.  base.img is now
read-only and can be accessed as a "snapshot".  All writes go to
snapshot.qcow2.

When the snapshot is no longer needed it is necessary to merge the COW
data back into base.img before deleting snapshot.qcow2.  This merge
has not been implemented in QEMU yet.

> - Base file continues to be modified. The snapshot file gets COW blocks
> copied into it. e.g. LVM, Livebackup, etc.
>
> Can we enhance the libvirt API to indicate what type of snapshot is desired.
> Also, when a snapshot is listed, can we try and describe it as one kind or
> the other?

I think the snapshot mechanism will depend on your storage backend.
If the disk image is an LVM volume, then it is natural for the
snapshot to be an LVM snapshot.  If the disk image is a qcow2 file,
then it is natural for the snapshot be a QEMU snapshot_blkdev
snapshot.

Also, it is often not possible to mix these snapshot mechanisms.  For
example, LVM snapshots don't work on qcow2 image files.

Does the application have to be aware of which snapshotting approach
is used by the backend?  Perhaps there are a few cases where it is
technically possible to mix-and-match but it just seems to expose
complexity without much gain.

Put another way: "If a storage backend fundamentally doesn't support
snapshotting the you like, use a different backend".

> There is no facility in the API to track dirty bitmaps. Suppose a disk
> format or qemu proper has the ability to maintain a dirty bitmap of
> blocks(or clusters) modified since some event (time in ms, perhaps). I would
> like libvirt to provide a function such as:
>
> /*
> * returns NULL if the underlying block driver does not support
> * maintaining a dirty bitmap. If it does support a dirty bitmap,
> * the driver returns an opaque object that represents the time
> * since which this dirty bitmap is valid.
> *
> * Used by incremental backup programs to determine if qemu
> * has a bitmap of blocks that were dirtied since the last time
> * a backup was taken.
> */
> virStorageDirtyBitmapTimeOpaquePtr
> virStorageVolDirtyBitmapPresent(virStorageVolPtr vol)

Yes, dirty bitmap support is important.  This will make backup much
more efficient on storage backends that support it.

For QEMU image files it will be possible to provide dirty block
information in the future.  btrfs and a SAN appliance that I have
looked both have mechanisms that could be used to provide dirty block
tracking.

Stefan

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-04 Thread Jagane Sundar

Thanks for looping me in, Stefan.


Does each volume have its own independent snapshot namespace?  It may
be wise to document that snapshot namespaces are *not* independent
because storage backends may not be able to provide these semantics.


There is a need for 'just-a-volume-snapshot', and for a 'whole-vm-snapshot'.
The 'whole-vm-snapshot' can possibly be collection of 
'just-a-volume-snapshot'.

/* Create a snapshot of a storage volume.  XML is optional, if non-NULL,
it would be a new top-level element  which is similar to
the top-level  for virDomainSnapshotCreateXML, to
specify name and description. Flags is 0 for now. */
virStorageVolSnapshotPtr virDomainSnapshotCreateXML(virStorageVolPtr
vol, const char *xml, unsigned int flags);
[For qcow2, this would be implemented with 'qemu-img snapshot -c',
similar to what virDomainSnapshotXML already does on inactive domains.
Later, we can add LVM and btrfs support, or even allow full file copies
of any file type.  Also in the future, we could enhance XML to take a
new element that describes a relationship between the name of the
original and of the snapshot, in the case where a new filename has to be
created to complete the snapshot process.]

I found this formatting quite hard to read.  Something along these
lines would be easier to parse visually:

/* Create a snapshot of a storage volume.  XML is optional, if non-NULL,
  * it would be a new top-level element  which is similar to
  * the top-level  for virDomainSnapshotCreateXML, to
  * specify name and description. Flags is 0 for now.
  */
virStorageVolSnapshotPtr virDomainSnapshotCreateXML(
 virStorageVolPtr vol, const char *xml, unsigned int flags);


There are two types of snapshots that I am aware of:
- Base file is left unmodified after snapshot, snapshot file is created 
and modified. e.g. qcow2 (I think)
- Base file continues to be modified. The snapshot file gets COW blocks 
copied into it. e.g. LVM, Livebackup, etc.


Can we enhance the libvirt API to indicate what type of snapshot is 
desired. Also, when a snapshot is listed, can we try and describe it as 
one kind or the other?


There is no facility in the API to track dirty bitmaps. Suppose a disk 
format or qemu proper has the ability to maintain a dirty bitmap of 
blocks(or clusters) modified since some event (time in ms, perhaps). I 
would like libvirt to provide a function such as:


/*
* returns NULL if the underlying block driver does not support
* maintaining a dirty bitmap. If it does support a dirty bitmap,
* the driver returns an opaque object that represents the time
* since which this dirty bitmap is valid.
*
* Used by incremental backup programs to determine if qemu
* has a bitmap of blocks that were dirtied since the last time
* a backup was taken.
*/
virStorageDirtyBitmapTimeOpaquePtr 
virStorageVolDirtyBitmapPresent(virStorageVolPtr vol)


Thanks,
Jagane

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-07-04 Thread Stefan Hajnoczi
On Thu, Jun 16, 2011 at 6:41 AM, Eric Blake  wrote:

Robert, Fernando, Jagane: I have CCed you because we have discussed
snapshot APIs and I thought you'd be interested in Eric's work to
build them for libvirt.

Does each volume have its own independent snapshot namespace?  It may
be wise to document that snapshot namespaces are *not* independent
because storage backends may not be able to provide these semantics.

> /* Create a snapshot of a storage volume.  XML is optional, if non-NULL,
> it would be a new top-level element  which is similar to
> the top-level  for virDomainSnapshotCreateXML, to
> specify name and description. Flags is 0 for now. */
> virStorageVolSnapshotPtr virDomainSnapshotCreateXML(virStorageVolPtr
> vol, const char *xml, unsigned int flags);
> [For qcow2, this would be implemented with 'qemu-img snapshot -c',
> similar to what virDomainSnapshotXML already does on inactive domains.
> Later, we can add LVM and btrfs support, or even allow full file copies
> of any file type.  Also in the future, we could enhance XML to take a
> new element that describes a relationship between the name of the
> original and of the snapshot, in the case where a new filename has to be
> created to complete the snapshot process.]

I found this formatting quite hard to read.  Something along these
lines would be easier to parse visually:

/* Create a snapshot of a storage volume.  XML is optional, if non-NULL,
 * it would be a new top-level element  which is similar to
 * the top-level  for virDomainSnapshotCreateXML, to
 * specify name and description. Flags is 0 for now.
 */
virStorageVolSnapshotPtr virDomainSnapshotCreateXML(
virStorageVolPtr vol, const char *xml, unsigned int flags);

...coments...

> /* Probe if vol has snapshots.  1 if true, 0 if false, -1 on error.
> Flags is 0 for now.  */
> int virStorageVolHasCurrentSnapshot(virStorageVolPtr vol, unsigned int
> flags);
> [For qcow2 images, snapshots can be contained within the same file and
> managed with qemu-img -l, but for other formats, this may mean that
> libvirt has to start managing externally saved data associated with the
> storage pool that associates snapshots with filenames.  In fact, even
> for qcow2 it might be useful to support creation of new files backed by
> the previous snapshot rather than cramming multiple snapshots in one
> file, so we may have a use for flags to filter out the presence of
> single-file vs. multiple-file snapshot setups.]

What is the "current snapshot"?

Is this function necessary when you already have
virStorageVolSnapshotListNames()?

> /* Return the most recent snapshot of a volume, if one exists, or NULL
> on failure.  Flags is 0 for now.  */
> virStorageVolSnapshotPtr virStorageVolSnapshotCurrent(virStorageVolPtr
> vol, unsigned int flags);

The name should include "revert".  This looks like a shortcut function
for virStorageVolRevertToSnapshot().

> /* Delete the storage associated with a snapshot (although the opaque
> snapshot object must still be independently freed).  If flags is 0, any
> child snapshots based off of this one are rebased onto the parent; if
> flags is VIR_STORAGE_VOL_SNAPSHOT_DELETE_CHILDREN , then any child
> snapshots based off of this one are also deleted.  */

What is the "opaque snapshot object"?

> int virStorageVolSnapshotDelete(virStorageVolSnapshotPtr snapshot,
> unsigned int flags);
> [For qcow2, this would involve qemu-img snapshot -d.  For
> multiple-file snapshots, this would also involve qemu-img commit.]

Is virStorageVolDelete() modified to also delete the volume's snapshots?

Stefan

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-06-22 Thread Stefan Hajnoczi
On Wed, Jun 22, 2011 at 9:27 AM, Daniel Veillard  wrote:
> On Tue, Jun 21, 2011 at 02:12:28PM +0100, Stefan Hajnoczi wrote:
>> On Tue, Jun 21, 2011 at 11:30 AM, Daniel P. Berrange
>>  wrote:
>> > For formats like LVM, brtfs, SCSI, etc,  libvirt will have todo all
>> > the work of creating the snapshot, possibly then telling QEMU to
>> > switch the backing file of a virtual disk to the new image (if the
>> > snapshot mechanism works that way).
>>
>> Putting non-virtualization storage management code into libvirt seems
>> suboptimal since other applications may also want to use these generic
>> features.  However, I'm not aware of a storage management API for
>> Linux that would support LVM and various SAN/NAS appliances.  Ideally
>> we would have something like that and libvirt can use the storage
>> management API without knowing all the different storage types.
>>
>> A service like udisks with plugins for SAN/NAS appliances could solve
>> the problem of where to put the storage management code.
>
>  Well there is multiple answers to this "sub-optimality"
>  1/ we can't really wait, there are some parallel developments I'm
>     following (from a distance) which may provide some of this
>     the key point is trying to keep some compatibility in the
>     objects and terms of the API to be able to reuse them
>  2/ if we develop some code in libvirt for this is could be separated
>     as a library once it's mature enough
>
> IMHO the point is making sure the way we represent things maps easilly
> with how other specs or library may do it, then we should be able to
> reuse them (e.g. CDMI snapshots
> http://cdmi.sniacloud.com/CDMI_Spec/14-Snapshots/14-Snapshots.htm )

I agree.  The main thing is getting the semantics and the model right
so that it can represent the various storage types/APIs.

Stefan

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-06-22 Thread Daniel Veillard
On Tue, Jun 21, 2011 at 02:12:28PM +0100, Stefan Hajnoczi wrote:
> On Tue, Jun 21, 2011 at 11:30 AM, Daniel P. Berrange
>  wrote:
> > For formats like LVM, brtfs, SCSI, etc,  libvirt will have todo all
> > the work of creating the snapshot, possibly then telling QEMU to
> > switch the backing file of a virtual disk to the new image (if the
> > snapshot mechanism works that way).
> 
> Putting non-virtualization storage management code into libvirt seems
> suboptimal since other applications may also want to use these generic
> features.  However, I'm not aware of a storage management API for
> Linux that would support LVM and various SAN/NAS appliances.  Ideally
> we would have something like that and libvirt can use the storage
> management API without knowing all the different storage types.
> 
> A service like udisks with plugins for SAN/NAS appliances could solve
> the problem of where to put the storage management code.

  Well there is multiple answers to this "sub-optimality"
  1/ we can't really wait, there are some parallel developments I'm
 following (from a distance) which may provide some of this
 the key point is trying to keep some compatibility in the
 objects and terms of the API to be able to reuse them
  2/ if we develop some code in libvirt for this is could be separated
 as a library once it's mature enough

IMHO the point is making sure the way we represent things maps easilly
with how other specs or library may do it, then we should be able to
reuse them (e.g. CDMI snapshots
http://cdmi.sniacloud.com/CDMI_Spec/14-Snapshots/14-Snapshots.htm )

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-06-21 Thread Stefan Hajnoczi
On Tue, Jun 21, 2011 at 2:53 PM, Eric Blake  wrote:
> On 06/21/2011 04:30 AM, Daniel P. Berrange wrote:
>>> Upstream qemu is developing a 'live snapshot' feature, which allows the
>>> creation of a snapshot without the current downtime of several seconds
>>> required by the current 'savevm' monitor command, as well as means for
>>> controlling applications (libvirt) to request that qemu pause I/O to a
>>> particular disk, then externally perform a snapshot, then tell qemu to
>>> resume I/O (perhaps on a different file name or fd from the host, but
>>> with no change to the contents seen by the guest).  Eventually, these
>>> changes will make it possible for libvirt to create fast snapshots of
>>> LVM partitions or btrfs files for guest disk images, as well as to
>>
>> Actually, IIUC, the QEMU 'live snapshot' feature is only for special
>> disk formats like qcow2, qed, etc.
>
> Does anyone have pointers to the qemu implementation of monitor commands
> used for live snapshot?

http://repo.or.cz/w/qemu.git/blob/HEAD:/blockdev.c#l572

Jes implemented the snapshot_blkdev command and is integrating guest
agent fsfreeze support.

I think it needs to be a multi-step process instead of just one
command so that libvirt can do storage management for LVM and co.

Stefan

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-06-21 Thread Eric Blake
On 06/21/2011 04:30 AM, Daniel P. Berrange wrote:
>> Upstream qemu is developing a 'live snapshot' feature, which allows the
>> creation of a snapshot without the current downtime of several seconds
>> required by the current 'savevm' monitor command, as well as means for
>> controlling applications (libvirt) to request that qemu pause I/O to a
>> particular disk, then externally perform a snapshot, then tell qemu to
>> resume I/O (perhaps on a different file name or fd from the host, but
>> with no change to the contents seen by the guest).  Eventually, these
>> changes will make it possible for libvirt to create fast snapshots of
>> LVM partitions or btrfs files for guest disk images, as well as to
> 
> Actually, IIUC, the QEMU 'live snapshot' feature is only for special
> disk formats like qcow2, qed, etc.

Does anyone have pointers to the qemu implementation of monitor commands
used for live snapshot?

> For formats like LVM, brtfs, SCSI, etc,  libvirt will have todo all
> the work of creating the snapshot, possibly then telling QEMU to
> switch the backing file of a virtual disk to the new image (if the
> snapshot mechanism works that way).

Yes, that was what I was envisioning.

> 
>> select which disks are saved in a snapshot (that is, save a
>> crash-consistent state of a subset of disks, without the corresponding
>> RAM state, rather than making a full system restore point); the latter
>> would work best with guest cooperation to quiesce disks before qemu
>> pauses I/O to that disk, but that is an orthogonal enhancement.
> 
> At the very least, you need a way to store QEMU writing to the disk
> for a period of time, whether or not the guest is quiesced. There
> are basically 3 options
> 
>  1. Pause the guest CPUs (eg  'stop' on the monitor)
>  2. QEMU queues I/O from guest in memory temporarily (does not currently 
> exist)
>  3. QEMU tells guest to quiesce I/O temporarily (does not currently exist)
> 
> To perform a snapshot libvirt would need todo 
> 
>  1. Stop I/O using one of the 3 methods above
>  2. If disk is a special format
>   - Ask QEMU to snapshot it
> Else
>   - Create snapshot ourselves
>   - Update QEMU disk backing path (optional)
>  3. Resume I/O

It is step 2B (create the snapshot ourselves) where the proposed
virStorageVolSnapshot* APIs would be useful.  The remaining steps also
need implementation, but I believe that they can fit into existing APIs
by the use of new flag values, rather than requiring any new API.

> 
>> However, my first goal with API enhancements is to merely prove that
>> libvirt can manage a live snapshot by using qemu-img on a qcow2 image
>> rather than the current 'savevm' approach of qemu doing all the work.
> 
> FYI, QEMU developers are adament that if the disk image is open
> by QEMU you should, in general, not do anything using qemu-img
> on that disk image.

Agreed.  And I further think that we need to expend some efforts making
the new image locking code also play well with libvirt - that is, any
virStorageVol API that can modify a disk image (rather than just do a
read-only operation describing the image) should probably be taught to
fail if any active domain is also using that image.  Conversely, if a
long-running virStorageVol API is started on a volume, then an attempt
to virDomainStart a domain should see that the volume is already in use
and fail just as if the volume had been locked by another running domain.

> libvirt does currently do things like querying
> disk capacity, but we can get away with that because it is an
> invariant section of the header. We certainly can't create internal
> snapshots with qemu-img while the guest is live. Creating external
> snapshots with qemu-img is probably OK, but when I've suggested
> this before QEMU developers were unhappy with even that.

Basically, my proposed virStorageVolSnapshot APIs should only be used on
inactive volumes; for a running domain, you should always go through the
existing virDomainSnapshot API, which can then make appropriate
decisions whether to do external snapshots, or whether to have qemu do
the work because the image is qcow2.  I think we're in agreement here,
and that it still doesn't impact the decision for adding new API for
offline snapshot management.

> 
> What I'm not seeing here, is how these APIs all relate to the existing
> support we have in virStorageVol APIs for creating snapshots. THis is
> already implemented for LVM, QCow, QCow2.

The only existing snapshot API that I found was
virDomainSnapshotCreateXML, which only works on qcow2 (not lvm or qcow),
and which works either online (via qemu) or offline (via qemu-img).  But
I could have overlooked something - where is the existing API for
creating an LVM snapshot?  For volume creation, I'm aware of code for
specifying a backing file for an existing file, but backing files aren't
necessarily the same as snapshots, are they?

> The snapshots are created by
> specifying a backing file in the initial 

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-06-21 Thread Stefan Hajnoczi
On Tue, Jun 21, 2011 at 11:30 AM, Daniel P. Berrange
 wrote:
> On Wed, Jun 15, 2011 at 11:41:27PM -0600, Eric Blake wrote:
>> Right now, libvirt has a snapshot API via virDomainSnapshotCreateXML,
>> but for qemu domains, it only works if all the guest disk images are
>> qcow2, and qemu rather than libvirt does all the work.  However, it has
>> a couple of drawbacks: it is inherently tied to domains (there is no way
>> to manage snapshots of storage volumes not tied to a domain, even though
>> libvirt does that for qcow2 images associated with offline qemu domains
>> by using the qemu-img application).  And it necessarily operates on all
>> of the images associated with a domain in parallel - if any disk image
>> is not qcow2, the snapshot fails, and there is no way to select a subset
>> of disks to save.  However, it works on both active (disk and memory
>> state) and inactive domains (just disk state).
>>
>> Upstream qemu is developing a 'live snapshot' feature, which allows the
>> creation of a snapshot without the current downtime of several seconds
>> required by the current 'savevm' monitor command, as well as means for
>> controlling applications (libvirt) to request that qemu pause I/O to a
>> particular disk, then externally perform a snapshot, then tell qemu to
>> resume I/O (perhaps on a different file name or fd from the host, but
>> with no change to the contents seen by the guest).  Eventually, these
>> changes will make it possible for libvirt to create fast snapshots of
>> LVM partitions or btrfs files for guest disk images, as well as to
>
> Actually, IIUC, the QEMU 'live snapshot' feature is only for special
> disk formats like qcow2, qed, etc.

Yes.  The live snapshot feature in QEMU will not do btrfs, LVM, or
SAN/NAS snapshots.

> For formats like LVM, brtfs, SCSI, etc,  libvirt will have todo all
> the work of creating the snapshot, possibly then telling QEMU to
> switch the backing file of a virtual disk to the new image (if the
> snapshot mechanism works that way).

Putting non-virtualization storage management code into libvirt seems
suboptimal since other applications may also want to use these generic
features.  However, I'm not aware of a storage management API for
Linux that would support LVM and various SAN/NAS appliances.  Ideally
we would have something like that and libvirt can use the storage
management API without knowing all the different storage types.

A service like udisks with plugins for SAN/NAS appliances could solve
the problem of where to put the storage management code.

>> select which disks are saved in a snapshot (that is, save a
>> crash-consistent state of a subset of disks, without the corresponding
>> RAM state, rather than making a full system restore point); the latter
>> would work best with guest cooperation to quiesce disks before qemu
>> pauses I/O to that disk, but that is an orthogonal enhancement.
>
> At the very least, you need a way to store QEMU writing to the disk
> for a period of time, whether or not the guest is quiesced. There
> are basically 3 options
>
>  1. Pause the guest CPUs (eg  'stop' on the monitor)
>  2. QEMU queues I/O from guest in memory temporarily (does not currently 
> exist)
>  3. QEMU tells guest to quiesce I/O temporarily (does not currently exist)
>
> To perform a snapshot libvirt would need todo
>
>  1. Stop I/O using one of the 3 methods above
>  2. If disk is a special format
>      - Ask QEMU to snapshot it
>    Else
>      - Create snapshot ourselves
>      - Update QEMU disk backing path (optional)
>  3. Resume I/O

Yes, QEMU needs to provide commands for these individual steps.  Also,
the guest must be notified of the snapshot operation so that it can
flush in-memory data to disk - otherwise this cannot be used for
backup purposes since guests with several GBs of RAM will keep a
considerable portion of state in memory and disk will be out-of-date.

>> /* Save a domain into the file 'to' with additional actions.  If flags
>> is 0, then xml is ignored, and this is like virDomainSave.  If flags
>> includes VIR_DOMAIN_SAVE_DISKS, then all of the associated disk images
>> are also snapshotted, as if by virStorageVolSnapshotCreateXML; the xml
>> argument is optional, but if present, it should be a 
>> element with  sub-elements for directions on each disk that needs
>> a non-empty xml argument for proper volume snapshot creation.  If flags
>> includes VIR_DOMAIN_SAVE_RESUME, then the guest is resumed after the
>> offline snapshot is complete (note that VIR_DOMAIN_SAVE_RESUME without
>> VIR_DOMAIN_SAVE_DISKS makes little sense, as a saved state file is
>> rendered useless if the disk images are modified before it is resumed).
>>  If flags includes VIR_DOMAIN_SAVE_QUIESCE, this requests that a guest
>> agent quiesce disk state before the saved state file is created.  */
>> int virDomainSaveFlags(virDomainPtr domain, const char *to, const char
>> *xml, unsigned int flags);
>
>
> What I'm not seeing here, is how these APIs 

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-06-21 Thread Daniel P. Berrange
On Wed, Jun 15, 2011 at 11:41:27PM -0600, Eric Blake wrote:
> Right now, libvirt has a snapshot API via virDomainSnapshotCreateXML,
> but for qemu domains, it only works if all the guest disk images are
> qcow2, and qemu rather than libvirt does all the work.  However, it has
> a couple of drawbacks: it is inherently tied to domains (there is no way
> to manage snapshots of storage volumes not tied to a domain, even though
> libvirt does that for qcow2 images associated with offline qemu domains
> by using the qemu-img application).  And it necessarily operates on all
> of the images associated with a domain in parallel - if any disk image
> is not qcow2, the snapshot fails, and there is no way to select a subset
> of disks to save.  However, it works on both active (disk and memory
> state) and inactive domains (just disk state).
> 
> Upstream qemu is developing a 'live snapshot' feature, which allows the
> creation of a snapshot without the current downtime of several seconds
> required by the current 'savevm' monitor command, as well as means for
> controlling applications (libvirt) to request that qemu pause I/O to a
> particular disk, then externally perform a snapshot, then tell qemu to
> resume I/O (perhaps on a different file name or fd from the host, but
> with no change to the contents seen by the guest).  Eventually, these
> changes will make it possible for libvirt to create fast snapshots of
> LVM partitions or btrfs files for guest disk images, as well as to

Actually, IIUC, the QEMU 'live snapshot' feature is only for special
disk formats like qcow2, qed, etc.

For formats like LVM, brtfs, SCSI, etc,  libvirt will have todo all
the work of creating the snapshot, possibly then telling QEMU to
switch the backing file of a virtual disk to the new image (if the
snapshot mechanism works that way).

> select which disks are saved in a snapshot (that is, save a
> crash-consistent state of a subset of disks, without the corresponding
> RAM state, rather than making a full system restore point); the latter
> would work best with guest cooperation to quiesce disks before qemu
> pauses I/O to that disk, but that is an orthogonal enhancement.

At the very least, you need a way to store QEMU writing to the disk
for a period of time, whether or not the guest is quiesced. There
are basically 3 options

 1. Pause the guest CPUs (eg  'stop' on the monitor)
 2. QEMU queues I/O from guest in memory temporarily (does not currently exist)
 3. QEMU tells guest to quiesce I/O temporarily (does not currently exist)

To perform a snapshot libvirt would need todo 

 1. Stop I/O using one of the 3 methods above
 2. If disk is a special format
  - Ask QEMU to snapshot it
Else
  - Create snapshot ourselves
  - Update QEMU disk backing path (optional)
 3. Resume I/O

> However, my first goal with API enhancements is to merely prove that
> libvirt can manage a live snapshot by using qemu-img on a qcow2 image
> rather than the current 'savevm' approach of qemu doing all the work.

FYI, QEMU developers are adament that if the disk image is open
by QEMU you should, in general, not do anything using qemu-img
on that disk image. libvirt does currently do things like querying
disk capacity, but we can get away with that because it is an
invariant section of the header. We certainly can't create internal
snapshots with qemu-img while the guest is live. Creating external
snapshots with qemu-img is probably OK, but when I've suggested
this before QEMU developers were unhappy with even that.

> Additionally, libvirt provides the virDomainSave command, which saves
> just the state of the domain's memory, and stops the guest.  A crude
> libvirt-only snapshot could thus already be done by using virDomainSave,
> then externally doing a snapshot of all disk images associated with the
> domain by using virStorageVol APIs, except that such APIs don't yet
> exist.  Additionally, virDomainSave has no flags argument, so there is
> no way to request that the guest be resumed after the snapshot completes.
> 
> Right now, I'm proposing the addition of virDomainSaveFlags, along with
> a series of virStorageVolSnapshot* APIs that mirror the
> virDomainSnapshot* APIs.  This would mean adding:
> 
> 
> /* Opaque type to manage a snapshot of a single storage volume.  */
> typedef virStorageVolSnapshotPtr;
> 
> /* Create a snapshot of a storage volume.  XML is optional, if non-NULL,
> it would be a new top-level element  which is similar to
> the top-level  for virDomainSnapshotCreateXML, to
> specify name and description. Flags is 0 for now. */
> virStorageVolSnapshotPtr virDomainSnapshotCreateXML(virStorageVolPtr
> vol, const char *xml, unsigned int flags);
> [For qcow2, this would be implemented with 'qemu-img snapshot -c',
> similar to what virDomainSnapshotXML already does on inactive domains.
> Later, we can add LVM and btrfs support, or even allow full file copies
> of any file type.  Also in the future, we could enhance XML to 

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-06-20 Thread Motonobu Ichimura
Hi,

2011年6月21日7:52 Eric Blake :

>
> >
> > For LVM, it may be lvconvert --merge , but requires recent version of lvm
> > and linux kernel,
> > So, there are some failure reason (toolchain doesn't support it or
> toolchain
> > supports, but failed)
> > I don't know how to handle these case for now, but we may need to add
> some
> > virErrorNumber
> > to indicate snapshot-related error.
>
> Yes, we can add new virError values as we come across situations where
> such operations are unsupported, but which don't fit well into any
> existing error values.
>

I understand.


>
> >> virStorageVolSnapshotCreateXML.  [And since my first phase of
> >> implementation will be focused on inline qcow2 snapshots, I don't yet
> >> know what that XML will need to contain for any other type of snapshots,
> >> such as mapping out how the snapshot backing file will be named in
> >> relation to the possibly new live file.]
> >>
> >
> > In LVM case, taking snapshot(lvcreate)  needs at least snapshot volume
> size
> > as an argument,
> > So, I think storage-specific element can be inserted in this XML format
>
> Wouldn't the snapshot volume size always be equal to the volume size of
> the original?  Or is there really value in allowing the snapshot size to
> be configured to something different (and if different, must it always
> be <= original, or does a larger snapshot than the original make sense)?
>

My concern senario is

1.  creaing a snapshot through libvirt,
2.  take a backup from snapshot,
3.  remove a snapshot.

case.

In this case, I think sys-admin may want to create a snapshot size  smaller
than original size.
But this senario seems to make another problem (snapshot becomes invalid,
described below).
so assuming that the snapshot volume size always be equal to the volume size
of
the original is better for us.

>> Any feedback on this approach?  Any other APIs that would be useful to
> >> add?  I'd like to get all the new APIs in place for 0.9.3 with minimal
> >> qcow2 functionality, then use the time before 0.9.4 to further enhance
> >> the APIs to cover more snapshot cases but without having to add any new
> >> APIs.
> >>
> >
> > LVM snapshot may become invalid in the case of running out the volume
> size,
> > So, adding an  API for checking whether snapshot is valid will be needed.
>
> I'm not quite sure I follow the scenario you are envisioning here.
> Would you mind stepping through an example (mapping [proposed] libvirt
> API calls to lvm commands would be helpful), on how an original lvm
> partition is created, then a snapshot partition, then how you run out of
> volume size in that snapshot?  It sounds like you are saying that there
> is a way to create an lvm snapshot that is valid at the time that is
> created, but later on, subsequent actions cause the snapshot to run out
> of space and no longer be valid.  But my understanding is that a
> snapshot is a constant size (it represents a known state of the disk at
> a fixed point in time), only the deltas to that snapshot (aka the live
> disk) ever have the potential to grow beyond the amount of storage used
> by the snapshot.  Or are you worried about creating an lvm snapshot by
> libvirt, but then a third party program changes the property of the lvm
> snapshot volume to change its size?
>
>
I worried about the case I mensioned above (snapshot size is smaller case)


Regards,
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-06-20 Thread Eric Blake
On 06/20/2011 04:52 PM, Eric Blake wrote:
>> LVM snapshot may become invalid in the case of running out the volume size,
>> So, adding an  API for checking whether snapshot is valid will be needed.
> 
> I'm not quite sure I follow the scenario you are envisioning here.
> Would you mind stepping through an example (mapping [proposed] libvirt
> API calls to lvm commands would be helpful), on how an original lvm
> partition is created, then a snapshot partition, then how you run out of
> volume size in that snapshot?  It sounds like you are saying that there
> is a way to create an lvm snapshot that is valid at the time that is
> created, but later on, subsequent actions cause the snapshot to run out
> of space and no longer be valid.  But my understanding is that a
> snapshot is a constant size (it represents a known state of the disk at
> a fixed point in time), only the deltas to that snapshot (aka the live
> disk) ever have the potential to grow beyond the amount of storage used
> by the snapshot.  Or are you worried about creating an lvm snapshot by
> libvirt, but then a third party program changes the property of the lvm
> snapshot volume to change its size?

Or is it something like the following scenario:

start with 10G lvm partition
create snapshot A to a 10GB partition
make 2G worth of changes
create snapshot B to a 2GB partition
decide to delete snapshot A (merging it back into B)

at that point, the 10G partition for the live file is still adequate,
but now the effort of merging the 10G of data from snapshot into the 2G
of storage reserved for snapshot B will fail, because the partition for
B is not large enough.

But I still don't see how that needs a new API; it seems like this would
be a case of making the 'delete A' operation fail rather than a case of
silently making snapshot B invalid.  If the only way to make a snapshot
lvm partition invalid is by making changes to that lvm partition outside
the knowledge of libvirt, then I don't know that libvirt can protect
itself against such 3rd party actions; and if libvirt's own actions can
never cause an invalidation, then what is the point of adding an API to
detect an invalid snapshot?  But I'm relatively inexperienced with lvm,
so an example of what you mean will go a long way to help me understand
the scenario you are envisioning.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-06-20 Thread Eric Blake
On 06/20/2011 04:22 PM, Motonobu Ichimura wrote:
>> /* Opaque type to manage a snapshot of a single storage volume.  */
>> typedef virStorageVolSnapshotPtr;
>>
>>
> I'm just wondering how to detect storage type and to choose assosiate
> shapshot functionality.
> Is there any idea about it?

I'm not yet convinced we need storage-type-specific functionality; the
XML elements can be optional, and if a particular element is not needed
by a particular storage driver, then that driver can ignore the element
(that is, the point of the snapshot XML on the create API is to provide
details like name or description where the default is not good enough,
and on dumpxml it will be the minimum xml for what was actually used in
creating the snapshot).

I believe that libvirt already has code for distinguishing between
different storage types (directory vs. block device), and can therefore
already tell whether a block device is an LVM partition.  It also
shouldn't be too hard to tell whether a disk image belongs to a btrfs
mount point, for using btrfs copy-on-write cloning (if nothing else,
coreutils' cp has already done this, so I have examples to copy from).
But I'll probably have to get more familiar with that code as I expand
this API beyond qcow2, and it certainly makes sense to consider whether
'virsh vol-info' should be expanded to display information already known
by libvirt (whether the volume is file- or block-based; if block-based
whether the partition is LVM, and if file-based what file system the
containing mount point appears to be).  That may indeed involve adding
more APIs beyond this proposal, but I think that this proposal will work
for 0.9.3 even if such new APIs are delayed until after the release.

>> /* Revert a volume back to the state of a snapshot, returning 0 on
>> success.  Flags is 0 for now.  */
>> int virStorageVolRevertToSnapsot(virStorageVolSnapshotPtr snapshot,
>> unsigned int flags);
>> [For qcow2, this would involve qemu-img snapshot -a.  Here, a useful
>> flag might be whether to delete any changes made after the point of the
>> snapshot; virDomainRevertToSnapshot should probably honor the same type
>> of flag.]
>>
> 
> For LVM, it may be lvconvert --merge , but requires recent version of lvm
> and linux kernel,
> So, there are some failure reason (toolchain doesn't support it or toolchain
> supports, but failed)
> I don't know how to handle these case for now, but we may need to add some
> virErrorNumber
> to indicate snapshot-related error.

Yes, we can add new virError values as we come across situations where
such operations are unsupported, but which don't fit well into any
existing error values.

>> virStorageVolSnapshotCreateXML.  [And since my first phase of
>> implementation will be focused on inline qcow2 snapshots, I don't yet
>> know what that XML will need to contain for any other type of snapshots,
>> such as mapping out how the snapshot backing file will be named in
>> relation to the possibly new live file.]
>>
> 
> In LVM case, taking snapshot(lvcreate)  needs at least snapshot volume size
> as an argument,
> So, I think storage-specific element can be inserted in this XML format

Wouldn't the snapshot volume size always be equal to the volume size of
the original?  Or is there really value in allowing the snapshot size to
be configured to something different (and if different, must it always
be <= original, or does a larger snapshot than the original make sense)?

>> Any feedback on this approach?  Any other APIs that would be useful to
>> add?  I'd like to get all the new APIs in place for 0.9.3 with minimal
>> qcow2 functionality, then use the time before 0.9.4 to further enhance
>> the APIs to cover more snapshot cases but without having to add any new
>> APIs.
>>
> 
> LVM snapshot may become invalid in the case of running out the volume size,
> So, adding an  API for checking whether snapshot is valid will be needed.

I'm not quite sure I follow the scenario you are envisioning here.
Would you mind stepping through an example (mapping [proposed] libvirt
API calls to lvm commands would be helpful), on how an original lvm
partition is created, then a snapshot partition, then how you run out of
volume size in that snapshot?  It sounds like you are saying that there
is a way to create an lvm snapshot that is valid at the time that is
created, but later on, subsequent actions cause the snapshot to run out
of space and no longer be valid.  But my understanding is that a
snapshot is a constant size (it represents a known state of the disk at
a fixed point in time), only the deltas to that snapshot (aka the live
disk) ever have the potential to grow beyond the amount of storage used
by the snapshot.  Or are you worried about creating an lvm snapshot by
libvirt, but then a third party program changes the property of the lvm
snapshot volume to change its size?

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-06-20 Thread Eric Blake
On 06/15/2011 11:41 PM, Eric Blake wrote:
> /* Opaque type to manage a snapshot of a single storage volume.  */
> typedef virStorageVolSnapshotPtr;
> 
> /* Create a snapshot of a storage volume.  XML is optional, if non-NULL,
> it would be a new top-level element  which is similar to
> the top-level  for virDomainSnapshotCreateXML, to
> specify name and description. Flags is 0 for now. */
> virStorageVolSnapshotPtr virDomainSnapshotCreateXML(virStorageVolPtr
> vol, const char *xml, unsigned int flags);

Copy-and-paste error; should be:

virStorageVolSnapshotPtr virStorageVolSnapshotCreateXML(virStorageVolPtr
vol, const char *xml, unsigned int flags);

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] RFC: API additions for enhanced snapshot support

2011-06-20 Thread Motonobu Ichimura
Hi,

I'm new to this ML, and very interested in this topic.
I'm going to  try to implement LVM snapshot feature using this framework.

2011/6/16 Eric Blake 

>
>
> /* Opaque type to manage a snapshot of a single storage volume.  */
> typedef virStorageVolSnapshotPtr;
>
>
I'm just wondering how to detect storage type and to choose assosiate
shapshot functionality.
Is there any idea about it?


>
> /* Revert a volume back to the state of a snapshot, returning 0 on
> success.  Flags is 0 for now.  */
> int virStorageVolRevertToSnapsot(virStorageVolSnapshotPtr snapshot,
> unsigned int flags);
> [For qcow2, this would involve qemu-img snapshot -a.  Here, a useful
> flag might be whether to delete any changes made after the point of the
> snapshot; virDomainRevertToSnapshot should probably honor the same type
> of flag.]
>

For LVM, it may be lvconvert --merge , but requires recent version of lvm
and linux kernel,
So, there are some failure reason (toolchain doesn't support it or toolchain
supports, but failed)
I don't know how to handle these case for now, but we may need to add some
virErrorNumber
to indicate snapshot-related error.

- 元のメッセージを表示 -

>
>
> As for the XML changes, it makes sense to snapshot just a subset of
> disks when you only care about crash-consistent state or if you can rely
> on a guest agent to quiesce the subset of disk(s) you care about, so the
> existing  element needs a new optional subelement to
> control which disks are snapshotted; additionally, this subelement will
> be useful for disk image formats that require additional complexity
> (such as a secondary file name, rather than the inline snapshot feature
> of qcow2).  I'm envisioning something like the following:
>
> 
>  whatever
>  
>  
>...
>  
> 
>
> where there can be up to as many  elements as there are disk
>  in the domain xml; any domain disk not listed is given default
> treatment.  The name attribute of  is mandatory, in order to match
> this disk element to one of the domain disks.  The snapshot='yes|no'
> attribute is optional, defaulting to yes, in order to skip a particular
> disk.  The  subelement is optional, but if present, it
> would be the same XML as is provided to the
> virStorageVolSnapshotCreateXML.  [And since my first phase of
> implementation will be focused on inline qcow2 snapshots, I don't yet
> know what that XML will need to contain for any other type of snapshots,
> such as mapping out how the snapshot backing file will be named in
> relation to the possibly new live file.]
>

In LVM case, taking snapshot(lvcreate)  needs at least snapshot volume size
as an argument,
So, I think storage-specific element can be inserted in this XML format


>
> Any feedback on this approach?  Any other APIs that would be useful to
> add?  I'd like to get all the new APIs in place for 0.9.3 with minimal
> qcow2 functionality, then use the time before 0.9.4 to further enhance
> the APIs to cover more snapshot cases but without having to add any new
> APIs.
>

LVM snapshot may become invalid in the case of running out the volume size,
So, adding an  API for checking whether snapshot is valid will be needed.

Regards,
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] RFC: API additions for enhanced snapshot support

2011-06-15 Thread Eric Blake
Right now, libvirt has a snapshot API via virDomainSnapshotCreateXML,
but for qemu domains, it only works if all the guest disk images are
qcow2, and qemu rather than libvirt does all the work.  However, it has
a couple of drawbacks: it is inherently tied to domains (there is no way
to manage snapshots of storage volumes not tied to a domain, even though
libvirt does that for qcow2 images associated with offline qemu domains
by using the qemu-img application).  And it necessarily operates on all
of the images associated with a domain in parallel - if any disk image
is not qcow2, the snapshot fails, and there is no way to select a subset
of disks to save.  However, it works on both active (disk and memory
state) and inactive domains (just disk state).

Upstream qemu is developing a 'live snapshot' feature, which allows the
creation of a snapshot without the current downtime of several seconds
required by the current 'savevm' monitor command, as well as means for
controlling applications (libvirt) to request that qemu pause I/O to a
particular disk, then externally perform a snapshot, then tell qemu to
resume I/O (perhaps on a different file name or fd from the host, but
with no change to the contents seen by the guest).  Eventually, these
changes will make it possible for libvirt to create fast snapshots of
LVM partitions or btrfs files for guest disk images, as well as to
select which disks are saved in a snapshot (that is, save a
crash-consistent state of a subset of disks, without the corresponding
RAM state, rather than making a full system restore point); the latter
would work best with guest cooperation to quiesce disks before qemu
pauses I/O to that disk, but that is an orthogonal enhancement.
However, my first goal with API enhancements is to merely prove that
libvirt can manage a live snapshot by using qemu-img on a qcow2 image
rather than the current 'savevm' approach of qemu doing all the work.

Additionally, libvirt provides the virDomainSave command, which saves
just the state of the domain's memory, and stops the guest.  A crude
libvirt-only snapshot could thus already be done by using virDomainSave,
then externally doing a snapshot of all disk images associated with the
domain by using virStorageVol APIs, except that such APIs don't yet
exist.  Additionally, virDomainSave has no flags argument, so there is
no way to request that the guest be resumed after the snapshot completes.

Right now, I'm proposing the addition of virDomainSaveFlags, along with
a series of virStorageVolSnapshot* APIs that mirror the
virDomainSnapshot* APIs.  This would mean adding:


/* Opaque type to manage a snapshot of a single storage volume.  */
typedef virStorageVolSnapshotPtr;

/* Create a snapshot of a storage volume.  XML is optional, if non-NULL,
it would be a new top-level element  which is similar to
the top-level  for virDomainSnapshotCreateXML, to
specify name and description. Flags is 0 for now. */
virStorageVolSnapshotPtr virDomainSnapshotCreateXML(virStorageVolPtr
vol, const char *xml, unsigned int flags);
[For qcow2, this would be implemented with 'qemu-img snapshot -c',
similar to what virDomainSnapshotXML already does on inactive domains.
Later, we can add LVM and btrfs support, or even allow full file copies
of any file type.  Also in the future, we could enhance XML to take a
new element that describes a relationship between the name of the
original and of the snapshot, in the case where a new filename has to be
created to complete the snapshot process.]


/* Probe if vol has snapshots.  1 if true, 0 if false, -1 on error.
Flags is 0 for now.  */
int virStorageVolHasCurrentSnapshot(virStorageVolPtr vol, unsigned int
flags);
[For qcow2 images, snapshots can be contained within the same file and
managed with qemu-img -l, but for other formats, this may mean that
libvirt has to start managing externally saved data associated with the
storage pool that associates snapshots with filenames.  In fact, even
for qcow2 it might be useful to support creation of new files backed by
the previous snapshot rather than cramming multiple snapshots in one
file, so we may have a use for flags to filter out the presence of
single-file vs. multiple-file snapshot setups.]


/* Revert a volume back to the state of a snapshot, returning 0 on
success.  Flags is 0 for now.  */
int virStorageVolRevertToSnapsot(virStorageVolSnapshotPtr snapshot,
unsigned int flags);
[For qcow2, this would involve qemu-img snapshot -a.  Here, a useful
flag might be whether to delete any changes made after the point of the
snapshot; virDomainRevertToSnapshot should probably honor the same type
of flag.]


/* Return the most recent snapshot of a volume, if one exists, or NULL
on failure.  Flags is 0 for now.  */
virStorageVolSnapshotPtr virStorageVolSnapshotCurrent(virStorageVolPtr
vol, unsigned int flags);


/* Delete the storage associated with a snapshot (although the opaque
snapshot object must still be independently freed).  If flags is 0,