Who is going to decide whether the hypervisor snapshot should actually
happen or not? Or how?

Darren

On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
<chris.su...@netapp.com> wrote:
>
> --
> Chris Suich
> chris.su...@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
> On Oct 8, 2013, at 2:24 PM, Darren Shepherd <darren.s.sheph...@gmail.com> 
> wrote:
>
>> So in the implementation, when we say "quiesce" is that actually being
>> implemented as a VM snapshot (memory and disk).  And then when you say
>> "unquiesce" you are talking about deleting the VM snapshot?
>
> If the VM snapshot is not going to the hypervisor, then yes, it will actually 
> be a hypervisor snapshot. Just to be clear, the unquiesce is not quite a 
> delete - it is a collapse of the VM snapshot and the active VM back into one 
> file.
>
>>
>> In NetApp, what are you snapshotting?  The whole netapp volume (I
>> don't know the correct term), a file on NFS, an iscsi volume?  I don't
>> know a whole heck of a lot about the netapp snapshot capabilities.
>
> Essentially we are using internal APIs to create file level backups - don't 
> worry too much about the terminology.
>
>>
>> I know storage solutions can snapshot better and faster than
>> hypervisors can with COW files.  I've personally just been always
>> perplexed on whats the best way to implement it.  For storage
>> solutions that are block based, its really easy to have the storage
>> doing the snapshot.  For shared file systems, like NFS, its seems way
>> more complicated as you don't want to snapshot the entire filesystem
>> in order to snapshot one file.
>
> With filesystems like NFS, things are certainly more complicated, but that is 
> taken care of by our controller's operating system, Data ONTAP, and we simply 
> use APIs to communicate with it.
>
>>
>> Darren
>>
>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
>> <chris.su...@netapp.com> wrote:
>>> I can comment on the second half.
>>>
>>> Through storage operations, storage providers can create backups much 
>>> faster than hypervisors and over time, their snapshots are more efficient 
>>> than the snapshot chains that hypervisors create. It is true that a VM 
>>> snapshot taken at the storage level is slightly different as it would be 
>>> psuedo-quiesced, not have it's memory snapshotted. This is accomplished 
>>> through hypervisor snapshots:
>>>
>>> 1) VM snapshot request (lets say VM 'A'
>>> 2) Create hypervisor snapshot (optional)
>>>  -VM 'A' is snapshotted, creating active VM 'A*'
>>>  -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*'
>>> 3) Storage driver(s) take snapshots of each volume
>>> 4) Undo hypervisor snapshot (optional)
>>>  -VM snapshot 'A' is rolled back into VM 'A*' so the hypervisor snapshot no 
>>> longer exists
>>>
>>> Now, a couple notes:
>>> -The reason this is optional is that not all users necessarily care about 
>>> the memory or disk consistency of their VMs and would prefer faster 
>>> snapshots to consistency.
>>> -Preemptively, yes, we are actually taking hypervisor snapshots which means 
>>> there isn't actually a performance of taking storage snapshots when 
>>> quiescing the VM. However, the performance gain will come both during 
>>> restoring the VM and during normal operations as described above.
>>>
>>> Although you can think of it as a poor man's VM snapshot, I would think of 
>>> it more as a consistent multi-volume snapshot. Again, the difference being 
>>> that this snapshot was not truly quiesced like a hypervisor snapshot would 
>>> be.
>>>
>>> --
>>> Chris Suich
>>> chris.su...@netapp.com
>>> NetApp Software Engineer
>>> Data Center Platforms – Cloud Solutions
>>> Citrix, Cisco & Red Hat
>>>
>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd <darren.s.sheph...@gmail.com> 
>>> wrote:
>>>
>>>> My only comment is that having the return type as boolean and using to
>>>> that indicate quiesce behaviour seems obscure and will probably lead
>>>> to a problem later.  Your basically saying the result of the
>>>> takeVMSnapshot will only ever need to communicate back whether
>>>> unquiesce needs to happen.  Maybe some result object would be more
>>>> extensible.
>>>>
>>>> Actually, I think I have more comments.  This seems a bit odd to me.
>>>> Why would a storage driver in ACS implement a VM snapshot
>>>> functionality?  VM snapshot is a really a hypervisor orchestrated
>>>> operation.  So it seems like were trying to implement a poor mans VM
>>>> snapshot.  Maybe if I understood what NetApp was trying to do it would
>>>> make more sense, but its all odd.  To do a proper VM snapshot you need
>>>> to snapshot memory and disk at the exact same time.  How are we going
>>>> to do that if ACS is orchestrating the VM snapshot and delegating to
>>>> storage providers.  Its not like you are going to pause the VM.... or
>>>> are you?
>>>>
>>>> Darren
>>>>
>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <edison...@citrix.com> wrote:
>>>>> I created a design document page at 
>>>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+snapshot+related+operations,
>>>>>  feel free to add items on it.
>>>>> And a new branch "pluggable_vm_snapshot" is created.
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: SuichII, Christopher [mailto:chris.su...@netapp.com]
>>>>>> Sent: Monday, October 07, 2013 10:02 AM
>>>>>> To: <dev@cloudstack.apache.org>
>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>>>>>>
>>>>>> I'm a fan of option 2 - this gives us the most flexibility (as you 
>>>>>> stated). The
>>>>>> option is given to completely override the way VM snapshots work AND
>>>>>> storage providers are given to opportunity to work within the default VM
>>>>>> snapshot workflow.
>>>>>>
>>>>>> I believe this option should satisfy your concern, Mike. The snapshot and
>>>>>> quiesce strategy would be in charge of communicating with the hypervisor.
>>>>>> Storage providers should be able to leverage the default strategies and
>>>>>> simply perform the storage operations.
>>>>>>
>>>>>> I don't think it should be much of an issue that new method to the 
>>>>>> storage
>>>>>> driver interface may not apply to everyone. In fact, that is already the 
>>>>>> case.
>>>>>> Some methods such as un/maintain(), attachToXXX() and takeSnapshot() are
>>>>>> already not implemented by every driver - they just return false when 
>>>>>> asked
>>>>>> if they can handle the operation.
>>>>>>
>>>>>> --
>>>>>> Chris Suich
>>>>>> chris.su...@netapp.com
>>>>>> NetApp Software Engineer
>>>>>> Data Center Platforms - Cloud Solutions
>>>>>> Citrix, Cisco & Red Hat
>>>>>>
>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski 
>>>>>> <mike.tutkow...@solidfire.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Well, my first thought on this is that the storage driver should not
>>>>>>> be telling the hypervisor to do anything. It should be responsible for
>>>>>>> creating/deleting volumes, snapshots, etc. on its storage system only.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <edison...@citrix.com> wrote:
>>>>>>>
>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The current
>>>>>>>> workflow will be like the following:
>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot ->
>>>>>>>> send CreateVMSnapshotCommand to hypervisor to create vm snapshot.
>>>>>>>>
>>>>>>>> If anybody wants to change the workflow, then need to either change
>>>>>>>> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl.
>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl should be
>>>>>>>> able to handle different ways to take vm snapshot, instead of hard 
>>>>>>>> code.
>>>>>>>>
>>>>>>>> The requirements for the pluggable VM snapshot coming from:
>>>>>>>> Storage vendor may have their optimization, such as NetApp.
>>>>>>>> VM snapshot can be implemented in a totally different way(For
>>>>>>>> example, I could just send a command to guest VM, to tell my
>>>>>>>> application to flush disk and hold disk write, then come to hypervisor 
>>>>>>>> to
>>>>>> take a volume snapshot).
>>>>>>>>
>>>>>>>> If we agree on enable pluggable VM snapshot, then we can move on
>>>>>>>> discuss how to implement it.
>>>>>>>>
>>>>>>>> The possible options:
>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy interface,
>>>>>>>> which has the following interfaces:
>>>>>>>>  VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>  Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>  Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>
>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>> VMSnapshotManagerImpl:
>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity check,
>>>>>>>> then will handle over to VMSnapshotStrategy.
>>>>>>>> In VMSnapshotStrategy implementation, it may just send a
>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor host, or do
>>>>>>>> anything special operations.
>>>>>>>>
>>>>>>>> 2. fine-grained interface. Not only add a VMSnapshotStrategy
>>>>>>>> interface, but also add certain methods on the storage driver.
>>>>>>>>  The VMSnapshotStrategy interface will be the same as option 1.
>>>>>>>>  Will add the following methods on storage driver:
>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM that created
>>>>>>>> on this storage, storage vendor can either take one snapshot for this
>>>>>>>> volumes in one shot, or take snapshot for each volume separately
>>>>>>>>     The pre-condition: vm is unquiesced.
>>>>>>>>     It will return a Boolean to indicate, do need unquiesce vm or not.
>>>>>>>>     In the default storage driver, it will return false.
>>>>>>>>  */
>>>>>>>>  boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>> VMSnapshot vmSnapshot);
>>>>>>>>  Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>> VMSnapshot vmSnapshot);
>>>>>>>> Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>> VMSnapshot vmSNapshot);
>>>>>>>>
>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>> VMSnapshotManagerImpl:
>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
>>>>>>>> driver:takeVMSnapshot In the implementation of VMSnapshotStrategy's
>>>>>>>> takeVMSnapshot, the pseudo code looks like:
>>>>>>>>     HypervisorHelper.quiesceVM(vm);
>>>>>>>>     val volumes = vm.getVolumes();
>>>>>>>>     val maps = new Map[driver, list[VolumeInfo]]();
>>>>>>>>     Volumes.foreach(volume => maps.put(volume.getDriver, volume ::
>>>>>>>> maps.get(volume.getdriver())))
>>>>>>>>     val needUnquiesce = true;
>>>>>>>>      maps.foreach((driver, volumes) => needUnquiesce  =
>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
>>>>>>>>    if (needUnquiesce ) {
>>>>>>>>     HypervisorHelper.unquiesce(vm);
>>>>>>>>  }
>>>>>>>>
>>>>>>>> By default, the quiesceVM in HypervisorHelper will actually take vm
>>>>>>>> snapshot through hypervisor.
>>>>>>>> Does above logic makes senesce?
>>>>>>>>
>>>>>>>> The pros of option 1 is that: it's simple, no need to change storage
>>>>>>>> driver interfaces. The cons is that each storage vendor need to
>>>>>>>> implement a strategy, maybe they will do the same thing.
>>>>>>>> The pros of option 2 is that, storage driver won't need to worry
>>>>>>>> about how to quiesce/unquiesce vm. The cons is that, it will add
>>>>>>>> these methods on each storage drivers, so it assumes that this work
>>>>>>>> flow will work for everybody.
>>>>>>>>
>>>>>>>> So which option we should take? Or if you have other options, please
>>>>>>>> let's know.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Mike Tutkowski*
>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>>>>> e: mike.tutkow...@solidfire.com
>>>>>>> o: 303.746.7302
>>>>>>> Advancing the way the world uses the
>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>>>>> *(tm)*
>>>>>
>>>
>

Reply via email to