Who is going to decide whether the hypervisor snapshot should actually happen or not? Or how?
Darren On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher <chris.su...@netapp.com> wrote: > > -- > Chris Suich > chris.su...@netapp.com > NetApp Software Engineer > Data Center Platforms – Cloud Solutions > Citrix, Cisco & Red Hat > > On Oct 8, 2013, at 2:24 PM, Darren Shepherd <darren.s.sheph...@gmail.com> > wrote: > >> So in the implementation, when we say "quiesce" is that actually being >> implemented as a VM snapshot (memory and disk). And then when you say >> "unquiesce" you are talking about deleting the VM snapshot? > > If the VM snapshot is not going to the hypervisor, then yes, it will actually > be a hypervisor snapshot. Just to be clear, the unquiesce is not quite a > delete - it is a collapse of the VM snapshot and the active VM back into one > file. > >> >> In NetApp, what are you snapshotting? The whole netapp volume (I >> don't know the correct term), a file on NFS, an iscsi volume? I don't >> know a whole heck of a lot about the netapp snapshot capabilities. > > Essentially we are using internal APIs to create file level backups - don't > worry too much about the terminology. > >> >> I know storage solutions can snapshot better and faster than >> hypervisors can with COW files. I've personally just been always >> perplexed on whats the best way to implement it. For storage >> solutions that are block based, its really easy to have the storage >> doing the snapshot. For shared file systems, like NFS, its seems way >> more complicated as you don't want to snapshot the entire filesystem >> in order to snapshot one file. > > With filesystems like NFS, things are certainly more complicated, but that is > taken care of by our controller's operating system, Data ONTAP, and we simply > use APIs to communicate with it. > >> >> Darren >> >> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher >> <chris.su...@netapp.com> wrote: >>> I can comment on the second half. >>> >>> Through storage operations, storage providers can create backups much >>> faster than hypervisors and over time, their snapshots are more efficient >>> than the snapshot chains that hypervisors create. It is true that a VM >>> snapshot taken at the storage level is slightly different as it would be >>> psuedo-quiesced, not have it's memory snapshotted. This is accomplished >>> through hypervisor snapshots: >>> >>> 1) VM snapshot request (lets say VM 'A' >>> 2) Create hypervisor snapshot (optional) >>> -VM 'A' is snapshotted, creating active VM 'A*' >>> -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*' >>> 3) Storage driver(s) take snapshots of each volume >>> 4) Undo hypervisor snapshot (optional) >>> -VM snapshot 'A' is rolled back into VM 'A*' so the hypervisor snapshot no >>> longer exists >>> >>> Now, a couple notes: >>> -The reason this is optional is that not all users necessarily care about >>> the memory or disk consistency of their VMs and would prefer faster >>> snapshots to consistency. >>> -Preemptively, yes, we are actually taking hypervisor snapshots which means >>> there isn't actually a performance of taking storage snapshots when >>> quiescing the VM. However, the performance gain will come both during >>> restoring the VM and during normal operations as described above. >>> >>> Although you can think of it as a poor man's VM snapshot, I would think of >>> it more as a consistent multi-volume snapshot. Again, the difference being >>> that this snapshot was not truly quiesced like a hypervisor snapshot would >>> be. >>> >>> -- >>> Chris Suich >>> chris.su...@netapp.com >>> NetApp Software Engineer >>> Data Center Platforms – Cloud Solutions >>> Citrix, Cisco & Red Hat >>> >>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd <darren.s.sheph...@gmail.com> >>> wrote: >>> >>>> My only comment is that having the return type as boolean and using to >>>> that indicate quiesce behaviour seems obscure and will probably lead >>>> to a problem later. Your basically saying the result of the >>>> takeVMSnapshot will only ever need to communicate back whether >>>> unquiesce needs to happen. Maybe some result object would be more >>>> extensible. >>>> >>>> Actually, I think I have more comments. This seems a bit odd to me. >>>> Why would a storage driver in ACS implement a VM snapshot >>>> functionality? VM snapshot is a really a hypervisor orchestrated >>>> operation. So it seems like were trying to implement a poor mans VM >>>> snapshot. Maybe if I understood what NetApp was trying to do it would >>>> make more sense, but its all odd. To do a proper VM snapshot you need >>>> to snapshot memory and disk at the exact same time. How are we going >>>> to do that if ACS is orchestrating the VM snapshot and delegating to >>>> storage providers. Its not like you are going to pause the VM.... or >>>> are you? >>>> >>>> Darren >>>> >>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <edison...@citrix.com> wrote: >>>>> I created a design document page at >>>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+snapshot+related+operations, >>>>> feel free to add items on it. >>>>> And a new branch "pluggable_vm_snapshot" is created. >>>>> >>>>>> -----Original Message----- >>>>>> From: SuichII, Christopher [mailto:chris.su...@netapp.com] >>>>>> Sent: Monday, October 07, 2013 10:02 AM >>>>>> To: <dev@cloudstack.apache.org> >>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations? >>>>>> >>>>>> I'm a fan of option 2 - this gives us the most flexibility (as you >>>>>> stated). The >>>>>> option is given to completely override the way VM snapshots work AND >>>>>> storage providers are given to opportunity to work within the default VM >>>>>> snapshot workflow. >>>>>> >>>>>> I believe this option should satisfy your concern, Mike. The snapshot and >>>>>> quiesce strategy would be in charge of communicating with the hypervisor. >>>>>> Storage providers should be able to leverage the default strategies and >>>>>> simply perform the storage operations. >>>>>> >>>>>> I don't think it should be much of an issue that new method to the >>>>>> storage >>>>>> driver interface may not apply to everyone. In fact, that is already the >>>>>> case. >>>>>> Some methods such as un/maintain(), attachToXXX() and takeSnapshot() are >>>>>> already not implemented by every driver - they just return false when >>>>>> asked >>>>>> if they can handle the operation. >>>>>> >>>>>> -- >>>>>> Chris Suich >>>>>> chris.su...@netapp.com >>>>>> NetApp Software Engineer >>>>>> Data Center Platforms - Cloud Solutions >>>>>> Citrix, Cisco & Red Hat >>>>>> >>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski >>>>>> <mike.tutkow...@solidfire.com> >>>>>> wrote: >>>>>> >>>>>>> Well, my first thought on this is that the storage driver should not >>>>>>> be telling the hypervisor to do anything. It should be responsible for >>>>>>> creating/deleting volumes, snapshots, etc. on its storage system only. >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <edison...@citrix.com> wrote: >>>>>>> >>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The current >>>>>>>> workflow will be like the following: >>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot -> >>>>>>>> send CreateVMSnapshotCommand to hypervisor to create vm snapshot. >>>>>>>> >>>>>>>> If anybody wants to change the workflow, then need to either change >>>>>>>> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl. >>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl should be >>>>>>>> able to handle different ways to take vm snapshot, instead of hard >>>>>>>> code. >>>>>>>> >>>>>>>> The requirements for the pluggable VM snapshot coming from: >>>>>>>> Storage vendor may have their optimization, such as NetApp. >>>>>>>> VM snapshot can be implemented in a totally different way(For >>>>>>>> example, I could just send a command to guest VM, to tell my >>>>>>>> application to flush disk and hold disk write, then come to hypervisor >>>>>>>> to >>>>>> take a volume snapshot). >>>>>>>> >>>>>>>> If we agree on enable pluggable VM snapshot, then we can move on >>>>>>>> discuss how to implement it. >>>>>>>> >>>>>>>> The possible options: >>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy interface, >>>>>>>> which has the following interfaces: >>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot); >>>>>>>> Boolean revertVMSnapshot(VMSnapshot vmSnapshot); >>>>>>>> Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot); >>>>>>>> >>>>>>>> The work flow will be: createVMSnapshot api -> >>>>>> VMSnapshotManagerImpl: >>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot >>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity check, >>>>>>>> then will handle over to VMSnapshotStrategy. >>>>>>>> In VMSnapshotStrategy implementation, it may just send a >>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor host, or do >>>>>>>> anything special operations. >>>>>>>> >>>>>>>> 2. fine-grained interface. Not only add a VMSnapshotStrategy >>>>>>>> interface, but also add certain methods on the storage driver. >>>>>>>> The VMSnapshotStrategy interface will be the same as option 1. >>>>>>>> Will add the following methods on storage driver: >>>>>>>> /* volumesBelongToVM is the list of volumes of the VM that created >>>>>>>> on this storage, storage vendor can either take one snapshot for this >>>>>>>> volumes in one shot, or take snapshot for each volume separately >>>>>>>> The pre-condition: vm is unquiesced. >>>>>>>> It will return a Boolean to indicate, do need unquiesce vm or not. >>>>>>>> In the default storage driver, it will return false. >>>>>>>> */ >>>>>>>> boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM, >>>>>>>> VMSnapshot vmSnapshot); >>>>>>>> Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM, >>>>>>>> VMSnapshot vmSnapshot); >>>>>>>> Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM, >>>>>>>> VMSnapshot vmSNapshot); >>>>>>>> >>>>>>>> The work flow will be: createVMSnapshot api -> >>>>>> VMSnapshotManagerImpl: >>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage >>>>>>>> driver:takeVMSnapshot In the implementation of VMSnapshotStrategy's >>>>>>>> takeVMSnapshot, the pseudo code looks like: >>>>>>>> HypervisorHelper.quiesceVM(vm); >>>>>>>> val volumes = vm.getVolumes(); >>>>>>>> val maps = new Map[driver, list[VolumeInfo]](); >>>>>>>> Volumes.foreach(volume => maps.put(volume.getDriver, volume :: >>>>>>>> maps.get(volume.getdriver()))) >>>>>>>> val needUnquiesce = true; >>>>>>>> maps.foreach((driver, volumes) => needUnquiesce = >>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes)) >>>>>>>> if (needUnquiesce ) { >>>>>>>> HypervisorHelper.unquiesce(vm); >>>>>>>> } >>>>>>>> >>>>>>>> By default, the quiesceVM in HypervisorHelper will actually take vm >>>>>>>> snapshot through hypervisor. >>>>>>>> Does above logic makes senesce? >>>>>>>> >>>>>>>> The pros of option 1 is that: it's simple, no need to change storage >>>>>>>> driver interfaces. The cons is that each storage vendor need to >>>>>>>> implement a strategy, maybe they will do the same thing. >>>>>>>> The pros of option 2 is that, storage driver won't need to worry >>>>>>>> about how to quiesce/unquiesce vm. The cons is that, it will add >>>>>>>> these methods on each storage drivers, so it assumes that this work >>>>>>>> flow will work for everybody. >>>>>>>> >>>>>>>> So which option we should take? Or if you have other options, please >>>>>>>> let's know. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Mike Tutkowski* >>>>>>> *Senior CloudStack Developer, SolidFire Inc.* >>>>>>> e: mike.tutkow...@solidfire.com >>>>>>> o: 303.746.7302 >>>>>>> Advancing the way the world uses the >>>>>>> cloud<http://solidfire.com/solution/overview/?video=play> >>>>>>> *(tm)* >>>>> >>> >