Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder - discussion reminder

2014-03-21 Thread Deepak Shetty
Bruce,
   Thanks, it helps clarify.

thanx,
deepak



On Thu, Mar 20, 2014 at 10:07 PM, Bruce Montague 
bruce_monta...@symantec.com wrote:

 HI, Deepak. With the caveat that both the etherpad and Ron's presentation
 are pretty high-level, my guess is:



 1)  DR middleware refers to the orchestration engine managing the
 entire DR process between the primary and secondary sites. (Something like
 two Heat workflows interacting or a workflow that works across multiple
 OpenStack deployments.) The replication agent is what does what resembles
 continually cloning a volume from the primary to the secondary, with
 snapshots appearing on the secondary at times when the volumes contents are
 application-consistent and consistent with each other (for all the volumes
 of a VM or a multi-tier app). These secondary-site snapshots appear at
 specified rates (so you know how recent your oldest snapshots there will
 be). For instance, the replication agent might do some sort of snapshot(s)
 on the primary and then it updates the corresponding volume(s) on the
 secondary using the primary snapshot(s). This resembles (maybe it could
 even be) something like DRBD or NBD. Many SAN vendors provide some form of
 replication agent between SANs.



 2)  Regarding metadata, the replication agent might only be
 replicating the volumes of some tenant VMs. It might not be replicating any
 volumes containing OpenStack metadata. (This is for the smaller tenant
 use-case, not complete OpenStack deployment mirroring, or somesuch. If
 complete mirroring was done, maybe you wouldn't have to sync metadata if
 you designed the system just for that). DR is often something that a tenant
 might apply only to a set of core servers (key pets).  In this use-case the
 two (or more DR sites) might not be symmetrical. The secondary site needs
 to know it is in the secondary role. Things like IP addresses, maybe
 security and firewall rules, might have to change for the workload to run
 at the secondary site. Applying this metadata to VMs on the secondary site
 (what needs to change in the personality), when they boot, is probably
 something Heat can do.





 -bruce



 *From:* Deepak Shetty [mailto:dpkshe...@gmail.com]
 *Sent:* Wednesday, March 19, 2014 11:54 PM

 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] Disaster Recovery for OpenStack - call for
 stakeholder - discussion reminder



 Hi List,

 I was looking at the etherpad and March 19 notes and have few Qs

 1) How is the DR middleware (depicted in Ron's youtube video) different
 than the replication agent (noted in the March 19 etherpad notes). Are
 they same, if not, how/why are they different ?

 2) Maybe a dumb Q.. but still.. Why do we need to worry about syncing
 metadata differently ? If all the storage that is used across openstack
 services (and in typical case it might be just 1 backend, say GlsuterFS)
 are beign replicated durign the DR, wouldn't the metadata be replicated
 too.. why do we need to be concerned abt it as a separate entity ?

 thanx,
 deepak



 On Wed, Mar 19, 2014 at 2:11 PM, Ronen Kat ronen...@il.ibm.com wrote:

 For those who are interested we will discuss the disaster recovery
 use-cases and how to proceed toward the Juno summit on March 19 at 17:00
 UTC (invitation below)



 Call-in:
 https://www.teleconference.att.com/servlet/glbAccess?process=1accessCode=6406941accessNumber=1809417783#C2
 Passcode: 6406941

 Etherpad:
 https://etherpad.openstack.org/p/juno-disaster-recovery-call-for-stakeholders
 Wiki: https://wiki.openstack.org/wiki/DisasterRecovery

 Regards,
 __
 Ronen I. Kat, PhD
 Storage Research
 *IBM Research - Haifa*
 Phone: +972.3.7689493
 Email: ronen...@il.ibm.com




 From:Luohao (brian) brian.luo...@huawei.com
 To:OpenStack Development Mailing List (not for usage questions)
 openstack-dev@lists.openstack.org,
 Date:14/03/2014 03:59 AM
 Subject:Re: [openstack-dev] Disaster Recovery for OpenStack -
 call for stakeholder
 --




 1.  fsfreeze with vss has been added to qemu upstream, see
 http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg01963.html for
 usage.
 2.  libvirt allows a client to send any commands to qemu-ga, see
 http://wiki.libvirt.org/page/Qemu_guest_agent
 3.  linux fsfreeze is not equivalent to windows fsfreeze+vss. Linux
 fsreeze offers fs consistency only, while windows vss allows agents like
 sqlserver to register their plugins to flush their cache to disk when a
 snapshot occurs.
 4.  my understanding is xenserver does not support fsfreeze+vss now,
 because xenserver normally does not use block backend in qemu.

 -Original Message-
 From: Bruce Montague 
 [mailto:bruce_monta...@symantec.combruce_monta...@symantec.com]

 Sent: Thursday, March 13, 2014 10:35 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack

Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder - discussion reminder

2014-03-20 Thread Deepak Shetty
Hi List,
I was looking at the etherpad and March 19 notes and have few Qs

1) How is the DR middleware (depicted in Ron's youtube video) different
than the replication agent (noted in the March 19 etherpad notes). Are
they same, if not, how/why are they different ?

2) Maybe a dumb Q.. but still.. Why do we need to worry about syncing
metadata differently ? If all the storage that is used across openstack
services (and in typical case it might be just 1 backend, say GlsuterFS)
are beign replicated durign the DR, wouldn't the metadata be replicated
too.. why do we need to be concerned abt it as a separate entity ?

thanx,
deepak



On Wed, Mar 19, 2014 at 2:11 PM, Ronen Kat ronen...@il.ibm.com wrote:

 For those who are interested we will discuss the disaster recovery
 use-cases and how to proceed toward the Juno summit on March 19 at 17:00
 UTC (invitation below)



 Call-in:
 *https://www.teleconference.att.com/servlet/glbAccess?process=1accessCode=6406941accessNumber=1809417783#C2*https://www.teleconference.att.com/servlet/glbAccess?process=1accessCode=6406941accessNumber=1809417783#C2
 Passcode: 6406941

 Etherpad:
 https://etherpad.openstack.org/p/juno-disaster-recovery-call-for-stakeholders
 Wiki: https://wiki.openstack.org/wiki/DisasterRecovery

 Regards,
 __
 Ronen I. Kat, PhD
 Storage Research
 *IBM Research - Haifa*
 Phone: +972.3.7689493
 Email: ronen...@il.ibm.com




 From:Luohao (brian) brian.luo...@huawei.com
 To:OpenStack Development Mailing List (not for usage questions)
 openstack-dev@lists.openstack.org,
 Date:14/03/2014 03:59 AM
 Subject:Re: [openstack-dev] Disaster Recovery for OpenStack -
 call for stakeholder
 --



 1.  fsfreeze with vss has been added to qemu upstream, see
 http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg01963.html for
 usage.
 2.  libvirt allows a client to send any commands to qemu-ga, see
 http://wiki.libvirt.org/page/Qemu_guest_agent
 3.  linux fsfreeze is not equivalent to windows fsfreeze+vss. Linux
 fsreeze offers fs consistency only, while windows vss allows agents like
 sqlserver to register their plugins to flush their cache to disk when a
 snapshot occurs.
 4.  my understanding is xenserver does not support fsfreeze+vss now,
 because xenserver normally does not use block backend in qemu.

 -Original Message-
 From: Bruce Montague 
 [mailto:bruce_monta...@symantec.combruce_monta...@symantec.com]

 Sent: Thursday, March 13, 2014 10:35 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for
 stakeholder

 Hi, about OpenStack and VSS. Does anyone have experience with the qemu
 project's implementation of VSS support? They appear to have a within-guest
 agent, qemu-ga, that perhaps can work as a VSS requestor. Does it also work
 with KVM? Does qemu-ga work with libvirt (can VSS quiesce be triggered via
 libvirt)? I think there was an effort for qemu-ga to use fsfreeze as an
 equivalent to VSS on Linux systems, was that done?  If so, could an
 OpenStack API provide a generic quiesce request that would then get passed
 to libvirt? (Also, the XenServer VSS support seems different than
 qemu/KVM's, is this true? Can it also be accessed through libvirt?

 Thanks,

 -bruce

 -Original Message-
 From: Alessandro Pilotti 
 [mailto:apilo...@cloudbasesolutions.comapilo...@cloudbasesolutions.com
 ]
 Sent: Thursday, March 13, 2014 6:49 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for
 stakeholder

 Those use cases are very important in enterprise scenarios requirements,
 but there's an important missing piece in the current OpenStack APIs:
 support for application consistent backups via Volume Shadow Copy (or other
 solutions) at the instance level, including differential / incremental
 backups.

 VSS can be seamlessly added to the Nova Hyper-V driver (it's included with
 the free Hyper-V Server) with e.g. vSphere and XenServer supporting it as
 well (quescing) and with the option for third party vendors to add drivers
 for their solutions.

 A generic Nova backup / restore API supporting those features is quite
 straightforward to design. The main question at this stage is if the
 OpenStack community wants to support those use cases or not. Cinder
 backup/restore support [1] and volume replication [2] are surely a great
 starting point in this direction.

 Alessandro

 [1] https://review.openstack.org/#/c/69351/
 [2] https://review.openstack.org/#/c/64026/


  On 12/mar/2014, at 20:45, Bruce Montague bruce_monta...@symantec.com
 wrote:
 
 
  Hi, regarding the call to create a list of disaster recovery (DR) use
 cases (
 http://lists.openstack.org/pipermail/openstack-dev/2014-March/028859.html), 
 the following list sketches some speculative OpenStack DR use cases.
 These use cases do not reflect

[openstack-dev] Disaster Recovery for OpenStack - call for stakeholder - discussion reminder

2014-03-19 Thread Ronen Kat
For those who are interested we will discuss the disaster recovery 
use-cases and how to proceed toward the Juno summit on March 19 at 17:00 
UTC (invitation below)



Call-in: 
https://www.teleconference.att.com/servlet/glbAccess?process=1accessCode=6406941accessNumber=1809417783#C2
 

Passcode: 6406941

Etherpad: 
https://etherpad.openstack.org/p/juno-disaster-recovery-call-for-stakeholders
Wiki: https://wiki.openstack.org/wiki/DisasterRecovery

Regards,
__
Ronen I. Kat, PhD
Storage Research
IBM Research - Haifa
Phone: +972.3.7689493
Email: ronen...@il.ibm.com




From:   Luohao (brian) brian.luo...@huawei.com
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org, 
Date:   14/03/2014 03:59 AM
Subject:Re: [openstack-dev] Disaster Recovery for OpenStack - call 
for stakeholder



1.  fsfreeze with vss has been added to qemu upstream, see 
http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg01963.html for 
usage.
2.  libvirt allows a client to send any commands to qemu-ga, see 
http://wiki.libvirt.org/page/Qemu_guest_agent
3.  linux fsfreeze is not equivalent to windows fsfreeze+vss. Linux 
fsreeze offers fs consistency only, while windows vss allows agents like 
sqlserver to register their plugins to flush their cache to disk when a 
snapshot occurs.
4.  my understanding is xenserver does not support fsfreeze+vss now, 
because xenserver normally does not use block backend in qemu.

-Original Message-
From: Bruce Montague [mailto:bruce_monta...@symantec.com] 
Sent: Thursday, March 13, 2014 10:35 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for 
stakeholder

Hi, about OpenStack and VSS. Does anyone have experience with the qemu 
project's implementation of VSS support? They appear to have a 
within-guest agent, qemu-ga, that perhaps can work as a VSS requestor. 
Does it also work with KVM? Does qemu-ga work with libvirt (can VSS 
quiesce be triggered via libvirt)? I think there was an effort for qemu-ga 
to use fsfreeze as an equivalent to VSS on Linux systems, was that done? 
If so, could an OpenStack API provide a generic quiesce request that would 
then get passed to libvirt? (Also, the XenServer VSS support seems 
different than qemu/KVM's, is this true? Can it also be accessed through 
libvirt?

Thanks,

-bruce

-Original Message-
From: Alessandro Pilotti [mailto:apilo...@cloudbasesolutions.com]
Sent: Thursday, March 13, 2014 6:49 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for 
stakeholder

Those use cases are very important in enterprise scenarios requirements, 
but there's an important missing piece in the current OpenStack APIs: 
support for application consistent backups via Volume Shadow Copy (or 
other solutions) at the instance level, including differential / 
incremental backups.

VSS can be seamlessly added to the Nova Hyper-V driver (it's included with 
the free Hyper-V Server) with e.g. vSphere and XenServer supporting it as 
well (quescing) and with the option for third party vendors to add drivers 
for their solutions.

A generic Nova backup / restore API supporting those features is quite 
straightforward to design. The main question at this stage is if the 
OpenStack community wants to support those use cases or not. Cinder 
backup/restore support [1] and volume replication [2] are surely a great 
starting point in this direction.

Alessandro

[1] https://review.openstack.org/#/c/69351/
[2] https://review.openstack.org/#/c/64026/


 On 12/mar/2014, at 20:45, Bruce Montague bruce_monta...@symantec.com 
wrote:


 Hi, regarding the call to create a list of disaster recovery (DR) use 
cases ( 
http://lists.openstack.org/pipermail/openstack-dev/2014-March/028859.html 
), the following list sketches some speculative OpenStack DR use cases. 
These use cases do not reflect any specific product behavior and span a 
wide spectrum. This list is not a proposal, it is intended primarily to 
solicit additional discussion. The first basic use case, (1), is described 
in a bit more detail than the others; many of the others are elaborations 
on this basic theme.



 * (1) [Single VM]

 A single Windows VM with 4 volumes and VSS (Microsoft's Volume 
Shadowcopy Services) installed runs a key application and integral 
database. VSS can quiesce the app, database, filesystem, and I/O on demand 
and can be invoked external to the guest.

   a. The VM's volumes, including the boot volume, are replicated to a 
remote DR site (another OpenStack deployment).

   b. Some form of replicated VM or VM metadata exists at the remote 
site. This VM/description includes the replicated volumes. Some systems 
might use cold migration or some form of wide-area live VM migration to 
establish this remote site VM/description.

   c. When specified

Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder

2014-03-13 Thread Zhangleiqiang (Trump)
About the (1) [Single VM], the use cases as follows can be supplement.

1. Protection Group: Define the set of instances to be protected.
2. Protection Policy: Define the policy for protection group, such as sync 
period, sync priority, advanced features, etc.
3. Recovery Plan:Define the recovery steps during recovery, such as the 
power-off and boot order of instances, etc

--
zhangleiqiang (Ray)

Best Regards


 -Original Message-
 From: Bruce Montague [mailto:bruce_monta...@symantec.com]
 Sent: Thursday, March 13, 2014 2:38 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for
 stakeholder
 
 
 Hi, regarding the call to create a list of disaster recovery (DR) use cases
 ( http://lists.openstack.org/pipermail/openstack-dev/2014-March/028859.html
  ), the following list sketches some speculative OpenStack DR use cases. These
 use cases do not reflect any specific product behavior and span a wide
 spectrum. This list is not a proposal, it is intended primarily to solicit 
 additional
 discussion. The first basic use case, (1), is described in a bit more detail 
 than
 the others; many of the others are elaborations on this basic theme.
 
 
 
 * (1) [Single VM]
 
 A single Windows VM with 4 volumes and VSS (Microsoft's Volume Shadowcopy
 Services) installed runs a key application and integral database. VSS can 
 quiesce
 the app, database, filesystem, and I/O on demand and can be invoked external
 to the guest.
 
a. The VM's volumes, including the boot volume, are replicated to a remote
 DR site (another OpenStack deployment).
 
b. Some form of replicated VM or VM metadata exists at the remote site.
 This VM/description includes the replicated volumes. Some systems might use
 cold migration or some form of wide-area live VM migration to establish this
 remote site VM/description.
 
c. When specified by an SLA or policy, VSS is invoked, putting the VM's
 volumes in an application-consistent state. This state is flushed all the way
 through to the remote volumes. As each remote volume reaches its
 application-consistent state, this is recognized in some fashion, perhaps by 
 an
 in-band signal, and a snapshot of the volume is made at the remote site.
 Volume replication is re-enabled immediately following the snapshot. A backup
 is then made of the snapshot on the remote site. At the completion of this 
 cycle,
 application-consistent volume snapshots and backups exist on the remote site.
 
d.  When a disaster or firedrill happens, the replication network
 connection is cut. The remote site VM pre-created or defined so as to use the
 replicated volumes is then booted, using the latest application-consistent 
 state
 of the replicated volumes. The entire VM environment (management accounts,
 networking, external firewalling, console access, etc..), similar to that of 
 the
 primary, either needs to pre-exist in some fashion on the secondary or be
 created dynamically by the DR system. The booting VM either needs to attach
 to a virtual network environment similar to at the primary site or the VM 
 needs
 to have boot code that can alter its network personality. Networking
 configuration may occur in conjunction with an update to DNS and other
 networking infrastructure. It is necessary for all required networking
 configuration  to be pre-specified or done automatically. No manual admin
 activity should be required. Environment requirements may be stored in a DR
 configuration !
 or database associated with the replication.
 
e. In a firedrill or test, the virtual network environment at the remote 
 site
 may be a test bubble isolated from the real network, with some provision for
 protected access (such as NAT). Automatic testing is necessary to verify that
 replication succeeded. These tests need to be configurable by the end-user and
 admin and integrated with DR orchestration.
 
f. After the VM has booted and been operational, the network connection
 between the two sites is re-established. A replication connection between the
 replicated volumes is restablished, and the replicated volumes are re-synced,
 with the roles of primary and secondary reversed. (Ongoing replication in this
 configuration may occur, driven from the new primary.)
 
g. A planned failback of the VM to the old primary proceeds similar to the
 failover from the old primary to the old replica, but with roles reversed and 
 the
 process minimizing offline time and data loss.
 
 
 
 * (2) [Core tenant/project infrastructure VMs]
 
 Twenty VMs power the core infrastructure of a group using a private cloud
 (OpenStack in their own datacenter). Not all VMs run Windows with VSS, some
 run Linux with some equivalent mechanism, such as qemu-ga, driving fsfreeze
 and signal scripts. These VMs are replicated to a remote OpenStack
 deployment, in a fashion similar to (1). Orchestration occurring at the remote
 site on failover is more

Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder

2014-03-13 Thread Alessandro Pilotti
Those use cases are very important in enterprise scenarios requirements, but 
there's an important missing piece in the current OpenStack APIs: support for 
application consistent backups via Volume Shadow Copy (or other solutions) at 
the instance level, including differential / incremental backups.

VSS can be seamlessly added to the Nova Hyper-V driver (it's included with the 
free Hyper-V Server) with e.g. vSphere and XenServer supporting it as well 
(quescing) and with the option for third party vendors to add drivers for their 
solutions.

A generic Nova backup / restore API supporting those features is quite 
straightforward to design. The main question at this stage is if the OpenStack 
community wants to support those use cases or not. Cinder backup/restore 
support [1] and volume replication [2] are surely a great starting point in 
this direction.

Alessandro

[1] https://review.openstack.org/#/c/69351/
[2] https://review.openstack.org/#/c/64026/


 On 12/mar/2014, at 20:45, Bruce Montague bruce_monta...@symantec.com 
 wrote:
 
 
 Hi, regarding the call to create a list of disaster recovery (DR) use cases ( 
 http://lists.openstack.org/pipermail/openstack-dev/2014-March/028859.html ), 
 the following list sketches some speculative OpenStack DR use cases. These 
 use cases do not reflect any specific product behavior and span a wide 
 spectrum. This list is not a proposal, it is intended primarily to solicit 
 additional discussion. The first basic use case, (1), is described in a bit 
 more detail than the others; many of the others are elaborations on this 
 basic theme. 
 
 
 
 * (1) [Single VM]
 
 A single Windows VM with 4 volumes and VSS (Microsoft's Volume Shadowcopy 
 Services) installed runs a key application and integral database. VSS can 
 quiesce the app, database, filesystem, and I/O on demand and can be invoked 
 external to the guest.
 
   a. The VM's volumes, including the boot volume, are replicated to a remote 
 DR site (another OpenStack deployment).
 
   b. Some form of replicated VM or VM metadata exists at the remote site. 
 This VM/description includes the replicated volumes. Some systems might use 
 cold migration or some form of wide-area live VM migration to establish this 
 remote site VM/description.
 
   c. When specified by an SLA or policy, VSS is invoked, putting the VM's 
 volumes in an application-consistent state. This state is flushed all the way 
 through to the remote volumes. As each remote volume reaches its 
 application-consistent state, this is recognized in some fashion, perhaps by 
 an in-band signal, and a snapshot of the volume is made at the remote site. 
 Volume replication is re-enabled immediately following the snapshot. A backup 
 is then made of the snapshot on the remote site. At the completion of this 
 cycle, application-consistent volume snapshots and backups exist on the 
 remote site.
 
   d.  When a disaster or firedrill happens, the replication network 
 connection is cut. The remote site VM pre-created or defined so as to use the 
 replicated volumes is then booted, using the latest application-consistent 
 state of the replicated volumes. The entire VM environment (management 
 accounts, networking, external firewalling, console access, etc..), similar 
 to that of the primary, either needs to pre-exist in some fashion on the 
 secondary or be created dynamically by the DR system. The booting VM either 
 needs to attach to a virtual network environment similar to at the primary 
 site or the VM needs to have boot code that can alter its network 
 personality. Networking configuration may occur in conjunction with an update 
 to DNS and other networking infrastructure. It is necessary for all required 
 networking configuration  to be pre-specified or done automatically. No 
 manual admin activity should be required. Environment requirements may be 
 stored in a DR configuration o
 r database associated with the replication. 
 
   e. In a firedrill or test, the virtual network environment at the remote 
 site may be a test bubble isolated from the real network, with some 
 provision for protected access (such as NAT). Automatic testing is necessary 
 to verify that replication succeeded. These tests need to be configurable by 
 the end-user and admin and integrated with DR orchestration.
 
   f. After the VM has booted and been operational, the network connection 
 between the two sites is re-established. A replication connection between the 
 replicated volumes is restablished, and the replicated volumes are re-synced, 
 with the roles of primary and secondary reversed. (Ongoing replication in 
 this configuration may occur, driven from the new primary.)
 
   g. A planned failback of the VM to the old primary proceeds similar to the 
 failover from the old primary to the old replica, but with roles reversed and 
 the process minimizing offline time and data loss.
 
 
 
 * (2) [Core tenant/project infrastructure VMs] 
 
 Twenty VMs 

Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder

2014-03-13 Thread Bruce Montague
Hi, about OpenStack and VSS. Does anyone have experience with the qemu 
project's implementation of VSS support? They appear to have a within-guest 
agent, qemu-ga, that perhaps can work as a VSS requestor. Does it also work 
with KVM? Does qemu-ga work with libvirt (can VSS quiesce be triggered via 
libvirt)? I think there was an effort for qemu-ga to use fsfreeze as an 
equivalent to VSS on Linux systems, was that done?  If so, could an OpenStack 
API provide a generic quiesce request that would then get passed to libvirt? 
(Also, the XenServer VSS support seems different than qemu/KVM's, is this true? 
Can it also be accessed through libvirt?

Thanks,

-bruce

-Original Message-
From: Alessandro Pilotti [mailto:apilo...@cloudbasesolutions.com]
Sent: Thursday, March 13, 2014 6:49 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for 
stakeholder

Those use cases are very important in enterprise scenarios requirements, but 
there's an important missing piece in the current OpenStack APIs: support for 
application consistent backups via Volume Shadow Copy (or other solutions) at 
the instance level, including differential / incremental backups.

VSS can be seamlessly added to the Nova Hyper-V driver (it's included with the 
free Hyper-V Server) with e.g. vSphere and XenServer supporting it as well 
(quescing) and with the option for third party vendors to add drivers for their 
solutions.

A generic Nova backup / restore API supporting those features is quite 
straightforward to design. The main question at this stage is if the OpenStack 
community wants to support those use cases or not. Cinder backup/restore 
support [1] and volume replication [2] are surely a great starting point in 
this direction.

Alessandro

[1] https://review.openstack.org/#/c/69351/
[2] https://review.openstack.org/#/c/64026/


 On 12/mar/2014, at 20:45, Bruce Montague bruce_monta...@symantec.com 
 wrote:


 Hi, regarding the call to create a list of disaster recovery (DR) use cases ( 
 http://lists.openstack.org/pipermail/openstack-dev/2014-March/028859.html ), 
 the following list sketches some speculative OpenStack DR use cases. These 
 use cases do not reflect any specific product behavior and span a wide 
 spectrum. This list is not a proposal, it is intended primarily to solicit 
 additional discussion. The first basic use case, (1), is described in a bit 
 more detail than the others; many of the others are elaborations on this 
 basic theme.



 * (1) [Single VM]

 A single Windows VM with 4 volumes and VSS (Microsoft's Volume Shadowcopy 
 Services) installed runs a key application and integral database. VSS can 
 quiesce the app, database, filesystem, and I/O on demand and can be invoked 
 external to the guest.

   a. The VM's volumes, including the boot volume, are replicated to a remote 
 DR site (another OpenStack deployment).

   b. Some form of replicated VM or VM metadata exists at the remote site. 
 This VM/description includes the replicated volumes. Some systems might use 
 cold migration or some form of wide-area live VM migration to establish this 
 remote site VM/description.

   c. When specified by an SLA or policy, VSS is invoked, putting the VM's 
 volumes in an application-consistent state. This state is flushed all the way 
 through to the remote volumes. As each remote volume reaches its 
 application-consistent state, this is recognized in some fashion, perhaps by 
 an in-band signal, and a snapshot of the volume is made at the remote site. 
 Volume replication is re-enabled immediately following the snapshot. A backup 
 is then made of the snapshot on the remote site. At the completion of this 
 cycle, application-consistent volume snapshots and backups exist on the 
 remote site.

   d.  When a disaster or firedrill happens, the replication network
 connection is cut. The remote site VM pre-created or defined so as to use the 
 replicated volumes is then booted, using the latest application-consistent 
 state of the replicated volumes. The entire VM environment (management 
 accounts, networking, external firewalling, console access, etc..), similar 
 to that of the primary, either needs to pre-exist in some fashion on the 
 secondary or be created dynamically by the DR system. The booting VM either 
 needs to attach to a virtual network environment similar to at the primary 
 site or the VM needs to have boot code that can alter its network 
 personality. Networking configuration may occur in conjunction with an update 
 to DNS and other networking infrastructure. It is necessary for all required 
 networking configuration  to be pre-specified or done automatically. No 
 manual admin activity should be required. Environment requirements may be 
 stored in a DR configuration o r database associated with the replication.

   e. In a firedrill or test, the virtual network environment at the remote 
 site may be a test bubble isolated

Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder

2014-03-13 Thread Michael Factor
Bruce,

Nice list of use cases; thank you for sharing.  One thought

Bruce Montague bruce_monta...@symantec.com wrote on 13/03/2014 04:34:59 
PM:


  * (2) [Core tenant/project infrastructure VMs]
 
  Twenty VMs power the core infrastructure of a group using a 
 private cloud (OpenStack in their own datacenter). Not all VMs run 
 Windows with VSS, some run Linux with some equivalent mechanism, 
 such as qemu-ga, driving fsfreeze and signal scripts. These VMs are 
 replicated to a remote OpenStack deployment, in a fashion similar to
 (1). Orchestration occurring at the remote site on failover is more 
 complex (correct VM boot order is orchestrated, DHCP service is 
 configured as expected, all IPs are made available and verified). An
 equivalent virtual network topology consisting of multiple networks 
 or subnets might be pre-created or dynamically created at failover time.
 
a. Storage for all volumes of all VMs might be on a single 
 storage backend (logically a single large volume containing many 
 smaller sub-volumes, examples being a VMware datastore or Hyper-V 
 CSV). This entire large volume might be replicated between similar 
 storage backends at the primary and secondary site. A single 
 replicated large volume thus replicates all the tenant VM's volumes.
 The DR system must trigger quiesce of all volumes to application-
 consistent state.

A variant of having logically a single volume on a single storage backend 
is having all the volumes allocated from storage that provides consistency 
groups.  This may also be related to cross VM consistent 
backups/snapshots.  Of course a question would be whether, and if so, how 
to surface this.

-- Michael

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder

2014-03-13 Thread Fox, Kevin M
Funny this topic came up. I was just looking into some of this yesterday. 
Here's some links that I came up with:

*  
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Administration_Guide/sub-sect-qemu-ga-freeze-thaw.html
 - Describes how application level safe backups of vm's can be accomplished. 
Didn't have the proper framework prior to RedHat 6.5. Looks reasonable now.

* http://lists.gnu.org/archive/html/qemu-devel/2012-11/msg01043.html - An 
example of a hook that lets you snapshot mysql safely while it is still running.

* https://wiki.openstack.org/wiki/Cinder/QuiescedSnapshotWithQemuGuestAgent - A 
blueprint for making safe live snapshots enabled via the Cinder api. Its not 
there yet, but being worked on.

 * https://blueprints.launchpad.net/nova/+spec/qemu-guest-agent-support - Nova 
supports freeze/thawing the instance.

Thanks,
Kevin

From: Bruce Montague [bruce_monta...@symantec.com]
Sent: Thursday, March 13, 2014 7:34 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for 
stakeholder

Hi, about OpenStack and VSS. Does anyone have experience with the qemu 
project's implementation of VSS support? They appear to have a within-guest 
agent, qemu-ga, that perhaps can work as a VSS requestor. Does it also work 
with KVM? Does qemu-ga work with libvirt (can VSS quiesce be triggered via 
libvirt)? I think there was an effort for qemu-ga to use fsfreeze as an 
equivalent to VSS on Linux systems, was that done?  If so, could an OpenStack 
API provide a generic quiesce request that would then get passed to libvirt? 
(Also, the XenServer VSS support seems different than qemu/KVM's, is this true? 
Can it also be accessed through libvirt?

Thanks,

-bruce

-Original Message-
From: Alessandro Pilotti [mailto:apilo...@cloudbasesolutions.com]
Sent: Thursday, March 13, 2014 6:49 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for 
stakeholder

Those use cases are very important in enterprise scenarios requirements, but 
there's an important missing piece in the current OpenStack APIs: support for 
application consistent backups via Volume Shadow Copy (or other solutions) at 
the instance level, including differential / incremental backups.

VSS can be seamlessly added to the Nova Hyper-V driver (it's included with the 
free Hyper-V Server) with e.g. vSphere and XenServer supporting it as well 
(quescing) and with the option for third party vendors to add drivers for their 
solutions.

A generic Nova backup / restore API supporting those features is quite 
straightforward to design. The main question at this stage is if the OpenStack 
community wants to support those use cases or not. Cinder backup/restore 
support [1] and volume replication [2] are surely a great starting point in 
this direction.

Alessandro

[1] https://review.openstack.org/#/c/69351/
[2] https://review.openstack.org/#/c/64026/


 On 12/mar/2014, at 20:45, Bruce Montague bruce_monta...@symantec.com 
 wrote:


 Hi, regarding the call to create a list of disaster recovery (DR) use cases ( 
 http://lists.openstack.org/pipermail/openstack-dev/2014-March/028859.html ), 
 the following list sketches some speculative OpenStack DR use cases. These 
 use cases do not reflect any specific product behavior and span a wide 
 spectrum. This list is not a proposal, it is intended primarily to solicit 
 additional discussion. The first basic use case, (1), is described in a bit 
 more detail than the others; many of the others are elaborations on this 
 basic theme.



 * (1) [Single VM]

 A single Windows VM with 4 volumes and VSS (Microsoft's Volume Shadowcopy 
 Services) installed runs a key application and integral database. VSS can 
 quiesce the app, database, filesystem, and I/O on demand and can be invoked 
 external to the guest.

   a. The VM's volumes, including the boot volume, are replicated to a remote 
 DR site (another OpenStack deployment).

   b. Some form of replicated VM or VM metadata exists at the remote site. 
 This VM/description includes the replicated volumes. Some systems might use 
 cold migration or some form of wide-area live VM migration to establish this 
 remote site VM/description.

   c. When specified by an SLA or policy, VSS is invoked, putting the VM's 
 volumes in an application-consistent state. This state is flushed all the way 
 through to the remote volumes. As each remote volume reaches its 
 application-consistent state, this is recognized in some fashion, perhaps by 
 an in-band signal, and a snapshot of the volume is made at the remote site. 
 Volume replication is re-enabled immediately following the snapshot. A backup 
 is then made of the snapshot on the remote site. At the completion of this 
 cycle, application-consistent volume snapshots and backups exist

Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder

2014-03-13 Thread Luohao (brian)
1.  fsfreeze with vss has been added to qemu upstream, see 
http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg01963.html for usage.
2.  libvirt allows a client to send any commands to qemu-ga, see 
http://wiki.libvirt.org/page/Qemu_guest_agent
3.  linux fsfreeze is not equivalent to windows fsfreeze+vss. Linux fsreeze 
offers fs consistency only, while windows vss allows agents like sqlserver to 
register their plugins to flush their cache to disk when a snapshot occurs.
4.  my understanding is xenserver does not support fsfreeze+vss now, because 
xenserver normally does not use block backend in qemu.

-Original Message-
From: Bruce Montague [mailto:bruce_monta...@symantec.com] 
Sent: Thursday, March 13, 2014 10:35 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for 
stakeholder

Hi, about OpenStack and VSS. Does anyone have experience with the qemu 
project's implementation of VSS support? They appear to have a within-guest 
agent, qemu-ga, that perhaps can work as a VSS requestor. Does it also work 
with KVM? Does qemu-ga work with libvirt (can VSS quiesce be triggered via 
libvirt)? I think there was an effort for qemu-ga to use fsfreeze as an 
equivalent to VSS on Linux systems, was that done?  If so, could an OpenStack 
API provide a generic quiesce request that would then get passed to libvirt? 
(Also, the XenServer VSS support seems different than qemu/KVM's, is this true? 
Can it also be accessed through libvirt?

Thanks,

-bruce

-Original Message-
From: Alessandro Pilotti [mailto:apilo...@cloudbasesolutions.com]
Sent: Thursday, March 13, 2014 6:49 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for 
stakeholder

Those use cases are very important in enterprise scenarios requirements, but 
there's an important missing piece in the current OpenStack APIs: support for 
application consistent backups via Volume Shadow Copy (or other solutions) at 
the instance level, including differential / incremental backups.

VSS can be seamlessly added to the Nova Hyper-V driver (it's included with the 
free Hyper-V Server) with e.g. vSphere and XenServer supporting it as well 
(quescing) and with the option for third party vendors to add drivers for their 
solutions.

A generic Nova backup / restore API supporting those features is quite 
straightforward to design. The main question at this stage is if the OpenStack 
community wants to support those use cases or not. Cinder backup/restore 
support [1] and volume replication [2] are surely a great starting point in 
this direction.

Alessandro

[1] https://review.openstack.org/#/c/69351/
[2] https://review.openstack.org/#/c/64026/


 On 12/mar/2014, at 20:45, Bruce Montague bruce_monta...@symantec.com 
 wrote:


 Hi, regarding the call to create a list of disaster recovery (DR) use cases ( 
 http://lists.openstack.org/pipermail/openstack-dev/2014-March/028859.html ), 
 the following list sketches some speculative OpenStack DR use cases. These 
 use cases do not reflect any specific product behavior and span a wide 
 spectrum. This list is not a proposal, it is intended primarily to solicit 
 additional discussion. The first basic use case, (1), is described in a bit 
 more detail than the others; many of the others are elaborations on this 
 basic theme.



 * (1) [Single VM]

 A single Windows VM with 4 volumes and VSS (Microsoft's Volume Shadowcopy 
 Services) installed runs a key application and integral database. VSS can 
 quiesce the app, database, filesystem, and I/O on demand and can be invoked 
 external to the guest.

   a. The VM's volumes, including the boot volume, are replicated to a remote 
 DR site (another OpenStack deployment).

   b. Some form of replicated VM or VM metadata exists at the remote site. 
 This VM/description includes the replicated volumes. Some systems might use 
 cold migration or some form of wide-area live VM migration to establish this 
 remote site VM/description.

   c. When specified by an SLA or policy, VSS is invoked, putting the VM's 
 volumes in an application-consistent state. This state is flushed all the way 
 through to the remote volumes. As each remote volume reaches its 
 application-consistent state, this is recognized in some fashion, perhaps by 
 an in-band signal, and a snapshot of the volume is made at the remote site. 
 Volume replication is re-enabled immediately following the snapshot. A backup 
 is then made of the snapshot on the remote site. At the completion of this 
 cycle, application-consistent volume snapshots and backups exist on the 
 remote site.

   d.  When a disaster or firedrill happens, the replication network 
 connection is cut. The remote site VM pre-created or defined so as to use the 
 replicated volumes is then booted, using the latest application-consistent 
 state of the replicated volumes. The entire VM

[openstack-dev] Disaster Recovery for OpenStack - call for stakeholder

2014-03-04 Thread Ronen Kat
Hello,

In the Hong-Kong summit, there was a lot of interest around OpenStack 
support for Disaster Recovery including a design summit session, an 
un-conference session and a break-out session.
In addition we set up a Wiki for OpenStack disaster recovery - see 
https://wiki.openstack.org/wiki/DisasterRecovery 
The first step was enabling volume replication in Cinder, which has 
started in the Icehouse development cycle and will continue into Juno.

Toward the Juno summit and development cycle we would like to send out a 
call for disaster recovery stakeholders, looking to:
* Create a list of use-cases and scenarios for disaster recovery with 
OpenStack
* Find interested parties who wish to contribute features and code to 
advance disaster recovery in OpenStack
* Plan needed for discussions at the Juno summit

To coordinate such efforts, I  would like to invite you to a conference 
call on Wednesday March 5 at 12pm ET and work together coordinating 
actions for the Juno summit (an invitation is attached).
We will record minutes of the call at - 
https://etherpad.openstack.org/p/juno-disaster-recovery-call-for-stakeholders 
(link also available from the disaster recovery wiki page).
If you are unable to join and interested, please register your self and 
share your thoughts.



Call in numbers are available at 
https://www.teleconference.att.com/servlet/glbAccess?process=1accessCode=6406941accessNumber=1809417783#C2
 

Passcode: 6406941

Regards,
__
Ronen I. Kat, PhD
Storage Research
IBM Research - Haifa
Phone: +972.3.7689493
Email: ronen...@il.ibm.com


invite.ics
Description: Binary data
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev