On Wed, Jun 3, 2009 at 7:09 AM, Drew Weaver <drew.wea...@thenap.com> wrote:
> Hi All, > > I'm attempting to devise a method which will provide continuous operation > of certain resources in the event of a disaster at a single facility. > > The types of resources that need to be available in the event of a disaster > are ecommerce applications and other business critical resources. > > Some of the questions I keep running into are: > > Should the additional sites be connected to the primary site > (and/or the Internet directly)? > What is the best way to handle the routing? Obviously two > devices cannot occupy the same IP address at the same time, so how do you > provide that instant 'cut-over'? I could see using application balancers to > do this but then what if the application balancers fail, etc? > > Any advice from folks on list or off who have done similar work is greatly > appreciated. > > Thanks, > -Drew > > > In an environment where a DR site is deemed critical, it is my experience that critical business applications also have a test or development environment associated with the production one. If you look at the problem this way, then a DR equipped with the test/devel systems, with one "instance" of production always available, would only be challenging in terms of data sync. Various SAN solutions would resolve that (SAN sync-ing over WAN/MAN/etc.). Virtualization of critical systems may also add some benefits here: clone the critical VMs in the DR, and in conjunction with the storage being available, you'll be able to bring up this type of machines in no time - just make sure you have some sort of L2 available - maybe EoS, or tunneling over an L3 connectivity - tons of info when querying for virtual machine mobility and inter-site connectivity. Voice has to be considered, also - f/PSTN - make arrangements with provider to re-route (8xx) in case of disaster. VoIP may add some extra capabilities in terms of reachability over the Internet, in case your DR site cannot accommodate - C/S people, for example, who are critical to interface with customers in case of disaster (if no information - bigger loss - perception issues) have to be able to connect even from home. As far as "immediate" switch from one to another - DNS is the primary concern (unless some wise people have hardcoded IPs all over), but there are other issues people tend to forget, at the core of some clilents - take Oracle "fat" client and its TNS names - I've seen those associated with IPs, instead of host names ... etc. Disclaimer: the above = one of many aspects. Have seen DNS comments already, so I won't repeat those aspects. HTH, -- ***Stefan http://twitter.com/netfortius