RE: [ActiveDir] Disaster recovery scenario comments requested.

Myrick, Todd (NIH/CIT) Sun, 10 Aug 2003 14:46:47 -0700

DR is such an open ended topic.

First what constitutes a disaster to your organization?  
What systems can your organization do without?  
How much overhead to the capital cost are you willing to assume in your
Operations to have a DR system and strategy?  
Can you solve DR problems with technology alone, or do you need a plan?  
How big should the plan be?


Exchange DR is also a little open ended.  This is because Exchange 200x is
basically an application services that is depended on AD, DNS, IIS, TCP/IP
and Hardware.

My suggestions are as follows.

Planning/Strategy

The organization needs to be able to decide what the importance of the
systems is, and what the procedures are for fixing them.  If email is
important, a cost study should be made to see what the actual cost of not
having the system in place during operations is, and what times of the year
are critical to have the systems.  I always say you don't know the true
value of something until you don't have it.  That is when the real cost are
discovered.  Many companies go out of business as a result of disasters and
data lost.

The Plan

1.  Establish a communication hierarchy to communicate disaster and
response.
2.  Make a plan that is easy to distribute and follow. 
(No more than 4 pages)  
        A. Page one should identify what constitutes a disaster, and roles /
responsibilities, the communication process, and the major contact points.
        B. Page two should have systems map that describes how every thing
interrelates.
        C. Page three should have a process map that explains how to
transfer to the backup system.
        D. Page four should have list of references for assisting to
troubleshoot, and communicate the switch over. 
3.  A command center should be established when a disaster happens.  There
should be briefings every two hours for direct reports.  People should work
in two's.
4.  After the disaster, there should be a post mortem of the resolution,
documentation should be updated, and commendations given to those who
performed well under pressure.

Implementing

Training is the only way you can prepare for a disaster.  Every effort
should be made to make the transition from one system to the other seamless.
If Email is a critical service in your organization the following things
will affect it; Network connectivity, Environment, Hardware and software
configuration.  These are the major factors that need to be addressed in a
disaster recovery operation in order to mitigate risk.  

If the network is down the only way to get to the data is to be at the
server where the data is at or to switch over to a network where the data is
at.  If the power is out, or the Internet is down, you will need backup
network access and facilities to gain access to the data.  This might mean
that you need to invest in more hardware, and technology that allows you to
duplicate the data from one system to another in near time.  The current
limit I heard about data replication technology is the medium you transfer
the data by.  Fiber is the fastest mechanism and with one repeater it can
travel about 12 miles at nearly the same speed as a LAN.  Connections that
are multiplexed together might be able to increase the distance, but not
before the cost get outrageous.  

Hardware failures are mitigated with the use of redundant hardware, that is
pretty straight forward.  Just call HP and Dell and have them fight it out
over price and features.  I prefer Compaq for Server hardware.   

Software is the last bastion, and this is where you need to make your
choices wisely.  Generally in large environments you will need Anti-Virus,
Backup, Data Retention, and Data recovery solutions in order to maintain a
collaborative environment that is rather complete.  You will also need good
operations to manage the various elements of this system and troubleshoot
problems as they occur.  Troubleshooting consist of good tactical analysis
tools, specifically reporting, having a test environment and a staging
environment to test solutions and upgrades on production networks. 

The Team 

The best way to implement this system design is to develop a team approach.
I just finished reading a book called Team Secrets of the Navy Seals.  In it
the seals are organized as a special operation.  They have two Tech/Officers
in charge, and 4 enlisted personal.  Each person is responsible for a
specific function, and is responsible for training the others in his group
on the synthesis of the operation.  In the event that there is a problem or
that person is incapacitated, others can fill in that role.  Seals rely on
one another, and they are a volunteer organization.  So if anyone can't make
it, they are given the opportunity to leave and go work someplace else in
the organization.  Communication and self-management are also important and
they are expected to constantly be improving themselves.  Seal teams train
with each other for extended periods of time before they are put on an
operation together, and they never leave anyone behind from their team.  

The reason why I mention this is because in a disaster there is a lot of
confusion, there are people that are going to be heroic, there are people
that will panic, there will be people that want to blame, and there will be
the possibility for loss of property.  Discipline, knowing yourself and
moderation are your only defenses against the randomness of the event.
Chance only favors the prepared mind, hope is not a strategy.

Exchange DR

The biggest problem you run into with Exchange is viruses.  It is really not
in the best interest in any organization to allow anyone to use free or open
email systems on a corporate network.  Basically block port 25 inbound and
outbound at the firewall, and only allow specific bastion hosts to allow in
coming and out going email.  Subscribing to a service that helps eliminate
spam by content filtering and identifying open relay's is also a good
investment.

Protocols

Exchange 200x relies on IIS to run the SMTP and NNTP protocols.  Basically
Microsoft is turning IIS into an application protocol router.  What you have
to plan for is Front/End Backend setup of your Exchange organization.  This
means there is a component that is exposed to the Internet for Web Based
access to your Exchange servers via port 80, and port 25.  I recommend that
you look into a hardware solution that supports SSL and Load Balancing over
port 25.  There is a solution out there for about 20K for Exchange that does
the following.  IDS, Anti Virus, Content scanning, HTTP, SSL for POP IMAP
and HTTP, and load balancing redirection.  Just can't remember the vendor.

Storage Hardware

Storage is the next major issue and you need to know the following
information.

1.  How many users do you support?
2.  How big of a mail store so you want to support for your users?
3.  How many users do you want to put on a server?
4.  How much do you want to pay per user for mail storage?

The rule of thumb for our organization is we support as many users as needed
in the organization.  So our organization is 28K, but we are willing to
support the entire organization if necessary, and if they are willing to
pay. 

We currently support 50 MB per user, but are looking to expand to 100 MB per
user and beyond with special technology.  Specifically we are looking
forward to Exchange 200x technology to allow us to have smaller store sizes,
and vaulting to remove attachments off the server after a period of time and
archive them.

As a general rule we like to keep our servers around 1500 users as a
watermark, and allow them to grow or shrink up to 2000 users.  MS clustering
currently has a limitation of about 1000 users + or - 500 users.  If
clustering is necessary in your organization look to using SAN technology
and 2 node active/active clustering.  Compaq/HP being the premiere player in
the this market for Wintel.   If off site data replication is your cup of
tea, then look at EMC solutions.  They are expensive, but they are the
Cadillac of data storage. 

The general rule is that commercially hosted email cost around $100 per user
per year for full Exchange 200x support and 100 MB of storage.  POP/IMAP
support is considerably less, but not as feature rich, and requires more
work on the client side.  Find out how much the organization is willing to
pay per user, and determine the cost per feature.  If you can't do it for
between a 40% savings to the organization, I would look to outsource it.
Remember that savings also means downtime!

Backup/Recovery

Most major vendors support Backup solutions for Exchange.  Veritas is the
standard that most people stand by, but everyone I hear from says their
support is lousy.  The rising star is COMM Vault.  I say give them a look.  

These are the features you're looking for in your backup and recovery
selection.

Microsoft backup format support
Open Database/file support
Shadow copy support
Bare metal restore function 
(Recovery of a server using a bootable CD and tape)
*Encryption (Good Feature)
*Expandability to newer storage technology
*Support other Systems (Linux and UNIX)

Notice I did not list Brick Level.  This is because true brick level backups
break the Single instance storage of Exchange, and Take forever to run.

Aelita just came out with ARM, that basically can take a Microsoft Backup
formatted tape and recover a single mailbox or multiple mailboxes without
the need of an Expensive unsecured Recovery Forest or large unwieldy restore
procedures.  Using Their interface, restoring mailboxes is like falling off
a log.  No special training required.  

In the even you need to recover the entire server, the bare metal restores
option and SAN can make recovery a snap.  You could also possibly just place
a fresh server in the old servers place, change the IP addresses and Host
names, then Restore the IS and PF's and recover from a down server in an
hour or two.  This is because the directory is AD.  

Troubleshooting

Troubleshooting is more a tactical skill now a day.  It used to be a
operations skill, but with so many functions that need to be managed, you
can't rely on the same tech's to plan and troubleshoot the technology to
also maintain them.  Something has to give.  To be a good troubleshooter you
need to know network, hardware, OS, and ultimately application
troubleshooting.  You have to know your own abilities, be willing to grow,
think differently, research, test, and ultimately execute.  Also you can't
plan for things you can see.  A good reporting package is a must.  Bindview
Control has good reporting tools for both Exchange and also security.
Aelita In-trust is also another good utility.  Quest also has a pretty good
tool for interactive troubleshooting called Spotlight.  It is like perfmon
on steroids.  Also proactive Monitoring is a must.  MOM, or NetIQ's
appmanager are good tools to monitor your environment with.  Mom is more
event driven and can fire off resolutions.  Appmanager is more historic
information gathering.  It is basically good to tell you something broke,
and then allows you to research the historic information.  

Troubleshooting Exchange can be a challenge, because most of the problems
come from the client side.  You need to be able to collect data from a
client perspective and the server's perspective, see what systems are in
between and determine if it is a network bottleneck, or a hardware
bottleneck.  Knowing the protocols, how they act, and how they act when
there is problems, is a very important thing to understand.  Also
understanding quirks of the systems and software is also good knowledge.
Documentation and contacts are also a valuable tool.  I highly recommend
that you look at Chris Wolf's newest book, Troubleshooting Microsoft
Technologies for further information.  He is also working on a book for
Enterprise troubleshooting.              

Conclusion    

I have been in 7 disasters in my lifetime.  I used to work at a hospital as
an orderly; train wrecks, blizzards, and patient's coding taught me that you
have to work together in order to protect and heal people.  In IT, I was a
veteran of I Love You, Several Data Disasters, 9/11 and most recently SQL
Slammer.  What is interesting is that SQL slammer was actually the worst
disaster I ran into, probably because it involved the most managers, and not
a team.  It got way too political.

As you can see, DR for exchange sometimes only shows you the tip of the
iceberg.  I hope my sharing information to you all is helpful.

Please tell me what you think, I am always open to critical review.

Toddler
  


-----Original Message-----
From: Rick Kingslan [mailto:[EMAIL PROTECTED] 
Sent: Sunday, August 10, 2003 12:13 PM
To: [EMAIL PROTECTED]
Subject: RE: [ActiveDir] Disaster recovery scenario comments requested.

Jan,

Do you know if they have published a paper or some detail on this process?
Naturally, I'm interested in what they are proposing.

Currently, their full-fledged technical document is slated for March 2004,
which, IMHO, is way too late.

Rick Kingslan  MCSE, MCSA, MCT
Microsoft MVP - Active Directory
Associate Expert
Expert Zone - www.microsoft.com/windowsxp/expertzone
  

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jan Wilson
Sent: Sunday, August 10, 2003 10:56 AM
To: [EMAIL PROTECTED]
Subject: Re: [ActiveDir] Disaster recovery scenario comments requested.


Just as an aside here - MS of course displayed their VM server at tech ed -
one nice idea was DR for Exchange 2003 - you would basically generate a new
email server in minutes on a VM - users are then back online and you then
begin to backfill their email from tape.

List info   : http://www.activedir.org/mail_list.htm
List FAQ    : http://www.activedir.org/list_faq.htm
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/


List info   : http://www.activedir.org/mail_list.htm
List FAQ    : http://www.activedir.org/list_faq.htm
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/
List info   : http://www.activedir.org/mail_list.htm
List FAQ    : http://www.activedir.org/list_faq.htm
List archive: http://www.mail-archive.com/activedir%40mail.activedir.org/

RE: [ActiveDir] Disaster recovery scenario comments requested.

Reply via email to