Title: Message
Kim:
Any thoughts?
OK,
I'll bite.
CAUTION: As always, my email is long, wordy,
technical and sometimes skirts off topic; however, I've got to put up with free
marketing references to Cisco/Juniperat every turn on NANOG. It's
nice to get Foundry's name here every once in awhile. We're all good
companies. For the most part (95%), I'll stay on HA operational topics for
the NANOG reader.
Some
recommended books on this subject are listed below. I will refer to these
books during this email:
Top
Down Network Design by Priscilla Oppenheimer (Cisco Press)
Designing Enterprise Routing and Switching
Architectures by Howard Berkowitz (McMillan Press)
High
Availability Network Fundamentals by Chris Oggerino (Cisco
Press)
Lot's
of other referencesto include industry standards by Telcordia (how's your
calculus?). See HA Networking Fundamentals for a good root reference
list.
I
guess the main thing to do is look on page 48 of Priscilla's book. She
categorizes customer requirements and recommends a method to prioritize those
requirements for network design tradeoffs. I've added "profitability" to
the list for service providers. You can tailor it to your business
goals. I go back to this a lot and it helps me know where availability
sits as a design requirement.
Now we
have to think about what you mean by "Industry Standard". This definitely
depends on the industry; however, it varies per company based on design
requirements mapped to business goals of YOUR company in that industry.
Obviously, having better availability is just one part of a multi-faceted
competitive business plan. In some industries, it is assumed to be basic
to have high availability. Other's ignore it.
Some
components you must look at are:
Human
Error/Process Error
Physical Infrastructure Security and
Robustness
Equipment Quality
Technologies
Special Events, Risks, and Threats (Sean Donelan
digging up your fiber, hacker attack, governmental or organizational
shutdown/de-peering, war or political unrest, resource shortages, economy,
insert your imagination here)
Maintenance
On the
technology side, basically... The lower you push your redundant failover
technology, the better your failover. SONET APS can failover in 50ms over
and over again. L2 and L3 protocols continue to operate as normal with
minimum In-Flight Packet loss (IFP). This is exactly why the 10GbE Forum
is promoting APS in the 10GbE WAN PHY!
Foundry Networks (my company) has two new technologies that can give you
sub second failover and avoid the failover of L3 and slower L2 redundancy
protocols (RSTP and STP). The technologies are Metro Ring Protocol and
Virtual Switch Redundancy Protocol. Both of these are currently in beta
(soon to be released), but I've been playing with them for the past week.
VSRP is VRRP on L2 steroids (sub second failover). Easy to understand (one
switch is actively forwarding while the other is on standby).All of
these L2 protocols are interoperable on the same devices in the same networks
(RSTP, STP, VSRP, MRP). A customer can run STPwith a provider VSRP
edge and MRP core. VLAN stacking and STP tunneling is supported for those
of you looking at Metro business plans. Below is an example of HA
technology with MRP. Take a look at this
topology:
_P1A
PE1 1
| ___P2A___P2C
\_P1B/2
| _P4A
\___P2B___P3A/3
|
\_PE2
I've
got link P2B to P3A running MPLS (LER to LER, don't ask why, it's just a lab)
OC-48 (wire speed 2.5G) with Draft Martini L2 VPN. Link P2A to P2C is
802.3ae draft 4.2 compliant 10GbE. All other links are GbE. I've got
50 VLAN's. 25 of them travel clockwise around the rings and 25 of them
travel counter clockwise. Each group of 25 is grouped in a topology group
and run an instance of MRP on the lead (master) VLAN of that topology
group. Rings are 1, 2, and 3. I really hope my diagram shows up OK
for the readers of this email.
The
MRP ring masters are PE1 for ring1, P1B for ring 2, and PE2 for ring 3.
MRP masters send out Ring Health Packets (RHP) around the ring every 100ms
(configurable). They originate these out of their primary ports and
receive them on their secondary ports. MRP masters block forwarding on
their secondary ports if they receive the RHP's. They transition to
forwarding (ring broken) when they stop receiving the RHP's.
Now
let's assume that all traffic is taking the bottom path via MRP primary paths on
the masters. OK, let's start pinging (192.168.1.40 is PE2 loopback
address):
PE1#ping 192.168.1.40 count 1 time 800Sending 1,
16-byte ICMP Echo to 192.168.1.40, timeout 800 msec, TTL 64Type Control-c to
abort 511000Request timed out. Here I unplug PE1 to
P1B link(primary path).1 In-Flight Packet(IFP)
lost. 854000Request timed out. Here I unplug PE1B to
P2B link (primary path) 1 IFP lost 116Request timed out.
Here I unplug P3A to PE2 link (primary path) 2 IFP's lost. All traffic on