(with a red face, reconstructing my original response. We're having
a violent thunderstorm, and the power went out. I saved a lot of
things, but trusted my UPS. Unfortunately, the UPS is much more
helpful when the CPU is plugged into one of the battery backup
outlets, not a surge protector outlet. Now it is!)
>Priscilla wrote, in response to John Hardman,
>Well, you're preaching to the choir, but I have a couple comments in line
>nonetheless. A really technical discussion would require more knowledge of
>statistics, queuing theory, etc., than I have. But it can't hurt to discuss
>the issues at a high level also.
>
>
>
>In the phone industry, we can use Erlang and other obscure methods for
>calculating the amount of bandwidth needed based on an acceptable frequency
>of someone picking up the phone and not getting dial tone. Why can't we do
>something similar with networking? I suspect it's because network traffic
>is so different from phone traffic. We claim that network traffic is
>"bursty," but it's not nearly as bursty as phone traffic. There's very
>little quiet time. Even if the user isn't doing something there's still
>overhead traffic, keepalives, routing table updates, etc. The consequences
>of not being able to send this overhead traffic can result in serious
>performance degradation.
While Erlang C distributions can be of some use in estimating
buffered data network device capacity, the statistical aspects get
complex. The "conventional wisdom" was that packet interarrivals
were exponential, but more recent research (e.g., by Will Leland at
Telecordia) showed that fractal distributions were much more
accurate. There's also the aspect that traffic in a routed system is
actively affected by congestion and other feedback.
A surprisingly large number of Internet packets are 40 to 48 bytes.
These turn out to be TCP, and primarily HTTP, acknowledgements,
possibly with some buffer padding beyond the essential 20 bytes of IP
and 20 bytes of TCP. HTTP is _not_ bandwidth efficient, or address
space efficient, but there's a lot of inertia in upgrading it.
>
>
> >
>>Now this begs the next question... do people need 99.999% uptime on the
>>phone system or on their network? Keep in mind that 99.999% uptime equals
to
>>apx 1 minute of downtime per 30 days.
And adding each additional nine adds a lot of cost. Also, adding
redundancy in particular parts of the overall system doesn't
necessarily make things better. Perhaps going wildly off topic,
perhaps not, professional futurists use the idea of a
"Hahn-Strasseman point" in forecasting technology. That refers to the
physical demonstration of nuclear fission. All the theoretical work
in the world wasn't going to go much farther without that
experimental breakthrough. In other cases, all the experiments in
the world may be waiting on theoretical breakthroughs.
In networking, there's a point at which you MUST physically
diversify, to protect against major disasters. Military command
posts like Cheyenne Mountain or Raven Rock could at one point have a
chance against nuclear attack, but not when warheads reached a
certain level of accuracy. Nuclear warfighting strategy means get the
National Command Authority into an airborne, mobile command post
ASAP--and have a chain of delegation if that plane becomes a
fireball. If your mission-critical data center sits on the San
Andreas Fault, it might survive The Big One, but the power and data
cables to it aren't as likely.
There's also the sorts of problems that redundancy doesn't help.
Radia Perlman's dissertation is on the Byzantine Corruption Problem,
which deals with the class of reliability problems that are caused by
at least partially incorrect information rather than nodal failures.
Routing protocols are subject to Byzantine Corruption.
>
> >
>>With the idea that BGP is growing widely with all of the /24 companies
>>joining the table, is a real shame. I would venture to say that many of the
>>companies out there could stand to take the down time of a single
connection
> >or a multiple connection to the same ISP and never really hurt their
>>business.
>>I can not say if BGP will scale to meet this growing "need", but I
>>can tell you that having to get more and more memory and CPU to handle the
>>larger and larger routing table is a burden and a pain. Hopefully someone
> >much more intelligent than I will find a simple and easy solution.
Simple answer: without at least some operational changes, we have
2-5 years before the global routing system gets into real trouble.
There are short-term fixes being considered, but 7-10 years out calls
for new research ideas.
I'm one of the speakers at the Internet Society meeting coming up in
Stockholm, in the "New Approaches to Internet Routing" session
chaired by Lyman Chapin. I will be defining the problem space--the
"what problem are we trying to solve" section. Sue Hares will talk
about short-term fixes to both BGP proper and operational practices,
and Frank Kastenholz will talk about research trends for the long
term.
There are lots of problems. It's more than the pure number of routes.
Contrast, for example, the number of "best" routes seen with a show
ip bgp at a major Tier 1 provider router, with the total number of
routes in the BGP Loc-RIB. It used to be a ratio of about 4 or 5
route instances to each best route, but the ratio is climbing to more
like 10:1. The Internet routing topology, conceived to be
hierarchical, is flattening.
People are injecting routes for not necessarily good reasons other
than their own desires -- the tragedy of the commons (below) is
relevant. People want to "multihome," but don't necessarily do it in
a manner that really improves their overall reliability. Another big
problem is injecting lots of routes for traffic engineering -- the
desire for "load balancing". Some routes simply are being injected
due to cluelessness. If you are in Australia and there are only four
(hypothetically) transoceanic links leaving the continent, there
really is no value for your /24 being seen in Norway or Argentina.
It's not so much a memory as a processing and convergence problem.
The more little routes, the more they are likely to change and force
routers to reconverge. BGP uses a path vector algorithm, which
derives from distance vector, and has the classic tradeoff of
stability (e.g., using holddown) versus fast convergence with
possible loops. We've also learned that the conventional wisdom that
"bad news travels fast" -- withdrawals propagate faster than
announcements -- is wrong; things work the other way around.
>
>But as Howard and Geoff would say, we're dealing with the "tragedy of the
>commons." Everyone wants to meet their own particular needs and is
>unwilling to meet the needs of the overall community. The phrase comes from
>something to do with sheep herders sharing a common area in Medieval
>Britain, if I recall. ;-)
>
One of the few things I remember from Economics 101, and very
relevant. Small English farming communities would have a commons, or
shared grazing area for livestock. It had enough capacity to feed
the animals that the households needed for their household meat,
milk, wool, etc.
But some greedy residents sent additional animals, intended for sale,
into the common area. Overgrazing soon wiped out the entire pasture.
And so it is with routing. Yakov Rekhter once observed that an IP
address has economic value if, and only if, it is reachable. Load
balancing may not be worth it if it causes instability. Multihoming
to more than two providers in a geographic area may be a matter of
diminishing returns, especially if local loop, electrical power, or
server redundancy isn't at the same level. Multihoming to more than
one POP of a single and reliable provider may be much more effective
than many people believe.
Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=5489&t=5468
--------------------------------------------------
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]