Re: OT -- BGP scalability [7:5468]

Howard C. Berkowitz Tue, 22 May 2001 17:00:42 -0700
(with a red face, reconstructing my original response.  We're having 
a violent thunderstorm, and the power went out. I saved a lot of 
things, but trusted my UPS. Unfortunately, the UPS is much more 
helpful when the CPU is plugged into one of the battery backup 
outlets, not a surge protector outlet.  Now it is!)

>Priscilla wrote, in response to John Hardman,



>Well, you're preaching to the choir, but I have a couple comments in line
>nonetheless. A really technical discussion would require more knowledge of
>statistics, queuing theory, etc., than I have. But it can't hurt to discuss
>the issues at a high level also.
>
>
>
>In the phone industry, we can use Erlang and other obscure methods for
>calculating the amount of bandwidth needed based on an acceptable frequency
>of someone picking up the phone and not getting dial tone. Why can't we do
>something similar with networking? I suspect it's because network traffic
>is so different from phone traffic. We claim that network traffic is
>"bursty," but it's not nearly as bursty as phone traffic. There's very
>little quiet time. Even if the user isn't doing something there's still
>overhead traffic, keepalives, routing table updates, etc. The consequences
>of not being able to send this overhead traffic can result in serious
>performance degradation.

While Erlang C distributions can be of some use in estimating 
buffered data network device capacity, the statistical aspects get 
complex.  The "conventional wisdom" was that packet interarrivals 
were exponential, but more recent research (e.g., by Will Leland at 
Telecordia) showed that fractal distributions were much more 
accurate.  There's also the aspect that traffic in a routed system is 
actively affected by congestion and other feedback.

A surprisingly large number of Internet packets are 40 to 48 bytes. 
These turn out to be TCP, and primarily HTTP, acknowledgements, 
possibly with some buffer padding beyond the essential 20 bytes of IP 
and 20 bytes of TCP.  HTTP is _not_ bandwidth efficient, or address 
space efficient, but there's a lot of inertia in upgrading it.

>
>
>  >
>>Now this begs the next question... do people need 99.999% uptime on the
>>phone system or on their network? Keep in mind that 99.999% uptime equals
to
>>apx 1 minute of downtime per 30 days.

And adding each additional nine adds a lot of cost.  Also, adding 
redundancy in particular parts of the overall system doesn't 
necessarily make things better.  Perhaps going wildly off topic, 
perhaps not, professional futurists use the idea of a 
"Hahn-Strasseman point" in forecasting technology. That refers to the 
physical demonstration of nuclear fission.  All the theoretical work 
in the world wasn't going to go much farther without that 
experimental breakthrough.  In other cases, all the experiments in 
the world may be waiting on theoretical breakthroughs.

In networking, there's a point at which you MUST physically 
diversify, to protect against major disasters.  Military command 
posts like Cheyenne Mountain or Raven Rock could at one point have a 
chance against nuclear attack, but not when warheads reached a 
certain level of accuracy. Nuclear warfighting strategy means get the 
National Command Authority into an airborne, mobile command post 
ASAP--and have a chain of delegation if that plane becomes a 
fireball.  If your mission-critical data center sits on the San 
Andreas Fault, it might survive The Big One, but the power and data 
cables to it aren't as likely.

There's also the sorts of problems that redundancy doesn't help. 
Radia Perlman's dissertation is on the Byzantine Corruption Problem, 
which deals with the class of reliability problems that are caused by 
at least partially incorrect information rather than nodal failures. 
Routing protocols are subject to Byzantine Corruption.
>
>  >
>>With the idea that BGP is growing widely with all of the /24 companies
>>joining the table, is a real shame. I would venture to say that many of the
>>companies out there could stand to take the down time of a single
connection
>  >or a multiple connection to the same ISP and never really hurt their
>>business.
>>I can not say if BGP will scale to meet this growing "need", but I
>>can tell you that having to get more and more memory and CPU to handle the
>>larger and larger routing table is a burden and a pain. Hopefully someone
>  >much more intelligent than I will find a simple and easy solution.

Simple answer:  without at least some operational changes, we have 
2-5 years before the global routing system gets into real trouble. 
There are short-term fixes being considered, but 7-10 years out calls 
for new research ideas.

I'm one of the speakers at the Internet Society meeting coming up in 
Stockholm, in the "New Approaches to Internet Routing" session 
chaired by Lyman Chapin.  I will be defining the problem space--the 
"what problem are we trying to solve" section.  Sue Hares will talk 
about short-term fixes to both BGP proper and operational practices, 
and Frank Kastenholz will talk about research trends for the long 
term.

There are lots of problems. It's more than the pure number of routes. 
Contrast, for example, the number of "best" routes seen with a show 
ip bgp at a major Tier 1 provider router, with the total number of 
routes in the BGP Loc-RIB.  It used to be a ratio of about 4 or 5 
route instances to each best route, but the ratio is climbing to more 
like 10:1.  The Internet routing topology, conceived to be 
hierarchical, is flattening.

People are injecting routes for not necessarily good reasons other 
than their own desires -- the tragedy of the commons (below) is 
relevant. People want to "multihome," but don't necessarily do it in 
a manner that really improves their overall reliability. Another big 
problem is injecting lots of routes for traffic engineering -- the 
desire for "load balancing".  Some routes simply are being injected 
due to cluelessness.  If you are in Australia and there are only four 
(hypothetically) transoceanic links leaving the continent, there 
really is no value for your /24 being seen in Norway or Argentina.

It's not so much a memory as a processing and convergence problem. 
The more little routes, the more they are likely to change and force 
routers to reconverge.  BGP uses a path vector algorithm, which 
derives from distance vector, and has the classic tradeoff of 
stability (e.g., using holddown) versus fast convergence with 
possible loops. We've also learned that the conventional wisdom that 
"bad news travels fast" -- withdrawals propagate faster than 
announcements -- is wrong; things work the other way around.

>
>But as Howard and Geoff would say, we're dealing with the "tragedy of the
>commons." Everyone wants to meet their own particular needs and is
>unwilling to meet the needs of the overall community. The phrase comes from
>something to do with sheep herders sharing a common area in Medieval
>Britain, if I recall. ;-)
>
One of the few things I remember from Economics 101, and very 
relevant.  Small English farming communities would have a commons, or 
shared grazing area for livestock.  It had enough capacity to feed 
the animals that the households needed for their household meat, 
milk, wool, etc.

But some greedy residents sent additional animals, intended for sale, 
into the common area.  Overgrazing soon wiped out the entire pasture.

And so it is with routing.  Yakov Rekhter once observed that an IP 
address has economic value if, and only if, it is reachable. Load 
balancing may not be worth it if it causes instability. Multihoming 
to more than two providers in a geographic area may be a matter of 
diminishing returns, especially if local loop, electrical power, or 
server redundancy isn't at the same level.  Multihoming to more than 
one POP of a single and reliable provider may be much more effective 
than many people believe.




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=5489&t=5468
--------------------------------------------------
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]
Re: OT -- BGP scalability [7:5468]

Reply via email to