Re: redundancy [was: something about arrogance]

2002-07-30 Thread Pedro R Marques


Pedro Roque Marques wrote:

>--- Start of forwarded message ---
>From: [EMAIL PROTECTED] (Patrick Evans)
>To: Jim Shankland <[EMAIL PROTECTED]>
>Cc: [EMAIL PROTECTED]
>Newsgroups: jnx.ext.nanog
>Subject: Re: redundancy [was: something  about arrogance]
>Message-ID: <[EMAIL PROTECTED]>
>Date: 31 Jul 02 00:32:49 GMT
>References: <[EMAIL PROTECTED]>
>Organization: Juniper Networks, San Francisco, California
>
>
>On Tue, 30 Jul 2002, Jim Shankland wrote:
>
>  
>
>>Patrick Evans <[EMAIL PROTECTED]> writes:
>>
>>
>>
>>>My first project, if network availability were a key issue, within any
>>>organisation would be to a) obtain [an AS number] and b) make use of
>>>it.
>>>  
>>>
>>Heh.  How many bits in an AS number, again?
>>
>>
>>
>*grin*
>
>That's a problem with the underlying protocol. I get paid to run
>operational networks, not bleat endlessly about "how much work would
>it *really* take to implement 24bit AS numbers?" :)
>  
>

The plan is 32 bits... (see draft-ietf-idr-as4bytes-05.txt for details).
Essentially i think it just takes interest/demand from ISPs since the 
mechanism can be implemented and deployed without in a non disrruptive way.

>Crying about protocol deficiencies is a distant second to keeping a
>business up and running these days.
>  
>
imho, protocol efficiencies are not so much the problem. If it is clear 
the scale routing must operate on the right hardware/software can be 
engineered... that assuming that people are willing to upgrade their 
existing boxes and that it isn't a requirement that it must run on 5 
year old small entreprise boxes.

The later seems to be the biggest problem although. Effectivly the 
growth of routing table size is  bound by the maximum memory size and 
CPU capacity present in the most common boxes used in the network and 
not by protocol efficiency.

It is not so much of a question if one can build a database engine and 
respective distribution protocol than can scale upto n million paths but 
of the limits of the current day moral equivalent of the AGS+. Thus all 
the people that have these deployed in their networks tend to be 
concerned about the need to upgrade them as the size of the routing 
table increase.

As one of the posters was king enought to point out these sometimes end 
up being more issues of economics/buisiness than of engineering.

regards,
  Pedro.




Re: redundancy [was: something about arrogance]

2002-07-30 Thread Patrick Evans


On Tue, 30 Jul 2002, Jim Shankland wrote:

> Patrick Evans <[EMAIL PROTECTED]> writes:
>
> > My first project, if network availability were a key issue, within any
> > organisation would be to a) obtain [an AS number] and b) make use of
> > it.
>
> Heh.  How many bits in an AS number, again?
>
*grin*

That's a problem with the underlying protocol. I get paid to run
operational networks, not bleat endlessly about "how much work would
it *really* take to implement 24bit AS numbers?" :)

Crying about protocol deficiencies is a distant second to keeping a
business up and running these days.

-- 
Patrick Evans, allegedly
Email:  [EMAIL PROTECTED]
CV: www.pre.org/pre/cv
Wheels: Kawasaki ZXR400L9





Re: redundancy [was: something about arrogance]

2002-07-30 Thread Jim Shankland


Patrick Evans <[EMAIL PROTECTED]> writes:

> My first project, if network availability were a key issue, within any
> organisation would be to a) obtain [an AS number] and b) make use of
> it.

Heh.  How many bits in an AS number, again?

Jim Shankland



Re: redundancy [was: something about arrogance]

2002-07-30 Thread Patrick Evans


On Tue, 30 Jul 2002, David Schwartz wrote:

> One more just for kicks. Client had a 100Mbps circuit from their sole
> provider (100Mbps to colocated router, DS3 from this router to their
> premises). The circuit had been in place for several years and the contract
> had long since expired. One day, they got a call

Er, what does due diligence mean to you?

(We're wy into no-shit-sherlock territory here)

(For the record, I'd consider any operation without an AS number a
startup, and my first project, if network availability were a key
issue, within any organisation would be to a) obtain one and b) make
use of it. YMMV, but some V are more equal then others. Particularly
in the current economic climate.)

-- 
Patrick Evans, allegedly
Email:  [EMAIL PROTECTED]
CV: www.pre.org/pre/cv
Wheels: Kawasaki ZXR400L9




RE: redundancy [was: something about arrogance]

2002-07-30 Thread Brad Knowles


At 1:23 PM -0400 2002/07/30, Derek Samford wrote:

>   At the same time, I've been able to maintain aggregation of all
>  of my routes, and maintain true stability in my network. There is
>  absolutely no excuse to fill up the routing tables with nonsense.

Seeing as I don't understand much about this process, I would 
love to hear a detailed explanation of how you have managed to do all 
this.

-- 
Brad Knowles, <[EMAIL PROTECTED]>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
 -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI$ P+>++ L+ !E W+++(--) N+ !w---
O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+() DI+() D+(++) G+() e++> h--- r---(+++)* z(+++)



Re: redundancy [was: something about arrogance]

2002-07-30 Thread Brad Knowles


At 3:23 AM -0700 2002/07/30, Pedro R Marques wrote:

>  It is my impression, from reading this list and tidbits of gossip,
>  that the most common causes of failure are:
>  - link failure
>  - equipment failure (routers mostly), both software and hardware
>  - configuration errors

Most likely true.

>  To do so, one can look at:
>  - 2 external links to distinct providers
>  - 2 external links to the same provider

The latter doesn't protect you from a mis-configuration problem 
from the same provider, upstream of their redundant links to you. 
Moreover, it also doesn't protect you if they have a SPOF above your 
redundant links, even if logically they have two (or more) separate 
outward links, if they are over the same fiber, or the fibers in 
question are physically close to each other, then a single backhoe 
could take you out.

A second provider doesn't necessarily protect you against the 
backhoe problem, but it would reduce the chances of a problem caused 
by an upstream misconfiguration.

>  While i can't speak to the economics part of the equation (although
>  i would expect it to be cheaper to buy an additional link than connect
>  to a different provider) from a point of view of restoration,
>  protecting a path with an alternate path from the same provider
>  is certainly an aproach that gives you much better convengence times.

Perhaps, perhaps not.  I would be willing to bet that there are 
at least a few large providers that effectively run each city as a 
separate business, and they'll rape you just as much or more for two 
connections as you would pay to get one connection each from two 
companies.

>  Unless the main concern is that the upstream ISP fails entirely...
>  which given the fact that it tends to have frontpage honors on the
>  NYTimes this days does not apear to be an all to common occurence
>  (i mean operationally, not financially - clarification added to
>  dispel potential humorous remarks).

Again, I think that this is at least partly dependant on who the 
upstreams are.  If they're small enough, then a single backhoe could 
take out all the fiber (or cause the remaining fiber to be loaded 
well past capacity and practically useless) or cause a power loss 
across the entire facility.

Even if you buy connectivity from a pretty big upstream, what 
with WorldCom and Qwest both being in serious trouble (and KPN/Qwest 
having completely shut down operations), I would indeed be very 
concerned about complete failure of my upstream.

-- 
Brad Knowles, <[EMAIL PROTECTED]>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
 -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI$ P+>++ L+ !E W+++(--) N+ !w---
O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+() DI+() D+(++) G+() e++> h--- r---(+++)* z(+++)



RE: redundancy [was: something about arrogance]

2002-07-30 Thread Derek Samford


That is even worse than what we have been talking about. You should be
running a P2P T1 back to yourself, and distributing the access from a
POP, or have the carrier you're reselling the T1 for allocate a /24.
There is no reason to run BGP for a single /24 whatsoever, it should be
announced in Carrier address space. Using your AS for another company
totally violates the whole idea of an "Autonomous System". 

Derek

-Original Message-
From: Manolo Hernandez [mailto:[EMAIL PROTECTED]] 
Sent: Tuesday, July 30, 2002 1:30 PM
To: Derek Samford
Cc: [EMAIL PROTECTED]; 'Pedro R Marques'; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: RE: redundancy [was: something about arrogance]

Yes their is a reason to some /24s advertised to the world. If this a
class on BGP they would tell you that was a nono, but since this is the
real world it happens and is sometimes required. It is required when you
need to give a customer T-1 access at a location seperate from yours and
has a seperate connection to the net and you are using your AS on the
access router. A /24 is a solution that works nicely and still works
with your aggregated /20 address. 


On Tue, 2002-07-30 at 13:23, Derek Samford wrote:
> 
> I couldn't possibly agree more. In fact, my approach has been to
create
> a mesh between different Colo centers, and keep it at about 3 Transit
> carriers. Because of the different methods of interconnection, I
haven't
> ever had a long-term outage. Also, I've been able to filter any issues
> that are beyond my carrier's immediate reach (i.e. congested peering
> points.) At the same time, I've been able to maintain aggregation of
all
> of my routes, and maintain true stability in my network. There is
> absolutely no excuse to fill up the routing tables with nonsense.
> 
> Derek
> 
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf
Of
> Phil Rosenthal
> Sent: Tuesday, July 30, 2002 12:52 PM
> To: 'Pedro R Marques'; [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: RE: redundancy [was: something about arrogance]
> 
> 
> I have in the past single-homed to Level(3) and Verio, each in their
own
> facility in NC.
> In that time, both carriers had about 1 solid hour a month of solid
> downtime (some months were worse, some were better). Some of the
outages
> were on the order of 8 solid hours (verio) or 4 hours (level3).
> 
> We did not run HSRP with Level3, so it may be difficult to guarantee
the
> uptime of one gige handoff... But we ran HSRP with verio, and of all
the
> outages (about 20 of them) -- Maybe two of them were avoided because
of
> HSRP.
> 
> Other than that, it was all downtime.
> 
> At this point,  I couldn't conceive single-homing to any uplink
anymore.
> 
> --Phil
> 
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf
Of
> Pedro R Marques
> Sent: Tuesday, July 30, 2002 6:23 AM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: redundancy [was: something about arrogance]
> 
> 
> 
> Brad writes:
>  >I'm probably demonstrating my ignorance here (and my
stupidity
> 
> in 
>  > stepping into a long-standing highly charged argument), but I'm 
>  > completely missing something.  For reasons of redundancy & 
>  > reliability, even if you were to buy bandwidth in only one
location, 
>  > wouldn't you want to buy it from at least two different providers?
>  
>  >If you buy bandwidth from two different providers at two 
>  > different locations, this would seem to me to be a good way to 
>  > provide backup in case on provider or one location goes 
>  > Tango-Uniform, and you could always backhaul the bandwidth for the 
>  > site/provider that is down.
> 
> Several other posters have mentioned reasons why redundancy between 2 
> different connections to separate providers are not, in most
situations,
> 
> the preferable aproach but i would like to add another
point/question...
> 
> When considering redudancy/reliability/etc it is important to think 
> about what kind of failures do you want to protect against vs cost of 
> doing so.
> 
> It is my impression, from reading this list and tidbits of gossip,
that 
> the most common causes of failure are:
> - link failure
> - equipment failure (routers mostly), both software and hardware
> - configuration errors
> 
> All of those are much more frequent than the failure of an entire ISP
(a
> 
> transit provider). It is expected, i believe, of a competent ISP to 
> provide redudancy both within a POP and intra-POP links/equipment and 
> its connections to upstreams/peers.
> 
> As such, probably the first lev

RE: redundancy [was: something about arrogance]

2002-07-30 Thread Manolo Hernandez


Yes their is a reason to some /24s advertised to the world. If this a
class on BGP they would tell you that was a nono, but since this is the
real world it happens and is sometimes required. It is required when you
need to give a customer T-1 access at a location seperate from yours and
has a seperate connection to the net and you are using your AS on the
access router. A /24 is a solution that works nicely and still works
with your aggregated /20 address. 


On Tue, 2002-07-30 at 13:23, Derek Samford wrote:
> 
> I couldn't possibly agree more. In fact, my approach has been to create
> a mesh between different Colo centers, and keep it at about 3 Transit
> carriers. Because of the different methods of interconnection, I haven't
> ever had a long-term outage. Also, I've been able to filter any issues
> that are beyond my carrier's immediate reach (i.e. congested peering
> points.) At the same time, I've been able to maintain aggregation of all
> of my routes, and maintain true stability in my network. There is
> absolutely no excuse to fill up the routing tables with nonsense.
> 
> Derek
> 
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
> Phil Rosenthal
> Sent: Tuesday, July 30, 2002 12:52 PM
> To: 'Pedro R Marques'; [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: RE: redundancy [was: something about arrogance]
> 
> 
> I have in the past single-homed to Level(3) and Verio, each in their own
> facility in NC.
> In that time, both carriers had about 1 solid hour a month of solid
> downtime (some months were worse, some were better). Some of the outages
> were on the order of 8 solid hours (verio) or 4 hours (level3).
> 
> We did not run HSRP with Level3, so it may be difficult to guarantee the
> uptime of one gige handoff... But we ran HSRP with verio, and of all the
> outages (about 20 of them) -- Maybe two of them were avoided because of
> HSRP.
> 
> Other than that, it was all downtime.
> 
> At this point,  I couldn't conceive single-homing to any uplink anymore.
> 
> --Phil
> 
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
> Pedro R Marques
> Sent: Tuesday, July 30, 2002 6:23 AM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: redundancy [was: something about arrogance]
> 
> 
> 
> Brad writes:
>  >I'm probably demonstrating my ignorance here (and my stupidity
> 
> in 
>  > stepping into a long-standing highly charged argument), but I'm 
>  > completely missing something.  For reasons of redundancy & 
>  > reliability, even if you were to buy bandwidth in only one location, 
>  > wouldn't you want to buy it from at least two different providers?
>  
>  >If you buy bandwidth from two different providers at two 
>  > different locations, this would seem to me to be a good way to 
>  > provide backup in case on provider or one location goes 
>  > Tango-Uniform, and you could always backhaul the bandwidth for the 
>  > site/provider that is down.
> 
> Several other posters have mentioned reasons why redundancy between 2 
> different connections to separate providers are not, in most situations,
> 
> the preferable aproach but i would like to add another point/question...
> 
> When considering redudancy/reliability/etc it is important to think 
> about what kind of failures do you want to protect against vs cost of 
> doing so.
> 
> It is my impression, from reading this list and tidbits of gossip, that 
> the most common causes of failure are:
> - link failure
> - equipment failure (routers mostly), both software and hardware
> - configuration errors
> 
> All of those are much more frequent than the failure of an entire ISP (a
> 
> transit provider). It is expected, i believe, of a competent ISP to 
> provide redudancy both within a POP and intra-POP links/equipment and 
> its connections to upstreams/peers.
> 
> As such, probably the first level of redundancy that a origin AS 
> (non-transit) would look at would be  with the intent to protect from 
> failures of its external connectivity link and termination equipment 
> (routers on both ends).
> 
> To do so, one can look at:
> - 2 external links to distinct providers
> - 2 external links to the same provider
> 
> While i can't speak to the economics part of the equation (although i 
> would expect it to be cheaper to buy an additional link than connect to 
> a different provider) from a point of view of restoration, protecting a 
> path with an alternate path from the same provider is certainly an 
> aproach that gives you much better convengence tim

RE: redundancy [was: something about arrogance]

2002-07-30 Thread Derek Samford


I couldn't possibly agree more. In fact, my approach has been to create
a mesh between different Colo centers, and keep it at about 3 Transit
carriers. Because of the different methods of interconnection, I haven't
ever had a long-term outage. Also, I've been able to filter any issues
that are beyond my carrier's immediate reach (i.e. congested peering
points.) At the same time, I've been able to maintain aggregation of all
of my routes, and maintain true stability in my network. There is
absolutely no excuse to fill up the routing tables with nonsense.

Derek

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
Phil Rosenthal
Sent: Tuesday, July 30, 2002 12:52 PM
To: 'Pedro R Marques'; [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: RE: redundancy [was: something about arrogance]


I have in the past single-homed to Level(3) and Verio, each in their own
facility in NC.
In that time, both carriers had about 1 solid hour a month of solid
downtime (some months were worse, some were better). Some of the outages
were on the order of 8 solid hours (verio) or 4 hours (level3).

We did not run HSRP with Level3, so it may be difficult to guarantee the
uptime of one gige handoff... But we ran HSRP with verio, and of all the
outages (about 20 of them) -- Maybe two of them were avoided because of
HSRP.

Other than that, it was all downtime.

At this point,  I couldn't conceive single-homing to any uplink anymore.

--Phil

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
Pedro R Marques
Sent: Tuesday, July 30, 2002 6:23 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: redundancy [was: something about arrogance]



Brad writes:
 >I'm probably demonstrating my ignorance here (and my stupidity

in 
 > stepping into a long-standing highly charged argument), but I'm 
 > completely missing something.  For reasons of redundancy & 
 > reliability, even if you were to buy bandwidth in only one location, 
 > wouldn't you want to buy it from at least two different providers?
 
 >If you buy bandwidth from two different providers at two 
 > different locations, this would seem to me to be a good way to 
 > provide backup in case on provider or one location goes 
 > Tango-Uniform, and you could always backhaul the bandwidth for the 
 > site/provider that is down.

Several other posters have mentioned reasons why redundancy between 2 
different connections to separate providers are not, in most situations,

the preferable aproach but i would like to add another point/question...

When considering redudancy/reliability/etc it is important to think 
about what kind of failures do you want to protect against vs cost of 
doing so.

It is my impression, from reading this list and tidbits of gossip, that 
the most common causes of failure are:
- link failure
- equipment failure (routers mostly), both software and hardware
- configuration errors

All of those are much more frequent than the failure of an entire ISP (a

transit provider). It is expected, i believe, of a competent ISP to 
provide redudancy both within a POP and intra-POP links/equipment and 
its connections to upstreams/peers.

As such, probably the first level of redundancy that a origin AS 
(non-transit) would look at would be  with the intent to protect from 
failures of its external connectivity link and termination equipment 
(routers on both ends).

To do so, one can look at:
- 2 external links to distinct providers
- 2 external links to the same provider

While i can't speak to the economics part of the equation (although i 
would expect it to be cheaper to buy an additional link than connect to 
a different provider) from a point of view of restoration, protecting a 
path with an alternate path from the same provider is certainly an 
aproach that gives you much better convengence times.

This comes from the fact that in terms of network topology, the distance

between 2 links to the same upstream is much shorter than 2 links to 
different upstreams. While, if you protect a path with an alternate path

to the same ISP you can expect convergence to occur within the IGP 
convergence times of your provider, with 2 different providers you need 
global BGP convergence to occur.

This gets to be longer dependent on how topologically distant your 2 
upstreams are... for instance attempting to protect a path to an ISP 
with very wide connectivity with a protection path from one with very 
limited connectivity would be a particularly bad case as you would have 
to wait for the path announced by the larger ISP to be withdrawn n times

from all its peering points and the protection path to make its way 
through in replacement.

It is counter-intuitive to me what i perceive to be the standard 
practice of attempting to multi-home to 2 distinct providers by 
origin-only ASes... fro

RE: redundancy [was: something about arrogance]

2002-07-30 Thread Phil Rosenthal


I have in the past single-homed to Level(3) and Verio, each in their own
facility in NC.
In that time, both carriers had about 1 solid hour a month of solid
downtime (some months were worse, some were better). Some of the outages
were on the order of 8 solid hours (verio) or 4 hours (level3).

We did not run HSRP with Level3, so it may be difficult to guarantee the
uptime of one gige handoff... But we ran HSRP with verio, and of all the
outages (about 20 of them) -- Maybe two of them were avoided because of
HSRP.

Other than that, it was all downtime.

At this point,  I couldn't conceive single-homing to any uplink anymore.

--Phil

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
Pedro R Marques
Sent: Tuesday, July 30, 2002 6:23 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: redundancy [was: something about arrogance]



Brad writes:
 >I'm probably demonstrating my ignorance here (and my stupidity

in 
 > stepping into a long-standing highly charged argument), but I'm 
 > completely missing something.  For reasons of redundancy & 
 > reliability, even if you were to buy bandwidth in only one location, 
 > wouldn't you want to buy it from at least two different providers?
 
 >If you buy bandwidth from two different providers at two 
 > different locations, this would seem to me to be a good way to 
 > provide backup in case on provider or one location goes 
 > Tango-Uniform, and you could always backhaul the bandwidth for the 
 > site/provider that is down.

Several other posters have mentioned reasons why redundancy between 2 
different connections to separate providers are not, in most situations,

the preferable aproach but i would like to add another point/question...

When considering redudancy/reliability/etc it is important to think 
about what kind of failures do you want to protect against vs cost of 
doing so.

It is my impression, from reading this list and tidbits of gossip, that 
the most common causes of failure are:
- link failure
- equipment failure (routers mostly), both software and hardware
- configuration errors

All of those are much more frequent than the failure of an entire ISP (a

transit provider). It is expected, i believe, of a competent ISP to 
provide redudancy both within a POP and intra-POP links/equipment and 
its connections to upstreams/peers.

As such, probably the first level of redundancy that a origin AS 
(non-transit) would look at would be  with the intent to protect from 
failures of its external connectivity link and termination equipment 
(routers on both ends).

To do so, one can look at:
- 2 external links to distinct providers
- 2 external links to the same provider

While i can't speak to the economics part of the equation (although i 
would expect it to be cheaper to buy an additional link than connect to 
a different provider) from a point of view of restoration, protecting a 
path with an alternate path from the same provider is certainly an 
aproach that gives you much better convengence times.

This comes from the fact that in terms of network topology, the distance

between 2 links to the same upstream is much shorter than 2 links to 
different upstreams. While, if you protect a path with an alternate path

to the same ISP you can expect convergence to occur within the IGP 
convergence times of your provider, with 2 different providers you need 
global BGP convergence to occur.

This gets to be longer dependent on how topologically distant your 2 
upstreams are... for instance attempting to protect a path to an ISP 
with very wide connectivity with a protection path from one with very 
limited connectivity would be a particularly bad case as you would have 
to wait for the path announced by the larger ISP to be withdrawn n times

from all its peering points and the protection path to make its way 
through in replacement.

It is counter-intuitive to me what i perceive to be the standard 
practice of attempting to multi-home to 2 distinct providers by 
origin-only ASes... from several points of view: convergence times, load

on the global routing system, complexity of management, etc, dual 
connectivity to different routers of the same provider (using distinct 
physical paths) would seem to me to make more sense.

Unless the main concern is that the upstream ISP fails entirely... which

given the fact that it tends to have frontpage honors on the NYTimes 
this days does not apear to be an all to common occurence (i mean 
operationally, not financially - clarification added to dispel potential

humorous remarks).

So, my question to the list is, why is multi-homing to 2 different 
providers such a desirable thing ? What is the motivation and why is it 
prefered over multiple connections to the same upstream ?

Is the main motivation not so much reliability but having a shorter 
as-path to more destinations ? This would apear to

RE: redundancy [was: something about arrogance]

2002-07-30 Thread John Ferriby

>   You cannot as easily be held hostage. I have consulted for
> a few ISPs and
> have my share of war stories.
>
>   Here's a (true!) example. One day, a certain head of a
> fairly large ISP
> decided that he wouldn't route traffic to or from IPs he had
> assigned that
> didn't reverse resolve because he felt it was imperative that
> people be able
> to find network contacts in this way (I think he got sick of
> being the one to
> get the abuse emails). He told my client three days before implementing
a
> sweep and filter. He had the equivalent of about 38 /24s from this ISP
> distributed over about 180 customers, they were his sole uplink.

[SNIP]

Often overlooked is the redundancy in business processes.  We tend
to view events with an external-forces engineering perspective while
frequently the culprits are uninformed decisions, knee-jerk reactions and
opportunism by humans at our vendors.   (Not to downplay other risks.)

-John

--
John Ferriby - PGP Key: www.ferriby.com/pgpkey



smime.p7s
Description: application/pkcs7-signature


Re: redundancy [was: something about arrogance]

2002-07-30 Thread David Schwartz



On Tue, 30 Jul 2002 03:23:24 -0700, Pedro R Marques wrote:

>All of those are much more frequent than the failure of an entire ISP (a
>transit provider). It is expected, i believe, of a competent ISP to
>provide redudancy both within a POP and intra-POP links/equipment and
>its connections to upstreams/peers.

Yes, but when the ISP that all your redundant links go to and that you got
all your IPs from goes out of business, what's the mean time to repair? 30
days?

>So, my question to the list is, why is multi-homing to 2 different
>providers such a desirable thing ? What is the motivation and why is it
>prefered over multiple connections to the same upstream ?

You cannot as easily be held hostage. I have consulted for a few ISPs and
have my share of war stories.

Here's a (true!) example. One day, a certain head of a fairly large ISP
decided that he wouldn't route traffic to or from IPs he had assigned that
didn't reverse resolve because he felt it was imperative that people be able
to find network contacts in this way (I think he got sick of being the one to
get the abuse emails). He told my client three days before implementing a
sweep and filter. He had the equivalent of about 38 /24s from this ISP
distributed over about 180 customers, they were his sole uplink.

Here's another good one. A client needed a /22 immediately for a major
customer about to come online, set it up fast or lost the account. We made
sure to met all the IP assignment guidelines and our justification was
impeccable, we had >90% utilization of a /18. The only problem was, the
client's provider had a screw up in their allocations and justifications and
their applications were being refused by ARIN until they fixed their
problems. Now what?

One more just for kicks. Client had a 100Mbps circuit from their sole
provider (100Mbps to colocated router, DS3 from this router to their
premises). The circuit had been in place for several years and the contract
had long since expired. One day, they got a call -- they had 5 days to agree
to a new (and MUCH higher) pricing scheme with a much higher minimum paid
bandwidth amount or their circuit would be turned off. The kicker -- they had
to agree to a two year term!

The other issue is provider misconfigurations/meltdowns. They're not common,
but if you're multihomed, you can just shut down the circuit to the
misconfigured providers. There have been a few cases of these that I've seem
where the repair time was several hours.

If you add cases where just one POP was out, the number goes way up. If
you're only in one location yourself and only use one provider, all of your
redundant links will likely go to the same POP.

DS





redundancy [was: something about arrogance]

2002-07-30 Thread Pedro R Marques


Brad writes:
 >I'm probably demonstrating my ignorance here (and my stupidity 
in 
 > stepping into a long-standing highly charged argument), but I'm 
 > completely missing something.  For reasons of redundancy & 
 > reliability, even if you were to buy bandwidth in only one location, 
 > wouldn't you want to buy it from at least two different providers?
 
 >If you buy bandwidth from two different providers at two 
 > different locations, this would seem to me to be a good way to 
 > provide backup in case on provider or one location goes 
 > Tango-Uniform, and you could always backhaul the bandwidth for the 
 > site/provider that is down.

Several other posters have mentioned reasons why redundancy between 2 
different connections to separate providers are not, in most situations, 
the preferable aproach but i would like to add another point/question...

When considering redudancy/reliability/etc it is important to think 
about what kind of failures do you want to protect against vs cost of 
doing so.

It is my impression, from reading this list and tidbits of gossip, that 
the most common causes of failure are:
- link failure
- equipment failure (routers mostly), both software and hardware
- configuration errors

All of those are much more frequent than the failure of an entire ISP (a 
transit provider). It is expected, i believe, of a competent ISP to 
provide redudancy both within a POP and intra-POP links/equipment and 
its connections to upstreams/peers.

As such, probably the first level of redundancy that a origin AS 
(non-transit) would look at would be  with the intent to protect from 
failures of its external connectivity link and termination equipment 
(routers on both ends).

To do so, one can look at:
- 2 external links to distinct providers
- 2 external links to the same provider

While i can't speak to the economics part of the equation (although i 
would expect it to be cheaper to buy an additional link than connect to 
a different provider) from a point of view of restoration, protecting a 
path with an alternate path from the same provider is certainly an 
aproach that gives you much better convengence times.

This comes from the fact that in terms of network topology, the distance 
between 2 links to the same upstream is much shorter than 2 links to 
different upstreams. While, if you protect a path with an alternate path 
to the same ISP you can expect convergence to occur within the IGP 
convergence times of your provider, with 2 different providers you need 
global BGP convergence to occur.

This gets to be longer dependent on how topologically distant your 2 
upstreams are... for instance attempting to protect a path to an ISP 
with very wide connectivity with a protection path from one with very 
limited connectivity would be a particularly bad case as you would have 
to wait for the path announced by the larger ISP to be withdrawn n times 
from all its peering points and the protection path to make its way 
through in replacement.

It is counter-intuitive to me what i perceive to be the standard 
practice of attempting to multi-home to 2 distinct providers by 
origin-only ASes... from several points of view: convergence times, load 
on the global routing system, complexity of management, etc, dual 
connectivity to different routers of the same provider (using distinct 
physical paths) would seem to me to make more sense.

Unless the main concern is that the upstream ISP fails entirely... which 
given the fact that it tends to have frontpage honors on the NYTimes 
this days does not apear to be an all to common occurence (i mean 
operationally, not financially - clarification added to dispel potential 
humorous remarks).

So, my question to the list is, why is multi-homing to 2 different 
providers such a desirable thing ? What is the motivation and why is it 
prefered over multiple connections to the same upstream ?

Is the main motivation not so much reliability but having a shorter 
as-path to more destinations ? This would apear to me to be a clear 
advantage since that doesn't necessarily reflect in better qualitify of 
interconnection.

My apologies in advance if these seem to be stupid questions...

thanks,
  Pedro.