Re: redundancy [was: something about arrogance]
Pedro Roque Marques wrote: >--- Start of forwarded message --- >From: [EMAIL PROTECTED] (Patrick Evans) >To: Jim Shankland <[EMAIL PROTECTED]> >Cc: [EMAIL PROTECTED] >Newsgroups: jnx.ext.nanog >Subject: Re: redundancy [was: something about arrogance] >Message-ID: <[EMAIL PROTECTED]> >Date: 31 Jul 02 00:32:49 GMT >References: <[EMAIL PROTECTED]> >Organization: Juniper Networks, San Francisco, California > > >On Tue, 30 Jul 2002, Jim Shankland wrote: > > > >>Patrick Evans <[EMAIL PROTECTED]> writes: >> >> >> >>>My first project, if network availability were a key issue, within any >>>organisation would be to a) obtain [an AS number] and b) make use of >>>it. >>> >>> >>Heh. How many bits in an AS number, again? >> >> >> >*grin* > >That's a problem with the underlying protocol. I get paid to run >operational networks, not bleat endlessly about "how much work would >it *really* take to implement 24bit AS numbers?" :) > > The plan is 32 bits... (see draft-ietf-idr-as4bytes-05.txt for details). Essentially i think it just takes interest/demand from ISPs since the mechanism can be implemented and deployed without in a non disrruptive way. >Crying about protocol deficiencies is a distant second to keeping a >business up and running these days. > > imho, protocol efficiencies are not so much the problem. If it is clear the scale routing must operate on the right hardware/software can be engineered... that assuming that people are willing to upgrade their existing boxes and that it isn't a requirement that it must run on 5 year old small entreprise boxes. The later seems to be the biggest problem although. Effectivly the growth of routing table size is bound by the maximum memory size and CPU capacity present in the most common boxes used in the network and not by protocol efficiency. It is not so much of a question if one can build a database engine and respective distribution protocol than can scale upto n million paths but of the limits of the current day moral equivalent of the AGS+. Thus all the people that have these deployed in their networks tend to be concerned about the need to upgrade them as the size of the routing table increase. As one of the posters was king enought to point out these sometimes end up being more issues of economics/buisiness than of engineering. regards, Pedro.
Re: redundancy [was: something about arrogance]
On Tue, 30 Jul 2002, Jim Shankland wrote: > Patrick Evans <[EMAIL PROTECTED]> writes: > > > My first project, if network availability were a key issue, within any > > organisation would be to a) obtain [an AS number] and b) make use of > > it. > > Heh. How many bits in an AS number, again? > *grin* That's a problem with the underlying protocol. I get paid to run operational networks, not bleat endlessly about "how much work would it *really* take to implement 24bit AS numbers?" :) Crying about protocol deficiencies is a distant second to keeping a business up and running these days. -- Patrick Evans, allegedly Email: [EMAIL PROTECTED] CV: www.pre.org/pre/cv Wheels: Kawasaki ZXR400L9
Re: redundancy [was: something about arrogance]
Patrick Evans <[EMAIL PROTECTED]> writes: > My first project, if network availability were a key issue, within any > organisation would be to a) obtain [an AS number] and b) make use of > it. Heh. How many bits in an AS number, again? Jim Shankland
Re: redundancy [was: something about arrogance]
On Tue, 30 Jul 2002, David Schwartz wrote: > One more just for kicks. Client had a 100Mbps circuit from their sole > provider (100Mbps to colocated router, DS3 from this router to their > premises). The circuit had been in place for several years and the contract > had long since expired. One day, they got a call Er, what does due diligence mean to you? (We're wy into no-shit-sherlock territory here) (For the record, I'd consider any operation without an AS number a startup, and my first project, if network availability were a key issue, within any organisation would be to a) obtain one and b) make use of it. YMMV, but some V are more equal then others. Particularly in the current economic climate.) -- Patrick Evans, allegedly Email: [EMAIL PROTECTED] CV: www.pre.org/pre/cv Wheels: Kawasaki ZXR400L9
RE: redundancy [was: something about arrogance]
At 1:23 PM -0400 2002/07/30, Derek Samford wrote: > At the same time, I've been able to maintain aggregation of all > of my routes, and maintain true stability in my network. There is > absolutely no excuse to fill up the routing tables with nonsense. Seeing as I don't understand much about this process, I would love to hear a detailed explanation of how you have managed to do all this. -- Brad Knowles, <[EMAIL PROTECTED]> "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI$ P+>++ L+ !E W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++> h--- r---(+++)* z(+++)
Re: redundancy [was: something about arrogance]
At 3:23 AM -0700 2002/07/30, Pedro R Marques wrote: > It is my impression, from reading this list and tidbits of gossip, > that the most common causes of failure are: > - link failure > - equipment failure (routers mostly), both software and hardware > - configuration errors Most likely true. > To do so, one can look at: > - 2 external links to distinct providers > - 2 external links to the same provider The latter doesn't protect you from a mis-configuration problem from the same provider, upstream of their redundant links to you. Moreover, it also doesn't protect you if they have a SPOF above your redundant links, even if logically they have two (or more) separate outward links, if they are over the same fiber, or the fibers in question are physically close to each other, then a single backhoe could take you out. A second provider doesn't necessarily protect you against the backhoe problem, but it would reduce the chances of a problem caused by an upstream misconfiguration. > While i can't speak to the economics part of the equation (although > i would expect it to be cheaper to buy an additional link than connect > to a different provider) from a point of view of restoration, > protecting a path with an alternate path from the same provider > is certainly an aproach that gives you much better convengence times. Perhaps, perhaps not. I would be willing to bet that there are at least a few large providers that effectively run each city as a separate business, and they'll rape you just as much or more for two connections as you would pay to get one connection each from two companies. > Unless the main concern is that the upstream ISP fails entirely... > which given the fact that it tends to have frontpage honors on the > NYTimes this days does not apear to be an all to common occurence > (i mean operationally, not financially - clarification added to > dispel potential humorous remarks). Again, I think that this is at least partly dependant on who the upstreams are. If they're small enough, then a single backhoe could take out all the fiber (or cause the remaining fiber to be loaded well past capacity and practically useless) or cause a power loss across the entire facility. Even if you buy connectivity from a pretty big upstream, what with WorldCom and Qwest both being in serious trouble (and KPN/Qwest having completely shut down operations), I would indeed be very concerned about complete failure of my upstream. -- Brad Knowles, <[EMAIL PROTECTED]> "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI$ P+>++ L+ !E W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++> h--- r---(+++)* z(+++)
RE: redundancy [was: something about arrogance]
That is even worse than what we have been talking about. You should be running a P2P T1 back to yourself, and distributing the access from a POP, or have the carrier you're reselling the T1 for allocate a /24. There is no reason to run BGP for a single /24 whatsoever, it should be announced in Carrier address space. Using your AS for another company totally violates the whole idea of an "Autonomous System". Derek -Original Message- From: Manolo Hernandez [mailto:[EMAIL PROTECTED]] Sent: Tuesday, July 30, 2002 1:30 PM To: Derek Samford Cc: [EMAIL PROTECTED]; 'Pedro R Marques'; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: redundancy [was: something about arrogance] Yes their is a reason to some /24s advertised to the world. If this a class on BGP they would tell you that was a nono, but since this is the real world it happens and is sometimes required. It is required when you need to give a customer T-1 access at a location seperate from yours and has a seperate connection to the net and you are using your AS on the access router. A /24 is a solution that works nicely and still works with your aggregated /20 address. On Tue, 2002-07-30 at 13:23, Derek Samford wrote: > > I couldn't possibly agree more. In fact, my approach has been to create > a mesh between different Colo centers, and keep it at about 3 Transit > carriers. Because of the different methods of interconnection, I haven't > ever had a long-term outage. Also, I've been able to filter any issues > that are beyond my carrier's immediate reach (i.e. congested peering > points.) At the same time, I've been able to maintain aggregation of all > of my routes, and maintain true stability in my network. There is > absolutely no excuse to fill up the routing tables with nonsense. > > Derek > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of > Phil Rosenthal > Sent: Tuesday, July 30, 2002 12:52 PM > To: 'Pedro R Marques'; [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: RE: redundancy [was: something about arrogance] > > > I have in the past single-homed to Level(3) and Verio, each in their own > facility in NC. > In that time, both carriers had about 1 solid hour a month of solid > downtime (some months were worse, some were better). Some of the outages > were on the order of 8 solid hours (verio) or 4 hours (level3). > > We did not run HSRP with Level3, so it may be difficult to guarantee the > uptime of one gige handoff... But we ran HSRP with verio, and of all the > outages (about 20 of them) -- Maybe two of them were avoided because of > HSRP. > > Other than that, it was all downtime. > > At this point, I couldn't conceive single-homing to any uplink anymore. > > --Phil > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of > Pedro R Marques > Sent: Tuesday, July 30, 2002 6:23 AM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: redundancy [was: something about arrogance] > > > > Brad writes: > >I'm probably demonstrating my ignorance here (and my stupidity > > in > > stepping into a long-standing highly charged argument), but I'm > > completely missing something. For reasons of redundancy & > > reliability, even if you were to buy bandwidth in only one location, > > wouldn't you want to buy it from at least two different providers? > > >If you buy bandwidth from two different providers at two > > different locations, this would seem to me to be a good way to > > provide backup in case on provider or one location goes > > Tango-Uniform, and you could always backhaul the bandwidth for the > > site/provider that is down. > > Several other posters have mentioned reasons why redundancy between 2 > different connections to separate providers are not, in most situations, > > the preferable aproach but i would like to add another point/question... > > When considering redudancy/reliability/etc it is important to think > about what kind of failures do you want to protect against vs cost of > doing so. > > It is my impression, from reading this list and tidbits of gossip, that > the most common causes of failure are: > - link failure > - equipment failure (routers mostly), both software and hardware > - configuration errors > > All of those are much more frequent than the failure of an entire ISP (a > > transit provider). It is expected, i believe, of a competent ISP to > provide redudancy both within a POP and intra-POP links/equipment and > its connections to upstreams/peers. > > As such, probably the first lev
RE: redundancy [was: something about arrogance]
Yes their is a reason to some /24s advertised to the world. If this a class on BGP they would tell you that was a nono, but since this is the real world it happens and is sometimes required. It is required when you need to give a customer T-1 access at a location seperate from yours and has a seperate connection to the net and you are using your AS on the access router. A /24 is a solution that works nicely and still works with your aggregated /20 address. On Tue, 2002-07-30 at 13:23, Derek Samford wrote: > > I couldn't possibly agree more. In fact, my approach has been to create > a mesh between different Colo centers, and keep it at about 3 Transit > carriers. Because of the different methods of interconnection, I haven't > ever had a long-term outage. Also, I've been able to filter any issues > that are beyond my carrier's immediate reach (i.e. congested peering > points.) At the same time, I've been able to maintain aggregation of all > of my routes, and maintain true stability in my network. There is > absolutely no excuse to fill up the routing tables with nonsense. > > Derek > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of > Phil Rosenthal > Sent: Tuesday, July 30, 2002 12:52 PM > To: 'Pedro R Marques'; [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: RE: redundancy [was: something about arrogance] > > > I have in the past single-homed to Level(3) and Verio, each in their own > facility in NC. > In that time, both carriers had about 1 solid hour a month of solid > downtime (some months were worse, some were better). Some of the outages > were on the order of 8 solid hours (verio) or 4 hours (level3). > > We did not run HSRP with Level3, so it may be difficult to guarantee the > uptime of one gige handoff... But we ran HSRP with verio, and of all the > outages (about 20 of them) -- Maybe two of them were avoided because of > HSRP. > > Other than that, it was all downtime. > > At this point, I couldn't conceive single-homing to any uplink anymore. > > --Phil > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of > Pedro R Marques > Sent: Tuesday, July 30, 2002 6:23 AM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: redundancy [was: something about arrogance] > > > > Brad writes: > >I'm probably demonstrating my ignorance here (and my stupidity > > in > > stepping into a long-standing highly charged argument), but I'm > > completely missing something. For reasons of redundancy & > > reliability, even if you were to buy bandwidth in only one location, > > wouldn't you want to buy it from at least two different providers? > > >If you buy bandwidth from two different providers at two > > different locations, this would seem to me to be a good way to > > provide backup in case on provider or one location goes > > Tango-Uniform, and you could always backhaul the bandwidth for the > > site/provider that is down. > > Several other posters have mentioned reasons why redundancy between 2 > different connections to separate providers are not, in most situations, > > the preferable aproach but i would like to add another point/question... > > When considering redudancy/reliability/etc it is important to think > about what kind of failures do you want to protect against vs cost of > doing so. > > It is my impression, from reading this list and tidbits of gossip, that > the most common causes of failure are: > - link failure > - equipment failure (routers mostly), both software and hardware > - configuration errors > > All of those are much more frequent than the failure of an entire ISP (a > > transit provider). It is expected, i believe, of a competent ISP to > provide redudancy both within a POP and intra-POP links/equipment and > its connections to upstreams/peers. > > As such, probably the first level of redundancy that a origin AS > (non-transit) would look at would be with the intent to protect from > failures of its external connectivity link and termination equipment > (routers on both ends). > > To do so, one can look at: > - 2 external links to distinct providers > - 2 external links to the same provider > > While i can't speak to the economics part of the equation (although i > would expect it to be cheaper to buy an additional link than connect to > a different provider) from a point of view of restoration, protecting a > path with an alternate path from the same provider is certainly an > aproach that gives you much better convengence tim
RE: redundancy [was: something about arrogance]
I couldn't possibly agree more. In fact, my approach has been to create a mesh between different Colo centers, and keep it at about 3 Transit carriers. Because of the different methods of interconnection, I haven't ever had a long-term outage. Also, I've been able to filter any issues that are beyond my carrier's immediate reach (i.e. congested peering points.) At the same time, I've been able to maintain aggregation of all of my routes, and maintain true stability in my network. There is absolutely no excuse to fill up the routing tables with nonsense. Derek -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Phil Rosenthal Sent: Tuesday, July 30, 2002 12:52 PM To: 'Pedro R Marques'; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: RE: redundancy [was: something about arrogance] I have in the past single-homed to Level(3) and Verio, each in their own facility in NC. In that time, both carriers had about 1 solid hour a month of solid downtime (some months were worse, some were better). Some of the outages were on the order of 8 solid hours (verio) or 4 hours (level3). We did not run HSRP with Level3, so it may be difficult to guarantee the uptime of one gige handoff... But we ran HSRP with verio, and of all the outages (about 20 of them) -- Maybe two of them were avoided because of HSRP. Other than that, it was all downtime. At this point, I couldn't conceive single-homing to any uplink anymore. --Phil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Pedro R Marques Sent: Tuesday, July 30, 2002 6:23 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: redundancy [was: something about arrogance] Brad writes: >I'm probably demonstrating my ignorance here (and my stupidity in > stepping into a long-standing highly charged argument), but I'm > completely missing something. For reasons of redundancy & > reliability, even if you were to buy bandwidth in only one location, > wouldn't you want to buy it from at least two different providers? >If you buy bandwidth from two different providers at two > different locations, this would seem to me to be a good way to > provide backup in case on provider or one location goes > Tango-Uniform, and you could always backhaul the bandwidth for the > site/provider that is down. Several other posters have mentioned reasons why redundancy between 2 different connections to separate providers are not, in most situations, the preferable aproach but i would like to add another point/question... When considering redudancy/reliability/etc it is important to think about what kind of failures do you want to protect against vs cost of doing so. It is my impression, from reading this list and tidbits of gossip, that the most common causes of failure are: - link failure - equipment failure (routers mostly), both software and hardware - configuration errors All of those are much more frequent than the failure of an entire ISP (a transit provider). It is expected, i believe, of a competent ISP to provide redudancy both within a POP and intra-POP links/equipment and its connections to upstreams/peers. As such, probably the first level of redundancy that a origin AS (non-transit) would look at would be with the intent to protect from failures of its external connectivity link and termination equipment (routers on both ends). To do so, one can look at: - 2 external links to distinct providers - 2 external links to the same provider While i can't speak to the economics part of the equation (although i would expect it to be cheaper to buy an additional link than connect to a different provider) from a point of view of restoration, protecting a path with an alternate path from the same provider is certainly an aproach that gives you much better convengence times. This comes from the fact that in terms of network topology, the distance between 2 links to the same upstream is much shorter than 2 links to different upstreams. While, if you protect a path with an alternate path to the same ISP you can expect convergence to occur within the IGP convergence times of your provider, with 2 different providers you need global BGP convergence to occur. This gets to be longer dependent on how topologically distant your 2 upstreams are... for instance attempting to protect a path to an ISP with very wide connectivity with a protection path from one with very limited connectivity would be a particularly bad case as you would have to wait for the path announced by the larger ISP to be withdrawn n times from all its peering points and the protection path to make its way through in replacement. It is counter-intuitive to me what i perceive to be the standard practice of attempting to multi-home to 2 distinct providers by origin-only ASes... fro
RE: redundancy [was: something about arrogance]
I have in the past single-homed to Level(3) and Verio, each in their own facility in NC. In that time, both carriers had about 1 solid hour a month of solid downtime (some months were worse, some were better). Some of the outages were on the order of 8 solid hours (verio) or 4 hours (level3). We did not run HSRP with Level3, so it may be difficult to guarantee the uptime of one gige handoff... But we ran HSRP with verio, and of all the outages (about 20 of them) -- Maybe two of them were avoided because of HSRP. Other than that, it was all downtime. At this point, I couldn't conceive single-homing to any uplink anymore. --Phil -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Pedro R Marques Sent: Tuesday, July 30, 2002 6:23 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: redundancy [was: something about arrogance] Brad writes: >I'm probably demonstrating my ignorance here (and my stupidity in > stepping into a long-standing highly charged argument), but I'm > completely missing something. For reasons of redundancy & > reliability, even if you were to buy bandwidth in only one location, > wouldn't you want to buy it from at least two different providers? >If you buy bandwidth from two different providers at two > different locations, this would seem to me to be a good way to > provide backup in case on provider or one location goes > Tango-Uniform, and you could always backhaul the bandwidth for the > site/provider that is down. Several other posters have mentioned reasons why redundancy between 2 different connections to separate providers are not, in most situations, the preferable aproach but i would like to add another point/question... When considering redudancy/reliability/etc it is important to think about what kind of failures do you want to protect against vs cost of doing so. It is my impression, from reading this list and tidbits of gossip, that the most common causes of failure are: - link failure - equipment failure (routers mostly), both software and hardware - configuration errors All of those are much more frequent than the failure of an entire ISP (a transit provider). It is expected, i believe, of a competent ISP to provide redudancy both within a POP and intra-POP links/equipment and its connections to upstreams/peers. As such, probably the first level of redundancy that a origin AS (non-transit) would look at would be with the intent to protect from failures of its external connectivity link and termination equipment (routers on both ends). To do so, one can look at: - 2 external links to distinct providers - 2 external links to the same provider While i can't speak to the economics part of the equation (although i would expect it to be cheaper to buy an additional link than connect to a different provider) from a point of view of restoration, protecting a path with an alternate path from the same provider is certainly an aproach that gives you much better convengence times. This comes from the fact that in terms of network topology, the distance between 2 links to the same upstream is much shorter than 2 links to different upstreams. While, if you protect a path with an alternate path to the same ISP you can expect convergence to occur within the IGP convergence times of your provider, with 2 different providers you need global BGP convergence to occur. This gets to be longer dependent on how topologically distant your 2 upstreams are... for instance attempting to protect a path to an ISP with very wide connectivity with a protection path from one with very limited connectivity would be a particularly bad case as you would have to wait for the path announced by the larger ISP to be withdrawn n times from all its peering points and the protection path to make its way through in replacement. It is counter-intuitive to me what i perceive to be the standard practice of attempting to multi-home to 2 distinct providers by origin-only ASes... from several points of view: convergence times, load on the global routing system, complexity of management, etc, dual connectivity to different routers of the same provider (using distinct physical paths) would seem to me to make more sense. Unless the main concern is that the upstream ISP fails entirely... which given the fact that it tends to have frontpage honors on the NYTimes this days does not apear to be an all to common occurence (i mean operationally, not financially - clarification added to dispel potential humorous remarks). So, my question to the list is, why is multi-homing to 2 different providers such a desirable thing ? What is the motivation and why is it prefered over multiple connections to the same upstream ? Is the main motivation not so much reliability but having a shorter as-path to more destinations ? This would apear to
RE: redundancy [was: something about arrogance]
> You cannot as easily be held hostage. I have consulted for > a few ISPs and > have my share of war stories. > > Here's a (true!) example. One day, a certain head of a > fairly large ISP > decided that he wouldn't route traffic to or from IPs he had > assigned that > didn't reverse resolve because he felt it was imperative that > people be able > to find network contacts in this way (I think he got sick of > being the one to > get the abuse emails). He told my client three days before implementing a > sweep and filter. He had the equivalent of about 38 /24s from this ISP > distributed over about 180 customers, they were his sole uplink. [SNIP] Often overlooked is the redundancy in business processes. We tend to view events with an external-forces engineering perspective while frequently the culprits are uninformed decisions, knee-jerk reactions and opportunism by humans at our vendors. (Not to downplay other risks.) -John -- John Ferriby - PGP Key: www.ferriby.com/pgpkey smime.p7s Description: application/pkcs7-signature
Re: redundancy [was: something about arrogance]
On Tue, 30 Jul 2002 03:23:24 -0700, Pedro R Marques wrote: >All of those are much more frequent than the failure of an entire ISP (a >transit provider). It is expected, i believe, of a competent ISP to >provide redudancy both within a POP and intra-POP links/equipment and >its connections to upstreams/peers. Yes, but when the ISP that all your redundant links go to and that you got all your IPs from goes out of business, what's the mean time to repair? 30 days? >So, my question to the list is, why is multi-homing to 2 different >providers such a desirable thing ? What is the motivation and why is it >prefered over multiple connections to the same upstream ? You cannot as easily be held hostage. I have consulted for a few ISPs and have my share of war stories. Here's a (true!) example. One day, a certain head of a fairly large ISP decided that he wouldn't route traffic to or from IPs he had assigned that didn't reverse resolve because he felt it was imperative that people be able to find network contacts in this way (I think he got sick of being the one to get the abuse emails). He told my client three days before implementing a sweep and filter. He had the equivalent of about 38 /24s from this ISP distributed over about 180 customers, they were his sole uplink. Here's another good one. A client needed a /22 immediately for a major customer about to come online, set it up fast or lost the account. We made sure to met all the IP assignment guidelines and our justification was impeccable, we had >90% utilization of a /18. The only problem was, the client's provider had a screw up in their allocations and justifications and their applications were being refused by ARIN until they fixed their problems. Now what? One more just for kicks. Client had a 100Mbps circuit from their sole provider (100Mbps to colocated router, DS3 from this router to their premises). The circuit had been in place for several years and the contract had long since expired. One day, they got a call -- they had 5 days to agree to a new (and MUCH higher) pricing scheme with a much higher minimum paid bandwidth amount or their circuit would be turned off. The kicker -- they had to agree to a two year term! The other issue is provider misconfigurations/meltdowns. They're not common, but if you're multihomed, you can just shut down the circuit to the misconfigured providers. There have been a few cases of these that I've seem where the repair time was several hours. If you add cases where just one POP was out, the number goes way up. If you're only in one location yourself and only use one provider, all of your redundant links will likely go to the same POP. DS
redundancy [was: something about arrogance]
Brad writes: >I'm probably demonstrating my ignorance here (and my stupidity in > stepping into a long-standing highly charged argument), but I'm > completely missing something. For reasons of redundancy & > reliability, even if you were to buy bandwidth in only one location, > wouldn't you want to buy it from at least two different providers? >If you buy bandwidth from two different providers at two > different locations, this would seem to me to be a good way to > provide backup in case on provider or one location goes > Tango-Uniform, and you could always backhaul the bandwidth for the > site/provider that is down. Several other posters have mentioned reasons why redundancy between 2 different connections to separate providers are not, in most situations, the preferable aproach but i would like to add another point/question... When considering redudancy/reliability/etc it is important to think about what kind of failures do you want to protect against vs cost of doing so. It is my impression, from reading this list and tidbits of gossip, that the most common causes of failure are: - link failure - equipment failure (routers mostly), both software and hardware - configuration errors All of those are much more frequent than the failure of an entire ISP (a transit provider). It is expected, i believe, of a competent ISP to provide redudancy both within a POP and intra-POP links/equipment and its connections to upstreams/peers. As such, probably the first level of redundancy that a origin AS (non-transit) would look at would be with the intent to protect from failures of its external connectivity link and termination equipment (routers on both ends). To do so, one can look at: - 2 external links to distinct providers - 2 external links to the same provider While i can't speak to the economics part of the equation (although i would expect it to be cheaper to buy an additional link than connect to a different provider) from a point of view of restoration, protecting a path with an alternate path from the same provider is certainly an aproach that gives you much better convengence times. This comes from the fact that in terms of network topology, the distance between 2 links to the same upstream is much shorter than 2 links to different upstreams. While, if you protect a path with an alternate path to the same ISP you can expect convergence to occur within the IGP convergence times of your provider, with 2 different providers you need global BGP convergence to occur. This gets to be longer dependent on how topologically distant your 2 upstreams are... for instance attempting to protect a path to an ISP with very wide connectivity with a protection path from one with very limited connectivity would be a particularly bad case as you would have to wait for the path announced by the larger ISP to be withdrawn n times from all its peering points and the protection path to make its way through in replacement. It is counter-intuitive to me what i perceive to be the standard practice of attempting to multi-home to 2 distinct providers by origin-only ASes... from several points of view: convergence times, load on the global routing system, complexity of management, etc, dual connectivity to different routers of the same provider (using distinct physical paths) would seem to me to make more sense. Unless the main concern is that the upstream ISP fails entirely... which given the fact that it tends to have frontpage honors on the NYTimes this days does not apear to be an all to common occurence (i mean operationally, not financially - clarification added to dispel potential humorous remarks). So, my question to the list is, why is multi-homing to 2 different providers such a desirable thing ? What is the motivation and why is it prefered over multiple connections to the same upstream ? Is the main motivation not so much reliability but having a shorter as-path to more destinations ? This would apear to me to be a clear advantage since that doesn't necessarily reflect in better qualitify of interconnection. My apologies in advance if these seem to be stupid questions... thanks, Pedro.