RE: Converged Networks Threat (Was: Level3 Outage)

2004-03-02 Thread Kuhtz, Christian



   From where i'm sitting, I see a number of potentially 
 dangerous trends that could result in some quite catastrophic 
 failures of networks. No, i'm not predicting that the 
 internet will end in 8^H7 days or anything like that.  I 
 think the Level3 outage as seen from the outside is a clear 
 case that single providers will continue to have their own 
 network failures for time to come.  (I just hope daily it's 
 not my employers network ;-) )

I don't agree with this 'the sky is falling' perspective and we've seen
these discussions over and over.  Survivability was and continues to be
a design goal of anything we do here.  Was from the first days and it's
true to this day.

When you implement a critical service, you need to do due diligence on
whether the path chosen meets the needs.

   Now the question of Emergency Services is being posed 
 here but also in parallel by a number of other people at the 
 FCC.  We've seen the E911 recommendation come out regarding 
 VoIP calls.  How long until a simple power failure results in 
 the inability to place calls?

There are specific requirements (read: gov't regulations) to implement
E911 with a number of redundancy options, typicalling calling for things
like triple path redundancy.  While I have worked on E911 infrastructure
in the past and I'm not aware of an exhaustive analysis for E911 over
IP, I don't see a reason off the top of my head why you can't do the
same thing on IP.

Sure, requires careful planning.  But what critical service doesn't?

What are you asking for?  More gov't regulation?

   While my friends that are local VFD do still have the 
 traditional pager service with towers, etc... how long until 
 the T1's that are used for dial-in or speaking to the towers 
 are moved to some sort of IP based system?  The global 
 economy seems to be going this direction with varying degrees 
 of caution.
 
   I'm concerned, but not worried.. the network will survive..

What's your point then? :)

There's no panacea for poor implementation.  That's why knowledge and
experience is important in network design and it's importance is
directly linked to the definined critical need of the service
implemented.

Sorry, just angst for me here.  No visible life.

Thanks
Christian

*
The information transmitted is intended only for the person or entity to which it is 
addressed and may contain confidential, proprietary, and/or privileged material.  Any 
review, retransmission, dissemination or other use of, or taking of any action in 
reliance upon, this information by persons or entities other than the intended 
recipient is prohibited.  If you received this in error, please contact the sender and 
delete the material from all computers.  113



RE: Converged Networks Threat (Was: Level3 Outage)

2004-03-02 Thread Kuhtz, Christian

   If events are not properly triggered back upstream (ie: 
 adjencies stay up, bgp remains fairly stable) and you end up 
 dumping a lot of traffic on the floor, it's sometimes a bit 
 more dificult to diagnose than loss of light on a physical path.
 
   On the sunny side, I see this improving over time.  
 Software bugs will be squashed.  Poorly designed networks 
 will be reconfigured to better handle these situations.

But this happens everywhere, everyday, regardless of the underlying
technology.


*
The information transmitted is intended only for the person or entity to which it is 
addressed and may contain confidential, proprietary, and/or privileged material.  Any 
review, retransmission, dissemination or other use of, or taking of any action in 
reliance upon, this information by persons or entities other than the intended 
recipient is prohibited.  If you received this in error, please contact the sender and 
delete the material from all computers.  113



Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-26 Thread Michael . Dillon

 Wouldn't it be great 
if routers had the equivalent of 'User mode Linux' each process 
handling a service, isolated and protected from each other.  The 
physical router would be nothing more than a generic kernel handling 
resource allocation.  Each virtual router would have access to x amount 
of resources and will either halt, sleep, crash when it exhausts those 
resources for a given time slice. 

This is possible today. Build your own routers using
the right microkernel, OSKIT and the Click Modular Router
software and you can have this. When we restrict ourselves
only to router packages from major vendors then we are 
doomed to using outdated technology at inflated prices.

--Michael Dillon





Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-26 Thread Valdis . Kletnieks
On Thu, 26 Feb 2004 14:48:55 GMT, [EMAIL PROTECTED]  said:

 History shows that if you can build a mousetrap that is technically
 better than anything on the market, your best route for success is
 to sell it into niche markets where the customer appreciates the
 technical advances that you can provide and is willing to pay for
 those technical advances. I don't think that describes the larger
 Internet provider networks.

So your target market is those mompop ISPs that *dont* buy
their Ciscos from eBay? :)


pgp0.pgp
Description: PGP signature


Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-26 Thread vijay gill

On Thu, Feb 26, 2004 at 02:48:55PM +, [EMAIL PROTECTED] wrote:
 
  This is possible today. Build your own routers using
  the right microkernel, OSKIT and the Click Modular Router
  software and you can have this. When we restrict ourselves
  only to router packages from major vendors then we are 
  doomed to using outdated technology at inflated prices.
 
 Tell you what Michael, build me some of those, have it pass my labs
 and I'll give you millions in business. Deal? 
 
 The problem with your lab is that you have too many millions
 to give. In order to win those millions people would have to prove
 that their box is at least as good as C and J in the core of the
 largest Internet backbones in the world. That is an awfully big

Let me try this one more time. From the top.

You said:
begin quote
  software and you can have this. When we restrict ourselves
  only to router packages from major vendors then we are
  doomed to using outdated technology at inflated prices.
end quote

So now we have
 to give. In order to win those millions people would have to prove
 that their box is at least as good as C and J in the core of the

So the outdated technology at inflated prices is too high of a hurdle
to pass for the magic Click Modular Software router, the ones that are
allegedly NOT antiquated and are not using outdated technology?
But somehow still cannot function in a core? 


 History shows that if you can build a mousetrap that is technically
 better than anything on the market, your best route for success is

Thought it went build a better mousetrap and the world will beat a 
path to your door, etc etc etc.  


 to sell it into niche markets where the customer appreciates the
 technical advances that you can provide and is willing to pay for
 those technical advances. I don't think that describes the larger
 Internet provider networks.

How would you know this?  Historically, the cutting edge technology
has always gone into the large cores first because they are the
ones pushing the bleeding edge in terms of capacity, power, and
routing.

/vijay


Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-26 Thread vijay gill

On Thu, Feb 26, 2004 at 10:05:03AM -0800, David Barak wrote:
 
 --- vijay gill [EMAIL PROTECTED] wrote:
  How would you know this?  Historically, the cutting
  edge technology
  has always gone into the large cores first because
  they are the
  ones pushing the bleeding edge in terms of capacity,
  power, and
  routing.
  
  /vijay
 
 I'm not sure that I'd agree with that statement: most
 of the large providers with whom I'm familiar tend to
 be relatively conservative with regard to new
 technology deployments, for a couple of reasons:
 
 1) their backbones currently work - changing them
 into something which may or may not work better is a
 non-trivial operation, and risks the network.

This is perhaps current. Check back to see large deployments
GSR - sprint/UUNEt
GRF - uunet
Juniper - UUNET/CWUSA

In all of the above cases, those were the large isps that forced
development of the boxes. Most of the smaller cutting edge
networks are still running 7513s.

GSR was invented because the 7513s were running out of PPS.
CEF was designed to support offloading the RP.

 2) they have an installed base of customers who are
 living with existing functionality - this goes back to
 reason 1 - unless there is money to be made, nobody
 wants to deploy anything.
 
 3) It makes more sense to deploy a new box at the
 edge, and eventually permit it to migrate to the core
 after it's been thoroughly proven - the IP model has
 features living on the edges of the network, while
 capacity lives in the core.  If you have 3 high-cap
 boxes in the core, it's probably easier to add a
 fourth than it is to rip the three out and replace
 them with two higher-cap boxes.

The core has expanded to the edge, not the other way around.
The aggregate backplane bandwidth requirements tend to
drive core box evolution first while the edge box normally
has to deal with high touch features and port multiplexing.
These of course are becoming more and more specialized over
time.

 4) existing management infrastructure permits the
 management of existing boxes - it's easier to deploy
 an all-new network than it is to upgrade from one
 technology/platform to another.

Only if you are willing to write off your entire capital
investment. No one is willing to do that today.


 
 -David Barak
 -Fully RFC 1925 Compliant
 


/vijay
 __
 Do you Yahoo!?
 Get better spam protection with Yahoo! Mail.
 http://antispam.yahoo.com/tools


Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-26 Thread Brett Watson

 1) their backbones currently work - changing them
 into something which may or may not work better is a
 non-trivial operation, and risks the network.

i would disagree.  their backbone tend to reach scaling problems, hence the
need for bleeding/leading edge technologies.  that's been my experience in
three past-large networks.

 
 This is perhaps current. Check back to see large deployments
 GSR - sprint/UUNEt
 GRF - uunet
 Juniper - UUNET/CWUSA

indeed, and going back even further

is-is, 7000 and the original SSE - mci/sprint
vip and netflow - genuity (the original)/probably many others

-b




Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-26 Thread Petri Helenius
vijay gill wrote:

CEF was designed to support offloading the RP.

 

Not really. There existed distributed fastswitching before DCEF came 
along. It might still exist. CEF was developed to address the issue of 
route cache insertion and purging. The unneccessarily painful 60 second 
interval new destination stall was widely documented before CEF got 
widespread use. The fast switching approach was also particularly 
painful when DDOS attacks occurred.

Pete



Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-26 Thread vijay gill

On Thu, Feb 26, 2004 at 09:32:07PM +0200, Petri Helenius wrote:

 along. It might still exist. CEF was developed to address the issue of 
 route cache insertion and purging. The unneccessarily painful 60 second 
 interval new destination stall was widely documented before CEF got 
 widespread use. The fast switching approach was also particularly 
 painful when DDOS attacks occurred.


Thanks for the correction. I clearly was not paying enough attention
when composing.

/vijay


Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-26 Thread Randy Bush

 History shows that if you can build a mousetrap that is technically
 better than anything on the market, your best route for success is
 to sell it into niche markets where the customer appreciates the
 technical advances that you can provide and is willing to pay for
 those technical advances. I don't think that describes the larger
 Internet provider networks.

and this has been so well shown by the blazing successes of
bay networks, avici, what-its-name that burst into flames in
everyone's labs, ...

watch out for flying pigs

randy



Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-26 Thread Deepak Jain

and this has been so well shown by the blazing successes of
bay networks, avici, what-its-name that burst into flames in
everyone's labs, ...
That's a very good point. Building a router that works (at least 
learning from J's example) is hiring away the most important talent
from your competition. Though, it could also be said that the companies 
that hired that same talent away from J have not met the same success, yet.

Deepak


Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-26 Thread David Barak


--- vijay gill [EMAIL PROTECTED] wrote:

 In all of the above cases, those were the large isps
 that forced
 development of the boxes. Most of the smaller
 cutting edge
 networks are still running 7513s.
 
Hmm - what I was getting at was that the big ISPs for
the most part still have a whole lot of 7513s running
around (figuratively), while if I were building a new
network from the ground up, I'd be unlikely to use
them.

 GSR was invented because the 7513s were running out
 of PPS.
 CEF was designed to support offloading the RP.
 
  2) they have an installed base of customers who
 are
  living with existing functionality - this goes
 back to
  reason 1 - unless there is money to be made,
 nobody
  wants to deploy anything.
  
  3) It makes more sense to deploy a new box at the
  edge, and eventually permit it to migrate to the
 core
  after it's been thoroughly proven - the IP model
 has
  features living on the edges of the network, while
  capacity lives in the core.  If you have 3
 high-cap
  boxes in the core, it's probably easier to add a
  fourth than it is to rip the three out and replace
  them with two higher-cap boxes.
 
 The core has expanded to the edge, not the other way
 around.
 The aggregate backplane bandwidth requirements tend
 to
 drive core box evolution first while the edge box
 normally
 has to deal with high touch features and port
 multiplexing.
 These of course are becoming more and more
 specialized over
 time.
 
I agree, from a capacity perspective: the GSR began
life as a core router because it supported big pipes. 
It's only recently that it's had anywhere near the
number of features which the 7500 has (and there are
still a whole lot of specialized features which it
doesn't have).  From a feature deployment approach,
new boxes come in at the edge (think of the deployment
of the 7500 itself: it was an IP front-end for ATM
networks)


  4) existing management infrastructure permits the
  management of existing boxes - it's easier to
 deploy
  an all-new network than it is to upgrade from one
  technology/platform to another.
 
 Only if you are willing to write off your entire
 capital
 investment. No one is willing to do that today.

That is EXACTLY my point: as new companies are
unwilling to write off an investment, they MUST keep
supporting the old stuff.  once they're supporting the
old stuff of vendor X, that provides an incentive to
get more new stuff from vendor X, if the management
platform is the same.

For instance, if I've got a Marconi ATM network, I'm
unlikely to buy new Cisco ATM gear, unless I'm either
building a parallel network, or am looking for an edge
front-end to offer new features.  
However, if I were building a new ATM network today, I
would do a bake-off between the vendors and see which
one met my needs best.

-David Barak
-Fully RFC 1925 Compliant-

=
David Barak
-fully RFC 1925 compliant-

__
Do you Yahoo!?
Get better spam protection with Yahoo! Mail.
http://antispam.yahoo.com/tools


Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-26 Thread David Barak


--- vijay gill [EMAIL PROTECTED] wrote:
 How would you know this?  Historically, the cutting
 edge technology
 has always gone into the large cores first because
 they are the
 ones pushing the bleeding edge in terms of capacity,
 power, and
 routing.
 
 /vijay

I'm not sure that I'd agree with that statement: most
of the large providers with whom I'm familiar tend to
be relatively conservative with regard to new
technology deployments, for a couple of reasons:

1) their backbones currently work - changing them
into something which may or may not work better is a
non-trivial operation, and risks the network.

2) they have an installed base of customers who are
living with existing functionality - this goes back to
reason 1 - unless there is money to be made, nobody
wants to deploy anything.

3) It makes more sense to deploy a new box at the
edge, and eventually permit it to migrate to the core
after it's been thoroughly proven - the IP model has
features living on the edges of the network, while
capacity lives in the core.  If you have 3 high-cap
boxes in the core, it's probably easier to add a
fourth than it is to rip the three out and replace
them with two higher-cap boxes.

4) existing management infrastructure permits the
management of existing boxes - it's easier to deploy
an all-new network than it is to upgrade from one
technology/platform to another.

-David Barak
-Fully RFC 1925 Compliant

__
Do you Yahoo!?
Get better spam protection with Yahoo! Mail.
http://antispam.yahoo.com/tools


Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-26 Thread vijay gill

On Thu, Feb 26, 2004 at 11:28:09AM +, [EMAIL PROTECTED] wrote:
 
  Wouldn't it be great 
 if routers had the equivalent of 'User mode Linux' each process 
 handling a service, isolated and protected from each other.  The 
 physical router would be nothing more than a generic kernel handling 
 resource allocation.  Each virtual router would have access to x amount 
 of resources and will either halt, sleep, crash when it exhausts those 
 resources for a given time slice. 
 
 This is possible today. Build your own routers using
 the right microkernel, OSKIT and the Click Modular Router
 software and you can have this. When we restrict ourselves
 only to router packages from major vendors then we are 
 doomed to using outdated technology at inflated prices.

Tell you what Michael, build me some of those, have it pass my labs
and I'll give you millions in business. Deal? 

Let me draw it out here: 

Step 1: Buy box
Step 2: Install Click Modular Router Software
Step 3: Profit

/vijay


Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread Dave Stewart
At 10:52 AM 2/25/2004, you wrote:

recommendation come out regarding VoIP calls.  How long until a simple
power failure results in the inability to place calls?
We're already at that point.  If the power goes out at home, I'd have to 
grab a flashlight and go hunting for a regular ol' POTS-powered phone.  Or 
use the cell phone (as I did when Bubba had a few too many to drink one 
night recently and took out a power transformer).  But I do have a few old 
regular phones.  How many people don't?

Interactive Intelligence, Artisoft and many others are selling businesses 
phone systems that run entirely on  a server that may or may not be 
connected to a UPS of sufficient capacity to keep the server running during 
an extended outage.  These systems are frequently handling a PRI instead of 
POTS lines, so there's no backup when the UPS dies.  One the phone server 
goes down, no phone service.

VOIP services have the same problem.  Lights go out, that whiz-bang 
handy-dandy VOIP phone doesn't work, either.

Sure, we talking about the end user, not the core/backbone.  But the answer 
to the question, strictly speaking, is that a simple power outage can 
result in many people being unable to make a simple phone call (or at best, 
relying on their cell phones... assuming the generator fired at their 
nearest cell when the lights went out).




Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread Steven M. Bellovin

In message [EMAIL PROTECTED], Jared Mauch writes:

   (I know this is treading on a few what if scenarios, but it could
actually mean a lot if we convert to a mostly IP world as I see the trend).


I think your analysis is dead-on.

--Steve Bellovin, http://www.research.att.com/~smb




Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread Matthew Crocker

	I'm saying that if a network had a FR/ATM/TDM failure in the past
it would be limited to just the FR/ATM/TDM network.  (well, aside from
any IP circuits that are riding that FR/ATM/TDM network).  We're now 
seeing
the change from the TDM based network being the underlying network to 
the
IP/MPLS Core being this underlying network.

What it means is that a failure of the IP portion of the network
that disrupts the underlying MPLS/GMPLS/whatnot core that is now
transporting these FR/ATM/TDM services, does pose a risk.  Is the risk
greater than in the past, relying on the TDM/WDM network?  I think that
there could be some more spectacular network failures to come.  Overall
I think people will learn from these to make the resulting networks
more reliable.  (eg: there has been a lot learned as a result of the
NE power outage last year).
Internet traffic should run over an IP/MPLS core in a separate session 
(VRF, Virtual context, whatever..) so the MPLS core never sees the full 
BGP routing information of the Internet.  So long as router vendors can 
provide proper protection between routing instances so one virtual 
router can't consume all memory/cpu; The MPLS core should be pretty 
stable.  The core MPLS network and control plane should be completely 
separate from regular traffic and much less complex for any given 
carrier.  VoIP, Internet, EoM, AToM, FRoM, TDMoM should all run in 
separate sessions all isolated from each other.  A router should act 
like a unix machine treating each MPLS/VRF session as a separate user, 
isolating and protecting users from each other, providing resource 
allocation and limits.  I'm not sure of the effectiveness of current 
generation routers but it should be coming down the line.   That said, 
the IP/MPLS core should be more stable than traditional TDM networks, 
the Internet itself may not stabilize but that shouldn't affect the 
core.  What happened at L3 was an internet outage, that shouldn't in 
theory affect the MPLS core.  Think back 10 years when it was common 
for a unix binary to wipe out a machine by consuming all resources 
(fork bombs anyone?).  Unix machines have come a long way since then.  
Routers need to follow the same progression.  What is the routing 
equivalent of 'while (1) { fork(); };'?  Currently it is massive BGP 
flapping that chew resources.  A good router should be immune to that 
and can be with proper resource management.

-Matt



RE: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread Sean Crandall

From Jared:

   I keep hear of Frame-Relay and ATM signaling that is 
 going to happen in large providers MPLS cores.  That's right, 
 your safe TDM based services, will be transported over 
 someones IP backbone first.
 This means if they don't protect their IP network, the TDM 
 services could fail.  These types of CES services are not 
 just limited to Frame and ATM.
 (Did anyone with frame/atm/vpn services from Level3 
 experience the same outage?)

We use Level3 for IP transit and transport (both DS-3 and Ethernet over
MPLS (via Martini)) all over the country.  As with everyone else, we saw
the problems with the transit traffic out of SJC and ATL.  However, our
transport services were not affected at all by the problems.  In fact, I
just ended up sending my Level3-SJC bound traffic to LAX via Level3
which was going through the same equipment as the transit traffic which
was having problems.

From Pete:

  From this, it can be deduced that reducing unneccessary 
 system complexity and shortening the strings of pearls that 
 make up the system contribute to better availablity and 
 resiliency of the system. Diversity works both ways in this 
 equation. It lessens the probablity of same failure hitting 
 majority of your boxes but at the same time increases the 
 knowledge needed to understand and maintain the whole system.
 
 I would vote for the KISS principle if in doubt.

I agree.  Granted the string of pearls is always going to be pretty
long, but there are definitely is a trend from what I have seen with
customers to make the string longer than it needs to be.

-Sean

Sean P. Crandall
VP Engineering Operations
MegaPath Networks Inc.
6691 Owens Drive
Pleasanton, CA  94588
(925) 201-2530 (office)
(925) 201-2550 (fax)






Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread Paul Vixie

[EMAIL PROTECTED] (Jared Mauch) writes:

 ...
   I keep hear of Frame-Relay and ATM signaling that is going
 to happen in large providers MPLS cores.  That's right, your safe TDM
 based services, will be transported over someones IP backbone first.

One of my DS3/DS1 vendors recently told me of a plan to use MPLS for part
of the route inside their switching center.  I said not with my circuits
you won't.  Once they understood that I was willing to take my business
elsewhere or simply do without, they decided that an M13 was worth having
after all.  My advice is, walk softly but carry a big stick.  When we all
say everything over IP that means teaching more devices how to speak
802.11 or other packet-based access protocols rather than giving them ATM
or F/R or dialup modem circuitry.  It does *not* mean simulating an ISO-L1
or ISO-L2 circuit using a ISO-L3 network.  (Ick.)
-- 
Paul Vixie


Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread Jeff S Wheeler

On Wed, 2004-02-25 at 13:34, David Meyer wrote:
 Is it that sharing fate in the switching fabric (as
 opposed to say, in the transport fabric, or even
 conduit) reduces the resiliency of a given service (in
 this case FR/ATM/TDM), and as such poses the danger
 you describe?

Our vendors will tell us that the IP routing fabrics of today are indeed
quite reliable and resistant to failure, and they may be right when it
comes to hardware MTBF.  However, the IP network relies a great deal
more on shared/inter-domain, real-time configuration (BGP) than do any
traditional telecommunications networks utilizing the tried and true
technologies referenced above.

Yesterday we witnessed a large scale failure that has yet to be
attributed to configuration, software, or hardware; however one need
look no further than the 168.0.0.0/6 thread, or the GBLX customer who
leaked several tens of thousands of their peers' routes to GBLX shortly
before the Level(3) event, to show that configuration-induced failures
in the Internet reach much further than in traditional TDM or single
vendor PVC networks.

The single point of failure we all share is our reliance on a correct
BGP table, populated by our peers and transit providers; and kept free
of errors by those same operators.

-- JSW




RE: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread Bora Akyol


SNIP
 
 I think it has been proven a few times that physical fate sharing is 
 only a minor contributor to the total connectivity availability while 
 system complexity mostly controlled by software written and 
 operated by 
 imperfect humans contribute a major share to end-to-end availability.
 
  From this, it can be deduced that reducing unneccessary system 
 complexity and shortening the strings of pearls that make up 
 the system 
 contribute to better availablity and resiliency of the 
 system. Diversity 
 works both ways in this equation. It lessens the probablity of same 
 failure hitting majority of your boxes but at the same time increases 
 the knowledge needed to understand and maintain the whole system.
 
 I would vote for the KISS principle if in doubt.

Hi Pete

This train of thought works well for only accidental failures,
unfortunately
if you have an adversary that is bent on disturbing communications
and damaging the critical infrastructure of a country, physical faith
sharing 
makes things less robust than they need to be. By the way, no
disagreement
from me on any of the points you make. Keeping it simple and robust is
definitely
a good first step. Having diverse paths in the fiber infrastructure is
also necessary.

Regards, 

Bora




Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread Matthew Crocker

Yesterday we witnessed a large scale failure that has yet to be
attributed to configuration, software, or hardware; however one need
look no further than the 168.0.0.0/6 thread, or the GBLX customer who
leaked several tens of thousands of their peers' routes to GBLX shortly
This should be rewritten 'Or GLBX who LET one of their customers leak 
several tens of thousands of the peers routes...'.  I'm sorry, a 
network should be able to protect itself from its users and customers.  
BGP filters are not that hard to figure out and peer prefix limits 
should be part of every config.  Don't trust the guy at the other end 
of the pipe to do the right thing.

-Matt



RE: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread Erik Haagsman

On Wed, 2004-02-25 at 20:16, Bora Akyol wrote:
 This train of thought works well for only accidental failures,
 unfortunately
 if you have an adversary that is bent on disturbing communications
 and damaging the critical infrastructure of a country, physical faith
 sharing 
 makes things less robust than they need to be. By the way, no
 disagreement
 from me on any of the points you make. Keeping it simple and robust is
 definitely
 a good first step. Having diverse paths in the fiber infrastructure is
 also necessary.

I don't think faith sharing prevents us from having diverse paths, since
this is where redundancy comes in. Even if all services run over the
same fibre paths, there isn't any problem as long as there's a
sufficient number of alternative paths in case any of the paths goe
down. 

Cheers,

-- 
---
Erik Haagsman
Network Architect
We Dare BV
tel: +31.10.7507008
fax: +31.10.7507005
http://www.we-dare.nl






Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread David Meyer

Jared,

  I keep hear of Frame-Relay and ATM signaling that is going
 to happen in large providers MPLS cores.  That's right, your safe TDM
 based services, will be transported over someones IP backbone first.
 This means if they don't protect their IP network, the TDM services could
 fail.  These types of CES services are not just limited to Frame and ATM.
 (Did anyone with frame/atm/vpn services from Level3 experience the
 same outage?)

Is your concern that carrying FR/ATM/TDM over a packet
core (IP or MPLS or ..) will, via some mechanism, reduce
the resilience of the those services, of the packet core,
of both, or something else?

  We're at (or already past) the dangerous point of network
 convergence.  While I suspect that nobody directly died as a result of
 the recent outage, the trend to link together hospitals, doctors
 and other agencies via the Internet and a series of VPN clients continues
 to grow.  (I say this knowing how important the internet is to
 the medical community, reading x-rays and other data scans at
 home for the oncall is quite common). 

Again, I'm unclear as to what constitutes the dangerous
point of network convergence, or for that matter, what
constitutes convergence (I'm sure we have close to a
common understanding, but its worth making that
explicit).  In any event, can you be more explicit about
what you mean here?

Thanks,

Dave





Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread dan

Convergence, and our lust to throw TDM/ATM infrastructure in the garbge
is an area very near and dear to my heart.

I apologize if I am being a bit redundant here... but from our
perspective, we are an ISP that is under a lot of pressure to deploy a
VoIP solution.  I just don't think we can... It's just not reliable enough
yet. Period.

In a TDM environment the end node switch is incredibly reliable.  I can't
ever remember in my 30 years on this earth when the end node my telephone
was connected to was EVER down, not once, not EVER.  A circuit switch
environment gives us inherint admission control (if there are not enough
tandem/interswitch trunks we just get a fast busy).  This allows them to
guarantee end to end quality.  The one problem, is that if any of the
tandems along the path my call is connected get nuked off the face of the
earth, I am completely off the air.

In an IP (packet based) environment, theoretically routing protocols can
reroute my call while it is in progress if a catstrophic event occurs,
like the entire NE losing power. The inherint problem with IP is that it
has no admission control, and that it's fundamental resliant design was to
make sure that the core of the network knew nothing about the flows
within, so that it _could_ survive a failure.  This design goal is the
problem when trying to guarantee end to end quality of service.  Without
admission control, we can pack it full, so that nothing works 
Variable length frames mean that we have little idea of what is coming
down the pipe next.

This can all be solved by massivly overbuilding our network.

Other than the occasional DoS against an area of the network, outages
caused by overuse are relativley rare

Yhe big problem is the end node hardware in IP networks.  Routers crash
ALL the time it is actually a joke.  Yes, theoretically a user could
have 3 separate connections to the Internet and use their VoIP phone and
be happy, but that is not the case.  They buy Internet service from one
place, that is aggregated in the same building as that TDM end node in the
voice world(usually).  That aggregation (access) layer is the single
biggest vulnerability in both worlds.  It just does not fail in the TDM
world like it does in the IP world.  We need to find ways to make that
work better in the IP world so it can be as reliable as the TMD world.  I
realize that us (the public) are asking IP hardware vendors for new
features far faster than can be released reliably... but surely we can
find ways to fail it over more effectivley than it does now...


Dan.







Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread Jared Mauch

On Wed, Feb 25, 2004 at 09:44:51AM -0800, David Meyer wrote:
   Jared,
 
 I keep hear of Frame-Relay and ATM signaling that is going
  to happen in large providers MPLS cores.  That's right, your safe TDM
  based services, will be transported over someones IP backbone first.
  This means if they don't protect their IP network, the TDM services could
  fail.  These types of CES services are not just limited to Frame and ATM.
  (Did anyone with frame/atm/vpn services from Level3 experience the
  same outage?)
 
   Is your concern that carrying FR/ATM/TDM over a packet
   core (IP or MPLS or ..) will, via some mechanism, reduce
   the resilience of the those services, of the packet core,
   of both, or something else?

I'm saying that if a network had a FR/ATM/TDM failure in the past
it would be limited to just the FR/ATM/TDM network.  (well, aside from
any IP circuits that are riding that FR/ATM/TDM network).  We're now seeing
the change from the TDM based network being the underlying network to the
IP/MPLS Core being this underlying network.

What it means is that a failure of the IP portion of the network
that disrupts the underlying MPLS/GMPLS/whatnot core that is now 
transporting these FR/ATM/TDM services, does pose a risk.  Is the risk
greater than in the past, relying on the TDM/WDM network?  I think that
there could be some more spectacular network failures to come.  Overall
I think people will learn from these to make the resulting networks
more reliable.  (eg: there has been a lot learned as a result of the
NE power outage last year).

 We're at (or already past) the dangerous point of network
  convergence.  While I suspect that nobody directly died as a result of
  the recent outage, the trend to link together hospitals, doctors
  and other agencies via the Internet and a series of VPN clients continues
  to grow.  (I say this knowing how important the internet is to
  the medical community, reading x-rays and other data scans at
  home for the oncall is quite common). 
 
   Again, I'm unclear as to what constitutes the dangerous
   point of network convergence, or for that matter, what
   constitutes convergence (I'm sure we have close to a
   common understanding, but its worth making that
   explicit).  In any event, can you be more explicit about
   what you mean here?

Transporting FR/ATM/TDM/Voice over the IP/MPLS core, as well as
some of the technology shifts (VoIP, Voice over Cable, etc..) are removing
some of the resiliance from the end-user network that existed in the past.

I think that most companies that offer frame-relay which also
have a IP network are looking at moving their frame-relay on to their IP
network.  (I could be wrong here clearly).  This means that overall we need
to continue to provide a more reliable IP network than in the past.  It
is critically important.  I think that Pete Templin is right to question
peoples statements that nobody died because of a network outage.  While
I think that the answer is likely No, will that be the case in 2-3 years
as Qwest, SBC, Verizon, and others move to a more native VoIP infrastructure?

A failure within their IP network could result in some emergency
calling (eg: 911) not working.  While there are alternate means of calling
for help (cell phone, etc..) that may not rely upon the same network elements
that have failed, some people would consider a 60 second delay as you
switch contact methods too long and an excessive risk to someones health.

I think it bolsters the case for personal emergency preparedness,
but also spending more time looking at the services you purchase.  If
you are relying on a private frame-relay circuit as backup for your VPN over
the public internet, knowing if this is switched over an IP network becomes
more important.

(I know this is treading on a few what if scenarios, but it could
actually mean a lot if we convert to a mostly IP world as I see the trend).

- jared

-- 
Jared Mauch  | pgp key available via finger from [EMAIL PROTECTED]
clue++;  | http://puck.nether.net/~jared/  My statements are only mine.


Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread David Meyer

Jared,

 Is your concern that carrying FR/ATM/TDM over a packet
 core (IP or MPLS or ..) will, via some mechanism, reduce
 the resilience of the those services, of the packet core,
 of both, or something else?
 
  I'm saying that if a network had a FR/ATM/TDM failure in
 the past it would be limited to just the FR/ATM/TDM network.
 (well, aside from any IP circuits that are riding that FR/ATM/TDM
 network).  We're now seeing the change from the TDM based
 network being the underlying network to the IP/MPLS Core
 being this underlying network. 
 
  What it means is that a failure of the IP portion of the network
 that disrupts the underlying MPLS/GMPLS/whatnot core that is now 
 transporting these FR/ATM/TDM services, does pose a risk.  Is the risk
 greater than in the past, relying on the TDM/WDM network?  I think that
 there could be some more spectacular network failures to come.  Overall
 I think people will learn from these to make the resulting networks
 more reliable.  (eg: there has been a lot learned as a result of the
 NE power outage last year).

I think folks can almost certainly agree that when you
share fate, well, you share fate. But maybe there is
something else here. Many of these services have always
shared fate at the transport level; that is, in most
cases, I didn't have a separate fiber plant/DWDM
infrastructure for FR/ATM/TDM, IP, Service X, etc,  so
fate was already being/has always been shared in the
transport infrastructure. 

So maybe try this question: 

  Is it that sharing fate in the switching fabric (as
  opposed to say, in the transport fabric, or even
  conduit) reduces the resiliency of a given service (in
  this case FR/ATM/TDM), and as such poses the danger
  you describe?

Is this an accurate characterization of your point? If
so, why should sharing fate in the switching fabric
necessarily reduce the resiliency of the those services
that share that fabric (i.e., why should this be so)? I
have some ideas, but I'm interested in what ideas other
folks have.   

Thanks,

Dave




Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread Petri Helenius
David Meyer wrote:

	Is this an accurate characterization of your point? If
	so, why should sharing fate in the switching fabric
	necessarily reduce the resiliency of the those services
	that share that fabric (i.e., why should this be so)? I
	have some ideas, but I'm interested in what ideas other
	folks have.   
 

I think it has been proven a few times that physical fate sharing is 
only a minor contributor to the total connectivity availability while 
system complexity mostly controlled by software written and operated by 
imperfect humans contribute a major share to end-to-end availability.

From this, it can be deduced that reducing unneccessary system 
complexity and shortening the strings of pearls that make up the system 
contribute to better availablity and resiliency of the system. Diversity 
works both ways in this equation. It lessens the probablity of same 
failure hitting majority of your boxes but at the same time increases 
the knowledge needed to understand and maintain the whole system.

I would vote for the KISS principle if in doubt.

Pete



Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread Jared Mauch

On Wed, Feb 25, 2004 at 10:34:55AM -0800, David Meyer wrote:
   Jared,
 
Is your concern that carrying FR/ATM/TDM over a packet
core (IP or MPLS or ..) will, via some mechanism, reduce
the resilience of the those services, of the packet core,
of both, or something else?
  
 I'm saying that if a network had a FR/ATM/TDM failure in
  the past it would be limited to just the FR/ATM/TDM network.
  (well, aside from any IP circuits that are riding that FR/ATM/TDM
  network).  We're now seeing the change from the TDM based
  network being the underlying network to the IP/MPLS Core
  being this underlying network. 
  
 What it means is that a failure of the IP portion of the network
  that disrupts the underlying MPLS/GMPLS/whatnot core that is now 
  transporting these FR/ATM/TDM services, does pose a risk.  Is the risk
  greater than in the past, relying on the TDM/WDM network?  I think that
  there could be some more spectacular network failures to come.  Overall
  I think people will learn from these to make the resulting networks
  more reliable.  (eg: there has been a lot learned as a result of the
  NE power outage last year).
 
   I think folks can almost certainly agree that when you
   share fate, well, you share fate. But maybe there is
   something else here. Many of these services have always
   shared fate at the transport level; that is, in most
   cases, I didn't have a separate fiber plant/DWDM
   infrastructure for FR/ATM/TDM, IP, Service X, etc,  so
   fate was already being/has always been shared in the
   transport infrastructure. 
 
   So maybe try this question: 
 
 Is it that sharing fate in the switching fabric (as
 opposed to say, in the transport fabric, or even
 conduit) reduces the resiliency of a given service (in
 this case FR/ATM/TDM), and as such poses the danger
 you describe?

I think the threat is that the switching fabric and
forwarding plane can be disrupted by more things than exist in a 
pure TDM based network.  This isn't to say that the packet (or even
label) network isn't the future of these services, it's just
that today there are some interesting problems that still exist as
the technology continues to mature.

   Is this an accurate characterization of your point? If
   so, why should sharing fate in the switching fabric
   necessarily reduce the resiliency of the those services
   that share that fabric (i.e., why should this be so)? I
   have some ideas, but I'm interested in what ideas other
   folks have.   

I believe that there still exist a number of cases where the
switching fabric can get out-of-sync with the control-plane.

If events are not properly triggered back upstream (ie: adjencies
stay up, bgp remains fairly stable) and you end up dumping a lot of
traffic on the floor, it's sometimes a bit more dificult to diagnose
than loss of light on a physical path.

On the sunny side, I see this improving over time.  Software
bugs will be squashed.  Poorly designed networks will be reconfigured to
better handle these situations.

- jared

-- 
Jared Mauch  | pgp key available via finger from [EMAIL PROTECTED]
clue++;  | http://puck.nether.net/~jared/  My statements are only mine.


Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread David Meyer

Petri,

 I think it has been proven a few times that physical fate sharing is 
 only a minor contributor to the total connectivity availability while 
 system complexity mostly controlled by software written and operated by 
 imperfect humans contribute a major share to end-to-end availability.

Yes, and at the very least would seem to match our
intuition and experience. 

 From this, it can be deduced that reducing unneccessary system 
 complexity and shortening the strings of pearls that make up the system 
 contribute to better availablity and resiliency of the system. Diversity 
 works both ways in this equation. It lessens the probablity of same 
 failure hitting majority of your boxes but at the same time increases 
 the knowledge needed to understand and maintain the whole system.

No doubt. However, the problem is: What constitutes
unnecessary system complexity? A designed system's
robustness comes in part from its complexity. So its not
that complexity is inherently bad; rather, it is just
that you wind up with extreme sensitivity to outlying
events which is exhibited by catastrophic cascading
failures if you push a system's complexity past some
point; these are the so-called robust yet fragile
systems (think NE power outage).  

BTW, the extreme sensitivity to outlying events/catastrophic
cascading failures property is a signature of class of
dynamic systems of which we believe the Internet is an
example; unfortunately, the machinery we currently have
(in dynamical systems theory) isn't yet mature enough to
provide us with engineering rules.

 I would vote for the KISS principle if in doubt.

Truly. See RFC 3439 and/or
http://www.1-4-5.net/~dmm/complexity_and_the_internet. I
also said a few words about this topic at NANOG26
where we has a panel on this topic (my slides on 
http://www.maoz.com/~dmm/NANOG26/complexity_panel).

Dave




Re: Converged Networks Threat (Was: Level3 Outage)

2004-02-25 Thread Petri Helenius
David Meyer wrote:

	
	No doubt. However, the problem is: What constitutes
	unnecessary system complexity? A designed system's
	robustness comes in part from its complexity. So its not
	that complexity is inherently bad; rather, it is just
	that you wind up with extreme sensitivity to outlying
	events which is exhibited by catastrophic cascading
	failures if you push a system's complexity past some
	point; these are the so-called robust yet fragile
	systems (think NE power outage).  
 

I think you hit the nail on the head. I view complexity as diminishing 
returns play. When you increase complexity, the increase does benefit a 
decreasing percentage of the users. A way to manage complexity is 
splitting large systems into smaller pieces and try to make the pieces 
independent enough to survive a failure of neighboring piece. This 
approach exists at least in the marketing materials of many 
telecommunications equipment vendors. The question then becomes, what 
good is a backbone router without BGP process. So far I haven´t seen a 
router with a disposable entity on interface or peer basis. So if a BGP 
speaker to 10.1.1.1 crashes the system would still be able to maintain 
relationship to 10.2.2.2. Obviously the point of single device 
availability becomes moot if we can figure out a way to route/switch 
around the failed device quickly enough. Today we don´t even have a 
generic IP layer liveness protocol so by default packets will be 
blackholed for a definite duration until a routing protocol starts to 
miss it´s hello packets. (I´m aware of work towards this goal)

In summary, I feel systems should be designed to run independent in all 
failure modes. If you lose 1-n neighbors the system should be 
self-sufficient on figuring out near-immediately the situation, continue 
working while negotiating with neighbors about the overall picture.

Pete