Re: Level 3 RFO

2005-10-24 Thread Florian Weimer

* Daniel Roesen:

 On Sun, Oct 23, 2005 at 09:48:58PM +0200, Florian Weimer wrote:
 This isn't the first time this has happened to an ISP. 8-(

 Indeed.

 Are there any configuration tweaks which can locally confine such an
 event?  Something like the hard prefix limit for BGP, perhaps.

 JunOS:
 set protocols ospf prefix-export-limit n
 set protocols isis level n prefix-export-limit n

Wouldn't an import limit be better?  If you've got a
almost-fully-meshed MPLS core, export limits won't really work, will
they?

In more traditional networks, I can imagine that it helps to confine
anomalies.  Has anybody tried that on a real network? 8-)


Re: Level 3 RFO

2005-10-24 Thread Daniel Roesen

On Mon, Oct 24, 2005 at 01:25:23PM +0200, Florian Weimer wrote:
  Are there any configuration tweaks which can locally confine such an
  event?  Something like the hard prefix limit for BGP, perhaps.
 
  JunOS:
  set protocols ospf prefix-export-limit n
  set protocols isis level n prefix-export-limit n
 
 Wouldn't an import limit be better?

We're talking link-state protocols here... they need to have the same
view everywhere. The only thing you can limit is what you inject into
the (IGP-)global view.

 If you've got a almost-fully-meshed MPLS core, export limits won't
 really work, will they?

I don't understand this question. What has MPLS to do with IGP route
filtering?!?


Regards,
Daniel

-- 
CLUE-RIPE -- Jabber: [EMAIL PROTECTED] -- [EMAIL PROTECTED] -- PGP: 0xA85C8AA0


Re: Level 3 RFO

2005-10-24 Thread Florian Weimer

* Daniel Roesen:

 On Mon, Oct 24, 2005 at 01:25:23PM +0200, Florian Weimer wrote:
  Are there any configuration tweaks which can locally confine such an
  event?  Something like the hard prefix limit for BGP, perhaps.
 
  JunOS:
  set protocols ospf prefix-export-limit n
  set protocols isis level n prefix-export-limit n
 
 Wouldn't an import limit be better?

 We're talking link-state protocols here... they need to have the same
 view everywhere. The only thing you can limit is what you inject into
 the (IGP-)global view.

What a pity.  There isn't an ugly workaround, either?  There has to be
something that can be done, given the operational risk that is
involved.

Certainly, this adds a new dimension to the distributed single point
of failure concept. 8-(

 If you've got a almost-fully-meshed MPLS core, export limits won't
 really work, will they?

 I don't understand this question. What has MPLS to do with IGP route
 filtering?!?

It's the almost fully-meshed part.  In such a setup, a single router
which exceeds the limit can affect a large part of the the network,
even if other routers do not propagate the bogus data.

But as you say, if the limit you mentioned is just a local limit on
redistribution to the IGP for a single router, my point is moot--if
it's in the IGP, you lose because the limit does not apply to routes
which are received over the IGP.


Re: Level 3 RFO

2005-10-23 Thread Florian Weimer

 However, due to the number of flooded LSAs, other devices in the
 Level 3 network had difficulty fully loading the OSPF tables and
 processing the volume of updates.  This caused abnormal conditions
 within portions of the Level 3 network.  Manual intervention on
 specific routers was required to allow a number of routers to return
 to a normal routing state.

This isn't the first time this has happened to an ISP. 8-(

Are there any configuration tweaks which can locally confine such an
event?  Something like the hard prefix limit for BGP, perhaps.  (I'm
not an OSPF expert, and understand that things are generally more
difficult with link-state protocols.)


Re: Level 3 RFO

2005-10-23 Thread Daniel Roesen

On Sun, Oct 23, 2005 at 09:48:58PM +0200, Florian Weimer wrote:
 This isn't the first time this has happened to an ISP. 8-(

Indeed.

 Are there any configuration tweaks which can locally confine such an
 event?  Something like the hard prefix limit for BGP, perhaps.

JunOS:
set protocols ospf prefix-export-limit n
set protocols isis level n prefix-export-limit n

I'm told IOS has the ~same.


Best regards,
Daniel

-- 
CLUE-RIPE -- Jabber: [EMAIL PROTECTED] -- [EMAIL PROTECTED] -- PGP: 0xA85C8AA0


Level 3 RFO

2005-10-22 Thread erikk


Customer Information



   Customer Company Name:  (Internap)



   Customer Contact Information:  ([EMAIL PROTECTED])



   Customer Location:  (All services with Level3 Communications)



   Original Ticket Number:  SM Parent 1429209



   Customer Impact:  Outage



Event Summary



   Outage location:  IP North America, Trans-Atlantic and European
Markets



   Ticket Create Date and Time:  10/21/2005 12:01 MDT



   Service Restore Date and Time:  Between 10/21/2005 12:25 MDT to

10/21/2005 5:31 MDT depending on Location



   Total Duration:  Varied by Location



   Event Description:

A configuration update was applied to an edge router in Chicago as part of
approved low risk maintenance activity. This validated and approved
configuration change was applied to four other major markets with no impact.
However; in this specific case the configuration was corrupted during the
deployment process on this specific edge router.  Upon load of the corrupted
configuration, the device created an open-ended policy allowing this 
router's

routes to be redistributed to OSPF.



The engineering team immediately reverted to the previous saved
configuration to mitigate route propagation.  The rollback was followed by
deliberate router isolation and complete device reload to ensure no stale
LSAs (Link State Announcements), existed on the device and completed by
12:08 MDT.  After reloading the edge router, the initial cause of the event
was effectively mitigated.  However, due to the number of flooded LSAs,
other devices in the Level 3 network had difficulty fully loading the OSPF
tables and processing the volume of updates.  This caused abnormal
conditions within portions of the Level 3 network.  Manual intervention on
specific routers was required to allow a number of routers to return to a
normal routing state.





Root Cause Analysis



Committed redistribution of loopback statement in an erroneous state.









Repair



   On devices with large number of adjacent neighbors a selective
process of disabling interfaces on redundant paths or OSPF process restarts
stabilized the affected portions to the network.



Future Preventive Actions



The Level 3 engineering team is currently analyzing the event in order to
determine an appropriate action plan.  Details of this specific plan will be
available after the analysis is complete.