Re: [c-nsp] BGP Hold time expired/ospf dropping 6500 Sup720-3BXL

Jason LeBlanc Thu, 21 Jan 2010 16:59:01 -0800

Can you send your <snipped> OSPF config?

On Jan 21, 2010, at 5:28 PM, Andy B. wrote:


> Hi,
> 
> I just fell over this thread while doing a little reseach to solve a
> similar situation.
> 
> Hardware:
> 
> - 6509 with SUP720-3BXL on both ends
> - SXF15a
> - Uptime: 46 weeks
> 
> Problem:
> 
> - OSPF (for the loopback between cores) and BGP (mostly customers whom
> we send the full table) going up and down all the time:
> 
> %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.130 on TenGigabitEthernet4/1 from
> FULL to DOWN, Neighbor Down: Dead timer expired
> %OSPF-5-ADJCHG: Process 1, Nbr x.x.x.131 on TenGigabitEthernet9/1 from
> LOADING to FULL, Loading Done
> %BGP-5-ADJCHANGE: neighbor y.y.y.14 Down BGP Notification sent
> %BGP-3-NOTIFICATION: sent to neighbor y.y.y.14 4/0 (hold time expired) 0 bytes
> %BGP-5-ADJCHANGE: neighbor y.y.y.14 Up
> 
> This keeps going on for several hours, and suddenly it stabilizes itself.
> 
> Furthermore I use cacti to generate graphs from the core router via
> SNMP. I have one VLAN that has around 15 GBPS traffic at peak times,
> and as soon as I hit more than 15 GBPS, no more graphs are drawn, core
> router console becomes rather unresponsive and OSPF starts to behave
> strangely.
> 
> What I can rule out is the fiber capacity. I have multiple circuits
> and different paths and operators. The OSPF issue happens on all
> circuits, not just a specific one. No 10 GE link is used more than
> 60%. In fact, traffic from inside my backbone to any place outside
> remains unaffected (thank God), but the core router itself is pretty
> useless. Pinging the core's loopback or any ip loaded on that box
> results in a 40-60% packet loss.
> 
> CPU usage is not high, it's stable. No unusual processes, just IP
> Input and BGP Scanner. More than 50% memory is still free at that
> time.
> 
> I've had this many times recently, but it really just happens when my
> core goes beyond +- 15 GBPS of traffic (outbound). We've been below 15
> GBPS for 2 years and it never happaned at that time. Now all this mess
> happens almost daily, rendering important billing graphs useless and
> annoying full table BGP customers.
> 
> Is this a memory issue, due to the router's long uptime? Would
> reloading the router help in this case? That's the last thing I would
> want to do, but if it helps...
> 
> Cheers,
> 
> Andy
> 
> On Fri, Dec 11, 2009 at 5:22 PM, Drew Weaver <drew.wea...@thenap.com> wrote:
>> Howdy all,
>> 
>> Last night I had an interesting encounter on one of my 6509s /w SUP7203-BXL.
>> 
>> This switch has 3x iBGP sessions with full internet tables and is also 
>> running OSPF.
>> 
>> Two of the three iBGP sessions randomly dropped with:
>> 
>> %BGP-3-NOTIFICATION: sent to neighbor x.x.x.3 4/0 (hold time expired) 0 
>> bytes, I also noticed that during this period OSPF dropped with Neighbor 
>> Down: Dead timer expired
>> 
>> and then re-established, and then failed again, and re-established, and 
>> failed again, and so-on, and so-on.
>> 
>> I checked the physical interfaces between this 6500 and the two GSR 12000s 
>> it peers with and there were no errors, there was also no obvious spike in 
>> traffic that would account for latency that might cause the hold timers to 
>> expire. I remember when this system first came online it took a really long 
>> time for it to download the full internet tables from the upstream GSRs and 
>> also during that time there was a lot of CPU time being eaten up, I am 
>> wondering if maybe the first session failing caused sort of a 'performance' 
>> domino effect which then caused everything else to fail, the issue 
>> eventually corrected itself and stabilized.
>> 
>> This particular box is running 12.2(18)SXF17 so I am less likely to believe 
>> it is a software bug.
>> 
>> Does anyone have any tips on both how I can avoid the hold timer issue 
>> altogether and also how I can make it so that if a session does go down and 
>> re-establish it doesn't totally nail the CPU while it's trying to 
>> re-establish/download the routes? A long time ago I also read that 
>> increasing the MTU on both ends of a circuit can make BGP tables download 
>> faster, I don't know if that's true or not, has anyone else found that?
>> 
>> thanks,
>> -Drew
>> 
>> 
>> _______________________________________________
>> cisco-nsp mailing list  cisco-nsp@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>> 
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/

_______________________________________________
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: [c-nsp] BGP Hold time expired/ospf dropping 6500 Sup720-3BXL

Reply via email to