You appear to have a high number of input queue drops and input errors, granted the counters have never been cleared, do you haver any PPS graphs of the link between these two boxes? I would suspect a traffic spike or link fault causing control messages to be dropped being the cause here.
Dave. Justin Shore wrote: > This afternoon I stumbled across a problem with a LDP session between a > 7613 and a 7201. Actually both LDP and iBGP were flapping every 10 > seconds or so. I had both interfaces configured for MPLS, LDP, IS-IS > (with AUTH and BFD though BFD isn't enabled on the interface itself yet) > with an interface MTU of 9000 and CLNS MTU of 1496. Nothing too fancy. > The systems as a whole are configured with MPLS graceful-restart, LDP, > no mpls ip propagate-ttl, and LDP router-ID on a loopback: > > # 7201 > mpls label protocol ldp > no mpls ip propagate-ttl > mpls ldp graceful-restart > mpls ldp router-id Loopback0 force > > # 7613 > mls mpls tunnel-recir > mpls traffic-eng tunnels > mpls ldp graceful-restart > no mpls ip propagate-ttl > mpls label protocol ldp > mpls ldp router-id Loopback0 force > > This morning at 7:05 the router stopped responding to SNMP queries for > about 15m. The load was about 13 before. Cacti shows the load doubling > in the 10m prior to the 15m of nothing. When it came back the load was > just shy of 50 and stayed there for about 30m. After that it stayed at > around 30-35 for the next 7.5hrs before I noticed the BGP flapping issue > and shutdown the peer for troubleshooting. The load dropped back to > around 16, higher than it was before the hiccup this morning. I'm at a > loss to adequately explain why the load has been so jacked. I think the > 30-35 load was because BGP flapping and the slightly higher load now is > due to the LDP flapping issue. That's my best guess. > > Anyone know how to troubleshoot a LDP neighbor flapping issue? The 7613 > is logging this: > > 730278: Mar 4 20:43:48.696 CST: LDP GR: Received FT Sess TLV from > 10.64.0.34:0 (fl 0x1, rs 0x0, rconn 0, rcov 120000) > 730279: Mar 4 20:43:48.696 CST: LDP GR: MFI cutover wait delay = > 600000, Forwarding State Hold Timer = 600000 > 730280: Mar 4 20:43:48.696 CST: LDP GR: searching for down nbr record > (10.64.0.34:0, 10.64.0.178) > 730281: Mar 4 20:43:48.696 CST: LDP GR: Added FT Sess TLV (Rconn > 120000, Rcov 0) to INIT msg to 10.64.0.34:0 > > The 7201 is logging this: > > 054705: Mar 5 00:28:19.599 CST: LDP GR: GR session 10.64.0.20:0:: lost > 054706: Mar 5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: created > [1 total] > 054707: Mar 5 00:28:19 CST: %LDP-5-GR: GR session 10.64.0.20:0 (inst. > 3): interrupted--recovery pending > 054708: Mar 5 00:28:19.599 CST: LDP GR: GR session 10.64.0.20:0:: > bindings retained > 054709: Mar 5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: state > change (None -> Reconnect-Wait) > 054710: Mar 5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: > reconnect timer started [120000 msecs] > 054711: Mar 5 00:28:19.599 CST: LDP GR: down nbr 10.64.0.20:0:: added > to bindings task queue [1 entries] > 054712: Mar 5 00:28:19 CST: %LDP-5-NBRCHG: LDP Neighbor 10.64.0.20:0 > (0) is DOWN (Received error notification from peer: Shut down) > > 054713: Mar 5 00:28:25.923 CST: LDP GR: searching for down nbr record > (10.64.0.20:0, 10.64.0.179) > 054714: Mar 5 00:28:25.923 CST: LDP GR: search for down nbr record > (10.64.0.20:0, 10.64.0.179) returned 10.64.0.20:0 > 054715: Mar 5 00:28:25.923 CST: LDP GR: Added FT Sess TLV (Rconn 0, > Rcov 120000) to INIT msg to 10.64.0.20:0 > 054716: Mar 5 00:28:25.947 CST: LDP GR: Received FT Sess TLV from > 10.64.0.20:0 (fl 0x1, rs 0x0, rconn 120000, rcov 0) > 054717: Mar 5 00:28:25.947 CST: LDP GR: GR session 10.64.0.20:0:: > established > 054718: Mar 5 00:28:25.947 CST: LDP GR: GR session 10.64.0.20:0:: found > down nbr 10.64.0.20:0 > 054719: Mar 5 00:28:25.947 CST: LDP GR: down nbr 10.64.0.20:0:: > reconnect timer stopped > 054720: Mar 5 00:28:25.947 CST: LDP GR: down nbr 10.64.0.20:0:: state > change (Reconnect-Wait -> Recovering) > 054721: Mar 5 00:28:25.947 CST: LDP GR: down nbr 10.64.0.20:0:: > recovery timer started [1 msecs] > 054722: Mar 5 00:28:25 CST: %LDP-5-GR: GR session 10.64.0.20:0 (inst. > 4): starting graceful recovery > 054723: Mar 5 00:28:25 CST: %LDP-5-NBRCHG: LDP Neighbor 10.64.0.20:0 > (4) is UP > 054724: Mar 5 00:28:25.951 CST: LDP GR: down nbr 10.64.0.20:0:: > recovery timer expired > 054725: Mar 5 00:28:25 CST: %LDP-5-GR: GR session 10.64.0.20:0 (inst. > 4): completed graceful recovery > 054726: Mar 5 00:28:25.951 CST: LDP GR: down nbr 10.64.0.20:0:: > destroying record [0 left] > 054727: Mar 5 00:28:25.951 CST: LDP GR: down nbr 10.64.0.20:0:: state > change (Recovering -> Delete-Wait) > > 054728: Mar 5 00:28:28.091 CST: LDP GR: Tagcon querying for up to 12 > bindings update tasks [table 0] > 054729: Mar 5 00:28:28.091 CST: LDP GR: down nbr 10.64.0.20:0:: > requesting bindings DEL for {10.64.0.20:0, 3} > 054730: Mar 5 00:28:28.091 CST: LDP GR: down nbr 10.64.0.20:0:: removed > from bindings task queue [0 entries] > 054731: Mar 5 00:28:28.091 CST: LDP GR: Requesting 1 bindings update > tasks [0 left in queue] > > 10.64.0.20 is a loopback on the 7613 and 10.64.0.34 is a loopback on the > 7201. > > I do have some interface errors which I also can't explain. They do not > appear to be incrementing though. 7613: > > GigabitEthernet9/1 is up, line protocol is up (connected) > Hardware is C6k 1000Mb 802.3, address is 001a.3063.0a80 (bia > 001a.3063.0a80) > Description: TO 2821-2.dc Gi0/0 > Internet address is 10.64.0.179/31 > MTU 9000 bytes, BW 1000000 Kbit, DLY 10 usec, > reliability 255/255, txload 1/255, rxload 1/255 > Encapsulation ARPA, loopback not set > Keepalive set (10 sec) > Full-duplex, 1000Mb/s > input flow-control is off, output flow-control is off > Clock mode is auto > ARP type: ARPA, ARP Timeout 04:00:00 > Last input 00:00:02, output 00:00:00, output hang never > Last clearing of "show interface" counters never > Input queue: 0/75/1936665/7581 (size/max/drops/flushes); Total output > drops: 4 > Queueing strategy: fifo > Output queue: 0/40 (size/max) > 5 minute input rate 49000 bits/sec, 17 packets/sec > 5 minute output rate 56000 bits/sec, 24 packets/sec > L2 Switched: ucast: 52903876 pkt, 3771470311 bytes - mcast: 15056043 > pkt, 1653756471 bytes > L3 in Switched: ucast: 80170438 pkt, 12709078926 bytes - mcast: 0 pkt, > 0 bytes mcast > L3 out Switched: ucast: 185161821 pkt, 36022953056 bytes mcast: 0 pkt, > 0 bytes > 150040994 packets input, 30087625055 bytes, 0 no buffer > Received 15660647 broadcasts (0 IP multicasts) > 30 runts, 4247159 giants, 0 throttles > 1929071 input errors, 68 CRC, 0 frame, 13 overrun, 0 ignored > 0 watchdog, 0 multicast, 0 pause input > 0 input packets with dribble condition detected > 257650143 packets output, 64726258058 bytes, 0 underruns > 2 output errors, 0 collisions, 2 interface resets > 0 babbles, 0 late collision, 0 deferred > 0 lost carrier, 0 no carrier, 0 PAUSE output > 0 output buffer failures, 0 output buffers swapped out > > 7201: > GigabitEthernet0/0 is up, line protocol is up > Hardware is MV64460 Internal MAC, address is 0023.5ee9.ac1b (bia > 0023.5ee9.ac1b) > Description: TO 7613-2.clr Gi9/1 > Internet address is 10.64.0.178/31 > MTU 9000 bytes, BW 1000000 Kbit/sec, DLY 10 usec, > reliability 255/255, txload 1/255, rxload 1/255 > Encapsulation ARPA, loopback not set > Keepalive set (10 sec) > Full-duplex, 1000Mb/s, media type is RJ45 > output flow-control is XON, input flow-control is unsupported > ARP type: ARPA, ARP Timeout 04:00:00 > Last input 00:00:00, output 00:00:00, output hang never > Last clearing of "show interface" counters never > Input queue: 0/75/3951/0 (size/max/drops/flushes); Total output drops: 6 > Queueing strategy: fifo > Output queue: 0/40 (size/max) > 5 minute input rate 45000 bits/sec, 19 packets/sec > 5 minute output rate 64000 bits/sec, 13 packets/sec > 51466122 packets input, 1916487584 bytes, 0 no buffer > Received 1891956 broadcasts, 0 runts, 0 giants, 0 throttles > 5 input errors, 0 CRC, 0 frame, 0 overrun, 5 ignored > 0 watchdog, 2247902 multicast, 0 pause input > 0 input packets with dribble condition detected > 32927369 packets output, 1549013167 bytes, 0 underruns > 8 output errors, 0 collisions, 1 interface resets > 23 unknown protocol drops > 23 unknown protocol drops > 0 babbles, 0 late collision, 0 deferred > 8 lost carrier, 0 no carrier, 0 pause output > 0 output buffer failures, 0 output buffers swapped out > > > Any thoughts as to what's going on here? I can't tell for certain which > of the 2 routers is causing LDP and BGP to drop. Knowing that would > help me narrow my troubleshooting focus. The 7600 is running SRB1 and > the 7201 is running 12.4(15)T7. > > Thanks > Justin > > _______________________________________________ > cisco-nsp mailing list cisco-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/cisco-nsp > archive at http://puck.nether.net/pipermail/cisco-nsp/ > _______________________________________________ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/