Re: [j-nsp] Random BGP peer drops

Tima Maryin Wed, 15 Feb 2012 10:19:34 -0800

Hi,


http://www.gossamer-threads.com/lists/nsp/juniper/32538#32538

So either solve load/transmission issues or upgrade RR to 10.4



On 14.02.2012 20:55, Serge Vautour wrote:

Hello,

We have an MPLS network made up of many MX960s and MX80s. We run OSPF as our IGP - 
all links in area 0. BGP is used for signaling of all L2VPN&  VPLS. At this 
time we only have 1 L3VPN for mgmt. LDP is used for for transport LSPs. We have 
M10i as dedicated Route Reflectors. Most MX are on 10.4S5. M10i still on 10.0R3. 
Each PE peers with 2 RRs and has 2 diverse uplinks for redundancy. If 1 link fails, 
there's always another path.

It's been rare but we've seen random iBGP peer drops. The first was several 
months ago. We've now seen 2 in the last week. 2 of the 3 were related to link 
failures. The primary path from the PE to the RR failed. BGP timed out after a 
bit. Here's an example:

Feb  8 14:05:32  OURBOX-re0 mib2d[2279]: %DAEMON-4-SNMP_TRAP_LINK_DOWN: ifIndex 
129, ifAdminStatus up(1), ifOperStatus down(2), ifName xe-7/0/0
Feb  8 14:05:32  OURBOX-re0 mib2d[2279]: %DAEMON-4-SNMP_TRAP_LINK_DOWN: ifIndex 
120, ifAdminStatus up(1), ifOperStatus down(2), ifName xe-0/0/0
Feb  8 14:06:33  OURBOX-re0 rpd[1413]: %DAEMON-4: bgp_hold_timeout:3660: 
NOTIFICATION sent to 10.1.1.2 (Internal AS 123): code 4 (Hold Timer Expired 
Error), Reason: holdtime expired for 10.1.1.2 (Internal AS 123), socket buffer 
sndcc: 0 rcvcc: 0 TCP state: 4, snd_una: 1056225956 snd_nxt: 1056225956 
snd_wnd: 16384 rcv_nxt: 3883304584 rcv_adv: 3883320968, hold timer 0

BGP holdtime is 90sec. This is more than enough time for OSPF to find the other 
path and converge. The BGP peer came back up before the link so things did 
eventually converge.

The last BGP peer drop happened without any links failure. Out of the blue, BGP 
just went down. The logs on the PE:

Feb 13 20:40:48  OUR-PE1 rpd[1159]: %DAEMON-4: bgp_hold_timeout:3660: 
NOTIFICATION sent to 10.1.1.2 (Internal AS 123): code 4 (Hold Timer Expired 
Error), Reason: holdtime expired for 10.1.1.2 (Internal AS 123), socket buffer 
sndcc: 0 rcvcc: 0 TCP state: 4, snd_una: 2149021074 snd_nxt: 2149021074 
snd_wnd: 16384 rcv_nxt: 2049196833 rcv_adv: 2049213217, hold timer 0
Feb 13 20:40:48  OUR-PE1 rpd[1159]: %DAEMON-4-RPD_BGP_NEIGHBOR_STATE_CHANGED: 
BGP peer 10.1.1.2 (Internal AS 123) changed state from Established to Idle 
(event HoldTime)
Feb 13 20:41:21  OUR-PE1 rpd[1159]: %DAEMON-4-RPD_BGP_NEIGHBOR_STATE_CHANGED: 
BGP peer 10.1.1.2 (Internal AS 123) changed state from OpenConfirm to 
Established (event RecvKeepAlive)

The RR side shows the same:

Feb 13 20:40:49  OUR-RR1-re0 rpd[1187]: 
%DAEMON-4-RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 10.1.1.61 (Internal AS 123) 
changed state from Established to Idle (event RecvNotify)
Feb 13 20:40:49  OUR-RR1-re0 rpd[1187]: %DAEMON-4: bgp_read_v4_message:8927: 
NOTIFICATION received from 10.1.1.61 (Internal AS 123): code 4 (Hold Timer 
Expired Error), socket buffer sndcc: 57 rcvcc: 0 TCP state: 4, snd_una: 
2049196833 snd_nxt: 2049196871 snd_wnd: 16384 rcv_nxt: 2149021095 rcv_adv: 
2149037458, hold timer 1:03.112744
Feb 13 20:41:21  OUR-RR1-re0 rpd[1187]: 
%DAEMON-4-RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 10.1.1.61 (Internal AS 123) 
changed state from EstabSync to Established (event RsyncAck)
Feb 13 20:41:30  OUR-RR1-re0 rpd[1187]: %DAEMON-3: bgp_send: sending 30 bytes 
to 10.1.1.61 (Internal AS 123) blocked (no spooling requested): Resource 
temporarily unavailable


You can see the peer wasn't down long and re-established on it's own. The logs 
on the RR make it look like it received a msg from the PE that it was dropping 
the BGP session. The last error on the RR seems odd as well.


Has anyone seen something like this before? We do have a case open regarding a 
large number of LSA retransmits. TAC is saying this is a bug related to NSR but 
shouldn't cause any negative impacts. I'm not sure if this is related. I'm 
considering opening a case for this as well but I'm not very confident I'll get 
far.


Any help would be appreciated.

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Random BGP peer drops

Reply via email to