Re: [j-nsp] MX960 JunOS recommendations

Tima Maryin Mon, 23 Nov 2009 03:22:18 -0800

Hello!

(If anyone interested)


It was a PR463989



p.s.
It took me almost month(!!) to extract _existing_ PR number from JTAC !

/angry



Krzysztof Szarkowicz wrote:

JNPR send notification because of hold timer expired (meaning no BGP messages 
are received from the
neighbor) - this is correct behavior from BGP perspective.

Do you have logs on CSCO side for the same event? I assume you will see 
retransmission of UPDATE
message (not Keepalive message). This Update message is dropped somewhere on 
the path between CSCO
and JNPR. And CSCO retrsmits this message. Since UPDATE message is sent within 
Keepalive timer, no
Keepalives are sent.

The most common cause of dropping is mismatch of MPLS MTU, or L2 device with 
misconfigured MTUs
somewhere in between.

You have to figure out (debugs, traceoptions, tcpdumps, whats ever) which 
device on the path is
dropping.

//Krzysztof

-----Original Message-----
From: Tima Maryin [mailto:t...@transtelecom.net]Sent: Thursday, 12 November, 2009 9:07
To: kszarkow...@gmail.com
Cc: juniper-nsp@puck.nether.net
Subject: Re: [j-nsp] MX960 JunOS recommendations

First of all thanks to all who cares :)
I'll reply one by one


Derick Winkworth wrote:
 > How about some debugs or traceoptions?
 >
 >
traceoptions at last Jun says that box dosen't receive bgp notifications sometimes. haven't tried any more yet
sth...@nethelp.no wrote:
 >
 > Make sure that your IP MTU is the same on both Cisco and Juniper sides.
 > If you run IS-IS, make sure your CLNS MTU is the same on both Cisco and
 > Juniper sides.


IP mtu are the same, otherwise ospf do not come up


 > People have been running interoperable Cisco and Juniper networks for
 > many years. This is not rocket science.
Yeah, we installed several Juns into our network several months ago and this isthe only problem which we couldn't solve and rolled back to previous software
(well i do not count some rpd crashes on box with aggregated interfaces which wecan avoid for now. jtac evetually said that its PR439627. I can't read thishidden PR, but its supposed to be fixed in 10.x and 9.3Rnextrelease )
Krzysztof Szarkowicz wrote:
With MTUs around 9000 configured on ALL links in the network there should be no 
problem with BGP,
since as per RFC4271, section 4:

The maximum message size is 4096 octets.  All implementations are required to 
support this maximum
message size.

So even if MPLS and IP MTUs slightly differ, with sizes around 9000 it doesn't 
matter from BGP
perspective.

The only thing that comes in my mind, that there are some L2 switches in 
between and there is
something wrong with MTU on those switches. Worth to check.
There are no switches between them
its
7301-geoptic-7606-tengig-t1600-tengig-mx960
Its lab setup. On the real network it was slightly different, but actually itsthe same from this problem point of view
Could you paste from the log the Notification message generated when the BGP 
session is tear down?
I didn't find any dependance from interfaces load or anything else.
It can be 3-4 gig load (like it was on real network) or empty (like its inlab), bgp session may drop once per minute or stay up for 30 - 60 mins.
Cisco can be either GSR or 7301, Juniper can be mx or T.

There is nothing special  in logs.
Thats the one from mx960:
Nov 12 06:18:31 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0rcvcc: 0 TCP state: 4, snd_una: 307818660 snd_nxt: 307818660 snd_wnd: 16230rcv_nxt: 614682635 rcv_adv: 614699019, hold timer 0Nov 12 06:20:48 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0rcvcc: 0 TCP state: 4, snd_una: 1301747029 snd_nxt: 1301747029 snd_wnd: 16211rcv_nxt: 732160622 rcv_adv: 732177006, hold timer 0Nov 12 06:22:53 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0rcvcc: 0 TCP state: 4, snd_una: 2024212109 snd_nxt: 2024212109 snd_wnd: 16230rcv_nxt: 3950965686 rcv_adv: 3950982070, hold timer 0Nov 12 06:24:56 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0rcvcc: 0 TCP state: 4, snd_una: 2363347692 snd_nxt: 2363347692 snd_wnd: 16230rcv_nxt: 1449362513 rcv_adv: 1449378897, hold timer 0Nov 12 06:59:09 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0rcvcc: 0 TCP state: 4, snd_una: 3704141975 snd_nxt: 3704141975 snd_wnd: 15985rcv_nxt: 2261397920 rcv_adv: 2261414304, hold timer 0Nov 12 07:01:19 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0rcvcc: 0 TCP state: 4, snd_una: 1379635866 snd_nxt: 1379635866 snd_wnd: 16230rcv_nxt: 612357774 rcv_adv: 612374158, hold timer 0Nov 12 07:04:06 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0rcvcc: 0 TCP state: 4, snd_una: 3377139997 snd_nxt: 3377139997 snd_wnd: 16211rcv_nxt: 544711184 rcv_adv: 544727568, hold timer 0Nov 12 07:20:37 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0rcvcc: 0 TCP state: 4, snd_una: 3633708680 snd_nxt: 3633708680 snd_wnd: 16175rcv_nxt: 1216109422 rcv_adv: 1216125806, hold timer 0Nov 12 07:22:54 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0rcvcc: 0 TCP state: 4, snd_una: 4034247055 snd_nxt: 4034247055 snd_wnd: 16211rcv_nxt: 2010186633 rcv_adv: 2010203017, hold timer 0Nov 12 07:25:00 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 38rcvcc: 0 TCP state: 4, snd_una: 3122195868 snd_nxt: 3122195868 snd_wnd: 16268rcv_nxt: 209999860 rcv_adv: 210016244, hold timer 0
Thanks,
Krzysztof



-----Original Message-----
From: Tima Maryin [mailto:t...@transtelecom.net]Sent: Wednesday, 11 November, 2009 15:12
To: kszarkow...@gmail.com
Cc: juniper-nsp@puck.nether.net
Subject: Re: [j-nsp] MX960 JunOS recommendations

Uhm, i see your point here.
We indeed have cisco - cisco - Jun - Jun setup


My cisco interface mtu = ip mtu = mpls mtu =9000
But i reeeealy doubt that bgp keepalive packet size can come close to that mtu.


On Juniper i set interface mtu = cisco mtu +14 and it works fine!
And! As you say, it reports different mpls mtu value:

 > show interfaces xe-1/0/0 | match MTU
Link-level type: Ethernet, MTU: 9014, LAN-PHY mode, Speed: 10Gbps, Loopback:None, Source filtering: Disabled,
     Protocol inet, MTU: 9000
     Protocol mpls, MTU: 8988
     Protocol multiservice, MTU: Unlimited
As far as i understand "default mpls mtu" term (not sure that i _fully_understand it though) it seems, Juniper supposes 3 labels maximum.I dont see any reasons for device to drop packets which has 1 or 2 labels andbigger than mpls mtu, but still ok from interface mtu point ov view.
As per your logic, device should drop all traffic that match such criteria butit seems only bgp session keepalives and i didn't see any other problems
But still, i made an experiment on Juniper and cisco which has bgp sessionbetween them.
cisco:
#sh mpls interfaces g 0/0 detail  | i MTU
         MTU = 9000
#sh ip int g 0/0 | i MTU
   MTU is 9000 bytes
#sh run int g 0/0
Building configuration...

Current configuration : 212 bytes
!
interface GigabitEthernet0/0
  description --- to 7606-2 ---
  mtu 9000
  ip address 10.3.13.2 255.255.255.0
  load-interval 30
  duplex full
  speed 1000
  media-type gbic
  no negotiation auto
  tag-switching ip
end


If i set mtu 9000 under family mpls and commit it, it looks like this:

 > show interfaces xe-1/0/0 | match MTU
Link-level type: Ethernet, MTU: 9014, LAN-PHY mode, Speed: 10Gbps, Loopback:None, Source filtering: Disabled,
     Protocol inet, MTU: 9000
     Protocol mpls, MTU: 9000
       Flags: Is-Primary, User-MTU
     Protocol multiservice, MTU: Unlimited



and problem still persists



please let me know if you have any other ideas :)
p.s. Its the same effect if i set tag-sw mtu 8988 on cisco and leave it'default' (=8988) on juniper
Krzysztof Szarkowicz wrote:
Let me guess.

Your network is multivendor network (JNPR and CSCO) and some transit devices 
are CSCO?

CSCO and JNPR uses different algorithm to calculate default MPLS MTU (if MPLS 
MTU is not
explicitely
configured) which results in 4 byte difference between CSCO side and JNPR side 
of the same link
for
MPLS MTU (the IP MTU is equal on both ends, so no problem with OSPF).

If on JNPR side your MPLS MTU is say 1500 and on the CSCO side the MPLS MTU is 
1504, when the
CSCO
device send an BGP update packet towards JNPR device with size 1502, this 
packet is dropped by
JNPR
device (as it is to big), and TCP ACK is not sent back. CSCO is keeping by 
resending this 1502
long
packet, and JNPR is constantly dropping. Thus, after hold timer expires, the 
Notification message
is
sent.

I assume that with 9.3.R3.8 you didn't catched the '1502' packet sizes.

Could you check with some show commands, what is the MPLS MTU on both ends of 
the link (which is
terminated on CSCO on one side and JNPR on other side)?

//Krzysztof

-----Original Message-----
From: Tima Maryin [mailto:t...@transtelecom.net]Sent: Wednesday, 11 November, 2009 9:57
To: kszarkow...@gmail.com
Cc: juniper-nsp@puck.nether.net
Subject: Re: [j-nsp] MX960 JunOS recommendations

What did you mean by "inappropriately configured" ?

There are the same mtu settings everywhere and traffic passes quite well.
And ospf session goes up without problems.
And how comes that "inappropriately configured IP and MPLS MTU" work well on9.3R3.8 ?
Krzysztof Szarkowicz wrote:
It is not a nasty bug, but problem of inappropriately configured IP and MPLS 
MTUs on transit
nodes.
//Krzysztof

-----Original Message-----
From: juniper-nsp-boun...@puck.nether.net 
[mailto:juniper-nsp-boun...@puck.nether.net] On Behalf
Of
Tima Maryin
Sent: Wednesday, 11 November, 2009 8:28
To: juniper-nsp@puck.nether.net
Subject: Re: [j-nsp] MX960 JunOS recommendations
9.3R4.4 has a nasty bug which occures in setup when you have bgp session overchain of few routers/links with ospf/ldp
bgp session occasionally goes down with notification timeout. Even when there isno traffic at all and no physical errors
rollback to 9.3r3 helps though


JTAC still not confirmed it, but it easlily can be reprodused in lab


_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX960 JunOS recommendations

Reply via email to