Hi, 11.2 is end of support so my guess is that's there no point in raising a case. As first step I'd try the upgrade to some supported release and then check if that helps.
Regards, Wojciech 4 sty 2016 17:02 "Niall Donaghy" <niall.dona...@geant.org> napisał(a): > > > From your comments I understand there was no CPU spike, and traceoptions > aren’t the cause either. > > By this point* I would have raised a JTAC case for analysis of the core > dump, and taken their lead. > > > > * assuming you’ve checked all sources of information and found no clues as > to the cause, ie: logfile analysis, resource exhaustion checks, analysis of > config, eg: are you using suspected buggy features, or anything > non-standard/complex/advanced? > > > > We are running 14.1R5.5 on MX series and have lots of features turned on, > and several workarounds in place. We have found a few bugs for JNPR... > > > > Kind regards, > > Niall > > > > From: Alireza Soltanian [mailto:soltan...@gmail.com] > Sent: 04 January 2016 15:18 > To: Niall Donaghy > Cc: juniper-nsp@puck.nether.net > Subject: RE: [j-nsp] RPD Crash on M320 > > > > Just asking. Anyway any idea about my comments? Also is there any > mechanism or approach for dealing with these kind of situations? > > On Jan 4, 2016 6:45 PM, "Niall Donaghy" <niall.dona...@geant.org <mailto: > niall.dona...@geant.org> > wrote: > > Reading the core dump is beyond my expertise I’m afraid. > > > > Br, > > Niall > > > > From: Alireza Soltanian [mailto:soltan...@gmail.com <mailto: > soltan...@gmail.com> ] > Sent: 04 January 2016 15:14 > To: Niall Donaghy > Cc: juniper-nsp@puck.nether.net <mailto:juniper-nsp@puck.nether.net> > Subject: RE: [j-nsp] RPD Crash on M320 > > > > Hi > Yes I checked the CPU graph and there was a spike on CPU load. > The link was flappy 20 minutes before crash. Also it remained flappy two > hours after this crash. During this time we can see LDP sessions go UP DOWN > over and over. But the only time there was a crash was this time and there > is no spike on CPU. > I must mention we had another issue with another M320. Whenever a link > flapped, CPU of RPD went high and all OSPF sessions reset. I found out the > root cause for that. It was traceoption for LDP. For this box we dont use > traceoption. > Is there any way to read the dump? > > Thank you > > On Jan 4, 2016 6:34 PM, "Niall Donaghy" <niall.dona...@geant.org <mailto: > niall.dona...@geant.org> > wrote: > > Hi Alireza, > > It seemed to me this event could be related to the core dump: Jan 3 > 00:31:28 apa-rtr-028 /kernel: jsr_prl_recv_ack_msg(): received PRL ACK > message on non-active socket w/handle 0x10046fa0000004e > However upon further investigation > (http://kb.juniper.net/InfoCenter/index?page=content < > http://kb.juniper.net/InfoCenter/index?page=content&id=KB18195> > &id=KB18195) I see these > messages are normal/harmless. > > Do you have Cacti graphs of CPU utilisation for both REs, before the rpd > crash? Link flapping may be giving rise to CPU hogging, leading to > instability and subsequent rpd crash. > Was the link particularly flappy just before the crash? > > Kind regards, > Niall > > > > > > -----Original Message----- > > From: juniper-nsp [mailto:juniper-nsp-boun...@puck.nether.net <mailto: > juniper-nsp-boun...@puck.nether.net> ] On Behalf > Of > > Alireza Soltanian > > Sent: 04 January 2016 11:04 > > To: juniper-nsp@puck.nether.net <mailto:juniper-nsp@puck.nether.net> > > Subject: [j-nsp] RPD Crash on M320 > > > > Hi everybody > > > > Recently, we had continuous link flap between our M320 and remote sites. > We > > have a lot of L2Circuits between these sites on our M320. At one point we > had > > crash on RPD process which lead to following log. I must mention the link > flap > > started at 12:10AM and it was continued until 2:30AM. But Crash was > occurred > > at 12:30AM. > > > > > > > > Jan 3 00:31:04 apa-rtr-028 rpd[42128]: RPD_LDP_SESSIONDOWN: LDP session > > 10.237.253.168 is down, reason: received notification from peer > > > > Jan 3 00:31:05 apa-rtr-028 rpd[42128]: RPD_LDP_SESSIONDOWN: LDP session > > 10.237.254.1 is down, reason: received notification from peer > > > > Jan 3 00:31:05 apa-rtr-028 rpd[42128]: RPD_LDP_SESSIONDOWN: LDP session > > 10.237.253.120 is down, reason: received notification from peer > > > > Jan 3 00:31:05 apa-rtr-028 /kernel: jsr_prl_recv_ack_msg(): received > PRL > ACK > > message on non-active socket w/handle 0x1008af8000001c6 > > > > Jan 3 00:31:06 apa-rtr-028 rpd[42128]: RPD_LDP_SESSIONDOWN: LDP session > > 10.237.253.192 is down, reason: received notification from peer > > > > Jan 3 00:31:28 apa-rtr-028 /kernel: jsr_prl_recv_ack_msg(): received > PRL > ACK > > message on non-active socket w/handle 0x10046fa0000004e > > > > > > > > Jan 3 00:32:18 apa-rtr-028 init: routing (PID 42128) terminated by > signal > > number 6. Core dumped! > > > > Jan 3 00:32:18 apa-rtr-028 init: routing (PID 18307) started > > > > Jan 3 00:32:18 apa-rtr-028 rpd[18307]: L2CKT acquiring mastership for > primary > > > > Jan 3 00:32:18 apa-rtr-028 rpd[18307]: L2VPN acquiring mastership for > primary > > > > Jan 3 00:32:20 apa-rtr-028 rpd[18307]: RPD_KRT_KERNEL_BAD_ROUTE: KRT: > > lost ifl 0 for route (null) > > > > Jan 3 00:32:20 apa-rtr-028 last message repeated 65 times > > > > Jan 3 00:32:20 apa-rtr-028 rpd[18307]: L2CKT acquiring mastership for > primary > > > > Jan 3 00:32:20 apa-rtr-028 rpd[18307]: Primary starts deleting all > L2circuit IFL > > Repository > > > > Jan 3 00:32:20 apa-rtr-028 rpd[18307]: RPD_TASK_BEGIN: Commencing > routing > > updates, version 11.2R2.4, built 2011-09-01 06:53:31 UTC by builder > > > > > > > > Jan 3 00:32:21 apa-rtr-028 mib2d[33413]: SNMP_TRAP_LINK_DOWN: ifIndex > > 1329, ifAdminStatus up(1), ifOperStatus down(2), ifName ae1.1041 > > > > Jan 3 00:32:21 apa-rtr-028 mib2d[33413]: SNMP_TRAP_LINK_DOWN: ifIndex > > 1311, ifAdminStatus up(1), ifOperStatus down(2), ifName ae1.1039 > > > > Jan 3 00:32:21 apa-rtr-028 mib2d[33413]: SNMP_TRAP_LINK_DOWN: ifIndex > > 1312, ifAdminStatus up(1), ifOperStatus down(2), ifName ae1.1038 > > > > > > > > The case is we always have this kind of log (except the Crash) on the > device. Is > > there any clue why RPD process crashed? I don't have access to JTAC so I > cannot > > analyze the dump. > > > > The JunOS version is : 11.2R2.4 > > > > > > > > Thank you for your help and support > > > > _______________________________________________ > > juniper-nsp mailing list juniper-nsp@puck.nether.net <mailto: > juniper-nsp@puck.nether.net> > > https://puck.nether.net/mailman/listinfo/juniper-nsp > > _______________________________________________ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp