Hi Tim, Do you happen to have port mirroring/sampling enabled on the router? We encountered a similar issue, JTAC found out that sampled process was causing this behavior and it is solved in 11.4R4 (we did not upgrade yet to test in our environment and it also doesn't appear in the release notes however the JTAC engineer said it is solved) The relevant PR is PR726841, while it is with the details of our specific case test, the issue is (according to JTAC) "Sampled being the slow daemon lead to the slow operation of route updation from KRT --- PFE and KRT was stuck for some time."
Regards, Ido -----Original Message----- From: juniper-nsp-boun...@puck.nether.net [mailto:juniper-nsp-boun...@puck.nether.net] On Behalf Of Tim Vollebregt Sent: Wednesday, July 18, 2012 1:04 AM To: Juniper-NSP Subject: [j-nsp] route BGP stall bug Hi All, This morning during a maintenance I experienced the route stall bug Richard mentioned a few times already on j-nsp. Hardware kit: -MX480 with SCB (non-e) -2 x RE-S-1800x4 -4 x MPC 3D 16x 10GE Software version: 10.4R8.5 During this maintenance I was placing 2 new routing engines into the router, replacing the 'old' RE-S-2000. This router is pushing a lot of traffic and receiving 14 x full BGP tables from eBGP peers/1 RR session to it's 'mate'/several iBGP peers with partial tables After replacing the RE's the FPC's initialized and BGP sessions were being established it took quite some time before the RIB was completely filled. After checking some hosts I came to the conclusion that there were unreachable destinations however the RIB was looking fine. When checking the FIB by issuing command: show route forwarding-table summary I saw that there were only 11K prefixes pushed to the FIB and it was hanging. As I was aware of the bug I waited for some time. And it eventually took about 30 minutes to fill the FIB with 414K prefixes. During these 30 minutes a lot of destinations were unreachable and traffic was being blackholed as exchanging RIB with peers was fine. As there was still some time left in the maintenance window and I really wanted to have some workaround for dealing with this bug I did the following. I deactivated all eBGP peer groups and did a switchover to the other routing engine. When the PFC's were initialized the router started building it's iBGP sessions towards the core routers, and it's RR session (full table). This worked out quite well, the FIB was being filled with the full table within 5 minutes. Afterwards I activated all eBGP peergroups again and monitored the FIB, eventually it took about 30 minutes to fill the FIB with the correct next-hops. But this time the blackholing was just for a limited amount of time. It seems this bug is there since release 10.0 (MPC), and there doesn't seem to be a fix yet. Does anyone have more information about it, PR number etc? IMHO this is a really bad one, and can be a showstopper in some cases. Thanks for your time. BR, Tim _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
_______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp