Re: [j-nsp] rib-groups && VPN reflection
I think you've got it clear, Adam. Take routes that are originally destined for inet.0, create a rib-group in order to leak them into a secondary routing instance, and then expect normal L3VPN behaviour for those route-instance routes. (The idea is to drop the routes into an overlay topology, which includes a multi-homed network element, with an interface and BGP relationship to inet.0 in order to influence it and "override" based on original route properties.). Yes, of course, loops are possible, but this is perfectly possible, and policed by all the usual routing protocol mechanisms, IFF the route table advertisement mode is non-reflector mode (meaning that routes are exported directly from the VRFs and not from the bgp.l3vpn.0 holding table). It's this change in functionality and behaviour based on other features that disappoints me most here. KB32423 describes the situation, I've since found, and it advises me to off the reflection. Off-box reflection, for VPN, is probably the best idea anyway, but it's a shame that it isn't clearer in the feature documentation that this road has serious pitfalls. I do understand your point on using RTs to determine remote PE table destination. But there's no easy way to take undecorated inet.0 routes and annotate with RTs, is there? Even if there was, I have to assume that the original route in inet.0 can/will be displaced. That was actually the very nice thing about rib-groups: the two copies of the route do not share any route selection fate. Appreciate the feedback from all. Haven't quite given up yet. -- Adam. On Sun, 21 Apr 2019 at 11:22, wrote: > > I'm not sure I understand your objective, so just to confirm. > Is your objective to leak route from routing table A to routing table B > while being able to advertise the leaked route from table B to other PEs > (where the route is expected to land in table B). > If that's the case then this is not allowed as it can form routing loops. > Instead one is expected to set the RTs on export from table A so that other > PEs can import these in the desired table. > This is where the use of inet.0 is troublesome -so in that case one is > expected to do the route leaking on all remote ends. > But you can see the pattern here, advertising is done from the originating > table and then the "leaking" is supposed to be done on/by the remote end. > > adam > > > > -- -- Adam. ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] rib-groups && VPN reflection
Hello all. I figure this topic is a fundamental and probably frequently asked/answered although it's new problem space for me. I thought I'd consult the font of knowledge here to seek any advice. Environment: MX, JUNOS 15.1F6 Headline requirement: Leak EBGP routes from global inet.0 into a VPN (in order to implement off-ramp/on-ramp for DDoS protection traffic conditioning). Experience: The challenge is quite simple on the surface. Use a rib-group directive on the EBGP peer to group together inet.0 and the VRF.inet.0 together as import-rib/Adj-Rib-In for the peer. Indeed this works as you would expect, and received routes appear in both inet.0 and VRF.inet.0 But the problem is that if rpd is also configured with any of: - IBGP reflection for inetvpn family - EBGP for inetvpn - advertise-from-main-vpn-table, then any leaked routes, while being present in the VRF, do not get advertised internally to other PE VPN routing tables. The cause seems to be that these features cause the mechanics of advertising VPN routes internally to change. These features bring in a requirement for rpd to retain VPN routes in their "native" inet-vpn form, rather than simply consult the origin routing-instsances and synthesise on demand so that the interaction with reflection clients or EBGP peers can be handled. So when these features are enabled, rpd opportunistically switches to a mode where it goes to the trouble of cloning the instance-based vanilla routes as inetvpn within bgp.l3vpn.0 or equiv. Indeed battle-scarred Juniper engineers are probably familiar with this document that offers counsel for how to maintain uptime in the face of this optimisation gear-shift: https://www.juniper.net/documentation/en_US/junos/topics/example/bgp-vpn-session-flap-prevention.html I can understand and appreciate this, even if I might not like it. But the abstraction seems to be incomplete. The method of copying routes to bgp.l3vpn.0 is similar if not identical, under-the-hood, to the initial rib-group operation I am performing at route source to leak the original inet.0 route and this route, as seen in the VRF.inet.0 table, becomes a Secondary route. As such, it apparently isn't candidate for further cloning/copying into bgp.l3vpn.0, and as a consequence the "leaked" route doesn't actually make it into the VPN tables of other PEs. The document suggests a workaround of maintaining the original route in inet.0, but sadly for my use case, the whole premise of the leak operation is to ultimately result in a global table inet.0 redirect elsewhere, so depending on inet.0 route selection is a bit fragile for this. My question to others is, is this a well-known man-trap that I am naively unaware of? Is it simply the case that best practice to get reflection off of production VRF-hosting PEs is actually mandatory here, or are others surprised by this apparent feature clash? Can I reasonably expect it to be addressed further down the software road? Or is there another, perhaps better, way of achieving my objective? Any thoughts appreciated. -- Adam. ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] BGP apparent I/O throttling on MX960 (JUNOS 14.1R6)
Hello all. Anyone any experience with situations where "show bgp neighbor X.X.X.X" on JUNOS CLI produces a small appendix to the usual output stating: "Received and buffered octets: 20". 20 in this case seems to vary between invocations, but usually under 100. Example pseudo-sanitised output at the end of the mail for anyone interested. It seems to suggest that rpd completed a short read or is otherwise still waiting for a complete message from the remote peer. In this case, the remote peer was aware of the situation because they monitor their XR speaker's BGP OutQ instrumentation to watch for slow readers. Their observation was a zero-size advertised receive window in TCP preventing their BGP speaker from sending queued message). Problem disappeared after several days with no obvious action taken by either party. OutQ on peer's side returned to zero, and I could see no further messages of partial reception or buffering on our side. Beyond the instrumentation, I could find no obvious evidence of degraded function. Peer's large OutQ was a cause for concern (their export policy to us is one of full table), but it went unrealised. No other BGP peer on the box exhibited similar symptoms. NSR in use (wondered if replication between REs could slow down effective TCP receive rate). Link-layer to peer was reliable and with low latency. Probably in the X-Files, I realise, but I thought a stab-in-the-dark here might be worthwhile. -- Adam. Peer: REMOTE_IP+179 AS REMOTE_AS Local: LOCAL+51007 AS MY_AS Description: Peer Type: ExternalState: EstablishedFlags: Last State: EstabSync Last Event: RecvKeepAlive Last Error: Cease Export: [ OUT ] Import: [ IN ] Options: Options: Authentication key is configured Holdtime: 90 Preference: 170 Number of flaps: 1 Last flap event: Stop Error: 'Cease' Sent: 1 Recv: 0 Peer ID: PEER-ROUTER-ID Local ID: MY-ROUTER-ID Active Holdtime: 90 Keepalive Interval: 30 Group index: 9Peer index: 0 BFD: disabled, down Local Interface: ae12.0 NLRI for restart configured on peer: inet-unicast NLRI advertised by peer: inet-unicast inet-multicast NLRI for this session: inet-unicast Peer supports Refresh capability (2) Stale routes from peer are kept for: 300 Peer does not support Restarter functionality NLRI that restart is negotiated for: inet-unicast NLRI of received end-of-rib markers: inet-unicast NLRI of all end-of-rib markers sent: inet-unicast Peer supports 4 byte AS extension (peer-as REMOTE_AS) Peer does not support Addpath Table inet.0 Bit: 10007 RIB State: BGP restart is complete Send state: in sync Active prefixes: 203815 Received prefixes:595715 Accepted prefixes:505370 Suppressed due to damping:0 Advertised prefixes: 5755 Last traffic (seconds): Received 27 Sent 2Checked 57 Input messages: Total 13670093 Updates 13660814 Refreshes 41 Octets 1307202704 Output messages: Total 299601 Updates 174126 Refreshes 0 Octets 15932309 Output Queue[0]: 0 * Received and buffered octets: 29* adamc@router> show system connections extensive | find REMOTE tcp4 0 38 LOCAL.51007 REMOTE.179 ESTABLISHED sndsbcc: 38 sndsbmbcnt:256 sndsbmbmax: 131072 sndsblowat: 2048 sndsbhiwat: 16384 rcvsbcc: 0 rcvsbmbcnt: 0 rcvsbmbmax: 131072 rcvsblowat: 1 rcvsbhiwat: 16384 proc id: 1873 proc name:rpd iss: 3882968990 sndup: 3901286434 snduna: 3901286434 sndnxt: 3901286472 sndwnd: 32195 sndmax: 3901286472sndcwnd: 12397 sndssthresh: 0 irs: 811065039 rcvup: 2118576537 rcvnxt: 2118576537 rcvadv: 1059075027 rcvwnd: 16384 rtt: 1977181859 srtt: 556992rttv: 388071 rxtcur: 64000 rxtshift: 0 rtseq: 3901286434 rttmin: 0 mss: 1404 flags: REQ_SCALE RCVD_SCALE REQ_TSTMP [0x13e0] ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
[j-nsp] JUNOS precision-timers for BGP
Does anyone have positive or negative experience with this feature in 14.1 please? Currently in a situation troubleshooting consequences of high CPU usage with a number of aggravating factors. Most sensitive to the scarcity of CPU resources however is a number of BGP sessions with aggressive timers. Quite often a commit operation seems to make rpd block sufficiently enough (or indeed it's already starved out by other processes) to neglect keepalives for these unforgiving BGP sessions and we end up losing them. Juniper have recommended to us consideration of "precision-timers", a global BGP knob which, if I understand it well, offloads all of the crucial BGP session management functionality to a different rpd thread in order to leave the main thread able to handle config requests etc. - not too dissimilar to the session management separation in openbgpd etc. The Juniper documentation says this feature is recommended for low hold timers, and from what we can ascertain rpd is able to transition to off-thread session management without a down/up which is pretty neat. I'm aware of PR1044141 which apparently causes pain when used in conjunction with traceoptions, but I'm keen to understand if others have operational experience. We're also making inroads to lower CPU demands through the use of distributed PPM etc., but the regular pattern I tend to see there is that this doesn't fly once the PPM'd protocol has a security knob added, eg. adding authentication to BFD, VRRP etc. -- Adam. ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] Multi Core on JUNOS?
On 8 October 2015 at 17:46, Saku Yttiwrote: > > Hard step#3 is to make rpd use multiple cores. JNPR seems to choose to > DIY inside rpd and just launch threads. I personally would like to see > rpd distributed to multiple OS processes, and capitalise more on > FreeBSD's memory management and scheduling. But I'm not sure how to > handle the IPC efficiently without large cost in data duplications > across processes. > > I can imagine that making rpd MT is probably hard to the point of almost not being worth the benefit (with current REs), unless one can adequately break down the problem into divisable chunks of work that are insensitive to execution order. BGP peer route analysis, comparison against import policy and current RIB might fall into that category, but not without a lot of locking and potential for races. I think there's more bang for buck in the 64-bit address space change what with Internet FIB table size, and I'm quite interested in the developments to make rpd 64-bit clean which I'm sure are also not insignificant. I notice Mr Tinka already expressed a conservative view on jumping into a 64-bit rpd and I can totally understand and appreciate that view. Juniper haven't made this a default on the 14.1R5 cut of code that we're currently testing, so I imagine they're still looking for bleeding-edge feedback to whittle out the potential nasties. I'm quite intrigued by the tidbit in the Juniper docs, though, that suggests that switching from a 32-bit to a 64-bit rpd is not service affecting though which means that the wait-and-see approach is viable? Or am I totally misunderstanding this? https://www.juniper.net/documentation/en_US/junos14.1/topics/reference/configuration-statement/routing-edit-system-processes.html It doesn't say that one needs NSR or any sort of "help" from the backup RE which might be the first assumption. So I wonder how they manage to get one process to exit and the other one to start up with different runtimes, differently sized pointers etc., and somehow share the same in-core RIB and protocol state etc etc.? If this really does work, there's probably someone sitting somewhere in Sunnydale immensely smug and under-stated right now, and if so I'm sure he/she'd eat the MT problem for breakfast! -- Adam. ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp