- My transit provider in Sydney uses localpref on their side to designate one session as “primary” and I am not able to change that. But I can and do send traffic out on both links as equal cost.
Thats interesting, haven't had a vendor do that. I typically use med to preffer one path over another for the same vendor In terms of time it takes to learn a new outbound path, I don’t see this as an issue given the options I have to announce multiple paths over iBGP and use of BFD – this should be possible to make quick by tuning my internal peer configs. Guess this comes down to the hardware. I was testing with mikrotik routers and found inserting / deleting routes could take a long time. A On 27 February 2018 at 13:12, Rhys Hanrahan <r...@nexusone.com.au> wrote: > Hi Guys, > > > > Thanks David for confirming BFD is the way to go here. Luckily, I have > been able to enable BFD on all my transit links so far, so the time to > detect peer failure has been quick. > > > > And thanks Geoff for your detailed reply. From some off-list discussions, > I think that I first need to apply some of the configs (like Add-Path) that > I mentioned originally and see how I go from there, and also need to > pinpoint with more certainty where the issue is occurring. > > > > I know that I’ve mentioned primary/secondary transit links, but I actually > _*am*_ announcing all prefixes on all transit links, and I’m only using > AS Path prepending to try and optimise routing for prefixes that are in VIC > vs NSW. So it’s not a case of conditionally advertising routes in this > case. I did also try advertising more specific prefixes (e.g. /22 at NSW > and /24 in VIC) but I found anecdotally that AS path prepending was faster > for the inbound traffic to converge during failover. > > > > So in a sense, I _*am*_ talking about MRAI timers, which I totally > understand is just not a valid discussion to be having in the context of > the general internet and it’s likely that yes, the outage window I’m seeing > when a prefix is announced over a new transit path is totally reasonable. > BUT where I start to run into a problem with the outcome is still this way > when I have multiple links with a single transit provider. For example: > > > > - I have cross-connect directly between one of my transit edge routers > and one of their routers. > - I have another cross-connect directly between another of my transit > edge routers and another of their routers (and this is not to mean that I > intend this to be a backup path – I send out traffic active/active). > - Both links are to the same transit provider, in the same POP. > - I am advertising the same prefixes over both links, no AS path > prepending, so the announcements are basically identical. > - My transit provider in Sydney uses localpref on their side to > designate one session as “primary” and I am not able to change that. But I > can and do send traffic out on both links as equal cost. > - As far as the rest of the internet is concerned my prefixes are > still being announced from the same transit provider, so there shouldn’t be > a need to propagate routing changes beyond my directly adjacent peer and > their internal network. This is primarily why I am expecting not to see any > impact in this scenario. > - Given that I have adjusted my MRAI timer down to 0 with my adjacent > transit peers, and have BFD enabled, they should be able to switchover to > the alternate link fairly quickly > - And yet, I see a 20 second outage window even in this scenario when > I ping from an external connection into one of my prefixes announced over > this transit. > > > > That scenario above is mainly what I am concerned about as I didn’t expect > much/any service impact in the above scenario, since I would have thought > the path over the internet in general would remain unchanged up till my > transit provider’s internal network. > > > > Regarding what you listed as problem b) totally understand this, and I > would expect some kind of delay when re-announcing via another transit > since as you say, this has to propagate through countless upstreams > throughout the internet - naturally this will take time. It’s good to hear > you say 20-30 seconds is a good number in terms of getting everyone to > re-learn routes. That’s really helpful. > > > > In terms of time it takes to learn a new outbound path, I don’t see this > as an issue given the options I have to announce multiple paths over iBGP > and use of BFD – this should be possible to make quick by tuning my > internal peer configs. > > > > Thanks everyone for your experiences and insights. Based on some of the > replies I got, it seems like it is reasonable to expect that in the > scenario described in the bullet points above, it’s possible to see very > little if any forwarding loss. And only once I am forced to advertise via a > new transit would I expect to see the 20-30 second window as everyone on > the internet learns a new path. I do need to improve my iBGP convergence > and actually implement some of the methods I mentioned originally, and > re-evaluate so as to rule out my iBGP convergence time as the issue I’m > currently seeing for the scenario in the bullet points above. > > > > Thanks everyone for your help. > > > Rhys Hanrahan > Chief Information Officer > Nexus One Pty Ltd > > E: supp...@nexusone.com.au > P: +61 2 9191 0606 <(02)%209191%200606> > W: http://www.nexusone.com.au/ > M: PO Box 127, Royal Exchange NSW 1225 > A: Level 10 307 Pitt St, Sydney NSW 2000 > > [image: ttp://quintus.nexusone.com.au/~rhys/nexus1-email-sig.jpg] > > *From: *AusNOG <ausnog-boun...@lists.ausnog.net> on behalf of David > Hughes <da...@hughes.com.au> > *Date: *Tuesday, 27 February 2018 at 9:39 am > *To: *Geoff Huston <g...@apnic.net> > *Cc: *"ausnog@lists.ausnog.net" <ausnog@lists.ausnog.net> > *Subject: *Re: [AusNOG] Best practices on speeding up BGP convergence > times > > > > > > On 26 Feb 2018, at 9:52 pm, Geoff Huston <g...@apnic.net> wrote: > > > > > a) detecting link down quickly > > You can adjust your BGP session keepalive timers to smaller values and > make the session more sensitive to outages as a result. I also thought that > these days you can get the interface status to directly map to the session > state, but its been a while since I’ve done this in anger and frankly I > have NFC how to do that, even if I used to know! Maybe you are already > doing that anyway. > > > > > > This is the scenario I was talking about (references below). You can > easily have link on a northbound interface even if the peer isn’t there > (you hit a layer-2 agg switch on the way for example). If the peer fails > but you still have link on the interface you’ll be blindly forwarding > packets to it, even though it’s not there anymore, until the BGP timers > expire. That was the point of the lightning talk I gave way-back -then. > Default timers aren’t helpful in this situation. > > > > Fast forward to this decade and you have routing protocols that are > “BFD-aware” so you have sub-second link failure detection. That allows the > control plane to pull down the peer session and remove paths to that peer > from the FIB. You can only run BFD if your upstream is as well so you know > they will dump the prefixes from that peer session as quickly as you will. > It makes failing over to a secondary link within the same upstream provider > pretty seamless. > > > > > > Ref : > > http://archive.apnic.net/meetings/21/docs/sigs/routing/ > routing-pres-hughes-bgp.pdf > > http://lists.ausnog.net/pipermail/ausnog/2015-January/029486.html > > > > > > David > > ... > > _______________________________________________ > AusNOG mailing list > AusNOG@lists.ausnog.net > http://lists.ausnog.net/mailman/listinfo/ausnog > >
_______________________________________________ AusNOG mailing list AusNOG@lists.ausnog.net http://lists.ausnog.net/mailman/listinfo/ausnog