Hi Guys,

Thanks David for confirming BFD is the way to go here. Luckily, I have been 
able to enable BFD on all my transit links so far, so the time to detect peer 
failure has been quick.

And thanks Geoff for your detailed reply. From some off-list discussions, I 
think that I first need to apply some of the configs (like Add-Path) that I 
mentioned originally and see how I go from there, and also need to pinpoint 
with more certainty where the issue is occurring.

I know that I’ve mentioned primary/secondary transit links, but I actually _am_ 
announcing all prefixes on all transit links, and I’m only using AS Path 
prepending to try and optimise routing for prefixes that are in VIC vs NSW. So 
it’s not a case of conditionally advertising routes in this case. I did also 
try advertising more specific prefixes (e.g. /22 at NSW and /24 in VIC) but I 
found anecdotally that AS path prepending was faster for the inbound traffic to 
converge during failover.

So in a sense, I _am_ talking about MRAI timers, which I totally understand is 
just not a valid discussion to be having in the context of the general internet 
and it’s likely that yes, the outage window I’m seeing when a prefix is 
announced over a new transit path is totally reasonable. BUT where I start to 
run into a problem with the outcome is still this way when I have multiple 
links with a single transit provider. For example:


  *   I have cross-connect directly between one of my transit edge routers and 
one of their routers.
  *   I have another cross-connect directly between another of my transit edge 
routers and another of their routers (and this is not to mean that I intend 
this to be a backup path – I send out traffic active/active).
  *   Both links are to the same transit provider, in the same POP.
  *   I am advertising the same prefixes over both links, no AS path 
prepending, so the announcements are basically identical.
  *   My transit provider in Sydney uses localpref on their side to designate 
one session as “primary” and I am not able to change that. But I can and do 
send traffic out on both links as equal cost.
  *   As far as the rest of the internet is concerned my prefixes are still 
being announced from the same transit provider, so there shouldn’t be a need to 
propagate routing changes beyond my directly adjacent peer and their internal 
network. This is primarily why I am expecting not to see any impact in this 
scenario.
  *   Given that I have adjusted my MRAI timer down to 0 with my adjacent 
transit peers, and have BFD enabled, they should be able to switchover to the 
alternate link fairly quickly
  *   And yet, I see a 20 second outage window even in this scenario when I 
ping from an external connection into one of my prefixes announced over this 
transit.

That scenario above is mainly what I am concerned about as I didn’t expect 
much/any service impact in the above scenario, since I would have thought the 
path over the internet in general would remain unchanged up till my transit 
provider’s internal network.

Regarding what you listed as problem b) totally understand this, and I would 
expect some kind of delay when re-announcing via another transit since as you 
say, this has to propagate through countless upstreams throughout the internet 
- naturally this will take time. It’s good to hear you say 20-30 seconds is a 
good number in terms of getting everyone to re-learn routes. That’s really 
helpful.

In terms of time it takes to learn a new outbound path, I don’t see this as an 
issue given the options I have to announce multiple paths over iBGP and use of 
BFD – this should be possible to make quick by tuning my internal peer configs.

Thanks everyone for your experiences and insights. Based on some of the replies 
I got, it seems like it is reasonable to expect that in the scenario described 
in the bullet points above, it’s possible to see very little if any forwarding 
loss. And only once I am forced to advertise via a new transit would I expect 
to see the 20-30 second window as everyone on the internet learns a new path. I 
do need to improve my iBGP convergence and actually implement some of the 
methods I mentioned originally, and re-evaluate so as to rule out my iBGP 
convergence time as the issue I’m currently seeing for the scenario in the 
bullet points above.

Thanks everyone for your help.

Rhys Hanrahan
Chief Information Officer
Nexus One Pty Ltd

E: supp...@nexusone.com.au<mailto:supp...@nexusone.com.au>
P: +61 2 9191 0606
W: http://www.nexusone.com.au/
M: PO Box 127, Royal Exchange NSW 1225
A: Level 10 307 Pitt St, Sydney NSW 2000

[ttp://quintus.nexusone.com.au/~rhys/nexus1-email-sig.jpg]
From: AusNOG <ausnog-boun...@lists.ausnog.net> on behalf of David Hughes 
<da...@hughes.com.au>
Date: Tuesday, 27 February 2018 at 9:39 am
To: Geoff Huston <g...@apnic.net>
Cc: "ausnog@lists.ausnog.net" <ausnog@lists.ausnog.net>
Subject: Re: [AusNOG] Best practices on speeding up BGP convergence times


On 26 Feb 2018, at 9:52 pm, Geoff Huston 
<g...@apnic.net<mailto:g...@apnic.net>> wrote:


a) detecting link down quickly

You can adjust your BGP session keepalive timers to smaller values and make the 
session more sensitive to outages as a result. I also thought that these days 
you can get the interface status  to directly map to the session state, but its 
been a while since I’ve done this in anger and frankly I have NFC how to do 
that, even if I used to know! Maybe you are already doing that anyway.


This is the scenario I was talking about (references below).  You can easily 
have link on a northbound interface even if the peer isn’t there (you hit a 
layer-2 agg switch on the way for example).  If the peer fails but you still 
have link on the interface you’ll be blindly forwarding packets to it, even 
though it’s not there anymore, until the BGP timers expire.  That was the point 
of the lightning talk I gave way-back -then.  Default timers aren’t helpful in 
this situation.

Fast forward to this decade and you have routing protocols that are “BFD-aware” 
so you have sub-second link failure detection.  That allows the control plane 
to pull down the peer session and remove paths to that peer from the FIB.  You 
can only run BFD if your upstream is as well so you know they will dump the 
prefixes from that peer session as quickly as you will.  It makes failing over 
to a secondary link within the same upstream provider pretty seamless.


Ref :
http://archive.apnic.net/meetings/21/docs/sigs/routing/routing-pres-hughes-bgp.pdf
http://lists.ausnog.net/pipermail/ausnog/2015-January/029486.html


David
...
_______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
http://lists.ausnog.net/mailman/listinfo/ausnog

Reply via email to