- My transit provider in Sydney uses localpref on their side to
   designate one session as “primary” and I am not able to change that. But I
   can and do send traffic out on both links as equal cost.


Thats interesting, haven't had a vendor do that. I typically use med to
preffer one path over another for the same vendor


In terms of time it takes to learn a new outbound path, I don’t see this as
an issue given the options I have to announce multiple paths over iBGP and
use of BFD – this should be possible to make quick by tuning my internal
peer configs.


Guess this comes down to the hardware. I was testing with mikrotik routers
and found inserting / deleting routes could take a long time.

A


On 27 February 2018 at 13:12, Rhys Hanrahan <r...@nexusone.com.au> wrote:

> Hi Guys,
>
>
>
> Thanks David for confirming BFD is the way to go here. Luckily, I have
> been able to enable BFD on all my transit links so far, so the time to
> detect peer failure has been quick.
>
>
>
> And thanks Geoff for your detailed reply. From some off-list discussions,
> I think that I first need to apply some of the configs (like Add-Path) that
> I mentioned originally and see how I go from there, and also need to
> pinpoint with more certainty where the issue is occurring.
>
>
>
> I know that I’ve mentioned primary/secondary transit links, but I actually
> _*am*_ announcing all prefixes on all transit links, and I’m only using
> AS Path prepending to try and optimise routing for prefixes that are in VIC
> vs NSW. So it’s not a case of conditionally advertising routes in this
> case. I did also try advertising more specific prefixes (e.g. /22 at NSW
> and /24 in VIC) but I found anecdotally that AS path prepending was faster
> for the inbound traffic to converge during failover.
>
>
>
> So in a sense, I _*am*_ talking about MRAI timers, which I totally
> understand is just not a valid discussion to be having in the context of
> the general internet and it’s likely that yes, the outage window I’m seeing
> when a prefix is announced over a new transit path is totally reasonable.
> BUT where I start to run into a problem with the outcome is still this way
> when I have multiple links with a single transit provider. For example:
>
>
>
>    - I have cross-connect directly between one of my transit edge routers
>    and one of their routers.
>    - I have another cross-connect directly between another of my transit
>    edge routers and another of their routers (and this is not to mean that I
>    intend this to be a backup path – I send out traffic active/active).
>    - Both links are to the same transit provider, in the same POP.
>    - I am advertising the same prefixes over both links, no AS path
>    prepending, so the announcements are basically identical.
>    - My transit provider in Sydney uses localpref on their side to
>    designate one session as “primary” and I am not able to change that. But I
>    can and do send traffic out on both links as equal cost.
>    - As far as the rest of the internet is concerned my prefixes are
>    still being announced from the same transit provider, so there shouldn’t be
>    a need to propagate routing changes beyond my directly adjacent peer and
>    their internal network. This is primarily why I am expecting not to see any
>    impact in this scenario.
>    - Given that I have adjusted my MRAI timer down to 0 with my adjacent
>    transit peers, and have BFD enabled, they should be able to switchover to
>    the alternate link fairly quickly
>    - And yet, I see a 20 second outage window even in this scenario when
>    I ping from an external connection into one of my prefixes announced over
>    this transit.
>
>
>
> That scenario above is mainly what I am concerned about as I didn’t expect
> much/any service impact in the above scenario, since I would have thought
> the path over the internet in general would remain unchanged up till my
> transit provider’s internal network.
>
>
>
> Regarding what you listed as problem b) totally understand this, and I
> would expect some kind of delay when re-announcing via another transit
> since as you say, this has to propagate through countless upstreams
> throughout the internet - naturally this will take time. It’s good to hear
> you say 20-30 seconds is a good number in terms of getting everyone to
> re-learn routes. That’s really helpful.
>
>
>
> In terms of time it takes to learn a new outbound path, I don’t see this
> as an issue given the options I have to announce multiple paths over iBGP
> and use of BFD – this should be possible to make quick by tuning my
> internal peer configs.
>
>
>
> Thanks everyone for your experiences and insights. Based on some of the
> replies I got, it seems like it is reasonable to expect that in the
> scenario described in the bullet points above, it’s possible to see very
> little if any forwarding loss. And only once I am forced to advertise via a
> new transit would I expect to see the 20-30 second window as everyone on
> the internet learns a new path. I do need to improve my iBGP convergence
> and actually implement some of the methods I mentioned originally, and
> re-evaluate so as to rule out my iBGP convergence time as the issue I’m
> currently seeing for the scenario in the bullet points above.
>
>
>
> Thanks everyone for your help.
>
>
> Rhys Hanrahan
> Chief Information Officer
> Nexus One Pty Ltd
>
> E: supp...@nexusone.com.au
> P: +61 2 9191 0606 <(02)%209191%200606>
> W: http://www.nexusone.com.au/
> M: PO Box 127, Royal Exchange NSW 1225
> A: Level 10 307 Pitt St, Sydney NSW 2000
>
> [image: ttp://quintus.nexusone.com.au/~rhys/nexus1-email-sig.jpg]
>
> *From: *AusNOG <ausnog-boun...@lists.ausnog.net> on behalf of David
> Hughes <da...@hughes.com.au>
> *Date: *Tuesday, 27 February 2018 at 9:39 am
> *To: *Geoff Huston <g...@apnic.net>
> *Cc: *"ausnog@lists.ausnog.net" <ausnog@lists.ausnog.net>
> *Subject: *Re: [AusNOG] Best practices on speeding up BGP convergence
> times
>
>
>
>
>
> On 26 Feb 2018, at 9:52 pm, Geoff Huston <g...@apnic.net> wrote:
>
>
>
>
> a) detecting link down quickly
>
> You can adjust your BGP session keepalive timers to smaller values and
> make the session more sensitive to outages as a result. I also thought that
> these days you can get the interface status  to directly map to the session
> state, but its been a while since I’ve done this in anger and frankly I
> have NFC how to do that, even if I used to know! Maybe you are already
> doing that anyway.
>
>
>
>
>
> This is the scenario I was talking about (references below).  You can
> easily have link on a northbound interface even if the peer isn’t there
> (you hit a layer-2 agg switch on the way for example).  If the peer fails
> but you still have link on the interface you’ll be blindly forwarding
> packets to it, even though it’s not there anymore, until the BGP timers
> expire.  That was the point of the lightning talk I gave way-back -then.
> Default timers aren’t helpful in this situation.
>
>
>
> Fast forward to this decade and you have routing protocols that are
> “BFD-aware” so you have sub-second link failure detection.  That allows the
> control plane to pull down the peer session and remove paths to that peer
> from the FIB.  You can only run BFD if your upstream is as well so you know
> they will dump the prefixes from that peer session as quickly as you will.
> It makes failing over to a secondary link within the same upstream provider
> pretty seamless.
>
>
>
>
>
> Ref :
>
> http://archive.apnic.net/meetings/21/docs/sigs/routing/
> routing-pres-hughes-bgp.pdf
>
> http://lists.ausnog.net/pipermail/ausnog/2015-January/029486.html
>
>
>
>
>
> David
>
> ...
>
> _______________________________________________
> AusNOG mailing list
> AusNOG@lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog
>
>
_______________________________________________
AusNOG mailing list
AusNOG@lists.ausnog.net
http://lists.ausnog.net/mailman/listinfo/ausnog

Reply via email to