Re: BFD for routes learned trough Route-servers in IXPs

2020-09-22 Thread Jared Mauch



> On Sep 22, 2020, at 4:46 AM, Andy Davidson  wrote:
> 
> Hi,
> 
> Douglas Fisher wrote:
>> B) There is any other alternative to that?
> 
> Don't connect to IXPs with very very large and complicated topologies. 
> Connect to local IXPs where the design makes a forwarding plane failure that 
> causes the problem you describe less likely. 

Or don’t use a route server except to bootstrap.  I regularly see issues 
related to them.  I get it’s not easy to peer at an IXP, but IXP peering isn’t 
for everyone as some people might make it sound.  This is why back in the day 
there was a push to require 24x7 staffing of the remote side to ensure it was 
being monitored/supported.

That may no longer apply to many people, but without active monitoring, you 
won’t know what the state is of the remote side.

- Jared

Re: BFD for routes learned trough Route-servers in IXPs

2020-09-22 Thread Andy Davidson
Hi,

Douglas Fisher wrote:
> B) There is any other alternative to that?

Don't connect to IXPs with very very large and complicated topologies. Connect 
to local IXPs where the design makes a forwarding plane failure that causes the 
problem you describe less likely. 

Andy



Re: BFD for routes learned trough Route-servers in IXPs

2020-09-20 Thread Baldur Norddahl
Hello

ARP timeout should be lower than MAC timeout, but usually the default is
the other way around. Which is extremely stupid. To those who do not know
why, let me give a simple example:

Router R1 is connected to switch SW1 with a connection to server SRV: R1
<-> SW1 <-> SRV
Router R2 is connected to switch SW2 with a connection to server SRV: R2
<-> SW2 <-> SRV

The server is using R1 as default gateway. Traffic is arriving from the
internet through R2 towards the server. The server will however send
replies back through the default gateway at R1. This is a usual case with
redundant routers - only one will be used as a default gateway but traffic
may come from both.

Initially all will be good. But SW2 is only seeing unidirectional traffic
from R2. No traffic goes from SRV to R2 and thus, after some time, SW2 will
expire the MAC learning for SRV. This has the unfortunate result that SW2
will start flooding traffic to SRV out through all ports.

Then after more time has passed, R2 will renew the ARP binding by sending
out an ARP query to SRV. The server will send back an ARP reply to R2. This
packet from SRV to R2 will pass SW2 and thus have the effect of renewing
the MAC binding at SW2 too. The flooding stops and all is well again. Until
the MAC binding expires and the story repeats.

If the MAC timeout is 5 minutes and the ARP timeout is 20 minutes, which is
very usual, you will have flooding for 15 minutes out of every 20 minutes
interval! Stupid!

Why have vendors not fixed their defaults for this case?

Regards,

Baldur



On Thu, Sep 17, 2020 at 7:51 AM Saku Ytti  wrote:

> On Wed, 16 Sep 2020 at 23:15, Chriztoffer Hansen
>  wrote:
> > On 16/09/2020 04:01, Ryan Hamel wrote:
>
> > > CoPP is always important, and it's not just Mikrotik's with default low
> > > ARP timeouts.
> > >
> > > Linux - 1 minute
> > > Brocade - 10 minutes
> > > Cumulus  - 18 minutes
> > > BSD distros - 20 minutes
> > > Extreme - 20 minutes
> > Juniper - 20 minutes
> > > HP - 25 minutes
> IOS - 4 hours
>
> Why are these considered (by Ryan) low values? Does low have a
> negative connotation here?
>
> ARP timeout should be lower than MAC timeout, and MAC timeout usually
> is 300 seconds. Anything above 300seconds is probably poor BCP for
> default value, as defaults should interoperate in a somewhat sane
> manner.
> Of course operators are free to configure very high ARP timeout, as
> long as they also remember to equally configure higher MAC timeout.
>
> --
>   ++ytti
>


Re: BFD for routes learned trough Route-servers in IXPs

2020-09-17 Thread Randy Bush
> a) Check if there is anything hindering the evolution of this draft to
> an RFC.

was i unclear?

> the draft passed wglc in 1948.  it is awaiting two
> implementations, as is the wont of the idr wg.

randy


Re: BFD for routes learned trough Route-servers in IXPs

2020-09-17 Thread Paul Timmins

On 9/17/20 1:51 PM, Douglas Fischer wrote:

But 30 Seconds for an IXP? It does not make any sense!
Those packets are stealing CPU cycles of the Control Plane of any 
router in the LAN.


Especially given how some exchanges lock the mac address of 
participants. You could probably get away with ARP timeouts of a day or 
even just permanent with manual clearing when you see a peer go down.


-Paul



Re: BFD for routes learned trough Route-servers in IXPs

2020-09-17 Thread Douglas Fischer
If you look just to the normal situations...
1.2K vs 576K may not represent very much.

But if you look tho ARP Requests Graphs on a significative topology
changing on a big IXP, and also look to CPU-per-process graphs, maybe what
I'm suggesting could be more explicit.

I'm talking of good boxes freezing because of that.
Of course CoPP exists to avoid that. But the vanilla configurations of CoPP
combined with lunatic ARP-Timeout causes many day-by-day problems...

So, in this case, the solution would but a BCP with some "MUST"s defining
acceptable rates.

And with that, every that doesn't like to be waked up at dawn will become
happy(at least by this reason).


Em qui., 17 de set. de 2020 às 15:07, Saku Ytti  escreveu:

> On Thu, 17 Sep 2020 at 20:51, Douglas Fischer 
> wrote:
>
> > Why should we spend CPU Cycles with 576K ARP Requests a day(2K
> participants, 5 min ARP-Timeout).
> > Instead of 1.2K ARP Requests a day(2K participants, 4 hours ARP-Timeout)?
> > I would prefer to use those CPU cycles to process other things like BGP
> messages, BFD, etc...
>
> I think this communication may not be very communicative.
>
> How many more BGP messages per day can we process if we do 1.2k ARP
> requests a day instead of 576k? How many more days of DFZ BGP UPDATE
> growth is that?
>
> --
>   ++ytti
>


-- 
Douglas Fernando Fischer
Engº de Controle e Automação


Re: BFD for routes learned trough Route-servers in IXPs

2020-09-17 Thread Saku Ytti
On Thu, 17 Sep 2020 at 20:51, Douglas Fischer  wrote:

> Why should we spend CPU Cycles with 576K ARP Requests a day(2K participants, 
> 5 min ARP-Timeout).
> Instead of 1.2K ARP Requests a day(2K participants, 4 hours ARP-Timeout)?
> I would prefer to use those CPU cycles to process other things like BGP 
> messages, BFD, etc...

I think this communication may not be very communicative.

How many more BGP messages per day can we process if we do 1.2k ARP
requests a day instead of 576k? How many more days of DFZ BGP UPDATE
growth is that?

-- 
  ++ytti


Re: BFD for routes learned trough Route-servers in IXPs

2020-09-17 Thread Douglas Fischer
Well...
My idea with the initial mail was:

a) Check if there is anything hindering the evolution of this draft to an
RFC.

b) Bet in try to make possible a thing that nowadays could be considered
impossible, like:
   "How to enable the BFD capability on a route-server with 2000 BGP
Sessions without crashing the box?"


And maybe:
c) How about suggesting a standard best practice dor ARP-Timeout for IXPs.
   And creating tools to measure the ARP-Timeout configurations of each
participant, and make this info available trough standard protocols.


Em qua., 16 de set. de 2020 às 18:14, Christopher Morrow <
morrowc.li...@gmail.com> escreveu:

> On Wed, Sep 16, 2020 at 4:55 PM Randy Bush  wrote:
> >
> > >>> So, I was searching on how to solve that and I found a draft (8th
> release)
> > >>> with the intention to solve that...
> > >>> https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
> > >>>
> > >>> If understood correctly, the effective implementation of it will
> depend on
> > >>> new code on any BGP engine that will want to do that check.
> > >>> It is kind of frustrating... At least 10 years after the release of
> RFC
> > >>> until the refresh os every router involved in IXPs in the world.
> > >>
> > >> you have a better (== easier to implement and deploy) signaling path?
> > >>
> > >> the draft passed wglc in 1948.  it is awaiting two implementations, as
> > >> is the wont of the idr wg.
> > >
> > > I think you also mean to say: "this is actually still a DRAFT and not
> > > an RFC, so really no BGP implementor is beholden to this document,
> > > unless they have coin bearing customers who wish to see this feature
> > > implemented"
> >
> > if i had meant to say that, i probably would have.  no one on this
> > thread has called it anything other than a draft, so i am quite unsure
> > what your point is; and i will not put words in your mouth.
>
> I think the OP said:
> " At least 10 years after the release of RFC
> > >>> until the refresh os every router involved in IXPs in the world."
>
> it's not an rfc yet.
>
> > sadly, these years, vendors do not seem to care a lot about drafts,
> > rfcs, ...  anything which sells.
>
> sure :(
>


-- 
Douglas Fernando Fischer
Engº de Controle e Automação


Re: BFD for routes learned trough Route-servers in IXPs

2020-09-17 Thread Douglas Fischer
About this comparison between CAM-Table Timeout, and ARP-Table Timeout.
I tend to partially agree with you...

Ethernet is a so widely used protocol to sever scenarios.
We need to consider the different needs of the type of communications.


For example:
I'm not a big fan of Mikrotik/RouterOS.
But I know they are there, and liking or not, I need to accept that I will
need to deal with then(as a peer or even as an operator).

One of most common uses of Mikrotik is for HotSpot/Captive Portal.
And for that, an ARP Timeout of 30 seconds is very OK!
Is a good way to check if the EndUser is still reachable on the network,
and based on that do the billing.

But 30 Seconds for an IXP? It does not make any sense!
Those packets are stealing CPU cycles of the Control Plane of any router in
the LAN.

Another example:
You suggested equalizing ARP-Timeout and MAC-Timeout
For a campus LAN? With frequent topology changes, add/removes of
hosts every time...
That is perfect!


But talking about an IXP LAN:
In an ideal scenario, how often should happen topology changes on an IXP?
How often new hosts get ins/outs of hosts in the and IXP LAN?

Why should we spend CPU Cycles with 576K ARP Requests a day(2K
participants, 5 min ARP-Timeout).
Instead of 1.2K ARP Requests a day(2K participants, 4 hours ARP-Timeout)?
I would prefer to use those CPU cycles to process other things like BGP
messages, BFD, etc...





Em qui., 17 de set. de 2020 às 02:54, Saku Ytti  escreveu:

> On Wed, 16 Sep 2020 at 23:15, Chriztoffer Hansen
>  wrote:
> > On 16/09/2020 04:01, Ryan Hamel wrote:
>
> > > CoPP is always important, and it's not just Mikrotik's with default low
> > > ARP timeouts.
> > >
> > > Linux - 1 minute
> > > Brocade - 10 minutes
> > > Cumulus  - 18 minutes
> > > BSD distros - 20 minutes
> > > Extreme - 20 minutes
> > Juniper - 20 minutes
> > > HP - 25 minutes
> IOS - 4 hours
>
> Why are these considered (by Ryan) low values? Does low have a
> negative connotation here?
>
> ARP timeout should be lower than MAC timeout, and MAC timeout usually
> is 300 seconds. Anything above 300seconds is probably poor BCP for
> default value, as defaults should interoperate in a somewhat sane
> manner.
> Of course operators are free to configure very high ARP timeout, as
> long as they also remember to equally configure higher MAC timeout.
>
> --
>   ++ytti
>


-- 
Douglas Fernando Fischer
Engº de Controle e Automação


Re: BFD for routes learned trough Route-servers in IXPs

2020-09-17 Thread Robert Raszuk
>
> If the traffic is that important then the public internet is the wrong
> way to transport it.


Nonsense.

It is usually something said by those who do not know how to use Internet
as a transport in a reliable way between two endpoints.

In your books what is Internet good for ? Torrent and porn ?

>  The internet has convergence times up to multiple minutes.

It does not matter how long does it take to "converge" any single path.

Hint: Consider using multiple disjoined paths and you see that for vast
majority of "Internet failures" the connectivity restoration time would be
very close to your RTT time between your endpoints.

Rgs,
R.


Re: BFD for routes learned trough Route-servers in IXPs

2020-09-16 Thread Karsten Elfenbein
Am Mi., 16. Sept. 2020 um 02:57 Uhr schrieb Douglas Fischer
:
>
> Time-to-time, in some IXP in the world some issue on the forwarding plane 
> occurs.
> When it occurs, this topic comes back.
>
> The failures are not big enough to drop the BGP sessions between IXP 
> participants and route-servers.
>
> But are enough to prejudice traffic between participants.
>
> And then the problem comes:
> "How can I check if my communication against the NextHop of the routes that I 
> learn from the route-servers are OK?
> If it is not OK, how can I remove it from my FIB?"

If the traffic is that important then the public internet is the wrong
way to transport it. The internet has convergence times up to multiple
minutes. Failures can occur everywhere.
Reacting to these changes comes at a global cost.

> Some other possible causes of this feeling are:
> - ARP Resolution issues
> (CPU protection and lunatic Mikrotiks with 30 seconds ARP timeout is a 
> bombastic recipe)
> - MAC-Address Learning limitations on the transport link of the participants 
> can be a pain in the a..rm.

IXP can/do limit the participant port allowed MAC
IXP usually provide a sane config which includes ARP timeouts (which
can be checked and an ARP sponge helps as well)
The same goes for all the other multicast/broadcast protocols.

>
> So, I was searching on how to solve that and I found a draft (8th release) 
> with the intention to solve that...
> https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
>
> If understood correctly, the effective implementation of it will depend on 
> new code on any BGP engine that will want to do that check.
> It is kind of frustrating... At least 10 years after the release of RFC until 
> the refresh os every router involved in IXPs in the world.
>
>
> Some questions come:
> A) There is anything that we can do to rush this?
> B) There is any other alternative to that?

IXP are not simple L2 switches anymore, forwarding is done with
LACP/MPLS/VXLAN/... over multiple paths. When A and B can reach a
route-server it does not guarantee that A can reach B.
Using BFD between members might help or might not as you can not check
the complete topology below.

The IXP should use BFD and maybe even compare interface counters on
both sides of a link in their infrastructure.

@past dayjob: We monitored IXP health by pinging our peers/next-hops
every X minutes and alerted NOC when there would be bigger changes.
Like 10% of peers/next-hops that responded before stopped responding
to ICMP.

>
> P.S.1: I gave up of inventing crazy BGP filter polices to test reachability 
> of NextHop. The effectiveness of it can't even be compared to BFD, and almost 
> kill de processing capacity of my router.
>
> P.S.2: IMHO, the biggest downside of those problems is the evasion of 
> route-servers from some participants when issues described  above occurs.

route-servers caused some issues in the past like not propagating the
revocation/timeout of prefixes
some peers like a more direct relationship


Re: BFD for routes learned trough Route-servers in IXPs

2020-09-16 Thread Saku Ytti
On Wed, 16 Sep 2020 at 23:15, Chriztoffer Hansen
 wrote:
> On 16/09/2020 04:01, Ryan Hamel wrote:

> > CoPP is always important, and it's not just Mikrotik's with default low
> > ARP timeouts.
> >
> > Linux - 1 minute
> > Brocade - 10 minutes
> > Cumulus  - 18 minutes
> > BSD distros - 20 minutes
> > Extreme - 20 minutes
> Juniper - 20 minutes
> > HP - 25 minutes
IOS - 4 hours

Why are these considered (by Ryan) low values? Does low have a
negative connotation here?

ARP timeout should be lower than MAC timeout, and MAC timeout usually
is 300 seconds. Anything above 300seconds is probably poor BCP for
default value, as defaults should interoperate in a somewhat sane
manner.
Of course operators are free to configure very high ARP timeout, as
long as they also remember to equally configure higher MAC timeout.

-- 
  ++ytti


Re: BFD for routes learned trough Route-servers in IXPs

2020-09-16 Thread Christopher Morrow
On Wed, Sep 16, 2020 at 4:55 PM Randy Bush  wrote:
>
> >>> So, I was searching on how to solve that and I found a draft (8th release)
> >>> with the intention to solve that...
> >>> https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
> >>>
> >>> If understood correctly, the effective implementation of it will depend on
> >>> new code on any BGP engine that will want to do that check.
> >>> It is kind of frustrating... At least 10 years after the release of RFC
> >>> until the refresh os every router involved in IXPs in the world.
> >>
> >> you have a better (== easier to implement and deploy) signaling path?
> >>
> >> the draft passed wglc in 1948.  it is awaiting two implementations, as
> >> is the wont of the idr wg.
> >
> > I think you also mean to say: "this is actually still a DRAFT and not
> > an RFC, so really no BGP implementor is beholden to this document,
> > unless they have coin bearing customers who wish to see this feature
> > implemented"
>
> if i had meant to say that, i probably would have.  no one on this
> thread has called it anything other than a draft, so i am quite unsure
> what your point is; and i will not put words in your mouth.

I think the OP said:
" At least 10 years after the release of RFC
> >>> until the refresh os every router involved in IXPs in the world."

it's not an rfc yet.

> sadly, these years, vendors do not seem to care a lot about drafts,
> rfcs, ...  anything which sells.

sure :(


Re: BFD for routes learned trough Route-servers in IXPs

2020-09-16 Thread Randy Bush
>>> So, I was searching on how to solve that and I found a draft (8th release)
>>> with the intention to solve that...
>>> https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
>>>
>>> If understood correctly, the effective implementation of it will depend on
>>> new code on any BGP engine that will want to do that check.
>>> It is kind of frustrating... At least 10 years after the release of RFC
>>> until the refresh os every router involved in IXPs in the world.
>>
>> you have a better (== easier to implement and deploy) signaling path?
>>
>> the draft passed wglc in 1948.  it is awaiting two implementations, as
>> is the wont of the idr wg.
> 
> I think you also mean to say: "this is actually still a DRAFT and not
> an RFC, so really no BGP implementor is beholden to this document,
> unless they have coin bearing customers who wish to see this feature
> implemented"

if i had meant to say that, i probably would have.  no one on this
thread has called it anything other than a draft, so i am quite unsure
what your point is; and i will not put words in your mouth.

sadly, these years, vendors do not seem to care a lot about drafts,
rfcs, ...  anything which sells.

randy


Re: BFD for routes learned trough Route-servers in IXPs

2020-09-16 Thread Christopher Morrow
On Tue, Sep 15, 2020 at 9:40 PM Randy Bush  wrote:
>
> > So, I was searching on how to solve that and I found a draft (8th release)
> > with the intention to solve that...
> > https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
> >
> > If understood correctly, the effective implementation of it will depend on
> > new code on any BGP engine that will want to do that check.
> > It is kind of frustrating... At least 10 years after the release of RFC
> > until the refresh os every router involved in IXPs in the world.
>
> you have a better (== easier to implement and deploy) signaling path?
>
> the draft passed wglc in 1948.  it is awaiting two implementations, as
> is the wont of the idr wg.

I think you also mean to say: "this is actually still a DRAFT and not
an RFC, so really no BGP implementor is beholden to this document,
unless they have coin bearing customers who wish to see this feature
implemented"


BFD for routes learned trough Route-servers in IXPs

2020-09-16 Thread Chriztoffer Hansen

On 16/09/2020 04:01, Ryan Hamel wrote:
> CoPP is always important, and it's not just Mikrotik's with default low
> ARP timeouts.
> 
> Linux - 1 minute
> Brocade - 10 minutes
> Cumulus  - 18 minutes
> BSD distros - 20 minutes
> Extreme - 20 minutes

Juniper - 20 minutes

> HP - 25 minutes

-- 
Chriztoffer



smime.p7s
Description: S/MIME Cryptographic Signature


Re: BFD for routes learned trough Route-servers in IXPs

2020-09-16 Thread Nick Hilliard

Ryan Hamel wrote on 16/09/2020 03:01:

Install a route optimizer that constantly pings next hops


or if you want a more reliable IXP experience, don't install a route 
optimiser and if you do, don't make it ping next-hops.


- you're not guaranteed that the icmp reply back to the route optimiser 
will follow the forward path.


- you are guaranteed that icmp is heavily deprioritised on ixp routers

- the busier the IXP, the busier the control planes of all the IXP 
routers you're going to ping, and the more likely they are to drop your 
ping packets. This will lead to greater route churn. If this approach is 
widely deployed it will lead to wider-scale routing oscillations due to 
control plane mismanagement.


- route optimisers are associated with serious bgp leakage issues. if 
you're doing this at an IXP, the danger is significantly magnified 
because bi-lat peering sessions rarely, if ever, implement prefix filtering.


It is true that IXPs occasionally see forwarding plane failures.  These 
tend to be pretty unusual these days.


Be careful about optimising edge cases like this.  You'll often end up 
introducing new failure modes which may be more serious and which may 
occur more regularly.


Nick


Re: BFD for routes learned trough Route-servers in IXPs

2020-09-16 Thread Zbyněk Pospíchal
Hi,

In some IXPs, getting a BFD protected BGP sessions with their
route-servers is possible. However, it is usualy optional, so there is
no way how to discover know who of your MLPA peering partners has their
sessions protected the same way and who don't.

You can also ask peers you have a session with to enable BFD there. If
they run carrier-grade border routes connected to IXP switches just with
fibers, it works pretty well.

So just try to talk with your peers about BFD.

-- 
S pozdravem/Best Regards,

Zbyněk Pospíchal




Dne 16.09.20 v 2:55 Douglas Fischer napsal(a):
> Time-to-time, in some IXP in the world some issue on the forwarding
> plane occurs.
> When it occurs, this topic comes back.
> 
> The failures are not big enough to drop the BGP sessions between IXP
> participants and route-servers.
> 
> But are enough to prejudice traffic between participants.
> 
> And then the problem comes:
> "How can I check if my communication against the NextHop of the routes
> that I learn from the route-servers are OK?
> If it is not OK, how can I remove it from my FIB?"
> 
> Some other possible causes of this feeling are:
> - ARP Resolution issues
> (CPU protection and lunatic Mikrotiks with 30 seconds ARP timeout is a
> bombastic recipe)
> - MAC-Address Learning limitations on the transport link of the
> participants can be a pain in the a..rm.
> 
> 
> So, I was searching on how to solve that and I found a draft (8th
> release) with the intention to solve that...
> https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
> 
> If understood correctly, the effective implementation of it will depend
> on new code on any BGP engine that will want to do that check.
> It is kind of frustrating... At least 10 years after the release of RFC
> until the refresh os every router involved in IXPs in the world.
> 
> 
> Some questions come:
> A) There is anything that we can do to rush this?
> B) There is any other alternative to that?
> 
> 
> P.S.1: I gave up of inventing crazy BGP filter polices to test
> reachability of NextHop. The effectiveness of it can't even be compared
> to BFD, and almost kill de processing capacity of my router.
> 
> P.S.2: IMHO, the biggest downside of those problems is the evasion of
> route-servers from some participants when issues described  above occurs.




Re: BFD for routes learned trough Route-servers in IXPs

2020-09-15 Thread Ryan Hamel
> "How can I check if my communication against the NextHop of the routes that I 
> learn from the route-servers are OK? If it is not OK, how can I remove it 
> from my FIB?"

Install a route optimizer that constantly pings next hops, when the drop 
threshold is met, remove the routes. No one is going to open BFD to whole 
subnets, especially those they don't have peering agreements with, making this 
pointless.
> - ARP Resolution issues (CPU protection and lunatic Mikrotiks with 30 seconds 
> ARP timeout is a bombastic recipe)
CoPP is always important, and it's not just Mikrotik's with default low ARP 
timeouts.
Linux - 1 minute
Brocade - 10 minutes
Cumulus - 18 minutes
BSD distros - 20 minutes
Extreme - 20 minutes
HP - 25 minutes

> - MAC-Address Learning limitations on the transport link of the participants 
> can be a pain in the a..rm.
As you said, this issue doesn't seem important enough to warrant significant 
action. For transport, colo a switch that can handles BGP announcements, 
routes, and ARPs, then transport that across with only 2 MACs and internal 
point-to-point IP assignments.
Ryan
On Sep 15 2020, at 5:55 pm, Douglas Fischer  wrote:
> Time-to-time, in some IXP in the world some issue on the forwarding plane 
> occurs.
> When it occurs, this topic comes back.
>
> The failures are not big enough to drop the BGP sessions between IXP 
> participants and route-servers.
>
> But are enough to prejudice traffic between participants.
>
> And then the problem comes:
> "How can I check if my communication against the NextHop of the routes that I 
> learn from the route-servers are OK?
> If it is not OK, how can I remove it from my FIB?"
>
> Some other possible causes of this feeling are:
> - ARP Resolution issues
> (CPU protection and lunatic Mikrotiks with 30 seconds ARP timeout is a 
> bombastic recipe)
> - MAC-Address Learning limitations on the transport link of the participants 
> can be a pain in the a..rm.
>
>
> So, I was searching on how to solve that and I found a draft (8th release) 
> with the intention to solve that...
> https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
>
>
> If understood correctly, the effective implementation of it will depend on 
> new code on any BGP engine that will want to do that check.
> It is kind of frustrating... At least 10 years after the release of RFC until 
> the refresh os every router involved in IXPs in the world.
>
>
> Some questions come:
> A) There is anything that we can do to rush this?
> B) There is any other alternative to that?
>
>
> P.S.1: I gave up of inventing crazy BGP filter polices to test reachability 
> of NextHop. The effectiveness of it can't even be compared to BFD, and almost 
> kill de processing capacity of my router.
>
> P.S.2: IMHO, the biggest downside of those problems is the evasion of 
> route-servers from some participants when issues described above occurs.

Re: BFD for routes learned trough Route-servers in IXPs

2020-09-15 Thread Randy Bush
> So, I was searching on how to solve that and I found a draft (8th release)
> with the intention to solve that...
> https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08
> 
> If understood correctly, the effective implementation of it will depend on
> new code on any BGP engine that will want to do that check.
> It is kind of frustrating... At least 10 years after the release of RFC
> until the refresh os every router involved in IXPs in the world.

you have a better (== easier to implement and deploy) signaling path?

the draft passed wglc in 1948.  it is awaiting two implementations, as
is the wont of the idr wg.

randy


BFD for routes learned trough Route-servers in IXPs

2020-09-15 Thread Douglas Fischer
Time-to-time, in some IXP in the world some issue on the forwarding plane
occurs.
When it occurs, this topic comes back.

The failures are not big enough to drop the BGP sessions between IXP
participants and route-servers.

But are enough to prejudice traffic between participants.

And then the problem comes:
"How can I check if my communication against the NextHop of the routes that
I learn from the route-servers are OK?
If it is not OK, how can I remove it from my FIB?"

Some other possible causes of this feeling are:
- ARP Resolution issues
(CPU protection and lunatic Mikrotiks with 30 seconds ARP timeout is a
bombastic recipe)
- MAC-Address Learning limitations on the transport link of the
participants can be a pain in the a..rm.


So, I was searching on how to solve that and I found a draft (8th release)
with the intention to solve that...
https://tools.ietf.org/html/draft-ietf-idr-rs-bfd-08

If understood correctly, the effective implementation of it will depend on
new code on any BGP engine that will want to do that check.
It is kind of frustrating... At least 10 years after the release of RFC
until the refresh os every router involved in IXPs in the world.


Some questions come:
A) There is anything that we can do to rush this?
B) There is any other alternative to that?


P.S.1: I gave up of inventing crazy BGP filter polices to test reachability
of NextHop. The effectiveness of it can't even be compared to BFD, and
almost kill de processing capacity of my router.

P.S.2: IMHO, the biggest downside of those problems is the evasion of
route-servers from some participants when issues described  above occurs.