Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-11 Thread adamv0025
> Rob Foehl
> Sent: Tuesday, November 10, 2020 6:26 PM
> 
> On Tue, 10 Nov 2020, Jeffrey Haas wrote:
> 
> > The thing to remember is that even though you're not getting a given
afi/safi
> as front-loaded as you want (absolute front of queue), as soon as we have
> routes for that priority they're dispatched accordingly.
> 
> Right, that turns out to be the essential issue -- the output queues
actually are
> working as configured, but the AFI/SAFI routes relevant to a higher
priority
> queue arrive so late in the process that it's basically irrelevant whether
they
> get to cut in line at that point.  Certainly wasn't observable to human
eyes, had
> to capture the traffic to verify.
> 
I agree if priority of route processing is to be user
controllable/selectable then it needs to apply end-to-end, i.e. RX,
processing, TX.
 

> > Full table walks to populate the queues take some seconds to several
minutes
> depending on the scale of the router.  In the absence of prioritization,
> something like the evpn routes might not go out for most of a minute
rather
> than getting delayed some number of seconds until the rib walker has
reached
> that table.
> 
> Ah, maybe this is the sticking point: on a route reflector with an
> RE-S-X6-64 carrying ~10M inet routes and ~10K evpn routes, a new session
> toward an RR client PE needing to be sent ~1.6M inet routes (full table,
add-
> path 2) and maybe ~3K evpn routes takes between 11-17 minutes to get
> through the initial batch.  The evpn routes only arrive at the tail end of
that,
> and may only preempt around 1000 inet routes in the output queues, as
> confirmed by TAC.
> 
> I have some RRs that tend toward the low end of that range and some that
tend
> toward the high end -- and not entirely sure why in either case -- but
that
> timing is pretty consistent overall, and pretty horrifying.  I could
almost live
> with "most of a minute", but this is not that.
> 
Well regardless of this issue at hand I urge you to use separate RRs for
distribution of Internet prefixes and separate ones for VPN(L3/L3) prefixes.
Not only it might *address your problem but it's also much safer since the
probability of malformed message arriving via the Internet (e.g. some
university doing experiments) is much higher then it being originated by
your own PEs.

*it won't address your issue cause PEs on the receiving end will still have
broken priority. 
 


> 
> [on the topic of route refreshes]
> 
> > The intent of the code is to issue the minimum set of refreshes for new
> configuration.  If it's provably not minimum for a given config, there
should be
> a PR on that.
> 
> I'm pretty sure that much is working as intended, given what is actually
> sent -- this issue is the time spent walking other RIBs that have no
> bearing on what's being refreshed.
> 
This is a notorious case actually, again probably because of a missing
state.
Ran into an issue with 2k VRFs + VRF containing internet routes, 
Say after adding 2001st VRF it would take up to 10 minutes for routes
already in VPNv4 on a local PE to actually make it into the newly configured
VRF (directly connected prefixes and static routes appeared instantly).
  

> > The cost of the refresh in getting routes sent to you is another
artifact of "we
> don't keep that state" - at least in that configuration.  This is a
circumstance
> where family route-target (RT-Constrain) may help.  You should find when
> using that feature that adding a new VRF with support for that feature
results in
> the missing routes arriving quite fast - we keep the state.
> 
> I'd briefly looked at RT-Constrain, but wasn't convinced it'd be useful
> here since disinterested PEs only have to discard at most ~10K EVPN routes
> at present.  Worth revisiting that assessment?
> 
It would definitely save some cycles and I'd say it's worth implementing.


adam 

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-10 Thread Rob Foehl

On Tue, 10 Nov 2020, Robert Raszuk wrote:


But what seems wired is last statement: 

"This has problems with blackholing traffic for long periods in several
cases,..." 

We as the industry have solved this problem many years ago, by clearly
decoupling connectivity restoration term from protocol convergence term. 


Fundamentally, yes -- but not for EVPN DF elections.  Each PE making its 
own decisions about who wins without any round-trip handshake agreement is 
the root of the problem, at least when coupled with all of the fun that 
comes with layer 2 flooding.


There's also no binding between whether a PE has actually converged and 
when it brings up IRBs and starts announcing those routes, which leads to 
a different sort of blackholing.  Or in the single-active case, whether 
the IRB should even be brought up at all, which leads to some really dumb 
traffic paths.  (Think layer 3 via P -> inactive PE -> same P, different 
encapsulation -> active PE -> layer 2 segment, for an example.)



I think this would be a recommended direction not so much to mangle BGP code
to optimize here and in the same time cause new maybe more severe issues
somewhere else. Sure per SAFI refresh should be the norm, but I don't think
this is the main issue here. 


Absolutely.  The reason for the concern here is that the output queue 
priorities would be sufficient to work around the more fundamental flaws, 
if not for the fact that they're largely ineffective in this exact case.


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-10 Thread Rob Foehl

On Tue, 10 Nov 2020, Gert Doering wrote:


Can you do the EVPN routes on a separate session (different loopback on
both ends, dedicated to EVPN-afi-only BGP)?  Or separate RRs?

Yes, this is not what you're asking, just a wild idea to make life
better :-)


Not that wild -- I've already been pinning up EVPN-only sessions between 
adjacent PEs to smooth out the DF elections where possible.  Discrete 
sessions over multiple loopback addresses also work, at the cost of extra 
complexity.


At some point that starts to look like giving up on RRs, though -- which 
I'd rather avoid, they're kinda useful :)


-Rob
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-10 Thread Robert Raszuk
>
> Can you do the EVPN routes on a separate session (different loopback on
> both ends, dedicated to EVPN-afi-only BGP)?


Separate sessions would help if TCP socket would be the real issue, but
here clear it is not.


> Or separate RRs?
>

Sure that may help. In fact even separate RPD demon on the same box may
help :)

But what seems wired is last statement:

"This has problems with blackholing traffic for long periods in several
cases,..."

We as the industry have solved this problem many years ago, by clearly
decoupling connectivity restoration term from protocol convergence term.

IMO protocols can take as much as they like to "converge" after bad or good
network event yet connectivity restoration upon any network event within a
domain (RRs were brought as example) should be max of 100s of ms. Clearly
sub second.

How:

- RIB tracks next hops and when they go down (known via fast IGP flooding)
or their metric changes then paths with such next hop are either removed or
best path is run
- Data plane has precomputed backup paths and switchover happens in the PIC
fashion in parallel to any control plane stress free work

I think this would be a recommended direction not so much to mangle BGP
code to optimize here and in the same time cause new maybe more severe
issues somewhere else. Sure per SAFI refresh should be the norm, but I
don't think this is the main issue here.

Thx,
R.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-10 Thread Gert Doering
Hi,

On Tue, Nov 10, 2020 at 01:26:09PM -0500, Rob Foehl wrote:
> Ah, maybe this is the sticking point: on a route reflector with an 
> RE-S-X6-64 carrying ~10M inet routes and ~10K evpn routes, a new session 
> toward an RR client PE needing to be sent ~1.6M inet routes (full table, 
> add-path 2) and maybe ~3K evpn routes takes between 11-17 minutes to get 
> through the initial batch.  The evpn routes only arrive at the tail end of 
> that, and may only preempt around 1000 inet routes in the output queues, 
> as confirmed by TAC.

Can you do the EVPN routes on a separate session (different loopback on
both ends, dedicated to EVPN-afi-only BGP)?  Or separate RRs?

Yes, this is not what you're asking, just a wild idea to make life
better :-)

gert
-- 
"If was one thing all people took for granted, was conviction that if you 
 feed honest figures into a computer, honest figures come out. Never doubted 
 it myself till I met a computer with a sense of humor."
 Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany g...@greenie.muc.de


signature.asc
Description: PGP signature
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-10 Thread Rob Foehl

On Tue, 10 Nov 2020, Jeffrey Haas wrote:


The thing to remember is that even though you're not getting a given afi/safi 
as front-loaded as you want (absolute front of queue), as soon as we have 
routes for that priority they're dispatched accordingly.


Right, that turns out to be the essential issue -- the output queues 
actually are working as configured, but the AFI/SAFI routes relevant to a 
higher priority queue arrive so late in the process that it's basically 
irrelevant whether they get to cut in line at that point.  Certainly 
wasn't observable to human eyes, had to capture the traffic to verify.



Full table walks to populate the queues take some seconds to several minutes 
depending on the scale of the router.  In the absence of prioritization, 
something like the evpn routes might not go out for most of a minute rather 
than getting delayed some number of seconds until the rib walker has reached 
that table.


Ah, maybe this is the sticking point: on a route reflector with an 
RE-S-X6-64 carrying ~10M inet routes and ~10K evpn routes, a new session 
toward an RR client PE needing to be sent ~1.6M inet routes (full table, 
add-path 2) and maybe ~3K evpn routes takes between 11-17 minutes to get 
through the initial batch.  The evpn routes only arrive at the tail end of 
that, and may only preempt around 1000 inet routes in the output queues, 
as confirmed by TAC.


I have some RRs that tend toward the low end of that range and some that 
tend toward the high end -- and not entirely sure why in either case -- 
but that timing is pretty consistent overall, and pretty horrifying.  I 
could almost live with "most of a minute", but this is not that.


This has problems with blackholing traffic for long periods in several 
cases, but the consequences for DF elections are particularly disastrous, 
given that they make up their own minds based on received state without 
any affirmative handshake: the only possible behaviors are discarding or 
looping traffic for every ethernet segment involved until the routes 
settle, depending on whether the PE involved believes it's going to win 
the election and how soon.  Setting extremely long 20 minute DF election 
hold timers is currently the least worst "solution", as losing traffic for 
up to 20 minutes is preferable to flooding a segment into oblivion -- but 
only just.


I wouldn't be nearly as concerned with this if we weren't taking 15-20 
minute outages every time anything changes on one of the PEs involved...



[on the topic of route refreshes]


The intent of the code is to issue the minimum set of refreshes for new 
configuration.  If it's provably not minimum for a given config, there should 
be a PR on that.


I'm pretty sure that much is working as intended, given what is actually 
sent -- this issue is the time spent walking other RIBs that have no 
bearing on what's being refreshed.



The cost of the refresh in getting routes sent to you is another artifact of "we 
don't keep that state" - at least in that configuration.  This is a circumstance 
where family route-target (RT-Constrain) may help.  You should find when using that 
feature that adding a new VRF with support for that feature results in the missing routes 
arriving quite fast - we keep the state.


I'd briefly looked at RT-Constrain, but wasn't convinced it'd be useful 
here since disinterested PEs only have to discard at most ~10K EVPN routes 
at present.  Worth revisiting that assessment?


-Rob


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-10 Thread Jeffrey Haas via juniper-nsp
--- Begin Message ---
Rob,


> On Nov 9, 2020, at 9:53 PM, Rob Foehl  wrote:
> 
>> An immense amount of work in the BGP code is built around the need to not 
>> have to keep full state on EVERYTHING.  We're already one of the most 
>> stateful BGP implementations on the planet.  Many times that helps us, 
>> sometimes it doesn't.
>> 
>> But as a result of such designs, for certain kinds of large work it is 
>> necessary to have a consistent work list and build a simple iterator on 
>> that.  One of the more common patterns that is impacted by this is the walk 
>> of the various routing tables.  As noted, we start roughly at inet.0 and go 
>> forward based on internal table order.
> 
> Makes sense, but also erases the utility of output queue priorities when
> multiple tables are involved.  Is there any feasibility of moving the RIB
> walking in the direction of more parallelism, or at least something like
> round robin between tables, without incurring too much overhead / bug
> surface / et cetera?

Recent RPD work has been done toward introducing multiple threads of execution 
on the BGP pipeline.  (Sharding.)  The output queue work is still applicable 
here.

The thing to remember is that even though you're not getting a given afi/safi 
as front-loaded as you want (absolute front of queue), as soon as we have 
routes for that priority they're dispatched accordingly.

Full table walks to populate the queues take some seconds to several minutes 
depending on the scale of the router.  In the absence of prioritization, 
something like the evpn routes might not go out for most of a minute rather 
than getting delayed some number of seconds until the rib walker has reached 
that table.

Perfect?  No. Better, yes.  It was a tradeoff of complexity vs. perfect 
queueing behavior.

> 
>> The primary challenge for populating the route queues in user desired orders 
>> is to move that code out of the pattern that is used for quite a few other 
>> things.  While you may want your evpn routes to go first, you likely don't 
>> want route resolution which is using earlier tables to be negatively 
>> impacted.  Decoupling the iterators for the overlapping table impacts is 
>> challenging, at best.  Once we're able to achieve that, the user 
>> configuration becomes a small thing.
> 
> I'm actually worried that if the open ER goes anywhere, it'll result in
> the ability to specify a table order only, and that's an awfully big
> hammer when what's really needed is the equivalent of the output queue
> priorities covering the entire process.  Some of these animals are more
> equal than others.

Which is why there's 16 queues to work with.  In my usual presentation on this 
feature, "why 16?".  The answer is "we needed at least the usual 3 
(low/medium/high), but then users would fight over arbitrary arrangement of 
those among different tables, and you can't do absolute prioritization of those 
per table because VPN may be least priority for some people and Internet 
highest or vice versa... and also, it should be less than 32 based on available 
bits in the data structure".

Yes, some more equal than others.  Hence the flexibility.

Once there's an ability to adjust the walker, it wouldn't impact the use of the 
queues.  It simply would make sure that things were prioritized earlier.

And, as noted above, we're continuing threading work.  At some point the tables 
may gain additional levels of independence which would obviate an explicit 
feature... maybe.  The core observation I make for a lot of this stuff is 
"There's always too much work to do".

> 
>> I don't recall seeing the question about the route refreshes, but I can 
>> offer a small bit of commentary: The CLI for our route refresh isn't as 
>> fine-grained as it could be.  The BGP extension for route refresh permits 
>> per afi/safi refreshing and honestly, we should expose that to the user.  I 
>> know I flagged this for PLM at one point in the past.
> 
> The route refresh issue mostly causes trouble when bringing new PEs into
> existing instances, and is presumably a consequence of the same behavior:
> the refresh message includes the correct AFI/SAFI, but the remote winds up
> walking every RIB before it starts emitting routes for the requested
> family (and no others).  The open case for the output queue issue has a
> note from 9/2 wherein TAC was able to reproduce this behavior and collect
> packet captures of both the specific refresh message and the long period
> of silence before any routes were sent.

The intent of the code is to issue the minimum set of refreshes for new 
configuration.  If it's provably not minimum for a given config, there should 
be a PR on that.

The cost of the refresh in getting routes sent to you is another artifact of 
"we don't keep that state" - at least in that configuration.  This is a 
circumstance where family route-target (RT-Constrain) may help.  You should 
find when using that feature that adding a new VRF with 

Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-09 Thread Rob Foehl

On Mon, 9 Nov 2020, Jeffrey Haas wrote:


As the source of this particular bit of difficulty, a bit of explanation for 
why it simply wasn't done when the initial feature was authored.


Much appreciated -- the explanation, anyway ;)


An immense amount of work in the BGP code is built around the need to not have 
to keep full state on EVERYTHING.  We're already one of the most stateful BGP 
implementations on the planet.  Many times that helps us, sometimes it doesn't.

But as a result of such designs, for certain kinds of large work it is 
necessary to have a consistent work list and build a simple iterator on that.  
One of the more common patterns that is impacted by this is the walk of the 
various routing tables.  As noted, we start roughly at inet.0 and go forward 
based on internal table order.


Makes sense, but also erases the utility of output queue priorities when 
multiple tables are involved.  Is there any feasibility of moving the RIB 
walking in the direction of more parallelism, or at least something like 
round robin between tables, without incurring too much overhead / bug 
surface / et cetera?



The primary challenge for populating the route queues in user desired orders is 
to move that code out of the pattern that is used for quite a few other things. 
 While you may want your evpn routes to go first, you likely don't want route 
resolution which is using earlier tables to be negatively impacted.  Decoupling 
the iterators for the overlapping table impacts is challenging, at best.  Once 
we're able to achieve that, the user configuration becomes a small thing.


I'm actually worried that if the open ER goes anywhere, it'll result in 
the ability to specify a table order only, and that's an awfully big 
hammer when what's really needed is the equivalent of the output queue 
priorities covering the entire process.  Some of these animals are more 
equal than others.



I don't recall seeing the question about the route refreshes, but I can offer a 
small bit of commentary: The CLI for our route refresh isn't as fine-grained as 
it could be.  The BGP extension for route refresh permits per afi/safi 
refreshing and honestly, we should expose that to the user.  I know I flagged 
this for PLM at one point in the past.


The route refresh issue mostly causes trouble when bringing new PEs into 
existing instances, and is presumably a consequence of the same behavior: 
the refresh message includes the correct AFI/SAFI, but the remote winds up 
walking every RIB before it starts emitting routes for the requested 
family (and no others).  The open case for the output queue issue has a 
note from 9/2 wherein TAC was able to reproduce this behavior and collect 
packet captures of both the specific refresh message and the long period 
of silence before any routes were sent.


-Rob

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-09 Thread Jeffrey Haas via juniper-nsp
--- Begin Message ---


> On Nov 9, 2020, at 12:19 PM, Rob Foehl  wrote:
> 
> [External Email. Be cautious of content]
> 
> 
> On Mon, 27 Jul 2020, Rob Foehl wrote:
> 
>> Anyone know the secret to getting BGP output queue priorities working across
>> multiple NLRIs?
> [...]
>> I've tried about a dozen combinations of options, and cannot get any other
>> result with inet/evpn routes in the same session -- inet.0 routes always
>> arrive ahead of *.evpn.0.
> 
> Following up on this for posterity:
> 
> That last part turns out to not be entirely true.  It appears that the
> output queue priorities do work as intended, but route generation walks
> through the RIBs in a static order, always starting with inet.0 -- so
> maybe the last ~1000 inet routes wind up in the output queues at the same
> time as evpn routes.
> 
> This was declared to be working as designed, and the issue is now stuck in
> ER hell; best estimate for a real solution is "maybe next year".  Route
> refresh for EVPN routes triggering a full walk of all RIBs was also
> confirmed, but remains unexplained.

As the source of this particular bit of difficulty, a bit of explanation for 
why it simply wasn't done when the initial feature was authored.

An immense amount of work in the BGP code is built around the need to not have 
to keep full state on EVERYTHING.  We're already one of the most stateful BGP 
implementations on the planet.  Many times that helps us, sometimes it doesn't. 
 

But as a result of such designs, for certain kinds of large work it is 
necessary to have a consistent work list and build a simple iterator on that.  
One of the more common patterns that is impacted by this is the walk of the 
various routing tables.  As noted, we start roughly at inet.0 and go forward 
based on internal table order.

The primary challenge for populating the route queues in user desired orders is 
to move that code out of the pattern that is used for quite a few other things. 
 While you may want your evpn routes to go first, you likely don't want route 
resolution which is using earlier tables to be negatively impacted.  Decoupling 
the iterators for the overlapping table impacts is challenging, at best.  Once 
we're able to achieve that, the user configuration becomes a small thing.

I don't recall seeing the question about the route refreshes, but I can offer a 
small bit of commentary: The CLI for our route refresh isn't as fine-grained as 
it could be.  The BGP extension for route refresh permits per afi/safi 
refreshing and honestly, we should expose that to the user.  I know I flagged 
this for PLM at one point in the past.

-- Jeff

--- End Message ---
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-11-09 Thread Rob Foehl

On Mon, 27 Jul 2020, Rob Foehl wrote:

Anyone know the secret to getting BGP output queue priorities working across 
multiple NLRIs?

[...]
I've tried about a dozen combinations of options, and cannot get any other 
result with inet/evpn routes in the same session -- inet.0 routes always 
arrive ahead of *.evpn.0.


Following up on this for posterity:

That last part turns out to not be entirely true.  It appears that the 
output queue priorities do work as intended, but route generation walks 
through the RIBs in a static order, always starting with inet.0 -- so 
maybe the last ~1000 inet routes wind up in the output queues at the same 
time as evpn routes.


This was declared to be working as designed, and the issue is now stuck in 
ER hell; best estimate for a real solution is "maybe next year".  Route 
refresh for EVPN routes triggering a full walk of all RIBs was also 
confirmed, but remains unexplained.


-Rob


___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-07-29 Thread Rob Foehl

On Tue, 28 Jul 2020, Jeffrey Haas wrote:


- "show bgp output-scheduler" is empty without top-level "protocols bgp
 output-queue-priority" config, regardless of anything else

- Top-level "protocols bgp family evpn signaling" priority config -- and
 nothing else within that stanza -- broke every v6 session on the box,
 even with family inet6 explicitly configured under those groups


If you're simply trying to prioritize evpn differently than inet unicast, 
simply having a separate priority for that address family should have been 
sufficient.


Right, that's what I took away from the docs...  No luck in any case, 
starting from the "simplest" of just adding this:


set protocols bgp group X family evpn signaling output-queue-priority expedited

That'll produce this in "show bgp group output-queues" for that group:

  NLRI evpn:
OutQ: expedited RRQ: priority 1 WDQ: priority 1

...but that's it, and no change in behavior.  Same config for family inet 
in the same group would show NLRI inet: output, and no more evpn if both 
were configured.  Still no change.



Can you clarify what you mean "broke every v6 session"?


For that one, it shut down every session on the box that didn't explicitly 
have family inet / family evpn configured at the group/neighbor level, 
refused all the incoming family inet sessions with NLRI mismatch (trying 
to send evpn only), and made no attempt to reestablish any of the family 
inet6 sessions.



I think what you're running into is one of the generally gross things about the 
address-family stanza and the inheritance model global => group => neighbor.  
If you specify ANY address-family configuration at a given scope level, it doesn't 
treat it as inheriting the less specific scopes; it overrides it.


In that specific case, yes; maybe I didn't wait long enough, but this was 
only an experiment to see whether setting something under global family 
evpn would do anything different -- and had about the expected result, 
given the way inheritance works.  (This was the least surprising result 
out of everything I tried.  I have logs, if you want 'em.)



FWIW, the use case of "prioritize a family different" is one of the things this 
was intended to address.  Once you have a working config you may find that you want to do 
policy driven config and use the route-type policy to prioritize the DF related routes in 
its own queue.  That way you're not dealing with the swarm of ARP related routes.


Eventually, yes -- same for certain classes of inet routes -- but for now 
I'd have been happy with "just shove everything EVPN into the expedited 
queue".  I couldn't get them ahead of inet, and it was a many-minute wait 
for anything else to arrive, so pretty easy to observe...


-Rob



- Per-group family evpn priority config would show up under "show bgp
 group output-queues" and similar, but adding family inet would cause the
 NLRI evpn priority output to disappear

- Policy-level adjustments to any of the above had no effect between NLRIs

- "show bgp neighbor output-queue" output always looks like this:

 Peer: x.x.x.x+179 AS 20021 Local: y.y.y.y+52199 AS n
   Output Queue[1]: 0(inet.0, inet-unicast)

 Peer: x.x.x.x+179 AS 20021 Local: y.y.y.y+52199 AS n
   Output Queue[2]: 0(bgp.evpn.0, evpn)

 ...which seems to fit the default per-RIB behavior as described.

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-07-28 Thread Michael Hare via juniper-nsp
--- Begin Message ---
I'm quite interesting in this topic as I am in the same boat.  I have problems 
similar to Rob in 18.3R3.

We do have jtac support but I haven't contacted them; a time/priority issue so 
far.

- "show bgp output-scheduler" is empty without top-level "protocols bgp 
output-queue-priority" config, regardless of anything else
= same here, so I pasted a canonical top level from 
https://www.juniper.net/documentation/en_US/junos/topics/topic-map/bgp-route-prioritization.html]
= I'm not sure I get the significance of the defaults section if priority has a 
token assignment; what ends up in low/medium/high by default?  Is his related 
to assignment via policy-statement?

protocols {
bgp {
output-queue-priority {
expedited update-tokens 100;
priority 1 update-tokens 1;
priority 2 update-tokens 10;
..
..
priority 15 update-tokens 75;
priority 16 update-tokens 80;
defaults {
low priority 1;
medium priority 10;
high expedited;
}
}
}
}

Anyway, I tried the following under lab iBGP, for fun, to prioritize VPN-ish 
things before global [for us internet is NOT in VRF].

Group: iBGP-reflector-client-v4
family inet-vpn {
unicast {
output-queue-priority priority 10;
route-refresh-priority priority 4;
withdraw-priority priority 16;
}
}
family inet6-vpn {
unicast {
output-queue-priority priority 10;
route-refresh-priority priority 4;
withdraw-priority priority 16;
}
}
family evpn {
signaling {
output-queue-priority priority 11;
route-refresh-priority priority 5;
withdraw-priority expedited;
}
}


And output [below] is implying on the first nlri in the list has priority.  
Where is the priority output for evpn and inet6-vpn-unicast?  With this 
technique must you do a different group per NLRI?  

Lastly the lack of counters and reliance on gauges makes it really difficult to 
determine what is going .

@lab # run show bgp group output-queues iBGP-reflector-client-v4 
Group Type: InternalAS: 65400  Local AS: 65400
  Name: iBGP-reflector-client-v4 Index: 4  Flags: 
  Export: [ flowspec-advertise select-iBGP-reflector-routes next-hop-self 
accept-selected-routes ] 
  Options: 
  Holdtime: 0
  NLRI inet-vpn-unicast: 
OutQ: priority 10 RRQ: priority 4 WDQ: priority 16 

  Total peers: 2Established: 2
  $rrip1+179
  $rrip2+179
Table  Tot Paths  Act Paths SuppressedHistory Damp StatePending
inet.0   
 12  0
inetflow.0   
  0  0
bgp.l3vpn.0  
  6  0
bgp.l3vpn-inet6.0
  6  0
bgp.evpn.0   
 38  0
L3VPN-9105.inet.0
  1  0
L3VPN-9105.inet6.0   
  1  0
L3VPN-9104.inet.0
  1  0
L3VPN-9104.inet6.0   
  1  0
EVPN-9100.evpn.0 
 31  0
EVPN-9101.evpn.0 
  3  0
__default_evpn__.evpn.0 
  4  0

[FIN]

-Michael


> -Original Message-
> From: juniper-nsp  On Behalf Of Rob
> Foehl
> Sent: Monday, July 27, 2020 10:06 PM
> To: juniper-nsp@puck.nether.net
> Subject: [j-nsp] BGP output queue priorities between RIBs/NLRIs
> 
> Anyone know the secret to getting BGP output queue priorities working
> across multiple NLRIs?
> 
> Had trouble with EVPN routes getting stuck behind full refreshes of the v4
> RIB, often for minutes at a time, which causes havoc with the default DF
> election hold timer of 3 seconds.  Bumping those timers up to tens of
> minutes solves this, but... poorly.
> 
> The documentation[1] says:
> 
> "In the default configuration, that is, when no output-queue-priority
> configuration or policy that overrides priority exists, the routing
> protocol process (rpd) enqueues BGP routes into the output queue per
> routing information base (RIB). [...] While proc

Re: [j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-07-28 Thread Jeffrey Haas via juniper-nsp
--- Begin Message ---
See below:

> On Jul 27, 2020, at 11:05 PM, Rob Foehl  wrote:
> 
> [External Email. Be cautious of content]
> 
> 
> Anyone know the secret to getting BGP output queue priorities working
> across multiple NLRIs?
> 
> Had trouble with EVPN routes getting stuck behind full refreshes of the v4
> RIB, often for minutes at a time, which causes havoc with the default DF
> election hold timer of 3 seconds.  Bumping those timers up to tens of
> minutes solves this, but... poorly.
> 
> The documentation[1] says:
> 
> "In the default configuration, that is, when no output-queue-priority
> configuration or policy that overrides priority exists, the routing
> protocol process (rpd) enqueues BGP routes into the output queue per
> routing information base (RIB). [...] While processing output queues, the
> BGP update code flushes the output queue for the current RIB before moving
> on to the next RIB that has a non-empty output queue."
> 
> I've tried about a dozen combinations of options, and cannot get any other
> result with inet/evpn routes in the same session -- inet.0 routes always
> arrive ahead of *.evpn.0.  Am I missing something[2], or is that text not
> quite accurate?
> 
> -Rob
> 
> 
> [1] 
> https://www.juniper.net/documentation/en_US/junos/topics/topic-map/bgp-route-prioritization.html
> 
> [2] Highlight reel of failed attempts, all on 19.2R2 thus far:
> 
> - "show bgp output-scheduler" is empty without top-level "protocols bgp
>  output-queue-priority" config, regardless of anything else
> 
> - Top-level "protocols bgp family evpn signaling" priority config -- and
>  nothing else within that stanza -- broke every v6 session on the box,
>  even with family inet6 explicitly configured under those groups

If you're simply trying to prioritize evpn differently than inet unicast, 
simply having a separate priority for that address family should have been 
sufficient.

Can you clarify what you mean "broke every v6 session"?

I think what you're running into is one of the generally gross things about the 
address-family stanza and the inheritance model global => group => neighbor.  
If you specify ANY address-family configuration at a given scope level, it 
doesn't treat it as inheriting the less specific scopes; it overrides it.

FWIW, the use case of "prioritize a family different" is one of the things this 
was intended to address.  Once you have a working config you may find that you 
want to do policy driven config and use the route-type policy to prioritize the 
DF related routes in its own queue.  That way you're not dealing with the swarm 
of ARP related routes.

-- Jeff



> 
> - Per-group family evpn priority config would show up under "show bgp
>  group output-queues" and similar, but adding family inet would cause the
>  NLRI evpn priority output to disappear
> 
> - Policy-level adjustments to any of the above had no effect between NLRIs
> 
> - "show bgp neighbor output-queue" output always looks like this:
> 
>  Peer: x.x.x.x+179 AS 20021 Local: y.y.y.y+52199 AS n
>Output Queue[1]: 0(inet.0, inet-unicast)
> 
>  Peer: x.x.x.x+179 AS 20021 Local: y.y.y.y+52199 AS n
>Output Queue[2]: 0(bgp.evpn.0, evpn)
> 
>  ...which seems to fit the default per-RIB behavior as described.
> 
> ___
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://urldefense.com/v3/__https://puck.nether.net/mailman/listinfo/juniper-nsp__;!!NEt6yMaO-gk!Xqncm4WhWcDxEBmq2G8Oj_x0PGbBfFynQ62E2OyAj00qIuijy3p3IqwTnSifXP8$

--- End Message ---
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


[j-nsp] BGP output queue priorities between RIBs/NLRIs

2020-07-27 Thread Rob Foehl
Anyone know the secret to getting BGP output queue priorities working 
across multiple NLRIs?


Had trouble with EVPN routes getting stuck behind full refreshes of the v4 
RIB, often for minutes at a time, which causes havoc with the default DF 
election hold timer of 3 seconds.  Bumping those timers up to tens of 
minutes solves this, but... poorly.


The documentation[1] says:

"In the default configuration, that is, when no output-queue-priority 
configuration or policy that overrides priority exists, the routing 
protocol process (rpd) enqueues BGP routes into the output queue per 
routing information base (RIB). [...] While processing output queues, the 
BGP update code flushes the output queue for the current RIB before moving 
on to the next RIB that has a non-empty output queue."


I've tried about a dozen combinations of options, and cannot get any other 
result with inet/evpn routes in the same session -- inet.0 routes always 
arrive ahead of *.evpn.0.  Am I missing something[2], or is that text not 
quite accurate?


-Rob


[1] 
https://www.juniper.net/documentation/en_US/junos/topics/topic-map/bgp-route-prioritization.html

[2] Highlight reel of failed attempts, all on 19.2R2 thus far:

- "show bgp output-scheduler" is empty without top-level "protocols bgp
  output-queue-priority" config, regardless of anything else

- Top-level "protocols bgp family evpn signaling" priority config -- and
  nothing else within that stanza -- broke every v6 session on the box,
  even with family inet6 explicitly configured under those groups

- Per-group family evpn priority config would show up under "show bgp
  group output-queues" and similar, but adding family inet would cause the
  NLRI evpn priority output to disappear

- Policy-level adjustments to any of the above had no effect between NLRIs

- "show bgp neighbor output-queue" output always looks like this:

  Peer: x.x.x.x+179 AS 20021 Local: y.y.y.y+52199 AS n
Output Queue[1]: 0(inet.0, inet-unicast)

  Peer: x.x.x.x+179 AS 20021 Local: y.y.y.y+52199 AS n
Output Queue[2]: 0(bgp.evpn.0, evpn)

  ...which seems to fit the default per-RIB behavior as described.

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp