On Thu, 3 Dec 2015, Daniel Walton wrote:

If both routes are received from iBGP peers (doesn't matter if they were originated in our AS or not) then we compare the router-id. The compare router-id knob only comes into play when comparing two paths from eBGP peers because it causes us to skip the 'prefer oldest' check.

My point being that it is confusing to explain the knob in the context of
iBGP when the knob only has an impact for eBGP peers.

Yes. Tried to fix.

      It seems relevant to an operator. DMED is not free, it has an
      intrinsic cost. Operators surely will want to have the
      information they need to be able to balance the costs against
      the benefits?


It is an implemenation detail though. We should explain what the knob does and based on that the operator can decide whether or not he/she wants to enable it. If doing so causes performance issues then we address those and fix them.

That cost won't be fixable though. It unavoidable comes with some additional cost - supra-linear at least, so far as we know.

We have a metric which intrinsically has no order.

So it becomes impossible to scan along, pair by pair, in one O(n) pass and find the "largest" or "most preferred", because such a thing is intrinsically undefined and non-existant for this metric.

Therefore, in order to get a deterministic answer from 'preference', we must _impose_ some other order on to the routes - given that we don't like the existing "free" order of 'received order' that we store them in.

Imposing that external order unavoidably has a cost. I can't think of a way to impose an order other than via a sort. So that implies an O(nlogn) scaling cost, at least.

We can stagger that cost by doing an insertion sort when routes are received, so that any given incoming route doesn't need more than O(n) work, but overall it will still be an _additional_ O(nlogn) amount of work.

Further, if one destination has a lot of paths, often many other destinations in the routing table will too, so its more O(dnlogn) - and if d is Internet DFZ prefixes, well that's been growing above-linear, polynomially or maybe even exponentially.

It could be a very real cost, if you have a lot of neighouring ASes. And DMED doesn't even fix much, to balance against that cost. If I was an operator I'd want to know these things. :)

Note: For add-path bestpath-TX, where we'll have to sort anyway to do the best-path comparison, then yes, you get DMED for free (well a constant factor additional cost that can hence be ignored, when compared to an O(nlogn) scaling cost).

I could say that the algorithm for replicating updates among members of a peer-group is non-existent and that therefore you shouldn't configure lots of peers because it will have a performance impact. That would be a true statement but the operator doesn't really need to know that unless they configure so many peers and carry so many prefixes that they notice poor performance.

I don't know. If we knew the scaling cost of something, and if the feature was optional and the admin might want to balance the costs against the benefits, why not give them that information?

I don't know the costs of adding peers, but if someone wants to do a study and measure it, I'd be happy to put it in the docs. :) (If the scaling figure looks bad and that makes someone go improve it, all the better).

Other vendors might be afraid to tell admins less-than-glossy facts about their implementation, or the reality of the costs of additional features in their documentation, but we're different, surely? :)

I have not done a mathmatical proof on it but it has been 14 years since we
found MED churn and we do not know of any flavor of MED churn that is not
solved by the three bullets below.

Tweaking IGP metrics can indeed solve things, but the rules you give are a bit topology specific. It's not general. It's not even clear they always apply (maybe they stop applying with certain combinations of clusters, who knows - sounds like your Type-II churn is that).

Also, poor operators who have to figure this stuff out and experiment with it and try get it right, potentially while customers are getting irritated at their cat video packets regularly suffering routing glitches, as BGP oscillates.

There are some simple mathematical rules that can be followed in designing a path-vector routing protocol and its metrics which will _ensure_ the protocol converges, always, on any network. What I'd love is if we could embrace those rules and start figuring if and how we can bend BGP into following those rules.

1. Make paths map to a weight (lowest weight == most preferred), and do so
   without reference to any other path (MED fails on that)

2. Ensure that the weight of a path always increases when propagated

   (many BGP metrics fail on this, but saved by IGP cost hopefully
   reflecting this, except if iBGP and IGP topologies are at
   cross-purposes / not aligned, in which case you can have oscillation on
   IGP costs alone; or if you don't run an IGP at all - seems more common
   today; the CLUSTER_LIST length check is good though, it always
   increases with reflection - lift that earlier into the decision process
   and bingo...).

That's it. Do that, or something that can be shown to be equivalent, and you can be _sure_ your iBGP will converge. No tweaking needed.

If the CLUSTER_LIST length were lifted to before the IGP check, you wouldn't even need to care about tweaking IGP costs and having to ensure they were _just right_. BGP would just work reliably, even if you screwed up the IGP costs, no matter how complex your iBGP topology - no need for complex extensions.

I would like Quagga to do what it can to give operators a BGP that "Just Works" out of the box. :)

I don't know how to do that, but I do think we need to stop revering the old and provably borkten stuff.
 
We have 'prefer oldest external' enabled by default which does help.

It doesn't help on any of the examples in the literature I can think of. :)

The only other knob we could flip is always-compare-med but I don't think customers would like it much if we did.

We obviously can't, but for any operator who doesn't care about MED - who doesn't want to give neighbours the power to inject badly-defined, internal routing metrics, the smart thing to do is:

- set MED to 0 on all routes
- set always-compare on

No performance cost.

The reality is that not many people actually hit MED churn and almost all of the ones that do hit the Type I scenario which is easily solved.

Why even risk having to solve it? Just disable MED, as above. Problem gone for good.

I think it would be best to just refer them to RFC3345 and say "read this if
you are seeing oscillating routes in bgp".

Will add that.
  
Mathmatically prove...no, I have never attempted to do so and we've never been asked to provide a mathmatical proof. Am I confident that MED churn is solved with these approaches, absolutely.

At least one is admin-intensive and even error-prone, and still a prereq for the other.

"It just works, provably, by maths, you really can't screw it up" - is much better, and what I'd prefer to work towards. Maybe that's just me.

Maybe mention it in the section on 'prefer oldest external' and compare
router-id and chop the mention here?

Made it more specific, but it does seem relevant there?

I suspect a lot of people going into those docs won't read the explanatory stuff and only read the command-specific text.

regards,
--
Paul Jakma, HPE Networking, Advanced Technology Group
Fortune:
Your analyst has you mixed up with another patient.  Don't believe a
thing he tells you.
_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

Reply via email to