[quagga-dev 14072] Re: [PATCH] docs: Update bgpd docs, inc. on decision process, and with a section on MED.

Paul Jakma Thu, 03 Dec 2015 07:46:12 -0800

On Thu, 3 Dec 2015, Daniel Walton wrote:

If both routes are received from iBGP peers (doesn't matter if they wereoriginated in our AS or not) then we compare the router-id. The comparerouter-id knob only comes into play when comparing two paths from eBGPpeers because it causes us to skip the 'prefer oldest' check.

My point being that it is confusing to explain the knob in the context of
iBGP when the knob only has an impact for eBGP peers.


Yes. Tried to fix.

      It seems relevant to an operator. DMED is not free, it has an
      intrinsic cost. Operators surely will want to have the
      information they need to be able to balance the costs against
      the benefits?

It is an implemenation detail though. We should explain what the knobdoes and based on that the operator can decide whether or not he/shewants to enable it. If doing so causes performance issues then weaddress those and fix them.

That cost won't be fixable though. It unavoidable comes with someadditional cost - supra-linear at least, so far as we know.


We have a metric which intrinsically has no order.

So it becomes impossible to scan along, pair by pair, in one O(n) pass andfind the "largest" or "most preferred", because such a thing isintrinsically undefined and non-existant for this metric.

Therefore, in order to get a deterministic answer from 'preference', wemust _impose_ some other order on to the routes - given that we don't likethe existing "free" order of 'received order' that we store them in.

Imposing that external order unavoidably has a cost. I can't think of away to impose an order other than via a sort. So that implies an O(nlogn)scaling cost, at least.

We can stagger that cost by doing an insertion sort when routes arereceived, so that any given incoming route doesn't need more than O(n)work, but overall it will still be an _additional_ O(nlogn) amount ofwork.

Further, if one destination has a lot of paths, often many otherdestinations in the routing table will too, so its more O(dnlogn) - and ifd is Internet DFZ prefixes, well that's been growing above-linear,polynomially or maybe even exponentially.

It could be a very real cost, if you have a lot of neighouring ASes. AndDMED doesn't even fix much, to balance against that cost. If I was anoperator I'd want to know these things. :)

Note: For add-path bestpath-TX, where we'll have to sort anyway to do thebest-path comparison, then yes, you get DMED for free (well a constantfactor additional cost that can hence be ignored, when compared to anO(nlogn) scaling cost).

I could say that the algorithm for replicating updates among members ofa peer-group is non-existent and that therefore you shouldn't configurelots of peers because it will have a performance impact. That would bea true statement but the operator doesn't really need to know thatunless they configure so many peers and carry so many prefixes that theynotice poor performance.

I don't know. If we knew the scaling cost of something, and if the featurewas optional and the admin might want to balance the costs against thebenefits, why not give them that information?

I don't know the costs of adding peers, but if someone wants to do a studyand measure it, I'd be happy to put it in the docs. :) (If the scalingfigure looks bad and that makes someone go improve it, all the better).

Other vendors might be afraid to tell admins less-than-glossy facts abouttheir implementation, or the reality of the costs of additional featuresin their documentation, but we're different, surely? :)

I have not done a mathmatical proof on it but it has been 14 years since we
found MED churn and we do not know of any flavor of MED churn that is not
solved by the three bullets below.

Tweaking IGP metrics can indeed solve things, but the rules you give are abit topology specific. It's not general. It's not even clear they alwaysapply (maybe they stop applying with certain combinations of clusters, whoknows - sounds like your Type-II churn is that).

Also, poor operators who have to figure this stuff out and experiment withit and try get it right, potentially while customers are getting irritatedat their cat video packets regularly suffering routing glitches, as BGPoscillates.

There are some simple mathematical rules that can be followed in designinga path-vector routing protocol and its metrics which will _ensure_ theprotocol converges, always, on any network. What I'd love is if we couldembrace those rules and start figuring if and how we can bend BGP intofollowing those rules.


1. Make paths map to a weight (lowest weight == most preferred), and do so
   without reference to any other path (MED fails on that)

2. Ensure that the weight of a path always increases when propagated

   (many BGP metrics fail on this, but saved by IGP cost hopefully
   reflecting this, except if iBGP and IGP topologies are at
   cross-purposes / not aligned, in which case you can have oscillation on
   IGP costs alone; or if you don't run an IGP at all - seems more common
   today; the CLUSTER_LIST length check is good though, it always
   increases with reflection - lift that earlier into the decision process
   and bingo...).

That's it. Do that, or something that can be shown to be equivalent, andyou can be _sure_ your iBGP will converge. No tweaking needed.

If the CLUSTER_LIST length were lifted to before the IGP check, youwouldn't even need to care about tweaking IGP costs and having to ensurethey were _just right_. BGP would just work reliably, even if you screwedup the IGP costs, no matter how complex your iBGP topology - no need forcomplex extensions.

I would like Quagga to do what it can to give operators a BGP that "JustWorks" out of the box. :)

I don't know how to do that, but I do think we need to stop revering theold and provably borkten stuff.

We have 'prefer oldest external' enabled by default which does help.

It doesn't help on any of the examples in the literature I can think of.:)

The only other knob we could flip is always-compare-med but I don'tthink customers would like it much if we did.

We obviously can't, but for any operator who doesn't care about MED - whodoesn't want to give neighbours the power to inject badly-defined,internal routing metrics, the smart thing to do is:


- set MED to 0 on all routes
- set always-compare on

No performance cost.

The reality is that not many people actually hit MED churn and almostall of the ones that do hit the Type I scenario which is easily solved.

Why even risk having to solve it? Just disable MED, as above. Problem gonefor good.

I think it would be best to just refer them to RFC3345 and say "read this if
you are seeing oscillating routes in bgp".


Will add that.

Mathmatically prove...no, I have never attempted to do so and we'venever been asked to provide a mathmatical proof. Am I confident thatMED churn is solved with these approaches, absolutely.

At least one is admin-intensive and even error-prone, and still a prereqfor the other.

"It just works, provably, by maths, you really can't screw it up" - ismuch better, and what I'd prefer to work towards. Maybe that's just me.

Maybe mention it in the section on 'prefer oldest external' and compare
router-id and chop the mention here?


Made it more specific, but it does seem relevant there?

I suspect a lot of people going into those docs won't read the explanatorystuff and only read the command-specific text.


regards,
--
Paul Jakma, HPE Networking, Advanced Technology Group
Fortune:
Your analyst has you mixed up with another patient.  Don't believe a
thing he tells you.

_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

[quagga-dev 14072] Re: [PATCH] docs: Update bgpd docs, inc. on decision process, and with a section on MED.

Reply via email to