On Thu, 3 Dec 2015, Daniel Walton wrote:
I think it is a good idea to document how the bestpath algorithm works
but personally there is an overwhelming amount of text here about MED.
There's a dearth of operator-orientated info out there on MED and its
issues. Most of the docs seem to focus on the immediate, local
consequences of DMED and always-compare, but don't really explore the
bigger picture.
In particular, the non-transitive preferences over routes that MED can
induce and the intrinsic problems that can cause with route-hiding iBGP
mechanisms doesn't seem well-described outside of academic literature -
which isn't very accessible to non-academics (both in literal access
terms, and ease of reading).
+@deffn {BGP} {bgp bestpath compare-routerid} {}
+@anchor{bgp bestpath compare-routerid}
+
+Ensure that where iBGP routes are equal on most metrics,
including
+local-pref, AS_PATH length, IGP cost, MED, the tie is broken
based on
+router-ID. If a route has an ORIGINATOR_ID attribute, i.e. it
has been
+reflected, that ID will be used. Otherwise, the router-ID of
the peer the
+route was received from will be used.
+
+The advantage of this is that the route-selection (at this
point) will be
+deterministic, across iBGP. The disadvantage is that such
equal routes will
+tend to take the same exit out of the AS, via the lowest-ID
router.
+
Comparing the router-id always happens if both paths are from iBGP
peers, it is only if they are both from eBGP peers that it applies.
"iBGP routes" in the above is probably badly worded. I didn't mean that to
be iBGP origin there, but routes being compared where both were received
from iBGP (or both from eBGP, as you note - but then only if the
external-age check didn't do a return).
Or did you mean something else?
Not your change but above reads "The use of t is not" instead of "The use of
it is not"
Can fix in another trivial patch.
+A deterministic comparison tends to imply an additional
overhead of sorting
+over any set of n routes to a destination. The implementation
of
+deterministic MED in Quagga scales significantly worse than
most sorting
+algorithms at present, with the number of paths to a given
destination.
+That number is often low enough to not cause any issues, but
where there are
+many paths, the deterministic comparison may quickly become
increasingly
+expensive in terms of CPU.
I would say that the details of the sorting algorithm used is probably more
info that the average person is interested in if they are trying to
understand how bestpath works.
It seems relevant to an operator. DMED is not free, it has an intrinsic
cost. Operators surely will want to have the information they need to be
able to balance the costs against the benefits?
+There is as of this writing @emph{no} known way to use MED for
its original
+purpose; @emph{and} reduce routing information in non-full-mesh
iBGP
+topologies (e.g with reflectors); @emph{and} be sure to avoid
the
+instability problems of MED due the non-transitive routing
preferences it
+can induce.
But there is a way :)
Is is _sure to avoid_ though? There are many many networks, and there are
different ways BGP can behave even on the same network.
MED intrinsically has an undefined order of preference across routes,
that's the source of all the issues with it. iBGP topologies are getting
bigger and more complex (though some, DCs, are getting more regular in
structure), and we ship with defaults that leave the more complex iBGP
networks wide open to problems caused by MED.
* Preferring the oldest external path solves one scenario
* "Type I" churn (as described in RFC 3345) can be solved by tweaking IGP
metrics. If you are using RRs you just have to make your inter-cluster
links have a higher cost than your intra-cluster links (same theory with
confeds). When we first discovered MED churn most customers that were
hitting it were able to solve it via this approach.
* "Type II" churn can be solved by using addpath to TX the bestpath per
neighbor-AS...see draft-ietf-idr-route-oscillation-stop-01
I don't know how to describe these cases in a way that an operator could
apply the advice and be sure they had avoided MED issues though. Patch to
the doc welcome though. ;)
Indeed, can you prove the issues are solved with those approaches? There
are papers that derive quite simple rules that engineers can apply and be
_sure_ that their path-vector protocol will converge, and even will
converge on optimal routes. The IGP cost case potentially can be proven to
meet those rules, but that proof will be specific to the network - not a
general proof to any network.
There are simple fixes to the "churn" issues, certainly if one leverages
the academic work and recognising the root of the problem: It's due to
fundamental ordering properties of the metrics involved (or utter lack
thereof).
I think the text above remains correct, there is no way to have all those
3 things, including being "sure to avoid the instability problems", as far
as I am aware. Least in general (that phrase might be missing).
+Note that even if action is taken to address the MED
non-transitivity
+issues, other oscillations may still be possible. E.g. on IGP
cost if iBGP
+and IGP topologies are at cross-purposes with each other.
Can you clarify here?
Flavel and Roughan give an example, and I think at least one of Griffin's
papers might give a few examples of IGP "wedgies", iirc.
Would say "produces deterministic" instead of "produces more deterministic".
Ack.
+Setting this option will have a performance cost that may be
noticeable when +there are many routes for each destination.
Currently in Quagga it is +implemented in a way that scales poorly
as the number of routes per +destination increases.
Why don't we fix our implementation so that it is less expensive and
chop the paragraph above? I am worried that we will end up discouraging
customers from enabling deterministic-med.
Well, I'm not aware of DMED fixing anything, so I'm not going to spend my
time on that. Someone else could, and update the above.
Till then, it seems like important information for admins. If they choose
not to enable DMED, they're not losing anything afaik.
Really, they should enable always-compare and set all MEDs to 0 when
received from eBGP, unless they have a specific use for MED. In which
case, DMED would be irrelevant anyway.
+Note that there are other sources of indeterminism in the route
selection
+process, @xref{BGP decision process}.
Other than "prefer oldest external" what sources of indeterminism are there?
That's the one I had in mind.
regards,
--
Paul Jakma, HPE Networking, Advanced Technology Group
Fortune:
Live within your income, even if you have to borrow to do so.
-Josh Billings
_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev