I think it is a good idea to document how the bestpath algorithm works but
personally there is an overwhelming amount of text here about MED.
> +@deffn {BGP} {bgp bestpath compare-routerid} {}
> +@anchor{bgp bestpath compare-routerid}
> +
> +Ensure that where iBGP routes are equal on most metrics, including
> +local-pref, AS_PATH length, IGP cost, MED, the tie is broken based on
> +router-ID. If a route has an ORIGINATOR_ID attribute, i.e. it has been
> +reflected, that ID will be used. Otherwise, the router-ID of the peer the
> +route was received from will be used.
> +
> +The advantage of this is that the route-selection (at this point) will be
> +deterministic, across iBGP. The disadvantage is that such equal routes
> will
> +tend to take the same exit out of the AS, via the lowest-ID router.
> +
>
Comparing the router-id always happens if both paths are from iBGP peers,
it is only if they are both from eBGP peers that it applies.
> +If this option is enabled, then the external-age check, where already
> +selected eBGP routes are preferred, is skipped.
> +@end deffn
> +
> +
> +
> @node BGP route flap dampening
> @subsection BGP route flap dampening
>
> @@ -151,6 +220,189 @@ The route-flap damping algorithm is compatible with
> @cite{RFC2439}. The use of t
> is not recommended nowadays, see @uref{
> http://www.ripe.net/ripe/docs/ripe-378,,RIPE-378}.
> @end deffn
>
>
Not your change but above reads "The use of t is not" instead of "The use
of it is not"
> +@node BGP MED
> +@section BGP MED
> +
> +The BGP @acronym{MED, Multi_Exit_Discriminator} attribute is intended to
> +allow one AS to indicate its preferences for its ingress points to another
> +AS. The MED attribute will not be propagated on to another AS by the
> +receiving AS - it is `non-transitive' in the BGP sense.
> +
> +E.g.@:, if AS X and AS Y have 2 different BGP peering points, then AS X
> +might set a MED of 100 on routes advertised at one and a MED of 200 at the
> +other. When AS Y selects between otherwise equal routes to or via
> +AS X, AS Y should prefer to take the path via the lower MED peering of
> 100 with
> +AS X. Setting the MED allows an AS to influence the routing taken to it
> +within another, neighbouring AS.
> +
> +In this use of MED it is not really meaningful to compare the MED value on
> +routes where the next AS on the paths differs. E.g., if AS Y also had a
> +route for some destination via AS Z in addition to the routes from AS X,
> and
> +AS Z had also set a MED, it wouldn't make sense for AS Y to compare AS Z's
> +MED values to those of AS X. The MED values have been set by different
> +administrators, with different frames of reference.
> +
> +The default behaviour of BGP therefore is to not compare MED values across
> +routes received from different neighbouring ASes. In Quagga this is done
> by
> +comparing the neighbouring, left-most AS in the received AS_PATHs of the
> +routes and only comparing MED if those are the same.
> +
> +Unfortunately, this behaviour of MED, of sometimes being compared across
> +routes and sometimes not, depending on the properties of those other
> routes,
> +means MED can cause the order of preference over all the routes to be
> +undefined. That is, given routes A, B, and C, if A is preferred to B,
> and B
> +is preferred to C, then a well-defined order should mean the preference is
> +transitive (in the sense of orders @footnote{For some set of objects to
> have
> +an order, there @emph{must} be some binary ordering relation that is
> defined
> +between @emph{every} combination of those objects, @math{a \prec b}, and
> +that relation @emph{must} be transitive, i.e. if @math{a \prec b} and
> +@math{b \prec c} then that relation must carry over and it must be that
> +@math{a \prec c} for the objects to have an order. If the relation allows
> +for equality, i.e. if @math{a \prec b} and @math{b \prec a} may both be
> true
> +and this implies that @math{a = b}, then some objects may be equal in
> order to each
> +other and the order is partial. Otherwise, if there is an order, all the
> +objects are distinct and have a total order. MED unfortunately does not
> +define its order over all cases.}) and that A would be preferred to C.
> +
> +However, when MED is involved this need not be the case. With MED it is
> +possible that C is actually preferred over A. This can be true even where
> +BGP defines a deterministic ``most preferred'' route out of the full set
> of
> +A,B,C. With MED, for any given set of routes there may be a
> deterministically
> +preferred route, but there may be no way to arrange them into
> +any order of preference.
> +
> +That MED can induce non-transitive orders of preference over routes can
> +cause issues. Firstly, it may be perceived to cause routing table churn
> +locally at speakers; secondly it may cause routing instability in
> +non-full-mesh iBGP topologies, where sets of speakers continually
> oscillate
> +between different paths.
> +
> +The first issue arises from how speakers often implement routing
> decisions.
> +Though BGP defines a selection process that will deterministically select
> +the same route as best at any given speaker, even with MED, that process
> +requires evaluating all routes together. For performance and ease of
> +implementation reasons, many implementations evaluate route preferences
> in a
> +pair-wise fashion instead. Given there is no well-defined order when MED
> is
> +involved, the best route that will be chosen becomes subject to
> +implementation details, such as the order the routes are stored in. That
> +may be (locally) non-deterministic, e.g.@: it may be the order the routes
> +were received in.
> +
> +This indeterminism may be considered undesirable, though it need not cause
> +problems. It may mean additional routing churn is perceived, as sometimes
> +more updates may be produced than at other times in reaction to some
> event .
> +
> +This first issue can be fixed with a more deterministic route selection
> that
> +ensures routes are ordered by the neighbouring AS during selection.
> +@xref{bgp deterministic-med}. This may reduce the number of updates as
> +routes are received, and may in some cases reduce routing churn. Though,
> it
> +could equally deterministically produce the largest possible set of
> updates
> +in response to the most common sequence of received updates.
> +
> +A deterministic comparison tends to imply an additional overhead of
> sorting
> +over any set of n routes to a destination. The implementation of
> +deterministic MED in Quagga scales significantly worse than most sorting
> +algorithms at present, with the number of paths to a given destination.
> +That number is often low enough to not cause any issues, but where there
> are
> +many paths, the deterministic comparison may quickly become increasingly
> +expensive in terms of CPU.
>
I would say that the details of the sorting algorithm used is probably more
info that the average person is interested in if they are trying to
understand how bestpath works.
> +
> +Deterministic local evaluation can @emph{not} fix the second issue of MED
> +however. Which is that the non-transitive preference of routes MED can
> +cause may lead to routing instability or oscillation across multiple
> +speakers. This can occur with non-full-mesh iBGP topologies that reduce
> the
> +routing information known to each speaker. This has primarily been
> +documented with iBGP route-reflection topologies. However, any other
> +route-hiding technologies potentially could also cause oscillation with
> MED.
> +
> +The second issue occurs where speakers each have only a subset of routes.
> +E.g. speaker X might have routes A,B, and speaker Y might have route C.
> X
> +selects A as its best, Y obviously can only choose C. They exchange
> routes
> +and then X might choose C as best from A,B,C while Y might choose A as
> best
> +from A,C - the non-transitive, non-defined order of preference of routes
> +that MED may induce allows this. They then withdraw their routes and the
> +cycle repeats. This can occur even if all speakers use a deterministic
> +order in route selection.
> +
> +More complex and insidious cycles of oscillation have been documented in
> the
> +literature. See, e.g., @cite{McPherson, D. and Gill, V. and Walton, D.,
> + "Border Gateway Protocol (BGP) Persistent Route Oscillation Condition",
> + IETF RFC3345}, and @cite{Flavel, A. and M. Roughan, "Stable and
> flexible
> + iBGP", ACM SIGCOMM 2009}, and @cite{Griffin, T. and G. Wilfong,
> +"On the correctness of IBGP configuration", ACM SIGCOMM 2002} for
> concrete examples and further
> +references.
> +
> +There is as of this writing @emph{no} known way to use MED for its
> original
> +purpose; @emph{and} reduce routing information in non-full-mesh iBGP
> +topologies (e.g with reflectors); @emph{and} be sure to avoid the
> +instability problems of MED due the non-transitive routing preferences it
> +can induce.
>
But there is a way :)
- Preferring the oldest external path solves one scenario
- "Type I" churn (as described in RFC 3345) can be solved by tweaking
IGP metrics. If you are using RRs you just have to make your inter-cluster
links have a higher cost than your intra-cluster links (same theory with
confeds). When we first discovered MED churn most customers that were
hitting it were able to solve it via this approach.
- "Type II" churn can be solved by using addpath to TX the bestpath per
neighbor-AS...see draft-ietf-idr-route-oscillation-stop-01
> +
> +The instability problems that MED can introduce on more complex,
> +non-full-mesh, iBGP topologies may be avoided either by:
> +
> +@itemize
> +@item
> +Deleting MED from all routes received from neighbouring ASes,
> +and/or by ignoring MED entirely in the decision process. There is no way
> to
> +do this at this time in Quagga.
> +@item
> +Setting @ref{bgp always-compare-med}, however this allows MED to be
> compared
> +across values set by different neighbour ASes, which may not produce
> +desirable results.
> +@item
> +Setting MED to the same value (e.g. 0) using @ref{routemap set metric}
> on all
> +received routes, in combination with setting @ref{bgp always-compare-med}
> on
> +all speakers.
> +@end itemize
> +
> +As MED is evaluated after the AS_PATH length check, another possible use
> for
> +MED is for intra-AS steering of routes with equal AS_PATH length, as an
> +extension of the last case above. As MED is evaluated before IGP metric,
> +this can allow cold-potato routing to be implemented, sending traffic to
> +preferred hand-offs with neighbours, rather than the closest hand-off
> +according to the IGP metric. This would be done with @ref{routemap set
> +metric} and by setting @ref{bgp always-compare-med} on all speakers.
> +
> +Note that even if action is taken to address the MED non-transitivity
> +issues, other oscillations may still be possible. E.g. on IGP cost if
> iBGP
> +and IGP topologies are at cross-purposes with each other.
>
Can you clarify here?
> +
> +@deffn {BGP} {bgp deterministic-med} {}
> +@anchor{bgp deterministic-med}
> +
> +Carry out route-selection in way that produces more deterministic answers
> +locally, even in the face of MED and the lack of a well-defined order of
> +preference it can induce on routes. Without this option the preferred
> route
> +with MED may be determined largely by the order that routes were received
> +in.
>
Would say "produces deterministic" instead of "produces more deterministic".
> +
> +Setting this option will have a performance cost that may be noticeable
> when
> +there are many routes for each destination. Currently in Quagga it is
> +implemented in a way that scales poorly as the number of routes per
> +destination increases.
>
Why don't we fix our implementation so that it is less expensive and chop
the paragraph above? I am worried that we will end up discouraging
customers from enabling deterministic-med.
> +
> +The default is that this option is not set.
> +@end deffn
> +
> +Note that there are other sources of indeterminism in the route selection
> +process, @xref{BGP decision process}.
>
Other than "prefer oldest external" what sources of indeterminism are there?
Daniel
_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev