Section 5, you have see ... Which I assume you meant to point at your later MED section.
donald On Thu, Dec 3, 2015 at 5:33 AM, Paul Jakma <[email protected]> wrote: > Take 3. Add the new section to the menu so the Info version builds. Add a > footnote with a quick refresher on orders. > > * bgpd.texi: Document the -l argument. Update the 'BGP decision process' > table > to reflect what /actually/ is implemented. Add docs on > 'compare-routerid' in > the bestpath section. > > Add a section on MED, to highlight the issues it has by default, and to > highlight that it is terminally broken for its original purpose in many > modern iBGP topologies. > > * routemap.texi: set an anchor on 'set metric' so bgpd.texi can reference > it. > --- > doc/bgpd.texi | 264 > ++++++++++++++++++++++++++++++++++++++++++++++++++++-- > doc/routemap.texi | 1 + > 2 files changed, 259 insertions(+), 6 deletions(-) > > diff --git a/doc/bgpd.texi b/doc/bgpd.texi > index 7d92b5e..89ac8a3 100644 > --- a/doc/bgpd.texi > +++ b/doc/bgpd.texi > @@ -18,6 +18,7 @@ BGP-4. > @menu > * Starting BGP:: > * BGP router:: > +* BGP MED:: > * BGP network:: > * BGP Peer:: > * BGP Peer Group:: > @@ -53,6 +54,13 @@ Set the bgp protocol's port number. > @item -r > @itemx --retain > When program terminates, retain BGP routes added by zebra. > + > +@item -l > +@itemx --listenon > +Specify a specific IP address for bgpd to listen on, rather than its > +default of INADDR_ANY / IN6ADDR_ANY. This can be useful to constrain bgpd > +to an internal address, or to run multiple bgpd processes on one host. > + > @end table > > @node BGP router > @@ -104,18 +112,59 @@ This command set distance value to > @node BGP decision process > @subsection BGP decision process > > +The decision process Quagga BGP uses to select routes is as follows: > + > @table @asis > @item 1. Weight check > +prefer higher local weight routes to lower routes. > > -@item 2. Local preference check. > +@item 2. Local preference check > +prefer higher local preference routes to lower. > + > +@item 3. Local route check > +Prefer local routes (statics, aggregates, redistributed) to received > routes. > + > +@item AS path length check > +Prefer shortest hop-count AS_PATHs. > + > +@item 4. Origin check > +Prefer the lowest origin type route. That is, prefer IGP origin routes to > +EGP, to Incomplete routes. > + > +@item 5. MED check > +Where routes with a MED were received from the same AS, > +prefer the route with the lowest MED. See ... > + > +@item 6. External check > +Prefer the route received from an external, eBGP peer > +over routes received from other types of peers. > + > +@item 7. IGP cost check > +Prefer the route with the lower IGP cost. > + > +@item 8. Multi-path check > +If multi-pathing is enabled, then check whether > +the routes not yet distinguished in preference may be considered equal. If > +@ref{bgp bestpath as-path multipath-relax} is set, all such routes are > +considered equal, otherwise routes received via iBGP with identical > AS_PATHs > +or routes received from eBGP neighbours in the same AS are considered > equal. > + > > -@item 3. Local route check. > +@item 10. Router-ID check > +Prefer the route with the lowest router-ID. If the > +route has an ORIGINATOR_ID attribute, through iBGP reflection, then that > +router ID is used, otherwise the router-ID of the peer the route was > +received from is used. > > -@item 4. AS path length check. > +@item 11. Cluster-List length check > +The route with the shortest cluster-list > +length is used. The cluster-list reflects the iBGP reflection path the > +route has taken. > > -@item 5. Origin check. > +@item 12. Peer address > +Prefer the route received from the peer with the higher > +transport layer address, as a last-resort tie-breaker. > > -@item 6. MED check. > @end table > > @deffn {BGP} {bgp bestpath as-path confed} {} > @@ -125,11 +174,31 @@ decision process. > @end deffn > > @deffn {BGP} {bgp bestpath as-path multipath-relax} {} > +@anchor{bgp bestpath as-path multipath-relax} > This command specifies that BGP decision process should consider paths > of equal AS_PATH length candidates for multipath computation. Without > the knob, the entire AS_PATH must match for multipath computation. > @end deffn > > +@deffn {BGP} {bgp bestpath compare-routerid} {} > +@anchor{bgp bestpath compare-routerid} > + > +Ensure that where iBGP routes are equal on most metrics, including > +local-pref, AS_PATH length, IGP cost, MED, the tie is broken based on > +router-ID. If a route has an ORIGINATOR_ID attribute, i.e. it has been > +reflected, that ID will be used. Otherwise, the router-ID of the peer the > +route was received from will be used. > + > +The advantage of this is that the route-selection (at this point) will be > +deterministic, across iBGP. The disadvantage is that such equal routes > will > +tend to take the same exit out of the AS, via the lowest-ID router. > + > +If this option is enabled, then the external-age check, where already > +selected eBGP routes are preferred, is skipped. > +@end deffn > + > + > + > @node BGP route flap dampening > @subsection BGP route flap dampening > > @@ -151,6 +220,189 @@ The route-flap damping algorithm is compatible with > @cite{RFC2439}. The use of t > is not recommended nowadays, see @uref{ > http://www.ripe.net/ripe/docs/ripe-378,,RIPE-378}. > @end deffn > > +@node BGP MED > +@section BGP MED > + > +The BGP @acronym{MED, Multi_Exit_Discriminator} attribute is intended to > +allow one AS to indicate its preferences for its ingress points to another > +AS. The MED attribute will not be propagated on to another AS by the > +receiving AS - it is `non-transitive' in the BGP sense. > + > +E.g.@:, if AS X and AS Y have 2 different BGP peering points, then AS X > +might set a MED of 100 on routes advertised at one and a MED of 200 at the > +other. When AS Y selects between otherwise equal routes to or via > +AS X, AS Y should prefer to take the path via the lower MED peering of > 100 with > +AS X. Setting the MED allows an AS to influence the routing taken to it > +within another, neighbouring AS. > + > +In this use of MED it is not really meaningful to compare the MED value on > +routes where the next AS on the paths differs. E.g., if AS Y also had a > +route for some destination via AS Z in addition to the routes from AS X, > and > +AS Z had also set a MED, it wouldn't make sense for AS Y to compare AS Z's > +MED values to those of AS X. The MED values have been set by different > +administrators, with different frames of reference. > + > +The default behaviour of BGP therefore is to not compare MED values across > +routes received from different neighbouring ASes. In Quagga this is done > by > +comparing the neighbouring, left-most AS in the received AS_PATHs of the > +routes and only comparing MED if those are the same. > + > +Unfortunately, this behaviour of MED, of sometimes being compared across > +routes and sometimes not, depending on the properties of those other > routes, > +means MED can cause the order of preference over all the routes to be > +undefined. That is, given routes A, B, and C, if A is preferred to B, > and B > +is preferred to C, then a well-defined order should mean the preference is > +transitive (in the sense of orders @footnote{For some set of objects to > have > +an order, there @emph{must} be some binary ordering relation that is > defined > +between @emph{every} combination of those objects, @math{a \prec b}, and > +that relation @emph{must} be transitive, i.e. if @math{a \prec b} and > +@math{b \prec c} then that relation must carry over and it must be that > +@math{a \prec c} for the objects to have an order. If the relation allows > +for equality, i.e. if @math{a \prec b} and @math{b \prec a} may both be > true > +and this implies that @math{a = b}, then some objects may be equal in > order to each > +other and the order is partial. Otherwise, if there is an order, all the > +objects are distinct and have a total order. MED unfortunately does not > +define its order over all cases.}) and that A would be preferred to C. > + > +However, when MED is involved this need not be the case. With MED it is > +possible that C is actually preferred over A. This can be true even where > +BGP defines a deterministic ``most preferred'' route out of the full set > of > +A,B,C. With MED, for any given set of routes there may be a > deterministically > +preferred route, but there may be no way to arrange them into > +any order of preference. > + > +That MED can induce non-transitive orders of preference over routes can > +cause issues. Firstly, it may be perceived to cause routing table churn > +locally at speakers; secondly it may cause routing instability in > +non-full-mesh iBGP topologies, where sets of speakers continually > oscillate > +between different paths. > + > +The first issue arises from how speakers often implement routing > decisions. > +Though BGP defines a selection process that will deterministically select > +the same route as best at any given speaker, even with MED, that process > +requires evaluating all routes together. For performance and ease of > +implementation reasons, many implementations evaluate route preferences > in a > +pair-wise fashion instead. Given there is no well-defined order when MED > is > +involved, the best route that will be chosen becomes subject to > +implementation details, such as the order the routes are stored in. That > +may be (locally) non-deterministic, e.g.@: it may be the order the routes > +were received in. > + > +This indeterminism may be considered undesirable, though it need not cause > +problems. It may mean additional routing churn is perceived, as sometimes > +more updates may be produced than at other times in reaction to some > event . > + > +This first issue can be fixed with a more deterministic route selection > that > +ensures routes are ordered by the neighbouring AS during selection. > +@xref{bgp deterministic-med}. This may reduce the number of updates as > +routes are received, and may in some cases reduce routing churn. Though, > it > +could equally deterministically produce the largest possible set of > updates > +in response to the most common sequence of received updates. > + > +A deterministic comparison tends to imply an additional overhead of > sorting > +over any set of n routes to a destination. The implementation of > +deterministic MED in Quagga scales significantly worse than most sorting > +algorithms at present, with the number of paths to a given destination. > +That number is often low enough to not cause any issues, but where there > are > +many paths, the deterministic comparison may quickly become increasingly > +expensive in terms of CPU. > + > +Deterministic local evaluation can @emph{not} fix the second issue of MED > +however. Which is that the non-transitive preference of routes MED can > +cause may lead to routing instability or oscillation across multiple > +speakers. This can occur with non-full-mesh iBGP topologies that reduce > the > +routing information known to each speaker. This has primarily been > +documented with iBGP route-reflection topologies. However, any other > +route-hiding technologies potentially could also cause oscillation with > MED. > + > +The second issue occurs where speakers each have only a subset of routes. > +E.g. speaker X might have routes A,B, and speaker Y might have route C. > X > +selects A as its best, Y obviously can only choose C. They exchange > routes > +and then X might choose C as best from A,B,C while Y might choose A as > best > +from A,C - the non-transitive, non-defined order of preference of routes > +that MED may induce allows this. They then withdraw their routes and the > +cycle repeats. This can occur even if all speakers use a deterministic > +order in route selection. > + > +More complex and insidious cycles of oscillation have been documented in > the > +literature. See, e.g., @cite{McPherson, D. and Gill, V. and Walton, D., > + "Border Gateway Protocol (BGP) Persistent Route Oscillation Condition", > + IETF RFC3345}, and @cite{Flavel, A. and M. Roughan, "Stable and > flexible > + iBGP", ACM SIGCOMM 2009}, and @cite{Griffin, T. and G. Wilfong, > +"On the correctness of IBGP configuration", ACM SIGCOMM 2002} for > concrete examples and further > +references. > + > +There is as of this writing @emph{no} known way to use MED for its > original > +purpose; @emph{and} reduce routing information in non-full-mesh iBGP > +topologies (e.g with reflectors); @emph{and} be sure to avoid the > +instability problems of MED due the non-transitive routing preferences it > +can induce. > + > +The instability problems that MED can introduce on more complex, > +non-full-mesh, iBGP topologies may be avoided either by: > + > +@itemize > +@item > +Deleting MED from all routes received from neighbouring ASes, > +and/or by ignoring MED entirely in the decision process. There is no way > to > +do this at this time in Quagga. > +@item > +Setting @ref{bgp always-compare-med}, however this allows MED to be > compared > +across values set by different neighbour ASes, which may not produce > +desirable results. > +@item > +Setting MED to the same value (e.g. 0) using @ref{routemap set metric} > on all > +received routes, in combination with setting @ref{bgp always-compare-med} > on > +all speakers. > +@end itemize > + > +As MED is evaluated after the AS_PATH length check, another possible use > for > +MED is for intra-AS steering of routes with equal AS_PATH length, as an > +extension of the last case above. As MED is evaluated before IGP metric, > +this can allow cold-potato routing to be implemented, sending traffic to > +preferred hand-offs with neighbours, rather than the closest hand-off > +according to the IGP metric. This would be done with @ref{routemap set > +metric} and by setting @ref{bgp always-compare-med} on all speakers. > + > +Note that even if action is taken to address the MED non-transitivity > +issues, other oscillations may still be possible. E.g. on IGP cost if > iBGP > +and IGP topologies are at cross-purposes with each other. > + > +@deffn {BGP} {bgp deterministic-med} {} > +@anchor{bgp deterministic-med} > + > +Carry out route-selection in way that produces more deterministic answers > +locally, even in the face of MED and the lack of a well-defined order of > +preference it can induce on routes. Without this option the preferred > route > +with MED may be determined largely by the order that routes were received > +in. > + > +Setting this option will have a performance cost that may be noticeable > when > +there are many routes for each destination. Currently in Quagga it is > +implemented in a way that scales poorly as the number of routes per > +destination increases. > + > +The default is that this option is not set. > +@end deffn > + > +Note that there are other sources of indeterminism in the route selection > +process, @xref{BGP decision process}. > + > +@deffn {BGP} {bgp always-compare-med} {} > +@anchor{bgp always-compare-med} > + > +Always compare the MED on routes, even when they were received from > +different neighbouring ASes. Setting this option makes the order of > +preference of routes more defined, and should eliminate MED induced > +oscillations. > + > +This option can be used, together with @ref{routemap set metric} to use > MED > +as an intra-AS metric to steer equal-length AS_PATH routes to, e.g., > desired > +exit points. > +@end deffn > + > + > + > @node BGP network > @section BGP network > > @@ -188,7 +440,7 @@ This command specifies an aggregate address. > @end deffn > > @deffn {BGP} {aggregate-address @var{A.B.C.D/M} as-set} {} > -This command specifies an aggregate address. Resulting routes inlucde > +This command specifies an aggregate address. Resulting routes include > AS set. > @end deffn > > diff --git a/doc/routemap.texi b/doc/routemap.texi > index db3e72d..7938c96 100644 > --- a/doc/routemap.texi > +++ b/doc/routemap.texi > @@ -171,6 +171,7 @@ Set the route's weight. > @end deffn > > @deffn {Route-map Command} {set metric @var{metric}} {} > +@anchor{routemap set metric} > Set the BGP attribute MED. > @end deffn > > -- > 2.5.0 > > > _______________________________________________ > Quagga-dev mailing list > [email protected] > https://lists.quagga.net/mailman/listinfo/quagga-dev >
_______________________________________________ Quagga-dev mailing list [email protected] https://lists.quagga.net/mailman/listinfo/quagga-dev
