On Tue, Jan 23, 2024 at 10:12:33PM -0800, William Herrin wrote:
> Respectfully Chris, you are mistaken.
> 
> https://datatracker.ietf.org/doc/html/rfc4271#section-9.1.2.2
> 
> "a) Remove from consideration all routes that are not tied for having
> the smallest number of AS numbers present in their AS_PATH
> attributes."
> 
> So literally, the first thing BGP does when picking the best next hop
> is to discard all but the routes with the shortest AS path.

Not true.  Read the whole RFC--you've ommitted Sections 9.1 and 9.1.1, which 
are very critical.

Discarding all but the routes with shortest AS path is _not_ literally the 
first thing BGP does as you stated above.

The first thing BGP does is to calculate the degree of preference whenever BGP 
receives a new route, withdrawn route or replacement route (See Section 9.1.1). 
 The determination of the degree of preference is considered to be a local 
matter for each Autonomous System exercising route policy, typically expressed 
using LOCAL_PREF, to execute upon the configured administrative policy to class 
the incoming routes.

After completion of 9.1.1, section 9.1.2 and 9.1.2.2 which you cited begins 
(Phase 2: Route Selection).  Route selection under 9.1.2 is only invoked after 
degree of preference is determined (called 'Phase 1' decision) as clearly 
described in Section 9.1.

In fact, even in 9.1.2.2 that you cited above, it clearly states:

   In its Adj-RIBs-In, a BGP speaker may have several routes to the same
   destination that have the same degree of preference. 

   [ snip ]

   The following tie-breaking procedure assumes that, for each candidate
   route, all the BGP speakers within an autonomous system can ascertain
   the cost of a path (interior distance) to the address depicted by the
   NEXT_HOP attribute of the route, and follow the same route selection
   algorithm.

   The tie-breaking algorithm begins by considering all equally
   preferable routes to the same destination, and then selects routes to
   be removed from consideration.  The algorithm terminates as soon as
   only one route remains in consideration.  The criteria MUST be
   applied in the order specified.

   [ snip ]

      a) Remove from consideration all routes that are not tied for
         having the smallest number of AS numbers present in their
         AS_PATH attributes.  Note that when counting this number, an
         AS_SET counts as 1, no matter how many ASes are in the set.



So you see, the comparison of AS_PATH and therefore the route selection process 
could only begin after routes are first resolved by their degree of preference, 
often typically exercised by LOCAL_PREF across the AS (or other similar import, 
such as Cisco's "weight" parameter which is applied before LOCAL_PREF locally 
significant to the router itself where its been configured).  The route 
selection process, including the elimination of routes with inferior AS paths, 
is a tie-breaker algorithm after degree of preference is first calculated, 
which is what we've been trying to tell you.  So no, AS_PATH comparison is not 
literally the first thing BGP does.

You're ignoring Section 9.1.1 in its entirety, which chronologically begins 
before Section 9.1.2.2 (the section you cited), which also clearly specifies 
that route selection process described in it (including AS_PATH comparison) is 
a tie-breaking procedure. 


> 
> It also says that BGP implementations are -allowed- to use other
> selection criteria.


Further followed by the following clause immediately afterwards: 
  "BGP implementations MAY use any algorithm that produces the __same results__ 
as those described here."

And restricted by the following clause in the preceding paragraph:
  "The criteria MUST be applied in the order specified."

And clarified by Section 9.1:
  "as long as the implementations support the described functionality and they 
exhibit the same externally visible behavior."


> And there are many situations where doing so is
> well advised and improves the result. But AS path length is
> unambiguously the default, off which a user has to move it.


So, when a BGP implementation is written in a router software, how does the 
manufacturer know whether your network is going to need to be applying lot of 
degrees of preference, or none?  The vendors have no idea, and RFC also 
clarifies that degree of preference is a local policy matter.  Therefore, the 
default behavior is to assume a universally same LOCAL_PREF until a policy is 
configured, which typically has been '100' across many vendor implementations.  
In this instance, since all routes have the same degree of preference of 100, 
Section 9.1.2.2 you cited then begins to tie-break the routes of same 
preference, starting with the AS_PATH comparison, but it is absolutely by no 
means, the first thing BGP does, at all.  The first thing BGP does as clearly 
specified in the RFC is to determine the degree of preference to meet local 
routing policy.


The degree of preference differs greatly depending on what type of network you 
run.  If you're an edge consumer ASN (such as multi-homed stub enterprise 
running BGP), without providing any downstream IP transit to other BGP 
customers, and not peering with other networks (at an IX or otherwise), then 
your network probably doesn't have a lot of need to apply administrative policy 
to determine a degree of preference, and you can be happy fiddling with just 
AS_PATH.

But if you're running a network which provides transit to other ASNs and 
peering with other networks, then suddenly, applying administrative policy is 
not only desirable, but operationally required.  This isn't solely a 
revenue/greed problem as some have cynically stated, but it's actually also a 
critical service availiability and reliability issue, because not having degree 
of preference pursuant to established routing policy in an IP network 
completely eliminates the ability to implement a desired predictability in 
traffic engineering to meet capacity planning objectives for network 
interconnections.

Are there exceptions, pitfalls to this, where poorly designed or thought-out 
networks suffer in certain routing situations?  Absolutely.  But that's the 
Internet-- it's not perfect, but it works very well most of the time for most 
situations.  

Your desired 'policy-free, AS_PATH-only' world may solve your particular 
complaint at hand, but it absolutely would break the rest of the Internet, with 
no effective ways to implement routing policy for large-scale network 
interconnections that make the Internet tick.  BGP exists to provide anchors to 
apply routing policy into the path selection process at scale.  It is wrong to 
assume that AS_PATH is the first thing and the only thing which matters in BGP, 
through incorrect and out-of-context parsing of the RFC to fit your desired 
narrative.  

In operational realities, backed by the history and the RFCs themselves, the 
single most important and influencial knob in BGP is actually arguablely the 
LOCAL_PREF, more so than AS_PATH.  Sadly, most people won't get to experience 
this until they've run or dealt with operational realities of managing a large 
IP network.  The problem you're complaining about is an exception, primarily 
caused by your poor selection of IP transit provider at the data center which 
you're running AS11875, and you're demanding everyone else to take 
responsibility for the purchasing decision you've made.  There are some good 
proposals, such as commonly accepted wide communities for commonly encountered 
traffic-engineering scenarios to help improve upon this, and make BGP a better 
experience for the end-user in situations like the one you're having, but we're 
not quite there today, and it's understandably not going to be a quick process.

In the meantime, in the immediate short term, glad to hear that your route 
pollution announcement solved the issue for you.  In the medium-term, you 
should get a new transit provider for AS11875 with better connectivity into 
3356.  Long-term, perhaps commonly accepted wide communities could become a 
standard some day to improve knobs in situations like this.


James

Reply via email to