[Speaking as an individual contributor in this response] > On May 6, 2026, at 03:17, Tiger Xu <[email protected]> wrote: > In essence, this changes the extended community from non‑transitive to > transitive and introduces the concept of bandwidth aggregation – both of > which were already present in draft-xu-idr-fare version -00.
First, a few words on the general use cases of carrying bandwidth/capacity in BGP routes: The link-bandwidth feature, and its varying uses over the years, and the varying transitivities used for it[1] have been a long mess. At a fundamental level, the feature of "we've sent a value, apply a multipath ratio across all paths for that destination based on the received values" has been broadly consistent across the implementations. As the use cases started to split underlay vs. overlay topology and how multipath was handled at each layer and its interaction load balancing became messier. One could observe that there are benefits for splitting the feature carrying the signaling for the bandwidth/capacity based on the role the routes are intended to serve, and also where they are applied. The fact that the "role" of a given route is generally clear in most BGP contexts where BGP is carrying the underlay routing has made it less of a deployment problem to use the same signaling mechanism for both underlay and overlay. However, it does mean that in places where "math" on those values has been necessary that having an overloaded signaling mechanism complicates implementation and operational logic. The ready example covered in many places is that having hop-by-hop underlay bandwidth capacity is great for load balancing across nexthops. However, when it comes time to consider multipath load balancing for individual overlay/service routes passing over links of disparate capacities, there tends to be a need to apply math based on the desired network-wide load balancing. Is it that you want to have a receiver acquire the minimal functional bandwidth that path can use? Or, is it a ratio for traffic to be broadly load balanced behind a set of paths? And certainly there are more use cases. The various use cases have been broadly solved on a single signaling mechanism and - frustrating to some - by operational paradigm and discipline. Simply having more than one signaling mechanism would offer some flexibility to operators and implementors. This has been mentioned in multiple contexts over the years. There has also been appropriate criticism of the link-bw encoding. The choice of IEEE 754 32-bit floating point numbers provided a useful way to carry big numbers across BGP in an existing encoding - extended communities. However, the poor granularity of that type for the numbers we use these days in networks leads to mostly operational issues. For example, you can configure one number and the closest rounded number is what is encoded on the wire. Similarly, how do do policy on numbers where rounding may be in place? And finally, such numbers don't encode or interact nicely with YANG. There have been some proposals to simply change the encoding to get us out of this particular bit of unpleasantness. I think there is room for further work to provide for a less insane encoding. However, that will also lead us to figuring out how a new such mechanism (possibly a new community) interops with the existing stuff. Since most of the use cases for link-bw are satisfied with being a ratio rather than carrying precise numbers, the pressure to address the deficiencies above hasn't been high. However, once there's a desire for more precise capacity encoding, we'll likely see the appropriate mechanisms being proposed for those use cases - and those use cases may overlap with the existing ones. I think there's more room for work to provide cleaner separation of overlay and underlay use cases. In this respect, I'm supportive of continuing discussion on the work you've begun with FARE. But like the other comments above, much of that discussion will be whether a separate signaling mechanism makes our lives easier at the implementation and at the operations level. I look forward to that discussion. A few terse technical comments on the draft itself: Section 3: Your requested encoding is impossible in RFC 4360 extended communities. You have six octets to work with. You both global and local-admin fields that require 4-octets each. Security/Operational considerations: Your desire in this draft is to use transitive extended communities. Unlike the hop-by-hop (re-)generated non-transitive extended communities used by DMZ, you have attribute escape issues to address: - If a given node doesn't "do math" on the community because it doesn't understand it, how does that impact the use case? - You need to protect the deployment against receiving such communities from outside the deployment. - You need to discuss how you remove the communities when the routes are being sent outside the deployment. Some of these considerations are already addressed as part of the link-bw document. -- Jeff [1] Juniper issued the first version as non-transitive, and then immediately started shipping code where it was transitive while squatting on the transitive code point - sloppiness from my forebears that has made for unfortunate cleanup work in IETF along with interop issues. _______________________________________________ BESS mailing list -- [email protected] To unsubscribe send an email to [email protected]
