Ketan Talaulikar has entered the following ballot position for draft-ietf-grow-bmp-bgp-rib-stats-14: Discuss
When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ for more information about how to handle DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-grow-bmp-bgp-rib-stats/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- Thanks to the authors and the WG for this document. Please find below certain points that I would like to discuss. <discuss-1> Semantics of routes, paths, primary, and backup. Section 2 of this document says: Primary route: A route to a prefix that is considered the best route by the BGP decision process [RFC4271] and actively used for forwarding traffic to that prefix. Backup route: A backup route is eligible for route selection, but it is not selected as the primary route and is also installed in the Loc-RIB. It is not used until all primary routes become unreachable. Backup routes are used for fast convergence in the event of failures. Consider an BGP route for destination prefix x/y is a multipath: x/y via BGP NH1 (path1) (best) via BGP NH2 (path2) (multipath - say ECMP) via BGP NH3 (path3) (backup) via BGP NH4 (path4) (valid but not best/multipath/backup) via BGP NH5 (path5) (invalid - for whatsover reason) This is a single route. The best/multipath/backup/valid/invalid/etc are qualifiers of its paths. Except for two stats that refer to paths (stale and suppressed), everything is referring to routes. I would like to discuss the semantics of route vs path. It seems to me like some of the stats are for paths and not routes. In general, I think the use of the terms primary/backup which are related to forwarding plane aspects can be confusing. Instead, perhaps using terms that are more suitable for BGP Loc-RIB would be better? I've suggested some of them above for consideration. Also refer to draft-ietf-grow-bmp-path-marking-tlv - the terms of stats should be aligned across the BMP documents? Furthermore, there is a wrong assumption that backup paths are only activated when all primary paths are down. This is very much implementation dependent. Some implementations have a 1:1 provisioning of primary/backup - where the backup would get used when its specific primary goes down - this draws on the FRR notion in the forwarding planes. Refer to the definition in draft-ietf-grow-bmp-path-marking-tlv These clarifications have implications on several of the stats as they are defined currently. <discuss-2> Section 3 has the following text and Section 4 introduces a table that brings up an interesting aspect. "This section defines different statistics type for Adj-RIB-In and Adj-RIB-Out monitoring type. Some of these statistics are also applicable to Loc-RIB; refer to Section 4 for more details." For types 24 through 28, they are applicable for both Adj-RIB-In and Loc-RIB. How does one know what is being reported? Can this be clarified? Seems like this is the first document introducing such overloaded types but I don't find the reason why this is being done. There is also a sort of duplication for same stat being both global as well as per afi/safi - is there any guidance on whether only one of them needs to be supported (this way avoiding the race conditions and discrepancies in their totaling)? It is important to clarify these aspects if this is going to set a precedent/guidance for other similar stats in BMP in future documents? <discuss-3> Section 5 - Operational considerations - is not entirely operational considerations. There is reference to "implementations" in several places and it is not clear if this is on the router side or the collector/monitoring side - this needs to be clarified so that expectations on either side implementations are clear. As an example: "Implementations MUST track discontinuities and log this information." - which side is this for? Several aspects are not really operational consideration but implementation considerations. Please consider a "Procedures" section for documenting some of those aspects. As an example, how is this text an operational consideration "Some statistics are dependent on feature configurations, such as GR, LLGR, and RPKI, so the corresponding statistics are only sent when these features are enabled. This statistics include Type 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 39, 40, 41, 42, and 43." Another example is "A BMP implementation MUST ignore unrecognized stat types upon receipt and MUST exclude unsupported stat types upon transmission." ... this is a normative protocol behavior that is burried in the Operational Considerations section. "Operators MAY consider rate-limiting statistic updates to minimize performance impact on control-plane processes." - why is this not at least a SHOULD and perhaps even a MUST? ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- I note that the WGLC for this document was not cross-posted to the IDR WG for soliciting review as required by the GROW WG charter. I hope this can be avoided going forward. I support the DISCUSS positions of both Eric and Gunter. Some of their points are related to the points that I have raised in my ballot as well. I also have some comments/suggestions that I hope will help improve the document. 1) Type = 37: (64-bit Gauge) Current number of routes in per-AFI/SAFI post-policy Adj-RIB-In not found by verifying route origin AS number through the ROA of RPKI [RFC6811]. The value is structured as: 2-byte AFI, 1-byte SAFI, followed by a 64-bit Gauge. The phrase 'not found by verifying ...' is confusing. I assume this refers to routes that didn't find any match in the RPKI cache? If so, please clarify. This also applies to type 43. 2) Type = 39: (64-bit Gauge) Current number of routes refused to be sent by exceeding the maximum AS_PATH length supported by the local configuration. The phrase 'refused to be sent ...' is confusing. Perhaps you mean routes that were not sent because ... This also applies to type 40. _______________________________________________ GROW mailing list -- [email protected] To unsubscribe send an email to [email protected]
