Hi Mukul,

Thanks for your expedited prior response. Please see inline: GV2>

From: Srivastava, Mukul Kumar <[email protected]>
Sent: Friday, November 14, 2025 10:56 PM
To: Gunter van de Velde (Nokia) <[email protected]>; The IESG 
<[email protected]>
Cc: [email protected]; [email protected]; 
[email protected]; [email protected]
Subject: Re: Gunter Van de Velde's Discuss on 
draft-ietf-grow-bmp-bgp-rib-stats-14: (with DISCUSS and COMMENT)

You don't often get email from 
[email protected]<mailto:[email protected]>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>


CAUTION: This is an external email. Please be very careful when clicking links 
or opening attachments. See the URL nok.it/ext for additional information.


Hi Gunter

Sharing my inline response on some of the comments below as [MS].

Thanks
Mukul
From: Gunter Van de Velde via Datatracker 
<[email protected]<mailto:[email protected]>>
Date: Friday, November 14, 2025 at 11:34 AM
To: The IESG <[email protected]<mailto:[email protected]>>
Cc: 
[email protected]<mailto:[email protected]>
 
<[email protected]<mailto:[email protected]>>,
 [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>, 
[email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>>, 
[email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Gunter Van de Velde's Discuss on draft-ietf-grow-bmp-bgp-rib-stats-14: 
(with DISCUSS and COMMENT)
Gunter Van de Velde has entered the following ballot position for
draft-ietf-grow-bmp-bgp-rib-stats-14: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to 
https://urldefense.com/v3/__https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/__;!!NEt6yMaO-gk!HrcolkTY5nVCvuTYRwujR1l5Uhw5AXsD__8s8nfxS2eyiJKMYJAnEZwS-Zvr10n9IMw8X4Etpsob$
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-ietf-grow-bmp-bgp-rib-stats/__;!!NEt6yMaO-gk!HrcolkTY5nVCvuTYRwujR1l5Uhw5AXsD__8s8nfxS2eyiJKMYJAnEZwS-Zvr10n9IMw8X1T7zVNz$



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

# Gunter Van de Velde, RTG AD, comments for draft-ietf-grow-bmp-bgp-rib-stats-14

# The line numbers used are rendered from IETF idnits tool:
https://urldefense.com/v3/__https://author-tools.ietf.org/api/idnits?url=https:**Awww.ietf.org*archive*id*draft-ietf-grow-bmp-bgp-rib-stats-14.txt__;Ly8vLy8!!NEt6yMaO-gk!HrcolkTY5nVCvuTYRwujR1l5Uhw5AXsD__8s8nfxS2eyiJKMYJAnEZwS-Zvr10n9IMw8X5rdqMnz$

# Many thanks for the RTGDIR review from Bruno and the shepherd writeup from
Job.

# Did i miss seeing a cross posting to IDR/BESS to understand if the various
suggested gauges definitions are accurately described and understood from
protocol perspective.

# DISCUSS
# =======

#1# the section "5.  Operational Considerations" seems to document a mix of
operational considerations (non BCP14 language required) and GMP protocol
formal procedures (BCP14 language is required). Can these two be untangled. It
will make it easier for implementors to do the correct implementation.
[MS] I am not clear what needs to be untangled and how do we want to word this.

GV2> I was suggesting to split up “5. Operational Considerations” in a section 
“5.1. Operational Considerations to produce gauges for BMP statistics messages” 
and “5.2. Operational Considerations for Operators using gauges for BMP 
statistics message”

GV2> For example, using this then the following section would find a fit is 5.1.

369          Counters may reset due to session restart, manual clearance, or
370          overflow.  Implementations MUST track discontinuities and log this
371          information.

GV2> For example, using this then the following section would find a fit is 5.2.

373          Operators MAY consider rate-limiting statistic updates to minimize
374          performance impact on control-plane processes.  Operators SHOULD
375          enable only necessary statistics to reduce memory and CPU overhead.
#2# In general i found the descriptions of most of the gauges for the newly
proposed statistics types not very accurately described. See my ""COMMENT""
section for input and overview. Too lengthy in the overview DISCUSS section

#3# some gauges seem duplicates from prior existing gauges. Not sure we need
two times the same gauge in different code-points. seems sub-optimal and error
prone.

#4# section 5 is named "Statistics Definition" and that seems not aa well
described title. Can this be something that better describes the content? for
example "RIB monitoring type statistics"
[MS] I feel "Statistics Definition” is an appropriate generic section title. 
This is following by two sub-titles - “Adj-RIB-In Statistics Definition” and 
“Adj-RIB-Out Statistics Definition" which allies well with "Statistics 
Definition” title.

GV2> Thanks for the explanation. I still believe that “Statistics Definition” 
feels too broad as a section title since it only covers RIB statistics and hence
something like “RIB Monitoring Statistics” seems maybe more accurate? It’s 
narrower, matches the actual scope, and better reflects what the section 
contains.


#5# it was unclear to me that what the document specifies is that the gauge
that is formalized in this document is not simply a single dimensional gauge
alone, but that the value transferred by BMP is a combination of "AFI + SAFI +
gauge". I think i missed seeing that explicitly mentioned in the document.
Adding lengths (in general introduction section maybe to avoid repetition) of
each field would help making sure implementations interop well.
[MS] A gauge is a numeric (64 bit integer) value. The "AFI + SAFI” is the 
additional encoding that goes in the BMP stats data. As mentioned in RFC 7854, 
BMP statistics message is encoded like this -

  *   BMP header + BMP peer header + Stats count.

  *   The stats count is a TLV. (Stats Type, Stats length, Value —> Stats Data)

  *   The Stats Data is being encoded with  "AFI + SAFI”,  + “64 bit gauge”. 
This is being referred as “Value” in the doc.

Note that the wording is same as BMP rib-out RFC 8671. Also used in BMP loc-rib 
RFC 9069.
A BMP background is probably assumed in this draft.

GV2> While reviewing this draft alongside the earlier one, I noticed that some 
Types include the AFI/SAFI in their description (e.g., Type 19, 21), while 
others do not (e.g., Type 18, 20). When reading this document on its own, that 
inconsistency makes the meaning of the Value field unclear.
To avoid confusion, I suggest explicitly stating the meaning of “value” and 
where the AFI/SAFI context applies for each Type in this draft explicitly as 
well.

----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

# comments
# ========

19         This document defines new statistics type to monitor BMP Adj-RIB-In
20         and Adj-RIB-Out Routing Information Bases (RIBs).

GV> in the abstract is mentioned that the document defines new statistics (but
later is mentioned it are guages)

86         This document defines new gauges for BMP statistics message.

GV> The above does not fully align with what is written in the abstract, I
suspect you want to say:

"
This document defines gauges for new BMP statistics messages.
"
[MS] The abstract is usually a high-level thing, so it is mentioned like 
“statistics". It could be a “32 bit count” or a “64 bit gauge”. The statistics 
definition clarifies that is is a “gauge”. We can update doc as you said - 
"This document defines gauges for new BMP statistics messages.”

GV2> Thank you. This document has only new gauges, no new counters. Would it 
not be more correct to state that in the abstract to avoid confusion? Its only 
few words and does not blow up the abstract size

107        *  Pre-policy Adj-RIB-In: The result before applying the inbound
108           policy to an Adj-RIB-In.  Note that this aligns with the pre-
109           policy Adj-RIB-In concept specified in Section 2 of [RFC7854].

GV> Why is the text from RFC7854 not re-used? is there need for a new explicit
definition? GV> RFC7854 says:

"
   o  Adj-RIB-In: As defined in [RFC4271], "The Adj-RIBs-In contains
      unprocessed routing information that has been advertised to the
      local BGP speaker by its peers."  This is also referred to as the
      pre-policy Adj-RIB-In in this document.
"
[MS] I think this was a specific comment from Paulo & others to make it 
explicit and say pre-policy. RFC 7854 defines Adj-rib-in and says it is 
referred as pre-policy Adj-rib-in. I feel, it is ok to have this and 
implementation may continue using stats type 7 if that deems appropriate.

GV2> This is only a non-blocking comment. It up to authors and the WG to decide 
on how to use this comment. At least from my perspective as a first tiem reader 
of the document, it reads slightly odd. It is not technically wrong.

127        *  Primary route: A route to a prefix that is considered the best
128           route by the BGP decision process [RFC4271] and actively used for
129           forwarding traffic to that prefix.

GV> is this accurate? is it not the BGP route that is selected by BGP for being
forwarded to its peers? There may be ECMP or uECMP routes actively used

131        *  Backup route: A backup route is eligible for route selection, but
132           it is not selected as the primary route and is also installed in
133           the Loc-RIB.  It is not used until all primary routes become
134           unreachable.  Backup routes are used for fast convergence in the
135           event of failures.

GV> here is the concept of "all primary routes" used, indicating more as a
single best route. Is this not contradicting the prior bullet point?

[MS} - I think we need some clarification here and we can update doc if there 
is an agreement.
My understanding is that for a given prefix, there can be only one route marked 
is active route.  ECMP is applicable for forwarding layer only. When ECMP is 
present, the active route can have multiple next-hop (ECMP) installed in FIB to 
forward the traffic. From this BMP statistics draft POV, to keep things 
generic, I would suggest to define a primary route as “A route that is marked 
as active by local BGP protocol". Backup path is all paths that are "not 
primary route".  When we bring in  forwarding concepts, things might get 
confusing.

GV2> i think this is serious enough to explicitly document what is considered 
as an “primary route”. If implementors need to implement this, then they need 
to exectly understand what this means and what to implement. Otherwise we end 
up with statistics measuring different properties.


137     3.  Statistics Definition

GV> This title seems rather undescriptive. What about calling this section:

"
RIB monitoring type statistics
"

145        *  Type = 18: (64-bit Gauge) Current number of routes in pre-policy
146           Adj-RIB-In.  This gauge is similar to stats type 7 defined in
147           [RFC7854] and makes it explicitly for the pre-policy Adj-RIB-In.

GV> It is written that this is similar as stats type 7, but when looking at the
definitions in section 2 it is exactly the same. pre-existing stats type 7 is
exactly the same as the proposed stats type 18. Do we need type 18?
[MS] As mentioned before, this was done based on explicit comment from Paulo 
and others. I have mentioned my other thoughts above about this topic.

GV2> ack

149        *  Type = 19: (64-bit Gauge) Current number of routes in per-Address
150           Family Identifier (AFI)/Subsequent Address Family Identifier
151           (SAFI) pre-policy Adj-RIB-In.  This gauge is similar to stats type
152           9 defined in Section 4.8 of [RFC7854] and makes it explicitly for
153           the pre-policy Adj-RIB-In.  The value is structured as: 2-byte
154           AFI, 1-byte SAFI, followed by a 64-bit Gauge.

GV> same observation as the prior item. The newly suggested type 19 is exactly
the same as type 9. Do we need this new gauge? GV> what exactly is the "value"?
Can the structure of the field be more clarified? how how is the field encoded?
it seems more as a single dimensional 64 bit gauge.
[MS] Same comment as above. The “value” is the “Value —> Stats Data” mentioned 
in the BMP statistics message encoding explained above.

GV2> see my prior response. Some extra text blob in this document will help 
reduce confusion

GV> first time usage of the AFI/SAFI in this document and adding a reference
can be handy. Also maybe a list of AFI/SAFI this is intended for if this is
only for a subset of them.
[MS] This is all AFI/SAFI that a BGP peer supports. There is no list that needs 
to be mentioned here.

GV2> Is there a reference to what afi/safi is? For example rfc4760 and 
https://www.iana.org/assignments/safi-namespace/safi-namespace.xhtml and 
https://www.iana.org/assignments/address-family-numbers/address-family-numbers.xhtml

GV2> When vendors claim support for a proposed standard, they’re expected to 
implement and follow all formal procedures in the RFC — meaning they must 
comply with every MUST / MUST NOT requirement. The SHOULD / MAY items matter 
less for “full support.”
My point was whether this RFC implicitly assumes a minimum set of AFI/SAFI a 
vendor must implement to claim compliance. Are some AFI/SAFI less essential 
than others?
Maybe the cleanest solution is to state explicitly, as you already hinted, that 
compliance applies only to the AFI/SAFI that the BGP peer actually supports.

159        *  Type = 21: (64-bit Gauge) Current number of routes in per-AFI/SAFI
160           post-policy Adj-RIB-In.  The value is structured as: 2-byte AFI,
161           1-byte SAFI, followed by a 64-bit Gauge.

GV> what exactly is the "value"? Can the structure of the field be more
clarified? how how is the field encoded? it seems more as a single dimensional
64 bit gauge.
[MS] The “Value” is explained above.

GV2> ack

163        *  Type = 22: (64-bit Gauge) Current number of routes in per-AFI/SAFI
164           rejected by inbound policy.  This gauge is different from stats
165           type 0 defined in Section 4.8 of [RFC7854].  The stats type 0 is a
166           32-counter which is a monotonically increasing number and doesn't
167           represent the current number of routes rejected by an inbound
168           policy due to ongoing configuration changes.  The value is
169           structured as: 2-byte AFI, 1-byte SAFI, followed by a 64-bit
170           Gauge.

GV> If over time more and more routes are rejected, then how can the number of
rejected routes go ever go down? its an increasing number. Unless there is
assumption that there is accounting for the changing number of routes
received/withdrawn by a peer and it is the number of routes that were rejected
from the number of routes received. This may need more accurate definition of
what exactly is being measured and what reference is used.
[MS] - The rejected route can change based on policy configuration. RIB-in is 
associated with import policy. While RIB-out is associated with export policy. 
So we are measuring the effect of policy configuration.

GV2> please add such accurate description on behavior in the text. It clarifies 
assumptions that implementor may have and will help avoiding implementors 
implement incompatible fields.


172        *  Type = 23: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
173           accepted by inbound policy.  The value is structured as: 2-byte
174           AFI, 1-byte SAFI, followed by a 64-bit Gauge.  Some
175           implementations, or configurations in implementations, may discard
176           routes that do not match policy and thus the accepted count (type
177           23) and the Adj-RIB-In counts (type 21) will be identical in such
178           cases.

GV> not sure what is the text starting with "Some implementations, or ..." helps
with the formal definition of the field. It is useful from operational
perspective, but it convolutes the formal part of the definition of the field
itself. Maybe move to operational implication section
[MS] The text was added as part of a review comment.

GV2> useful information, but seems more at its place in the operational 
guidance section.


180        *  Type = 24: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
181           selected as primary route.  The value is structured as: 2-byte
182           AFI, 1-byte SAFI, followed by a 64-bit Gauge.

GV> the primary route is the route forwarding traffic? does this include all
ECMP and uECMP paths. BGP will only fwd the best BGP Path, but it may use more
as a single path for forwarding

184        *  Type = 25: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
185           selected as a backup route.  The value is structured as: 2-byte
186           AFI, 1-byte SAFI, followed by a 64-bit Gauge.

GV> does this include all routes that are not the BGP best path or only the
routes that are not used for forwarding? What makes a route a "backup" route.
[MS] Explained my thought above.

GV2> ok. As mentioned prior, this ties into an important point on what is 
considered as a backup route.

195        *  Type = 27: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
196           marked as stale by Graceful Restart (GR) events.  The value is
197           structured as: 2-byte AFI, 1-byte SAFI, followed by a 64-bit
198           Gauge.  'Stale' refers to a path which has been declared stale by
199           the BGP GR mechanism as described in Section 4.1 of [RFC4724].

GV> GR events happen when a CPM moves from a primary unit to a standby
unit/process. Such involves significant processing. Hence i wonder how mush
operational value this brings, or if would make the GR event worse then it
already is.
[MS] This is just a stats sent to collector. BMP stats is not interfering to 
the GR processing. These counter are created by local BGP process after 
processing. So I am not clear about this comment.

GV2> My understanding is that stale routes are routes that existed before the 
restart but haven’t been refreshed yet. They’re kept temporarily to avoid 
traffic loss, but that doesn’t mean GR processing has completed, the switchover 
is still ongoing and consumes CPU resources.
If BMP is running at the same time, generating accurate updates at high speed 
also takes CPU. From an operational perspective, it may be better to let the 
router focus its resources on completing the GR switchover and returning to a 
fully active state. Sending BMP statistics isn’t “free,” so there’s a trade-off 
to consider here. This could be documented in the operational guidance section.

201        *  Type = 28: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
202           marked as stale by Long-Lived Graceful Restart (LLGR).  The value
203           is structured as: 2-byte AFI, 1-byte SAFI, followed by a 64-bit
204           Gauge.  'Stale' refers to a path which has been declared stale by
205           the BGP LLGR mechanism as described in Section 4.3 of [RFC9494].

GV> see prior comments

211        *  Type = 30: (64-bit Gauge) Current Number of routes per-AFI/SAFI
212           left until reaching the received route threshold which corresponds
213           to the upper bound of accepted routes per Section 6.7 of
214           [RFC4271].  The value is structured as: 2-byte AFI, 1-byte SAFI,
215           followed by a 64-bit Gauge.

GV> Is this accurate? multiprotocol extensions are described in RFC4760 and not
in RFC4271. It is unclear how this counter referencing rfc4271 is to be applied
to rfc4760 when multiple afi/safi may be received from a single peer.

217        *  Type = 31: (64-bit Gauge) Current Number of routes left until
218           reaching a license-customized route threshold.  This value is
219           affected by whether a customized license exists, and when the
220           customized license is installed.

GV> This may be a soft threshold and in addition may be enforced outside the
router knowledge.

222        *  Type = 32: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
223           left until reaching a license-customized route threshold.  This
224           value is affected by whether a customized license exists for the
225           relevant address family, and when the customized license is
226           installed.  The value is structured as: 2-byte AFI, 1-byte SAFI,
227           followed by a 64-bit Gauge.

GV> This may be a soft threshold and in addition may be enforced outside the
router knowledge.

264        *  Type = 39: (64-bit Gauge) Current number of routes refused to be
265           sent by exceeding the maximum AS_PATH length supported by the
266           local configuration.

GV> can this be more accurate described? Is it "refused to be sent" or simply
"not sent" because route AS_PATH is longer as max AS_PATH length towards the
peer?

268        *  Type = 40: (64-bit Gauge) Current number of routes in per-AFI/SAFI
269           refused to be sent by exceeding the maximum AS_PATH length
270           supported by the local configuration.  The value is structured as:
271           2-byte AFI, 1-byte SAFI, followed by a 64-bit Gauge.

GV> See prior comment. I do not think 'refused' is the most accurate word to
use.... maybe filtered is a better term to use?
[MS] We can update that.

GV2> Thanks

328     5.  Operational Considerations

GV> Some of the definitions earlier have operational concerns included and are
maybe better added to the operational implication section.

330        This document defines new gauges for BMP statistics messages.  The

GV> i think more accurate would be that the document specifies gauges for "new
BMP statistics".

333        implementation-dependent.  Implementations SHOULD determine
334        appropriate report generation and delivery strategies, including
335        configurable timing intervals and threshold values.  The mechanism
336        for controlling the reporting of new gauges SHOULD be consistent with
337        that of existing types.  Implementations SHOULD also support per-
338        router configuration of statistic subsets for collection and
339        reporting.

GV> Why is this uppercase SHOULD? Is there a procedure that breaks? lowercase
seems sufficient as its documenting good behavior.
[MS] We can update that.

GV2> Thanks.

341        Some statistics are dependent on feature configurations, such as GR,
342        LLGR, and RPKI, so the corresponding statistics are only sent when
343        these features are enabled.  This statistics include Type 24, 25, 26,

GV> From operational perspective sending BGP Stats during a GR may impact the
GR event due to additional processing and dynamics. That is an operational
concern.

351        Certain statistics may have logical relationships (e.g., per-AFI/SAFI
352        counts summing to global totals).  Implementations MAY perform
353        consistency checks but MUST NOT assume strict dependencies (due to
354        potential race conditions or partial failures).  Discrepancies (e.g.,
355        sum(per-AFI/SAFI) != global count) SHOULD be logged as warnings but
356        MUST NOT disrupt protocol operation.

GV> not convinced these need to be BCP14 language. is BCP14 language required?

358        For backward compatibility, and absent policy otherwise, it is
359        RECOMMENDED that monitored routers capable of generating both (Type 7
360        and Type 18) or (Type 9 and Type 19) BMP statistics SHOULD transmit
361        both corresponding types simultaneously.  This allows monitoring
362        stations to process either format according to their needs without
363        disrupting existing implementations that rely on Type 7 or Type 9.

GV> In what way are The new types different from the prior types. its the exact
same value representing the exact same property.
[MS] This is an attempt to make counter explicit.

GV2> Can this be explained in the text within the draft. It is unclear from 
reading the document.

369        Counters may reset due to session restart, manual clearance, or
370        overflow.  Implementations MUST track discontinuities and log this
371        information.

GV> This document specifies gauges, not counters. Is this accurate usage of
words? is BCP14 language correct? seems not to be about formal protocol
procedure

373        Operators MAY consider rate-limiting statistic updates to minimize
374        performance impact on control-plane processes.  Operators SHOULD
375        enable only necessary statistics to reduce memory and CPU overhead.

GV> lowercase should/may seems sufficient
[MS] We can update that.

GV2> Thank you.

Many thanks for this document,

Kind Regards,
Gunter Van de Velde
RTG Area Director


_______________________________________________
GROW mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to