Hi Jean,

Thanks for considering my comments. Please see some follow-up 
questions/comments.

I notice that you are now calling out Data Manifest, Platform Manifest, and 
Data Collection Manifest. For Data Manifest, the current definition in the 
Terminology section states:

The necessary data required to interpret telemetry information

Can you be more specific about what “necessary data” is? How about saying “The 
collection period required to interpret telemetry information”?

> On Oct 20, 2025, at 2:48 PM, Jean Quilbeuf <[email protected]> wrote:
> 
> Hi Mahesh,
> Thank you very much for your comments. I tried to address your comments, the 
> diff is here: 
> https://author-tools.ietf.org/iddiff?url1=draft-ietf-opsawg-collected-data-manifest-09&url2=draft-ietf-opsawg-collected-data-manifest-10&difftype=--html
>  
> <https://author-tools.ietf.org/iddiff?url1=draft-ietf-opsawg-collected-data-manifest-09&url2=draft-ietf-opsawg-collected-data-manifest-10&difftype=--html>
>  
> 
> I did not address the following points:
> * issue with the non-normative YANG module

I will wait for this point to be addressed before progressing the document.

> * there was one open questions from Med’s review, do we align with YANG 
> catalog (ease of utilization of the module) or with some existing hardware 
> module (ease of implementation). Asked here to the WG: 
> https://mailarchive.ietf.org/arch/msg/opsawg/gZTy7zxEvkzBNfuOlZ9z3hfWxQc/ 
> <https://mailarchive.ietf.org/arch/msg/opsawg/gZTy7zxEvkzBNfuOlZ9z3hfWxQc/>  
> . I will ask again.

Ok.

> 
> Best,
> Jean
> 
>> -----Original Message-----
>> From: Mahesh Jethanandani <[email protected]>
>> Sent: Monday, September 29, 2025 4:10 PM
>> To: [email protected]
>> Cc: opsawg <[email protected]>
>> Subject: [OPSAWG]AD review of draft-ietf-opsawg-collected-data-manifest
>> 
>> Hi Authors,
>> 
>> Thanks for working on this document. Thanks also to reviewers who have
>> reviewed this document and helped improve it.
>> 
>> Please find here some comments that I would like to discuss with the authors.
>> Hope they go towards improving the draft.
>> 
>> 
>> Section 1, paragraph 0
>>>   Network platforms use Network Telemetry, such as YANG-Push [RFC8641],
>>>   to continuously stream information, including both counters and state
>>>   information.
>> 
>> Thanks to Med for performing the OPSDIR revew. I do not see a response from
>> the authors (at least on the mailing list). Have those points been addressed?
>> 
>> Thanks also to Joe for the shepherd writeup. He (and others) have brought up
>> the question or normative and non-normative YANG modules being mixed in the
>> draft. It should be captured in the shepherd's writeup. Regardless, the
>> combination of normative and non-normative in the normative section of the
>> document is problematic. All non-normative text should be moved to the
>> Appendix.
>> 
>> Section 1, paragraph 0
>>>   One part of this information is the actual collection period, as
>>>   opposed to the configured collection period.  On some platforms, that
>>>   period can be adjusted automatically by the platform, for instance to
>>>   reduce the load incurred by sending the telemetry.  To later exploit
>>>   the collected data, getting this actual collection period is crucial.
>>>   This document defines a YANG model augmenting the YANG-Push model
>>>   [RFC8641] to expose the actual collection period in Section 4.
>> 
>> If the server cannot satisfy the client's suggested collection period, 
>> should it not
>> just reject the session? Are you suggesting that servers can change the 
>> collection
>> period on the fly after a session is established? Should the server not 
>> report that
>> as an error or close the session if it cannot satisfy what was negotiated as 
>> part of
>> the session?
> 
> That would depend on the device, some devices would reduce the cadence rather 
> than interrupt the collection.
> We changed the text to " On some platforms, that period can be adjusted 
> *unilaterally* by the platform ..." to make it clear.

> 
>> Section 1, paragraph 7
>>>   To retrieve the context in which a particular piece of data was
>>>   collected, three elements are necessary: the time of data emission,
>>>   the originating platform and the subscription through which the data
>>>   arrived.  The approach described in this document delegates the time
>>>   retrieval to the database storing the collected telemetry and
>>>   focusing on providing a way to match a platform and a subscription
>>>   identifier to the collection context.  This is consistent with most
>>>   of the YANG modules for devices, which focus on describing the
>>>   current state of the device, rather than the evolution of that state
>>>   through time.
>> 
>> It is not clear why the document delegates the time retrieval to the database
>> storing the collected telemetry. Is this the database at the source or the 
>> database
>> where the data is being collected?
>> Moreover, while the motivation for matching the platform id to the 
>> subscription
>> is clear, the rationale behind the statement following that is not clear. 
>> Can this
>> be explained differently, or dropped if it does not provide motivation for 
>> the
>> work within the draft?
> 
> The paragraph has been rewritten to clarify our point: we don’t care about 
> recording history in the YANG model.

Got it. I think you mean to say “which focus on describing the state of the 
device at a particular point in time, rather than …”.

> 
>> Section 2, paragraph 2
>>>   Platform manifest: part of the data manifest that completely
>>>   characterizes the platform producing the telemetry information.
>> 
>> Isn't "*part* of the data manifest that *completely* characterizes” a
>> contradiction? Can this be reworded?
> 
> Removed "completely".
> 
>> Section 2, paragraph 2
>>>   Data collection manifest: part of the data manifest that completely
>>>   characterizes how and when the telemetry information was metered.
>> 
>> Same here.
>> 
> 
> Removed "completely".
> 
>> Section 3.1, paragraph 4
>>>   From a collection parameters point of view, the data scientists
>>>   analyzing the collected data must know whether the counter was
>>>   requested from the network platform as on-change or at specific
>>>   cadence [RFC8641].  Indeed, an on-change collection explains why
>>>   there is a single value as opposed to a time series.  In case of
>>>   periodic collection, this exact cadence might not be observable in
>>>   the time series.  Indeed, this time series might report some values
>>>   as 0 or might even omit some values.  The reason for this behavior
>>>   might be diverse: the network platform may be under stress, with a
>>>   too small observation period, compared to the minimum-observed-
>>>   period.  Knowing the conditions under which a counter was collected
>>>   and streamed (along with the platform details) helps drawing the
>>>   informed conclusions.  As an example, some platform might report a
>>>   value of 0 for counters when the collection period is too short with
>>>   respect to the capabilities of the platform.  Without context, this
>>>   value of 0 might lead to a wrong conclusion that the corresponding
>>>   counter dropped to zero.
>> 
>> When I read the term "too short", I understand it as a limitation that 
>> prevents
>> the device from reading the value correctly. Is that what is meant in this 
>> case?
> 
> Yes some devices have this bug. We added "erroneously" to make it clear that 
> is a bug in that case.
> 
>> Which also brings up the question of why the collection period is defined in
>> centiseconds? What use case requires data to be collected in centiseconds or
>> even milliseconds?
> 
> Centiseconds come from the ietf-yang-push module, we reused it.
> 
>> In particular, I wonder how the reporting of 0 value for the counter is 
>> different
>> between the time period being too short, or when the platform wants to report
>> a count of 0, because the counter was not incremented?
> 
> For this document, the point is that some misleading telemetry might be 
> collected and the potential cause of this misleading information must be 
> recorded.
> 
>> Section 3.2, paragraph 0
>>>   When a new device is onboarded, operators must check that the new
>>>   device streams data (e.g., with YANG-Push), that the Network
>>>   Telemetry data is the right one, that the data is correctly collected
>>>   at the data collection, and finally that the data can be analyzed
>>>   (compared with other similar devices).  For the last point, the data
>>>   manifest, which must be linked to the data up to the collection and
>>>   analytics system, contains the relevant information.
>> 
>> What is "relevant information" in this case? The collector is the one 
>> requesting
>> the data. Therefore, it knows which device is supposed to send the data. Is 
>> this a
>> check to see that the name/id of the device that the collector requested data
>> from matches the name/id in the data manifest? If so, can that be made more
>> apparent?
> 
> You think along the lines of the YANG management traditional way. A 
> controller configures the collection of metrics, so the controller acts as a 
> collector. 
> The dynamic subscription (https://www.rfc-editor.org/rfc/rfc8639#section-2.4 
> <https://www.rfc-editor.org/rfc/rfc8639#section-2.4>) does not represent all 
> the cases.
> Think of YANG-Push telemetry, like in 
> https://datatracker.ietf.org/doc/draft-ietf-nmop-network-anomaly-architecture/
>  
> <https://datatracker.ietf.org/doc/draft-ietf-nmop-network-anomaly-architecture/>
>  figure 1: 
>    so UDP based telemetry, 
>    distributed from all line cards, 
>    either configured by a collector that is NOT the controller or even 
> configured by default on all routers as a default config.
> 
> We mentioned the fact that the collector is not necessarily the one 
> configuring the telemetry.

Thanks for adding that clarification.

> 
>> Section 4, paragraph 5
>>>       "This module augments ietf-subscribed-notification and
>>>        ietf-yang-push with the current-period statistics reporting the
>>>        actual collection period, as opposed to the configured one.
>> 
>> This module seems not to be adding current-period statistics. It is adding 
>> the
>> current collection period.
> 
> Removed "statistics"
> 
>> Section 4, paragraph 8
>>>       description
>>>         "Adds current period statistics";
>> 
>> Same here.
> 
> Removed "statistics"
> 
>> Section 5.1, paragraph 3
>>>   The YANG module contains a list of platform manifests (in 'platforms/
>>>   platform'), indexed by the identifier of the platform.  That
>>>   identifier should be defined by the network manager so that each
>>>   platform emitting Network Telemetry has a unique identifier.  There
>>>   are several documents about managing the inventory of the network,
>>>   e.g., [I-D.ietf-ivy-network-inventory-yang].  The platform identifier
>>>   should be the same as the identifier used in inventories or the
>>>   'node-id' in [RFC8345].  As an example, the identifier could be the
>>>   'sysName' from [RFC3418].  The scope of the "ietf-platform-manifest"
>>>   module is the scope of the data collection, i.e., a given network,
>>>   therefore it contains a collection of platform manifests, as opposed
>>>   to the device scope, which would contain a single platform manifest.
>> 
>> Thanks to Qiufang Ma for her YANG doctors' review. She makes a few points
>> that are worth considering, which I will reiterate later.
>> 
>> The last few statements are clear on the scope of the model, i.e., a given
>> network and not a device. If that is the case, I am curious to know how the 
>> data
>> nodes in this module get populated. If the idea is for the module to be
>> implemented both at a network level (as a list) and at a device level (as a 
>> single
>> entry in the list), the statement above on the scope needs to be modified.
> 
> Clarified in the text.

Thanks

> 
>> Thanks for recognizing that some of the nodes in this module should be the
>> same as the nodes/definitions in other modules, such as platform/id, name,
>> software-version, etc. If that is the case, should they not be leafref to 
>> nodes in
>> the network inventory module or the hardware module defined in RFC8345?
>> How are the nodes defined differently from the nodes defined in that module?
> 
> Clarified that selecting the global identifier for devices is out of scope of 
> this document.

Ok.

> 
>> Section 5.2, paragraph 4
>>>       "This module describes the platform information to be used as
>>>        context of data collection from a given network element. The
>>>        contents of this model must be streamed along with the data
>>>        streamed from the network element so that the platform context
>>>        of the data collection can be retrieved later.
>> 
>> Why later? Isn't the idea that the collector wants to correlate the data 
>> being
>> streamed to the platform manifest as it is being written to a time series 
>> database
>> or being analyzed? Should it not be collected as part of establishing the 
>> session?
> 
> Clarified that context is needed at analysis time.

The updated text is better. Thanks.

> 
>> Section 5.2, paragraph 4
>>>        The data content of this model should not change except on
>>>        upgrade or patching of the device.
>> 
>> Thanks for bringing this up. If a device is "patched", by which I understand 
>> its
>> software is updated, why should the data content change? Or do you mean the
>> device is being replaced?
> 
> Clarified that we consider software updates here and indicated the possible 
> changes.

The updated statement says:

which might modify the software or os version or the YANG library

How does the modification of software or OS version affect the contents of the 
data being streamed? I think changes in the data being streamed, which you 
refer to as the YANG library will affect the data being collected, but I am not 
sure about other software or OS.

> 
>> Section 5.2, paragraph 7
>>>       leaf name {
>>>         type string {
>>>           length "1..1023";
>>>         }
>>>         description
>>>           "Model of the platform from which data is collected.";
>>>       }
>> 
>> Why constrain the length of the string? Most models have shied away from
>> putting any constraints on the name and leaving it to the implementation in
>> question. My suggestion would be to provide a rationale for the constraint or
>> remove it from this and all the nodes that have it.
> 
> The limit was added because of a comment from Joe: 
> https://mailarchive.ietf.org/arch/msg/opsawg/3xYDyNgTsODKDT4I8adAjBFmflA/ 
> <https://mailarchive.ietf.org/arch/msg/opsawg/3xYDyNgTsODKDT4I8adAjBFmflA/> 
> I am open to removing it.

Ok. I went back and looked at Joe’s comment. He is correct, in that one must 
have restriction on the size when writing to the database. The question is 
whether it needs to be in model. My observation is that most models have chosen 
not to put a restriction in the model itself, unless it is a well-known string, 
and leave it to the implementation to put that restriction based on their 
requirements. 

Thanks

> 
>> Section 5.2, paragraph 7
>>>       description
>>>         "Top container including all platforms in scope. If this model
>>>          is hosted on a single device, it should contain a single entry
>>>          in the list. At the network level, it should contain an entry
>>>          for every monitored platform.";
>> 
>> This description is what I alluded to above, and seems to contradict the
>> statement above, mainly:
>> 
>> The scope of the "ietf-platform-manifest" module is the scope of the data
>> collection, i.e., a given network, therefore it contains a collection of 
>> platform
>> manifests, as opposed to the device scope, which would contain a single
>> platform manifest.
>> 
>> 
>> If the idea is that this module needs to be implemented both at a network 
>> level
>> (as a list of platforms) AND at a device level (as a single entry in the 
>> list), the
>> above statement needs to be modified to make it clear.
> 
> The above statement was clarified.
> 
>> Section 6, paragraph 1
>>> 6.  Data Collection Manifest
>>> 
>>>   This section is non-normative.
>> 
>> Thanks to Qiufang Ma for her YANG doctors review and Joe who brings up the
>> point in his Shepherd review. Both bring up a valid point on this particular
>> module. If this section is not normative, then this section belongs in the
>> Appendix.
>> 
>> Section 13, paragraph 1
>>>   *  Do we want to handle the absence of values, i.e. add information
>>>      about missed collection or errors in the collection context ? It
>>>      could also explain why some values are missing.  On the other
>>>      hand, this might also be out scope.  CLOSED: the goal of the
>>>      manifest is to be able to detect miscollection a posteriori.
>>>      Assurance of the metric collection is out of scope and could be
>>>      done via an external mechanism such as SAIN.
>> 
>> Why is information on missed collection out of scope of this document? If the
>> system is allowed to change the collection period if it "too short", can it 
>> not
>> report an error when it is not able to because of it?
> 
> I believe that it is out of scope for this document, the missed collection 
> are not necessary part of the context for the collection.
> Also, the device might report that it did not sent a collection when it 
> should have, but that would not account the collection missed outside of the 
> device (i.e. UDP packet lost, collector overloaded and dropping some 
> telemetry). 
> There should be some other system to monitor the collection stack, report and 
> explain such miscollections.
> 
>> No reference entries found for these items, which were mentioned in the text:
>> [draft-opsarea-rfc5706bis].
> 
> That reference will be removed in the RFC, it’s part of the "Changes between 
> revisions" section.
> 
>> Found terminology that should be reviewed for inclusivity; see 
>> https://www.rfc- <https://www.rfc-/>
>> editor.org/part2/#inclusive_language 
>> <http://editor.org/part2/#inclusive_language> for background and more
>> guidance:
>> 
>> * Terms "natively" and "native"; alternatives might be "built-in",
>>   "fundamental", "ingrained", "intrinsic", "original"
> 
> Removed.
> 
>> -------------------------------------------------------------------------------
>> NIT
>> -------------------------------------------------------------------------------
>> 
>> All comments below are about very minor potential issues that you may choose
>> to address in some way - or ignore - as you see fit. Some were flagged by
>> automated tools (via https://github.com/larseggert/ietf-reviewtool), so there
>> will likely be some false positives. There is no need to let me know what 
>> you did
>> with these suggestions.
>> 
>> Section 4, paragraph 0
>> + s/current collection/current collection period/
>> 
>> Section 2, paragraph 2
>>>   Platform: equipment of the network able to produce telemetry data.
>> 
>> s/equipment of the network/equipment in the network/
>> 
>> Section 3.1, paragraph 2
>>>   *  This counter definition, typically as defined in the YANG model.
>> 
>> s/This counter/The counter/
>> 
>> Section 4, paragraph 0
>>>   Some platforms will adjust the collection period depending on their
>>>   capabilities and current load.  The YANG module in this section
>>>   augments the "ietf-subscribed-notification" module to provide the
>>>   "current-period" leaf.  The value of this leaf indicates the current
>>>   collection which might be different from the configured collection
>>>   period.
>> 
>> s/current collection/current collection period/
>> 
>> Mahesh Jethanandani
>> [email protected]


Mahesh Jethanandani
[email protected]






_______________________________________________
OPSAWG mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to