[OPSAWG] Comments on draft-claise-opsawg-collected-data-manifest-06

Jan Lindblad (jlindbla) Mon, 13 Nov 2023 13:12:51 -0800

Hi Benoît, draft authors, WG,

Thank you for the presentations (in several WGs) during IETF 118 and your great 
work around model driven telemetry. I have now read the latest version of 
draft-claise-opsawg-collected-data-manifest and would like to offer some 
comments.


This is really valuable work, and I will make sure to reference/use it in the 
next version of draft-lindblad-tlm-philatelist, which I generally feel fits 
quite nicely with this.

1) Controller level modules

I read the abstract and intro section of some earlier version of the 
collected-data-manifest work already last year, but it wasn't until this week I 
realized that this work is aimed at controllers. I think this fact is not 
mentioned in the abstract, nor anywhere in section 1 of the document. Many of 
the referenced modules (e.g. ietf-yang-library, ietf-subscribed-notifications) 
are device level modules, which IMHO makes it easy to misunderstand the 
proposed architecture. I'd suggest you clearly position this work as a set of 
controller level modules already in the abstract, even if it is already 
mentioned elsewhere if you read the entire document carefully.

2) Copy pasting from device modules

I see there is quite a bit of copy+pasting from device modules in this work. I 
understand why, and I would have done the same thing just to get my points 
across, but we need to do something about this in upcoming versions.

3) YANG to Time Series Database (TSDB) mapping

Appendix A provides a sketch for a mapping from YANG to TSDB tagged format. May 
I propose that we collaborate on the details for this mapping in 
draft-kll-yang-label-tsdb and that you refer to that document in lieu of 
appendix A?

4) Configuring the collection process

A "principle" I have proposed in the IAB e-impact program mailing list is that 
the (sustainability) telemetry collection should be entirely controlled by 
configuration. It should be possible for the operators/consumers of the 
collected data output to control (and transparently inspect) the collection 
process, and not embed/hard code the choices of what is included and not in 
code. Do you agree with this principle, and if so, would you have some thoughts 
about how the configuration framework I have proposed in 
draft-lindblad-tlm-philatelist could be merged with the platform and collection 
manifest?

5) Collection of the metadata

In section 5.2, it is mentioned that "We don’t focus on the timing aspect as 
storing both the data and their manifest in a time series database will allow 
the data scientists to look for the Data Manifest corresponding to the 
timestamp of the datapoint. In that scenario, the reliability of the collection 
of the Data Manifest is the same as the reliability of the data collection 
itself, since the Data Manifest is like any other data."

Could you elaborate a little on your exact ideas here? As I understand it, the 
main bulk of the data collection would be from a device to the TSDB. But the 
data manifest model would sit on a controller/collector, and not a device? So 
would the collector have a subscription on itself, or what exactly do you have 
in mind? Also, would this metadata collection process be granular, so that only 
actual changed leafs (e.g. period) is recorded, or would it record all data 
manifest values when any (e.g. period) changes? The example in figure 5 makes 
me think you might mean taking the entire thing each time anything changes. It 
seems to me the data manifest is potentially rather large, and if the period 
changes frequently, this could amount to a lot of data in the TSDB.

6) Size of the platform manifest

The platform manifest includes pretty much the entire yang-library. For certain 
devices, this could be a large amount of data. More than 1MB, I would guess. 
This data is sent to the TSDB once per system the collector is fetching data 
from. If that is from a few hundred devices (or much more), this metadata alone 
may land in the GB zone (or much more). Is there something we could do to make 
this scale a bit better? Maybe structuring the metadata differently could make 
it easier to reduce the repetition across devices that lands in the TSDB?

7) Wider applicability

Another of the "principles" I argued for in the e-impact mailer was that we 
should collect telemetry data from existing device interfaces (available now), 
rather than require and wait for new ones to be implemented in the real world 
networks. In practice, this implies collecting data also using other means than 
YANG-Push. I proposed some mechanisms for dealing with both the collection of 
data and metadata from such non-YANG sources in draft-lindblad-tlm-philatelist. 
Do you think we could incorporate some of those thoughts in the work here?

Thank you again for doing all this work and for sharing with the WG.

Best Regards,
/jan

_______________________________________________
OPSAWG mailing list
OPSAWG@ietf.org
https://www.ietf.org/mailman/listinfo/opsawg

[OPSAWG] Comments on draft-claise-opsawg-collected-data-manifest-06

Reply via email to