Re: [OPSAWG] AD review of draft-ietf-opsawg-service-assurance-yang-07

Jean Quilbeuf Tue, 18 Oct 2022 17:09:57 -0700

Hi Rob,
Thank you very much for you detailed review.

The latest version should address most of the comments. The diff is here: 
https://www.ietf.org/rfcdiff?url2=draft-ietf-opsawg-service-assurance-yang-08


Answers to some unanswered or added details are inline below.

Thanks Again
Best,
Jean


> -----Original Message-----
> From: Rob Wilton (rwilton) [mailto:rwil...@cisco.com]
> Sent: Monday 17 October 2022 13:06
> To: opsawg@ietf.org; draft-ietf-opsawg-service-assurance-yang....@ietf.org
> Subject: AD review of draft-ietf-opsawg-service-assurance-yang-07
> 
> Dear authors,
> 
> And here is my corresponding AD review of draft-ietf-opsawg-service-
> assurance-yang-07.  Again, I found the document to be well-written, with
> mostly minor comments/suggestions, bar one question about the symptom
> reference.
> 
> 
> Moderate level comments:
> 
> (1) p 11, sec 3.3.  YANG Module
> 
>   grouping symptom {
>     description
>       "A grouping for the symptoms for a specific subservice.";
>     leaf symptom-id {
>       type leafref {
>         path "/agents/symptoms-description/symptom-id";
>       }
>       description
>         "Identifier of the symptom, to be interpreted according
>          to the agent identified by the agent-id.";
>     }
> 
> Should this path be constrained to the specific agent instance identified by
> agent-id?  E.g., similar to how the 'id' path is constrained in subservice-
> reference below.
> 

Indeed that was a mistake, it's fixed now.

> 
> (4) p 5, sec 3.2.  Tree View
> 
>    module: ietf-service-assurance
>      +--ro assurance-graph-last-change    yang:date-and-time
>      +--rw subservices
>      |  +--rw subservice* [type id]
>      |     +--rw type                                identityref
>      |     +--rw id                                  string
> 
> I don't really like to mess with the structure of YANG modules during AD (or
> IESG) review, but I wonder whether having type as part of the subservice
> key doesn't end up making it more cumbersome to refer to subservices, and
> whether a flat id space might be better (i.e., removing 'type' from the
> subservice list keys).  But I also don't mind if the author/WG do not want to
> change the structure at this stage.
> 

I decided not to change it at the moment.

> 
> (8) p 5, sec 3.2.  Tree View
> 
>            +--ro subservices* [type id]
>               +--ro type    -> /subservices/subservice/type
>               +--ro id      leafref
> 
> A couple more consistency questions:
> 1) I note that subservices has been put into a container, but that agents,
> assured-services are not.
> 2) The "subservice" list is the singular tense (presumably because it is 
> under a
> container), but agents, assured-services and subservices are not.  I know that
> you have to optimize for XML or JSON readability.
> 

According to https://datatracker.ietf.org/doc/html/rfc8407#section-4.14.2 , top 
level lists with large number of nodes should be avoided. So I have wrapped 
them into containers.

> 
> (9) p 6, sec 3.2.  Tree View
> 
>            +--ro subservices* [type id]
>               +--ro type    -> /subservices/subservice/type
>               +--ro id      leafref
>    The date of last change "assurance-graph-last-change" is read only.
>    It must be updated each time the graph structure is changed by
>    addition or deletion of subservices, dependencies or modification of
>    their configurable attributes.  Such modifications correspond to a
>    structural change in the graph.  The date of last change is useful
>    for a client to quickly check if there is a need to update the graph
>    structure.  A change in the health-score or symptoms associated to a
>    service or subservice does not change the structure of the graph and
>    thus has no effect on the date of last change.
> 
> Presumably, you considered, and dismissed, having a top-level timestamp to
> track whether any changes in symptoms or health-score have occurred
> within any sub-service?  Perhaps this doesn't end up being useful in practice?
> 

The idea behind that timestamp was that the graph structure is "hard" to 
maintain using updates from streaming telemetry (need to keep track of 
dependencies and sometimes recursively delete orphan nodes), thus having a way 
to fully synchronize via this timestamp can simplify the algorithm.
Scores and symptom changes are easier to track via a simple subscription.
Also our assumption is that graph structure changes less frequently than scores 
and symptoms.

> 
> (11) p 6, sec 3.2.  Tree View
> 
>    The presence of "maintenance-contact" field inhibits the emission of
>    symptoms for that subservice and subservices that depend on them.
>    See Section 3.6 of [I-D.ietf-opsawg-service-assurance-architecture]
>    for a more detailed discussion.
> 
> This wasn't obvious to me from the leaf name, and I wonder whether it
> would have been better to have a presence-container "in-maintenance"
> with a child "contact" leaf.

Yes, it's more readable.

> 
> (12) p 6, sec 3.2.  Tree View
> 
>    The "health-score" contains a value normally between 0 and 100
>    indicating how healthy the subservice is.  The special value -1 can
>    be used to specify that no value could be computed for that health-
>    score, for instance if some metric needed for that computation could
>    not be collected.
> 
> Because this is an enum, this would often/normally be encoded as the string
> "missing" rather than as -1.

I am hoping that YANG-aware deserialization will replace the value -1 by the 
enum name, i.e. "missing" when targeting a human. The reason to use a number is 
for homogeneity with the health score, thinking of time series databases which 
fail if a time series changes type (int -> string for instance) over time.

> 
> (13) p 7, sec 3.2.  Tree View
> 
>    The "health-score" contains a value normally between 0 and 100
>    indicating how healthy the subservice is.  The special value -1 can
>    be used to specify that no value could be computed for that health-
>    score, for instance if some metric needed for that computation could
>    not be collected.
>    The "symptoms-history-start" is the cutoff date for reporting
>    symptoms.  Symptoms that were terminated before that date are not
>    reported anymore in the model.
> 
> Presumably it is implementation specific about how/when the symptoms get
> cleared (i.e., when the symptoms-history-start is updated).  I did wonder
> whether a YANG RPC could be defined to reset this.  But I might be confuses
> as to who is actually providing this timestatmp, e.g., is it the monitored
> device (or SAIN agent), or is it the SAIN Collector.  This could also be 
> deferred
> as future work.
> 

For now it is to the discretion of the entity exporting the symptoms (could be 
agent exporting to collector or collector exporting to the rest of the world) 
to decide that cutoff date.
Indeed a more sophisticated agent or collector might have a rpc to set the 
symptoms retention period, but I think that it is out of scope for this draft.

 
> Thanks,
> Rob

_______________________________________________
OPSAWG mailing list
OPSAWG@ietf.org
https://www.ietf.org/mailman/listinfo/opsawg

Re: [OPSAWG] AD review of draft-ietf-opsawg-service-assurance-yang-07

Reply via email to