Benjamin Kaduk has entered the following ballot position for draft-ietf-pce-stateful-pce-auto-bandwidth-11: Discuss
When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html for more information about IESG DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-pce-stateful-pce-auto-bandwidth/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- This document seems generally useful, and thanks for it. I think there's a couple places where it's ambiguous or unclear, and hopefully we can tighten up the language without needing extensive changes. There may be an internal inconsistency in Section 2.3's definitions relating the Overflow-Count and the Overflow-Threshold: the description of Overflow-Count reads as if it is a reported quantity, but the description of Overflow-Threshold implies that Overflow-Count is a configured parameter. (Similarly for the Underflow- variants.) Specifically, the sentence "[t]his value indicates how many times consecutively, the percentage or absolute difference between the current MaxAvgBw and the current bandwidth reservation of the LSP is greater than or equal to the Overflow-Threshold value" uses the descriptive verb "is" as opposed to a conditional such as "needs to be [...] in order to [...]". Section 5.2.3.2 says: If the percentage difference between the current MaxAvgBw and the current bandwidth reservation is greater than or less than or equal to the threshold percentage, the LSP bandwidth is adjusted to the current bandwidth demand (MaxAvgBw) (as long as the difference in the bandwidth is at least or above the Minimum-Threshold). As written ("greater than or less than or equal to"), this is always true, which cannot be what was intended. Probably we should reword to just "greater than" and talk about the percentage absolute difference, i.e., 100 * abs(MaxAvgBw - CurrentRes) / CurrentRes . ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- As a general note, I'm not sure the phrase "calculated bandwidth to be adjusted" is a great fit for how it's currently being used. Specifically, the "calculation" in question seems to just be taking the maximum of a sliding window of per-timeslice average bandwidth values, and this seems like a sufficiently small amount of computation that I'd be comfortable describing it as a "measurement", for "measured bandwidth as input to path calculation". That is, this doesn't seem like it comes close to the amount of computation involved in actually computing paths, so I keep misreading what information is being sent where. Section 1 Over time, based on the varying traffic pattern, an LSP established with a certain bandwidth may require to adjust the bandwidth reserved in the network dynamically. The head-end Label Switch Router (LSR) monitors the actual bandwidth demand of the established LSP and periodically computes new bandwidth. The head-end LSR adjusts the bandwidth reservation of the LSP based on the computed bandwidth automatically. This feature is commonly referred to as Auto- Bandwidth. The Auto-Bandwidth feature is described in detail in Section 4 of this document. >From just this text, I can't tell if this is describing the mechanisms of this document or an existing Auto-Bandwidth feature that is related to the mechanisms of this document. In the model considered in this document, the PCC (head-end of the LSP) collects the traffic rate samples flowing through the LSP and calculates the new adjusted bandwidth. The PCC reports the calculated bandwidth to be adjusted to the PCE. This is similar to the Passive stateful PCE model, while the Passive stateful PCE uses path request/reply mechanism, the Active stateful PCE uses report/update mechanism. [...] nit: I think the punctuation in this last sentence needs some tweaking, as there looks to be a comma splice at present. (It's also unclear how much we need to talk about Passive stateful PCE, since just two paragraphs ago we say we focus on Active stateful PCE in this document.) This document defines the PCEP extensions needed to support Auto- Bandwidth feature in a Active stateful PCE model where the LSP bandwidth to be adjusted is calculated on the PCC (head-end of the LSP). The use of PCE to calculate the bandwidth to be adjusted is out of scope of this document. Maybe I'm just confused about active vs. passive in general, but I thought that active stateful PCE involved the PCE "making the decisions" with the PCC just providing input data. But here we're saying that the PCC is making the bandwidth decision, even though the rest of the LSP computation remains on the PCE? Or is "calculate the bandwidth to be adjusted" referring to "evaluate the expression max_{time periods}(avg. bandwidth per time period)" (as opposed to "compute the bandwidth adjustment to make)? Section 2.3 Are we assuming any relationship between the Up-Adjustment-Threshold and the Overflow-Threshold? traffic demand. If the percentage or absolute difference between the current MaxAvgBw and the current bandwidth reservation of the LSP is greater than or equal to the threshold value, the overflow condition is set to be met. The LSP bandwidth is adjusted to the nit: I think s/set to be met/said to be met/ ? (this appears twice) Minimum-Threshold: The increase or decrease of the LSP bandwidth should be at least or above the minimum-threshold represented as an absolute bandwidth value before the bandwidth adjustment for the LSP is made. This threshold can be seen as a suppression threshold that is used along with a percentage threshold to avoid unnecessary auto-bandwidth adjustments and re-signaling of the LSP at low bandwidth values. I can't tell from this text if this threshold is a (LSP) bandwidth value/measurement or a bandwidth offset between configured and measured values. (The subsequent protocol element definitions are more clear that it's the latter.) Section 3 The column headings in Table 1 refer to who initiates the LSP in question, right? The PCEP speaker supporting this document must have a mechanism to advertise the automatic bandwidth adjustment capability for both PCC- Initiated and PCE-Initiated LSPs. Just checking on "must" vs. "MUST" here. o It is required to identify and inform the PCC, which LSPs are enabled with Auto-Bandwidth feature. Not all LSPs in some deployments would like their bandwidth to be dependent on the real-time bandwidth usage but be constant as set by the operator. nit: I suggest NEW: % o It is required to identify and inform the PCC which LSPs have % enabled the Auto-Bandwidth feature. Not all LSPs in some % deployments would like their bandwidth to be dependent on the % real-time bandwidth usage; for some LSPs leaving the bandwidth % constant as set by the operator is preferred. Section 4.2 When the Auto-Bandwidth feature is enabled, the measured traffic rate is periodically sampled at each Sample-Interval (which can be configured by an operator and the default value as 5 minutes) by the PCC, when the PCC is the head-end node of the LSP. The traffic rate Do we cover what happens in the case when the PCC is not the head-end node of the LSP? Section 5.1 o The PCEP speaker that does not recognize the extensions defined in this document sends the PCErr message with error-type 2 (capability not supported) as per Section 6.9 in [RFC5440]. This refers to when the actual new messages (e.g., AUTO-BANDWIDTH-ATTRIBUTES TLV) are being used, not merely advertising the Auto-Bandwidth Capability TLV in the OPEN object, right? That is, this is descriptive language of what the core protocol spec mandates as error handling in the presence of unrecognized input? Section 5.1.1 nit: in the previous section we captialized this as "Auto-Bandwidth Capability", but here (including the section heading) it is "AUTO-BANDWIDTH-CAPABILITY". Should they be consistent? Section 5.2 Future specification can define additional sub-TLVs. I see that we see later that unrecognized sub-TLVs are ignored, but would there be a case where we want to use a bit in the AUTO-BANDWIDTH-CAPABILITY flags to negotiate support for a sub-TLV (i.e., to confirm that the peer will understand it, instead of just sending it and not knowing if it is understood)? Section 5.2.1 It sounds like we should ignore values larger than 7 days as well (rather than erroring out)? Section 5.2.2, 5.2.3 I'd like to see a little bit more detail specified of the interaction between (regular) Adjustment-{Interval,Threshold} and Down-Adjustment-{Interval,Threshold}. We do have this text about (e.g.) "[t]he Adjustment-Interval sub-TLV specifies the time interval for both upward and downward trend", which conveys the general intent, but I'm not sure I have all the details about there being one default value, that applies to both, and changes to the (regular) parameter continue to apply to both, until an explicit Down- variant is seen, at which point the control of the two parameters diverge permanently and they must always be separately controlled. Section 5.2.3.1 o Adjustment-Threshold: The absolute Adjustment-Threshold bandwidth value, encoded in IEEE floating point format (see nit: I'd suggest that this is a bandwidth *difference* value (or offset), pedantically speaking. (The following paragraph is quite clear on the actual usage, though, so this is nit-level.) This comment applies to basically all the threshold fields in the document, but I'll just make it once, here. Section 5.2.3.2 o Reserved: SHOULD be set to zero on transmission and MUST be ignored on receipt. I think the formulation of "MUST be set to zero on transmission and SHOULD be ignored on receipt" is more common. o Percentage: The Adjustment-Threshold value (7 bits), encoded in percentage (an integer from 1 to 100). The value 0 is considered to be invalid. The default value is 5 percent. I assume that the question of whether to allow sub-percentage granularity came up, and don't want to revisit the WG's decision. But please confirm that it was discussed. Also, are values over 100 also "invalid"? What should the recipient do upon receipt of an invalid value? If the percentage difference between the current MaxAvgBw and the current bandwidth reservation is greater than or less than or equal to the threshold percentage, the LSP bandwidth is adjusted to the current bandwidth demand (MaxAvgBw) (as long as the difference in the bandwidth is at least or above the Minimum-Threshold). I'd suggest rewording to make the two criteria equal peers (e.g., "If X and Y") rather than relegating the minimum-threshold to a parenthetical that's easy to ignore. Section 5.2.3.3 It would probably be good to reiterate here (and at other similar locations) that this "is used to decide" only when there's a need for different behavior in the upward and downard directions. Also, nit-level, I think "overrides" is more appropriate than "overwrites", since the non-Down- variant remains active for its separate role. Section 5.2.4.2 per second. The default maximum-bandwidth value is not set. I guess effectively the default is FLOAT_MAX, inherited from RFC 5440. Section 5.2.5.1 o Overflow-Threshold: The absolute Overflow-Threshold bandwidth value, encoded in IEEE floating point format (see [IEEE.754.1985]), expressed in bytes per second. Refer to Section 3.1.2 of [RFC3471] for a table of commonly used values. If the increase of the current MaxAvgBw from the current bandwidth reservation is greater than or equal to the threshold value, the overflow condition is met. nit: I'd suggest to s/increase/difference/, since (IIUC) the operation here is "observed bandwidth remains above the (reservation plus) threshold for <count> times", not "observed bandwidth increases in successive reporting intervals". This assumes that the Adjustment-Interval is larger than <count> times the reporting interval, of course. Section 5.2.5.3 o Underflow-Threshold: The absolute Underflow-Threshold bandwidth value, encoded in IEEE floating point format (see [IEEE.754.1985]), expressed in bytes per second. Refer to Section 3.1.2 of [RFC3471] for a table of commonly used values. If the decrease of the current MaxAvgBw from the current bandwidth reservation is greater than or equal to the threshold value, the underflow condition is met. As above, I suggest s/decrease/difference/ (perhaps with an additional indication that the magnitude of the difference is to be used, not the signed difference). Section 5.7 I'm not sure I understand why there's a need for an auto-bandwidth-specific "overload" notification, as opposed to having such messages suppressed as part of a more general overload condition. But that doesn't mean there's not a need, of course, as I'm not very familiar with this area! Section 6.6 An implementation MAY allow a limit to be placed on the rate of auto- bandwidth related messages sent by a PCEP speaker and received by a peer. An implementation MAY also allow sending a notification when a PCEP speaker is overwhelmed or the rate of messages reach a threshold. I don't want to revisit a WG consensus outcome, but I could see "SHOULD" for "allow sending a notification", noting that it is "SHOULD allow sending" and not "SHOULD send". Section 7 This document defines AUTO-BANDWIDTH-CAPABILITY TLV and AUTO- BANDWIDTH-ATTRIBUTES sub-TLVs which do not add any new security concerns beyond those already discussed in [RFC8231] and [RFC8281] I'd suggest to qualify this as "do not add any substantial new security concerns". We did just in the previous section talk about how this functionality could increase computational and operational load, which, taken to an extreme, can affect availability and could be considered a security consideration. We could also perhaps give guidance about setting the min and max bandwidth sub-TLVs to constrain automatic adjustment to be within a known range and thus bound the extent of second-order effects on the rest of the MPLS-TE domain. Section 8.2 It might be good to say within what registry the sub-registry being created is to be located. Also, the last paragraph could perhaps be rephrased as "the initial contents of the registry are empty, with all bits unassigned". Section 9.2 I think that the RFC 7525 and 8253 references fall under the RECOMMENDED guidance in Section 7, in which case per https://www.ietf.org/about/groups/iesg/statements/normative-informative-references/ they should be classified as Normative references. (It's probably also good to cite RFC 7525 as BCP 195 and not just RFC 7525.) The IEEE.754 reference is also normative since we use its wire encoding (but it could practically be considered "well-known" at this point, given its ubiquity). _______________________________________________ Pce mailing list Pce@ietf.org https://www.ietf.org/mailman/listinfo/pce