Hi Lars, Just posted an updated version, and a diff from the previous version is available at: https://www.ietf.org/rfcdiff?url2=draft-ietf-alto-performance-metrics-20
Thanks! Richard On Tue, Nov 30, 2021 at 8:49 PM Y. Richard Yang <y...@cs.yale.edu> wrote: > Hi Lars, > > Thanks for the review! Please see below. > > On Mon, Nov 29, 2021 at 8:10 AM Lars Eggert via Datatracker < > nore...@ietf.org> wrote: > >> Lars Eggert has entered the following ballot position for >> draft-ietf-alto-performance-metrics-19: Discuss >> >> Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/ >> for more information about how to handle DISCUSS and COMMENT positions. >> >> The document, along with other ballot positions, can be found here: >> https://datatracker.ietf.org/doc/draft-ietf-alto-performance-metrics/ >> >> >> >> ---------------------------------------------------------------------- >> DISCUSS: >> ---------------------------------------------------------------------- >> >> This document needs to become much more formal about how it defines the >> metrics it wishes to use with ALTO. This could either be done either by >> identifying and normatively referencing existing metrics the IETF has >> defined, >> or by defining them here. When normatively referencing existing IETF >> metrics, it >> would need to explain why their use with ALTO makes sense. >> >> At the moment, the document informatively points to a somewhat arbitrary >> collection of prior IETF metrics (most of which are from IPPM, residual >> bandwidth from IS-IS TE, but then reservable bandwidth from OSPF TE?). > > > To give some background, the WG derived the list of metrics from RFC 8571 > (BGP - Link State (BGP-LS) Advertisement of IGP Traffic Engineering > Performance > Metric Extensions), focusing on network->application. The list added Hop > Count > (exists in original ALTO RFC 7285), Round-trip (to avoid two queries, and > many apps > use RTT), and TCP Throughput, and removed Unidirectional Available > Bandwidth > and Unidirectional Utilized Bandwidth, to reduce the number of bandwidth > metrics. > > >> But it >> only refers to them as "examples", > > > I searched the word "example" and do not see where the document says that > they > are examples. It says that "Since different applications may use different > cost metrics, > the ALTO base protocol introduces an ALTO Cost Metric Registry (Section > 14.2 of > [RFC7285]), as a systematic mechanism to allow different metrics to be > specified. For > example, a delay-sensitive application may want to use latency-related > metrics, and > a bandwidth-sensitive application may want to use bandwidth-related > metrics." > > Does this paragraph give an impression that the metrics are only examples? > If so, > do you suggest removing the "For example" phrase to reduce the impression? > > The document does have the sentence " The "Origin Example" column of Table > 1 gives an example RFC that has defined each metric." Here the word > "example" > word means one existing work. > > >> without actually defining how exactly they >> are to be used with ALTO, or - if not those - which actual metrics are >> supposed >> to be used. >> > > The document has "... the ALTO base protocol introduces an ALTO Cost > Metric Registry > (Section 14.2 of [RFC7285]), as a systematic mechanism to allow different > metrics > to be specified. " and "When an ALTO server supports a cost metric > defined in this document, > it should announce this metric in its information resource directory (IRD) > as defined in > Section 9.2 of [RFC7285]." Does this provide enough on how exactly they > should be used? > The function of this document is to satisfy the registry and the use will > be in the base protocol > (RFC7285). If there is a specific suggestion, it will be good to have. > > >> Defining a mechanism for exposing metric information to clients isn't >> really >> useful unless the content of that information is much more clearly >> specified. >> >> I agree with this statement that information should be specified as > clearly as > possible, but at the same time, we need abstraction to reduce the > complexity. > One guiding principle in the design is that ALTO information provides > reasonable guidance, not mathematical precision. > > >> Section 4.1.3. , paragraph 2, discuss: >> > Intended Semantics: To give the throughput of a TCP congestion- >> > control conforming flow from the specified source to the specified >> > destination; see [RFC3649, Section 5.1 of RFC8312] on how TCP >> > throughput is estimated. The spatial aggregation level is specified >> > in the query context (e.g., PID to PID, or endpoint to endpoint). >> >> A TCP bandwidth estimate can only be meaningfully be derived for bulk TCP >> transfers > > > Yes. It is intended for bulk transfer. > > >> under a set of pretty strict and simplistic assumptions, making this >> metric a meaningless at best and misleading at worst, > > > I will say that TCP throughput formula in general has turned out to be > quite useful. > > > >> given that the source of >> this information doesn't know what workload, congestion controller and >> network >> conditions the user of this information will use or see. >> > > Network (the source) is in a pretty good position to estimate the > potential TCP > throughput. In a high multiplexing setting (small fish in a big pond), > network can > have access to estimated loss rate, RTT, and typical packet size to compute > the TCP throughput formula. In a low multiplexing setting (big fish in a > small pond), > network can know the set of flows and estimate the bandwidth share. See > the citation > of the Prophet work in the document and the G2 work in SIGMETRICS'21 and > SIGCOMM'21. The congestion controller info is part of the metric (the link > points > to standard TCP/Reno). I made some minor edits to clarify. > > >> Also, RFC3649 is an Experimental RFC (from 2003!) and RFC8312 is an >> Informational RFC. Since this document normatively refers to them, it >> needs to >> cite them, and this will cause DOWNREFs for PS document. I would argue >> that >> at least RFC3649 is certainly not an appropriate DOWNREF. >> >> > Good suggestion! I added the reference to 3649 from the second paragraph of > Sec. 5.1 of RFC8312 (you are a co-author). It reads "The average > window sizes of Standard TCP and HSTCP are from [RFC3649]. The > average window size of CUBIC is calculated using Eq. 6 and the CUBIC > TCP-friendly region for three different values of C." Our plan, which is > already suggested > by Martin but it is my fault to not update yet, is to remove 3649 and use > RFC8312bis. > Make sense? > > >> Why define this metric at all? The material you point to is the usual >> model-based throughput calculation based on RTT and loss rates; a client >> that >> intended to predict TCP performance could simply query ALTO for this and >> perform >> their own computation, which will likely be more accurate, since the >> client will >> hopefully know which congestion controller they will use for the given >> workload, >> and what the characteristics of that workload are. >> > > The throughput formula is for a very limited setting, i.e., the small fish > setting. What > we found useful is the low multiplexing setting, where the loss rate is > the output, > not the input, of the convergence process. It has good use cases. Please > see > the Prophet paper and one most recent example is the use cases, such as > accelerating time-bound constrained flows, in Sec. 3 of > https://www.reservoir.com/wp-content/uploads/2021/08/G2_QTBS_TR_2021.pdf > The paper uses max-min fairness but the Internet uses other fairness. > > > >> ---------------------------------------------------------------------- >> COMMENT: >> ---------------------------------------------------------------------- >> >> Section 1. , paragraph 6, comment: >> > The purpose of this document is to ensure proper usage of the >> > performance metrics defined in Table 1; it does not claim novelty of >> > the metrics. The "Origin Example" column of Table 1 gives an example >> > RFC that has defined each metric. >> >> I don't understand what the purpose of the "origin example" column is. >> Most of >> these point to IPPM metrics, which have a pretty clear and >> narrowly-defined area >> of applicability. Since ALTO isn't performing IPPM-style network testing, >> it's >> not clear why IPPM metrics are referenced here? >> > > The metrics that this document use are defined in multiple IETF documents > before. > The intention of the sentence is to give early work credit. > > >> Section 2.2. , paragraph 23, comment: >> > If a cost metric string does not have the optional statical operator >> > string, the statistical operator SHOULD be interpreted as the default >> > statical operator in the definition of the base metric. If the >> >> What is a "statical" operator; I am not familiar with the term and it >> doesn't >> seem to appear in other RFCs? (Also occurs elsewhere in this document.) >> >> Apology for the typo. statical operator -> statistical operator. They are > fixed in > an internal version but we did not upload. > > > > >> Section 3.1.4. , paragraph 4, comment: >> > link statistics. Another example of a source to estimate the delay >> > is the IPPM framework [RFC2330]. It is RECOMMENDED that the >> >> IPPM defines measurement metrics. How would they be a source for >> estimates? >> >> > The intention was to refer to the measurement methodology in 6.2 of RFC > 2330, but > I can see the potential confusion now. How about we change the wording to > "Another example of a source to estimate the delay is through active > measurements, > for example, considering the IETF IPPM framework [RFC2330]." > > >> Section 3.3. , paragraph 1, comment: >> > 3.3. Cost Metric: Delay Variation (delay-variation) >> >> Is this supposed to apply to the one-way or bidirectional delay? > > > This is the current specification: " > 3.3.3. Intended Semantics and Use > > Intended Semantics: To specify spatial and temporal aggregated delay > variation (also called delay jitter)) with respect to the minimum > delay observed on the stream over the one-way delay from the > specified source and destination. The spatial aggregation level is > specified in the query context (e.g., PID to PID, or endpoint to > endpoint)." > > So it is one way. > > Also, delay >> variation is not independent from path utilization (c.f. bufferbloat), so >> why is >> it being reported independently? >> > > Not sure I understand the suggestion. We see reports of jitter > (e.g., https://cpr.att.com/pdf/se/0001-0003.pdf) reported independently > (in the sense > as a single metric, without specifying as conditional > values/probabilities). > > >> >> Section 3.5. , paragraph 1, comment: >> > 3.5. Cost Metric: Loss Rate (lossrate) >> >> What is this metric supposed to capture? Loss is generally not >> independent from >> network utilization (apart from random corruption loss). So it should be >> zero >> for unloaded networks, and depends on utilization otherwise. Also, is this >> unidirectional or bidirectional loss (wording below is unclear)? >> > > It is meaningful in high multiplexing settings. There can also be an > load-independent > (I can see that you may see interference can be load as well) loss rate > when there are > wireless links. > > It is intended to be unidirectional: "3.5.3. Intended Semantics and Use > > Intended Semantics: To specify spatial and temporal aggregated packet > loss rate from the specified source and the specified destination. > The spatial aggregation level is specified in the query context > (e.g., PID to PID, or endpoint to endpoint)." > > How about the following change: > " To specify spatial and temporal aggregated packet > loss rate from the specified source and the specified destination." > => > To specify spatial and temporal aggregated packet > loss rate, in one way, from the specified source and the specified > destination." > > > Using lowercase "not" together with an uppercase RFC2119 keyword is not >> acceptable usage. Found: "MUST not" >> >> > Got it. We have fixed the case: > "The total length of the cost metric string MUST not exceed 32" > => > "The total length of the cost metric string MUST NOT exceed 32" > > >> The document has 6 authors, which exceeds the recommended author limit. I >> assume the sponsoring AD has agreed that this is appropriate? >> >> No reference entries found for: [RFC3649] and [RFC8312]. >> >> > Thanks for pointing it out. It was missing after an update and pointed out > by > Martin. It is fixed in the next version which we will upload soon. > > >> Found terminology that should be reviewed for inclusivity; see >> https://www.rfc-editor.org/part2/#inclusive_language for background and >> more >> guidance: >> >> * Term "man"; alternatives might be "individual", "people", "person". >> >> Hah. You mean change > "man-in-the-middle (MITM) attacks" > => > "person-in-the-middle attacks". > > I looked and indeed see PITM ( > https://en.wikipedia.org/wiki/Man-in-the-middle_attack). > > Interesting and fixed. Thanks! > > The edits below are great and fixed. Thanks again! > > Richard > > > >> ------------------------------------------------------------------------------- >> All comments below are about very minor potential issues that you may >> choose to >> address in some way - or ignore - as you see fit. Some were flagged by >> automated tools (via https://github.com/larseggert/ietf-reviewtool), so >> there >> will likely be some false positives. There is no need to let me know what >> you >> did with these suggestions. >> >> "Abstract", paragraph 2, nit: >> - types of cost metric. Since the ALTO base protocol (RFC 7285) >> + types of cost metrics. Since the ALTO base protocol (RFC 7285) >> + + >> >> Section 1. , paragraph 2, nit: >> > ] on registering ALTO cost metrics. Hence it specifies the identifier, >> the in >> > ^^^^^ >> A comma may be missing after the conjunctive/linking adverb "Hence". >> >> Section 2.2. , paragraph 2, nit: >> > of the observations. median: the mid point (i.e., p50) of the >> observations. >> > ^^^^^^^^^ >> This word is normally spelled with a hyphen. >> >> "IPPM ", paragraph 2, nit: >> > Also, delay variation is not independent from path utilization (c.f. >> buffer >> > ^^^^^^^^^^^^^^^^ >> The usual collocation for "independent" is "of", not "from". Did you mean >> "independent of"? >> >> Section 3.3.3. , paragraph 7, nit: >> > apture? Loss is generally not independent from network utilization >> (apart fr >> > ^^^^^^^^^^^^^^^^ >> The usual collocation for "independent" is "of", not "from". Did you mean >> "independent of"? >> >> Section 3.4.3. , paragraph 6, nit: >> > imation" method. See Section 3.1.4 on on related discussions such as >> summing >> > ^^^^^ >> Possible typo: you repeated a word. >> >> Section 3.5.4. , paragraph 3, nit: >> > [RFC8312]), it helps to specify as much details as possible on the the >> cong >> > ^^^^ >> Use "many" with countable plural nouns like "details". >> >> Section 3.5.4. , paragraph 3, nit: >> > ify as much details as possible on the the congestion control algorithm >> used >> > ^^^^^^^ >> Two determiners in a row. Choose either "the" or "the". >> >> These URLs in the document can probably be converted to HTTPS: >> * >> http://www.iana.org/assignments/alto-protocol/alto-protocol.xhtml#cost-metrics >> >> >> >> _______________________________________________ >> alto mailing list >> alto@ietf.org >> https://www.ietf.org/mailman/listinfo/alto > >
_______________________________________________ alto mailing list alto@ietf.org https://www.ietf.org/mailman/listinfo/alto