I agree with most of the suggested features that must be tested for AQM evaluation.
But I have some doubts if the proposed experiments/metrics are really applicable and able to reveal the required features. My comments in detail: Section 2.1: Flow completion time: Is applicable only for equally sized finite flows. Not really meaningful for variable sized flows, e.g. the Tmix trace. Not applicable to infinite flows. Section 2.2: Packet loss: - Long term loss probability is meaningful only in a steady state scenario. And it characterizes the TCP flavor, not the AQM. (Loss probability remains the same, whatever you do with AQM, as long as you reach roughly the same throughput.) - interval between consecutive losses: If the losses are well spaced, it resembles somehow the loss probability. If not well spaced (bursty), what to record? - packet loss patterns: Metric is undefined, except the special case "packet loss synchronization", next section, 2.3. It is, indeed, highly interesting qualitatively in non-stationary cases, e.g. abrupt capacity drop. But how to quantify? Section 2.4: Goodput: - Meaningful only with steady state occupancy by a number of more or less greedy TCP flows. Here it shows, to which extent the AQM is able to keep a link close to the 100% utilization. - With a trace of variable sized flows (Tmix) the goodput resembles the traffic offer (if in total below the link capacity; not overloaded). - The overload scenario does not reach steady state. Goodput in overload cases is highly dependent on other things than AQM, e.g. test duration or shuffling of the trace. Section 2.6: Trade-off latency vs. goodput: The section refers to two (x,y) plots of the form: X=delay(parms) Y=goodput(parms) and X=delay(parms) Y=drop_ratio(parms) where <parms> are tuples of parameter values, each describing one experimental set-up. It remains unclear, what <parms> might be in the context of the given document. The cited document [TCPEVAL2013] suggests that one dimension of <parms> might be the scaling of the applied Tmix trace. The other parameters in the cited documents are not applicable here. More on this see below. Section 4.1: TCP-friendly sender: Requires the plots according to section 2.6. But at the same time it specifies one single long-lived, non application limited flow - this is one single dot in each of the plots. Section 4.2: Aggressive Transport Sender: same problem Section 4.3: Unresponsive Transport: same problem; moreover: Scenario is only applicable to scheduling, not to AQM. The described traffic simply overloads the link with no response to AQM (that is, to my understanding the meaning of unresponsive traffic). A "long-lived non application limited UDP flow" is somewhat infinite, other than its counterpart TCP. I would suggest a different test here: In a mixture of responsive and unresponsive traffic do a test, to which extent the AQM scheme is still able to keep the responsive fraction under control. This requires that the unresponsive traffic is well below the capacity limit. The rationale behind this test is that an AQM scheme might under- or over-react if it drops packets but does not see the expected reduction. Section 4.4: Initial Congestion Window: Makes sense only with a mix of short lived flows. For long-lived flows the IW does not matter. Alternatively a single experiment of a pre- existing long-lived flow and a newly appearing IW3/IW10 flow could be executed. But there is no reference to any traffic mix. The table specifies just 2 flows in parallel. What are the <parms> for graphs according to section 2.6? Section 4.5: Traffic Mix The section defines its own traffic mixes in a table, but requires the graphs according to 2.6, which somehow implies the Tmix traffic. Section 6: Burst absorption Same as above; the test comes with its own traffic mix, but requests graphs according to 2.6, thus implying Tmix. The proposed bursty scenarios seem to be not specific enough, if compared with 4.5. I would propose here, for reproducibility, something like UDP on/off background traffic. Section 7: Stability This section mixes two different things: (a) The impact of general drop rate by other cross traffic, which is unrelated to the bottleneck link. (b) Reaction to varying link capacity at the bottleneck. The general drop rate experiment (a) is weakly specified: If the drop rate is too high, the bottleneck capacity cannot be reached; AQM does not matter. Or, the other way round, if the drop rate is too low, the AQM algorithm dominates the drop process, whereas the background drops don't really matter. Only the transition between both regimes could be of interest. But how to get there, and is this of relevance in practice? The varying capacity experiment (b) is really relevant. I am asking myself, if there could be resonant effects in the AQM parameter adaptation algorithms, and how to test for this? Wolfram Lautenschläger Alcatel-Lucent Bell Labs _______________________________________________ aqm mailing list aqm@ietf.org https://www.ietf.org/mailman/listinfo/aqm