I agree with most of the suggested features that must be tested 
for AQM evaluation.

But I have some doubts if the proposed experiments/metrics
are really applicable and able to reveal the required features.

My comments in detail:

Section 2.1: Flow completion time:

Is applicable only for equally sized finite flows. 
Not really meaningful for variable sized flows, e.g. the Tmix trace. 
Not applicable to infinite flows.

Section 2.2: Packet loss: 

- Long term loss probability is meaningful only in a steady state
scenario. And it characterizes the TCP flavor, not the AQM. (Loss
probability remains the same, whatever you do with AQM, as long as you
reach roughly the same throughput.)

- interval between consecutive losses: If the losses are well spaced,
it resembles somehow the loss probability. If not well spaced (bursty), 
what to record?

- packet loss patterns: Metric is undefined, except the special case 
"packet loss synchronization", next section, 2.3. It is, indeed, highly 
interesting qualitatively in non-stationary cases, e.g. abrupt capacity 
drop. But how to quantify?

Section 2.4: Goodput: 

- Meaningful only with steady state occupancy by a number of more or 
less greedy TCP flows. Here it shows, to which extent the AQM is able 
to keep a link close to the 100% utilization. 

- With a trace of variable sized flows (Tmix) the goodput resembles the 
traffic offer (if in total below the link capacity; not overloaded). 

- The overload scenario does not reach steady state. Goodput in 
overload cases is highly dependent on other things than AQM, e.g. test 
duration or shuffling of the trace.

Section 2.6: Trade-off latency vs. goodput:

The section refers to two (x,y) plots of the form:
      X=delay(parms)
      Y=goodput(parms)
and
      X=delay(parms)
      Y=drop_ratio(parms)
where <parms> are tuples of parameter values, each describing 
one experimental set-up.

It remains unclear, what <parms> might be in the context of the given 
document. The cited document [TCPEVAL2013] suggests that one dimension 
of <parms> might be the scaling of the applied Tmix trace. The other 
parameters in the cited documents are not applicable here. More on this 
see below.

Section 4.1: TCP-friendly sender:

Requires the plots according to section 2.6. But at the same time it 
specifies one single long-lived, non application limited flow - this is 
one single dot in each of the plots.

Section 4.2: Aggressive Transport Sender: same problem

Section 4.3: Unresponsive Transport: same problem; moreover:

Scenario is only applicable to scheduling, not to AQM. The described 
traffic simply overloads the link with no response to AQM (that is, to 
my understanding the meaning of unresponsive traffic). A "long-lived 
non application limited UDP flow" is somewhat infinite, other than its 
counterpart TCP.

I would suggest a different test here: In a mixture of responsive and 
unresponsive traffic do a test, to which extent the AQM scheme is still 
able to keep the responsive fraction under control. This requires that 
the unresponsive traffic is well below the capacity limit. The 
rationale behind this test is that an AQM scheme might under- or 
over-react if it drops packets but does not see the expected reduction.

Section 4.4: Initial Congestion Window:

Makes sense only with a mix of short lived flows. For long-lived flows 
the IW does not matter. Alternatively a single experiment of a pre-
existing long-lived flow and a newly appearing IW3/IW10 flow could be 
executed. But there is no reference to any traffic mix. The table 
specifies just 2 flows in parallel.

What are the <parms> for graphs according to section 2.6? 

Section 4.5: Traffic Mix

The section defines its own traffic mixes in a table, but requires the 
graphs according to 2.6, which somehow implies the Tmix traffic.

Section 6: Burst absorption

Same as above; the test comes with its own traffic mix, but requests 
graphs according to 2.6, thus implying Tmix.

The proposed bursty scenarios seem to be not specific enough, if 
compared with 4.5.

I would propose here, for reproducibility, something like UDP on/off 
background traffic.

Section 7: Stability

This section mixes two different things: 
(a) The impact of general drop rate by other cross traffic, which is 
unrelated to the bottleneck link. 
(b) Reaction to varying link capacity at the bottleneck.

The general drop rate experiment (a) is weakly specified: If the drop 
rate is too high, the bottleneck capacity cannot be reached; AQM does 
not matter. Or, the other way round, if the drop rate is too low, the 
AQM algorithm dominates the drop process, whereas the background drops 
don't really matter. Only the transition between both regimes could be 
of interest. But how to get there, and is this of relevance in practice?

The varying capacity experiment (b) is really relevant. I am asking 
myself, if there could be resonant effects in the AQM parameter 
adaptation algorithms, and how to test for this?


Wolfram Lautenschläger
Alcatel-Lucent
Bell Labs


_______________________________________________
aqm mailing list
aqm@ietf.org
https://www.ietf.org/mailman/listinfo/aqm

Reply via email to