Dear Jonathan,

Thanks for your extensive answer. If you could comment on a couple more things:

1. in Cake, count saturates at 2^32-1, am I right (https://github.com/dtaht/sch_cake/blob/master/codel5.h#L352).

could it make sense to saturate count when interval/sqrt(count) and interval/sqrt(count+1) are indistinguishable in timer resolution?

2. I found three more changes from the original Codel:

2.1 Line292: if (sojourn_time < target || queue is empty) instead of (sojourn_time < target || there is less than MTU bytes in the queue).

was it changed because queue_length_in_bytes < MTU causes sojourn_time < target ?

2.2. Line 304:

else if (vars->count > 1 && now - vars->drop_next < 8 * interval) {
  /* we were recently dropping; be more aggressive */
  return now - vars->first_above_time >
     codel_control_law(now, interval, vars->rec_inv_sqrt);
}

2.3. Line 308:

else if (((now - vars->first_above_time) >> 15) * ((now - codel_get_enqueue_time(skb)) >> 15) > threshold) { return true; }. This line is explained in this email from Cake mailing list: https://lists.bufferbloat.net/pipermail/cake/2015-May/000227.html


My question is - are these changes valid for a standalone Codel or are they specifically made to optimize Cake's scheduler?

Regards,
Polina


On 07/20/2015 09:34 PM, Jonathan Morton wrote:
I can’t speak for Kathy, but as Cake’s author I have some insight into the 
changes I’ve made to the associated version of Codel.

Ronald Bless:
1. after exiting dropping state, count is usually reset, unless "the
next drop state is entered too close to the previous one”.
In fact, in most cases (ie. where the true RTT is near to or less than the one 
assumed by the interval parameter, and the flow is persistent) the drop state 
will be re-entered sufficiently soon to prevent the reset.  The backoff 
behaviour for that case is thus more significant in practice.

The standard backoff behaviour is to reduce count by just 2.  This is obviously 
wrong in the case where count has already reached a significant value; with a 
high drop frequency, count will climb a lot during a single RTT.  But if the 
elevated drop frequency is necessary and sufficient to control the flow, we 
don’t want it to grow indefinitely.

Cake’s version halves the count, reducing the drop frequency by a factor of 
sqrt(2) after a pause.  This multiplicative backoff prevents count from growing 
indefinitely, and allows it to reduce gradually to a lower value if that 
becomes appropriate due to reduced network load.

Anil's suggestion of continuously reducing count during the non-dropping state 
is an interesting alternative which might also make sense.  It has the pleasing 
property of relating the backoff rate to the length of time spent outside the 
dropping state.

2. is 'count' supposed to be reset or saturated on overflow and what
should be its maximum value (it makes a difference whether you are using
16-, 32-, or 64-bit counter)?
It’s clear to me that a saturating increment is necessary to deal with 
unresponsive flows adequately.  Otherwise, the drop frequency will wrap around 
to a low value periodically when it needs to be consistently high.

My test case for this involves 50 upstream TCP flows on a narrow link (say 
10Mbit) and a single downstream TCP flow on a wide link (say 100Mbit).  In this 
configuration, the acks for the downstream flow constitute an unresponsive (in 
fact, anti-responsive) upstream flow of greater throughput demand than the 
individual upstream TCP flows.  Anti-responsive means that dropping packets 
actually increases the network load, because reducing the ack density permits 
higher throughput on the downlink, which in turn generates additional acks on 
the uplink.

Since Cake uses very robust flow isolation, simply delivering all the acks in 
order, using a fair-share of the bandwidth, would severely limit the throughput 
of the downstream flow.  However, the standard Codel behaviour resulted in 
wildly oscillating throughput on the downstream flow.  Straightforward 
calculations suggested that the count variable was probably wrapping.

At first, I only put in the modified backoff behaviour mentioned above.  Since 
the downstream flow was often reaching full throughput, I thought that 
occasionally the drop rate was managing to empty the ack queue, making the 
backoff behaviour significant. This assumption turned out to be correct, but 
the backoff modification was insufficient to completely correct the behaviour.  
Only when I changed count to use a saturating increment instead of a wrapping 
one did the downstream flow achieve full throughput consistently.

It’s possible that if count was a wider variable (32 instead of 16 bits) that 
the backoff modification would have been sufficient to prevent wrapping in this 
case.  However, using saturating arithmetic is inexpensive on modern CPUs (many 
have dedicated instructions which could be used by highly optimised 
implementations) and is the most robust solution.

Note that in the absence of robust flow isolation, eg. with plain Codel or PIE, 
the upstream TCP flows will back off to make room for all the acks.  This does 
not require a particularly high drop rate, but it reduces the upstream goodput 
significantly.  Cake instead achieves high goodput in both directions in this 
especially challenging case.

Anil Agarwal:
I have attached the updated document.
There are indeed a lot of suggestions here, which I haven’t yet fully 
assimilated.  I’ll cherry-pick a few that stood out:

When RTT is small compared to the default Interval, CoDel is slow in reacting 
and controlling queue size/delay during TCP slow start.
For example, if Interval = 100 ms and RTT = 20 ms, CoDel will not drop or mark 
a packet for the first 100 ms of a new TCP connection; if link rate = 10 Mbps, 
initial window size = 14600 bytes, then during this period, the TCP cwnd will 
grow to 104 kbytes; queuing delay will be 63 ms.
Cake’s modified Codel does vary its sensitivity according to how quickly the 
queue is filling.  With the default parameters, it may trigger as quickly as 
35ms after a large burst arrives, or as slowly as 200ms if the queue is growing 
very slowly.

It could be improved further in theory, by giving Codel visibility of the 
length and drain rate of the queue (and thus the ability to predict future 
delay) rather than only observing the exit delay.  However, in the general case 
the drain rate is variable, so this may be difficult to make robust.

Ultimately, triggering instantly when the queue begins to fill is optimal to 
deal with slow-start, which will double its cwnd further while the congestion 
signal is in transit, before responding with a halving.  However, an instant 
trigger is suboptimal for the congestion-avoidance phase of TCP, which is why 
Codel always delays its response.  Flow isolation is a good (if imperfect) 
countermeasure for the suboptimal slow-start behaviour this causes.

My “Explicit Load Regulation” proposal, which hasn’t yet been widely 
circulated, aims to get closer to optimal behaviour in both situations.  It is 
an extension to ECN behaviour, giving the option of more moderate signals than 
ECN permits, while being easier to deploy incrementally than DCTCP is.  
Hopefully I’ll be able to write it up properly later.

The draft rfc states - “For example, there is about an 86% probability that no 
more than two of the 100 VoIP sessions will be involved in any given collision”
Based on analytical equations for hash collision probabilities (I derived them 
independently some time ago), the probability of no collision = 90.78%…
Cake’s set-associative hash function reduces the probability of flow collisions 
further, to negligible values for the above number of flows.

Alternatively, codel_dequeue() can be re-structured as two routines…
Certainly a refactoring would be beneficial for readability.  I haven’t got 
around to doing it yet.

  - Jonathan Morton

_______________________________________________
aqm mailing list
aqm@ietf.org
https://www.ietf.org/mailman/listinfo/aqm

_______________________________________________
aqm mailing list
aqm@ietf.org
https://www.ietf.org/mailman/listinfo/aqm

Reply via email to