Re: [aqm] [rmcat] Comments on draft-johansson-rmcat-scream-cc-03

Dave Taht Wed, 05 Nov 2014 10:33:31 -0800

On Wed, Nov 5, 2014 at 8:30 AM, John Leslie <[email protected]> wrote:
> Mirja K?hlewind <[email protected]> wrote:
>>
>> Hi Dave, hi John,
>>
>> why would you drop packets before marking?
>
>    Dave will have to answer that. Perhaps he means that without AQM
> the first action comes when the buffer fills. If so, I agree it's
> entirely reasonable to drop ECN-enabled packet at that point.


I don’t necessarily want to fork this conversation on the scream draft.
In particular I wanted to know what was exactly meant by the pacing
language, also.

But as I am not attending ietf this time...

after shipping a stable release of cerowrt back in june, I have been
working with mine and  toke´s enormous data set to tear apart why
certain problems exist. For a long time now I have been very concerned
about IW10 and slow start in particular, at lower rates typically seen
in home uplinks.

Plug: Toke is giving a preso at iccrg (I believe) on his results,
the title of his talk is

¨How to reliably measure the performance of modern AQMs and what comes
of doing so?¨

The netperf-wrapper data is wonderful, as are the captures.

He did not look into ecn behavior as part of that, but that is kind
of next.... and I still keep hoping to have a viable real-world test
of webrtc and rmcat-like behaviors to fiddle with one day soon,
from somebody.

>    Dave seemed to imply that once tail-drop became necessary, he'd
> add ECN-marking at a low rate in addition to tail-drop. I agree that
> is reasonable as well.

Well, I primarily work with codel, which is head drop, and fq_codel
which usually dedicates a codel queue to a given flow, which is
usually exclusively CE capable or not.

ECN was not part of the original codel research. I do happen to like
ECN and hope that someone is fiddling with it for primary frames in
rmcat and also for attempting more agressive slow start-like behavior.

The behaviors of an ECN marking queue vs a dropping queue with
that first linux codel implementation were *different*, but we got good results
under ¨nice¨ conditions for either kind of flow, if run exclusively.

After adding ECN
to the linux codel and fq_codel versions, it seemed safe to
declare that fq + aqm w/ecn was ok to deploy globally, (which is
why it is on on fq_codel by default)
but a single queue aqm with mixed ecn capable and incapable
traffic was not. (which is why it is off on pie and codel by default)

Later on we added probabilistic ECN  to the pie algorithm,
where it will revert to dropping on overload via a brick wall filter,
basically. That works but is far from satisfying in terms of
overall latency seen by all flows in the case of multiple flows
in slow start, other overload, or a misbehaving sender.

What I have been working on of late is deterministic - if codel´s
current drop schedule determines that without ecn it would drop
two or more packets in a row, it will drop all but the last packet,
mark that and deliver it
(and it is gentler still when entering drop mode, dropping
one then always ecn marking the next, to provide the earliest possible
signal)

This seems to have the desired properties of dropping more at
low rates, high loads, or misbehaving sender, while dropping less
and marking more when things are more in a steady state, while
marking first when a flow first starts ramping up.

Overall drop vs mark ratios at various rates are comparable to the
pie stuff, at the limited speeds and rtts I have tested, but there
are still problems. I think in particular we ran smack into a
recently fixed bug in linux´s reno implementation...

Note: I had originally bought into the idea of treating ECN as a
multi-bit signal as per DCTCP - but the newly landed DCTCP
implementation and tests also use a purely dedicated single queue
for it, and no other TCPs do, so doing meaner things to ECN on serious overload
seems necessary when outside the datacenter.

Anyway, I am always willing to share patches in progress, raw data
etc, if you want some stuff that applies against linux
net-next as of a few days ago, the relevant patch
set is in this directory.

http://snapon.lab.bufferbloat.net/~d/new_codel_models/everythingcompared.png

This also includes some new work on a better-than-htb rate limiter, with
some (totally failed) attempts at wedging dart-like diffserv ideas in
it, a better
version of codel and various other experimental ideas that work or do not
work to varying degrees. Note: the above data set is quite polluted as I
iterated over various options to the algorithm, dont take any of the data or
other graphs than the above at face value! And I have a couple more patches
under test....

A lot of people have noted that linux codel´s performance drops off at
higher rates,
and at high levels of traffic.

This is because the linux codel does not match the advanced ns2 model we had,
and I think this finally fixes that, now that Linux handles TSO/GSO sizing more
sanely. It is however hard to observe any difference
between fq_codel and nfq_codel as the fq part takes care of 98% of the problem
and high rates are not observed by the codel sub queues...

and far more work is needed before a comprehensive set of improvements
could be pushed upstream. ENOFUNDING here. If anyone is interested
in collaborating on this work let me know.

>
>    I don't think Dave was implying that this should become standard.

Overload protection is obviously needed for ECN. How to do that is
not standardized, this just happens to be one approach.

>> Isn't the idea of ECN to avoid drop by sending a congestion signal
>> more early on?
>
>    Well...
>
>    I'd like to get away from that "avoid drop" language. It suggests
> that ECN would be a way to defer drops on ECN-capable flows (whether
> or not the actual implementation reduces its rate). That would be bad.
> Instead, I'd like to get folks thinking in terms of ECN as an AQM
> signal sent _before_ any drops are needed.

Yes. In what I am trying to do, ECN marking happens first, then it
starts to revert to dropping if that fails to get things under control
soon enough.

Hysteresis is an ongoing problem here.

>    And, alas, the current ECN standard seems to discourage any early
> action by requiring CE to result in the _same_ reduction as packet loss.
> This inevitably would disadvantage ECN-capable flows if the sum of CE
> and drop exceeded the drops of non-ECN flows. That is a paradigm we
> have to break out of.

At one level, I would like it if we kept tcps definition of ecn and morphed it
for newer protocols and applications like rmcat. Mosh treats ecn
interestingly, for example, I am using it in the babel routing
protocol interestingly
etc, etc.

>
>> Regarding the implementation at least for Linux, the network stack goes
>> into either CWR or Recovery state after the first marking or, respectively,
>> loss detection (3 dup ACKs) and stays there for about one RTT where it does
>> not perform any further decreases (if not implemented differently using a
>> recent update to the congestion control interface as DCTCP does).

Yea, that´s one stack.... but we dont have insight into what other stacks
actually do, as what we mostly have data on is ecn capability being available,


not being fully tested as actually working as expected, when used. Do any
test suites exist for measuring flows as to their expected CE ack window
marking behavior, and actual rate reduction, under a variety of conditions,
including reordering?



>
>    I am no Linux guru; but that is my understanding as well.
>
> --
> John Leslie <[email protected]>



-- 
Dave Täht

http://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks

_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Re: [aqm] [rmcat] Comments on draft-johansson-rmcat-scream-cc-03

Reply via email to