Re: [aqm] draft-ietf-aqm-pie-01: review

Dave Taht Wed, 13 May 2015 07:45:02 -0700

On Tue, May 12, 2015 at 6:31 PM, Bob Briscoe <[email protected]> wrote:
> Greg,
>
> At 23:44 12/05/2015, Greg White wrote:
>>
>> Bob,
>>
>> I haven't had a chance to fully read your review (though what I've read
>> looks to be insightful), but there was one comment in the email thread
>> that compelled me to respond.
>>
>> You wrote:  There really is no point in employing control
>> theory experts to work out the stability limits
>> for how fast a control algorithm can respond, if
>> implementers then add in extra arbitrary limits
>> on the responsiveness like this.
>>
>> I would personally truncate/modify that statement to something more like:
>> There really is no point in employing control theory experts to work out
>> the stability limits for how fast an AQM control algorithm can respond.
>> :)
>
>
> I disagree in this respect: the control theory analysis gave the designer of
> the algorithm (Rong) a feel for what autotuning would be needed, to keep it
> far enough away from instability (given the idealised assumptions, i.e. only
> TCP Reno etc.) That's the whole point of a gain and phase /margin/. Not
> gospel, but a better insight into how much 'margin of error' there is under
> idealised conditions that are something close to those expected to be
> common.


A minimal amount of tachyons and infinitely precise mathematics is needed
for any implementable algorithm.

> My comment was in response to discovering an arbitrary limit had been added
> to the Linux implementation: "Limit the change in p per T_UPDATE to 2%". The
> whole point of the rest of the PIE algorithm is to automatically limit how
> rapidly p changes, by steering a mid-course far enough away from two cliffs
> known to be out there somewhere (probably not where the theory says they
> are, but at least it gives a feel).
>
> So to write in a hard-coded limit that completely overrides all the
> autotuning is IMO just plain ignorant (I'll eat my words if someone like
> Rong wrote that code!). It will make PIE unnecessarily sluggish when
> conditions are changing fast and the rest of the code has judged that it
> will be quite safe to react fast.

I assumed that the code I was given and helped polish had been thoroughly
reviewed by the authors, and had hoped it had been more reviewed by others
concerned with it long before now.

Certainly time and benchmarkings can improve all things.

http://www.bufferbloat.net/projects/codel/wiki/Cake

It is important to recognize (and remember) that we are in an iterative
process. I recently (I'll put up a graph) took apart what dslreports was
doing to make sure it was reporting real stuff, and along the way
measured how CMTSes behaved today in particular compared to
what we could do now - it is *way better*...

but not idea.

> And, I should add here that 'safe' only means 'unlikely to harm
> utilisation'. Even if PIE becomes unstable, the oscillations only hurt
> utilisation. It's not like we're dealing with nitroglycerin or something.

Well, it is certainly feasible someone could still break the net. Did anyone
imagine how ntp was recently used for ddos attacks?

> No. Actually, I will push back harder. This community needs skilled
> mathmaticians, skilled designers, skilled evangelists and skilled
> implementers.

+10!

EEs making hardware in fpgas, coders coding, theorists theorizing
experimenters experimenting, all in the same building would be nice.

It could also use a "red team" and a "blue team" - skeptics attacking,
defenders defending...

and a much higher cycle time between idea/attempt/testing than
what is allowed by classic academic publication.

and way better test tools.

As much as these personality types and skill sets (including mine) tend to
fight I am glad we are all locked in this virtual room together...

(well... some of the time. :) )

>And we all need humility, to listen carefully to the others.

+100

And funding! However, my government, having lost the war on drugs
seems to be reluctant or unable to start one on internet congestion.

The eu seems to be doing better in this regard. I'm tired of having nothing
but canvas over my head and my enlistment was up 2 years ago.

> And yes, agreed, we also need to be wary that someone who sounds like an
> expert could be pushing inapplicable theory.

This field is new to me. I brought my background in real time control systems
(principally audio) to the table as well as mods to one algorithm I
had been using for
a decade....

http://www.bufferbloat.net/projects/cerowrt/wiki/Wondershaper_Must_Die

...

and everything since then has been kind of a wonderful learning experience.

>
>
>> Real networks aren't LTI systems, and I actually think it is disingenuous
>> to present classical control theoretical analyses of AQMs that are
>> intended to be deployed in real networks.  PIE borrows from classical
>> control theory, and that is fine, but I think that is about as far as one
>> should take it.

Agree

Also I kvetch that few statistical methods that are commonly understood
apply. This is why I am excited about the tracy widom stuff - it is not that
it is different than a std distribution that is the important part, but that it
implies other methods could be derived.

>
> I'm not sure what you're saying. How can it be fine, but not be taken?
>
> I found reading the stability analysis in the HPSR paper on PIE useful for
> understanding how to improve PIE autotuning over a wide range. I think
> you're criticism assumes the analysis is trying to claim something about the
> solution. Whereas I saw it as providing insight for those who designed it,
> and for those wanting to understand (and improve) the design.
>
> As long as you know how much weight to give such analysis and what it's
> limits are, it is much much better than the Dilbert rule of thumb: "87% of
> made up numbers are better than those derived from analysis". I assume
> whoever wrote that 2% limit in the Linux code must live by Dilbertisms like
> this.
>
>
>> To me PIE has always seemed like a "steampunk" AQM in that there are a lot
>> of knobs, gears and wires that make it look arcane.  All of those
>> knobs/gears/wires were added for a reason, and they do make for a system
>> that works well* in comparative testing.  But, is all of the algorithmic
>> complexity needed?  Maybe not.

In recent months multiple episodes of "girl genius" have oddly been appropo
for exactly what was going on on these mailing lists.

My favorite:

http://www.girlgeniusonline.com/comic.php?date=20150406#.VVNghnU4l4s

>
> And when you read my review (pls do) you will get a feel for my assessment
> of PIE: it looks to me like the implementation (by which I mean the
> pseudocode on which other implementations are based) has failed to realise
> the intent of the design. In places this is because there are still holes in
> the design, and the implementation has made a mess of filling them. But it's
> often because the implementation has introduced a lot of bad practices (e.g.
> abitrary constants). Most importantly, too little deep thought and care has
> gone into the pseudocode.
>
>> Is there a more streamlined version that
>> would work equally well (or better)?  Probably.
>
>
> We are nearly finished testing a very simple AQM that doesn't use any of
> theose steampunk PI controls. But only having understood the control stuff
> could we build up the confidence and intuition to understand where it's
> probably not necessary.
>
> It's aimed at cost-reduction more than perfection for very large numbers of
> low stat-mux queues within a larger scheduling hierarchy. Not really your
> scenario (cable modems), but it's possible it could be retargeted at
> residential gateways and be as good as PIE, or maybe so slightly worse that
> it wouldn't matter.
>
>> One of the most
>> compelling arguments in favor of PIE for the cable industry was that
>> nearly all of the control-law functions could be done in CPU as opposed to
>> being burned into gates (in a cost-constained high-performance mass-market
>> chip). So, it is relatively easy to make changes if further studies point
>> out improvements that can be made.  I'm happy that it is getting another
>> set of smart eyes looking at it and suggesting improvements.  I would be
>> especially pleased if we could show equivalent or better performance
>> across a broader range of scenarios with a simplified algorithm.
>>
>> *works well when integrated with the DOCSIS MAC layer, in simulation.
>> We'll have real equipment at some point to do validation testing with.
>>
>> Note, some of the comments in your review are arising from a few
>> errors/omissions in the pseudocode that were pointed out by others and
>> that Rong will have fixed in the next draft (e.g. status_update() runs
>> every 16ms regardless of packet arrivals, the T_UPDATE check was probably
>> not the best way to explain this. In DOCSIS the updates are suppressed
>> when in sleep mode, and resume with initialized state upon exit from
>> sleep).  But in my view many of the others look like good suggestions that
>> would warrant further study.
>
>
> OK. I suspected that might be the case. Happy to turn that criticism into an
> editorial nit.
>
>
>> Your comment on queue sampling comes close to one of my lingering mental
>> issues with PIE.

My own kvetch was that the sampling interval was a fixed constant and
(as you pointed out) too high for lower rates.

and in the code and API untunable.

>> CoDel's insight was that it isn't the current queue
>> latency (or moving average queue latency) that matters as much as the
>> minimum queue latency over a recent window. PIE doesn't take advantage of
>> this insight.
>
>
> Altho you're right to point out that the min over a window is a useful
> insight, the way it's done in CoDel leaves me cold. All the decisions in
> CoDel are either-or and switching into and out of modes. I would have been
> happier if the closer the min was to being over the target, and the more
> often the mins came close to the target in recent time (e.g. by an EWMA),
> then the more likely it would drop. GIven no magic number will be correct,
> it's better to use a continuous approach, rather than when you're one side
> of a threshold you're fine and the other you're dead.

Tend to agree.

>
> Bob
>
>> (CoDel, however, misses that it probably does matter how
>> big that minimum queue latency is and how fast it is growing/shrinking).
>>
>> -Greg
>>
>>
>> On 5/9/15, 11:13 AM, "Bob Briscoe" <[email protected]> wrote:
>>
>> >Dave,
>> >
>> >At 00:51 09/05/2015, Dave Taht wrote:
>> >>Dear Bob:
>> >>
>> >>  I now understand the linux codebase for pie a lot better, as well as
>> >>some of the experimental data I have. It  looks like I could make
>> >>several of the changes you describe and put them in my next series of
>> >>tests, and based on your parameters I should be able to exercise some
>> >>edge cases across those changes. Wow, thx!
>> >
>> >I'd like to allow the authors a chance to think
>> >and respond before anyone does any changes to
>> >implementations - they may have their reasons that I didn't understand.
>> >
>> >
>> >>I have not actually read the latest pie draft, but would like to make
>> >>a few comments on your comments quickly:
>> >>
>> >>re: 3.1: in linux, params->ecn is presently a boolean, and could
>> >>easily be modified to mean anything and compare anything you want.
>> >>What would be a good default?
>> >>
>> >>The ECN support in the linux code on enqueue looks like:
>> >>
>> >>         if (!drop_early(sch, skb->len)) {
>> >>                 enqueue = true;
>> >>         } else if (q->params.ecn && (q->vars.prob <= MAX_PROB / 10) &&
>> >>
>> >>                    INET_ECN_set_ce(skb)) {
>> >>
>> >>                 /* If packet is ecn capable, mark it if drop
>> >> probability
>> >>
>> >>                  * is lower than 10%, else drop it.
>> >>
>> >>                  */
>> >>                 ....
>> >
>> >Next mail I'll send an email I sent to Fred offlist on this subject.
>> >
>> >
>> >>Re: 5.0: will look over that code
>> >
>> >My review didn't have a section 5.0
>> >:?
>> >
>> >
>> >>re: 5.1,  linux code:
>> >>
>> >>         /* Non-linear drop in probability: Reduce drop probability
>> >>quickly if
>> >>          * delay is 0 for 2 consecutive Tupdate periods.
>> >>          */
>> >>
>> >>         if ((qdelay == 0) && (qdelay_old == 0) && update_prob)
>> >>                 q->vars.prob = (q->vars.prob * 98) / 100;
>> >
>> >Ooh joy. The great thing about open source is
>> >that it allows everyone to assert their human
>> >right to add their own made-up numbers.
>> ><http://dilbert.com/strip/2008-05-08>
>> >
>> >Looking at the Linux code, I just found another magic number has been
>> >added:
>> ><https://github.com/torvalds/linux/blob/master/net/sched/sch_pie.c#L368>
>> >/* to ensure we increase probability in steps of no more than 2% */
>> >:-!
>> >
>> >There really is no point in employing control
>> >theory experts to work out the stability limits
>> >for how fast a control algorithm can respond, if
>> >implementers then add in extra arbitrary limits
>> >on the responsiveness like this.
>> >
>> >
>> >>re: 5.2: strongly agree that the lookup table doesn't scale properly.
>> >>The linux code appears to differ from the draft also, here, with a
>> >>smaller lookup table, and some other smoothing functions. I am going
>> >>to stop pasting now and just point at:
>> >>https://github.com/torvalds/linux/blob/master/net/sched/sch_pie.c#L334
>> >>
>> >>Will await more feedback from y'all on that.
>> >
>> >The factors I used in my review were added to the
>> >PIE draft when the pseudocode was added in Oct'14.
>> >The numbers in the Linux code you point to above
>> >match the original HPSR'13 paper on PIE.
>> >We need to wait to hear the answer from the
>> >authors about why the numbers were changed.
>> >
>> >
>> >>Codel also has a similar undershoot problem for which I proposed we
>> >>try a fixed point fractional count variable in a recent post to the
>> >>cake mailing list.
>> >
>> >Back in 2013 I pointed out that the CoDel
>> >square-root control law ends up increasing drop
>> >probability linearly with time. See my points 3 & 4 here:
>> ><http://www.ietf.org/mail-archive/web/aqm/current/msg00376.html>
>> >
>> >
>> >>re: 5.3.1: In the environments i work in it is extremely hard to get
>> >>timers to reliably fire in under 2ms intervals, particularly on vm'd
>> >>systems. Also as you fire the timer more rapidly the current
>> >>calculations in pie now done out of band of the packet processing have
>> >>a couple divides in them which tend to be processor intensive... both
>> >>things said, i figure this and other implementations could fire faster
>> >>than the default 16ms...
>> >
>> >My point was that an RFC should say what the
>> >algorithm is trying to achieve, not solely what a
>> >particular implementation for a particular
>> >environment might have to do as a compromise.
>> >
>> >
>> >>re: 5.3.2: I like what you are saying but I gotta go work it out for
>> >>myself. which will take a while. patches wanted.
>> >
>> >I'm not sure whether the inaccuracy is
>> >significant. I tried to find cases where it might
>> >be, but gave up wasting time using trial and
>> >error - I need to think about it analytically.
>> >
>> >
>> >>re: 5.4: linux and all my tests have always been against:
>> >>            /* default of 100 ms in pschedtime */
>> >>            vars->burst_time = PSCHED_NS2TICKS(100 * NSEC_PER_MSEC);
>> >
>> >I think it could be zero, given the alpha/beta
>> >formula already significantly filters out bursts.
>> >But again, wait for the authors to answer my
>> >question about whether they chose this value based on experiments.
>> >
>> >
>> >>5.5: explains a lot. Probably. Will think on it.
>> >>
>> >>5.6: :chortle: heats_this_room() indeed! Derandomization always looked
>> >>to me like an overly complex solution to a non-problem.
>> >>
>> >>5.7: don't think this problem exists in the linux code but will step
>> >>through.
>> >
>> >It does:
>> ><https://github.com/torvalds/linux/blob/master/net/sched/sch_pie.c#L383>
>> >
>> >
>> >>But: in one of my recent (450 rrul_50_up tcp flows) tests neither
>> >>codel or pie got to a sane drop rate in under 60 seconds, and pie
>> >>stayed stuck at 10000 packets outstanding, i did not try more - on
>> >>gigE local links. I think a lot of pie tests were run by others with a
>> >>very low outside packet limit (200 packets) and thus the tail drop
>> >>kicked in before pie itself could react.
>> >
>> >That's the opposite to my concern in 5.7 - I'm
>> >sure tail drop will often kick in before PIE can
>> >react (see my points 5.3.1 , 5.4 & 5.5).
>> >
>> >I'm talking about contriving an attack that
>> >drives PIE to 100% drop /before/ it reaches tail
>> >drop (because 100% drop is pathological, whereas
>> >tail drop is only mildy irritating by comparison).
>> >
>> >
>> >>5.8:  I think this is a quibblish change not relevant for any
>> >>reasonable length of queue measured in packets.
>> >>
>> >>but I do note that we switched from packet limits to byte limits in
>> >>cake but that was for other reasons - primarily due to the extreme
>> >>(1000x1) dynamic range of a modern "packet".
>> >
>> >I've think you've misunderstood (i.e. I've explained it badly).
>> >
>> >I merely meant that, if PIE has to fall back to
>> >tail drop, at least it should always stop
>> >enqueuing packets when there's <1MTU free at the
>> >tail, which ensures there will be no bias against large packets.
>> >
>> >
>> >>6: I do wish the draft and the code I have still lined up, and the
>> >>constants clearly defined.
>> >>
>> >>7: exp(p) is not cheap, and there aint no floating point in the kernel
>> >>either.
>> >
>> >Sorry, again I've explained badly. I had defined
>> >p as a double, and I meant "extract the integer
>> >held in the exponent of the floating point representation".
>> >
>> >I should have described this for integer
>> >arithmentic, as "lg(p) in integer arithmetic".
>> >On Intel Architecture it's the bsr instruction in
>> >assembler, which returns the position of the
>> >highest set bit in an unsigned integer.
>> >
>> >I've clarified this point and 5.8 above in my
>> >review. I've updated it to issue 01A and uploaded it at the same URL:
>> ><http://www.bobbriscoe.net/projects/latency/piervw_tr.pdf>
>> >
>> >
>> >>8. Haven't read the draft, can't comment on the nits.
>> >>
>> >>One quick note on:
>> >>
>> >>"4.1 Random Dropping s/Like any state-of-the-art AQM scheme, PIE would
>> >>drop packets randomly/ /PIE drops packets randomly/ Rationale: The
>> >>other scheme with a claim to be state-of-the-art doesnÃ¢€™t (CoDel). I
>> I
>> >>would agree if the draft had said Ã¢€œA state of the art scheme should
>>
>> >>introduce randomness into packet dropping in order to desynchronize
>> >>flows,Ã¢€Â  but maybe it was decided not to introduce such underhand
>>
>> >>criticism of CoDel. Whatever, the draft needs to be careful about
>> >>evangelising random drop, given it attempts to derandomize later."
>> >>
>> >>I don't buy the gospel that randomness is needed to avoid tcp global
>> >>synchronization. I would prefer that whatever evangelicalization of
>> >>randomness exists in any draft here or elsewhere be dropped in favor
>> >>always of discussing the real problem of tcp global sync instead...
>> >>
>> >>... which as near as I can tell both codel and fq_codel avoid just
>> >>fine without inducing a random number generator. I welcome evidence to
>> >>the contrary.
>> >
>> >Dropping single packets at a time (like CoDel and
>> >PDPC) should prevent global synch.
>> >Global synch is caused if a spike in one flow can
>> >induce a sequence of drops that hit other flows.
>> >
>> >Theoretically, there could be a problem with
>> >dropping in a deterministic pattern as CoDel does
>> >(a flow could game it by avoiding sending around
>> >the times it knows drops will occur). But it's a
>> >lot easier to just send faster by ignoring drop signals anyway.
>> >
>> >CoDel's determinism might become problematic if
>> >unresponsiveness to congestion became a serious
>> >problem, so that it became necessary to police congestion responsiveness.
>> >
>> >IOW, when you're putting algorithms in a network
>> >that could be there for decades, don't not use
>> >randomness lightly. Randomness has nice security properties.
>> >
>> >
>> >Bob
>> >
>> >
>> >________________________________________________________________
>> >Bob Briscoe,                                                  BT
>> >
>> >_______________________________________________
>> >aqm mailing list
>> >[email protected]
>> >https://www.ietf.org/mailman/listinfo/aqm
>
>
> ________________________________________________________________
> Bob Briscoe,                                                  BT



-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Re: [aqm] draft-ietf-aqm-pie-01: review

Reply via email to