I do not have time to discuss this. Will resume near the prague ietf.

On Fri, Mar 20, 2015 at 3:37 AM, Michael Welzl <[email protected]> wrote:
> Hi,
>
> Thanks again for your comments!
> While you do raise some interesting points and ask some interesting questions 
> (e.g. if someone has used the linux DCTCP code...), I'll keep my responses in 
> line below focused on actual text suggestions for the draft.
>
>
>> On 20 Mar 2015, at 02:25, Dave Taht <[email protected]> wrote:
>>
>> On Thu, Mar 19, 2015 at 12:54 AM,  <[email protected]> wrote:
>>> Thanks Dave for reading this ID and providing your comments. It's really
>>
>> As I am the person that fought to get a pitfalls portion into this
>> document, and then spaced on adding any text, I apologize for the
>> delay in feedback. I am extremely busy with make-wifi-fast and have
>> otherwise dropped out of the ietf besides this group.
>>
>>> good to explore what may be missing.
>>
>> For starters, to what extent do others here have operational
>> experience with deploying ECN? I saw that gorry, in particular, was
>> doing some interesting work in testing satellite systems, to which I
>> provided a profusion of comments privately as to how I would use squid
>> with ecn and fq_codel to better handle web traffic. ?
>>
>> In my case, tcp + fq_codel (Well, cake, these days) with ecn is
>> enabled in both my labs to the fullest extent possible, and used day
>> in and day out, when not testing something else. It is also on the 10
>> machines I have spread around the world on linode, and isc... and as
>> best as I recall a few in my google compute cluster. It is used to
>> protect babel routing packets from being dropped by the queue
>> management system, I have a multiplicity of benchmarks comparing life
>> with and without ecn in netperf-wrapper, and so on.
>>
>> tcp with ecn enabled and fq_codel is also now used throughout
>> archive.org's systems, but operational difficulties (e.g. configuring
>> RED right) have precluded using it on the switches presently in use.
>> It was my hope, this year, to establish a full blown 10+GigE router on
>> at least some of their traffic this past year, but ENOFUNDING.
>>
>> I would love to know, in particular, if anyone has been trying the
>> latest and now readily available in linux DCTCP in a real deployment
>> anywhere, and was willing to talk about it? I see, for example, that
>> per route setting of ECN is also now in the kernel, and I surmise
>> there must be a good reason for that.
>>
>> I have several hacky test tools that use ECN in various ways, which
>> could use some more users and love.
>>
>>>> Dave Taht <[email protected]> wrote:
>>>>>
>>>>> section 6 addition. (could use more verbiage)
>>>>>
>>>>> 6.3 "An AQM that is ECN aware MUST have overload protection.
>>>>
>>>>   I fear I cannot discern what you mean this to say. :^(
>>
>> Overload protection has been discussed here before. Basically you need
>> an operational point at which you drop, rather than mark packets. The
>> consensus here is that operational point should be mark before you
>> would normally drop, but pie,codel,fq_codel, cake and red *do not do*
>> that presently, and there are severe constraints/hw/sw costs to having
>> two different setpoints.
>
> This statement seems to conflate to separate issues:
> The phrase "mark before you would normally drop" talks about where the 
> marking point should be (assuming that "normally" means: if the packet was 
> not ECN-enabled).
> What you say about overload protection is something else: it's a point at 
> which an AQM mechanism would make a decision to drop *ECN-enabled* packets.
>
> I have not seen any sign of consensus for the latter being good practice, and 
> I, for one, am strongly against it, for the following two reasons:
> 1) it is potentially harmful: later in your email you point at the importance 
> of ECN for non-TCP traffic - indeed the reaction to ECN might not always be 
> exactly the same as it would be to a packet drop (in particular with "mark 
> before you would normally drop"). However, any such behaviour becomes moot 
> when, in the same round-trip time, drops are enforced on some of the 
> ECN-enabled packets: then the sender has no way but to react to the drop the 
> "normal" way, meaning that any potential benefit from a different reaction to 
> ECN is eliminated.
> 2) I can't see how it would help against attacks: any queue has an upper 
> limit, and I can try to kill all other traffic by sending at a crazy high 
> rate with or without ECN. Most AQM mechanisms operate probabilistically 
> (well, not CoDel), based on an average (delay or queue length), and I can't 
> see how sometimes dropping instead of ECN-marking packets would help against 
> such sources.
>
>
>> The present version of codel in linux has no overload protection. It
>> will merrily keep marking packets until the packet limit is exceeded,
>> then drop, rather than drop at any threshold. Thus ecn is disabled by
>> default in that version.
>
> Why? It will drop anyway when the total queue length is exceeded.
>
>
>> There have long been several patches being
>> tested in cerowrt (and available for all to try) that attempt various
>> methods to do this more sanely, which I have also reported here. The
>> two we have settled on will hopefully be comprehensively evaluated
>> this summer.
>>
>> There was (last I looked) no way to do ecn in ns2, and support for ns3
>> has not quite landed yet as best I recall.
>>
>> We viewed fq_codel with/ecn as safe to deploy, due to the flow
>> isolation, and that is still mostly true. For the hardware
>> implementation however, we dropped the search all queues portion of
>> the algorithm (see last paragraph of section 5.1 of the fq_codel
>> draft) and are still in search of saner ways to find the largest
>> queue(s) to search in parallel.
>>
>> We added a mildly smarter version of overflow protection to the linux
>> version of pie, but it misbehaves when random numbers are excessively
>> random, dropping when it should probably still be marking.
>>
>> None of this is directly applicable to the language of the document,
>> except by better explaining multiple things to naive users.
>>
>> 1) enabling ECN by itself accomplishes nothing, unless there is an AQM
>> on the bottleneck link(s) also
>
> Isn't this blindingly obvious, even by the definition of ECN in RFC 3168?
>
>
>> I note that stuart cheshire did not fully grasp this duality until I
>> worked closely with him on:
>> http://www.bufferbloat.net/projects/cerowrt/wiki/Enable_ECN
>>
>> He's a smart cookie. Others aren't. More context around ECN is needed.
>>
>> 2) That application developers blithely enabling ecn is potentially
>> dangerous to the health of the network.
>
> I have neither seen evidence nor consensus of this being correct.
>
>
>> It would seem intuitive to a gamer, perhaps, to mark all their packets
>> with ECN, so that by god, all their packets got through. (it's not
>> only intuitive, but other forms of sparse traffic can also benefit
>> from being ecn marked. I also did favor the ECN enablement of the main
>> frame in the webrtc nada proposal for example. I have marked dns and
>> icmpv6 traffic with ECN and watched that do fascinating things to the
>> network, also.
>>
>> Everyone here is seemingly stuck on ecn + tcp, where I have long felt
>> that safer places to innovate were in quic and webrtc.
>>
>> ooh! another 6.x section addition:
>>
>> 6.x an example where ecn marking can be bad is where the inner header
>> is copied to the outer, verbatim, and not copied back.
>>
>> this error in code exists in the field today, it is presently in the
>> tinc 1.1 vpn system.
>
> Agreed, I think we should address that.
>
>
>>>>> It is trivial for a malbehaved application/worm/bot to mark all
>>>>> its packets with ECN and thus gain priority over other traffic
>>>>> not ecn marked.
>>>>
>>>>   This somewhat-paranoid claim rests on several assumptions that I
>>>> hope we will recommend against.
>>
>> Not paranoid at all. Trivially feasible, and a real potential attack
>> vector. If you would like to be scared about how a flood of ecn marked
>> packets could do worse damage, you might want to look at the scope of
>> attacks that cloudflare has to deal with regularly.
>
> Please share details. I have trouble understanding the danger of ECN.
>
>
>>>> - the most obvious is an assumption that a tail-drop node will mark
>>>>  _instead_ of dropping ECN-capable packets. This is not actually
>>>>  possible, and I hope we will strongly deprecate it. Tail-drop should
>>>>  drop packets regardless of ECN bits.
>>
>> I agree that a tail drop queue will not do ECN. However in an aqm
>> system without overload protection, you basically end up with a tail
>> drop queue, one that also ends up dropping all the non-ecn marked
>> packets.
>>
>>>>
>>>> - there is also an assumption that an ECN-capable transport can mark
>>>>  its packets as ECN-capable and then never reduce its sending rate.
>>>>  I suppose it could; but not-ECN-capable transports can also never
>>>>  reduce the sending rate. :^( And the not-ECN-capable transports
>>>>  could accomplish the same reduction in "lost" packets by FEC.
>>
>> This is false equivalence. If ecn can be gamed, it will be gamed.
>
> As above, yes it can be gamed - what I have not yet seen is evidence of this 
> being a serious problem (any more serious than transports sending many 
> non-ECN-capable packets without adapting their rate).
>
>
>> A lot of my support of ecn is basically that packet loss is so trivial
>> above 100mbit that it really doesn't matter much if it used or not, so
>> it helps a little in the general case, but with well behaved apps
>> getting marked .01% of the time, on or off and the whole debate is a
>> tempest in a tea-cup.
>>
>> It does seem very useful on longer RTTs.
>>
>>>>
>>>>   I believe we are going to "suggest" a lower marking threshhold for
>>
>> despite 3 years of trying have been unable to come up with an
>> algorithm for that that works well with different setpoints with mixed
>> traffic.
>>
>>>> ECN-capable packets than the dropping threshhold for not-ECN-capable
>>>> packets at AQM-capable nodes. This should reduce the paranoia level,
>>>> I hope, since the ECN-capable flows will get congestion signals when
>>>> not-ECN-capable packets are _not_ being dropped.
>>
>> Look forward to seeing a working version from someone.
>>
>>>>   We should concentrate our efforts on providing useful signals:
>>>> that some transports might make poor use of these signals is beyond
>>>> our scope.
>>
>> I thought we were providing useful *guidance* to developers of network
>> applications.
>>
>>>>
>>> I understand that router overload needs to be considered in the design of
>>> an  AQM algorithm, but I inclined to think there is not much say to
>>> application designers, and that this need may have been said said in the
>>> AQM Recommendations document. Agreeing with John, I don't see this as the
>>> place to start putting detail on how routers implement AQM.
>>
>> That's why it was a short sentence to begin with. However, some
>> discussion of the benefits and pitfalls of using ECN in new
>> applications I do feel is needed.
>>
>>>>> 6.4 Enabling ECN at the application layer requires access to the IP
>>>>>    header fields, which are usually abstracted out completely at the
>>>>>    tcp layer, and hard to access from udp with multiple non-portable
>>>>>    methods to do so.
>>>>
>>>>   Yes, there are TCP stacks which are ECN-unfriendly; but there are
>>>> enough _today_ which are friendly to ECN.
>>
>> Again, tcp thinking.
>>
>> 1) It is trivial to write an a udp app that emits ecn. Same setsockopt
>> as IP_TOs. Mosh and multiple other apps does it already.
>> 2) It is less trivial to write a udp app that handles ecn correctly.
>> Mosh does that also, but so far as I know they got the BSD
>> implementation wrong.
>>
>> the sendmsg and recvmsg apis are in dire need of an update since their
>> specification.
>>
>> IF you wish to refine the scope of this document to be only TCP with
>> ECN, and exclude use case such as vpn encapsulation and udp
>> applications where it might be useful (like webrtc), ok... but....
>>
>>>>
>>> I also agree with what you say - although, again I'm not sure we need to
>>> add this here, I think the design of transports is really the topic of
>>> RFC5405.bis,
>>>
>>>>>    ECN over UDP in new applications such as webrtc and Quic has
>>>>>    great potential for many other applications, however the same
>>>>>    care of design that went into ECN on TCP needs to go into
>>>>>    future UDP based protocols.
>>>>
>>>>   I wouldn't disagree; but those issues are essentially-solved
>>>> problems today.
>>
>> You are kidding me, right?
>>
>>>>> Some other section that may end up here?
>>>>>
>>>>> ECN marking other sorts of flows (example routing packets) that have a
>>>>> higher priority than other flows on link-local packets may be of benefit
>>>>> with wider availability of aqm technologies that are ecn aware...
>>>>
>>> I'm not sure I understand what you are suggesting with respect to ECN.
>>>
>>>>   I suppose there might be _some_ use for ECN on routing packets; but
>>>> I doubt this is desirable today. ECN is not-at-all about getting a
>>>> higher priority -- it's about getting congestion signals without
>>>> packet loss.
>>
>> On that we agree, and I should probably have used a different example
>> from routing, citing the original webrtc nada draft as my example.
>>
>>>>
>>> I think the IETF would normally recommend diffserv priority marking for
>>> network control traffic.
>>
>> I am all in favor of CS6. Not so much CS7. And as you know, few
>> diffserv priorities survive e2e transit, and ECN markings survive much
>> more often end to end than diffserv.
>>
>>>
>>>> --
>>>> John Leslie <[email protected]>
>>>>
>>>
>>> Gorry
>>>
>>>
>>
>
> Cheers,
> Michael
>
>



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Reply via email to