I do not have time to discuss this. Will resume near the prague ietf. On Fri, Mar 20, 2015 at 3:37 AM, Michael Welzl <[email protected]> wrote: > Hi, > > Thanks again for your comments! > While you do raise some interesting points and ask some interesting questions > (e.g. if someone has used the linux DCTCP code...), I'll keep my responses in > line below focused on actual text suggestions for the draft. > > >> On 20 Mar 2015, at 02:25, Dave Taht <[email protected]> wrote: >> >> On Thu, Mar 19, 2015 at 12:54 AM, <[email protected]> wrote: >>> Thanks Dave for reading this ID and providing your comments. It's really >> >> As I am the person that fought to get a pitfalls portion into this >> document, and then spaced on adding any text, I apologize for the >> delay in feedback. I am extremely busy with make-wifi-fast and have >> otherwise dropped out of the ietf besides this group. >> >>> good to explore what may be missing. >> >> For starters, to what extent do others here have operational >> experience with deploying ECN? I saw that gorry, in particular, was >> doing some interesting work in testing satellite systems, to which I >> provided a profusion of comments privately as to how I would use squid >> with ecn and fq_codel to better handle web traffic. ? >> >> In my case, tcp + fq_codel (Well, cake, these days) with ecn is >> enabled in both my labs to the fullest extent possible, and used day >> in and day out, when not testing something else. It is also on the 10 >> machines I have spread around the world on linode, and isc... and as >> best as I recall a few in my google compute cluster. It is used to >> protect babel routing packets from being dropped by the queue >> management system, I have a multiplicity of benchmarks comparing life >> with and without ecn in netperf-wrapper, and so on. >> >> tcp with ecn enabled and fq_codel is also now used throughout >> archive.org's systems, but operational difficulties (e.g. configuring >> RED right) have precluded using it on the switches presently in use. >> It was my hope, this year, to establish a full blown 10+GigE router on >> at least some of their traffic this past year, but ENOFUNDING. >> >> I would love to know, in particular, if anyone has been trying the >> latest and now readily available in linux DCTCP in a real deployment >> anywhere, and was willing to talk about it? I see, for example, that >> per route setting of ECN is also now in the kernel, and I surmise >> there must be a good reason for that. >> >> I have several hacky test tools that use ECN in various ways, which >> could use some more users and love. >> >>>> Dave Taht <[email protected]> wrote: >>>>> >>>>> section 6 addition. (could use more verbiage) >>>>> >>>>> 6.3 "An AQM that is ECN aware MUST have overload protection. >>>> >>>> I fear I cannot discern what you mean this to say. :^( >> >> Overload protection has been discussed here before. Basically you need >> an operational point at which you drop, rather than mark packets. The >> consensus here is that operational point should be mark before you >> would normally drop, but pie,codel,fq_codel, cake and red *do not do* >> that presently, and there are severe constraints/hw/sw costs to having >> two different setpoints. > > This statement seems to conflate to separate issues: > The phrase "mark before you would normally drop" talks about where the > marking point should be (assuming that "normally" means: if the packet was > not ECN-enabled). > What you say about overload protection is something else: it's a point at > which an AQM mechanism would make a decision to drop *ECN-enabled* packets. > > I have not seen any sign of consensus for the latter being good practice, and > I, for one, am strongly against it, for the following two reasons: > 1) it is potentially harmful: later in your email you point at the importance > of ECN for non-TCP traffic - indeed the reaction to ECN might not always be > exactly the same as it would be to a packet drop (in particular with "mark > before you would normally drop"). However, any such behaviour becomes moot > when, in the same round-trip time, drops are enforced on some of the > ECN-enabled packets: then the sender has no way but to react to the drop the > "normal" way, meaning that any potential benefit from a different reaction to > ECN is eliminated. > 2) I can't see how it would help against attacks: any queue has an upper > limit, and I can try to kill all other traffic by sending at a crazy high > rate with or without ECN. Most AQM mechanisms operate probabilistically > (well, not CoDel), based on an average (delay or queue length), and I can't > see how sometimes dropping instead of ECN-marking packets would help against > such sources. > > >> The present version of codel in linux has no overload protection. It >> will merrily keep marking packets until the packet limit is exceeded, >> then drop, rather than drop at any threshold. Thus ecn is disabled by >> default in that version. > > Why? It will drop anyway when the total queue length is exceeded. > > >> There have long been several patches being >> tested in cerowrt (and available for all to try) that attempt various >> methods to do this more sanely, which I have also reported here. The >> two we have settled on will hopefully be comprehensively evaluated >> this summer. >> >> There was (last I looked) no way to do ecn in ns2, and support for ns3 >> has not quite landed yet as best I recall. >> >> We viewed fq_codel with/ecn as safe to deploy, due to the flow >> isolation, and that is still mostly true. For the hardware >> implementation however, we dropped the search all queues portion of >> the algorithm (see last paragraph of section 5.1 of the fq_codel >> draft) and are still in search of saner ways to find the largest >> queue(s) to search in parallel. >> >> We added a mildly smarter version of overflow protection to the linux >> version of pie, but it misbehaves when random numbers are excessively >> random, dropping when it should probably still be marking. >> >> None of this is directly applicable to the language of the document, >> except by better explaining multiple things to naive users. >> >> 1) enabling ECN by itself accomplishes nothing, unless there is an AQM >> on the bottleneck link(s) also > > Isn't this blindingly obvious, even by the definition of ECN in RFC 3168? > > >> I note that stuart cheshire did not fully grasp this duality until I >> worked closely with him on: >> http://www.bufferbloat.net/projects/cerowrt/wiki/Enable_ECN >> >> He's a smart cookie. Others aren't. More context around ECN is needed. >> >> 2) That application developers blithely enabling ecn is potentially >> dangerous to the health of the network. > > I have neither seen evidence nor consensus of this being correct. > > >> It would seem intuitive to a gamer, perhaps, to mark all their packets >> with ECN, so that by god, all their packets got through. (it's not >> only intuitive, but other forms of sparse traffic can also benefit >> from being ecn marked. I also did favor the ECN enablement of the main >> frame in the webrtc nada proposal for example. I have marked dns and >> icmpv6 traffic with ECN and watched that do fascinating things to the >> network, also. >> >> Everyone here is seemingly stuck on ecn + tcp, where I have long felt >> that safer places to innovate were in quic and webrtc. >> >> ooh! another 6.x section addition: >> >> 6.x an example where ecn marking can be bad is where the inner header >> is copied to the outer, verbatim, and not copied back. >> >> this error in code exists in the field today, it is presently in the >> tinc 1.1 vpn system. > > Agreed, I think we should address that. > > >>>>> It is trivial for a malbehaved application/worm/bot to mark all >>>>> its packets with ECN and thus gain priority over other traffic >>>>> not ecn marked. >>>> >>>> This somewhat-paranoid claim rests on several assumptions that I >>>> hope we will recommend against. >> >> Not paranoid at all. Trivially feasible, and a real potential attack >> vector. If you would like to be scared about how a flood of ecn marked >> packets could do worse damage, you might want to look at the scope of >> attacks that cloudflare has to deal with regularly. > > Please share details. I have trouble understanding the danger of ECN. > > >>>> - the most obvious is an assumption that a tail-drop node will mark >>>> _instead_ of dropping ECN-capable packets. This is not actually >>>> possible, and I hope we will strongly deprecate it. Tail-drop should >>>> drop packets regardless of ECN bits. >> >> I agree that a tail drop queue will not do ECN. However in an aqm >> system without overload protection, you basically end up with a tail >> drop queue, one that also ends up dropping all the non-ecn marked >> packets. >> >>>> >>>> - there is also an assumption that an ECN-capable transport can mark >>>> its packets as ECN-capable and then never reduce its sending rate. >>>> I suppose it could; but not-ECN-capable transports can also never >>>> reduce the sending rate. :^( And the not-ECN-capable transports >>>> could accomplish the same reduction in "lost" packets by FEC. >> >> This is false equivalence. If ecn can be gamed, it will be gamed. > > As above, yes it can be gamed - what I have not yet seen is evidence of this > being a serious problem (any more serious than transports sending many > non-ECN-capable packets without adapting their rate). > > >> A lot of my support of ecn is basically that packet loss is so trivial >> above 100mbit that it really doesn't matter much if it used or not, so >> it helps a little in the general case, but with well behaved apps >> getting marked .01% of the time, on or off and the whole debate is a >> tempest in a tea-cup. >> >> It does seem very useful on longer RTTs. >> >>>> >>>> I believe we are going to "suggest" a lower marking threshhold for >> >> despite 3 years of trying have been unable to come up with an >> algorithm for that that works well with different setpoints with mixed >> traffic. >> >>>> ECN-capable packets than the dropping threshhold for not-ECN-capable >>>> packets at AQM-capable nodes. This should reduce the paranoia level, >>>> I hope, since the ECN-capable flows will get congestion signals when >>>> not-ECN-capable packets are _not_ being dropped. >> >> Look forward to seeing a working version from someone. >> >>>> We should concentrate our efforts on providing useful signals: >>>> that some transports might make poor use of these signals is beyond >>>> our scope. >> >> I thought we were providing useful *guidance* to developers of network >> applications. >> >>>> >>> I understand that router overload needs to be considered in the design of >>> an AQM algorithm, but I inclined to think there is not much say to >>> application designers, and that this need may have been said said in the >>> AQM Recommendations document. Agreeing with John, I don't see this as the >>> place to start putting detail on how routers implement AQM. >> >> That's why it was a short sentence to begin with. However, some >> discussion of the benefits and pitfalls of using ECN in new >> applications I do feel is needed. >> >>>>> 6.4 Enabling ECN at the application layer requires access to the IP >>>>> header fields, which are usually abstracted out completely at the >>>>> tcp layer, and hard to access from udp with multiple non-portable >>>>> methods to do so. >>>> >>>> Yes, there are TCP stacks which are ECN-unfriendly; but there are >>>> enough _today_ which are friendly to ECN. >> >> Again, tcp thinking. >> >> 1) It is trivial to write an a udp app that emits ecn. Same setsockopt >> as IP_TOs. Mosh and multiple other apps does it already. >> 2) It is less trivial to write a udp app that handles ecn correctly. >> Mosh does that also, but so far as I know they got the BSD >> implementation wrong. >> >> the sendmsg and recvmsg apis are in dire need of an update since their >> specification. >> >> IF you wish to refine the scope of this document to be only TCP with >> ECN, and exclude use case such as vpn encapsulation and udp >> applications where it might be useful (like webrtc), ok... but.... >> >>>> >>> I also agree with what you say - although, again I'm not sure we need to >>> add this here, I think the design of transports is really the topic of >>> RFC5405.bis, >>> >>>>> ECN over UDP in new applications such as webrtc and Quic has >>>>> great potential for many other applications, however the same >>>>> care of design that went into ECN on TCP needs to go into >>>>> future UDP based protocols. >>>> >>>> I wouldn't disagree; but those issues are essentially-solved >>>> problems today. >> >> You are kidding me, right? >> >>>>> Some other section that may end up here? >>>>> >>>>> ECN marking other sorts of flows (example routing packets) that have a >>>>> higher priority than other flows on link-local packets may be of benefit >>>>> with wider availability of aqm technologies that are ecn aware... >>>> >>> I'm not sure I understand what you are suggesting with respect to ECN. >>> >>>> I suppose there might be _some_ use for ECN on routing packets; but >>>> I doubt this is desirable today. ECN is not-at-all about getting a >>>> higher priority -- it's about getting congestion signals without >>>> packet loss. >> >> On that we agree, and I should probably have used a different example >> from routing, citing the original webrtc nada draft as my example. >> >>>> >>> I think the IETF would normally recommend diffserv priority marking for >>> network control traffic. >> >> I am all in favor of CS6. Not so much CS7. And as you know, few >> diffserv priorities survive e2e transit, and ECN markings survive much >> more often end to end than diffserv. >> >>> >>>> -- >>>> John Leslie <[email protected]> >>>> >>> >>> Gorry >>> >>> >> > > Cheers, > Michael > >
-- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb _______________________________________________ aqm mailing list [email protected] https://www.ietf.org/mailman/listinfo/aqm
