On 2010-06-22 06:41, Joel M. Halpern wrote: > I want to emphasis one aspect of what Wes said. > We see ECMP (and layer two link aggregates) used almost everywhere. They > are used for many reasons. For example, in some network designs I have > seen there are at least two links between every pair of devices. Those > designs also usually include at least 2 L3 level (ECMP) paths. > This is NOT a niche application. > > Conversely, as far as I can tell, all of the other uses for the field > appear to be niche applications, with limited utility.
This is indeed why the draft aims to square the circle by allowing local semantics but recommending ECMP/LAG compatible semantics as the default. Brian > > Yours, > Joel > > George, Wes E IV [NTK] wrote: >> -----Original Message----- >> From: Mark Smith >> [mailto:i...@69706e6720323030352d30312d31340a.nosense.org] >> Sent: Saturday, June 19, 2010 3:28 AM >> To: George, Wes E IV [NTK] >> Cc: Brian E Carpenter; 6man >> Subject: Re: [Fwd: I-D Action:draft-carpenter-6man-flow-update-03.txt] >> >> Sorry for the (very) late response, finally remembered to reply >> >> Does there have to be a single use, or more specifically, a single >> specific use? >> [[WEG]] well I don't think that there has to be a single use, but I do >> think that the requirement for immutability dramatically limits our >> options, and for the reasons I detailed originally, I think that >> particular thing should go away. I'm not saying that there's no >> possibility for other uses, just that those other uses should not >> assume immutability. >> >> >> It seems to me that one of the reason why there have be quite a variety >> of proposals on it's use are that it has some attractive properties >> that no other header fields or extension headers have i.e. >> >> - it's always in the IPv6 header, rather than optional like extension >> headers >> >> - all IPv6 implementations in at least the past 10 years support it >> >> - it is a constant and fixed size, avoiding the complexity and >> performance impact of dealing with a variable length field >> >> - it's mutable by the network >> >> - being 20 bits in size, it can be used to encode a million values >> >> [[WEG]] All good points, and I think other uses can coexist, should >> they ever get enough support to be widely implemented. I'm simply >> saying that I support flow identification as the primary >> implementation, and that I support changing the restrictions to make >> this as effective/efficient as possible, vs waiting on unspecified and >> otherwise TBD other proposals that don't, in my mind, have nearly as >> compelling of a use case/requirement to use this specific field, nor >> much in the way of support and wide-scale implementation to require us >> to consider backward compatibility a major requirement. >> >> >> Once concern I have about changing the above properties to suit the the >> ECMP traffic load balancing use case is that I think ECMP is a fairly >> niche use. It seems to me that only the largest of networks need to use >> ECMP because individual link speeds such as 40Gbps aren't large enough >> for them. >> [[WEG]] response to this below >> >> All other networks running IPv6 won't gain any benefit from >> this use case, yet they're always paying the price of the flow field >> because it is in each IPv6 packet. >> [[WEG]] since the flow field is always in the packet whether it's all >> 0s, all 1s, or contains useful data, I think the "paying the price for >> not using it" is a red herring at best. And since most networks >> running IPv6 do eventually need their packets to traverse the big ISP >> networks, they benefit from load balancing working properly on said >> networks, even if they benefit indirectly because their TCP doesn't >> throttle to compensate for a big UDP flow that is load balancing >> poorly and monopolizing one of the pipes, etc. >> >> >> I think with standardisation of 100Gbps link speeds, and talk of 1Tbps >> link speeds in the relatively >> near future, ECMP traffic load balancing usefulness for the largest >> of networks will also be reduced. I think making the flow field more >> widely useful than this specific case would be better. >> [[WEG]] Ok, first, how do you propose "making the flow field more >> widely useful"? Do you have something specific in mind, or is this >> more a case of, "let's wait and see what people come up with in the >> future"? >> >> Second, I fundamentally disagree with the notion that ECMP is a niche >> use limited to ISPs that need bigger pipes than whatever is currently >> the largest available and will go away as soon as we have bigger >> pipes. There are many, many enterprises that are using link bundling >> to make bigger than 10G pipes out of cheap 10GEs (or even bundling GE >> ports together because that's what they have on the boxes they're >> using), especially when they have datacenters full of servers that can >> now hand off 10GE themselves. At least on the routers I've worked >> with, they often use the same hash-based layer 3 flow determination to >> do load balancing between the members of the bundle even if the >> bundling itself is at layer 2. >> There are also implementations that carry traffic encapsulated and >> obscure the src/dst for the infrastructure to use for load balancing, >> even within the enterprise space. That aside, if you have multiple >> servers that are capable of generating fractions of 10G flows by >> themselves, you're still open to some risk if you try to balance that >> across 10G pipes. It's not reasonable to assume that just because >> there are multiple devices involved that simple src/dst hashing will >> work reliably in all cases. Works ok if it's a web server and >> thousands of unique requests are hitting it. More or less completely >> breaks if it's doing database replication and it might talk to 2 or 3 >> other hosts at most, or if a lot of the traffic is UDP and doesn't >> throttle, etc. >> >> I used 10G as an example above because that's what is common today. >> But you can just as easily substitute 40GE, 100GE, etc into the above >> and it still holds true. The reality is that no matter what size the >> biggest pipes are, there are always going to be a requirement to >> aggregate traffic from multiple sources into things bigger than those >> pipes, and it's not limited to the largest ISPs. There is certainly a >> delay between the latest and greatest (40G, 100G, 1T) being available >> on routers and people building servers that can singlehandedly fill >> them, but it's only a delay. Further, there is a cost associated with >> early adoption that makes it overly optimistic to assume that just >> because 40GE or 100GE exists, that people will immediately move away >> from the lower speeds in order to replace their bundles with bigger >> pipes for all of their aggregation needs. It's not just about buying >> new ports (and fabric) on routers. It requires a significant >> investment in Metro and long-haul > DW >> DM to support the higher speeds on the WAN side, and there may be >> distance or performance tradeoffs to consider. >> >> Wes George >> >> This e-mail may contain Sprint Nextel Company proprietary information >> intended for the sole use of the recipient(s). Any use by others is >> prohibited. If you are not the intended recipient, please contact the >> sender and delete all copies of the message. >> >> -------------------------------------------------------------------- >> IETF IPv6 working group mailing list >> ipv6@ietf.org >> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6 >> -------------------------------------------------------------------- >> > -------------------------------------------------------------------- > IETF IPv6 working group mailing list > ipv6@ietf.org > Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6 > -------------------------------------------------------------------- > -------------------------------------------------------------------- IETF IPv6 working group mailing list ipv6@ietf.org Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6 --------------------------------------------------------------------