Re: [Fwd: I-D Action:draft-carpenter-6man-flow-update-03.txt]

Mark Smith Mon, 21 Jun 2010 14:47:24 -0700

I'll read George's email more thoroughly tonight.


On Mon, 21 Jun 2010 14:41:06 -0400
"Joel M. Halpern" <j...@joelhalpern.com> wrote:

> I want to emphasis one aspect of what Wes said.
> We see ECMP (and layer two link aggregates) used almost everywhere. 
> They are used for many reasons.  For example, in some network designs I 
> have seen there are at least two links between every pair of devices. 
> Those designs also usually include at least 2 L3 level (ECMP) paths.
> This is NOT a niche application.
> 

Is there a distinction that ECMP is "in use" or "being used"?

The reason I question that is that Cisco routers default to using it,
and I guess that means that Juniper and other vendors gear might too.
So ECMP might be commonly "in use", however that may not be "commonly
used".

ECMP can make troubleshooting harder, because it makes traffic paths
less predictable - traceroute doesn't necessarily show the path that
the traffic that is having trouble traverses. In the past I've
specifically switched ECMP off for that reason - I gained no bandwidth
benefit from it, because the network had enough bandwidth (plenty
actually - the throughput limitation was the routers themselves and
what packet processing they were doing), so it merely increased
troubleshooting time and complexity.

ECMP does provide another benefit, which is a reason I've switched it
on in the past. It pre-loads alternative routes into the FIB. If a link
failure occurs and can be detected almost immediately e.g. due to loss
of link carrier, then forwarding isn't interrupted while routing
protocol re-convergence occurs. I found value in that, although I'd
have liked the ability to switch off the resulting active sharing of
traffic across those FIB entries once they're there, because of the
increased complexity in troubleshooting. This use case was certainly
valuable, however it was of course limited to where equal cost paths
between the sources and destinations. It doesn't provide this benefit
to all paths through your network. IOW, across your network, it's only
a partial solution to reducing the time it takes to recover from link
failure.

If we're going to use the flow id field for ECMP, when comparing it
to other potential uses, I'd like to be sure that we're either providing
a useful benefit to those people who are unintentionally using it, or
conversely, making sure we're not taking away some other use of it that
they'd place more value on, and would be willing to switch ECMP off to
gain.

Regards,
Mark.

> Conversely, as far as I can tell, all of the other uses for the field 
> appear to be niche applications, with limited utility.
> 
> Yours,
> Joel
> 
> George, Wes E IV [NTK] wrote:
> > -----Original Message-----
> > From: Mark Smith [mailto:i...@69706e6720323030352d30312d31340a.nosense.org]
> > Sent: Saturday, June 19, 2010 3:28 AM
> > To: George, Wes E IV [NTK]
> > Cc: Brian E Carpenter; 6man
> > Subject: Re: [Fwd: I-D Action:draft-carpenter-6man-flow-update-03.txt]
> > 
> > Sorry for the (very) late response, finally remembered to reply
> > 
> > Does there have to be a single use, or more specifically, a single
> > specific use?
> > [[WEG]] well I don't think that there has to be a single use, but I do 
> > think that the requirement for immutability dramatically limits our 
> > options, and for the reasons I detailed originally, I think that particular 
> > thing should go away. I'm not saying that there's no possibility for other 
> > uses, just that those other uses should not assume immutability.
> > 
> > 
> > It seems to me that one of the reason why there have be quite a variety
> > of proposals on it's use are that it has some attractive properties
> > that no other header fields or extension headers have i.e.
> > 
> > - it's always in the IPv6 header, rather than optional like extension
> >   headers
> > 
> > - all IPv6 implementations in at least the past 10 years support it
> > 
> > - it is a constant and fixed size, avoiding the complexity and
> >   performance impact of dealing with a variable length field
> > 
> > - it's mutable by the network
> > 
> > - being 20 bits in size, it can be used to encode a million values
> > 
> > [[WEG]] All good points, and I think other uses can coexist, should they 
> > ever get enough support to be widely implemented. I'm simply saying that I 
> > support flow identification as the primary implementation, and that I 
> > support changing the restrictions to make this as effective/efficient as 
> > possible, vs waiting on unspecified and otherwise TBD other proposals that 
> > don't, in my mind, have nearly as compelling of a use case/requirement to 
> > use this specific field, nor much in the way of support and wide-scale 
> > implementation to require us to consider backward compatibility a major 
> > requirement.
> > 
> > 
> > Once concern I have about changing the above properties to suit the the
> > ECMP traffic load balancing use case is that I think ECMP is a fairly
> > niche use. It seems to me that only the largest of networks need to use
> > ECMP because individual link speeds such as 40Gbps aren't large enough
> > for them.
> > [[WEG]] response to this below
> > 
> > All other networks running IPv6 won't gain any benefit from
> > this use case, yet they're always paying the price of the flow field
> > because it is in each IPv6 packet.
> > [[WEG]] since the flow field is always in the packet whether it's all 0s, 
> > all 1s, or contains useful data, I think the "paying the price for not 
> > using it" is a red herring at best. And since most networks running IPv6 do 
> > eventually need their packets to traverse the big ISP networks, they 
> > benefit from load balancing working properly on said networks, even if they 
> > benefit indirectly because their TCP doesn't throttle to compensate for a 
> > big UDP flow that is load balancing poorly and monopolizing one of the 
> > pipes, etc.
> > 
> > 
> > I think with standardisation of 100Gbps link speeds, and talk of 1Tbps link 
> > speeds in the relatively
> > near future, ECMP traffic load balancing usefulness for the largest
> > of networks will also be reduced. I think making the flow field more
> > widely useful than this specific case would be better.
> > [[WEG]] Ok, first, how do you propose "making the flow field more widely 
> > useful"? Do you have something specific in mind, or is this more a case of, 
> > "let's wait and see what people come up with in the future"?
> > 
> > Second, I fundamentally disagree with the notion that ECMP is a niche use 
> > limited to ISPs that need bigger pipes than whatever is currently the 
> > largest available and will go away as soon as we have bigger pipes. There 
> > are many, many enterprises that are using link bundling to make bigger than 
> > 10G pipes out of cheap 10GEs (or even bundling GE ports together because 
> > that's what they have on the boxes they're using), especially when they 
> > have datacenters full of servers that can now hand off 10GE themselves. At 
> > least on the routers I've worked with, they often use the same hash-based 
> > layer 3 flow determination to do load balancing between the members of the 
> > bundle even if the bundling itself is at layer 2.
> > There are also implementations that carry traffic encapsulated and obscure 
> > the src/dst for the infrastructure to use for load balancing, even within 
> > the enterprise space. That aside, if you have multiple servers that are 
> > capable of generating fractions of 10G flows by themselves, you're still 
> > open to some risk if you try to balance that across 10G pipes. It's not 
> > reasonable to assume that just because there are multiple devices involved 
> > that simple src/dst hashing will work reliably in all cases. Works ok if 
> > it's a web server and thousands of unique requests are hitting it. More or 
> > less completely breaks if it's doing database replication and it might talk 
> > to 2 or 3 other hosts at most, or if a lot of the traffic is UDP and 
> > doesn't throttle, etc.
> > 
> > I used 10G as an example above because that's what is common today. But you 
> > can just as easily substitute 40GE, 100GE, etc into the above and it still 
> > holds true. The reality is that no matter what size the biggest pipes are, 
> > there are always going to be a requirement to aggregate traffic from 
> > multiple sources into things bigger than those pipes, and it's not limited 
> > to the largest ISPs. There is certainly a delay between the latest and 
> > greatest (40G, 100G, 1T) being available on routers and people building 
> > servers that can singlehandedly fill them, but it's only a delay. Further, 
> > there is a cost associated with early adoption that makes it overly 
> > optimistic to assume that just because 40GE or 100GE exists, that people 
> > will immediately move away from the lower speeds in order to replace their 
> > bundles with bigger pipes for all of their aggregation needs. It's not just 
> > about buying new ports (and fabric) on routers. It requires a significant 
> > investment in Metro and long-hau
 l
>  DW
> >  DM to support the higher speeds on the WAN side, and there may be distance 
> > or performance tradeoffs to consider.
> > 
> > Wes George
> > 
> > This e-mail may contain Sprint Nextel Company proprietary information 
> > intended for the sole use of the recipient(s). Any use by others is 
> > prohibited. If you are not the intended recipient, please contact the 
> > sender and delete all copies of the message.
> > 
> > --------------------------------------------------------------------
> > IETF IPv6 working group mailing list
> > ipv6@ietf.org
> > Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> > --------------------------------------------------------------------
> > 
--------------------------------------------------------------------
IETF IPv6 working group mailing list
ipv6@ietf.org
Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------

Re: [Fwd: I-D Action:draft-carpenter-6man-flow-update-03.txt]

Reply via email to