I want to emphasis one aspect of what Wes said.
We see ECMP (and layer two link aggregates) used almost everywhere.
They are used for many reasons. For example, in some network designs I
have seen there are at least two links between every pair of devices.
Those designs also usually include at least 2 L3 level (ECMP) paths.
This is NOT a niche application.
Conversely, as far as I can tell, all of the other uses for the field
appear to be niche applications, with limited utility.
Yours,
Joel
George, Wes E IV [NTK] wrote:
-----Original Message-----
From: Mark Smith [mailto:i...@69706e6720323030352d30312d31340a.nosense.org]
Sent: Saturday, June 19, 2010 3:28 AM
To: George, Wes E IV [NTK]
Cc: Brian E Carpenter; 6man
Subject: Re: [Fwd: I-D Action:draft-carpenter-6man-flow-update-03.txt]
Sorry for the (very) late response, finally remembered to reply
Does there have to be a single use, or more specifically, a single
specific use?
[[WEG]] well I don't think that there has to be a single use, but I do think
that the requirement for immutability dramatically limits our options, and for
the reasons I detailed originally, I think that particular thing should go
away. I'm not saying that there's no possibility for other uses, just that
those other uses should not assume immutability.
It seems to me that one of the reason why there have be quite a variety
of proposals on it's use are that it has some attractive properties
that no other header fields or extension headers have i.e.
- it's always in the IPv6 header, rather than optional like extension
headers
- all IPv6 implementations in at least the past 10 years support it
- it is a constant and fixed size, avoiding the complexity and
performance impact of dealing with a variable length field
- it's mutable by the network
- being 20 bits in size, it can be used to encode a million values
[[WEG]] All good points, and I think other uses can coexist, should they ever
get enough support to be widely implemented. I'm simply saying that I support
flow identification as the primary implementation, and that I support changing
the restrictions to make this as effective/efficient as possible, vs waiting on
unspecified and otherwise TBD other proposals that don't, in my mind, have
nearly as compelling of a use case/requirement to use this specific field, nor
much in the way of support and wide-scale implementation to require us to
consider backward compatibility a major requirement.
Once concern I have about changing the above properties to suit the the
ECMP traffic load balancing use case is that I think ECMP is a fairly
niche use. It seems to me that only the largest of networks need to use
ECMP because individual link speeds such as 40Gbps aren't large enough
for them.
[[WEG]] response to this below
All other networks running IPv6 won't gain any benefit from
this use case, yet they're always paying the price of the flow field
because it is in each IPv6 packet.
[[WEG]] since the flow field is always in the packet whether it's all 0s, all 1s, or
contains useful data, I think the "paying the price for not using it" is a red
herring at best. And since most networks running IPv6 do eventually need their packets to
traverse the big ISP networks, they benefit from load balancing working properly on said
networks, even if they benefit indirectly because their TCP doesn't throttle to
compensate for a big UDP flow that is load balancing poorly and monopolizing one of the
pipes, etc.
I think with standardisation of 100Gbps link speeds, and talk of 1Tbps link
speeds in the relatively
near future, ECMP traffic load balancing usefulness for the largest
of networks will also be reduced. I think making the flow field more
widely useful than this specific case would be better.
[[WEG]] Ok, first, how do you propose "making the flow field more widely useful"? Do you
have something specific in mind, or is this more a case of, "let's wait and see what people
come up with in the future"?
Second, I fundamentally disagree with the notion that ECMP is a niche use
limited to ISPs that need bigger pipes than whatever is currently the largest
available and will go away as soon as we have bigger pipes. There are many,
many enterprises that are using link bundling to make bigger than 10G pipes out
of cheap 10GEs (or even bundling GE ports together because that's what they
have on the boxes they're using), especially when they have datacenters full of
servers that can now hand off 10GE themselves. At least on the routers I've
worked with, they often use the same hash-based layer 3 flow determination to
do load balancing between the members of the bundle even if the bundling itself
is at layer 2.
There are also implementations that carry traffic encapsulated and obscure the
src/dst for the infrastructure to use for load balancing, even within the
enterprise space. That aside, if you have multiple servers that are capable of
generating fractions of 10G flows by themselves, you're still open to some risk
if you try to balance that across 10G pipes. It's not reasonable to assume that
just because there are multiple devices involved that simple src/dst hashing
will work reliably in all cases. Works ok if it's a web server and thousands of
unique requests are hitting it. More or less completely breaks if it's doing
database replication and it might talk to 2 or 3 other hosts at most, or if a
lot of the traffic is UDP and doesn't throttle, etc.
I used 10G as an example above because that's what is common today. But you can
just as easily substitute 40GE, 100GE, etc into the above and it still holds
true. The reality is that no matter what size the biggest pipes are, there are
always going to be a requirement to aggregate traffic from multiple sources
into things bigger than those pipes, and it's not limited to the largest ISPs.
There is certainly a delay between the latest and greatest (40G, 100G, 1T)
being available on routers and people building servers that can singlehandedly
fill them, but it's only a delay. Further, there is a cost associated with
early adoption that makes it overly optimistic to assume that just because 40GE
or 100GE exists, that people will immediately move away from the lower speeds
in order to replace their bundles with bigger pipes for all of their
aggregation needs. It's not just about buying new ports (and fabric) on
routers. It requires a significant investment in Metro and long-haul
DW
DM to support the higher speeds on the WAN side, and there may be distance or
performance tradeoffs to consider.
Wes George
This e-mail may contain Sprint Nextel Company proprietary information intended
for the sole use of the recipient(s). Any use by others is prohibited. If you
are not the intended recipient, please contact the sender and delete all copies
of the message.
--------------------------------------------------------------------
IETF IPv6 working group mailing list
ipv6@ietf.org
Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------
--------------------------------------------------------------------
IETF IPv6 working group mailing list
ipv6@ietf.org
Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------