Ran, Jari, All, To emphasize some of the excellent points that Ran made ...
On Jun 21, 2011, at 9:00 AM, RJ Atkinson wrote: > Separately, packet re-ordering can (and routinely does) happen > in the deployed world already, regardless of contents of the > Flow Label field. So receiving nodes already have to be able > to cope with reordered packets. An example which routinely happens in today's networks would be link restoration. In that case, the network is restoring traffic from a much longer path to a shorter, more optimal path in the network. Depending on the transmission rate of the transmitter, this can/will lead to temporary reordering of microflows at the receivers. Do note that in well-operated networks this reordering is, hopefully, transient and of an extremely short duration. > There is a 4th implementation option, which is to use only fields > in the base IPv6 header for all cases where a router adds a Flow Label. > That implementation option adds little or no value for load-balancing > as compared with a zero Flow Label, because existing load-balancing > algorithms in the deployed world already tend to use the variable > fields of the IPv6 header (e.g. source IPv6 address, destination IPv6 > address, maybe ToS byte) for deployed load-balancing situations. > For non-fragmented packets, the deployed world already tends to use > the 5 input values that these I-Ds discuss, by the way. I think it's important to re-emphasize and expound upon the above. Namely, deployed IPv6 routers *already* [should] identify fragmented vs. non-fragmented packets, presumably by inspecting the "Next Header field of the last header of the Unfragmentable Part" for value 44 [RFC 2460, Section 4.5] for 'Fragment Header' for the purposes of deciding whether /or not/ they should attempt to identify Next Headers containing the Upper Layer protocol and, subsequently, the {protocol, src_port & dst_port} that will be fed, (along with {src_ip, dst_ip}), into LAG and/or ECMP hash algorithms for fine-grained load-balancing. "Conservative" router/switch implementations strive to reduce the risk of _persistent_ reordering of an individual microflow. IOW, since non-first fragments will not contain Upper Layer protocol information, (specifically: {src_port, dst_port}), that can be fed as input-keys to LAG and/or ECMP hash algorithms, the "safe" thing they should do is to only use the 2-tuple of {src_ip, dst_ip} as input-keys for _all_ fragments within a microflow. Obviously, this leads to 'coarse-grained' load-balancing for microflows containing fragmented packets. As with many things, Engineering is about properly managing a series of trade-offs. Currently, the advantage of avoiding persistent reordering of fragmented microflows out-weighs the disadvantage of only being able to perform coarse-grained load-balancing of the assumed very small amount of fragmented microflows. If this draft is widely implemented & deployed and originating hosts are encoding a "uniformly distributed", non-zero flow-label in all packets (fragmented or not), then it would seem logical that routers would be adapted so that: a) If they encounter a Fragment Header they use: {src_ip, dst_ip + flow_label} as input-keys to the LAG + ECMP hash algorithms; and/or, b) If they encounter a Next Header with, for example, an Upper Layer Protocol that they have *not* (yet?) implemented a parsing routine to extract appropriate input-keys (or, can't, because it's too deep in the packet's headers), then they revert back to using {src_ip, dst_ip + flow_label} as input-keys to the LAG + ECMP hash algorithms[1]; and/or, c) [Assuming widespread use of the flow-label], they no longer even bother looking at any Next Headers in all packets and _always_ use {src_ip, dst_ip + flow_label} for input-keys to LAG + ECMP hash algorithms. Personally, I see (a) & (b) as being a short- to medium-term "wins" that could be safely implemented, by default, in the next-spin of NP, FPGA SW and ASIC HW, given the existence of this, hopefully soon, RFC. Obviously, (c) is going to be a little further out. I assume that, similar to today's router implementations, router vendors will likely provide the flow-label as yet another input-key that may be used as input-keys for LAG + ECMP hash algorithms. It will then be up to individual operators to determine the appropriate time to configure their routers/switches to, for example, only use: {src_ip, dst_ip + flow_label} when they are comfortable doing so for all traffic. > IMHO, the vast majority of the benefit to using the IPv6 Flow Label > for load-balancing accrues to those IPv6 packets that have been > fragmented where the originating node inserts the non-zero Flow Label > value based on the documented 5 input parameters. +1 in the short- to medium (?) term. I would also point out a substantial additional advantage is [long-term] architectural flexibility in that the end-points (hosts) may freely use *new* transport protocols (SCTP, DCCP, UDP-lite, etc.) so long as they continue to label all packets with a "uniformly distributed", non-zero flow-label so that [Core] routers/switches have something they can safely use as input-keys for LAG and/or ECMP hash algorithms. At least, that's one part of the network that we don't need to worry about upgrading to support new transport-layer protocols. Unfortunately, middleboxes (FW's or, more generally, "security GW's") might still have to be adapted depending on the applicability of the new transport-layer protocol to various network types, (e.g.: SOHO vs. Large-ish Enterprise). Thanks, -shane [1] One example I can think of here is UDP-lite. Silly as though it may seem, (since the format of UDP and UDP-lite headers are nearly identical), parsing routines to extract {src_port, dst_port} from UDP-lite headers are not [widely] implemented in deployed equipment, today, because it is assumed this isn't a widely used transport-layer protocol. Depending on the implementation (ASIC, NP or FPGA), they might be adapted to recognize UDP-lite, but that's a lot of cost & work ... *just* for one additional transport-layer protocol! -------------------------------------------------------------------- IETF IPv6 working group mailing list ipv6@ietf.org Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6 --------------------------------------------------------------------