Ran, Jari, All,

To emphasize some of the excellent points that Ran made ...

On Jun 21, 2011, at 9:00 AM, RJ Atkinson wrote:
> Separately, packet re-ordering can (and routinely does) happen 
> in the deployed world already, regardless of contents of the
> Flow Label field.  So receiving nodes already have to be able 
> to cope with reordered packets.

An example which routinely happens in today's networks would be link 
restoration.  In that case, the network is restoring traffic from a much longer 
path to a shorter, more optimal path in the network.  Depending on the 
transmission rate of the transmitter, this can/will lead to temporary 
reordering of microflows at the receivers.  Do note that in well-operated 
networks this reordering is, hopefully, transient and of an extremely short 
duration.  


> There is a 4th implementation option, which is to use only fields 
> in the base IPv6 header for all cases where a router adds a Flow Label.  
> That implementation option adds little or no value for load-balancing 
> as compared with a zero Flow Label, because existing load-balancing 
> algorithms in the deployed world already tend to use the variable 
> fields of the IPv6 header (e.g. source IPv6 address, destination IPv6 
> address, maybe ToS byte) for deployed load-balancing situations.  
> For non-fragmented packets, the deployed world already tends to use 
> the 5 input values that these I-Ds discuss, by the way.

I think it's important to re-emphasize and expound upon the above.  Namely, 
deployed IPv6 routers *already* [should] identify fragmented vs. non-fragmented 
packets, presumably by inspecting the "Next Header field of the last header of 
the Unfragmentable Part" for value 44 [RFC 2460, Section 4.5] for 'Fragment 
Header' for the purposes of deciding whether /or not/ they should attempt to 
identify Next Headers containing the Upper Layer protocol and, subsequently, 
the {protocol, src_port & dst_port} that will be fed, (along with {src_ip, 
dst_ip}), into LAG and/or ECMP hash algorithms for fine-grained load-balancing.

"Conservative" router/switch implementations strive to reduce the risk of 
_persistent_ reordering of an individual microflow.  IOW, since non-first 
fragments will not contain Upper Layer protocol information, (specifically: 
{src_port, dst_port}), that can be fed as input-keys to LAG and/or ECMP hash 
algorithms, the "safe" thing they should do is to only use the 2-tuple of 
{src_ip, dst_ip} as input-keys for _all_ fragments within a microflow.  
Obviously, this leads to 'coarse-grained' load-balancing for microflows 
containing fragmented packets.

As with many things, Engineering is about properly managing a series of 
trade-offs.  Currently, the advantage of avoiding persistent reordering of 
fragmented microflows out-weighs the disadvantage of only being able to perform 
coarse-grained load-balancing of the assumed very small amount of fragmented 
microflows.  If this draft is widely implemented & deployed and originating 
hosts are encoding a "uniformly distributed", non-zero flow-label in all 
packets (fragmented or not), then it would seem logical that routers would be 
adapted so that:
a)  If they encounter a Fragment Header they use: {src_ip, dst_ip + flow_label} 
as input-keys to the LAG + ECMP hash algorithms; and/or,
b)  If they encounter a Next Header with, for example, an Upper Layer Protocol 
that they have *not* (yet?) implemented a parsing routine to extract 
appropriate input-keys (or, can't, because it's too deep in the packet's 
headers), then they revert back to using {src_ip, dst_ip + flow_label} as 
input-keys to the LAG + ECMP hash algorithms[1]; and/or,
c)  [Assuming widespread use of the flow-label], they no longer even bother 
looking at any Next Headers in all packets and _always_ use {src_ip, dst_ip + 
flow_label} for input-keys to LAG + ECMP hash algorithms.

Personally, I see (a) & (b) as being a short- to medium-term "wins" that could 
be safely implemented, by default, in the next-spin of NP, FPGA SW and ASIC HW, 
given the existence of this, hopefully soon, RFC.  Obviously, (c) is going to 
be a little further out.  I assume that, similar to today's router 
implementations, router vendors will likely provide the flow-label as yet 
another input-key that may be used as input-keys for LAG + ECMP hash 
algorithms.  It will then be up to individual operators to determine the 
appropriate time to configure their routers/switches to, for example, only use: 
{src_ip, dst_ip + flow_label} when they are comfortable doing so for all 
traffic.


> IMHO, the vast majority of the benefit to using the IPv6 Flow Label 
> for load-balancing accrues to those IPv6 packets that have been 
> fragmented where the originating node inserts the non-zero Flow Label 
> value based on the documented 5 input parameters.

+1 in the short- to medium (?) term.  I would also point out a substantial 
additional advantage is [long-term] architectural flexibility in that the 
end-points (hosts) may freely use *new* transport protocols (SCTP, DCCP, 
UDP-lite, etc.) so long as they continue to label all packets with a "uniformly 
distributed", non-zero flow-label so that [Core] routers/switches have 
something they can safely use as input-keys for LAG and/or ECMP hash 
algorithms.  At least, that's one part of the network that we don't need to 
worry about upgrading to support new transport-layer protocols.  Unfortunately, 
middleboxes (FW's or, more generally, "security GW's") might still have to be 
adapted depending on the applicability of the new transport-layer protocol to 
various network types, (e.g.: SOHO vs. Large-ish Enterprise).

Thanks,

-shane

[1] One example I can think of here is UDP-lite.  Silly as though it may seem, 
(since the format of UDP and UDP-lite headers are nearly identical), parsing 
routines to extract {src_port, dst_port} from UDP-lite headers are not [widely] 
implemented in deployed equipment, today, because it is assumed this isn't a 
widely used transport-layer protocol.  Depending on the implementation (ASIC, 
NP or FPGA), they might be adapted to recognize UDP-lite, but that's a lot of 
cost & work ... *just* for one additional transport-layer protocol!
--------------------------------------------------------------------
IETF IPv6 working group mailing list
ipv6@ietf.org
Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------

Reply via email to