Missed this:

On Thu, 18 Aug 2005, Pedro Roque Marques wrote:

> 
> > We are talking about
> > the requirements for routers. Conceptual, hypothetical routers, even.  
> > There is presently no requirement that routers have structures to prevent
> > fine grained load balancing.
> 
> Lets go down to basics:
> 
> You have an unicast advertisement from originating AS 1, which it
> advertises to 2 peer ASes (2, 3). In some random AS (4) where a source
> is located, its systems see two "comparable" paths, one that happens to
> transit through 2 and another through 3.
> 
> What are the requirements for load balancing for TCP congestion
> detection / avoidance on this system ?

None, ( RFC 1812 )  See section on "load splitting". 

On page 73, 5.2.4.3 Next Hop Address

   It is also possible for the
   algorithm to terminate when more than one route remains in the set.
   In this case, the router may arbitrarily discard all but one of them,
   or may perform "load-splitting" by choosing whichever of the routes
   has been least recently used.

Selecting the route "Least-recently used" for the next hop address results 
in fine grained load balancing. But it gives a great deal of lattitude in 
the load-splitting algorithm:


On page 78, 5.2.4.5 Load Splitting

     At the end of the Next-hop selection process, multiple routes may
     still remain.  A router has several options when this occurs.  It
     may arbitrarily discard some of the routes.  It may reduce the
     number of candidate routes by comparing metrics of routes from
     routing domains that are not considered equivalent.  It may retain
     more than one route and employ a load-splitting mechanism to divide
     traffic among them.  Perhaps the only thing that can be said about
     the relative merits of the options is that load-splitting is useful
     in some situations but not in others, so a wise implementor who
     implements load-splitting will also provide a way for the network
     manager to disable it.

This is why RFC1546 notes that:

   It is important to remember that anycasting is a stateless service.
   An internetwork has no obligation to deliver two successive packets
   sent to the same anycast address to the same host.

Particularly, you should pay attention to the second sentence:

   An internetwork has no obligation to deliver two successive packets
   sent to the same anycast address to the same host.

> If AS 4, as a load balancing behaviour of sending each consecutive
> packet through AS 2 and 3 paths, TCP for this unicast routing
> announcement will not work correctly. 

It works correctly, now.  Perhaps you could explain.

> TCP RTT measurements, congestion detection / avoidance, PMTU, all assume
> that you do have *one* path for a flow.

They don't assume one flow. Perhaps **you** __assume__ one flow.  But
RFC1812, etc makes no such assumption.  RFC2991 is Informational, it
expresses an opinion, but doesn't change the fundamentals.

As explained, PMTU works fine with fine grained load balancing: The PMTU
becomes the smallest MTU over any path. If PMTU discovery is stopped by
the client after the first packet, then the DF flag is cleared, and in
that case, the other paths can fragment.  I should note that most clients
that use PMTUD don't ever disable it, but it is permissible to stop PMTU
discovery after a session is established.

RTT measurements are of no consequence. There may already be a great deal 
of variation in RTT over one path. Multiple paths just (at most) add to 
the variation.

The point here, is that one should never assume there is only one path.  
If you've made such an assumption, you've made a mistake.

RFC2991 details some potential problems with multipath. It discusses PMTU
issues and TCP performance issues as reasons to avoid multipath. As I
showed, the PMTU issue was a red-herring.  It also mis-states the TCP
issue. RFC2581 describes the Fast retransmit behavior:

   "The TCP sender SHOULD use the "fast retransmit" algorithm to detect
   and repair loss, based on incoming duplicate ACKs.  The fast
   retransmit algorithm uses the arrival of 3 duplicate ACKs (4
   identical ACKs without the arrival of any other intervening packets)
   as an indication that a segment has been lost.  After receiving 3
   duplicate ACKs, TCP performs a retransmission of what appears to be
   the missing segment, without waiting for the retransmission timer to
   expire.

RFC2991 mis-states this as follows:

         When three or more packets are received before
         a "late" packet, TCP enters a mode called "fast-retransmit"

This is not the case. [However, if it were the case, it would still only
affect 6% of the packets.]  

A more thorough reading of RFC2581 reveals when an ACK should be sent:

   A TCP receiver SHOULD send an immediate duplicate ACK when an out-
   of-order segment arrives.  The purpose of this ACK is to inform the
   sender that a segment was received out-of-order and which sequence
   number is expected.  From the sender's perspective, duplicate ACKs
   can be caused by a number of network problems.  First, they can be
   caused by dropped segments.  In this case, all segments after the
   dropped segment will trigger duplicate ACKs.  Second, duplicate ACKs
   can be caused by the re-ordering of data segments by the network (not
   a rare event along some network paths [Pax97]). 

While out-of-order packets could trigger the fast retransmit, it occurs
just 3% of the time.  So just 3% of packets are unnecessarilly
retransmitted.  Not a great performance impact.

But again, at worst, this is merely a performance issue that may be more
than compensated for by the additional performance of multiple links for
more sessions and other beneficial characteristics of having multiple
paths available.  

Lets consider just a moment what those benefits are:

For example, when a path fails, it can be immediately removed from the
routers FIB, and another path can be immediately used without waiting for
routing processes to select the next best route and add it to the FIB. [no
more blackholes until next BGP scan after link failure]

VOIP RTP buffers have no such performance issues with multipath. As long
as each packet arrives before it is to be consumed, it does not matter
what order they arrive in.  PPLB would greatly improve VOIP performance
characteristics.

This is just a short couple of examples of what the benefits to 
multipath/PPLB/load-splitting/fine-grained-load-balancing are.

But aside from the benefits of multipath, and aside from the fact that one
might be able to do course-grained load-balancing, my point is that this:

==========================================================================
The Anycast extension will not work with PPLB, and that PPLB is allowed
behavior. Therefore, It should be well documented and clearly warned that
ANY use of this Anycast extension requires that no user and no
intermediate path served by this Anycast Extension use
load-splitting/multipath/PPLB/etc. [And this requirement precludes use of
this extension for Root DNS service]
==========================================================================

> The analysis of anycast has to start from the assumption that you system
> components do work correctly in unicast. Or it is completly
> uninteresting.
> 
> > On IOS, to enable PPLB, you disable the flow cache. 
> 
> This is an incorrect statement. You can both do "per-destination" load
> balancing with or without flow cache in most cisco boxes.
> Load balancing defaults to per-destination.

It can be turned off per interface on some boxes. The manual for the GSR 
Engine2 indicates that all interfaces should have the same configuration. 
IE, global.

> So disabling "flow cache" will not automatically enable per-packet load
> balancing. Which is not even supported in most platforms.

Its different on GSR versus the other platforms. On the GSR, it a
hw-module command.  The others all have you do "no route-cache".


> > If you don't have a flow cache (or its disabled), or some similar
> > mechanism, and you enter multiple routes to the same destination in the
> > route table, and the packet forwarding decision selects among multiple
> > routes (ie balances), then by definition, you will get fine grained load
> > balancing behavior.
> 
> This is incorrect.

It is certainly correct on Cisco. There are different commands on GSR, but
the job is the same. Actually, its correct on everything.  Everything else
such as the 3 methods described in RFC 2991 falls under "some similar
mechanism"

> >  You may not have explicitly chosen this behavior, but
> > you don't have any structure to prevent it.
> 
> > > They do have the ability to do deterministic load balancing. Most don't
> > > have the ability to to per-packet.
> > 
> > How do they do this without a flow cache? 
> 
> I explained this in the previous e-mail. Most systems use a
> deterministic hash on the packet header to pick which of the N-possible
> paths to forward traffic.

As I explained, that is "[a] similar mechanism".  Its not required by 
RFC1812, which permits fine grained load balancing. 

Telling us how to do course grained load balancing is not an answer.  
There is no requirement in RFC1812 for routers to even BE ABLE to do
course grained load balancing.

                --Dean

-- 
Av8 Internet   Prepared to pay a premium for better service?
www.av8.net         faster, more reliable, better service
617 344 9000   


_________________________________________________________________
web user interface: http://darkwing.uoregon.edu/~llynch/grow.html
web archive:        http://darkwing.uoregon.edu/~llynch/grow/

Reply via email to