Re: Extreme congestion (was Re: inter-domain link recovery)

Sean Donelan Wed, 15 Aug 2007 22:26:56 -0700


[...Lots of good stuff deleted to get to this point...]

On Wed, 15 Aug 2007, Fred Baker wrote:

So I would suggest that a third thing that can be done, after the other twoavenues have been exhausted, is to decide to not start new sessions unlessthere is some reasonable chance that they will be able to accomplish theirwork. This is a burden I would not want to put on the host, because theprobability is vanishingly small - any competent network operator is going tosolve the problem with money if it is other than transient. But from where Isit, it looks like the "simplest, cheapest, and most reliable" place todetect overwhelming congestion is at the congested link, and given thatsessions tend to be of finite duration and present semi-predictable loads, ifyou want to allow established sessions to complete, you want to run theestablished sessions in preference to new ones. The thing to do is delay theinitiation of new sessions.

I view this as part of the flash crowd family of congestion problems, acombination of a rapid increase in demand and a rapid decrease incapacity. But instead of targeting a single destination, the impact is

across multiple networks in the region.

In the flash crowd cases (including DDOS variations), the place to respond(Note: the word change from "detect" to "respond") to extreme congestiondoes not seem toe be at the congested link but several hops upstream ofthe congested link. Current "effective practice" seems to be 1-2 ASN'saway from the congested/failure point, but that may just also be thedistance to reach "effective" ISP backbone engineer response.

If I had an ICMP that went to the application, and if I trusted theapplication to obey me, I might very well say "dear browser or p2papplication, I know you want to open 4-7 TCP sessions at a time, but for thecoming 60 seconds could I convince you to open only one at a time?". Isuspect that would go a long way. But there is a trust issue - wouldenterprise firewalls let it get to the host, would the host be able to get itto the application, would the application honor it, and would the ISP trustthe enterprise/host/application to do so? is ddos possible? <mumble>

For the malicious DDOS, of course we don't expect the hosts to obey.However, in the more general flash crowd case, I think the expectation ofhosts following the RFC is pretty strong, although it may take years fornew things to make it into the stacks. It won't slow down all theelephants, but maybe can turn the stampede into just a rampage. Andthe advantage of doing it in the edge host is their scale grow withthe Internet.

But even if the hosts don't respond to the back-off, it would give theedge more in-band trouble-shooting information. For example, ICMP"Destination Unreachable - Load shedding in effect. Retry after "N"seconds" (where N is stored like the Next-Hop MTU). Sending more packetsto signal congestion, just makes congestion worse. However, having anexplicit Internet "busy signal" is mostly to help network operatorsbecause firewalls will probably drop those ICMP messages just like PMTU.

So plan B would be to in some way rate limit the passage of TCP SYN/SYN-ACKand SCTP INIT in such a way that the hosed links remain fully utilized butsessions that have become established get acceptable service (maybe not greatservice, but they eventually complete without failing).

This would be a useful plan B (or plan F - when things are reallyFUBARed), but I still think you need a way to signal it upstream 1 or 2ASNs from the Extreme Congestion to be effective. For example, BGP saysfor all packets for network w.x.y.z with community a, implement back-offqueue plan B. Probably not a queue per network in backbone routers, justone alternate queue plan B for all networks with that community. Once

the origin ASN feels things are back to "normal," they can remove the
community from their BGP announcements.

But what should the alternate queue plan B be?

Probably not fixed capacity numbers, but a distributed percentage across
different upstreams.

  Session protocol start packets (TCP SYN/SYN-ACK, SCTP INIT, etc) 1% queue
  Datagram protocol packets (UDP, ICMP, GRE, etc) 20% queue
  Session protocol established/finish packets (TCP ACK/FIN, etc) normal queue

That values session oriented protocols more than datagram orientedprotocols during extreme congestion.

Or would it be better to let the datagram protocols fight it out with thesession oriented protocols, just like normal Internet operations


  Session protocol start packets (TCP SYN/SYN-ACK, SCTP INIT, etc) 1% queue
  Everything else (UDP, ICMP, GRE, TCP ACK/FIN, etc) normal queue

And finally why only do this during extreme congestion?  Why not always
do it?

Re: Extreme congestion (was Re: inter-domain link recovery)

Reply via email to