Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
Henk - Thanx for your thoughtful posts. I have read your later posts on this thread as well - but decided to reply to this one. Top posting for better readability. There is broad agreement that faster flooding is desirable. There are now two proposals as to how to address the issue - neither of which is proposing to use TCP (or equivalent). I have commented on why IS-IS flooding requirements are significantly different than that for which TCP is used. I think it is also useful to note that even the simple test case which Bruno reported on in last week's interim meeting demonstrated that without any changes to the protocol at all IS-IS was able to flood an order of magnitude faster than it commonly does today. This gives me hope that we are looking at the problem correctly and will not need "TCP". Introducing a TCP based solution requires: a)A major change to the adjacency formation logic b)Removal of the independence of the IS-IS protocol from the address families whose reachability advertisements it supports - something which I think is a great strength of the protocol - particularly in environments where multiple address family support is needed I really don't want to do either of the above. Your comments regarding PSNP response times are quite correct - and both of the draft proposals discuss this - though I agree more detail will be required. It is intuitive that if you want to flood faster you also need to ACK faster - and probably even retransmit faster when that is needed. The basic relationship between retransmit interval and PSNP interval is expressed in ISO 10589: " partialSNPInterval - This is the amount of time between periodic action for transmission of Partial Sequence Number PDUs. It shall be less than minimumLSPTransmission-Interval." Of course ISO 10589 recommended values (2 seconds and 5 seconds respectively) associated with a much slower flooding rate and implementations I am aware of use values in this order of magnitude. These numbers need to be reduced if we are to flood faster, but the relationship between the two needs to remain the same. It is also true - as you state - that sending ACKs more quickly will result in additional PDUs which need to be received/processed by IS-IS - and this has some impact. But I think it is reasonable to expect that an implementation which can support sending and receiving LSPs at a faster rate should also be able to send/receive PSNPs at a faster rate. But we still need to be smarter than sending one PSNP/one LSP in cases where we have a burst. LANs are a more difficult problem than P2P - and thus far draft-ginsberg-lsr-isis-flooding-scale has been silent on this - but not because we aren't aware of this - just have focused on the P2P behavior first. What the best behavior on a LAN may be is something I am still considering. Slowing flooding down to the speed at which the slowest IS on the LAN can support may not be the best strategy - as it also slows down the propagation rate for systems downstream from the nodes on the LAN which can handle faster flooding - thereby having an impact on flooding speed throughout the network in a way which may be out of proportion. This is a smaller example of the larger issue that when only some nodes in the network support faster flooding the behavior of the whole network may not be "better" when faster flooding is enabled because it prolongs the period of LSDB inconsistency. More work needs to be done here... In summary, I don't expect to have to "reinvent TCP" - but I do think you have provided a useful perspective for us to consider as we progress on this topic, Thanx. Les > -Original Message- > From: Lsr On Behalf Of Henk Smit > Sent: Thursday, April 30, 2020 6:58 AM > To: lsr@ietf.org > Subject: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't > enough > > > Hello all, > > Two years ago, Gunter Van de Velde and myself published this draft: > https://tools.ietf.org/html/draft-hsmit-lsr-isis-flooding-over-tcp-00 > That started this discussion about flow/congestion control and ISIS > flooding. > > My thoughts were that once we start implementing new algorithms to > optimize ISIS flooding speed, we'll end up with our own version of TCP. > I think most people here have a good general understanding of TCP. > But if not, this is a good overview how TCP does it: > https://en.wikipedia.org/wiki/TCP_congestion_control > > > What does TCP do: > > TCP does 2 things: flow control and congestion control. > > 1) Flow control is: the receiver trying to prevent itself from being > overloaded. The receiver indicates, through the receiver-window-size > in the TCP acks, how much data it can or wants to receive. > 2) Congestion control is: the sender trying to prevent the links between >
Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
> On May 4, 2020, at 5:47 AM, Henk Smit wrote: > > I'm looking forward to seeing (an outline of) your algorithm. I'm not trying to push any particular algorithm, we already have some proposals. My intention was only to suggest that we not disregard solutions too aggressively. The argument that there are too many queues in one routers implementation of a punt path, or there are so few or none that one can't tell IS-IS from other traffic early enough doesn't mean that one couldn't consider what *would* be required for a simple solution to be viable. If a simple solution is elegant enough then perhaps market forces start coming in to play and the cost to implement isn't actually that high. In any case we already have a couple proposed semi-solutions and during the meeting it seemed to be agreed that people would write some code and collect some data at this point. Thanks, Chris. [as WG member] ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
Mitchel wrote: IS-IS has two levels of neighbors via hello level 1s (LSAs) and hello level 2s :, so immediate is somewhat relative.. As Tony said, Level-2 neighbors are still directly adjacent. There might be layer-2 switches between them. But there are never layer-3 routers between 2 adjacent level-2 neighbors. Les's point is that interfaces, linecards, and the interface between the data-plane and the control-plane can all be seen as points between 2 ISIS instances/process on two different routers, where ISIS messages might be dropped. And that therefor you need congestion-control (in stead of, or added to) receiver-side flow-control. Sorry, I disagree, Link capacity is always an issue.. Note, we're not trying to find the maximum number of LSPs we can transmit. We just want to improve the speed a bit. From 33 LSPs/sec today to 10k LSPs/sec or something in that order. There's no need to send 10 million LSPs/sec. Suppose the average LSP is 500 bytes. Suppose a router sends 10k LSPs per second. I think if ISISes can send 10k LSPs/sec, we've solved the problem for 99.99% of networks. 10k LSPs is 5 000 000 bytes. Is 40 000 000 bits. Is 40 Mbps. So a continuous stream of 10k LSPs/sec takes 40 Mbps to transmit. For LSP-flooding, bandwidth itself is never the problem. henk. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
On Friday I wrote: I still think we'll end up re-implementing a new (and weaker) TCP. Christian Hopps wrote 2020-05-04 01:27: Let's not be too cynical at the start though! :) I wasn't trying to be cynical. Let me explain my line of reasoning two years ago. When reading about the proposals for limiting the flooding topology in IS-IS, I read a requirement doc. It said that the goal was to support areas (flooding domains) of 10k routers. Or maybe even 100k routers. My immediate thought was: "how are you gonna sync the LSDB when a router boots up ? That takes 300 to 3000 seconds !?". This is the problem I wanted to solve. I hadn't even thought of routers in dense topologies that have 1k+ neighbors. There are currently heathens that use BGP as IGP in their data-centers. There's even a cult that is developing a new IGP on top of BGP (LSVR). If they think BGP/BGP-LS/LSVR are good choices for an IGP, why is that ? One reason is that people claim that BGP is more scalable. Note, when doing "Internet-routing" with large number of prefixes, routers, or some implementations of BGP, still sometimes need minutes, or dozens of minutes to process and re-advetise all those prefixes. So when we talk about minutes, why do people think BGP is so much more wonderful ? I think it's TCP. TCP can transport lots of info quickly and efficiently. And conceptually TCP is easy to understand for the user ("you write into a socket, you read from a socket on the other box. done"). If TCP is good enough for BGP bulk-transport, it should be good enough for IS-IS bulk-transport. If there are issues with using TCP for routing-protocols, I'm sure we've solved those by now (in our implementations). We can use those same solutions/tweaks we use for BGP's TCP in ISIS's TCP. Or am I too naive now ? BTW, all the implementations I've worked with used regular TCP. All the Open Source BGPs seem to be using the regular TCP in their kernels. Can someone explain why TCP is good for BGP but not for IS-IS ? Almost 24 years ago, I sat on a bench in Santa Cruz discussing protocols with an engineer who had a lot more experience than I had, and still have. He was designing LDP at the time (with Yakov). LDP also uses TCP. He said "if we had to design IS-IS now, of course we'd use TCP as transport now". I never forgot that. The goal here is not to make IS-IS transport optimal. We don't need to use maximum available bandwidth. I just happen to think we need the same 2 elements that TCP has: sender-side congestion-avoidance and receiver-side flow-control. I hope I have explained why sender-side congestion-control in IS-IS is not enough (you don't get the feedback you need to make it work). Les and others have tried to explain why receiver-side flow-control is hard to implement (the receiving IS-IS might not know about the state of its interfaces, linecards, etc). That's why I think we need both. And when we implement both, it'll start to look like TCP. So why not use TCP itself ? Or Quic ? Or another transport that's already implemented ? I'd note that our environment is a bit more controlled than the end-to-end internet environment. In IS-IS we are dealing with single link (logical) so very simple solutions (CTS/RTS, ethernet PAUSE) could be viable. Les's argument is that it's often not so controlled. Let me ask you one question: In your algorithm, the receiving IS-IS will send a "pause signal" when it is overrun. How does IS-IS know it is overrun ? The router is dropping IS-IS pdu's on the interface, on the linecard, on the queue between linecards and Control Plane, on the IS-IS process's input-queue. When queues are full, you can't send a message up saying "we didn't have space for an IS-IS message, but we're sending you this message that we've just dropped an IS-IS message". How do you envision this works ? Imho receiver-side flow-control can only send a rough upper-bound on how many pdu's it can receive normally. A solution with a "pause signal" is basically the same as a receiver-side flow-control, where the receive-window is either 0 or infinite. Thus our choice of algorithms may well be less restricted. I'm looking forward to seeing (an outline of) your algorithm. Again, I'm not pushing for TCP (anymore). I'm not pushing for anything. I'm just trying to explain the problems that I see with solutions that are, imho, a bit too simple to really help. Maybe I'm wrong, and the problem is simpler than I think. Experimentation would be nice. henk. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
Inline… Mitchell Erblich Implementation of IS-IS for Extreme Networks in what seems Eons ago… erbli...@earthlink.net > On May 3, 2020, at 11:12 PM, tony...@tony.li wrote: > > > Mitchell, > >> I think we/you are looking at two different problems: >> 1) a hop count of 1 or maybe two between the two end points 2) and the >> multiple / many hop count between the two end points. > > > IS-IS adjacencies are always between immediate L3 neighbors, ignoring strange > things like tunneling. IS-IS has two levels of neighbors via hello level 1s (LSAs) and hello level 2s :, so immediate is somewhat relative.. > > >> Thus, I think that your issue is mostly the #2 problem >> and the problem that most CA algorithms IMO always try to increase capacity >> and thus at some point must exceed capacity. TCP must find a range of >> capacity per flow (assuming a consistent a number of packets per sec). >> However, what is maybe missed (I missed it in the document) is the ability >> not to overshoot the TCP threshold point and trigger multiple initial >> congestion events in/exiting the slow-start phase. > > > Modern router designs have interface bandwidths from 10-400Gb/s. The CPU > would be hard pressed to supply 1Gb/s, therefore for most of the > circumstances that we’re concerned about, the link capacity is never the > issue. Sorry, I disagree ,, Link capacity is always an issue.. wrt to video we now have 4k/8.3mpixels video streams with 8k support at the TV in the near future. And as you mentioned 400Gb via Link Aggregation… Whether a single or multiple 1 core, 4core or more is used and the speed of the Eth interface(s) is based on the platform. I would assume that hello 2 level neighbors / adjacencies would be more WAN based and thus stresses a router’s link capacity more than a standard neighbors via hello 1s… Even more grey scale or the number of different colors of pixels require more link capacity… Pixelation on your TV is based on either link capacity or not being able to process the packets, and thus either get dropped or just skipped. Even compression consumes CPU processing a requires an “entire frame” / partial frame to be recieved before the frame can be processed. If we jump to OSPF it is the DRs and BDRs, I would assume consume more link capacity, than DRothers (note; does not use TCP). So, link capacity and/or packet-per-sec processing Will always be an issue in some/many environments, and is more so with IPv4 due to fragmentation given fixed input queue/fifos sizes, and whether the router then does tail drops, or REDS (random early discards), or does delay of dropping an adjacency because it is processing a massive number of LSAs and has not processed the hello,,, Please tell me 1 router can process say 1GByte of min sized packets at wire speeds of say 400Gb/sec, via echo requests/responses… aka 1 million pings as a quick stress test.. Even with long enough FIFOs, don’t we have longer latencies as we queue up the packets? This is a side effect of link capacity… Even limiting the number of routers in a area and having slower or faster convergence, is based on link capacity and CPU capacity,,, aka the platform > > Tony > ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
Mitchell, > I think we/you are looking at two different problems: > 1) a hop count of 1 or maybe two between the two end points 2) and the > multiple / many hop count between the two end points. IS-IS adjacencies are always between immediate L3 neighbors, ignoring strange things like tunneling. > Thus, I think that your issue is mostly the #2 problem > and the problem that most CA algorithms IMO always try to increase capacity > and thus at some point must exceed capacity. TCP must find a range of > capacity per flow (assuming a consistent a number of packets per sec). > However, what is maybe missed (I missed it in the document) is the ability > not to overshoot the TCP threshold point and trigger multiple initial > congestion events in/exiting the slow-start phase. Modern router designs have interface bandwidths from 10-400Gb/s. The CPU would be hard pressed to supply 1Gb/s, therefore for most of the circumstances that we’re concerned about, the link capacity is never the issue. Tony ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
Group, I think we/you are looking at two different problems: 1) a hop count of 1 or maybe two between the two end points 2) and the multiple / many hop count between the two end points. RFC3649 does a “ship-in-the night” type implementation of #1, where there is an excess of capacity between the two end points. It also IMO addresses the ability to burst and then close a shorter term connection. Thus, I think that your issue is mostly the #2 problem and the problem that most CA algorithms IMO always try to increase capacity and thus at some point must exceed capacity. TCP must find a range of capacity per flow (assuming a consistent a number of packets per sec). However, what is maybe missed (I missed it in the document) is the ability not to overshoot the TCP threshold point and trigger multiple initial congestion events in/exiting the slow-start phase. FYI: My implementations tend to try to identify a connection of packets-per-sec with a range using prime numbers, and possibly identify excess capacity and subdivide that capacity into future multiple connections. This type of implementation is much easier to do from a server, that implements the #1 scenario above where the server is one of the end points. Mitchell Erblich erbli...@earthlink.net > On Apr 30, 2020, at 6:57 AM, Henk Smit wrote: > > > Hello all, > > Two years ago, Gunter Van de Velde and myself published this draft: > https://tools.ietf.org/html/draft-hsmit-lsr-isis-flooding-over-tcp-00 > That started this discussion about flow/congestion control and ISIS flooding. > > My thoughts were that once we start implementing new algorithms to > optimize ISIS flooding speed, we'll end up with our own version of TCP. > I think most people here have a good general understanding of TCP. > But if not, this is a good overview how TCP does it: > https://en.wikipedia.org/wiki/TCP_congestion_control > > > What does TCP do: > > TCP does 2 things: flow control and congestion control. > > 1) Flow control is: the receiver trying to prevent itself from being > overloaded. The receiver indicates, through the receiver-window-size > in the TCP acks, how much data it can or wants to receive. > 2) Congestion control is: the sender trying to prevent the links between > sender and receiver from being overloaded. The sender makes an educated > guess at what speed it can send. > > > The part we seem to be missing: > > For the sender to make a guess at what speed it can send, it looks at > how the transmission is behaving. Are there drops ? What is the RTT ? > Do drop-percentage and RTT change ? Do acks come in at the same rate > as the sender sends segments ? Are there duplicate acks ? To be able > to do this, the sender must know what to expect. How acks behave. > > If you want an ISIS sender to make a guess at what speed it can send, > without changing the protocol, the only thing the sender can do is look > at the PSNPs that come back from the receiver. But the RTT of PSNPs can > not be predicted. Because a good ISIS implementation does not immediately > send a PSNP when it receives a LSP. 1) the receiver should jitter the PSNP, > like it should jitter all packets. And 2) the receiver should wait a little > to see if it can combine multiple acks into a single PSNP packet. > > In TCP, if a single segment gets lost, each new segment will cause the > receiver to send an ack with the seqnr of the last received byte. This > is called "duplicate acks". This triggers the sender to do > fast-retransmission. In ISIS, this can't be be done. The information > a sender can get from looking at incoming PSNPs is a lot less than what > TCP can learn from incoming acks. > > > The problem with sender-side congestion control: > > In ISIS, all we know is that the default retransmit-interval is 5 seconds. > And I think most implementations use that as the default. This means that > the receiver of an LSP has one requirement: send a PSNP within 5 seconds. > For the rest, implementations are free to send PSNPs however and whenever > they want. This means a sender can not really make conclusions about > flooding speed, dropped LSPs, capacity of the receiver, etc. > There is no ordering when flooding LSPs, or sending PSNPs. This makes > a sender-side algorithm for ISIS a lot harder. > > When you think about it, you realize that a sender should wait the > full 5 seconds before it can make any real conclusions about dropped LSPs. > If a sender looks at PSNPs to determine its flooding speed, it will probably > not be able to react without a delay of a few seconds. A sender might send > hunderds or thousands of LSPs in those 5 seconds, which might all or > partially be dropped, complicating matters even further. > > > A sender-sider algorithm should specify how to do PSNPs. > > So imho a sender-side only algorithm can't work just like that in a > multi-vendor
Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
> On Apr 30, 2020, at 9:57 AM, Henk Smit wrote: > > I still think we'll end up re-implementing a new (and weaker) TCP. Hi Henk, Thanks for the thoughtful writeup. Let's not be too cynical at the start though! :) I'd note that our environment is a bit more controlled than the end-to-end internet environment. In IS-IS we are dealing with single link (logical) so very simple solutions (CTS/RTS, ethernet PAUSE) could be viable. That aside, yes, there are queues and opportunity for loss between the receiving linecard interface and the IS-IS process on the router, but that path is much more accessible (controllable) to the control loop than all the routers and networks between 2 endpoints for an internet TCP connection. Also, while it's generally accepted that end-to-end internet based protocols need to be TCP friendly (e.g., the datagram-based congestion control algorithms (DCCP)), I'm not sure this requirement needs to be applied the same way for IS-IS. Thus our choice of algorithms may well be less restricted. Thanks, Chris. [as WG member] ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
[Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
Hello all, Two years ago, Gunter Van de Velde and myself published this draft: https://tools.ietf.org/html/draft-hsmit-lsr-isis-flooding-over-tcp-00 That started this discussion about flow/congestion control and ISIS flooding. My thoughts were that once we start implementing new algorithms to optimize ISIS flooding speed, we'll end up with our own version of TCP. I think most people here have a good general understanding of TCP. But if not, this is a good overview how TCP does it: https://en.wikipedia.org/wiki/TCP_congestion_control What does TCP do: TCP does 2 things: flow control and congestion control. 1) Flow control is: the receiver trying to prevent itself from being overloaded. The receiver indicates, through the receiver-window-size in the TCP acks, how much data it can or wants to receive. 2) Congestion control is: the sender trying to prevent the links between sender and receiver from being overloaded. The sender makes an educated guess at what speed it can send. The part we seem to be missing: For the sender to make a guess at what speed it can send, it looks at how the transmission is behaving. Are there drops ? What is the RTT ? Do drop-percentage and RTT change ? Do acks come in at the same rate as the sender sends segments ? Are there duplicate acks ? To be able to do this, the sender must know what to expect. How acks behave. If you want an ISIS sender to make a guess at what speed it can send, without changing the protocol, the only thing the sender can do is look at the PSNPs that come back from the receiver. But the RTT of PSNPs can not be predicted. Because a good ISIS implementation does not immediately send a PSNP when it receives a LSP. 1) the receiver should jitter the PSNP, like it should jitter all packets. And 2) the receiver should wait a little to see if it can combine multiple acks into a single PSNP packet. In TCP, if a single segment gets lost, each new segment will cause the receiver to send an ack with the seqnr of the last received byte. This is called "duplicate acks". This triggers the sender to do fast-retransmission. In ISIS, this can't be be done. The information a sender can get from looking at incoming PSNPs is a lot less than what TCP can learn from incoming acks. The problem with sender-side congestion control: In ISIS, all we know is that the default retransmit-interval is 5 seconds. And I think most implementations use that as the default. This means that the receiver of an LSP has one requirement: send a PSNP within 5 seconds. For the rest, implementations are free to send PSNPs however and whenever they want. This means a sender can not really make conclusions about flooding speed, dropped LSPs, capacity of the receiver, etc. There is no ordering when flooding LSPs, or sending PSNPs. This makes a sender-side algorithm for ISIS a lot harder. When you think about it, you realize that a sender should wait the full 5 seconds before it can make any real conclusions about dropped LSPs. If a sender looks at PSNPs to determine its flooding speed, it will probably not be able to react without a delay of a few seconds. A sender might send hunderds or thousands of LSPs in those 5 seconds, which might all or partially be dropped, complicating matters even further. A sender-sider algorithm should specify how to do PSNPs. So imho a sender-side only algorithm can't work just like that in a multi-vendor environment. We must not only specify a congestion-control algorithm for the sender. We must also specify for the receiver a more specific algorithm how and when to send PSNPs. At least how to do PSNPs under load. Note that this might result in the receiver sending more (and smaller) PSNPs. More packets might mean more congestion (inside routers). Will receiver-side flow-control work ? I don't know if that's enough. It will certainly help. I think to tackle this problem, we need 3 parts: 1) sender-side congestion-control algorithm 2) more detailed algorithm on receiver when and how to send PSNPs 3) receiver-side flow-control mechanism As discussed at length, I don't know if the ISIS process on the receiving router can actually know if its running out of resources (buffers on interfaces, linecards, etc). That's implementation dependent. A receiver can definitely advertise a fixed value. So the sender has an upper bound to use when doing congestion-control. Just like TCP has both a flow-control window and a congestion-control window, and a sender uses both. Maybe the receiver can even advertise a dynamic value. Maybe now, maybe only in the future. An advertised upper limit seems useful to me today. What I didn't like about our own proposal (flooding over TCP): The problem I saw with flooding over TCP concerns multi-point networks (LANs). When flooding over a multi-point network, setting up TCP connections introduces serious challenges. Who are the endpoints of the TCP connections ? Full mesh ? Or