Re: [Lsr] Flooding across a network

Les Ginsberg (ginsberg) Wed, 06 May 2020 11:26:25 -0700

Joel -

No - I am not "asking" for anything.


I am simply trying to demonstrate that a deployment that has nodes with 
significantly different flooding rate support can encounter long periods of 
looping in certain failure scenarios - and that a similar period of looping 
would NOT occur if all nodes supported a consistent flooding rate.
Do not read anything else into this.

   Les


> -----Original Message-----
> From: Joel M. Halpern <j...@joelhalpern.com>
> Sent: Wednesday, May 06, 2020 11:19 AM
> To: Les Ginsberg (ginsberg) <ginsb...@cisco.com>
> Cc: lsr@ietf.org
> Subject: Re: [Lsr] Flooding across a network
> 
> Les, maybe I am missing your point, but it sounds like what you are
> asking for is a (better?) version of the micro-loop prevention work, so
> as to mitigate the interaction between inconsistent convergence and
> fast-reroute?
> 
> Yours,
> Joel
> 
> On 5/6/2020 1:53 PM, Les Ginsberg (ginsberg) wrote:
> > Bruno -
> >
> > I am sorry it has been so difficult for us to understand each other. I am
> trying my best.
> >
> > Look at it this way:
> >
> > You are the customer. 😊
> > I am the vendor.
> >
> > The failure scenario I describe below happens and you notice that all
> Northbound destinations loop for 35 seconds whenever fast flooding is
> enabled.
> > I think you are going to complain about this - to me. 😊
> >
> > And I am going to tell you that this is a consequence of enabling fast
> flooding in the presence of a node which does not support it. Your options to
> reduce the period of looping will be:
> >
> > 1)Upgrade the slow node to support faster flooding
> > 2)Disable fast flooding
> > 3)Redesign your network
> >
> >      Les
> >
> >> -----Original Message-----
> >> From: bruno.decra...@orange.com <bruno.decra...@orange.com>
> >> Sent: Wednesday, May 06, 2020 10:10 AM
> >> To: Christian Hopps <cho...@chopps.org>
> >> Cc: Les Ginsberg (ginsberg) <ginsb...@cisco.com>; lsr@ietf.org
> >> Subject: RE: [Lsr] Flooding across a network
> >>
> >>> From: Christian Hopps [mailto:cho...@chopps.org]
> >>>
> >>> Bruno persistence has made me realize something fundamental here.
> >>>
> >>> The minute the LSP originator changes the LSP and floods it you have
> LSDB
> >> inconsistency.
> >>
> >> Exactly my point. Thank you Chris.
> >> I would even say: "The minute the LSP originator changes the LSP then
> you
> >> have LSDB inconsistency." But no big deal if there is disagreement on this
> >> detail.
> >>
> >>> That is going to last until the last node in the network has updated it's
> LSDB.
> >>
> >> Absolutely.
> >> So the faster we flood, the shorter the LSBD inconsistency.
> >>
> >> Now IMO, even if a single/few nodes flood faster, there is a chance of
> >> shortening the LSDB inconsistency. But in all cases, I don't see how this
> could
> >> make the LSDB inconsistency longer.
> >>
> >>
> >>> Les is pointing out that LSDB inconsistency can be bad in certain
> >> circumstances e.g., if a critical node is slow and thus inconsistent.
> >>>
> >>> I believe the right way to fix this is a simple one, help the operator 
> >>> flag
> the
> >> broken router software/hardware for replacement, but otherwise IS-IS
> >> should just try to do the best job it can do to which is to flood around 
> >> the
> >> problem (i.e., flood as optimally as possible).
> >>
> >> +1
> >> On a side note, I would not call a router flooding slowly as "broken". I 
> >> find
> it
> >> understandable that in a given network there are different type of
> routers
> >> (core vs aggregation), different roles (P having 50 IGP adjacencies with 50
> PEs
> >> vs PE having only 2 IGP adjacencies with 2 P), different hardware
> >> generations, different software, different vendors with different
> >> perspectives/markets.
> >>
> >> Thank you Chris.
> >>
> >> --Bruno
> >>>
> >>> Thanks,
> >>> Chris.
> >>> [as WG member]
> >>>
> >>>
> >>>> On May 6, 2020, at 10:33 AM, bruno.decra...@orange.com wrote:
> >>>>
> >>>> Les,
> >>>>
> >>>> From: Les Ginsberg (ginsberg) [mailto:ginsb...@cisco.com]
> >>>> Sent: Wednesday, May 6, 2020 4:14 PM
> >>>> To: DECRAENE Bruno TGI/OLN
> >>>> Cc: lsr@ietf.org
> >>>> Subject: RE: Flooding across a network
> >>>>
> >>>> Bruno –
> >>>>
> >>>> I am somewhat at a loss to understand your comments.
> >>>> The example is straightforward and does not need to consider FIB
> update
> >> time nor the ordering of prefix updates on different nodes.
> >>>> [Bruno] The example is straightforward but you are referring to FIB and
> IP
> >> packets forwarding as per those FIBs.
> >>>> I’d like we focus on LSP flooding and LSDB consistency.
> >>>>
> >>>> Consider the state of Node B and Node D at various time points from
> the
> >> trigger event.
> >>>>
> >>>> T+ 2 seconds:
> >>>> -----------------
> >>>> B has received all LSP Updates. It triggers an SPF and for all Northbound
> >> destinations previously reachable via C it installs paths via D.
> >>>> Let’s assume it take 5 seconds to update the forwarding plane.
> >>>>
> >>>> D has received 40 of the 1000 LSP updates. It triggers an SPF and finds
> >> that all Northbound destinations are reachable via B-C. It makes no
> changes
> >> to the forwarding plane.
> >>>>
> >>>> T+7 seconds
> >>>> -----------------
> >>>> B has completed FIB updates. Traffic to all Northbound destinations is
> >> being forwarded via D.
> >>>>
> >>>> D has now received 140 of the 1000 LSP updates. Entries in its
> forwarding
> >> plane for Northbound destinations still point to B.
> >>>>
> >>>> We have a loop.
> >>>>
> >>>> T + 30 seconds
> >>>> --------------------
> >>>> D has now received 600 of the 1000 LSP updates. Still no changes to its
> >> forwarding plane.
> >>>> Traffic to Northbound destinations is still looping.
> >>>>
> >>>> T+ 50 seconds
> >>>> -------------------
> >>>> D has finally received all 1000 LSP updates..
> >>>> It triggers (another) SPF and calculates paths to Northbound
> destinations
> >> via E. It begins to update its forwarding plane.
> >>>> Let’s assume this will take 5 seconds..
> >>>>
> >>>> T + 55 seconds
> >>>> --------------------
> >>>> D has completed forwarding plane updates – no more looping.
> >>>>
> >>>> That is all I am trying to illustrate.
> >>>>
> >>>> If you want to start arguing that node protecting LFAs + microloop
> >> avoidance could help (NOTE I explicitly  took those out of the example for
> >> simplicity) – it is easy enough to change the example to include multiple
> node
> >> failures or a node failure plus some northbound link failures on other
> nodes.
> >>>> [Bruno] I’m not talking about LFA/FRR. And with regards to microloops
> >> avoidance, some algorithms can handle any graph transition so including
> >> multiple node failures.
> >>>>
> >>>> But again, let’s stick to LSP flooding and LSDB consistency. (you are the
> >> one speaking about microloops in the forwarding plane).
> >>>>
> >>>> The point here is to look at the impact of long-lived LSDB inconsistency
> >> which results when some nodes support flooding an order of magnitude
> >> faster flooding than other nodes – which is what you asked me to clarify.
> >>>> [Bruno] No. I asked you to clarify why having a node with faster
> flooding
> >> could prolongs the period of LSDB inconsistency.
> >>>>
> >>>> Again, with you own words: “when only some nodes in the network
> >> support faster flooding the behavior of the whole network may not be
> >> "better" when faster flooding is enabled because it prolongs the period of
> >> LSDB inconsistency.”
> >>>> And with less words: “when only some nodes in the network support
> >> faster flooding […]  it prolongs the period of LSDB inconsistency.”
> >>>>
> >>>> --Bruno
> >>>>
> >>>>     Les
> >>>>
> >>>>
> >>>>
> >>>> From: bruno.decra...@orange.com <bruno.decra...@orange.com>
> >>>> Sent: Wednesday, May 06, 2020 6:21 AM
> >>>> To: Les Ginsberg (ginsberg) <ginsb...@cisco.com>
> >>>> Cc: lsr@ietf.org
> >>>> Subject: RE: Flooding across a network
> >>>>
> >>>> Les,
> >>>>
> >>>> From: Les Ginsberg (ginsberg) [mailto:ginsb...@cisco.com]
> >>>> Sent: Wednesday, May 6, 2020 1:35 AM
> >>>> To: DECRAENE Bruno TGI/OLN; l...@ietf..org
> >>>> Subject: RE: Flooding across a network
> >>>>
> >>>> Bruno -
> >>>>
> >>>> Seems like it was not too long ago that we were discussing this in
> person.
> >> Ahhh...the good old days...
> >>>> [Bruno] Indeed, may be not to the point of concluding. Indeed.
> >>>>
> >>>> First, let's agree that the interesting case does not involve 1 or even a
> >> small number of LSPs. For those cases flooding speed does not matter.
> >>>> The interesting cases involve a large number of LSPs (hundreds or
> >> thousands). And in such cases LFA/microloop avoidance techniques are
> not
> >> applicable.
> >>>>
> >>>> Take the following simple topology:
> >>>>
> >>>>     |  | ... |            |
> >>>>       +---+             +---+
> >>>>       | C |             | E |
> >>>>       +---+             +---+
> >>>>         |                 | 1000
> >>>>       +---+             +---+
> >>>>       | B |-------------| D |
> >>>>       +---+   1000      +---+
> >>>>         |                 |
> >>>>         |                 |
> >>>>          \               /
> >>>>           \            /
> >>>>            \         /
> >>>>             \      /
> >>>>               +---+
> >>>>               | A |
> >>>>               +---+
> >>>>
> >>>> There is a topology northbound of C and E (not shown) and a topology
> >> southbound of A (not shown).
> >>>> Cost on all links is 10 except B-D and D-E where cost is high.
> >>>>
> >>>> C is a node with 1000 neighbors.
> >>>> When all links are up, shortest path for all northbound destinations is
> via
> >> C.
> >>>> All nodes in the network support fast flooding except for Node D.
> >>>> Let’s say fast flooding is 500 LSPs/second and slow flooding (Node D) is
> 20
> >> LSPs/seconds.
> >>>> If  Node C fails we have 1000 LSPs to flood.
> >>>> All nodes except for D can receive these in 2 seconds (plus internode
> >> delay time).
> >>>> D can receive LSPs in 50 seconds.
> >>>>
> >>>> [Bruno] Thanks for your example. Agreed so far.
> >>>>
> >>>> When A and B and all southbound nodes receive/process the LSP
> >> updates they will start sending traffic to Northbound destinations via D.
> >>>> But for the better part of 50 seconds, Node D has yet to receive all LSP
> >> updates and still believes that shortest path is via B-C. It will loop 
> >> traffic.
> >>>>
> >>>> [Bruno] May I remind you that we are discussing IS-IS flooding in order
> to
> >> sync LSDB (LSP database). That is already a big enough subject. It does not
> >> including FIB (updates), nor IP forwarding.
> >>>>
> >>>> Quoting you “when only some nodes in the network support faster
> >> flooding the behavior of the whole network may not be "better" when
> faster
> >> flooding is enabled because it prolongs the period of LSDB inconsistency.”
> >>>>
> >>>> Taking your own examples, in both cases (all nodes support fast
> flooding;
> >> all nodes but D support fast flooding) the period of LSDB inconsistency is
> 50
> >> seconds. Hence this example does not illustrate your statement.
> >>>>
> >>>> Hence I’m restating my questions:
> >>>>
> >>>>>> when only some nodes in the network support faster flooding the
> >> behavior
> >>>>> of the whole network may not be "better" when faster flooding is
> >> enabled
> >>>>> because it prolongs the period of LSDB inconsistency.
> >>>>>
> >>>>> 1) Do you have data on this?
> >>>>>
> >>>>> 2) If not, can you provide an example where increasing the flooding
> >> rate on
> >>>>> one adjacency prolongs the period of LSDB inconsistency across the
> >>>>> network?
> >>>>
> >>>>
> >>>> Had all nodes used slow flooding, it still would have taken 50 seconds to
> >> converge, but there would be significantly less looping. There could be a
> >> good amount of blackholing, but this is preferable to looping.
> >>>> [Bruno] You are using an example where ordering FIB updates across
> the
> >> network, e.g. as per [1], allows to reduce _FIB_ inconsistency across the
> >> path/network. And you seem to conclude from this that this translates to
> >> LSDB update ordering. Those are two different things. In this thread, I’d
> >> suggest that we focus on IGP flooding and LSDB sync only. (*)
> >>>> [1] https://tools.ietf.org/html/rfc6976
> >>>> (*) We can discuss loop free IGP converge in a different thread if you
> >> want. IMO, the use of segment routing/source routing is better than oFIB.
> >> But at some point, it still relies on fast flooding when multiple LSPs are
> >> involved. (and I mean _fast_ not _ordered_)
> >>>>
> >>>> --Bruno
> >>>>
> >>>> One can always come up with examples – based on a specific topology
> >> and a specific failure - where things might be better/worse/unchanged in
> the
> >> face of inconsistent flooding speed support.
> >>>> But I hope this simple example illustrates the pitfalls.
> >>>>
> >>>>      Les
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: bruno.decra...@orange.com
> <bruno.decra...@orange.com>
> >>>>> Sent: Tuesday, May 05, 2020 8:28 AM
> >>>>> To: Les Ginsberg (ginsberg) <ginsb...@cisco.com>; lsr@ietf.org
> >>>>> Subject: Flooding across a network
> >>>>>
> >>>>> Les,
> >>>>>
> >>>>>> From: Lsr [mailto:lsr-boun...@ietf.org] On Behalf Of Les Ginsberg
> >>>>> (ginsberg)
> >>>>>> Sent: Monday, May 4, 2020 4:39 PM
> >>>>> [...]
> >>>>>> when only some nodes in the network support faster flooding the
> >> behavior
> >>>>> of the whole network may not be "better" when faster flooding is
> >> enabled
> >>>>> because it prolongs the period of LSDB inconsistency.
> >>>>>
> >>>>> 1) Do you have data on this?
> >>>>>
> >>>>> 2) If not, can you provide an example where increasing the flooding
> >> rate on
> >>>>> one adjacency prolongs the period of LSDB inconsistency across the
> >>>>> network?
> >>>>>
> >>>>> 3) In the meantime, let's try the theoretical analysis on a simple
> >> scenario
> >>>>> where a single LSP needs to be flooded across the network.
> >>>>>
> >>>>> - Let's call Dij the time needed to flood the LSP from node i to the
> >> adjacent
> >>>>> node j. Clearly Dij>0.
> >>>>> - Let's call k the node originating this LSP at t0=0s
> >>>>>
> >>>>> >From t0, the LSDB is inconsistent across the network as all nodes but
> k
> >> are
> >>>>> missing the LSP and hence only know about the 'old' topology.
> >>>>>
> >>>>> Let's call  SPT(k) the SPT rooted on k, using Dij as the metric between
> >>>>> adjacent nodes i and j. Let's call SP(k,i) the shortest path from k to 
> >>>>> i;
> and
> >>>>> D(k,i) the shortest distance between k and i.
> >>>>>
> >>>>> It seems that the time needed:
> >>>>> - for node j to learn about the LSP, and get in sync with k, is D(k,j)
> >>>>> - for all nodes across the network to learn about the LSP, and get in
> sync
> >> with
> >>>>> k, is Max[for all j] D(k,j)
> >>>>>
> >>>>> Then how can reducing the flooding delay on one adjacency could
> >> prolongs
> >>>>> the period of LSDB inconsistency?
> >>>>> It seems to me that it can only improve/decrease it. Otherwise, this
> >> would
> >>>>> mean that decreasing the cost on a link can increase the cost of the
> >> shortest
> >>>>> path.
> >>>>>
> >>>>> Note: I agree that there are other cases, such as  multiple LSPs
> >> originated by
> >>>>> the same node, and multiple LSPs originated by multiple nodes, but
> >> let's start
> >>>>> with the simple case.
> >>>>>
> >>>>> Thanks,
> >>>>> --Bruno
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Lsr [mailto:lsr-boun...@ietf.org] On Behalf Of Les Ginsberg
> >>>>> (ginsberg)
> >>>>>> Sent: Monday, May 4, 2020 4:39 PM
> >>>>>>
> >>>>>> Henk -
> >>>>>>
> >>>>>> Thanx for your thoughtful posts.
> >>>>>> I have read your later posts on this thread as well - but decided to
> >> reply to
> >>>>> this one.
> >>>>>> Top posting for better readability.
> >>>>>>
> >>>>>> There is broad agreement that faster flooding is desirable.
> >>>>>> There are now two proposals as to how to address the issue -
> neither
> >> of
> >>>>> which is proposing to use TCP (or equivalent).
> >>>>>>
> >>>>>> I have commented on why IS-IS flooding requirements are
> >> significantly
> >>>>> different than that for which TCP is used.
> >>>>>> I think it is also useful to note that even the simple test case which
> >> Bruno
> >>>>> reported on in last week's interim meeting demonstrated that
> without
> >> any
> >>>>> changes to the protocol at all IS-IS was able to flood an order of
> >> magnitude
> >>>>> faster than it commonly does today.
> >>>>>> This gives me hope that we are looking at the problem correctly and
> >> will not
> >>>>> need "TCP".
> >>>>>>
> >>>>>> Introducing a TCP based solution requires:
> >>>>>>
> >>>>>> a)A major change to the adjacency formation logic
> >>>>>>
> >>>>>> b)Removal of the independence of the IS-IS protocol from the
> >> address
> >>>>> families whose reachability advertisements it supports - something
> >> which I
> >>>>> think is a great strength of the protocol - particularly in environments
> >> where
> >>>>> multiple address family support is needed
> >>>>>>
> >>>>>> I really don't want to do either of the above.
> >>>>>>
> >>>>>> Your comments regarding PSNP response times are quite correct -
> >> and
> >>>>> both of the draft proposals discuss this - though I agree more detail
> will
> >> be
> >>>>> required.
> >>>>>> It is intuitive that if you want to flood faster you also need to ACK
> >> faster -
> >>>>> and probably even retransmit faster when that is needed.
> >>>>>> The basic relationship between retransmit interval and PSNP interval
> >> is
> >>>>> expressed in ISO 10589:
> >>>>>>
> >>>>>> " partialSNPInterval - This is the amount of time between periodic
> >>>>>          > action for transmission of Partial Sequence Number PDUs.
> >>>>>          > It shall be less than minimumLSPTransmission-Interval."
> >>>>>>
> >>>>>> Of course ISO 10589 recommended values (2 seconds and 5 seconds
> >>>>> respectively) associated with a much slower flooding rate and
> >>>>> implementations I am aware of use values in this order of magnitude.
> >> These
> >>>>> numbers need to be reduced if we are to flood faster, but the
> >> relationship
> >>>>> between the two needs to remain the same.
> >>>>>>
> >>>>>> It is also true - as you state - that sending ACKs more quickly will
> result
> >> in
> >>>>> additional PDUs which need to be received/processed by IS-IS - and
> this
> >> has
> >>>>> some impact. But I think it is reasonable to expect that an
> >> implementation
> >>>>> which can support sending and receiving LSPs at a faster rate should
> >> also be
> >>>>> able to send/receive PSNPs at a faster rate. But we still need to be
> >> smarter
> >>>>> than sending one PSNP/one LSP in cases where we have a burst.
> >>>>>>
> >>>>>> LANs are a more difficult problem than P2P - and thus far draft-
> >> ginsberg-lsr-
> >>>>> isis-flooding-scale has been silent on this - but not because we aren't
> >> aware
> >>>>> of this - just have focused on the P2P behavior first.
> >>>>>> What the best behavior on a LAN may be is something I am still
> >> considering.
> >>>>> Slowing flooding down to the speed at which the slowest IS on the
> LAN
> >> can
> >>>>> support may not be the best strategy - as it also slows down the
> >> propagation
> >>>>> rate for systems downstream from the nodes on the LAN which can
> >> handle
> >>>>> faster flooding - thereby having an impact on flooding speed
> >> throughout the
> >>>>> network in a way which may be out of proportion. This is a smaller
> >> example
> >>>>> of the larger issue that when only some nodes in the network support
> >> faster
> >>>>> flooding the behavior of the whole network may not be "better"
> when
> >> faster
> >>>>> flooding is enabled because it prolongs the period of LSDB
> >> inconsistency.
> >>>>> More work needs to be done here...
> >>>>>>
> >>>>>> In summary, I don't expect to have to "reinvent TCP" - but I do think
> >> you
> >>>>> have provided a useful perspective for us to consider as we progress
> on
> >> this
> >>>>> topic,
> >>>>>>
> >>>>>> Thanx.
> >>>>>>
> >>>>>      > Les
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Lsr <lsr-boun...@ietf.org> On Behalf Of Henk Smit
> >>>>>>> Sent: Thursday, April 30, 2020 6:58 AM
> >>>>>>> To: lsr@ietf.org
> >>>>>>> Subject: [Lsr] Why only a congestion-avoidance algorithm on the
> >> sender
> >>>>> isn't
> >>>>>>> enough
> >>>>>>>
> >>>>>>>
> >>>>>>> Hello all,
> >>>>>>>
> >>>>>>> Two years ago, Gunter Van de Velde and myself published this
> >> draft:
> >>>>>>> https://tools.ietf.org/html/draft-hsmit-lsr-isis-flooding-over-tcp-00
> >>>>>>> That started this discussion about flow/congestion control and ISIS
> >>>>>>> flooding.
> >>>>>>>
> >>>>>>> My thoughts were that once we start implementing new
> algorithms
> >> to
> >>>>>>> optimize ISIS flooding speed, we'll end up with our own version of
> >> TCP.
> >>>>>>> I think most people here have a good general understanding of
> TCP.
> >>>>>>> But if not, this is a good overview how TCP does it:
> >>>>>>> https://en.wikipedia.org/wiki/TCP_congestion_control
> >>>>>>>
> >>>>>>>
> >>>>>>> What does TCP do:
> >>>>>>> ====
> >>>>>>> TCP does 2 things: flow control and congestion control.
> >>>>>>>
> >>>>>>> 1) Flow control is: the receiver trying to prevent itself from being
> >>>>>>> overloaded. The receiver indicates, through the receiver-window-
> >> size
> >>>>>>> in the TCP acks, how much data it can or wants to receive.
> >>>>>>> 2) Congestion control is: the sender trying to prevent the links
> >> between
> >>>>>>> sender and receiver from being overloaded. The sender makes an
> >>>>> educated
> >>>>>>> guess at what speed it can send.
> >>>>>>>
> >>>>>>>
> >>>>>>> The part we seem to be missing:
> >>>>>>> ====
> >>>>>>> For the sender to make a guess at what speed it can send, it looks
> at
> >>>>>>> how the transmission is behaving. Are there drops ? What is the
> RTT
> >> ?
> >>>>>>> Do drop-percentage and RTT change ? Do acks come in at the same
> >> rate
> >>>>>>> as the sender sends segments ? Are there duplicate acks ? To be
> >> able
> >>>>>>> to do this, the sender must know what to expect. How acks
> behave.
> >>>>>>>
> >>>>>>> If you want an ISIS sender to make a guess at what speed it can
> >> send,
> >>>>>>> without changing the protocol, the only thing the sender can do is
> >> look
> >>>>>>> at the PSNPs that come back from the receiver. But the RTT of
> >> PSNPs can
> >>>>>>> not be predicted. Because a good ISIS implementation does not
> >>>>>>> immediately
> >>>>>>> send a PSNP when it receives a LSP. 1) the receiver should jitter the
> >>>>>>> PSNP,
> >>>>>>> like it should jitter all packets. And 2) the receiver should wait a
> >>>>>>> little
> >>>>>>> to see if it can combine multiple acks into a single PSNP packet.
> >>>>>>>
> >>>>>>> In TCP, if a single segment gets lost, each new segment will cause
> >> the
> >>>>>>> receiver to send an ack with the seqnr of the last received byte.
> This
> >>>>>>> is called "duplicate acks". This triggers the sender to do
> >>>>>>> fast-retransmission. In ISIS, this can't be be done. The information
> >>>>>>> a sender can get from looking at incoming PSNPs is a lot less than
> >> what
> >>>>>>> TCP can learn from incoming acks.
> >>>>>>>
> >>>>>>>
> >>>>>>> The problem with sender-side congestion control:
> >>>>>>> ====
> >>>>>>> In ISIS, all we know is that the default retransmit-interval is 5
> >>>>>>> seconds.
> >>>>>>> And I think most implementations use that as the default. This
> >> means
> >>>>>>> that
> >>>>>>> the receiver of an LSP has one requirement: send a PSNP within 5
> >>>>>>> seconds.
> >>>>>>> For the rest, implementations are free to send PSNPs however and
> >>>>>>> whenever
> >>>>>>> they want. This means a sender can not really make conclusions
> >> about
> >>>>>>> flooding speed, dropped LSPs, capacity of the receiver, etc.
> >>>>>>> There is no ordering when flooding LSPs, or sending PSNPs. This
> >> makes
> >>>>>>> a sender-side algorithm for ISIS a lot harder.
> >>>>>>>
> >>>>>>> When you think about it, you realize that a sender should wait the
> >>>>>>> full 5 seconds before it can make any real conclusions about
> >> dropped
> >>>>>>> LSPs.
> >>>>>>> If a sender looks at PSNPs to determine its flooding speed, it will
> >>>>>>> probably
> >>>>>>> not be able to react without a delay of a few seconds. A sender
> >> might
> >>>>>>> send
> >>>>>>> hunderds or thousands of LSPs in those 5 seconds, which might all
> >> or
> >>>>>>> partially be dropped, complicating matters even further.
> >>>>>>>
> >>>>>>>
> >>>>>>> A sender-sider algorithm should specify how to do PSNPs.
> >>>>>>> ====
> >>>>>>> So imho a sender-side only algorithm can't work just like that in a
> >>>>>>> multi-vendor environment. We must not only specify a congestion-
> >>>>> control
> >>>>>>> algorithm for the sender. We must also specify for the receiver a
> >> more
> >>>>>>> specific algorithm how and when to send PSNPs. At least how to do
> >>>>> PSNPs
> >>>>>>> under load.
> >>>>>>>
> >>>>>>> Note that this might result in the receiver sending more (and
> >> smaller)
> >>>>>>> PSNPs.
> >>>>>>> More packets might mean more congestion (inside routers).
> >>>>>>>
> >>>>>>>
> >>>>>>> Will receiver-side flow-control work ?
> >>>>>>> ====
> >>>>>>> I don't know if that's enough. It will certainly help.
> >>>>>>>
> >>>>>>> I think to tackle this problem, we need 3 parts:
> >>>>>>> 1) sender-side congestion-control algorithm
> >>>>>>> 2) more detailed algorithm on receiver when and how to send
> >> PSNPs
> >>>>>>> 3) receiver-side flow-control mechanism
> >>>>>>>
> >>>>>>> As discussed at length, I don't know if the ISIS process on the
> >>>>>>> receiving
> >>>>>>> router can actually know if its running out of resources (buffers on
> >>>>>>> interfaces, linecards, etc). That's implementation dependent. A
> >> receiver
> >>>>>>> can definitely advertise a fixed value. So the sender has an upper
> >> bound
> >>>>>>> to use when doing congestion-control. Just like TCP has both a
> >>>>>>> flow-control
> >>>>>>> window and a congestion-control window, and a sender uses both.
> >>>>> Maybe
> >>>>>>> the
> >>>>>>> receiver can even advertise a dynamic value. Maybe now, maybe
> >> only in
> >>>>>>> the
> >>>>>>> future. An advertised upper limit seems useful to me today.
> >>>>>>>
> >>>>>>>
> >>>>>>> What I didn't like about our own proposal (flooding over TCP):
> >>>>>>> ====
> >>>>>>> The problem I saw with flooding over TCP concerns multi-point
> >> networks
> >>>>>>> (LANs).
> >>>>>>>
> >>>>>>> When flooding over a multi-point network, setting up TCP
> >> connections
> >>>>>>> introduces serious challenges. Who are the endpoints of the TCP
> >>>>>>> connections ?
> >>>>>>> Full mesh ? Or do all ISes on a LAN create a TCP-connection to the
> >> DIS ?
> >>>>>>> There is no backup DIS in ISIS (unlike OSPF). Things get messy
> >> quickly.
> >>>>>>>
> >>>>>>> However, the other two proposals do not solve this problem
> either.
> >>>>>>> How will a sender-side congestion-avoidence algorithm determine
> >>>>> whether
> >>>>>>> there were drops ? There are no acks (PSNPs) on a LAN. We
> assume
> >> most
> >>>>>>> LSPs
> >>>>>>> that are broadcasted are received by all other ISes on the LAN.
> >> There
> >>>>>>> are
> >>>>>>> no acks. Only after the DIS has sent its periodic CSNPs, ISes can
> send
> >>>>>>> PSNPs to request retransmissions. It seems impossible (or very
> >> hard) to
> >>>>>>> me for all ISes on a LAN to keep track of dropped LSPs and adjust
> >> their
> >>>>>>> sending speed accordingly..
> >>>>>>>
> >>>>>>> When flooding on a LAN, the receiver-side algorithm seems best.
> >>>>> Because
> >>>>>>> all ISes can see what the lowest advertised sending-speed is. And
> >> make
> >>>>>>> sure they send slow enough to not overload the slowest IS. I'm not
> >> sure
> >>>>>>> this is a good solution, but is seems easier and more realistic than
> >>>>>>> ISIS-flooding-over-TCP or sender-side congestion-avoidance.
> >>>>>>>
> >>>>>>>
> >>>>>>> My conclusion:
> >>>>>>> ====
> >>>>>>> Sender-side congestion-control won't work without specifying in
> >> more
> >>>>>>> detail how and when to send PSNPs.
> >>>>>>> Receiver-side flow-control will certainly help. I dont' know if it's
> >>>>>>> good enough. I don't know if advertising a static value is good
> >> enough.
> >>>>>>> But it's a start.
> >>>>>>>
> >>>>>>> I still think we'll end up re-implementing a new (and weaker) TCP.
> >>>>>>>
> >>>>>>>
> >>>>>>> henk.
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Lsr mailing list
> >>>>>>> Lsr@ietf.org
> >>>>>>> https://www.ietf.org/mailman/listinfo/lsr
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Lsr mailing list
> >>>>>> Lsr@ietf.org
> >>>>>> https://www.ietf.org/mailman/listinfo/lsr
> >>>>>
> >>>>>
> >>
> __________________________________________________________
> >>>>>
> >>
> __________________________________________________________
> >>>>> _____
> >>>>>
> >>>>> Ce message et ses pieces jointes peuvent contenir des informations
> >>>>> confidentielles ou privilegiees et ne doivent donc
> >>>>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
> >> recu ce
> >>>>> message par erreur, veuillez le signaler
> >>>>> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> >>>>> electroniques etant susceptibles d'alteration,
> >>>>> Orange decline toute responsabilite si ce message a ete altere,
> >> deforme ou
> >>>>> falsifie. Merci.
> >>>>>
> >>>>> This message and its attachments may contain confidential or
> privileged
> >>>>> information that may be protected by law;
> >>>>> they should not be distributed, used or copied without authorisation.
> >>>>> If you have received this email in error, please notify the sender and
> >> delete
> >>>>> this message and its attachments.
> >>>>> As emails may be altered, Orange is not liable for messages that have
> >> been
> >>>>> modified, changed or falsified.
> >>>>> Thank you.
> >>>>
> >>>>
> >>
> __________________________________________________________
> >>
> __________________________________________________________
> >> _____
> >>>>
> >>>> Ce message et ses pieces jointes peuvent contenir des informations
> >> confidentielles ou privilegiees et ne doivent donc
> >>>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
> recu
> >> ce message par erreur, veuillez le signaler
> >>>> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> >> electroniques etant susceptibles d'alteration,
> >>>> Orange decline toute responsabilite si ce message a ete altere,
> deforme
> >> ou falsifie. Merci.
> >>>>
> >>>> This message and its attachments may contain confidential or privileged
> >> information that may be protected by law;
> >>>> they should not be distributed, used or copied without authorisation.
> >>>> If you have received this email in error, please notify the sender and
> >> delete this message and its attachments.
> >>>> As emails may be altered, Orange is not liable for messages that have
> >> been modified, changed or falsified.
> >>>> Thank you.
> >>>>
> >>
> __________________________________________________________
> >>
> __________________________________________________________
> >> _____
> >>>>
> >>>> Ce message et ses pieces jointes peuvent contenir des informations
> >> confidentielles ou privilegiees et ne doivent donc
> >>>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
> recu
> >> ce message par erreur, veuillez le signaler
> >>>> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> >> electroniques etant susceptibles d'alteration,
> >>>> Orange decline toute responsabilite si ce message a ete altere,
> deforme
> >> ou falsifie. Merci.
> >>>>
> >>>> This message and its attachments may contain confidential or privileged
> >> information that may be protected by law;
> >>>> they should not be distributed, used or copied without authorisation.
> >>>> If you have received this email in error, please notify the sender and
> >> delete this message and its attachments.
> >>>> As emails may be altered, Orange is not liable for messages that have
> >> been modified, changed or falsified.
> >>>> Thank you.
> >>>>
> >>>> _______________________________________________
> >>>> Lsr mailing list
> >>>> Lsr@ietf.org
> >>>> https://www.ietf.org/mailman/listinfo/lsr
> >>>
> >>
> >>
> __________________________________________________________
> >>
> __________________________________________________________
> >> _____
> >>
> >> Ce message et ses pieces jointes peuvent contenir des informations
> >> confidentielles ou privilegiees et ne doivent donc
> >> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu
> ce
> >> message par erreur, veuillez le signaler
> >> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> >> electroniques etant susceptibles d'alteration,
> >> Orange decline toute responsabilite si ce message a ete altere, deforme
> ou
> >> falsifie. Merci.
> >>
> >> This message and its attachments may contain confidential or privileged
> >> information that may be protected by law;
> >> they should not be distributed, used or copied without authorisation.
> >> If you have received this email in error, please notify the sender and
> delete
> >> this message and its attachments.
> >> As emails may be altered, Orange is not liable for messages that have
> been
> >> modified, changed or falsified.
> >> Thank you.
> >
> > _______________________________________________
> > Lsr mailing list
> > Lsr@ietf.org
> > https://www.ietf.org/mailman/listinfo/lsr
> >
_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] Flooding across a network

Reply via email to