Re: [Lsr] Dynamic flow control for flooding

2019-07-25 Thread Henk Smit


Hello Les,

Thanks for taking the time to respond.


[Les:] Base specification defines partialSNPInterval (2 seconds).
Clearly w faster flooding we should look at decreasing this
timer - but we certainly should not do away with it.


That was the point I was trying to make:
You kept mentioning that your "tx based flow control" only needed
changes to the internal implementation of the LSP-sender.
That's not the case. Your algorithm also depends on behaviour
of the LSP-receiver. I did not see that mentioned anywhere before.
Good to see that you (and Tony) now acknowledge this necessity.

I hope you also realize (and agree) that changing the algorithm
to send PSNPs on the LSP-receiver, in a way to improve the
flow-control algorithm for the LSP-sender, will probably have a
negative impact on the current efficiency of bundling acks in
PSNPs. And that change can multiply the number of PSNPs (and thus
ISIS PDUs in input queues) that need to be received on routers.

If you don’t like the name we can certainly find something more 
appealing.


I don't care much about the name.
(In general I do care about naming in programming. And even 10x more 
about

naming in protocol documents. But that's not important in the discussion
at the moment).

The point I was trying to get across is that your proposal is not
something that happens internally on a single individual router. It is
an algorithm that involves 2 routers. And thus it is a protocol issue.


What I am proposing does not require protocol extensions -
therefore no draft is required.


Protocols do no only describe octets on the wire. They also describe
behaviour. Thus, as Tony has already said, your proposed algorithm
also need to be documented. In an RFC probably.


Whether a BCP draft is desired is something I am open to considering.


I don't know much about process in the IETF. But I was always under
the assumptions that BCPs were mostly network design/configuration
recommendations for network operators.


From an earlier email:

[Les:] I think you know what I am about to say.. :)


Yes, my question of why use exponential backoffs was a rethorical
question (as I wrote at the end of my email).
I wrote:
I hope it is clear to everyone that these are not serious questions. 
I'm

just saying: "sometimes fast is slow".


FYI, few people probably know this, but I happen to be the guy that
intially came up with the idea of exponential backoffs in IGPs.
(Back in 1999 when I was at cisco).

Anyway, to reiterate my point: "sometimes fast is slow". It seems we
now all agree that sending LSPs "rapidly" and then assuming 
retransmissions

will fix any problems, is an approach that is way too naive. Good.

henk.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Les Ginsberg (ginsberg)
Tony -

From: Tony Li  On Behalf Of tony...@tony.li
Sent: Wednesday, July 24, 2019 3:37 PM
To: Les Ginsberg (ginsberg) 
Cc: stephane.litkow...@orange.com; lsr@ietf.org
Subject: Re: [Lsr] Dynamic flow control for flooding


Les,


If you disagree please take things bullet-by-bullet:


  *   LSP input queue implementations are typically interface independent FIFOs


Very true.  It would not be unreasonable for an implementation to report free 
space in the FIFO (in number of PDUs) divided by the number of active 
adjacencies.  Everyone gets their fair share.

[If dynamic flooding is enabled, this could be based on the number of 
adjacencies that should be actively flooding. That should be a much smaller 
number.]
[Les:] So you are agreeing that when a receiver wants to “dial back” it will 
need to do so on all interfaces enabled for flooding?



  *   Overloaded Receiver does not know which senders are disproportionately 
causing the overflow


This doesn’t matter.  The receiver needs them all to slow down.



  *   LSPs may be dropped at lower layers – IS-IS receiver may be unaware that 
the overload condition exists


That’s an implementation problem. The implementation NEEDS to be able to see 
its input queue plus input drops.

[Les:] And you want to ship this feature when…? 
I think this is a difficult ask.
Before we decide this is what is required we should explore the path of 
monitoring the unacknowledged Tx queue.


  *   Updating hellos dynamically to alter flooding transmission rate is an OOB 
signaling mechanism consuming  resources at a time when routers are the most 
busy
  *   Consistent flooding rates will require updated hellos be sent to all 
neighbors – exacerbating the cost on both sender and receiver


This is why I suggest sending the feedback in PSNPs as well as in IIHs.  
Regardless of the details, we need to consider sending PSNPs back more 
frequently.  I concur that optimizing the rate and triggers for sending more 
PSNPs is an open issue.

Strictly speaking, sending a TLV inside of our protocol PDUs is an in-band 
signaling mechanism.
[Les:] I agree – PSNP would be better since we need to send it anyway in order 
to ACK. Still does not convince me this is the preferred approach – but I agree 
it is better than hellos.

The resources consumed by maintaining a running count of a queue in silicon or 
in process space is effectively zero.

[Les:] It is not about counting – it is about how a given queue might be used. 
It isn’t reasonable to mandate that a dataplane-to-forwarding plane queue be 
dedicated to IS-IS. What other control plane entities are using the queue and 
how they empty it will introduce new variables. And the implementation cost 
comes in providing “real time updates” on the current queue space to clients 
that need it.

I really think monitoring the unacknowledged TX queue will give us what we need 
and make the solution completely contained within the IS-IS implementation.
Guess I need to work on more details on that approach.

   Les


Tony


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread tony . li

Les,

> If you disagree please take things bullet-by-bullet:
>  
> LSP input queue implementations are typically interface independent FIFOs


Very true.  It would not be unreasonable for an implementation to report free 
space in the FIFO (in number of PDUs) divided by the number of active 
adjacencies.  Everyone gets their fair share.

[If dynamic flooding is enabled, this could be based on the number of 
adjacencies that should be actively flooding. That should be a much smaller 
number.]


> Overloaded Receiver does not know which senders are disproportionately 
> causing the overflow


This doesn’t matter.  The receiver needs them all to slow down.


> LSPs may be dropped at lower layers – IS-IS receiver may be unaware that the 
> overload condition exists


That’s an implementation problem. The implementation NEEDS to be able to see 
its input queue plus input drops.


> Updating hellos dynamically to alter flooding transmission rate is an OOB 
> signaling mechanism consuming  resources at a time when routers are the most 
> busy
> Consistent flooding rates will require updated hellos be sent to all 
> neighbors – exacerbating the cost on both sender and receiver


This is why I suggest sending the feedback in PSNPs as well as in IIHs.  
Regardless of the details, we need to consider sending PSNPs back more 
frequently.  I concur that optimizing the rate and triggers for sending more 
PSNPs is an open issue.

Strictly speaking, sending a TLV inside of our protocol PDUs is an in-band 
signaling mechanism.

The resources consumed by maintaining a running count of a queue in silicon or 
in process space is effectively zero.

Tony


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread tony . li


> Whether a BCP draft is desired is something I am open to considering.


Process nit: what we’re talking about doing, regardless of how we do it, is 
overriding 10589.  As that’s a normative reference, overriding that requires 
that we have a standards track document.

I don’t believe that even a BCP is sufficient. 

Well, that’s my understanding, at least.

Tony


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread tony . li

Les,


> This thread reminds me of how easy it is to miscommunicate – and I bear some 
> of the responsibility for that.


Communications takes two.  The receiver must also be focused on the input. My 
bad too.


> I don’t see anything in there about changing the PSNP signaling rate.  From 
> your comments to Henk, I infer that you’re open to changing that rate.
> [Les:] The proposal in the slides is simply an example/straw man. I did not 
> spend a lot of time on it – in fact in the first draft of the slides I did 
> not even provide a proposal. It certainly needs more refinement.
> It is meant only to illustrate how we can do things w/o requiring the receive 
> side to do calculations for which the raw data may be difficult and w/o 
> requiring new TLVs.


Understood.  The difficulty in implementation is not our primary concern and 
something of a chicken and egg situation: if we do not ask to get the data 
about the input critical path, we will not get it.

We’ve had a very similar situation with IP option handling: IP header options 
used to be very, very rare, so the first generation of forwarding ASICs passed 
off options for software-based forwarding. This made options slow.  IPv6 
designers then decided that options were slow, so they weren’t going to use 
options.  ;-)

If we need data from the platform silicon, then we should ask for it.  We’re 
not going to get it if we don’t ask.  And without specifics about what’s going 
on, we’re not going to make good approximations to the optimal rate.


> As soon as you do that, you’re now providing receiver based feedback and 
> creating flow control.  You’re accepting that rates will vary per interface.
>  
> [Les:] Yes – but only when we know that continuing to send at a high rate 
> isn’t useful. It isn’t meant to fix things (as I keep emphasizing) and in a 
> network that works as intended it should never be necessary.


Ok, below you say that you’re willing to be aggressive.  But being aggressive 
means that we WILL push things into flow control.


> What you’re NOT doing is providing information about the receiver’s input 
> queue and requested input rate.  With less information, the transmitter can 
> only approximate the optimal rate and your proposal seems like a Newton’s 
> method approach to determining that rate.  
>  
> [Les:] For all of the implementations I have worked on (5 now – across 3 
> different vendors – not all still available  ) such information is not 
> easily determined. Buffer pools are shared among many components, input 
> queues may have multiple stages not all of which are visible to the routing 
> protocol. Plus, since once flow control is needed there is already a problem, 
> this isn’t fixing things – it is just trying to get by.
>  
> A solution which depends on current receiver state “all the time” is hard – 
> and hard to optimize. And I think we don’t need that degree of precision for 
> optimal operation.



No one is suggesting “all the time”.   The point is to provide more information 
than what’s currently in the PSNP.  The more information the transmitter has, 
the more accurately it can approximate the optimal transmit rate and maximize 
goodput.
 

> Your proposal depends on two constants: Usafe and Umax.  How do you know what 
> those are?
>  
> [Les:] Not yet.


That seems like a core problem.


> That’s information about the receiver. 
>  
> [Les:] Happy to agree to that.


Yay!  One more thing that we agree on.  :-)


> I infer that you propose to hard code some conservative values for these.  In 
> my mind, that implies that you will be going more slowly than you could if 
> you had more accurate data.  And pretty much what we’re proposing is that the 
> receiver advertise this type of information so that we don’t have to assume 
> the worst case.  This also is nice because an implementation only has to know 
> about it’s own capabilities.
>  
> [Les:] I expect the values to be aggressive – because the downside of 
> flooding LSPs too fast for (say) a few seconds is small.


Hmmm…. I’m not yet convinced.  Elsewhere on the thread, you said that a good 
implementation should prioritize receiving IIHs over LSPs and *SNPs.  I concur 
with that, but in my experience (5 vendors) I only know of one that implemented 
that in silicon, and that one’s not available (sob!).

Thus, my concern is that without a good approximation for the goodput rate, we 
will either chronically underestimate, penalizing convergence, or chronically 
overestimate, risking the stability of the network.

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread tony . li
Les,

Ok, let me reset.  I’ve re-read your slides.
 
I don’t see anything in there about changing the PSNP signaling rate.  From 
your comments to Henk, I infer that you’re open to changing that rate.

As soon as you do that, you’re now providing receiver based feedback and 
creating flow control.  You’re accepting that rates will vary per interface.

What you’re NOT doing is providing information about the receiver’s input queue 
and requested input rate.  With less information, the transmitter can only 
approximate the optimal rate and your proposal seems like a Newton’s method 
approach to determining that rate.  

Your proposal depends on two constants: Usafe and Umax.  How do you know what 
those are?

That’s information about the receiver. 

I infer that you propose to hard code some conservative values for these.  In 
my mind, that implies that you will be going more slowly than you could if you 
had more accurate data.  And pretty much what we’re proposing is that the 
receiver advertise this type of information so that we don’t have to assume the 
worst case.  This also is nice because an implementation only has to know about 
it’s own capabilities.

Tony


> On Jul 24, 2019, at 12:31 PM, Les Ginsberg (ginsberg)  
> wrote:
> 
> Tony –
>  
> I have NEVER proposed that the flooding rate be determined by the slowest 
> node.
> Quite the opposite.
>  
> Flooding rate should be based on the target convergence time and should be 
> aggressive because most topology changes involve much fewer than 1000 LSPs 
> (arbitrary number). So even w a slow node fast flooding won’t be an issue for 
> the vast majority of changes.
>  
> When we get a topology change with enough LSPs to expose the slowest node 
> limitations we (in decreasing order of importance):
>  
> 1)Continue to flood fast to those nodes/links which can handle it
> 2)Report the slow node to the operator (so they can address the limitation)
> 3)Do what we can to limit the overload on the slow node/link
>  
> Hope this helps.
>  
>Les
>  
>  
> From: Tony Li mailto:tony1ath...@gmail.com>> On 
> Behalf Of tony...@tony.li <mailto:tony...@tony.li>
> Sent: Wednesday, July 24, 2019 12:04 PM
> To: Les Ginsberg (ginsberg) mailto:ginsb...@cisco.com>>
> Cc: lsr@ietf.org <mailto:lsr@ietf.org>
> Subject: Re: [Lsr] Dynamic flow control for flooding
>  
>  
> Les,
>  
>  
> Optimizing the throughput through a slow receiver is pretty low on my list 
> because the ROI is low.
>  
>  
> Ok, I disagree. The slow receiver is the critical path to convergence.  Only 
> when the slow receiver has absorbed all changes and SPFed do we have 
> convergence.
>  
>  
> First, the rate that you select might be too fast for one neighbor and not 
> for the others.  Real flow control would help address this.
>  
> [Les:] At the cost of convergence. Not a good tradeoff.
> I am arguing that we do want to flood at the same rate on all interfaces used 
> for flooding. When we cannot, flow control does not help with convergence. It 
> may decrease some wasted bandwidth – but as we all agree that bandwidth isn’t 
> a significant limitation this isn’t a great concern.
>  
>  
> Rate limiting flooding delays convergence.  Please consider the following 
> topology:
>  
>  
> 1 —— 2 —— 3
> |||
> |||
> 4 —— 5 —— 6
> |||
> |||
> 7 —— 8 —— 9
>  
>  
> Suppose that we have 1000 LSPs injected at router 1.  Suppose further that 
> router 2 runs at half the rate of router 4.  [How router 1 knows this 
> requires $DEITY and is out of scope for the moment.]
>  
> Router 1 now floods at the optimal rate for router 2.  Router 1 uses that 
> same rate to flood to router 4.  Suppose that it takes time T for this to 
> complete.
>  
> When does the network converge?
>  
> Option 1: All nodes use the same flooding rate.
>  
> Router 2 will flood to router 3 concurrent with receiving updates from router 
> 1. Thus, router 3 will receive all updates in time T + delta, where delta is 
> router 2’s processing time.  For now, let’s approximate delta as zero.
>  
> Similarly, all routers will use the same rate, so router 4 will flood to 7 in 
> time T + delta, and so on, with router 9 receiving everything in time T + 3 * 
> delta.
>  
> Assuming no nodes SPF during the process, the network converges nearly 
> simultaneously in about time T.
>  
> Option 2: We flood a bit faster where we can.
>  
> Suppose that router 1 now floods at the full rate to router 4.  The full 
> update now takes time T/2.  Because all of the other nodes in the network are 
> fast, router 4 floods in time T/2 + delta to nodes

Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Les Ginsberg (ginsberg)
Tony –

I have NEVER proposed that the flooding rate be determined by the slowest node.
Quite the opposite.

Flooding rate should be based on the target convergence time and should be 
aggressive because most topology changes involve much fewer than 1000 LSPs 
(arbitrary number). So even w a slow node fast flooding won’t be an issue for 
the vast majority of changes.

When we get a topology change with enough LSPs to expose the slowest node 
limitations we (in decreasing order of importance):

1)Continue to flood fast to those nodes/links which can handle it
2)Report the slow node to the operator (so they can address the limitation)
3)Do what we can to limit the overload on the slow node/link

Hope this helps.

   Les


From: Tony Li  On Behalf Of tony...@tony.li
Sent: Wednesday, July 24, 2019 12:04 PM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: Re: [Lsr] Dynamic flow control for flooding


Les,


Optimizing the throughput through a slow receiver is pretty low on my list 
because the ROI is low.


Ok, I disagree. The slow receiver is the critical path to convergence.  Only 
when the slow receiver has absorbed all changes and SPFed do we have 
convergence.


First, the rate that you select might be too fast for one neighbor and not for 
the others.  Real flow control would help address this.

[Les:] At the cost of convergence. Not a good tradeoff.
I am arguing that we do want to flood at the same rate on all interfaces used 
for flooding. When we cannot, flow control does not help with convergence. It 
may decrease some wasted bandwidth – but as we all agree that bandwidth isn’t a 
significant limitation this isn’t a great concern.


Rate limiting flooding delays convergence.  Please consider the following 
topology:


1 —— 2 —— 3
|||
|||
4 —— 5 —— 6
|||
|||
7 —— 8 —— 9


Suppose that we have 1000 LSPs injected at router 1.  Suppose further that 
router 2 runs at half the rate of router 4.  [How router 1 knows this requires 
$DEITY and is out of scope for the moment.]

Router 1 now floods at the optimal rate for router 2.  Router 1 uses that same 
rate to flood to router 4.  Suppose that it takes time T for this to complete.

When does the network converge?

Option 1: All nodes use the same flooding rate.

Router 2 will flood to router 3 concurrent with receiving updates from router 
1. Thus, router 3 will receive all updates in time T + delta, where delta is 
router 2’s processing time.  For now, let’s approximate delta as zero.

Similarly, all routers will use the same rate, so router 4 will flood to 7 in 
time T + delta, and so on, with router 9 receiving everything in time T + 3 * 
delta.

Assuming no nodes SPF during the process, the network converges nearly 
simultaneously in about time T.

Option 2: We flood a bit faster where we can.

Suppose that router 1 now floods at the full rate to router 4.  The full update 
now takes time T/2.  Because all of the other nodes in the network are fast, 
router 4 floods in time T/2 + delta to nodes 5 and 7.  Carrying this forward, 
router 9 gets a full update in time T/2 + 3 * delta.  Even router 3 has full 
updates in T/2 + 3 * delta.

With the exception of node 2, the network has converged in half the time.  Even 
node 2 converges in time T.

Key points:

1) Yes, the slow node delays convergence and causes micro-loops as everyone 
around it SPFs.  The point here (and I think you agree) is that slow nodes need 
to be upgraded.

2) There is no way for us to know how fast a node can go without some form of 
flow control, other than to go absurdly slowly.

3) There are many folks who want to converge quickly.  It is mission critical 
for them.  They will address slow nodes. They will not accept pessimal timing 
to avoid micro-loops.



[Les:] I do not see how flow control improves things.


Flow control allows the transmitter to transmit at the optimal rate for the 
receiver.



Dropping down to the least common denominator CPU speed in the entire network 
is going to be undoable without an oracle, and absurdly slow even with that.

[Les:] Never advocated that – please do not put those words in my mouth.


How is that different than what you’ve proposed?  Router 1 can only flood at 
the rate that it gets PSNPs from router 2.  That paces its flooding to router 
4.  Following that logic, you somehow want router 4 to run at the same rate, 
forcing a uniformly slow rate.

Tony



___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread stephane.litkowski
Les,

Can’t we use from a transmitter point of view, the lack of ACKs from the 
receiver as a signal that the transmitter should slow down ?
I agree that depending on the exact bottleneck/issue, the IS-IS stack of the 
receiver may not be aware that something bad is happening and can’t provide 
feedback to the transmitter. However if the transmitter sees that the receiver 
is acking slowly or quickly, wouldn’t it be able to adjust its flooding speed. 
Can we have a receiver intentionally postponing a PSNP to aggregate multiple 
ACKs in a single message ?

From: Les Ginsberg (ginsberg) [mailto:ginsb...@cisco.com]
Sent: Wednesday, July 24, 2019 14:48
To: tony...@tony.li
Cc: LITKOWSKI Stephane OBS/OINIS; lsr@ietf.org
Subject: RE: [Lsr] Dynamic flow control for flooding

More inline…

From: Tony Li  On Behalf Of tony...@tony.li
Sent: Tuesday, July 23, 2019 10:56 PM
To: Les Ginsberg (ginsberg) 
Cc: stephane.litkow...@orange.com; lsr@ietf.org
Subject: Re: [Lsr] Dynamic flow control for flooding


Les,

There is something to be said for simply “flooding fast” and not worrying about 
flow control at all (regardless of whether TX or RX mechanisms would be used).


The only thing I would say to that is: really bad idea.

[Les:] I have to watch my words more closely. 
I am not arguing for this – but I do think that “most of the time” this 
strategy would actually be optimal.
We are discussing the extreme cases – as we should – where we have a large # of 
LSPs to flood. But let’s not lose sight of the fact that the simple approach 
works most of the time. For the times when the simple approach doesn’t work 
well, I am then arguing we should not overcomplicate the solution – 
particularly because the strategies we might use don’t help convergence.

If you supersaturate the receiver, you waste transmitter in the transmission, 
you swamp the receiver causing packet loss, you potentially trigger the loss of 
IIH’s, you risk causing a cascade failure, and until we come up with a better 
retransmission mechanism, you probably actually delay network convergence, as 
nothing is going to happen until you’ve completed retransmissions.
[Les:] Prioritization of hellos over LSPs/SNPs is a longstanding best practice 
(both on Tx and Rx) and this must not change. No one is advocating that – 
certainly not me.

The way to maximize goodput is NOT to transmit blindly.


[Les:] Not arguing for blindness, but I am arguing for simplicity.

But most important to me is to recognize that flow control (however done) is 
not fixing anything – nor making the flooding work better. The network is 
compromised and flow control won’t fix it.


 The network is not compromised.

[Les:] If the SLA the customer expects is convergence in less than N, then a 
slow link jeopardizes our ability to achieve that.

If you accept that, then it makes sense to look for the simplest way to do flow 
control and that is decidedly not from the RX side. (I expect Tony Li to 
disagree with that – but I have already outlined why it is more complex to do 
it from the Rx side.)



Flow control cannot be done without involvement of the RX side.  That’s why 
it’s called flow _control_.  The only thing that can be done purely from the TX 
side is a unilateral (and pessimal) transmit rate cap that will have to allow 
for the worst case CPU in the worst case situation (e.g., BGP impacting the 
CPU).

Flow control is the creation of a control loop and that requires feedback from 
the receiver.  This is true in every form of true flow control: XON/XOFF, 
RTS/CTS, sliding window protocols, credit based fabric mechanisms, etc.

I’ll go so far as to quote Wikipedia:

"In data communications<https://en.wikipedia.org/wiki/Data_communications>, 
flow control is the process of managing the rate of data transmission between 
two nodes to prevent a fast sender from overwhelming a slow receiver. It 
provides a mechanism for the receiver to control the transmission speed, so 
that the receiving node is not overwhelmed with data from transmitting node.”

[Les:] I will not argue about the definition.
In this specific case, there are difficulties in controlling the flooding rate 
based on advertisements from the RX side. The difficulties are outlined in my 
slides and largely have to do with the difficulties/costs of dynamically 
calculating what number to advertise. (A static advertisement is also difficult 
to calculate w/o being overly conservative.)

If you disagree please take things bullet-by-bullet:


  *   LSP input queue implementations are typically interface independent FIFOs
  *   Overloaded Receiver does not know which senders are disproportionately 
causing the overflow
  *   LSPs may be dropped at lower layers – IS-IS receiver may be unaware that 
the overload condition exists
  *   Updating hellos dynamically to alter flooding transmission rate is an OOB 
signaling mechanism consuming  resources at a time when routers are the most 
busy
  *   Consistent flo

Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Les Ginsberg (ginsberg)
Henk -

Welcome to the discussion.

Inline.

> -Original Message-
> From: henk.i...@xs4all.nl 
> Sent: Wednesday, July 24, 2019 5:34 AM
> To: Les Ginsberg (ginsberg) 
> Cc: stephane.litkow...@orange.com; Tony Li ; lsr@ietf.org
> Subject: Re: [Lsr] Dynamic flow control for flooding
> 
> 
> Hello Les,
> 
> Les Ginsberg (ginsberg) schreef op 2019-07-24 07:17:
> >
> > There is something to be said for simply “flooding fast” and not
> > worrying about flow control at all (regardless of whether TX or RX
> > mechanisms would be used). Some packets would be dropped, but
> > retransmission timers will insure that the flooding eventually
> > succeeds and retransmit timers are long (5 seconds by default). (I am
> > not the only one mentioning this BTW…)
> 
> Why do we have initial waits and exponential backoffs for LSP-generation
> and SPF-computations ? Why not react immediately ? Why not react
> constantly ?
> 
[Les:] I think you know what I am about to say.. 

Initial wait exists in order to minimize premature SPF calculation. Topology 
changes always entail multiple LSP updates (often 2 for a single link failure - 
more for a node failure or line card failure).
We wait a short while to increase the likelihood that we have received all of 
the LSP updates associated w the topology event.

Backoff is used so that if the network has an extended period of instability we 
do not invest cycles w diminishing returns i.e., the updates we are making to 
forwarding are likely stale before we even complete the update.

> We have a lot of bandwidth and cpu-power now. Isn't simple always better
> than "overly complex stuff" like exponential backoffs ? If you have more
> cpu-power, more memory and more bandwidth, why invent new algorithms ?

[Les:] Base spec never defined an algorithm for dynamically changing flooding 
speed (unless you consider retransmit timers). So this is "inventing" - not 
"reinventing".

   Les
> 
> henk.
> 
> 
> I hope it is clear to everyone that these are not serious questions. I'm
> just
> saying: "sometimes fast is slow". I am sure that if we ask the "old
> guys", they
> can come up with many stories how these problems are sometimes
> counter-intuitive.
> And how networks have melted because "fast is slow". I could tell at
> least 2
> of those stories myself.
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Les Ginsberg (ginsberg)
More inline…

From: Tony Li  On Behalf Of tony...@tony.li
Sent: Tuesday, July 23, 2019 10:56 PM
To: Les Ginsberg (ginsberg) 
Cc: stephane.litkow...@orange.com; lsr@ietf.org
Subject: Re: [Lsr] Dynamic flow control for flooding


Les,

There is something to be said for simply “flooding fast” and not worrying about 
flow control at all (regardless of whether TX or RX mechanisms would be used).


The only thing I would say to that is: really bad idea.

[Les:] I have to watch my words more closely. 
I am not arguing for this – but I do think that “most of the time” this 
strategy would actually be optimal.
We are discussing the extreme cases – as we should – where we have a large # of 
LSPs to flood. But let’s not lose sight of the fact that the simple approach 
works most of the time. For the times when the simple approach doesn’t work 
well, I am then arguing we should not overcomplicate the solution – 
particularly because the strategies we might use don’t help convergence.

If you supersaturate the receiver, you waste transmitter in the transmission, 
you swamp the receiver causing packet loss, you potentially trigger the loss of 
IIH’s, you risk causing a cascade failure, and until we come up with a better 
retransmission mechanism, you probably actually delay network convergence, as 
nothing is going to happen until you’ve completed retransmissions.
[Les:] Prioritization of hellos over LSPs/SNPs is a longstanding best practice 
(both on Tx and Rx) and this must not change. No one is advocating that – 
certainly not me.

The way to maximize goodput is NOT to transmit blindly.


[Les:] Not arguing for blindness, but I am arguing for simplicity.


But most important to me is to recognize that flow control (however done) is 
not fixing anything – nor making the flooding work better. The network is 
compromised and flow control won’t fix it.


 The network is not compromised.

[Les:] If the SLA the customer expects is convergence in less than N, then a 
slow link jeopardizes our ability to achieve that.


If you accept that, then it makes sense to look for the simplest way to do flow 
control and that is decidedly not from the RX side. (I expect Tony Li to 
disagree with that – but I have already outlined why it is more complex to do 
it from the Rx side.)



Flow control cannot be done without involvement of the RX side.  That’s why 
it’s called flow _control_.  The only thing that can be done purely from the TX 
side is a unilateral (and pessimal) transmit rate cap that will have to allow 
for the worst case CPU in the worst case situation (e.g., BGP impacting the 
CPU).

Flow control is the creation of a control loop and that requires feedback from 
the receiver.  This is true in every form of true flow control: XON/XOFF, 
RTS/CTS, sliding window protocols, credit based fabric mechanisms, etc.

I’ll go so far as to quote Wikipedia:

"In data communications<https://en.wikipedia.org/wiki/Data_communications>, 
flow control is the process of managing the rate of data transmission between 
two nodes to prevent a fast sender from overwhelming a slow receiver. It 
provides a mechanism for the receiver to control the transmission speed, so 
that the receiving node is not overwhelmed with data from transmitting node.”

[Les:] I will not argue about the definition.
In this specific case, there are difficulties in controlling the flooding rate 
based on advertisements from the RX side. The difficulties are outlined in my 
slides and largely have to do with the difficulties/costs of dynamically 
calculating what number to advertise. (A static advertisement is also difficult 
to calculate w/o being overly conservative.)

If you disagree please take things bullet-by-bullet:


  *   LSP input queue implementations are typically interface independent FIFOs
  *   Overloaded Receiver does not know which senders are disproportionately 
causing the overflow
  *   LSPs may be dropped at lower layers – IS-IS receiver may be unaware that 
the overload condition exists
  *   Updating hellos dynamically to alter flooding transmission rate is an OOB 
signaling mechanism consuming  resources at a time when routers are the most 
busy
  *   Consistent flooding rates will require updated hellos be sent to all 
neighbors – exacerbating the cost on both sender and receiver

   Les


Tony


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Robert Raszuk
*[Les:] At the cost of convergence. Not a good tradeoff.*

Hi Les - why at the cost of convergence ? No one as I see it is proposing
the same rate to every peer. Quite contrary -- per peer rate of flooding.
So if you keep flooding high speed on fast links the convergence will not
be any slower with flow control.

Thx,
R.

On Wed, Jul 24, 2019 at 8:28 PM Les Ginsberg (ginsberg) 
wrote:

> Tony –
>
>
>
> Inline.
>
>
>
> *From:* Tony Li  *On Behalf Of *tony...@tony.li
> *Sent:* Tuesday, July 23, 2019 10:39 PM
> *To:* Les Ginsberg (ginsberg) 
> *Cc:* lsr@ietf.org
> *Subject:* Re: [Lsr] Dynamic flow control for flooding
>
>
>
>
>
>
>
> Les,
>
>
>
> I also think we all agree on the goal - which is to flood significantly
> faster than many implementations do today to handle deployments like the
> case you mention below.
>
>
>
>
>
> I agree with that, but you haven’t responded to the goal that I proposed:
> keep the receiver processing PDUs.
>
>
>
> *[Les:] My goals are:*
>
>
>
> *Optimize convergence.*
>
> *Minimize complexity.*
>
>
>
> *Optimizing the throughput through a slow receiver is pretty low on my
> list because the ROI is low.*
>
>
>
> Beyond this point, I have a different perspective.
>
>
>
> As network-wide convergence depends upon fast propagation of LSP changes –
> which in turn requires consistent flooding rates on all interfaces enabled
> for flooding – a properly provisioned network MUST be able to sustain a
> consistent flooding rate or the operation of the network will suffer. We
> therefore need to view flow control issues as indicative of a problem.
>
>
>
> It is a mistake to equate LSP flooding with a set of independent P2P
> “connections” – each of which can operate at a rate independent of the
> other.
>
>
>
> If we can agree on this, then I believe we will have placed the flow
> control problem in its proper perspective – in which case it will become
> easier to agree on the best way to implement flow control.
>
>
>
>
>
> I agree that we want network wide convergence.  However, I disagree that
> the right thing to do is to uniformly flood at the same rate on all
> interfaces.
>
>
>
> First, the rate that you select might be too fast for one neighbor and not
> for the others.  Real flow control would help address this.
>
>
>
> *[Les:] At the cost of convergence. Not a good tradeoff.*
>
> *I am arguing that we do want to flood at the same rate on all interfaces
> used for flooding. When we cannot, flow control does not help with
> convergence. It may decrease some wasted bandwidth – but as we all agree
> that bandwidth isn’t a significant limitation this isn’t a great concern.*
>
>
>
> *The same reasons that drive us to use same SPF delay and LSP Generation
> timers on all routers also apply to flooding rate.*
>
>
>
> Second, all flooding paths are not created equal.  You do not know, a
> priori, how to flood to get uniform network wide propagation.  The variance
> in CPU loading alone is going to cause different routers to flood at
> different rates, and so it may actaully be optimal to flood quickly down
> one path, knowing that the data will reach the other end of the network and
> flood back before the slow CPU can absorb and flood.
>
>
>
> *[Les:] I do not see how flow control improves things.*
>
> *If you have alternate flooding paths around the slow link(s) this argues
> more that you should not have the slow link in the flooding topology in the
> first place. But this heads off topic – I think we agree that dynamic
> flooding is a separate discussion which we don’t want to bring into this
> thread.*
>
>
>
> Dropping down to the least common denominator CPU speed in the entire
> network is going to be undoable without an oracle, and absurdly slow even
> with that.
>
>
>
> *[Les:] Never advocated that – please do not put those words in my mouth.*
>
>
>
> *   Les*
>
>
>
>
>
> Tony
>
>
> ___
> Lsr mailing list
> Lsr@ietf.org
> https://www.ietf.org/mailman/listinfo/lsr
>
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Les Ginsberg (ginsberg)
Tony –

Inline.

From: Tony Li  On Behalf Of tony...@tony.li
Sent: Tuesday, July 23, 2019 10:39 PM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: Re: [Lsr] Dynamic flow control for flooding



Les,

I also think we all agree on the goal - which is to flood significantly faster 
than many implementations do today to handle deployments like the case you 
mention below.


I agree with that, but you haven’t responded to the goal that I proposed: keep 
the receiver processing PDUs.

[Les:] My goals are:

Optimize convergence.
Minimize complexity.

Optimizing the throughput through a slow receiver is pretty low on my list 
because the ROI is low.


Beyond this point, I have a different perspective.

As network-wide convergence depends upon fast propagation of LSP changes – 
which in turn requires consistent flooding rates on all interfaces enabled for 
flooding – a properly provisioned network MUST be able to sustain a consistent 
flooding rate or the operation of the network will suffer. We therefore need to 
view flow control issues as indicative of a problem.

It is a mistake to equate LSP flooding with a set of independent P2P 
“connections” – each of which can operate at a rate independent of the other.

If we can agree on this, then I believe we will have placed the flow control 
problem in its proper perspective – in which case it will become easier to 
agree on the best way to implement flow control.


I agree that we want network wide convergence.  However, I disagree that the 
right thing to do is to uniformly flood at the same rate on all interfaces.

First, the rate that you select might be too fast for one neighbor and not for 
the others.  Real flow control would help address this.

[Les:] At the cost of convergence. Not a good tradeoff.
I am arguing that we do want to flood at the same rate on all interfaces used 
for flooding. When we cannot, flow control does not help with convergence. It 
may decrease some wasted bandwidth – but as we all agree that bandwidth isn’t a 
significant limitation this isn’t a great concern.

The same reasons that drive us to use same SPF delay and LSP Generation timers 
on all routers also apply to flooding rate.

Second, all flooding paths are not created equal.  You do not know, a priori, 
how to flood to get uniform network wide propagation.  The variance in CPU 
loading alone is going to cause different routers to flood at different rates, 
and so it may actaully be optimal to flood quickly down one path, knowing that 
the data will reach the other end of the network and flood back before the slow 
CPU can absorb and flood.

[Les:] I do not see how flow control improves things.
If you have alternate flooding paths around the slow link(s) this argues more 
that you should not have the slow link in the flooding topology in the first 
place. But this heads off topic – I think we agree that dynamic flooding is a 
separate discussion which we don’t want to bring into this thread.

Dropping down to the least common denominator CPU speed in the entire network 
is going to be undoable without an oracle, and absurdly slow even with that.

[Les:] Never advocated that – please do not put those words in my mouth.

   Les


Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread tony . li

Robert,

> The second part of the question was really about at what layer it makes most 
> sense to provide this control loop. 


To me, the obvious thing to do is to make minor revisions to the protocol. We 
need to:

- Add a TLV so that the receiver can provide feedback. IMHO, this should be in 
IIH’s and PSNPs.

- Add text to modify the transmitters behavior.  In the presence of this TLV, 
the transmitter is released from 10589 compliance and may transmit (details 
TBD).

- Add text to modify the receivers behavior.  If you support this feature, then 
add the TLV, send PSNPs more frequently (rate & trigger TBD).


> Options seems to be: 
> 
> * Invent new or use existing link layer flow control (IEEE)
> * Reuse existing transport layer (TCP) 
> * App layer (QUIC or QUIC like)


All of these seem like massive overkill.


> I guess it would be useful to up front list on what type of media this must 
> be supported as it may change the game drastically: 
> 
> * directly connected fiber p2p 
> * p2mp (via switch)
> * p2p over encapsulation 
> etc…


All of the above, plus legacy media too.  No reason why this doesn’t apply to 
100BaseT.  Bandwidth is not the constraint.

Tony


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread tony . li

Robert,

Nothing has changed about the probability of network partitioning. That was 
simply a use case selected to motivate the discussion about flooding speed.

The entire discussion is almost orthogonal to dynamic flooding.  Let’s please 
take that out of the discussion.

Tony


> On Jul 24, 2019, at 7:38 AM, Robert Raszuk  wrote:
> 
> Hi,
> 
> Yes indeed while I was reading your richly connected node restart problem use 
> of overload-bit should be explored, proposed, implemented. 
> 
> For the partition problem I have two general comments: 
> 
> a) If network partitions is likely to happen more often in the case of 
> dynamic flooding perhaps as already said before we should increase the max 
> number of occurrences given LSP is to arrive at flooding optimized node. Two 
> may not be enough.
> 
> b)  If protocol extensions will help to mitigate effects of network partition 
> via much faster repair some folks may treat network partitions as normal 
> operational model and instead of re-architecting the network to make sure 
> network partition events are as rear as possible. 
> 
> Thx,
> R.
> 
> On Wed, Jul 24, 2019 at 4:12 PM Henk Smit  > wrote:
> 
> Hello Robert,
> 
> Tony brought up the example of a partioned network.
> But there are more examples.
> 
> E.g. in a network there is a router with a 1000 neighbors.
> (When discussing distributed vs centralized flooding-topology
>   reduction algorithms, I've been told these network designs exist).
> When such a router reboots/crashes/comes back up, all 1000 neighbors
> will create a new version of their own LSP. This causes a 1000 different
> LSPs to be flooded through the network at the same time. Impacting every
> router in the network.
> 
> The case I was thinking of myself, was when a router in a large network
> boots. When it brings up a number of adjacencies, each neighbor will
> try to synchronize its LSPDB with the newly booted router. As the newly
> booted router will send emtpy CSNPs to each of its neighbors, each
> neighbor will start sending the full LSPDB. If such a network has 10k
> LSPs, and such a router has 100 neighbors, that router will receive 100 
> * 10k
> is 1 million LSPs. Having a faster and more efficient flooding 
> transport,
> with flow-control, will make a reboot in such a topology less painful.
> 
> (In that last case, creative use of the overload-bit could prevent 
> black-holing
> or microloops while ISIS synchronizes its LSPDB after a reboot. Just 
> like we
> used the overload-bit to solve the problem of slow convergence of BGP 
> after
> a reboot, 22 years ago. I have no idea if there are any implementations 
> that
> use the overload-bit to alleviate slow convergence of IS-IS after a 
> reboot).
> 
> henk.
> 
> 
> Robert Raszuk schreef op 2019-07-24 15:33:
> > Hey Henk & all,
> > 
> > If acks for 1000 LSPs take 16 PSNPs (max 66 per PSNP) or even as long
> > as Tony mentioned the full flooding as Tony said may take 33 sec - is
> > this really a problem ?
> > 
> > Remember we are not talking about protocol convergence after link flap
> > or node going down. We are talking about serious network partitioning
> > which itself may have lasted for minutes, hours or days. While just
> > considering absolute numbers yelds desire to go faster and faster, if
> > we put things in the overall perspective is there really a problem to
> > be solved in the first place ?
> > 
> > Would there still be a problem if LSR WG recommends faster acking
> > maybe not for each LSP but for say 20 or 30 max ?
> > 
> > Thx,
> > R.
> 
> ___
> Lsr mailing list
> Lsr@ietf.org 
> https://www.ietf.org/mailman/listinfo/lsr 
> 

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Henk Smit

Les Ginsberg (ginsberg) schreef op 2019-07-23 22:29:


It is a mistake to equate LSP flooding with a set of independent P2P
“connections” – each of which can operate at a rate independent
of the other.


Of course, if some routers are slow, convergence in parts of the network
might be slow. But as Stephane has already suggested, it is up to the
operators to determine if slower convergence in parts of their network
is acceptable. E.g. they can chose to put fast/expensive/new routers
in the center of their network, and move older routers to, or buy 
cheaper

routers for, the edges of their network.


But I have a question for you, Les:

During your talk, or maybe in older emails, I was under the impression
that you wanted to warn for another problem. Namely microloops.
I am not sure I understand correctly. So let me try to explain what
I understood. And please correct me if I am wrong.


Between the time a link breaks, and the first router(s) start to repair
the network, some traffic is dropped. Bad for that traffic of course. 
But

the network keeps functioning. Once all routers have re-converged and
adjusted their FIBs, everything is fine again.

In the time in between, between the first router adjusting its FIB and
the last router adjusting its FIB, you can have microloops. Microloops
multiply traffic. Which can cause the whole network to suffer of 
congestion.

Impacting traffic that did not (originally) go over the broken link.

So you want the transition from "wrong FIBs", that point over the broken
path, to "the final FIBs", where all traffic flows correctly, to have
that transition happen on all routers at once. That would make the 
network
go from "drop some traffic" to "forward over the new path" without a 
stage

of "some microloops" in between.

Am I correct ? Is this what you try to prevent ?
Is this why you want all flooding between routers go at the same speed ?

Thanks in advance,

henk.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Tony Przygienda
I'm under the impression the whole discussion got triggered by my rather
loose remark during the dinner based on, admittedly, my quite dated
implementation experience (with 2 disjoint distributed SPTs to reduce
flooding). I realized it seems not clearly spelled out on the thread. So to
hopefully clarify a bit: the scenario was not partition really, the
scenario was where I ended up in reduced graphs where rather large
"cliques" ended up being connected by either long'ish think chains (2 links
minimal cuts) which ended up "bottle-necking" the flooding on large changes
like redistribution/restart scenarios causing significant amount of LSPs
needed refresh from "left" to "right" through the bottlenecks. Transients
on the network became longer to the point we ended up (then) scrapping the
attempt after considering how we would bulild two trees _per flooding
source_ to really deal with it which seemed like overkill (and RIFT does
BTW but only because of known topology properties) and then setting
overload for which we couldn't figure out a good dynamic way (too short
dangerous, too long as bad). All that with standard flooding rates (beauty
of having mix-of-implementaitons requirements posed). Controlling the span
in distributed fashion is of course much harder than centralized fashion so
one has to be careful to compare apples-with-oranges here ...

that said pretty much out this thread

--- tony



On Wed, Jul 24, 2019 at 9:34 AM Robert Raszuk  wrote:

> Hey Henk & all,
>
> If acks for 1000 LSPs take 16 PSNPs (max 66 per PSNP) or even as long as
> Tony mentioned the full flooding as Tony said may take 33 sec - is this
> really a problem ?
>
> Remember we are not talking about protocol convergence after link flap or
> node going down. We are talking about serious network partitioning which
> itself may have lasted for minutes, hours or days. While just considering
> absolute numbers yelds desire to go faster and faster, if we put things in
> the overall perspective is there really a problem to be solved in the first
> place ?
>
> Would there still be a problem if LSR WG recommends faster acking maybe
> not for each LSP but for say 20 or 30 max ?
>
> Thx,
> R.
>
>
>
>
>
>
>
>
> On Wed, Jul 24, 2019 at 3:18 PM Henk Smit  wrote:
>
>>
>> Hello Les,
>>
>> Les Ginsberg (ginsberg) wrote on 2019-07-24 07:17:
>>
>> > If you accept that, then it makes sense to look for the simplest way
>> > to do flow control and that is decidedly not from the RX side. (I
>> > expect Tony Li to disagree with that  – but I have already
>> > outlined why it is more complex to do it from the Rx side.)
>>
>> In your talk on Monday you called the idea in
>> draft-decraene-lsr-isis-flooding-speed-01 "receiver driven flow
>> control".
>> You don't like that. You want "transmit based flow control".
>> You argued that you can do "transmit based flow control" on the sender
>> only.
>> Therefor your algorithm is merely a "local trick".
>> And "local tricks" don't need RFCs. I agree with that.
>> But I don't agree that your algorithm is just a "local trick"..
>>
>>
>> In your algorithm, a "sender" sends a number of LSPs to a receiver.
>> Without waiting for acks (PNSPs). Like in any sliding window protocol.
>> The sending router keeps an eye on the number of unacked LSPs.
>> And it determines how fast it can send more LSPs based on the current
>> number of unacked LSPs. Every time the sender receives a PSNP, it
>> knows the receiver got a number of LSPs, so it can increase its
>> send-window again, and then send more LSPs.
>> Correct ?
>>
>> I agree that the core idea of this algorithm makes sense.
>> After all, it looks a lot like TCP.
>> I believe the authors of draft-decraene-lsr-isis-flooding-speed were
>> planning something like that for the next version of their draft.
>>
>>
>> However, I do not agree with the name "tx driven flow control".
>> I also do not agree that this algorithm is "a local trick".
>> Therefor I also do not think this algorithm doesn't need to be
>> documented (in an RFC).
>>
>> In your "tx based flow control", the sender (tx) sends LSPs at a rate
>> that is derived from the rate at which it receives PSNPs. Therefor
>> it is the sender of the PSNPs that sets the speed of transmission !
>> So it is still the receiver (of LSPs) that controls the flow control.
>> The name "tx based flow control" is a little misleading, imho.
>>
>>
>> It is important to realize that the success of your algorithm actually
>> depends on the behaviour of the receiver. How does it send PSNPs ?
>> Does it send one PSNP per received LSP ? Or does it pack multiple acks
>> in one PSNP ? Does it send a PSNP immediatly, or does it wait a short
>> time ? Does it try to fill a PSNP to the max (putting ~90 acks in one
>> PSNP) ? Does the receiver does something in between ? I don't think
>> the behaviour is specified exactly anywhere.
>>
>> I know about an IS-IS implementation from the nineties. When a router
>> would receive an LSP, it would a) set the 

Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Robert Raszuk
Hi,

Yes indeed while I was reading your richly connected node restart problem
use of overload-bit should be explored, proposed, implemented.

For the partition problem I have two general comments:

a) If network partitions is likely to happen more often in the case of
dynamic flooding perhaps as already said before we should increase the max
number of occurrences given LSP is to arrive at flooding optimized node.
Two may not be enough.

b)  If protocol extensions will help to mitigate effects of network
partition via much faster repair some folks may treat network partitions as
normal operational model and instead of re-architecting the network to make
sure network partition events are as rear as possible.

Thx,
R.

On Wed, Jul 24, 2019 at 4:12 PM Henk Smit  wrote:

>
> Hello Robert,
>
> Tony brought up the example of a partioned network.
> But there are more examples.
>
> E.g. in a network there is a router with a 1000 neighbors.
> (When discussing distributed vs centralized flooding-topology
>   reduction algorithms, I've been told these network designs exist).
> When such a router reboots/crashes/comes back up, all 1000 neighbors
> will create a new version of their own LSP. This causes a 1000 different
> LSPs to be flooded through the network at the same time. Impacting every
> router in the network.
>
> The case I was thinking of myself, was when a router in a large network
> boots. When it brings up a number of adjacencies, each neighbor will
> try to synchronize its LSPDB with the newly booted router. As the newly
> booted router will send emtpy CSNPs to each of its neighbors, each
> neighbor will start sending the full LSPDB. If such a network has 10k
> LSPs, and such a router has 100 neighbors, that router will receive 100
> * 10k
> is 1 million LSPs. Having a faster and more efficient flooding
> transport,
> with flow-control, will make a reboot in such a topology less painful.
>
> (In that last case, creative use of the overload-bit could prevent
> black-holing
> or microloops while ISIS synchronizes its LSPDB after a reboot. Just
> like we
> used the overload-bit to solve the problem of slow convergence of BGP
> after
> a reboot, 22 years ago. I have no idea if there are any implementations
> that
> use the overload-bit to alleviate slow convergence of IS-IS after a
> reboot).
>
> henk.
>
>
> Robert Raszuk schreef op 2019-07-24 15:33:
> > Hey Henk & all,
> >
> > If acks for 1000 LSPs take 16 PSNPs (max 66 per PSNP) or even as long
> > as Tony mentioned the full flooding as Tony said may take 33 sec - is
> > this really a problem ?
> >
> > Remember we are not talking about protocol convergence after link flap
> > or node going down. We are talking about serious network partitioning
> > which itself may have lasted for minutes, hours or days. While just
> > considering absolute numbers yelds desire to go faster and faster, if
> > we put things in the overall perspective is there really a problem to
> > be solved in the first place ?
> >
> > Would there still be a problem if LSR WG recommends faster acking
> > maybe not for each LSP but for say 20 or 30 max ?
> >
> > Thx,
> > R.
>
> ___
> Lsr mailing list
> Lsr@ietf.org
> https://www.ietf.org/mailman/listinfo/lsr
>
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Henk Smit



Hello Robert,

Tony brought up the example of a partioned network.
But there are more examples.

E.g. in a network there is a router with a 1000 neighbors.
(When discussing distributed vs centralized flooding-topology
 reduction algorithms, I've been told these network designs exist).
When such a router reboots/crashes/comes back up, all 1000 neighbors
will create a new version of their own LSP. This causes a 1000 different
LSPs to be flooded through the network at the same time. Impacting every
router in the network.

The case I was thinking of myself, was when a router in a large network
boots. When it brings up a number of adjacencies, each neighbor will
try to synchronize its LSPDB with the newly booted router. As the newly
booted router will send emtpy CSNPs to each of its neighbors, each
neighbor will start sending the full LSPDB. If such a network has 10k
LSPs, and such a router has 100 neighbors, that router will receive 100 
* 10k
is 1 million LSPs. Having a faster and more efficient flooding 
transport,

with flow-control, will make a reboot in such a topology less painful.

(In that last case, creative use of the overload-bit could prevent 
black-holing
or microloops while ISIS synchronizes its LSPDB after a reboot. Just 
like we
used the overload-bit to solve the problem of slow convergence of BGP 
after
a reboot, 22 years ago. I have no idea if there are any implementations 
that
use the overload-bit to alleviate slow convergence of IS-IS after a 
reboot).


henk.


Robert Raszuk schreef op 2019-07-24 15:33:

Hey Henk & all,

If acks for 1000 LSPs take 16 PSNPs (max 66 per PSNP) or even as long
as Tony mentioned the full flooding as Tony said may take 33 sec - is
this really a problem ?

Remember we are not talking about protocol convergence after link flap
or node going down. We are talking about serious network partitioning
which itself may have lasted for minutes, hours or days. While just
considering absolute numbers yelds desire to go faster and faster, if
we put things in the overall perspective is there really a problem to
be solved in the first place ?

Would there still be a problem if LSR WG recommends faster acking
maybe not for each LSP but for say 20 or 30 max ?

Thx,
R.


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Robert Raszuk
Hey Henk & all,

If acks for 1000 LSPs take 16 PSNPs (max 66 per PSNP) or even as long as
Tony mentioned the full flooding as Tony said may take 33 sec - is this
really a problem ?

Remember we are not talking about protocol convergence after link flap or
node going down. We are talking about serious network partitioning which
itself may have lasted for minutes, hours or days. While just considering
absolute numbers yelds desire to go faster and faster, if we put things in
the overall perspective is there really a problem to be solved in the first
place ?

Would there still be a problem if LSR WG recommends faster acking maybe not
for each LSP but for say 20 or 30 max ?

Thx,
R.








On Wed, Jul 24, 2019 at 3:18 PM Henk Smit  wrote:

>
> Hello Les,
>
> Les Ginsberg (ginsberg) wrote on 2019-07-24 07:17:
>
> > If you accept that, then it makes sense to look for the simplest way
> > to do flow control and that is decidedly not from the RX side. (I
> > expect Tony Li to disagree with that  – but I have already
> > outlined why it is more complex to do it from the Rx side.)
>
> In your talk on Monday you called the idea in
> draft-decraene-lsr-isis-flooding-speed-01 "receiver driven flow
> control".
> You don't like that. You want "transmit based flow control".
> You argued that you can do "transmit based flow control" on the sender
> only.
> Therefor your algorithm is merely a "local trick".
> And "local tricks" don't need RFCs. I agree with that.
> But I don't agree that your algorithm is just a "local trick".
>
>
> In your algorithm, a "sender" sends a number of LSPs to a receiver.
> Without waiting for acks (PNSPs). Like in any sliding window protocol.
> The sending router keeps an eye on the number of unacked LSPs.
> And it determines how fast it can send more LSPs based on the current
> number of unacked LSPs. Every time the sender receives a PSNP, it
> knows the receiver got a number of LSPs, so it can increase its
> send-window again, and then send more LSPs.
> Correct ?
>
> I agree that the core idea of this algorithm makes sense.
> After all, it looks a lot like TCP.
> I believe the authors of draft-decraene-lsr-isis-flooding-speed were
> planning something like that for the next version of their draft.
>
>
> However, I do not agree with the name "tx driven flow control".
> I also do not agree that this algorithm is "a local trick".
> Therefor I also do not think this algorithm doesn't need to be
> documented (in an RFC).
>
> In your "tx based flow control", the sender (tx) sends LSPs at a rate
> that is derived from the rate at which it receives PSNPs. Therefor
> it is the sender of the PSNPs that sets the speed of transmission !
> So it is still the receiver (of LSPs) that controls the flow control.
> The name "tx based flow control" is a little misleading, imho.
>
>
> It is important to realize that the success of your algorithm actually
> depends on the behaviour of the receiver. How does it send PSNPs ?
> Does it send one PSNP per received LSP ? Or does it pack multiple acks
> in one PSNP ? Does it send a PSNP immediatly, or does it wait a short
> time ? Does it try to fill a PSNP to the max (putting ~90 acks in one
> PSNP) ? Does the receiver does something in between ? I don't think
> the behaviour is specified exactly anywhere.
>
> I know about an IS-IS implementation from the nineties. When a router
> would receive an LSP, it would a) set the SSN bit (for that
> LSP/interface),
> and b) start the psnp-timer for that interface (if not already running).
> The psnp-timer would expire 2 seconds later. The router would then walk
> the LSPDB, find all LSPs with the SSN-bit set for that interface. And
> then build a PSNP with acks for all those LSPs. The result would be
> that: a) the first PSNP would be send 2 seconds (+/- jitter) after
> receiving the first LSP, and b) the PSNP would include ~66 acks. (As
> a router receiving at full speed would have received 66 LSPs in 2
> seconds).
>
> For your "tx based flow control" algorithm to work properly, this has
> to change. The receiving router must send PSNPs more quickly and more
> aggressively. The result would be that there will be less acks in each
> PSNP. And thus more PSNPs will be sent.
>
> This makes us realize: in the current situation, if a router receives
> a 1000 LSPs, and sends those LSPs to 64 neighbors, it would receive:
> - the 1000 LSPs from an upstream neighbor, plus
> - 1000/66 = 16 PSNPs from each downstream neighbor = 64 * 16 = 1024
> PSNPs.
> This makes a total of ~2000K PDUs received.
>
> If routers would send one PSNP per LSP (to have faster flow control),
> then the router in this example would receive:
> - the 1000 LSPs from an upstream neighbor, plus
> - 1000 PSNPs from each downstream neighbor * 16 = 1600 PSNPs.
> This makes a total of ~17000 PDUs received.
>
> The total number of PDUs received on this router would go from 2K PDUs
> to 17K PDUs.
>
> Remember that the problem we're trying to solve here is to make sure
> 

Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Henk Smit


Hello Les,

Les Ginsberg (ginsberg) wrote on 2019-07-24 07:17:


If you accept that, then it makes sense to look for the simplest way
to do flow control and that is decidedly not from the RX side. (I
expect Tony Li to disagree with that  – but I have already
outlined why it is more complex to do it from the Rx side.)


In your talk on Monday you called the idea in
draft-decraene-lsr-isis-flooding-speed-01 "receiver driven flow 
control".

You don't like that. You want "transmit based flow control".
You argued that you can do "transmit based flow control" on the sender 
only.

Therefor your algorithm is merely a "local trick".
And "local tricks" don't need RFCs. I agree with that.
But I don't agree that your algorithm is just a "local trick".


In your algorithm, a "sender" sends a number of LSPs to a receiver.
Without waiting for acks (PNSPs). Like in any sliding window protocol.
The sending router keeps an eye on the number of unacked LSPs.
And it determines how fast it can send more LSPs based on the current
number of unacked LSPs. Every time the sender receives a PSNP, it
knows the receiver got a number of LSPs, so it can increase its
send-window again, and then send more LSPs.
Correct ?

I agree that the core idea of this algorithm makes sense.
After all, it looks a lot like TCP.
I believe the authors of draft-decraene-lsr-isis-flooding-speed were
planning something like that for the next version of their draft.


However, I do not agree with the name "tx driven flow control".
I also do not agree that this algorithm is "a local trick".
Therefor I also do not think this algorithm doesn't need to be
documented (in an RFC).

In your "tx based flow control", the sender (tx) sends LSPs at a rate
that is derived from the rate at which it receives PSNPs. Therefor
it is the sender of the PSNPs that sets the speed of transmission !
So it is still the receiver (of LSPs) that controls the flow control.
The name "tx based flow control" is a little misleading, imho.


It is important to realize that the success of your algorithm actually
depends on the behaviour of the receiver. How does it send PSNPs ?
Does it send one PSNP per received LSP ? Or does it pack multiple acks
in one PSNP ? Does it send a PSNP immediatly, or does it wait a short
time ? Does it try to fill a PSNP to the max (putting ~90 acks in one
PSNP) ? Does the receiver does something in between ? I don't think
the behaviour is specified exactly anywhere.

I know about an IS-IS implementation from the nineties. When a router
would receive an LSP, it would a) set the SSN bit (for that 
LSP/interface),

and b) start the psnp-timer for that interface (if not already running).
The psnp-timer would expire 2 seconds later. The router would then walk
the LSPDB, find all LSPs with the SSN-bit set for that interface. And
then build a PSNP with acks for all those LSPs. The result would be
that: a) the first PSNP would be send 2 seconds (+/- jitter) after
receiving the first LSP, and b) the PSNP would include ~66 acks. (As
a router receiving at full speed would have received 66 LSPs in 2 
seconds).


For your "tx based flow control" algorithm to work properly, this has
to change. The receiving router must send PSNPs more quickly and more
aggressively. The result would be that there will be less acks in each
PSNP. And thus more PSNPs will be sent.

This makes us realize: in the current situation, if a router receives
a 1000 LSPs, and sends those LSPs to 64 neighbors, it would receive:
- the 1000 LSPs from an upstream neighbor, plus
- 1000/66 = 16 PSNPs from each downstream neighbor = 64 * 16 = 1024 
PSNPs.

This makes a total of ~2000K PDUs received.

If routers would send one PSNP per LSP (to have faster flow control),
then the router in this example would receive:
- the 1000 LSPs from an upstream neighbor, plus
- 1000 PSNPs from each downstream neighbor * 16 = 1600 PSNPs.
This makes a total of ~17000 PDUs received.

The total number of PDUs received on this router would go from 2K PDUs
to 17K PDUs.

Remember that the problem we're trying to solve here is to make sure
that routers do not get overrun on the receipt side with too many
packets too quickly. It seems an aggressive PSNP-scheme, to achieve
faster flow-control, is actually very counter-productive.

Of course the algorithm can be tweaked. E.g. TCP sends one ack per
every 2 received segments (if I'm not mistaken). If we do that here,
the number of PDUs would go down from 17K to 9K PDUs. What do you
propose ? How do you want the feedback of PSNPs to be quick, while
maintaining an efficient packing of multiple acks per PSNP ?


In any case, the points I'm trying to make here:
*) Your algorithm is not sender-driven, but still receiver-driven.
*) Your algorithm changes/dictates behaviour both on sender and 
receiver.
*) Interaction between a sender and a receiver is what we call a 
protocol.
   If you want to make this work, especially in multi-vendor 
environments,

   we need to document these 

Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread henk . ietf


Hello Les,

Les Ginsberg (ginsberg) schreef op 2019-07-24 07:17:


There is something to be said for simply “flooding fast” and not
worrying about flow control at all (regardless of whether TX or RX
mechanisms would be used). Some packets would be dropped, but
retransmission timers will insure that the flooding eventually
succeeds and retransmit timers are long (5 seconds by default). (I am
not the only one mentioning this BTW…)


Why do we have initial waits and exponential backoffs for LSP-generation
and SPF-computations ? Why not react immediately ? Why not react 
constantly ?


We have a lot of bandwidth and cpu-power now. Isn't simple always better
than "overly complex stuff" like exponential backoffs ? If you have more
cpu-power, more memory and more bandwidth, why invent new algorithms ?

henk.


I hope it is clear to everyone that these are not serious questions. I'm 
just
saying: "sometimes fast is slow". I am sure that if we ask the "old 
guys", they
can come up with many stories how these problems are sometimes 
counter-intuitive.
And how networks have melted because "fast is slow". I could tell at 
least 2

of those stories myself.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Robert Raszuk
Hi Tony,

> I am assuming that there is no link layer flow control.  I can’t recall
> working on a system that supports X.25 since about 1995, so I don’t think
> that’s a common use case today.
>

I was more thinking along the lines of leveraging IEEE 802.3x or 802.1Qbb
standard not necessarily suggest fancy X.25 or Frame Relay :)

 Henk proposed that we simply pick up TCP for this, but my concern with
> that is really about introducing a whole new dependency to the protocol.
> That’s a lot to chew.  Do we really need it all? I hope not.  Thus, Bruno’s
> original suggestion sparked my interest in doing something dynamic and
> simple.
>

The second part of the question was really about at what layer it makes
most sense to provide this control loop.

Options seems to be:

* Invent new or use existing link layer flow control (IEEE)
* Reuse existing transport layer (TCP)
* App layer (QUIC or QUIC like)

I guess it would be useful to up front list on what type of media this must
be supported as it may change the game drastically:

* directly connected fiber p2p
* p2mp (via switch)
* p2p over encapsulation
etc...

Thx,
R.
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-23 Thread Les Ginsberg (ginsberg)
Stephane –

There is much we agree on.

There is something to be said for simply “flooding fast” and not worrying about 
flow control at all (regardless of whether TX or RX mechanisms would be used). 
Some packets would be dropped, but retransmission timers will insure that the 
flooding eventually succeeds and retransmit timers are long (5 seconds by 
default). (I am not the only one mentioning this BTW…)

But most important to me is to recognize that flow control (however done) is 
not fixing anything – nor making the flooding work better. The network is 
compromised and flow control won’t fix it.
If you accept that, then it makes sense to look for the simplest way to do flow 
control and that is decidedly not from the RX side. (I expect Tony Li to 
disagree with that  – but I have already outlined why it is more complex to do 
it from the Rx side.)

   Les


From: stephane.litkow...@orange.com 
Sent: Tuesday, July 23, 2019 9:50 PM
To: Les Ginsberg (ginsberg) ; Tony Li ; 
lsr@ietf.org
Subject: RE: [Lsr] Dynamic flow control for flooding

Hi Les,

I agree that flooding is a global thing and not a local mechanism if we 
consider that the ultimate goal is to get the LSDB in-sync as fast as we can on 
all the nodes.

I just want to highlight three things:

  *   Link delay (due to transmission link distance) is already affecting the 
flooding speed (especially when we need to cross some links which have 100msec 
of RTD), so the flooding speed is already not equal on each link
  *   I put this one in parenthesis as it may be controversial ☺ (To converge a 
path after a topology change, we do not always require all the nodes to get the 
LSDB in-sync (I mean from a fwding point of view). That’s a tricky topic 
because it is highly depending on the network topology and in one hand flooding 
one or two hops away allows to converge the path, while in an other hand, it 
may create microloops with another network design. )
  *   I’m really wondering how much difference we may have considering the 
different routers we have in a single area today. Even if we have some legacy 
routers still deployed, they are more powerful compared to the time the ISO 
spec was done. Are we expecting hundreds of msec difference or tens between 
last generation of routers deployed and the legacy one ? In addition, in our 
case, we try to create consistent design, which means that we are trying to 
avoid having legacy routers in transit between last generation of routers and 
we are pushing the legacy one at the edge or try to remove them. There may be 
some transient situation when it happens but that’s not a design goal. This is 
to say that I’m not hurted to get a very fast flooding value on my core and 
last generation edges while letting a more conservative value for legacy edges. 
And I’m not expecting to have so much differences between the two (at least not 
really more than the link delay that may already exists and impact flooding).

Another point is that I would be really glad to see how much the flooding time 
is impacting the convergence time in real networks taking into account that the 
FIB rewrite is usually the biggest contributor (unfortunately we don’t have 
really instrumentation today to measure flooding). I’m not telling that there 
is nothing to do, of course the default flooding time we had for years could be 
improved and I fully agree. I’m just always interested to have some potential 
gain measurement.

Flow control is required in any case, we can always find a case when the IS-IS 
process will not get enough CPU time because CPU is busy doing other stuffs and 
IS-IS can’t process the input PDUs (as an example).


Brgds,

From: Lsr [mailto:lsr-boun...@ietf.org] On Behalf Of Les Ginsberg (ginsberg)
Sent: Tuesday, July 23, 2019 16:30
To: Tony Li; lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Dynamic flow control for flooding

Tony –

Thanx for picking up the discussion.
Thanx also for doing the math to show that bandwidth is not a concern. I think 
most/all of us knew that – but it is good to put that small question behind us.

I also think we all agree on the goal - which is to flood significantly faster 
than many implementations do today to handle deployments like the case you 
mention below.

Beyond this point, I have a different perspective.

As network-wide convergence depends upon fast propagation of LSP changes – 
which in turn requires consistent flooding rates on all interfaces enabled for 
flooding – a properly provisioned network MUST be able to sustain a consistent 
flooding rate or the operation of the network will suffer. We therefore need to 
view flow control issues as indicative of a problem.

It is a mistake to equate LSP flooding with a set of independent P2P 
“connections” – each of which can operate at a rate independent of the other.

If we can agree on this, then I believe we will have placed the flow control 
problem in its proper perspective – in which case it will 

Re: [Lsr] Dynamic flow control for flooding

2019-07-23 Thread tony . li

Hi Robert,


> Are you working on the assumption that there is no data link layer flow 
> control already which could signal the OS congestion on the receiving end ? 


I am assuming that there is no link layer flow control.  I can’t recall working 
on a system that supports X.25 since about 1995, so I don’t think that’s a 
common use case today. 


> Are we also working on the assumptions that when ISIS PDUs are send in DCs 
> (unknown today case when out of the sudden 1000s of LSPs may need to get 
> flooded) use of some L4 fancy flow control is an overkill and we must invent 
> new essentially L2 flow control to cover this case to address partition 
> repair ? 


I am not assuming that the issue is restricted to the DC. Flooding is an issue 
in all IS-IS networks.

1000s of LSPs can occur in any IS-IS network of significant scale.  All it 
takes is healing a partition and there can easily be a large number of LSPs to 
transmit.  The case of 1000s of LSPs is of interest because the scale magnifies 
the flooding problem.  If we only have one LSP that needs flooding, this entire 
discussion is moot.

I am assuming that we want to go faster.  That does seem to be something that 
we have agreement on.

I am assuming that we dont’ want to go too fast.  Overrunning the receiver is 
wasteful. I think we all agree on that.

I am not assuming that we have to use some ‘L4 fancy flow control’, but I am 
trying to get a reasonable approximation to optimal goodput, with errors being 
on the conservative side (i.e., not overrunning the receiver).

My understanding of control theory is pretty rudimentary, but what I do know is 
that it is going to be very difficult to achieve high goodput without a control 
loop of some flavor. I’m open to how we do this.  Henk proposed that we simply 
pick up TCP for this, but my concern with that is really about introducing a 
whole new dependency to the protocol.  That’s a lot to chew.  Do we really need 
it all? I hope not.  Thus, Bruno’s original suggestion sparked my interest in 
doing something dynamic and simple.

Tony


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-23 Thread stephane.litkowski
Hi Les,

I agree that flooding is a global thing and not a local mechanism if we 
consider that the ultimate goal is to get the LSDB in-sync as fast as we can on 
all the nodes.

I just want to highlight three things:

-  Link delay (due to transmission link distance) is already affecting 
the flooding speed (especially when we need to cross some links which have 
100msec of RTD), so the flooding speed is already not equal on each link

-  I put this one in parenthesis as it may be controversial ☺ (To 
converge a path after a topology change, we do not always require all the nodes 
to get the LSDB in-sync (I mean from a fwding point of view). That’s a tricky 
topic because it is highly depending on the network topology and in one hand 
flooding one or two hops away allows to converge the path, while in an other 
hand, it may create microloops with another network design. )

-  I’m really wondering how much difference we may have considering the 
different routers we have in a single area today. Even if we have some legacy 
routers still deployed, they are more powerful compared to the time the ISO 
spec was done. Are we expecting hundreds of msec difference or tens between 
last generation of routers deployed and the legacy one ? In addition, in our 
case, we try to create consistent design, which means that we are trying to 
avoid having legacy routers in transit between last generation of routers and 
we are pushing the legacy one at the edge or try to remove them. There may be 
some transient situation when it happens but that’s not a design goal. This is 
to say that I’m not hurted to get a very fast flooding value on my core and 
last generation edges while letting a more conservative value for legacy edges. 
And I’m not expecting to have so much differences between the two (at least not 
really more than the link delay that may already exists and impact flooding).

Another point is that I would be really glad to see how much the flooding time 
is impacting the convergence time in real networks taking into account that the 
FIB rewrite is usually the biggest contributor (unfortunately we don’t have 
really instrumentation today to measure flooding). I’m not telling that there 
is nothing to do, of course the default flooding time we had for years could be 
improved and I fully agree. I’m just always interested to have some potential 
gain measurement.

Flow control is required in any case, we can always find a case when the IS-IS 
process will not get enough CPU time because CPU is busy doing other stuffs and 
IS-IS can’t process the input PDUs (as an example).


Brgds,

From: Lsr [mailto:lsr-boun...@ietf.org] On Behalf Of Les Ginsberg (ginsberg)
Sent: Tuesday, July 23, 2019 16:30
To: Tony Li; lsr@ietf.org
Subject: Re: [Lsr] Dynamic flow control for flooding

Tony –

Thanx for picking up the discussion.
Thanx also for doing the math to show that bandwidth is not a concern. I think 
most/all of us knew that – but it is good to put that small question behind us.

I also think we all agree on the goal - which is to flood significantly faster 
than many implementations do today to handle deployments like the case you 
mention below.

Beyond this point, I have a different perspective.

As network-wide convergence depends upon fast propagation of LSP changes – 
which in turn requires consistent flooding rates on all interfaces enabled for 
flooding – a properly provisioned network MUST be able to sustain a consistent 
flooding rate or the operation of the network will suffer. We therefore need to 
view flow control issues as indicative of a problem.

It is a mistake to equate LSP flooding with a set of independent P2P 
“connections” – each of which can operate at a rate independent of the other.

If we can agree on this, then I believe we will have placed the flow control 
problem in its proper perspective – in which case it will become easier to 
agree on the best way to implement flow control.

   Les



From: Lsr mailto:lsr-boun...@ietf.org>> On Behalf Of Tony 
Li
Sent: Tuesday, July 23, 2019 6:34 AM
To: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: [Lsr] Dynamic flow control for flooding


Hi all,

I’d like to continue the discussion that we left off with last night.

The use case that I posited was a situation where we had 1000 LSPs to flood. 
This is an interesting case that can happen if there was a large network that 
partitioned and has now healed.  All LSPs from the other side of the partition 
are going to need to be updated.

Let’s further suppose that the LSPs have an average size of 1KB.  Thus, the 
entire transfer is around 1MB.

Suppose that we’re doing this on a 400Gb/s link. If we were to transmit the 
whole batch of LSPs at once, it takes a whopping 20us.  Not milliseconds, 
microseconds.  2x10^-5s.  Clearly, we are not going to be rate limited by 
bandwidth.

Note that 20us is an unreasonable lower bound: we cannot reasonably expect a 
node to absorb 

Re: [Lsr] Dynamic flow control for flooding

2019-07-23 Thread Les Ginsberg (ginsberg)
David -

Let's take Tony's example test case.

A large network is partitioned and heals. As a result I now have 1000 LSPs 
which need to be flooded.

Let's suppose on most links/nodes in the network I can support receiving of 500 
LSPs/second.
But on one link/node I can only support receiving 33 LSPs/second.

This means we are at risk for part of the network converging the LSPDB in 2+ 
seconds and part of the network converging the LSPDB in 33+ seconds.

Having the means for the node which only supports 33 LSPs/second to signal its 
"upstream neighbors" to not overflow its receive queue isn't going to help 
network convergence.

That's all I am saying.

   Les


> -Original Message-
> From: David Lamparter 
> Sent: Tuesday, July 23, 2019 2:14 PM
> To: Les Ginsberg (ginsberg) 
> Cc: Tony Li ; lsr@ietf.org
> Subject: Re: [Lsr] Dynamic flow control for flooding
> 
> Hi Les,
> 
> 
> On Tue, Jul 23, 2019 at 08:29:30PM +, Les Ginsberg (ginsberg) wrote:
> > [...] As network-wide convergence depends upon fast propagation of LSP
> > changes -
> 
> you're losing me between that previous part and the next:
> 
> > - which in turn requires consistent flooding rates on all interfaces
> > enabled for flooding [...]
> 
> I understand and follow your reasoning if we have a classical timer that
> limits flooding rates per LSP.  If we get multiple updates to the same
> LSP, dissimilar flooding rates imply we might just have sent out the
> previous now-outdated state, and we block for some potentially lengthy
> time before sending out the most recent version of that LSP.
> 
> I don't understand how we get delayed propagation of LSP changes if we
> employ some mechanism to raise the flooding rate to something based
> around the target system's capabilities.
> 
> Could you elaborate on how we get delayed LSP propagation in this
> scenario?
> 
> Thanks,
> 
> 
> -David

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-23 Thread Robert Raszuk
Hi Tony,

Are you working on the assumption that there is no data link layer flow
control already which could signal the OS congestion on the receiving end ?

Are we also working on the assumptions that when ISIS PDUs are send in DCs
(unknown today case when out of the sudden 1000s of LSPs may need to get
flooded) use of some L4 fancy flow control is an overkill and we must
invent new essentially L2 flow control to cover this case to address
partition repair ?

Thx,
R.

On Tue, Jul 23, 2019 at 3:34 PM Tony Li  wrote:

>
> Hi all,
>
> I’d like to continue the discussion that we left off with last night.
>
> The use case that I posited was a situation where we had 1000 LSPs to
> flood. This is an interesting case that can happen if there was a large
> network that partitioned and has now healed.  All LSPs from the other side
> of the partition are going to need to be updated.
>
> Let’s further suppose that the LSPs have an average size of 1KB.  Thus,
> the entire transfer is around 1MB.
>
> Suppose that we’re doing this on a 400Gb/s link. If we were to transmit
> the whole batch of LSPs at once, it takes a whopping 20us.  Not
> milliseconds, microseconds.  2x10^-5s.  Clearly, we are not going to be
> rate limited by bandwidth.
>
> Note that 20us is an unreasonable lower bound: we cannot reasonably expect
> a node to absorb 1k PDUs back to back without loss today, in addition to
> all of it’s other responsibilities.
>
> At the opposite end of the spectrum, suppose we transmit one PDU every
> 33ms.  That’s then going to take us 33 seconds to complete. Unreasonably
> slow.
>
> How can we then maximize our goodput?  We know that the receiver has a set
> of buffers and a processing rate that it can support. The processing rate
> will vary, depending on other loads.
>
> What we would like the transmitter to do is to transmit enough to create a
> small processing queue on the receiver and then transmit at the receiver’s
> processing rate.
>
> Can we agree on this goal?
>
> Tony
>
> ___
> Lsr mailing list
> Lsr@ietf.org
> https://www.ietf.org/mailman/listinfo/lsr
>
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-23 Thread Tony Przygienda
On Tue, Jul 23, 2019 at 5:24 PM Les Ginsberg (ginsberg) 
wrote:

> Tony –
>
>
>
> As usual, you cover a lot of territory – and even after a couple of
> readings I am not sure I got everything.
>

I was being accused of being too flowerly in my prose for many years so I
adopted an acerbic, terse style ;-)

>
> *From:* Tony Przygienda 
> *Sent:* Tuesday, July 23, 2019 1:56 PM
> *To:* Les Ginsberg (ginsberg) 
> *Cc:* Tony Li ; lsr@ietf.org
> *Subject:* Re: [Lsr] Dynamic flow control for flooding
>
>
>
>
>
>
>
> It is a mistake to equate LSP flooding with a set of independent P2P
> “connections” – each of which can operate at a rate independent of the
> other.
>
>
>
>
>
>
>
> At least my experience much disagrees with that and such a proposal seems
> to steer towards slowest receiver in the whole network problem so I wait
> for others to chime in.
>
> *[Les:] This is NOT AT ALL where I am going.*
>
> *If I have a “large network” and I have a node which consistently cannot
> support the flooding rates necessary to deal with Tony Li’s example (node w
> many neighbors fails) then the network has a problem.*
>
> *Slowing down everyone to meet the flooding speed of the slowest speed is
> not something I would expect a customer to accept. The network will not be
> able to deliver the convergence expected. The node in question needs to be
> identified and steps taken to either fix it or upgrade or replace it or…*
>
>
>
> *The point I am also making is trying to run the network with some links
> flooding fast and some links flooding slow isn’t a solution either.*
>

hmm, then I don't know what you propose in normal case except saying
nothing seems to skin the cat properly when your network is loop-sided
enough. On which we agree I guess ...


>
>
> Then, to clarify on Tony's mail, the "problem" I mentioned anecdotally
> yesterday as behavior I saw on things I did in their time was of course
> when processors were still well under 1GHz and links in Gigs and not 10s
> and 100s of Gigs we have today but yes, the limiting factor was the
> flooding rate (or rather effective processing rate of receiver AFAIR before
> it started drop the RX queues or was late enough to cause RE-TX on senders)
> in terms of losses/retransmissions necessary that were causing transients
> to the point it looked to me then the cure seemed worse than the disease
> (while the disease was likely a flu then compared to today given we didn't
> have massively dense meshes we steer towards today). The base spec &
> mandated flooding numbers didn't change but what is possible in terms of
> rates when breaking the spec did change of course in terms of CPU/links
> speed albeit most ISIS implementations go back to megahertz processors
> still ;-) And the dinner was great BTW ;-)
>
>
>
> So yes, I do think that anything that will flood @ reasonable rate without
> excessive losses will work well on well-computed
> double-flood-reduced-graph, the question is how to get the "reasonable" in
> place both in terms of numbers as well as mechanism for which we saw tons
> lively discussions/proposal yesterday, most obvious being of course going
> and manually bumping e'one's implementation to the desired (? ;-) value
> ...  Other consideration is having computation always trying to get more
> than 2 links in minimal cut on the graph of course which should alleviate
> any bottleneck or rather, make the cut less likely. Given quality of
> max-disjoint-node/link graph computation algorithms that should be doable
> by gut feeling. If e.g. the flood rate per link is available the algorithms
> should be doing even better in centralized case.
>
>
>
> *[Les:] Convergence issues and flooding overload as a result of excessive
> redundant flooding is a real issue – but it is a different problem (for
> which we have solutions) and we should not introduce that issue into this
> discussion.*
>

hmm, we are trying to build flood reduction to deal with exactly this
problem I thought and we are trying to find a good solution in the design
space between a hamiltonian path and not reducing any links @ all where on
one hand the specter of long flooding chains & partitions on single link
failures looms while beckoning with very low CPU load and on the other hand
we can do nothing @ all while staring down the abyss of excessivly large,
densely meshed networks and falling of the cliff of melted flooding ...
So, I'm not sure I introduced anything new but if I did, ignore my attempt
@ clarification of what I said yesterday ...

--- tony

>
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-23 Thread Les Ginsberg (ginsberg)
Tony –

As usual, you cover a lot of territory – and even after a couple of readings I 
am not sure I got everything.
Still, I dare to reply.
Inline.

From: Tony Przygienda 
Sent: Tuesday, July 23, 2019 1:56 PM
To: Les Ginsberg (ginsberg) 
Cc: Tony Li ; lsr@ietf.org
Subject: Re: [Lsr] Dynamic flow control for flooding



It is a mistake to equate LSP flooding with a set of independent P2P 
“connections” – each of which can operate at a rate independent of the other.



At least my experience much disagrees with that and such a proposal seems to 
steer towards slowest receiver in the whole network problem so I wait for 
others to chime in.
[Les:] This is NOT AT ALL where I am going.
If I have a “large network” and I have a node which consistently cannot support 
the flooding rates necessary to deal with Tony Li’s example (node w many 
neighbors fails) then the network has a problem.
Slowing down everyone to meet the flooding speed of the slowest speed is not 
something I would expect a customer to accept. The network will not be able to 
deliver the convergence expected. The node in question needs to be identified 
and steps taken to either fix it or upgrade or replace it or…

The point I am also making is trying to run the network with some links 
flooding fast and some links flooding slow isn’t a solution either.

Then, to clarify on Tony's mail, the "problem" I mentioned anecdotally 
yesterday as behavior I saw on things I did in their time was of course when 
processors were still well under 1GHz and links in Gigs and not 10s and 100s of 
Gigs we have today but yes, the limiting factor was the flooding rate (or 
rather effective processing rate of receiver AFAIR before it started drop the 
RX queues or was late enough to cause RE-TX on senders) in terms of 
losses/retransmissions necessary that were causing transients to the point it 
looked to me then the cure seemed worse than the disease (while the disease was 
likely a flu then compared to today given we didn't have massively dense meshes 
we steer towards today). The base spec & mandated flooding numbers didn't 
change but what is possible in terms of rates when breaking the spec did change 
of course in terms of CPU/links speed albeit most ISIS implementations go back 
to megahertz processors still ;-) And the dinner was great BTW ;-)

So yes, I do think that anything that will flood @ reasonable rate without 
excessive losses will work well on well-computed double-flood-reduced-graph, 
the question is how to get the "reasonable" in place both in terms of numbers 
as well as mechanism for which we saw tons lively discussions/proposal 
yesterday, most obvious being of course going and manually bumping e'one's 
implementation to the desired (? ;-) value ...  Other consideration is having 
computation always trying to get more than 2 links in minimal cut on the graph 
of course which should alleviate any bottleneck or rather, make the cut less 
likely. Given quality of max-disjoint-node/link graph computation algorithms 
that should be doable by gut feeling. If e.g. the flood rate per link is 
available the algorithms should be doing even better in centralized case.

[Les:] Convergence issues and flooding overload as a result of excessive 
redundant flooding is a real issue – but it is a different problem (for which 
we have solutions) and we should not introduce that issue into this discussion.

   Les

BTW, with all that experience (MANET did its share in different space as we 
know in terms of flood reduction as well) in RIFT we chose a solution based on 
MANET derivative where every source chooses a different set of trees to flood 
on using Fisher-Yates hashes but that seems possible only if you have 
directionality on the graph (that's what I said once on the mike that doing 
flood reduction in a lattice [partial rank-ordered graph with upper & lower 
bounds] is fairly trivial, on generic graphs not so much necessarily). But 
maybe Pascal reads that and gives it a think ;-)

as usual, 2 cents to improve the internet ;-)

--- tony
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-23 Thread David Lamparter
Hi Les,


On Tue, Jul 23, 2019 at 08:29:30PM +, Les Ginsberg (ginsberg) wrote:
> [...] As network-wide convergence depends upon fast propagation of LSP
> changes -

you're losing me between that previous part and the next:

> - which in turn requires consistent flooding rates on all interfaces
> enabled for flooding [...]

I understand and follow your reasoning if we have a classical timer that
limits flooding rates per LSP.  If we get multiple updates to the same
LSP, dissimilar flooding rates imply we might just have sent out the
previous now-outdated state, and we block for some potentially lengthy
time before sending out the most recent version of that LSP.

I don't understand how we get delayed propagation of LSP changes if we
employ some mechanism to raise the flooding rate to something based
around the target system's capabilities.

Could you elaborate on how we get delayed LSP propagation in this
scenario?

Thanks,


-David

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-23 Thread Tony Przygienda
>
>
> It is a mistake to equate LSP flooding with a set of independent P2P
> “connections” – each of which can operate at a rate independent of the
> other.
>
>
>
>
At least my experience much disagrees with that and such a proposal seems
to steer towards slowest receiver in the whole network problem so I wait
for others to chime in.

Then, to clarify on Tony's mail, the "problem" I mentioned anecdotally
yesterday as behavior I saw on things I did in their time was of course
when processors were still well under 1GHz and links in Gigs and not 10s
and 100s of Gigs we have today but yes, the limiting factor was the
flooding rate (or rather effective processing rate of receiver AFAIR before
it started drop the RX queues or was late enough to cause RE-TX on senders)
in terms of losses/retransmissions necessary that were causing transients
to the point it looked to me then the cure seemed worse than the disease
(while the disease was likely a flu then compared to today given we didn't
have massively dense meshes we steer towards today). The base spec &
mandated flooding numbers didn't change but what is possible in terms of
rates when breaking the spec did change of course in terms of CPU/links
speed albeit most ISIS implementations go back to megahertz processors
still ;-) And the dinner was great BTW ;-)

So yes, I do think that anything that will flood @ reasonable rate without
excessive losses will work well on well-computed
double-flood-reduced-graph, the question is how to get the "reasonable" in
place both in terms of numbers as well as mechanism for which we saw tons
lively discussions/proposal yesterday, most obvious being of course going
and manually bumping e'one's implementation to the desired (? ;-) value
  Other consideration is having computation always trying to get more
than 2 links in minimal cut on the graph of course which should alleviate
any bottleneck or rather, make the cut less likely. Given quality of
max-disjoint-node/link graph computation algorithms that should be doable
by gut feeling. If e.g. the flood rate per link is available the algorithms
should be doing even better in centralized case.

BTW, with all that experience (MANET did its share in different space as we
know in terms of flood reduction as well) in RIFT we chose a solution based
on MANET derivative where every source chooses a different set of trees to
flood on using Fisher-Yates hashes but that seems possible only if you have
directionality on the graph (that's what I said once on the mike that doing
flood reduction in a lattice [partial rank-ordered graph with upper & lower
bounds] is fairly trivial, on generic graphs not so much necessarily). But
maybe Pascal reads that and gives it a think ;-)

as usual, 2 cents to improve the internet ;-)

--- tony
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-23 Thread Les Ginsberg (ginsberg)
Tony –

Thanx for picking up the discussion.
Thanx also for doing the math to show that bandwidth is not a concern. I think 
most/all of us knew that – but it is good to put that small question behind us.

I also think we all agree on the goal - which is to flood significantly faster 
than many implementations do today to handle deployments like the case you 
mention below.

Beyond this point, I have a different perspective.

As network-wide convergence depends upon fast propagation of LSP changes – 
which in turn requires consistent flooding rates on all interfaces enabled for 
flooding – a properly provisioned network MUST be able to sustain a consistent 
flooding rate or the operation of the network will suffer. We therefore need to 
view flow control issues as indicative of a problem.

It is a mistake to equate LSP flooding with a set of independent P2P 
“connections” – each of which can operate at a rate independent of the other.

If we can agree on this, then I believe we will have placed the flow control 
problem in its proper perspective – in which case it will become easier to 
agree on the best way to implement flow control.

   Les



From: Lsr mailto:lsr-boun...@ietf.org>> On Behalf Of Tony 
Li
Sent: Tuesday, July 23, 2019 6:34 AM
To: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: [Lsr] Dynamic flow control for flooding


Hi all,

I’d like to continue the discussion that we left off with last night.

The use case that I posited was a situation where we had 1000 LSPs to flood. 
This is an interesting case that can happen if there was a large network that 
partitioned and has now healed.  All LSPs from the other side of the partition 
are going to need to be updated.

Let’s further suppose that the LSPs have an average size of 1KB.  Thus, the 
entire transfer is around 1MB.

Suppose that we’re doing this on a 400Gb/s link. If we were to transmit the 
whole batch of LSPs at once, it takes a whopping 20us.  Not milliseconds, 
microseconds.  2x10^-5s.  Clearly, we are not going to be rate limited by 
bandwidth.

Note that 20us is an unreasonable lower bound: we cannot reasonably expect a 
node to absorb 1k PDUs back to back without loss today, in addition to all of 
it’s other responsibilities.

At the opposite end of the spectrum, suppose we transmit one PDU every 33ms.  
That’s then going to take us 33 seconds to complete. Unreasonably slow.

How can we then maximize our goodput?  We know that the receiver has a set of 
buffers and a processing rate that it can support. The processing rate will 
vary, depending on other loads.

What we would like the transmitter to do is to transmit enough to create a 
small processing queue on the receiver and then transmit at the receiver’s 
processing rate.

Can we agree on this goal?

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr