Tony –

This thread reminds me of how easy it is to miscommunicate – and I bear some of 
the responsibility for that.
Inline.

From: Tony Li <tony1ath...@gmail.com> On Behalf Of tony...@tony.li
Sent: Wednesday, July 24, 2019 1:07 PM
To: Les Ginsberg (ginsberg) <ginsb...@cisco.com>
Cc: lsr@ietf.org
Subject: Re: [Lsr] Dynamic flow control for flooding

Les,

Ok, let me reset.  I’ve re-read your slides.

I don’t see anything in there about changing the PSNP signaling rate.  From 
your comments to Henk, I infer that you’re open to changing that rate.
[Les:] The proposal in the slides is simply an example/straw man. I did not 
spend a lot of time on it – in fact in the first draft of the slides I did not 
even provide a proposal. It certainly needs more refinement.
It is meant only to illustrate how we can do things w/o requiring the receive 
side to do calculations for which the raw data may be difficult and w/o 
requiring new TLVs.


As soon as you do that, you’re now providing receiver based feedback and 
creating flow control.  You’re accepting that rates will vary per interface.

[Les:] Yes – but only when we know that continuing to send at a high rate isn’t 
useful. It isn’t meant to fix things (as I keep emphasizing) and in a network 
that works as intended it should never be necessary.

What you’re NOT doing is providing information about the receiver’s input queue 
and requested input rate.  With less information, the transmitter can only 
approximate the optimal rate and your proposal seems like a Newton’s method 
approach to determining that rate.

[Les:] For all of the implementations I have worked on (5 now – across 3 
different vendors – not all still available 😊 ) such information is not easily 
determined. Buffer pools are shared among many components, input queues may 
have multiple stages not all of which are visible to the routing protocol. 
Plus, since once flow control is needed there is already a problem, this isn’t 
fixing things – it is just trying to get by.

A solution which depends on current receiver state “all the time” is hard – and 
hard to optimize. And I think we don’t need that degree of precision for 
optimal operation.


Your proposal depends on two constants: Usafe and Umax.  How do you know what 
those are?

[Les:] Not yet.

That’s information about the receiver.

[Les:] Happy to agree to that.

I infer that you propose to hard code some conservative values for these.  In 
my mind, that implies that you will be going more slowly than you could if you 
had more accurate data.  And pretty much what we’re proposing is that the 
receiver advertise this type of information so that we don’t have to assume the 
worst case.  This also is nice because an implementation only has to know about 
it’s own capabilities.

[Les:] I expect the values to be aggressive – because the downside of flooding 
LSPs too fast for (say) a few seconds is small.


Tony



On Jul 24, 2019, at 12:31 PM, Les Ginsberg (ginsberg) 
<ginsb...@cisco.com<mailto:ginsb...@cisco.com>> wrote:

Tony –

I have NEVER proposed that the flooding rate be determined by the slowest node.
Quite the opposite.

Flooding rate should be based on the target convergence time and should be 
aggressive because most topology changes involve much fewer than 1000 LSPs 
(arbitrary number). So even w a slow node fast flooding won’t be an issue for 
the vast majority of changes.

When we get a topology change with enough LSPs to expose the slowest node 
limitations we (in decreasing order of importance):

1)Continue to flood fast to those nodes/links which can handle it
2)Report the slow node to the operator (so they can address the limitation)
3)Do what we can to limit the overload on the slow node/link

Hope this helps.

   Les


From: Tony Li <tony1ath...@gmail.com<mailto:tony1ath...@gmail.com>> On Behalf 
Of tony...@tony.li<mailto:tony...@tony.li>
Sent: Wednesday, July 24, 2019 12:04 PM
To: Les Ginsberg (ginsberg) <ginsb...@cisco.com<mailto:ginsb...@cisco.com>>
Cc: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Dynamic flow control for flooding


Les,


Optimizing the throughput through a slow receiver is pretty low on my list 
because the ROI is low.


Ok, I disagree. The slow receiver is the critical path to convergence.  Only 
when the slow receiver has absorbed all changes and SPFed do we have 
convergence.


First, the rate that you select might be too fast for one neighbor and not for 
the others.  Real flow control would help address this.

[Les:] At the cost of convergence. Not a good tradeoff.
I am arguing that we do want to flood at the same rate on all interfaces used 
for flooding. When we cannot, flow control does not help with convergence. It 
may decrease some wasted bandwidth – but as we all agree that bandwidth isn’t a 
significant limitation this isn’t a great concern.


Rate limiting flooding delays convergence.  Please consider the following 
topology:


1 —————— 2 —————— 3
|        |        |
|        |        |
4 —————— 5 —————— 6
|        |        |
|        |        |
7 —————— 8 —————— 9


Suppose that we have 1000 LSPs injected at router 1.  Suppose further that 
router 2 runs at half the rate of router 4.  [How router 1 knows this requires 
$DEITY and is out of scope for the moment.]

Router 1 now floods at the optimal rate for router 2.  Router 1 uses that same 
rate to flood to router 4.  Suppose that it takes time T for this to complete.

When does the network converge?

Option 1: All nodes use the same flooding rate.

Router 2 will flood to router 3 concurrent with receiving updates from router 
1. Thus, router 3 will receive all updates in time T + delta, where delta is 
router 2’s processing time.  For now, let’s approximate delta as zero.

Similarly, all routers will use the same rate, so router 4 will flood to 7 in 
time T + delta, and so on, with router 9 receiving everything in time T + 3 * 
delta.

Assuming no nodes SPF during the process, the network converges nearly 
simultaneously in about time T.

Option 2: We flood a bit faster where we can.

Suppose that router 1 now floods at the full rate to router 4.  The full update 
now takes time T/2.  Because all of the other nodes in the network are fast, 
router 4 floods in time T/2 + delta to nodes 5 and 7.  Carrying this forward, 
router 9 gets a full update in time T/2 + 3 * delta.  Even router 3 has full 
updates in T/2 + 3 * delta.

With the exception of node 2, the network has converged in half the time.  Even 
node 2 converges in time T.

Key points:

1) Yes, the slow node delays convergence and causes micro-loops as everyone 
around it SPFs.  The point here (and I think you agree) is that slow nodes need 
to be upgraded.

2) There is no way for us to know how fast a node can go without some form of 
flow control, other than to go absurdly slowly.

3) There are many folks who want to converge quickly.  It is mission critical 
for them.  They will address slow nodes. They will not accept pessimal timing 
to avoid micro-loops.




[Les:] I do not see how flow control improves things.


Flow control allows the transmitter to transmit at the optimal rate for the 
receiver.




Dropping down to the least common denominator CPU speed in the entire network 
is going to be undoable without an oracle, and absurdly slow even with that.

[Les:] Never advocated that – please do not put those words in my mouth.


How is that different than what you’ve proposed?  Router 1 can only flood at 
the rate that it gets PSNPs from router 2.  That paces its flooding to router 
4.  Following that logic, you somehow want router 4 to run at the same rate, 
forcing a uniformly slow rate.

Tony

_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to