Jeffrey Haas wrote 2018-11-07 20:56:
I guess my question to those who live in IGP land is how often is this
a
problem? In the case of an IGP, the backpressure means you have
databases
that are out of sync and end up with bad forwarding.
As discussed below, if you have multiple flooding paths, and not all of
them are congested or throttled, when at least one copy of the LSP makes
it across, convergence will be fast.
Both iBGP and eBGP. The two general issues are slow receivers (scale)
or
responses to dropped packets.
Slow receivers are a problem for native flooding too.
Although I suspect you mean: after congestion problems, and the
situation
improves, native flooding will recover quickly, while TCP might
intentionally
keep things slow for a longer period of time. Correct ?
Yes, that is an issue.
The general experiment I recommend to people trying to do this sort of
thing
is take your TCP stream of choice, pace it according to your
transmission
needs, and then drop 5-30% of the packets. Observe what happens.
Don't we have DSCP for this ?
And remember, the TCP connection will (almost) always be between two
directly connected endpoints.
TCP recovers fine. But the hiccups can do bad things when timeliness
is
expected. For example, 3 second hold times for aggressive BGP peering
may
time out.
Our proposal is to do only flooding over TCP.
Adjacency management is still done based on native IIHs (and BFD).
Even if TCP stalls the flooding, the adjacency should stay up.
With flooding over multiple paths, it should not be a fatal event.
I guess I'd restate my concern as "for this application, ensure that
you're
okay with the results of stalled trasnmission". Effectively, see the
answer
to the question I asked above about native behaviors.
[Flooding happens over multiple paths. As long as one path is quick,
convergence
will be quick too].
This, I think, is a better point addressing my concerns. Thanks.
I expect there will be more issues that need to be addressed.
E.g. an old rule of thumb is: don't generate a routing update (packet)
unless
you are pretty sure you can send it right away, and the receiver can
receive it.
Otherwise a lot of the actual communication might be stale information.
I'm not sure if people find this rule of thumb still relevant these
days.
(I know people who do not). With an abundance of cpu-cores, memory and
bandwidth,
it seems many problems of the past are not visible anymore. Unless you
start
pushing beyond what most people do. But if you do care, it is advisable
to keep
your TCP window-sizes small. Maybe at the default 16KB, maybe even 8KB
or 4KB.
With a window-size of 4KB you might be able to still send a dozen
average-size LSPs,
and those might get stuck/stale in TCP. But I think that's a good
trade-off to get
syncing of large LSDBs. As long as you don't set the window-size to 64KB
or larger.
And maybe even then, stale LSPs might be less of a problem than
old-timers think.
henk.
_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr