Well informed discussion by now ;-) just one inline comment
> as tony (and others) have pointed out ... it will only work during
> nice-weather
> conditions ... as soon as you have the perfect-storm, you will have micro
> loops (and all the pain you try to protect yourself) - therefore i'd be much
> more in favour of 'something else' which gets us to zero uloops, and for that
> i
> am afraid we need to have the notions of independent forwarding planes
> (ala make before break) -
>
> trying to synchronize RIB/FIB update across different CPUs/routing-
> stacks/vendors/load conditions is a battle that can never be won.
>
> [SLI] This is not a definitive solution for microloops, just a simple quick
> win to
> remove somes.
> I'm still dreaming on a simple definitive solution ... but I don't see any
> ... oFIB
> or synchronized FIBs sounds to not really have a good of support for
> implementations ...
> In the meantime, I do think there are small and fast areas of improvments
> even if not definitive solutions.
[Tony said] The battle of synchronizing the nodes (and the network in a sense)
cannot be won since beside dealing with asynchronous networks (ah, the halcyon
days of TDM ;-) we are dealing here with hugely asynchronous systems within
networks (by now). The days of same single-core processor doing everything on
the side while running fast-path written in assembly by a pizza-fed wizard are
long, long past (but then, 200 msec before flooding something & other guy being
able to look @ it was considered quite normal ;-). Looking @ e.g. our
architecture I am having fun discussions along the lines "I gave you ACK but
you know, it only means I processed some of the stuff you gave me & then I have
to do all those other things until it's in FIB & fwd'ing but you surely don't
want me to wait until it's done before I ACK you. So, what does an 'ACK' really
mean here ?" ;-) We can get 'some' amount of better synchronization (and
again, we better not get too good @ it, a perfectly sync'ed netwo
rk on LSA refreshes or HELLOs is _not_ a fun thing to debug ;-) and that's
what this work will need to settle for. The important things in the work are
IMO
* make sure you don't aim for _perfect_ synchronization but some small
jitter to avoid network-wide Dirac pulses on your control planes. Even if you
don't, I'm pretty sure the asynchronous nature of today's architectures will
confound you. All kind of hysteresis like Hannes said is built into all the
large systems today (packing, pipelining of async comms, state batching,
reordering of state updates to preserve FIB integrity, chip restrictions & so
on).
* allow for 'how many new LSAs and/or a timebound' is my 'normal
failure' knob for the operator. Those numbers may shift dramatically. If I'm
running large numbers of IGP shortcuts over a link, I may end up with tons of
stuff on my plate on a single phy link failure before I want my computation to
jump-start. Yes, normal will be probably, one link - jump @ it if you're @
cool temperature but everybody is doing that today already pretty much so it's
more 'what's bad enough to start to back-off'.
Looking fwd' to what will emerge as practical proposed backoff algorithm here
--- tony
_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg