On Mon, 18 Aug 2025 at 21:22, Matthew Petach via NANOG
<nanog@lists.nanog.org> wrote:

> I don't know of many networks that choose link costs to ensure resulting
> uniqueness of the cumulative cost through the path.  Indeed, ECMP is taken
> to be an assumption for most IGPs we use in the real world.

That is funny, and of course we can beat Djikstra massively if we can
make assumptions for specific environments, which is arguably what
engineering is, take advantage of environment constants that allow for
assumptions which yield to optimisation.

How is SPF ran today? I have no clue, because the modern approach to
convergence is not to converge fast, but to converge before fault.
Which is not something Djikstra does. The naive approach would be to
just run SPF many many times, removing from the topology failed nodes
and edges to recover post-converge topology and loop free alternative
paths.
But absolutely there exists some domain specific solution which is
cheaper when you need to recover both the best current path and best
post-convergence paths. If such an algorithm is actually used or if
the much more antifragile approach is used to throw a compute at it
and run SPF as many times as it takes, I have no idea.

In Junos a few years back they enabled out-of-box the infrastructure
for this post-fault convergence, regardless if or not you chose to
install the backup paths.
How this is implemented in practice is that the same structure that
ECMP uses is used for backup paths, just the backup path is programmed
in the hardware at worse weight, so it becomes excluded as ECMP option
during lookup result. However because the infrastructure is still
enabled, if for example interface flaps, the HW will invalidate the
best ECMP option, and the next-best (if any) becomes valid.

In practice what happened after Juniper enabled that infrastructure is
that we started to get a lot of bugs where after the network event we
had a blackholing event. These were largely caused because software
omits reprogramming hardware when something happens sufficiently fast
that  software didn't have time to invalidate the best option, then
software will prune the invalid+valid before it enters hardware. Which
is good optimisation, unless you've now added capability in the
hardware to invalidate adjance without sw.
To our surprise, Junos code has suffered so much technical debt that
Juniper doesn't actually know every place in code where this could
happen. We raised a separate issue to figure out why so many similar
bugs occurred to us, and Juniper came out with an answer which is
paraphrased as 'we just have to find all the bugs where this can
happen''. Naively you'd want that all these go through one function
call, and you fix the bug once there, but apparently the codebase is
far less clean so they cannot deterministically say if all of those
cases are fixed or not.
This used to be, in my experience, super rare in Junos that HW/SW
disagree with, while it used to be extremely common in PFC3.
We've not not seen this type of bug in a year or two, so maybe most are fixed.

But certainly if you are running MPLS you can have 100% coverage for
all faults, if post-convergence path exists, you can utilise it
immediately after hardware detects fault (link down), without waiting
for software. This makes SPT performance quite uninteresting, if rapid
convergence is the goal.



-- 
  ++ytti
_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/D4TMSWXOGQHKL7ZQEQZ2HKABGKKYW2AB/

Reply via email to