On Tue, Oct 7, 2025 at 9:55 AM Martin Tonusoo via NANOG <
[email protected]> wrote:

> Hi.
>
> > 2. p3619 : "Then each new prefix will be propagated in parallel."
> >
> > Not really. Even if you assume the AS A sent a single UPDATE with 1 NLRI
> for each prefix, ASes B C D are going to aggregate multiple NLRI changes in
> a single UPDATE message to each other. This isn't going to cause the
> amplification claimed.
>
> Perhaps the authors meant that each UPDATE message sent by AS A has
> unique path attributes and thus ensuring that ASes B, C and D can not
> aggregate multiple NLRIs into a single UPDATE message.
>
>
> I tried to replicate the "BGP Vortices Delay Network Convergence" test
> demonstrated in paragraph 5.3. Setup(drawing:
> https://gist.github.com/tonusoo/1cced39aa6ae53143d12623a05f02331) is
> very similar to figure 4b on the page 3621, but all my routers are
> running BIRD 3(single thread mode). Router "rY"(ingress) injects real
> BGP feed into the lab setup, router "rX"(upstream) periodically
> advertises and withdraws 50 routes and router "rK" injects 5k prefixes
> for the BGP vortex. Running the packet capture on Linux bridge
> connecting, for example, the "rN" and "rM" routers confirms that the
> BGP vortex is ongoing and I'm seeing well over 10k UPDATE messages per
> second. However, I might be doing something wrong, but I don't see the
> delays shown on figure 5a on page 3622. That is, 50 routes advertised
> or withdrawn by "rX" are propagated to "rZ" within few hundred
> milliseconds and not delayed for 10+ seconds.
>


Looking at figure 6, it appears that the larger component appears to be the
time
between when the BGP update message arrived at the bystander-AS and when
FRR finished logging the update message in its logs.  As the methodology
claims:

By subtracting the time a route advertisement arrived at the bystander-AS
from when it was logged in the FRR’s BGP log, we computed the processing
time on the bystander-AS.

As someone who has dealt with logging of debugging output from programs
that
need to be as real-time as possible, the logging functions are generally
written
to be asynchronous and separate from the main processing path, so that
delays
in the logging subsystem don't hold up the real work the program is doing.
Using the appearance of a log message as an indicator of precise timing of
when
a RIB update happened is handwavy at best, and flat-out wrong at worst.
The timestamp at which the zlog subsystem of FRR got the BGP update log
message is unlikely to be the same timestamp at which the RIB itself was
updated.  Indeed, when researching FRR logging timestamps, it says


   - Performance impact: Debug-level logging can significantly increase the
   load on the system and may not capture precise, real-time updates without
   impacting performance, especially for frequent RIB updates.

So, you end up with a double-whammy; turning on debug logging to see the
logs for the routing updates significantly increases the load on the box
running
FRR, which in turns slows down the rate at which it can process update
messages
coming in.

I think we've all known for years the perils of turning on extensive debug
messages on routers.
How many of us have had the awkward moment of a partner shaking us awake in
bed
saying "what happened?  You were shouting "undebug all!  undebug all!" in
your sleep.
Were you having a nightmare?"

I suspect if you turn on verbose debugging logging on "rZ", you might find
that suddenly
route updates to the RIB slow down noticeably.  This has less to do with
the actions
of a route vortex, and much more to do with hitting the CPU of your router
over the head
repeatedly with the blunt hammer of sprintf.  ^_^;;

Matt
_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/[email protected]/message/O6FEAOOIOT5VK6GGAN5EOEYSRS7HPOHZ/

Reply via email to