Maxence,
Maria gave a lovely breakdown about some of the considerations for her
implementation. Among the the details, the core one is hidden at the
bottom of the message: BIRD uses 8 bytes to store the timestamp.
Junos uses 4.
Fundamentally this is about how much memory you're willing to throw at
things that scale on a per-route basis. Junos has not only to scale up,
but over the years has had various needs to scale downward to run on
constrained systems. This means that each per-route scaled item is kept
as thin as possible.
Having hit one of the design choices, here's a few discussion points and
some will overlap considerations Maria has given for her own implementation:
On 2/26/26 05:19, Maxence Younsi wrote:
I checked the FRRouting code and it seems like they keep track of the uptime of the peers
and paths as seconds with a monotonic clock. They then convert that monotonic time to
"real clock time" and get some microseconds value that way. If I'm correct, I
guess this is really only seconds granularity. This looks similar to what Juniper does.
One of the details you're seeing across implementations is there's a
cost to getting clock time that you'll believe. In circumstances where
fetching a system clock with appropriate monotonic granularity is
involved, this is sometimes an infrequent operation. A system call's
context switch may interrupt CPU intensive work and cause the system to
yield important work to other daemons that are fighting for CPU during
scaled work. So, avoiding calls to fetch the time when it's "not really
important" is a common tradeoff.
This indirectly talks about daemon schedulers and also nods to the fact
that on unix-like environments, these are some of your design choices.
If you're on a RTOS, you may make different choices.
We would also be interested in knowing whether vendors store different
timestamps before and after inbound/outbound policy application. If they do,
and if operators are interested in that, we can actually make Adj-RIB-In/Out
Pre-policy and Adj-RIB-In/Out Post-policy timestamps, instead of the non
specific Adj-RIB-In/Out timestamps we currently have. This would allow
measuring the duration of policy computation.
tl;dr:
We kindly ask for feedback from vendors on their timestamp granularity. We
would like to know:
- What is the highest granularity of timestamp you can get?
(seconds/ms/ns?)
- What is the highest granularity of timestamp you actually store?
(seconds/ms/ns?)
- Do you store different timestamps for Adj-RIB-In Pre-policy and
post-policy? Same question for Adj-RIB-Out.
As I'd already noted at the mic, even though we have access to higher
granularity timestamps, Junos only bothers with second level
granularity. This is enough because of the third question you ask.
The rough pipeline for our implementation is a fairly common one:
- Pull the routes out of a buffer during a read callback.
- Do the work for import
- Create and insert the routes in the routing table, perhaps slightly
asynchronously depending on threading model. (Ours is flexible.)
For us, the timestamp is created when the route itself is created and
inserted. Time has obviously passed from the prior steps, which is what
you're inquiring about. Minimally this means we're "not tracking time
it takes to create the route and run policy". While this overhead is
interesting from a performance perspective, accurately measuring it at
run time burns resources (memory) and may be disruptive to CPU - so we
don't do it.
And somewhat similarly, we don't cover the fact that there is a desire
to measure the time at which the routes arrive on the wire and hit the
other stations through the pipeline described above. This not only
depends on when the kernel decides to hand-off TCP to the socket, but
also whether TCP has been called in such a fashion to deliver whole BGP
frames. And this doesn't include whether TCP itself is operating in
such a way that BGP frames are being efficiently transferred on the wire.
In my experience, managing the cadence of TCP interaction is far more
important to throughput of routes than many of the per-route costs noted
above.
-- Jeff
_______________________________________________
GROW mailing list -- [email protected]
To unsubscribe send an email to [email protected]