Hello Maxence,

TL;DR: … well it's complicated. See inline.

On Thu, Feb 26, 2026 at 11:19:15AM +0100, Maxence Younsi wrote:

> [...] u32 for seconds + u32 for microseconds.
> We then asked which format was easier for vendors to implement.
> 
> We learned that "Juniper's implementation only really keeps track of
> seconds, everything else is fake(d)". This means they don't really
> have a use for the microseconds field.
> 
> I checked the FRRouting code and it seems like they keep track of the
> uptime of the peers and paths as seconds with a monotonic clock. They
> then convert that monotonic time to "real clock time" and get some
> microseconds value that way. If I'm correct, I guess this is really
> only seconds granularity. This looks similar to what Juniper does.

BIRD has microsecond granularity for everything. Yet BIRD 2 updates its
internal timestamping clock only once a while, so you may sometimes be
off even by several seconds. (Except for BFD which has its own clock.)

In BIRD 3, you get real microsecond granularity, yet I would be very
careful interpreting anything from that – e.g. one thread may be
processing an older route export while another one is already adding a
new one.

Also we have a large ARM (Ampere Altra, iirc) where timestamping is so
expensive that we can't get anything better than about 50 us
granularity.

> We would also be interested in knowing whether vendors store different
> timestamps before and after inbound/outbound policy application. If
> they do, and if operators are interested in that, we can actually make
> Adj-RIB-In/Out Pre-policy and Adj-RIB-In/Out Post-policy timestamps,
> instead of the non specific Adj-RIB-In/Out timestamps we currently
> have. This would allow measuring the duration of policy computation.

Our policy computation is usually sub-microsecond on export, unless the
user configures something crazy. We have measurements for this.

For import policy, the slowest thing in BIRD is often ROA, which may
eat around 100 us (depending on lots of factors).

Also many parts of the route processing are impossible to account to a
specific route, as they are processed in large batches whenever required.
This also means that the measured data may be inaccurate by quite some
systematic error.

In the end, the question is what we are trying to measure and how much
computation time and memory we are willing to spend on it.

> tl;dr:
> We kindly ask for feedback from vendors on their timestamp granularity. We 
> would like to know:
>       - What is the highest granularity of timestamp you can get? 
> (seconds/ms/ns?)

For internal scheduling, we use nanoseconds but we use these timestamps
only as relative values. For routes, microseconds are feasible.

>       - What is the highest granularity of timestamp you actually store? 
> (seconds/ms/ns?)

Microseconds. And I'm kinda frowning upon the s/us split, as it means
that we have to compute division by a million every time we export the
timestamp to BMP. Division is costly.

Well, this reminds me that I should run a performance test on this to
see how costly this actually is in this case.

>       - Do you store different timestamps for Adj-RIB-In Pre-policy and 
> post-policy? Same question for Adj-RIB-Out.

Adj-RIB-In:
  - BIRD 2 stores pre-policy timestamps only if explicit adj-rib-in is
    configured
  - BIRD 3 does not and unless somebody explicitly requests it, we won't
    implement it
  - also please note that routes from adj-rib-in may get re-evaluated
    on RTR updates or when IGP recalculates the best route

Adj-RIB-Out:
  - if explicit table is configured, both BIRD 2 and BIRD 3 do store
    post-policy timestamps (but with different semantics)
  - please note that routes may get re-evaluated, and with that the
    timestamp difference becomes totally bonkers

> Optional questions:
> To define precisely what the timestamp types we would also like to know:
>       - Do timestamps stored in the RIB correspond to the last update on the 
> route attributes? 
>       - Do they correspond to the time of allocation of the route in the RIB?

Yes, basically both (differing just by several instructions).

Also, it may happen that into BIRD 3.3.x or 3.4.x, we implement deferred
best route selection, and then there will be even more uncertainty in
the timestamps … I can now think of these:

- attributes received
- next hop pre-resolved
- NLRI parsed
- pre-policy attributes stored
- filters done
- MPLS labels assigned
- next hop applied
- route inserted / withdrawn
- best route selected
- for each exporting protocol instance
  - notification received
  - route picked from queue
  - prefilter done (e.g. ignoring the BGP instance's own routes)
  - filters done
  - BGP postprocessing done
  - route queued for TX
  - route put into socket
  - socket ready for more TX

Huh. That's a lot. We don't store almost any of these but all these are
significant places in the route processing path. And I'm not even
thinking about more complicated setups involving EVPN and L3VPN route
translation or advanced route manipulation with multiple tables.

Also, please note that every timestamp is 8 bytes. At a route reflector
with 100 peers, to store just adj-rib-out pre-policy + post-policy,
it's 2 * 8 * 100 * 1M3 = 2G of memory for IPv[6+4] full BGP tables.

Even if we decide to just send the timestamps and forget them instantly,
it's quite some more data into the firehose which BMP already is.

With all of that, it's compelling but massively complicated when trying
to do right, and I'm afraid that it's already kinda crossing the line
where the implementation specifics matter more than acceptable for RFC
standardization.

I hope that I haven't confused you too much with all this infodump.

Have a nice day!  
Maria

-- 
Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
_______________________________________________
GROW mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to