I guess it might not fit with bird's abstractions (or perhaps the Babel 
protocol), but has thought been given to using SO_TIMESTAMPING to have the 
kernel compute TX/RX timestamps? 

- Erin


On Sat, 13 Apr 2024, at 16:14, Maria Matejka via Bird-users wrote:
> Hello Stephanie, Toke and list,
> 
> On Fri, Apr 12, 2024 at 04:22:50PM +0200, Toke Høiland-Jørgensen via 
> Bird-users wrote:
> 
>> Stephanie Wilde-Hobbs via Bird-users bird-users@network.cz writes:
>> 
>>> The babel RTT metric measurements provided by bird appears suspect for my 
>>> setup. The metric through a tunnel with a latency of about 5ms is shown in 
>>> babel as 150+ms.
>>> 
> […]
> 
>>> Debug logs show many RTT samples with approximately correct timestamps 
>>> (4-6ms) then the occasional IHU with 800-1200ms calculated instead. 
>>> Calculating the RTT metric by hand using babel packet logs shows that the 
>>> calculations are correct. By correlating two packet dumps (the machines 
>>> have <1ms NTP timekeeping) I can also see that the packets for which high 
>>> RTT is calculated have similar transit times through the tunnel as other 
>>> packets. Hence, I suspect the accuracy of the packet timestamps recorded by 
>>> bird. Is the current packet timestamping system giving correct timestamps 
>>> if the packet arrives while babel is processing another event?
>>> 
>> Hmm, so Babel implementation in Bird tries to get a timestamp as early as 
>> possible after receiving the packet, and set it as late as possible before 
>> sending out the packet. However, the former in practice means after 
>> returning from poll(), so if the packet has been sitting around in the OS 
>> buffer for a while before Bird gets around to process it, the timestamp is 
>> not set until Bird is done processing it. Likewise, if the packet sits 
>> around in a socket buffer (or in a lower-level buffer on the sending side) 
>> after Bird has sent it out, that time will also be counted as part of the 
>> RTT.
>> 
> I would suspect that the kernel table prune routine may be the case. It just 
> runs from begin to end synchronously.
> 
> I have just fast-tracked Babel in its own thread for BIRD 3, it may be worth 
> checking. (There should be also artifacts from the build process for download 
> available.) This should get you rid of most of the cases of suspiciously high 
> RTT.
> 
> `https://gitlab.nic.cz/labs/bird/-/tree/babel-in-threads`
> Just to be noted, updating a route in BIRD 3 is still a locking process so it 
> may still tamper the RTT measurements. At least it should happen only in 
> cases where Babel is doing the update. Anyway, with BIRD 3 internals, it 
> should be possible to easily *detect* such situations and disregard these 
> single measurements as unreliable. (Not implemented, though.)
> 
> There are even some thoughts on implementing lockless import queues for 
> routing tables, yet now we have to prioritize BIRD 3 stabilization to 
> actually release it as a stable version. Import queues must wait.
> 
> Also with this testing, feel free to report any weird behavior, notably 
> crashes of BIRD 3, as bugs. That would be very helpful with stabilizing BIRD 
> 3. Thanks a lot!
> 
> Maria
> 
> – Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
> 

Reply via email to