Hi,

The babel RTT metric measurements provided by bird appears suspect for my setup. The metric through a tunnel with a latency of about 5ms is shown in babel as 150+ms.

Can others replicate this issue? (should be easy to check for other babel users since RTT measurement is on by default in recent versions)

First I suspected a problem with the tunnel, but I compared bird's babel RTT measurement against a long-running ping for the same time period and got ~160ms measured by bird's babel implementation, and 4.6ms with a 28ms maximum latency reported by pings in the same wireguard tunnel. Other machines across my network also report similarly inflated RTT metrics for all non-wired links.

Debug logs show many RTT samples with approximately correct timestamps (4-6ms) then the occasional IHU with 800-1200ms calculated instead. Calculating the RTT metric by hand using babel packet logs shows that the calculations are correct. By correlating two packet dumps (the machines have <1ms NTP timekeeping) I can also see that the packets for which high RTT is calculated have similar transit times through the tunnel as other packets. Hence, I suspect the accuracy of the packet timestamps recorded by bird. Is the current packet timestamping system giving correct timestamps if the packet arrives while babel is processing another event?

I can provide packet captures for anyone interested in debugging further.

Thanks,
Stephanie.

Reply via email to