I'll turn this into a web page, but this is what I have now. Corrections/feedback encouraged. Off-list is fine.
The place to start is a system's loopstats file. This is from a low cost DigitalOcean cloud server in San Francisco. http://users.megapathdsl.net/~hmurray/ntpsec/SFO-self.png That is the system's opinion of how good its clock is. There are two types of errors to consider. The first is the wiggles in that graph. That tells you how stable the local clock is. In this case, except for a few spikes early on, the system mostly thinks it is within 1/2 ms of the correct time. So as long as we are interested in millisecond accuracy rather than microseconds, this system is probably a good place to stand while looking at other servers and/or the internet connections from here to there. The other type of error is systematic errors, for example, using the wrong edge of a PPS pulse or asymmetric network delays. They don't show up in loopstats. You can't detect them without digging deeper. Both types of errors are something you need to keep in mind when looking at graphs. After the typical request-response packet exchange, a NTP client has 4 time stamps: The time the request left the client The time the request arrived at the server The time the response left the server The time the response arrived at the client Note that there are two different clocks used to make those time stamps, either of which may be inaccurate. NTP servers also act as clients to get their time from lower stratum servers. ntpd logs those time stamps in the rawstats file. If you use the "noselect" option on a "server" line in your config file, you can collect info without letting dirty data corrupt your local clock. Here is a graph of the round trip times from San Francisco to several servers on the east coast: http://users.megapathdsl.net/~hmurray/ntpsec/SFO-east-rtt.png The steps in the green and red dots are due to routing changes. The fuzz on the blue dots is queuing delays on some overloaded link. The cap on the fuzz indicates that the overloaded link has 10 ms of buffering. There are a few scattered red dots. The ones that indicate extra delays are typical network glitches. I don't have a good story for the ones at 14 and 15 hours that indicate reduced time. My guess would be a transient network path that was a few ms shorter but didn't happen often enough to show up clearly. Normally, ntpd assumes that the network delays are symmetrical. That lets it compute the offset between the local clock and the remote clock. Here is a graph of results of that calculation: http://users.megapathdsl.net/~hmurray/ntpsec/SFO-east-off.png If instead, you assume that both clocks are accurate, you can compute the network transit delays in each direction. I picked well run servers for this experiment, so that assumption is probably valid. The limiting factor is probably the ms or so on the local clock. Here is a graph of the delays to/from rackety: http://users.megapathdsl.net/~hmurray/ntpsec/SFO-rackety-out-back.png That shows that the congestion is on the return path. It also shows that the return path takes about 5 ms longer than the forward path. Here is the out/back graph for the NIST systems: http://users.megapathdsl.net/~hmurray/ntpsec/SFO-nist-out-back.png The first thing to notice is that the outgoing path takes over twice as long as the return path. Going back to the round trip time graph, it's suspicious that systems located relatively near each other have such large differences in round trip times. The return times are close to the times to/from rackety. Note that there are only a few steps in the bottom/return path and the steps in the top/forward path match the steps in the round trip time so most of the routing changes are on the long forward path. There is an interesting event associated with time-d from 17.5 to 18.5 hours. Note that the out/back steps are mirror images of each other and that there is no change in the round trip time during that time slot. That would happen if the time on the remote system was offset. It could also happen with some unlikey changes in routing. Here is the round trip time graph for the nearby clocks used as references by this system: http://users.megapathdsl.net/~hmurray/ntpsec/SFO-local-rtt.png And the corresponding offset graph: http://users.megapathdsl.net/~hmurray/ntpsec/SFO-local-off.png The routing to all 3 clocks is stable, but something is off by 1/2 ms. Here is the out/back graph: http://users.megapathdsl.net/~hmurray/ntpsec/SFO-local-out-back.png (I dropped one of the HP clocks to reduce clutter.) The mirror image pattern is due to offsets/errors in the local clock. (It could be due to errors in the remote clocks, but all 3 have GPS/PPS inputs and the return paths all agree.) Note the 1/2 ms offset between the two out times. In order to figure out which clock/path is correct, I'll have to find at least one more good clock. (The 2 clocks at HP are on the same subnet so they only get one vote.) -- These are my opinions. I hate spam. _______________________________________________ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel