Re: [ntp:questions] what's the matter with my ntp

Mike Cook Sat, 29 Aug 2015 23:49:41 -0700

> Le 30 août 2015 à 05:26, Brian Inglis <brian.ing...@systematicsw.ab.ca> a 
> écrit :
> 
> On 2015-08-29 15:23, Mike Cook wrote:
>>> Le 29 août 2015 à 17:41, Charles Swiger <cswi...@mac.com> a écrit :
>>> On Aug 29, 2015, at 1:03 AM, Mike Cook <michael.c...@sfr.fr> wrote:
>>>> For me this would not be considered acceptable. I would NEVER have ntp 
>>>> servers in virtual machines.
>>> +1 to this point.
>>>> Your maxpoll default of 10(1024s) for the pool servers is too high. Try 
>>>> dropping it to 6 (64s) or 7(128s).
>>> -1 to this.
>>   experience tells me otherwise.
>> Here’s a test:
>> 2 clients, same hardware, same LAN segment, same version NTP, same ntp 
>> config, with exception of their own GPS PPS ref clock.
>> The 2 clients reference 4 identical servers with the same versions of ntpd, 
>> all 4 have GPS PPS ref clocks in addition to their PPS refs and preferred 
>> GPS sync’d server.
>> All servers are on the same LAN segment as the clients. These servers are 
>> configured with maxpoll 10 on bb2, and 6 on bb3.
>> ntpq -pn data
>> bb2
>> Sat Aug 29 20:59:19 UTC 2015
>>      remote           refid      st t when poll reach   delay   offset  
>> jitter
>> ==============================================================================
>> o127.127.22.0    .Neo6.           0 l   13   16  377    0.000    0.002   
>> 0.002
>> *192.168.1.4     .PPS1.           1 u    9   16  377    0.712    0.119   
>> 0.015
>> -192.168.1.15    .PA6H.           1 u  583 1024  377    1.163    0.358   
>> 0.056
>> -192.168.1.16    .MAX7.           1 u  597 1024  377    1.260    0.369 
>> 189.435
>> +192.168.1.17    .ResT.           1 u  493 1024  377    1.261    0.312   
>> 0.040
>> +192.168.1.18    .Neo8.           1 u  600 1024  377    1.136    0.250   
>> 0.088
>> bb3
>> Sat Aug 29 20:57:18 UTC 2015
>>      remote           refid      st t when poll reach   delay   offset  
>> jitter
>> ==============================================================================
>> o127.127.22.0    .NS-T.           0 l   12   16  377    0.000    0.003   
>> 0.002
>> *192.168.1.4     .PPS1.           1 u   10   16  377    0.591    0.064   
>> 0.029
>> +192.168.1.15    .PA6H.           1 u   10   16  377    0.437    0.021   
>> 0.037
>> -192.168.1.16    .MAX7.           1 u    9   16  377    0.583    0.074   
>> 0.029
>> +192.168.1.17    .ResT.           1 u   10   16  377    0.448    0.017   
>> 0.030
>> +192.168.1.18    .Neo8.           1 u   10   16  377    0.593    0.055   
>> 0.022
>> 
>> You can see that the maxpoll 10 servers seen from bb2 have delays at >2 
>> times that of the same servers seen from bb3.
>> Reported offset on bb3 is up to 5 times that seen from bb3.
>> Reported jitter (ntpd’s error bars) is in 3 cases of the same order, though 
>> greater, but in one is completely anomalous on bb2 .
>> 
>>  I have not made a complete analysis of this and give the following as a 
>> probable cause.
>> Long intervals between client server exchanges allow arp caches to be 
>> cleared in both clients, servers, routers causing arp resolution (who has 
>> IP, I have IP, here’s my MAC).
>> This extra path delay will possibly be asynchronous and offset and jitter 
>> will be increased .
>> Here are the clients arp caches to show what I mean:
>> bb2
>> mike@bb2:~$ arp -a
>> livebox-router (192.168.1.1) at 3c:81:d8:db:4e:b4 [ether] on eth0
>> muon.stratum1.d2g.com (192.168.1.4) at 00:00:24:c6:20:60 [ether] on eth0
>> electron.home (192.168.1.13) at 34:15:9e:01:e5:9c [ether] on eth0
>> cubieez.stratum1.d2g.com (192.168.1.124) at 02:cd:07:c1:82:5f [ether] on eth0
>> mike@bb2:~$ exit
>> bb3
>> mike@bb3:~$ arp -a
>> cubieez.stratum1.d2g.com (192.168.1.124) at 02:cd:07:c1:82:5f [ether] on eth0
>> livebox-router (192.168.1.1) at 3c:81:d8:db:4e:b4 [ether] on eth0
>> raspb3.stratum1.d2g.com (192.168.1.17) at b8:27:eb:71:05:50 [ether] on eth0
>> raspb2.stratum1.d2g.com (192.168.1.16) at b8:27:eb:2b:ab:90 [ether] on eth0
>> raspb4.stratum1.d2g.com (192.168.1.18) at b8:27:eb:fe:15:fa [ether] on eth0
>> electron.home (192.168.1.13) at 34:15:9e:01:e5:9c [ether] on eth0
>> muon.stratum1.d2g.com (192.168.1.4) at 00:00:24:c6:20:60 [ether] on eth0
>> raspb1.stratum1.d2g.com (192.168.1.15) at b8:27:eb:7e:0b:f2 [ether] on eth0
>> So in certain application environments using long poll intervals IS 
>> detrimental.
>> I am going to run the same configs overnight with the net loaded to see what 
>> difference that makes.
> 
>>> Unless your hardware is broken, it shouldn't need to ask what the time is 
>>> every minute just to manage decent timekeeping accuracy.  Admittedly, 
>>> running on a VM often resembles running on real hardware with a broken 
>>> clock, but to me that in turn means one should be running ntpd in the 
>>> hypervisor so that it is managing a real HW clock and let the VMs inherit 
>>> time from the hypervisor / Dom0 / etc.
> 
> Reports indicate min/maxpoll 4 improve results on LAN segments,
> minpoll 6 improves remote network server offsets, maxpoll 10
> improves system clock frequency and offset estimation using only
> remote network servers.
> I have seen the latter pull Windows systems within 10-100us of UTC,
> much better than the few to tens of ms expected on this platform.


Here’s the results of my experiment with a loaded network (just pings between 
clients and servers at periods less than the  maxpoll intervals).

bb2 
Sun Aug 30 05:37:32 UTC 2015
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
o127.127.22.0    .Neo6.           0 l    2   16  377    0.000    0.002   0.002
*192.168.1.4     .PPS1.           1 u   14   16  377    0.726    0.116   0.026
-192.168.1.15    .PA6H.           1 u 1002 1024  377    0.751    0.059   0.030
+192.168.1.16    .MAX7.           1 u  192 1024  377    0.733    0.115   0.117
-192.168.1.17    .ResT.           1 u  913 1024  377    0.717    0.015   0.035
+192.168.1.18    .Neo8.           1 u  108 1024  377    0.682    0.033   0.025
bb3
Sun Aug 30 05:37:43 UTC 2015
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
o127.127.22.0    .NS-T.           0 l    5   16  377    0.000    0.005   0.002
*192.168.1.4     .PPS1.           1 u    3   16  377    0.586    0.056   0.021
+192.168.1.15    .PA6H.           1 u    3   16  377    0.404    0.009   0.014
-192.168.1.16    .MAX7.           1 u    2   16  377    0.605    0.086   0.041
+192.168.1.17    .ResT.           1 u    3   16  377    0.444    0.013   0.027
+192.168.1.18    .Neo8.           1 u    3   16  377    0.541    0.065   0.022

So now we see that both servers figures are very similar, though the lower 
maxpoll still has the edge.  
I was not monitoring the arp cache over the period, but some timelines for bb2.

The anomalous jitter figure from 192.168.1.16 was absorbed by 22:52UTC, 2h20m 
after it appeared and about 60mins into this last experiment . 
There were no other instances of strange jitter.
The delay figure dropped 300us after just 5mins of adding the network load.
Offset and jitter fell to current levels at about 23:48UTC. A quick calculation 
make that 7 poll cycles.

 I have described the setup, this is a very small group of micro systems which 
themselves are used for only time related experiments. The real world is 
somewhat cluttered so YMMD.

Mike

> 
> -- 
> Take care. Thanks, Brian Inglis
> _______________________________________________
> questions mailing list
> questions@lists.ntp.org
> http://lists.ntp.org/listinfo/questions

"Ceux qui sont prêts à abandonner une liberté essentielle pour obtenir une 
petite et provisoire sécurité, ne méritent ni liberté ni sécurité."
Benjimin Franklin
_______________________________________________
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions

Re: [ntp:questions] what's the matter with my ntp

Reply via email to