We are experiencing an issue using Orphan mode and peering in our ntpd 4.2.6p4
set-up. With the loss of our stratum 1 time hosts, the stratum 2 are not
properly choosing a primary time provider. Below is our ntp.conf for all 4 of
the stratum 2 servers:
tinker step .010 stepout 60 panic 0
server tfds1 prefer minpoll 4 maxpoll 5 burst iburst
server tfds2 minpoll 4 maxpoll 5 burst iburst
server tfds3 minpoll 4 maxpoll 5 burst iburst
peer timehost1 minpoll 4 maxpoll 5 burst iburst
peer timehost2 minpoll 4 maxpoll 5 burst iburst
peer timehost3 minpoll 4 maxpoll 5 burst iburst
peer timehost4 minpoll 4 maxpoll 5 burst iburst
tos orphan 4
driftfile /etc/ntp/drift
The stratum 2 (timehost[1-4]) attempt to peer with the loss of the stratum 1
(tfds[1-3]}. However, instead of them all staying at stratum 4 as was seen when
using ntpd 4.2.4p7 (have other issues with 4.2.4p7 and need to update), the
peers are dropping down 1 stratum from the peer they are locking to. Since they
are peering to one another, this results in the timehosts slowly dropping in
stratum as they attempt to stay 1 stratum below the locked to host. They
continue to drop in stratum until reaching a stratum 16. Once they hit stratum
16, all other hosts disconnect and the peers previously locking to the now
stratum 16 host will unlock and jump back to a stratum 4. Once at least 1 peer
jumps back to 4, the others will begin jumping to stratum 4-5. This process
will repeat itself until the stratum 1 hosts are reconnected or the timehosts
choose a primary. We have only once seen it stabilize with all 4 hosts and it
took almost a full 24 hours to do so. With only 3 timehosts r
unning, they will stabilize within minutes.
>From what we are able to tell, a primary peer is chosen when 3 of the 4
>timehosts lock to the same peer. When the 4th peer sees that the others are
>all connected to it, it syncs to its internal clock and remains a stratum 4.
>Is this correct, or is something else going on here?
Further questions:
Are the peers intentionally dropping below the orphan mode set stratum, or is
that a bug?
Are we missing anything in ntp.conf to make orphan mode work properly?
Is this possibly just a limitation on the number of peers?
If working as intended, is there a way to force a primary peer quicker?
Note: We have tested without burst/iburst on the peer declarations as well as
the removal of the timehost declaration of the host itself. None of these
modifications had an impact.
Thanks,
Matt
_______________________________________________
questions mailing list
[email protected]
http://lists.ntp.org/listinfo/questions