Ted Beatie wrote:
  server <one or more servers, external or internal>
  server <one or more other gateways, using the back-end addresses>

Add iburst to the end of each server line. This speeds up synchronization.


To all of the server lines, or just the internal-to-our-system servers?


All.


  server <two or more gateways, using the back-end addresses>

Three servers are an absolute minimum because 2 means it has no way of knowing which is providing better information. Let's leave aside the question of the meaning of the word "better", it's a very complicated subject.


As I mentioned to Tom, what if we can't guarantee that?  As near as I
can tell, whereas more is better, the only actual requirement is for one
server.  In some cases, we're lucky if we get even one, so we either
need to believe that one, or we need to set the time manually.


Should I assume that you have no control over these systems? Whose requirement?


Based on the above the internal NTP server has a stratum of 2 and will almost always be used over a stratum of 4. Is that internal NTP server getting its data from a stratum 1 server and is it internal or external?


It is internal, and looks like it gets it's time from other internal machines;

portal-01:~# ntptrace -n
127.0.0.1: stratum 3, offset 0.000006, synch distance 15.20248
10.16.4.1: stratum 2, offset -2.558634, synch distance 1.00000
10.16.4.100: stratum 2, offset -2.571121, synch distance 1.00000
10.16.100.2: stratum 2, offset -2.520537, synch distance 0.04373
132.163.4.101:  *Timeout*

That means that 10.16.100.2 is not actually getting time from anywhere and is currently isolated. It can't reach the Boulder timeserver.


By obfuscating the addresses it's hard to know if you've also removed the Tally Codes which indicates what gateway1 thinks of the servers. Since you are using the private address space for this it really doesn't matter if they're seen. If you don't want to show the names, just add a -n and it won't translate the IP addresses.


As I mentioned in the post, the tally codes were spaces.

portal-01:~# ntpq -nc pe localhost
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 10.16.4.1       10.16.4.100      2 u   40   64  377    0.280  -2558.0   4.447
 10.123.123.2    10.123.123.1     4 u  810 1024  377    0.172  -1849.0   2.014
 10.123.123.3    0.0.0.0         16 u  679 1024    0    0.000    0.000 4000.00


That means that it's not synchronized and it hasn't even decided on how valid or invalid each of them are.


This only has two servers and you need at least 3. As it is gateway1 and gateway2 are at two different stratum levels. However you need to fix the problem first on the gateways.


Despite the spec, that seems to be a consistent interpretation.  If
everything internal is fully meshed, and there is only one external time
source, will everything sync up to that external source, no matter the skew?


At best you will get the single source time, but that's not guaranteed.


Looking at the debugging techniques, and seeing that the tally code is
a space, and delving deeper, I see;

  gateway1:~# ntpq -c as localhost
  ind assID status  conf reach auth condition  last_event cnt
  ===========================================================
  1 47900  9014   yes   yes  none    reject   reachable  1
  2 47901  9014   yes   yes  none    reject   reachable  1
  3 47902  8000   yes   yes  none    reject

  storage-node2:~# ntpq -c as localhost
  ind assID status  conf reach auth condition  last_event cnt
  ===========================================================
  1 16076  9064   yes   yes  none    reject   reachable  6
  2 16077  9064   yes   yes  none    reject   reachable  6


Usually you will see these kinds of results when the server you are looking at has just started. You really need to give it time to synchronize.


Not in this case;

portal-01:~# ps aux|grep ntp;for i in 2 51 52 53 54; do ssh -1
10.123.123.$i ps aux; done | grep ntp
root   11283  0.0  0.1  2328 2320 ?        SL   Sep30   0:05 /usr/sbin/ntpd
root   17856  0.0  0.1  2328 2320 ?        SL   Sep30   0:04 /usr/sbin/ntpd
root     382  0.0  0.1  2328 2320 ?        SL   Jun13   0:04 /usr/sbin/ntpd -g
root     382  0.0  0.1  2328 2320 ?        SL   Jun13   0:04 /usr/sbin/ntpd -g
root     383  0.0  0.1  2328 2320 ?        SL   Jun13   0:04 /usr/sbin/ntpd -g
root     389  0.0  0.1  2328 2320 ?        SL   Jun13   0:05 /usr/sbin/ntpd -g


This is hard to understand since you can't tell which system is which.

(the Sep30 processes are on the two gateways, the Jun13 processes are on
the servers.  I had recently manually stopped ntpd, resync'd the times,
and restarted ntpd on the gateways)


This appears to indicate it received just one packet which is not enough to synchronize anything. How long did you wait for the server after it was started to interrogate this server? You need to wait at least 15-20 minutes when you don't use iburst.


How long would it take with iburst set?  How can we deal with the fact
that the gateways and servers all generally come up at the same time?


Usually with iburst it can be as fast as 15 seconds but it depends on lots of factors. I don't think this is your issue here.

Danny

            --ted

--
Ted Beatie                         Permabit, Inc.             [EMAIL PROTECTED]
Sr. Systems Engineer       One Kendall Sq, Cambridge, MA       +1-617-995-9317

_______________________________________________
questions mailing list
[email protected]
https://lists.ntp.isc.org/mailman/listinfo/questions


_______________________________________________
questions mailing list
[email protected]
https://lists.ntp.isc.org/mailman/listinfo/questions

Reply via email to