Re: [ntp:questions] NTP.log interpretation

2014-04-29 Thread GregL
As a follow up to this discussion which has been very helpful to me
wanted to say we are moving towards adding 4 internal time servers (NTP
Unix based.. non-x86 hdw) to the two Windows domain controllers that are
running NTP as well.  All 6 will be synced externally and will serve all
our internal NTP client systems.   The 'load balancer' will be removed and
all ntp.conf files will have all 6 time server entries.

Thanks to everyone for the helpful and informative input.

-Greg
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-21 Thread Charles Swiger
Hi, Greg--

On Apr 21, 2014, at 9:13 AM, GregL  wrote:
> What you are saying is that if we are currently using a DNS load balancer
> appliance to point 'ntp.host' to ntp1.host or ntp2.host (ntp1 unless it
> fails, then to ntp2)that is really isn't doing us any good, because
> that only dns lookup is only ever done when xntpd starts.   Once it gets
> the IP address on startup, it always uses the numerical IP address to send
> out it's UDP requests.   Is that correct?

That's largely correct, yes.  (Certainly for anything old enough to be called 
xntpd.  :-)

> Not even if the IP fails to respond to requests, will it try the DNS lookup 
> again?

Modern versions of ntpd have a "pool" directive which can be used in place of 
"server";
when using pool, ntpd will re-query DNS and hope to get a different IP if the
current IP fails to respond:

  http://www.eecis.udel.edu/~mills/ntp/html/discover.html#pool

Regards,
-- 
-Chuck

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-21 Thread GregL
On Fri, Apr 18, 2014 at 3:15 PM, Jochen Bern wrote:

> (FWIW, ntpd does the DNS resolution *once* when loading its config and
> works with the one IP obtained from then on, plans of implementing
> automatic rotation/selection of "pool" servers in future versions
> notwithstanding. And having potentially disagreeing NTP servers put
> behind a V*IP* load balancer is discouraged as well.)
>

Forgive me for just double checking this point.

What you are saying is that if we are currently using a DNS load balancer
appliance to point 'ntp.host' to ntp1.host or ntp2.host (ntp1 unless it
fails, then to ntp2)that is really isn't doing us any good, because
that only dns lookup is only ever done when xntpd starts.   Once it gets
the IP address on startup, it always uses the numerical IP address to send
out it's UDP requests.   Is that correct?  Not even if the IP fails to
respond to requests, will it try the DNS lookup again?

Thanks!

-gregl
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-20 Thread David Taylor

On 21/04/2014 02:29, William Unruh wrote:
[]

You might consider replacing those with a couple of small linux or bsd
systems just there for serving the time. Windows is not the greatest
time platform, and a cheap (RPi with Sure gps) time server which is not
Windows and is not busy with other stuff might well be more consistant
and useful.


As a guide, Windows on a wired network is satisfactory for 
"sub-millisecond" level accuracy, with Windows-8 being the preferred 
platform, or nearer "microsecond" withn FreeBSD or Linux.  There are 
some notes on using the Raspberry Pi here:


  http://www.satsignal.eu/ntp/Raspberry-Pi-NTP.html

and a low-cost, no-soldering GPS/PPS board for use with an internal or 
external antenna here:



http://ava.upuaut.net/store/index.php?route=product/product&path=59_60&product_id=95

--
Cheers,
David
Web: http://www.satsignal.eu

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-20 Thread William Unruh
On 2014-04-21, GregL  wrote:
>
>
> I really do appreciate all the responses to my questions.  I would like to
> give this the proper respect it deserves and evaluate some best practices
> with respect to time synchronization across our enterprise systems.  I
> would rather make configuration changes once, than have to go back and
> change them.
>
> Currently, while I mentioned that there are two servers serving time...I
> did not mention that they are windows domain controllers.  I think that may
> add other factors that need to be considered as well.  This is an
> environment that I'm getting up to speed on over the last 6 months.

You might consider replacing those with a couple of small linux or bsd
systems just there for serving the time. Windows is not the greatest
time platform, and a cheap (RPi with Sure gps) time server which is not
Windows and is not busy with other stuff might well be more consistant
and useful.

>
> The more I learn, the more I see how much I don't know ;-)   But knowing
> what I don't know is better than not knowing ;-)  I think ;-)

Always. 

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-20 Thread GregL
On Fri, Apr 18, 2014 at 9:43 PM, Jason Rabel
wrote:

> Greg,
>
> As others have suggested, any client running NTP should point to *at
> least* 3 time sources (usually ~5 is preferred)... The reason
> being if one server goes wacko, but the other two agree, then the client
> knows to X out the bad one and keep the two others. With
> only two you are essentially just flipping a coin...
>
> I do not know where you are located, but if you are serving time to 100+
> clients, you should probably consider the "pool" servers as
> backup sources and look more into finding local public stratum 1 & 2
> servers:
>
> http://support.ntp.org/bin/view/Servers/StratumOneTimeServers
>
> http://support.ntp.org/bin/view/Servers/StratumTwoTimeServers
>
> NTP uses very very very little bandwidth, it's one small UDP packet (less
> than 128 bytes) that (assuming default configuration)
> works its way up to once every 17 minute... There's no reason to be stingy
> with selecting a handful of external internet time
> servers (unless company policy prohibits it).
>


I really do appreciate all the responses to my questions.  I would like to
give this the proper respect it deserves and evaluate some best practices
with respect to time synchronization across our enterprise systems.  I
would rather make configuration changes once, than have to go back and
change them.

Currently, while I mentioned that there are two servers serving time...I
did not mention that they are windows domain controllers.  I think that may
add other factors that need to be considered as well.  This is an
environment that I'm getting up to speed on over the last 6 months.

The more I learn, the more I see how much I don't know ;-)   But knowing
what I don't know is better than not knowing ;-)  I think ;-)



-gregl

Greg Leibfried
*email is preferred
*leave txt/voice msg at:  507.722.1151
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-19 Thread Jason Rabel
> Orphan Mode is an automatic server discovery scheme. Nothing more.
>
> Orphan Mode does not make it possible for the members of a "time island"
> to determine the correct time in the absence of reference sources.
>
> Symmetric Active/Passive Mode (aka Peer Associations) allows the
> creation of a bidirectional link between two ntpd instances.
>
> In my experience, the ntpds in a peer association will ignore each other
> when they have the same sys_peer (i.e. when they are "synchronised" to
> the same source).
>
> And the Mitigation Rules do not provide special Peer Classification for
> Symmetric Active/Passive Mode.

Steve, the way I read it is that the tos ophan settings is a replacement for 
the local clock driver, so I tried to follow the image
of the peer network as shown on the page you linked.

Each of my S2s are synced to a different S1 (using the server line). But also 
each S2 has a peer line for the other two S2s and the
"tos orphan 5" line.

As a test I disconnected all S1s from the network and let things sit for a 
while to see what would happen. It has been a while but
if I recall correctly, the S2s dropped down to S5. I do not recall if each S2 
chose another S2 as its primary source or what. But
they all continued to distribute time.


If that is configured wrong, then please tell me the correct way and I will 
happily fix it.

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-19 Thread Steve Kostecke
On 2014-04-19, Jason Rabel  wrote:

> Then I have three Stratum-2 servers that use the "server" line for
> the S1 servers, but in addition they use the "peer" line with each
> other S2 server. When you combine that with "orphan" mode if all my S1
> servers went down, the S2's would work with each other to figure out
> their best guess at the right time.

Orphan Mode is an automatic server discovery scheme. Nothing more.

Orphan Mode does not make it possible for the members of a "time island"
to determine the correct time in the absence of reference sources.

http://doc.ntp.org/4.2.6p5/assoc.html#orphan

Symmetric Active/Passive Mode (aka Peer Associations) allows the
creation of a bidirectional link between two ntpd instances.

http://doc.ntp.org/4.2.6p5/assoc.html#symact

In my experience, the ntpds in a peer association will ignore each other
when they have the same sys_peer (i.e. when they are "synchronised" to
the same source).

And the Mitigation Rules do not provide special Peer Classification for
Symmetric Active/Passive Mode.

http://doc.ntp.org/4.2.6p5/prefer.html#peer
or http://doc.ntp.org/dev/prefer.html#peer

-- 
Steve Kostecke 
NTP Public Services Project - http://support.ntp.org/

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-19 Thread David Woolley

On 18/04/14 21:15, Jochen Bern wrote:

[Unthreaded reply.]

Are you aware that you mail/news client is broken and is not threading 
your replies properly.  If you are using the mail list, you need an 
In-Reply-To header.  If you are using the newsgroup, you need a 
References header.


___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-18 Thread Jason Rabel
Greg,

As others have suggested, any client running NTP should point to *at least* 3 
time sources (usually ~5 is preferred)... The reason
being if one server goes wacko, but the other two agree, then the client knows 
to X out the bad one and keep the two others. With
only two you are essentially just flipping a coin...

I do not know where you are located, but if you are serving time to 100+ 
clients, you should probably consider the "pool" servers as
backup sources and look more into finding local public stratum 1 & 2 servers:

http://support.ntp.org/bin/view/Servers/StratumOneTimeServers

http://support.ntp.org/bin/view/Servers/StratumTwoTimeServers

NTP uses very very very little bandwidth, it's one small UDP packet (less than 
128 bytes) that (assuming default configuration)
works its way up to once every 17 minute... There's no reason to be stingy with 
selecting a handful of external internet time
servers (unless company policy prohibits it).

If your company has the funds and you have the ability to mount a GPS antenna, 
then going with a commercial GPS based NTP server
might be the way to go. You can choose from various oscillator options so that 
they will flywheel if they lose GPS lock but still
keep decent time for long hold-over periods. Likewise those same companies also 
offer CDMA based time servers if you have no
sky-view access.

If you want the DIY route, any old PC running Linux or FreeBSD that has a 
serial port + a GPS module that will output a PPS will
yield you far better results than you could sync with over a network.

Finally, it would also be worthwhile to have a layer of your time servers 
"peer" with each other. For instance, I have several
Stratum-1 servers that get time via GPS. Then I have three Stratum-2 servers 
that use the "server" line for the S1 servers, but in
addition they use the "peer" line with each other S2 server. When you combine 
that with "orphan" mode if all my S1 servers went
down, the S2's would work with each other to figure out their best guess at the 
right time. Finally all my clients point to the S2
servers... Just because it's only my local LAN, I do not have any external NTP 
servers listed, but if I did then those would end up
being used as fallback sources for the S2 servers.

My S2 servers are also not dedicated time servers, but they are servers that 
don't go down, rebooted, or even tinkered with often.
For instance one is a NAS that is the primary network storage for all clients 
running. Another is a database server. 

A dedicated NTP server doesn't have to be a huge powerful machine. Many 
commercial products if you open them up you would be
surprised to see 486-class PC104 SBCs The extra cost comes in their 
proprietary hardware that usually will discipline a TCXO,
OCXO, or Rubidium oscillator to GPS or CDMA (giving the flywheel ability)...

I have built probably half a dozen GPS based Stratum-1 NTP servers using 
Soekris SBCs and Motorola Oncore GPS receivers, all off
ebay... Maybe spending $50 total in hardware and an hour or less mounting 
everything in the little chassis and soldering wires. The
end result is a nice time server that consumes maybe 5-10 watts... I also 
purchased a handful of old commercial time servers that
also pop up on eBay from time to time at decent "hobbyist" prices... But to be 
honest they provide no better time than my homemade
ones, and most are running outdated OSes and NTP distros that I would not trust 
in a commercial environment because of the potential
for exploiting (which is probably why they ended up on eBay). Not to mention 
most are using old GPS receivers from the 90's (some
aren't even timing receivers).




___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-18 Thread GregL
On Fri, Apr 18, 2014 at 3:15 PM, Jochen Bern wrote:

> Am I missing something, or will the setup described above (and assuming
> that the two servers disagree again) *force* your clients to do what you
> just called "the far greater problem"? Namely, being randomly split
> 50/50 between the two servers, not even *knowing* of the other one?
>
>
I think that is part of the reason I'm sanity checking.  I think if the
servers stay in sync... it's probably a non issue.  But that is the issue..
I've seen the havoc when one out of two servers is bad... and with the load
balancer, there's no guarantee I'm any better off...

(FWIW, ntpd does the DNS resolution *once* when loading its config and
> works with the one IP obtained from then on, plans of implementing
> automatic rotation/selection of "pool" servers in future versions
> notwithstanding. And having potentially disagreeing NTP servers put
> behind a V*IP* load balancer is discouraged as well.)
>

**especially** considering that statement!   Hmmmif that's the way ntpd
works, then I think the load balancer is worse than useless for ntp
clients... it could be disastrous, correct?

Thanks again for the feedback/advice... I think that re-examining the
configuration is even more important now.

I like the idea of one time server from a pool and the other from a gps
based source.

-greg
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-18 Thread William Unruh
On 2014-04-18, GregL  wrote:
>> > Yes, clearly the root of the most recent problem was a faulty
>> configuration
>> > that allowed our internal time servers to grow to nearly 50 seconds apart
>> > at some pointand that wreaked havoc in many many areas.
>>
>> What was causing that. Clearly one, or both, are not getting their time
>> from proper servers themselves. In you post there seemed to be a hint
>> that one of your servers was getting its time from the other. That is
>> bad idea. It is no better than having just one server.
>>
>>
> Yes.  From what I understand, one of the servers that serves as a time
> server as was rebuilt in January and the ntpd configuration was not put
> back on.  It was an oversight.  Because of other services that run there,
> that server *should* have kept in sync with the other server, but that sync
> didn't appear to happen either.

Having two servers, one of which gets its time from the other is pretty
useless. It is equivalent at the best of times to having only one
server, and at the worst to hvaing none (as you discovered).

You should always try to make sure that your sources of time really are
independent. That is a problem with the pool, you can get two or three
servers all of whom get their time from the same stratum 1 server. 

If you can do it, a better solution would be to have say one server with
a gps PPS clock source, and the other(s) from the outside ntp pool. 

>
> Clearly a bad situation.  That is corrected now, with both internal time
> servers independently configured to go to a external pool of NTP servers.
> That is more of the "correct the problem" fix;  as a matter of looking at
> the big picture, we are just trying to determine any other changes we
> should make.   Building more dedicated time servers that aren't rebooted
> weekly is one thing I will lobby for ;-)
>
> I'm certainly learning more ;-)
>
>
> --Greg

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-18 Thread Jochen Bern
On 18.04.2014 20:45, questions-requ...@lists.ntp.org digested:
> From: GregL 
> 
> > > What about the idea of going to only one entry, but that entry is
> > > served by a DNS load balancer to choose one of two internal time
> > > servers to check.
> >
> > Well, that will [...]
> 
> I'm wrestling with that very question.  With 100+ systems, we have a far
> greater problem if some systems are *off* and others are not.

Am I missing something, or will the setup described above (and assuming
that the two servers disagree again) *force* your clients to do what you
just called "the far greater problem"? Namely, being randomly split
50/50 between the two servers, not even *knowing* of the other one?

(FWIW, ntpd does the DNS resolution *once* when loading its config and
works with the one IP obtained from then on, plans of implementing
automatic rotation/selection of "pool" servers in future versions
notwithstanding. And having potentially disagreeing NTP servers put
behind a V*IP* load balancer is discouraged as well.)

Regards,
J. Bern
-- 
*NEU* - NEC IT-Infrastruktur-Produkte im :
Server--Storage--Virtualisierung--Management SW--Passion for Performance
Jochen Bern, Systemingenieur --- LINworks GmbH 
Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt
PGP (1024D/4096g) FP = D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C27
Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202
Unternehmenssitz Weiterstadt, Geschäftsführer Metin Dogan, Oliver Michel
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-18 Thread William Unruh
On 2014-04-18, Steve Kostecke  wrote:
> On 2014-04-18, William Unruh  wrote:
>
>> On 2014-04-18, GregL  wrote:
>>
>>> Now, I'm just planning on making changes to the ntp.conf, like adding
>>> the "-x" parameter. I'm hoping that that will prevent huge time
>>> resets backwards in time...should that ever be even possible again.
>>
>> ntpd will reset the time if it is off by more than 128 ms.
>
> The default step threshold is 128ms. This threshold is user
> configurable.
>
> As for the '-x' option. Using it could lead to having a clock so far off
> from the correct time that ntpd will never be able to correct the offset
> via slewing. 
>
>> Those higly non-linear jumps are one of the "features" of ntpd. If you
>> do not want them, run for example chrony. It will smoothly change the
>> time. It will however also at times slew the time much faster than
>> 500PPM to get the time back on track.
>
> 500PPM per day is 43 seconds per day. One could argue that a clock which
> requires more than 43 seconds per day of correction is fundamentally
> broken and requires repair rather than calibration.

If the rate error were off by that much, that would be true. However, if
the clock is off by an hour say, and you do not want it ever jump
backwards then 43 sec per day would take 100 days to correct that offset
error (assuming that none of that 43 sec per day were  taken up by
rate error). At the max linux slew rate of 10 PPM, it would take
about 10 hours to correct. Yes, your rate might be out by 10% but it may
be that never jumping is worth that to you. 

Also, some clocks are just out by over 500PPM. That could be the case
for Linux with its clock calibration routine for a while (very rare but
possible). Since almost
none of us are capable of rewriting the kernel, "fixing the problem" was
not an option. 
(On another bootup, the rate error could be very different.)


>

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-18 Thread Steve Kostecke
On 2014-04-18, William Unruh  wrote:

> On 2014-04-18, GregL  wrote:
>
>> Now, I'm just planning on making changes to the ntp.conf, like adding
>> the "-x" parameter. I'm hoping that that will prevent huge time
>> resets backwards in time...should that ever be even possible again.
>
> ntpd will reset the time if it is off by more than 128 ms.

The default step threshold is 128ms. This threshold is user
configurable.

As for the '-x' option. Using it could lead to having a clock so far off
from the correct time that ntpd will never be able to correct the offset
via slewing. 

> Those higly non-linear jumps are one of the "features" of ntpd. If you
> do not want them, run for example chrony. It will smoothly change the
> time. It will however also at times slew the time much faster than
> 500PPM to get the time back on track.

500PPM per day is 43 seconds per day. One could argue that a clock which
requires more than 43 seconds per day of correction is fundamentally
broken and requires repair rather than calibration.

-- 
Steve Kostecke 
NTP Public Services Project - http://support.ntp.org/

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-18 Thread GregL
> > Yes, clearly the root of the most recent problem was a faulty
> configuration
> > that allowed our internal time servers to grow to nearly 50 seconds apart
> > at some pointand that wreaked havoc in many many areas.
>
> What was causing that. Clearly one, or both, are not getting their time
> from proper servers themselves. In you post there seemed to be a hint
> that one of your servers was getting its time from the other. That is
> bad idea. It is no better than having just one server.
>
>
Yes.  From what I understand, one of the servers that serves as a time
server as was rebuilt in January and the ntpd configuration was not put
back on.  It was an oversight.  Because of other services that run there,
that server *should* have kept in sync with the other server, but that sync
didn't appear to happen either.

Clearly a bad situation.  That is corrected now, with both internal time
servers independently configured to go to a external pool of NTP servers.
That is more of the "correct the problem" fix;  as a matter of looking at
the big picture, we are just trying to determine any other changes we
should make.   Building more dedicated time servers that aren't rebooted
weekly is one thing I will lobby for ;-)

I'm certainly learning more ;-)


--Greg
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-18 Thread William Unruh
On 2014-04-18, GregL  wrote:
>> On Fri, Apr 18, 2014 at 09:01:09AM -0500, GregL wrote:
>> > >   What you should do is to add more servers to the config.
>> >
>> > What about the idea of going to only one entry, but that entry is served
>> by
>> > a DNS load balancer to choose one of two internal time servers to check.
>> >  Each of those, is configured to point at a pool of time servers (4
>> each).
>>
>> Well, that will prevent the client from detecting it's getting wrong
>> time. Is that what you want?
>>
>>
> I'm wrestling with that very question.  With 100+ systems, we have a far
> greater problem if some systems are *off* and others are not.
>
> From the log it seems that at least one server is completely wrong,
>> the offset between the two servers is around 3 seconds! I'd suggest to
>> fix that first.
>>
>>
> Yes, clearly the root of the most recent problem was a faulty configuration
> that allowed our internal time servers to grow to nearly 50 seconds apart
> at some pointand that wreaked havoc in many many areas.

What was causing that. Clearly one, or both, are not getting their time
from proper servers themselves. In you post there seemed to be a hint
that one of your servers was getting its time from the other. That is
bad idea. It is no better than having just one server. 

>
> That is fixed, and our two internal time servers *should* be correct.

>
> Now, I'm just planning on making changes to the ntp.conf, like adding the
> "-x" parameter.  I'm hoping that that will prevent huge time resets
> backwards in time...should that ever be even possible again.

ntpd will reset the time if it is off by more than 128 ms. Those higly
non-linear jumps are one of the "features" of ntpd. If you do not want
them, run for example chrony. It will smoothly change the time. It will
however also at times slew the time much faster than 500PPM to get the
time back on track. 
>
> But, was the "sychronization lost" message *because* ntp saw the time
> difference so great on peer servers...and chose one to synch to...resulting
> in the time reset message?

And since there are only two, it had no idea which one to choose so it
chose randomly. 

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-18 Thread Miroslav Lichvar
On Fri, Apr 18, 2014 at 10:38:10AM -0500, GregL wrote:
> But, was the "sychronization lost" message *because* ntp saw the time
> difference so great on peer servers...and chose one to synch to...resulting
> in the time reset message?

It seems so. Not sure how close this is to the version you are
running, but in xntp3-5.93e (dated 1998) it seems the system peer is
unselected (and the message logged) on every clock step.

-- 
Miroslav Lichvar
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-18 Thread GregL
> On Fri, Apr 18, 2014 at 09:01:09AM -0500, GregL wrote:
> > >   What you should do is to add more servers to the config.
> >
> > What about the idea of going to only one entry, but that entry is served
> by
> > a DNS load balancer to choose one of two internal time servers to check.
> >  Each of those, is configured to point at a pool of time servers (4
> each).
>
> Well, that will prevent the client from detecting it's getting wrong
> time. Is that what you want?
>
>
I'm wrestling with that very question.  With 100+ systems, we have a far
greater problem if some systems are *off* and others are not.

>From the log it seems that at least one server is completely wrong,
> the offset between the two servers is around 3 seconds! I'd suggest to
> fix that first.
>
>
Yes, clearly the root of the most recent problem was a faulty configuration
that allowed our internal time servers to grow to nearly 50 seconds apart
at some pointand that wreaked havoc in many many areas.

That is fixed, and our two internal time servers *should* be correct.

Now, I'm just planning on making changes to the ntp.conf, like adding the
"-x" parameter.  I'm hoping that that will prevent huge time resets
backwards in time...should that ever be even possible again.

But, was the "sychronization lost" message *because* ntp saw the time
difference so great on peer servers...and chose one to synch to...resulting
in the time reset message?
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-18 Thread Miroslav Lichvar
On Fri, Apr 18, 2014 at 09:01:09AM -0500, GregL wrote:
> >   What you should do is to add more servers to the config.
> 
> What about the idea of going to only one entry, but that entry is served by
> a DNS load balancer to choose one of two internal time servers to check.
>  Each of those, is configured to point at a pool of time servers (4 each).

Well, that will prevent the client from detecting it's getting wrong
time. Is that what you want?

>From the log it seems that at least one server is completely wrong,
the offset between the two servers is around 3 seconds! I'd suggest to
fix that first.

-- 
Miroslav Lichvar
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-18 Thread GregL
>
>
> A classic example of the adage " A man with two clocks doesn't know what
> the time is" . So neither can NTP.
> It will hop between the two until the two agree. This is a bad
> configuration.
>
>
That is certainly the way it feels! ;-)


>
>   What you should do is to add more servers to the config.
> 


What about the idea of going to only one entry, but that entry is served by
a DNS load balancer to choose one of two internal time servers to check.
 Each of those, is configured to point at a pool of time servers (4 each).
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP.log interpretation

2014-04-18 Thread mike cook

A classic example of the adage " A man with two clocks doesn't know what the 
time is" . So neither can NTP.
It will hop between the two until the two agree. This is a bad configuration.

Le 18 avr. 2014 à 05:53, GregL a écrit :

< snip>

> Right now, my plan is to add the "-x" option to the xntpd startup;
> hopefully that would avoid setting the clock backwards.   Additionally,
> actions are taken to make sure the two time servers never get that far out
> of sync without throwing out some alerts.
> 

  What you should do is to add more servers to the config.

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


[ntp:questions] NTP.log interpretation

2014-04-18 Thread GregL
I'm trying to determine what this section of an ntp.log is telling me.

This is from a default xntpd instance on AIX 5.3.

ntp.conf has two severs listed, 34 and 97, with the '34' server preferred.

The first couple of lines are the last in a *long* list of hourly logged
offset messages.

It appears to me that *something* caused the system to sync to the
non-preferred server.  My guess is a lack of response from the '34' server?

Q#1)  Is that a safe assumption?  How many times, and how long would it
wait to get a response from '34'?

It turns out that due to a configuration issue, the '34' server was
actually out of time sync, and the second, non preferred server,  '97', was
actually the correct time. They were seconds off.   ('34' ended up not
being configured to an outside time server pool... only to sync with
'97'... but clearly there was a problem with that 'sync' happening.).


After this first reset, then the system went into this cycle of resetting
back and forth.  I'm trying to understand the logic here.  Is it telling me
that the was that the response from the preferred server was sporadic
enough that it would regularly flip to sync with secondary server...thus
the flip flopping back and forth with time.   This continued for hours
until the configuration issue with the two time servers were fixed, so they
were both "in sync" with the correct time... then back to regular old
boring offset log entries.

Q#2)  does "sychonisation lost" mean a lack of response from the preferred
server, or does it mean a dramatic difference in time, such that it needs
to be reset (as another log message indicates).

Thanks for any help in learning how to read this.  I have no indications of
any network issues, but given NTP uses UDP, it could be hard to track.   It
seems suspicious that once the two time servers were in sync...the issues
went away.   I.e.. meaning a *real* network issue did not exist... perhaps
just a small hiccup that caused the flip/flopping to occur and continue
until fixed.

Right now, my plan is to add the "-x" option to the xntpd startup;
hopefully that would avoid setting the clock backwards.   Additionally,
actions are taken to make sure the two time servers never get that far out
of sync without throwing out some alerts.

Any advice/counsel concerning this scenario would be greatly appreciated.
 I would like to feel a better understanding of what the log is telling
me which is why I am here ;-)

Thanks!



10 Apr 00:57:22 xntpd[245888]: offset -0.000638 freq 1.908 poll 6

10 Apr 01:57:22 xntpd[245888]: offset -0.000978 freq 2.094 poll 6

*10 Apr 02:19:40 xntpd[245888]: synchronized to 172.16.32.34, stratum=3

*10 Apr 02:21:05 xntpd[245888]: synchronisation lost

*10 Apr 02:21:48 xntpd[245888]: synchronized to 172.16.56.97, stratum=1

*10 Apr 02:27:48 xntpd[245888]: synchronized to 172.16.32.34, stratum=2

10 Apr 02:35:18 xntpd[245888]: time reset (step) -2.580356 s

10 Apr 02:35:18 xntpd[245888]: synchronized to 172.16.56.97, stratum=3

10 Apr 02:35:18 xntpd[245888]: synchronisation lost

10 Apr 02:35:18 xntpd[245888]: system event 'event_clock_reset' (0x05)
status 'sync_alarm, sync_unspec, 15 events, event_peer/strat_chg' (0xc0f4)

10 Apr 02:35:18 xntpd[245888]: system event 'event_sync_chg' (0x03) status
'sync_alarm, sync_unspec, 15 events, event_clock_reset' (0xc0f5)

10 Apr 02:35:18 xntpd[245888]: system event 'event_peer/strat_chg' (0x04)
status 'sync_alarm, sync_unspec, 15 events, event_sync_chg' (0xc0f3)

10 Apr 02:35:50 xntpd[245888]: peer 172.16.56.97 event 'event_reach' (0x84)
status 'reach, conf, 15 events, event_reach' (0x90f4)

10 Apr 02:36:22 xntpd[245888]: peer 172.16.32.34 event 'event_reach' (0x84)
status 'reach, conf, 15 events, event_reach' (0x90f4)

10 Apr 02:40:06 xntpd[245888]: synchronized to 172.16.56.97, stratum=3

10 Apr 02:40:09 xntpd[245888]: time reset (step) 2.569861 s

10 Apr 02:40:09 xntpd[245888]: synchronisation lost

10 Apr 02:40:09 xntpd[245888]: system event 'event_clock_reset' (0x05)
status 'sync_alarm, sync_unspec, 15 events, event_peer/strat_chg' (0xc0f4)

10 Apr 02:40:41 xntpd[245888]: peer 172.16.32.34 event 'event_reach' (0x84)
status 'reach, conf, 15 events, event_reach' (0x90f4)

10 Apr 02:41:13 xntpd[245888]: peer 172.16.56.97 event 'event_reach' (0x84)
status 'reach, conf, 15 events, event_reach' (0x90f4)

10 Apr 02:44:57 xntpd[245888]: system event 'event_peer/strat_chg' (0x04)
status 'sync_alarm, sync_ntp, 15 events, event_clock_reset' (0xc6f5)

10 Apr 02:44:57 xntpd[245888]: synchronized to 172.16.32.34, stratum=2

10 Apr 02:44:54 xntpd[245888]: time reset (step) -2.877219 s

10 Apr 02:44:54 xntpd[245888]: synchronisation lost

10 Apr 02:44:54 xntpd[245888]: system event 'event_clock_reset' (0x05)
status 'sync_alarm, sync_unspec, 15 events, event_peer/strat_chg' (0xc0f4)

10 Apr 02:45:26 xntpd[245888]: peer 172.16.56.97 event 'event_reach' (0x84)
status 'reach, conf, 15 events, event_reach' (0x90f4)

10 Apr 02:45:58 xntpd[245888]: peer 17