On 2015-02-12, Charles Swiger <cswi...@mac.com> wrote:
> On Feb 12, 2015, at 12:49 AM, William Unruh <un...@invalid.ca> wrote:
>> On 2015-02-11, Charles Swiger <cswi...@mac.com <mailto:cswi...@mac.com>> 
>> wrote:
>>> On Feb 11, 2015, at 7:23 AM, Rob <nom...@example.com 
>>> <mailto:nom...@example.com>> wrote:
>>>> But I see it has also been explained elsewhere in the thread: ntpd has
>>>> a maximum on the momentary drift of 500ppm, no matter if it is static
>>>> or dynamic or the sum of two.  I think that is not warranted.
>>> 
>>> Do you believe that a clock which loses or gains over a minute per day
>>> should be assumed to be trustworthy?
> [ ... ]
>> A few kernels ago, Linux kernel had problems in calibrating the system
>> clock. It would be off by up to a few 100 PPM, and change from one boot
>> to the next. Ie, this was a consistant drift error.
>
> Yes-- I had that specific kernel bug in mind.
>
>> It could be zeroed out using adjtimex, but ntpd is supposed to handle the 
>> clock,
>> not demand people fixing things by hand. chrony had no problem.
>
> chrony has a problem if it trusts an obviously broken clock!

Not sure what you mean by this, or which broken clock it is that it is
supposed to trust. If it uses external sources, then it has a similar
selection algorithm just as ntpd has. If you mean itself, then yes, it
tries really hard to discipline the local clock. Whether it should have
some algorithm which shuts it down if the local clock seems to be too
wild, I do not remember if it does or does not. (Ie something equivalent
to ntpd's arbitrary 1000 sec rule-- ie if the clock is out by 1000 sec
ntpd gives up). But whether or not it should give up, or try its best is
something that should be left to the user, not to some arbitrary rules
by the designer. 
It is true that if others depend on that clock, then the responsible
thing to do might be to shut it down. But those foreign systems will
soon enough discover that this clock is crazy and exclude it anyway. 


>
>> It uses both the
>> frequency and the tick rate adjustments of admtimex. ntpd could have a
>> problem if the linux clock was off by say 400PPM in which case it would
>> leave little headroom for normal operation.
>
> For what it's worth, real physical hardware on a non-broken OS seems to
> do fine with only +/- 100 ppm tolerance almost all of the time.  I've
> looked for counterexamples in a large fleet, but the only things I saw
> with an ntp.drift value beyond 100ppm were VMs.

I had much bigger drifts than that when the linux kernel bug was in
place.  But I agree that the variation of the rate is less than 100 ppm.
and often less than 1ppm. 

>
>> Of course the "right" procedure would be to fix the kernel, and that was
>> eventually done, but that eventually was on the order of a year or two. 
>
> That's exactly right.  From that situation, I drew the conclusion that
> precise timekeeping wasn't a priority for the Linux project.

Or that the people at responsible for the timing did not understand what
the kernel was doing or why. It is a rather complex piece of code. 

>
>> Were all Linux people to give up disciplining their clocks while waiting
>> for the kernel people to get their act together? That hardly seems
>> sensible advice. 
>
> I recall advising anyone who cared about precise timekeeping to run
> FreeBSD, NetBSD, or even maybe OpenSolaris instead of Linux.

Linux was fine, after that intial transient was done with. Chrony fixed
it very quickly (much less than an hour) ntpd had much more difficulty with
it.
>
> And yes, I fully understand that some folks have a homogenous environment
> and lack the flexibility to select the best platform for a particular purpose.
> Been there, had thousands of java processes on Linux VMs get wonky with
> the June 2012 leap second due to adjtimex() kernel hang.
>
> Regards,

_______________________________________________
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions

Reply via email to