Re: [ntp:questions] NTP autokey: self-signed certificate expiration problem

2017-12-29 Thread David L. Mills

Stephane lasagni wrote:


Hello,


I tried the NTP autokey protocol (TC scheme at first, then with IFF parameters 
- Schnorr algorithm since it is the scheme that is the most documented). I 
managed to get both schemes to work ok however I have noticed one problem: my 
product is a NTP client and self-generate its auto-signed non-trusted 
certificate as described in the protocol (using the ntp-keygen -H command). 
However when my product starts, it always start with a default date which is in 
2015! Because the self-signed certificat is only valid for 1 year, it is 
expired immediately after its generation! I need to be synchronized before I 
generate the certificate...but then I need the certificate before to be able to 
synchronise!


I found a workaround but I don't think it is a very "clean" solution: I use the option "-l" 
of ntp-keygen to specify the certificate life time duration and I put a big duration value (like 40 years) just 
to make sure the generated certificate is valid at power up. I can then make sure that I renew the certificate 
every month or so (but everytime with a 40 years duration => I've set up a cronjob to launch a script to 
generate the certificate at power-up and then every month but this script is "fixed" so each time it 
is launched the new generated certificate has a 40 years duration...


I am thinking there must be a better way to deal with that! I'm probably not 
the only one to have this time of problem! :)


How can this type of problem be dealt with? Is there a better solution?


thank you very much for your help!

Best regards

Stéphane


PS: I am planning to also test the "private certificate" to try to understand 
how it works (I have sent a question about this scheme recently)



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 


Stephane,

As alternative, you can use the symmetric key scheme.  This does not 
require Autokey.


The original intent of the keygen program with no argument was to 
generate a certificate using the current time of the operating system.  
Therefore, once you generate a proper certificate, the old certificate 
lifetime is updated.


Dave
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP autokey and the "private certificate" scheme

2017-12-19 Thread David L. Mills

Stephane lasagni wrote:


Hello,
I apologize in advance if my questions further below seem basic to some of you: 
I am very new to NTP and Cybersecurity (a whole new world for me!). I am trying 
to work out out NTP autokey works when using the “private certificate” scheme, 
I thought you might be able to help me to understand it better. I know this 
scheme is not recommended by RFC 5906 (only for testing purposes). However in 
my application, this scheme could be appropriate. I think I understood how the 
other schemes (TC, IFF,..) worked but for some reasons I’m struggling to 
understand the “private certificate” scheme. I have the following questions 
(which I numbered to make the reading easier):

1.   I understand the “private certificate” scheme is not recommended for 
general use (only for testing and development) only because, with this scheme, 
it is difficult to renew the certificate for all hosts in a secure way, is it 
correct?

I understand that the TA (Trusted Authority) generates this private certificate 
off-line (signed by the TA) and provides it in a secure way to all hosts of the 
NTP group but what I am struggling to understand is what this private 
certificate contains exactly and how it is used:

2.   Does the private certificate replace the self-signed certificate which 
is generated by each host at the beginning of the protocol? ie each host knows 
they can use the public key in that certificate (and the associated private key 
: see question 3) for the cookie encryption/de-encryption, etc..?

3.   If answer to question 2 is yes, does it mean that, in addition with 
the certificate, the TA has to provide each host with the associated private 
key which goes with the public key of the certificate?

4.   If answer to question 3 is no, does it mean each host has 2 
certificates: the self-signed non-trusted certificate generated at the 
beginning of protocol + the private certificate? How the private certificate is 
then used exactly?

5.   From RFC 5906, I understand that in case the private certificate 
scheme is used, then the certificate trail and the identification steps are not 
necessary. What about the SIGN exchange? The SIGN exchange only has sense with 
a non-trusted self-signed certificate so this brings me back to the previous 
questions

6.   Last question (beginner lever I think...sorry!) and I am sure I 
probably forgot some : what does this private certificate contain in terms of 
subject name (the issuer is clearly the TA but is the subject name exactly the 
same for all hosts, ie the certificate is identical for all hosts? maybe it 
does not matter?) and how long is it valid for (1 year by default I guess which 
makes this scheme difficult to use in practice for the reasons given above?)?

Thank you very much in advance for your help!
Best regards
Stéphane

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 


Stephanie,

Golly. You are the first person in 20 years to have asked about the 
private certificate scheme.  Frankly, I don't remember all the tiny 
details you mentioned.  However, the Autokey scheme is about to be 
replaced by  new security proposals, so it is probably better to wait 
until the dust clears.


Dave
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Understanding MJD in NTPv4 test cases

2012-08-17 Thread David L. Mills

antony,

The NTP timescale described on the NTP Era and Era Numbering page at the 
NTP Project site reckonds JDN days and fraction since noon on the first 
day of year 4613. The correspondence betwen JDN day and civil year prior 
to Pope Gregory's bull is in JDN years of 365.25 days , as is the habit 
of historians. The year since the papal bull corresponds to the 
Gregorian Calendar. The tables you see in the documentation were 
determined by an awesome Excel spreadsheet that apparently is unreadable 
with the current Excel version. Whatever bugs may remain are mine, but 
do not affect the current NTP timekeeping, only possible historic 
misadventures.


David

antony.arciu...@oooii.com wrote:


In http://www.ietf.org/rfc/rfc5905.txt and 
http://www.eecis.udel.edu/~mills/database/reports/ntp4/ntp4.pdf

Figure 4: Interesting Historic NTP Dates/Table 2: Interesting Historic NTP Dates

It seems to me that all MJD values in the table are calculated using Gregorian 
calculations all the way back. If I put in an if before Gregorian was used to 
do a different leap day calculation, then none of the pre-Gregorian values come out 
correctly.

This is all true EXCEPT for the first test, the Julian Day Number Epoch. The document 
describes what should be zero for MJD, i.e. the exact constant that the 
offset is from regular JD. Using the Gregorian calculation for Julian Days, the value for 
Jan 1, -4712 is 38, not 0. Thus the MJD I get is -2399963, not -241. That's way more 
than any precision/noon v. midnight would produce.

Why is that first date different algorithmically, if the test values are to be 
believed? It's not a Julian v. Gregorian thing because I can generate the other 
BCE dates consistently.

Am I missing something? I feel like either the first MJD in the table is 
incorrect, or all BCE values are incorrectly calculated using Gregorian math in 
a non-Gregorian era.

I am using code from http://www.tondering.dk/claus/calendar.html AND 
http://bowie.gsfc.nasa.gov/time/julian.txt (ported to C) for JDN calc and they 
are always self consistent (I do a assert(JDN1 == JDN2) to be sure I've got the 
calcs right). I also have a 3rd I did right from 
http://en.wikipedia.org/wiki/Julian_day, but that too produces 38.

I can only get 0 JDN, thus -241 MJD with a non-Gregorian calculation, but then all BCE calcs + 
Last day Julian fail (the CE tests after Last day Julian all pass.

Please, can someone shed some light on what I am missing? What are the rules 
NTP uses for calculating JDN?

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTP vs RADclock?

2012-06-07 Thread David L. Mills

Julian,

Thanks for the paper reference. Your ideas on feed-forward are similar 
to the ideas in Greg Troxel's MIT dissertation. These ideas were 
partially implemented in NTPv3 a very long time ago.


There are some minor misinterptretations in the paper. The NTP 
discipline loop is not critically damped; it is purposely underdamped 
with a minor overshoot of a few percent in order to improve the 
transient response. The impulse response was slavishly copied in the 
microkernel and nanokernel code that left here over a decade ago. The 
microkernel survives in Solaris and the nanokernel in most BSD systems 
with varying degrees of fidelity; however, many systems have elected to 
modify the original BSD tickadj semantics, which result in an extra 
pole. The result is a moderate instability at the longer poll intervals, 
especially if the step threshold is increased or eliminated. In any 
case, the response has no serious misbehavior as the paper described. 
Note that in no case are the daemon and kernel algorithms cascaded as 
the paper implies. Either one or the other is used, but not both.


The system behavior with multiple servers is indeed as the paper 
suggests, but there is considerable algorithm horsepower to diminish the 
effects, including the cluster  and combine algorithms, plus the 
anti-clockhop and prefer peer mechanisms. These provisions were first 
implementd in the Internet of twenty years ago when the congestion on 
overseas links frequently reached over one second. Perhaps today these 
algorithms could be more carefully tuned for LANs and even wifi netorks.


As the paper describes, NTP algorithms are designed for traditional 
mathematical analysis, but with both linear and nonlinear components. 
However, the FLL algorithm is based on a model described by Levine as 
predictive. The model  in the documentation describes both the PLL and 
FLL in predictive terms, but that doesn't change the conclusions in the 
paper.


The paper suggests possible improvements in data filtering and analysis. 
The clock filter and popcorn spike suppressor algorithms in NTP 
represent one approach. A persistent observation is that NTP does not 
effectively use offset/delay samples other than at the apex of the 
scattergram. While it does indeed do that for the huff-n'-puff fiilter, 
the possible improvement in other cases is problematic. The paper does 
not mention the implications of roundtrip delay in the maximum error 
statistic, such as in Cristian's model, as used by NTP. It is a natural 
error bound for asymmetric paths such as mentioned in the paper.


In summary, the NTP algorithms have evolved over thiry years in response 
to major changes in Internet service models and surely could use some 
further evolution. I am glad there is continuing interest in improvements.


Dave

Julien Ridoux wrote:


On Thursday, June 7, 2012 4:19:31 PM UTC+10, unruh wrote:
 


On 2012-06-07, skillz...@gmail.com skillz...@gmail.com wrote:
   


On Jun 5, 6:46?pm, Julien Ridoux jul...@synclab.org wrote:
 


On Tuesday, June 5, 2012 9:12:42 AM UTC+10, E-Mail Sent to this address will be 
added to the BlackLists wrote:
   


Thanks for response. It would be great to have a simplified
description of the algorithm. I've read most of the docs on the
synclab site.
 



That is a very impressive effort :)

 


I'm trying to synchronize several devices to a reference
clock of a similar device rather than to a true wall clock time of a
real NTP server. I can't use the RADclock code directly because it's
GPL so I'd like distill the algorithm down to something I can
implement from scratch. I'd like to adapt my current NTP client and
server code to use RADclock so I can compare performance. I'm aiming
 


Why not just use the reference implimentation of radclock for the tests?
   



The next version of RADclock is likely to be released under a BSD licence. This 
should save you the trouble of reimplementing the algorithm.

 
 


for 10 microseconds or less of sync on a wireless LAN with a fast
 

Not going to work. 

   


initial sync time (less than 5 seconds, but I can send many packets
 


In 5 sec you could set the time, but you cannot get a good estimate of
the rate. 

   


close together if needed). I haven't been able to get even close to
this with my current NTP code (Ethernet is close, but WiFi introduces
a lot of variability). Some of the key points seem to be:
 


Precisely. And Radclock will not help with that. It uses the same ( or
perhaps much slower) synchronization algorithm.
   



I agree with Unruh, you are going to bump into quite a few issues and the much 
noisier characteristic of the WiFi channel makes it much harder for any 
synchronisation algorithm. RADclock however does a pretty good job at filtering 
noise but there is no magic involved. If the overall noise characteristics of 
all packets is bad, then you cannot do much about it.
I have run RADclock over WiFi, but I would 

[ntp:questions] Computer Network Time Synchronization: Russian translation

2012-04-26 Thread David L. Mills

Folks,

I thought you might get a kick out of this. See 
www.eecis.udel.edu/~mills for ISBN number. Translated from the second 
edition, published by CRC Press.


Dave
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Falseticker determination

2012-04-05 Thread David L. Mills

A C,

Before you take a hacksaw to code, you should see the How NTP Works 
collection in the online documentation, in particular the clock select 
algorithm page. It includes  advice on how to avoid falsetickers in 
cases like yours, including the use of tinker and/or tos commands. There 
should be no need of additional trace lines in the code, as there 
already are some that demonstrate the results of the clock select and 
clluster algorithms.


D M

A C wrote:

On 4/4/2012 18:52, E-Mail Sent to this address will be added to the 
BlackLists wrote:



Dave Hart wrote:


A Cagcarver+...@acarver.net  wrote:


Where in the code of 4.2.7p270 is the determination that
  a peer is a falseticker?  I'm looking through ntp_proto.c
  but I don't think I'm fully grasping how the determination
  is made and the peer marked.

I want to put some debug lines in the area of the code
  where the falseticker is determined so I can figure out
  what conditions are causing the PPS to be marked as a
  false ticker.



Line 2519 of ntp_proto.c (in clock_select):
peer-new_status = CTL_PST_SEL_SANE;

All survivors to that point in the code get the x, fleetingly.  Those
that keep it fail to survive to line 2688:
peers[i].peer-new_status = CTL_PST_SEL_SELCAND;



I would have thought 2835 - 2855 might be where he would
  want to take a closer look.



Well that was a fun exercise.  The end result is that the PPS is no 
longer a false ticker and I don't need a prefer peer either. :)


Believe it or not, it's actually working better this way.  For 
starters, the offset is staying within +/- 10 us of the PPS pulse.  
Additionally, allowing the system to select the rest of the clocks 
using the normal clock selection instead of a prefer peer has actually 
quieted the sys_fuzz messages.  Usually I was seeing a sys_fuzz 
message perhaps once every couple minutes.  Now I see one maybe once 
in a few hours or more.  Plus if any of the peers explodes for 
whatever reason it doesn't wipe out my clock because the averaged time 
doesn't move much.

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Off topic: using delay in routing protocols

2011-12-29 Thread David L. Mills

Juliusz ,

The fuzzballs indeed used a delay metric. They made little nests at the 
earth stations in the SATnet program, as well as the routers used in the 
early NSFnet. In its original form, the ARPAnet also used a a node state 
metric like the fuzzballs, but switched to a link based metric like 
OpenSPF. So far as I know the fuzzballs used split horizon and hold-down 
before anybody else did. This was exemplified by the mantra


good news travels fast, but bad news travels forever. See below for 
additional references.


Mills, D.L. The Fuzzball. Proc. ACM SIGCOMM 88 Symposium (Palo Alto CA, 
August 1988), 115-122.


Mills, D.L., and H.-W. Braun. The NSFNET Backbone Network. Proc. ACM 
SIGCOMM 87 Symposium (Stoweflake VT, August 1987), 191-196.


Dave

Juliusz Chroboczek wrote:


Hi,

Sorry for the offtopic post, but I really don't see another place to ask
this question.

I hear that the Fuzzball routing protocol used packet delay as a routing
metric.  Does anyone recall if that's right?  Was it the RTT, or was it
attempting to perform an estimate of one-way delay?

More generally, I'll be grateful for any pointers to papers on the
subject of using delay in routing protocols.

Thanks for your help,

-- Juliusz Chroboczek

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Choice of local reference clock seems to affect synchronization on a leaf node

2011-11-08 Thread David L. Mills
  unruh, 

unhurt,


1. You have a broken interpretation on how the NTP discipline algorithm 
works. See the online document How NTP Works, and in particular the 
discipline and clock state machine pages.


2. Your comparison between NTP and Chrony is badly conceived. Talk to 
Miroslav; he knows the issues.


3. The PIC (sic) issues have already been carefully considered. See the 
startup algorithms described on the How NTP Works. pages.
4. The orphan mode and local clock discipline require special provisions 
to delay clock adjustments until the configured sources have had a 
chance to activate. The paint isn't quite dry on some intricacies.


5. Starting NTP weith an initial ten-year offset is not a frequent 
adventure. Under these conditions, if the clock takes a little longer to 
stabilize, I'm not going to worry a lot about it.


Dave

unruh wrote:


On 2011-11-07, Nathan Kitchen nkitc...@aristanetworks.com wrote:
 


On Sun, Nov 6, 2011 at 2:13 PM, Danny Mayer ma...@ntp.org wrote:
   


On 11/4/2011 7:27 PM, Nathan Kitchen wrote:
 


I'm curious about some behavior that I'm observing on a host running
ntpd as a client. As I understand it, configuring a local reference
clock--either an undisciplined local clock or orphan mode--shouldn't
help me, but I see different behavior when I do have one. In
particular, when I'm synchronizing after correcting a very large
offset, I synchronize about 2x faster in orphan mode than with no
local clock, and with an undisciplined local clock I don't even fix
the offset.

I'm curious about whether this difference should be expected.

I'm using the following configuration in all cases:

? ?driftfile /persist/local/ntp.drift
? ?server 172.22.22.50 iburst

My three different configurations for local clocks are the following:

1. No additional commands

2. tos orphan 10

3. server 127.127.1.0
? ? fudge 127.127.1.0 stratum 10

In all three cases, my test has these steps:

1. Stop ntpd.
2. Set the clock to 2000-1-1 00:00:00 (that is, more than 10 years ago).
3. Run ntpd -g.
4. Check that the 11-year offset is corrected.
5. Wait for synchronization to the time server.

With either configuration #1 (no local clock) or #2 (orphan mode), the
offset is corrected quickly: 4 and 13 seconds, respectively. With
configuration #3 (undisciplined local clock), it fails to be corrected
within 60 seconds.
   


In case #3 that's expected if there are no servers to get the correct
time. What else would you expect? Where would it get it's time from?
 


In case #3, as in the other cases, the configuration includes the
server 172.22.22.50.

   


After the offset is corrected, configuration #1 takes 921 seconds to
synchronize to the server. Configuration #2 takes 472.

   


First, correcting the offset is the major concern. After that figuring
out the frequency changes need to be calculated with additional packets
being received and that takes time. It needs to have enough of them to
do the calculation.
 



Actually, that is not the way that ntpd works. It has no concept of
frequency error. All it knows is the offset. It then changes the
frequency in order to correct the offset. It does not correct the offset
directly. It never figures out what the frequency error is. All it does
is If offset is positive, speed up the clock, if negative slow it down
( where I am defining the offset at 'true' time- system clock time).
(There is lots that goes into ntp's best estimate of the 'true' time,
which is irrelevant to this discussion)

chrony has a different philosophy, where it has a concept of both the
frequency error and the offset, and it tries to correct both
independently. It keeps a large number of measurements to estimate both
the frequency error and the offset from those measurements. This results
in a far far faster convergence, and a better system clock offset behaviour (by
factors of 2-20).
Another approach might be to use the PID concepts ( in which one uses
the present offset, the derivative of the offset and the integral of the
offset to drive the correction) to control the clock to get faster
convergence, without overshoot and with high long term accuracy. These
kinds of feedback systems are used for example to control the
temperature of scientific heat baths to high precision and fast
non ringing convergence (and have gained popular use in for example
sous vide cooking). 


It might be interesting to get a Masters or PhD student somewhere to compare the
various techniques for clock control to see what their advantages and
disadvantages are especially under real life conditions. 

 


Why would it take fewer packets with orphan mode enabled (and no
peers) than with no local clock?

-- Nathan
   



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___

Re: [ntp:questions] Arcron (type 27) driver users needed

2011-09-23 Thread David L. Mills

Guys,

Joe sent be a receiver and I have finally verified it works sorta. It 
took a couple of weeks before it found WWVB, but it did. I connected it 
to deacon for test. When it first came up ntpq reported it as working, 
but after a couple of minutes the driver aparently went into a loops and 
stayed there. At least ntpq timed out for repeated pe queries. I left it 
commented out to aviiod posssible gattery exhaust. I need a little help 
here, as I cannot clearly see what I am doing.


Dave

Joe Landers wrote:


Dave,

I have a couple of Arcron's, although I don't actively use them anymore.
If you let us know exactly what you'd like us to do, we'd be happy
to help. Alternatively, I can ship one to you or Harlan and you can 
verify the fix yourself.


Joe

On 7/20/2011 8:33 PM, Dave Hart wrote:


If you use the refclock_arc.c driver (127.127.27.*) please reply to me
and mention which receiver and radio station you use.  The driver
supports MSF, DCF, and WWVB.

I'm asking because five years ago, a bug report was filed indicating
the driver was dependent on the system's time zone, when it should be
independent of it.  The driver maintainer provided a patch, and the
reporter of the bug said it cured his problem.  Then, unfortunately,
the bug was dropped on the floor and the fix never integrated into the
distributed ntpd sources.  Details at:

http://bugs.ntp.org/402

Harlan and I are embarrassed by the oversight and would like to
resolve it, but also aware the proposed fix is not
backwards-compatible and risks breaking currently-functioning setups.
Additionally, we are both leery of making changes to a reference clock
driver without adequate testing.  I would like to see the change
tested against several radio stations and with the system timezone set
to UTC and local time.

Cheers,
Dave Hart
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Magice Server Numbers: 4,5,7,9

2011-09-16 Thread David L. Mills

Danny,

We would not be having this discussion if folks read how NTP works in 
the online documentation, in particular, the page on the select 
algorithm. The number of candidates is not limited to ten. By default, 
ten is the high water mark for survivors mobilized as preemptible in 
manycast and pool modes.


Dave

Danny Mayer wrote:


On 9/16/2011 2:24 AM, unruh wrote:
Danny,
 


6. Seven clocks allow for the failure of three.


Etc, etc. . . .
 


The only answer is to have at least 11 clocks although that
is also not foolproof:-) 
   



Actually no. If you get too many reference clocks they will start to
gang up against each other and it becomes impossibly complex to try and
decide which set to use. As the number of clocks increase the number of
gangs will likely increase. That's why the reference implementation
limits the number of reference clocks to 10.

Danny
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Magice Server Numbers: 4,5,7,9

2011-09-16 Thread David L. Mills

Danny,

We would not be having this discussion if folks read how NTP works in 
the online documentation. The maximum number of selectable candidates is 
not limited to ten. Ten is the high water mark for the number of 
preemptable candidates mobilized by the manycast and pool modes.


Dave

Danny Mayer wrote:


On 9/16/2011 2:24 AM, unruh wrote:

 


6.  Seven clocks allow for the failure of three.
Etc, etc. . . .
 


The only answer is to have at least 11 clocks although that
is also not foolproof:-) 
   



Actually no. If you get too many reference clocks they will start to
gang up against each other and it becomes impossibly complex to try and
decide which set to use. As the number of clocks increase the number of
gangs will likely increase. That's why the reference implementation
limits the number of reference clocks to 10.

Danny
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] how long does it take ntpd to sync up

2011-08-28 Thread David L. Mills

Brian,

See the release notes for the latest distribution in the online 
documentation.


There has been a bit of facebook engineering in this discussion. For the 
real story, see the How NTP Works pages in the latest online documentation.



Dave

Brian Utterback wrote:


On 08/28/11 06:53, David Woolley wrote:
 


Harlan Stenn wrote:

   


ntp *does* a fit/figure of the expected needed adjustment - your
sentence implies that it does not (at least that's how it reads to me).
 


No it doesn't.  It is basically a feedback controller.

If you take a modern central heating controller, which varies the output
by varying how much of a ten minute cycle the pump is running, you have
something similar to version 3 ntpd, except that it used 4 seconds.

The central heating controller will use the measured difference from the
target temperature to adjust the on to off proportion, and also include
some of the integral of that, to, eventually, remove any remaining offset.

A fitting process would be more like the controller measuring the rate
of temperature change when the heat was off, and when it was on, and
then calculating exactly how much on to off time to apply in one go. (In
practice, it isn't as simple as that for the central heating system, as
there is a lag involved for the heat to get from the boiler flame to the
the thermostat.

   



I think that there might be something to the process that Bill supports,
at least in some situations. As you know, I have had problems with NTP
and larger initial frequencies. (I know I owe you testing on the latest
NTP. Sorry, I have been busy. I'll get to it soon.) But if you think
about it, we already do something like this for offset.

Of course, NTP uses a PLL in the general case. (It used to use a FLL
when it settled in, but I seem to remember Dr. Mills saying that was
removed.) but with iburst, we futz with the algorithm to get the offset
right quickly. Couldn't we do something similar for frequency?
Particularly in the case where there is no drift file, the initial
frequency could be very far off. Save the first 16 (say) poll samples,
correct for the offset adjustment, do a best fit analysis and some
sanity checking and use the result for an initial frequency. You could
keep this up, doing best fit analysis until it got to within a certain
error interval and then switch to the normal regime.

If you only did the first analysis, you could trivially prove that it
does not break the PLL because you are left with an initial condition
that potentially could have occurred anyway. But in most cases, you are
going to zero in on the right frequency pretty quickly.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Setting back-up network servers to minpoll 10 automatically when sychronizied to a referrence clock.

2011-07-15 Thread David L. Mills

David,

Something like this was done in NTPv3 (xntpd) and it turned out to be a 
bad idea. The poll interval is determined by the time constant, which 
for PPS and other low-stratum sources is relatively small. If a backup 
is switched in at a poll interval much larger than this, it takes awhile 
for the time constant and backup poll interval to stabilize and 
meanwhile the Nyquist limit is exceeded. In NTPv3 this sometimes 
resulted in an evil twitch until things calmed down.


It gets worse in the general case where sources of unequal stratum are 
configured. The algorithms can result in more than one stratum survivor 
is present and the combine algorithm uses them all. They must all run 
the same poll interval or evil twitches can result. All in all, to do 
what you suggest requires an intricate evaluation of what already is a 
complicated and fragile algorithm, so this might not happen soon.


Dave

David J Taylor wrote:

Edward T. Mischanko etm1962@ wrote in message 
news:ivk80m$4uh$1...@speranza.aioe.org...


I am using GPS with PPS as my primary time source.  I don't want to 
set my back-up network servers to minpoll 10 in the configuration 
because if the GPS ever fails the servers would be fixed at minpoll 
10.  I propose an enhancement to the current NTPD functions:  Have 
NTPD automatically set network clocks to minpoll 10 when using a 
stratum 0 clock as a time refference.  This would cut network traffic 
and server load while still allowing a minpoll 6 setting or lower in 
the configuration to be used if the refference clock ever fails.



I like and support that suggestion.

David
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Controlling the combine algorithm

2011-07-04 Thread David L. Mills

Steve,

There is something wrong with the nomenclature in your message. By 
definition, the number of survivors is the number remaining after the 
cluster algorithm has completedd. This number is three by default, but 
can be changed using the minclock option. To reduce the numb er of 
survivors below three is generally a bad idea, but as I said, the option 
is available.


I don't understand what you mean by it doesn't work. The maxclock 
option does not do what you want. It is intended for automatic 
configuration, where it specifies the maximum number of servers 
remaining after the preempt phase. For configured associations, this 
number is irrelevant.


Dave

Steve Kostecke wrote:


On 2011-07-02, David L. Mills mi...@udel.edu wrote:

 

The combine algorithm operates on the survivors of the cluster 
algorithm, as described on the How NTP Works page. The number of 
survivors can be set using the minclock option. I'm not sure why you 
want to do this, but the option is there.
   



Dave,

I've been having a discussion with someone who wishes to configure a
large number of unicast time servers and restrict the number of
survivors to a number below the default. He was attempting to accomplish
this using maxclock option, but it did not work.

Thanks,

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Failure of NIST Time Servers

2011-06-04 Thread David L. Mills

Eugen ,

The remote NIST servers do not use the ACTS driver in the distribution. 
They use an algorithm called lockclock that functions as a modem device 
driver. I assume the unhealthy indication provided in the ACTS timecode 
is translated to the NTP LI indicator via the local clock driver, but 
Judah is using a relatively old and much modified version of the NTP 
reference implementation, so it is not clear how this is done.


Dave

Eugen COCA wrote:


In my opinion, transmitting the time with an offset of about 680
seconds with ... some of the systems transmitted the wrong time
without this indication. (unhealthy indicator), is a bit
unprofessional. Of course, it is the users' sole responsibility to
configure his/her time servers in order to avoid these things.

I'm thinking for you to post a messages stating that such a behavior
will never happen in the future as we discovered the bug and corrected
it.

Eugen

On May 27, 11:16 pm, jlevine jlev...@boulder.nist.gov wrote:
 


The primary and backup ACTS servers that are used to synchronize the
NIST
Internet time servers to the atomic clock ensemble in Boulder failed
on Wednesday 24 May at about 1900 UTC (1300 MDT). The failure
affected
11 of the 35 NIST Internet time servers, and the time transmitted by
the affected servers was wrong by up to 680 seconds. In most cases,
the incorrect time was accompanied by the unhealthy indicator, but
some
of the systems transmitted the wrong time without this indication. The
other 24 servers were not affected. In some cases, one of the physical
servers at a site was affected while the others were not, so that
repeated requests to the public site address resulted in time messages
that differed by up to 680 seconds.

The ACTS servers have been fully repaired as of 27 May, 1800 UTC (1200
MDT),
and all of the servers should be resynchronized within a few hours.
There
may be a transient error of up to 10 milliseconds during this period.

I apologize for this failure, and I regret the problems and
inconvenience
that have resulted.

Judah Levine
jlev...@boulder.nist.gov
   



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Loop Filter Gains vs. Polling Interval

2011-05-16 Thread David L. Mills

Edward,

The loop time constant varies directly with the poll interval. See How 
NTP Works in the current documentation. Note that the default value and 
range are purposely optimized for public time servers in order to manage 
network overhead and are not appropriate for the most accurate LAN 
servers. For that the poll interval and thus loop time constant should 
be clamped below this, but not below 4 (16 s) for typical 100-Mb/s networks.


Dave

Mischanko, Edward T wrote:

Dave, 


How do I adjust the loop time constant so that it is shorter??

 

The integral component should be retained.  The effect of having too 
long a loop time constant should be that the system fails to track
 

clock 
 


wander, so would be more vulnerable to temperature changes or
 

forgetting 
 


to disable power management of clock frequency.
 



-Original Message-
On Behalf Of David Woolley
Sent: Monday, May 16, 2011 2:01 AM
To: questions@lists.ntp.org
Subject: Re: [ntp:questions] Loop Filter Gains vs. Polling Interval

Mischanko, Edward T wrote:
 


Can anyone tell me, does the sensitivity for frequency adjustment
   


lessen
 


as the polling interval increases?  I ask because I'm observing that
   


my

They are linked, but I have a feeling that the loop time constant is not

clamped, even though the poll interval is.  The aim is to always achieve

a certain level of oversampling.  At the very least, the intent is to 
vary the time constant and then choose a poll interval that is 
appropriate for that.


 


offset increases and the frequency adjustment decreases to the point I
fall out of sync at polling intervals above 256.  What am I doing
   


wrong?

The integral component should be retained.  The effect of having too 
long a loop time constant should be that the system fails to track clock


wander, so would be more vulnerable to temperature changes or forgetting

to disable power management of clock frequency.

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] ntp-keygen -H and update options

2011-05-13 Thread David L. Mills

Joe,

The documentation is rather specific. If you generate a new host or sign 
key, the certificates are invalid and should be regenerated. Running 
ntp-keygen with now arguments generates a new certificate of the same 
type and signature as the existing one.


Dave


Joe Smithian wrote:


Hi All,

I am trying to configure a trusted NTP server and some clients using
Autokey.

ntp-keygen document:


-HGenerate a new encrypted RSA public/private host key file and link. Note
that if the sign key is the same as the host key, generating a new host key
invalidates all certificates signed with the old host key.My questions:

1-When we should use -H option? When generating new keys? updating
certificates? or both cases?



2-Does “-H” flag only generate RSA keys; not DSA even when we use –S DSA
option, as in the example below?



Let say we generate new keys using non-default options such as

e.g:ntp-keygen generate -password mypasword -c RSA-SHA -S RSA -modulus
1024



3- Should we use the same arguments when running ntp-keygen later to update
the certificates/keys? Is ntp-keygen smart enough to generate new
certificates of the same type as the existing one without specifying the
arguments? If not the problem is that if the user runs the ntp-keygen with
no or different arguments it may generate new certificates of different
type.




I would appreciate your comments.

Regards

Joe
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] POSIX leap seconds versus the current NTP behaviour

2011-05-07 Thread David L. Mills
 that if an adjustment rate of once every 
10 seconds is
all that is necessary to achieve this precision with this system's clock and 
this fine a
time source, then when you have the same system clock but a much sloppier time 
reference
source (e.g. time samples from the network) the adjustment rate justifiable by 
the
achievable timekeeping accuracy is going to be significantly lower (say once 
every
few hundred seconds, like the Allan intercept with a good NTP source).  This is 
a good
result, if it can be implemented this way, since being able to keep the clock 
as accurate
as it can be with a rate of adjustment which is typically quite small has some 
side
benefits with respect to the implementation of kernel timestamping of packets 
or other
events, or of system-call-free user space time stamping.

Dennis Ferguson


On 5 May 2011, at 04:10 , David L. Mills wrote:
 


Dennis,

Holy timewarp! Are you the same Dennis Ferguson that wrote much of the original 
xntpd code three decades ago? If so, your original leapseconds code has changed 
considerably, as documented in the white paper at 
www.eecis.udel.edu/~mills/leap.html. It does not speak POSIX, only UTC. This 
applies to both the daemon and the kernel.

Dave

Dennis Ferguson wrote:

   


Hello,

A strict reading of the part of the POSIX standard defining seconds
since the epoch would seem to require that when a leap second is added the
clock should be stepped back at 00:00:01.  That is, the second which should
be replayed is the second whose seconds since the epoch representation is
an even multiple of 86400.  Right now the NTP implementation doesn't do that,
it instead steps the clock back at 00:00:00 and replays the second which is
one before the even multiple of 86400 in the seconds since the epoch
representation, to match what seems to be required for the NTP timescale.

For a new implementation of this is there any reason not to do the kernel
timekeeping the way POSIX seems to want it?  I thought I preferred the NTP
handling since it seemed to keep the leap problem on the correct day (for
an all days have 86400 seconds timescale, which describes both the NTP and
the POSIX timescales), but I've since decided that might not be all that
important and I appreciate the symmetry of the POSIX approach (leaps forward
occur at 23:59:59, leaps back at 00:00:01, and both leaps end up at 00:00:00)
as well as the fact that the POSIX approach yields a simple equation to 
determine
the conversion from time-of-day to seconds-since-the-epoch which is always
valid, even across a leap (and even if the inverse conversion is ambiguous)
while I'm having difficulty finding a similar description of NTP's behaviour.

Dennis Ferguson
 



 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] POSIX leap seconds versus the current NTP behaviour

2011-05-04 Thread David L. Mills

Dennis,

Holy timewarp! Are you the same Dennis Ferguson that wrote much of the 
original xntpd code three decades ago? If so, your original leapseconds 
code has changed considerably, as documented in the white paper at 
www.eecis.udel.edu/~mills/leap.html. It does not speak POSIX, only UTC. 
This applies to both the daemon and the kernel.


Dave

Dennis Ferguson wrote:


Hello,

A strict reading of the part of the POSIX standard defining seconds
since the epoch would seem to require that when a leap second is added the
clock should be stepped back at 00:00:01.  That is, the second which should
be replayed is the second whose seconds since the epoch representation is
an even multiple of 86400.  Right now the NTP implementation doesn't do that,
it instead steps the clock back at 00:00:00 and replays the second which is
one before the even multiple of 86400 in the seconds since the epoch
representation, to match what seems to be required for the NTP timescale.

For a new implementation of this is there any reason not to do the kernel
timekeeping the way POSIX seems to want it?  I thought I preferred the NTP
handling since it seemed to keep the leap problem on the correct day (for
an all days have 86400 seconds timescale, which describes both the NTP and
the POSIX timescales), but I've since decided that might not be all that
important and I appreciate the symmetry of the POSIX approach (leaps forward
occur at 23:59:59, leaps back at 00:00:01, and both leaps end up at 00:00:00)
as well as the fact that the POSIX approach yields a simple equation to 
determine
the conversion from time-of-day to seconds-since-the-epoch which is always
valid, even across a leap (and even if the inverse conversion is ambiguous)
while I'm having difficulty finding a similar description of NTP's behaviour.

Dennis Ferguson
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Bug 1700 - Clock drifts excessively at pollinglevels above 256.

2011-04-24 Thread David L. Mills

Edward,

I don't know enough about the mechanism Windows uses to adjust the 
system clock. If some variant of the Unix adjtime(), the solution may be 
straightforward. The phase-lock loop parameters determine the risetime 
and overshoot of the discipline loop, in particular the loop gain and 
corer frequency. The Unix discipline loop is carefully designed for 
minimum risetime consistent with controlled overshoot of about six 
percent  The loop is designed to preserve this characteristic over a 
wide time constant an poll interval range from 8 s to 36 hr, but with 
the FLL in use at the longer time constants. The most critical parameter 
is the loop gain, which depends primarily on the timer frequency. In 
most Unix systems this is 100 Hz, but in some systems it can be as high 
as 1000 Hz with a change in parameters. It would not be feasible here to 
summarize in detail how to establish these parameters; however, Chapter 
4 of both the first and second edition of the boot Network Time 
Synchronization: the NTP Protocol on Earth and in Space, CRC Press,, 
has the mathematical basis.


Here is a quick litmus test. With the client in normal operation with 
the pool interval at 6 (64 s) and the time and frequency settled down, 
introduce a 100-ms step in time. The discipline loop should converge to 
zero in about 3000 s and overshoot about six percent. If the response is 
far different than that, major surgery is required.


Dave,

Mischanko, Edward T wrote:


Your question is a very good one that I don't know the answer to.  I
have observed this behavoir while actually watching NTP Time Server
Monitor by Meinberg, live.  I didn't anticipate any power saving
features as being active.  I have seen a correction in the behavior by
changing the PLL and FLL gains, as noted.  I haven't specifially looked
for this problem in other ports, only Windows and only on systems
without a referrence clock.

-Original Message-
From: questions-bounces+edward.mischanko=arcelormittal@lists.ntp.org
[mailto:questions-bounces+edward.mischanko=arcelormittal@lists.ntp.o
rg] On Behalf Of unruh
Sent: Saturday, April 23, 2011 1:20 PM
To: questions@lists.ntp.org
Subject: Re: [ntp:questions] Bug 1700 - Clock drifts excessively at
pollinglevels above 256.

On 2011-04-22, Mischanko, Edward T edward.mischa...@arcelormittal.com
wrote:
 


256.

My system clock drifts excessively when polling above 256 in a Windows
environment, as much as 5 ms or more.  I have made changes to
.../ntpd/ntp_loopfilter.c CLOCK_PLL, CLOCK_FLL, CLOCK_LIMIT, and
CLOCK_PGATE to address this problem.  I realize the changes I have
   


made
 


are global in nature and really only need changes in the Windows port.
I would welcome a patch to the Windows port to accomplish these
   


changes
 


or any other changes that accomplish the 1 ms stability I have now
achieved.  I hope Dr. Mills will have constructive comments on this
problem and proposed solutions.
   



Does your system cpu use powersaving cpu frequency changes? Is it a
virtual Windows machine?

 


*/
   



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] NTPD can take 10 hours to achieve stability

2011-04-18 Thread David L. Mills

C.,

It doesn't take ten hours; it takes five/ten minutes. See the online 
documentation release notes for recent NTP development versions at 
www.ntp.org.


Dave

C BlacK wrote:


Why would it take ntpd ten hours to achieve its accuracy?  Can this be 
explained in laymans terms and
mathematically



 

Absolutely normal!  NTPD can sometimes need up to ten hours to achieve 
the accuracy it is capable of.


 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Help getting IRIG working

2011-04-02 Thread David L. Mills

Chris  Co.,

The usual problem is overdriving the computer input.. Most IRIG devices 
produce a modulated signal in the range 10 V P-P, which is far larger 
than the line-in level. You might need an attenuator to produce in the 
order of 1 V P-P. As Chris says, the best way is to monitor the line out 
signal using the computer speaker. With a little practice, it is 
possible to slowly increase the input level until the speaker changes 
tone or becomes raspy. The bottom line is to monitor the AGC signal with 
that trace and bracket the input signal so the AGC reads in the middle 
of the range about 127.


Dave



Dave

Chris Albertson wrote:


On Fri, Apr 1, 2011 at 10:57 AM, Jim Kusznir jkusz...@gmail.com wrote:
 


Hello all:

I'm trying to set up a linux ntp server using IRIG as a time source,
from a SEL 2407 (http://www.selinc.com/sel-2407/).  Unfortunately,
I've not managed to get this running yet.
   




Have you tried flag3 to enable audio monitoring?  This should allow
you to hear the IRIG signal on the computer's speakers.  Hearing the
signal would 100% verify that the signal is being inpu
 


___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Venting steam: Autokey in 4.2.6/4.2.7

2011-03-30 Thread David L. Mills

Steve,

Whatever does or does not work with IFF applies also to GQ and MV. These 
have been no changed. However, from a purely practical view, IFF is 
probably best for typical Internet configurations..


 Dave

Dave

Steve Kostecke wrote:


On 2011-03-29, Dave Hart h...@ntp.org wrote:

 


On Tue, Mar 29, 2011 at 12:53 AM, David L. Mills mi...@udel.edu wrote:

   


I sent you a message requesting to test this before deployment.


 


I was referring to docs galore as I thrashed about earlier.  I don't doubt
each of your changes was an improvement, but each one also made Steve's
4.2.4 step-by-step guide less useful.  I was looking at:
   



I've moved the legacy Autokey Configuration to
http://support.ntp.org/bin/view/Support/ConfiguringAutokeyFourTwoFour

http://support.ntp.org/bin/view/Support/ConfiguringAutokey is being
updated for the current Autokey configuration scheme. It currently
only covers IFF and it does not address any of the ident/group name
features.

At the moment I have ntp-dev-4.2.7p142 Autokey+IFF running between
psp-fb1 (trust group server) and psp-os1. Here's the view from the
client:

ntpq rv 6
assID=29118 \
status=f63a reach, conf, auth, sel_sys.peer, 3 events, event_10,
srcadr=psp-fb1.ntp.org, srcport=123, dstadr=2001:4f8:fff7:1::26,
dstport=123, leap=00, stratum=2, precision=-20, rootdelay=0.626,
rootdisp=16.495, refid=209.81.9.7,
reftime=d13c56aa.cc4f74b3  Tue, Mar 29 2011 13:01:30.798,
rec=d13c588e.76244c5b  Tue, Mar 29 2011 13:09:34.461, reach=377,
unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=176, flash=00 ok,
keyid=2472358740, offset=-1.346, delay=0.194, dispersion=5.554,
jitter=0.605, xleave=0.028,
filtdelay=0.28   0.25   0.34   0.29   0.25   0.26  0.19  0.22,
filtoffset=  -0.96  -0.85  -0.72  -0.69  -0.80  -0.97 -1.35 -0.39,
filtdisp= 0.00   1.02   2.04   3.03   4.05   5.06  6.06  7.05,
host=psp-fb1.ntp.org, flags=0x87f21, signature=md5WithRSAEncryption

The flags decode as:

#define CRYPTO_FLAG_ENAB  0x0001 /* crypto enable */
#define CRYPTO_FLAG_IFF   0x0020 /* IFF identity scheme */
#define CRYPTO_FLAG_VALID 0x0100 /* public key verified */
#define CRYPTO_FLAG_VRFY  0x0200 /* identity verified */
#define CRYPTO_FLAG_PROV  0x0400 /* signature verified */
#define CRYPTO_FLAG_AGREE 0x0800 /* cookie verifed */
#define CRYPTO_FLAG_AUTO  0x1000 /* autokey verified */
#define CRYPTO_FLAG_SIGN  0x2000 /* certificate signed */
#define CRYPTO_FLAG_LEAP  0x4000 /* leapseconds table verified */

I also have Autokey+IFF running between a 4.7.7p142 (amd64) client and a
4.2.6p2 (686) server on my home LAN.

I appreciate Dave Hart's patience with me on IRC while getting this up
and running.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] new driver development

2011-03-30 Thread David L. Mills

Bruce,

You have completely missed the point. The war to minimize the number of 
drivers has not always been successful, but does represent many hours of 
work on my part to update all the drivers when some minor detail of the 
common interface has changed over the last thirty years. Your reference 
to April Fool does not help when assessing your credibility.


Dave

Bruce Lilly wrote:


On Mon, 28 Mar 2011 15:01:13 +, David L. Mills wrote:

 


You may not be aware that all Spectracom devices are supported with one
driver, all TrueTime devices are supported with one driver, all
telephone modem services are supported with one driver, all Austron
devices are supported with one driver, all Heath devices are supported
with one driver  and most GPS receivers are supported with one driver.
   



I'm going to give the benefit of the doubt and presume that that's an
early April Fool's joke:

Spectracom devices involve not one, but multiple drivers:
  refclock_acts.c
  refclock_irig.c
  refclock_wwvb.c

GPS receivers invlove *many* drivers, including at least the following:
  refclock_acts.c
  refclock_arbiter.c
  refclock_as2201.c
  refclock_bancomm.c
  refclock_fg.c
  refclock_gpsvme.c
  refclock_hopfpci.c
  refclock_hopfser.c
  refclock_hpgps.c
  refclock_jupiter.c
  refclock_mx4200.c
  refclock_nmea.c
  refclock_oncore.c
  refclock_palisade.c
  refclock_parse.c
  refclock_ripencc.c
  refclock_trak.c
  refclock_true.c
  refclock_wwvb.c
  refclock_zyfer.c
...and your reference to all one Austron devices [sic] is contradictory
as the supported Austron device *is* a GPS receiver.

Note also in the above list of GPS drivers that there are two separate
ones for Hopf devices as well as for Trimble GPS devices.

The refclock_heath.c driver in fact supports only one of the two (long
since discontinued) Heathkit receivers, as is well-documented in the
source code, e.g.:
* The GC-1001 II was apparently never tested and, based on a Coverity
* scan, apparently never worked [Bug 689].  Related code has been 
disabled.


As for Truetime, the driver opens a serial port and parses the received
text; it does not need to access different types of objects, different
object namespaces, or different APIs -- it's really only one sort of 
device

with relatively minor data stream inconsistencies.

With somewhat greater accuracy, you might have said that there is one
driver that supports all supported SVID IPC interfaces.

 


This happened with many hours of dedicated effort on the part of
refclock developers. You can appreciate the serious pushback in creating
a new driver if a similar one is already available.
   



One might then ask, as many of the above all merely grab data from a
serial port, why they were not all required to be rolled into a single
driver, such as refclock_parse.c.  But then, a quick 
glance

at that shows how convoluted things can get...

One might also wonder why there are separate drivers for WWVB receivers,
all using the same type of serial port communications, and all apparently
minor variations derived from the first:
 refclock_wwvb.c
 refclock_chronolg.c
 refclock_dumbclock.c
 refclock_ulink.c
... or the ones for various IRIG time code receivers.

In the case of what I have to date proposed, there are no similar
drivers (I looked. Several times).  There aren't any that address
the issues outlined in the article which started this thread.  There
aren't any that use any form of POSIX IPC.  There seems to be some
confusion, probably on the part of those unfamiliar with the differences
between SVID and POSIX shared memory: SVID shared memory and POSIX
shared memory have as much in common as a Yamaha motorcycle and a Yamaha
piano. Less in common than other types of IPC (e.g. semaphores), which
are also quite different and in most cases also incompatible.

 


An appropriate plan
is

[common interface code]
#ifdef POSIX
...
#else
...
#endif
   



That implies compiled-in support for one (exclusive-)or another
type of device, i.e. no possibility to use both types.  Worse,
it means one build will try to access one type of device given
a particular refclock server configuration, and a build with a
different set of (build) configure options will access a
different type of device given the same (runtime) ntpd
configuration file refclock server pseudo-IP address specification.
In effect, it means that there are two distinct drivers (only one
of which can be incorporated at build time) using a single device
driver number.

Are you really suggesting some sort of build-time configure option
and conditional compilation macro that would replace (e.g.) the
Heathkit driver with something completely different?

If so, in some sense that might not be such a terrible idea; some
products (and in some cases the companies that produced them) have
long since vanished.  Likewise for some technologies (e.g.  LORAN).
However, it invites confusion when bug reports etc. refer to a device
number that might be different

Re: [ntp:questions] Venting steam: Autokey in 4.2.6/4.2.7

2011-03-29 Thread David L. Mills

Dave,

I didn't mean to cause Steve problems, but something did need to be 
changed, particularly the binding between the trusted host name and the 
group name. Besides fixing the vulnerability, it makes use of non-keygen 
certificates less of a bother. Also, this allows more than one secure 
group to share the same broadcast network. This is the third 
more-or-less trivial change in syntax in fifteen years (frm Autokey 
Version 1).


The -l option was added in order to change the certificate expiration 
time for test and to allow users to make long-lived certificates.


Dave

Dave Hart wrote:

On Tue, Mar 29, 2011 at 12:53 AM, David L. Mills mi...@udel.edu 
mailto:mi...@udel.edu wrote:


I sent you a message requesting to test this before deployment.


I was referring to docs galore as I thrashed about earlier. Â I don't 
doubt each of your changes was an improvement, but each one also made 
Steve's 4.2.4 step-by-step guide less useful. Â I was looking at:


http://www.eecis.udel.edu/~mills/ntp/html/autokey.html 
http://www.eecis.udel.edu/%7Emills/ntp/html/autokey.html
http://www.eecis.udel.edu/~mills/ntp/html/keygen.html 
http://www.eecis.udel.edu/%7Emills/ntp/html/keygen.html

http://support.ntp.org/bin/view/Support/ConfiguringAutokey
http://bugs.ntp.org/1864 https://bugs.ntp.org/show_bug.cgi?id=1864Â 

BTW keygen.html mentions a -l days option which ntp-keygen doesn't 
understand, do you want me to fix the options processing so it does? 
 Or get rid of that item from the docs?


I'm not the dimmest bulb on the block, but when I was interested in 
reproducing the crash reported in bug 1864 and 1840, I didn't manage 
to. Â And I spent several hours trying. Â The crash may be a bug I 
introduced in ntp_config generic FIFO code that replaced the 
degenerate use of priority queues as FIFOs in Sachim's original 
ntp.conf parser rewrite. Â I was focused on getting past the 
configuration issues to debug the configuration code, not on setting 
up a working Autokey.


That said, Steve has kindly dove in head first and is extracting me 
from my confusion one step at a time. Â I never forgot that you wanted 
me to test pool + autokey operation, I just feared and loathed the 
idea of setting up autokey again from scratch and have had other 
things to keep me busy. Â I'm optimistic Steve will be able to help me 
get a working setup to test pool + autokey and also to see if 
ntp_crypto.c:2984 really is unneeded.


Cheers,
Dave Hart



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Venting steam: Autokey in 4.2.6/4.2.7

2011-03-29 Thread David L. Mills

Miroslav,

Unfortunately, while things were in flux, snapshots continued to be 
produced, which was counterproductive. I have no direct say in that.


The best advice is:

1. Produce a working version of the configuration without Autokey.
2. Roll keys for all group members using ntp-keygen with no options 
other than the -T option for the trusted hosts. Add the crypto command 
with no options to all configuration files. Add the autokey option to 
the server command for all clients of the trusted hosts. Verify the TC 
scheme works.
3. Make the group keys with the -I option on a trusted host or trusted 
agent.
4. Make the client keys from the group keys and distribute as in the 
original directions. Use an arbitray file name, preferably the name of 
the group.
5. Add the ident option to the client server command with name the same 
as the client keys installed.
6. For broadcast clients, use the same files, but use the ident option 
in the crypto command instead.


All this is in the autokey.html page along with a detailed description 
of the operations. Note also the relevant white pages at the NTP project 
page www.eecis.udel.edu/~ntp.html, especially the security analysis and 
the simulation and analysis of the on-wire protocol.


In contrast with the previous version, no options are required on the 
crypto command other than cited above. Note that the -s option is not 
required on the ntp-keygen program. These options can be added for 
special circumstances.


Dave

Miroslav Lichvar wrote:


On Mon, Mar 28, 2011 at 11:11:28PM +, Dave Hart wrote:
 


Autokey is very clever in dealing with some unique challenges other
PKI OpenSSL client code doesn't have to.  Anyone attempting to
configure it should be on payroll, if not time and a half.

(insert series of profanities here)
   



I had a similar feeling when I was expanding my NTP test suite to test
basic Autokey functionality and compatibility between 4.2.2, 4.2.4 and
4.2.6 version. I eventually got most of it working, but I'm not sure
if it's working as intended or accidentaly by misplacing a private
key, etc.

I wasn't able to get the MV scheme working though. I have read the
official ntp-keygen page and the wiki document.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] new driver development

2011-03-28 Thread David L. Mills

Bruce  Co.,

You may not be aware that all Spectracom devices are supported with one 
driver, all TrueTime devices are supported with one driver, all 
telephone modem services are supported with one driver, all Austron 
devices are supported with one driver, all Heath devices are supported 
with one driver  and most GPS receivers are supported with one driver. 
This happened with many hours of dedicated effort on the part of 
refclock developers. You can appreciate the serious pushback in creating 
a new driver if a similar one is already available. An appropriate plan is


[common interface code]
#ifdef POSIX
...
#else
...
#endif

Dave

Bruce Lilly wrote:


On Fri, 18 Mar 2011 04:51:40 +, Harlan Stenn wrote:


 


I don't see this one.  If flag1 0 (the current default) means SVID,
and we decide that flag1 1 means POSIX, what is the issue?  How is
that significantly different from changing 127.127.28.x to 127.127.y.x ?
   



Among others,
 1. The following is workable:
   server 127.127.28.1 ...

   server 127.127.y.1 ...

Your proposal in this respect, viz.:
   server 127.127.28.1 ...
   fudge 127.127.28.1 flag1 0 ...

   server 127.127.28.1 ...
   fudge 127.127.28.1 flag1 1 ...

simply won't work. IOW, one can have 4 units ea. using different
drivers, but one cannot have multiple devices sharing the same
driver and unit numbers but differing flags (or ttl, etc.)

 2. With separate drivers, each can perform appropriate initialization
via the clock_init function pointer in its struct refclock
structure.  One cannot alter the way that works based on flag or ttl
values as neither are accessible; the prototype is:
  void (*clock_init) (void);
I.e. no pointer to a peer structure.


 


They are separate issues.  We support timespec where it exists.  We want
to support timespec under SHM regardless.
   



If I thought that was feasible, I would have done it and submitted 
patches a year ago.


___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Venting steam: Autokey in 4.2.6/4.2.7

2011-03-28 Thread David L. Mills

Dave,

When all else fails, read the documentation. There were good reasons to 
change the configuration in minor ways.


1. There was a huge vulnerability if the identity file was specified by 
the server, but the correct file was not specified by the client. The 
scheme devolved to TC with no warring to the user.
2. Multiple secure groups (including anycast and pool) sharing the same 
broadcast network are supported. The primary intent is to provide an 
engineered selection of pool servers from the same DNS collection.
3. Configuration is much simpler and for the TC identity scheme requires 
no arguments on the ntp-keygen program or crypto  configuration command.

4. Configuration for prior versions is possible; see the documentation.

I sent you a message requesting to test this before deployment.

Dave

.Dave Hart wrote:


http://support.ntp.org/bin/view/Support/ConfiguringAutokey

For ntpd 4.2.4 and earlier, Steve Kostecke patiently worked out
step-by-step instructions, and refined them over time heping people to
use them, as seen on the page referenced above.

For 4.2.6 ntp-keygen and autokey got an overhaul which makes those
instructions useless.  To investigate http://bugs.ntp.org/1840 and
http://bugs.ntp.org/1864 filed by Rich Schmidt about ntpd 4.2.7
crashing when attempting to use Autokey, and to test a change to
remove a presumed unneeded line of code (ntp_crypto.c:2984) identified
through static analysis, I once again have tried to get a basic
Autokey setup working.

So far I have spent hours and achieved nothing but failure and
humiliation.  This is with Rich holding my hand telling me what to do.
I'm so pissed off I want a baseball bat and an effigy.  Now, granted,
I'm not scratching an itch to secure my NTP, I'm scratching an itch to
reproduce a fault and fix it, so i'm not typical, but if i were trying
to secure my NTP, I'd use symmetric key.

Autokey is very clever in dealing with some unique challenges other
PKI OpenSSL client code doesn't have to.  Anyone attempting to
configure it should be on payroll, if not time and a half.

(insert series of profanities here)

Dave Hart
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Secure NTP

2011-03-24 Thread David L. Mills

Yassica,

In principle, NTP Autokey can use certificates generated by OpenSSL or 
by other certificate authorities (CA); however, there are some very 
minor details with these certificates, including the sequence number and 
use of the X.500 extension fields. Ideally, the CA would run the Autokey 
protocol and serve as the TH itself, which would be consistent with the 
TC model. Absent that, the choice is to use the certificates generated 
by the ntp-keygen program.


Yessica wrote:


Hello!
I am installing an NTP server, but requires authentication for that
clients can be synchronized with the server, and also that
authentication should be with public and private keys. Let me know if
I can work with certificates issued by any authority or can only use
the certificates generated by the ntp-keygen.

Thank you very much!
I hope you can answer.

PS: I'm working with ntp v4

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] new driver development

2011-03-17 Thread David L. Mills

Bruce,

I take it your driver will replace or modify an existing driver, right? 
Adding a new driver to the current population of over 40 drivers is  not 
a practical course.


Dave

Bruce Lilly wrote:


I'm preparing a POSIX shared memory driver (PSHM) for ntp to address a
few issues that exist with the present SHM driver.  In no particular
order, these are:

o POSIX (not SVID) shared memory

  -- POSIX shared memory namespace rather than hexadecimal constant
  -- avoids 0x4e545030 [...] Big units will give non-ascii
  -- provides ample namespace size for ridiculously huge numbers of
 units w/o obfuscation

o nanoseconds, not microseconds

  -- resolution compatible with bulk of the ntp reference
 implementation
  -- using POSIX struct timespec
  -- client compatibility with POSIX clock_gettime()

o per-unit configurable source type (a.k.a. class)

  -- unlike present SHM driver, which treats all units as identical
 and unconfigurable
  -- currently UHF; wrong for everything else
  -- was TELEPHONE; wrong for everything else

o per-unit PPS flag

  -- permits shared memory PPS drivers

o POSIX-comforming code

  -- no attempt to work around buggy (non-POSIX-conforming) systems!

o separate header file to simplify client code

  -- shared memory structure clearly defined and well-documented

o source code/header version strings for use by what(1) or ident(1)

  -- for facilitation of bug reporting, version verification, daemon-
 client campatibility checks

o client/driver run-time implementation/compatibility tests

  -- integer and pointer sizes
  -- endianness

o no dead code

  -- e.g. SHM driver nsamples-related cruft

o provision in shared memory to specify (variable part of) clockstats
  output

  -- client can control clockstats format and content (and frequency
 of logging)
  -- can be different for each unit

o POSIX mutex for synchronized access to shared memory for updates

  -- obviates mode 0 / mode 1 / OLDWAY


Comments and suggestions are welcome.

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Is a Spectracom Netclock/2 worth saving?

2011-03-11 Thread David L. Mills

Rick,

All the Spectracom WWVB and GPS receivers use the same serial protocol; 
I have had one or more of these things running for almost 30 years. The 
Netclock/2 is useless without its ferrite-stick loop antenna, and even 
then the rising noise pollution due to the noisy electrical grid and 
machine room UPS systems have rendered any WWVB receiver essentially 
useless. An ironic fact is that my WWV receivers now outperform the wwvb 
receivers and both are much inferior to a GPS receiver. Considering the 
sometimes difficult problem of finding rooftop real estate for a GPS 
antenna, a CDMA receiver, such as the EndRun CNTP, might be now the 
easiest to deploy.


Dave

Rick Jones wrote:


So, in a local on its way to its final reward pile I have come
across a Spectracom Netclock/2 and am wondering whether it might be
worth saving from the scrap heap.  It does power-up and shows a time.
Unsurprisingly I suppose since it has no antenna connected and I'm in
an office building, the Antenna, Signal and Time Sync LEDs light red
:)

rick jones
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] AutoKey again

2011-02-04 Thread David L. Mills

Jacek,

An index to the cryptic error comment is in ./include/ntp_crypto.h. It 
says bad or missing group key. This message is from the client; you 
should see the similar message at the server. Check to be sure you are 
using the correct client parameters file.


Recent chjanges to the configuration process makes it much simpler to 
deply a secure subnet. This doesn't change the protocol, just the 
commands to set it up. See the development documentation on the web and 
the Autokey Public Key Cryptography page..


Dave

Jacek Igalson wrote:


Hello,

Some time ago I reported a bug in the implementation of
AutoKey+IFF, in ntp ver 4.2.4p8.
The error is intermittent and has been observed a in the long
run of ntpd, that is within 2 - 10 days.

When the error happens, ntpd keeps on running but authenticated
server is rejected:

ntpq -p
remote   refid  st t when poll reach   delay   offset 
jitter


neptune .CRYP.  16 u   6d   1600.0000.000 
0.000
*ntp2.tp.pl  .ATOM.   1 u   15   64  3772.5220.008 
0.088


ntpq -c associations
ind assID status  conf reach auth condition  last_event cnt
===
 1 60684  e0fe   yes   yes   ok reject 15
 2 60685  9614   yes   yes  none  sys.peer   reachable  1

Client synchronizes successfully to the another server which is
in the configuration file.
Server with the authentication is not used any more, reject
status seems to be permanent (unless ntpd is restarted).

The only hint is in cryptostats logfile:
...ntpkey_IFFkey_xxx.tpnet.pl.3479706582 mod 384
...error 10e opcode 8207 ts 3505303563 fs 3479706582

What is a meaning of error 10e opcode?
Has someone encountered such a problem in the longer run?

I appreciate your help.
Jacek
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Number of servers needed to detect one falseticker?

2011-01-05 Thread David L. Mills

Terje,

That's why Autokey uses digital signatures and zero-knowledge identity 
proofs.


Dave

Terje Mathisen wrote:


David L. Mills wrote:


Miroslav,

Nowhere in the documentation produced by me is the statement that the
minimum number of servers to reliably find the truechimers is four.
There might have been some confusion in the past, in particular with
reference to Lamport's paper, which describes an algorithm much more
complicated and unsuitable for practical use. In that paper, four
Byzantine generals are necessary to detect a traitor, but only three if
digital signatures are available. The NTP algorithm, derived in part
from Keith Marzullo's dissertation, is not that algorithm.



I.e. Byzantine generals not only lie, the also lie about _who_ they 
are, spoofing messages from other generals. In NTP this would mean a 
falsticker which also sends out packets pretending to be responses 
from other servers, something which is effectively impossible unless 
they are based on the same (broadcast) network and can sniff incoming 
requests and/or poison the ARP tables to commandeer the other server's 
IP address.


Your digital signatures make such lies impossible.


The NTP algorithm is described on the page you cite. A constructive
proof, elaborated in my book, is simple and based on the intersection
properties of correctness intervals, which are loosely defined as the
interval equal to the roundtrip delay with the center point as the
maximum likelihood estimate of the server offset. If there are two
servers and their correctness intervals overlap, both are truechimers.
If the intervals do not overpap, no decision is possible. If there are
three servers and the intersection of two intervals is nonempty, both
are truechimers and the third is a falseticker. If no two intervals
intersect, no decision is possible.

So, it is incomplete to specify a minimum number of servers. The only
valid statement is on the page The intersection interval is the
smallest interval containing points from the largest number of
correctness intervals. If the intersection interval contains more than
half the total number of servers; those servers are truechimers and the
others are falsetickers.



I think Miroslav showed an ascii art example for when three servers 
might not be enough:


Two servers which don't overlap, and a third which overlaps (partly) 
both of them:


      server A and B
  ---   server C

In this particular situation C must be a survivor, but since it 
overlaps both A and B with an identical amount, there is no way to 
determine if (A^C) or (B^C) is the best interval to pick.


I guess the key here is that this situation is impossible unless at 
least one of the servers are lying (falseticker).


You could even extend this to four servers, where server D is 
identical to server C, and it would be equally hard to determine if A 
or B was the falseticker, right?


Fortunately, NTP timestamps have enough resolution to make the 
likelyhood of multiple perfectly positioned confidence intervals 
extremely unlikely, and if it does happen in a particular poll cycle, 
then NTPD will happily coast on until the next poll. :-)


Terje



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Number of servers needed to detect one falseticker?

2011-01-05 Thread David L. Mills

MIroslav,

The select algorithm was changed in a very minor way to conform 
precisely to the formal assassin quoted in my previous message. It 
probably has very little practical significance. After all, the old 
algorithm has been going strong for nineteen years.


Dave

Miroslav Lichvar wrote:


On Wed, Jan 05, 2011 at 09:23:59AM +0100, Terje Mathisen wrote:
 


Two servers which don't overlap, and a third which overlaps (partly)
both of them:

     server A and B
 ---   server C

In this particular situation C must be a survivor, but since it
overlaps both A and B with an identical amount, there is no way to
determine if (A^C) or (B^C) is the best interval to pick.
   



The select algorithm doesn't care how much they overlap. Recent
ntp-dev versions work as described on the select.html web page, so the
intersection interval will be equal to C and all three sources will
pass. Older versions worked also with centers of the intervals and as
the centers of A and B are lying outside the intersection interval, C
would be the only truechimer. 


I'd be curious to hear why that approach was dropped.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Number of servers needed to detect one falseticker?

2011-01-05 Thread David L. Mills

Terfe,

Read the formal assertion carefully and examine the algorithm on the 
Select Algorithm page. The algorithm would return interval C as the 
smallest intersection with the largess number of contributors.


Dave

Terje Mathisen wrote:


David L. Mills wrote:


Miroslav,

Nowhere in the documentation produced by me is the statement that the
minimum number of servers to reliably find the truechimers is four.
There might have been some confusion in the past, in particular with
reference to Lamport's paper, which describes an algorithm much more
complicated and unsuitable for practical use. In that paper, four
Byzantine generals are necessary to detect a traitor, but only three if
digital signatures are available. The NTP algorithm, derived in part
from Keith Marzullo's dissertation, is not that algorithm.



I.e. Byzantine generals not only lie, the also lie about _who_ they 
are, spoofing messages from other generals. In NTP this would mean a 
falsticker which also sends out packets pretending to be responses 
from other servers, something which is effectively impossible unless 
they are based on the same (broadcast) network and can sniff incoming 
requests and/or poison the ARP tables to commandeer the other server's 
IP address.


Your digital signatures make such lies impossible.


The NTP algorithm is described on the page you cite. A constructive
proof, elaborated in my book, is simple and based on the intersection
properties of correctness intervals, which are loosely defined as the
interval equal to the roundtrip delay with the center point as the
maximum likelihood estimate of the server offset. If there are two
servers and their correctness intervals overlap, both are truechimers.
If the intervals do not overpap, no decision is possible. If there are
three servers and the intersection of two intervals is nonempty, both
are truechimers and the third is a falseticker. If no two intervals
intersect, no decision is possible.

So, it is incomplete to specify a minimum number of servers. The only
valid statement is on the page The intersection interval is the
smallest interval containing points from the largest number of
correctness intervals. If the intersection interval contains more than
half the total number of servers; those servers are truechimers and the
others are falsetickers.



I think Miroslav showed an ascii art example for when three servers 
might not be enough:


Two servers which don't overlap, and a third which overlaps (partly) 
both of them:


      server A and B
  ---   server C

In this particular situation C must be a survivor, but since it 
overlaps both A and B with an identical amount, there is no way to 
determine if (A^C) or (B^C) is the best interval to pick.


I guess the key here is that this situation is impossible unless at 
least one of the servers are lying (falseticker).


You could even extend this to four servers, where server D is 
identical to server C, and it would be equally hard to determine if A 
or B was the falseticker, right?


Fortunately, NTP timestamps have enough resolution to make the 
likelyhood of multiple perfectly positioned confidence intervals 
extremely unlikely, and if it does happen in a particular poll cycle, 
then NTPD will happily coast on until the next poll. :-)


Terje



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Number of servers needed to detect one falseticker?

2011-01-05 Thread David L. Mills

Miroslav,

According to your diagram, the algorithm would determine the 
intersection interval as interval a. The midpoints of all three 
intervals would be considered truechimers, since each of the intervals 
a, b and c, contain points in the intersection interval.


Dave

Miroslav Lichvar wrote:


On Wed, Jan 05, 2011 at 10:31:15AM -0500, Brian Utterback wrote:
 


Let's equalize a bit to make it a bit more fair:

c   b-
 a--

So, now, if you were NTP, which would you choose? You are correct in
your assessment that NTP would accept them all as truechimers. You are
correct also that adding a fourth still does not guarantee that you
will throw out the falseticker, but NTP uses intervals at this stage,
not actual servers, so adding another truechimer will guarantee that
the interval used will contain the real time.
   



Not necessarily.

 |
  -   A
 | -  B
  C 
  --- D

 |
   == X

Here, B is the only server off, but the result X doesn't contain the
actual time.

 


I think clockhopping can happen with any number of servers, there just
needs to be two or more similar sources on top of the list sorted by
synchronization distance.

 


With more servers on the list, the clustering and combining algorithms
will merge them into a single offset and they will not hop. With two
servers, these algorithms cannot function.
   



Combining doesn't affect clockhopping, it happens after the system
peer is selected.

 


By the way, over time Dr. Mills has added features to try to suppress
clock hopping as much as possible without compromising the correctness
proofs. With the latest versions, clock hopping may not be so much of
a problem. Bu tit is still an issue. Even if you prefer one clock, it
might be inaccessible for a while and you will hop anyway.
   



Yes, the maximum anti-clockhopping threshold is a fixed value (1 ms by
default), so it can't work well in all situations. But it can be tuned
with the tos mindist command.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Number of servers needed to detect one falseticker?

2011-01-04 Thread David L. Mills

Miroslav,

Nowhere in the documentation produced by me is the statement that the 
minimum number of servers to reliably find the truechimers is four. 
There might have been some confusion in the past, in particular with 
reference to Lamport's paper, which describes an algorithm much more 
complicated and unsuitable for practical use. In that paper, four 
Byzantine generals are necessary to detect a traitor, but only three if 
digital signatures are available. The NTP algorithm, derived in part 
from Keith Marzullo's dissertation, is not that algorithm.


The NTP algorithm is described on the page you cite. A constructive 
proof, elaborated in my book, is simple and based on the intersection 
properties of correctness intervals, which are loosely defined as the 
interval equal to the roundtrip delay with the center point as the 
maximum likelihood estimate of the server offset. If there are two 
servers and their correctness intervals overlap, both are truechimers. 
If the intervals do not overpap, no decision is possible. If there are 
three servers and the intersection of two intervals is nonempty, both 
are truechimers and the third is a falseticker.  If no two intervals 
intersect, no decision is possible.


So, it is incomplete to specify a minimum number of servers. The only 
valid statement is on the page The intersection interval is the 
smallest interval containing points from the largest number of 
correctness intervals. If the intersection interval contains more than 
half the total number of servers; those servers are truechimers and the 
others are falsetickers.


Dave

Miroslav Lichvar wrote:


Hi,

I'm wondering about the section 5.3.3 on the ntp support web

http://support.ntp.org/bin/view/Support/SelectingOffsiteNTPServers#Section_5.3.3.

It says and explains that minimum number of servers to detect one
falseticker is four, is that really correct? I understand that four is
better for reliability, but from the algorithm description
(http://www.eecis.udel.edu/~mills/ntp/html/select.html) and my tests
with a simulated falseticker it seems that three is enough.

Also, while running with two servers might be the worst configuration
for ntpd, it still could be prefered over the configuration with only
one server by users who would rather have two sources marked as
falsetickers and know a problem needs to be fixed than unknowingly
follow a bad truechimer.

Is it possible to reword that section?

Thanks,

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Number of servers needed to detect one falseticker?

2011-01-04 Thread David L. Mills

David,

As you might see from the online documentation, much of the tutorial 
material has been largely rewritten. Awhile back, some kind soul pointed 
out a logical discrepancy in the select algorithm. That was repaired, 
the code updated and the documentation refreshed. The pages linked from 
How NTP Works is offered as a definitive tutorial that might clear the 
air on these issues.


Dave

David Woolley wrote:


Miroslav Lichvar wrote:


On Tue, Jan 04, 2011 at 02:35:13PM -0500, Richard B. Gilbert wrote:


The problem with using only two servers is that NTPD has no means of
determining which is more nearly correct when the two differ, as
they inevitably will!



ntpd will pick the one with smaller distance if their intervals
overlap. Otherwise they both will be falsetickers.

In this case, ntpd will use an average of both of them, when the 
confidence intervals overlap; it will not pick just one except for the 
purposes of providing downstream error statistics.


___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] synchronization distance

2010-12-04 Thread David L. Mills

David,

I'm not making myself absolutely crystal clear and you are obscuring the 
point.


Windows has an awesome protocol that sets the time. It happens to use 
the NTP packet header format, but is not otherwise compliant with the 
NTPv4 specification, especially the 36-h poll interval limitation, which 
is an engineering parameter based on the expected wander of a commodity 
crystal oscillator. All that doesn't matter at all, other than Windows 
servers are compatible with Windows clients. What does matter is that 
Windows servers are NOT compatible with NTPv4 clients, which SHOULD NOT 
BE USED. Use one of the SNTP variants instead.


As a diehard workaround, use the tos maxdist command to set the distance 
threshold to something really high, like ten seconds. There is nothing 
whatsoever to be gained by this, as the expected error with update 
intervals of a week will be as bad or worse than with SNTP..


Dave

David Woolley wrote:


David L. Mills wrote:


BlackList,

I say again with considerable emphasis: this is a Microsoft product, 
not the NTPv4 distribution that leaves here. What you see is what you 
get, 



But it is often NTPv4 reference version that is used as the client and 
fails to synchronize because the root dispersion is too high.


Corporate politics are such that it is difficult to get a Unix system, 
or even Windows running the reference version, near the root of the 
time distribution  tree.  People deeper in the tree then see the 
effects, even if they are using the reference implementation.


warts and all. I doubt it has anything to do with root distance, or 
any other public specification, but that doesn't make any difference 
if the customer is satisfied with the performance. Just don't compare 
it with anything in the NTP distribution, documentation or 
specification.


Dave

E-Mail Sent to this address will be added to the BlackLists wrote:


David L. Mills wrote:
 


I had no idea somebody would try to configure current
NTPv4 with a poll interval of a week.
The current maximum allowed is 36 h.
  



http://technet.microsoft.com/en-us/library/cc773263%28WS.10%29.aspx
BlockQuote
SpecialPollInterval
This entry specifies the special poll interval in seconds
 for manual peers. ...
The default value on stand-alone clients and servers is 604,800.
/BlockQuote

{7 days}


 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] synchronization distance

2010-12-04 Thread David L. Mills

David,

I'm not learning anything at all in our exchange, and that is a real 
disappointment. Apparently, there are complaints NTPv4 does not play 
nicely with Microsoft. Microsoft is not about to change. NTPv4 is not 
about to change; however, there is a minor configuration option that 
makes NTPv4 work almost as well or maybe worse than SNTP. Whether this 
is good or bad corporate practice is not based on sound engineering 
principles, but on corporate convenience. I see absolutely no need to 
care about that or especially to prolong this discussion.


Dave

David Woolley wrote:


David L. Mills wrote:


David,

I'm not making myself absolutely crystal clear and you are obscuring 
the point.


Windows has an awesome protocol that sets the time. It happens to use 
the NTP packet header format, but is not otherwise compliant with the 
NTPv4 specification, especially the 36-h poll interval limitation, 
which is an engineering parameter based on the expected wander of a 
commodity crystal oscillator. All that doesn't matter at all, other 
than Windows servers are compatible with Windows clients. What does 
matter is that Windows servers are NOT compatible with NTPv4 clients, 
which SHOULD NOT BE USED. Use one of the SNTP variants instead.



To a large extent I would agree with you, but the net effect of this 
is to say if you work for a marketing led company (probably true of 
most of the Fortune 500), do not use NTP as it is almost certain that 
your IT department has a strict Microsoft policy for their core 
systems, and are not time synchronisation experts.




As a diehard workaround, use the tos maxdist command to set the 
distance threshold to something really high, like ten seconds. There 
is nothing whatsoever to be gained by this, as the expected error 
with update intervals of a week will be as bad or worse than with SNTP..


Dave

David Woolley wrote:


David L. Mills wrote:


BlackList,

I say again with considerable emphasis: this is a Microsoft 
product, not the NTPv4 distribution that leaves here. What you see 
is what you get, 




But it is often NTPv4 reference version that is used as the client 
and fails to synchronize because the root dispersion is too high.


Corporate politics are such that it is difficult to get a Unix 
system, or even Windows running the reference version, near the root 
of the time distribution  tree.  People deeper in the tree then see 
the effects, even if they are using the reference implementation.


warts and all. I doubt it has anything to do with root distance, or 
any other public specification, but that doesn't make any 
difference if the customer is satisfied with the performance. Just 
don't compare it with anything in the NTP distribution, 
documentation or specification.


Dave

E-Mail Sent to this address will be added to the BlackLists wrote:


David L. Mills wrote:
 


I had no idea somebody would try to configure current
NTPv4 with a poll interval of a week.
The current maximum allowed is 36 h.
  




http://technet.microsoft.com/en-us/library/cc773263%28WS.10%29.aspx
BlockQuote
SpecialPollInterval
This entry specifies the special poll interval in seconds
 for manual peers. ...
The default value on stand-alone clients and servers is 604,800.
/BlockQuote

{7 days}


 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions




___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] synchronization distance

2010-12-03 Thread David L. Mills

David,

I'm confused about your explanation, especially the default 
configuration. The definitive explanation of synchronization distance, 
in particular your reference more precisely to root distance, is on the 
page How NTP Works in the online documentation at ntp.org.


Dave

David Woolley wrote:


Atul Gupta wrote:


Can anyone explain me what is synchronization distance in case of ntp?
and what is distance exceeded problem?



It's an estimate of the maximum difference between the time on the 
stratum zero source and the time measured by the client, consisting of 
components for round trip time, system precision and clock drift since 
the last actual read of the stratum source.


It most commonly happens because people try to use default 
configurations of w32time as their source.  The default configuration 
has exceptionally long poll times, and doesn't adjust its stratum to 
indicate that it hasn't had an update in days.  (The default 
configuration is also far from full NTP compliance.)


___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] synchronization distance

2010-12-03 Thread David L. Mills

BlackList,

I say again with considerable emphasis: this is a Microsoft product, not 
the NTPv4 distribution that leaves here. What you see is what you get, 
warts and all. I doubt it has anything to do with root distance, or any 
other public specification, but that doesn't make any difference if the 
customer is satisfied with the performance. Just don't compare it with 
anything in the NTP distribution, documentation or specification.


Dave

E-Mail Sent to this address will be added to the BlackLists wrote:


David L. Mills wrote:
 


I had no idea somebody would try to configure current
NTPv4 with a poll interval of a week.
The current maximum allowed is 36 h.
   



http://technet.microsoft.com/en-us/library/cc773263%28WS.10%29.aspx
BlockQuote
SpecialPollInterval
This entry specifies the special poll interval in seconds
 for manual peers. ...
The default value on stand-alone clients and servers is 604,800.
/BlockQuote

{7 days}


 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] help needed for ntpd ipv6 setup

2010-12-01 Thread David L. Mills

horhe,

I can't speak to the versions used by other repackagers, but the current 
ntp-dev version interprets a nonzero broadest delay option as defeating 
the calibration volley for all broadcast and multicast clients. This is 
why it replaced the novolley option of the broadcast client command. If 
this turns out not to be the case, a bug report is suggested.


Dave

horhe wrote:


W dniu 2010-11-29 21:37, Marc-Andre Alpers pisze:
 


Hello!

Have nobody a solution or idea wat is wrong with my server?
   



Hello,
I think this is the same bug : http://bugs.gentoo.org/326209 . I have
got very similar problem.
Regars

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Newbie question on the MD5 key of a public/remote NTP server.

2010-11-12 Thread David L. Mills

Harry,

As I said, NTP Autokey is designed to operate outside the NAT perimeter. 
In principal, although I don't recommend it, it is possible to use 
symmetric key cryptography transparently with a NAT box. The policies on 
assignment and distribution of keys depend on the agency. NIST has an 
experimental MD5 server with expectation you pay a service fee for the 
key. I am told NRC (Canada) either plans or has in operation a similar 
service.


Dave

Harry wrote:


Hello,

I'm quite new to the NTP world. I haven't had a chance to study and
understand the NTP trust model fully.
But I /have/ understood so far is...
 1. that MD5 symmetric keys can be used to authenticate a public/
remote NTP Server
 2. that this public/remote, MD5 talking NTP server can reach out to
NTP clients behind a NAT/Firewall (which Autokey protocol cannot)
 3. that the MD5 symmetric keys must be distributed securely somehow
to the NTP client.

What I haven't been able to figure out is...
 1. How/Where to locate a public/remote NTP server that supports MD5
authentication?
 2. How would the administrator of this NTP server (a human)
distribute the keys to me: Via email? Via Phone/Fax?
 3. Having received the keys even by secure means such as email/phone/
fax, what is stopping me from going rogue later... say, by using the
key values of the authentic server and distributing wrong time? (I
won't of course actually go rogue, just trying to understand.)

Can somebody please explain this in plain English?

Regards,
/HS

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Will AutoKey setup work on a NAT host behind a firewall?

2010-11-10 Thread David L. Mills

Harry,

Symmetric key cryptography works fine behind a NAT box. See the 
Authentication Support page in the official NTP documentation on 
ntp.org. As I said, the intended Autokey model is for the server and 
client to live on the Internet side of the NAT box and have it serve 
time to the internal network via a separate interface.


Dave

Harry wrote:


On Nov 10, 2:59 am, David L. Mills mi...@udel.edu wrote:
 


Harry,

Autokey is not designed to work behind NAT boxes. The Autokey server and
client must have the same (reversed) IP addresses. The intended model is
using two interfaces, one for the Internet side running Autokey, the
other for the inside net on the other side of the NAT box.

Dave

Harry wrote:
   


Hello,
 


I want to employ the AutoKey method of securing NTP.
 


Basically, I want one host that would act as an NTP client of an
external NTP server, talking AutoKey. This NTP client is to become the
NTP server for other hosts on the intranet. All these hosts are behind
a corporate firewall and are very likely using NAT / IP masquerading
as well. (I can tell NAT / IP masquerading is in use in our
environment because all hosts report the same IP address at
http://www.whatismyipaddress.com.)
 


I ask this question because I ran into a circa 2004 link (http://
www.ecsirt.net/tools/crypto-ntp.html) that says,
  Be Aware!
  Before we start building ntpd, one important notice:
  NTP with Autokey does not work from a host that is behind a
masquerading or NAT host!
 


Is this a conceptual / fundamental limitation, or something related to
NTP version? If latter, I'm hoping that it would probably have been
fixed by now.
 


If  AutoKey and NAT don't go together conceptually, what would be my
next best option of securing NTP? Though MD5 method is there but it is
symmetric cryptography and prone to man-in-the-middle attacks... which
is why btw I was hoping to be able to employ AutoKey.
 


Many thanks,
/HS
 


___
questions mailing list
questi...@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 

   



Dave, I really appreciate your response to my newbie question.

May I ask (you or other users of this forum)...

1. What, then, would be the next best way (MD5-based symmetric key
mode?) to syncing up a behind-NAT NTP client from an external NTP
server in a tamper-proof manner? I'm not competent/powerful enough to
advise the powers what be in my organization to have an Autokey NTP
client outside our NAT/Firewall; most likely, I'll be told to continue
to operate from behind the NAT/Firewall.

2. What physical/network setup should Autokey-desiring NTP clients
follow? Is it OK, e.g., to have a Autokey client host (AkH) outside
one's NAT network and have all the hosts inside the NAT network use
AkH as a NTP server?


I also skimmed thru your (excellent) book on NTP. I was hoping to find
a mention of NAT in Chapter 9, but didnt. Not complaining, just humbly/
respectfully bringing it up. So, please do elaborate here if you can
on this issue.

Many thanks in advance,
/HS

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Will AutoKey setup work on a NAT host behind a firewall?

2010-11-09 Thread David L. Mills

Harry,

Autokey is not designed to work behind NAT boxes. The Autokey server and 
client must have the same (reversed) IP addresses. The intended model is 
using two interfaces, one for the Internet side running Autokey, the 
other for the inside net on the other side of the NAT box.


Dave

Harry wrote:


Hello,

I want to employ the AutoKey method of securing NTP.

Basically, I want one host that would act as an NTP client of an
external NTP server, talking AutoKey. This NTP client is to become the
NTP server for other hosts on the intranet. All these hosts are behind
a corporate firewall and are very likely using NAT / IP masquerading
as well. (I can tell NAT / IP masquerading is in use in our
environment because all hosts report the same IP address at
http://www.whatismyipaddress.com.)

I ask this question because I ran into a circa 2004 link (http://
www.ecsirt.net/tools/crypto-ntp.html) that says,
   Be Aware!
   Before we start building ntpd, one important notice:
   NTP with Autokey does not work from a host that is behind a
masquerading or NAT host!

Is this a conceptual / fundamental limitation, or something related to
NTP version? If latter, I'm hoping that it would probably have been
fixed by now.

If  AutoKey and NAT don't go together conceptually, what would be my
next best option of securing NTP? Though MD5 method is there but it is
symmetric cryptography and prone to man-in-the-middle attacks... which
is why btw I was hoping to be able to employ AutoKey.

Many thanks,
/HS

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-11-09 Thread David L. Mills
 55680.806 0.000152206 0.000 0.000635788 0.00 6
55507 55748.805 0.17572 0.000 0.001913772 0.00 6
55507 55749.805 0.17023 0.000 0.002925621 0.00 6
55507 55816.805 0.02029 0.000 0.004627915 0.00 6
55507 55820.805 0.01787 0.000 0.006756535 0.00 6
55507 55887.805 0.00213 0.000 0.009488173 0.00 6
55507 55953.805 0.025053023 75.234 0.012286357 0.00 6
55507 56014.806 0.020799313 75.234 0.011590816 0.00 6
55507 56080.806 0.003262439 75.234 0.012489851 0.00 6
55507 56085.805 -0.002976502 75.234 0.011889591 0.00 6
55507 56284.806 -0.005790262 75.166 0.011166097 0.024282 6
55507 56486.805 -0.004832251 75.107 0.010450418 0.030644 6
55507 56619.806 -0.001653216 75.094 0.009839874 0.029037 6
55507 57107.805 0.001371581 75.134 0.009266278 0.030606 6
55507 57294.806 0.002566599 75.163 0.008678100 0.030363 6
55507 57565.806 0.003550332 75.220 0.008125067 0.034897 6
55507 57771.806 0.004306454 75.273 0.007605004 0.037617 6
55507 58259.806 0.004533934 75.405 0.007114285 0.058414 6
55507 58793.805 0.005079226 75.567 0.006657596 0.079074 6
55507 58851.806 0.005470119 75.585 0.006229144 0.074268 6
55507 59187.806 0.006022272 75.706 0.005830100 0.081514 6
55507 59256.805 0.006003279 75.731 0.005453563 0.076748 6
55507 59391.805 0.005970118 75.779 0.005101355 0.073773 6
55507 59919.805 0.006141090 75.972 0.004772263 0.097114 6
55507 60309.805 0.006432901 76.122 0.004465236 0.105107 6
55507 60441.805 0.006557244 76.173 0.004177077 0.06 6
55507 60635.805 0.006560367 76.249 0.003907298 0.097307 6
55507 60714.806 0.006276305 76.279 0.003656322 0.091620 6
55507 61240.806 0.004699083 76.426 0.003465336 0.100290 6
55507 61424.806 0.004688040 76.477 0.003241528 0.095558 6
55507 61566.805 0.004881127 76.519 0.003032940 0.090572 6
55507 62028.806 0.005194546 76.662 0.002839219 0.098669 6
55507 62146.805 0.005274662 76.699 0.002655997 0.093223 6
55507 62608.806 0.005172885 76.841 0.002484718 0.100701 6
55507 62937.806 0.004712989 76.934 0.002329922 0.099704 6
55507 63331.806 0.004662792 77.043 0.002179514 0.100980 6
55507 63397.805 0.004648114 77.061 0.002038756 0.094680 6
55507 63722.806 0.004832430 77.155 0.001908194 0.094547 6
55507 64246.805 0.004666484 77.301 0.001785916 0.102357 6
55507 64701.806 0.004691766 77.428 0.001670596 0.105788 6
55507 64768.805 0.004609847 77.446 0.001562968 0.099170 6
55507 65165.806 0.004017867 77.542 0.001476927 0.098667 6
55507 65548.806 0.003799573 77.628 0.001383693 0.097256 6
55507 65808.805 0.003839284 77.688 0.001294402 0.093375 6
55507 65893.805 0.003827564 77.707 0.001210810 0.087613 6
55507 66411.805 0.003819743 77.825 0.001132612 0.091952 6
55507 66611.806 0.003747307 77.870 0.001059771 0.087451 6
55507 66942.806 0.003664775 77.942 0.000991754 0.085704 6
55507 67268.805 0.003395993 78.008 0.000932556 0.083495 6
55507 67796.805 0.003322722 78.113 0.000872711 0.086411 6
55507 68061.806 0.003407083 78.166 0.000816891 0.083039 6
55507 68593.805 0.003187342 78.268 0.000768071 0.085501 6
55507 68989.805 0.003164909 78.342 0.000718508 0.084227 6
55507 69255.805 0.003198491 78.393 0.000672208 0.080801 6
55507 69428.805 0.003139896 78.425 0.000629134 0.076445 6
55507 69779.806 0.002965214 78.487 0.000591733 0.074796 6
55507 69976.806 0.002849590 78.521 0.000555023 0.070958 6
55507 70397.805 0.002597799 78.586 0.000526753 0.070263 6
55507 70501.806 0.002577416 78.602 0.000492785 0.065967 6
55507 70632.806 0.002619503 78.622 0.000461198 0.062129 6
55507 70941.809 0.002597823 78.670 0.000431480 0.060528 6
55507 71223.806 0.002607564 78.714 0.000403627 0.058701 6
55507 71471.806 0.002375362 78.749 0.000386381 0.056296 6
55507 71929.805 0.002268382 78.811 0.000363400 0.057030 6
55507 71941.805 0.002310620 78.813 0.000340257 0.053349 6
55507 72472.805 0.002392774 78.889 0.000319604 0.056633 6
55507 72668.805 0.002407426 78.917 0.000299007 0.053901 6
55507 72994.806 0.002265679 78.961 0.000284150 0.052767 6

With the clock off by 1ms at start, the frequency estimate is about 5 PPM low.

Happy hunting,
Dave Hart

On Sun, Nov 7, 2010 at 00:58 UTC, David L. Mills mi...@udel.edu wrote:
 


Dave,

I think I have hunted down what is going on. It takes some serious
investigation. Turns out the modern adjtime(), at least in some systems, is
far from what I knew some years back. I have already described that era from
what I knew of SunOS, Ultrix, OSF/1 and Tru64, since I or my graduate
assistant  implemented the precision time kernel used in those production
systems. Today, at least Solaris and Linux have put up stuff that turns out
to be absolute poison when attempting things like measuring frequency.

The original model I started with was (a) messier the initial offset and
tell adjtime() to slew the kernel to that offset; (b) after five minutes
assume the kernel has largely completed the slew, measure the current offset
anc compute the frequency. This is a little tricky, since the amount the
kernel has slewed the time must be added before computing the time

Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-11-08 Thread David L. Mills

Dave,

Notice toward the end of the calibration period adjtime() is called with 
only a small offset, so issues like the slew rate and residual offset 
are moot. Those calls should minimize the residual and the actual clock 
time should be within the measuement offset. Apparentyl, at least the 
FreeBSD mechanizm does not forget prior requests as the Solaris 
mechanism does. I tried it with the kernel disabledd with the same result.


In the original precision time modifications for SunOS, Ultrix, OSF/1, 
HP-UX and Solaris some fifteen years ago, the kernels conformed to the 
original BSD semanitcs and all this claptrap about measuring clock 
frequency worked just fine. From recent experiments with initial clock 
offset in the 30-50 ms and hardware clock errors to 40 PPM, all this 
worked fine. I storngly suspect at least the Solaris and Tru64 precision 
time kernels have not been modified, but the adjtime() semantics has, at 
least for relativley large initial offsets. It is not an issue of 
limiting the slew rate to less than 500 PPM, as that is in fact the 
result shown in the looopstats trace. It seems at least some kernels do 
not forget past programmed offsets when presented with new ones. For 
that reason if no other, the mission to measure the intrinsic clock 
offset with large initial offsets is dead in the water.


The source will be modified to entirely avoid all such initial training.

Dave

Dave Hart wrote:


On Mon, Nov 8, 2010 at 05:14 UTC, David L. Mills mi...@udel.edu wrote:
 


Thanks for the test. I verified the same thing. Note that the measured
offset at the end of the frequency measurement phase was very small, so the
net frequency measurement should be the same as Solaris. Obviously, FreeBSD
is doing something very different than Solaris. I suspect Linux is doing
something completely different as well. At this point I am prepared to
abandon the mission entirely, as I don't want to get bogged down with the
specifics of each idiosyncratic operating system. Accordingly, I will back
out all the changes and revert to the bad old ugly algorithms.
   



That is disappointing but I understand your frustration.  I was hoping
the remainder returned by adjtime() would allow ntpd to know exactly
how much the OS had in fact slewed the clock, adapting to differing
adjtime() implementations.

Cheers,
Dave Hart
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-11-06 Thread David L. Mills

Dave,

I think I have hunted down what is going on. It takes some serious 
investigation. Turns out the modern adjtime(), at least in some systems, 
is far from what I knew some years back. I have already described that 
era from what I knew of SunOS, Ultrix, OSF/1 and Tru64, since I or my 
graduate assistant  implemented the precision time kernel used in those 
production systems. Today, at least Solaris and Linux have put up stuff 
that turns out to be absolute poison when attempting things like 
measuring frequency.


The original model I started with was (a) messier the initial offset and 
tell adjtime() to slew the kernel to that offset; (b) after five minutes 
assume the kernel has largely completed the slew, measure the current 
offset anc compute the frequency. This is a little tricky, since the 
amount the kernel has slewed the time must be added before computing the 
time difference.


Well, this didn't work, apparently because the kernel didn't do what it 
was supposed to do. However, a hint is available in the form of the 
second phase of the training interval where the residual offset is 
amortized while holding the frequency constant. This involves 
periodically measuring the offset and updating the adjtime() programmed 
offset. This works, as evident my previous message. This could be due to 
a nonlinearity in the adjtime() calculation of the slew rate, which is 
apparently higher the larger the programmed offset. Apparently, the 
repeated calls to adjtime() eventually lowers the offset and thus the 
slew rate to something reasonable. Whatever the cause, the behavior when 
the frequency file is present is within expectations.


So, I did the same thing during the frequency measurement phase, with 
result the following loopstats from a Solaris system with initial offset 
122 ms, no frequency file and the kernel enabled.


55506 78196.297 0.122624404 0.000 0.043354274 0.00 4
55506 78200.301 0.10804 0.000 0.040745455 0.00 4
55506 78208.300 0.083775590 0.000 0.040190338 0.00 4
55506 78226.299 0.047307323 0.000 0.045698174 0.00 4
55506 78244.298 0.026714020 0.000 0.054270636 0.00 4
55506 78262.298 0.015085167 0.000 0.063186252 0.00 4
55506 78280.298 0.008518459 0.000 0.071337915 0.00 4
55506 78298.298 0.004810297 0.000 0.078439342 0.00 4
55506 78316.298 0.002716332 0.000 0.084507033 0.00 4
55506 78334.297 0.001533888 0.000 0.089652370 0.00 4
55506 78352.298 0.000866173 0.000 0.094006017 0.00 4
55506 78370.298 0.000489120 0.000 0.097751274 0.00 4
55506 78388.297 0.000276202 0.000 0.100850407 0.00 4
55506 78406.297 0.000155969 0.000 0.103480985 0.00 4
55506 78424.297 0.88074 0.000 0.105713875 0.00 4
55506 78442.297 0.49735 0.000 0.107604976 0.00 4
55506 78460.297 0.28085 0.000 0.109209608 0.00 4
55506 78478.298 0.15859 0.000 0.110570569 0.00 4
55506 78496.298 0.003133702 10.416 0.111724541 0.00 4
55506 78512.297 0.001806182 10.416 0.104509792 0.00 4
55506 78528.297 0.000873748 10.416 0.097760515 0.00 4
55506 78544.297 0.000362151 10.416 0.091446767 0.00 4
55506 78656.297 0.000353000 12.931 0.085540618 0.889337 4
55506 78688.298 0.000354000 13.104 0.080015921 0.834140 4
55506 78720.297 0.000226000 13.214 0.074848054 0.781241 4
55506 78736.297 -0.000163000 13.175 0.070014079 0.730920 4
55506 78864.299 -0.000234000 12.718 0.065492179 0.702547 4
55506 78912.299 -0.000289000 12.506 0.061262326 0.661420 4
55506 78944.298 -0.000286000 12.366 0.057305659 0.620669 4
55506 78992.297 -0.000286000 12.157 0.053604536 0.585287 5
55506 79008.297 -0.000233000 12.143 0.050142455 0.547509 5
55506 79269.297 -0.000469000 11.676 0.046904046 0.538099 5
55506 79430.297 -0.000319000 11.480 0.043874749 0.508089 5
55506 79495.297 -0.000265000 11.414 0.041041075 0.475841 5
55506 79755.304 -0.000206000 11.210 0.038390415 0.450932 5
55506 79852.297 -0.000257000 11.115 0.035910950 0.423146 5
55506 79949.297 -0.000249000 11.023 0.033591618 0.397155 6

It starts at second 78196 with offset 122 ms and frequency zero. At 300 
s later, second 78496, the frequency is set at 10.496, which happens to 
be within 1 PPM of the nominal value. At this point the residual offset 
is about 3.1 ms. At second 78544 the residual offset drops below 0.5 ms 
and the frequency clamp is removed. However, there is about a 0.3 ms 
residual offset, which at this low poll interval of 16 s and low time 
constant makes the frequency loop rather sensitive, so the frequency 
jumps to 12.9 PPM. While this slowly subsides to the e nominal value, 
note the residual offset stays below 0.35 ms. Victory is declared


I haven't tried this on other machines, but the sheer blunderbuss 
approach here should tame even them. Your exploits to the contrary are 
invited. The new code is in the backroom, but not yet a snapshot.


Dave

Dave Hart wrote:


On Sat, Nov 6, 2010 at 03:34 UTC, David L. Mills mi...@udel.edu wrote:
 


Now to the apparent initial frequency

Re: [ntp:questions] calldelay syntax error ntp4.2.7p77 ?

2010-11-05 Thread David L. Mills

Steve,

Clarification received. My understanding was that the current ntp-dev 
documentation was published on the web site, but that assumption 
apparently is defective. The release version is so far behind the 
development version that the two are essentially completely different 
protocols. They don't belong on the same web site. In any case, a 
prospective development user must first obtain the distribution, then 
read the release notes to see if it would be useful. At least, the 
release notes should be on the web site.


Dave

Steve Kostecke wrote:


On 2010-11-05, David L. Mills mi...@udel.edu wrote:

 

The calldelay option is not mentioned in the master copy of the 
documentation that resides here. Sometimes there can be considerable 
delay for it to be published at ntp.org.
   



doc.ntp.org is an archive which houses copies of the original Official
Distribution Documentation for production (i.e. stable) releases of
NTP. 


doc.ntp.org links to the NTP-dev documentation you maintain. We do not
mirror your documentation tree.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-11-05 Thread David L. Mills

Dave,

Further investigation continues; however, you have a fundamental 
misunderstanding of the slew limit in Soalris or any other 
BSD-semantics system. The slew is not a limit, it is an intrinsic 
constant equal to 500 PPM. The slew is implemented in this way. When 
adjtime() is called, it computes the number of timer ticks necessary to 
amortize the given offset at a fixed amount at each tick. Those kernels 
I have seen, including SunOS, Ultrix, OSF/1, Tru64 and Solaris, use an 
increment of 5 microseconds at each interrupt of a 100-Hz clock, which 
results in a slew rate of 500 PPM. There is no issue of exceeding the 
slew rate, it is an intrinsic constant compiled in the kernel. In days 
past, this value could be tinkered using the tickadj() program, but 
today that is not always possible.


The test I did was to run two tests using initial offsets of +90 ms and 
-90 s. The tests use  ntptime -f 500 or -f -500 for three minutes in 
order to change the offset to about +-90 ms, then start ntpd normally. 
The frequency file was present and very near the expected hardware 
frequency. My results were far different than yours; offset converges to 
within 0.5 ms and frequency surge less than 1 PPM. We need to explain 
why the results are so far different.


Dave

Dave Hart wrote:


I have reproduced the problems Miroslav reports with the new startup
behavior on FreeBSD, indicating the problem is not isolated to Linux.
My guess is Solaris is the outlier here, because its adjtime() doesn't
enforce a 500 PPM cap.  The system I'm using is psp-fb2, a FreeBSD 6.4
x86 machine with the latest ntp-dev snapshot which is known to
normally keep time well.

I ran several tests, but I think the last is the most interesting.  I
stopped ntpd and removed its drift file (containing 80.530), then
slewed the time backward 100 msec using adjtime() and patience.  After
restarting it takes a few minutes for its three manycast sources to be
found and sys.peer to be declared.  Five minutes later, the frequency
is mis-estimated at 236 PPM, about 155 PPM too much.  As a result, the
system sets up for quite a long settling period with many steps.
Below are the logs and some billboard snapshots.  Please ignore the
refclock, it is marked noselect.

If your FreeBSD test machine has a negative ntp.drift, you might see
worse behavior with a positive initial offset.  I am repeating the
test with the correct drift file in place now.

Cheers,
Dave Hart

loopstats

55505 59851.663 0.102341957 0.000 0.036183346 0.00 6
55505 60190.664 0.080163684 236.465 0.032321622 0.00 6
55505 60257.664 0.053082763 236.465 0.031713930 0.00 6
55505 60325.664 0.009274475 236.465 0.033465616 0.00 6
55505 60393.664 -0.029456239 236.465 0.034168151 0.00 6
55505 60458.664 -0.041054653 236.465 0.032223363 0.00 6
55505 60459.664 -0.041380233 236.465 0.030142416 0.00 6
55505 60662.664 -0.006418134 236.076 0.030786168 0.137559 6
55505 60943.120 -0.066341200 234.964 0.035751383 0.413381 6
55505 61215.145 -0.076412713 233.726 0.033631393 0.584265 6
55505 61334.664 -0.090128710 233.086 0.031830847 0.591422 6
55505 61669.516 0.0 233.086 0.01907 0.553225 6
55505 61748.516 -0.027231614 232.958 0.009627829 0.519476 6
55505 61883.516 -0.038290456 232.650 0.009818119 0.497985 6
55505 62351.516 -0.066567566 230.793 0.013575544 0.804986 6
55505 62555.516 -0.099115749 229.588 0.017137138 0.865194 6
55505 63161.378 0.0 229.588 0.01907 0.809315 6
55505 63244.058 -0.033000775 229.425 0.011667536 0.759242 6
55505 63358.379 -0.044957700 229.119 0.011704101 0.718371 6

ntp.log

5 Nov 16:27:36 ntpd[57402]: ntpd exiting on signal 15
5 Nov 16:34:05 ntpd[58309]: Listen normally on 10 multicast ff05::101 UDP 123
5 Nov 16:34:05 ntpd[58309]: Added Multicast Listener ff05::101 on
interface #10 multicast
5 Nov 16:34:05 ntpd[58309]: NTP PARSE support: Copyright (c)
1989-2009, Frank Kardel
5 Nov 16:34:05 ntpd[58309]: PARSE receiver #0: initializing PPS to ASSERT
5 Nov 16:34:05 ntpd[58309]: PARSE receiver #0: reference clock
Meinberg GPS16x receiver (I/O device /dev/refclock-0, PPS device
/dev/refclock-0) added
5 Nov 16:34:05 ntpd[58309]: PARSE receiver #0: Stratum 0, trust time
4d+00:00:00, precision -19
5 Nov 16:34:05 ntpd[58309]: PARSE receiver #0: rootdelay 0.00 s,
phase adjustment 0.001968 s, PPS phase adjustment 0.00 s, normal
IO handling
5 Nov 16:34:05 ntpd[58309]: PARSE receiver #0: Format recognition:
Meinberg GPS Extended
5 Nov 16:34:05 ntpd[58309]: PARSE receiver #0: PPS support
(implementation PPS API)
5 Nov 16:34:05 ntpd[58309]: GENERIC(0) 8011 81 mobilize assoc 62401
5 Nov 16:34:05 ntpd[58309]: refclock_newpeer: clock type 43 invalid
5 Nov 16:34:05 ntpd[58309]: 127.127.43.0 interface 127.0.0.1 - null
5 Nov 16:34:05 ntpd[58309]: 192.168.4.255 c811 81 mobilize assoc 62403
5 Nov 16:34:05 ntpd[58309]: ff05::101 c811 81 mobilize assoc 62404
5 Nov 16:34:05 ntpd[58309]: 0.0.0.0 c016 06 restart
5 Nov 16:34:05 ntpd[58309]: 0.0.0.0 

Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-11-05 Thread David L. Mills

Dave,

We need some order here, as what you report with a pre-existing 
frequency file is quite dubious. Consider the two runs here, both with 
an existing frequency file and starting at both plus and minus 90 ms 
offsets:


howland initial offset -91.7 ,s

55505 63959.804 -0.01000 -7.250 0.11310 0.003221 6
55505 64177.095 -0.091755534 -7.247 0.032440480 0.00 4
55505 64181.093 -0.083409937 -7.247 0.030488404 0.00 4
55505 64189.093 -0.064119966 -7.247 0.029323417 0.00 4
55505 64207.094 -0.035152179 -7.247 0.029279200 0.00 4
55505 64225.094 -0.019192498 -7.247 0.027963396 0.00 4
55505 64245.095 -0.010407119 -7.247 0.026341136 0.00 4
55505 64263.095 -0.005596762 -7.247 0.024698501 0.00 4
55505 64281.095 -0.002971348 -7.247 0.023121971 0.00 4
55505 64301.095 -0.001484549 -7.247 0.021635011 0.00 4
55505 64319.095 -0.000680614 -7.247 0.020239695 0.00 4
55505 64393.095 -0.00042 -8.676 0.018932726 0.505275 4
55505 64535.096 0.00013 -8.394 0.017711010 0.483019 4
55505 64567.095 0.000178000 -8.308 0.016567141 0.452867 4
55505 64695.095 0.000156000 -8.003 0.015497143 0.437100 4
55505 64711.095 0.000131000 -7.971 0.014496253 0.409026 4
55505 64743.095 0.89000 -7.927 0.013560011 0.382917 4
55505 64871.095 0.81000 -7.769 0.012684229 0.362527 4
55505 64887.095 0.000108000 -7.743 0.011865014 0.339241 5
55505 65065.095 0.65000 -7.699 0.011098715 0.317714 5
55505 65325.095 0.85000 -7.614 0.010381899 0.298685 5
55505 65421.095 0.000131000 -7.566 0.009711391 0.279909 5
55505 65616.095 0.88000 -7.501 0.009084187 0.262852 5

howland initial offset 93/7

55505 66464.549 0.093707184 -7.247 0.033130493 0.00 4
55505 66468.551 0.085202918 -7.247 0.031136252 0.00 4
55505 66476.551 0.065630650 -7.247 0.029936050 0.00 4
55505 66494.550 0.036251561 -7.247 0.029866998 0.00 4
55505 66512.549 0.020055273 -7.247 0.028518816 0.00 4
55505 66532.549 0.011178952 -7.247 0.026860866 0.00 4
55505 66550.549 0.006275409 -7.247 0.025185779 0.00 4
55505 66568.549 0.003628803 -7.247 0.023577714 0.00 4
55505 66588.549 0.002073854 -7.247 0.022061783 0.00 4
55505 66606.548 0.001265987 -7.247 0.020638884 0.00 4
55505 66624.549 0.000840837 -7.247 0.019306494 0.00 4
55505 2.549 0.000598000 -5.376 0.018059775 0.661350 4
55505 66736.549 0.49000 -5.321 0.016894488 0.618946 4
55505 66870.549 -0.000102000 -5.530 0.015803437 0.583647 4
55505 66886.549 -0.000258000 -5.593 0.014782864 0.546406 4
55505 67014.549 -0.000187000 -5.958 0.013828126 0.527176 4
55505 67110.548 -0.000212000 -6.268 0.012935031 0.505203 4
55505 67126.549 -0.000198000 -6.317 0.012099614 0.472883 4
55505 67254.548 -0.000151000 -6.612 0.011318165 0.454465 5
55505 67270.549 -0.9 -6.617 0.010587196 0.425117 5
55505 67336.549 -0.000115000 -6.646 0.009903419 0.397792 5

In both cases, the zeros in the wander column confirm that the frequency 
is not adjusted  until either the offset falls below 0.5 ms or the 
5-minute threshold runs out and the residual offset tickles the 
frequency. Considering this, it seems a stretch that the frequency is 
kicked leading to a massive frequency error. If your tests confirm this, 
please advise.


Now to the apparent initial frequency error. This is new, as tests in 
the past have not confirmed that. I need to plant some debug code in 
direct-freq().


Dave

Dave Hart wrote:


I have reproduced the problems Miroslav reports with the new startup
behavior on FreeBSD, indicating the problem is not isolated to Linux.
My guess is Solaris is the outlier here, because its adjtime() doesn't
enforce a 500 PPM cap.  The system I'm using is psp-fb2, a FreeBSD 6.4
x86 machine with the latest ntp-dev snapshot which is known to
normally keep time well.

I ran several tests, but I think the last is the most interesting.  I
stopped ntpd and removed its drift file (containing 80.530), then
slewed the time backward 100 msec using adjtime() and patience.  After
restarting it takes a few minutes for its three manycast sources to be
found and sys.peer to be declared.  Five minutes later, the frequency
is mis-estimated at 236 PPM, about 155 PPM too much.  As a result, the
system sets up for quite a long settling period with many steps.
Below are the logs and some billboard snapshots.  Please ignore the
refclock, it is marked noselect.

If your FreeBSD test machine has a negative ntp.drift, you might see
worse behavior with a positive initial offset.  I am repeating the
test with the correct drift file in place now.

Cheers,
Dave Hart

loopstats

55505 59851.663 0.102341957 0.000 0.036183346 0.00 6
55505 60190.664 0.080163684 236.465 0.032321622 0.00 6
55505 60257.664 0.053082763 236.465 0.031713930 0.00 6
55505 60325.664 0.009274475 236.465 0.033465616 0.00 6
55505 60393.664 -0.029456239 236.465 0.034168151 0.00 6
55505 60458.664 -0.041054653 236.465 0.032223363 0.00 6
55505 60459.664 -0.041380233 236.465 0.030142416 0.00 6

Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-11-04 Thread David L. Mills

Miroslav,

The NTP daemon purposely ignores the leftover from adjtime(). To do 
otherwise would invite massive instability. Each time an NTP update is 
received, a new offset estimate is available regardless of past history. 
Therefore, the intent is to ignore all past history and start with a 
fresh update. Note that the slew rate of adjtime() is not a factor with 
the kernel discipline.


Dave

Miroslav Lichvar wrote:


On Wed, Nov 03, 2010 at 04:06:39PM +, Dave Hart wrote:
 


On Wed, Nov 3, 2010 at 09:24 UTC, Miroslav Lichvar mlich...@redhat.com wrote:
   


On Tue, Nov 02, 2010 at 10:03:30PM +, David L. Mills wrote:
 


I ran the same test here on four different machines with the
expected results. These included Solaris on both SPARC and Intel
machines, as well as two FreeBSD machines.
   


[...]
   


Ok, I think I have found the problem. The adj_systime() routine is
called from adj_host_clock() with adjustments over 500 microseconds,
which means ntpd is trying to adjust the clock at a rate higher than
what uses the Linux adjtime(). It can't keep up and the lost offset
correction is what makes the ~170ppm frequency error.
 


Congratulations on isolating the problem.  If adjtime() is returning
failure, ntpd will log that mentioning adj_systime.  Do you see any of
those?
   



No, it's not an error in usage, adjtime() just don't have enough time
to apply whole correction and ntpd doesn't check the leftover, so
the offset is adjusted actually slower than what ntpd is assuming.

 


Is it a feature or a bug that FreeBSD and Solaris can apparently slew
faster than 500 PPM using adjtime()?

If it's a feature, is there a way we can detect at configure time what
the adjtime() slew limit is without actually trying it?  We don't want
to require root for configure.
   



Probably not. I think I saw on one BSD system only 100ppm rate, so it
will have to be clamped to either the lowest rate from all supported
systems or to a constant defined in the configure script based on
the system and version.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-11-04 Thread David L. Mills

Miroslav,

Wrong. The damon starts off be setting the frequency to zero, as you can 
see in the protostats. When the frequency calibration is complete, the 
frequency is set directly, as you can also see in the protostats. It 
could be a massively broken motherboard might set the frequency larger 
than 500 PPM, but in your case it set it at 172 PPM, well within the 
tolerance. During the 5 minutes following the direct set, the frequency 
update is suppressed, so there is no clamp. So, please explain where to 
you find the bug.


Dave

Miroslav Lichvar wrote:


On Wed, Nov 03, 2010 at 03:54:33PM +, David L. Mills wrote:
 


The daemon clamps the adjtime() (sic) offset to 500 PPM, which is
consistent with ordinary Unix semantics. 
   



No, during that new fast phase correction on start it's not clamped to
anything. That's the bug I'm hitting here.

If 500 ppm is the standard rate, Linux is working fine and
the other systems are the bad ones.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-11-04 Thread David L. Mills

Miroslav,

IT IS NOT S BUG. Specifically, the Unix adjtime() semantics allows any 
argument, even if bizarre. The slew rate is constant at 500 PPM; the 
duration of the slew is calculate to amortize the argument as given. 
There is no way to exceed the slew rate; it is constant. You and Linux 
may have a different view, but the NTP implementation conforms to the 
traditional Unix semantics.


Dave

Miroslav Lichvar wrote:


On Thu, Nov 04, 2010 at 08:32:06PM +, David L. Mills wrote:
 


Wrong. The damon starts off be setting the frequency to zero, as you
can see in the protostats. When the frequency calibration is
complete, the frequency is set directly, as you can also see in the
protostats. It could be a massively broken motherboard might set the
frequency larger than 500 PPM, but in your case it set it at 172
PPM, well within the tolerance. During the 5 minutes following the
direct set, the frequency update is suppressed, so there is no
clamp. So, please explain where to you find the bug.
   



The bug is in the adj_host_clock() function in ntp_loopfilter.c. On
startup when freq_cnt  0, a reduced time constant is used which makes
the adjustment so large that the adjtime() argument is over 500
microseconds. On systems following the standard slew rate 500 ppm, the
adjustment will be applied only partially in the one second interval
it has and so there will be error in clock_offset. The missing offset
is the cause of the 172ppm error in the frequency estimation.

In order to fix this bug without checking the adjtime() leftover, a
clamp for the adjustment has to be added to that function, so
adjustment + drift_comp stays below 500 microseconds (or whatever
value is appropriate for the system).

For example:

   if (adjustment + drift_comp  500e-6)
   adjustment = 500e-6 - drift_comp;
   else if (adjustment + drift_comp  -500e-6)
   adjustment = -500e-6 - drift_comp;

   clock_offset -= adjustment;
   adj_systime(adjustment + drift_comp);

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-11-04 Thread David L. Mills

Bill,

You have absolutely no idea what you are talking about and you do reveal 
an abysmal lack of understudying of control theory. To incorporate past 
history in future controls when the current control variable is measured 
violates the time delay constrain. Go back to the books.


Dave

unruh wrote:


On 2010-11-04, David L. Mills mi...@udel.edu wrote:
 


Miroslav,

The NTP daemon purposely ignores the leftover from adjtime(). To do 
otherwise would invite massive instability. Each time an NTP update is 
received, a new offset estimate is available regardless of past history. 
Therefore, the intent is to ignore all past history and start with a 
fresh update. Note that the slew rate of adjtime() is not a factor with 
the kernel discipline.
   



That is of course a philosophical position, and a strange one. Clocks
are largely predictable systems ( that is why they are used as clocks).
Thus the past history is strongly determinative of what the future
behaviour will be. To act as if this is not true, that each now
measurement should be treated as if it completely disconnected with the
past is a strange way of treating a highly predictable system. That is
of course one of the key places where ntpd and chrony differ. The
evidence is that properly taking account of the past does not create massive
instability  but rather creates far more accurate discipling of the
clock than does past amnesia.
 


Dave

Miroslav Lichvar wrote:

   


On Wed, Nov 03, 2010 at 04:06:39PM +, Dave Hart wrote:


 


On Wed, Nov 3, 2010 at 09:24 UTC, Miroslav Lichvar mlich...@redhat.com wrote:
  

   


On Tue, Nov 02, 2010 at 10:03:30PM +, David L. Mills wrote:


 


I ran the same test here on four different machines with the
expected results. These included Solaris on both SPARC and Intel
machines, as well as two FreeBSD machines.
  

   


[...]
  

   


Ok, I think I have found the problem. The adj_systime() routine is
called from adj_host_clock() with adjustments over 500 microseconds,
which means ntpd is trying to adjust the clock at a rate higher than
what uses the Linux adjtime(). It can't keep up and the lost offset
correction is what makes the ~170ppm frequency error.


 


Congratulations on isolating the problem.  If adjtime() is returning
failure, ntpd will log that mentioning adj_systime.  Do you see any of
those?
  

   


No, it's not an error in usage, adjtime() just don't have enough time
to apply whole correction and ntpd doesn't check the leftover, so
the offset is adjusted actually slower than what ntpd is assuming.



 


Is it a feature or a bug that FreeBSD and Solaris can apparently slew
faster than 500 PPM using adjtime()?

If it's a feature, is there a way we can detect at configure time what
the adjtime() slew limit is without actually trying it?  We don't want
to require root for configure.
  

   


Probably not. I think I saw on one BSD system only 100ppm rate, so it
will have to be clamped to either the lowest rate from all supported
systems or to a constant defined in the configure script based on
the system and version.



 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-11-04 Thread David L. Mills

Dave,

How does this issue persist? NTP does not limit the slew rate, adjtime() 
does, and NTP does not care. The adjtime() semantics is the same whether 
or not the slew rate is exceeded and for any programmed offset. The 
only issue is whether the programmed offset  is complete in the time 
allowed, in this case five minutes. If it does not complete, the 
discipline reverts to the ordinary algorithm, which will probably surge 
as without the initial training period. At 500 PPM, the slew rate is 0.5 
ms/s or 30 ms/min or 150 ms in five minutes. This is sufficient for a 
maximum offset of 128 ms before the step kicks in. And yes, as reported 
several times, I have tested it with and without when the step is 
exceeded, but not with Linux.


Dave

ms/5 mini. Dave Hart wrote:


On Thu, Nov 4, 2010 at 22:22 UTC, David L. Mills mi...@udel.edu wrote:
 


Miroslav,

IT IS NOT S BUG. Specifically, the Unix adjtime() semantics allows any
argument, even if bizarre. The slew rate is constant at 500 PPM; the
duration of the slew is calculate to amortize the argument as given. There
is no way to exceed the slew rate; it is constant. You and Linux may have
a different view, but the NTP implementation conforms to the traditional
Unix semantics.

Dave
   



Dr. Mills, I think you will find Miroslav in agreement with your view
about how adjtime() works.  As I understand it, the issue he's raising
is that during the initial offset convergence period only, ntpd does
not limit its slew rate to 500 PPM, as it does otherwise.  When it
exceeds 500 PPM, ntpd is assuming the full adjustment + drift_comp
value has been applied, when only the first 500 PPM has been.  As a
result, ntpd stops slewing the clock sooner than it should.

If that analysis is correct, two possible solutions come to mind.
Either ntpd can limit its slew rate during initial convergence to 500
PPM, or it can go faster on systems which allow it by using the
residual returned by adjtime() to accurately account for how much the
clock is being slewed.

Incidentally, Miroslav wrote a simple test program to measure the
maximum slew rate of adjtime(), by requesting a constant 10 PPM
correction every second, and displaying the difference between the
requested amount and the residual returned:


#include stdio.h
#include sys/time.h
#include unistd.h

int main() {
   struct timeval tv, otv;

   tv.tv_sec = 0;
   tv.tv_usec = 10;

   while (1) {
   if (adjtime(tv, otv)) {
   printf(fail\n);
   return 1;
   }
   printf(%ld\n, tv.tv_usec - otv.tv_usec);
   sleep(1);
   }

   return 0;
}

On Linux and FreeBSD 6.4, this displays a rock-steady 500 every
second after the first.  On OpenSolaris, it's around 63000.
Presumably the Solaris 10 you're using is similar.  If so, you should
be able to reproduce Miroslav's problem on FreeBSD, but not Solaris,
which will happily apply up to 63 msec/sec of slew.  By starting with
an initial offset just under the step threshold, ntpd should exceed
FreeBSD adjtime()'s ability to keep up during the initial offset
convergence.

Cheers,
Dave Hart
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] calldelay syntax error ntp4.2.7p77 ?

2010-11-04 Thread David L. Mills

BlackLists,

The calldelay option is not mentioned in the master copy of the 
documentation that resides here. Sometimes there can be considerable 
delay for it to be published at ntp.org. Sorry about that, but I have no 
control over the publishing process. I am told the relevant 
documentation is in the current ntp-dev that leaves here. In any case, 
calldelay is no more, believe me.


Dave

E-Mail Sent to this address will be added to the BlackLists wrote:


BlackLists wrote:
 


Somewhere between ntp4.2.7p41 and ntp4.2.7p77
 calldelay 69
became invalid ntp.conf syntax?
 {commenting it out eliminates the error.}
log:
syntax error in ntp.conf line 15, column 1
line 15 column 1 syntax error, unexpected T_String, expecting $end
 


Nevermind, I found it in the diffs circa 4.2.7p63/64 2010-oct-13/15
   



FYI, it is still mentioned twice in http://doc.ntp.org/4.2.6/confopt.html
with a link to http://doc.ntp.org/4.2.6/miscopt.html

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-11-03 Thread David L. Mills

Mirosalv,

Why didn't you tell me you are using Linux? All bets are off. You are on 
your own.


The daemon clamps the adjtime() (sic) offset to 500 PPM, which is 
consistent with ordinary Unix semantics. The Unix adjtime() syscall can 
return the amount of time not amortized since the lasdt call and leaves 
it up to the user to include in the next call. For nptd, the leftover is 
ignored, since a new update has been measured from scratch. I don't know 
how Linux gotother ideas. You might consider using FreeBSD.


Dave

Miroslav Lichvar wrote:


On Tue, Nov 02, 2010 at 10:03:30PM +, David L. Mills wrote:
 


I ran the same test here on four different machines with the
expected results. These included Solaris on both SPARC and Intel
machines, as well as two FreeBSD machines. I tested with and without
the kernel, with initial offset 300 ms (including step correction)
and 100 ms. I tested with initial poll interval of 16 s and 64 s. At
16 s, the leftover offset after frequency update was about half a
millisecond within spec, but this torqued the frequency about 2 PPM
as expected. It settled down within 1 PM within 20 m.

Bottom line; I cannot verify your experience.
   



Ok, I think I have found the problem. The adj_systime() routine is
called from adj_host_clock() with adjustments over 500 microseconds,
which means ntpd is trying to adjust the clock at a rate higher than
what uses the Linux adjtime(). It can't keep up and the lost offset
correction is what makes the ~170ppm frequency error.

But I have to wonder at what rate adjtime() slews the clock on Solaris
and FreeBSD, it certainly has to be over 3000 ppm.

Anyway, clamping adjustment in adj_host_clock() so adjustment +
drift_clock doesn't go over 500 microseconds fixed the problem for me.
The error in frequency estimation is now below 2 ppm.

I have also rerun the original test with drift file and ntpd now
reaches the 200us sync in less than 2000 seconds with all initial
offsets.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-11-03 Thread David L. Mills

David,

In Mirsolav's test, the frequency was computed at 172 PPM, which is well 
within the capabilities of the algorithm. On the other hand, even if the 
intrinsic hardware frequency error is more than 500 PPM, the frequency 
will be set to 500 PPM and the daemon will continue normally, but will 
not be able to reduce the residual time offset to zero. Mirsolav's 
experience is far different; apparently, the computed frequency was not 
installed and the daemon continued at that frequency error and  exceed 
the step threshold every twenty minutes. This apparently occurred 
whether or not the kernel was enabled and whether or not the simulator 
was involved.


Again I stress no such behavior occurs with Solaris or FreeBSD, so 
something might be affected by Linux adjtime functionality. It might 
help to eyeball the ntp_loopfilter.c and the direct_freq() routine. 
There might be some angst with the Linux semantics.


Dave

David Woolley wrote:


David L. Mills wrote:

I don't think that is right. The adjtime() call can be in principle 
anything, accoridng to the Solaris and FreeBSD man pages, but the 
rate of adjustment is fixed at 500 PPM in the Unix implementation. If 
the Linux argument is limited to 500 microseconds, Linux is 
essentially unusable with NTP. I would be surprised if this were the 
case.




I think what he is really saying is that he is not using the kernel 
discipline and ntpd is tweaking the clock every second, but he has 
broken hardware, which requires a correction of more than 500ppm, and, 
as he is describing it, adjtime has a residual correction to apply 
before the next tweak, or more likely ntpd is limiting it to 500ppm.


As to Linux, I would guess most users of ntpd are using Linux.

Miroslav: ntpd requires an uncorrected clock that is good to 
significantly better than 500ppm.  You can probably get away with 
450ppm, but the transient response will be compromised.


A good quality PC should be within about 10ppm.  A cheap one should be 
within about 50ppm.   500ppm is broken.  You can use tickadj to 
compensate in steps of 100ppm, but a machine with that error is likely 
to have other problems; the crystal may be barely disciplining the 
oscillator.


___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-11-01 Thread David L. Mills

Mirslav,

You have something seriously wrong. The frequency was apparently set 
correctly at  -168 PPM, then shortly after that there was a series of 
step corrections about 172 ms in   20 min, which is about 170 PPM. In 
other words, the frequcny  change did not work. It certainly worke here, 
even with the kernel enabled. You should see the frequency change in the 
loopstats. If not, try disabling the kernel to see if that is the problem.


Dave,

Miroslav Lichvar wrote:


On Wed, Oct 27, 2010 at 11:55:22PM +, David L. Mills wrote:
 


See the most recent ntp-dev. It needed some tuning.
   



Hm, with ntp-dev-4.2.7p74 it still doesn't seem to work as expected.
In the same testing conditions as I used the last time, the frequency
estimate is now 170 ppm off and the clock is stepped several times
before it settles down.

http://mlichvar.fedorapeople.org/tmp/ntp_start4.png

1 Jan 01:00:00 ntpd.new2[3135]: ntpd 4.2.7...@1.2307 Mon Nov  1 12:20:56 UTC 
2010 (1)
1 Jan 01:00:00 ntpd.new2[3135]: proto: precision = 0.101 usec
1 Jan 01:00:00 ntpd.new2[3135]: ntp_io: estimated max descriptors: 1024, 
initial socket boundary: 16
1 Jan 01:00:00 ntpd.new2[3135]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
1 Jan 01:00:00 ntpd.new2[3135]: Listen normally on 1 lo 127.0.0.1 UDP 123
1 Jan 01:00:00 ntpd.new2[3135]: Listen normally on 2 eth0 192.168.123.4 UDP 123
1 Jan 01:00:00 ntpd.new2[3135]: 192.168.123.1 8011 81 mobilize assoc 9201
1 Jan 01:00:00 ntpd.new2[3135]: 0.0.0.0 c016 06 restart
1 Jan 01:00:00 ntpd.new2[3135]: 0.0.0.0 c012 02 freq_set ntpd 0.000 PPM
1 Jan 01:00:00 ntpd.new2[3135]: 0.0.0.0 c011 01 freq_not_set
1 Jan 01:00:01 ntpd.new2[3135]: 192.168.123.1 8024 84 reachable
1 Jan 01:03:17 ntpd.new2[3135]: 192.168.123.1 963a 8a sys_peer
1 Jan 01:03:17 ntpd.new2[3135]: 0.0.0.0 c614 04 freq_mode
1 Jan 01:08:43 ntpd.new2[3135]: 0.0.0.0 0612 02 freq_set ntpd -168.875 PPM
1 Jan 01:08:43 ntpd.new2[3135]: 0.0.0.0 0615 05 clock_sync
1 Jan 01:32:42 ntpd.new2[3135]: 0.0.0.0 0613 03 spike_detect +0.160001 s
1 Jan 01:41:23 ntpd.new2[3135]: 0.0.0.0 061c 0c clock_step +0.172040 s
1 Jan 01:41:23 ntpd.new2[3135]: 0.0.0.0 0615 05 clock_sync
1 Jan 01:41:24 ntpd.new2[3135]: 0.0.0.0 c618 08 no_sys_peer
1 Jan 01:41:24 ntpd.new2[3135]: 192.168.123.1 8044 84 reachable
1 Jan 01:45:53 ntpd.new2[3135]: 192.168.123.1 965a 8a sys_peer
1 Jan 01:58:07 ntpd.new2[3135]: 0.0.0.0 0613 03 spike_detect +0.140452 s
1 Jan 02:06:52 ntpd.new2[3135]: 0.0.0.0 061c 0c clock_step +0.170957 s
1 Jan 02:06:52 ntpd.new2[3135]: 0.0.0.0 0615 05 clock_sync
1 Jan 02:06:53 ntpd.new2[3135]: 0.0.0.0 c618 08 no_sys_peer

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] systems won't synchronize no matter what

2010-11-01 Thread David L. Mills

Mirslav,

There is a very good reason. First, the kernel can an only be switched 
between PLL and FLL mode discreetly, while the daemon has a gradual 
transition between modes so that the poll interval can vary seamlessly 
between 8 s and 36 hr. Second, the kernel PLL is most useful to minimize 
sawtooth errors, and is no better than the daemon loop to track 
incidental frequency noise. Seldom if ever is it useful to switch to FLL 
mode at poll intervals less than 1024 s, unless the incidental phase 
noise is less than a microsecond.


It may happen that the kernel PLL switches modes occasionally during 
operation at 1024 s, but that does not disturb timekeeping accuracy. 
Recent versions of ntpd suppress log messages when switching modes like 
that.


Dave

Miroslav Lichvar wrote:


On Sat, Oct 30, 2010 at 09:09:17AM -0700, Chuck Swiger wrote:
 


 http://www.ece.udel.edu/~mills/database/papers/nano/nano.pdf

In operation, PLL mode is preferred at small update intervals and
time constants and FLL mode at large intervals and time constants.
The optimum crossover point between the PLL and FLL modes, as
determined by simulation and analysis, is the Allan intercept. As a
compromise, the PLL/FLL algorithm operates in PLL mode for update
intervals of 256 s and smaller and in FLL mode for intervals of 1024
s and larger. Between 256 s and 1024 s the mode is specified by the
API. This behavior parallels the NTP daemon behavior, except that in
the latter the weight given the FLL prediction is linearly
interpolated from zero at 256 s to unity at 1024 s.
   



Is there a reason why ntpd doesn't make use of the STA_FLL flag? I
think it would be nice if ntpd switched the FLL limit to 256 s with
tinker allan 8 or lower.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] peers using same server

2010-11-01 Thread David L. Mills

Miroslav,

Depends on the synchronization distance. See the online page How NTP Works.

Dave

Miroslav Lichvar wrote:


I've come across an interesting problem and I'm not sure if this is a
bug or feature.

When two peers are configured to use the same server as their source
and the link between one peer and the server goes down, the peer
doesn't switch to the other peer and will stay unsynchronized.

1 Jan 01:00:00 ntpd[3468]: ntpd 4.2.7...@1.2297 Mon Oct 25 09:43:53 UTC 2010 (1)
1 Jan 01:00:00 ntpd[3468]: proto: precision = 0.101 usec
1 Jan 01:00:00 ntpd[3468]: ntp_io: estimated max descriptors: 1024, initial 
socket boundary: 16
1 Jan 01:00:00 ntpd[3468]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
1 Jan 01:00:00 ntpd[3468]: Listen normally on 1 lo 127.0.0.1 UDP 123
1 Jan 01:00:00 ntpd[3468]: Listen normally on 2 eth0 192.168.123.4 UDP 123
1 Jan 01:00:00 ntpd[3468]: 192.168.123.1 8011 81 mobilize assoc 9201
1 Jan 01:00:00 ntpd[3468]: 192.168.123.3 8011 81 mobilize assoc 9202
1 Jan 01:00:00 ntpd[3468]: 0.0.0.0 c016 06 restart
1 Jan 01:00:00 ntpd[3468]: 0.0.0.0 c012 02 freq_set kernel 100.000 PPM
1 Jan 01:05:29 ntpd[3468]: 192.168.123.3 8024 84 reachable
1 Jan 01:08:44 ntpd[3468]: 192.168.123.3 963a 8a sys_peer
1 Jan 01:08:44 ntpd[3468]: 0.0.0.0 c615 05 clock_sync
1 Jan 19:11:37 ntpd[3468]: 192.168.123.1 8024 84 reachable
1 Jan 19:40:48 ntpd[3468]: 192.168.123.1 963a 8a sys_peer
1 Jan 22:03:04 ntpd[3468]: 192.168.123.1 8643 83 unreachable
1 Jan 22:07:08 ntpd[3468]: 0.0.0.0 0618 08 no_sys_peer
2 Jan 16:07:01 ntpd[3468]: 192.168.123.1 8054 84 reachable
2 Jan 16:12:32 ntpd[3468]: 192.168.123.1 966a 8a sys_peer
2 Jan 18:58:12 ntpd[3468]: 192.168.123.1 8673 83 unreachable
2 Jan 19:00:34 ntpd[3468]: 0.0.0.0 0628 08 no_sys_peer

But when each peer uses a different server, they will switch to the
other peer quickly when the link goes down.

1 Jan 01:00:00 ntpd[3635]: ntpd 4.2.7...@1.2297 Mon Oct 25 09:43:53 UTC 2010 (1)
1 Jan 01:00:00 ntpd[3635]: proto: precision = 0.101 usec
1 Jan 01:00:00 ntpd[3635]: ntp_io: estimated max descriptors: 1024, initial 
socket boundary: 16
1 Jan 01:00:00 ntpd[3635]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
1 Jan 01:00:00 ntpd[3635]: Listen normally on 1 lo 127.0.0.1 UDP 123
1 Jan 01:00:00 ntpd[3635]: Listen normally on 2 eth0 192.168.123.4 UDP 123
1 Jan 01:00:00 ntpd[3635]: 192.168.123.2 8011 81 mobilize assoc 9201
1 Jan 01:00:00 ntpd[3635]: 192.168.123.3 8011 81 mobilize assoc 9202
1 Jan 01:00:00 ntpd[3635]: 0.0.0.0 c016 06 restart
1 Jan 01:00:00 ntpd[3635]: 0.0.0.0 c012 02 freq_set kernel 100.000 PPM
1 Jan 01:05:29 ntpd[3635]: 192.168.123.3 8024 84 reachable
1 Jan 01:09:54 ntpd[3635]: 192.168.123.3 963a 8a sys_peer
1 Jan 01:09:54 ntpd[3635]: 0.0.0.0 c615 05 clock_sync
1 Jan 02:49:28 ntpd[3635]: 192.168.123.2 8024 84 reachable
1 Jan 03:11:52 ntpd[3635]: 192.168.123.2 963a 8a sys_peer
1 Jan 05:40:36 ntpd[3635]: 192.168.123.2 8643 83 unreachable
1 Jan 05:48:17 ntpd[3635]: 192.168.123.3 961a 8a sys_peer
1 Jan 07:22:11 ntpd[3635]: 192.168.123.2 8054 84 reachable
1 Jan 08:01:28 ntpd[3635]: 192.168.123.2 966a 8a sys_peer
1 Jan 10:13:37 ntpd[3635]: 192.168.123.2 8673 83 unreachable
1 Jan 10:20:08 ntpd[3635]: 192.168.123.3 961a 8a sys_peer

Any explanation for this? I've tried versions 4.2.2, 4.2.4, 4.2.6 and
the latest dev, the result is always the same.

Thanks,

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] What level of timesynch error is typical onWinXP?

2010-10-25 Thread David L. Mills

Miroslav,

I uttered an egregious, terrible lie: the hold interval is not 600 s; it 
is 300 s, so the maximum time to converge, given a frequency file within 
1 PPM, is five minutes; without a frequency file, within ten minutes to 
0.5 ms and 1 PPM. Again, I caution that if for some reason the  initial 
frequency file is in error something like 500 PPM, there will be a 
considerable additional time to converge.


The model for this scheme is not to fix broken frequency files, but to 
quickly converge after a laptop has been off for a few hours.


Dave

David L. Mills wrote:


Miroslav,

No, it not expected, unless you are referring to broadcast mode when 
started with +-100 ms initial offset. That has been corrected as per 
your bug report.


For record, a hold timer is started when the first update is received 
after startup and ends when the residual offset is less than 0.5 ms or 
after a timeout of 600 s. During the hold interval the PLL loop time 
constant is set very low and the frequency discipline is disabled. 
With this arrangement, the offset typical converges within 600 s, even 
with initial offsets up to +-100 ms, and much less if the initial 
offset is in the 10-50 ms range. If you see different behavior, either 
with client/server or broadcast modes, please report.


Note that, if the initial frequency error is significant, there may 
still be a surge correction. If the frequency file is not present at 
startup, the frequency will be measured, typically within +-1 PPM, 
within 600 s, following with the above scheme will be in effect. Under 
worst case conditions, there still could be a wobble following startup 
not exceeding 1 ms. If somebody finds an extraordinarily unlikely y 
set of circumstances leading to, say 2 ms, I'm not going to lose sleep 
over that.


Dave

The Miroslav Lichvar wrote:


On Fri, Oct 22, 2010 at 11:39:47AM +0100, David J Taylor wrote:
 


Thanks, Dave.  I may be missing something here, but it seems to me
that 4.2.7p58 still takes a number of hours to reach the accuracy
limits where thermal effects dominate.  It's that which matters to
me, rather than something in the first few minutes.  I agree the
graphs would not show such short time-scale initial disturbances.
   



Did the clock frequency change before you started the new version?

I played with the latest ntp-dev a bit and there indeed is a
improvement on start, mainly when the initial offset is around
0.01-0.05s. But the frequency error has to be very small to make a
difference, see these plots:

http://mlichvar.fedorapeople.org/tmp/ntp_start_offset.png
http://mlichvar.fedorapeople.org/tmp/ntp_start_freq.png

Also, I've noticed when ntpd is started without driftfile and the
initial offset is over 0.05 second, the overshoot can easily reach 100
percent, is this expected?

 





___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] ntp orphan mode without manycast or multicast or broadcast ?

2010-10-14 Thread David L. Mills

Jerzy ,

See the online documentation at www.ntp.org. The release notes page has 
a link to a tutorial on orphan mode. Note the online documentation 
applies to the latest development version; the feature may or may not 
appear in the release versioin.


Dave

Miernik, Jerzy (Jerzy) wrote:


Orphan Mode description related to NTP Version 4 Release Notes does not require 
either manycast or multicast or broadcast to be configured. Is it then possible 
to have ntpd get into orphan mode, in a mesh, with just unicasts?
I think it should be possible to configure several servers in each client and 
all clients in each server, peer servers in servers, all IP addresses unicast, 
to have orphan mode operational, if needed. Servers could be of stratum lower 
by 1 than clients.

1. Could I get an expert comment if this would work?
2. Has anyone ever tried such a multi-unicast addressing for orphan mode?
3. If unicast addresses of servers are ordered in ntp.conf in ascending order 
of hop numbers, would ntpd try the closest servers before trying more distant 
ones?
4. If I still may, any config examples? Or something to avoid?
Best regards,
Jerzy.

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] GPS

2010-10-05 Thread David L. Mills

John,

My experience with SA some years ago was that the timing accuracy was in 
the order of LORAN C, that is, about one microsecond. However, the 
oscillator in my Austron 2201 GPS receiver was disciplined in frequency 
by LORAN C, with result the timing accuracy was in the order of 50 ns. I 
calibrated it with a Cesium oscillator. In other words, even if SA were 
turned back on, it is easily defeated.


Dave

John Hasler wrote:


Chris H writes:
 


With Selective Availability enabled would that cause time accuracy
issues with the signals from the satellite?
   



I believe that it would inject a maximum of about 300 ns of random
error.  However, SA is obsolete.  The newest satellites don't even
implement it.
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Allan deviation survey

2010-09-16 Thread David L. Mills

Dave,

I'm glad it's gone, as the code was never intended to measure 
resolution. It is intended to measure precision, defined in the 
specification as the time to read the system clock. This turns out  to 
be really important for a client to read two or more sources on the same 
fast Ethernet. The intent is to avoid non-intersecting correctness 
intervals.


Dave

Dave Hart wrote:


On Thu, Sep 16, 2010 at 1:23 AM, David L. Mills mi...@udel.edu wrote:
 


Miroslav,

The fastest machine I can find on campus has precision -22, or about 230 ns.
Then, I peeked at time.nist.gov, which is actually three machines behind a
load leveler. It reports to be an i386 running FreeBSD 61. Are you ready for
this? It reports precision -29 or 1.9 ns! I'm rather suspiciousabout that
number.
   



I think this can be attributed to some code that used to be in ntpd
which, on FreeBSD only, used for precision an OS estimate of the clock
resolution in place of the measured latency to read the clock used on
every other platform.  That FreeBSD exception was removed from ntpd
years ago, but apparently after the version in use by NIST.

Dave Hart
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Allan deviation survey

2010-09-15 Thread David L. Mills

Miroslav,

The fastest machine I can find on campus has precision -22, or about 230 
ns. Then, I peeked at time.nist.gov, which is actually three machines 
behind a load leveler. It reports to be an i386 running FreeBSD 61. Are 
you ready for this? It reports precision -29 or 1.9 ns! I'm rather 
suspiciousabout that number. What  processor and operating system are 
you using. What is the precision reported by ntpd?


For historic perspective, the time to read the clock on a Sun SPARC IPC 
in the late 1980s was 42 microseconds.


Thanks for the jitter.c update.

Dave

Miroslav Lichvar wrote:


On Tue, Sep 14, 2010 at 07:17:04PM +, David L. Mills wrote:
 


Miroslav,

Better recalibrate your slide rule. On a 2.8 GHz dual-core Pentium
running OpenSolaris 10, the measured precision is -21, which works
out to 470 ns.
   



That probably just means the system on your machine is not using the
rdtsc instruction when reading time.

 


You claim ten times faster. What snake oil are you
using for your processor? To check, try running the jitter.c program
in the distribution.
   



Average   0.00088
First rank
0   0.00082
1   0.00082
2   0.00082
3   0.00082
4   0.00082
5   0.00082
6   0.00082
7   0.00082
8   0.00082
9   0.00082
Last rank
70   0.13065
71   0.13324
72   0.13477
73   0.13545
74   0.13782
75   0.14010
76   0.19316
77   0.23405
78   0.34991
79   0.000111950

But I had to apply the following patch, because there was time stored
as seconds since 1900 in double format, which for current time gives
only about 119ns resolution and so the differences ended up as zero.

--- jitter.c.orig   2008-07-16 23:20:59.0 +0200
+++ jitter.c2010-09-15 10:10:06.0 +0200
@@ -15,6 +15,7 @@
#include sys/time.h
#include stdlib.h
#include jitter.h
+#include time.h

#define NBUF   82
#define FRAC   4294967296. /* a illion */
@@ -33,7 +34,7 @@
   char *argv[]
   )
{
-   l_fp tr;
+   l_fp tr, first;
   int i, j;
   double dtemp, gtod[NBUF];

@@ -43,11 +44,13 @@
   for (i = 0; i  NBUF; i ++)
   gtod[i] = 0;

+   get_systime(first);
   /*
* Construct gtod array
*/
   for (i = 0; i  NBUF; i ++) {
   get_systime(tr);
+   tr.l_i -= first.l_i;
   LFPTOD(tr, gtod[i]);
   }

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Allan deviation survey

2010-09-14 Thread David L. Mills

Miroslav,

Better recalibrate your slide rule. On a 2.8 GHz dual-core Pentium 
running OpenSolaris 10, the measured precision is -21, which works out 
to 470 ns. You claim ten times faster. What snake oil are you using for 
your processor? To check, try running the jitter.c program in the 
distribution.


Dave

Miroslav Lichvar wrote:


On Tue, Sep 14, 2010 at 02:45:01AM +, David L. Mills wrote:
 


Don't get fooled by the MINSTEP. Precision is defined by the time to
read the system clock at the user interface and I have never seen
anything less than 500 ns for that, more typically 1000 ns.
   



This is what ntpd prints here (patched to use smaller MINSTEP):
ntpd[9357]: proto: precision = 0.086 usec

I've seen values as low as 40 ns. The system has to use TSC as
clocksource though.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Allan deviation survey

2010-09-14 Thread David L. Mills

Bill,

A feedback loop that minimizes time and frequency errors is a type-2 
loop whether linear or not.  NTP as specified and implanted is not 
linear either, since it can use old samples in the clock filter 
algorithmic, turns into a FLL at larger poll interval,. and has an 
automatic poll-adjust mechanism. For the purposes here, only the clock 
filter is significant.


I suspect you use the precision time support in the kernel and the 
ntp_adjtime() syscall or their equivalents in Linux. This code, which is 
a descendent of code I wrote for the Alpha, implements a linear, type-2 
loop with the same impulse response and time constant as the daemon loop 
used when the precision time support is not available. Both the daemon 
and kernel loops and yours as well are, crudely put, lowpass filters. 
What I suspect you did was control the frequency directly and move the 
corner frequency of the PLL to up out of the way values by decreasing 
the time constant exponent (shift), then doing the lowpass function 
yourself. That might even give better results than the PLL alone. 
However, the ntpd measurements are after the clock filter and before the 
kernel call, while I suspect yours are after the discipline and before 
the kernel call. The two measurements are not comparable. Look at it 
this way. At a poll interval of 16 s, the PLL reduces a given time 
offset by a factor of 256, so in fact a 1-ms offset actually causes a 
4-microsecond change in the clock phase. If that were the criteria to 
judge performance, ntpdt would look 256 times better than advertised. I 
am not here judging whether chrony is better than ntpd or not, just that 
the performance measurements be comparable and honest.


Dave

unruh wrote:


On 2010-09-14, David L. Mills mi...@udel.edu wrote:
 


Miroslav,

I think we are talking right past each other. Both Chrony and NTP 
implement the clock discipline using a second-order feedback loop that 
   



chrony does not use a second-order feedback loop it is a high order, and
variable order feedback loop. It remembers not just the slope and
offset, as does ntp, but also past values of the errors. It is, as far
as I can tell, stable ( poles in the lower have complex t plane.)
The variable order also makes it non-linear.

The high order and the non-linearity both make it very different from
ntp.

 

can minimize error in both time and frequency, although each uses a 
different loop filter. Chrony uses a least-squares technique; NTP uses a 
traditional phase-lock loop. The response of these loops is 
characterized by risetime and overshoot, or alternatively time constant 
and damping factor. If Chrony were designed to have similar risetime and 
overshoot characteristics and equivalent time constant, when operated 
under the same conditions (trace 1) it will perform in a manner similar 
to NTP. That was and is my claim.
   


..

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Allan deviation survey

2010-09-13 Thread David L. Mills

Miroslav,

I think we are talking right past each other. Both Chrony and NTP 
implement the clock discipline using a second-order feedback loop that 
can minimize error in both time and frequency, although each uses a 
different loop filter. Chrony uses a least-squares technique; NTP uses a 
traditional phase-lock loop. The response of these loops is 
characterized by risetime and overshoot, or alternatively time constant 
and damping factor. If Chrony were designed to have similar risetime and 
overshoot characteristics and equivalent time constant, when operated 
under the same conditions (trace 1) it will perform in a manner similar 
to NTP. That was and is my claim.


I read your message very carefully and conclude you have done something 
very similar to what I have. You generated phase noise from an 
exponential distribution and verified it has slope -0.5 on a 
variance-time plot, then generated random-walt frequency noise and 
verified it has slope near zero on a variance-time plot or used some 
other equivalent technique to verify the distributions. Using trial and 
error you found appropriate factors to combine the phase and frequency 
noise to produce an Allan variance characteristic similar to trace 1. 
All this is not hard using Matlab, but you might have used something else.


The interesting thing to me is how you used that information to develop 
the claim that Chrony is far better than NTP? To support your claim, you 
would have to confront both Chrony and NTP with samples drawn from the 
resulting distribution and compare statistics.  The cumulative 
probability distributions in Chapter 6 of my book were made using the 
NTP simulator included in the NTP distribution. I assume you have 
something similar.


It would be interesting to repeat the experiment with trace 3 and NTP 
operating at a poll interval of 16 s..


Dave

Miroslav Lichvar wrote:


On Fri, Sep 10, 2010 at 10:10:08PM +, David L. Mills wrote:
 


A previous message implied that, once the Allan characteristic was
determined, it would show chrony to be better than ntpd. Be advised
the default time constant (at 64 s poll interval) was specifically
chosen to match trace 1 on the graph mentioned above.
   



Wasn't that rather for 16s poll interval? From simulations is seems
that the phase noise would have to be 10-30 times higher (or the
frequency noise lower, but that's unrealistic) to ntpd perform
well at 64s poll interval.

 


In other
words, it is in fact optimum for that characteristic and chrony can
do no better.
   



Well, it does better. With phase noise and random-walk frequency
corresponding to the trace 1 from your graph chrony is about 5 times
better than ntpd. With 30 times higher phase noise the difference is
only in order of tens of percent, but it's still better.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Allan deviation survey

2010-09-13 Thread David L. Mills

Miroslav,

You don't need a week for that, since the anticipated intercept is in 
the order of 200 s (trace 3). However, plots such as these are really 
susceptible to little hidden resonances, so I tend to prefer a long tail 
and lots and lots of samples. For comparison, the averaging time for PPS 
signal in the kernel is 256 s, which is close to the expected Allan 
intercept for modern systems.


Don't get fooled by the MINSTEP. Precision is defined by the time to 
read the system clock at the user interface and I have never seen 
anything less than 500 ns for that, more typically 1000 ns.


Dave

Miroslav Lichvar wrote:


On Fri, Sep 10, 2010 at 08:48:58PM +, David L. Mills wrote:
 


Miroslav,

I've done this many times with several machines in several places
and reported the results in Chapter 12 and 6 in both the first and
second editions of my book, as well as my 1995 paper in ACM Trans.
Networking. Judah Levine of NIST has done the same thing and
reported in IEEE Transactions. He pointed out valuable precautions
when making these measurements. You need to disconnect all time
disciplines and let the computer clock free-wheel. You need to
continue the measurements for at least a week, ten times longer than
the largest lag in the plot. You need to display on log-log
coordinates and look for straight lines intersecting at what I have
called the Allan intercept. I have Matlab programs here that do that
and produce graphs like the attached.
   



For the simulation and development purposes I'm interested in the most
important part of the graph is the point at which the line starts to
divert from the -1 slope. With good PPS signal one day of collecting
data should be enough.

 


For those that might want to repeat the experiments, see the
attached figure. Trace 1 is from an old Sun SPARC IPC; trace 2 is
from a Digital Alpha. 
   



Thanks, that's very helpful.

 


Traces 3 and 4 were generated using artificial
noise sources with parameters chosen to closely match the measured
characteristics.  Phase noise is generated from an exponential
distribution, while frequency nose is generated from the integral of
a Gaussian distribution, in other words a random walk. Trace 4 is
the interesting one. It shows the projected performance with
precision of one nanosecond. The fastest machines I have found have
a precision of about 500 ns. Note, precision is the time taken to
read the kernel clock and is not the resolution.
   



With current CPUs the precision is well below 100 ns. (thus the
MINSTEP constant used in ntpd's precision routine is too high)

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Allan deviation survey

2010-09-12 Thread David L. Mills

Bill,

Please reread the definition of Allan deviation. It is a measure of 
frequency differences, not time errors. I principle, it could be applied 
to a virtual machine with virtual timer interrupts, but nobody familiar 
with the principles would do that and it would serve no useful purpose. 
The phase noise would be huge due to processor scheduling, etc. This 
would push the Allan intercept to very large values. The NTP clock 
discipline does that automatically. So, it makes no sense at all to use 
Allan analysis in such cases.


I'm getting really tired of this discussion, and it servers no useful 
purpose. And, by the way, mail sent to your alleged mail address is 
returned to sender as undeliverable.


Dave

unruh wrote:


On 2010-09-11, David L. Mills mi...@udel.edu wrote:
 


David,

With due respect, your comment has nothing to do with the issue. Allan 
deviation is between a quartz crystal oscillator,  timer interrupt, 
interpolation mechanism and a kerel syscall to read. the clock. It has 
nothing whatsoever to do with virtual machines.
   



?? Allan deviation is a measure of the error of a clock as a function of
lag. It does NOT specify the error soruce. It is not simply defined for
only certain machines used in certain ways. Now it may be simple for
some systems (like your lightly loaded systems) but that is largely
irrelevant.
The purpose of the doing things like measureing Alan deviation is to
understand the noise sources affecting a clock. If those happen to be
diurnal temperature variations, then that is what needs to be handeled.
If it is Virtual Machines and their clock reading then that is what you
need to look at. 


Errors are errors and understanding them is crucial to designing a
decent error mitigation procedure. Closing one's eyes to a dominant
error source will ssimply mean that the error mitigation procedure will
suck.

 


Dave

David Woolley wrote:

   


David L. Mills wrote:

 


Bill,

Running a precision time server on a busy public machine with a 
widely varying load is not a good idea and I have no interest in 
that. Running 
   

As indicated by the sort of questions the group is getting recently, 
it is becoming the norm to run time servers on virtual machines, 
because that is how businesses now run all their servers.  The whole 
point of virtual machines is that the host is busy and running a 
varied load!


___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Allan deviation survey

2010-09-11 Thread David L. Mills

David,

With due respect, your comment has nothing to do with the issue. Allan 
deviation is between a quartz crystal oscillator,  timer interrupt, 
interpolation mechanism and a kerel syscall to read. the clock. It has 
nothing whatsoever to do with virtual machines.


Dave

David Woolley wrote:


David L. Mills wrote:


Bill,

Running a precision time server on a busy public machine with a 
widely varying load is not a good idea and I have no interest in 
that. Running 



As indicated by the sort of questions the group is getting recently, 
it is becoming the norm to run time servers on virtual machines, 
because that is how businesses now run all their servers.  The whole 
point of virtual machines is that the host is busy and running a 
varied load!


___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Allan deviation survey

2010-09-11 Thread David L. Mills

David,

I have no idea where you are coming from. At my feet are two GPS/CDMA 
time servers running embedded Linux systems. I have two more on campus 
plus two dedicated Unix machines connected to GPS receivers. NIST has 
about a dozen dedicated time servers running FreeBSD. USNO has about a 
dozen running HP-UX. The NRC in Canada runs at least two of them, as 
does an unknown number in Europe, Japan and Australia. There is even one 
in Antarctica and occasionally one or two in space and oone on the 
seafloor of the Pacific Ocean. Maybe they are abnormal in your view, but 
they are the ones I am concerned about and they are the ones the Allan 
deviation analysis is intended for. If you want to run NTP in a virtual 
machines, the performance will depend on many factors, but none of which 
have to do with Allan deviation.


Dave

David Woolley wrote:


David L. Mills wrote:



I beg to differ. All the machines I used are PCs or similar 
workstations. They really and truly behave according to an exponential 



As you note in another reply, you seem to use them in a way that is 
abnormal for most users of NTP, i.e. as dedicated real machines in 
well controlled environments.


___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Allan deviation survey

2010-09-10 Thread David L. Mills

Miroslav,

I've done this many times with several machines in several places and 
reported the results in Chapter 12 and 6 in both the first and second 
editions of my book, as well as my 1995 paper in ACM Trans. Networking. 
Judah Levine of NIST has done the same thing and reported in IEEE 
Transactions. He pointed out valuable precautions when making these 
measurements. You need to disconnect all time disciplines and let the 
computer clock free-wheel. You need to continue the measurements for at 
least a week, ten times longer than the largest lag in the plot. You 
need to display on log-log coordinates and look for straight lines 
intersecting at what I have called the Allan intercept. I have Matlab 
programs here that do that and produce graphs like the attached.


Most of the prior work was done 15 years ago. I strongly suspect the 
Allan intercept has moved to lower time values due to the fact that 
modern processors are faster and the interrupt latency is smaller. The 
current NTP distribution includes a NTP simulator that can be excited 
with white phase noise and random-walk frequency noise that very nicely 
models the real noise sources.


For those that might want to repeat the experiments, see the attached 
figure. Trace 1 is from an old Sun SPARC IPC; trace 2 is from a Digital 
Alpha. Traces 3 and 4 were generated using artificial noise sources with 
parameters chosen to closely match the measured characteristics.  Phase 
noise is generated from an exponential distribution, while frequency 
nose is generated from the integral of a Gaussian distribution, in other 
words a random walk. Trace 4 is the interesting one. It shows the 
projected performance with precision of one nanosecond. The fastest 
machines I have found have a precision of about 500 ns. Note, precision 
is the time taken to read the kernel clock and is not the resolution.


Dave

Miroslav Lichvar wrote:


Hi,

I'm trying to find out how a typical computer clock oscillator
performs in normal conditions without temperature stabilization or a
stable CPU load and how far it is from the ideal case which includes
only a random-walk frequency noise.

A very useful statistics is the Allan deviation. It can be used to
compare performance of oscillators, to make a guess of the optimal
polling interval, whether enabling ntpd daemon loop to use FLL will
help, how much better chrony will be than ntpd, etc.

If you have a PPS device and would be willing to run the machine
unsynchronized for a day, I'd like to ask you to measure the Allan
deviation and send it to me.

I wrote a small ncurses program that can be used with LinuxPPS to
capture the PPS samples and create an Allan deviation plot. An
overview is displayed and continuously updated while samples are
collected. Data which can be used to make an accurate graph (e.g. in
gnuplot) are written to the file specified by -p option when the
program is ended or when the 'w' key is pressed.

Available at:
http://mlichvar.fedorapeople.org/ppsallan-0.1.tar.gz

Obligatory screenshot :-)
  Allan deviation plot (span 11:09:55, skew +0.0)
1e-05├
│	  
+
│ 
│  + +
1e-06├  + 
│   +++
│  ++ 
│+++ 
│   ++  
1e-07├ +++
│++ 
│  +++

│ ++++
│   +++   
1e-08├  + 
│  
│ 
│

│
1e-09└───┴───┴───┴───┴───┴
  1e+00   1e+011e+02   1e+03   1e+04   1e+05
w:Write   q:Quit   r:Reset   1:Skew 0.0   2:Skew +1.0   3:Skew -0.5


To make a good plot:
1. disable everything that could make system clock adjustments
2. start ./ppsallan -p adev.plot /sys/devices/virtual/pps/pps0/assert
  (change the sys file as appropriate)
3. let it collect the PPS samples for at least one day
4. hit q and send me the adev.plot file

Thanks,

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions

Re: [ntp:questions] Allan deviation survey

2010-09-10 Thread David L. Mills

Bill,

All my measurements were in temperature-controlled environments, such as 
a campus lab or home office, and the data were collected over one week. 
The temperature varied less than a degree C. However, I have data from 
Poul-Henning Kamp for a similar experiment done in summertime Denmark 
where the environment was not controlled. As expected it looked 
something less than shown by the graph attached to my previous message. 
The Allan characteristic tends to fizzle out at lags greater than about 
one-fourth the span of the measurements. Thus, if you collect samples 
for only one day the maximum lag could not be more than a few hours and 
the diurnal effects would not be apparent.


A previous message implied that, once the Allan characteristic was 
determined, it would show chrony to be better than ntpd. Be advised the 
default time constant (at 64 s poll interval) was specifically chosen to 
match trace 1 on the graph mentioned above. In other words, it is in 
fact optimum for that characteristic and chrony can do no better.


Having said that, modern machines are faster and with less phase noise, 
although with the same rotten clock oscillator. Thus, I would expect a 
modern machine to behave something like trace 3 on my plot, where the 
intercept is more like 200 s than 2000 s. From anecdotal evidence, 16 s 
is about right, but 8 s is too vulnerable to jitter in the Ethernet NICs 
and switches. NIC jitter can vary widely, even using the same Ethernet 
chip, specifically the PCNET chips that do scatter-gather on-fly and 
coalesce interrupts. I have measured jitter components from 150 ns with 
a i386 running FreeBSD  to over 1 ms with a Sun Ultra running Solaris 
10. So, any performance comparisons must take these differences in account.


Dave

unruh wrote:


On 2010-09-10, Miroslav Lichvar mlich...@redhat.com wrote:
 


Hi,

I'm trying to find out how a typical computer clock oscillator
performs in normal conditions without temperature stabilization or a
stable CPU load and how far it is from the ideal case which includes
only a random-walk frequency noise.

A very useful statistics is the Allan deviation. It can be used to
compare performance of oscillators, to make a guess of the optimal
polling interval, whether enabling ntpd daemon loop to use FLL will
help, how much better chrony will be than ntpd, etc.

If you have a PPS device and would be willing to run the machine
unsynchronized for a day, I'd like to ask you to measure the Allan
deviation and send it to me.

I wrote a small ncurses program that can be used with LinuxPPS to
capture the PPS samples and create an Allan deviation plot. An
overview is displayed and continuously updated while samples are
collected. Data which can be used to make an accurate graph (e.g. in
gnuplot) are written to the file specified by -p option when the
program is ended or when the 'w' key is pressed.

Available at:
http://mlichvar.fedorapeople.org/ppsallan-0.1.tar.gz

Obligatory screenshot :-)
  Allan deviation plot (span 11:09:55, skew +0.0)
1e-05???
???	  
+
??? 
???  + +
1e-06???  + 
???   +++
???  ++ 
???+++ 
???   ++  
1e-07??? +++
???++ 
???  +++

??? ++++
???   +++   
1e-08???  + 
???  
??? 
???

???
1e-09???
  1e+00   1e+011e+02   1e+03   1e+04   1e+05
w:Write   q:Quit   r:Reset   1:Skew 0.0   2:Skew +1.0   3:Skew -0.5


To make a good plot:
1. disable everything that could make system clock adjustments
2. start ./ppsallan -p adev.plot /sys/devices/virtual/pps/pps0/assert
  (change the sys file as appropriate)
3. let it collect the PPS samples for at least one day
4. hit q and send me the adev.plot file

Thanks,

   



Except yo uknow that the typical computer clock is driven mostly by
temperature variations, and those are time of day dependent. People tend
to work during the day thus their computer works during the day, and
does not at night. Ie, there is a very strong daily cycle in the temp of
the computer. That is NOT within the Allan model, and the Allan
variation and minimum are really irrelevant with this highly
non-stochastic noise model.

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Allan deviation survey

2010-09-10 Thread David L. Mills

David,

I beg to differ. All the machines I used are PCs or similar 
workstations. They really and truly behave according to an exponential 
distribution with a small mean of a few to a few tens of microseconds. I 
have done a tedious histogram from which I can pick out the cache 
replacement, context-switch and timer interrupts. I've used both uniform 
and exponential noise models with substantially the same results. Since 
I was looking for the best performers, most of the data was collected on 
lightly loaded machines; the characteristics with a heavily loaded 
campus server are much worse.


Dave

David Woolley wrote:


Miroslav Lichvar wrote:



A very useful statistics is the Allan deviation. It can be used to
compare performance of oscillators, to make a guess of the optimal



Surely that is based on a particular model of the phase noise and the 
big argument about ntpd is that PC's don't follow that model.


___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Allan deviation survey

2010-09-10 Thread David L. Mills

Bill,

Running a precision time server on a busy public machine with a widely 
varying load is not a good idea and I have no interest in that. Running 
experiments on a dedicated, but very busy, time server such as 
rackety.udel.edu is much more interesting. As for load-induced 
temperature variations, even on a busy NTPserver, the CPU is loaded to 
about five percent and the load is constant.


As for your concern about diurnal variations for any reason, that's what 
the clock discipline algorithm is for and has nothing to do with Allan 
deviation.


As for the question about the graph, it's from my book. However, there 
are examples in the Precision Time Synchronization briefing slides on 
the NTP project page at www.eecis.udel.edu/~mills/ntp.html. be advised, 
most of those briefings are from the 1990s.


Dave

unruh wrote:


On 2010-09-10, David L. Mills mi...@udel.edu wrote:
 


This is a multi-part message in MIME format.
--090107050005040702060208
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable

Miroslav,

I've done this many times with several machines in several places and=20
reported the results in Chapter 12 and 6 in both the first and second=20
editions of my book, as well as my 1995 paper in ACM Trans. Networking.=20
Judah Levine of NIST has done the same thing and reported in IEEE=20
Transactions. He pointed out valuable precautions when making these=20
measurements. You need to disconnect all time disciplines and let the=20
computer clock free-wheel. You need to continue the measurements for at=20
least a week, ten times longer than the largest lag in the plot. You=20
need to display on log-log coordinates and look for straight lines=20
intersecting at what I have called the Allan intercept. I have Matlab=20
programs here that do that and produce graphs like the attached.
   



What was the load on those computers? Were they running just this time
measurement software or were they being used for real work by normal  people?

From what I have seen from machines that are in use, they heat up

during the day when people use them and cool off at night when they are
idle. The amplitude of this component depends on how much work is being
done.  This does not fit the model of random phase noise/random walk
frequency noise. It has a strong periodic component with period of a
day. I see this in most of my systems. Such a periodic noise is not part
of the noise model on which the Allan intercept is based. 


Also this assumes a very particular model of how the measurements are
made, of how the time corrections are made  and of what is desired from 
the system. 

 


Most of the prior work was done 15 years ago. I strongly suspect the=20
Allan intercept has moved to lower time values due to the fact that=20
modern processors are faster and the interrupt latency is smaller. The=20
current NTP distribution includes a NTP simulator that can be excited=20
with white phase noise and random-walk frequency noise that very nicely=20
models the real noise sources.

For those that might want to repeat the experiments, see the attached=20
figure. Trace 1 is from an old Sun SPARC IPC; trace 2 is from a Digital=20
Alpha. Traces 3 and 4 were generated using artificial noise sources with=20
parameters chosen to closely match the measured characteristics.  Phase=20
noise is generated from an exponential distribution, while frequency=20
nose is generated from the integral of a Gaussian distribution, in other=20
words a random walk. Trace 4 is the interesting one. It shows the=20
projected performance with precision of one nanosecond. The fastest=20
machines I have found have a precision of about 500 ns. Note, precision=20
is the time taken to read the kernel clock and is not the resolution.

   



You graph of course did not make it through to newsnet.
Have you archived the figure somewhere?

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] [Pool] 4000 packets a second?

2010-09-01 Thread David L. Mills

Dave,

Looks like you got impaled on my own bug. The intent was indeed to 
express the minimum headway, aka guard time (ntp_minpkt), in seconds, 
not as exponent. This is so folks could specify 3 s, for instance. This 
is what the configuration code assumes. On the other hand, the minimum 
average headway (ntp_minpoll) is specified as an exponent of two, since 
this is consistent with poll interval. Both of these items are used 
elsewhere in the rate management code. However, and it must by my bug, I 
mistakenly coded it in ntp_monitor.c and ntp_proto.c as exponent. We can 
discuss offline how to fix this simple bug. Apparently, I never did get 
around to documenting the command other than an orphan reference on the 
rate management page. That's in process of fix.


Dave

Dave Hart wrote:


On Wed, Sep 1, 2010 at 03:43 UTC, David L. Mills mi...@udel.edu wrote:
 


Dave,

The code I wrote in ntp_monitor.c has apparently been rewritten.
   



Yes, I take what I like to think is credit for rewriting
ntp_restrict.c and ntp_monitor.c.  I was unhappy with the way code and
structure definitions had been duplicated when IPv6 support was added,
and wanted to share more of the code between v4 and v6.

However, it was intended to be (and I believe it is) equivalent to the
original code functionally.

 


The MRU resolution is in seconds.
   



As part of the rewrite the resolution recorded in the MRU list was
changed to keep the list correctly ordered at a subsecond level and to
enable the iterative retrieval used by ntpq's mrulist command
(replacing ntpdc's monlist).  However, decisions are still made based
on whole-second calculations.

 


The original interpretation of  minimum was that
any headway less than this would be dropped. Setting that to zero would mean
nothing would be dropped. Apparently, the current code is contrary to the
original intent and documentation.
   



You seemingly claim the units of discard minimum 0 are seconds and
that would mean zero minimum seconds between requests.  The units are
log-base-2 seconds, as with minpoll and maxpoll.  See the
documentation you have so lovingly maintained [1].  This was true with
the old code and remains true with the rewritten code.  I have tested
4.2.6 builds on the pool server I'm involved with, and the rate
limiting behavior appears unchanged by the rewrite, except it is more
effective on a busy server where 600 monlist entries was inadequate
for rate-limiting to be enforced.

 


I didn't check to see if the probabilistic choice to preempt old entries if
the list is full remains. My earlier experience is that this is important
for the busiest servers.
   



The code is still there, but it is much less likely to come into play
with the 600-entry cap lifted.  I also remember puzzling quite a bit
over that snippet of code and the documentation for discard monitor
describing it.  I recall thinking the code did not appear to be doing
what the documentation stated.  I welcome review of discard monitor
behavior in 4.2.7, where it can be made more relevant by limiting the
size of the MRU list using mru maxdepth 100 or so.

Cheers,
Dave Hart

[1] http://www.eecis.udel.edu/~mills/ntp/html/accopt.html#discard

 


Dave

Dave Hart wrote:

On Wed, Sep 1, 2010 at 00:42 UTC, David L. Mills mi...@udel.edu wrote:


Did you intend the discard minimum 0? That effectively disables the rate
control defense mechanism. you should leave it out.


That has not been my experience on the pool server I'm involved with:

h...@pool1 fgrep discard /etc/ntp.conf
# discard minimum 0 (power of 2 like poll interval) is needed
discard minimum 0 average 3
h...@psp-fb1 ntpq -c sysstats
uptime: 1059862
sysstats reset: 1059862
packets received:   263004216
current version:144454930
older version:  99867648
bad length or format:   18635251
authentication failed:  316799
declined:   3179
restricted: 14857
rate limited:   56970859
KoD responses:  1405175
processed for time: 76220
h...@pool1 ntpdc -c sysstats
time since restart: 1059868
time since reset:   1059868
packets received:   263005578
packets processed:  76220
current version:144455895
previous version:   99867947
declined:   3179
access denied:  14857
bad length or format:   18635348
bad authentication: 316800
rate exceeded:  56971000
h...@pool1

A bit over 20% of incoming traffic has exceeded rate limits with
discard minimum 0 used (1s minimum).

Cheers,
Dave Hart

   



 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] [Pool] 4000 packets a second?

2010-08-31 Thread David L. Mills

Dave,

The code I wrote in ntp_monitor.c has apparently been rewritten. The MRU 
resolution is in seconds. The original interpretation of  minimum was 
that any headway less than this would be dropped. Setting that to zero 
would mean nothing would be dropped. Apparently, the current code is 
contrary to the original intent and documentation.


I didn't check to see if the probabilistic choice to preempt old entries 
if the list is full remains. My earlier experience is that this is 
important for the busiest servers.


Dave

Dave Hart wrote:


On Wed, Sep 1, 2010 at 00:42 UTC, David L. Mills mi...@udel.edu wrote:
 


Did you intend the discard minimum 0? That effectively disables the rate
control defense mechanism. you should leave it out.
   



That has not been my experience on the pool server I'm involved with:

h...@pool1 fgrep discard /etc/ntp.conf
# discard minimum 0 (power of 2 like poll interval) is needed
discard minimum 0 average 3
h...@psp-fb1 ntpq -c sysstats
uptime: 1059862
sysstats reset: 1059862
packets received:   263004216
current version:144454930
older version:  99867648
bad length or format:   18635251
authentication failed:  316799
declined:   3179
restricted: 14857
rate limited:   56970859
KoD responses:  1405175
processed for time: 76220
h...@pool1 ntpdc -c sysstats
time since restart: 1059868
time since reset:   1059868
packets received:   263005578
packets processed:  76220
current version:144455895
previous version:   99867947
declined:   3179
access denied:  14857
bad length or format:   18635348
bad authentication: 316800
rate exceeded:  56971000
h...@pool1

A bit over 20% of incoming traffic has exceeded rate limits with
discard minimum 0 used (1s minimum).

Cheers,
Dave Hart
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] noise and stability values

2010-08-26 Thread David L. Mills

Jaap,

Please see the Event Messages and Status Codes ant the ntpq pages in the 
documentation in your release.


Dave

Jaap Winius wrote:


Hi folks,

A few years ago I started graphing my NTP server's performance. The 
machine ran Debian lenny with ntp v4.2.4. However, after recently 
upgrading to squeeze, which comes with ntp v4.2.6, I noticed that two 
system variables included in my graphs -- noise and stability -- are no 
longer present: neither in the output of ntpq -c rv, nor in its 
associated manual, ntpq.html, which is part of the ntp-doc package. Well, 
actually the noise variable wasn't mentioned in the previous version of 
the manual either (dated July 28, 2005), but is was present in the output 
from the above command.


So, can ntpq no longer be used to examine a system's noise and stability 
variables, or is it currently necessary to use different command?


Cheers,

Jaap

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Q: No state variable in ntpd 4....@1.2089-o?

2010-08-13 Thread David L. Mills

Ulrich,

The state variable was never intended to be externally visible. It has 
changed in many ways for many reasons. The variables designed for 
external monitoring are explicitly revealed in the documentation, in 
particular the ntpq page and the Event Messages and Status Words page. 
Anything else is and will be grossly misleading.


Dave

Ulrich Windl wrote:


Hi,

in a program I wrote I used the existence of the state variable to
detect a ntpd v4. To my surprise I found a server that runs NTPv4
without a state variable:

ntpq rl
assID=0 status=0415 leap_none, sync_uhf_clock, 1 event, event_clock_reset,
version=ntpd 4@1.2089-o Mon Feb  8 15:25:35 UTC 2010 (1),
processor=i686, system=Linux/2.6.32.7-ppsbeta, leap=00, stratum=1,
precision=-20, rootdelay=0.000, rootdisp=0.141, refid=GPS,
reftime=d00f9725.13579120  Fri, Aug 13 2010 12:04:21.075,
clock=d00f972d.e89a3fc1  Fri, Aug 13 2010 12:04:29.908, peer=46241,
tc=4, mintc=3, offset=0.002, frequency=26.025, sys_jitter=0.002,
clk_jitter=0.003, clk_wander=0.000

Is that standard, or is it a patched version? While I might like the
additional variables, I wonder where the old ones went to...

Regards,
Ulrich

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Advice for a (LAN-interconnected) WLAN Testbed w/ ARM nodes

2010-08-03 Thread David L. Mills

Guys,

May I suggest you review the definitions of system offset (THETA), root 
distance (LAMBDA) and system jitter (PSI) in Section 11.2 of rfc5905.


Dave

David Woolley wrote:


Miroslav Lichvar wrote:


But there will be more clock updates. Noise in frequency may go up,
but the offset will be the same or better, unless there are network
congestions that last longer than the clock filter can handle (8 *
poll interval).

The offset may be better, but offset is not the offset from true 
time, it is the offset from the sum of the upstream server time and 
the measurement errors for the last poll.  There should be no way of 
measuring the error from UTC.  The fact that you can do so, to some 
extent, is due to the NTP algorithms being less than ideal.


___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Is a packet with stratum 1 allowed to contain a KoD code?

2010-07-25 Thread David L. Mills

Danny,


KoD packets have the leap bits set to 3 (unsynchronized); the stratum is 
not signficant. The reference implementation sets the stratum to 16 for 
the RATE kiss code. However, packet stratum 16 is mapped to stratum 0 as 
visible to the monitoring function. Codes like INIT and STEP are used as 
labels for associations and used only for monitoring purposes.


Dave

Danny Mayer wrote:


On 7/19/2010 8:43 AM, Christer Eriksson wrote:
 


Hi,
Is an NTP packet with stratum set to 1 ever allowed to contain a kiss of
death code? I got a server (NTPv4) that sends NTP packets with stratum 1 and
KoD codes like INIT or STEP and I fail to find a confirmation in any RFC
relating to version 4 of NTP whether this is allowed or not.

   



See RFC5905 Section 7.4 for KOD packets. This has nothing to do with
Stratum. Why would you assume that Stratum 1 is somehow exempt? INIT and
STEP are just states of the server and it is instructing the client that
it is not yet ready to deliver accurate timestamps.

Danny

 


Thanks  Best Regards
Christer Eriksson
   


___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Do logs indicate a config problem?

2010-07-16 Thread David L. Mills

Ulrich,

From context I suspect Linux has incorporated the PPS kernel discipline 
code I wrote in the 1990s. That code has several provisions to groom 
noisy PPS signals, including a median filter, popcorn spike suppressor 
and range gate. Apparently, the PPS signal in this case is very noisy 
and these provisions are doing their job.


Dave

Ulrich Windl wrote:


David Lord sn...@lordynet.org writes:

 


Do logs indicate a config problem?

system  server1:  MSFa, PPSa peer=server2 and remote servers
system  server2:  peer=server1 and remote servers
system  server3:  GPSb, PPSb, server2 and server1


This is total logged over period Feb 7 to Feb 12:
ntp.log.server1:
7 Feb 19:22:39 ntpd[26820]: clock SHM(0) event 'clk_noreply' (0x01)
7 Feb 19:22:40 ntpd[26820]: clock PPS(0) event 'clk_noreply' (0x01)
7 Feb 19:27:01 ntpd[26820]: synchronized to SHM(0), stratum 0
7 Feb 19:27:01 ntpd[26820]: kernel time sync status change 2001
7 Feb 19:28:01 ntpd[26820]: synchronized to PPS(0), stratum 0
12 Feb 10:13:44 ntpd[26820]: sendto(xxx.xxx.xx.xx) (fd=27): Host is down
12 Feb 10:20:41 ntpd[26820]: ntpd exiting on signal 15


This is tiny section of log on server1 with similar entries
repeating continuously over whole period Feb 12 to Feb 14. There
were too many reboots and restarts over period for peerstats to
be useful. I swapped kernels attempting to get frequency offset
down from just over 50ppm but the autoconfig still gave 50ppm
(on NetBSD-3.1 I could set options TIMER_FREQ= to  get  10ppm
without problem).

ntp.log.server1:
13 Feb 07:07:48 ntpd[22483]: kernel time sync status change 2901
13 Feb 07:08:53 ntpd[22483]: kernel time sync status change 2101
13 Feb 07:09:59 ntpd[22483]: kernel time sync status change 2301
13 Feb 07:17:28 ntpd[22483]: kernel time sync status change 2501
13 Feb 07:18:32 ntpd[22483]: kernel time sync status change 2301
13 Feb 07:19:38 ntpd[22483]: kernel time sync status change 2901
13 Feb 07:20:41 ntpd[22483]: kernel time sync status change 2301
13 Feb 07:21:47 ntpd[22483]: kernel time sync status change 2101
13 Feb 07:22:50 ntpd[22483]: kernel time sync status change 2501
13 Feb 07:23:54 ntpd[22483]: kernel time sync status change 2101

from timex.h:
0x0001 enable pll updates
0x0002 enable pps freq discipline
0x0004 enable pps time discipline
0x0100 pps signal present
0x0200 pps signal jitter exceeded
0x0400 pps signal wander exceeded
0x0800 pps signal calibration error
   



Hi!

I'd say all 0x800 and 0x400 should never happen unless there is a severe
problem. 0x200 should also happen extremely rarely under normal
conditions. (based on my NTP/Linux experience several years ago)

Regards,
Ulrich

 


This is total logged over period Feb 14 to Feb 19:
ntp.log.server1
14 Feb 10:49:34 ntpd[169]: clock SHM(0) event 'clk_noreply' (0x01)
14 Feb 10:52:53 ntpd[169]: synchronized to xx.xxx.xx.xxx, stratum 2
14 Feb 10:52:53 ntpd[169]: kernel time sync status change 2001
14 Feb 10:58:05 ntpd[169]: synchronized to SHM(0), stratum 0
15 Feb 10:32:40 ntpd[169]: ntpd exiting on signal 15
15 Feb 10:34:00 ntpd[169]: clock SHM(0) event 'clk_noreply' (0x01)
15 Feb 10:40:27 ntpd[169]: synchronized to xx.xxx.xx.xxx, stratum 2
15 Feb 10:40:27 ntpd[169]: kernel time sync status change 2001
15 Feb 10:41:47 ntpd[169]: synchronized to xx.xxx.xx.xxx, stratum 2
15 Feb 10:42:36 ntpd[169]: synchronized to SHM(0), stratum 0


This is tiny section of log on server3 with similar entries
repeating continuously over whole period Feb 7 to Feb 19.
ntp.log.server3:
19 Feb 13:08:24 ntpd[6745]: kernel time sync error \

0x2307PLL,PPSFREQ,PPSTIME,PPSSIGNAL,PPSJITTER,NANO,MODE=0x0=PLL,CLK=0x0=A
19 Feb 13:08:39 ntpd[6745]: kernel time sync status change \
0x2107PLL,PPSFREQ,PPSTIME,PPSSIGNAL,NANO,MODE=0x0=PLL,CLK=0x0=A
19 Feb 13:13:31 ntpd[6745]: kernel time sync error \

0x2307PLL,PPSFREQ,PPSTIME,PPSSIGNAL,PPSJITTER,NANO,MODE=0x0=PLL,CLK=0x0=A
19 Feb 13:13:46 ntpd[6745]: kernel time sync status change \
0x2107PLL,PPSFREQ,PPSTIME,PPSSIGNAL,NANO,MODE=0x0=PLL,CLK=0x0=A



Otherwise timekeeping is good enough for me:

peerstats
ident mean rms  max
==
. MSF using serial dsr with shm and pps to serial dcd with atom
server1  20100208-11127.127.22.00.0451.0387.100 PPSa
server3  20100208-11127.127.22.0   -0.0430.2884.544 PPSb
server3  20100208-11server1 0.0310.4163.485
server3  20100208-11server2 0.4220.5775.196
. MSF config changed to use only serial dcd with shm driver
server1  20100215-18127.127.28.00.1970.4272.723  MSFa
server3  20100215-18127.127.22.00.000   -0.0020.021  PPSb
server3  20100215-18server1 1.3710.4121.139
server3  20100215-18server2 1.4520.5041.222


Offset from MSF receiver varies significantly with temperature,
around 1ms/deg based on offset of server1 seen from 

Re: [ntp:questions] monitoring symmetric peers

2010-07-12 Thread David L. Mills

Kostas,

Your symmetric peers have the same upstream system peer, so by rule they 
are not going to believe each other unless one of them switches to a 
different upstream source. Even if they have differerent upstream 
sources, one of them will not beleive the other, since that would create 
a loop.


Dave

Kostas Magkos wrote:


Hi all,

I have two Debian etch ntp servers, running on stock 2.6.18-5-amd64
kernel with ntp 4.2@1.1585-o. The two systems are configured as
stratum 2 ntp servers, synchronised with various public stratum 1
servers. They are configured as symmetric active peers.

How can I ensure that the two servers are actually symmetric peers? When
I list their associations, I can see that each server lists the other
one as reject, outlyer or falseticker. I understand that they couldn't
be sys.peers, but shouldn't they be candindates for each other, in that
way participating in the cluster?

Thanks in advance,

Kostas Magkos

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] SNTP with 1ms of precision?

2010-07-08 Thread David L. Mills

David,

The basic definition of SNTP has not changed over the yeas, although 
rfc5905 does clarify the intended scope and role of primary servers, 
secondary servers and clients. It was the expected, but not required, 
model that the Unix adjtime() system call be used if the offset was less 
than an unspecified value and the settimeofday() if greater. There was 
no intent, either in the earlier SNTP specifications or rfc5905 to 
specify the SNTP clock discipline algorithm itself.


Dave

David Woolley wrote:


Danny Mayer wrote:


On 6/16/2010 5:22 PM, Maarten Wiltink wrote:


Marcelo Pimenta marcelopiment...@gmail.com wrote in message
news:aanlktilq6m8apeoasibr-o8mhwifqkfv9xyf6mudr...@mail.gmail.com...

[...]


The NTP algorithm is much more complicated than the SNTP algorithm.


The short, short version: there is no SNTP algorithm. SNTP is NTP
_without_ the algorithms. Using NTP means continuously adjusting the
speed of your clock so it tracks real time as best you can make it,
while SNTP is simply asking what time [they think] it is.



This is a totally inaccurate statement. See RFC 5905 Section 14. SNTP is



That RFC was published after this thread was started! You can't go 
changing the definitions just for you convenience.  Even if it had 
been published, say six months earlier, the reality is that de facto 
and historic definitions would still dominate the market.



merely a subset of the full NTP protocol. An SNTP server is one with
it's own refclock and not dependent on any other upstream servers while
and SNTP client is one with a single upstream server and no dependent
clients. An SNTP client or an SNTP server should be disciplining the
clock in the same way as an NTP server. An SNTP server should
continuously adjust the speed of your clock otherwise it's not SNTP
compliant.



In reality, most SNTP clients step the clock.  A few may use a simple 
frequency control scheme.  Once you go much beyond that, it becomes 
simpler to use a full NTP client, but maybe configure only one server.


In fact, RFC 1305 doesn't require any specific clock discipline for 
NTPv3 clients; that is in an appendix, rather than in the main 
specification.


The important clarification about SNTP, ignoring any recent attempt to 
redefine it, is that it doesn't specify an algorithm, rather than that 
it requires the use of only a trivial algorithm.


___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Clock and Network Simulator

2010-07-01 Thread David L. Mills

Miroslav,

You have not revealed the result of the experiment I suggested, so I 
don't know whether the Linux kernel performs as expected with the 
original design parameters.


I think we are done with this discussion. The kernel discipline loop is 
conservatively designed according to sound engineering practice. This is 
the practice in the systems engineering course I taught over several 
years. You are invited to obstruct that practice to your own ends, but 
not in the public distribution.


Dave

Miroslav Lichvar wrote:


On Wed, Jun 30, 2010 at 10:00:06PM +, David L. Mills wrote:
 


Is there somebody around here that understands feedback control
theory? You are doing extreme violence to determine a really simple
thing, the discipline loop impulse response. There is a much simpler
way.
   



It was a demonstration of what clknetsim can do. You may be able
to predict the result, but I'm not. I think being able to verify a
theory with simulations is always a good thing.

 


Of particular importance is the damping factor, which is evident
from the overshoot. If SHIFT_PLL is radically changed, I would
expect the overshoot to be replaced by an exponentially decaying
ring characteristic.
   



That's not what I see in tests on real hw and simulations with
SHIFT_PLL 2.

 


The change in SHIFT_PLL would result
in unstable behavior below 5 (32 s), as well as serious transients
if the discipline shifts from the daemon to the kernel and back. All
feedback loops become unstable unless the time constant is at least
several times the frequency update interval, which is this case is
one second. If you do want to explore how stability may be affected,
restore the original design and recompile the distribution with
NTP_MINPOLL changed from 3 to 1.
   



Is poll 1 SHIFT_PLL 4 really equal to poll 3 SHIFT_PLL 2 in this
respect? If you can provide information how to demostrate the
instability with SHIFT_PLL 2 and normal polls, it'll be much easier to
convince the kernel folks to change it back to 4.

With polls 3-10 and SHIFT_PLL 2, the only instability I've seen is
with very long update intervals (e.g. when the network connection
repeatedly goes up and down), the frequency will eventually start
jumping between +500 and -500 ppm. But kernel loop with SHIFT_PLL 4
and daemon loop with small poll intervals have the same problem, the
threshold is just 4 times higher for them.

clknetsim has a pll_clamp option which can be enabled to avoid this
instability, it clamps the PLL update interval to
tc * (1  (ntp_shift_pll + 1)), where tc is the time constant in
seconds. I will be doing more testing with it and possibly propose to
include a similar code in the kernel.

As for runtime switching between daemon and kernel discipline, I
haven't tried that. I didn't even know it is supported by ntpd.

 


To fix the original problem reported to me, change the frequency
gain (only) by the square of  100 divided by the new clock frequency
in Hz. For instance, to preserve the loop dynamics with a 1000-Hz
clock, divide the frequency gain parameter by 100. In the original
nanokernel routine ktime.c at line 60 there is a line

SHIFT_PLL * 2 + time_constant.

Replacing it by SHIFT_PLL * 20 + time_constant would fix the
progblem for 1000-Hz clocks.
   



I'm not a kernel developer, but I think this is already fixed. Current
kernels can be configured to use a dynamic HZ (CONFIG_NO_HZ aka
tickless mode), so the ntp code had to be rewritten to allow such
operation. With SHIFT_PLL 4, the response and the overshoot is exactly
as you describe it should be.

BTW, the effect of changing SHIFT_PLL to 2 on clock accuracy in
various network conditions is shown here:

http://fedorapeople.org/~mlichvar/clknetsim/test1_exp.png

With poll 6 and 10ppb/s wander, the crossover is around 10ms jitter.
With larger jitters SHIFT_PLL 2 can be up to 2 times worse (it seems
this can't be improved by lowering the poll interval) and with very
small jitters it can be about 50 times better.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Clock and Network Simulator

2010-07-01 Thread David L. Mills

Miroslav,

Exactly as expected. The overshoot exceeds the design limit of 10 
percent by as much as 40 percent. That's exactly what the design is 
intended to avoid


Dave

Miroslav Lichvar wrote:


On Wed, Jun 30, 2010 at 10:00:06PM +, David L. Mills wrote:
 


The change in SHIFT_PLL would result
in unstable behavior below 5 (32 s), as well as serious transients
if the discipline shifts from the daemon to the kernel and back. All
feedback loops become unstable unless the time constant is at least
several times the frequency update interval, which is this case is
one second. If you do want to explore how stability may be affected,
restore the original design and recompile the distribution with
NTP_MINPOLL changed from 3 to 1.
   



I recompiled ntpd with NTP_MINPOLL 1 and here is a PLL response plot
for SHIFT_PLL 4 poll 1 and SHIFT_PLL 2 poll 3:

http://fedorapeople.org/~mlichvar/clknetsim/test6.png

The initial offset is 0.1 second, after crossing zero offset, they
both stay in negative.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Clock and Network Simulator

2010-07-01 Thread David L. Mills

Rob,

With due respect, I don't think you know what you are talking about. The 
original discipline loop described in rfd1305 was refined as described 
in my 1995 paper and further refined over the years since then. For each 
and every refinement a series of tests, both in simulation and in situ, 
were performed with initial offsets up to +-500 PPM and +-100 ms to 
verify correct behavior with the original parameters. The daemon loop is 
required to operate over a time constant between 8 s and 36 h, which is 
an extremely large range as verified by ongoing configurations here.


The kernel loop is designed to replicate the daemon loop over a much 
narrower range between 8s and 1024 s. The ideal poll interval is 16 and 
64 s matching the Allan intercept as described in the literature. It was 
first implemented for the Alpha in 1992 and refined as described in my 
1995 paper and not changed since then. Correct behavior must be 
confirmed by an experiment such as I suggested in a previous message. 
Sound engineering principles project that the behavior at other time 
constants will be as I described. If you don't believe those principles, 
you are not exercising sound engineering judgment.


Dr. Dave

Rob wrote:


Miroslav Lichvar mlich...@redhat.com wrote:
 


On Wed, Jun 30, 2010 at 10:00:06PM +, David L. Mills wrote:
   


Is there somebody around here that understands feedback control
theory? You are doing extreme violence to determine a really simple
thing, the discipline loop impulse response. There is a much simpler
way.
 


It was a demonstration of what clknetsim can do. You may be able
to predict the result, but I'm not. I think being able to verify a
theory with simulations is always a good thing.
   



Mr Mills is of the school that says the design predicts that the
program behaves like that, and the implementation has not changed
for 10 years, so it must be correct.  While I normally adhere to the
same principles, it happened just a week or two ago that he had to
admit that there was a bug in code that he firmly believed was
correct and that an observed behaviour could not possibly happen.

So I agree with you that it never hurts to test something that theory
has already proved to be correct.  Maybe the actually released program
does not really implement the mechanism that was designed, without
the programmer knowing it.

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Clock and Network Simulator

2010-07-01 Thread David L. Mills

Rob,

Your comment makes no sense. The actual code implemented from my design 
was tested in Solaris and also in FreeBSD. In both cases the tests 
confirmed the behavior described previously. I have not tested it in 
Linux. If it performs other than as I described, the port is broken.


Dave

Rob wrote:


David L. Mills mi...@udel.edu wrote:
 


Rob,

With due respect, I don't think you know what you are talking about.
   



Read it again.  I don't question your design, I question your claims
that the code implements the design which are upheld even when the
contrary is shown in observations.
Even when your design is perfect and your belief is firm, there can
still be bugs.

___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Clock and Network Simulator

2010-06-30 Thread David L. Mills

Miroslav,

Is there somebody around here that understands feedback control theory? 
You are doing extreme violence to determine a really simple thing, the 
discipline loop impulse response. There is a much simpler way.


Forget everything except the tools that come with the NTP distribution. 
Find a good, stable server and light up a selected client. Make sure the 
client kernel is enabled. Set minpoll and maxpoll to 6. Configure the 
loopstats monitoring function. Run the client until operation stabilizes 
as determined by the loopstats data


While the daemon is running, use ntptime to set the clock offset to 100 
ms. Go away and do something useful a couple of yours.


Inspect the loopstats data. It should start at 100 ms, exponentially 
decay to zero in about 3000 s, overshoot about six percent of the 
initial offset., then slowly decrease to zero over a period of hours. 
This is the intended nominal behavior for a poll interval (same as time 
constant) of 6. If you increase (decrease) the poll interval by one, the 
impulse response will look the same, but at double (half) the time 
scale. This should hold true for poll intervals from 3 to 10.


Of particular importance is the damping factor, which is evident from 
the overshoot. If SHIFT_PLL is radically changed, I would expect the 
overshoot to be replaced by an exponentially decaying ring 
characteristic. With the intended loop constants the behavior should 
have a single overshoot characteristic in the order of a few percent. 
From a mathematical and engineering point of view the intended behavior 
provides the fastest convergence, relative to the chosen time constant, 
with only nominal overshoot.


If the intended effect of the SHIFT_PLL change was to decrease the 
convergence time, that is the absolute worst thing to do. The nanokernel 
allows the time constant to range from zero to ten and carefully scales 
the state variables to match, although it (and the daemon discipline) 
starts to become unstable at values below 3, the minimum enforced by the 
daemon. The change in SHIFT_PLL would result in unstable behavior below 
5 (32 s), as well as serious transients if the discipline shifts from 
the daemon to the kernel and back. All feedback loops become unstable 
unless the time constant is at least several times the frequency update 
interval, which is this case is one second. If you do want to explore 
how stability may be affected, restore the original design and recompile 
the distribution with NTP_MINPOLL changed from 3 to 1.


Now to the issue of multiple tandem server/clients. You don't need to 
explore the behavior; it can be reliably predicted. Assume the server 
and all downstream clients are started at the same time. The impulse 
response of  the first downstream client of the original client 
operating as a server is the convolution of the original impulse 
response with itself. Roughly speaking, the offset decay is slower, 
reaching zero in twice the original time or about 6000 s. The behavior 
of the next downstream client is the convolution of this convolution and 
the original impulse response and so on.


To fix the original problem reported to me, change the frequency gain 
(only) by the square of  100 divided by the new clock frequency in Hz. 
For instance, to preserve the loop dynamics with a 1000-Hz clock, divide 
the frequency gain parameter by 100. In the original nanokernel routine 
ktime.c at line 60 there is a line


SHIFT_PLL * 2 + time_constant.

Replacing it by SHIFT_PLL * 20 + time_constant would fix the progblem 
for 1000-Hz clocks.


Dave

Miroslav Lichvar wrote:


On Tue, Jun 29, 2010 at 06:31:01PM +, David L. Mills wrote:
 


From your description your simulator is designed to do something
else, but what else is not clear from your messages. It might help
to describe an experiment using your simulator and show what results
it produces.
   



It's designed to test NTP implementations, but it uses a more general
approach.

Ntpdsim tests ntpd as an NTP client with simulated NTP servers in a
simulated network. Clknetsim doesn't simulate NTP servers, it
simulates only a network to which are connected real NTP clients and
servers.

The difference is that ntpdsim tests one NTP client and clknetsim
tests whole NTP network.

Say we want to test how does the Linux SHIFT_PLL change affect an NTP
network. There is a chain of seven ntpd daemons configured, all using
poll 6. Strata 1, 3, 5, 7 have SHIFT_PLL 2 and strata 2, 4, 6 have
SHIFT_PLL 4. Stratum 1 has clock with zero wander and frequency offset
and is using the LOCAL driver, the rest have clocks with 1ppb/s
wander. Between all nodes is network delay with exponential
distribution and a constant jitter. The simulations are repeated with
jitter starting from 10 microseconds and increased to 0.1 second in 28
steps. Each simulation is 400 seconds long and the result is a
list of RMS offsets, one for each stratum.

After finishing all iterations, we'll make an RMS

Re: [ntp:questions] Clock and Network Simulator

2010-06-30 Thread David L. Mills

Bill,

The ntpdsim simulator uses the real system clock, which is modeled as 
the Allan variance. However, the server clock is modeled as a 
random-walk process computed as the integral of a Gaussian process. The 
network is modeled as an exponential distribution, although provisions 
have been made to model transients in the form of step changes.


Dave

unruh wrote:


On 2010-06-30, David Woolley da...@ex.djwhome.demon.invalid wrote:
 


Miroslav Lichvar wrote:

   


and is using the LOCAL driver, the rest have clocks with 1ppb/s
wander. Between all nodes is network delay with exponential
distribution and a constant jitter. The simulations are repeated with
 

Real world NTP networks don't behave like that, and most of the things 
that annoy people relate to the real world behaviour.  Real networks are 
subject to diurnal and near step changes in frequency.
   



? It seems you are discussing the behaviour of the clocks, not of the
network. (the network has no frequency). His simulator works with real
clocks which have exactly the behaviour you describe. Now it may be that
the network model is not the best ( random long delays in one way trip
time due to network overload).

 


Dr Mills modelling also has this problem.
   



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Clock and Network Simulator

2010-06-30 Thread David L. Mills

Guys,

Mercy me! I lied through my incisors. Upon review, the NTP simulator 
simulates the system interfaces to set, adjust and read the hardware 
system clock, so the actual hardware is not involved. It's been several 
years since I used the simulator and it has been updated since then.


Dave

David L. Mills wrote:


Bill,

The ntpdsim simulator uses the real system clock, which is modeled as 
the Allan variance. However, the server clock is modeled as a 
random-walk process computed as the integral of a Gaussian process. 
The network is modeled as an exponential distribution, although 
provisions have been made to model transients in the form of step 
changes.


Dave

unruh wrote:


On 2010-06-30, David Woolley da...@ex.djwhome.demon.invalid wrote:
 


Miroslav Lichvar wrote:

  


and is using the LOCAL driver, the rest have clocks with 1ppb/s
wander. Between all nodes is network delay with exponential
distribution and a constant jitter. The simulations are repeated with



Real world NTP networks don't behave like that, and most of the 
things that annoy people relate to the real world behaviour.  Real 
networks are subject to diurnal and near step changes in frequency.
  



? It seems you are discussing the behaviour of the clocks, not of the
network. (the network has no frequency). His simulator works with real
clocks which have exactly the behaviour you describe. Now it may be that
the network model is not the best ( random long delays in one way trip
time due to network overload).

 


Dr Mills modelling also has this problem.
  



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Clock and Network Simulator

2010-06-29 Thread David L. Mills

Miroslav,

You don't need to read the code, just the documentation. The ntpdsim is 
what the literature calls a probabilistic discrete event simulator 
(PDES). It provides a virtual environment including multiple servers and 
simulated network behavior within which the actual ntpd code runs. 
Everything, including packet send and receive, is simulated independent 
of operating system. It runs hundreds of times faster than the real 
system. It is indent is to study the synchronization behavior under 
low-probabiliy and long-baseline conditions. It is not a simple system; 
it simulates the actual conditions in a real world, multiple server 
environment and provides the same error reports, statistics reports and 
synchronization data as the real system..


From your description your simulator is designed to do something else, 
but what else is not clear from your messages. It might help to describe 
an experiment using your simulator and show what results it produces.


Dave

Miroslav Lichvar wrote:


On Mon, Jun 28, 2010 at 05:35:34PM +, David L. Mills wrote:
 


How is your simulator different than the one included in the NTP
software distribution?
   



If I read the code correctly, ntpdsim simulates inside ntpd a
minimalistic NTP server (or multiple servers) with configured wander
and network delay, and an ideal local clock.

clknetsim simulates clocks and a network to which are connected
unmodified ntpd daemons. The simulation is transparent for them. With
symbol preloading (dynamic linker's LD_PRELOAD variable) they don't
use the real system calls like sendto(), recvfrom(), select(),
gettimeofday(), ntp_adjtime(), but they use the symbols provided by
clknetsim instead. The system calls are passed to the clknetsim
server, which synchronizes all events, adjusts the clocks, forwards
NTP packets and monitors the real time and frequency offsets of the
virtual clocks.

So, for a simulation similar to ntpdsim, at least two ntpd daemons
have to be connected to clknetsim, one configured as a server with
LOCAL clock (adding a refclock source is on my todo list) and a client
which will be the subject of testing. 


The advantage of this approach is that it allows you to test pretty
much anything that can be tested in a real network, a chain of NTP
servers (few such tests are on the clknetsim webpage), server/peer
modes, broadcast modes, authentication, compatibility with older
versions or different NTP implementations, etc.

The disadvantage is that the simulation runs slower.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Reference clock driver for /dev/rtc

2010-06-28 Thread David L. Mills

Miroslav,

Changing the value of  SHIFT_PLL from 4 to 2 makes the kernel discipline 
behave quire differently than the daemon discipline. When switching from 
one to the other, the result will be a serious transient. With a kernel 
update interval of one second, this value violates the minimum delay 
requirement of the feedback loop. The design was very carefully done 
according to sound engineering theory and practice. The design insures 
optimum rise time relative to the time constant along with a controlled 
overshoot of about five percent. The change to SHIFT_PLL will compromise 
the intended behavior. If you make changes like that, be sure someone is 
around with knowledge of feedback control theory.


The frequency gain problem was reported some time ago and I provided a 
simple fix which reduces the frequency gain by a factor of the square of 
the ratio of the actual clock frequency to 100 Hz. It has nothing to do 
with a tickless kernel.


The extra pole in the adjtime() emulation was reported to me some time 
ago and might have since been removed. The result with the extra pole 
will be underdamping at large poll intervals resulting in oscillatory 
behavior. It is most serious where the adjtime() pole frequency is close 
to the discipline poll frequency. If it makes any difference, SGI has 
the same problem.


Linux users should be told the incompatibility with the 
ntp_gettimeofday() call means the TAI-UTC offset feature provided by the 
NIST leapseconds file and the Autokey protocol will be unavailable. This 
is most important for NASA/JPL users when converting to and from UTC and 
TAI and eventually to TDB for deep space missions.


Dave

Miroslav Lichvar wrote:


On Sat, Jun 26, 2010 at 03:07:33PM +, David L. Mills wrote:
 


Another case in which the engineering model in Linux and NTP are not
compatible. Neither is necessarily wrong, just different. The
following issues are known to me.

1. The Linux kernel discipline code adapted from my Alpha code of
the 1990s does not account for the frequency gain at other than a
100-Hz clock. With a 1000-Hz clock this results in serious
instability. I pointed this out some years ago and it is a trivial
modification, but so far as I know it has not been fixed.
   



I think this was fixed few years ago when the tickless mode was
introduced in the kernel.

However, current kernels have one compile time constant set
differently from the standard implementation, the PLL gain shift
(SHIFT_PLL) is 2 instead of 4.

BTW, on the clknetsim page I announced in my previous post there are
some tests done with both SHIFT_PLL constants.

 


2. The Linux adjtime mechanism inserts an extra pole in the impulse
response, presumably to speed convergence when relatively large
adjustments are made. This makes NTP unstable at the larger poll
intervals when the kernel discipline is not in use. Both the kernel
and daemon discipline loops are carefully designed according to
sound engineering principles for optimum response, but the extra
pole defeats the design.
   



In which Linux version was this or how large the adjustment needed to
be? I don't remember ever seeing it and the ntpd daemon mode works
fine as far as I can tell.

 


3. The calling sequence for the ntp_gettime() system call is
incompatible with current use. As a result, access to the TAI-UTC
offset by application programs is not available.
   



This probably won't be fixed as it would break glibc compatibility.

 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Reference clock driver for /dev/rtc

2010-06-26 Thread David L. Mills

Bill,

Another case in which the engineering model in Linux and NTP are not 
compatible. Neither is necessarily wrong, just different. The following 
issues are known to me.


1. The Linux kernel discipline code adapted from my Alpha code of the 
1990s does not account for the frequency gain at other than a 100-Hz 
clock. With a 1000-Hz clock this results in serious instability. I 
pointed this out some years ago and it is a trivial modification, but so 
far as I know it has not been fixed.


2. The Linux adjtime mechanism inserts an extra pole in the impulse 
response, presumably to speed convergence when relatively large 
adjustments are made. This makes NTP unstable at the larger poll 
intervals when the kernel discipline is not in use. Both the kernel and 
daemon discipline loops are carefully designed according to sound 
engineering principles for optimum response, but the extra pole defeats 
the design.


3. The calling sequence for the ntp_gettime() system call is 
incompatible with current use. As a result, access to the TAI-UTC offset 
by application programs is not available.


4. As in the current instance, management of the RTC and system clock is 
incompatible. This issue should be reviewed in the context of the 
various models, whether the kernel or daemon discipline is in use and 
whether the system is awake or sleeping.


There are probably others and they probably could be resolved to insure 
a consistent model between Linux and other operating systems.


Dave

unruh wrote:


On 2010-06-23, David L. Mills mi...@udel.edu wrote:
 


Pavel,

Linux has many, many times broken the NTP model compatible with other 
systems such as Solaris and FreeBSD, among others. I have no trouble 
with that as long as whatever modifications are required in NTP to make 
the RTC driver work remain proprietary to Linux and never leak to other 
systems. I have no idea what the Linux 11-minute process is about, but 
it probably conflicts with the NTP 1-hour RTC alignment.
   



Linux, depending on the setting of a flag in the adjtimex setup, sets
the rtc from the system time once every  11 min. . This is a disaster if
you have a procedure to discipline the rtc (eg hwclock, or chrony) and
the sychronization  flag must be kept unset to prevent this behaviour.
On most systems the rtc is used to set the clock whent he computer is
down . Ie, the rtc in those cases CANNOT be disciplined. All you can do
is to determine the offset and drift rate of the rtc to make the use as
accurate as possible when the system comes up again. It is very hard to
detrmine the drift rate of a clocck that keeps getting reset
To then use the rtc mechanism in a VM seems to me to be overloading the
mechanism, making it hard to anything reasonable with it. 



 

If your driver requires the Linux model, whatever modifications are 
required in the base code (#ifdefs) will not be supported here and may 
conflict with future developments. On the other hand, it could be, for 
example, that the RTC provide a 1-second interrupt similar to the PPS 
signal now. On that assumption the base code might have a feature that 
supports the RTC in much the same way the NMEA driver does now. That 
would be a generic solution and nicely fit the NTP.


Dave

Krejci, Pavel wrote:

   


Hello Dave,


   
   From: David L. Mills [mailto:mi...@udel.edu]
   Sent: Wednesday, June 23, 2010 4:42 AM
   To: Krejci, Pavel
   Cc: questions@lists.ntp.org
   Subject: Re: [ntp:questions] Reference clock driver for /dev/rtc

   Pavel,

   It's not as simple as that. Normally, ntpd uses settimeofday()
   once per hour to set the system clock, which has the side effect
   of setting the RTC. Obviously, you don't want that. If the RTC
   refclock is enabled, that has to be disabled, so some kind of
   interlock must be devised. This can be a tricky business and have
   unintended consequences if something or other fails.  The
   interlocks with the PPS signal come to mind.   
   Do you mean the 11 minute mode in Linux, when the system time is

   periodically written to the rtc in 11 minute intervals? This is
   triggered by the synch status (time_status variable in the
   kernel). I've solved this by periodically resetting this synch
   status in my refclock driver.

   You are correct in that the RTC has in general far better
   temperature compensation than either the system clock or the
   TSC/PCC counter. However, its resolution is generally far worse.
   Even so, the lowpass character of the clock discipline masks this
   so actual delivered system time should be quite good. Chapter 15
   of my new book due in September contains an extensive discussion
   on these issues.
   Theoretically the worst RTC resolution is 1 second, but usually if
   offers update IRQ whenever the seconds counter changes. And this
   gives good resolution for my system. Attached is the /dev/rtc
   peerstats from my qemu guest

Re: [ntp:questions] Reference clock driver for /dev/rtc

2010-06-26 Thread David L. Mills

Kalle,

Calling settimeofday() is completely transparent to the kernel and ntpd 
state variables, including the UNSYNC bit; however, the actions in Linux 
might violate this design. Setting the RTC is a byproduct of 
settimeofday(), but in general setting the time to the current time is a 
no-op, at least to within 800 microseconds in a 1986 SPARC IPC running 
SunOS 4, but much less in modern times. See the comments at about line 
228 in ntp_util.c; note the code is enabled by the DOSYNCTODR define.


If there is a more generic way to set the RTC over all or most operating 
systems, it should be reconsidered. The Linux folks are invited to 
contribute #ifdefs as necessary. As it is, the current code goes back to 
SunOS circa 1986.


Dave

Kalle Pokki wrote:


On Sat, Jun 26, 2010 at 11:48, David Woolley
da...@ex.djwhome.demon.invalid wrote:

 


I think you missed Dave Mills' point that ntpd does this every 60 minutes,
so will also break mechanisms for compensating for RTC drift whilst the
processor is powered down.
   



I don't understand.

Settimeofday() isn't about updating the RTC. It updates the system
clock. Why would ntpd call that regularly and cause unnecessary jumps
in system time?

Calling settimeofday() also clears all NTP state variables inside the
kernel and sets the UNSYNC bit.
___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions
 



___
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions


Re: [ntp:questions] Reference clock driver for /dev/rtc

2010-06-24 Thread David L. Mills

Pavel,

Linux has many, many times broken the NTP model compatible with other 
systems such as Solaris and FreeBSD, among others. I have no trouble 
with that as long as whatever modifications are required in NTP to make 
the RTC driver work remain proprietary to Linux and never leak to other 
systems. I have no idea what the Linux 11-minute process is about, but 
it probably conflicts with the NTP 1-hour RTC alignment.


If your driver requires the Linux model, whatever modifications are 
required in the base code (#ifdefs) will not be supported here and may 
conflict with future developments. On the other hand, it could be, for 
example, that the RTC provide a 1-second interrupt similar to the PPS 
signal now. On that assumption the base code might have a feature that 
supports the RTC in much the same way the NMEA driver does now. That 
would be a generic solution and nicely fit the NTP.


Dave

Krejci, Pavel wrote:


Hello Dave,
 



From: David L. Mills [mailto:mi...@udel.edu]
Sent: Wednesday, June 23, 2010 4:42 AM
To: Krejci, Pavel
Cc: questions@lists.ntp.org
Subject: Re: [ntp:questions] Reference clock driver for /dev/rtc

Pavel,

It's not as simple as that. Normally, ntpd uses settimeofday()
once per hour to set the system clock, which has the side effect
of setting the RTC. Obviously, you don't want that. If the RTC
refclock is enabled, that has to be disabled, so some kind of
interlock must be devised. This can be a tricky business and have
unintended consequences if something or other fails.  The
interlocks with the PPS signal come to mind.   
Do you mean the 11 minute mode in Linux, when the system time is

periodically written to the rtc in 11 minute intervals? This is
triggered by the synch status (time_status variable in the
kernel). I've solved this by periodically resetting this synch
status in my refclock driver.

You are correct in that the RTC has in general far better
temperature compensation than either the system clock or the
TSC/PCC counter. However, its resolution is generally far worse.
Even so, the lowpass character of the clock discipline masks this
so actual delivered system time should be quite good. Chapter 15
of my new book due in September contains an extensive discussion
on these issues.
Theoretically the worst RTC resolution is 1 second, but usually if
offers update IRQ whenever the seconds counter changes. And this
gives good resolution for my system. Attached is the /dev/rtc
peerstats from my qemu guest system. The clock offset keeps
under 1 milisecond which is enough for our purposes. I will check
your book when published.
 
Regards
Pavel 
 
Dave


Krejci, Pavel wrote:


Hi,

well, then, do you find it useful? How should I proceed to contribute into ntpd 
project?

Thanks
Pavel

 


-Original Message-
From: unruh [mailto:un...@wormhole.physics.ubc.ca]
Sent: Thursday, June 17, 2010 11:48 AM
To: questions@lists.ntp.org
Subject: Re: [ntp:questions] Reference clock driver for /dev/rtc

On 2010-06-16, Krejci, Pavel
pavel.kre...@siemens-enterprise.com wrote:
   


Hi,

 


-Original Message-
From: unruh [mailto:un...@wormhole.physics.ubc.ca]
Sent: Tuesday, June 15, 2010 7:15 PM
To: questions@lists.ntp.org
Subject: Re: [ntp:questions] Reference clock driver for /dev/rtc

On 2010-06-15, Krejci, Pavel
pavel.kre...@siemens-enterprise.com wrote:
   


Hi,

since I cannot use kvm-clock as the clock source (older
 


guest kernel) I am using pit as the clock source. According to my
tests this is the most stable clock source among tsc,hpet
   


but still
   


can drift. Since the qemu keeps the /dev/rtc perfectly
   


synchronized
   


with the Host's system time it is a good time source for
   


the ntpd on
   


the guest. The host itself is then sychronized via NTP with the
external time server. I don't know any other way how to read the
system time from the Host, please offer if you have some.
   


I do not understand. If you driver can read the rtc, it
   


can read the
   


system clock instead.
   


I am not reading the Host's /dev/rtc. I am reading the
 


Guest's /dev/rtc, which is synchronized with the Host's system clock.
   


OK, if that is the way your virtual system works, (Ie it
delivers the system time via /dev/rtc) then so be it. I would
say it is terrible, since it uses a predefined item ( rtc) to
deliver something totally different ( the system time of the
underlying host) rtc has numberous idiosyncracies, not oleast
being that it delivers only times with one second precision.
It also delivers an interrupt on one second boundaries, is
written by a displacement of .5 sec (Ie if you write the time
x to it, that time refers to the time of the rtc .5 sec in
the future. ) I

  1   2   3   4   >