[ntp:questions] Windows Won't Syncronize to NTP

2008-03-30 Thread Matthew Lind
I can't get a Windows Client to sync to my NTP server.  All Linux
clients work fine.

Here is the info from my ntp.conf:

driftfile /var/lib/ntp/drift
restrict 127.0.0.1
restrict  mask 255.255.224.0 nomodify notrap 
server 0.rhel.pool.ntp.org iburst
server 1.rhel.pool.ntp.org iburst
server 2.rhel.pool.ntp.org iburst


Here is the tcpdump and ntpd -d info:

TCP DUMP OUTPUT:

13:02:19.440054 IP .ntp > .ntp: NTPv3, symmetric
active, length 48


ntpd -d output

receive: at 447 cent.ntp<-winclient mode 1 code 5 auth 0
transmit: at 447 cent.ntp->winclient mode 1


IPTables is off for testing

Any suggestions?

Thanks



___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


[ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread starlight
Hello,

I'm trying to configure a small network for high precision time. 
Recently acquired an Endrun CDMA time server that runs like 
a dream, tracking CDMA time to about +/- 5 microseconds.

The clients are a rag-tag assembly of diverse systems including 
a Centos 4.5 Linux i686, Linux x86_64, Sun Ultra 10, Sun Ultra 80, 
IBM RS/6000 44p, Windows 2003 X64, and a Windows XP laptop.

All are configured to prefer the Endrun clock and poll it on a 
16 second interval.  All are attached to a single SMC gigabit 
Ethernet switch with only the Endrun and two Sun systems running 
at a lower speed of 100 MBPS.  Close to zero network traffic
and system loads.

All systems are running 'ntpd' 4.2.4p4.  Compiled NTP native 
64-bit for the Windows X64 system.  [A #ifdef tweak to 
'intptr_t' and 'uintptr_t' is required, will provide patch if 
desired].

It generally is working well, with the systems tracking anywhere 
from +/- 100 microseconds to +/- 500 microseconds most of the 
time.

However once or twice a day, all the systems experience a 
random, uncorrelated time shift of from one to several 
milliseconds.  Had an issue where a UPS voltage correction shift 
and cheap power supply on the Windows X64 box appeared to be a
problem, but that was fixed by configuring the UPS to consider 
110V nominal instead of 120V.

Does anyone have any ideas about what could be causing these 
random time jumps and what might be done to eliminate them?

Something I'm planning to try is to make sure that 'mlock' is 
configured in the daemons--presently 'autoconf' has left it 
disabled for some reason.  However I don't belive page
faults are the culprit.  All the daemons are running at 
the highest real-time priority in the respective systems.

The above configuration is a controlled lab setup.  The next 
target is a stack eight of DELL 1950 servers in a production 
data center running Windows 2003 R2 and slaved to a newer Endrun 
time server.  Don't have useful data from these systems yet 
because the network jitter is outrageous.  Working with the 
network admin to hopefully have the NTP traffic to and from the 
Endrun clock bypass level 3 switch/router rule checking.  They 
have large, complex router ACL rulesets I suspect as the cause
of the jitter.

Attached are fairly representative graphs of the offset and 
frequency for two of the lab servers.

Thanks


P.S. Resent without graphs as the list mailer says
they're not allowed.  Happy to send them or the raw
'loopstats' to anyone interested.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread Richard B. Gilbert
[EMAIL PROTECTED] wrote:
> Hello,
> 
> I'm trying to configure a small network for high precision time. 
> Recently acquired an Endrun CDMA time server that runs like 
> a dream, tracking CDMA time to about +/- 5 microseconds.
> 
> The clients are a rag-tag assembly of diverse systems including 
> a Centos 4.5 Linux i686, Linux x86_64, Sun Ultra 10, Sun Ultra 80, 
> IBM RS/6000 44p, Windows 2003 X64, and a Windows XP laptop.
> 
> All are configured to prefer the Endrun clock and poll it on a 
> 16 second interval.  All are attached to a single SMC gigabit 
> Ethernet switch with only the Endrun and two Sun systems running 
> at a lower speed of 100 MBPS.  Close to zero network traffic
> and system loads.
> 
> All systems are running 'ntpd' 4.2.4p4.  Compiled NTP native 
> 64-bit for the Windows X64 system.  [A #ifdef tweak to 
> 'intptr_t' and 'uintptr_t' is required, will provide patch if 
> desired].
> 
> It generally is working well, with the systems tracking anywhere 
> from +/- 100 microseconds to +/- 500 microseconds most of the 
> time.
> 
> However once or twice a day, all the systems experience a 
> random, uncorrelated time shift of from one to several 
> milliseconds.  


Forcing the poll interval to 16 seconds is not always a good idea!
Ntpd will select a poll interval, generally starting at 64 seconds, and 
ramping up to as long as 1024 seconds as the clock is beaten into 
submission!

Directly connected refclocks are frequently polled at shorter intervals
but I don't think your refclock is "directly connected" in the same 
sense that a clock working through a serial or parallel port is directly
connected!

A clock connected via ethernet with all the latencies and jitter 
thereunto appertaining is no different than any other network server and 
should be polled in the same manner!

The very short poll intervals correct large errors quickly and the very 
long intervals correct small errors very accurately!

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread Unruh
[EMAIL PROTECTED] writes:

>Hello,

>I'm trying to configure a small network for high precision time. 
>Recently acquired an Endrun CDMA time server that runs like 
>a dream, tracking CDMA time to about +/- 5 microseconds.

No idea what CDMa time is, but that does not matter. 
Do you have peerstats running on the various machines so you can look at
the raw offset and particularly the round trip times? It may be that your
network one way is suddenly delaying things for mseconds one way for half
an hour say. 


>The clients are a rag-tag assembly of diverse systems including 
>a Centos 4.5 Linux i686, Linux x86_64, Sun Ultra 10, Sun Ultra 80, 
>IBM RS/6000 44p, Windows 2003 X64, and a Windows XP laptop.

>All are configured to prefer the Endrun clock and poll it on a 
>16 second interval.  All are attached to a single SMC gigabit 
>Ethernet switch with only the Endrun and two Sun systems running 
>at a lower speed of 100 MBPS.  Close to zero network traffic
>and system loads.

Maybe that ethernet switch suffers a nervous breakdown (too little to do?)
once a day. 



>All systems are running 'ntpd' 4.2.4p4.  Compiled NTP native 
>64-bit for the Windows X64 system.  [A #ifdef tweak to 
>'intptr_t' and 'uintptr_t' is required, will provide patch if 
>desired].

>It generally is working well, with the systems tracking anywhere 
>from +/- 100 microseconds to +/- 500 microseconds most of the 
>time.

Should be within 10s of usec, not hundreds.



>However once or twice a day, all the systems experience a 
>random, uncorrelated time shift of from one to several 
>milliseconds.  Had an issue where a UPS voltage correction shift 
>and cheap power supply on the Windows X64 box appeared to be a
>problem, but that was fixed by configuring the UPS to consider 
>110V nominal instead of 120V.

>Does anyone have any ideas about what could be causing these 
>random time jumps and what might be done to eliminate them?

>Something I'm planning to try is to make sure that 'mlock' is 
>configured in the daemons--presently 'autoconf' has left it 
>disabled for some reason.  However I don't belive page
>faults are the culprit.  All the daemons are running at 
>the highest real-time priority in the respective systems.

>The above configuration is a controlled lab setup.  The next 
>target is a stack eight of DELL 1950 servers in a production 
>data center running Windows 2003 R2 and slaved to a newer Endrun 
>time server.  Don't have useful data from these systems yet 

I would have just used a cheap GPS receiver, not pay $700 for one of these, 
but it's your money.

Ah, just looked at their web page. Would I really believe that the CDMA
cell phone network would care if their time signal were accurate to usec? 
There is no time path correction. But you should see that on your server
connected to the device. 

Anyway, look at the peerstats file, esp the roundtrip times and the
offsets. The ntp clock-filter tries to compensate for vast variations in
these but can only do so much.




>because the network jitter is outrageous.  Working with the 
>network admin to hopefully have the NTP traffic to and from the 
>Endrun clock bypass level 3 switch/router rule checking.  They 
>have large, complex router ACL rulesets I suspect as the cause
>of the jitter.

Sounds a bit weird. On an ADSL link from home through the telco to the 
university, I get
better than 1ms time accuracy. 

>Attached are fairly representative graphs of the offset and 
>frequency for two of the lab servers.

Netnews is text only. Post the info on a web page where anyone can look at
it. 



>Thanks
>P.S. Resent without graphs as the list mailer says
>they're not allowed.  Happy to send them or the raw
>'loopstats' to anyone interested.

Just post them.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread David Woolley
[EMAIL PROTECTED] wrote:

> The clients are a rag-tag assembly of diverse systems including 
> a Centos 4.5 Linux i686, Linux x86_64, Sun Ultra 10, Sun Ultra 80, 
> IBM RS/6000 44p, Windows 2003 X64, and a Windows XP laptop.

How are you interpolating the 16ms ticks on the Windows system?  How are 
you disabling power management on the lap top?

> 
> It generally is working well, with the systems tracking anywhere 
> from +/- 100 microseconds to +/- 500 microseconds most of the 
> time.

How are you measuring the difference from true time?  In principle, if 
ntpd can measure it, it will correct it.

> 
> However once or twice a day, all the systems experience a 
> random, uncorrelated time shift of from one to several 
> milliseconds.  Had an issue where a UPS voltage correction shift 

In which direction is the slip?  Backward only slips against true time 
(these might appear as forward slips if the real error is in the server) 
are typically due to lost clock interrupts.  If that is the case it 
implies you are using a tick rate of other than 100Hz.  Please note that 
the Linux kernel code is broken for clock frequencies other than 100Hz 
and the use of 1000Hz significantly increases the likelihood of a lost 
interrupt.

The normal source of lsot interrupts is disk drivers using programmed 
transfers.

> and cheap power supply on the Windows X64 box appeared to be a
> problem, but that was fixed by configuring the UPS to consider 
> 110V nominal instead of 120V.
> 

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Windows Won't Syncronize to NTP

2008-03-30 Thread Maarten Wiltink
"Matthew Lind" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]

> I can't get a Windows Client to sync to my NTP server.
> All Linux clients work fine.
[...]
> TCP DUMP OUTPUT:
>
> 13:02:19.440054 IP .ntp > .ntp: NTPv3,
> symmetric active, length 48
  

Don't do that. W32Time is asking to be a peer, which it has absolutely
no business to.

There have been recent posts (last two weeks or so) about how
to add a byte at the end of the name to request a less abusive
relationship with the NTP server.

Groetjes,
Maarten Wiltink


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread Unruh
"Richard B. Gilbert" <[EMAIL PROTECTED]> writes:

>[EMAIL PROTECTED] wrote:
>> Hello,
>> 
>> I'm trying to configure a small network for high precision time. 
>> Recently acquired an Endrun CDMA time server that runs like 
>> a dream, tracking CDMA time to about +/- 5 microseconds.
>> 
>> The clients are a rag-tag assembly of diverse systems including 
>> a Centos 4.5 Linux i686, Linux x86_64, Sun Ultra 10, Sun Ultra 80, 
>> IBM RS/6000 44p, Windows 2003 X64, and a Windows XP laptop.
>> 
>> All are configured to prefer the Endrun clock and poll it on a 
>> 16 second interval.  All are attached to a single SMC gigabit 
>> Ethernet switch with only the Endrun and two Sun systems running 
>> at a lower speed of 100 MBPS.  Close to zero network traffic
>> and system loads.
>> 
>> All systems are running 'ntpd' 4.2.4p4.  Compiled NTP native 
>> 64-bit for the Windows X64 system.  [A #ifdef tweak to 
>> 'intptr_t' and 'uintptr_t' is required, will provide patch if 
>> desired].
>> 
>> It generally is working well, with the systems tracking anywhere 
>> from +/- 100 microseconds to +/- 500 microseconds most of the 
>> time.
>> 
>> However once or twice a day, all the systems experience a 
>> random, uncorrelated time shift of from one to several 
>> milliseconds.  
>

>Forcing the poll interval to 16 seconds is not always a good idea!
>Ntpd will select a poll interval, generally starting at 64 seconds, and 
>ramping up to as long as 1024 seconds as the clock is beaten into 
>submission!

It is his network, he is not going to overload it. So, if he wants a 16 sec
poll interval that is up to him. 
I agree it is not a good idea for remote servers, but on his own system it
is fine. 


>Directly connected refclocks are frequently polled at shorter intervals
>but I don't think your refclock is "directly connected" in the same 
>sense that a clock working through a serial or parallel port is directly
>connected!

>A clock connected via ethernet with all the latencies and jitter 
>thereunto appertaining is no different than any other network server and 
>should be polled in the same manner!

??? The longer polls are in order not to swamp the remote server whith
1 people all polling every 16 sec ( or 1 sec) There is nothing in ntp
itself that mandates a longer poll interval. In fact a shorter poll
interval makes ntp much more responsive to changes ( clock drifts, etc)



>The very short poll intervals correct large errors quickly and the very 
>long intervals correct small errors very accurately!

No for a properly designed system both should be corrected. 


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread Maarten Wiltink
"Unruh" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]

> [...] Would I really believe that the CDMA cell phone network
> would care if their time signal were accurate to usec?

I would. Because IIUC, this is the basis on which they divide
timeslots between stations.

Groetjes,
Maarten Wiltink


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread Unruh
David Woolley <[EMAIL PROTECTED]> writes:

>[EMAIL PROTECTED] wrote:

>> The clients are a rag-tag assembly of diverse systems including 
>> a Centos 4.5 Linux i686, Linux x86_64, Sun Ultra 10, Sun Ultra 80, 
>> IBM RS/6000 44p, Windows 2003 X64, and a Windows XP laptop.

>How are you interpolating the 16ms ticks on the Windows system?  How are 
>you disabling power management on the lap top?

>> 
>> It generally is working well, with the systems tracking anywhere 
>> from +/- 100 microseconds to +/- 500 microseconds most of the 
>> time.

>How are you measuring the difference from true time?  In principle, if 
>ntpd can measure it, it will correct it.

I expect that he means the offsets that ntp measures. NTP does NOT correct
random offsets. Ie, if there is noise source which makes the offsets vary
by 500usec ntp will not get rid of them. You will see them in the offsets
as measured by ntp. Now, the time keeping might (or might not) be more
accurate than that, but those offsets are what I suspect he means.


>> 
>> However once or twice a day, all the systems experience a 
>> random, uncorrelated time shift of from one to several 
>> milliseconds.  Had an issue where a UPS voltage correction shift 

>In which direction is the slip?  Backward only slips against true time 
>(these might appear as forward slips if the real error is in the server) 
>are typically due to lost clock interrupts.  If that is the case it 
>implies you are using a tick rate of other than 100Hz.  Please note that 
>the Linux kernel code is broken for clock frequencies other than 100Hz 
>and the use of 1000Hz significantly increases the likelihood of a lost 
>interrupt.

He claims on all the systems. 


>The normal source of lsot interrupts is disk drivers using programmed 
>transfers.

Almost all disk drives on Linux now use dma.


>> and cheap power supply on the Windows X64 box appeared to be a
>> problem, but that was fixed by configuring the UPS to consider 
>> 110V nominal instead of 120V.
>> 

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread Maarten Wiltink
"Unruh" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> "Richard B. Gilbert" <[EMAIL PROTECTED]> writes:

>> Forcing the poll interval to 16 seconds is not always a good idea!
>> Ntpd will select a poll interval, generally starting at 64 seconds,
>> and ramping up to as long as 1024 seconds as the clock is beaten
>> into submission!
>
> It is his network, he is not going to overload it. So, if he wants a
> 16 sec poll interval that is up to him.
> I agree it is not a good idea for remote servers, but on his own system
> it is fine.
[...]
> ??? The longer polls are in order not to swamp the remote server whith
> 1 people all polling every 16 sec ( or 1 sec) There is nothing in
> ntp itself that mandates a longer poll interval. In fact a shorter poll
> interval makes ntp much more responsive to changes ( clock drifts, etc)

>> The very short poll intervals correct large errors quickly and the
>> very long intervals correct small errors very accurately!
>
> No for a properly designed system both should be corrected.

You seem to be missing the point. Once the large errors have been
corrected, NTP goes on to the small errors. For that, it _needs_ a
longer poll interval. That this gives the server more air is a
happy coincidence, but not why it does it.

Given the measurement error, you need to let the small error
accumulate over a longer period. Otherwise it would simply be
lost in the noise.

Do the math: assume the (constant!) measurement error to be +/- 1 ms,
the frequency error in my local host to be 1000 PPM (1/1000). With a
1 s polling interval, the real value is 1 ms and the measurement
will be between 0 and 2 ms. Not very good. With a 1000 s polling
interval, the real value is 1 s and the measurement will be between
0.999 and 1.001 s. Now that's useful to correct your clock with.

Now use more realistic numbers, like 50 PPM to start with, a polling
interval of 64 s and I'm not exactly sure what for the measuring
jitter. But the gist should be clear: that 50 PPM will go down, the
SNR will worsen, and the polling interval should go up to improve it
again.

Starting with a short interval is good to correct large errors
quickly. Backing off once you've done so is good to avoid pestering
the server, but it's also good to correct small errors accurately,
and _that_ is why it's done. And of course, once a larger than
expected offset is measured, the polling interval is shortened
again.

Groetjes,
Maarten Wiltink


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread Richard B. Gilbert
Unruh wrote:
> "Richard B. Gilbert" <[EMAIL PROTECTED]> writes:
> 
> 
>>[EMAIL PROTECTED] wrote:
>>
>>>Hello,
>>>
>>>I'm trying to configure a small network for high precision time. 
>>>Recently acquired an Endrun CDMA time server that runs like 
>>>a dream, tracking CDMA time to about +/- 5 microseconds.
>>>
>>>The clients are a rag-tag assembly of diverse systems including 
>>>a Centos 4.5 Linux i686, Linux x86_64, Sun Ultra 10, Sun Ultra 80, 
>>>IBM RS/6000 44p, Windows 2003 X64, and a Windows XP laptop.
>>>
>>>All are configured to prefer the Endrun clock and poll it on a 
>>>16 second interval.  All are attached to a single SMC gigabit 
>>>Ethernet switch with only the Endrun and two Sun systems running 
>>>at a lower speed of 100 MBPS.  Close to zero network traffic
>>>and system loads.
>>>
>>>All systems are running 'ntpd' 4.2.4p4.  Compiled NTP native 
>>>64-bit for the Windows X64 system.  [A #ifdef tweak to 
>>>'intptr_t' and 'uintptr_t' is required, will provide patch if 
>>>desired].
>>>
>>>It generally is working well, with the systems tracking anywhere 
>>>from +/- 100 microseconds to +/- 500 microseconds most of the 
>>>time.
>>>
>>>However once or twice a day, all the systems experience a 
>>>random, uncorrelated time shift of from one to several 
>>>milliseconds.  
>>
>>
> 
> 
>>Forcing the poll interval to 16 seconds is not always a good idea!
>>Ntpd will select a poll interval, generally starting at 64 seconds, and 
>>ramping up to as long as 1024 seconds as the clock is beaten into 
>>submission!
> 
> 
> It is his network, he is not going to overload it. So, if he wants a 16 sec
> poll interval that is up to him. 
> I agree it is not a good idea for remote servers, but on his own system it
> is fine. 
> 
> 
> 
>>Directly connected refclocks are frequently polled at shorter intervals
>>but I don't think your refclock is "directly connected" in the same 
>>sense that a clock working through a serial or parallel port is directly
>>connected!
> 
> 
>>A clock connected via ethernet with all the latencies and jitter 
>>thereunto appertaining is no different than any other network server and 
>>should be polled in the same manner!
> 
> 
> ??? The longer polls are in order not to swamp the remote server whith
> 1 people all polling every 16 sec ( or 1 sec) There is nothing in ntp
> itself that mandates a longer poll interval. In fact a shorter poll
> interval makes ntp much more responsive to changes ( clock drifts, etc)
> 
> 
> 
> 
>>The very short poll intervals correct large errors quickly and the very 
>>long intervals correct small errors very accurately!
> 
> 
> No for a properly designed system both should be corrected. 
> 
> 

If you don't measure across a long interval, you will never see some of 
those small errors.  When you measure across 1024 seconds you overwhelm 
the network jitter.  The long interval is part of the design for just 
that reason.

Suppose your frequency error is 5 PPM or 0.43 seconds per day.  Do you 
think you can measure that error accurately with a 64 second poll 
interval?  If you are working over the internet, an error that small is 
going to disappear in the jitter.  It will be sixteen times more obvious 
at the longer interval.

You can poll a hardware reference clock at 16 second intervals because 
the network is not involved!  The latency and jitter a PPS signal over a 
serial port are an order or two of magnitiude less than what you get 
over a busy network.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread David Woolley
Unruh wrote:
> 1 people all polling every 16 sec ( or 1 sec) There is nothing in ntp
> itself that mandates a longer poll interval. In fact a shorter poll
> interval makes ntp much more responsive to changes ( clock drifts, etc)

As I understand it, locking maxpoll low only slightly improves 
responsiveness.  The main effect is simply to oversample, as the time 
constants still adjust to values appropriate to a poll interval of 1024s.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread Hal Murray
>However once or twice a day, all the systems experience a
>random, uncorrelated time shift of from one to several
>milliseconds.

What does that mean?

I'm guessing that "uncorrelated" means the glitches don't happen
at the same time.

Are all clients seeing occasional problems?  Do they match
cron jobs or some activity burst on the system?

Can you try another network switch?  Or maybe even run without
any switches?  (plug the CDMA box directly into a second ethernet
port)

Can you try another NTP server?  How about setting up a PC,
letting it run for a day to establich a good drift file, and
then making it run on the local clock only.  That will drift,
slowly, but there won't be any jumps.

How about adding another client that doesn't do anything?
(Turn off cron too.)

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread David Woolley
Unruh wrote:
>> 
> I expect that he means the offsets that ntp measures. NTP does NOT correct

I suspect that too.

> random offsets. Ie, if there is noise source which makes the offsets vary

It averages them so as to reduce their effective size.

> by 500usec ntp will not get rid of them. You will see them in the offsets
> as measured by ntp. Now, the time keeping might (or might not) be more
> accurate than that, but those offsets are what I suspect he means.

The question is about "measured errors" that significantly exceed the 
random offsets.  In any case the systematic error can also greatly 
exceed the measured offset - that represents an error that ntpd cannot 
measure.
> 
> 
> Almost all disk drives on Linux now use dma.

They need to do both and the drivers that caused this problem were 
capable of using DMA.  The problem was, I believe, that certain chipsets 
were unsafe with DMA, so the default, at least used to be, the 
unconditional one of doing programmed transfers; you could enable DMA at 
your own risk.

My impression is that there are still enough systems with lost disk 
interrupts that someone reporting one tick backward steps can reasonably 
be assumed to have that problem, and it is a reasonable probability for 
someone who doesn't report the direction of the step.  The other common 
cause of steps, which are balanced in both directions, is not applicable 
here.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread David Woolley
Maarten Wiltink wrote:
> 
> You seem to be missing the point. Once the large errors have been
> corrected, NTP goes on to the small errors. For that, it _needs_ a
> longer poll interval. That this gives the server more air is a
> happy coincidence, but not why it does it.

I don't believe it *needs* longer poll intervals; I think they are 
simply wasteful in that the offsets are low pass filtered in such a way 
that clamping maxpoll makes very little difference to the result, when 
the time constant goes high.

I'm not sure that there is any user configurable option that actually 
does what people think they are doing by locking down maxpoll, in terms 
of keeping the loop time constant low.

A clamped maxpoll may improve the reponsiveness to faults causing time 
steps of more than 128ms, but one should be attacking the problem, not 
the symptom.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Windows won't Sync to NTP server

2008-03-30 Thread Danny Mayer
David Woolley wrote:
> [EMAIL PROTECTED] wrote:
>> I can't get a Windows Client to sync to my NTP server.  All Linux
>> clients work fine.
> 
> You didn't say that you were running a non-NTP compliant version of 
> w32time on the Windows system (it's illegally using symmetric active).
> 
> It is possible that your version of ntpd does not have the workaround 
> for the w32time bug that was extensively discussed last week.  You 
> should try setting the options on w32time that causes it to generate 
> proper client associations, upgrading to Windows 2003 (which is reported 
> to be compliant). Alternatively, you could run the reference ntpd on the 
> Windows systems.

ntp 4.2.4p4 does not include that fix nor do any of the tarballs for 
ntp-dev yet. That fix is coming. Martin Burnicki or Ryan Malayter 
provided instructions on how to get w32time to send client packet 
instead of symmetric active packets. The clients are getting synched 
because the restrict statement is denying peers.

Danny
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread Bill Unruh
On Sun, 30 Mar 2008, [EMAIL PROTECTED] wrote:

> At 04:51 PM 3/30/2008 -0700, Bill Unruh wrote:
>> Are those on the same day?
>
> Yes, same day.  Uncorrelated to anything I can identify
> or each other.  Same story on all the boxes.  Running
> a hefty multi-system compile with heavy NFS and Samba
> traffic does not produce these events, though it disturbs
> the Windows boxes slightly when CPU goes to 100%.
>
>> Which "linux" and which "windows" are those graphs since you
>> have 2 linux and 2 windows clients.
>
> That's the dual-core AMD 2.4GHz Athlon Tyan mobo whitebox
> runing Centos 4.5 SMP kernel.  Similar results on the
> Dell Dimension 2400 2.4GHz Intel P4 running Centos 4.5
> mono-processor kernel.
>
> Windows is a dual-core 3.4GHz Pentium D Tyan mobo whitebox
> running 2003 R2 SP2 standard server.
>
>> As I said, seeing the
>> peerstats files would be helpful (offset and roundtrip)
>
> Might try them later, but I can't belive a high-quality
> SMC switch is causing multi-millisecond delays.  Just not
> possible.  Pings are all about 400 microseconds, consistent
> but slightly different on each system.  Round trip is
> 800 microseconds.  Attaching the output from a bulk 'ntpq -p'
> 'ntptrace' script I have below.  Note that's 'ntptrace'
> version 4.1 since the 4.2 script has useless offset info.

I have had weird latencies on some switches here. 
And since all your machines are experiencing this, that switch is the only
commonality (or the ntp server). Do you have the peerstats on the server as
well to make sure that there are not some weird delays there.



>
>> Also these graphs seem to have cut off the spikes. Are the
>> spikes actaully higher or is that an illusion?
>
> Higher.  Sometimes 1ms, sometimes 5-6ms.
>
>> (Note the spikes are hundreds of usec, not many msec)
>
> That would be the ~1ms example, check out the other one.
>

I am also really really really disturbed that you have so many servers. You
are trying to test out one specific server. The others are simply liable to
confuse everything. For example ntp could for some bizarre reason, suddenly
decide to use one of those other sites as the preferred server and give a
glitch.

And what are all those CDMA servers? Set your system up with one single
source, the one you want to test.


>
>
>
>
> remote   refid  st t when poll reach   delay   offset  jitter
> ==
>   Endrun CDMA
> LOCAL(0)LOCAL(0)10 l   18   64  3770.0000.000   0.015
> *HOPF_S(0)   .CDMA.   0 l6   16  3770.0000.000   0.015
>   Centos 32
> *eachna  .CDMA.   1 u3   16  3770.683   -0.004   0.009
> -tock.usno.navy. .USNO.   1 u  452 1024  377   20.6781.432   2.822
> +navobs1.wustl.e .GPS.1 u  479 1024  377   50.136   -1.513   0.164
> +time.nist.gov   .ACTS.   1 u  471 1024  377   66.528   -1.708   0.156
> -tick.ucla.edu   .GPS.1 u  432 1024  377   87.3723.296   0.085
>   Ultra 10
> *172.29.87.3 .CDMA.   1 u   11   16  3770.869   -0.016   0.042
> 172.29.87.15: stratum 2, offset -0.07, synch distance 0.00783
> 172.29.87.3: stratum 1, offset -0.18, synch distance 0.00038, refid 'CDMA'
>   Ultra 80
> *172.29.87.3 .CDMA.   1 u4   16  3770.942   -0.012   0.012
> 172.29.87.17: stratum 2, offset -0.38, synch distance 0.00685
> 172.29.87.3: stratum 1, offset -0.17, synch distance 0.00038, refid 'CDMA'
>   44p
> *172.29.87.3 .CDMA.   1 u   13   16  3770.809   -0.001   0.016
> 172.29.87.13: stratum 2, offset -0.14, synch distance 0.00627
> 172.29.87.3: stratum 1, offset -0.18, synch distance 0.00038, refid 'CDMA'
>   Centos 64
> *172.29.87.3 .CDMA.   1 u   12   16  3770.6640.003   0.487
> 172.29.87.19: stratum 2, offset -0.09, synch distance 0.00720
> 172.29.87.3: stratum 1, offset -0.18, synch distance 0.00038, refid 'CDMA'
>   W2K3 64
> *172.29.87.3 .CDMA.   1 u4   16  3770.7340.053   0.014
> 172.29.87.20: stratum 2, offset -0.60, synch distance 0.00650
> 172.29.87.3: stratum 1, offset -0.19, synch distance 0.00038, refid 'CDMA'
>   XP 32 laptop
> *172.29.87.3 .CDMA.   1 u7   16  3770.8190.468   0.256
> 172.29.87.12: stratum 2, offset -0.000173, synch distance 0.00655
> 172.29.87.3: stratum 1, offset -0.17, synch distance 0.00038, refid 'CDMA'
>

-- 
William G. Unruh   |  Canadian Institute for| Tel: +1(604)822-3273
Physics&Astronomy  | Advanced Research  | Fax: +1(604)822-5324
UBC, Vancouver,BC  |   Program in Cosmology | [EMAIL PROTECTED]
Canada V6T 1Z1 |  and Gravity   |  www.theory.physics.ubc.ca/
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] high precision tracking: trying to understand sudden jumps

2008-03-30 Thread Unruh
"Maarten Wiltink" <[EMAIL PROTECTED]> writes:

>"Unruh" <[EMAIL PROTECTED]> wrote in message
>news:[EMAIL PROTECTED]
>> "Richard B. Gilbert" <[EMAIL PROTECTED]> writes:

>>> Forcing the poll interval to 16 seconds is not always a good idea!
>>> Ntpd will select a poll interval, generally starting at 64 seconds,
>>> and ramping up to as long as 1024 seconds as the clock is beaten
>>> into submission!
>>
>> It is his network, he is not going to overload it. So, if he wants a
>> 16 sec poll interval that is up to him.
>> I agree it is not a good idea for remote servers, but on his own system
>> it is fine.
>[...]
>> ??? The longer polls are in order not to swamp the remote server whith
>> 1 people all polling every 16 sec ( or 1 sec) There is nothing in
>> ntp itself that mandates a longer poll interval. In fact a shorter poll
>> interval makes ntp much more responsive to changes ( clock drifts, etc)

>>> The very short poll intervals correct large errors quickly and the
>>> very long intervals correct small errors very accurately!
>>
>> No for a properly designed system both should be corrected.

>You seem to be missing the point. Once the large errors have been
>corrected, NTP goes on to the small errors. For that, it _needs_ a
>longer poll interval. That this gives the server more air is a
>happy coincidence, but not why it does it.


I have no idea what this means. ntp simply runs a second order feedback
network It does not do anything for "large and small" errors. 

>Given the measurement error, you need to let the small error
>accumulate over a longer period. Otherwise it would simply be
>lost in the noise.

No idea what you mean.



>Do the math: assume the (constant!) measurement error to be +/- 1 ms,
>the frequency error in my local host to be 1000 PPM (1/1000). With a
>1 s polling interval, the real value is 1 ms and the measurement
>will be between 0 and 2 ms. Not very good. With a 1000 s polling
>interval, the real value is 1 s and the measurement will be between
>0.999 and 1.001 s. Now that's useful to correct your clock with.

You are not talking about large and small errors, you aree talking about
phase and frequency errors. And no computer has fixed eitehr phase of
frequency errors. They keep changing. Thus integrating for a longer time
does not help if the frequency errors ( drift) keeps changing. 



>Now use more realistic numbers, like 50 PPM to start with, a polling
>interval of 64 s and I'm not exactly sure what for the measuring
>jitter. But the gist should be clear: that 50 PPM will go down, the
>SNR will worsen, and the polling interval should go up to improve it
>again.

??? What you are descibing in one of the key problems with the ntp
algorithm.



>Starting with a short interval is good to correct large errors
>quickly. Backing off once you've done so is good to avoid pestering
>the server, but it's also good to correct small errors accurately,
>and _that_ is why it's done. And of course, once a larger than
>expected offset is measured, the polling interval is shortened
>again.

Anyway, that is not his problem. He is getting ms spikes in the loopfilter.
Those wipe out anything else he does. It destroys all attempts by ntp to
discipline the clock.



>Groetjes,
>Maarten Wiltink


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions