[ntp:questions] Beginner's questions to NTP configuration option "peer"

2008-09-02 Thread Nottorf, Stefan
Hello,
I am quite new to questions regarding time synchronisation and hope this
is the right place to post my questions. I am trying to configure time
synchronisation via NTP in a small to medium size network. Reading
through several web pages and parts of the book of Mr. Mills I still
have some questions, which I will post after briefly describing a part
of the network.
The network has a Meinberg GPS 167 clock, which serves as source for all
time synchronisation. Usage of public time servers is not possible,
because the network is isolated.
Of each blade enclosure, one blade uses the Meinberg clock as time
server and is the timeserver for other blades in the enclosure it is
located in. The stratum 2 blades then use each other as peers. In case
of a failing clock, these servers pretend to be stratum 5 (via fudge).
My questions are:
1) in the described network setup: if the stratum 2 host of an enclosure
1 fails to synchronize with it's stratum 1 source...
before:

|-|
| Lantime V4  |
| Stratum 1   |
|-|
  |
  |
  V
|-|  |-|
| Enclosure 1 |<--peer-->| Enclosure 2 |
| Blade 1 |  | Blade 1 |
| Stratum 2   |  | Stratum 2   |
|-|  |-|
  |
  |
  V
|-|
| Enclosure 1 |
| Blade 2 |
| Stratum 3   |
|-|

---
after:

|-|
| Lantime V4  |
| Stratum 1   |
|-|
  |
  X
  
|-|  |-|
| Enclosure 1 |<--peer-->| Enclosure 2 |
| Blade 1 |  | Blade 1 |
| Stratum ?   |  | Stratum 2   |
|-|  |-|
  |  ___/
  | /
  V/
|-|
| Enclosure 1 |
| Blade 2 |
| Stratum 3   |
|-|

In the documentation it is written that "Should one of the peers lose
all reference sources or simply cease operation, the other peers will
automatically reconfigure so that time and related values can flow from
the surviving peers to all hosts in the subnet". Does the above
ASCII-art describe this behaviour correctly?
2) If a stratum 2 host has peer connections to other stratum 2 hosts AND
it's network connection to the stratum 1 server(s) fail... Does the host
in question
   a) drop to stratum 3 (because it's next time source is the stratum 2
peer) like it would if both peer sides used each other as time server
via the "server" keyword? 
   b) drop to it`s fudged stratum level (stratum 5 in this case)?

Regards,
Stefan

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


[ntp:questions] Output of ntpq -p regarding stratum level

2008-09-10 Thread Nottorf, Stefan
Hello,
although this is probably a beginner's question, I would like to
understand this before I run into trouble caused by misunderstanding
basic information. I think i misunderstood how to read the output of
ntpq -p... 
The third column ("st") shows the stratum.
But beginning to monitor time synchronization in our network, I noticed
a slight difference in the understanding of stratum levels between me
and the monitoring tool (Nagios).
When I use ntpq -p on one of our hosts I get the following output:

remote  refid   st t when poll reach delay offset   jitter

==
*10.x.y.z   .PPS.1 u  338 1024  377  0.454 -1.559   0.078
name1.name  10.x.y.z 2 u  471 1024  376  0.209  1.763   0.093
name2.name  10.x.y.z 2 u 1461 1024  374  0.001  2.652   0.206
name3.name  10.x.y.z 2 u  596 1024  376  0.385  0.052   0.322
LOCAL(0).LOCL.   5 l   59   64  377  0.000  0.000   0.001

Until now I read this line :*10.x.y.z   .PPS.1 u  338 1024  377
0.454 -1.559   0.078
the following way:
The host I ran ntpq -p on is synchronized to (*) server 10.x.y.z., whose
time source is a PulsePerSecond clock. 10.x.y.z is a server at stratum 1
and 338 seconds have passed since the last poll. The poll interval is
1024 seconds.
Because the time reference is located at stratum 1, this host is located
at stratum 2.
Similarily, if I look at server name1.name: name1.name is a server
located at stratum 2 and is itself synchronized to server 10.x.y.z.
If this is correct so far, the following output confuses me. I ran a
check for time synchronization for Nagios. Nagios provides a couple of
checks, one of them called check_ntp_peer. I use it to detect
differences between expected and current stratum levels.

//libexec/check_ntp_peer -H name1.name -W 4 -C 6 -v

1 candiate peers available
synchronization source found
Getting offset, jitter and stratum for peer 4551
parsing offset from peer 4551: -0.001193
parsing stratum from peer 4551: 1
NTP OK: Offset -0.001193 secs,
stratum=1|offset=-0.001193s;60.00;120.00; stratum=1;4;6;0;16

This output tells me, that name1.name is located in stratum 1 (instead
of the expected stratum 2). 

In short: does the column st refer to the stratum of the remote host OR
does it mean "If I synchronize to that remote host I will get to stratum
x" ?
Thanks and regards,
Stefan Nottorf

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


[ntp:questions] Command option iburst: usable in peer relations?

2008-10-20 Thread Nottorf, Stefan
Hello,
I am not quite sure on when to use the iburst option. In our network
there are stratum 2 peers and stratum 3 peers. While configuring I read
about the command options - also about iburst.
The official documentation
(http://www.eecis.udel.edu/~mills/ntp/html/confopt.html#cfg) reads:
"This option is valid only with the server command and type s addresses.
It is a recommended option with this command."
While the ntp.org site
(http://support.ntp.org/bin/view/Support/StartingNTP4#Section_7.1.4.1.)
states: 
"Use iburst in the appropriate peer or server lines in your
/etc/ntp.conf file for faster sync (i.e. ~8-15 seconds instead of ~5-10
minutes) "
Is the iburst option usable in peers or not?
Regards,
Stefan

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Slow convergence of NTP with GPS/PPS

2008-10-24 Thread Nottorf, Stefan
Nicola Berndt wrote:
> Btw, I have minpoll 4 maxpoll 4 in my ntp.conf and ntpq -p says:
> "poll 16". Is it the poll of ntpq -p or of ntpd? 
> 
> Best regards,
> ../nico

Hello,
ntpq -p shows the time in seconds between two polls (i.e. 16). In the
configuration file the poll interval is noted in 2^x . This means your
entry of minpoll 4 maxpoll 4 means 2^4 seconds minpoll 2^4 seconds
maxpoll . So the display of 16 seconds as result of ntpq -p is correct.
Hope this helped,
Stefan Nottorf

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Testing Sync Across Several Systems

2009-07-20 Thread Nottorf, Stefan
Hello,
We use Nagios to monitor our system - you can use one of the prepared
checks (check_ntp_time) to monitor the synchronization of your nodes.
You'll need the NRPE-Plugin for Nagios also. No costs involved (except
your time, of course).
Regards,
Stefan

-Original Message-
Greetings:

We have about 50 Linux/Solaris/Windows boxes running ntpd at several
different sites. Some of the systems from time to time go out of sync.
My question is there a way to test ntpd machines are all in sync with
the master
server?

I was thinking of using ssh to get on to each machine to do a date and
then go back to the master and do a date and compare, but this seems
problematic at best. What do people do to check that all machines are
in sync?

Thanks in Advanced for any help.

   Tom

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


[ntp:questions] Large offset (>200 seconds) during system boot

2009-11-23 Thread Nottorf, Stefan
Hello,
I encountered a somehow strange problem with ntpd after system boots.
We powered down a whole HP-Enclosure with HP-Blades, all running ntp
4.2@1.1570 (packaged with RedHat Enterprise Linux Server 5.1 64
bit). All of these Blades use ntp.confs like the following :

driftfile /var/lib/ntp/drift
server blade0 iburst version 4
peer blade1
peer blade2
peer blade3
peer blade4
peer blade5
peer blade6

The blades were running fine before the shut down, with 6 ms offset as
worst synchronization (measured against a Meinberg GPS 167 Radio Clock).
After completely shutting down the enclosure and rebooting the blades
showed an offset of 211000 ms (+- 1500 ms). ntpq -p showed that all
sources had (more ore less) the same offset.
After a restart of the ntp demon (via /etc/init.d/ntpd restart) -
without changing anything in the config file - everything worked fine
again...
This also happens if I shut down only a single blade.
1) Has anybody else encountered this behaviour?
2) Has anybody found a workaround (especially with this version of ntp)?
3) Is there a known problem, if many ntpdemons (~16) start at approx.
the same time?
4) Is there perhaps some well hidden "synchronization mechanism" by HP
that sets the time in advance (in this case to a time 211 s off...) ?

I might also wait for the release of ntp 4.2.6 if this would fix this
behaviour, but my customer is quite sensitive about being an "early
adopter" (regardless of how well tested the software is). 
My current "workaround" (if it can be called so) is restarting the ntpd
after the system has finished booting.
Best regards,
Stefan Nottorf
 
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Large offset (>200 seconds) during system boot

2009-11-24 Thread Nottorf, Stefan

> -Original Message-
> From: Kevin Oberman [mailto:ober...@es.net] 
> Sent: Monday, November 23, 2009 8:27 PM
> To: Nottorf, Stefan
> Cc: questions@lists.ntp.org
> Subject: Re: [ntp:questions] Large offset (>200 seconds) 
> during system boot 
> 
> > Date: Mon, 23 Nov 2009 14:02:52 +0100
> > From: "Nottorf, Stefan" 
> > Sender: questions-bounces+oberman=es@lists.ntp.org
> > 
> > Hello,
> > I encountered a somehow strange problem with ntpd after 
> system boots.
> > We powered down a whole HP-Enclosure with HP-Blades, all running ntp
> > 4.2@1.1570 (packaged with RedHat Enterprise Linux Server 5.1 64
> > bit). All of these Blades use ntp.confs like the following :
> > 
> > driftfile /var/lib/ntp/drift
> > server blade0 iburst version 4
> > peer blade1
> > peer blade2
> > peer blade3
> > peer blade4
> > peer blade5
> > peer blade6
> > 
> > The blades were running fine before the shut down, with 6 
> ms offset as
> > worst synchronization (measured against a Meinberg GPS 167 
> Radio Clock).
> > After completely shutting down the enclosure and rebooting 
> the blades
> > showed an offset of 211000 ms (+- 1500 ms). ntpq -p showed that all
> > sources had (more ore less) the same offset.
> > After a restart of the ntp demon (via /etc/init.d/ntpd restart) -
> > without changing anything in the config file - everything 
> worked fine
> > again...
> > This also happens if I shut down only a single blade.
> > 1) Has anybody else encountered this behaviour?
> > 2) Has anybody found a workaround (especially with this 
> version of ntp)?
> > 3) Is there a known problem, if many ntpdemons (~16) start 
> at approx.
> > the same time?
> > 4) Is there perhaps some well hidden "synchronization 
> mechanism" by HP
> > that sets the time in advance (in this case to a time 211 s 
> off...) ?
> > 
> > I might also wait for the release of ntp 4.2.6 if this 
> would fix this
> > behaviour, but my customer is quite sensitive about being an "early
> > adopter" (regardless of how well tested the software is). 
> > My current "workaround" (if it can be called so) is 
> restarting the ntpd
> > after the system has finished booting.
> > Best regards,
> > Stefan Nottorf
> 
> I don't believe that this has anything to do with NTP, but is the
> hardware. I suspect that the HP bladeserver has a single 
> hardware clock
> that is used to set the time for any blade at boot time. 
> Unfortunately,
> this time is off by about 211 seconds and the system calls that would
> normally update the hardware clock are not doing so. This is not
> unreasonable when a single HW clock is used for multiple 
> systems as one
> system could mess up the time for all of them if it mis-set the time.
> 
> I am puzzled by one thing...where is the "real" time coming from? If
> all systems are off by 211 seconds, ntp is getting the "correct" time
> from somewhere, but I don't know where. If you have no external time
> source, I am confused. You mention a Meinberg, but I don't see any
> indication of how it fits in.
> -- 
> R. Kevin Oberman, Network Engineer
> Energy Sciences Network (ESnet)
> Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
> E-mail: ober...@es.netPhone: +1 510 486-8634
> Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751
Hello, 
thanks for the response. Of course, you are right - in the ntp.conf that
I copied & pasted into my question there was no reference to the
Meinberg clock. The blade this ntp.conf was copied from is intended to
operate at stratum 3, with the server entry referring to a blade that is
intended to operate at stratum 2 (and thus is getting it's time from the
Meinberg clock, which is operating at stratum 1) and that is also
peering with other stratum 3 blades. We use a nagios check to regularly
compare the synchronisation of each blade to the Meinberg clock (
check_ntp_time -H  -w 0.015 -c 0.05) and notify the
admins if the time of the blades differs to much (as shown by the
thresholds used) from the Meinberg.
I'll check with our hardware vendor if there is such a feature (and how
I can either stop it from messing with the hardware clock or getting the
time from the Meinberg in advance to setting the time of the blades).
Best regards,
Stefan
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions