Re: [Linux-cluster] NTP sync cause CNAM shutdown

BONNETOT Jean-Daniel (EXT THALES) Thu, 13 Oct 2011 09:32:27 -0700

Thanks for your answer, it help me to find my way ;)
I saw "-x" option fot ntpd, but it's not the only things to apply.


First, I had to solve my timezone problem. 
-> Hwclock set on GMT int BIOS (UTC if you prefer)
-> timezone --utc Europe/Paris in kickstart, or set ZONE="Europe/Paris" and 
UTC=true in /etc/sysconfig/clock
This two settings make my time boot kernel in the right place, kernel get time 
from hwclock and know that it has to apply my timezone over it.

Then, I add "-x" option in /etc/syscinfig/ntp to say ntpd to not make big step.

As a result, boot time before:
Oct 13 12:02:20 s64lmwbig3b ntpd[7996]: ntpd 4.2.2p1@1.1570-o Thu Nov 26 
11:34:34 UTC 2009 (1)
Oct 13 12:02:20 s64lmwbig3b ntpd[7997]: precision = 1.000 usec
Oct 13 12:02:20 s64lmwbig3b ntpd[7997]: Listening on interface wildcard, 
0.0.0.0#123 Disabled
...
Oct 13 12:02:20 s64lmwbig3b ntpd[7997]: Listening on interface bond0, 
10.151.231.215#123 Enabled <== 2H TIME JUMP
Oct 13 14:02:31 s64lmwbig3b openais[7701]: [TOTEM] The token was lost in the 
OPERATIONAL state.
Oct 13 14:02:31 s64lmwbig3b openais[7701]: [TOTEM] Receive multicast socket 
recv buffer size (320000 bytes).
Oct 13 14:02:31 s64lmwbig3b openais[7701]: [TOTEM] Transmit multicast socket 
send buffer size (262142 bytes).
Oct 13 14:02:31 s64lmwbig3b openais[7701]: [TOTEM] entering GATHER state from 2.
=> CMAN crashed

Boot time now:
Oct 13 16:10:08 s64lmwbig3b clvmd: Cluster LVM daemon started - connected to 
CMAN
...
Oct 13 16:10:27 s64lmwbig3b ntpdate[7971]: step time server 10.151.156.87 
offset 1.306150 sec <== 1S TIME JUMP
Oct 13 16:10:29 s64lmwbig3b ntpd[7975]: ntpd 4.2.2p1@1.1570-o Thu Nov 26 
11:34:34 UTC 2009 (1)
Oct 13 16:10:29 s64lmwbig3b ntpd[7976]: precision = 1.000 usec
...
Oct 13 16:10:40 s64lmwbig3b modclusterd: startup succeeded
=> CMAN up and running

I looked for the FAQ you talked about but nothing, if you can post it when you 
have time ;)

Jean-Daniel BONNETOT

-----Message d'origine-----
De : linux-cluster-boun...@redhat.com [mailto:linux-cluster-boun...@redhat.com] 
De la part de Alvaro Jose Fernandez
Envoyé : mercredi 12 octobre 2011 17:52
À : linux clustering
Objet : Re: [Linux-cluster] NTP sync cause CNAM shutdown

Jean,

I too suffered the same issue, opened a case with support, etc. The best option 
running ntpd and RHCS are:

-First, start the cman, rgmanager, etc. (I mean, all the RHCS daemons) always 
after ntpd startup. In RHEL5 at least the default is the other way around. 

You can do that if you disable all RHCS daemons (via chkconfig off) from 
automatic startup, and then, starting them explicitly via your rc.local init 
script, as the last init sequence action (ie, after the network, basic systems, 
and most importantly after ntpd initially adjusted the clock, via it's 
"ntpdate" call.

Be aware that if you do the above, you must explicitly (manually) stop them if 
you need to shutdown the cluster or the nodes, as with this hack, the init 
scripts of cman, rgmanager, etc , won't run for the "kill"/shutdown sequence.

-Start the ntpd using the "slew" mode ( -x startup flag), in the configuration 
file. Running it in slew mode makes ntpd adjust the time over a large time 
span, enough to assure that CMAN internal timings won't get messed.

Using that hack was Ok for me, no more node evictions or unexpected problems 
since.

There is a FAQ and best practices document in Redhat Network for NTPD and RHCS, 
updated few months ago as I recall. Just search for it in the Redhat Network 
website (sorry, I don't have the link for the DOC at the moment)

regards,


Álvaro Fernández 
 Departamento de Sistemas_

-------
Hi,

I post previous email asking what was wrong in my two nodes cluster.conf. I 
think I found it and have some question.

The problem was two nodes boot, join then cman shutdown with :
Oct 12 15:55:30 s64lmwbig3c openais[7672]: [MAIN ] Killing node s64lmwbig3b 
because it has rejoined the cluster with existing state Oct 12 15:55:30 
s64lmwbig3c openais[7672]: [CMAN ] cman killed by node 1 because we rejoined 
the cluster without a full restart

Few seconds before, ntpd sync and jump forward with 7200 sec (2 hours, my 
timzone is GMT + 2).

My questions are:
Which date do you set up in your bios (GMT, your time zone)?
Do you use ntpd ? all documentations say to use it.
What are best practices about ntp and RHCS?

Jean-Daniel BONNETOT

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------
Ce message et toutes les pièces jointes sont établis à l'intention exclusive de 
ses destinataires et sont confidentiels. L'intégrité de ce message n'étant pas 
assurée sur Internet, la SNCF ne peut être tenue responsable des altérations 
qui pourraient se produire sur son contenu. Toute publication, utilisation, 
reproduction, ou diffusion, même partielle, non autorisée préalablement par la 
SNCF, est strictement interdite. Si vous n'êtes pas le destinataire de ce 
message, merci d'en avertir immédiatement l'expéditeur et de le détruire.
-------
This message and any attachments are intended solely for the addressees and are 
confidential. SNCF may not be held responsible for their contents whose 
accuracy and completeness cannot be guaranteed over the Internet. Unauthorized 
use, disclosure, distribution, copying, or any part thereof is strictly 
prohibited. If you are not the intended recipient of this message, please 
notify the sender immediately and delete it. 


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] NTP sync cause CNAM shutdown

Reply via email to