Re: [Linux-cluster] NTP sync cause CNAM shutdown

Alvaro Jose Fernandez Thu, 13 Oct 2011 10:08:42 -0700

Hi Jean,

The DOC is https://access.redhat.com/kb/docs/DOC-42471 .


But, at Steven Drake said in a previous email, if you *can* upgrade to RHEL6, 
sure that would be the best option (I just cannot upgrade my customer, he will 
die on RHEL 5.x). 

In RHEL6, the cluster daemons are different and use other API, unlike openais.

Best regards.

Álvaro Fernández 
 Departamento de Sistemas_
  
________________________________

SIVSA, Soluciones Informáticas S.A. 
Arenal nº 18 · 3ª Planta · 36201 · Vigo 
Teléfono: (+34)  986 092 100  
Fax: (+34)  986 092 219
e-mail: [email protected]
www.sivsa.com
España_
 
******************************  ADVERTENCIA LEGAL  ****************************
En cumplimiento de la Ley de Servicios de la Sociedad de la Información y de 
Comercio Electrónico (LSSI-CE), y de la vigente Ley Orgánica 15/1999 de 13 de 
Diciembre de Protección de Datos de Carácter Personal (LOPD), le informamos que 
su dirección de correo electrónico figura en este momento en la base de datos  
de SIVSA, Soluciones Informáticas, S.A,  con domicilio en la calle Areal nº 18 
- 3ª planta, Vigo (Pontevedra),  que, como responsable del fichero, le 
garantiza el ejercicio de sus derechos de acceso, rectificación, cancelación y 
oposición de los datos facilitados, en los términos y condiciones previstos en 
la propia LOPD, mediante una comunicación por escrito dirigida a la dirección 
indicada, a la atención del "Departamento de Administración".  De no ser así, 
se entiende que usted consiente expresamente que sus datos puedan ser 
utilizados por SIVSA con fines publicitarios, promocionales y de marketing, en 
relación con sus propios productos y servicios. 

Este mensaje va dirigido, de manera exclusiva, a su destinatario y contiene 
información confidencial y sujeta al secreto profesional, cuya divulgación no 
está permitida por la ley. En caso de haber recibido este mensaje por error, le 
rogamos que, de forma inmediata, nos lo comunique mediante correo electrónico 
remitido a nuestra atención o a través del teléfono (+ 34) 986 092 100 y 
proceda a su eliminación, así como a la de cualquier documento adjunto al 
mismo. Asimismo, le comunicamos que la distribución, copia o utilización de 
este mensaje, o de cualquier documento adjunto al mismo, cualquiera que fuera 
su finalidad, están prohibidas por la ley."


-----Mensaje original-----
De: [email protected] [mailto:[email protected]] 
En nombre de BONNETOT Jean-Daniel (EXT THALES)
Enviado el: jueves, 13 de octubre de 2011 18:15
Para: linux clustering
Asunto: Re: [Linux-cluster] NTP sync cause CNAM shutdown

Thanks for your answer, it help me to find my way ;) I saw "-x" option fot 
ntpd, but it's not the only things to apply.

First, I had to solve my timezone problem. 
-> Hwclock set on GMT int BIOS (UTC if you prefer) timezone --utc 
-> Europe/Paris in kickstart, or set ZONE="Europe/Paris" and UTC=true in 
-> /etc/sysconfig/clock
This two settings make my time boot kernel in the right place, kernel get time 
from hwclock and know that it has to apply my timezone over it.

Then, I add "-x" option in /etc/syscinfig/ntp to say ntpd to not make big step.

As a result, boot time before:
Oct 13 12:02:20 s64lmwbig3b ntpd[7996]: ntpd [email protected] Thu Nov 26 
11:34:34 UTC 2009 (1) Oct 13 12:02:20 s64lmwbig3b ntpd[7997]: precision = 1.000 
usec Oct 13 12:02:20 s64lmwbig3b ntpd[7997]: Listening on interface wildcard, 
0.0.0.0#123 Disabled ...
Oct 13 12:02:20 s64lmwbig3b ntpd[7997]: Listening on interface bond0, 
10.151.231.215#123 Enabled <== 2H TIME JUMP Oct 13 14:02:31 s64lmwbig3b 
openais[7701]: [TOTEM] The token was lost in the OPERATIONAL state.
Oct 13 14:02:31 s64lmwbig3b openais[7701]: [TOTEM] Receive multicast socket 
recv buffer size (320000 bytes).
Oct 13 14:02:31 s64lmwbig3b openais[7701]: [TOTEM] Transmit multicast socket 
send buffer size (262142 bytes).
Oct 13 14:02:31 s64lmwbig3b openais[7701]: [TOTEM] entering GATHER state from 2.
=> CMAN crashed

Boot time now:
Oct 13 16:10:08 s64lmwbig3b clvmd: Cluster LVM daemon started - connected to 
CMAN ...
Oct 13 16:10:27 s64lmwbig3b ntpdate[7971]: step time server 10.151.156.87 
offset 1.306150 sec <== 1S TIME JUMP Oct 13 16:10:29 s64lmwbig3b ntpd[7975]: 
ntpd [email protected] Thu Nov 26 11:34:34 UTC 2009 (1) Oct 13 16:10:29 
s64lmwbig3b ntpd[7976]: precision = 1.000 usec ...
Oct 13 16:10:40 s64lmwbig3b modclusterd: startup succeeded => CMAN up and 
running

I looked for the FAQ you talked about but nothing, if you can post it when you 
have time ;)

Jean-Daniel BONNETOT

-----Message d'origine-----
De : [email protected] [mailto:[email protected]] 
De la part de Alvaro Jose Fernandez Envoyé : mercredi 12 octobre 2011 17:52 À : 
linux clustering Objet : Re: [Linux-cluster] NTP sync cause CNAM shutdown

Jean,

I too suffered the same issue, opened a case with support, etc. The best option 
running ntpd and RHCS are:

-First, start the cman, rgmanager, etc. (I mean, all the RHCS daemons) always 
after ntpd startup. In RHEL5 at least the default is the other way around. 

You can do that if you disable all RHCS daemons (via chkconfig off) from 
automatic startup, and then, starting them explicitly via your rc.local init 
script, as the last init sequence action (ie, after the network, basic systems, 
and most importantly after ntpd initially adjusted the clock, via it's 
"ntpdate" call.

Be aware that if you do the above, you must explicitly (manually) stop them if 
you need to shutdown the cluster or the nodes, as with this hack, the init 
scripts of cman, rgmanager, etc , won't run for the "kill"/shutdown sequence.

-Start the ntpd using the "slew" mode ( -x startup flag), in the configuration 
file. Running it in slew mode makes ntpd adjust the time over a large time 
span, enough to assure that CMAN internal timings won't get messed.

Using that hack was Ok for me, no more node evictions or unexpected problems 
since.

There is a FAQ and best practices document in Redhat Network for NTPD and RHCS, 
updated few months ago as I recall. Just search for it in the Redhat Network 
website (sorry, I don't have the link for the DOC at the moment)

regards,


Álvaro Fernández
 Departamento de Sistemas_

-------
Hi,

I post previous email asking what was wrong in my two nodes cluster.conf. I 
think I found it and have some question.

The problem was two nodes boot, join then cman shutdown with :
Oct 12 15:55:30 s64lmwbig3c openais[7672]: [MAIN ] Killing node s64lmwbig3b 
because it has rejoined the cluster with existing state Oct 12 15:55:30 
s64lmwbig3c openais[7672]: [CMAN ] cman killed by node 1 because we rejoined 
the cluster without a full restart

Few seconds before, ntpd sync and jump forward with 7200 sec (2 hours, my 
timzone is GMT + 2).

My questions are:
Which date do you set up in your bios (GMT, your time zone)?
Do you use ntpd ? all documentations say to use it.
What are best practices about ntp and RHCS?

Jean-Daniel BONNETOT

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster
-------
Ce message et toutes les pièces jointes sont établis à l'intention exclusive de 
ses destinataires et sont confidentiels. L'intégrité de ce message n'étant pas 
assurée sur Internet, la SNCF ne peut être tenue responsable des altérations 
qui pourraient se produire sur son contenu. Toute publication, utilisation, 
reproduction, ou diffusion, même partielle, non autorisée préalablement par la 
SNCF, est strictement interdite. Si vous n'êtes pas le destinataire de ce 
message, merci d'en avertir immédiatement l'expéditeur et de le détruire.
-------
This message and any attachments are intended solely for the addressees and are 
confidential. SNCF may not be held responsible for their contents whose 
accuracy and completeness cannot be guaranteed over the Internet. Unauthorized 
use, disclosure, distribution, copying, or any part thereof is strictly 
prohibited. If you are not the intended recipient of this message, please 
notify the sender immediately and delete it. 


--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] NTP sync cause CNAM shutdown

Reply via email to