Hi,

 

I'm running a two node failover cluster. Yesterday the cluster tried to manage 
a state transition. In the log files I found the following entries:

 

heartbeat[6905]: 2009/02/10_21:45:55 WARN: node nagios-drbd2: is dead

heartbeat[6905]: 2009/02/10_21:45:55 info: Link nagios-drbd2:eth1 dead.

 

A few minutes later the node that was still alive tried to take over the 
resources and created the following entries in the log file ( the resource 
"ipaddress" is an example, there are a lot more entries for the other resources 
that were running on the cluster ):

 

pengine[7370]: 2009/02/10_21:45:59 WARN: custom_action: Action 
resource_nagios_ipaddress_stop_0 on nagios-drbd2 is unrunnable (offline)

pengine[7370]: 2009/02/10_21:45:59 WARN: custom_action: Marking node 
nagios-drbd2 unclean

 

Further more there a several entries telling:

 

stonithd[6916]: 2009/02/10_21:46:30 ERROR: Failed to STONITH the node 
nagios-drbd2: optype=RESET, op_result=TIMEOUT

 

The stonith is running via ssh on a direct link between the to nodes. Since 
Node2 was down the shutdown command never reached its destination.

 

My Questions are:

Why did the alive cluster try to stop resources on a cluster node that is 
considered as dead?

Why did STONITH try to shut down a node that is considered down? ( for safety 
reasons I think )

Shouldn't the resources just be started on the alive node without any further 
action?

Did I miss something in the default behaviour of heartbeat? Maybe a timeout?

Would a hardware STONITH device solve such problems in the future?

 

These entries as shown above fill the log from the time the node was found down 
until this morning I reached my Workstation.

 

With kind regards

Kai Zemke

 

=========================================================== 
smartnet Online Service GmbH, Schnackenburgallee 177, 22525 Hamburg 
=========================================================== 
Geschäftsführer: Christian Suding, Claus Masch 
Ust.IdNr.:DE191136350
Handelsregister HRB 66463 
Steuernummer: FA: Hamburg 54/855/01047
Fon: +49 (0) 40 5540-0
Fax: +49 (0) 40 5540-1040
kai.ze...@smartnet.de
Weitere Informationen siehe: http://www.smartnet.de <http://www.smartnet.de/> 
===========================================================  

 

Hinweis:
Diese Email kann vertrauliche und/oder rechtlich geschützte 
Informationen enthalten. Wenn Sie nicht der beabsichtigte  
Empfänger sind oder diese Email irrtümlich erhalten haben, 
informieren Sie bitte sofort den Absender telefonisch oder 
per Email und löschen Sie diese Email aus Ihrem System. 
Das unerlaubte Kopieren, sowie die unbefugte Weitergabe 
dieser Email ist nicht gestattet.Wir haften nicht für die 
Unversehrtheit von Emails, nachdem sie unseren Einfluss-
Bereich verlassen haben.

 

**********************************************************************************************
IMPORTANT: The contents of this email and any attachments are confidential. 
They are intended for the 
named recipient(s) only.
If you have received this email in error, please notify the system manager or 
the sender immediately and do 
not disclose the contents to anyone or make copies thereof.
*** eSafe scanned this email for viruses, vandals, and malicious content. ***
**********************************************************************************************
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to