Hi all, I keep getting node reboots across my cluster, it seems random in that the node being evicted changes and in that it happens every now an then. I'm running RHEL 4 kernel 2.6.89.0.26.ELsmp, and OCFS is OCFS2 1.2.9 Mon Jun 21 20:03:07 PDT 2010 (build 5e8325ec7f66b5189c65c7a8710fe8cb) I am using OCFS2 as a general purpose filesystem (i.e not for Oracle datafiles or OCR etc), with the following entries in /etc/fstab /dev/emcpowera1 /u01/cfs ocfs2 _netdev 0 0 As a general purpose filesystem, should I be using the nointr mount option? /etc/init.d/o2cb status Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster UATocfs2: Online Heartbeat dead threshold: 61 Network idle timeout: 60000 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active /var/log/messages from the node being rebooted doesn't show anything, just the to reboot shows the following Dec 8 00:59:02 dcapp01 syslogd 1.4.1: restart. On the other nodes, I see the following entries Dec 8 01:56:01 dcapp02 kernel: o2net: connection to node dcapp01 (num 0) at 10.255.255.1:10007 has been idle for 60.0 seconds, shutting it down. Dec 8 01:56:01 dcapp02 kernel: (0,3):o2net_idle_timer:1426 here are some times that might help debug the situation: (tmr 1291733701.691575 now 1291733761.692608 dr 1291733701.690949 adv 1291733701.696965:1291733701.696967 func (d399da91:500) 1291733701.691576:1291733701.696950) Dec 8 01:56:01 dcapp02 kernel: o2net: no longer connected to node dcapp01 (num 0) at 10.255.255.1:10007 Dec 8 01:57:01 dcapp02 kernel: (16082,3):o2net_connect_expired:1585 ERROR: no connection established with node 0 after 60.0 seconds, giving up and returning errors. Dec 8 01:57:01 dcapp02 kernel: (4215,2):dlm_send_remote_convert_request:398 ERROR: status = -107 Dec 8 01:57:01 dcapp02 kernel: (4215,2):dlm_wait_for_node_death:365 C5C06C9B675D41B99B60DE2EB28CE0F7: waiting 5000ms for notification of death of node 0 Dec 8 01:57:04 dcapp02 kernel: (16082,3):ocfs2_dlm_eviction_cb:119 device (120,1): dlm has evicted node 0 Dec 8 01:57:05 dcapp02 kernel: (4269,0):dlm_send_remote_convert_request:398 ERROR: status = -107 Dec 8 01:57:05 dcapp02 kernel: (4269,0):dlm_wait_for_node_death:365 D43AF814A25845F7B103EBBEA440BA18: waiting 5000ms for notification of death of node 0 Dec 8 01:57:05 dcapp02 kernel: (16082,3):ocfs2_dlm_eviction_cb:119 device (120,66): dlm has evicted node 0 Dec 8 01:57:06 dcapp02 kernel: (16082,3):ocfs2_dlm_eviction_cb:119 device (120,65): dlm has evicted node 0 Dec 8 02:00:15 dcapp02 kernel: o2net: connected to node dcapp01 (num 0) at 10.255.255.1:10007 Dec 8 02:00:29 dcapp02 kernel: ocfs2_dlm: Node 0 joins domain C5C06C9B675D41B99B60DE2EB28CE0F7 Dec 8 02:00:29 dcapp02 kernel: ocfs2_dlm: Nodes in domain ("C5C06C9B675D41B99B60DE2EB28CE0F7"): 0 1 2 3 6 7 8 9 10 11 14 15 Dec 8 02:00:35 dcapp02 kernel: ocfs2_dlm: Node 0 joins domain 97F22666B5A6494AAF38C53909275DB2 Dec 8 02:00:35 dcapp02 kernel: ocfs2_dlm: Nodes in domain ("97F22666B5A6494AAF38C53909275DB2"): 0 1 2 3 Dec 8 02:00:39 dcapp02 kernel: ocfs2_dlm: Node 0 joins domain D43AF814A25845F7B103EBBEA440BA18 Dec 8 02:00:39 dcapp02 kernel: ocfs2_dlm: Nodes in domain ("D43AF814A25845F7B103EBBEA440BA18"): 0 1 2 3 I would really appreciate some help with this, as I'm not sure where to go from here. Thanks Neil
------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Downer This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No confidentiality or privilege is waived or lost by any mistransmission. If you receive this message in error, please immediately delete it and all copies of it from your system, destroy any hard copies of it and notify the sender. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Downer EDI and any of its subsidiaries each reserve the right to monitor all e-mail communications through its networks. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorized to state them to be the views of any such entity. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users