Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
Thanks for your push on this. I don't know why I didn't check this before. It looks like we have a problem with our interconnect. Our SA is checking the hardware now. [r...@nyclx1 ~]# more /etc/ocfs2/cluster.conf node: ip_port = ip_address = 192.168.0.218 number = 0 name = nyclx1 cluster = tiaa node: ip_port = ip_address = 192.168.0.217 number = 1 name = nyclx2 cluster = tiaa cluster: node_count = 2 name = tiaa [r...@nyclx1 ~]# ping 192.168.0.217 PING 192.168.0.217 (192.168.0.217) 56(84) bytes of data. From 192.168.0.218 icmp_seq=2 Destination Host Unreachable From 192.168.0.218 icmp_seq=3 Destination Host Unreachable From 192.168.0.218 icmp_seq=4 Destination Host Unreachable -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, June 03, 2009 6:19 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node Do on both nodes: $ netstat -ta --numeric-ports Maybe port is already in use. Check your setup again. Ensure cluster.conf is the same on both nodes. And that the ips are correct. That tcpdump was capturing the traffic on the correct interface. etc. etc. McKinley, Reid wrote: Yes, I had tcpdump running in separate sessions on both servers. The port is correct. Here is the cluster.conf. node: ip_port = ip_address = 192.168.0.218 number = 0 name = nyclx1 cluster = tiaa node: ip_port = ip_address = 192.168.0.217 number = 1 name = nyclx2 cluster = tiaa cluster: node_count = 2 name = tiaa -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, June 03, 2009 5:35 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node Did you have tcpdump running on a terminal when you attempted the mount on another terminal? Is the interface and port correct? It is one thing to not see the packets on the nyclx2. But what confuses me is that there is no traffic on nyclx1 too. This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, please contact the sender immediately by reply e-mail and destroy all copies. You are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. TIAA-CREF ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
[Ocfs2-users] O2CB heartbeat not active on 2nd node
We are having trouble getting the 2nd node in our 2 node RAC configuration to have an active O2CB heartbeat. We have our OCR and voting disks on an OCFS2 mount point, so we cannot bring up Clusterware on this node. I'm at a loss as to what the issue is. It was running fine for a few weeks, then we had a reboot and we cannot get the heartbeat active and we cannot mount any OCFS2 filesystems on the 2nd node. Any ideas are greatly appreciated. Dmesg errors are at the bottom. Here are the rpm and status details: [r...@nyclx1 ~]# rpm -qa | grep ocfs2 ocfs2-tools-1.4.1-1.el5 ocfs2console-1.4.1-1.el5 ocfs2-2.6.18-92.el5-1.4.1-1.el5 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5 ocfs2-tools-debuginfo-1.4.1-1.el5 [r...@nyclx1 ~]# /etc/init.d/o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold = 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active [r...@nyclx2 ~]# rpm -qa | grep ocfs2 ocfs2-tools-1.4.1-1.el5 ocfs2-2.6.18-92.el5-1.4.1-1.el5 ocfs2-tools-debuginfo-1.4.1-1.el5 ocfs2console-1.4.1-1.el5 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5 [r...@nyclx2 ~]# /etc/init.d/o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold = 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active OCFS2 1.4.1 Wed Jul 23 12:05:34 PDT 2008 (build 3fc82af4b5669945497b322b6aabd031) (11296,1):o2net_connect_expired:1637 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (14212,1):dlm_request_join:1033 ERROR: status = -107 (14212,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (14212,1):dlm_join_domain:1485 ERROR: status = -107 (14212,1):dlm_register_domain:1732 ERROR: status = -107 (14212,1):ocfs2_dlm_init:2662 ERROR: status = -107 (14212,1):ocfs2_mount_volume:1251 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 1) (11296,1):o2net_connect_expired:1637 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (14350,1):dlm_request_join:1033 ERROR: status = -107 (14350,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (14350,1):dlm_join_domain:1485 ERROR: status = -107 (14350,1):dlm_register_domain:1732 ERROR: status = -107 (14350,1):ocfs2_dlm_init:2662 ERROR: status = -107 (14350,1):ocfs2_mount_volume:1251 ERROR: status = -107 ocfs2: Unmounting device (253,3) on (node 1) (11296,1):o2net_connect_expired:1637 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4347,1):dlm_request_join:1033 ERROR: status = -107 (4347,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4347,1):dlm_join_domain:1485 ERROR: status = -107 (4347,1):dlm_register_domain:1732 ERROR: status = -107 (4347,1):ocfs2_dlm_init:2662 ERROR: status = -107 (4347,1):ocfs2_mount_volume:1251 ERROR: status = -107 ocfs2: Unmounting device (253,3) on (node 1) (11296,1):o2net_connect_expired:1637 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4948,1):dlm_request_join:1033 ERROR: status = -107 (4948,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4948,1):dlm_join_domain:1485 ERROR: status = -107 (4948,1):dlm_register_domain:1732 ERROR: status = -107 (4948,1):ocfs2_dlm_init:2662 ERROR: status = -107 (4948,1):ocfs2_mount_volume:1251 ERROR: status = -107 ocfs2: Unmounting device (253,3) on (node 1) OCFS2 Node Manager 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 0f78045c75c0174e50e4cf0934bf9eae) OCFS2 DLM 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 DLMFS 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 User DLM kernel interface loaded [r...@nyclx2 ~]# Reid McKinley This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, please contact the sender immediately by reply e-mail and destroy all copies. You are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. TIAA-CREF ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
The connect requests are not getting through. Do you have any firewalls setup? Is iptables running? If so, either shut it down or allow traffic on the o2cb port. McKinley, Reid wrote: We are having trouble getting the 2^nd node in our 2 node RAC configuration to have an active O2CB heartbeat. We have our OCR and voting disks on an OCFS2 mount point, so we cannot bring up Clusterware on this node. I’m at a loss as to what the issue is. It was running fine for a few weeks, then we had a reboot and we cannot get the heartbeat active and we cannot mount any OCFS2 filesystems on the 2^nd node. Any ideas are greatly appreciated. Dmesg errors are at the bottom. Here are the rpm and status details: [r...@nyclx1 ~]# rpm -qa | grep ocfs2 ocfs2-tools-1.4.1-1.el5 ocfs2console-1.4.1-1.el5 ocfs2-2.6.18-92.el5-1.4.1-1.el5 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5 ocfs2-tools-debuginfo-1.4.1-1.el5 [r...@nyclx1 ~]# /etc/init.d/o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold = 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active [r...@nyclx2 ~]# rpm -qa | grep ocfs2 ocfs2-tools-1.4.1-1.el5 ocfs2-2.6.18-92.el5-1.4.1-1.el5 ocfs2-tools-debuginfo-1.4.1-1.el5 ocfs2console-1.4.1-1.el5 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5 [r...@nyclx2 ~]# /etc/init.d/o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold = 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active OCFS2 1.4.1 Wed Jul 23 12:05:34 PDT 2008 (build 3fc82af4b5669945497b322b6aabd031) (11296,1):o2net_connect_expired:1637 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (14212,1):dlm_request_join:1033 ERROR: status = -107 (14212,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (14212,1):dlm_join_domain:1485 ERROR: status = -107 (14212,1):dlm_register_domain:1732 ERROR: status = -107 (14212,1):ocfs2_dlm_init:2662 ERROR: status = -107 (14212,1):ocfs2_mount_volume:1251 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 1) (11296,1):o2net_connect_expired:1637 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (14350,1):dlm_request_join:1033 ERROR: status = -107 (14350,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (14350,1):dlm_join_domain:1485 ERROR: status = -107 (14350,1):dlm_register_domain:1732 ERROR: status = -107 (14350,1):ocfs2_dlm_init:2662 ERROR: status = -107 (14350,1):ocfs2_mount_volume:1251 ERROR: status = -107 ocfs2: Unmounting device (253,3) on (node 1) (11296,1):o2net_connect_expired:1637 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4347,1):dlm_request_join:1033 ERROR: status = -107 (4347,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4347,1):dlm_join_domain:1485 ERROR: status = -107 (4347,1):dlm_register_domain:1732 ERROR: status = -107 (4347,1):ocfs2_dlm_init:2662 ERROR: status = -107 (4347,1):ocfs2_mount_volume:1251 ERROR: status = -107 ocfs2: Unmounting device (253,3) on (node 1) (11296,1):o2net_connect_expired:1637 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4948,1):dlm_request_join:1033 ERROR: status = -107 (4948,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4948,1):dlm_join_domain:1485 ERROR: status = -107 (4948,1):dlm_register_domain:1732 ERROR: status = -107 (4948,1):ocfs2_dlm_init:2662 ERROR: status = -107 (4948,1):ocfs2_mount_volume:1251 ERROR: status = -107 ocfs2: Unmounting device (253,3) on (node 1) OCFS2 Node Manager 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 0f78045c75c0174e50e4cf0934bf9eae) OCFS2 DLM 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 DLMFS 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 User DLM kernel interface loaded [r...@nyclx2 ~]# Reid McKinley This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, please contact the sender immediately by reply e-mail and destroy all copies. You are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly
Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
No, iptables is shutdown and disabled. No firewalls. [r...@nyclx1 ~]# service iptables status Firewall is stopped. -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, June 03, 2009 12:57 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node The connect requests are not getting through. Do you have any firewalls setup? Is iptables running? If so, either shut it down or allow traffic on the o2cb port. McKinley, Reid wrote: We are having trouble getting the 2^nd node in our 2 node RAC configuration to have an active O2CB heartbeat. We have our OCR and voting disks on an OCFS2 mount point, so we cannot bring up Clusterware on this node. I'm at a loss as to what the issue is. It was running fine for a few weeks, then we had a reboot and we cannot get the heartbeat active and we cannot mount any OCFS2 filesystems on the 2^nd node. Any ideas are greatly appreciated. Dmesg errors are at the bottom. Here are the rpm and status details: [r...@nyclx1 ~]# rpm -qa | grep ocfs2 ocfs2-tools-1.4.1-1.el5 ocfs2console-1.4.1-1.el5 ocfs2-2.6.18-92.el5-1.4.1-1.el5 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5 ocfs2-tools-debuginfo-1.4.1-1.el5 [r...@nyclx1 ~]# /etc/init.d/o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold = 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active [r...@nyclx2 ~]# rpm -qa | grep ocfs2 ocfs2-tools-1.4.1-1.el5 ocfs2-2.6.18-92.el5-1.4.1-1.el5 ocfs2-tools-debuginfo-1.4.1-1.el5 ocfs2console-1.4.1-1.el5 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5 [r...@nyclx2 ~]# /etc/init.d/o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold = 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active OCFS2 1.4.1 Wed Jul 23 12:05:34 PDT 2008 (build 3fc82af4b5669945497b322b6aabd031) (11296,1):o2net_connect_expired:1637 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (14212,1):dlm_request_join:1033 ERROR: status = -107 (14212,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (14212,1):dlm_join_domain:1485 ERROR: status = -107 (14212,1):dlm_register_domain:1732 ERROR: status = -107 (14212,1):ocfs2_dlm_init:2662 ERROR: status = -107 (14212,1):ocfs2_mount_volume:1251 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 1) (11296,1):o2net_connect_expired:1637 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (14350,1):dlm_request_join:1033 ERROR: status = -107 (14350,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (14350,1):dlm_join_domain:1485 ERROR: status = -107 (14350,1):dlm_register_domain:1732 ERROR: status = -107 (14350,1):ocfs2_dlm_init:2662 ERROR: status = -107 (14350,1):ocfs2_mount_volume:1251 ERROR: status = -107 ocfs2: Unmounting device (253,3) on (node 1) (11296,1):o2net_connect_expired:1637 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4347,1):dlm_request_join:1033 ERROR: status = -107 (4347,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4347,1):dlm_join_domain:1485 ERROR: status = -107 (4347,1):dlm_register_domain:1732 ERROR: status = -107 (4347,1):ocfs2_dlm_init:2662 ERROR: status = -107 (4347,1):ocfs2_mount_volume:1251 ERROR: status = -107 ocfs2: Unmounting device (253,3) on (node 1) (11296,1):o2net_connect_expired:1637 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4948,1):dlm_request_join:1033 ERROR: status = -107 (4948,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4948,1):dlm_join_domain:1485 ERROR: status = -107 (4948,1):dlm_register_domain:1732 ERROR: status = -107 (4948,1):ocfs2_dlm_init:2662 ERROR: status = -107 (4948,1):ocfs2_mount_volume:1251 ERROR: status = -107 ocfs2: Unmounting device (253,3) on (node 1) OCFS2 Node Manager 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 0f78045c75c0174e50e4cf0934bf9eae) OCFS2 DLM 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 DLMFS 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 User DLM kernel interface loaded [r...@nyclx2 ~]# Reid McKinley This message, including any attachments, contains confidential
Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
Do: $ tcpdump -i ethX -s 2500 -ttt 'port ' on both nodes. Replace ethX with the appropriate interface. Then issue the mount command on node 1. Do you see the traffic on node 0? McKinley, Reid wrote: No, iptables is shutdown and disabled. No firewalls. [r...@nyclx1 ~]# service iptables status Firewall is stopped. -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, June 03, 2009 12:57 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node The connect requests are not getting through. Do you have any firewalls setup? Is iptables running? If so, either shut it down or allow traffic on the o2cb port. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
We can bring up the ocfs2 cluster on 1 of 2 nodes only. So, it appears that it's not specific to only one specific node. Right now we have the ocfs2 heartbeat operational on node2 (node1 in the cluster.conf). Here are the results of the mount and tcpdump. [r...@nyclx1 ~]# mount -t ocfs2 /dev/mapper/mpath1 /oragrid mount.ocfs2: Transport endpoint is not connected while mounting /dev/mapper/mpath1 on /oragrid. Check 'dmesg' for more information on this error. [r...@nyclx1 ~]# [r...@nyclx2 ~]# tcpdump -i eth1 -s 2500 -ttt 'port ' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes 0 packets captured 0 packets received by filter 0 packets dropped by kernel [r...@nyclx2 ~]# /etc/init.d/o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold = 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active [r...@nyclx1 ~]# tcpdump -i eth1 -s 2500 -ttt 'port ' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes 0 packets captured 0 packets received by filter 0 packets dropped by kernel [r...@nyclx1 ~]# /etc/init.d/o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold = 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, June 03, 2009 4:55 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node Do: $ tcpdump -i ethX -s 2500 -ttt 'port ' on both nodes. Replace ethX with the appropriate interface. Then issue the mount command on node 1. Do you see the traffic on node 0? McKinley, Reid wrote: No, iptables is shutdown and disabled. No firewalls. [r...@nyclx1 ~]# service iptables status Firewall is stopped. -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, June 03, 2009 12:57 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node The connect requests are not getting through. Do you have any firewalls setup? Is iptables running? If so, either shut it down or allow traffic on the o2cb port. This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, please contact the sender immediately by reply e-mail and destroy all copies. You are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. TIAA-CREF ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
Did you have tcpdump running on a terminal when you attempted the mount on another terminal? Is the interface and port correct? It is one thing to not see the packets on the nyclx2. But what confuses me is that there is no traffic on nyclx1 too. McKinley, Reid wrote: We can bring up the ocfs2 cluster on 1 of 2 nodes only. So, it appears that it's not specific to only one specific node. Right now we have the ocfs2 heartbeat operational on node2 (node1 in the cluster.conf). Here are the results of the mount and tcpdump. [r...@nyclx1 ~]# mount -t ocfs2 /dev/mapper/mpath1 /oragrid mount.ocfs2: Transport endpoint is not connected while mounting /dev/mapper/mpath1 on /oragrid. Check 'dmesg' for more information on this error. [r...@nyclx1 ~]# [r...@nyclx2 ~]# tcpdump -i eth1 -s 2500 -ttt 'port ' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes 0 packets captured 0 packets received by filter 0 packets dropped by kernel [r...@nyclx2 ~]# /etc/init.d/o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold = 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active [r...@nyclx1 ~]# tcpdump -i eth1 -s 2500 -ttt 'port ' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes 0 packets captured 0 packets received by filter 0 packets dropped by kernel [r...@nyclx1 ~]# /etc/init.d/o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold = 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, June 03, 2009 4:55 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node Do: $ tcpdump -i ethX -s 2500 -ttt 'port ' on both nodes. Replace ethX with the appropriate interface. Then issue the mount command on node 1. Do you see the traffic on node 0? McKinley, Reid wrote: No, iptables is shutdown and disabled. No firewalls. [r...@nyclx1 ~]# service iptables status Firewall is stopped. -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, June 03, 2009 12:57 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node The connect requests are not getting through. Do you have any firewalls setup? Is iptables running? If so, either shut it down or allow traffic on the o2cb port. This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, please contact the sender immediately by reply e-mail and destroy all copies. You are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. TIAA-CREF ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
Yes, I had tcpdump running in separate sessions on both servers. The port is correct. Here is the cluster.conf. node: ip_port = ip_address = 192.168.0.218 number = 0 name = nyclx1 cluster = tiaa node: ip_port = ip_address = 192.168.0.217 number = 1 name = nyclx2 cluster = tiaa cluster: node_count = 2 name = tiaa -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, June 03, 2009 5:35 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node Did you have tcpdump running on a terminal when you attempted the mount on another terminal? Is the interface and port correct? It is one thing to not see the packets on the nyclx2. But what confuses me is that there is no traffic on nyclx1 too. McKinley, Reid wrote: We can bring up the ocfs2 cluster on 1 of 2 nodes only. So, it appears that it's not specific to only one specific node. Right now we have the ocfs2 heartbeat operational on node2 (node1 in the cluster.conf). Here are the results of the mount and tcpdump. [r...@nyclx1 ~]# mount -t ocfs2 /dev/mapper/mpath1 /oragrid mount.ocfs2: Transport endpoint is not connected while mounting /dev/mapper/mpath1 on /oragrid. Check 'dmesg' for more information on this error. [r...@nyclx1 ~]# [r...@nyclx2 ~]# tcpdump -i eth1 -s 2500 -ttt 'port ' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes 0 packets captured 0 packets received by filter 0 packets dropped by kernel [r...@nyclx2 ~]# /etc/init.d/o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold = 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active [r...@nyclx1 ~]# tcpdump -i eth1 -s 2500 -ttt 'port ' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes 0 packets captured 0 packets received by filter 0 packets dropped by kernel [r...@nyclx1 ~]# /etc/init.d/o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold = 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, June 03, 2009 4:55 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node Do: $ tcpdump -i ethX -s 2500 -ttt 'port ' on both nodes. Replace ethX with the appropriate interface. Then issue the mount command on node 1. Do you see the traffic on node 0? McKinley, Reid wrote: No, iptables is shutdown and disabled. No firewalls. [r...@nyclx1 ~]# service iptables status Firewall is stopped. -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, June 03, 2009 12:57 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node The connect requests are not getting through. Do you have any firewalls setup? Is iptables running? If so, either shut it down or allow traffic on the o2cb port. This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, please contact the sender immediately by reply e-mail and destroy all copies. You are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. TIAA-CREF This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, please contact the sender immediately by reply e-mail and destroy all copies. You are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. TIAA-CREF
Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
Do on both nodes: $ netstat -ta --numeric-ports Maybe port is already in use. Check your setup again. Ensure cluster.conf is the same on both nodes. And that the ips are correct. That tcpdump was capturing the traffic on the correct interface. etc. etc. McKinley, Reid wrote: Yes, I had tcpdump running in separate sessions on both servers. The port is correct. Here is the cluster.conf. node: ip_port = ip_address = 192.168.0.218 number = 0 name = nyclx1 cluster = tiaa node: ip_port = ip_address = 192.168.0.217 number = 1 name = nyclx2 cluster = tiaa cluster: node_count = 2 name = tiaa -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, June 03, 2009 5:35 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node Did you have tcpdump running on a terminal when you attempted the mount on another terminal? Is the interface and port correct? It is one thing to not see the packets on the nyclx2. But what confuses me is that there is no traffic on nyclx1 too. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] O2CB heartbeat: Not active
Hi I am getting following errors. Can someone help in rectifying these cluster: * service cman is not running * service cman is not started in default runlevel * service rgmanager is not running * cluster node is not quorate * one or more nodes have no fencing agent configured: the cluster infrastructure might not work as intended Regards Vivek Aggarwal +973-36583058 -Original Message- From: ocfs2-users-boun...@oss.oracle.com [mailto:ocfs2-users-boun...@oss.oracle.com] On Behalf Of McKinley, Reid Sent: Wednesday, April 29, 2009 11:59 PM To: Sunil Mushran Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active Thank you, Joel and Sunil. I think you pinpointed our issue! Here are the rpm versions from each node: [r...@nyclx1 ~]# rpm -qa | grep ocfs2 ocfs2-tools-1.4.1-1.el5 ocfs2-tools-debuginfo-1.4.1-1.el5 ocfs2console-1.4.1-1.el5 ocfs2-2.6.18-92.el5-1.4.1-1.el5 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5 [r...@nyclx2 ~]# rpm -qa | grep ocfs2 ocfs2-tools-1.4.1-1.el5 ocfs2-tools-debuginfo-1.4.1-1.el5 ocfs2console-1.4.1-1.el5 ocfs2-2.6.18-92.el5-1.2.8-2.el5-- I think this is the issue ocfs2-2.6.18-92.el5debug-1.2.8-2.el5 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5 -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, April 29, 2009 4:48 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active You have different versions of the file system on the two nodes. On both nodes, do: $ rpm -qa | grep ocfs2 Secondly, you should partition the devices. Features like mount-by-label do not work with unpartitioned devices. McKinley, Reid wrote: Sunil, Here is the output...(note: nyxlx1 is where we can mount the ocfs2 fs). [r...@nyclx2 ~]# mounted.ocfs2 -d DeviceFS UUID Label /dev/sda ocfs2 f350f4e5-2bf8-4930-ad55-4cb703e38e25 ocfs1 /dev/sdb ocfs2 b5d03918-31f9-4280-a378-e0043c241517 oracle_home /dev/sdi ocfs2 f350f4e5-2bf8-4930-ad55-4cb703e38e25 ocfs1 /dev/sdj ocfs2 b5d03918-31f9-4280-a378-e0043c241517 oracle_home [r...@nyclx1 ~]# mounted.ocfs2 -d DeviceFS UUID Label /dev/sda ocfs2 f350f4e5-2bf8-4930-ad55-4cb703e38e25 ocfs1 /dev/sdb ocfs2 b5d03918-31f9-4280-a378-e0043c241517 oracle_home /dev/sdi ocfs2 f350f4e5-2bf8-4930-ad55-4cb703e38e25 ocfs1 /dev/sdj ocfs2 b5d03918-31f9-4280-a378-e0043c241517 oracle_home [r...@nyclx1 ~]# In the /var/log/messages, I see this at the time the mount fails: Apr 29 12:01:13 nyclx2 kernel: (12430,0):o2net_check_handshake:1163 node nyclx1 (num 0) at 192.168.0.218: advertised net protocol version 11 but 103 is required, disconnecting Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_initialize_super:1454 ERROR: couldn't mount because of unsupported optional features (10). Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_fill_super:578 ERROR: status = -22 Apr 29 12:01:17 nyclx2 kernel: ocfs2: Unmounting device (8,0) on (node 255) Thanks again, Reid -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, April 29, 2009 4:32 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active What does mounted.ocfs2 -d say on both nodes? Not /var/log/dmesg. It is /var/log/messages. You could instead run dmesg. This is important as it will tell you why the mount failed. Sunil McKinley, Reid wrote: Thank you! Everything appears to be fine then, except that we cannot mount an OCFS2 filesystem on our 2nd node. When I try to mount the fs using ocfs2console on the 2nd node, I receive this error message in a dialog box: mount.ocfs2: Invalid argument while mounting /dev/sda on /oracle_home. Check 'dmesg' for more information on this error. : Could not mount /dev/sda I do not see any related messages in /var/log/dmesg. Any help is greatly appreciated. Thanks, Reid The O2CB status is as follows on this 2nd node: [r...@nyclx2 ~]# lsmod | grep ocfs2 ocfs2 369640 0 ocfs2_dlmfs55952 1 ocfs2_dlm 217104 2 ocfs2,ocfs2_dlmfs ocfs2_nodemanager 225416 6 ocfs2,ocfs2_dlmfs,ocfs2_dlm configfs 62301 2 ocfs2_nodemanager jbd93873 2 ocfs2,ext3 [r...@nyclx2 ~]# service o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold: 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active [r...@nyclx2
Re: [Ocfs2-users] O2CB heartbeat: Not active
Thank you! Everything appears to be fine then, except that we cannot mount an OCFS2 filesystem on our 2nd node. When I try to mount the fs using ocfs2console on the 2nd node, I receive this error message in a dialog box: mount.ocfs2: Invalid argument while mounting /dev/sda on /oracle_home. Check 'dmesg' for more information on this error. : Could not mount /dev/sda I do not see any related messages in /var/log/dmesg. Any help is greatly appreciated. Thanks, Reid The O2CB status is as follows on this 2nd node: [r...@nyclx2 ~]# lsmod | grep ocfs2 ocfs2 369640 0 ocfs2_dlmfs55952 1 ocfs2_dlm 217104 2 ocfs2,ocfs2_dlmfs ocfs2_nodemanager 225416 6 ocfs2,ocfs2_dlmfs,ocfs2_dlm configfs 62301 2 ocfs2_nodemanager jbd93873 2 ocfs2,ext3 [r...@nyclx2 ~]# service o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold: 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active [r...@nyclx2 ~]# -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Tuesday, April 28, 2009 5:34 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active McKinley, Reid wrote: We have installed OCFS2 1.4.1 and for some reason we can only get the mount point mounted on 1 of 2 nodes. The 2^nd node shows that the heartbeat is not active. [r...@nyclx2 ~]# service o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold: 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active ( --- why is heartbeat not active?? ) The o2cb heartbeat starts when a volume is mounted. Is that volume mounted on the second node? Or is your qs, the mount on the second node is failing. For that I would suggest you check cluster.conf to ensure that the ip addresses are correct. Also, suggest either shutting down iptables or adding rules to allow traffic on the private network. Look for any firewalls between the nodes. dmesg should have more information. In short, cluster mount requires the nodes to connect to each other. Also, should we be concerned that the ocfs2_nodemanager does not show in this status? We have seen this in some doc, but it never shows up in our status. [r...@nyclx1 log]# rpm -qa|grep -i ocfs2 ocfs2-tools-1.4.1-1.el5 ocfs2console-1.4.1-1.el5 ocfs2-2.6.18-92.el5-1.4.1-1.el5 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5 ocfs2-tools-debuginfo-1.4.1-1.el5 [r...@nyclx1 log]# uname -r 2.6.18-92.el5 No need for concern. The output of the o2cb init script in 1.4 is slightly different than 1.2. (You could do lsmod to check if the module is loaded or not.) $ lsmod | grep ocfs2 ocfs2_dlmfs23944 1 ocfs2_dlm 176916 1 ocfs2_dlmfs ocfs2_nodemanager 141044 103 ocfs2_dlmfs,ocfs2_dlm configfs 28753 2 ocfs2_nodemanager This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, please contact the sender immediately by reply e-mail and destroy all copies. You are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. TIAA-CREF ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] O2CB heartbeat: Not active
What does mounted.ocfs2 -d say on both nodes? Not /var/log/dmesg. It is /var/log/messages. You could instead run dmesg. This is important as it will tell you why the mount failed. Sunil McKinley, Reid wrote: Thank you! Everything appears to be fine then, except that we cannot mount an OCFS2 filesystem on our 2nd node. When I try to mount the fs using ocfs2console on the 2nd node, I receive this error message in a dialog box: mount.ocfs2: Invalid argument while mounting /dev/sda on /oracle_home. Check 'dmesg' for more information on this error. : Could not mount /dev/sda I do not see any related messages in /var/log/dmesg. Any help is greatly appreciated. Thanks, Reid The O2CB status is as follows on this 2nd node: [r...@nyclx2 ~]# lsmod | grep ocfs2 ocfs2 369640 0 ocfs2_dlmfs55952 1 ocfs2_dlm 217104 2 ocfs2,ocfs2_dlmfs ocfs2_nodemanager 225416 6 ocfs2,ocfs2_dlmfs,ocfs2_dlm configfs 62301 2 ocfs2_nodemanager jbd93873 2 ocfs2,ext3 [r...@nyclx2 ~]# service o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold: 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active [r...@nyclx2 ~]# ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] O2CB heartbeat: Not active
On Wed, Apr 29, 2009 at 04:39:23PM -0400, McKinley, Reid wrote: In the /var/log/messages, I see this at the time the mount fails: Apr 29 12:01:13 nyclx2 kernel: (12430,0):o2net_check_handshake:1163 node nyclx1 (num 0) at 192.168.0.218: advertised net protocol version 11 but 103 is required, disconnecting Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_initialize_super:1454 ERROR: couldn't mount because of unsupported optional features (10). Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_fill_super:578 ERROR: status = -22 Apr 29 12:01:17 nyclx2 kernel: ocfs2: Unmounting device (8,0) on (node 255) It looks like one node is running 1.2 and the other 1.4. You cannot mount the same filesystem with different versions of the driver at the same time. Both versions may understand the disk format, but they cannot coordinate with each other. Specifically, you're seeing the network protocol version mismatch. Upgrade the other node to 1.4.1 and you should be able to mount. Joel -- A good programming language should have features that make the kind of people who use the phrase software engineering shake their heads disapprovingly. - Paul Graham Joel Becker Principal Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] O2CB heartbeat: Not active
You have different versions of the file system on the two nodes. On both nodes, do: $ rpm -qa | grep ocfs2 Secondly, you should partition the devices. Features like mount-by-label do not work with unpartitioned devices. McKinley, Reid wrote: Sunil, Here is the output...(note: nyxlx1 is where we can mount the ocfs2 fs). [r...@nyclx2 ~]# mounted.ocfs2 -d DeviceFS UUID Label /dev/sda ocfs2 f350f4e5-2bf8-4930-ad55-4cb703e38e25 ocfs1 /dev/sdb ocfs2 b5d03918-31f9-4280-a378-e0043c241517 oracle_home /dev/sdi ocfs2 f350f4e5-2bf8-4930-ad55-4cb703e38e25 ocfs1 /dev/sdj ocfs2 b5d03918-31f9-4280-a378-e0043c241517 oracle_home [r...@nyclx1 ~]# mounted.ocfs2 -d DeviceFS UUID Label /dev/sda ocfs2 f350f4e5-2bf8-4930-ad55-4cb703e38e25 ocfs1 /dev/sdb ocfs2 b5d03918-31f9-4280-a378-e0043c241517 oracle_home /dev/sdi ocfs2 f350f4e5-2bf8-4930-ad55-4cb703e38e25 ocfs1 /dev/sdj ocfs2 b5d03918-31f9-4280-a378-e0043c241517 oracle_home [r...@nyclx1 ~]# In the /var/log/messages, I see this at the time the mount fails: Apr 29 12:01:13 nyclx2 kernel: (12430,0):o2net_check_handshake:1163 node nyclx1 (num 0) at 192.168.0.218: advertised net protocol version 11 but 103 is required, disconnecting Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_initialize_super:1454 ERROR: couldn't mount because of unsupported optional features (10). Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_fill_super:578 ERROR: status = -22 Apr 29 12:01:17 nyclx2 kernel: ocfs2: Unmounting device (8,0) on (node 255) Thanks again, Reid -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, April 29, 2009 4:32 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active What does mounted.ocfs2 -d say on both nodes? Not /var/log/dmesg. It is /var/log/messages. You could instead run dmesg. This is important as it will tell you why the mount failed. Sunil McKinley, Reid wrote: Thank you! Everything appears to be fine then, except that we cannot mount an OCFS2 filesystem on our 2nd node. When I try to mount the fs using ocfs2console on the 2nd node, I receive this error message in a dialog box: mount.ocfs2: Invalid argument while mounting /dev/sda on /oracle_home. Check 'dmesg' for more information on this error. : Could not mount /dev/sda I do not see any related messages in /var/log/dmesg. Any help is greatly appreciated. Thanks, Reid The O2CB status is as follows on this 2nd node: [r...@nyclx2 ~]# lsmod | grep ocfs2 ocfs2 369640 0 ocfs2_dlmfs55952 1 ocfs2_dlm 217104 2 ocfs2,ocfs2_dlmfs ocfs2_nodemanager 225416 6 ocfs2,ocfs2_dlmfs,ocfs2_dlm configfs 62301 2 ocfs2_nodemanager jbd93873 2 ocfs2,ext3 [r...@nyclx2 ~]# service o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold: 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active [r...@nyclx2 ~]# This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, please contact the sender immediately by reply e-mail and destroy all copies. You are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. TIAA-CREF ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] O2CB heartbeat: Not active
Thank you, Joel and Sunil. I think you pinpointed our issue! Here are the rpm versions from each node: [r...@nyclx1 ~]# rpm -qa | grep ocfs2 ocfs2-tools-1.4.1-1.el5 ocfs2-tools-debuginfo-1.4.1-1.el5 ocfs2console-1.4.1-1.el5 ocfs2-2.6.18-92.el5-1.4.1-1.el5 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5 [r...@nyclx2 ~]# rpm -qa | grep ocfs2 ocfs2-tools-1.4.1-1.el5 ocfs2-tools-debuginfo-1.4.1-1.el5 ocfs2console-1.4.1-1.el5 ocfs2-2.6.18-92.el5-1.2.8-2.el5-- I think this is the issue ocfs2-2.6.18-92.el5debug-1.2.8-2.el5 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5 -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, April 29, 2009 4:48 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active You have different versions of the file system on the two nodes. On both nodes, do: $ rpm -qa | grep ocfs2 Secondly, you should partition the devices. Features like mount-by-label do not work with unpartitioned devices. McKinley, Reid wrote: Sunil, Here is the output...(note: nyxlx1 is where we can mount the ocfs2 fs). [r...@nyclx2 ~]# mounted.ocfs2 -d DeviceFS UUID Label /dev/sda ocfs2 f350f4e5-2bf8-4930-ad55-4cb703e38e25 ocfs1 /dev/sdb ocfs2 b5d03918-31f9-4280-a378-e0043c241517 oracle_home /dev/sdi ocfs2 f350f4e5-2bf8-4930-ad55-4cb703e38e25 ocfs1 /dev/sdj ocfs2 b5d03918-31f9-4280-a378-e0043c241517 oracle_home [r...@nyclx1 ~]# mounted.ocfs2 -d DeviceFS UUID Label /dev/sda ocfs2 f350f4e5-2bf8-4930-ad55-4cb703e38e25 ocfs1 /dev/sdb ocfs2 b5d03918-31f9-4280-a378-e0043c241517 oracle_home /dev/sdi ocfs2 f350f4e5-2bf8-4930-ad55-4cb703e38e25 ocfs1 /dev/sdj ocfs2 b5d03918-31f9-4280-a378-e0043c241517 oracle_home [r...@nyclx1 ~]# In the /var/log/messages, I see this at the time the mount fails: Apr 29 12:01:13 nyclx2 kernel: (12430,0):o2net_check_handshake:1163 node nyclx1 (num 0) at 192.168.0.218: advertised net protocol version 11 but 103 is required, disconnecting Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_initialize_super:1454 ERROR: couldn't mount because of unsupported optional features (10). Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_fill_super:578 ERROR: status = -22 Apr 29 12:01:17 nyclx2 kernel: ocfs2: Unmounting device (8,0) on (node 255) Thanks again, Reid -Original Message- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Wednesday, April 29, 2009 4:32 PM To: McKinley, Reid Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active What does mounted.ocfs2 -d say on both nodes? Not /var/log/dmesg. It is /var/log/messages. You could instead run dmesg. This is important as it will tell you why the mount failed. Sunil McKinley, Reid wrote: Thank you! Everything appears to be fine then, except that we cannot mount an OCFS2 filesystem on our 2nd node. When I try to mount the fs using ocfs2console on the 2nd node, I receive this error message in a dialog box: mount.ocfs2: Invalid argument while mounting /dev/sda on /oracle_home. Check 'dmesg' for more information on this error. : Could not mount /dev/sda I do not see any related messages in /var/log/dmesg. Any help is greatly appreciated. Thanks, Reid The O2CB status is as follows on this 2nd node: [r...@nyclx2 ~]# lsmod | grep ocfs2 ocfs2 369640 0 ocfs2_dlmfs55952 1 ocfs2_dlm 217104 2 ocfs2,ocfs2_dlmfs ocfs2_nodemanager 225416 6 ocfs2,ocfs2_dlmfs,ocfs2_dlm configfs 62301 2 ocfs2_nodemanager jbd93873 2 ocfs2,ext3 [r...@nyclx2 ~]# service o2cb status Driver for configfs: Loaded Filesystem configfs: Mounted Driver for ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster tiaa: Online Heartbeat dead threshold: 31 Network idle timeout: 3 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active [r...@nyclx2 ~]# This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, please contact the sender immediately by reply e-mail and destroy all copies. You are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. TIAA-CREF This message, including any attachments