Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-04 Thread McKinley, Reid
Thanks for your push on this.  I don't know why I didn't check this
before.

It looks like we have a problem with our interconnect.  Our SA is
checking the hardware now.

[r...@nyclx1 ~]# more /etc/ocfs2/cluster.conf
node:
ip_port = 
ip_address = 192.168.0.218
number = 0
name = nyclx1
cluster = tiaa

node:
ip_port = 
ip_address = 192.168.0.217
number = 1
name = nyclx2
cluster = tiaa

cluster:
node_count = 2
name = tiaa

[r...@nyclx1 ~]# ping 192.168.0.217
PING 192.168.0.217 (192.168.0.217) 56(84) bytes of data.
From 192.168.0.218 icmp_seq=2 Destination Host Unreachable
From 192.168.0.218 icmp_seq=3 Destination Host Unreachable
From 192.168.0.218 icmp_seq=4 Destination Host Unreachable

-Original Message-
From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
Sent: Wednesday, June 03, 2009 6:19 PM
To: McKinley, Reid
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

Do on both nodes:
$ netstat -ta --numeric-ports

Maybe port  is already in use.

Check your setup again. Ensure cluster.conf is the same on both nodes.
And that the ips are correct. That tcpdump was capturing
the traffic on the correct interface. etc. etc.


McKinley, Reid wrote:
 Yes, I had tcpdump running in separate sessions on both servers.

 The port is correct.  Here is the cluster.conf.

 node:
 ip_port = 
 ip_address = 192.168.0.218
 number = 0
 name = nyclx1
 cluster = tiaa

 node:
 ip_port = 
 ip_address = 192.168.0.217
 number = 1
 name = nyclx2
 cluster = tiaa

 cluster:
 node_count = 2
 name = tiaa

 -Original Message-
 From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
 Sent: Wednesday, June 03, 2009 5:35 PM
 To: McKinley, Reid
 Cc: ocfs2-users@oss.oracle.com
 Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

 Did you have tcpdump running on a terminal when you attempted
 the mount on another terminal? Is the interface and port correct?

 It is one thing to not see the packets on the nyclx2. But what
 confuses me is that there is no traffic on nyclx1 too.
   



This message, including any attachments, contains confidential information 
intended 
for a specific individual and purpose, and is protected by law. If you are not 
the intended 
recipient, please contact the sender immediately by reply e-mail and destroy 
all copies.
You are hereby notified that any disclosure, copying, or distribution of this 
message, or
the taking of any action based on it, is strictly prohibited.

TIAA-CREF



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


[Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread McKinley, Reid
We are having trouble getting the 2nd node in our 2 node RAC
configuration to have an active O2CB heartbeat. We have our OCR and
voting disks on an OCFS2 mount point, so we cannot bring up Clusterware
on this node.

 

I'm at a loss as to what the issue is.  It was running fine for a few
weeks, then we had a reboot and we cannot get the heartbeat active and
we cannot mount any OCFS2 filesystems on the 2nd node.

 

Any ideas are greatly appreciated.

 

Dmesg errors are at the bottom.

 

Here are the rpm and status details:

[r...@nyclx1 ~]# rpm -qa | grep ocfs2

ocfs2-tools-1.4.1-1.el5

ocfs2console-1.4.1-1.el5

ocfs2-2.6.18-92.el5-1.4.1-1.el5

ocfs2-2.6.18-92.el5debug-1.2.8-2.el5

ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5

ocfs2-tools-debuginfo-1.4.1-1.el5

[r...@nyclx1 ~]# /etc/init.d/o2cb status

Driver for configfs: Loaded

Filesystem configfs: Mounted

Driver for ocfs2_dlmfs: Loaded

Filesystem ocfs2_dlmfs: Mounted

Checking O2CB cluster tiaa: Online

Heartbeat dead threshold = 31

  Network idle timeout: 3

  Network keepalive delay: 2000

  Network reconnect delay: 2000

Checking O2CB heartbeat: Active

 

[r...@nyclx2 ~]# rpm -qa | grep ocfs2

ocfs2-tools-1.4.1-1.el5

ocfs2-2.6.18-92.el5-1.4.1-1.el5

ocfs2-tools-debuginfo-1.4.1-1.el5

ocfs2console-1.4.1-1.el5

ocfs2-2.6.18-92.el5debug-1.2.8-2.el5

ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5

[r...@nyclx2 ~]# /etc/init.d/o2cb status

Driver for configfs: Loaded

Filesystem configfs: Mounted

Driver for ocfs2_dlmfs: Loaded

Filesystem ocfs2_dlmfs: Mounted

Checking O2CB cluster tiaa: Online

Heartbeat dead threshold = 31

  Network idle timeout: 3

  Network keepalive delay: 2000

  Network reconnect delay: 2000

Checking O2CB heartbeat: Not active

 

OCFS2 1.4.1 Wed Jul 23 12:05:34 PDT 2008 (build
3fc82af4b5669945497b322b6aabd031)

(11296,1):o2net_connect_expired:1637 ERROR: no connection established
with node 0 after 30.0 seconds, giving up and returning errors.

(14212,1):dlm_request_join:1033 ERROR: status = -107

(14212,1):dlm_try_to_join_domain:1207 ERROR: status = -107

(14212,1):dlm_join_domain:1485 ERROR: status = -107

(14212,1):dlm_register_domain:1732 ERROR: status = -107

(14212,1):ocfs2_dlm_init:2662 ERROR: status = -107

(14212,1):ocfs2_mount_volume:1251 ERROR: status = -107

ocfs2: Unmounting device (253,2) on (node 1)

(11296,1):o2net_connect_expired:1637 ERROR: no connection established
with node 0 after 30.0 seconds, giving up and returning errors.

(14350,1):dlm_request_join:1033 ERROR: status = -107

(14350,1):dlm_try_to_join_domain:1207 ERROR: status = -107

(14350,1):dlm_join_domain:1485 ERROR: status = -107

(14350,1):dlm_register_domain:1732 ERROR: status = -107

(14350,1):ocfs2_dlm_init:2662 ERROR: status = -107

(14350,1):ocfs2_mount_volume:1251 ERROR: status = -107

ocfs2: Unmounting device (253,3) on (node 1)

(11296,1):o2net_connect_expired:1637 ERROR: no connection established
with node 0 after 30.0 seconds, giving up and returning errors.

(4347,1):dlm_request_join:1033 ERROR: status = -107

(4347,1):dlm_try_to_join_domain:1207 ERROR: status = -107

(4347,1):dlm_join_domain:1485 ERROR: status = -107

(4347,1):dlm_register_domain:1732 ERROR: status = -107

(4347,1):ocfs2_dlm_init:2662 ERROR: status = -107

(4347,1):ocfs2_mount_volume:1251 ERROR: status = -107

ocfs2: Unmounting device (253,3) on (node 1)

(11296,1):o2net_connect_expired:1637 ERROR: no connection established
with node 0 after 30.0 seconds, giving up and returning errors.

(4948,1):dlm_request_join:1033 ERROR: status = -107

(4948,1):dlm_try_to_join_domain:1207 ERROR: status = -107

(4948,1):dlm_join_domain:1485 ERROR: status = -107

(4948,1):dlm_register_domain:1732 ERROR: status = -107

(4948,1):ocfs2_dlm_init:2662 ERROR: status = -107

(4948,1):ocfs2_mount_volume:1251 ERROR: status = -107

ocfs2: Unmounting device (253,3) on (node 1)

OCFS2 Node Manager 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build
0f78045c75c0174e50e4cf0934bf9eae)

OCFS2 DLM 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build
4ce8fae327880c466761f40fb7619490)

OCFS2 DLMFS 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build
4ce8fae327880c466761f40fb7619490)

OCFS2 User DLM kernel interface loaded

[r...@nyclx2 ~]#

 

Reid McKinley



This message, including any attachments, contains confidential information 
intended 
for a specific individual and purpose, and is protected by law. If you are not 
the intended 
recipient, please contact the sender immediately by reply e-mail and destroy 
all copies.
You are hereby notified that any disclosure, copying, or distribution of this 
message, or
the taking of any action based on it, is strictly prohibited.

TIAA-CREF

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread Sunil Mushran
The connect requests are not getting through. Do you
have any firewalls setup? Is iptables running? If so, either
shut it down or allow traffic on the o2cb port.

McKinley, Reid wrote:

 We are having trouble getting the 2^nd node in our 2 node RAC 
 configuration to have an active O2CB heartbeat. We have our OCR and 
 voting disks on an OCFS2 mount point, so we cannot bring up 
 Clusterware on this node.

 I’m at a loss as to what the issue is. It was running fine for a few 
 weeks, then we had a reboot and we cannot get the heartbeat active and 
 we cannot mount any OCFS2 filesystems on the 2^nd node.

 Any ideas are greatly appreciated.

 Dmesg errors are at the bottom.

 Here are the rpm and status details:

 [r...@nyclx1 ~]# rpm -qa | grep ocfs2

 ocfs2-tools-1.4.1-1.el5

 ocfs2console-1.4.1-1.el5

 ocfs2-2.6.18-92.el5-1.4.1-1.el5

 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5

 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5

 ocfs2-tools-debuginfo-1.4.1-1.el5

 [r...@nyclx1 ~]# /etc/init.d/o2cb status

 Driver for configfs: Loaded

 Filesystem configfs: Mounted

 Driver for ocfs2_dlmfs: Loaded

 Filesystem ocfs2_dlmfs: Mounted

 Checking O2CB cluster tiaa: Online

 Heartbeat dead threshold = 31

 Network idle timeout: 3

 Network keepalive delay: 2000

 Network reconnect delay: 2000

 Checking O2CB heartbeat: Active

 [r...@nyclx2 ~]# rpm -qa | grep ocfs2

 ocfs2-tools-1.4.1-1.el5

 ocfs2-2.6.18-92.el5-1.4.1-1.el5

 ocfs2-tools-debuginfo-1.4.1-1.el5

 ocfs2console-1.4.1-1.el5

 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5

 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5

 [r...@nyclx2 ~]# /etc/init.d/o2cb status

 Driver for configfs: Loaded

 Filesystem configfs: Mounted

 Driver for ocfs2_dlmfs: Loaded

 Filesystem ocfs2_dlmfs: Mounted

 Checking O2CB cluster tiaa: Online

 Heartbeat dead threshold = 31

 Network idle timeout: 3

 Network keepalive delay: 2000

 Network reconnect delay: 2000

 Checking O2CB heartbeat: Not active

 OCFS2 1.4.1 Wed Jul 23 12:05:34 PDT 2008 (build 
 3fc82af4b5669945497b322b6aabd031)

 (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
 with node 0 after 30.0 seconds, giving up and returning errors.

 (14212,1):dlm_request_join:1033 ERROR: status = -107

 (14212,1):dlm_try_to_join_domain:1207 ERROR: status = -107

 (14212,1):dlm_join_domain:1485 ERROR: status = -107

 (14212,1):dlm_register_domain:1732 ERROR: status = -107

 (14212,1):ocfs2_dlm_init:2662 ERROR: status = -107

 (14212,1):ocfs2_mount_volume:1251 ERROR: status = -107

 ocfs2: Unmounting device (253,2) on (node 1)

 (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
 with node 0 after 30.0 seconds, giving up and returning errors.

 (14350,1):dlm_request_join:1033 ERROR: status = -107

 (14350,1):dlm_try_to_join_domain:1207 ERROR: status = -107

 (14350,1):dlm_join_domain:1485 ERROR: status = -107

 (14350,1):dlm_register_domain:1732 ERROR: status = -107

 (14350,1):ocfs2_dlm_init:2662 ERROR: status = -107

 (14350,1):ocfs2_mount_volume:1251 ERROR: status = -107

 ocfs2: Unmounting device (253,3) on (node 1)

 (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
 with node 0 after 30.0 seconds, giving up and returning errors.

 (4347,1):dlm_request_join:1033 ERROR: status = -107

 (4347,1):dlm_try_to_join_domain:1207 ERROR: status = -107

 (4347,1):dlm_join_domain:1485 ERROR: status = -107

 (4347,1):dlm_register_domain:1732 ERROR: status = -107

 (4347,1):ocfs2_dlm_init:2662 ERROR: status = -107

 (4347,1):ocfs2_mount_volume:1251 ERROR: status = -107

 ocfs2: Unmounting device (253,3) on (node 1)

 (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
 with node 0 after 30.0 seconds, giving up and returning errors.

 (4948,1):dlm_request_join:1033 ERROR: status = -107

 (4948,1):dlm_try_to_join_domain:1207 ERROR: status = -107

 (4948,1):dlm_join_domain:1485 ERROR: status = -107

 (4948,1):dlm_register_domain:1732 ERROR: status = -107

 (4948,1):ocfs2_dlm_init:2662 ERROR: status = -107

 (4948,1):ocfs2_mount_volume:1251 ERROR: status = -107

 ocfs2: Unmounting device (253,3) on (node 1)

 OCFS2 Node Manager 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 
 0f78045c75c0174e50e4cf0934bf9eae)

 OCFS2 DLM 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 
 4ce8fae327880c466761f40fb7619490)

 OCFS2 DLMFS 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 
 4ce8fae327880c466761f40fb7619490)

 OCFS2 User DLM kernel interface loaded

 [r...@nyclx2 ~]#

 Reid McKinley

 
 This message, including any attachments, contains confidential information 
 intended 
 for a specific individual and purpose, and is protected by law. If you are 
 not the intended 
 recipient, please contact the sender immediately by reply e-mail and destroy 
 all copies.
 You are hereby notified that any disclosure, copying, or distribution of this 
 message, or
 the taking of any action based on it, is strictly 

Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread McKinley, Reid
No, iptables is shutdown and disabled.  No firewalls.

[r...@nyclx1 ~]# service iptables status
Firewall is stopped.

-Original Message-
From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
Sent: Wednesday, June 03, 2009 12:57 PM
To: McKinley, Reid
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

The connect requests are not getting through. Do you
have any firewalls setup? Is iptables running? If so, either
shut it down or allow traffic on the o2cb port.

McKinley, Reid wrote:

 We are having trouble getting the 2^nd node in our 2 node RAC 
 configuration to have an active O2CB heartbeat. We have our OCR and 
 voting disks on an OCFS2 mount point, so we cannot bring up 
 Clusterware on this node.

 I'm at a loss as to what the issue is. It was running fine for a few 
 weeks, then we had a reboot and we cannot get the heartbeat active and

 we cannot mount any OCFS2 filesystems on the 2^nd node.

 Any ideas are greatly appreciated.

 Dmesg errors are at the bottom.

 Here are the rpm and status details:

 [r...@nyclx1 ~]# rpm -qa | grep ocfs2

 ocfs2-tools-1.4.1-1.el5

 ocfs2console-1.4.1-1.el5

 ocfs2-2.6.18-92.el5-1.4.1-1.el5

 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5

 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5

 ocfs2-tools-debuginfo-1.4.1-1.el5

 [r...@nyclx1 ~]# /etc/init.d/o2cb status

 Driver for configfs: Loaded

 Filesystem configfs: Mounted

 Driver for ocfs2_dlmfs: Loaded

 Filesystem ocfs2_dlmfs: Mounted

 Checking O2CB cluster tiaa: Online

 Heartbeat dead threshold = 31

 Network idle timeout: 3

 Network keepalive delay: 2000

 Network reconnect delay: 2000

 Checking O2CB heartbeat: Active

 [r...@nyclx2 ~]# rpm -qa | grep ocfs2

 ocfs2-tools-1.4.1-1.el5

 ocfs2-2.6.18-92.el5-1.4.1-1.el5

 ocfs2-tools-debuginfo-1.4.1-1.el5

 ocfs2console-1.4.1-1.el5

 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5

 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5

 [r...@nyclx2 ~]# /etc/init.d/o2cb status

 Driver for configfs: Loaded

 Filesystem configfs: Mounted

 Driver for ocfs2_dlmfs: Loaded

 Filesystem ocfs2_dlmfs: Mounted

 Checking O2CB cluster tiaa: Online

 Heartbeat dead threshold = 31

 Network idle timeout: 3

 Network keepalive delay: 2000

 Network reconnect delay: 2000

 Checking O2CB heartbeat: Not active

 OCFS2 1.4.1 Wed Jul 23 12:05:34 PDT 2008 (build 
 3fc82af4b5669945497b322b6aabd031)

 (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
 with node 0 after 30.0 seconds, giving up and returning errors.

 (14212,1):dlm_request_join:1033 ERROR: status = -107

 (14212,1):dlm_try_to_join_domain:1207 ERROR: status = -107

 (14212,1):dlm_join_domain:1485 ERROR: status = -107

 (14212,1):dlm_register_domain:1732 ERROR: status = -107

 (14212,1):ocfs2_dlm_init:2662 ERROR: status = -107

 (14212,1):ocfs2_mount_volume:1251 ERROR: status = -107

 ocfs2: Unmounting device (253,2) on (node 1)

 (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
 with node 0 after 30.0 seconds, giving up and returning errors.

 (14350,1):dlm_request_join:1033 ERROR: status = -107

 (14350,1):dlm_try_to_join_domain:1207 ERROR: status = -107

 (14350,1):dlm_join_domain:1485 ERROR: status = -107

 (14350,1):dlm_register_domain:1732 ERROR: status = -107

 (14350,1):ocfs2_dlm_init:2662 ERROR: status = -107

 (14350,1):ocfs2_mount_volume:1251 ERROR: status = -107

 ocfs2: Unmounting device (253,3) on (node 1)

 (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
 with node 0 after 30.0 seconds, giving up and returning errors.

 (4347,1):dlm_request_join:1033 ERROR: status = -107

 (4347,1):dlm_try_to_join_domain:1207 ERROR: status = -107

 (4347,1):dlm_join_domain:1485 ERROR: status = -107

 (4347,1):dlm_register_domain:1732 ERROR: status = -107

 (4347,1):ocfs2_dlm_init:2662 ERROR: status = -107

 (4347,1):ocfs2_mount_volume:1251 ERROR: status = -107

 ocfs2: Unmounting device (253,3) on (node 1)

 (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
 with node 0 after 30.0 seconds, giving up and returning errors.

 (4948,1):dlm_request_join:1033 ERROR: status = -107

 (4948,1):dlm_try_to_join_domain:1207 ERROR: status = -107

 (4948,1):dlm_join_domain:1485 ERROR: status = -107

 (4948,1):dlm_register_domain:1732 ERROR: status = -107

 (4948,1):ocfs2_dlm_init:2662 ERROR: status = -107

 (4948,1):ocfs2_mount_volume:1251 ERROR: status = -107

 ocfs2: Unmounting device (253,3) on (node 1)

 OCFS2 Node Manager 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 
 0f78045c75c0174e50e4cf0934bf9eae)

 OCFS2 DLM 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 
 4ce8fae327880c466761f40fb7619490)

 OCFS2 DLMFS 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 
 4ce8fae327880c466761f40fb7619490)

 OCFS2 User DLM kernel interface loaded

 [r...@nyclx2 ~]#

 Reid McKinley




 This message, including any attachments, contains confidential

Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread Sunil Mushran
Do:
$ tcpdump -i ethX -s 2500 -ttt 'port '

on both nodes. Replace ethX with the appropriate interface.
Then issue the mount command on node 1. Do you see the traffic
on node 0?

McKinley, Reid wrote:
 No, iptables is shutdown and disabled.  No firewalls.

 [r...@nyclx1 ~]# service iptables status
 Firewall is stopped.

 -Original Message-
 From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
 Sent: Wednesday, June 03, 2009 12:57 PM
 To: McKinley, Reid
 Cc: ocfs2-users@oss.oracle.com
 Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

 The connect requests are not getting through. Do you
 have any firewalls setup? Is iptables running? If so, either
 shut it down or allow traffic on the o2cb port.

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread McKinley, Reid
We can bring up the ocfs2 cluster on 1 of 2 nodes only.  So, it appears
that it's not specific to only one specific node.  Right now we have the
ocfs2 heartbeat operational on node2 (node1 in the cluster.conf).

Here are the results of the mount and tcpdump.

[r...@nyclx1 ~]# mount -t ocfs2 /dev/mapper/mpath1 /oragrid
mount.ocfs2: Transport endpoint is not connected while mounting
/dev/mapper/mpath1 on /oragrid. Check 'dmesg' for more information on
this error.
[r...@nyclx1 ~]#

[r...@nyclx2 ~]#  tcpdump -i eth1 -s 2500 -ttt 'port '
tcpdump: verbose output suppressed, use -v or -vv for full protocol
decode
listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel

[r...@nyclx2 ~]# /etc/init.d/o2cb status
Driver for configfs: Loaded
Filesystem configfs: Mounted
Driver for ocfs2_dlmfs: Loaded
Filesystem ocfs2_dlmfs: Mounted
Checking O2CB cluster tiaa: Online
Heartbeat dead threshold = 31
  Network idle timeout: 3
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Active

[r...@nyclx1 ~]# tcpdump -i eth1 -s 2500 -ttt 'port '
tcpdump: verbose output suppressed, use -v or -vv for full protocol
decode
listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel

[r...@nyclx1 ~]# /etc/init.d/o2cb status
Driver for configfs: Loaded
Filesystem configfs: Mounted
Driver for ocfs2_dlmfs: Loaded
Filesystem ocfs2_dlmfs: Mounted
Checking O2CB cluster tiaa: Online
Heartbeat dead threshold = 31
  Network idle timeout: 3
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Not active


-Original Message-
From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
Sent: Wednesday, June 03, 2009 4:55 PM
To: McKinley, Reid
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

Do:
$ tcpdump -i ethX -s 2500 -ttt 'port '

on both nodes. Replace ethX with the appropriate interface.
Then issue the mount command on node 1. Do you see the traffic
on node 0?

McKinley, Reid wrote:
 No, iptables is shutdown and disabled.  No firewalls.

 [r...@nyclx1 ~]# service iptables status
 Firewall is stopped.

 -Original Message-
 From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
 Sent: Wednesday, June 03, 2009 12:57 PM
 To: McKinley, Reid
 Cc: ocfs2-users@oss.oracle.com
 Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

 The connect requests are not getting through. Do you
 have any firewalls setup? Is iptables running? If so, either
 shut it down or allow traffic on the o2cb port.


This message, including any attachments, contains confidential information 
intended 
for a specific individual and purpose, and is protected by law. If you are not 
the intended 
recipient, please contact the sender immediately by reply e-mail and destroy 
all copies.
You are hereby notified that any disclosure, copying, or distribution of this 
message, or
the taking of any action based on it, is strictly prohibited.

TIAA-CREF



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread Sunil Mushran
Did you have tcpdump running on a terminal when you attempted
the mount on another terminal? Is the interface and port correct?

It is one thing to not see the packets on the nyclx2. But what
confuses me is that there is no traffic on nyclx1 too.


McKinley, Reid wrote:
 We can bring up the ocfs2 cluster on 1 of 2 nodes only.  So, it appears
 that it's not specific to only one specific node.  Right now we have the
 ocfs2 heartbeat operational on node2 (node1 in the cluster.conf).

 Here are the results of the mount and tcpdump.

 [r...@nyclx1 ~]# mount -t ocfs2 /dev/mapper/mpath1 /oragrid
 mount.ocfs2: Transport endpoint is not connected while mounting
 /dev/mapper/mpath1 on /oragrid. Check 'dmesg' for more information on
 this error.
 [r...@nyclx1 ~]#

 [r...@nyclx2 ~]#  tcpdump -i eth1 -s 2500 -ttt 'port '
 tcpdump: verbose output suppressed, use -v or -vv for full protocol
 decode
 listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes

 0 packets captured
 0 packets received by filter
 0 packets dropped by kernel

 [r...@nyclx2 ~]# /etc/init.d/o2cb status
 Driver for configfs: Loaded
 Filesystem configfs: Mounted
 Driver for ocfs2_dlmfs: Loaded
 Filesystem ocfs2_dlmfs: Mounted
 Checking O2CB cluster tiaa: Online
 Heartbeat dead threshold = 31
   Network idle timeout: 3
   Network keepalive delay: 2000
   Network reconnect delay: 2000
 Checking O2CB heartbeat: Active

 [r...@nyclx1 ~]# tcpdump -i eth1 -s 2500 -ttt 'port '
 tcpdump: verbose output suppressed, use -v or -vv for full protocol
 decode
 listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes

 0 packets captured
 0 packets received by filter
 0 packets dropped by kernel

 [r...@nyclx1 ~]# /etc/init.d/o2cb status
 Driver for configfs: Loaded
 Filesystem configfs: Mounted
 Driver for ocfs2_dlmfs: Loaded
 Filesystem ocfs2_dlmfs: Mounted
 Checking O2CB cluster tiaa: Online
 Heartbeat dead threshold = 31
   Network idle timeout: 3
   Network keepalive delay: 2000
   Network reconnect delay: 2000
 Checking O2CB heartbeat: Not active


 -Original Message-
 From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
 Sent: Wednesday, June 03, 2009 4:55 PM
 To: McKinley, Reid
 Cc: ocfs2-users@oss.oracle.com
 Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

 Do:
 $ tcpdump -i ethX -s 2500 -ttt 'port '

 on both nodes. Replace ethX with the appropriate interface.
 Then issue the mount command on node 1. Do you see the traffic
 on node 0?

 McKinley, Reid wrote:
   
 No, iptables is shutdown and disabled.  No firewalls.

 [r...@nyclx1 ~]# service iptables status
 Firewall is stopped.

 -Original Message-
 From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
 Sent: Wednesday, June 03, 2009 12:57 PM
 To: McKinley, Reid
 Cc: ocfs2-users@oss.oracle.com
 Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

 The connect requests are not getting through. Do you
 have any firewalls setup? Is iptables running? If so, either
 shut it down or allow traffic on the o2cb port.
 

 
 This message, including any attachments, contains confidential information 
 intended 
 for a specific individual and purpose, and is protected by law. If you are 
 not the intended 
 recipient, please contact the sender immediately by reply e-mail and destroy 
 all copies.
 You are hereby notified that any disclosure, copying, or distribution of this 
 message, or
 the taking of any action based on it, is strictly prohibited.

 TIAA-CREF
 

   


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread McKinley, Reid
Yes, I had tcpdump running in separate sessions on both servers.

The port is correct.  Here is the cluster.conf.

node:
ip_port = 
ip_address = 192.168.0.218
number = 0
name = nyclx1
cluster = tiaa

node:
ip_port = 
ip_address = 192.168.0.217
number = 1
name = nyclx2
cluster = tiaa

cluster:
node_count = 2
name = tiaa

-Original Message-
From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
Sent: Wednesday, June 03, 2009 5:35 PM
To: McKinley, Reid
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

Did you have tcpdump running on a terminal when you attempted
the mount on another terminal? Is the interface and port correct?

It is one thing to not see the packets on the nyclx2. But what
confuses me is that there is no traffic on nyclx1 too.


McKinley, Reid wrote:
 We can bring up the ocfs2 cluster on 1 of 2 nodes only.  So, it
appears
 that it's not specific to only one specific node.  Right now we have
the
 ocfs2 heartbeat operational on node2 (node1 in the cluster.conf).

 Here are the results of the mount and tcpdump.

 [r...@nyclx1 ~]# mount -t ocfs2 /dev/mapper/mpath1 /oragrid
 mount.ocfs2: Transport endpoint is not connected while mounting
 /dev/mapper/mpath1 on /oragrid. Check 'dmesg' for more information on
 this error.
 [r...@nyclx1 ~]#

 [r...@nyclx2 ~]#  tcpdump -i eth1 -s 2500 -ttt 'port '
 tcpdump: verbose output suppressed, use -v or -vv for full protocol
 decode
 listening on eth1, link-type EN10MB (Ethernet), capture size 2500
bytes

 0 packets captured
 0 packets received by filter
 0 packets dropped by kernel

 [r...@nyclx2 ~]# /etc/init.d/o2cb status
 Driver for configfs: Loaded
 Filesystem configfs: Mounted
 Driver for ocfs2_dlmfs: Loaded
 Filesystem ocfs2_dlmfs: Mounted
 Checking O2CB cluster tiaa: Online
 Heartbeat dead threshold = 31
   Network idle timeout: 3
   Network keepalive delay: 2000
   Network reconnect delay: 2000
 Checking O2CB heartbeat: Active

 [r...@nyclx1 ~]# tcpdump -i eth1 -s 2500 -ttt 'port '
 tcpdump: verbose output suppressed, use -v or -vv for full protocol
 decode
 listening on eth1, link-type EN10MB (Ethernet), capture size 2500
bytes

 0 packets captured
 0 packets received by filter
 0 packets dropped by kernel

 [r...@nyclx1 ~]# /etc/init.d/o2cb status
 Driver for configfs: Loaded
 Filesystem configfs: Mounted
 Driver for ocfs2_dlmfs: Loaded
 Filesystem ocfs2_dlmfs: Mounted
 Checking O2CB cluster tiaa: Online
 Heartbeat dead threshold = 31
   Network idle timeout: 3
   Network keepalive delay: 2000
   Network reconnect delay: 2000
 Checking O2CB heartbeat: Not active


 -Original Message-
 From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
 Sent: Wednesday, June 03, 2009 4:55 PM
 To: McKinley, Reid
 Cc: ocfs2-users@oss.oracle.com
 Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

 Do:
 $ tcpdump -i ethX -s 2500 -ttt 'port '

 on both nodes. Replace ethX with the appropriate interface.
 Then issue the mount command on node 1. Do you see the traffic
 on node 0?

 McKinley, Reid wrote:
   
 No, iptables is shutdown and disabled.  No firewalls.

 [r...@nyclx1 ~]# service iptables status
 Firewall is stopped.

 -Original Message-
 From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
 Sent: Wednesday, June 03, 2009 12:57 PM
 To: McKinley, Reid
 Cc: ocfs2-users@oss.oracle.com
 Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

 The connect requests are not getting through. Do you
 have any firewalls setup? Is iptables running? If so, either
 shut it down or allow traffic on the o2cb port.
 




 This message, including any attachments, contains confidential
information intended 
 for a specific individual and purpose, and is protected by law. If you
are not the intended 
 recipient, please contact the sender immediately by reply e-mail and
destroy all copies.
 You are hereby notified that any disclosure, copying, or distribution
of this message, or
 the taking of any action based on it, is strictly prohibited.

 TIAA-CREF




   



This message, including any attachments, contains confidential information 
intended 
for a specific individual and purpose, and is protected by law. If you are not 
the intended 
recipient, please contact the sender immediately by reply e-mail and destroy 
all copies.
You are hereby notified that any disclosure, copying, or distribution of this 
message, or
the taking of any action based on it, is strictly prohibited.

TIAA-CREF

Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread Sunil Mushran
Do on both nodes:
$ netstat -ta --numeric-ports

Maybe port  is already in use.

Check your setup again. Ensure cluster.conf is the same on both nodes.
And that the ips are correct. That tcpdump was capturing
the traffic on the correct interface. etc. etc.


McKinley, Reid wrote:
 Yes, I had tcpdump running in separate sessions on both servers.

 The port is correct.  Here is the cluster.conf.

 node:
 ip_port = 
 ip_address = 192.168.0.218
 number = 0
 name = nyclx1
 cluster = tiaa

 node:
 ip_port = 
 ip_address = 192.168.0.217
 number = 1
 name = nyclx2
 cluster = tiaa

 cluster:
 node_count = 2
 name = tiaa

 -Original Message-
 From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
 Sent: Wednesday, June 03, 2009 5:35 PM
 To: McKinley, Reid
 Cc: ocfs2-users@oss.oracle.com
 Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

 Did you have tcpdump running on a terminal when you attempted
 the mount on another terminal? Is the interface and port correct?

 It is one thing to not see the packets on the nyclx2. But what
 confuses me is that there is no traffic on nyclx1 too.
   


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] O2CB heartbeat: Not active

2009-05-04 Thread Agarwal Vivek-RNGB36
Hi

I am getting following errors. Can someone help in rectifying these

cluster:
* service cman is not running
* service cman is not started in default runlevel
* service rgmanager is not running
* cluster node is not quorate
* one or more nodes have no fencing agent configured: the cluster 
infrastructure might not work as intended

Regards
Vivek Aggarwal
+973-36583058 



-Original Message-
From: ocfs2-users-boun...@oss.oracle.com 
[mailto:ocfs2-users-boun...@oss.oracle.com] On Behalf Of McKinley, Reid
Sent: Wednesday, April 29, 2009 11:59 PM
To: Sunil Mushran
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active

Thank you, Joel and Sunil.  I think you pinpointed our issue!

Here are the rpm versions from each node:

[r...@nyclx1 ~]# rpm -qa | grep ocfs2
ocfs2-tools-1.4.1-1.el5
ocfs2-tools-debuginfo-1.4.1-1.el5
ocfs2console-1.4.1-1.el5
ocfs2-2.6.18-92.el5-1.4.1-1.el5
ocfs2-2.6.18-92.el5debug-1.2.8-2.el5
ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5

[r...@nyclx2 ~]# rpm -qa | grep ocfs2
ocfs2-tools-1.4.1-1.el5
ocfs2-tools-debuginfo-1.4.1-1.el5
ocfs2console-1.4.1-1.el5
ocfs2-2.6.18-92.el5-1.2.8-2.el5-- I think this is the issue
ocfs2-2.6.18-92.el5debug-1.2.8-2.el5
ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5

-Original Message-
From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
Sent: Wednesday, April 29, 2009 4:48 PM
To: McKinley, Reid
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active

You have different versions of the file system on the two nodes.
On both nodes, do:
$ rpm -qa | grep ocfs2

Secondly, you should partition the devices. Features like mount-by-label
do not work with unpartitioned devices.

McKinley, Reid wrote:
 Sunil,
 Here is the output...(note: nyxlx1 is where we can mount the ocfs2
fs).

 [r...@nyclx2 ~]# mounted.ocfs2 -d
 DeviceFS UUID
Label
 /dev/sda  ocfs2  f350f4e5-2bf8-4930-ad55-4cb703e38e25
ocfs1
 /dev/sdb  ocfs2  b5d03918-31f9-4280-a378-e0043c241517
 oracle_home
 /dev/sdi  ocfs2  f350f4e5-2bf8-4930-ad55-4cb703e38e25
ocfs1
 /dev/sdj  ocfs2  b5d03918-31f9-4280-a378-e0043c241517
 oracle_home

 [r...@nyclx1 ~]# mounted.ocfs2 -d
 DeviceFS UUID
Label
 /dev/sda  ocfs2  f350f4e5-2bf8-4930-ad55-4cb703e38e25
ocfs1
 /dev/sdb  ocfs2  b5d03918-31f9-4280-a378-e0043c241517
 oracle_home
 /dev/sdi  ocfs2  f350f4e5-2bf8-4930-ad55-4cb703e38e25
ocfs1
 /dev/sdj  ocfs2  b5d03918-31f9-4280-a378-e0043c241517
 oracle_home
 [r...@nyclx1 ~]#

 In the /var/log/messages, I see this at the time the mount fails:

 Apr 29 12:01:13 nyclx2 kernel: (12430,0):o2net_check_handshake:1163
node
 nyclx1 (num 0) at 192.168.0.218: advertised net protocol version
11
 but 103 is required, disconnecting
 Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_initialize_super:1454
 ERROR: couldn't mount because of unsupported optional features (10).
 Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_fill_super:578 ERROR:
 status = -22
 Apr 29 12:01:17 nyclx2 kernel: ocfs2: Unmounting device (8,0) on (node
 255)

 Thanks again,
 Reid

 -Original Message-
 From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
 Sent: Wednesday, April 29, 2009 4:32 PM
 To: McKinley, Reid
 Cc: ocfs2-users@oss.oracle.com
 Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active

 What does mounted.ocfs2 -d say on both nodes?

 Not /var/log/dmesg. It is /var/log/messages. You could instead
 run dmesg. This is important as it will tell you why the
 mount failed.

 Sunil


 McKinley, Reid wrote:
   
 Thank you!

 Everything appears to be fine then, except that we cannot mount an
 
 OCFS2
   
 filesystem on our 2nd node.  When I try to mount the fs using
 ocfs2console on the 2nd node, I receive this error message in a
dialog
 box:
 mount.ocfs2: Invalid argument while mounting /dev/sda on
 
 /oracle_home.
   
 Check 'dmesg' for more information on this error. : Could not mount
 /dev/sda

 I do not see any related messages in /var/log/dmesg.  
 Any help is greatly appreciated.
 Thanks,
 Reid

 The O2CB status is as follows on this 2nd node:

 [r...@nyclx2 ~]#  lsmod | grep ocfs2
 ocfs2 369640  0
 ocfs2_dlmfs55952  1
 ocfs2_dlm 217104  2 ocfs2,ocfs2_dlmfs
 ocfs2_nodemanager 225416  6 ocfs2,ocfs2_dlmfs,ocfs2_dlm
 configfs   62301  2 ocfs2_nodemanager
 jbd93873  2 ocfs2,ext3
  [r...@nyclx2 ~]# service o2cb status
 Driver for configfs: Loaded
 Filesystem configfs: Mounted
 Driver for ocfs2_dlmfs: Loaded
 Filesystem ocfs2_dlmfs: Mounted
 Checking O2CB cluster tiaa: Online
   Heartbeat dead threshold: 31
   Network idle timeout: 3
   Network keepalive delay: 2000
   Network reconnect delay: 2000
 Checking O2CB heartbeat: Not active
 [r...@nyclx2

Re: [Ocfs2-users] O2CB heartbeat: Not active

2009-04-29 Thread McKinley, Reid
Thank you!

Everything appears to be fine then, except that we cannot mount an OCFS2
filesystem on our 2nd node.  When I try to mount the fs using
ocfs2console on the 2nd node, I receive this error message in a dialog
box:
mount.ocfs2: Invalid argument while mounting /dev/sda on /oracle_home.
Check 'dmesg' for more information on this error. : Could not mount
/dev/sda

I do not see any related messages in /var/log/dmesg.  
Any help is greatly appreciated.
Thanks,
Reid

The O2CB status is as follows on this 2nd node:

[r...@nyclx2 ~]#  lsmod | grep ocfs2
ocfs2 369640  0
ocfs2_dlmfs55952  1
ocfs2_dlm 217104  2 ocfs2,ocfs2_dlmfs
ocfs2_nodemanager 225416  6 ocfs2,ocfs2_dlmfs,ocfs2_dlm
configfs   62301  2 ocfs2_nodemanager
jbd93873  2 ocfs2,ext3
 [r...@nyclx2 ~]# service o2cb status
Driver for configfs: Loaded
Filesystem configfs: Mounted
Driver for ocfs2_dlmfs: Loaded
Filesystem ocfs2_dlmfs: Mounted
Checking O2CB cluster tiaa: Online
  Heartbeat dead threshold: 31
  Network idle timeout: 3
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Not active
[r...@nyclx2 ~]#

-Original Message-
From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
Sent: Tuesday, April 28, 2009 5:34 PM
To: McKinley, Reid
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active

McKinley, Reid wrote:

 We have installed OCFS2 1.4.1 and for some reason we can only get the 
 mount point mounted on 1 of 2 nodes.  The 2^nd node shows that the 
 heartbeat is not active. 

  

 [r...@nyclx2 ~]# service o2cb status

 Driver for configfs: Loaded

 Filesystem configfs: Mounted

 Driver for ocfs2_dlmfs: Loaded

 Filesystem ocfs2_dlmfs: Mounted

 Checking O2CB cluster tiaa: Online

   Heartbeat dead threshold: 31

   Network idle timeout: 3

   Network keepalive delay: 2000

   Network reconnect delay: 2000

 Checking O2CB heartbeat: Not active   ( --- why is heartbeat not

 active?? )


The o2cb heartbeat starts when a volume is mounted. Is that volume
mounted on the second node?

Or is your qs, the mount on the second node is failing. For that I would
suggest you check cluster.conf to ensure that the ip addresses are
correct.
Also, suggest either shutting down iptables or adding rules to allow
traffic on the private network. Look for any firewalls between the
nodes.
dmesg should have more information. In short, cluster mount requires the
nodes to connect to each other.

 Also, should we be concerned that the ocfs2_nodemanager does not show 
 in this status?  We have seen this in some doc, but it never shows up 
 in our status.

 [r...@nyclx1 log]# rpm -qa|grep -i ocfs2

 ocfs2-tools-1.4.1-1.el5

 ocfs2console-1.4.1-1.el5

 ocfs2-2.6.18-92.el5-1.4.1-1.el5

 ocfs2-2.6.18-92.el5debug-1.2.8-2.el5

 ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5

 ocfs2-tools-debuginfo-1.4.1-1.el5

 [r...@nyclx1 log]# uname -r

 2.6.18-92.el5


No need for concern. The output of the o2cb init script in 1.4
is slightly different than 1.2. (You could do lsmod to check if the
module is loaded or not.)

$ lsmod | grep ocfs2
ocfs2_dlmfs23944  1
ocfs2_dlm 176916  1 ocfs2_dlmfs
ocfs2_nodemanager 141044  103 ocfs2_dlmfs,ocfs2_dlm
configfs   28753  2 ocfs2_nodemanager



This message, including any attachments, contains confidential information 
intended 
for a specific individual and purpose, and is protected by law. If you are not 
the intended 
recipient, please contact the sender immediately by reply e-mail and destroy 
all copies.
You are hereby notified that any disclosure, copying, or distribution of this 
message, or
the taking of any action based on it, is strictly prohibited.

TIAA-CREF



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] O2CB heartbeat: Not active

2009-04-29 Thread Sunil Mushran
What does mounted.ocfs2 -d say on both nodes?

Not /var/log/dmesg. It is /var/log/messages. You could instead
run dmesg. This is important as it will tell you why the
mount failed.

Sunil


McKinley, Reid wrote:
 Thank you!

 Everything appears to be fine then, except that we cannot mount an OCFS2
 filesystem on our 2nd node.  When I try to mount the fs using
 ocfs2console on the 2nd node, I receive this error message in a dialog
 box:
 mount.ocfs2: Invalid argument while mounting /dev/sda on /oracle_home.
 Check 'dmesg' for more information on this error. : Could not mount
 /dev/sda

 I do not see any related messages in /var/log/dmesg.  
 Any help is greatly appreciated.
 Thanks,
 Reid

 The O2CB status is as follows on this 2nd node:

 [r...@nyclx2 ~]#  lsmod | grep ocfs2
 ocfs2 369640  0
 ocfs2_dlmfs55952  1
 ocfs2_dlm 217104  2 ocfs2,ocfs2_dlmfs
 ocfs2_nodemanager 225416  6 ocfs2,ocfs2_dlmfs,ocfs2_dlm
 configfs   62301  2 ocfs2_nodemanager
 jbd93873  2 ocfs2,ext3
  [r...@nyclx2 ~]# service o2cb status
 Driver for configfs: Loaded
 Filesystem configfs: Mounted
 Driver for ocfs2_dlmfs: Loaded
 Filesystem ocfs2_dlmfs: Mounted
 Checking O2CB cluster tiaa: Online
   Heartbeat dead threshold: 31
   Network idle timeout: 3
   Network keepalive delay: 2000
   Network reconnect delay: 2000
 Checking O2CB heartbeat: Not active
 [r...@nyclx2 ~]#

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] O2CB heartbeat: Not active

2009-04-29 Thread Joel Becker
On Wed, Apr 29, 2009 at 04:39:23PM -0400, McKinley, Reid wrote:
 In the /var/log/messages, I see this at the time the mount fails:
 
 Apr 29 12:01:13 nyclx2 kernel: (12430,0):o2net_check_handshake:1163 node
 nyclx1 (num 0) at 192.168.0.218: advertised net protocol version 11
 but 103 is required, disconnecting
 Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_initialize_super:1454
 ERROR: couldn't mount because of unsupported optional features (10).
 Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_fill_super:578 ERROR:
 status = -22
 Apr 29 12:01:17 nyclx2 kernel: ocfs2: Unmounting device (8,0) on (node
 255)

It looks like one node is running 1.2 and the other 1.4.  You
cannot mount the same filesystem with different versions of the driver
at the same time.  Both versions may understand the disk format, but
they cannot coordinate with each other.
Specifically, you're seeing the network protocol version
mismatch.  Upgrade the other node to 1.4.1 and you should be able to
mount.

Joel

-- 

A good programming language should have features that make the
kind of people who use the phrase software engineering shake
their heads disapprovingly.
- Paul Graham

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] O2CB heartbeat: Not active

2009-04-29 Thread Sunil Mushran
You have different versions of the file system on the two nodes.
On both nodes, do:
$ rpm -qa | grep ocfs2

Secondly, you should partition the devices. Features like mount-by-label
do not work with unpartitioned devices.

McKinley, Reid wrote:
 Sunil,
 Here is the output...(note: nyxlx1 is where we can mount the ocfs2 fs).

 [r...@nyclx2 ~]# mounted.ocfs2 -d
 DeviceFS UUID  Label
 /dev/sda  ocfs2  f350f4e5-2bf8-4930-ad55-4cb703e38e25  ocfs1
 /dev/sdb  ocfs2  b5d03918-31f9-4280-a378-e0043c241517
 oracle_home
 /dev/sdi  ocfs2  f350f4e5-2bf8-4930-ad55-4cb703e38e25  ocfs1
 /dev/sdj  ocfs2  b5d03918-31f9-4280-a378-e0043c241517
 oracle_home

 [r...@nyclx1 ~]# mounted.ocfs2 -d
 DeviceFS UUID  Label
 /dev/sda  ocfs2  f350f4e5-2bf8-4930-ad55-4cb703e38e25  ocfs1
 /dev/sdb  ocfs2  b5d03918-31f9-4280-a378-e0043c241517
 oracle_home
 /dev/sdi  ocfs2  f350f4e5-2bf8-4930-ad55-4cb703e38e25  ocfs1
 /dev/sdj  ocfs2  b5d03918-31f9-4280-a378-e0043c241517
 oracle_home
 [r...@nyclx1 ~]#

 In the /var/log/messages, I see this at the time the mount fails:

 Apr 29 12:01:13 nyclx2 kernel: (12430,0):o2net_check_handshake:1163 node
 nyclx1 (num 0) at 192.168.0.218: advertised net protocol version 11
 but 103 is required, disconnecting
 Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_initialize_super:1454
 ERROR: couldn't mount because of unsupported optional features (10).
 Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_fill_super:578 ERROR:
 status = -22
 Apr 29 12:01:17 nyclx2 kernel: ocfs2: Unmounting device (8,0) on (node
 255)

 Thanks again,
 Reid

 -Original Message-
 From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
 Sent: Wednesday, April 29, 2009 4:32 PM
 To: McKinley, Reid
 Cc: ocfs2-users@oss.oracle.com
 Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active

 What does mounted.ocfs2 -d say on both nodes?

 Not /var/log/dmesg. It is /var/log/messages. You could instead
 run dmesg. This is important as it will tell you why the
 mount failed.

 Sunil


 McKinley, Reid wrote:
   
 Thank you!

 Everything appears to be fine then, except that we cannot mount an
 
 OCFS2
   
 filesystem on our 2nd node.  When I try to mount the fs using
 ocfs2console on the 2nd node, I receive this error message in a dialog
 box:
 mount.ocfs2: Invalid argument while mounting /dev/sda on
 
 /oracle_home.
   
 Check 'dmesg' for more information on this error. : Could not mount
 /dev/sda

 I do not see any related messages in /var/log/dmesg.  
 Any help is greatly appreciated.
 Thanks,
 Reid

 The O2CB status is as follows on this 2nd node:

 [r...@nyclx2 ~]#  lsmod | grep ocfs2
 ocfs2 369640  0
 ocfs2_dlmfs55952  1
 ocfs2_dlm 217104  2 ocfs2,ocfs2_dlmfs
 ocfs2_nodemanager 225416  6 ocfs2,ocfs2_dlmfs,ocfs2_dlm
 configfs   62301  2 ocfs2_nodemanager
 jbd93873  2 ocfs2,ext3
  [r...@nyclx2 ~]# service o2cb status
 Driver for configfs: Loaded
 Filesystem configfs: Mounted
 Driver for ocfs2_dlmfs: Loaded
 Filesystem ocfs2_dlmfs: Mounted
 Checking O2CB cluster tiaa: Online
   Heartbeat dead threshold: 31
   Network idle timeout: 3
   Network keepalive delay: 2000
   Network reconnect delay: 2000
 Checking O2CB heartbeat: Not active
 [r...@nyclx2 ~]#
 

 
 This message, including any attachments, contains confidential information 
 intended 
 for a specific individual and purpose, and is protected by law. If you are 
 not the intended 
 recipient, please contact the sender immediately by reply e-mail and destroy 
 all copies.
 You are hereby notified that any disclosure, copying, or distribution of this 
 message, or
 the taking of any action based on it, is strictly prohibited.

 TIAA-CREF
 

   


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] O2CB heartbeat: Not active

2009-04-29 Thread McKinley, Reid
Thank you, Joel and Sunil.  I think you pinpointed our issue!

Here are the rpm versions from each node:

[r...@nyclx1 ~]# rpm -qa | grep ocfs2
ocfs2-tools-1.4.1-1.el5
ocfs2-tools-debuginfo-1.4.1-1.el5
ocfs2console-1.4.1-1.el5
ocfs2-2.6.18-92.el5-1.4.1-1.el5
ocfs2-2.6.18-92.el5debug-1.2.8-2.el5
ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5

[r...@nyclx2 ~]# rpm -qa | grep ocfs2
ocfs2-tools-1.4.1-1.el5
ocfs2-tools-debuginfo-1.4.1-1.el5
ocfs2console-1.4.1-1.el5
ocfs2-2.6.18-92.el5-1.2.8-2.el5-- I think this is the issue
ocfs2-2.6.18-92.el5debug-1.2.8-2.el5
ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5

-Original Message-
From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
Sent: Wednesday, April 29, 2009 4:48 PM
To: McKinley, Reid
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active

You have different versions of the file system on the two nodes.
On both nodes, do:
$ rpm -qa | grep ocfs2

Secondly, you should partition the devices. Features like mount-by-label
do not work with unpartitioned devices.

McKinley, Reid wrote:
 Sunil,
 Here is the output...(note: nyxlx1 is where we can mount the ocfs2
fs).

 [r...@nyclx2 ~]# mounted.ocfs2 -d
 DeviceFS UUID
Label
 /dev/sda  ocfs2  f350f4e5-2bf8-4930-ad55-4cb703e38e25
ocfs1
 /dev/sdb  ocfs2  b5d03918-31f9-4280-a378-e0043c241517
 oracle_home
 /dev/sdi  ocfs2  f350f4e5-2bf8-4930-ad55-4cb703e38e25
ocfs1
 /dev/sdj  ocfs2  b5d03918-31f9-4280-a378-e0043c241517
 oracle_home

 [r...@nyclx1 ~]# mounted.ocfs2 -d
 DeviceFS UUID
Label
 /dev/sda  ocfs2  f350f4e5-2bf8-4930-ad55-4cb703e38e25
ocfs1
 /dev/sdb  ocfs2  b5d03918-31f9-4280-a378-e0043c241517
 oracle_home
 /dev/sdi  ocfs2  f350f4e5-2bf8-4930-ad55-4cb703e38e25
ocfs1
 /dev/sdj  ocfs2  b5d03918-31f9-4280-a378-e0043c241517
 oracle_home
 [r...@nyclx1 ~]#

 In the /var/log/messages, I see this at the time the mount fails:

 Apr 29 12:01:13 nyclx2 kernel: (12430,0):o2net_check_handshake:1163
node
 nyclx1 (num 0) at 192.168.0.218: advertised net protocol version
11
 but 103 is required, disconnecting
 Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_initialize_super:1454
 ERROR: couldn't mount because of unsupported optional features (10).
 Apr 29 12:01:17 nyclx2 kernel: (16953,0):ocfs2_fill_super:578 ERROR:
 status = -22
 Apr 29 12:01:17 nyclx2 kernel: ocfs2: Unmounting device (8,0) on (node
 255)

 Thanks again,
 Reid

 -Original Message-
 From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
 Sent: Wednesday, April 29, 2009 4:32 PM
 To: McKinley, Reid
 Cc: ocfs2-users@oss.oracle.com
 Subject: Re: [Ocfs2-users] O2CB heartbeat: Not active

 What does mounted.ocfs2 -d say on both nodes?

 Not /var/log/dmesg. It is /var/log/messages. You could instead
 run dmesg. This is important as it will tell you why the
 mount failed.

 Sunil


 McKinley, Reid wrote:
   
 Thank you!

 Everything appears to be fine then, except that we cannot mount an
 
 OCFS2
   
 filesystem on our 2nd node.  When I try to mount the fs using
 ocfs2console on the 2nd node, I receive this error message in a
dialog
 box:
 mount.ocfs2: Invalid argument while mounting /dev/sda on
 
 /oracle_home.
   
 Check 'dmesg' for more information on this error. : Could not mount
 /dev/sda

 I do not see any related messages in /var/log/dmesg.  
 Any help is greatly appreciated.
 Thanks,
 Reid

 The O2CB status is as follows on this 2nd node:

 [r...@nyclx2 ~]#  lsmod | grep ocfs2
 ocfs2 369640  0
 ocfs2_dlmfs55952  1
 ocfs2_dlm 217104  2 ocfs2,ocfs2_dlmfs
 ocfs2_nodemanager 225416  6 ocfs2,ocfs2_dlmfs,ocfs2_dlm
 configfs   62301  2 ocfs2_nodemanager
 jbd93873  2 ocfs2,ext3
  [r...@nyclx2 ~]# service o2cb status
 Driver for configfs: Loaded
 Filesystem configfs: Mounted
 Driver for ocfs2_dlmfs: Loaded
 Filesystem ocfs2_dlmfs: Mounted
 Checking O2CB cluster tiaa: Online
   Heartbeat dead threshold: 31
   Network idle timeout: 3
   Network keepalive delay: 2000
   Network reconnect delay: 2000
 Checking O2CB heartbeat: Not active
 [r...@nyclx2 ~]#
 




 This message, including any attachments, contains confidential
information intended 
 for a specific individual and purpose, and is protected by law. If you
are not the intended 
 recipient, please contact the sender immediately by reply e-mail and
destroy all copies.
 You are hereby notified that any disclosure, copying, or distribution
of this message, or
 the taking of any action based on it, is strictly prohibited.

 TIAA-CREF




   



This message, including any attachments