Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-04 Thread McKinley, Reid
Thanks for your push on this.  I don't know why I didn't check this
before.

It looks like we have a problem with our interconnect.  Our SA is
checking the hardware now.

[r...@nyclx1 ~]# more /etc/ocfs2/cluster.conf
node:
ip_port = 
ip_address = 192.168.0.218
number = 0
name = nyclx1
cluster = tiaa

node:
ip_port = 
ip_address = 192.168.0.217
number = 1
name = nyclx2
cluster = tiaa

cluster:
node_count = 2
name = tiaa

[r...@nyclx1 ~]# ping 192.168.0.217
PING 192.168.0.217 (192.168.0.217) 56(84) bytes of data.
>From 192.168.0.218 icmp_seq=2 Destination Host Unreachable
>From 192.168.0.218 icmp_seq=3 Destination Host Unreachable
>From 192.168.0.218 icmp_seq=4 Destination Host Unreachable

-Original Message-
From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
Sent: Wednesday, June 03, 2009 6:19 PM
To: McKinley, Reid
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

Do on both nodes:
$ netstat -ta --numeric-ports

Maybe port  is already in use.

Check your setup again. Ensure cluster.conf is the same on both nodes.
And that the ips are correct. That tcpdump was capturing
the traffic on the correct interface. etc. etc.


McKinley, Reid wrote:
> Yes, I had tcpdump running in separate sessions on both servers.
>
> The port is correct.  Here is the cluster.conf.
>
> node:
> ip_port = 
> ip_address = 192.168.0.218
> number = 0
> name = nyclx1
> cluster = tiaa
>
> node:
> ip_port = 
> ip_address = 192.168.0.217
> number = 1
> name = nyclx2
> cluster = tiaa
>
> cluster:
> node_count = 2
> name = tiaa
>
> -Original Message-
> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
> Sent: Wednesday, June 03, 2009 5:35 PM
> To: McKinley, Reid
> Cc: ocfs2-users@oss.oracle.com
> Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
>
> Did you have tcpdump running on a terminal when you attempted
> the mount on another terminal? Is the interface and port correct?
>
> It is one thing to not see the packets on the nyclx2. But what
> confuses me is that there is no traffic on nyclx1 too.
>   



This message, including any attachments, contains confidential information 
intended 
for a specific individual and purpose, and is protected by law. If you are not 
the intended 
recipient, please contact the sender immediately by reply e-mail and destroy 
all copies.
You are hereby notified that any disclosure, copying, or distribution of this 
message, or
the taking of any action based on it, is strictly prohibited.

TIAA-CREF



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread Sunil Mushran
Do on both nodes:
$ netstat -ta --numeric-ports

Maybe port  is already in use.

Check your setup again. Ensure cluster.conf is the same on both nodes.
And that the ips are correct. That tcpdump was capturing
the traffic on the correct interface. etc. etc.


McKinley, Reid wrote:
> Yes, I had tcpdump running in separate sessions on both servers.
>
> The port is correct.  Here is the cluster.conf.
>
> node:
> ip_port = 
> ip_address = 192.168.0.218
> number = 0
> name = nyclx1
> cluster = tiaa
>
> node:
> ip_port = 
> ip_address = 192.168.0.217
> number = 1
> name = nyclx2
> cluster = tiaa
>
> cluster:
> node_count = 2
> name = tiaa
>
> -Original Message-
> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
> Sent: Wednesday, June 03, 2009 5:35 PM
> To: McKinley, Reid
> Cc: ocfs2-users@oss.oracle.com
> Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
>
> Did you have tcpdump running on a terminal when you attempted
> the mount on another terminal? Is the interface and port correct?
>
> It is one thing to not see the packets on the nyclx2. But what
> confuses me is that there is no traffic on nyclx1 too.
>   


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread McKinley, Reid
Yes, I had tcpdump running in separate sessions on both servers.

The port is correct.  Here is the cluster.conf.

node:
ip_port = 
ip_address = 192.168.0.218
number = 0
name = nyclx1
cluster = tiaa

node:
ip_port = 
ip_address = 192.168.0.217
number = 1
name = nyclx2
cluster = tiaa

cluster:
node_count = 2
name = tiaa

-Original Message-
From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
Sent: Wednesday, June 03, 2009 5:35 PM
To: McKinley, Reid
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

Did you have tcpdump running on a terminal when you attempted
the mount on another terminal? Is the interface and port correct?

It is one thing to not see the packets on the nyclx2. But what
confuses me is that there is no traffic on nyclx1 too.


McKinley, Reid wrote:
> We can bring up the ocfs2 cluster on 1 of 2 nodes only.  So, it
appears
> that it's not specific to only one specific node.  Right now we have
the
> ocfs2 heartbeat operational on node2 (node1 in the cluster.conf).
>
> Here are the results of the mount and tcpdump.
>
> [r...@nyclx1 ~]# mount -t ocfs2 /dev/mapper/mpath1 /oragrid
> mount.ocfs2: Transport endpoint is not connected while mounting
> /dev/mapper/mpath1 on /oragrid. Check 'dmesg' for more information on
> this error.
> [r...@nyclx1 ~]#
>
> [r...@nyclx2 ~]#  tcpdump -i eth1 -s 2500 -ttt 'port '
> tcpdump: verbose output suppressed, use -v or -vv for full protocol
> decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 2500
bytes
>
> 0 packets captured
> 0 packets received by filter
> 0 packets dropped by kernel
>
> [r...@nyclx2 ~]# /etc/init.d/o2cb status
> Driver for "configfs": Loaded
> Filesystem "configfs": Mounted
> Driver for "ocfs2_dlmfs": Loaded
> Filesystem "ocfs2_dlmfs": Mounted
> Checking O2CB cluster tiaa: Online
> Heartbeat dead threshold = 31
>   Network idle timeout: 3
>   Network keepalive delay: 2000
>   Network reconnect delay: 2000
> Checking O2CB heartbeat: Active
>
> [r...@nyclx1 ~]# tcpdump -i eth1 -s 2500 -ttt 'port '
> tcpdump: verbose output suppressed, use -v or -vv for full protocol
> decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 2500
bytes
>
> 0 packets captured
> 0 packets received by filter
> 0 packets dropped by kernel
>
> [r...@nyclx1 ~]# /etc/init.d/o2cb status
> Driver for "configfs": Loaded
> Filesystem "configfs": Mounted
> Driver for "ocfs2_dlmfs": Loaded
> Filesystem "ocfs2_dlmfs": Mounted
> Checking O2CB cluster tiaa: Online
> Heartbeat dead threshold = 31
>   Network idle timeout: 3
>   Network keepalive delay: 2000
>   Network reconnect delay: 2000
> Checking O2CB heartbeat: Not active
>
>
> -Original Message-
> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
> Sent: Wednesday, June 03, 2009 4:55 PM
> To: McKinley, Reid
> Cc: ocfs2-users@oss.oracle.com
> Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
>
> Do:
> $ tcpdump -i ethX -s 2500 -ttt 'port '
>
> on both nodes. Replace ethX with the appropriate interface.
> Then issue the mount command on node 1. Do you see the traffic
> on node 0?
>
> McKinley, Reid wrote:
>   
>> No, iptables is shutdown and disabled.  No firewalls.
>>
>> [r...@nyclx1 ~]# service iptables status
>> Firewall is stopped.
>>
>> -Original Message-
>> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
>> Sent: Wednesday, June 03, 2009 12:57 PM
>> To: McKinley, Reid
>> Cc: ocfs2-users@oss.oracle.com
>> Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
>>
>> The connect requests are not getting through. Do you
>> have any firewalls setup? Is iptables running? If so, either
>> shut it down or allow traffic on the o2cb port.
>> 
>
>


> This message, including any attachments, contains confidential
information intended 
> for a specific individual and purpose, and is protected by law. If you
are not the intended 
> recipient, please contact the sender immediately by reply e-mail and
destroy all copies.
> You are hereby notified that any disclosure, copying, or distribution
of this message, or
> the taking of any action based on it, is strictly prohibited.
>
> TIAA-CREF
>


>
>   



Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread Sunil Mushran
Did you have tcpdump running on a terminal when you attempted
the mount on another terminal? Is the interface and port correct?

It is one thing to not see the packets on the nyclx2. But what
confuses me is that there is no traffic on nyclx1 too.


McKinley, Reid wrote:
> We can bring up the ocfs2 cluster on 1 of 2 nodes only.  So, it appears
> that it's not specific to only one specific node.  Right now we have the
> ocfs2 heartbeat operational on node2 (node1 in the cluster.conf).
>
> Here are the results of the mount and tcpdump.
>
> [r...@nyclx1 ~]# mount -t ocfs2 /dev/mapper/mpath1 /oragrid
> mount.ocfs2: Transport endpoint is not connected while mounting
> /dev/mapper/mpath1 on /oragrid. Check 'dmesg' for more information on
> this error.
> [r...@nyclx1 ~]#
>
> [r...@nyclx2 ~]#  tcpdump -i eth1 -s 2500 -ttt 'port '
> tcpdump: verbose output suppressed, use -v or -vv for full protocol
> decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes
>
> 0 packets captured
> 0 packets received by filter
> 0 packets dropped by kernel
>
> [r...@nyclx2 ~]# /etc/init.d/o2cb status
> Driver for "configfs": Loaded
> Filesystem "configfs": Mounted
> Driver for "ocfs2_dlmfs": Loaded
> Filesystem "ocfs2_dlmfs": Mounted
> Checking O2CB cluster tiaa: Online
> Heartbeat dead threshold = 31
>   Network idle timeout: 3
>   Network keepalive delay: 2000
>   Network reconnect delay: 2000
> Checking O2CB heartbeat: Active
>
> [r...@nyclx1 ~]# tcpdump -i eth1 -s 2500 -ttt 'port '
> tcpdump: verbose output suppressed, use -v or -vv for full protocol
> decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes
>
> 0 packets captured
> 0 packets received by filter
> 0 packets dropped by kernel
>
> [r...@nyclx1 ~]# /etc/init.d/o2cb status
> Driver for "configfs": Loaded
> Filesystem "configfs": Mounted
> Driver for "ocfs2_dlmfs": Loaded
> Filesystem "ocfs2_dlmfs": Mounted
> Checking O2CB cluster tiaa: Online
> Heartbeat dead threshold = 31
>   Network idle timeout: 3
>   Network keepalive delay: 2000
>   Network reconnect delay: 2000
> Checking O2CB heartbeat: Not active
>
>
> -Original Message-
> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
> Sent: Wednesday, June 03, 2009 4:55 PM
> To: McKinley, Reid
> Cc: ocfs2-users@oss.oracle.com
> Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
>
> Do:
> $ tcpdump -i ethX -s 2500 -ttt 'port '
>
> on both nodes. Replace ethX with the appropriate interface.
> Then issue the mount command on node 1. Do you see the traffic
> on node 0?
>
> McKinley, Reid wrote:
>   
>> No, iptables is shutdown and disabled.  No firewalls.
>>
>> [r...@nyclx1 ~]# service iptables status
>> Firewall is stopped.
>>
>> -Original Message-
>> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
>> Sent: Wednesday, June 03, 2009 12:57 PM
>> To: McKinley, Reid
>> Cc: ocfs2-users@oss.oracle.com
>> Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
>>
>> The connect requests are not getting through. Do you
>> have any firewalls setup? Is iptables running? If so, either
>> shut it down or allow traffic on the o2cb port.
>> 
>
> 
> This message, including any attachments, contains confidential information 
> intended 
> for a specific individual and purpose, and is protected by law. If you are 
> not the intended 
> recipient, please contact the sender immediately by reply e-mail and destroy 
> all copies.
> You are hereby notified that any disclosure, copying, or distribution of this 
> message, or
> the taking of any action based on it, is strictly prohibited.
>
> TIAA-CREF
> 
>
>   


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread McKinley, Reid
We can bring up the ocfs2 cluster on 1 of 2 nodes only.  So, it appears
that it's not specific to only one specific node.  Right now we have the
ocfs2 heartbeat operational on node2 (node1 in the cluster.conf).

Here are the results of the mount and tcpdump.

[r...@nyclx1 ~]# mount -t ocfs2 /dev/mapper/mpath1 /oragrid
mount.ocfs2: Transport endpoint is not connected while mounting
/dev/mapper/mpath1 on /oragrid. Check 'dmesg' for more information on
this error.
[r...@nyclx1 ~]#

[r...@nyclx2 ~]#  tcpdump -i eth1 -s 2500 -ttt 'port '
tcpdump: verbose output suppressed, use -v or -vv for full protocol
decode
listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel

[r...@nyclx2 ~]# /etc/init.d/o2cb status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster tiaa: Online
Heartbeat dead threshold = 31
  Network idle timeout: 3
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Active

[r...@nyclx1 ~]# tcpdump -i eth1 -s 2500 -ttt 'port '
tcpdump: verbose output suppressed, use -v or -vv for full protocol
decode
listening on eth1, link-type EN10MB (Ethernet), capture size 2500 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel

[r...@nyclx1 ~]# /etc/init.d/o2cb status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster tiaa: Online
Heartbeat dead threshold = 31
  Network idle timeout: 3
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Not active


-Original Message-
From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
Sent: Wednesday, June 03, 2009 4:55 PM
To: McKinley, Reid
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

Do:
$ tcpdump -i ethX -s 2500 -ttt 'port '

on both nodes. Replace ethX with the appropriate interface.
Then issue the mount command on node 1. Do you see the traffic
on node 0?

McKinley, Reid wrote:
> No, iptables is shutdown and disabled.  No firewalls.
>
> [r...@nyclx1 ~]# service iptables status
> Firewall is stopped.
>
> -Original Message-
> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
> Sent: Wednesday, June 03, 2009 12:57 PM
> To: McKinley, Reid
> Cc: ocfs2-users@oss.oracle.com
> Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
>
> The connect requests are not getting through. Do you
> have any firewalls setup? Is iptables running? If so, either
> shut it down or allow traffic on the o2cb port.


This message, including any attachments, contains confidential information 
intended 
for a specific individual and purpose, and is protected by law. If you are not 
the intended 
recipient, please contact the sender immediately by reply e-mail and destroy 
all copies.
You are hereby notified that any disclosure, copying, or distribution of this 
message, or
the taking of any action based on it, is strictly prohibited.

TIAA-CREF



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread Sunil Mushran
Do:
$ tcpdump -i ethX -s 2500 -ttt 'port '

on both nodes. Replace ethX with the appropriate interface.
Then issue the mount command on node 1. Do you see the traffic
on node 0?

McKinley, Reid wrote:
> No, iptables is shutdown and disabled.  No firewalls.
>
> [r...@nyclx1 ~]# service iptables status
> Firewall is stopped.
>
> -Original Message-
> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
> Sent: Wednesday, June 03, 2009 12:57 PM
> To: McKinley, Reid
> Cc: ocfs2-users@oss.oracle.com
> Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node
>
> The connect requests are not getting through. Do you
> have any firewalls setup? Is iptables running? If so, either
> shut it down or allow traffic on the o2cb port.

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread McKinley, Reid
No, iptables is shutdown and disabled.  No firewalls.

[r...@nyclx1 ~]# service iptables status
Firewall is stopped.

-Original Message-
From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
Sent: Wednesday, June 03, 2009 12:57 PM
To: McKinley, Reid
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

The connect requests are not getting through. Do you
have any firewalls setup? Is iptables running? If so, either
shut it down or allow traffic on the o2cb port.

McKinley, Reid wrote:
>
> We are having trouble getting the 2^nd node in our 2 node RAC 
> configuration to have an active O2CB heartbeat. We have our OCR and 
> voting disks on an OCFS2 mount point, so we cannot bring up 
> Clusterware on this node.
>
> I'm at a loss as to what the issue is. It was running fine for a few 
> weeks, then we had a reboot and we cannot get the heartbeat active and

> we cannot mount any OCFS2 filesystems on the 2^nd node.
>
> Any ideas are greatly appreciated.
>
> Dmesg errors are at the bottom.
>
> Here are the rpm and status details:
>
> [r...@nyclx1 ~]# rpm -qa | grep ocfs2
>
> ocfs2-tools-1.4.1-1.el5
>
> ocfs2console-1.4.1-1.el5
>
> ocfs2-2.6.18-92.el5-1.4.1-1.el5
>
> ocfs2-2.6.18-92.el5debug-1.2.8-2.el5
>
> ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5
>
> ocfs2-tools-debuginfo-1.4.1-1.el5
>
> [r...@nyclx1 ~]# /etc/init.d/o2cb status
>
> Driver for "configfs": Loaded
>
> Filesystem "configfs": Mounted
>
> Driver for "ocfs2_dlmfs": Loaded
>
> Filesystem "ocfs2_dlmfs": Mounted
>
> Checking O2CB cluster tiaa: Online
>
> Heartbeat dead threshold = 31
>
> Network idle timeout: 3
>
> Network keepalive delay: 2000
>
> Network reconnect delay: 2000
>
> Checking O2CB heartbeat: Active
>
> [r...@nyclx2 ~]# rpm -qa | grep ocfs2
>
> ocfs2-tools-1.4.1-1.el5
>
> ocfs2-2.6.18-92.el5-1.4.1-1.el5
>
> ocfs2-tools-debuginfo-1.4.1-1.el5
>
> ocfs2console-1.4.1-1.el5
>
> ocfs2-2.6.18-92.el5debug-1.2.8-2.el5
>
> ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5
>
> [r...@nyclx2 ~]# /etc/init.d/o2cb status
>
> Driver for "configfs": Loaded
>
> Filesystem "configfs": Mounted
>
> Driver for "ocfs2_dlmfs": Loaded
>
> Filesystem "ocfs2_dlmfs": Mounted
>
> Checking O2CB cluster tiaa: Online
>
> Heartbeat dead threshold = 31
>
> Network idle timeout: 3
>
> Network keepalive delay: 2000
>
> Network reconnect delay: 2000
>
> Checking O2CB heartbeat: Not active
>
> OCFS2 1.4.1 Wed Jul 23 12:05:34 PDT 2008 (build 
> 3fc82af4b5669945497b322b6aabd031)
>
> (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
> with node 0 after 30.0 seconds, giving up and returning errors.
>
> (14212,1):dlm_request_join:1033 ERROR: status = -107
>
> (14212,1):dlm_try_to_join_domain:1207 ERROR: status = -107
>
> (14212,1):dlm_join_domain:1485 ERROR: status = -107
>
> (14212,1):dlm_register_domain:1732 ERROR: status = -107
>
> (14212,1):ocfs2_dlm_init:2662 ERROR: status = -107
>
> (14212,1):ocfs2_mount_volume:1251 ERROR: status = -107
>
> ocfs2: Unmounting device (253,2) on (node 1)
>
> (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
> with node 0 after 30.0 seconds, giving up and returning errors.
>
> (14350,1):dlm_request_join:1033 ERROR: status = -107
>
> (14350,1):dlm_try_to_join_domain:1207 ERROR: status = -107
>
> (14350,1):dlm_join_domain:1485 ERROR: status = -107
>
> (14350,1):dlm_register_domain:1732 ERROR: status = -107
>
> (14350,1):ocfs2_dlm_init:2662 ERROR: status = -107
>
> (14350,1):ocfs2_mount_volume:1251 ERROR: status = -107
>
> ocfs2: Unmounting device (253,3) on (node 1)
>
> (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
> with node 0 after 30.0 seconds, giving up and returning errors.
>
> (4347,1):dlm_request_join:1033 ERROR: status = -107
>
> (4347,1):dlm_try_to_join_domain:1207 ERROR: status = -107
>
> (4347,1):dlm_join_domain:1485 ERROR: status = -107
>
> (4347,1):dlm_register_domain:1732 ERROR: status = -107
>
> (4347,1):ocfs2_dlm_init:2662 ERROR: status = -107
>
> (4347,1):ocfs2_mount_volume:1251 ERROR: status = -107
>
> ocfs2: Unmounting device (253,3) on (node 1)
>
> (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
> with node 0 after 30.0 seconds, giving up and returning errors.
>
> (4948,1):dlm_request_join:1033 ERROR: status = -107
>
> (4948,1):dlm_try_to_join_domain:1207 ERROR: status = -107
>
> (4948,1):dlm_join_domain:1485 ERROR: status = -107
>
> (4948,1):dlm_register_do

Re: [Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread Sunil Mushran
The connect requests are not getting through. Do you
have any firewalls setup? Is iptables running? If so, either
shut it down or allow traffic on the o2cb port.

McKinley, Reid wrote:
>
> We are having trouble getting the 2^nd node in our 2 node RAC 
> configuration to have an active O2CB heartbeat. We have our OCR and 
> voting disks on an OCFS2 mount point, so we cannot bring up 
> Clusterware on this node.
>
> I’m at a loss as to what the issue is. It was running fine for a few 
> weeks, then we had a reboot and we cannot get the heartbeat active and 
> we cannot mount any OCFS2 filesystems on the 2^nd node.
>
> Any ideas are greatly appreciated.
>
> Dmesg errors are at the bottom.
>
> Here are the rpm and status details:
>
> [r...@nyclx1 ~]# rpm -qa | grep ocfs2
>
> ocfs2-tools-1.4.1-1.el5
>
> ocfs2console-1.4.1-1.el5
>
> ocfs2-2.6.18-92.el5-1.4.1-1.el5
>
> ocfs2-2.6.18-92.el5debug-1.2.8-2.el5
>
> ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5
>
> ocfs2-tools-debuginfo-1.4.1-1.el5
>
> [r...@nyclx1 ~]# /etc/init.d/o2cb status
>
> Driver for "configfs": Loaded
>
> Filesystem "configfs": Mounted
>
> Driver for "ocfs2_dlmfs": Loaded
>
> Filesystem "ocfs2_dlmfs": Mounted
>
> Checking O2CB cluster tiaa: Online
>
> Heartbeat dead threshold = 31
>
> Network idle timeout: 3
>
> Network keepalive delay: 2000
>
> Network reconnect delay: 2000
>
> Checking O2CB heartbeat: Active
>
> [r...@nyclx2 ~]# rpm -qa | grep ocfs2
>
> ocfs2-tools-1.4.1-1.el5
>
> ocfs2-2.6.18-92.el5-1.4.1-1.el5
>
> ocfs2-tools-debuginfo-1.4.1-1.el5
>
> ocfs2console-1.4.1-1.el5
>
> ocfs2-2.6.18-92.el5debug-1.2.8-2.el5
>
> ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5
>
> [r...@nyclx2 ~]# /etc/init.d/o2cb status
>
> Driver for "configfs": Loaded
>
> Filesystem "configfs": Mounted
>
> Driver for "ocfs2_dlmfs": Loaded
>
> Filesystem "ocfs2_dlmfs": Mounted
>
> Checking O2CB cluster tiaa: Online
>
> Heartbeat dead threshold = 31
>
> Network idle timeout: 3
>
> Network keepalive delay: 2000
>
> Network reconnect delay: 2000
>
> Checking O2CB heartbeat: Not active
>
> OCFS2 1.4.1 Wed Jul 23 12:05:34 PDT 2008 (build 
> 3fc82af4b5669945497b322b6aabd031)
>
> (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
> with node 0 after 30.0 seconds, giving up and returning errors.
>
> (14212,1):dlm_request_join:1033 ERROR: status = -107
>
> (14212,1):dlm_try_to_join_domain:1207 ERROR: status = -107
>
> (14212,1):dlm_join_domain:1485 ERROR: status = -107
>
> (14212,1):dlm_register_domain:1732 ERROR: status = -107
>
> (14212,1):ocfs2_dlm_init:2662 ERROR: status = -107
>
> (14212,1):ocfs2_mount_volume:1251 ERROR: status = -107
>
> ocfs2: Unmounting device (253,2) on (node 1)
>
> (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
> with node 0 after 30.0 seconds, giving up and returning errors.
>
> (14350,1):dlm_request_join:1033 ERROR: status = -107
>
> (14350,1):dlm_try_to_join_domain:1207 ERROR: status = -107
>
> (14350,1):dlm_join_domain:1485 ERROR: status = -107
>
> (14350,1):dlm_register_domain:1732 ERROR: status = -107
>
> (14350,1):ocfs2_dlm_init:2662 ERROR: status = -107
>
> (14350,1):ocfs2_mount_volume:1251 ERROR: status = -107
>
> ocfs2: Unmounting device (253,3) on (node 1)
>
> (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
> with node 0 after 30.0 seconds, giving up and returning errors.
>
> (4347,1):dlm_request_join:1033 ERROR: status = -107
>
> (4347,1):dlm_try_to_join_domain:1207 ERROR: status = -107
>
> (4347,1):dlm_join_domain:1485 ERROR: status = -107
>
> (4347,1):dlm_register_domain:1732 ERROR: status = -107
>
> (4347,1):ocfs2_dlm_init:2662 ERROR: status = -107
>
> (4347,1):ocfs2_mount_volume:1251 ERROR: status = -107
>
> ocfs2: Unmounting device (253,3) on (node 1)
>
> (11296,1):o2net_connect_expired:1637 ERROR: no connection established 
> with node 0 after 30.0 seconds, giving up and returning errors.
>
> (4948,1):dlm_request_join:1033 ERROR: status = -107
>
> (4948,1):dlm_try_to_join_domain:1207 ERROR: status = -107
>
> (4948,1):dlm_join_domain:1485 ERROR: status = -107
>
> (4948,1):dlm_register_domain:1732 ERROR: status = -107
>
> (4948,1):ocfs2_dlm_init:2662 ERROR: status = -107
>
> (4948,1):ocfs2_mount_volume:1251 ERROR: status = -107
>
> ocfs2: Unmounting device (253,3) on (node 1)
>
> OCFS2 Node Manager 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 
> 0f78045c75c0174e50e4cf0934bf9eae)
>
> OCFS2 DLM 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 
> 4ce8fae327880c466761f40fb7619490)
>
> OCFS2 DLMFS 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build 
> 4ce8fae327880c466761f40fb7619490)
>
> OCFS2 User DLM kernel interface loaded
>
> [r...@nyclx2 ~]#
>
> Reid McKinley
>
> 
> This message, including any attachments, contains confidential information 
> intended 
> for a specific individual and purpose, and is protected by law. If you are 
> not the intended 
> recipient, please contact the sender immediatel

[Ocfs2-users] O2CB heartbeat not active on 2nd node

2009-06-03 Thread McKinley, Reid
We are having trouble getting the 2nd node in our 2 node RAC
configuration to have an active O2CB heartbeat. We have our OCR and
voting disks on an OCFS2 mount point, so we cannot bring up Clusterware
on this node.

 

I'm at a loss as to what the issue is.  It was running fine for a few
weeks, then we had a reboot and we cannot get the heartbeat active and
we cannot mount any OCFS2 filesystems on the 2nd node.

 

Any ideas are greatly appreciated.

 

Dmesg errors are at the bottom.

 

Here are the rpm and status details:

[r...@nyclx1 ~]# rpm -qa | grep ocfs2

ocfs2-tools-1.4.1-1.el5

ocfs2console-1.4.1-1.el5

ocfs2-2.6.18-92.el5-1.4.1-1.el5

ocfs2-2.6.18-92.el5debug-1.2.8-2.el5

ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5

ocfs2-tools-debuginfo-1.4.1-1.el5

[r...@nyclx1 ~]# /etc/init.d/o2cb status

Driver for "configfs": Loaded

Filesystem "configfs": Mounted

Driver for "ocfs2_dlmfs": Loaded

Filesystem "ocfs2_dlmfs": Mounted

Checking O2CB cluster tiaa: Online

Heartbeat dead threshold = 31

  Network idle timeout: 3

  Network keepalive delay: 2000

  Network reconnect delay: 2000

Checking O2CB heartbeat: Active

 

[r...@nyclx2 ~]# rpm -qa | grep ocfs2

ocfs2-tools-1.4.1-1.el5

ocfs2-2.6.18-92.el5-1.4.1-1.el5

ocfs2-tools-debuginfo-1.4.1-1.el5

ocfs2console-1.4.1-1.el5

ocfs2-2.6.18-92.el5debug-1.2.8-2.el5

ocfs2-2.6.18-92.el5-debuginfo-1.4.1-1.el5

[r...@nyclx2 ~]# /etc/init.d/o2cb status

Driver for "configfs": Loaded

Filesystem "configfs": Mounted

Driver for "ocfs2_dlmfs": Loaded

Filesystem "ocfs2_dlmfs": Mounted

Checking O2CB cluster tiaa: Online

Heartbeat dead threshold = 31

  Network idle timeout: 3

  Network keepalive delay: 2000

  Network reconnect delay: 2000

Checking O2CB heartbeat: Not active

 

OCFS2 1.4.1 Wed Jul 23 12:05:34 PDT 2008 (build
3fc82af4b5669945497b322b6aabd031)

(11296,1):o2net_connect_expired:1637 ERROR: no connection established
with node 0 after 30.0 seconds, giving up and returning errors.

(14212,1):dlm_request_join:1033 ERROR: status = -107

(14212,1):dlm_try_to_join_domain:1207 ERROR: status = -107

(14212,1):dlm_join_domain:1485 ERROR: status = -107

(14212,1):dlm_register_domain:1732 ERROR: status = -107

(14212,1):ocfs2_dlm_init:2662 ERROR: status = -107

(14212,1):ocfs2_mount_volume:1251 ERROR: status = -107

ocfs2: Unmounting device (253,2) on (node 1)

(11296,1):o2net_connect_expired:1637 ERROR: no connection established
with node 0 after 30.0 seconds, giving up and returning errors.

(14350,1):dlm_request_join:1033 ERROR: status = -107

(14350,1):dlm_try_to_join_domain:1207 ERROR: status = -107

(14350,1):dlm_join_domain:1485 ERROR: status = -107

(14350,1):dlm_register_domain:1732 ERROR: status = -107

(14350,1):ocfs2_dlm_init:2662 ERROR: status = -107

(14350,1):ocfs2_mount_volume:1251 ERROR: status = -107

ocfs2: Unmounting device (253,3) on (node 1)

(11296,1):o2net_connect_expired:1637 ERROR: no connection established
with node 0 after 30.0 seconds, giving up and returning errors.

(4347,1):dlm_request_join:1033 ERROR: status = -107

(4347,1):dlm_try_to_join_domain:1207 ERROR: status = -107

(4347,1):dlm_join_domain:1485 ERROR: status = -107

(4347,1):dlm_register_domain:1732 ERROR: status = -107

(4347,1):ocfs2_dlm_init:2662 ERROR: status = -107

(4347,1):ocfs2_mount_volume:1251 ERROR: status = -107

ocfs2: Unmounting device (253,3) on (node 1)

(11296,1):o2net_connect_expired:1637 ERROR: no connection established
with node 0 after 30.0 seconds, giving up and returning errors.

(4948,1):dlm_request_join:1033 ERROR: status = -107

(4948,1):dlm_try_to_join_domain:1207 ERROR: status = -107

(4948,1):dlm_join_domain:1485 ERROR: status = -107

(4948,1):dlm_register_domain:1732 ERROR: status = -107

(4948,1):ocfs2_dlm_init:2662 ERROR: status = -107

(4948,1):ocfs2_mount_volume:1251 ERROR: status = -107

ocfs2: Unmounting device (253,3) on (node 1)

OCFS2 Node Manager 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build
0f78045c75c0174e50e4cf0934bf9eae)

OCFS2 DLM 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build
4ce8fae327880c466761f40fb7619490)

OCFS2 DLMFS 1.4.1 Wed Jul 23 12:05:37 PDT 2008 (build
4ce8fae327880c466761f40fb7619490)

OCFS2 User DLM kernel interface loaded

[r...@nyclx2 ~]#

 

Reid McKinley



This message, including any attachments, contains confidential information 
intended 
for a specific individual and purpose, and is protected by law. If you are not 
the intended 
recipient, please contact the sender immediately by reply e-mail and destroy 
all copies.
You are hereby notified that any disclosure, copying, or distribution of this 
message, or
the taking of any action based on it, is strictly prohibited.

TIAA-CREF

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-