Enable some debugging. #debugfs.ocfs2 -l TCP allow ...do mount... #debugfs.ocfs2 -l TCP off
David Murphy wrote: > [r...@web2 ~]# nc -z 192.168.102.140 7777 > Connection to 192.168.102.140 7777 port [tcp/cbt] succeeded! > > [r...@web1 /etc/sysconfig/network-scripts]# nc -z 192.168.102.141 7777 > Connection to 192.168.102.141 7777 port [tcp/cbt] succeeded! > > -----Original Message----- > From: Sunil Mushran [mailto:sunil.mush...@oracle.com] > Sent: Monday, March 29, 2010 5:08 PM > To: David Murphy > Cc: ocfs2-users@oss.oracle.com > Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 > > What happens when you use netcat to ping the node? > nc -z host.example.com 7777 > > David Murphy wrote: > >> Some additional data: >> From Web1 ( New Fedora Machine) to Web2: >> [r...@web1 /etc/sysconfig/network-scripts]# nmap 192.168.102.141 >> >> Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT >> Nmap scan report for 192.168.102.141 >> Host is up (0.000076s latency). >> Not shown: 993 closed ports >> PORT STATE SERVICE >> 22/tcp open ssh >> 80/tcp open http >> 81/tcp open hosts2-ns >> 111/tcp open rpcbind >> 5666/tcp open nrpe >> 7777/tcp open unknown >> 9102/tcp open jetdirect >> MAC Address: 00:50:56:A3:58:5D (VMware) >> >> Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds >> >> >> From web2 -> web1 (new fedora machine) >> [r...@web2 ~]# nmap 192.168.102.140 >> >> Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT >> Interesting ports on 192.168.102.140: >> Not shown: 994 closed ports >> PORT STATE SERVICE >> 22/tcp open ssh >> 80/tcp open http >> 81/tcp open hosts2-ns >> 111/tcp open rpcbind >> 443/tcp open https >> 7777/tcp open unknown >> MAC Address: 00:50:56:A3:14:62 (VMWare) >> >> Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds >> >> >> Cluster.conf: >> cluster: >> node_count = 6 >> name = appshare >> >> node: >> ip_port = 7777 >> ip_address = 192.168.102.140 >> number = 1 >> name = web1 >> cluster = appshare >> >> node: >> ip_port = 7777 >> ip_address = 192.168.102.141 >> number = 2 >> name = web2 >> cluster = appshare >> >> node: >> ip_port = 7777 >> ip_address = 192.168.102.142 >> number = 3 >> name = web3 >> cluster = appshare >> >> node: >> ip_port = 7777 >> ip_address = 192.168.102.111 >> number = 4 >> name = rgapp1 >> cluster = appshare >> >> node: >> ip_port = 7777 >> ip_address = 192.168.102.122 >> number = 5 >> name = deploy >> cluster = appshare >> >> node: >> ip_port = 7777 >> ip_address = 192.168.102.112 >> number = 6 >> name = app1 >> cluster = appshare >> >> DMESG on WEB1: >> OCFS2 1.5.0 >> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 2 after 30.0 seconds, giving up and returning errors. >> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 3 after 30.0 seconds, giving up and returning errors. >> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 4 after 30.0 seconds, giving up and returning errors. >> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 5 after 30.0 seconds, giving up and returning errors. >> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 6 after 30.0 seconds, giving up and returning errors. >> (1262,0):dlm_request_join:1035 ERROR: status = -107 >> (1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >> (1262,0):dlm_join_domain:1487 ERROR: status = -107 >> (1262,0):dlm_register_domain:1753 ERROR: status = -107 >> (1262,0):o2cb_cluster_connect:313 ERROR: status = -107 >> (1262,0):ocfs2_dlm_init:2963 ERROR: status = -107 >> (1262,0):ocfs2_mount_volume:1788 ERROR: status = -107 >> ocfs2: Unmounting device (253,1) on (node 0) >> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 2 after 30.0 seconds, giving up and returning errors. >> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 3 after 30.0 seconds, giving up and returning errors. >> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 5 after 30.0 seconds, giving up and returning errors. >> (1199,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 6 after 30.0 seconds, giving up and returning errors. >> (1323,0):dlm_request_join:1035 ERROR: status = -107 >> (1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >> (1323,0):dlm_join_domain:1487 ERROR: status = -107 >> (1323,0):dlm_register_domain:1753 ERROR: status = -107 >> (1323,0):o2cb_cluster_connect:313 ERROR: status = -107 >> (1323,0):ocfs2_dlm_init:2963 ERROR: status = -107 >> (1323,0):ocfs2_mount_volume:1788 ERROR: status = -107 >> ocfs2: Unmounting device (253,1) on (node 0) >> VMCI: Major device number is: 249 >> VMware memory control driver initialized >> vmmemctl: started kernel thread pid=1522 >> ocfs2: Unregistered cluster interface o2cb >> OCFS2 Node Manager 1.5.0 >> OCFS2 DLM 1.5.0 >> ocfs2: Registered cluster interface o2cb >> OCFS2 DLMFS 1.5.0 >> OCFS2 User DLM kernel interface loaded >> OCFS2 1.5.0 >> (1810,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 4 after 30.0 seconds, giving up and returning errors. >> (1810,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 5 after 30.0 seconds, giving up and returning errors. >> (1810,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 6 after 30.0 seconds, giving up and returning errors. >> (1810,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 2 after 30.0 seconds, giving up and returning errors. >> (1810,0):o2net_connect_expired:1656 ERROR: no connection established >> with node 3 after 30.0 seconds, giving up and returning errors. >> (1839,0):dlm_request_join:1035 ERROR: status = -107 >> (1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >> (1839,0):dlm_join_domain:1487 ERROR: status = -107 >> (1839,0):dlm_register_domain:1753 ERROR: status = -107 >> (1839,0):o2cb_cluster_connect:313 ERROR: status = -107 >> (1839,0):ocfs2_dlm_init:2963 ERROR: status = -107 >> (1839,0):ocfs2_mount_volume:1788 ERROR: status = -107 >> ocfs2: Unmounting device (253,1) on (node 0) >> >> >> >> So clearly ocfs2 the service things it can connect to the node, but nmap >> sees the connection just fine. And Web2 can see the port on web1 just >> > fine, > >> so there is no firewall blocking the connections. >> >> I think it might be Fedora 12 used 1.50 for the OCFS kernel module and >> CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this? >> >> David >> -----Original Message----- >> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] >> Sent: Thursday, March 25, 2010 6:46 PM >> To: David Murphy >> Cc: ocfs2-users@oss.oracle.com >> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 >> >> hmm.. o2cb_ctl makes no connections. It just reads the cluster.conf and >> populates configfs. AFAIK. >> >> David Murphy wrote: >> >> >>> We had 6 nodes running CentOS 5.4 using 1.4.3 ocfs2-tools. >>> >>> >>> >>> I decided to rebuild one node with FC12. >>> >>> >>> >>> >>> >>> Which is working fine, however >>> >>> >>> >>> Nmap 192.168.200.112 shows 7777 as open >>> >>> And >>> >>> >>> >>> O2cb_ctl is timing out when trying to connect to that node which then >>> causes a 107 error. This happens with all node and all node have 7777 >>> open via nmap from the FC machine. >>> >>> >>> >>> >>> >>> Is there a way to further debug this to see what exactly o2cb_ctl is >>> seeing when trying to connect? >>> >>> >>> >>> >>> >>> David >>> >>> ---------------------------------------------------------------------- >>> -- >>> >>> _______________________________________________ >>> Ocfs2-users mailing list >>> Ocfs2-users@oss.oracle.com >>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>> >>> >> >> > > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users