[r...@web1 /dev]# debugfs.ocfs2 -l TCP off /dev/mapper/OCFS2_200Gp1 [r...@web1 /dev]# mount /dev/mapper/OCFS2_200Gp1 -v device=/dev/mapper/OCFS2_200Gp1 mount.ocfs2: Transport endpoint is not connected while mounting /dev/mapper/OCFS2_200Gp1 on /mnt/appshare. Check 'dmesg' for more information on this error. [r...@web1 /dev]#dmesg
DMESG: Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656 ERROR: no connection established with node 2 after 30.0 seconds, giving up and returning errors. Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656 ERROR: no connection established with node 3 after 30.0 seconds, giving up and returning errors. Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656 ERROR: no connection established with node 4 after 30.0 seconds, giving up and returning errors. Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656 ERROR: no connection established with node 5 after 30.0 seconds, giving up and returning errors. Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656 ERROR: no connection established with node 6 after 30.0 seconds, giving up and returning errors. Mar 30 10:23:38 web1 kernel: (1740,0):dlm_request_join:1035 ERROR: status = -107 Mar 30 10:23:38 web1 kernel: (1740,0):dlm_try_to_join_domain:1209 ERROR: status = -107 Mar 30 10:23:38 web1 kernel: (1740,0):dlm_join_domain:1487 ERROR: status = -107 Mar 30 10:23:38 web1 kernel: (1740,0):dlm_register_domain:1753 ERROR: status = -107 Mar 30 10:23:38 web1 kernel: (1740,0):o2cb_cluster_connect:313 ERROR: status = -107 Mar 30 10:23:38 web1 kernel: (1740,0):ocfs2_dlm_init:2963 ERROR: status = -107 Mar 30 10:23:38 web1 kernel: (1740,0):ocfs2_mount_volume:1788 ERROR: status = -107 Mar 30 10:23:38 web1 kernel: ocfs2: Unmounting device (253,1) on (node 0) DEBUGFS: debugfs: curdev /dev/mapper/OCFS2_200Gp1 debugfs: controld dump controld: Unable to access cluster service while obtaining the debug buffer debugfs: slotmap Slot# Node# 0 3 1 5 2 2 4 4 5 6 debugfs: stats Revision: 0.90 Mount Count: 0 Max Mount Count: 20 State: 0 Errors: 0 Check Interval: 0 Last Check: Mon Mar 29 10:53:52 2010 Creator OS: 0 Feature Compat: 1 backup-super Feature Incompat: 16 sparse Tunefs Incomplete: 0 Feature RO compat: 1 unwritten Root Blknum: 5 System Dir Blknum: 6 First Cluster Group Blknum: 3 Block Size Bits: 12 Cluster Size Bits: 12 Max Node Slots: 6 Extended Attributes Inline Size: 0 Label: OCFS2_APPSHARE_200G UUID: D6E0DD0AAC8844ED94A4A459FBB6F7FF UUID_hash: 0 (0x0) Cluster stack: classic o2cb Inode: 2 Mode: 00 Generation: 2428834932 (0x90c51474) FS Generation: 2428834932 (0x90c51474) CRC32: 00000000 ECC: 0000 Type: Unknown Attr: 0x0 Flags: Valid System Superblock Dynamic Features: (0x0) User: 0 (root) Group: 0 (root) Size: 0 Links: 0 Clusters: 52428119 ctime: 0x4a0b2372 -- Wed May 13 14:45:54 2009 atime: 0x0 -- Wed Dec 31 18:00:00 1969 mtime: 0x4a0b2372 -- Wed May 13 14:45:54 2009 dtime: 0x0 -- Wed Dec 31 18:00:00 1969 ctime_nsec: 0x00000000 -- 0 atime_nsec: 0x00000000 -- 0 mtime_nsec: 0x00000000 -- 0 Last Extblk: 0 Sub Alloc Slot: Global Sub Alloc Bit: 65535 It doesn't appear any extra debug logging actually was created. David -----Original Message----- From: Sunil Mushran [mailto:sunil.mush...@oracle.com] Sent: Monday, March 29, 2010 10:23 PM To: Angelo McComis Cc: David Murphy; ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 No On Mar 29, 2010, at 8:10 PM, Angelo McComis <ang...@mccomis.com> wrote: > Does it matter that the nodes are numbered 1-6 instead of 0-5? > > > > On Mon, Mar 29, 2010 at 4:25 PM, Sunil Mushran > <sunil.mush...@oracle.com > > wrote: >> Enable some debugging. >> >> #debugfs.ocfs2 -l TCP allow >> ...do mount... >> #debugfs.ocfs2 -l TCP off >> >> >> David Murphy wrote: >>> [r...@web2 ~]# nc -z 192.168.102.140 7777 Connection to >>> 192.168.102.140 7777 port [tcp/cbt] succeeded! >>> >>> [r...@web1 /etc/sysconfig/network-scripts]# nc -z 192.168.102.141 >>> 7777 Connection to 192.168.102.141 7777 port [tcp/cbt] succeeded! >>> >>> -----Original Message----- >>> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] >>> Sent: Monday, March 29, 2010 5:08 PM >>> To: David Murphy >>> Cc: ocfs2-users@oss.oracle.com >>> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 >>> >>> What happens when you use netcat to ping the node? >>> nc -z host.example.com 7777 >>> >>> David Murphy wrote: >>> >>>> Some additional data: >>>> From Web1 ( New Fedora Machine) to Web2: >>>> [r...@web1 /etc/sysconfig/network-scripts]# nmap >>>> 192.168.102.141 >>>> >>>> Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT >>>> Nmap scan report for 192.168.102.141 >>>> Host is up (0.000076s latency). >>>> Not shown: 993 closed ports >>>> PORT STATE SERVICE >>>> 22/tcp open ssh >>>> 80/tcp open http >>>> 81/tcp open hosts2-ns >>>> 111/tcp open rpcbind >>>> 5666/tcp open nrpe >>>> 7777/tcp open unknown >>>> 9102/tcp open jetdirect >>>> MAC Address: 00:50:56:A3:58:5D (VMware) >>>> >>>> Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds >>>> >>>> >>>> From web2 -> web1 (new fedora machine) >>>> [r...@web2 ~]# nmap 192.168.102.140 >>>> >>>> Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT >>>> Interesting ports on 192.168.102.140: >>>> Not shown: 994 closed ports >>>> PORT STATE SERVICE >>>> 22/tcp open ssh >>>> 80/tcp open http >>>> 81/tcp open hosts2-ns >>>> 111/tcp open rpcbind >>>> 443/tcp open https >>>> 7777/tcp open unknown >>>> MAC Address: 00:50:56:A3:14:62 (VMWare) >>>> >>>> Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds >>>> >>>> >>>> Cluster.conf: >>>> cluster: >>>> node_count = 6 >>>> name = appshare >>>> >>>> node: >>>> ip_port = 7777 >>>> ip_address = 192.168.102.140 >>>> number = 1 >>>> name = web1 >>>> cluster = appshare >>>> >>>> node: >>>> ip_port = 7777 >>>> ip_address = 192.168.102.141 >>>> number = 2 >>>> name = web2 >>>> cluster = appshare >>>> >>>> node: >>>> ip_port = 7777 >>>> ip_address = 192.168.102.142 >>>> number = 3 >>>> name = web3 >>>> cluster = appshare >>>> >>>> node: >>>> ip_port = 7777 >>>> ip_address = 192.168.102.111 >>>> number = 4 >>>> name = rgapp1 >>>> cluster = appshare >>>> >>>> node: >>>> ip_port = 7777 >>>> ip_address = 192.168.102.122 >>>> number = 5 >>>> name = deploy >>>> cluster = appshare >>>> >>>> node: >>>> ip_port = 7777 >>>> ip_address = 192.168.102.112 >>>> number = 6 >>>> name = app1 >>>> cluster = appshare >>>> >>>> DMESG on WEB1: >>>> OCFS2 1.5.0 >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 2 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 3 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 4 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 5 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 6 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1262,0):dlm_request_join:1035 ERROR: status = -107 >>>> (1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >>>> (1262,0):dlm_join_domain:1487 ERROR: status = -107 >>>> (1262,0):dlm_register_domain:1753 ERROR: status = -107 >>>> (1262,0):o2cb_cluster_connect:313 ERROR: status = -107 >>>> (1262,0):ocfs2_dlm_init:2963 ERROR: status = -107 >>>> (1262,0):ocfs2_mount_volume:1788 ERROR: status = -107 >>>> ocfs2: Unmounting device (253,1) on (node 0) >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 2 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 3 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 5 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 6 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1323,0):dlm_request_join:1035 ERROR: status = -107 >>>> (1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >>>> (1323,0):dlm_join_domain:1487 ERROR: status = -107 >>>> (1323,0):dlm_register_domain:1753 ERROR: status = -107 >>>> (1323,0):o2cb_cluster_connect:313 ERROR: status = -107 >>>> (1323,0):ocfs2_dlm_init:2963 ERROR: status = -107 >>>> (1323,0):ocfs2_mount_volume:1788 ERROR: status = -107 >>>> ocfs2: Unmounting device (253,1) on (node 0) >>>> VMCI: Major device number is: 249 >>>> VMware memory control driver initialized >>>> vmmemctl: started kernel thread pid=1522 >>>> ocfs2: Unregistered cluster interface o2cb >>>> OCFS2 Node Manager 1.5.0 >>>> OCFS2 DLM 1.5.0 >>>> ocfs2: Registered cluster interface o2cb >>>> OCFS2 DLMFS 1.5.0 >>>> OCFS2 User DLM kernel interface loaded >>>> OCFS2 1.5.0 >>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 4 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 5 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 6 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 2 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection >>>> established with node 3 after 30.0 seconds, giving up and returning >>>> errors. >>>> (1839,0):dlm_request_join:1035 ERROR: status = -107 >>>> (1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107 >>>> (1839,0):dlm_join_domain:1487 ERROR: status = -107 >>>> (1839,0):dlm_register_domain:1753 ERROR: status = -107 >>>> (1839,0):o2cb_cluster_connect:313 ERROR: status = -107 >>>> (1839,0):ocfs2_dlm_init:2963 ERROR: status = -107 >>>> (1839,0):ocfs2_mount_volume:1788 ERROR: status = -107 >>>> ocfs2: Unmounting device (253,1) on (node 0) >>>> >>>> >>>> >>>> So clearly ocfs2 the service things it can connect to the node, >>>> but nmap sees the connection just fine. And Web2 can see the port >>>> on web1 just >>>> >>> fine, >>> >>>> so there is no firewall blocking the connections. >>>> >>>> I think it might be Fedora 12 used 1.50 for the OCFS kernel >>>> module and >>>> CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this? >>>> >>>> David >>>> -----Original Message----- >>>> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] >>>> Sent: Thursday, March 25, 2010 6:46 PM >>>> To: David Murphy >>>> Cc: ocfs2-users@oss.oracle.com >>>> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2 >>>> >>>> hmm.. o2cb_ctl makes no connections. It just reads the cluster.conf >>>> and populates configfs. AFAIK. >>>> >>>> David Murphy wrote: >>>> >>>> >>>>> We had 6 nodes running CentOS 5.4 using 1.4.3 ocfs2-tools. >>>>> >>>>> >>>>> >>>>> I decided to rebuild one node with FC12. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Which is working fine, however >>>>> >>>>> >>>>> >>>>> Nmap 192.168.200.112 shows 7777 as open >>>>> >>>>> And >>>>> >>>>> >>>>> >>>>> O2cb_ctl is timing out when trying to connect to that node which >>>>> then causes a 107 error. This happens with all node and all node >>>>> have >>>>> 7777 >>>>> open via nmap from the FC machine. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Is there a way to further debug this to see what exactly >>>>> o2cb_ctl is >>>>> seeing when trying to connect? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> David >>>>> >>>>> --- >>>>> --- >>>>> ---------------------------------------------------------------- >>>>> -- >>>>> >>>>> _______________________________________________ >>>>> Ocfs2-users mailing list >>>>> Ocfs2-users@oss.oracle.com >>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users