On 2015/9/29 15:18, gjprabu wrote: > Hi Joseph, > > We have total 7 nodes and this problem occurs in multiple nodes > simultaneously not in particular one node. we checked network and its fine. > When we remount the ocfs2 partition, this problem is get fixed temporarily > and same problem reoccurs after some time. > > Even we do have problem while unmountinng. umount process goes to "D" stat, > then i need to restart server itself. Is there any solution for this issue. > > I have tried running fsck.ocfs2 in problematic machine but its throwing > error. > > fsck.ocfs2 1.8.0 > fsck.ocfs2: I/O error on channel while opening "/zoho/build/downloads" > IMO, this can happen if the mountpoint is offline.
> > Please refer the latest logs from one node. > > [258418.054204] o2cb: o2dlm has evicted node 7 from domain > A895BC216BE641A8A7E20AA89D57E051 > [258418.957738] o2cb: o2dlm has evicted node 7 from domain > A895BC216BE641A8A7E20AA89D57E051 > [264056.408719] o2dlm: Node 7 joins domain A895BC216BE641A8A7E20AA89D57E051 ( > 1 2 3 4 7 ) 5 nodes > [264464.605542] o2dlm: Node 7 leaves domain A895BC216BE641A8A7E20AA89D57E051 > ( 1 2 3 4 ) 4 nodes > [275619.497198] o2dlm: Node 7 joins domain A895BC216BE641A8A7E20AA89D57E051 ( > 1 2 3 4 7 ) 5 nodes > [426628.076148] o2cb: o2dlm has evicted node 1 from domain > A895BC216BE641A8A7E20AA89D57E051 > [426628.885084] o2dlm: Begin recovery on domain > A895BC216BE641A8A7E20AA89D57E051 for node 1 > [426628.891170] o2dlm: Node 3 (me) is the Recovery Master for the dead node 1 > in domain A895BC216BE641A8A7E20AA89D57E051 > [426634.182384] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051 > [427001.383315] o2dlm: Node 1 joins domain A895BC216BE641A8A7E20AA89D57E051 ( > 1 2 3 4 7 ) 5 nodes > The above message shows nodes in your cluster is frequently in and out. I suggest you check the cluster config in each node (/etc/ocfs2/cluster.conf as well as /sys/kernel/config/<cluster_name>/node/). I haven't used ocfs2 along with ceph rbd. So I am not sure if it has relations with rbd. > > > > Regards > G.J > ** > > > > ---- On Fri, 25 Sep 2015 06:26:57 +0530 *Joseph Qi <joseph...@huawei.com>* > wrote ---- > > On 2015/9/24 18:30, gjprabu wrote: > > Hi All, > > > > Can someone tell me what kind of is this. > > > > Regards > > Prabu GJ > > > > > > ---- On Wed, 23 Sep 2015 18:26:13 +0530 *gjprabu <gjpr...@zohocorp.com > <mailto:gjpr...@zohocorp.com>>* wrote ---- > > > > Hi All, > > > > This issue we faced in locally machine also. but it is not in all the > client only two ocfs2 client we facing this issue. > > > > Regards > > Prabu GJ > > > > > > > > ---- On Wed, 23 Sep 2015 17:49:51 +0530 *gjprabu <gjpr...@zohocorp.com > <mailto:gjpr...@zohocorp.com> <mailto:gjpr...@zohocorp.com>>* wrote ---- > > > > > > > > Hi All, > > > > We are using ocfs2 for RBD mounting and everything works fine, but > while writing or moving the data via the scripts after written it shows below > error. Please anybody help on this issue. > > > > > > > > # ls -althr > > ls: cannot access MICKEYLITE_3_0_M4_1_TEST: Input/output error > > ls: cannot access MICKEYLITE_3_0_M4_1_OLD: Input/output error > > total 0 > > d????????? ? ? ? ? ? MICKEYLITE_3_0_M4_1_TEST > > d????????? ? ? ? ? ? MICKEYLITE_3_0_M4_1_OLD > > > > _*Partition details.*_ > > > > /dev/rbd0 ocfs2 9.6T 140G 9.5T 2% /zoho/build/downloads > > > > /etc/ocfs2/cluster.conf > > cluster: > > node_count=7 > > heartbeat_mode = local > > name=ocfs2 > > > > node: > > ip_port = 7777 > > ip_address = 10.1.1.50 > > number = 1 > > name = integ-hm5 > > cluster = ocfs2 > > > > node: > > ip_port = 7777 > > ip_address = 10.1.1.51 > > number = 2 > > name = integ-hm9 > > cluster = ocfs2 > > > > node: > > ip_port = 7777 > > ip_address = 10.1.1.52 > > number = 3 > > name = integ-hm2 > > cluster = ocfs2 > > > > node: > > ip_port = 7777 > > ip_address = 10.1.1.53 > > number = 4 > > name = integ-ci-1 > > cluster = ocfs2 > > node: > > ip_port = 7777 > > ip_address = 10.1.1.54 > > number = 5 > > name = integ-cm2 > > cluster = ocfs2 > > node: > > ip_port = 7777 > > ip_address = 10.1.1.55 > > number = 6 > > name = integ-cm1 > > cluster = ocfs2 > > node: > > ip_port = 7777 > > ip_address = 10.1.1.56 > > number = 7 > > name = integ-hm8 > > cluster = ocfs2 > > > > > > *_Error on dmesg_* > > > > > > [516421.342393] (dlm_thread,51005,25):dlm_flush_asts:599 ERROR: status > = -112 > > [517119.689992] (httpd,64399,31):dlm_do_master_request:1383 ERROR: link > to 1 went down! > > [517119.690003] (dlm_thread,51005,25):dlm_send_proxy_ast_msg:482 ERROR: > A895BC216BE641A8A7E20AA89D57E051: res S000000000000000000000200000000, error > -112 send AST to node 1 > > [517119.690028] (dlm_thread,51005,25):dlm_flush_asts:599 ERROR: status > = -112 > > [517119.690034] (dlm_thread,51005,25):dlm_send_proxy_ast_msg:482 ERROR: > A895BC216BE641A8A7E20AA89D57E051: res S000000000000000000000200000000, error > -107 send AST to node 1 > > [517119.690036] (dlm_thread,51005,25):dlm_flush_asts:599 ERROR: status > = -107 > > [517119.700425] (httpd,64399,31):dlm_get_lock_resource:968 ERROR: > status = -112 > > [517517.894949] (dlm_thread,51005,25):dlm_send_proxy_ast_msg:482 ERROR: > A895BC216BE641A8A7E20AA89D57E051: res S000000000000000000000200000000, error > -112 send AST to node 1 > > [517517.899640] (dlm_thread,51005,25):dlm_flush_asts:599 ERROR: status > = -112 > > > The error messages means the connection between this node and node 1 has > problem. > You have to check the network. > > > > > Regards > > Prabu GJ > > > > > > > > _______________________________________________ > > Ocfs2-users mailing list > > Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com> > <mailto:Ocfs2-users@oss.oracle.com> > > https://oss.oracle.com/mailman/listinfo/ocfs2-users > > > > > > > > > > _______________________________________________ > > Ocfs2-users mailing list > > Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com> > > https://oss.oracle.com/mailman/listinfo/ocfs2-users > > > > > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users