Re: [Gluster-users] Self Heal fails...
I'm using GlusterFS version 3.2.3 built from the sources of the gluster.org website. I think I've found a way. I've shutdown my volume, detached the peers and basically recreated my storage volume from scratch. This time I started the setup with probing a peer from the node that had the up to date data in its underlying storage directory. Then I created the Volume again from scratch, this time entering node2:/export first and then node1:/export. Then I mounted the Gluster Volume locally and am currently running the find one liner on it. Judging from the logs, it seems to be rebuilding. I'm just wondering if there is perhaps a more elegant way to force a resync. It would be nice if there was a feature or a command, so that you can say: ok Node2, you are the main source, node1 listen to what node2 has to say. On 09/16/2011 08:31 PM, Burnash, James wrote: > Hi Robert. > > Can you tell us what version you are running? That helps nail down if this is > a known bug in a specific version. > > James Burnash > Unix Engineer > Knight Capital Group > > > -Original Message- > From: gluster-users-boun...@gluster.org > [mailto:gluster-users-boun...@gluster.org] On Behalf Of Robert Krig > Sent: Friday, September 16, 2011 2:17 PM > To: gluster-users@gluster.org > Subject: Re: [Gluster-users] Self Heal fails... > > > On 09/16/2011 06:36 PM, Robert Krig wrote: >> Hi there. I'm new to GlusterFS. I'm currently evaluating it for >> production usage. >> >> I have two Storage Servers which use JFS as a filesystem for the >> underlying export. >> >> The setup is supposed to be replicated. >> >> I've been experimenting with various settings for benchmarking and >> such, as well as trying out different failure scenarios. >> >> Anyways, the export directory on node 1 is out of sync with node 2. >> So I mounted the storage volume via glusterfs client on node1 in >> another directory. >> >> The fuse mounted directory is /storage >> >> As per Manual I tried doing the "find -noleaf -print0 >> | xargs --null stat >/dev/null" dance, however the logs throw a bunch >> of >> errors: >> ## >> ### >> [2011-09-16 18:29:33.759729] E >> [client3_1-fops.c:1216:client3_1_inodelk_cbk] 0-GLSTORAGE-client-0: >> error >> [2011-09-16 18:29:33.759747] I >> [client3_1-fops.c:1226:client3_1_inodelk_cbk] 0-GLSTORAGE-client-0: >> remote operation failed: Invalid argument >> [2011-09-16 18:29:33.759942] E >> [afr-self-heal-metadata.c:672:afr_sh_metadata_post_nonblocking_inodelk >> _cbk] >> 0-GLSTORAGE-replicate-0: Non Blocking metadata inodelks failed for /. >> [2011-09-16 18:29:33.759961] E >> [afr-self-heal-metadata.c:674:afr_sh_metadata_post_nonblocking_inodelk >> _cbk] >> 0-GLSTORAGE-replicate-0: Metadata self-heal failed for /. >> [2011-09-16 18:29:33.760167] W [rpc-common.c:64:xdr_to_generic] >> (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x7d) >> [0x7f4702a751ad] >> (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) >> [0x7f4702a74de5] >> (-->/usr/local/lib/glusterfs/3.2.3/xlator/protocol/client.so(client3_1 >> _entrylk_cbk+0x52) >> [0x7f46ff88a572]))) 0-xdr: XDR decoding failed >> [2011-09-16 18:29:33.760200] E >> [client3_1-fops.c:1292:client3_1_entrylk_cbk] 0-GLSTORAGE-client-0: >> error >> [2011-09-16 18:29:33.760215] I >> [client3_1-fops.c:1303:client3_1_entrylk_cbk] 0-GLSTORAGE-client-0: >> remote operation failed: Invalid argument >> [2011-09-16 18:29:33.760417] E >> [afr-self-heal-entry.c:2292:afr_sh_post_nonblocking_entry_cbk] >> 0-GLSTORAGE-replicate-0: Non Blocking entrylks failed for /. >> [2011-09-16 18:29:33.760447] E >> [afr-self-heal-common.c:1554:afr_self_heal_completion_cbk] >> 0-GLSTORAGE-replicate-0: background meta-data entry self-heal failed >> on / >> [2011-09-16 18:29:33.760808] I >> [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: >> remote operation failed: Invalid argument >> ## >> # >> >> >> Is this normal? The directory in question already has 150GB of data, >> so the find command is still running. Will it be ok once it finishes? >> from what I understand from the manual, the files should repair as the >> find process runs, or did I misinterpret that? >> >> If self heal should fail, is there a failsafe method to ensure that >> both nodes are in sync again? >> >> &
Re: [Gluster-users] Self Heal fails...
Hi Robert. Can you tell us what version you are running? That helps nail down if this is a known bug in a specific version. James Burnash Unix Engineer Knight Capital Group -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Robert Krig Sent: Friday, September 16, 2011 2:17 PM To: gluster-users@gluster.org Subject: Re: [Gluster-users] Self Heal fails... On 09/16/2011 06:36 PM, Robert Krig wrote: > > Hi there. I'm new to GlusterFS. I'm currently evaluating it for > production usage. > > I have two Storage Servers which use JFS as a filesystem for the > underlying export. > > The setup is supposed to be replicated. > > I've been experimenting with various settings for benchmarking and > such, as well as trying out different failure scenarios. > > Anyways, the export directory on node 1 is out of sync with node 2. > So I mounted the storage volume via glusterfs client on node1 in > another directory. > > The fuse mounted directory is /storage > > As per Manual I tried doing the "find -noleaf -print0 > | xargs --null stat >/dev/null" dance, however the logs throw a bunch > of > errors: > ## > ### > [2011-09-16 18:29:33.759729] E > [client3_1-fops.c:1216:client3_1_inodelk_cbk] 0-GLSTORAGE-client-0: > error > [2011-09-16 18:29:33.759747] I > [client3_1-fops.c:1226:client3_1_inodelk_cbk] 0-GLSTORAGE-client-0: > remote operation failed: Invalid argument > [2011-09-16 18:29:33.759942] E > [afr-self-heal-metadata.c:672:afr_sh_metadata_post_nonblocking_inodelk > _cbk] > 0-GLSTORAGE-replicate-0: Non Blocking metadata inodelks failed for /. > [2011-09-16 18:29:33.759961] E > [afr-self-heal-metadata.c:674:afr_sh_metadata_post_nonblocking_inodelk > _cbk] > 0-GLSTORAGE-replicate-0: Metadata self-heal failed for /. > [2011-09-16 18:29:33.760167] W [rpc-common.c:64:xdr_to_generic] > (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x7d) > [0x7f4702a751ad] > (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) > [0x7f4702a74de5] > (-->/usr/local/lib/glusterfs/3.2.3/xlator/protocol/client.so(client3_1 > _entrylk_cbk+0x52) > [0x7f46ff88a572]))) 0-xdr: XDR decoding failed > [2011-09-16 18:29:33.760200] E > [client3_1-fops.c:1292:client3_1_entrylk_cbk] 0-GLSTORAGE-client-0: > error > [2011-09-16 18:29:33.760215] I > [client3_1-fops.c:1303:client3_1_entrylk_cbk] 0-GLSTORAGE-client-0: > remote operation failed: Invalid argument > [2011-09-16 18:29:33.760417] E > [afr-self-heal-entry.c:2292:afr_sh_post_nonblocking_entry_cbk] > 0-GLSTORAGE-replicate-0: Non Blocking entrylks failed for /. > [2011-09-16 18:29:33.760447] E > [afr-self-heal-common.c:1554:afr_self_heal_completion_cbk] > 0-GLSTORAGE-replicate-0: background meta-data entry self-heal failed > on / > [2011-09-16 18:29:33.760808] I > [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: > remote operation failed: Invalid argument > ## > # > > > Is this normal? The directory in question already has 150GB of data, > so the find command is still running. Will it be ok once it finishes? > from what I understand from the manual, the files should repair as the > find process runs, or did I misinterpret that? > > If self heal should fail, is there a failsafe method to ensure that > both nodes are in sync again? > > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > Well, the find process has finished in the meantime, and as expected, it didn't fix anything. here are the last few lines of the client mount log: ## 2011-09-16 18:48:45.287954] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 18:48:45.288394] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 18:48:45.288921] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 18:48:45.289535] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 18:48:45.290063] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 18:48:45.290649] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument
Re: [Gluster-users] Self Heal fails...
On 09/16/2011 06:36 PM, Robert Krig wrote: > > Hi there. I'm new to GlusterFS. I'm currently evaluating it for > production usage. > > I have two Storage Servers which use JFS as a filesystem for the > underlying export. > > The setup is supposed to be replicated. > > I've been experimenting with various settings for benchmarking and such, > as well as trying out different failure scenarios. > > Anyways, the export directory on node 1 is out of sync with node 2. > So I mounted the storage volume via glusterfs client on node1 in another > directory. > > The fuse mounted directory is /storage > > As per Manual I tried doing the "find -noleaf -print0 | > xargs --null stat >/dev/null" dance, however the logs throw a bunch of > errors: > # > [2011-09-16 18:29:33.759729] E > [client3_1-fops.c:1216:client3_1_inodelk_cbk] 0-GLSTORAGE-client-0: error > [2011-09-16 18:29:33.759747] I > [client3_1-fops.c:1226:client3_1_inodelk_cbk] 0-GLSTORAGE-client-0: > remote operation failed: Invalid argument > [2011-09-16 18:29:33.759942] E > [afr-self-heal-metadata.c:672:afr_sh_metadata_post_nonblocking_inodelk_cbk] > 0-GLSTORAGE-replicate-0: Non Blocking metadata inodelks failed for /. > [2011-09-16 18:29:33.759961] E > [afr-self-heal-metadata.c:674:afr_sh_metadata_post_nonblocking_inodelk_cbk] > 0-GLSTORAGE-replicate-0: Metadata self-heal failed for /. > [2011-09-16 18:29:33.760167] W [rpc-common.c:64:xdr_to_generic] > (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x7d) [0x7f4702a751ad] > (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) > [0x7f4702a74de5] > (-->/usr/local/lib/glusterfs/3.2.3/xlator/protocol/client.so(client3_1_entrylk_cbk+0x52) > [0x7f46ff88a572]))) 0-xdr: XDR decoding failed > [2011-09-16 18:29:33.760200] E > [client3_1-fops.c:1292:client3_1_entrylk_cbk] 0-GLSTORAGE-client-0: error > [2011-09-16 18:29:33.760215] I > [client3_1-fops.c:1303:client3_1_entrylk_cbk] 0-GLSTORAGE-client-0: > remote operation failed: Invalid argument > [2011-09-16 18:29:33.760417] E > [afr-self-heal-entry.c:2292:afr_sh_post_nonblocking_entry_cbk] > 0-GLSTORAGE-replicate-0: Non Blocking entrylks failed for /. > [2011-09-16 18:29:33.760447] E > [afr-self-heal-common.c:1554:afr_self_heal_completion_cbk] > 0-GLSTORAGE-replicate-0: background meta-data entry self-heal failed on / > [2011-09-16 18:29:33.760808] I > [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: > remote operation failed: Invalid argument > ### > > > Is this normal? The directory in question already has 150GB of data, so > the find command is still running. Will it be ok once it finishes? > from what I understand from the manual, the files should repair as the > find process runs, or did I misinterpret that? > > If self heal should fail, is there a failsafe method to ensure that both > nodes are in sync again? > > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > Well, the find process has finished in the meantime, and as expected, it didn't fix anything. here are the last few lines of the client mount log: ## 2011-09-16 18:48:45.287954] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 18:48:45.288394] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 18:48:45.288921] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 18:48:45.289535] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 18:48:45.290063] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 18:48:45.290649] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 18:48:45.291126] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 20:14:52.289901] W [rpc-common.c:64:xdr_to_generic] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x7d) [0x7f4702a751ad] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f4702a74de5] (-->/usr/local/lib/glusterfs/3.2.3/xlator/protocol/client.so(client3_1_statfs_cbk+0x71) [0x7f46ff88b741]))) 0-xdr: XDR decoding failed [2011-09-16 20:14:52.289928] E [client3_1-fops.c:624:client3_1_statfs_cbk] 0-GLSTORAGE-client-0: error [2011-09-16 20:14:52.289939] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument #
[Gluster-users] Self Heal fails...
Hi there. I'm new to GlusterFS. I'm currently evaluating it for production usage. I have two Storage Servers which use JFS as a filesystem for the underlying export. The setup is supposed to be replicated. I've been experimenting with various settings for benchmarking and such, as well as trying out different failure scenarios. Anyways, the export directory on node 1 is out of sync with node 2. So I mounted the storage volume via glusterfs client on node1 in another directory. The fuse mounted directory is /storage As per Manual I tried doing the "find -noleaf -print0 | xargs --null stat >/dev/null" dance, however the logs throw a bunch of errors: # [2011-09-16 18:29:33.759729] E [client3_1-fops.c:1216:client3_1_inodelk_cbk] 0-GLSTORAGE-client-0: error [2011-09-16 18:29:33.759747] I [client3_1-fops.c:1226:client3_1_inodelk_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 18:29:33.759942] E [afr-self-heal-metadata.c:672:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-GLSTORAGE-replicate-0: Non Blocking metadata inodelks failed for /. [2011-09-16 18:29:33.759961] E [afr-self-heal-metadata.c:674:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-GLSTORAGE-replicate-0: Metadata self-heal failed for /. [2011-09-16 18:29:33.760167] W [rpc-common.c:64:xdr_to_generic] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x7d) [0x7f4702a751ad] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f4702a74de5] (-->/usr/local/lib/glusterfs/3.2.3/xlator/protocol/client.so(client3_1_entrylk_cbk+0x52) [0x7f46ff88a572]))) 0-xdr: XDR decoding failed [2011-09-16 18:29:33.760200] E [client3_1-fops.c:1292:client3_1_entrylk_cbk] 0-GLSTORAGE-client-0: error [2011-09-16 18:29:33.760215] I [client3_1-fops.c:1303:client3_1_entrylk_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument [2011-09-16 18:29:33.760417] E [afr-self-heal-entry.c:2292:afr_sh_post_nonblocking_entry_cbk] 0-GLSTORAGE-replicate-0: Non Blocking entrylks failed for /. [2011-09-16 18:29:33.760447] E [afr-self-heal-common.c:1554:afr_self_heal_completion_cbk] 0-GLSTORAGE-replicate-0: background meta-data entry self-heal failed on / [2011-09-16 18:29:33.760808] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0: remote operation failed: Invalid argument ### Is this normal? The directory in question already has 150GB of data, so the find command is still running. Will it be ok once it finishes? from what I understand from the manual, the files should repair as the find process runs, or did I misinterpret that? If self heal should fail, is there a failsafe method to ensure that both nodes are in sync again? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users