Re: [Gluster-devel] [Gluster-users] lockd: server not responding, timed out
in slow start 1654 other TCP timeouts 40 SACK retransmits failed 49154 packets collapsed in receive queue due to low socket buffer 14237653 DSACKs sent for old packets 1 DSACKs sent for out of order packets 8963734 DSACKs received 121369 connections reset due to unexpected data 5968 connections reset due to early user close 34 connections aborted due to timeout TCPSACKDiscard: 253 TCPDSACKIgnoredOld: 64 TCPDSACKIgnoredNoUndo: 12840 TCPSpuriousRTOs: 14 TCPSackShifted: 48580479 TCPSackMerged: 43924691 TCPSackShiftFallback: 159482792 TCPBacklogDrop: 521 TCPChallengeACK: 4858 TCPSYNChallenge: 53 IpExt: InBcastPkts: 344 InOctets: -1991304967 OutOctets: 1560295186 InBcastOctets: 142592 From: Niels de Vos [nde...@redhat.com] Sent: Monday, January 26, 2015 4:37 AM To: Peter Auyeung Cc: gluster-us...@gluster.org; gluster-devel@gluster.org Subject: Re: [Gluster-devel] [Gluster-users] lockd: server not responding, timed out On Mon, Jan 26, 2015 at 12:26:53AM +, Peter Auyeung wrote: Hi Niels, The question if we keep getting the lockd error even after restart and rebooted the NFS client.. This particular error would only occur when the NFS-server could not register the nlockmgr RPC-program to rpcbind/portmapper. The most likely scenario where this fails, is where there is an NFS-client (or service) on the storage server that conflicts with the Gluster/NFS service. If there are conflicting RPC services in rpcbind/portmapper, you may be able check and remove those with the 'rpcinfo' command. Ports that are listed in te output, but are not listed in netstat/ss are in used by kernel services (like the lockd kernel module). In order to restore the NLM function of Gluster/NFS, you can take these steps: 1. ensure that there are no other NFS-services (server or client) running on the Gluster storage server. Gluster/NFS should be the only service which does some NFS on the server. 2. stop the rpcbind service 3. clear the rpcbind-cache (rm /var/lib/rpcbind/portmap.xdr) 4. start the rpcbind service 5. restart the Gluster/NFS service In case your NFS-client got connected to the incorrect NLM service on your storage server, you would need to unmount and mount the export again. Niels Peter From: Niels de Vos [nde...@redhat.com] Sent: Saturday, January 24, 2015 3:26 AM To: Peter Auyeung Cc: gluster-us...@gluster.org; gluster-devel@gluster.org Subject: Re: [Gluster-devel] [Gluster-users] lockd: server not responding, timed out On Fri, Jan 23, 2015 at 11:50:26PM +, Peter Auyeung wrote: We have a 6 nodes gluster running ubuntu on xfs sharing gluster volumes over NFS been running fine for 3 months. We restarted glusterfs-server on one of the node and all NFS clients start getting the lockd: server not responding, timed out on /var/log/messages We are still able to read write but seems like process that require a persistent file lock failed like database exports. We have an interim fix to remount the NFS with nolock option but need to know why that is necessary all in a sudden after a service glusterfs-server restart on one of the gluster node The cause that you need to mount wiht 'nolock' is that one server can only have one NLM-service active. The Linux NFS-client uses the 'lockd' kernel module, and the Gluster/NFS server provides its own lock manager. To be able to use a lock manager, it needs to be registered at rpcbind/portmapper. Only one lock manager can be registered at a time, the 2nd one that tries to register will fail. In case the NFS-client has registered the lockd kernel module as lock manager, any locking requests to the Gluster/NFS service will fail and you will see those messages in /var/log/messages. This is one of the main reasons why it is not advised to access volumes over NFS on a Gluster storage server. You should rather use the GlusterFS protocol for mounting volumes locally. (Or even better, seperate your storage servers from the application servers.) HTH, Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] lockd: server not responding, timed out
On Mon, Jan 26, 2015 at 12:26:53AM +, Peter Auyeung wrote: Hi Niels, The question if we keep getting the lockd error even after restart and rebooted the NFS client.. This particular error would only occur when the NFS-server could not register the nlockmgr RPC-program to rpcbind/portmapper. The most likely scenario where this fails, is where there is an NFS-client (or service) on the storage server that conflicts with the Gluster/NFS service. If there are conflicting RPC services in rpcbind/portmapper, you may be able check and remove those with the 'rpcinfo' command. Ports that are listed in te output, but are not listed in netstat/ss are in used by kernel services (like the lockd kernel module). In order to restore the NLM function of Gluster/NFS, you can take these steps: 1. ensure that there are no other NFS-services (server or client) running on the Gluster storage server. Gluster/NFS should be the only service which does some NFS on the server. 2. stop the rpcbind service 3. clear the rpcbind-cache (rm /var/lib/rpcbind/portmap.xdr) 4. start the rpcbind service 5. restart the Gluster/NFS service In case your NFS-client got connected to the incorrect NLM service on your storage server, you would need to unmount and mount the export again. Niels Peter From: Niels de Vos [nde...@redhat.com] Sent: Saturday, January 24, 2015 3:26 AM To: Peter Auyeung Cc: gluster-us...@gluster.org; gluster-devel@gluster.org Subject: Re: [Gluster-devel] [Gluster-users] lockd: server not responding, timed out On Fri, Jan 23, 2015 at 11:50:26PM +, Peter Auyeung wrote: We have a 6 nodes gluster running ubuntu on xfs sharing gluster volumes over NFS been running fine for 3 months. We restarted glusterfs-server on one of the node and all NFS clients start getting the lockd: server not responding, timed out on /var/log/messages We are still able to read write but seems like process that require a persistent file lock failed like database exports. We have an interim fix to remount the NFS with nolock option but need to know why that is necessary all in a sudden after a service glusterfs-server restart on one of the gluster node The cause that you need to mount wiht 'nolock' is that one server can only have one NLM-service active. The Linux NFS-client uses the 'lockd' kernel module, and the Gluster/NFS server provides its own lock manager. To be able to use a lock manager, it needs to be registered at rpcbind/portmapper. Only one lock manager can be registered at a time, the 2nd one that tries to register will fail. In case the NFS-client has registered the lockd kernel module as lock manager, any locking requests to the Gluster/NFS service will fail and you will see those messages in /var/log/messages. This is one of the main reasons why it is not advised to access volumes over NFS on a Gluster storage server. You should rather use the GlusterFS protocol for mounting volumes locally. (Or even better, seperate your storage servers from the application servers.) HTH, Niels pgphd80mfXt8M.pgp Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] lockd: server not responding, timed out
On Fri, Jan 23, 2015 at 11:50:26PM +, Peter Auyeung wrote: We have a 6 nodes gluster running ubuntu on xfs sharing gluster volumes over NFS been running fine for 3 months. We restarted glusterfs-server on one of the node and all NFS clients start getting the lockd: server not responding, timed out on /var/log/messages We are still able to read write but seems like process that require a persistent file lock failed like database exports. We have an interim fix to remount the NFS with nolock option but need to know why that is necessary all in a sudden after a service glusterfs-server restart on one of the gluster node The cause that you need to mount wiht 'nolock' is that one server can only have one NLM-service active. The Linux NFS-client uses the 'lockd' kernel module, and the Gluster/NFS server provides its own lock manager. To be able to use a lock manager, it needs to be registered at rpcbind/portmapper. Only one lock manager can be registered at a time, the 2nd one that tries to register will fail. In case the NFS-client has registered the lockd kernel module as lock manager, any locking requests to the Gluster/NFS service will fail and you will see those messages in /var/log/messages. This is one of the main reasons why it is not advised to access volumes over NFS on a Gluster storage server. You should rather use the GlusterFS protocol for mounting volumes locally. (Or even better, seperate your storage servers from the application servers.) HTH, Niels pgpbJSZ41AsSL.pgp Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel