Re: [Gluster-devel] [Gluster-users] lockd: server not responding, timed out

2015-01-27 Thread Peter Auyeung
 in slow start
1654 other TCP timeouts
40 SACK retransmits failed
49154 packets collapsed in receive queue due to low socket buffer
14237653 DSACKs sent for old packets
1 DSACKs sent for out of order packets
8963734 DSACKs received
121369 connections reset due to unexpected data
5968 connections reset due to early user close
34 connections aborted due to timeout
TCPSACKDiscard: 253
TCPDSACKIgnoredOld: 64
TCPDSACKIgnoredNoUndo: 12840
TCPSpuriousRTOs: 14
TCPSackShifted: 48580479
TCPSackMerged: 43924691
TCPSackShiftFallback: 159482792
TCPBacklogDrop: 521
TCPChallengeACK: 4858
TCPSYNChallenge: 53
IpExt:
InBcastPkts: 344
InOctets: -1991304967
OutOctets: 1560295186
InBcastOctets: 142592

From: Niels de Vos [nde...@redhat.com]
Sent: Monday, January 26, 2015 4:37 AM
To: Peter Auyeung
Cc: gluster-us...@gluster.org; gluster-devel@gluster.org
Subject: Re: [Gluster-devel] [Gluster-users] lockd: server  not responding, 
timed out

On Mon, Jan 26, 2015 at 12:26:53AM +, Peter Auyeung wrote:
 Hi Niels,

 The question if we keep getting the lockd error even after restart and
 rebooted the NFS client..

This particular error would only occur when the NFS-server could not
register the nlockmgr RPC-program to rpcbind/portmapper. The most likely
scenario where this fails, is where there is an NFS-client (or service)
on the storage server that conflicts with the Gluster/NFS service.

If there are conflicting RPC services in rpcbind/portmapper, you may be
able check and remove those with the 'rpcinfo' command. Ports that are
listed in te output, but are not listed in netstat/ss are in used by
kernel services (like the lockd kernel module).

In order to restore the NLM function of Gluster/NFS, you can take these
steps:

1. ensure that there are no other NFS-services (server or client)
   running on the Gluster storage server. Gluster/NFS should be the only
   service which does some NFS on the server.
2. stop the rpcbind service
3. clear the rpcbind-cache (rm /var/lib/rpcbind/portmap.xdr)
4. start the rpcbind service
5. restart the Gluster/NFS service


In case your NFS-client got connected to the incorrect NLM service on
your storage server, you would need to unmount and mount the export
again.

Niels


 Peter
 
 From: Niels de Vos [nde...@redhat.com]
 Sent: Saturday, January 24, 2015 3:26 AM
 To: Peter Auyeung
 Cc: gluster-us...@gluster.org; gluster-devel@gluster.org
 Subject: Re: [Gluster-devel] [Gluster-users] lockd: server  not responding, 
 timed out

 On Fri, Jan 23, 2015 at 11:50:26PM +, Peter Auyeung wrote:
  We have a 6 nodes gluster running ubuntu on xfs sharing gluster
  volumes over NFS been running fine for 3 months.
  We restarted glusterfs-server on one of the node and all NFS clients
  start getting the  lockd: server  not responding, timed out on
  /var/log/messages
 
  We are still able to read write but seems like process that require a
  persistent file lock failed like database exports.
 
  We have an interim fix to remount the NFS with nolock option but need
  to know why that is necessary all in a sudden after a service
  glusterfs-server restart on one of the gluster node

 The cause that you need to mount wiht 'nolock' is that one server can
 only have one NLM-service active. The Linux NFS-client uses the 'lockd'
 kernel module, and the Gluster/NFS server provides its own lock manager.
 To be able to use a lock manager, it needs to be registered at
 rpcbind/portmapper. Only one lock manager can be registered at a time,
 the 2nd one that tries to register will fail. In case the NFS-client has
 registered the lockd kernel module as lock manager, any locking requests
 to the Gluster/NFS service will fail and you will see those messages in
 /var/log/messages.

 This is one of the main reasons why it is not advised to access volumes
 over NFS on a Gluster storage server. You should rather use the
 GlusterFS protocol for mounting volumes locally. (Or even better,
 seperate your storage servers from the application servers.)

 HTH,
 Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] lockd: server not responding, timed out

2015-01-26 Thread Niels de Vos
On Mon, Jan 26, 2015 at 12:26:53AM +, Peter Auyeung wrote:
 Hi Niels,
 
 The question if we keep getting the lockd error even after restart and
 rebooted the NFS client..

This particular error would only occur when the NFS-server could not
register the nlockmgr RPC-program to rpcbind/portmapper. The most likely
scenario where this fails, is where there is an NFS-client (or service)
on the storage server that conflicts with the Gluster/NFS service.

If there are conflicting RPC services in rpcbind/portmapper, you may be
able check and remove those with the 'rpcinfo' command. Ports that are
listed in te output, but are not listed in netstat/ss are in used by
kernel services (like the lockd kernel module).

In order to restore the NLM function of Gluster/NFS, you can take these
steps:

1. ensure that there are no other NFS-services (server or client)
   running on the Gluster storage server. Gluster/NFS should be the only
   service which does some NFS on the server.
2. stop the rpcbind service
3. clear the rpcbind-cache (rm /var/lib/rpcbind/portmap.xdr)
4. start the rpcbind service
5. restart the Gluster/NFS service


In case your NFS-client got connected to the incorrect NLM service on
your storage server, you would need to unmount and mount the export
again.

Niels

 
 Peter
 
 From: Niels de Vos [nde...@redhat.com]
 Sent: Saturday, January 24, 2015 3:26 AM
 To: Peter Auyeung
 Cc: gluster-us...@gluster.org; gluster-devel@gluster.org
 Subject: Re: [Gluster-devel] [Gluster-users] lockd: server  not responding, 
 timed out
 
 On Fri, Jan 23, 2015 at 11:50:26PM +, Peter Auyeung wrote:
  We have a 6 nodes gluster running ubuntu on xfs sharing gluster
  volumes over NFS been running fine for 3 months.
  We restarted glusterfs-server on one of the node and all NFS clients
  start getting the  lockd: server  not responding, timed out on
  /var/log/messages
 
  We are still able to read write but seems like process that require a
  persistent file lock failed like database exports.
 
  We have an interim fix to remount the NFS with nolock option but need
  to know why that is necessary all in a sudden after a service
  glusterfs-server restart on one of the gluster node
 
 The cause that you need to mount wiht 'nolock' is that one server can
 only have one NLM-service active. The Linux NFS-client uses the 'lockd'
 kernel module, and the Gluster/NFS server provides its own lock manager.
 To be able to use a lock manager, it needs to be registered at
 rpcbind/portmapper. Only one lock manager can be registered at a time,
 the 2nd one that tries to register will fail. In case the NFS-client has
 registered the lockd kernel module as lock manager, any locking requests
 to the Gluster/NFS service will fail and you will see those messages in
 /var/log/messages.
 
 This is one of the main reasons why it is not advised to access volumes
 over NFS on a Gluster storage server. You should rather use the
 GlusterFS protocol for mounting volumes locally. (Or even better,
 seperate your storage servers from the application servers.)
 
 HTH,
 Niels


pgphd80mfXt8M.pgp
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] lockd: server not responding, timed out

2015-01-24 Thread Niels de Vos
On Fri, Jan 23, 2015 at 11:50:26PM +, Peter Auyeung wrote:
 We have a 6 nodes gluster running ubuntu on xfs sharing gluster
 volumes over NFS been running fine for 3 months.
 We restarted glusterfs-server on one of the node and all NFS clients
 start getting the  lockd: server  not responding, timed out on
 /var/log/messages
 
 We are still able to read write but seems like process that require a
 persistent file lock failed like database exports.
 
 We have an interim fix to remount the NFS with nolock option but need
 to know why that is necessary all in a sudden after a service
 glusterfs-server restart on one of the gluster node

The cause that you need to mount wiht 'nolock' is that one server can
only have one NLM-service active. The Linux NFS-client uses the 'lockd'
kernel module, and the Gluster/NFS server provides its own lock manager.
To be able to use a lock manager, it needs to be registered at
rpcbind/portmapper. Only one lock manager can be registered at a time,
the 2nd one that tries to register will fail. In case the NFS-client has
registered the lockd kernel module as lock manager, any locking requests
to the Gluster/NFS service will fail and you will see those messages in
/var/log/messages.

This is one of the main reasons why it is not advised to access volumes
over NFS on a Gluster storage server. You should rather use the
GlusterFS protocol for mounting volumes locally. (Or even better,
seperate your storage servers from the application servers.)

HTH,
Niels


pgpbJSZ41AsSL.pgp
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel