Bug#617666: nfs-kernel-server: Periodic nfsd failure - single nfsd process with high CPU and no mounts working

2011-03-20 Thread Luk Claes
 On 10/03/11 12:54, Debian Bug Tracking System wrote:

 I have some extra information about this problem - the syslog contains
 some kernel error messages related to nfs and xfs (the filesystem of the
 /export partition).  I have attached the relevant log section...

 It could be this is a problem with xfs or even with our hardware raid
 controller.  I have rebooted the machine with /export unmounted and am
 currently running xfs_repair over it to see if that picks up any problems.

Hi

I guess your xfs_repair finished by now? Did it shed some more light on
the issue or should we look more closely into the nfs code?

Cheers

Luk



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d86374f.5070...@debian.org



Bug#617666: nfs-kernel-server: Periodic nfsd failure - single nfsd process with high CPU and no mounts working

2011-03-10 Thread Dan Tomlinson
Package: nfs-kernel-server
Version: 1:1.2.2-4
Severity: grave
Justification: renders package unusable


Hi there,

appologies if this has already been reported but I couldn't see anything quite 
matching what I'm seeing.

I have a 26TB debian squeeze fileserver providing NFS mounts to a large number 
of users.  The system has been working flawlessly for a number of months but 
twice in the last week NFS seems to have crashed.  The first thing I noticed is 
that users reported being unable to access shares.  Logging into the system I 
see a single nfsd process taking 100% CPU with a very long run time.  
Restarting nfs-kernel-server has no effect.  The process is unkillable (even 
with -9) and the system has required a reboot to get it usable again.  jnettop 
is not showing significant network traffic and lsof on /export/ (where all my 
NFS exports are located) shows no nfs access to any files.  

Please let me know if you need any further information.  I am going to reboot 
the server now, so I may not be able to reproduce the problem straight away 
(but as its happened twice, I am quite sure it will happen again at some 
point...).

Thanks in advance for your help.

Dan Tomlinson

My /etc/exports file is below:


# /etc/exports: the access control list for filesystems which may be exported
#   to NFS clients.  See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes   hostname1(no_subtree_check,rw,sync,no_subtree_check) 
hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
#

# misc shares
/export/software
192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 
192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/system_tools
192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecure) 
192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecure)
/export/home
192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 
192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecure)

# flychip shares
/export/flychip/archives
192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 
192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/flychip/misc
192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 
192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/flychip/production  
192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 
192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/flychip/share   
192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 
192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/flychip/temp
192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 
192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)

# mickelm shares
/export/micklem/releases
192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecure) 
192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecure)
/export/micklem/data
192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecure) 
192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecure)

# logic shares
/export/logic/data  
192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecure) 
192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecure)
/export/logic/webdav
192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecure) 
192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecure)



-- System Information:
Debian Release: 6.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/16 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages nfs-kernel-server depends on:
ii  libblkid1   2.17.2-9 block device id library
ii  libc6   2.11.2-10Embedded GNU C Library: Shared lib
ii  libcomerr2  1.41.12-2common error description library
ii  libgssapi-krb5-21.8.3+dfsg-4 MIT Kerberos runtime libraries - k
ii  libgssglue1 0.1-4mechanism-switch gssapi library
ii  libk5crypto31.8.3+dfsg-4 MIT Kerberos runtime libraries - C
ii  libkrb5-3   1.8.3+dfsg-4 MIT Kerberos runtime libraries
ii  libnfsidmap20.23-2   An nfs idmapping library
ii  librpcsecgss3   0.19-2   allows secure rpc communication us
ii  libwrap07.6.q-19 Wietse Venema's TCP wrappers libra
ii  lsb-base3.2-23.2squeeze1 Linux Standard Base 3.2 init scrip
ii  nfs-common  1:1.2.2-4NFS support files common to client
ii  ucf 3.0025+nmu1  Update Configuration File: preserv

nfs-kernel-server recommends no packages.

nfs-kernel-server suggests no packages.

-- no debconf