It is possible that I'm seeing the same problem. Our AMD Opteron 4386 (16
cores) machine is also getting stuck with lots of hung tasks.

Although it responds to ping, and even a KVM virtual machine running on it
appears to continue working correctly, the host itself is locked up. This
happens once a week - probably when the machine is under the most direct
CPU load and NFS load.

Once the machine is in this state I can type in a username at the login
prompt but no password prompt ever appears.

I forced a crashdump and it contained hundreds of tasks with backtraces
involving a mutex_lock in walk_component or nfsd_lookup_dentry which look
similar to Alexander's:

PID: 499    TASK: ffff880490a29080  CPU: 11  COMMAND: "nrpe"
 #0 [ffff880454e099a8] __schedule at ffffffff8134f195
 #1 [ffff880454e09a30] __mutex_lock_common.isra.5 at ffffffff8134fb74
 #2 [ffff880454e09aa0] mutex_lock at ffffffff8134fa62
 #3 [ffff880454e09ac0] walk_component at ffffffff81103868
 #4 [ffff880454e09b30] link_path_walk at ffffffff811040c1
 #5 [ffff880454e09bc0] path_openat at ffffffff8110611d
 #6 [ffff880454e09c50] do_filp_open at ffffffff8110646d
 #7 [ffff880454e09d20] open_exec at ffffffff810fed80
 #8 [ffff880454e09d40] load_elf_binary at ffffffff81135939
 #9 [ffff880454e09e50] search_binary_handler at ffffffff810ff7fd
#10 [ffff880454e09ea0] do_execve_common.isra.24 at ffffffff81100551
#11 [ffff880454e09f10] sys_execve at ffffffff81014dd2
#12 [ffff880454e09f50] stub_execve at ffffffff813559ec
    RIP: 00007fcc8991ca87  RSP: 00007fffe8b91ef8  RFLAGS: 00000246
    RAX: 000000000000003b  RBX: 0000000000000003  RCX: ffffffffffffffff
    RDX: 000000000164d180  RSI: 00007fffe8b91f10  RDI: 00007fcc899bc3ad
    RBP: 0000000000000003   R8: 0000000000000000   R9: 00000000000001f2
    R10: 00007fcc8a88f9d0  R11: 0000000000000246  R12: 00007fffe8b91f10
    R13: 0000000000000400  R14: 0000000000000001  R15: 00007fffe8b91f10
    ORIG_RAX: 000000000000003b  CS: 0033  SS: 002b

and:

PID: 4087   TASK: ffff88040ea63840  CPU: 2   COMMAND: "nfsd"
 #0 [ffff8804034b9c00] __schedule at ffffffff8134f195
 #1 [ffff8804034b9c88] __mutex_lock_common.isra.5 at ffffffff8134fb74
 #2 [ffff8804034b9cf8] mutex_lock at ffffffff8134fa62
 #3 [ffff8804034b9d18] fh_lock_nested.isra.6 at ffffffffa043d63c [nfsd]
 #4 [ffff8804034b9d28] nfsd_lookup_dentry at ffffffffa043df1a [nfsd]
 #5 [ffff8804034b9d98] nfsd4_secinfo.part.15 at ffffffffa0447692 [nfsd]
 #6 [ffff8804034b9dc8] nfsd4_proc_compound at ffffffffa04468d6 [nfsd]
 #7 [ffff8804034b9e18] nfsd_dispatch at ffffffffa043a7cd [nfsd]
 #8 [ffff8804034b9e48] svc_process_common at ffffffffa0336c3f [sunrpc]
 #9 [ffff8804034b9eb8] svc_process at ffffffffa0337050 [sunrpc]
#10 [ffff8804034b9ed8] nfsd at ffffffffa043a0e3 [nfsd]
#11 [ffff8804034b9ef8] kthread at ffffffff8105f701
#12 [ffff8804034b9f48] kernel_thread_helper at ffffffff813576f4

and:

PID: 5013   TASK: ffff880805c8b180  CPU: 8   COMMAND: "getty"
 #0 [ffff88080cb8b9a8] __schedule at ffffffff8134f195
 #1 [ffff88080cb8ba30] __mutex_lock_common.isra.5 at ffffffff8134fb74
 #2 [ffff88080cb8baa0] mutex_lock at ffffffff8134fa62
 #3 [ffff88080cb8bac0] walk_component at ffffffff81103868
 #4 [ffff88080cb8bb30] link_path_walk at ffffffff811040c1
 #5 [ffff88080cb8bbc0] path_openat at ffffffff8110611d
 #6 [ffff88080cb8bc50] do_filp_open at ffffffff8110646d
 #7 [ffff88080cb8bd20] open_exec at ffffffff810fed80
 #8 [ffff88080cb8bd40] load_elf_binary at ffffffff81135939
 #9 [ffff88080cb8be50] search_binary_handler at ffffffff810ff7fd
#10 [ffff88080cb8bea0] do_execve_common.isra.24 at ffffffff81100551
#11 [ffff88080cb8bf10] sys_execve at ffffffff81014dd2
#12 [ffff88080cb8bf50] stub_execve at ffffffff813559ec
    RIP: 00007f0d1ed74a87  RSP: 00007fffab157528  RFLAGS: 00000206
    RAX: 000000000000003b  RBX: 0000000000000000  RCX: ffffffffffffffff
    RDX: 00007fffab159ee8  RSI: 00007fffab157600  RDI: 0000000000405d7c
    RBP: 0000000000000003   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000206  R12: 00000000006075a0
    R13: 00000000011da750  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 000000000000003b  CS: 0033  SS: 002b

ii  linux-image-amd64                            3.2+46
ii  nfs-kernel-server                            1:1.2.6-4

Mike.


-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to