Package: linux-image-3.16.0-4-amd64 Version: 3.16.36-1+deb8u1 One of our systems is suddenly unable to stat() a particular file (cookies.sqlite-wal in a user's Firefox profile). Any attempt to do so hangs in the system call, as shown by strace. The file resides on an NFSv4 share (sec=krb5p). Other files in the same directory on the same share remain accessible. The affected file is normally accessible on the NFS server and from other NFS clients running the same kernel.
The user has reported a similar incident yesterday on some directories on a different NFS share (also sec=krb5p, but hosted on a different server). He rebooted to clear up the problem. I'd like advice on how to troubleshoot this effectively. I've tried rpcdebug -m {nfs,rpc,nlm} -s all but didn't see any smoking gun; maybe some information cached by the kernel is suppressing NFS activity associated with the stat() calls. The log entries I do see say NFS: nfs_lookup_revalidate(cookies.sqlite-wal) is valid Some related kernel traces from this system's logs (in chronological order, with an intervening reboot; the first trace is associated with yesterday's incident, the second trace is 2-3 minutes newer than the timestamp on cookies.sqlite-wal): [97483.663949] INFO: task ls:23767 blocked for more than 120 seconds. [97483.663951] Tainted: P W O 3.16.0-4-amd64 #1 [97483.663952] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [97483.663954] ls D ffff8800afd3b808 0 23767 1 0x00000004 [97483.663957] ffff8800afd3b3b0 0000000000000086 0000000000012f40 ffff88039a057fd8 [97483.663960] 0000000000012f40 ffff8800afd3b3b0 ffff88041ea137f0 ffff88041edcd128 [97483.663962] 0000000000000002 ffffffff8113eb50 ffff88039a057d60 ffff88039a057e40 [97483.663965] Call Trace: [97483.663968] [<ffffffff8113eb50>] ? wait_on_page_read+0x60/0x60 [97483.663971] [<ffffffff81515069>] ? io_schedule+0x99/0x120 [97483.663974] [<ffffffff8113eb5a>] ? sleep_on_page+0xa/0x10 [97483.663977] [<ffffffff815153ec>] ? __wait_on_bit+0x5c/0x90 [97483.663980] [<ffffffff8113e956>] ? wait_on_page_bit+0xc6/0xd0 [97483.663983] [<ffffffff810a95c0>] ? autoremove_wake_function+0x30/0x30 [97483.663986] [<ffffffff8114c22d>] ? pagevec_lookup_tag+0x1d/0x30 [97483.663989] [<ffffffff8113ea30>] ? filemap_fdatawait_range+0xd0/0x160 [97483.663993] [<ffffffff81140656>] ? filemap_write_and_wait+0x36/0x50 [97483.664002] [<ffffffffa06c4c08>] ? nfs_getattr+0x108/0x220 [nfs] [97483.664005] [<ffffffff811af057>] ? vfs_fstatat+0x57/0x90 [97483.664009] [<ffffffff811af51d>] ? SYSC_newlstat+0x1d/0x40 [97483.664013] [<ffffffff8151854d>] ? system_call_fast_compare_end+0x10/0x15 [ 9724.415533] INFO: task mozStorage #5:2748 blocked for more than 120 seconds. [ 9724.415537] Tainted: P W O 3.16.0-4-amd64 #1 [ 9724.415538] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9724.415539] mozStorage #5 D ffff8803ee153a88 0 2748 2323 0x00000000 [ 9724.415542] ffff8803ee153630 0000000000000082 0000000000012f40 ffff8803ee2cffd8 [ 9724.415544] 0000000000012f40 ffff8803ee153630 ffff88041ea537f0 ffff88041edaf7a0 [ 9724.415545] 0000000000000002 ffffffffa0669800 ffff8803ee2cfc30 ffff88040b1152c0 [ 9724.415547] Call Trace: [ 9724.415557] [<ffffffffa0669800>] ? nfs_pageio_doio+0x50/0x50 [nfs] [ 9724.415560] [<ffffffff81515069>] ? io_schedule+0x99/0x120 [ 9724.415563] [<ffffffffa066980a>] ? nfs_wait_bit_uninterruptible+0xa/0x10 [nfs] [ 9724.415566] [<ffffffff815153ec>] ? __wait_on_bit+0x5c/0x90 [ 9724.415568] [<ffffffff810738ba>] ? internal_add_timer+0x2a/0x70 [ 9724.415571] [<ffffffffa0669800>] ? nfs_pageio_doio+0x50/0x50 [nfs] [ 9724.415573] [<ffffffff81515497>] ? out_of_line_wait_on_bit+0x77/0x90 [ 9724.415575] [<ffffffff810a95c0>] ? autoremove_wake_function+0x30/0x30 [ 9724.415578] [<ffffffffa066ef2e>] ? nfs_updatepage+0x15e/0x830 [nfs] [ 9724.415582] [<ffffffffa06609a7>] ? nfs_write_end+0x57/0x320 [nfs] [ 9724.415585] [<ffffffff81164d25>] ? iov_iter_copy_from_user_atomic+0x75/0x190 [ 9724.415588] [<ffffffff8113f08b>] ? generic_perform_write+0x11b/0x1c0 [ 9724.415590] [<ffffffff81140f98>] ? __generic_file_write_iter+0x158/0x340 [ 9724.415592] [<ffffffff811411b9>] ? generic_file_write_iter+0x39/0xa0 [ 9724.415595] [<ffffffffa065fdb3>] ? nfs_file_write+0x83/0x1a0 [nfs] [ 9724.415598] [<ffffffff811aa134>] ? new_sync_write+0x74/0xa0 [ 9724.415600] [<ffffffff811aa872>] ? vfs_write+0xb2/0x1f0 [ 9724.415601] [<ffffffff811ab3b2>] ? SyS_write+0x42/0xa0 [ 9724.415603] [<ffffffff811aa393>] ? SyS_lseek+0x43/0xa0 [ 9724.415605] [<ffffffff8151854d>] ? system_call_fast_compare_end+0x10/0x15