05.03.2016 19:32, Dmitry Sivachenko пишет:

I am running a number of machines with /home mounted via nfs (FreeBSD 
10.3-PRERELEASE #0 r294799, rw,bg,intr,soft).

Sometimes I get the following messages in syslog:

nfs_getpages: error 4
vm_fault: pager read error, pid NNN (myprog)

After that I see I lot of processes stuck in "pfault" state (these are 
computational processes which use some files from NFS mount), they use 0% of CPU after 
that.

On NFS server machine I see nothing strange in logs.  procstat -kk for such 
stuck processes shows:
  PID    TID COMM             TDNAME           KSTACK
85274 102056 myprog           -                mi_switch+0xbe sleepq_wait+0x3a 
_sleep+0x287 vm_waitpfault+0x8a vm_fault_hold+0xdd0 vm_fault+0x77 
trap_pfault+0x180 trap+0x52c calltrap+0x8


What can be the reason of this?

For example, if some processes running on NFS server box modify some files 
"in-place"
and these files are opened by processes running on NFS client, that could be 
the reason.
If so, change this so processes updating such files create new temporary 
versions of them first
and then rename them atomically.


This should not be the case: users are working only on NFS clients.
Moreover, the nature of computations is so that each process uses it's own set 
of files.

(Forgot to mention in my previous e-mail that these processes can't be stopped 
even with kill -9)

Make sure you use TCP mounts and TSO is disabled. Try switching between 
NFSv3/NFSv4 to avoid this bug
and to discover what version is broken. And show full mount command/option set.

_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to