On Mon, 13 Jun 2005, Federico Giannici wrote: > I have an MX mail server that receives email messages and saves them to > an email storage server via NFS. > > Both pc are OpenBSD i386, version 3.7 for the NFS client (MX server) and > 3.4 for the NFS server (the storage server). > > From time to time the connections from the NFS clients seem to freeze > (at least the new ones).
"Connection" is a bit confusing term here, since by default NFS is a connectionless protocol, it uses UDP. Only if you tell it to use TCP, it uses connections. Are you using TCP or UDP? > > I applied the famous NFS patch that disables write gathering for v3 > (http://marc.theaimsgroup.com/?l=openbsd-misc&m=110676811107986&w=2), > but the problem remains (perhaps a little less frequent). > > I have also raised the number of nfsd processes and "vfs.nfs.iothreads" to 20. > > The server uses a fxp interface and the client an sk one. From "netstat -i" I > have seen that there are no errors or collisions. > > Here is the nfsstat output for the client and the server after almost a day of > uptime: > > Client Info: > Rpc Counts: > Getattr Setattr Lookup Readlink Read Write Create Remove > 136400 12 429651 0 13653 178549 25790 25819 > Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access > 5415 20016 0 16 0 336388 0 1359316 > Mknod Fsstat Fsinfo PathConf Commit > 0 27008 1 0 42530 > Rpc Info: > TimedOut Invalid X Replies Retries Requests > 4 0 139 28889 2600564 > Cache Info: > Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses > 1669083 136400 1239823 424253 67080 13653 632860 178549 > BioRLHits Misses BioD Hits Misses DirE Hits Misses > 0 0 0 0 26996 27954 > > > Server Info: > Getattr Setattr Lookup Readlink Read Write Create Remove > 90847 0 269426 0 8882 137000 16908 16947 > Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access > 3263 13427 0 0 0 197032 0 872760 > Mknod Fsstat Fsinfo PathConf Commit > 0 16594 0 0 28598 > Server Ret-Failed > 65447 > Server Faults > 0 > Server Cache Stats: > Inprog Idem Non-idem Misses > 21 14256 920 1657428 > Server Write Gathering: > WriteOps WriteRPC Opsaved > 136997 137000 3 > > > What make me worry is the hight value of the "Ret-Failed" field. > Is it normal? Yes. Ret-Failed is increased when opening a nonexistent file, for example. A little experiment on an idle nfs server: server: nfsstat -s > a client: cat nonexistentfile server: nfsstat -s > b server: diff -u a b --- a Tue Jun 14 09:03:05 2005 +++ b Tue Jun 14 09:03:11 2005 @@ -1,17 +1,17 @@ Server Info: Getattr Setattr Lookup Readlink Read Write Create Remove - 160196 432 37640 17 11621 19508 1092 866 + 160196 432 37641 17 11621 19508 1092 866 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access - 207 50 4 28 28 909 0 3214591 + 207 50 4 28 28 909 0 3214619 Mknod Fsstat Fsinfo PathConf Commit 0 362 4 0 3426 Server Ret-Failed - 18377 + 18378 Server Faults 0 Server Cache Stats: Inprog Idem Non-idem Misses - 13 290 1 3450691 + 13 290 1 3450720 Server Write Gathering: WriteOps WriteRPC Opsaved 17979 19508 1529 > I have no experience of NFS, is it normal that sometime ot stalls? No. This is not normal. But there have been quite some fixes in NFS since 3.4, so it might be worth the trouble upgrading the server. > What else I could do to prevent this to happen? Some relevant questions/hints: - Is all I/O stalled? Or just I/O to certain mailboxes? - Are there any stale lock files in the mail dir? - Does the maillog say something about failing deliveries on the NFS client? Or do the local delivery processes just hang? - Try using tcpdump to check the traffic between client and server, this might give a clue what is going on. -Otto