On Mon, 13 Jun 2005, Federico Giannici wrote:

> I have an MX mail server that receives email messages and saves them to
> an email storage server via NFS.
> 
> Both pc are OpenBSD i386, version 3.7 for the NFS client (MX server) and
> 3.4 for the NFS server (the storage server).
> 
> From time to time the connections from the NFS clients seem to freeze
> (at least the new ones).

"Connection" is a bit confusing term here, since by default NFS is a
connectionless protocol, it uses UDP. Only if you tell it to use TCP,
it uses connections. Are you using TCP or UDP?

> 
> I applied the famous NFS patch that disables write gathering for v3
> (http://marc.theaimsgroup.com/?l=openbsd-misc&m=110676811107986&w=2),
> but the problem remains (perhaps a little less frequent).
> 
> I have also raised the number of nfsd processes and "vfs.nfs.iothreads" to 20.
> 
> The server uses a fxp interface and the client an sk one. From "netstat -i" I
> have seen that there are no errors or collisions.
> 
> Here is the nfsstat output for the client and the server after almost a day of
> uptime:
> 
> Client Info:
> Rpc Counts:
>   Getattr   Setattr    Lookup  Readlink      Read     Write    Create   Remove
>    136400        12    429651         0     13653    178549     25790    25819
>    Rename      Link   Symlink     Mkdir     Rmdir   Readdir  RdirPlus   Access
>      5415     20016         0        16         0    336388         0  1359316
>     Mknod    Fsstat    Fsinfo  PathConf    Commit
>         0     27008         1         0     42530
> Rpc Info:
>  TimedOut   Invalid X Replies   Retries  Requests
>         4         0       139     28889   2600564
> Cache Info:
> Attr Hits    Misses Lkup Hits    Misses BioR Hits    Misses BioW Hits  Misses
>   1669083    136400   1239823    424253     67080     13653    632860   178549
> BioRLHits    Misses BioD Hits    Misses DirE Hits    Misses
>         0         0         0         0     26996     27954
> 
> 
> Server Info:
>   Getattr   Setattr    Lookup  Readlink      Read     Write    Create   Remove
>     90847         0    269426         0      8882    137000     16908    16947
>    Rename      Link   Symlink     Mkdir     Rmdir   Readdir  RdirPlus   Access
>      3263     13427         0         0         0    197032         0   872760
>     Mknod    Fsstat    Fsinfo  PathConf    Commit
>         0     16594         0         0     28598
> Server Ret-Failed
>             65447
> Server Faults
>             0
> Server Cache Stats:
>    Inprog      Idem  Non-idem    Misses
>        21     14256       920   1657428
> Server Write Gathering:
>  WriteOps  WriteRPC   Opsaved
>    136997    137000         3
> 
> 
> What make me worry is the hight value of the "Ret-Failed" field.
> Is it normal?

Yes. Ret-Failed is increased when opening a nonexistent file, for example.

A little experiment on an idle nfs server:

server: nfsstat -s > a
client: cat nonexistentfile
server: nfsstat -s > b
server: diff -u a b
--- a   Tue Jun 14 09:03:05 2005
+++ b   Tue Jun 14 09:03:11 2005
@@ -1,17 +1,17 @@
 Server Info:
   Getattr   Setattr    Lookup  Readlink      Read     Write    Create    Remove
-   160196       432     37640        17     11621     19508      1092       866
+   160196       432     37641        17     11621     19508      1092       866
    Rename      Link   Symlink     Mkdir     Rmdir   Readdir  RdirPlus    Access
-      207        50         4        28        28       909         0   3214591
+      207        50         4        28        28       909         0   3214619
     Mknod    Fsstat    Fsinfo  PathConf    Commit
         0       362         4         0      3426
 Server Ret-Failed
-            18377
+            18378
 Server Faults
             0
 Server Cache Stats:
    Inprog      Idem  Non-idem    Misses
-       13       290         1   3450691
+       13       290         1   3450720
 Server Write Gathering:
  WriteOps  WriteRPC   Opsaved
     17979     19508      1529
                                           
> I have no experience of NFS, is it normal that sometime ot stalls?

No. This is not normal. But there have been quite some fixes in NFS since
3.4, so it might be worth the trouble upgrading the server.

> What else I could do to prevent this to happen?

Some relevant questions/hints:

- Is all I/O stalled? Or just I/O to certain mailboxes?
- Are there any stale lock files in the mail dir?
- Does the maillog say something about failing deliveries on the NFS
client? Or do the local delivery processes just hang?
- Try using tcpdump to check the traffic between client and server,
this might give a clue what is going on.

        -Otto

Reply via email to