Hello,

        I'm one of the Linux sysadmins for an e-broker and we are running
into a kernel limit for sunrpc on several production systems. After much
searching on google, HOWTO's, etc.. I decided to post here. I'm thinking
about a solution but I am not sure what will fix our problem.. :( I'm
really desperate here so if anyone has any idea.. :(

** Description of our architecture:

We have 32 VA-Linux systems (each with SCSI, Dual PIII-600, 1GB ram) that
are running (each) 200 apache-ssl web servers.
Some CGI's access -one- NFS partition on an NFS server in the same
subnet but these CGI's are loaded every time a customer connects to our
site since they display dynamic data.

The VA-Linux machines run Debian "Potato" with kernel 2.2.14 while the NFS
Server runs RedHat 6.1 (with all updates applied) with knfsd and 64 kernel
threads on a Compaq 1850 with a SmartArray X2000 with 56Mb Cache.

** Description of the problem.

On -all- web servers, during peak hours we get things like:

Apr  7 09:06:15 pawww-001 kernel: nfs: task 5820 can't get a request slot
Apr  7 09:06:15 pawww-001 kernel: nfs: task 5933 can't get a request slot
Apr  7 09:06:15 pawww-001 kernel: nfs: task 5816 can't get a request slot
Apr  7 09:06:15 pawww-001 kernel: nfs: task 5938 can't get a request slot

We are getting more and more of these and this is causing delays for the
affected customers because the cgi's have to re-try. We are also getting a
lot of:
Apr  7 10:57:40 pawww-001 kernel: nfs: server 172.17.7.229 OK
Apr  7 11:11:55 pawww-001 kernel: nfs: server 172.17.7.229 not responding, still trying
Apr  7 11:11:55 pawww-001 kernel: nfs: server 172.17.7.229 OK

I have tracked the first error to this file:

# grep -n 'request slot' /usr/src/linux/net/sunrpc/*c
....
/usr/src/linux/net/sunrpc/clnt.c:603:                   printk("%s: task %d can't get 
a request slot\n",

I'm thinking of increasing the following values and recompiling.
#define MAX_IOVEC       8
#define RPC_MAXCONG             16
/usr/src/linux/include/linux/sunrpc/xprt.h

MAX_IOVEC is listed in /usr/src/linux/net/TUNABE but not RPC_MAXCONG..

Does anyone have any idea if this would help with our current problem?
Would we run into other problems with increasing these values (something
like the number of fd's < 256 ?) What values should I use? I'm really in
the dark here and anyway from what I hear this is subject to change in
kernels 2.4.x but we need to do something about it since they are in
trouble here... :( Also, I assumed the problem was on the web servers'
side because I am not getting any error on the NFS server but I really
have no idea if this is coming from the NFS clients; I just -think- so
from the errors that I am seeing.

Thanks for reading.

Vincent, [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]

Reply via email to