Hello,
Just as a follow-up to my own post. I received an answer from Alan
Cox who was suggesting to start more nfsd's than the default 8. Problem is
that I am already using 64 nfsd's (the maximum). When I try using more, I
get a kmalloc error (RH6.1 + latest patches, default kernelult 2.1.12
kernel). That's something along the lines of:
Apr 10 10:43:40 solace kernel: Installing knfsd (copyright (C) 1996
[EMAIL PROTECTED]).
Apr 10 10:43:40 solace kernel: nfsd_fh_init : Could not allocate fhcache
Apr 10 10:43:40 solace rpc.nfsd: nfssvc: Cannot allocate memory
Apr 10 10:43:40 solace nfs: rpc.nfsd startup failed
However, Alan's answer seems to imply that the problem is coming from the
server and not from the clients'
limits. Is this everyone's feeling also? Is there anything else to do to
have more nfsd's on Linux-2.2 than the (apparent) maximum of 64?
Vincent
On Fri, 7 Apr 2000, Vincent Cojot wrote:
>
> Hello,
>
> I'm one of the Linux sysadmins for an e-broker and we are running
> into a kernel limit for sunrpc on several production systems. After much
> searching on google, HOWTO's, etc.. I decided to post here. I'm thinking
> about a solution but I am not sure what will fix our problem.. :( I'm
> really desperate here so if anyone has any idea.. :(
>
> ** Description of our architecture:
>
> We have 32 VA-Linux systems (each with SCSI, Dual PIII-600, 1GB ram) that
> are running (each) 200 apache-ssl web servers.
> Some CGI's access -one- NFS partition on an NFS server in the same
> subnet but these CGI's are loaded every time a customer connects to our
> site since they display dynamic data.
>
> The VA-Linux machines run Debian "Potato" with kernel 2.2.14 while the NFS
> Server runs RedHat 6.1 (with all updates applied) with knfsd and 64 kernel
> threads on a Compaq 1850 with a SmartArray X2000 with 56Mb Cache.
>
> ** Description of the problem.
>
> On -all- web servers, during peak hours we get things like:
>
> Apr 7 09:06:15 pawww-001 kernel: nfs: task 5820 can't get a request slot
> Apr 7 09:06:15 pawww-001 kernel: nfs: task 5933 can't get a request slot
> Apr 7 09:06:15 pawww-001 kernel: nfs: task 5816 can't get a request slot
> Apr 7 09:06:15 pawww-001 kernel: nfs: task 5938 can't get a request slot
>
> We are getting more and more of these and this is causing delays for the
> affected customers because the cgi's have to re-try. We are also getting a
> lot of:
> Apr 7 10:57:40 pawww-001 kernel: nfs: server 172.17.7.229 OK
> Apr 7 11:11:55 pawww-001 kernel: nfs: server 172.17.7.229 not responding, still
>trying
> Apr 7 11:11:55 pawww-001 kernel: nfs: server 172.17.7.229 OK
>
> I have tracked the first error to this file:
>
> # grep -n 'request slot' /usr/src/linux/net/sunrpc/*c
> ....
> /usr/src/linux/net/sunrpc/clnt.c:603: printk("%s: task %d can't
>get a request slot\n",
>
> I'm thinking of increasing the following values and recompiling.
> #define MAX_IOVEC 8
> #define RPC_MAXCONG 16
> /usr/src/linux/include/linux/sunrpc/xprt.h
>
> MAX_IOVEC is listed in /usr/src/linux/net/TUNABE but not RPC_MAXCONG..
>
> Does anyone have any idea if this would help with our current problem?
> Would we run into other problems with increasing these values (something
> like the number of fd's < 256 ?) What values should I use? I'm really in
> the dark here and anyway from what I hear this is subject to change in
> kernels 2.4.x but we need to do something about it since they are in
> trouble here... :( Also, I assumed the problem was on the web servers'
> side because I am not getting any error on the NFS server but I really
> have no idea if this is coming from the NFS clients; I just -think- so
> from the errors that I am seeing.
>
> Thanks for reading.
>
> Vincent, [EMAIL PROTECTED]
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> the body of a message to [EMAIL PROTECTED]
>
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]