I think 61 is "connection refused". My guess is that the accept() queue
length is too small. Looking through
Tomcat 4 source, it looks like the default accept backlog is 10. Possibly
there's a way to increase it?
You could always rebuild with a larger default. But if you are throwing a
lot of simultaneous connections
at Tomcat, you may overflow the accept backlog.
The other possibility is you're running out of file descriptors. Not sure
how you increase the number per process
allowed in Win2K ...
The CLOSE_WAIT sockets may or may not be a problem ... one end of the
connection has been closed, and now
you're waiting for the other end to finish and close. If someone is not
closing their end, these sockets would hang
around in that state a long time.
Hope this helps ...
- Fernando
Hi all,
we have the following configuration:
- Windows 2000 (Service Pack 1) (2, 4 and 8 processors)
- Apache 1.3.19 with mod_ssl 2.8.3
- Tomcat 3.2.2 (Apache and Tomcat are talking AJP13)
- We have a loadbalancer configured with 3 Tomcat workers
- Our load generating test clients are implemented using HttpUnit 1.2.4 +
JSSE 1.0.2
On heavy load (starting from 50 concurrent requests up to 200 concurrent
requests) we observe non-deterministic TCP/socket problems.
It seems that in almost every case, the only place where we can see some
kind of exception is the mod_jk log file:
...
[jk_connect.c (143)]: jk_open_socket, connect() failed errno = 61
[jk_ajp13_worker.c (173)]: In jk_endpoint_t::connect_to_tomcat, failed
errno = 61
[jk_ajp13_worker.c (584)]: Error connecting to the Tomcat process.
[jk_ajp13_worker.c (203)]: connection_tcp_get_message: Error -
jk_tcp_socket_recvfull failed
[jk_ajp13_worker.c (619)]: Error reading request
...
[jk_ajp13_worker.c (271)]: read_into_msg_buff: Error -
read_fully_from_server failed
[jk_lb_worker.c (349)]: In jk_endpoint_t::service, none recoverable
error...
...
Analysing the exceptions that are thrown from HttpUnit, it looks like that
sometimes the socket cannot connect at all and sometimes the response
could not be retrieved completely. Most errors occur in the early startup
phase of our load test.
Using netstat we can observe a *lot* of sockets in CLOSE_WAIT state
connected to the AJP13 port.
The settings we use in our Apache configuration are as follows:
...
Timeout 300
KeepAlive On
MaxKeepAliveRequests 500
KeepAliveTimeout 300
MaxRequestsPerChild 0
ThreadsPerChild 500
...
I would appreciate, if there is anyone who has gone through the same kind
of problems or if there is some kind of solution that may help to solve
these problems. Perhaps, if there is someone with an equivalent
Linux-Setup, i would appreciate, if she/he could tell us some of her/his
experience?
Mit freundlichen Grüßen / Kind regards,
Norbert Klose