RE: Socket problems under heavy load on Win2K

2001-06-29 Thread GOMEZ Henri

>   The description is right, but the implications are 
>wrong.  Due to
>how TCP/IP works, these sockets will be closed by the OS within a few
>minutes.  I know on some UNIXs you can set this at run time 
>(i.e. Solaris)
>and some its a compile time directive (i.e. FreeBSD).  I don't 
>know about
>Windows, but they seem to be closed in about 5 minutes, all 
>defaults that
>I've seen under Unix are between 2 and 5 minutes.  

FYI, they stay FOREVER on AS/400 !




RE: Socket problems under heavy load on Win2K

2001-06-29 Thread Randy Layman



> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> Sent: Friday, June 29, 2001 2:06 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Socket problems under heavy load on Win2K
> 
> 
> 
> The CLOSE_WAIT sockets may or may not be a problem ... one end of the
> connection has been closed, and now
> you're waiting for the other end to finish and close.  If 
> someone is not
> closing their end, these sockets would hang
> around in that state a long time.
> 

The description is right, but the implications are wrong.  Due to
how TCP/IP works, these sockets will be closed by the OS within a few
minutes.  I know on some UNIXs you can set this at run time (i.e. Solaris)
and some its a compile time directive (i.e. FreeBSD).  I don't know about
Windows, but they seem to be closed in about 5 minutes, all defaults that
I've seen under Unix are between 2 and 5 minutes.  

The purpose of this is that it won't allow the same remote computer
(with the same port) to connect to the same server port for a different
connection, since there might be some lost packets floating though the net.
It should be harmless.  Under high load these will probably stay around a
little longer since its not really important to close these sockets.

Randy

PS I kind of feel sorry for Petra Hora.  If they send a message in the
future, no one will read it since everyone is filtering their message to the
trash.



Re: Socket problems under heavy load on Win2K

2001-06-29 Thread Fernando_Salazar


I think 61 is "connection refused".  My guess is that the accept() queue
length is too small.  Looking through
Tomcat 4 source, it looks like the default accept backlog is 10.  Possibly
there's a way to increase it?
You could always rebuild with a larger default.  But if you are throwing a
lot of simultaneous connections
at Tomcat, you may overflow the accept backlog.

The other possibility is you're running out of file descriptors.  Not sure
how you increase the number per process
allowed in Win2K ...

The CLOSE_WAIT sockets may or may not be a problem ... one end of the
connection has been closed, and now
you're waiting for the other end to finish and close.  If someone is not
closing their end, these sockets would hang
around in that state a long time.

Hope this helps ...

- Fernando




Hi all,

we have the following configuration:
- Windows 2000 (Service Pack 1) (2, 4 and 8 processors)
- Apache 1.3.19 with mod_ssl 2.8.3
- Tomcat 3.2.2 (Apache and Tomcat are talking AJP13)
- We have a loadbalancer configured with 3 Tomcat workers
- Our load generating test clients are implemented using HttpUnit 1.2.4 +
JSSE 1.0.2

On heavy load (starting from 50 concurrent requests up to 200 concurrent
requests) we observe non-deterministic TCP/socket problems.
It seems that in almost every case, the only place where we can see some
kind of exception is the mod_jk log file:

...
[jk_connect.c (143)]: jk_open_socket, connect() failed errno = 61
[jk_ajp13_worker.c (173)]: In jk_endpoint_t::connect_to_tomcat, failed
errno = 61
[jk_ajp13_worker.c (584)]: Error connecting to the Tomcat process.
[jk_ajp13_worker.c (203)]: connection_tcp_get_message: Error -
jk_tcp_socket_recvfull failed
[jk_ajp13_worker.c (619)]: Error reading request
...
[jk_ajp13_worker.c (271)]: read_into_msg_buff: Error -
read_fully_from_server failed
[jk_lb_worker.c (349)]: In jk_endpoint_t::service, none recoverable
error...
...

Analysing the exceptions that are thrown from HttpUnit, it looks like that
sometimes the socket cannot connect at all and sometimes the response
could not be retrieved completely. Most errors occur in the early startup
phase of our load test.
Using netstat we can observe a *lot* of sockets in CLOSE_WAIT state
connected to the AJP13 port.
The settings we use in our Apache configuration are as follows:

...
Timeout 300
KeepAlive On
MaxKeepAliveRequests 500
KeepAliveTimeout 300
MaxRequestsPerChild 0
ThreadsPerChild 500
...

I would appreciate, if there is anyone who has gone through the same kind
of problems or if there is some kind of solution that may help to solve
these problems. Perhaps, if there is someone with an equivalent
Linux-Setup, i would appreciate, if she/he could tell us some of her/his
experience?

Mit freundlichen Grüßen / Kind regards,
Norbert Klose