On Thu, Dec 13, 2007 at 12:35:03PM +0100, Otto Moerbeek wrote: : > > > or it might be a program "forgetting" to do a close. > > > > Does select() notify the application of FIN from the other side? > > > > If not, that would explain things, it wouldn't be reasonable for > > httpd to manually try and receive from all sockets in keepalive > > to see whether it needs to close the socket, since it will only > > wait KeepAliveTimeout (default 15s) before it closes them anyway. > > Nice suggestion, but if you've marked the fd for read I would expect > select to notify if the other side does a shutdown(SHUT_WR). > > Other scenarios are also thinkable: like the server socket being > blocked because of outgoing data that cannot be written out. That > might prevent the server from doing a close too. But in the end the > close will happen, otherwise you would run out of fd's very soon. > > -Otto
The behaviour is starting to make sense now. Scenario: * The client connects to the server, sends its request and then closes the socket, that is shutdown() aka half-close. It can still read the reply. * The server accepts the connection, reads the request, and may or may not notice that the client has done a shutdown() - it is not important. Nevertheless the server can not close the socket since it has a reply to deliver. And the server host TCP stack has noticed the shutdown() so the socket already enters CLOSE_WAIT. * The server starts sending the reply which may be large e.g a file download. In the middle of this transfer the client's ethernet cable gets plugged out, the client host gets powered off, a firewall in the path goes bananas or whatnot. * The server is now stuck in a write() call since the server host TCP stack has to wait quite a while to be sure the connection is really dead. And the state is still CLOSE_WAIT. If the client program would die, the client host TCP stack would close the socket and tell the server host TCP stack, that would fail the hanging write() call. So there must be a harder error such as network outage or power outage to induce this problem. If this scenario is correct, there is nothing to do about it, except decreasing the likelyhood of the server socket being half-closed while sending the reply, and having KeepAliveTimeout in httpd.conf at its default (15) or slightly lower seems to do the trick. But I do not know how. If there is some quirk in httpd's implementation of the KeepAliveTimeout that makes it not notice the half-close and keeps the socket open the whole KeepAliveTimeout, that would explain it. -- / Raimo Niskanen, Erlang/OTP, Ericsson AB