On Thu, Dec 13, 2007 at 12:35:03PM +0100, Otto Moerbeek wrote:
:
> > > or it might be a program "forgetting" to do a close.
> > 
> > Does select() notify the application of FIN from the other side?
> > 
> > If not, that would explain things, it wouldn't be reasonable for
> > httpd to manually try and receive from all sockets in keepalive
> > to see whether it needs to close the socket, since it will only
> > wait KeepAliveTimeout (default 15s) before it closes them anyway.
> 
> Nice suggestion, but if you've marked the fd for read I would expect
> select to notify if the other side does a shutdown(SHUT_WR).
> 
> Other scenarios are also thinkable: like the server socket being
> blocked because of outgoing data that cannot be written out. That
> might prevent the server from doing a close too. But in the end the
> close will happen, otherwise you would run out of fd's very soon.
> 
>       -Otto

The behaviour is starting to make sense now. Scenario:

* The client connects to the server, sends its request and
  then closes the socket, that is shutdown() aka half-close.
  It can still read the reply.
* The server accepts the connection, reads the request,
  and may or may not notice that the client has done
  a shutdown() - it is not important. Nevertheless the
  server can not close the socket since it has a
  reply to deliver. And the server host TCP stack
  has noticed the shutdown() so the socket already
  enters CLOSE_WAIT.
* The server starts sending the reply which may be large
  e.g a file download. In the middle of this transfer
  the client's ethernet cable gets plugged out, the
  client host gets powered off, a firewall in the
  path goes bananas or whatnot.
* The server is now stuck in a write() call since the
  server host TCP stack has to wait quite a while
  to be sure the connection is really dead.
  And the state is still CLOSE_WAIT.

If the client program would die, the client host TCP
stack would close the socket and tell the server host
TCP stack, that would fail the hanging write() call.
So there must be a harder error such as network
outage or power outage to induce this problem.

If this scenario is correct, there is nothing to do
about it, except decreasing the likelyhood of the
server socket being half-closed while sending
the reply, and having KeepAliveTimeout in
httpd.conf at its default (15) or slightly lower
seems to do the trick. But I do not know how.

If there is some quirk in httpd's implementation
of the KeepAliveTimeout that makes it not notice
the half-close and keeps the socket open the whole
KeepAliveTimeout, that would explain it.

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Reply via email to