Am 20.01.2015 um 10:15 schrieb Rainer Jung:
Am 20.01.2015 um 08:45 schrieb Ruediger Pluem:
On 01/19/2015 11:40 PM, Rainer Jung wrote:
I noticed a hanging child process on our ASF server aurora.
It currently uses 2.4.11 (plus the post tag commit) and event MPM.
Most processes exiting due to MaxConnectionsPerChild get cleaned up
after some time but this one doesn't. It now hangs
for more than an hour. I'll let it hang. In case anyone has a good
question I can answer with gdb let me know.

It shows a strange connection view in the server status table:

PID     Connections     Threads Async connections
total   accepting       busy    idle    writing keep-alive      closing
93557   1       yes     0       0       0       0       0

So it has 1 connection, but 0s in all other columns.

The connection can be seen by lsof:

  FD     TYPE             DEVICE   SIZE/OFF   NODE NAME
txt     VREG     183,3400335528   36497117 275235
/x1/www/archive.apache.org/dist/cordova/cordova-3.4.0-src.zip
   9u    PIPE 0xfffffe061ecfab60      16384        ->0xfffffe061ecfacb8
  10u    PIPE 0xfffffe061ecfacb8          0        ->0xfffffe061ecfab60
  24u  KQUEUE 0xfffffe033071be00                   count=0, state=0x2
  41u    IPv4 0xfffffe01316243d0        0t0    TCP
127.0.0.1:35849->127.0.0.1:8050 (CLOSE_WAIT)
  83u    IPv4 0xfffffe0255d08b70        0t0    TCP
127.0.0.1:52023->127.0.0.1:8050 (CLOSE_WAIT)
108u    IPv4 0xfffffe09990eeb70        0t0    TCP
127.0.0.1:22532->127.0.0.1:8050 (CLOSE_WAIT)

This is the established connectioN:

110u    IPv4 0xfffffe0255ab4b70        0t0    TCP
192.87.106.229:http->179.206.174.192:65496 (ESTABLISHED)

And this is likely the file being served on that connection:

126r    VREG     183,3400335528   36497117 275235
/x1/www/archive.apache.org/dist/cordova/cordova-3.4.0-src.zip
156u    IPv4 0xfffffe048d0ff3d0        0t0    TCP
127.0.0.1:26685->127.0.0.1:8050 (CLOSE_WAIT)
229u    IPv4 0xfffffe0131d013d0        0t0    TCP
127.0.0.1:31538->127.0.0.1:8050 (CLOSE_WAIT)

netstat shows:

Proto Recv-Q Send-Q Local Address          Foreign Address
(state)
tcp4       0  87650 192.87.106.229.80      179.206.174.192.65496
ESTABLISHED

so there's 87650 bytes in the send-q. Most lilely the client hans't
acked what we send.

Isn't it weird that the connection remains in this state for an hour?
I would guess the OS tries to resent whats in the
buffer and if doesn't get  ACK'ed it would somehow timeout the TCP
connection and assume the peer is gone.

I was misguided by the server-status table. I have another child process
now with one open connection, all zeroes in the other server status
columns and for that I checked network and process activity. And indeed
ktrace and tcpdump both show, that the process still does send some data
every few seconds. It just takes very long for the big file served and
the low rate of transfer. Thanks for putting me on the right track. I
will check for a few more "hanging" children during the next days
whether they actually still serve files.

So it all boils down to why the connection isn't shown in any of the
state columns in the server status table. Plus it could be interesting
to limit the time a closing child still serves requests, because it
keeps blocking a scoreboard entry and aborting one or two requests every
now and them might be better than running out of children.

And now another hour later, the same process actually shows the connection in the "Async connection - writing" column. Even more confusing. I wonder in which state the connection was in between.

Regards,

Rainer

Reply via email to