On Wed, 4 Feb 2015, Sverre Froyen wrote:

I'd also look at the open descriptors of the named process (although they
should be closed at this time, since TIME_WAIT means closed on this side,
and waiting for the 4 minutes to expire before killing the connection)...

Also I'd record that information every minute or so to see how many
connections are added and how many are going away.

Perhaps there is some bug triggered in the tcp stack and somehow connections
are not being GC’ed?

This is vaguely similar to a problem I have seen from time to time. On my 
servers, it is usually port 80 that gets attacked. Someone opens TCP 
connections to this port on the server, sends no request, and leaves the 
connection open indefinitely. See 
http://mail-index.netbsd.org/netbsd-users/2011/01/04/msg007484.html

When I test such a scenario to port 53 (using telnet), the connection shows as 
ESTABLISHED for 30 seconds. Then, presumably, named times-out and closes the 
connection. At this point netstat shows the connection as TIME_WAIT for another 
10 seconds. After that it disappears.

If I disable the network connection during the 30 second period before named 
times out, however, I instead observe the connection in FIN_WAIT_1 mode for 
another 30 minutes or so.

This is on netbsd-6. I notice that your netstat output has the client and 
server columns in the reverse order from what I see. Could it be that in your 
netstat output, FIN_WAIT_1 is reported as TIME_WAIT?

Regards,
Sverre

The two problems are not identical. In my case, the connections are really in the TIME_WAIT state. Christos has also found that the 2MSL timer each connection is negative. If this value is negative, the connection should be removed.

The callout code in kern_timeout.c:

                if (delta <0)
                         cc->cc_ev_late.ev_count++;

At the same time, the problem occurs that expired entries are not deleted from the ndp table. 'ndp -a' shows expired entries.

Both problems occur only after several days of uptime. They probably have the same cause.

The problem you described is different.


Regards
Uwe

Reply via email to