Hi Robert,
Sounds like there's a bug somewhere. Before we start trying to track it
[...]
So, with that introduction, we're interested in resolving:
Quite comprehensive indeed; thank you for all that information. I was
not aware that there was a decoupling between the various parts of the
abstractions, but now that I think of it, it's more or less logical I guess.
The first is the easiest to resolve, as all we need to do is see whether
[...]
the file descriptor numbers being returned to see whether, perhaps, that
number only goes up over time, and gets really big.
My personal feeling is that it's a race condition; no idea why, but it
feels that way. Maybe because it's such a small number as compared to
the big amount of connections that takes place.
I do not leak file descriptors as far as I can see, I can send you the
information you ask for (netstat, sockstat, fstat, etc.) offlist if you
like, or if you prefer, I can give you access to the machine, please let
me know whichever you like.
I'd like to reiterate that at this moment i'm not sure at all if it's my
code, or kernel code. However I've seen, for my feeling, sufficient
information to reasonably suspect that it _might_ be something outside
my code :).
wedged-up state. It would be most helpful if you could actually shut
down to single-user mode, killing all user processes, then waiting ten
minutes, and capturing the output of those above commands to files that
you can then e-mail to me.
Because it's a live machine that would be very difficult. Maybe, if you
really really need it that way and we can't find another way I can
announce maintainance and do it in the middle of the night :).
Without accusing you of having buggy code, I should say that I think
there's a reasonable chance that what you're seeing is an interaction
between an existing leak of resources in the application and the way the
kernel state management has changed. The output from netstat pretty
Yes that was the first thing I though of as well, however, especially
one of the two applications is so simple that I would be ashamed to
death if I still had a bug in there :). If it turns out that way:
sssstttt ;).
precisely matches that what you'd expect: lots of TCP connections in the
CLOSED state reflecting a series of connections built by an application
but then not properly discarded. Likewise, when the application is
killed, all of the connections go away -- most likely because the file
descriptors are all closed, allowing them to be garbage collected and
connection state freed. If it is this sort of bug, then most likely
you're missing a call to close() in a work loop somewhere, and in some
exceptional case, you fall out of the loop without calling close().
I will double check this once more, but honestly, i strongly doubt it...
Also one other thing that I've noticed, is that it's always the input
buffer that has bytes left; never the output buffer...
Moreover, i've seen that close() reports EBADF, but due to the insane
amount of connections I can not say for certain that that's when the
connection goes into CLOSED state. The ip's do match, but it's very
common for the same ip's to make numerous connections too.
Kind Regards,
Ali
--
Transip BV | http://www.transip.nl/
We never let you down.
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"