On Fri, Jul 25, 2008 at 03:35:41PM -0500, Jason Bratton wrote: > After posting my email, I finally figured out the problem. For some > reason, I had to set tcp-listen-queue. I never had it set before, so > something changed in the code, but yeah, that fixed it. I set both > tcp-clients and tcp-listen-queue to 1000 and haven't had any problems > like that since.
That didn't seem to do it for us, the bind instance in question ran for about 38 hours and then it refused to accept tcp connections again. I found the following error message in the logs at about the time of the outage: 27-Jul-2008 15:35:12.234 resolver: notice: clients-per-query decreased to 17 27-Jul-2008 15:35:34.440 general: error: socket.c:1996: unexpected error: 27-Jul-2008 15:35:34.440 general: error: internal_accept: fcntl() failed: Too many open files 27-Jul-2008 15:35:34.452 general: error: socket.c:1996: unexpected error: 27-Jul-2008 15:35:34.452 general: error: internal_accept: fcntl() failed: Too many open files Since messages there seem to be several messages about the file handle limit being exceeded on the list already, I presume it's the same problem that other people are having with the 9.5.0-P1 patch. Anybody has any suggestions what I specifically I should be looking at at the next outage? > -- Jason > > Thomas Jacob wrote: > > Hello list, > > > > We're having problems with the -P1 version, some time after > > starting the server (could be minutes or hours), the tcp request > > handler seems to get stuck, and all (or almost all) new requests > > get stuck in the SYN_RECV tcp stat. We haven't found out what > > exactly triggers this yet, could be load, could be specific > > types of queries. > > > > This seems to be the same problem as described > > in the following post by Jason Bratton: > > > > http://marc.info/?l=bind-users&m=121628960603391&w=2 > > > > The main difference should be that we're running > > the version of bind that comes with Ubuntu 8.0.4 LTS x86_64, > > and the problems happen when upgrading from > > version bind9_9.4.2-10 to bind9_9.4.2-10ubuntu0.1, a diff > > between these two shows the exact same -P1 patch as in the upstream > > version. > > > > Our tcp related settings: > > > > transfers-out 100; > > transfers-per-ns 100; > > tcp-clients 5000; > > recursive-clients 10000 > > > > Is anyone else seeing this? Is this really a bind bug? And if yes, is > > there a workaround? > > > > > > Regards, > > Thomas > > > > > Confidentiality Notice: This e-mail message (including any attached or > embedded documents) is intended for the exclusive and confidential use of the > individual or entity to which this message is addressed, and unless otherwise > expressly indicated, is confidential and privileged information of Rackspace. > Any dissemination, distribution or copying of the enclosed material is > prohibited. > If you receive this transmission in error, please notify us immediately by > e-mail > at [EMAIL PROTECTED], and delete the original message. > Your cooperation is appreciated. > >
