Re: Recursive queries fail after bind has been running for a few hours

2012-03-12 Thread Kevin Oberman
On Mon, Mar 12, 2012 at 12:05 PM, Mr X  wrote:
> Hey there
>
> I'm having a bizarre issue with 9.7.3-P3-RedHat-9.7.3-8.P3.el6_2.2 -
> recursive queries stop functioning after bind has been running for a few
> hours. It's a very low volume system (dev), maybe a few queries per hour at
> most. It's not due to cache filling or anything like I've dealt with in the
> past. I suspect it's related to DNSSEC and root-server validation but I
> could use another set of eyes on my debug log. Sorry for posting from a
> inconspicuous e-mail address. My employer asks that I'm careful about the
> information I disclose on public mailing lists.
>
> You can see my debug log during a failed query
> http://pastebin.com/5hh05WjM
>
> Successful query here
> http://pastebin.com/H9qSQcyG
>
> If you would like to see my config, I can include portions, but it's huge so
> please let me know exactly what parts you're looking for.

You are getting timeouts for some reason. The obvious question is
whether the queries are actually being sent or whether they and and
responses are not coming back. Or,perhaps the response IS coming back,
but named is not picking them up.

Could you try getting a packet capture? As these are UDP and assuming
Unix, something like 'tcpdump -w badquery.bpf -s0 -p port 53`. This
will capture all DNS traffic to/from this system, but you say it is
not all that much, so it should be tractable.

Once you have captured the data, you can use a tool like wireshark to
look at it.
-- 
R. Kevin Oberman, Network Engineer
E-mail: kob6...@gmail.com
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Recursive queries fail after bind has been running for a few hours

2012-03-12 Thread Lyle Giese
I don't look at debug logs and may be WAY off base.  But the time period 
for the log seems to be about 10 seconds start to finish in the failed 
query.  However line 56 indicates that it timed out the query after 30 
seconds.


That just doesn't add up to me for some reason.  Or is there 20 seconds 
of preceeding logs missing when the query started?


Lyle Giese
LCR Computer Services, Inc.

On 03/12/12 15:05, Mr X wrote:

Hey there

I'm having a bizarre issue with 9.7.3-P3-RedHat-9.7.3-8.P3.el6_2.2 - 
recursive queries stop functioning after bind has been running for a 
few hours. It's a very low volume system (dev), maybe a few queries 
per hour at most. It's not due to cache filling or anything like I've 
dealt with in the past. I suspect it's related to DNSSEC and 
root-server validation but I could use another set of eyes on my debug 
log. Sorry for posting from a inconspicuous e-mail address. My 
employer asks that I'm careful about the information I disclose on 
public mailing lists.


You can see my debug log during a failed query
http://pastebin.com/5hh05WjM

Successful query here
http://pastebin.com/H9qSQcyG

If you would like to see my config, I can include portions, but it's 
huge so please let me know exactly what parts you're looking for.


- Brian



___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Recursive queries fail after bind has been running for a few hours

2012-03-13 Thread G.W. Haywood

B0;261;0cHi there,

On Mon, Mar 12, 2012 at 12:05 PM, Mr X  wrote:


I'm having a bizarre issue with 9.7.3-P3-RedHat-9.7.3-8.P3.el6_2.2 -
recursive queries stop functioning after bind has been running for a few
hours. It's a very low volume system (dev), maybe a few queries per hour
...


I saw something very similar with versions of 9.7 and I believe 9.8.

I was never able to pin it down, and never collected any evidence that
it was BIND itself that was the problem, but I did have to restart it
on several occasions when recursive queries suddenly started to fail.

Your suspicions are similar to mine although your setup appears not to
be.  I was using self-compiled binaries on a Debian system, but I do
run DNSSEC.  Now that I'm runnning 9.9 the problem seems to have gone.

Try upgrading?

Is your server also authoritative?

--

73,
Ged.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Recursive queries fail after bind has been running for a few hours

2012-03-13 Thread Mr X
On Mon, Mar 12, 2012 at 3:37 PM, Kevin Oberman  wrote:

> On Mon, Mar 12, 2012 at 12:05 PM, Mr X  wrote:
> > Hey there
> >
> > I'm having a bizarre issue with 9.7.3-P3-RedHat-9.7.3-8.P3.el6_2.2 -
> > recursive queries stop functioning after bind has been running for a few
> > hours. It's a very low volume system (dev), maybe a few queries per hour
> at
> > most. It's not due to cache filling or anything like I've dealt with in
> the
> > past. I suspect it's related to DNSSEC and root-server validation but I
> > could use another set of eyes on my debug log. Sorry for posting from a
> > inconspicuous e-mail address. My employer asks that I'm careful about the
> > information I disclose on public mailing lists.
> >
> > You can see my debug log during a failed query
> > http://pastebin.com/5hh05WjM
> >
> > Successful query here
> > http://pastebin.com/H9qSQcyG
> >
> > If you would like to see my config, I can include portions, but it's
> huge so
> > please let me know exactly what parts you're looking for.
>
> You are getting timeouts for some reason. The obvious question is
> whether the queries are actually being sent or whether they and and
> responses are not coming back. Or,perhaps the response IS coming back,
> but named is not picking them up.
>
> Could you try getting a packet capture? As these are UDP and assuming
> Unix, something like 'tcpdump -w badquery.bpf -s0 -p port 53`. This
> will capture all DNS traffic to/from this system, but you say it is
> not all that much, so it should be tractable.
>
> Once you have captured the data, you can use a tool like wireshark to
> look at it.
>


I had to sanitize some data, so the -vvv output of the packet capture is
here:

http://pastebin.com/GKSspL2L

Unfortunately this server is both authoritative and recursive. I have an
upcoming project to segment these two functions, but for now getting this
host operational is my priority. It's also worth mentioning that I have IO
data center nameservers as a forwarder as seen in this packet capture. When
bind is in a failed state I can query against these nameservers directly -
so I had not considered this a potential cause.

I really appreciate everyones help.


> --
> R. Kevin Oberman, Network Engineer
> E-mail: kob6...@gmail.com
>
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users