RE: Bind9.5.1 under no Root Name Servers

Todd Snyder Fri, 17 Jul 2009 06:35:03 -0700

Martin,

It looks like you were relying on an odd mechanism to determine an
outage.  What you were seeing is the server filling up all the available
recursive "slots" because they weren't getting answered, backing up the
queue.  It wasn't necessarily an indication of an outage, it could have
meant that you had too many people trying to do lookups at once.
However, I suspect that worked well for you, and would generally
indicate there was a problem.


I'd suggest instead using stats to look for problems.  We've been
testing running "rndc stats" every couple of minutes on a server, then
parsing that data to both dump into a DB to graph the results, and to
raise alerts.  With some pretty simple programming, you can keep a
rolling average of errors.  Then, if you get a value that's more than X
above that average, you could raise an alert, or consider that to be an
"outage".  What's harder is getting a really good way to detect
"abnormal" numbers of queries, as the average isn't the best way.
Weekends are lower, weekdays are higher ... I guess the best way to do
it would be to have a daily average (Monday-Sunday) and if the current
errors is greater than that days norm, it's abnormal.  But I digress...

In your situation, looking for hard downs on your connectivity, you
would see successful queries drop to 0 (or near 0), and your errors ramp
up.  that wouldn't be a hard one to detect programmatically.  

The other nice thing about putting this all into a DB is that you can
look back and get historical stats quite easily.

Look at tools like rrd/cacti for graphing, and we've been using perl for
the monitoring stuff.  

Not quite as simple as looking for log lines, but all pretty easy
overall, and has some nice bonuses.

Cheers,

Todd.

-----Original Message-----
From: bind-users-boun...@lists.isc.org
[mailto:bind-users-boun...@lists.isc.org] On Behalf Of Martin McCormick
Sent: Friday, July 17, 2009 9:20 AM
To: bind-us...@isc.org
Subject: Bind9.5.1 under no Root Name Servers

What does bind9.5.1 do when there is an Internet issue and we
loose all root name servers?

        The bind9.3.x we had been running always began producing
tons of lines saying that there were no more recursive clients. I
had written a program that looked for the time stamp when the
mess starts and then for the time stamp of the last distress
call and we called that an outage since bind certainly wasn't
happy.

        We had a very brief outage on the day we switched to
bind9.5.1 and I saw nothing remarkable in the named.log file
during the period where we lost all roots. Either bind9.5.1
doesn't produce this message or the hit just didn't last long
enough for all the recursive slots to fill up.

        We do allow recursion from within our network but
disallow it for 3RD parties.

        Bind is an excellent place to take the pulse of one's
whole network since it is so closely tied to everything else.

        Here is an actual example of the message we look for:

08-Jul-2009 08:38:20.296 client 139.78.102.224#53631:
 no more recursive clients: quota reached

Martin McCormick WB5AGZ  Stillwater, OK 
Systems Engineer
OSU Information Technology Department Telecommunications Services Group
_______________________________________________
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.
_______________________________________________
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

RE: Bind9.5.1 under no Root Name Servers

Reply via email to