Re: Runaway BIND

2007-02-14 Thread Rich Johnson


On Feb 13, 2007, at 11:14 PM, Andy Smith wrote:


On Tue, Feb 13, 2007 at 10:17:49AM -0500, Rich Johnson wrote:
What is surprising is that such an event brought down _another_  
machine.


Would it be fair to say that excessive loggers are ill-behaved?


It sounds like your bind was misconfigured and didn't know what the
IPs of the root servers were.
Always a possibility--though it would mean that it has been  
misconfigured for 6-1/2 years since the initial installation (potato,  
or maybe the woody upgrade) and the problem never before  
encountered.  The list of root servers in /etc/bind/db.root is  
provided by bind/bind9 package.   The only changes to the package  
distribution are the list of forwarders in named.conf.options and the  
zones in named.conf.local.



That is a critical situation so I
don't blame it for logging like mad.  Some monitoring to check your
logs and disk space would be advised.
I have no objection to bind logging its concerns.  But it's the like  
mad bit I find troublesome.15+MB of log per hour repeatedly  
recording the same problem, however critical,  strikes me as  
excessive.  At this rate the recording eventually causes more damage  
than the problem itself.  My server withstood 10hrs (from Sat PM-Sun  
AM) before being overwhelmed.




Try logcheck and, well, I'm not sure what.  I use nagios but that's
a bit much for a single machine home user.  There must be something
though.  Maybe one of those desktop widgets that display graphs of
disk space?
Hmm...any of these capable of shutting down the system as it enters a  
red zone (e.g. /var free space  10M) and before it becomes  
unbootable?   I find unexpected automatic shutdown preferable to  
recovery.


Thx,
--rich





--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Runaway BIND

2007-02-13 Thread Matus UHLAR - fantomas
On 12.02.07 09:55, Rich Johnson wrote:
 I just recovered from bind (8.4.7-1)  flooding /var/log/syslog with  
 several hundred megabytes of messages along the lines of:
 
  grep no addrs found for root syslog | head
 Feb 10 19:54:59 creaky named[7652]: sysquery: no addrs found for root  
 NS (B.GTLD-SERVERS.net)
 Feb 10 19:54:59 creaky named[7652]: sysquery: no addrs found for root  
 NS (C.GTLD-SERVERS.net)
 Feb 10 19:54:59 creaky named[7652]: sysquery: no addrs found for root  
 NS (D.GTLD-SERVERS.net)
 
 When bind goes nuts it repeated cycles through the entire set of root  
 servers at both ROOT-SERVERS.NET and GTLD-SERVERS.net at the rate of  
 ~550 logs/sec.  A typical burst runs for ~75 seconds or so and emits  
 ~42000 messages.  In my case the bursts started at 19:54:59 EST   
 (UTC-5) and affected both master and slave servers.  That and the  
 maturity of bind leads me to suspect some external trigger.

 The named.conf option set is rather daunting.  Can anyone suggest  
 some options to throttle back the verbosity?

well, i suggest you
- upgrade to bind9 (preferrably 9.3)
- check your named.root zone, if it exists and if you have it configured.

-- 
Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Fucking windows! Bring Bill Gates! (Southpark the movie)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Runaway BIND

2007-02-13 Thread Rich Johnson


On Feb 13, 2007, at 3:12 AM, Matus UHLAR - fantomas wrote:


On 12.02.07 09:55, Rich Johnson wrote:
well, i suggest you
- upgrade to bind9 (preferrably 9.3)
- check your named.root zone, if it exists and if you have it  
configured.


Done!  Let's hope it helps.  I run dist-upgrade (testing) regularly.   
Perhaps at some point it should specify bind9?


I think I've identified the triggering event.  It appears that it was  
my gateway router (a DSL modem) losing power.  Under such conditions  
the failure to reach the root servers is not surprising.


What is surprising is that such an event brought down _another_ machine.

Would it be fair to say that excessive loggers are ill-behaved? 
  



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Runaway BIND

2007-02-13 Thread Andy Smith
On Tue, Feb 13, 2007 at 10:17:49AM -0500, Rich Johnson wrote:
 What is surprising is that such an event brought down _another_ machine.
 
 Would it be fair to say that excessive loggers are ill-behaved? 

It sounds like your bind was misconfigured and didn't know what the
IPs of the root servers were.  That is a critical situation so I
don't blame it for logging like mad.  Some monitoring to check your
logs and disk space would be advised.

Try logcheck and, well, I'm not sure what.  I use nagios but that's
a bit much for a single machine home user.  There must be something
though.  Maybe one of those desktop widgets that display graphs of
disk space?

Cheers,
Andy

-- 
http://bitfolk.com/ -- No-nonsense VPS hosting
Encrypted mail welcome - keyid 0x604DE5DB


signature.asc
Description: Digital signature


Runaway BIND

2007-02-12 Thread Rich Johnson

OUCH!

I just recovered from bind (8.4.7-1)  flooding /var/log/syslog with  
several hundred megabytes of messages along the lines of:


 grep no addrs found for root syslog | head
Feb 10 19:54:59 creaky named[7652]: sysquery: no addrs found for root  
NS (B.GTLD-SERVERS.net)
Feb 10 19:54:59 creaky named[7652]: sysquery: no addrs found for root  
NS (C.GTLD-SERVERS.net)
Feb 10 19:54:59 creaky named[7652]: sysquery: no addrs found for root  
NS (D.GTLD-SERVERS.net)


When bind goes nuts it repeated cycles through the entire set of root  
servers at both ROOT-SERVERS.NET and GTLD-SERVERS.net at the rate of  
~550 logs/sec.  A typical burst runs for ~75 seconds or so and emits  
~42000 messages.  In my case the bursts started at 19:54:59 EST   
(UTC-5) and affected both master and slave servers.  That and the  
maturity of bind leads me to suspect some external trigger.


Upon reboot, init would not tolerate a full /var.  I was able to  
recover with no data loss by:

1.  rebooting into /bin/sh
2.  manually running fsck
3.  moving the morbidly obese syslog to another partition for analysis
4.  normal reboot.

The named.conf option set is rather daunting.  Can anyone suggest  
some options to throttle back the verbosity?


Thx,
--rich






--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]