Hi!
Just to let you know that I've made a change to CVS today, reported by Pawel Malachowski, where it looked like the plugins were making too many calls to resolver/DNS when the plugins were compiled with IPv6 options enabled.
This should reduce the occasions of timeouts. However, I do like the idea of making the Nagios server a caching name server too...
Ton On 9 Nov 2006, at 22:25, Steve Shipway wrote: We dealt with this by installing a local caching-only nameserver on the Nagios host itself. This also took a lot of the load off of the main nameservers. So, resolv.conf was set to use 127.0.0.1 by default and have our normal name servers as secondaries. A nice sideeffect was that it vastly sped up the name resolution. Steve -- Steve Shipway ITSS, University of Auckland (09) 3737 599 x 86487 [EMAIL PROTECTED]
Yey !! That totally did it. Thx AZ I hadn't even considered messing with the resolver cuz I was sure it was a nagios issue so I had to fix nagios. If that wasn't a text book example of how well mailinglists can work then I don't know what is...
thx
On 11/7/06, Az <[EMAIL PROTECTED]> wrote: stucky wrote: > I use the check_by_ssh plugin for most of my stuff and I noticed that > if the primary nameserver is unavailable nagios starts freaking out. > All of a sudden all plugins time out. I tested it using the 'host' > command and it only takes about 1 second longer to lookup hosts using > the secondary nameserver. > The default timeout for check_by_ssh is 10 seconds. I cranked it up to > 30 and still I get timeouts. I'm not sure I understand that one. > Has anyone else seen this. We had a similar issue in that our primary DNS was doing strange things, and it quite often took 5 or even 10 seconds to perform a DNS lookup. What we were seeing was 70% of service checks (and subsequently host checks) failing by timing out. The key was the multiple of 5 seconds. The resolver timeout on, say, RHEL3 is based on RES_TIMEOUT in resolv.h... which was 5 seconds.
We added the following to our resolv.conf , and found the problems went away:
options timeout:2 rotate
This sets the timeout for waiting for a reply to 2 seconds, and tells the resolve to rotate through your 'nameserver' entries rather than always hitting #1, then #2, etc.
Cheers.
-- stucky
This message has been scanned for viruses by MailController. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo Nagios-users mailing list ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
T: +44 (0)870 787 9243 F: +44 (0)845 280 1725 Skype: tonvoon |
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue.
::: Messages without supporting info will risk being sent to /dev/null