I use service deps. Most of my services are nrpe checks and I create a dep on nrpe. If a check comes back critical (or which ever state you choose to execute the dep) it does an nrpe check, if nrpe returns critical (or whichever state you choose) it stops executing the services dependant on nrpe.
My load is less than 2 on a machine with 800 hosts and 6000 services. Active host checks are disabled. As for ping I don't check as a service only a host check which gets executed if any service turns critical. You can use check_ssh as the host check command instead of ping if you prefer as well. On 1/27/09, Mathieu Gagné <mga...@iweb.com> wrote: > Hi, > > > Rahul Nabar wrote: >> I set up my nagios system to monitor 256 odd nodes each with about 6 >> services (direct and NRPE). It is working fine but my load averages have >> started edging upwards. Not critical yet but I wanted some tips to make >> things more efficient and see if there are things I might have done >> ineffeciently. > > We have +2000 hosts and +4700 services configured on one of our Nagios > instance. Load average is between 1.3 an 2.0 which I find acceptable. > > Our hardware is the following: Core2 Duo 4300 @ 1.80GHz with 2GB of RAM. > >> One of the points I identified is this: I am doing a ping and ssh check >> on each server. This seems redundant. Is there a way to set it up so that: >> Do a ssh check; if this succeds obviously ping is ok. If it fails do a >> ping check and report on that. > > "check-host-alive" is only triggered when a service associated with the > host changes state. > > However, I personally consider PING to be a service in itself, > monitoring the network performance/quality. > > PING can still answer but with degraded performances (packet loss, poor > response time). You probably want to be informed about such problems. > (ie. in case of a (D)DoS where your network port is maxed out) > >> How about the other way around too? I have a bunch of NRPE checks: >> load_average, total-processes, scratch and home dir usage, pbs_mom, >> ntp_time. If ssh fails then there is obviously no reason to try these >> other checks right? But I think the monitoring_host wastes its cycles >> still trying them (based on the "Last Check" time) > > The SSH service state can be CRITICAL while all the other services are > still OK. (ie. ssh server misconfiguration) You probably want to be > informed about it too. > >> Any tips how I can achieve these effeciency tweaks? Or is there a >> problem in my strategy? Any other performance tweaks so that I can >> squeeze every ounce of Nagios performace? >> >> Already I am using NRPE rather than check_by_sshh since I was told the >> latter might be ineffecient for the monitoring host load usage. > > What kind of server are you using? > > Also, what's the check_interval? A 1 minute interval might put the > server on its knee since it would be scheduling and executing 1536 > checks per minute. (as per your informations) > > There's a lot of factors that could impact Nagios performance and you > should be aware of all of them. Reading the documentation and > understanding the impact of each configuration would be a good start. > > -- > Mathieu > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null