Re: [Nagios-users] graphing trends across hosts or services instead of a timeseries
On Wed, Mar 4, 2009 at 3:33 PM, Marco Tiradomarco.tir...@gmail.com wrote: PNP4Nagios has a feature called pages that allows you to show multiple services for the same host or multiple hosts for the same service. It should be easy to use since it supports regular expressions. Check the following link http://www.pnp4nagios.org/pnp/pages I can't seem to get the pages feature of pnp4nagios working correctly to get an assembly of charts. I just tried creating a page by modifying the canned file web_traffic.cfg-sample in the directory /nagios/etc/pnp/pages/ like so: filename: cpu_temperatures.cfg # define page { use_regex 1 page_name Compute Nodes Current CPU Core Temperatures } define graph { host_name star177,star178,star179 service_descCpuCoreTemperature } I still do not see the webpage. Where should I be looking for in the web-interface? I already have a corresponding entry in the services.cfg file: define service{ use rpn_intermediate_service hostgroup_name 64bit-compute-nodes service_description CpuCoreTemperature check_command check_nrpe!check_cpu_temp use srv-pnp-rpn-intermediate } The permissions and ownership of cpu_temperatures.cfg also seems correct. What else could I be messing up? Any advice? -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] how can I access data stored by pnp4nagios
On Thu, Jun 25, 2009 at 7:31 PM, Rahul Nabar rpna...@gmail.com wrote: In which case is there another way for me to access historic_perf_quantity at a given point in time for all servers? Or perhaps it is possible to generate something like this using custom rrdtool queries? Any suggestions? Just noticed that one of my old emails had received a pointed to the pages feature in pnp4nagios. Maybe that will work this time around.. -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] how can I access data stored by pnp4nagios
I just updated my nagios installation so that I can get cpu-core temperatures with lm_sensors. Works fine. I have pnp4nagios which give good time series trends of temperatures as well. But the problem is if I want a snapshot of CPU temperatures right now for my whole server room hardware (or at any other past instance of time) pnp4nagios is not useful. It gives a time series but not a spatial (trend over servers at a point in time) graph. If I wanted to hack together my own graphing tool what's the best place to pull out data from? Digging in I noticed the /nagios/share/perfdata/serverxxx/ directory which does have the xml and rrd files. Is this a good spot to pull data out from for each monitored server? Unfortunately that would only be for the latest time instance , right? But since pnp4nagios can plot over a time range it must have access to historic temperatures too. So, where are these stored? I suspect internally as a binary produced my rrdtool? In which case is there another way for me to access historic_perf_quantity at a given point in time for all servers? Or perhaps it is possible to generate something like this using custom rrdtool queries? Any sugesstions? -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] nagvis requires ndoutils; how stable is ndoutils?
On Tue, Jun 16, 2009 at 6:47 AM, Kevin Keane subscript...@kkeane.comwrote: I just installed ndoutils with mysql. There indeed was one pitfall: the database is growing quite large very quickly. Eventually, the DB got sluggish and couldn't keep up with the data Nagios threw at it (the DB server is quite underpowered). It got so bad that after a week or so, Nagios wouldn't even start up. It turned out that it wasn't primarily the database itself, but binary logging. It is turned on by default (at least on CentOS) but you only need it for replication. If you are not using replication, simply turn off binary logging and you should be good to go. At least, I hope so; I only made that change yesterday, so I won't know for another week or so. Thanks for all those helpful comments guys! You might have saved me from a few disasters here. I think I am staying away from Nagvis (and ndoutils ) for now. Nagviz seems to me one of those tools that simply look great but the back-end still needs quite some work before I'd be brave enough to unleash it in a production environment! -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] nagvis requires ndoutils; how stable is ndoutils?
On Thu, Jun 25, 2009 at 12:20 AM, Kevin Keane subscript...@kkeane.comwrote: I think that is a bit overreacting. ndoutils is a database client. Thanks Kevin. Point taken. Databases need management and tuning to get you good performance - that's just routine, regardless of the brand you are using: mysql, SQL Server, Oracle, Postgres, But the way nagios natively stores data seems to be pretty robust though. Nagios has scaled excellently right out of the box. From all these discussions it seems that the problems arise when I try to hook up ndoutils etc. in there. Maybe I am wrong! No amount of work or polishing will change that. There's a reason DBAs are highly valued professionals. I feel that's the crux though. If each native nagios install neeed a skilled DBA to tune it till it worked I doubt it'd have been so successful. -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] nagvis requires ndoutils; how stable is ndoutils?
On Wed, Jun 10, 2009 at 2:56 AM, Giorgio Zarrellizarre...@linux.it wrote: it works. It's not the best, it has some overhead problems with MySql, causing some taxing utilization of the cpu, but it works. Sometimes you can fall in some indexing problems, but you can workaround them with this sql clause: Thanks Giorgio and Marc-André. I think those comments confirm what I thought. It is somewhat unstable and unfit to push into a production environment. Nagvis did look so cool otherwise though! Support also seems sort of iffy. I was toying with using the ndo2fs backend but there is very little documentation. Besides usually Nagios questions generate a lot of responses on the list but the relative silence for my Nagvis questions seems to say that not so many users are trying it yet. -- Rahul -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] cannot make comments under nagios after server crash. nagios.cmd missing
I recently had a server crash. I recovered and restarted nagios manually but now I seem to have lost the ability to make comments on hosts. If I try I get the error message: Error: Could not stat() command file '/usr/local/nagios/var/rw/nagios.cmd'! The external command file may be missing, Nagios may not be running, and/or Nagios may not be checking external commands. An error occurred while attempting to commit your command for processing. That file is indeed missing. Running a locate nagios.cmd though shows the file at that location so it must have been there before the crash. Do I need to restart something? What am I missing. The older comments are intact though. It is just that I cannot make new comments. -- Rahul -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] how to generate an availability report for a specific service on a whole hostgroup.
I cannot figure out what is the best way to generate (if at all possible) this kind of a report in Nagios: For a specific service (ssh) list all the nodes in a hostgroup that had a status other than normal in the last year (say). I can get a trend for a service on a specific host. But how do I get this report for an entire hostgroup+service combination? Any tips? -- Rahul -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] A group of Nagios users are:
On Mon, Mar 9, 2009 at 6:15 PM, Martyn mar...@chetnet.co.uk wrote: Just on a lighter note, what do we call a bunch of Nagios users; Nagiothions? Nagiosers is my vote. :) -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Alert History seems empty
On Mon, Mar 2, 2009 at 1:12 PM, Marc Powell m...@ena.com wrote: On Mar 2, 2009, at 11:25 AM, Rahul Nabar wrote: Alert history isn't performance data. An 'alert' is logged when the service changes state (i.e. OK-CRITICAL for example). Your service has not changed state in the current log file. Thanks Marc. My bad. What I should have been looking for is View Trends for this service. I got confused between those two. -- Rahul -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Alert History seems empty
On Mon, Mar 2, 2009 at 1:57 PM, Jim Avery j...@jimavery.me.uk wrote: The history will normally only record anything if the ping check has changed state (for example from OK to Warning). If there's nothing in the log for a particular day, it simply means it''s been pinging fine all day (or if it's been critical all day long). Nagios itself doesn't do anything with the performance data, but can be configured to pass it on to flat files, a database or to a graph for example PNP or nagiosgrapher. I use (and recommend) PNP as it's easy to install and use and seems to get better better with every new release. http://www.pnp4nagios.org/pnp/start Thanks Jim. That makes sense now. I was misinterpreting the term alert history. I already have PNP4nagios working. -- Rahul -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Alert History seems empty
If I try View Alert History for this Service I seem to get an error No history information was found for this this service in the current log file The file that it reports File: /usr/local/nagios/var/nagios.log is indeed correctly present. What else could be wrong? Does Alert History have to be explicitly enabled in some way? This is a simple ping service so it does have performance data. -- Rahul -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] graphing trends across hosts or services instead of a timeseries
On Tue, Feb 24, 2009 at 3:17 AM, Jim Avery j...@jimavery.me.uk wrote: I had to install some dependencies, I forget which ones, but I'm pretty sure librrds-perl was one of them. The web interface for drraw is fairly intuitive, except it took me a few minutes to notice that in order to save a graph, you need to specify a Graph Title in the Graph Options section! hth, Jim Awesome! That will be a lot of help I am sure. Thank you again, Jim. I appreciate the help! -- Rahul -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] nagios does not produce performance data for check_total_procs
While getting PNP to graph stuff for my Nagios services I get an error whenever I try to plot the Total Processes field. RRD Database/usr/local/nagios/share/perfdata/star23/Total_Processes.rrd not found. I checked and that file for performance data indeed seems absent. I do have the process_perf_data set to 1. Any ideas why Nagios is not producing performance data for this particular service? define service{ hostgroup_name npre-compute-nodes service_description Total Processes check_command check_nrpe!check_total_procs process_perf_data 1 use srv-pnp-rpn-intermediate } -- Rahul -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_lm_sensors and the correct sensor names for checking cpu temperatures
On Thu, Feb 12, 2009 at 1:35 AM, Matteo Corti matteo.co...@gmail.comwrote: Hi Rahul, On Feb 11, 2009, at 20:55 , Rahul Nabar wrote: On Wed, Feb 11, 2009 at 11:15 AM, Matteo Corti matteo.co...@gmail.com wrote: Dear Rahul, Does the input include the newline between the Core0 Temp: and the temperature? Yes, it does! Is that messing up the regexes? Yes ... :-( I hoped that the sensors -u output was more standard and that I could rely on that. I'll try to see if I can get the sensors information in a way which is consistent on several systems ... Regexes always have this habit of breaking up when one is sure one covered all test cases! :) Thanks for helping me figure out what it was! I'll see if I can quickly hack together something that'll make it work for me! -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] graphing trends across hosts or services instead of a timeseries
One other thing that I haven't figured out yet with PNP-NAGIOS is this: How does one get trending across services or hosts? i.e. It is easy to see time series graphs of pingtimes, load averages disk usages etc. but sometimes what seems more relevant is a chart across services for a given snapshot in time. Say, to identify a hot node, or a node with unusually high load averages. Is there a way to do this? Or am I tinkering with the wrong tool! -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] graphing trends across hosts or services instead of a timeseries
On Thu, Feb 12, 2009 at 11:46 AM, Lee Azzarello l...@dropio.com wrote: Nagios itself does have some trending tools in version 3, though they are not very comprehensive. Are you looking for something beyond their scope? Thanks Lee. I am not aware of the scope of the inbuilt trending tools. Maybe that's a good place to start. How does one use those? Say, how can one obtain a graph of ping times across all hosts in a suitable format? That might make it easy to identify problem machines. -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] graphing trends across hosts or services instead of a timeseries
On Thu, Feb 12, 2009 at 2:06 PM, Lee Azzarello l...@dropio.com wrote: In Nagios version 3, you click on Reporting-Trends and use the menus to generate a picture Thanks again Lee! The limitation is you can only see one picture at a time for a particular host or service That is a drawback. The whole idea is to get a picture across *many* hosts or services. For a given host my PNP already generates better plots than the inbuilt Nagios trending suite. PNP does seem very geared for this. Just not sure how to make it plot a certain time slice instead of a historic time series! -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] check_iptables and the -S option for iptables; now defunct?
I was trying to roll the check_iptables script but ran a hiccup since my system iptables refuses to accept the -S option that is included in the script when it invokes iptables. iptables v1.3.5: Unknown arg `-S' Any other users of this script? Have you guys done away with the -S option? Any workarounds? It seems this option was removed in later iptables versions. But I am not expert enough with iptables to exactly understand its relevance. -- Rahul -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_iptables and the -S option for iptables; now defunct?
Found this on the list. I had to make one modification within the script: The -S argument is not known by the version, 1.3.8, of iptables on the server in question, so I replaced it with the -L argument. [ http://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg23867.html ] It does seem to work; although I am not really sure what I am doing! :) -- Rahul -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_lm_sensors and the correct sensor names for checking cpu temperatures
On Wed, Feb 11, 2009 at 1:28 AM, Matteo Corti matteo.co...@gmail.comwrote: Could you try to send me both the output of 'sensors' and 'check_lm_sensors --list -v -v'. Many thanks, Sure. Here it is rpna...@star255:~/usr/local/nagios/libexec/check_lm_sensors --list -v -v warning: hddtemp not found: HDD temperatures not checked sensors found at /usr/bin/sensors LM_SENSORS OK - | rpna...@star255:~sensors k8temp-pci-00c3 Adapter: PCI adapter Core0 Temp: +25°C Core1 Temp: +20°C -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_lm_sensors and the correct sensor names for checking cpu temperatures
On Wed, Feb 11, 2009 at 11:15 AM, Matteo Corti matteo.co...@gmail.comwrote: Dear Rahul, Does the input include the newline between the Core0 Temp: and the temperature? Yes, it does! Is that messing up the regexes? -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_lm_sensors and the correct sensor names for checking cpu temperatures
On Wed, Feb 11, 2009 at 11:18 AM, Matteo Corti matteo.co...@gmail.comwrote: I forgot: could you also please send me the output of 'sensors -uA' I feel that I could parse the raw output to avoid problems with spaces and newlines. Cheers and thanks again Matteo k8temp-pci-00c3 Core0 Temp: 24.00 (temp1) ERROR: Can't get feature `temp2' data! Core1 Temp: 20.00 (temp3) ERROR: Can't get feature `temp4' data! -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_lm_sensors and the correct sensor names for checking cpu temperatures
On Tue, Feb 10, 2009 at 2:57 PM, Matteo Corti matteo.co...@gmail.comwrote: Dear Rahul, Please post the output of /usr/local/nagios/libexec/check_lm_sensors --list Maybe, the --list option should tell us what check_lm_sensors sees but since it parses the output of sensors it should work. Thanks again Matteo! usr/local/nagios/libexec/check_lm_sensors --list LM_SENSORS OK - | That's the only output I get. -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_lm_sensors and the correct sensor names for checking cpu temperatures
On Tue, Feb 10, 2009 at 3:03 PM, Rahul Nabar rpna...@gmail.com wrote: On Tue, Feb 10, 2009 at 2:57 PM, Matteo Corti matteo.co...@gmail.comwrote: Dear Rahul, Please post the output of /usr/local/nagios/libexec/check_lm_sensors --list Maybe, the --list option should tell us what check_lm_sensors sees but since it parses the output of sensors it should work. Thanks again Matteo! usr/local/nagios/libexec/check_lm_sensors --list LM_SENSORS OK - | That's the only output I get. Just a thought: Could it have something to do with my $LANG variable? It was set to en_US.UTF-8 earlier and then in the output of sensors the centigrade sign appeared messed up. Once I set it to export LANG=en_EN it appears correctly. Could this be screwing up the regexes inside check_lm_sensors? If so, what's the workaround? Maybe I'm totally off the mark and this is just a red herring. -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NRPE clutters /var/log/messages
On Sun, Feb 8, 2009 at 9:18 AM, Hiren Patel hir3npa...@gmail.com wrote: Rahul Nabar wrote: My /var/log/messages shows hundreds of entries of this sort: Feb 6 23:33:00 star256 xinetd[15109]: START: nrpe pid=17610 from=:::11.0.0.100 Feb 6 23:33:01 star256 xinetd[15109]: EXIT: nrpe status=0 pid=17610 duration=1(sec) Are they just indicative of normal nrpe operations? If so, how can I disable them so as not to clutter my log? I do have debug=0 in my nrpe.conf. Why still these messages? -- looks normal for me. the messages seem like they come from xinetd though, you could look at: 1) xinetd logging options 2) getting inetd logging to a separate file using xinet/syslog configuration. Thanks Hiren. I'll look into those. Otherwise it is too much data to be logged during normal operations! -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NRPE and redundant calls to remote hosts.
On Sun, Feb 8, 2009 at 9:08 AM, Hiren Patel hir3npa...@gmail.com wrote: Marc Powell wrote: Passive checks with NSCA is pretty close, minus the 'if there is a status change' part. You could build that logic into whatever wrapper you are using to run the plugins on the remote host though. From the perspective of the nagios host, passive checks are much better than active checks. Thanks Hiren and Marc! and if you're processing performance data for graphing or the like, you want the results submitted even if the service is okay. True. But for some services I'd like to know much quicker if something is wrong than if it is just sending performance data back for graphs. The passive approach seems perfect for this. -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] how is Service check Latency defined in nagios?
What exactly is the Service check Latency in nagios? My processor load averages are still ok after I enabled PNP but my latencies have shot through the roof. Should I be worried or not? I have latencies around 46k millisecs and execution times of 800 millisecs for my services. -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] how is Service check Latency defined in nagios?
On Mon, Feb 9, 2009 at 1:31 PM, Marc Powell m...@ena.com wrote: - not allowing nagios to run sufficient concurrent checks for your configuration. running 'bin/nagios -s etc/nagios.cfg' will provide you with a recommendation. Make sure max_concurrent_checks is that high or higher. Thanks Marc. I have: max_concurrent_checks=0 I ran the -s option and it produced a bunch of stats. (see below) But I am not sure which line is the recommendation you refer to. In addition it does report some savings I could get by using the -x and -u options. Maybe I ought to enable those. Not sure whether those will be latency reduction (good) or execution time reduction (not so relevant for me) -- Rahul /usr/local/nagios/bin/nagios -s /usr/local/nagios/etc/nagios.cfg Nagios 3.0.6 Copyright (c) 1999-2008 Ethan Galstad (http://www.nagios.org) Last Modified: 12-01-2008 License: GPL Timing information on object configuration processing is listed below. You can use this information to see if precaching your object configuration would be useful. Object Config Source: Config files (uncached) OBJECT CONFIG PROCESSING TIMES (* = Potential for precache savings with -u option) -- Read: 0.012501 sec Resolve: 0.000822 sec * Recomb Contactgroups: 0.81 sec * Recomb Hostgroups:0.004820 sec * Dup Services: 0.031199 sec * Recomb Servicegroups: 0.000847 sec * Duplicate:0.03 sec * Inherit: 0.003707 sec * Recomb Contacts: 0.05 sec * Sort: 0.01 sec * Register: 0.028375 sec Free: 0.002792 sec TOTAL:0.085155 sec * = 0.041487 sec (48.72%) estimated savings RETENTION DATA TIMES -- Read and Process: 0.429982 sec TOTAL:0.429982 sec Timing information on configuration verification is listed below. CONFIG VERIFICATION TIMES (* = Potential for speedup with -x option) -- Object Relationships: 0.103647 sec Circular Paths: 0.002239 sec * Misc: 0.003193 sec TOTAL:0.109079 sec * = 0.002239 sec (2.1%) estimated savings EVENT SCHEDULING TIMES - Get service info:0.023477 sec Get host info info: 0.000172 sec Get service params: 0.35 sec Schedule service times: 0.041946 sec Schedule service events: 0.021094 sec Get host params: 0.07 sec Schedule host times: 0.003610 sec Schedule host events:0.008372 sec TOTAL: 0.098713 sec Projected scheduling information for host and service checks is listed below. This information assumes that you are going to start running Nagios with your current config files. HOST SCHEDULING INFORMATION --- Total hosts: 265 Total scheduled hosts: 262 Host inter-check delay method: SMART Average host check interval: 300.00 sec Host inter-check delay: 1.15 sec Max host check spread: 30 min First scheduled check: Mon Feb 9 13:50:01 2009 Last scheduled check:Mon Feb 9 13:54:59 2009 SERVICE SCHEDULING INFORMATION --- Total services: 2073 Total scheduled services: 2057 Service inter-check delay method: SMART Average service check interval: 300.58 sec Inter-check delay: 0.15 sec Interleave factor method: SMART Average services per host: 7.82 Service interleave factor: 8 Max service check spread: 30 min First scheduled check: Mon Feb 9 13:50:38 2009 Last scheduled check: Mon Feb 9 13:55:40 2009 CHECK PROCESSING INFORMATION Check result reaper interval: 10 sec Max concurrent service checks: Unlimited PERFORMANCE SUGGESTIONS --- I have no suggestions - things look okay. -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages
Re: [Nagios-users] how is Service check Latency defined in nagios?
On Mon, Feb 9, 2009 at 3:58 PM, Max perld...@webwizarddesign.com wrote: Rahul, On Mon, Feb 9, 2009 at 2:53 PM, Rahul Nabar rpna...@gmail.com wrote: Thanks Marc. I have: max_concurrent_checks=0 Our experience has been that with max_concurrent_checks set to 0 and inter-check delay and nagios sleep set very low we get high reported service check latencies as we are basically asking Nagios to try and run everything as soon as possible ... 1000s of checks over a few seconds in essence ... which it can't do. As far as 'real life' negative impact the high latency in this singular case hasn't meant much; it initially really worried me until i realized that the high service latency is just happening because we are basically telling nagios to pause / sleep / wait for as little time as possible and run things as quickly as possible. We have around a 146 second service check latency but from our detailed Nagios metrics we see that check runs are completing in right around 4 minutes, under our 5 minute hard-ceiling (around 6000 checks). our PNP performance graphs prove our suspicions .. our reporting server receives 6000 metrics in 4 minutes or less and we have no gaps in our graphs or major under or over sampling problems with the data we retrieve from our remote agents. I only bring that up because if you not only have max_concurrent_checks set to 0 but also have tuned way down inter-check delay settings and sleep time you might be encountering the same situation and the high latency might not be something to worry about .. but only IF you have all your delays tuned very low and no ceiling on max checks. for any other situation it is definitely something to investigate. Thanks Max. That is a pretty intricate issue that I had no idea about! I'm still trying to figure out the exact implications of what you describe. Maybe I need to visit the Nagios manual again to re-read nagios's scheduling logic. It's especially important to me now that I also have PnP running performance stats. Meanwhile this is a dump of the relevant parameters you speak about. I don't recall changing any from their defaults. Maybe I ought to in the light of what you mentioned? service_inter_check_delay_method=s host_inter_check_delay_method=s sleep_time=0.25 #Timeouts: service_check_timeout=60 host_check_timeout=30 event_handler_timeout=30 notification_timeout=30 ocsp_timeout=5 perfdata_timeout=5 -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] how is Service check Latency defined in nagios?
On Mon, Feb 9, 2009 at 4:52 PM, Max perld...@webwizarddesign.com wrote: Yes, definitely do that. I talk about how my team set up nagios and PNP to minimize delays in polling on my blog, though be warned that we break some of the rules that the documentation says to always follow, like doing a fork() in a NEB module and setting inter-check delay methods to n .. none. so while it works for us I know that a number of people on this list would probably balk at how we did things and call us idiots :). Thanks again Max! I think sometimes one is forced to disobey the standard prescriptions! Maybe that is idiotic but whatever works! :) http://www.semintelligent.com/blog/ Thanks for the blog. Just found a very useful snippet there: ps -e -a -x -f -o %u | sort | uniq -c | sort -rn there. If I use this I find that the nagios owned processes seem to fluctuate a lot. Suddenly it goes as high as 54 and then for a while it owns only 3 processes. Then it shoots up again. Very interesting. Maybe that is the phenomenon you were referring to? I should probably wrap it in a bash wrapper and get it to graph the nagios processes in a 1 sec resolution to get a finer-time-grained idea of what is going on! -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] One Time Check
On Fri, Feb 6, 2009 at 10:59 AM, Marc Powell m...@ena.com wrote: set the correct check_interval in the service definition. http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#service check_interval 1440 # 1440 minutes or 1 every 24 hours. Unless it is important to control *when* the check runs within a 24 hour period? -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] NRPE and redundant calls to remote hosts.
I've been adding a bunch of checks via NRPE on remote nodes and this got me thinking. Isn't it inefficient to keep starting check_nrpe calls from the monitoring host all the time? Why cannot nrpe on the remote node monitor some of the local services and only send a message back to nagios if there is a status change? For warnings based on things like disk usage, cpu usage, total procs, pbs scheduler daemon status , cpu temperatures etc. coudn't this approach relive the central host's cpu of a lot of endless check_nrpe calls? NRPE is already doing the work; its just a question of what initiates a communication channel between NRPE and nagios. Just curious. Or maybe there is a way of achieving this already that I don't know of! -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] One Time Check
On Fri, Feb 6, 2009 at 11:57 AM, Marc Powell m...@ena.com wrote: On Feb 6, 2009, at 11:08 AM, Rahul Nabar wrote: check_interval 1440 # 1440 minutes or 1 every 24 hours. Unless it is important to control *when* the check runs within a 24 hour period? The OP didn't state any such requirement. You are right. I over-assumed. -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios remote GUI interface
On Fri, Feb 6, 2009 at 11:47 AM, michael.washing...@fitchratings.comwrote: Just completed an upgrade, from Nagios 1.x to 3.x on separate desktop devices however. On replacement device, I can locally load GUI using http specified to either localhost or actual static ip, but cannot load remotely as I was able to on 1.x device. 1.x was load on REL enterprise 3.x level while 3.x was through Fedora 9/SELinux. My browser fails to present an authentication Window fro the 3.x device. I thought it was the Linux firewall config which I have temporarily disabled while resolving, but still same problem. Any thoughts? Maybe SELINUX is blocking it? Try setenforce 0 just to check? Maybe you already did. Just a thought. -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios remote GUI interface
On Fri, Feb 6, 2009 at 12:49 PM, michael.washing...@fitchratings.comwrote: Just performed it...but still no luck Another random idea. Can you open any pages at all if they reside on the new machine? Just wondering if its an apache (etc.) issue. I had a bunch of restrictive conditions on my /etc/httpd/conf/httpd.conf about who could access what pages. -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] NRPE clutters /var/log/messages
My /var/log/messages shows hundreds of entries of this sort: Feb 6 23:33:00 star256 xinetd[15109]: START: nrpe pid=17610 from=:::11.0.0.100 Feb 6 23:33:01 star256 xinetd[15109]: EXIT: nrpe status=0 pid=17610 duration=1(sec) Are they just indicative of normal nrpe operations? If so, how can I disable them so as not to clutter my log? I do have debug=0 in my nrpe.conf. Why still these messages? -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] posting graphs of pnp4nagios performance stats.: latencies of host and service checks are terribly degraded
After I enabled pnp4nagios my service and host latencies have shot up disastrously! The transition around midnight yesterday (when I got pnp running) is amazing! Just posting two graphs here in case it helps anybody else: http://picasaweb.google.com/rpnabar/Nagios_debug?feat=directlink Either: (A) I am doing something stupidly bad with the default mode OR (B) I really need to go to the bulk mode. I would really appreciate any other tips / stories users have for me! -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] nagios and lm_sensors
On Wed, Feb 4, 2009 at 12:46 AM, Matteo Corti matteo.co...@id.ethz.chwrote: Dear Rahul, You need the Nagios::Plugin and Nagios::Plugin::Threshold Perl modules which you can get on CPAN http://search.cpan.org/~tonvoon/Nagios-Plugin-0.31/lib/Nagios/Plugin.pmhttp://search.cpan.org/%7Etonvoon/Nagios-Plugin-0.31/lib/Nagios/Plugin.pm I did install ' Nagios::Plugin' from the link. Is there a seperate 'Nagios::Plugin::Threshold' module to be installed or is it included by default? It seems I still don't have success. 'perl Makefile.PL' says: Warning: prerequisite Class::Accessor 0 not found. Warning: prerequisite Config::Tiny 0 not found. Warning: prerequisite Math::Calc::Units 0 not found. Warning: prerequisite Params::Validate 0 not found. Writing Makefile for Nagios::Plugin make doesn't complain *but* make test is a complete disaster! Failed 16/16 test scripts, 0.00% okay. 715/718 subtests failed, 0.42% okay. make: *** [test_dynamic] Error 255 I must still be doing something stupid! Any other suggestions? -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] added PNP in the default mode; how serious is the performance degradation to warrant a switch to the bulk mode?
So, I finally succeed in configuring PNP for Nagios (whew!!)! It's been a long bloody battle but I think I eventually won! :) I've just added PNP performance graphs to my 4-switches for now. Am a bit hesistant about adding it to all my 300 hosts due to all the caeveats about performance and PNP in the default mode. Any other PNP users? How many services / hosts are you running PNP performance graphs on? How is your performance? Have you been forced to switch to the bulk mode already? -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] nagios and lm_sensors
Ah! Ok. Yes. I can dig deeper and install them all from CPAN. Didn't realize the dependencies were so many! Thanks again! -- Rahul On Wed, Feb 4, 2009 at 11:22 PM, Matteo Corti matteo.co...@id.ethz.chwrote: Dear Rahul, On Feb 5, 2009, at 5:23 , Rahul Nabar wrote: On Wed, Feb 4, 2009 at 12:46 AM, Matteo Corti matteo.co...@id.ethz.ch wrote: Dear Rahul, You need the Nagios::Plugin and Nagios::Plugin::Threshold Perl modules which you can get on CPAN http://search.cpan.org/~tonvoon/Nagios-Plugin-0.31/lib/Nagios/http://search.cpan.org/%7Etonvoon/Nagios-Plugin-0.31/lib/Nagios/ Plugin.pm I did install ' Nagios::Plugin' from the link. Is there a seperate 'Nagios::Plugin::Threshold' module to be installed or is it included by default? It seems I still don't have success. 'perl Makefile.PL' says: Warning: prerequisite Class::Accessor 0 not found. Warning: prerequisite Config::Tiny 0 not found. Warning: prerequisite Math::Calc::Units 0 not found. Warning: prerequisite Params::Validate 0 not found. Writing Makefile for Nagios::Plugin make doesn't complain *but* make test is a complete disaster! Failed 16/16 test scripts, 0.00% okay. 715/718 subtests failed, 0.42% okay. make: *** [test_dynamic] Error 255 I must still be doing something stupid! Any other suggestions? All the warnings are telling you that you are missing several *needed* modules (e.g., Class::Accessor, Config::Tiny, ...) You can install them via CPAN or maybe you can already find them packaged for your OS. Consult the documentation of the Perl distribution you are using. Matteo -- ETH Zurich, Dr. Matteo Corti, Informatikdienste / Basisdienste STC E 13, Stampfenbachstrasse 67, 8092 Zurich Tel +41 44 6327944, http://www.id.ethz.ch -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] nagios and lm_sensors
On Wed, Feb 4, 2009 at 11:22 PM, Matteo Corti matteo.co...@id.ethz.chwrote: All the warnings are telling you that you are missing several *needed* modules (e.g., Class::Accessor, Config::Tiny, ...) You can install them via CPAN or maybe you can already find them packaged for your OS. Consult the documentation of the Perl distribution you are using. Thanks for all the help thus far Matteo! Maybe I can bug you one last time! I think I have success in getting Nagios::Plugin etc. working. But when I come back and try to compile check_lm_sensor it has only one last complaint (solved all the other previous dependency issues) / hostperl Makefile.PL Cannot determine perl version info from check_lm_sensors.pod WARNING: INSTALLSITESCRIPT is not a known parameter. 'INSTALLSITESCRIPT' is not a known MakeMaker parameter name. Writing Makefile for check_lm_sensors /// I'm not sure if this is a problem or not? But if I just go ahead then make and make install silently proceed. But if I try using check_lm_sensors it crashes badly! // /usr/lib/nagios/plugins/contrib/check_lm_sensors Can't locate version.pm in @INC (@INC contains: /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.7/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.7/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.6/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.8/i386-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /usr/lib/nagios/plugins/contrib/check_lm_sensors line 33. BEGIN failed--compilation aborted at /usr/lib/nagios/plugins/contrib/check_lm_sensors line 33. /// Any more pointers? -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] nagios and lm_sensors
On Thu, Feb 5, 2009 at 12:14 AM, Matteo Corti matteo.co...@id.ethz.chwrote: Yes you are missing another module 'version'. I forgot to put it in the list of requirements since it usually part of most Perl distribution. Awesome. It works! Thanks for all the help Matteo! I'm one step closer to monitoring my remote server cpu temperatures via nrpe + check_lm_sensors + nagios. This was a statistic we have been lacking on our HPC cluster for a long long time! My Perl distribution probably needs updating. I think we have been lazy about this! -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] nagios and lm_sensors
I just installed the lm_sensors package on my remote machines to get their temperatures. sensors works OK. I wanted to use check_nrpe and somehow add the remote machine temps to my nagios webpage. I found the check_sensors command but that only returns sensor ok. I suppose the check_lm_sensors plugin ( http://www.nagiosexchange.org/cgi-bin/page.cgi?g=Detailed%2F1289.html;d=1) is what I need? Are there other users of this? I did try compiling it but am running into problems following the instructions in the INSTALL / Cannot determine perl version info from check_lm_sensors.pod WARNING: INSTALLSITESCRIPT is not a known parameter. Warning: prerequisite Nagios::Plugin 0 not found. Warning: prerequisite Nagios::Plugin::Threshold 0 not found. 'INSTALLSITESCRIPT' is not a known MakeMaker parameter name. Writing Makefile for check_lm_sensors // Do I need to configure more pre-requisites before I can get this running? -- Rahul -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] SNMP monitoring of a Dell switch: snmpwalk succeds but check_snmp fails.
On Thu, Jan 29, 2009 at 8:40 PM, Max perld...@webwizarddesign.com wrote: Thanks for the detailed comments Max! The OID that maps to ifOperStatus from RFC1213-MIB is 1.3.6.1.2.1.2.2.1.8 So grep for 1.3.6.1.2.1.2.2.1.8.1 No luck. No such string in there. if the interface you want to look at is indeed at index 1 :). Ah! Now I am lost! What do you mean by this. Sorry I am a networks newbiee and especially SNMP is greek-n-latin to me! Actually I am not even sure what I should be monitoring on a switch. I was just using the example from the nagios tutorial for now. Maybe its alive/dead status ; bandwidth of individual ports (but that's mrtg's job right?) ; dropped packets; some thermal events? How does one go about this? What are other users montoring on their switches and how does one go about translating the fairly cryptic SNMP fields into something usable? Should I dig into my Dell switch manuals? Or is this reinventing the wheel and Nagios has an automated way to achieve this already? If you are just running this for one port on one switch, then loading the MIB is no biggie, if you plan to monitor hundreds or thousands of ports, would be better to use the numeric form of the OID and run an ePN plugin using the perl Net::SNMP or NSNMP library or a plugin that implements the C Net-SNMP libraries directly The maximum I'll end up monitoring is perhaps 4 switches with 48 ports each. So from your stats this should be on the fairly low side. -- Rahul -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] SNMP monitoring of a Dell switch: snmpwalk succeds but check_snmp fails.
I was trying to monitor my Dell Power Connect Switch via nagios. I used the default templates and have this check_command in my switch.cfg: check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB Unfortunately the web-interface shows: SNMP CRITICAL - *down(2)* Now I tried a naked SNMP query on this switch: snmpwalk -v1 -c public switch3 -m ALL.1 /tmp/switch3.snmp.log The switch does respond with reams of output! But I am not really sure what I should be looking for in there. I tried grepping on ifOperStatus.1 but that is not to be found. Any other suggestions how to monitor this recalcitrant Dell switch? -- Rahul -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Ways and tweaks to make nagios more efficient. load average on monitoring host edging up.
On Tue, Jan 27, 2009 at 5:58 PM, Jake jakepau...@gmail.com wrote: I use ping as both a service check and a host check because i want to ping all of the time to measure latency, etc. I wouldn't think so much about eliminating service checks that aren't directly redundant as much as making sure the checks you do are as fast as possible. Thanks Jake! I'll heed the advice. I wasn't sure about what are the parts best worth tackling to gain efficiency. Specifically, look for any service check that takes longer than a second. Is there a place where it logs how long a service check too? How do you usually find out? I can only see when it was last checked on my interface but not how long it took. Also make sure your timeouts are set low as this can easily be a source for high load averages - e.g. if you consider 500ms latency on the ping service to be critical then why not set your timeout value to one or two seconds instead of 10 (which is the default for check_ping). Is *service_check_timeout=60 in the main config file the timeout that you are talking about? I might be mistaking what you mean.* Shouldn't this matter only for the nodes that *do* have a latency problem alone? I hope these will remain a minor fraction. But the major chunk will be the ones that respond within the timeout but still a *lot* of work. How does it work out that the timeout made such a huge difference for you? That single change for check_ping made a huge difference for me and that was before I started even looking at other services like my check_dell-hardware and check_hp-hardware which were awfully slow prior to rewriting them (now available on nagiosexchange.) -- Jake Paulus jakepau...@gmail.com -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Ways and tweaks to make nagios more efficient. load average on monitoring host edging up.
On Wed, Jan 28, 2009 at 1:32 AM, Kyle O'Donnell kyleodonn...@gmail.comwrote: I use service deps. Most of my services are nrpe checks and I create a dep on nrpe. If a check comes back critical (or which ever state you choose to execute the dep) it does an nrpe check, if nrpe returns critical (or whichever state you choose) it stops executing the services dependant on nrpe. My load is less than 2 on a machine with 800 hosts and 6000 services. Active host checks are disabled. So, you have no active checks at all? Or just no active host checks? I am a bit confused. All my checks are active. How does one disable active host checks? And then when will the host check be done at all? As for ping I don't check as a service only a host check which gets executed if any service turns critical. That might be the exact functionality I was thinking of. If I look under Host Status Details for all host groups I see very recent and regular checks being done on all my hosts under the column for Last Check. Even ones that do not have any services critical. Or will I only see the behavior you describe after I somehow disable active host checks? You can use check_ssh as the host check command instead of ping if you prefer as well. Good idea. But I still want ping to fall back on. If ssh fails only then ping. Is that logical? -- Rahul -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Ways and tweaks to make nagios more efficient. load average on monitoring host edging up.
On Tue, Jan 27, 2009 at 6:04 PM, Mathieu Gagné mga...@iweb.com wrote: We have +2000 hosts and +4700 services configured on one of our Nagios instance. Load average is between 1.3 an 2.0 which I find acceptable. Wow. That's way bigger than what I have. Mine's a cluster of 256 machines and around 6 services checked on each. I have an advantage that most are on a local LAN so no internet connectivity issues and external bandwidth bottlenecks. The SSH service state can be CRITICAL while all the other services are still OK. (ie. ssh server misconfiguration) You probably want to be informed about it too. True. But if SSH is down will NRPE still work? Or are they totally independent? What kind of server are you using? Intel(R) Xeon(TM) CPU 2.80GHz dual core. 2 GB RAM Also, what's the check_interval? A 1 minute interval might put the server on its knee since it would be scheduling and executing 1536 checks per minute. (as per your informations) nagios.cfg command_check_interval=-1 services.cfg normal_check_interval 5 retry_check_interval1 -- Rahul -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Ways and tweaks to make nagios more efficient. load average on monitoring host edging up.
On Wed, Jan 28, 2009 at 12:32 PM, Marc Powell m...@ena.com wrote: Just out of curiosity, what is the magnitude of the 'edging upwards' that you are seeing? Its not bad right now .But the trend is what I am wary about. My load factors are around 3. But I am still planning on adding more hosts and services and I thought it best to investigate early on if I was doing things efficiently before it came to a critical point. Just about any hardware released in the past 5 years or so should have no problems with that number of checks at all if they're 'normal' (base nagios-plugins) and run at a normal interval (5 min). Even older hardware could probably do it. What are they types of checks you are performing? How often? Intel(R) Xeon(TM) CPU 2.80GHz dual core. 2 GB RAM Its about 5 years old now I think. Checks I have are: check_ssh check_ping NRPE check_nrpe!check_load check_nrpe!check_total_procs check_nrpe!check_disk check_nrpe!check_disk_scratch check_nrpe!check_pbsmom check_nrpe!check_time_node Are they perl checks and do you have the embedded perl interpreter (ePN) enabled? I don't think I have that enabled. It is disabled by default I think at compile-time. -- Rahul -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Ways and tweaks to make nagios more efficient. load average on monitoring host edging up.
On Wed, Jan 28, 2009 at 4:34 PM, Marc Powell m...@ena.com wrote: On Jan 28, 2009, at 2:21 PM, Rahul Nabar wrote: Intel(R) Xeon(TM) CPU 2.80GHz dual core. 2 GB RAM Its about 5 years old now I think. A minor correction. Mine is just a hyperthreaded machine. I don't think it is two real cores. But still shows up as twin cpus. In case it matters. -- Rahul -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Ways and tweaks to make nagios more efficient. load average on monitoring host edging up.
On Wed, Jan 28, 2009 at 5:07 PM, Mathieu Gagné mga...@iweb.com wrote: According to cpubenchmark.net, my el cheapo CPU is better than yours: Intel Xeon 2.80GHz Score: 495 Rank: 281 Link: http://www.cpubenchmark.net/cpu_lookup.php?cpu=Intel+Xeon+2.80GHz Intel Core2 4300 @ 1.80GHz Score: 983 Rank: 170 Link: http://www.cpubenchmark.net/cpu_lookup.php?cpu=Intel+Core2+4300+%40+1.80GHz Xeon isn't always better. Sorry. :-( Haha! I guess I have to live with that for now! Too bad! -- Rahul -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Ways and tweaks to make nagios more efficient. load average on monitoring host edging up.
I set up my nagios system to monitor 256 odd nodes each with about 6 services (direct and NRPE). It is working fine but my load averages have started edging upwards. Not critical yet but I wanted some tips to make things more efficient and see if there are things I might have done ineffeciently. One of the points I identified is this: I am doing a ping and ssh check on each server. This seems redundant. Is there a way to set it up so that: Do a ssh check; if this succeds obviously ping is ok. If it fails do a ping check and report on that. How about the other way around too? I have a bunch of NRPE checks: load_average, total-processes, scratch and home dir usage, pbs_mom, ntp_time. If ssh fails then there is obviously no reason to try these other checks right? But I think the monitoring_host wastes its cycles still trying them (based on the Last Check time) Any tips how I can achieve these effeciency tweaks? Or is there a problem in my strategy? Any other performance tweaks so that I can squeeze every ounce of Nagios performace? Already I am using NRPE rather than check_by_sshh since I was told the latter might be ineffecient for the monitoring host load usage. -- Rahul -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] NRPE installation fails since check_nrpe plugin is not found in the libexec directory
I could successfully install nagios-plugins-1.4.13.tar.gz but when I try installing nrpe-2.8 the make install-plugin step fails. It does not seem to find the check_nrpe file in the libexec directory. All the other check_* files seem to be present but not this one! Any sugesstions what I could be messing up? I've the relevant error below. -- Rahul [r...@star177 nrpe-2.8]# make install-plugin cd ./src/ make install-plugin make[1]: Entering directory `/usr/local/src/nagios_nodes/downloads/nrpe-2.8/src' /usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/libexec /usr/bin/install -c -m 775 -o nagios -g nagios check_nrpe /usr/local/nagios/libexec /usr/bin/install: cannot stat `check_nrpe': No such file or directory make[1]: *** [install-plugin] Error 1 make[1]: Leaving directory `/usr/local/src/nagios_nodes/downloads/nrpe-2.8/src' make: *** [install-plugin] Error 2 -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NRPE installation fails since check_nrpe plugin is not found in the libexec directory
On Mon, Jan 26, 2009 at 1:38 PM, Andy Shellam andy-li...@networkmail.euwrote: Hi Rahul, The error it's returning suggests that check_nrpe is not in the src subdirectory of the current directory (/usr/local/src/nagios_nodes/downloads/nrpe-2.8/src.) It looks as if the plugin has not been built. Do you in fact have check_nrpe in the above mentioned directory? What was the final output from make ? I'm guessing it threw an error. Thanks again Andy! I don't seem to have the check_nrpe. Neither here nor in /usr/local/nagios/libexec. All the other scripts seem to be present there though! But not this one. You are right. make did throw errors (I got carried away by my previous successful installs and did not notice!) Snippet attached below but it seems to be a bunch of SSL errors that prevent it from compiling check_nrpe. I traced back further and checked config.log for the nagios-plugins-1.4.13 and it shows: configure:24258: WARNING: OpenSSL or GnuTLS libs could not be found or were disabled I'm not sure why though! yum info openssl.i686 openssl-devel.i386 show those both packages as installed ANy other sugesstions? -- Rahul / /usr/include/openssl/bn.h:287: error: expected specifier-qualifier-list before ‘BN_ULONG’ /usr/include/openssl/bn.h:303: error: expected specifier-qualifier-list before ‘BN_ULONG’ /usr/include/openssl/bn.h:449: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘BN_mod_word’ /usr/include/openssl/bn.h:450: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘BN_div_word’ /usr/include/openssl/bn.h:451: error: expected declaration specifiers or ‘...’ before ‘BN_ULONG’ /usr/include/openssl/bn.h:452: error: expected declaration specifiers or ‘...’ before ‘BN_ULONG’ /usr/include/openssl/bn.h:453: error: expected declaration specifiers or ‘...’ before ‘BN_ULONG’ /usr/include/openssl/bn.h:454: error: expected declaration specifiers or ‘...’ before ‘BN_ULONG’ /usr/include/openssl/bn.h:455: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘BN_get_word’ /usr/include/openssl/bn.h:470: error: expected declaration specifiers or ‘...’ before ‘BN_ULONG’ /usr/include/openssl/bn.h:743: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘bn_mul_add_words’ /usr/include/openssl/bn.h:744: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘bn_mul_words’ /usr/include/openssl/bn.h:745: error: expected ‘)’ before ‘*’ token /usr/include/openssl/bn.h:746: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘bn_div_words’ /usr/include/openssl/bn.h:747: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘bn_add_words’ /usr/include/openssl/bn.h:748: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘bn_sub_words’ In file included from /usr/include/openssl/ssl.h:978, from ../include/config.h:228, from ../include/common.h:24, from utils.c:32: /usr/include/openssl/ssl3.h:303: error: expected specifier-qualifier-list before ‘PQ_64BIT’ In file included from /usr/include/openssl/dtls1.h:64, from /usr/include/openssl/ssl.h:980, from ../include/config.h:228, from ../include/common.h:24, from utils.c:32: /usr/include/openssl/pqueue.h:73: error: expected specifier-qualifier-list before ‘PQ_64BIT’ /usr/include/openssl/pqueue.h:80: error: expected ‘)’ before ‘priority’ /usr/include/openssl/pqueue.h:89: error: expected declaration specifiers or ‘...’ before ‘PQ_64BIT’ In file included from /usr/include/openssl/ssl.h:980, from ../include/config.h:228, from ../include/common.h:24, from utils.c:32: /usr/include/openssl/dtls1.h:92: error: expected specifier-qualifier-list before ‘PQ_64BIT’ make[1]: *** [nrpe] Error 1 make[1]: Leaving directory `/usr/local/src/nagios_nodes/downloads/nrpe-2.8/src' // -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NRPE installation fails since check_nrpe pluginis not found in the libexec directory
Yes, check to ensure gnutls is also installed (rpm -q | grep tls) , and also run ldconfig -v | grep ssl and just be sure it can see the openssl-related *.so file(s) ;) Thanks Jamie! I'm stumped here am doing all checks possible to sniff out what my problems are! I am not very sure, but the output below seems to indicate that we have what we need right? rpm -q gnutls gnutls-1.6.3-2.fc8 gnutls-1.6.3-2.fc8 ldconfig -v | grep ssl libssl.so.6 - libssl.so.0.9.8b libssl.so.6 - libssl.so.0.9.8b libssl3.so - libssl3.so libgnutls-openssl.so.13 - libgnutls-openssl.so.13.3.0 libssl3.so - libssl3.so libgnutls-openssl.so.13 - libgnutls-openssl.so.13.3.0 and just be sure it can see the openssl-related *.so file(s) ;) ls -al /usr/lib/libgnutls-openssl.so.13.3.0 -rwxr-xr-x 1 root root 102572 2007-08-21 16:25 /usr/lib/libgnutls-openssl.so.13.3.0 Hmm..I am not sure what you mean. Is the above a sufficient check? Feel free to shoot more suggestions at me, however unlikely! At this point I really am grasping at straws! :-( In case it matters I am running FC8; pretty standard. Hence I had never expected nrpe to be so difficult to get up and running! -- Rahul -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NRPE installation fails since check_nrpe pluginis not found in the libexec directory
On Mon, Jan 26, 2009 at 3:34 PM, James Pratt jpr...@norwich.edu wrote: Yes all appears to be in order... I'm not sure what to tell you there The only other thing I can think of is that FC8 is just too old (?)... I don't think there are even updates for it anymore... Thanks for your tips James! I don't think it is an FC8 issue. Just last week I got nrpe running on about 175 machines all using FC8. THese were 32 bit machines though. Today I tried extending this to our remaining machines (64 bit) and that is when i started running into problems. The 64 bit machines might be a red herring though. It's just this class of machines ; may not have to do anything with the 64 bit arch. You may want to just install FC10 or 11 (Or whatever the most recent is!) - I remember I once setup Nagios on FC10 or 11 and it was a breeze using RPM's for everything, and the Nagios site has good instructions... (I now use CentOS 5.2 - fedora's release cycle is much too fast, and CentOS is as stable as RHEL for me... Upgrading my OS isnt an option unfortunately. Too big a project to unroll on 256 machines for now. -- Rahul -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NRPE installation fails since check_nrpe pluginis not found in the libexec directory
On Mon, Jan 26, 2009 at 3:34 PM, James Pratt jpr...@norwich.edu wrote: Yes all appears to be in order... I'm not sure what to tell you there The only other thing I can think of is that FC8 is just too old (?)... I don't think there are even updates for it anymore... You may want to just install FC10 or 11 (Or whatever the most recent is!) - I remember I once setup Nagios on FC10 or 11 and it was a breeze using RPM's for everything, and the Nagios site has good instructions... (I now use CentOS 5.2 - fedora's release cycle is much too fast, and CentOS is as stable as RHEL for me... Solved it! Thanks for all your help guys. Well, the 64 bit issue was a red herring. I am still not a 100% sure what my issue was but here's what I think: It was all a problem with NFS mounted drives. I have a base system and several of my remote hosts find their executibles by NFS mounts on the relevant dirs. I was trying to install from one such machine and that's when I had these problems. I went back and tried on the home-machine where these NFS mounts reside and it all worked. No idea why! The only suspicion I have is some soft-links business. These don't span the NFS mounts I remember. -- Rahul -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] nagios service flapping
I just had a bunch of services start flapping on me. THe common factor seems all of these were services monitored by nrpe. // Notifications for this service are being suppressed because it was detected as having been flapping between different states (22.4% change = 20.0% threshold). When the service state stabilizes and the flapping stops, notifications will be re-enabled. // My nrpe.cfg is pristine except for command[check_disk_scratch]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /scratch What could be causing a service to start flapping. Never happened to me before. ANy debug sugesstions? The Status for the service is correct though. DISK OK - free space: /scratch 14886 MB (52% inode=98%): -- Rahul snippet from services.cfg define service{ use rpn_intermediate_service hostgroup_name npre-compute-nodes service_description /scratch Partition on nodes check_command check_nrpe!check_disk_scratch ; details defined in the nrpe.conf } -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] using NPRE to monitor pbs_mom. Error: NRPE: Unable to read output
I'm a bit confused about how exactly to add stuff with NRPE to monitor local services on my remote hosts. I got the basics out of the way and I can already monitor the easy stuff like users, procs, swap etc. More ambitiously, I wanted to monitor the status of my pbs_mom (Torque Scheduler daemon) on each node in my cluster. I found the script check_pbsmom.sh on the NagiosExchange (snippet below) and copied it to my /usr/local/nagios/libexec. Then I added this line to my nrpe.cfg command[check_pbsmom]=/usr/local/nagios/libexec/check_pbsmom But then I don't seem to have much success. remotehost/usr/local/nagios/libexec/check_nrpe -H localhost -c check_pbsmom NRPE: Unable to read output If I just run the shell script though it seems to be working /usr/local/nagios/libexec/check_pbsmom.sh PBS_MOM OK: Daemon is running. Host is listening. What am I doing wrong here! I'm still a bit confused about the interaction between command.cfg on the monitoring machine and the nrpe.cfg on the remote host. Any advice? -- Rahul #!/bin/bash # SYNOPSIS # check_pbsmom [TCP port] [TCP port] ... # # DESCRIPTION # This NAGIOS plugin checks whether: 1) pbs_mom is running and # 2) the host is listening on the given port(s). If no port # number is specified TCP ports 15002 and 15003 are checked. # # AUTHOR # wayne.mall...@jcu.edu.au OK=0 WARN=1 CRITICAL=2 PATH=/bin:/sbin:/usr/bin:/usr/sbin # Default listening ports are TCP 15004 and 42559. if [ $# -lt 1 ] ; then list=15002 15003 else list=$* fi if [ `ps -C pbs_mom | wc -l` -lt 2 ]; then echo PBS_MOM CRITICAL: Daemon is NOT running! exit $CRITICAL else for port in $list ; do if [ `netstat -ln | grep -E tcp.*:$port | wc -l` -lt 1 ]; then echo PBS_MOM CRITICAL: Host is NOT listening on TCP port $port! exit $CRITICAL fi done echo PBS_MOM OK: Daemon is running. Host is listening. exit $OK fi -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] using NPRE to monitor pbs_mom. Error: NRPE: Unableto read output
On Thu, Jan 22, 2009 at 3:18 PM, Seth Simmons ssimm...@cymfony.com wrote: The filename you specified is check_pbsmom.sh though your command shows check_pbsmom I was careless. That was exactly it! Thanks Seth. My bad. It works now. -- Rahul -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] tweaking the order of sorting in nagios lists: numeric rather than alphabetical
Is there a way to tweak the manner in which nagios sorts the names of hosts in the Service Status Details? I have hosts named star01, star02 and so on all the way through star256 and nagios insists on sortin these like so: star10 star101 star102 [snip] star119 star12 etc. Can I make the numbers sort in a numeric order rather than a strict alphabetical order? -- Rahul -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Looking for a Nagios answering service
On Wed, Jan 7, 2009 at 4:54 PM, Baron Schwartz ba...@percona.com wrote: * watch Nagios email or SMS alerts 24/7 * filter out obvious spam * response time must be on the order of minutes * call our on-call engineer, and once our engineer acks, the job is done. Maybe I am missing something. But what's the additional service provided by this intermediate company, again? I'm just curious. Why cannot an appropriately set notification-scheme directly targeting your on-call engineer work? Isn't that the purpose of notification policies? -- Rahul -- Check out the new SourceForge.net Marketplace. It is the best place to buy or sell services for just about anything Open Source. http://p.sf.net/sfu/Xq1LFB___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] nagios error: Could not stat() command file
On Wed, Dec 24, 2008 at 8:41 AM, Marc Powell m...@ena.com wrote: list. I missed out on this one. Sorry. Make sure you've followed the Documentation on enabling External Commands. The CGI's use that functionality to send commands to nagios. It's disabled by default. I must be missing out on something very basic. Here's what I have confirmed in my nagios.cfg : check_external_commands=1 command_check_interval=-1 command_file=/usr/local/nagios/var/rw/nagios.cmd My permissions on the directory /usr/local/nagios/var/rw seem correct too: drwxrwsr-x 2 nagios nagcmd 4096 Dec 22 17:06 rw cat /etc/group shows that both the required users are a part of the correct group: nagcmd:x:1239:nagios,apache locate nagios.cmd returns a null showing that this file is not accidentally being created in a wrong location. getenforce gives Disabled so I guess it is not damn-SELINUX-once-more day yet! At this point I am stumped again! Any other checks I am missing out on? Ian, I did peruse the list postings from last month on this topic which is how I came up with these checks I outlined above. In case I am still missing the relevant instruction it'd be great if you could point me to the correct post that you might have in mind. Thanks again guys; and I apologize if I am missing something clearly basic! -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] nagios error: Could not stat() command file
On Wed, Dec 24, 2008 at 9:51 AM, Marc Powell m...@ena.com wrote: Make sure you've restarted nagios after adding these. Also check for errors in nagios.log. Works! Thanks Marc. I am not sure what it was. But earlier I was doing a /etc/init.d nagios reload Now I tried a restart. Perfect! Thanks again guys! -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] re-enabling check_snmp
Getting more and more impressed with Nagios's capabilities I was getting more ambitious and now was getting it to monitor the switches on my university-research-computing-cluster as well. The pings work fine but the SNMP monitoring fails. Digging deeper I noticed that I did not have the command check_snmp I think this is because when I installed Nagios two days ago I did not have snmpwalk, snmpget etc,.installed on my system. I just did a yum install net-snmp earlier today. How can I now retroactively get this check_snmp functionality? Do I have to do a ./confgure, make , make install dance again on the nagios_plugins source? That's ok but I was just afraid if it would overwrite any of my configs etc. What is the recommended procedure now? [I guess I was stupid in the fact that I skimped reading config.log in my eagerness to go ahead. I also find the ./configure has skipped on some other potentially useful plugins for me eg. mysql] I hope I am not overusing this group in my eagerness to get more done with Nagios! Apologize in advance if I did! -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Cannot add remote-linux server to my setup to be monitored
On Tue, Dec 23, 2008 at 4:02 AM, Kenneth Holter kenneho@gmail.com wrote: Just a little side note: I don't think you need to maintain the hostgroup- host relationship in both the hostgroup and host definitions. Keep the definition in one of the two to get a cleaner code. Someone please correct me if I'm wrong. :) Thanks guys! I got it working now. Another question: I see a Critical Notification of the sort: PROCS CRITICAL: 1217 processes with STATE = RSZDT on my localhost itself. What is this? Any clues? I'm stumped. -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Cannot add remote-linux server to my setup to be monitored
On Tue, Dec 23, 2008 at 6:52 PM, Andy Shellam andy-li...@networkmail.eu wrote: It means you have a service check set up to check how many processes are in the state RSZDT (I believe these are active processes) with a critical threshold. The current number of (active?) processes on the machine is 1,217 which is above your critical threshold you have defined so Nagios is alerting you (good Nagios.) Good Nagios indeed! It has paid back pretty quickly! Something did indeed go wrong on my server and had spawned a lot of processes in the S status. I am looking into this now. Glad I did not ignore the red critical flag! -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] nagios error: Could not stat() command file
When I try to access many of the sub-menu options under nagios (eg. deactivate the service etc.) I get the following error: Error: Could not stat() command file '/usr/local/nagios/var/rw/nagios.cmd'! The external command file may be missing, Nagios may not be running, and/or Nagios may not be checking external commands. An error occurred while attempting to commit your command for processing. I looked in the indicated dir and it seems empty. Should there be something in there? Does it point to a fault Nagios install? All my tests seemed OK. Any suggestions? -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] nagios error: Could not stat() command file
On Wed, Dec 24, 2008 at 1:14 AM, Ian Masters i...@acces.co.jp wrote: Have you Googled and checked the list archives? I answered this same question earlier this month. Oh! I'll google on the Nagios list. I missed out on this one. Sorry. -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Cannot add remote-linux server to my setup to be monitored
I just installed Nagios and I can monitor my localhost all right. I tried to start with one of my remote compute-nodes but this does not seem to work so well. I see my new group compute-nodes on the web interface but it does not list the remote machine I tried adding. I'm stumped as to what I am doing wrong! To my nagios.cfg I added this line :cfg_file=/usr/local/nagios/etc/hosts.cfg And made a new /usr/local/nagios/etc/hosts.cfg like so: define hostgroup{ hostgroup_name compute-nodes alias compute-nodes members star01 } define host{ host_name star01 alias star01 address 11.0.0.1 hostgroups compute-nodes check_command check-host-alive max_check_attempts 5 check_period24x7 process_perf_data 0 retain_nonstatus_information0 contact_groups admins notification_interval 30 notification_period 24x7 notification_optionsd,u,r } Shouldn't this be a basic template to get me started up? What else do I need to do? Any debug suggestions? A ping to 11.0.0.1 is successful. -- Rahul -- ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null