[Nagios-users] more nagiosgraph issues
Hello List I've been struggling with nagiosgraph (1.4) and nagios 3.2 for the last 3 weeks and can't get it to work, my setting are as follows : Nagios.cfg # PROCESS PERFORMANCE DATA OPTION process_performance_data=1 service_perfdata_file=/var/spool/nagios/perfdata.log service_perfdata_file_template=$LASTSERVICECHECK$||$HOSTNAME$||$SERVICEDESC$||$SERVICEOUTPUT$||$SERVICEPERFDATA$ service_perfdata_file_mode=a service_perfdata_file_processing_interval=30 service_perfdata_file_processing_command=process-service-perfdata commands.cfg: # Nagios Performance Commands define command { command_name process-service-perfdata command_line /usr/local/nagios/lib/insert.pl } nagiosgraph.conf : ( cleared from all comments ) debug = 5 debug_showgraph = 5 logfile = /usr/local/nagios/var/nagiosgraph.log rrddir = /usr/local/nagios/nagiosgraph/rrd mapfile = /usr/local/nagios/nagiosgraph/map colorscheme = 1 heartbeat = 600 perflog = /var/spool/nagios/perfdata.log dbseparator = subdir plotas = LINE2 maximums = Current Load,PLW,Procs: total,Procs: zombie,User Count minimums = APCUPSD,Mem: free,Mem: swap withmaximums = PING withminimums = PING hostdb = /usr/local/nagios/nagiosgraph/hostdb.conf servdb = /usr/local/nagios/nagiosgraph/servdb.conf nagioscgiurl = https://nagiosserver/nagios/cgi-bin javascript = /nagios/nagiosgraph.js stylesheet = /nagios/stylesheets/nagiosgraph.css graphlabels = true small = 650x150 clear = clear the list selecthost = Select server selectitems = Optionally, select the data set(s) to graph: selectserv = Select service fixedscale = Fixed Scale submit = Update Graphs zoom = Resize the graphs: perfforhost = Performance data for host perfforserv = Performance data for service service = service asof = as of dai = Today daily = Daily day = Today week = This Week weekly = Weekly month = This Month monthly = Monthly year = This Year yearly = Yearly configerror = Configuration Error (email mailto:alan.bren...@ithaka.org";>Alan). noservicegiven = Bad URL for showservice.cgi; no service given apcupsd = Uninterruptible Power Supply Status (Battery Charge, Tempurature, Load Percentage, Time Left) bps = Bits Per Second clamdb = Clam Database diskgb = Disk Usage in Gigabytes diskpct = Disk Usage in Percent http = Bits Per Second load = Load Average losspct = Loss Percentage mailq = Pending Output E-mail Messages memory = RAM Usage Mem%3A%20swap = Swap Utilization swap = Swap Utilization ping = Ping Loss Percentage and Round Trip Average pingloss = Ping Loss Percentage pingrta = Ping Round Trip Average PLW = Perl Log Watcher Events procs = Processes qsize = Messages in Outbound Queue rta = Round Trip Average smtp = E-mail Status testcolor = Show Colors typesome = Type some space seperated nagiosgraph line names here graph = Graph previous = previous next = next createdby = Created by nagiosgraph directory and all sub dirs and files have nagios:nagios ownership with full rwx for group and user. By using rrdtool dump I can see data in the rrd files , but no graphs are being displayed . from nagiosgraph.log Wed Feb 3 10:03:02 2010 insert.pl debug getrules(/usr/local/nagios/nagiosgraph/map) Wed Feb 3 10:03:02 2010 insert.pl debug inputdata() Wed Feb 3 10:03:02 2010 insert.pl debug inputdata empty /var/spool/nagios/perfdata.log Wed Feb 3 10:03:02 2010 insert.pl debug insert.pl exited but doing tail -f /var/spool/nagios/perfdata.log 1265194199||dec1-be-107||Check NTP Time||NTP OK: Offset -0.0004923343658 secs||offset=-0.000492s;10.00;500.00; 1265194199||dec1-be-107||Total Processes||PROCS OK: 80 processes|| 1265194199||dec1-be-71||Check NTP Time||NTP OK: Offset -0.0008155107498 secs||offset=-0.000816s;10.00;500.00; 1265194199||dec1-be-71||Total Processes||PROCS OK: 130 processes|| Any one got any idea why this is happening and hoiw can i get it to work ? Thanks Assaf -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Event Handlers
I have a service that needs to be monitored every minute. I need some help understanding how services go from soft to a hard state and if an event handler can be run after a service has gone into a hard state. I'm sure everyone has a very dynamic and custom environment to some extent. I have event handlers that will not run if a lock file is present (cause i am deploying code or so other scripts do not step on each other). So I for this service that I monitor every minute, I have Max Retries set to 3, Check Interval is 1, and retry interval is 1. Can someone help shed some light on how I can get an event handler to run again after a service has gone into a hard state? Thanks, JB -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Asterisk Questions
Hi all! I am wanting to write a plugin that will tell Asterisk to make a test call every so often and then report back to Nagios if a call was successful or not. The caveat here is that Asterisk is on one server and Nagios is on its own dedicated server. If anyone knows of any script or plugin that can do this, I would greately appreciate any pointers in the right direction. Thanks! -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Event Handlers
On Feb 3, 2010, at 8:16 AM, Jeff wrote: > I have a service that needs to be monitored every minute. I need some help > understanding how services go from soft to a hard state When a service check results in a non-OK state, services go from a Soft to a Hard state when they reach max_check_attempts. http://nagios.sourceforge.net/docs/3_0/statetypes.html > and if an event handler can be run after a service has gone into a hard > state. Only for it's initial Hard problem state or initial Hard recovery state. http://nagios.sourceforge.net/docs/3_0/eventhandlers.html > I'm sure everyone has a very dynamic and custom environment to some extent. > I have event handlers that will not run if a lock file is present (cause i am > deploying code or so other scripts do not step on each other). So I for this > service that I monitor every minute, I have Max Retries set to 3, Check > Interval is 1, and retry interval is 1. Can someone help shed some light on > how I can get an event handler to run again after a service has gone into a > hard state? You can't really... The only real facility nagios has to do this (that I can think of right now) is is_volatile (http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#service) but that's probably overkill for your needs; particularly the notification implications. -- Marc -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] check load oddity
This is making no sense to me at all. It is obvious to me that my load is less than the critical threshold, why is the plugin reporting a critical state? [r...@monitor1 plugins]# ./check_load 2.0 1.8 1.5 3.0 2.8 2.5 CRITICAL - load average: 1.96, 1.01, 0.75|load1=1.960;0.000;0.000;0; load5=1.010;0.000;0.000;0; load15=0.750;0.000;0.000;0; Usage:check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15 Options: -h, --help Print detailed help screen -V, --version Print version information -w, --warning=WLOAD1,WLOAD5,WLOAD15 Exit with WARNING status if load average exceeds WLOADn -c, --critical=CLOAD1,CLOAD5,CLOAD15 Exit with CRITICAL status if load average exceed CLOADn the load average format is the same used by "uptime" and "w" -r, --percpu Divide the load averages by the number of CPUs (when possible) Completely confused with this one. DAve -- "Posterity, you will know how much it cost the present generation to preserve your freedom. I hope you will make good use of it. If you do not, I shall repent in heaven that ever I took half the pains to preserve it." John Adams http://appleseedinfo.org -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check load oddity
I've seen that with check_procs. Justin -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Could not expand hostgroups error
Chris Weiss wrote: > When I run nagios -v with cfg_file=/etc/nagios/objects/twiki.cfg > present, it complains: > Error: Could not expand hostgroups and/or hosts specified in service > (config file '/etc/nagios/objects/twiki.cfg', starting on line 24) >Error processing object config files! > > I'm stumped as to why it cannot expand the hostname. It's defined in > the same file and nowhere else. The only other place "twiki-servers" > is referenced is in one of the other host files in conf.d > That's precisely the reason for the error. You're assigning a service to a hostgroup you have not defined. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check load oddity
DAve wrote: > This is making no sense to me at all. It is obvious to me that my load > is less than the critical threshold, why is the plugin reporting a > critical state? > > [r...@monitor1 plugins]# ./check_load 2.0 1.8 1.5 3.0 2.8 2.5 > CRITICAL - load average: 1.96, 1.01, 0.75|load1=1.960;0.000;0.000;0; > load5=1.010;0.000;0.000;0; load15=0.750;0.000;0.000;0; > Re-read the output of the syntax help the plugin is giving you. You are not passing the plugin valid paramaters. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check load oddity
On Feb 3, 2010, at 9:59 AM, DAve wrote: > This is making no sense to me at all. It is obvious to me that my load > is less than the critical threshold, why is the plugin reporting a > critical state? Almost certainly because these don't match (e.g. you're using it wrong) - > [r...@monitor1 plugins]# ./check_load 2.0 1.8 1.5 3.0 2.8 2.5 > Usage:check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15 -- Marc -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Exclusions to check_procs plugin?
Alle, I've searched and found this question has been asked before, but there don't seem to have been any responses. Is it possible to exclude a process with the check_procs plugin. I have John the Ripper running on one of my machines which consistently uses 100% of one of the four CPUs: Cpu0 : 0.3%us, 0.3%sy, 0.0%ni, 93.0%id, 6.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.7%us, 0.7%sy, 0.0%ni, 97.0%id, 1.3%wa, 0.0%hi, 0.3%si, 0.0%st Cpu2 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 :*100.0%us*, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3912852k total, 3761780k used, 151072k free, 180016k buffers Swap: 8388600k total, 112k used, 8388488k free, 2994944k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 11066 root 25 0 12376 7012 644 R 100.1 0.2 70505:06 john I'd like to ignore this if possible. Best Regards, Camron -- Camron W. Fox Hilo Office High Performance Computing Group Fujitsu Management Services of America, Inc. E-mail: cw...@us.fujitsu.com -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check load oddity
Morris, Patrick wrote: > DAve wrote: >> This is making no sense to me at all. It is obvious to me that my load >> is less than the critical threshold, why is the plugin reporting a >> critical state? >> >> [r...@monitor1 plugins]# ./check_load 2.0 1.8 1.5 3.0 2.8 2.5 >> CRITICAL - load average: 1.96, 1.01, 0.75|load1=1.960;0.000;0.000;0; >> load5=1.010;0.000;0.000;0; load15=0.750;0.000;0.000;0; >> > > Re-read the output of the syntax help the plugin is giving you. You are > not passing the plugin valid paramaters. The preloaded command in NagiosQL is wrong and I never even checked it. Coffee needed... sigh... [r...@monitor1 plugins]# ./check_load -w 2.0,1.8,1.5 -c 3.0,2.8,2.5 OK - load average: 1.76, 0.86, 0.72|load1=1.760;2.000;3.000;0; load5=0.860;1.800;2.800;0; load15=0.720;1.500;2.500;0; DAve -- "Posterity, you will know how much it cost the present generation to preserve your freedom. I hope you will make good use of it. If you do not, I shall repent in heaven that ever I took half the pains to preserve it." John Adams http://appleseedinfo.org -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Change From Address in notifications
Is there a way to statically set the from address in notifications? Our notifications are all being generated as nag...@localhost which causes them to be blocked by various spam filters. Thanks! Chip Burke -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Exclusions to check_procs plugin?
On Feb 3, 2010, at 11:28 AM, Camron W. Fox wrote: > Alle, > > I've searched and found this question has been asked before, but there > don't seem to have been any responses. > Is it possible to exclude a process with the check_procs plugin. The --help details no such option so being trusting and not going through the code, I'd say no. A quick search of http://exchange.nagios.org shows a number of check_proc* scripts. Perhaps one of them has that functionality or could easily be modified to ignore processes you specify. -- Marc -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Change From Address in notifications
On Feb 3, 2010, at 12:31 PM, Chip Burke wrote: > Is there a way to statically set the from address in notifications? Our > notifications are all being generated as nag...@localhost which causes them > to be blocked by various spam filters. This is a configuration problem with your mailer daemon; it sounds like you haven't properly configured it to be Internet friendly. How you fix it depends on what MTA (postfix, sendmail, exim, etc) you're using on the Nagios machine and is pretty much outside the scope of this list. All accounts on the machine will have this problem, not just the nagios account. http://support.nagios.com/knowledgebase/faqs/index.php?option=com_content&view=article&id=52&catid=35&faq_id=338 If you just want to fix it for your nagios user, there are several options that may work for you. Google result you may find useful (several ways discussed) -- http://www.groundworkopensource.com/community/forums/viewtopic.php?t=1286 -- Marc -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Could not expand hostgroups error
On Wed, Feb 3, 2010 at 8:46 AM, Morris, Patrick wrote: > Chris Weiss wrote: >> >> When I run nagios -v with cfg_file=/etc/nagios/objects/twiki.cfg >> present, it complains: >> Error: Could not expand hostgroups and/or hosts specified in service >> (config file '/etc/nagios/objects/twiki.cfg', starting on line 24) >> Error processing object config files! >> >> I'm stumped as to why it cannot expand the hostname. It's defined in >> the same file and nowhere else. The only other place "twiki-servers" >> is referenced is in one of the other host files in conf.d >> > > That's precisely the reason for the error. You're assigning a service to a > hostgroup you have not defined. > Patrick - I'm defining the hostgroup immediately above the service definition in that file. I could see where that might not be supported, but I do the same thing with my exchange.cfg file (and all the others) and that seems to work fine. Matt - I do have one host that references those hostgroups. Thanks for the link! I'd run across (and bookmarked) it before when I was hunting for ways to optimize Nagios configs. -- -Chris -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Change From Address in notifications
Hi, On Wednesday 03 February 2010 07:31:32 pm Chip Burke wrote: > Is there a way to statically set the from address in notifications? Our > notifications are all being generated as nag...@localhost which causes them > to be blocked by various spam filters. just edit the command in your nagios configuration and add the option -r to the mail commando define command{ command_namenotify-service-by-email command_line/usr/bin/printf "%b" "..." | /usr/bin/mail -s "..." $CONTACTEMAIL$ -r nag...@example.com } see man mailx Greetings, Christian > > > > Thanks! > > > > > Chip Burke > -- Christian SchneemannGeschaeftsfuehrer: Ralph Dehner IT Consultant & TrainerUnternehmenssitz: Vohburg B1 Systems GmbH Amtsgericht: Ingolstadt Mobil: +49-(0)-1757250665 Handelsregister: HRB 3537 EMail: schneem...@b1-systems.de http://www.b1-systems.de Adresse: B1 Systems GmbH, Osterfeldstraße 7, 85088 Vohburg GPG: http://pgpkeys.pca.dfn.de/pks/lookup?op=get&search=0x2FA8643A41BDAB81 -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Change From Address in notifications
That's got it, thanks! Chip Burke -Original Message- From: Christian Schneemann [mailto:schneem...@b1-systems.de] Sent: Wednesday, February 03, 2010 2:10 PM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Change From Address in notifications Hi, On Wednesday 03 February 2010 07:31:32 pm Chip Burke wrote: > Is there a way to statically set the from address in notifications? Our > notifications are all being generated as nag...@localhost which causes them > to be blocked by various spam filters. just edit the command in your nagios configuration and add the option -r to the mail commando define command{ command_namenotify-service-by-email command_line/usr/bin/printf "%b" "..." | /usr/bin/mail -s "..." $CONTACTEMAIL$ -r nag...@example.com } see man mailx Greetings, Christian > > > > Thanks! > > > > > Chip Burke > -- Christian SchneemannGeschaeftsfuehrer: Ralph Dehner IT Consultant & TrainerUnternehmenssitz: Vohburg B1 Systems GmbH Amtsgericht: Ingolstadt Mobil: +49-(0)-1757250665 Handelsregister: HRB 3537 EMail: schneem...@b1-systems.de http://www.b1-systems.de Adresse: B1 Systems GmbH, Osterfeldstraße 7, 85088 Vohburg GPG: http://pgpkeys.pca.dfn.de/pks/lookup?op=get&search=0x2FA8643A41BDAB81 -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios 3.0.5 problem
Well, I have more information to add. I found a script that was being launched at midnight to purge old data from the database. The tables being pruned are used by perfparse to store perfdata and the like. They have > 180M rows, are 30-60GB, and are actively being inserted into all the while. As I understand it, they are InnoDB and should be using row (not table) locks, and really should not have much trouble with concurrent inserts. While this goes on, one CPU/core is largely in iowait, but the other 7 are largely idle, and we generally don't have any trouble with RAM or other resource exhaustion. Now that I know what caused my problem, I can reproduce it, which is ... interesting. After only a few minutes, nagios starts falling behind on service checks. It appears to be getting new checks with current timestamps in the nagios.log, but a service detail sorted by "Last Check" descending slowly shows the timestamps getting further and further behind current. A bit later, nagios starts taking 100% of 2 CPU cores, and nsca processes start to stack up... leading to the problem as I was observing it in the morning. In an attempt to diagnose I tried a few things. I have found that by the time nagios starts to bug out it can't be saved. If you cancel the delete query after seeing a lag on the check results, it does not slowly improve, and 'catch up' as I had hoped. This happens even if there are no rows to be deleted, though not if you use LIMIT to keep the query to a reasonable timeframe. I'm still looking for fresh ideas, but in the meantime I am writing a script to loop over the delete and do it in 10,000 row increments which are ~10 seconds instead of ~3M rows which takes over an hour per table. If you do the math, though, you'll see it'll be nearly as time-consuming, and I'm just hoping that we'll lock whatever is going on for a shorter period with room for inserts to happen in-between. Even if that 'fixes' it, I won't be satisfied. Any and all suggestions are welcomed. --Rick On Fri, Jan 29, 2010 at 11:01 AM, Rick Mangus > wrote: > Hello, all. > > Forgive me, I am new to the list, and have only begun working with nagios > recently. I have searched this list and googled furiously with little > result, so must cease my lurking and present my problem to you. > > I will begin with the problem: Sometime after midnight every night, my > nagios server starts to have trouble processing service checks. I don't > know the cause, and cannot find a solution. I can describe the symptoms in > detail and hope we can diagnose it. > > The web interface shows the last service check came in at 02:28:34 (EST). > I know that around 4:15 every morning, xinetd starts refusing connections to > nsca due to high load (max_load is 18), and that eventually I will have > 32000+ nsca connections using up all available PIDs leading to an inability > to fork new processes, effectively killing the machine. While all this > happens, the nagios.log appears to periodically stall, making no new entries > for 15 minutes at a time, and then flush 15000 in the space of a single > second. Also, it seems the checkresults directory is empty most of the > time, but sometimes pops up to 2045 files (it's on a ramdisk with 2048 > inodes) and not a single one gets deleted in a time period I have been > patient enough to observe. > > The periods in which the nagios log is going nowhere are accompanied by > nagios taking 100% of 2 CPUs. One thread appears to poll() approximately > every 25 usecs, and another is inscrutable, with mprotect() the only > strace-visible syscall. All the nsca processes have a blocking write() they > are waiting on. When the log is showing new entries, there are still no > updates made to the services, and it seems that that is what is filling up > checkresults. I admit I have not checked to find the order of the log and > checkresults processes, though I assumed they would operate in the opposite > order of what this appears to show. > > I know this behavior has been ongoing for at least 1 month. I have > disabled all cron jobs that I feared might be interfering. I will answer > any and all questions to the best of my ability, and hope someone here can > shed some light on the situation. > > --Rick > -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will ri
Re: [Nagios-users] Event Handlers
Marc Powell wrote: > On Feb 3, 2010, at 8:16 AM, Jeff wrote: > >> I have a service that needs to be monitored every minute. I need some help >> understanding how services go from soft to a hard state > > When a service check results in a non-OK state, services go from a Soft to a > Hard state when they reach max_check_attempts. > http://nagios.sourceforge.net/docs/3_0/statetypes.html > >> and if an event handler can be run after a service has gone into a hard >> state. > > Only for it's initial Hard problem state or initial Hard recovery state. > http://nagios.sourceforge.net/docs/3_0/eventhandlers.html > >> I'm sure everyone has a very dynamic and custom environment to some extent. >> I have event handlers that will not run if a lock file is present (cause i >> am deploying code or so other scripts do not step on each other). So I for >> this service that I monitor every minute, I have Max Retries set to 3, Check >> Interval is 1, and retry interval is 1. Can someone help shed some light on >> how I can get an event handler to run again after a service has gone into a >> hard state? > > You can't really... The only real facility nagios has to do this (that I can > think of right now) is is_volatile > (http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#service) but > that's probably overkill for your needs; particularly the notification > implications. The other possibility for having something run every time the service is checked, is to configure your ocsp_command. Not exactly what it's generally used for, but it'll do in a pinch. -- Mike Lindsey -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Asterisk Questions
> -Original Message- > From: Wolfe, Robert [mailto:robert.wo...@robertwolfe.org] > Sent: Thursday, 4 February 2010 01:52 > > Hi all! > > I am wanting to write a plugin that will tell Asterisk to make a test > call every so often and then report back to Nagios if a call was > successful or not. The caveat here is that Asterisk is on one server > and Nagios is on its own dedicated server. > > If anyone knows of any script or plugin that can do this, I would > greately appreciate any pointers in the right direction. Use NRPE (active) or NSCA (passive) for the check result submission. Write your test routine, house it on your remote server. For NRPE, use 'check_nrpe -H -c '. Have the NRPE daemon listen on your asterisk box with the '' defined in the 'nrpe.cfg' file. For NSCA, the NSCA daemon listens on your Nagios server, and a cron-job runs your check and submits the results. See http://nagios.sourceforge.net/docs/3_0/addons.html for a better description and more documentation the check submission methods. Stuart -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Customizing notifications
I have a request to "plain English"-ify my notifications. One item I have been asked for is when the service state changes, to report the duration of the previous service state. Example: HTTP is now OK after 00:02:35 of down time. Is there an easy way to do this? It seems Nagios doesn't offer a Last State Duration macro, so I am assuming this is going to be a matter of some sort of custom scripting. Has anyone had experience with this sort of thing? Thanks! Chip Burke -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Customizing notifications
Chip Burke wrote: > I have a request to “plain English”-ify my notifications. One item I > have been asked for is when the service state changes, to report the > duration of the previous service state. > > Example: HTTP is now OK after 00:02:35 of down time. > > Is there an easy way to do this? It seems Nagios doesn’t offer a Last > State Duration macro, so I am assuming this is going to be a matter of > some sort of custom scripting. Has anyone had experience with this sort > of thing? Likely, your best option will be to set up an event handler script for that service. If you already have event handlers configured, and you want this logic to run everywhere, consider setting up a script like this for your global event handler. In the event handler, you will want to touch a file in /tmp based on the host, service, and state, whenever there's a hard state change. Like, /tmp/localhost-load-ok... You could even simplify if all you care is ok/not ok. Then in your notification script, just check for the presence of those files, and do your date calculation by pulling the modification date out with stat (or script code, if your notification command isn't a chunk of bash). Something like: now=`date +%s` if [ "${NAGIOS_LASTSSERVICESTATE}" == "OK"] then time=`echo ${now} - ${filetime} | bc` filetime=`stat -f "%m" /tmp/localhost-load-notok` else time=`echo ${now} - ${filetime} | bc` filetime=`stat -f "%m" /tmp/localhost-load-ok` fi echo "${NAGIOS_SERVICE} is now ${NAGIOS_SERVICESTATE} after ${time} seconds." You might want to flesh it out with some file-exists tests as well. Good luck! -- Mike Lindsey -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Customizing notifications
On Feb 3, 2010, at 4:51 PM, Chip Burke wrote: > I have a request to “plain English”-ify my notifications. One item I have > been asked for is when the service state changes, to report the duration of > the previous service state. > > Example: HTTP is now OK after 00:02:35 of down time. > > Is there an easy way to do this? It seems Nagios doesn’t offer a Last State > Duration macro, so I am assuming this is going to be a matter of some sort of > custom scripting. Has anyone had experience with this sort of thing? $LASTSERVICEOK$ has potential, depending on when it's updated. "This is a timestamp in time_t format (seconds since the UNIX epoch) indicating the time at which the service was last detected as being in an OK state." so time_t(now) - $LASTSERVICEOK$ = number of seconds in non-OK state I am _assuming_ that the macro is not updated until after the recovery notification is sent. -- Marc -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios 3.0.5 problem
Rick Mangus wrote: > Well, I have more information to add. > > I found a script that was being launched at midnight to purge old data > from the database. The tables being pruned are used by perfparse to > store perfdata and the like. They have > 180M rows, are 30-60GB, and > are actively being inserted into all the while. As I understand it, > they are InnoDB and should be using row (not table) locks, and really > should not have much trouble with concurrent inserts. While this goes > on, one CPU/core is largely in iowait, but the other 7 are largely > idle, and we generally don't have any trouble with RAM or other > resource exhaustion. Are your check results going to the same disk partition where all the I/O is happening? If Nagios is stuck waiting for disk, moving them somewhere else may just fix your problem. -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null